Regular expression to fetch beginning of string or a symbol - javascript

I am writing a function to find attributes value from given string and given attribute name.
The input stings look like those below:
sip:+19999999999#trunkgroup2:5060;user=phone
<sip:+19999999999;tgrp=0180401;trunk-context=aaaa.aaaa.ca#10.10.10.100:8000;user=phone;transport=udp>
<sip:19999999999;tgrp=0306001;trunk-context=aaaa.aaaa.ca#10.10.10.100:8000;transport=udp>
<sip:+19999999999;tgrp=SMPPDIN;trunk-context=aaaa.aaaa.ca#10.10.10.100:8000;transport=udp>
After few hours I came out with this regular expression: /(\Wsip[:,+,=]+)(\w+)/g, but this is not working for the first example - as there is no not a word character before the attributes name.
How can I fix this expression to fetch both cases - <sip... and sip.. only when it is the beginning of the string.
I use this function to extract both sip and tgrp values.

Replace \W with \b, and use
\b(sip[:+=]+)(\w+)
Or, to match at the beginning of a string:
^\W?(sip[:+=]+)(\w+)
See the first regex demo and the second regex demo.
As \W is a consuming pattern matching any non-word char (a char other than a letter/digit/_) you won't have a match at the start of the string. A \b word boundary will match at the start of the string and in case there is a non-word char before s.
If you literally need to find a match at the beginning of a string after an optional non-word char, the \W must be replaced with ^\W? where ^ match the start of a string, and \W? matches 1 or 0 non-word chars.
Also, note that , inside a character class is matched as a literal ,. If you mean to use it to enumerate chars, you should remove it.
Pattern details:
\b - a word boundary
OR
^ - start of string
\W? - 1 or 0 (due to the ? quantifier) non-word chars (i.e. chars other than letters/digits and _)
(sip[:+=]+) - Group 1: sip substring followed with one or more :, + or = chars
(\w+) - Group 2: one or more word chars.

for begining of line use ^ and to make < is optional use ?
^<?(sip[:,+,=]+)(\w+)

Related

Trouble with regular expression catching strings that begins and ends with alpha character and contains dash

I'm trying to create a regular expression that matches strings such as:
N1-112S
So far I have succeeded with the following (although I'm not really sure why it works):
item.match(/^\D.-/)
I'd like to further bolster the results by ensuring that the last character is A-Z as well.
I'd appreciate some help on a good regular expression for matching this pattern. Thanks!
If you plan to match a string that starts with an uppercase ASCII letter, then has a digit, then a hyphen, then 1 or more digits and then an ASCII letter at the end of the string use
/^[A-Z]\d-\d+[A-Z]$/.test(item)
See the regex demo. Also, to test if a regex matches some string or not, I'd recommend RegExp#test.
Pattern details
^ - start of string
[A-Z] - an uppercase ASCII letter
\d - an ASCII digit
- - a hyphen
\d+ - 1+ digits
[A-Z] - an ASCII letter
$ - end of string.
Variations
To match any alphanumeric chars after hyphen till the end of string, you need to change the above pattern a bit:
/^[A-Z]\d-[\dA-Z]*[A-Z]$/
The second \d+ is changed to [\dA-Z]*, any 0 or more ASCII digits or letters.
If there can be any chars after -, use .* or [^] instead of a \d+:
/^[A-Z]\d-.*[A-Z]$/

Do not allow '.'(dot) anywhere in a string (regular expression)

I have a regular expression for allowing unicode chars in names(Spanish, Japanese etc), but I don't want to allow '.'(dot) anywhere in the string.
I have tried this regex but it fails when string length is less than 3. I am using xRegExp.
^[^.][\\pL ,.'-‘’][^.]+$
For Example:
NOËL // true
Sanket ketkar // true
.sank // false
san. ket // false
NOËL.some // false
Basically it should return false when name has '.' in it.
Your pattern ^[^.][\\pL ,.'-‘’][^.]+$ matches at least 3 characters because you use 3 characters classes, where the first 2 expect to match at least 1 character and the last one matches 1 or more times.
You could remove the dot from your character class and repeat that character class only to match 1+ times any of the listed to also match when there are less than 3 characters.
^[\p{L} ,'‘’-]+$
Regex demo
Or you could use a negated character class:
^[^.\r\n]+$
^ Start of string
[^.\r\n]+ Negated character class, match any char except a dot or newline
$ End of string
Regex demo
You could try:
^[\p{L},\-\s‘’]+(?!\.)$
As seen here: https://regex101.com/r/ireqbW/5
Explanation -
The first part of the regex [\p{L},\-\s‘’]+ matches any unicode letter, hyphen or space (given by \s)
(?!\.) is a Negative LookAhead in regex, which basically tells the regex that for each match, it should not be followed by a .
^[^.]+$
It will match any non-empty string that does not contain a dot between the start and the end of the string.
If there is a dot somewhere between start to end (i.e. anywhere) it will fail.

Regex to validate hypen(-) at start and end of the string

I'm validating a string("-test-") whether it contains hypens(-) at start and end of the string using regex. So i found an regex to restrict hypen at start and end of regex.
/^(?!-)[a-zA-Z0-9-' ]+[^-]$/i
This regex was validating as expected when the string contains more than one char("aa") with or without hypen. But its not working as expected when i'm simply passing one character string("a") without hypen.
And also these need to allow special characters and alphanumeric characters like "$abcd&". Need to restirct oly hypen at start and end of the string.
Could you guys help out of this..
The pattern you have matches a string that consists of at least 2 chars because [a-zA-Z0-9-' ]+ needs 1 char to match and [^-] requires another char to be present.
You may revamp the lookahead to also fail a string that ends with -:
/^(?!-)(?!.*-$).+$/
^^^^^^^^
See the regex demo
Details
^ - start of a string
(?!-)(?!.*-$) - negative lookaheads that fail the match if the string starts with - or ends with -
.+ - any 1 or more chars other than line break chars (use [\s\S] to match any char)
$ - end of string.
An unrolled version for this pattern would be
^[^-]+(?:-+[^-]+)*$
See this regex demo
Details
^ - start of string
[^-]+ - 1 or more chars other than -
(?:-+[^-]+)* - 0+ sequences of
-+ - 1+ hyphens
[^-]+ - 1 or more chars other than -
$ - end of string.
To allow any character but only disallow hyphen at start and end:
^(?!-).*[^-]$
^ start of string
(?!-) look ahead if there is no hyphen
.* match any amount of any character
[^-] match one character, that is not a hyphen
$ at the end
See demo at regex101

Regex match character before and after underscore

I have to write a regex with matches following:
String should start with alphabets - [a-zA-Z]
String can contain alphabets, spaces, numbers, _ and - (underscore and hyphen)
String should not end with _ or - (underscore and hyphen)
Underscore character should not have space before and after.
I came up with the following regex, but it doesn't seems to work
/^[a-zA-Z0-9]+(\b_|_\b)[a-zA-Z0-9]+$/
Test case:
HelloWorld // Match
Hello_World //Match
Hello _World // doesn't match
Hello_ World // doesn't match
Hello _ World // doesn't match
Hello_World_1 // Match
He110_W0rld // Match
Hello - World // Match
Hello-World // Match
_HelloWorld // doesn't match
Hello_-_World // match
You may use
^(?!.*(?:[_-]$|_ | _))[a-zA-Z][\w -]*$
See the regex demo
Explanation:
^ - start of string
(?!.*(?:[_-]$|_ | _)) - after some chars (.*) there must not appear ((?!...)) a _ or - at the end of string ([_-]$), nor space+_ or _+space
[a-zA-Z] - the first char matched and consumed must be an ASCII letter
[\w -]* - 0+ word (\w = [a-zA-Z0-9_]) chars or space or -
$ - end of string
You could use this one:
^(?!^[ _-]|.*[ _-]$|.* _|.*_ )[\w -]*$
regex tester
For the test cases I used modifier gm to match each line individually.
If emtpy string should not be considered as acceptable, then change the final * to a +:
^(?!^[ _-]|.*[ _-]$|.* _|.*_ )[\w -]+$
Meaning of each part
^ and $ match the beginning/ending of the input
(?! ): list of things that should not match:
|: logical OR
^[ _-]: starts with any of these three characters
.*[ _-]$: ends with any of these three characters
.* _: has space followed by underscore anywhere
.*_: has underscore followed by space anywhere
[\w -]: any alphanumeric character or underscore (also matched by \w) or space or hyphen
*: zero or more times
+: one or more times
What about this?
^[a-zA-Z](\B_\B|[a-zA-Z0-9 -])*[a-zA-Z0-9 ]$
Broken down:
^
[a-zA-Z] allowed characters at beginning
(
\B_\B underscore with no word-boundary
| or
[a-zA-Z0-9 -] other allowed characters
)*
[a-zA-Z0-9 ] allowed characters at end
$
Oh! I love me some regex!
Would this work? /^[a-z]$|^[a-z](?:_(?=[^ ]))?(?:[a-z\d -][^ ]_[^ ])*[a-z\d -]*[^_-]$/i
I was a tad unsure of rule 4--do you mean underscores can have a space before or after or neither, but not before and after?

Regex: string up to 20char long, without specific characters

I am trying to make regexp for validating string not containing
^ ; , & . < > | and having 1-20 characters. Any other Unicode characters are valid (asian letters for example).
How to do it?
You can use the following:
^[^^;,&.<>|]{1,20}$
Explanation:
^ assert starting of the string
[^ start of negated character class ([^ ])
^;,&.<>| all the characters you dont want to match
] close the negates character class
{1,20} range of matches
$ assert ending of the string
It will match any character other than specified characters within range of 1-20.
Your regex \w[^;,&.<>|]{1,20} contains \w that might not match all Unicode letters (I guess your regex flavor does not match Unicode letters with \w). Anyway, the \w only matches 1 character in your pattern.
Also, you say you need to exclude ^ but it is missing in your pattern.
When you want to validate length, you also must use ^/$ anchors to mark the beginning and end of a string.
To create a pattern for some range that does not match specific characters, you need a negated character class with anchors around it, and the length is set with limiting quantifiers:
^[^^;,&.<>|]{1,20}$
Or (this version makes sure we only match at the beginning and end of the string, never a line):
\A[^^;,&.<>|]{1,20}\z
Note that inside a character class, almost all special characters do not require escaping (only some of them, none in your case). Even the ^ caret symbol.
See demo

Categories