Regex match character before and after underscore - javascript

I have to write a regex with matches following:
String should start with alphabets - [a-zA-Z]
String can contain alphabets, spaces, numbers, _ and - (underscore and hyphen)
String should not end with _ or - (underscore and hyphen)
Underscore character should not have space before and after.
I came up with the following regex, but it doesn't seems to work
/^[a-zA-Z0-9]+(\b_|_\b)[a-zA-Z0-9]+$/
Test case:
HelloWorld // Match
Hello_World //Match
Hello _World // doesn't match
Hello_ World // doesn't match
Hello _ World // doesn't match
Hello_World_1 // Match
He110_W0rld // Match
Hello - World // Match
Hello-World // Match
_HelloWorld // doesn't match
Hello_-_World // match

You may use
^(?!.*(?:[_-]$|_ | _))[a-zA-Z][\w -]*$
See the regex demo
Explanation:
^ - start of string
(?!.*(?:[_-]$|_ | _)) - after some chars (.*) there must not appear ((?!...)) a _ or - at the end of string ([_-]$), nor space+_ or _+space
[a-zA-Z] - the first char matched and consumed must be an ASCII letter
[\w -]* - 0+ word (\w = [a-zA-Z0-9_]) chars or space or -
$ - end of string

You could use this one:
^(?!^[ _-]|.*[ _-]$|.* _|.*_ )[\w -]*$
regex tester
For the test cases I used modifier gm to match each line individually.
If emtpy string should not be considered as acceptable, then change the final * to a +:
^(?!^[ _-]|.*[ _-]$|.* _|.*_ )[\w -]+$
Meaning of each part
^ and $ match the beginning/ending of the input
(?! ): list of things that should not match:
|: logical OR
^[ _-]: starts with any of these three characters
.*[ _-]$: ends with any of these three characters
.* _: has space followed by underscore anywhere
.*_: has underscore followed by space anywhere
[\w -]: any alphanumeric character or underscore (also matched by \w) or space or hyphen
*: zero or more times
+: one or more times

What about this?
^[a-zA-Z](\B_\B|[a-zA-Z0-9 -])*[a-zA-Z0-9 ]$
Broken down:
^
[a-zA-Z] allowed characters at beginning
(
\B_\B underscore with no word-boundary
| or
[a-zA-Z0-9 -] other allowed characters
)*
[a-zA-Z0-9 ] allowed characters at end
$

Oh! I love me some regex!
Would this work? /^[a-z]$|^[a-z](?:_(?=[^ ]))?(?:[a-z\d -][^ ]_[^ ])*[a-z\d -]*[^_-]$/i
I was a tad unsure of rule 4--do you mean underscores can have a space before or after or neither, but not before and after?

Related

Issue with javascript regex not matching less than 3 characters

I have the following javascript regex:
/^[^\s][a-z0-9 ]+[^\s]$/i
I need to allow any alphanumeric character as well as spaces inside the string but not at the beginning nor at the end.
Oddly enough, the above regex will not accept less than 3 characters, e.g. aa will not match but aaa will.
I am not sure why. Can anyone please help ?
You have: [^\s] (requires matching at least one non-whitespace character), [a-z0-9 ]+ (requires matching at least one alphanumeric or space character), and [^\s] again (requires matching at least one non-whitespace character). So, in total, you need at least 3 characters in the string.
Use word boundaries at the beginning and end instead:
/^\b[a-z0-9 ]+\b$/i
https://regex101.com/r/2GhH3N/1
Try the following regex:
^(?! )[a-z0-9 ]*[a-z0-9]$
Details:
^(?! ) - Start of the string and no space after it (so here we exclude the
initial space).
[a-z0-9 ]* - A sequence of letters, digits and spaces, possibly empty
(the content before the last letter(see below).
[a-z0-9]$ - The last letter and the end of string (so here we exclude the
terminal space).
You should re-write the expression as
/^[a-z0-9]+(?:\s+[a-z0-9]+)*$/i
See the regex demo.
NOTE: If only one whitespace is allowed between the alphanumeric chars use
/^[a-z0-9]+(?:\s[a-z0-9]+)*$/i
^^
Details
^ - start of string
[a-z0-9]+ - 1+ letters/digits
(?:\s+[a-z0-9]+)* - 0 or more repetitions of 1+ whitespaces (\s+) and 1+ digit/letters
$ - end of string.
See the regex graph:

Regex to validate hypen(-) at start and end of the string

I'm validating a string("-test-") whether it contains hypens(-) at start and end of the string using regex. So i found an regex to restrict hypen at start and end of regex.
/^(?!-)[a-zA-Z0-9-' ]+[^-]$/i
This regex was validating as expected when the string contains more than one char("aa") with or without hypen. But its not working as expected when i'm simply passing one character string("a") without hypen.
And also these need to allow special characters and alphanumeric characters like "$abcd&". Need to restirct oly hypen at start and end of the string.
Could you guys help out of this..
The pattern you have matches a string that consists of at least 2 chars because [a-zA-Z0-9-' ]+ needs 1 char to match and [^-] requires another char to be present.
You may revamp the lookahead to also fail a string that ends with -:
/^(?!-)(?!.*-$).+$/
^^^^^^^^
See the regex demo
Details
^ - start of a string
(?!-)(?!.*-$) - negative lookaheads that fail the match if the string starts with - or ends with -
.+ - any 1 or more chars other than line break chars (use [\s\S] to match any char)
$ - end of string.
An unrolled version for this pattern would be
^[^-]+(?:-+[^-]+)*$
See this regex demo
Details
^ - start of string
[^-]+ - 1 or more chars other than -
(?:-+[^-]+)* - 0+ sequences of
-+ - 1+ hyphens
[^-]+ - 1 or more chars other than -
$ - end of string.
To allow any character but only disallow hyphen at start and end:
^(?!-).*[^-]$
^ start of string
(?!-) look ahead if there is no hyphen
.* match any amount of any character
[^-] match one character, that is not a hyphen
$ at the end
See demo at regex101

Regular expression to fetch beginning of string or a symbol

I am writing a function to find attributes value from given string and given attribute name.
The input stings look like those below:
sip:+19999999999#trunkgroup2:5060;user=phone
<sip:+19999999999;tgrp=0180401;trunk-context=aaaa.aaaa.ca#10.10.10.100:8000;user=phone;transport=udp>
<sip:19999999999;tgrp=0306001;trunk-context=aaaa.aaaa.ca#10.10.10.100:8000;transport=udp>
<sip:+19999999999;tgrp=SMPPDIN;trunk-context=aaaa.aaaa.ca#10.10.10.100:8000;transport=udp>
After few hours I came out with this regular expression: /(\Wsip[:,+,=]+)(\w+)/g, but this is not working for the first example - as there is no not a word character before the attributes name.
How can I fix this expression to fetch both cases - <sip... and sip.. only when it is the beginning of the string.
I use this function to extract both sip and tgrp values.
Replace \W with \b, and use
\b(sip[:+=]+)(\w+)
Or, to match at the beginning of a string:
^\W?(sip[:+=]+)(\w+)
See the first regex demo and the second regex demo.
As \W is a consuming pattern matching any non-word char (a char other than a letter/digit/_) you won't have a match at the start of the string. A \b word boundary will match at the start of the string and in case there is a non-word char before s.
If you literally need to find a match at the beginning of a string after an optional non-word char, the \W must be replaced with ^\W? where ^ match the start of a string, and \W? matches 1 or 0 non-word chars.
Also, note that , inside a character class is matched as a literal ,. If you mean to use it to enumerate chars, you should remove it.
Pattern details:
\b - a word boundary
OR
^ - start of string
\W? - 1 or 0 (due to the ? quantifier) non-word chars (i.e. chars other than letters/digits and _)
(sip[:+=]+) - Group 1: sip substring followed with one or more :, + or = chars
(\w+) - Group 2: one or more word chars.
for begining of line use ^ and to make < is optional use ?
^<?(sip[:,+,=]+)(\w+)

Capture between pattern of digits

I'm stuck trying to capture a structure like this:
1:1 wefeff qwefejä qwefjk
dfjdf 10:2 jdskjdksdjö
12:1 qwe qwe: qwertyå
I would want to match everything between the digits, followed by a colon, followed by another set of digits. So the expected output would be:
match 1 = 1:1 wefeff qwefejä qwefjk dfjdf
match 2 = 10:2 jdskjdksdjö
match 3 = 12:1 qwe qwe: qwertyå
Here's what I have tried:
\d+\:\d+.+
But that fails if there are word characters spanning two lines.
I'm using a javascript based regex engine.
You may use a regex based on a tempered greedy token:
/\d+:\d+(?:(?!\d+:\d)[\s\S])*/g
The \d+:\d+ part will match one or more digits, a colon, one or more digits and (?:(?!\d+:\d)[\s\S])* will match any char, zero or more occurrences, that do not start a sequence of one or more digits followed with a colon and a digit. See this regex demo.
As the tempered greedy token is a resource consuming construct, you can unroll it into a more efficient pattern like
/\d+:\d+\D*(?:\d(?!\d*:\d)\D*)*/g
See another regex demo.
Now, the () is turned into a pattern that matches strings linearly:
\D* - 0+ non-digit symbols
(?: - start of a non-capturing group matching zero or more sequences of:
\d - a digit that is...
(?!\d*:\d) - not followed with 0+ digits, : and a digit
\D* - 0+ non-digit symbols
)* - end of the non-capturing group.
you can use or not the ñ-Ñ, but you should be ok this way
\d+?:\d+? [a-zñA-ZÑ ]*
Edited:
If you want to include the break lines, you can add the \n or \r to the set,
\d+?:\d+? [a-zñA-ZÑ\n ]*
\d+?:\d+? [a-zñA-ZÑ\r ]*
Give it a try ! also tested in https://regex101.com/
for more chars:
^[a-zA-Z0-9!##\$%\^\&*)(+=._-]+$

What does the regular expression \|(?=\w=>) mean?

I am an amateur in JavaScript. I saw this other (now deleted) question, and it made me wonder. Can you tell me what does the below regular expression exactly mean?
split(/\|(?=\w=>)/)
Does it split the string with |?
The regular expression is contained in the slashes.
It means
\| # A pipe symbol. It needs to be scaped with a backslash
# because otherwise it means "OR"
(?= # a so-called lookahead group. It checks if its contents match
# at the current position without actually advancing in the string
\w=> # a word character (a-z, A-Z, 0-9, _) followed by =>
) # end of lookahead group.
It splits the string on | but only if its followed by a char in [a-zA-Z0-9_] and =>
Example:
It will split a|b=> on the |
It will not split a|b on the |
It splits the string on every '|' followed by (?) an alphanumerical character (\w, shorthand for [a-zA-Z0-9_]) + the character sequence '=>'.
Here's a link that can help you understand regular expressions in javascript
Breakdown of the regular expression:
/ regular expression literal start delimiter
\| match | in the string, | is a special character in regex, so \ is used to escape it
(?= Is a lookahead expression, it checks to see if a string follows the expression without matching it
\w=> matches any alphanumeric string (including _), followed by =>
)/ marks the end of the lookahead expression and the end of the regex
In short, the string will be split on | if it is followed by any alphanumeric character or underscore and then =>.
In this case, the pipe character is escaped so it's treated as a literal pipe. The split occurs on pipes that are followed by any alphanumeric and '=>'.
The '|' is also used in regular expressions as a sort of OR operator. For example:
split(/k|i|tt|y/)
Would split on either a 'k', an 'i', a 'tt' or a 'y' character.
Trimming the delimiting characters, we get \|(?=\w=>)
| is a special character in regex, so it should be escaped with a backslash as \|
(?=REGEX) is syntax for positive look ahead: matches only if REGEX matches, but doesn't consume the substring that matches REGEX. The match to the REGEX doesn't become part of the matched result. Had it been mere \|\w=>, the parent string would be split around |a=> instead of |.
Thus /\|(?=\w=>)/ matches only those | characters that are followed by \w=>. It matches |a=> but not |a>, || etc.
Consider the example string from the linked question: a=>aa|b=>b||b|c=>cc. If it wasn't for the lookahead, split will yield an array of [a=>aa, b||b, cc]. With lookahead, you'll get [a=>aa, b=>b||b, c=>cc], which is the desired output.

Categories