Trying to Invalidate #Mentions using regex [duplicate] - javascript

I have a text like this;
[Some Text][1][Some Text][2][Some Text][3][Some Text][4]
I want to match [Some Text][2] with this regex;
/\[.*?\]\[2\]/
But it returns [Some Text][1][Some Text][2]
How can i match only [Some Text][2]?
Note : There can be any character in Some Text including [ and ] And the numbers in square brackets can be any number not only 1 and 2. The Some Text that i want to match can be at the beginning of the line and there can be multiple Some Texts
JSFiddle

The \[.*?\]\[2\] pattern works like this:
\[ - finds the leftmost [ (as the regex engine processes the string input from left to right)
.*? - matches any 0+ chars other than line break chars, as few as possible, but as many as needed for a successful match, as there are subsequent patterns, see below
\]\[2\] - ][2] substring.
So, the .*? gets expanded upon each failure until it finds the leftmost ][2]. Note the lazy quantifiers do not guarantee the "shortest" matches.
Solution
Instead of a .*? (or .*) use negated character classes that match any char but the boundary char.
\[[^\]\[]*\]\[2\]
See this regex demo.
Here, .*? is replaced with [^\]\[]* - 0 or more chars other than ] and [.
Other examples:
Strings between angle brackets: <[^<>]*> matches <...> with no < and > inside
Strings between parentheses: \([^()]*\) matches (...) with no ( and ) inside
Strings between double quotation marks: "[^"]*" matches "..." with no " inside
Strings between curly braces: \{[^{}]*} matches "..." with no " inside
In other situations, when the starting pattern is a multichar string or complex pattern, use a tempered greedy token, (?:(?!start).)*?. To match abc 1 def in abc 0 abc 1 def, use abc(?:(?!abc).)*?def.

You could try the below regex,
(?!^)(\[[A-Z].*?\]\[\d+\])
DEMO

Related

Regex to capture first group in brackets before a match. Here: hexcolor within {my text}{#F00} [duplicate]

I have a text like this;
[Some Text][1][Some Text][2][Some Text][3][Some Text][4]
I want to match [Some Text][2] with this regex;
/\[.*?\]\[2\]/
But it returns [Some Text][1][Some Text][2]
How can i match only [Some Text][2]?
Note : There can be any character in Some Text including [ and ] And the numbers in square brackets can be any number not only 1 and 2. The Some Text that i want to match can be at the beginning of the line and there can be multiple Some Texts
JSFiddle
The \[.*?\]\[2\] pattern works like this:
\[ - finds the leftmost [ (as the regex engine processes the string input from left to right)
.*? - matches any 0+ chars other than line break chars, as few as possible, but as many as needed for a successful match, as there are subsequent patterns, see below
\]\[2\] - ][2] substring.
So, the .*? gets expanded upon each failure until it finds the leftmost ][2]. Note the lazy quantifiers do not guarantee the "shortest" matches.
Solution
Instead of a .*? (or .*) use negated character classes that match any char but the boundary char.
\[[^\]\[]*\]\[2\]
See this regex demo.
Here, .*? is replaced with [^\]\[]* - 0 or more chars other than ] and [.
Other examples:
Strings between angle brackets: <[^<>]*> matches <...> with no < and > inside
Strings between parentheses: \([^()]*\) matches (...) with no ( and ) inside
Strings between double quotation marks: "[^"]*" matches "..." with no " inside
Strings between curly braces: \{[^{}]*} matches "..." with no " inside
In other situations, when the starting pattern is a multichar string or complex pattern, use a tempered greedy token, (?:(?!start).)*?. To match abc 1 def in abc 0 abc 1 def, use abc(?:(?!abc).)*?def.
You could try the below regex,
(?!^)(\[[A-Z].*?\]\[\d+\])
DEMO

Do not allow '.'(dot) anywhere in a string (regular expression)

I have a regular expression for allowing unicode chars in names(Spanish, Japanese etc), but I don't want to allow '.'(dot) anywhere in the string.
I have tried this regex but it fails when string length is less than 3. I am using xRegExp.
^[^.][\\pL ,.'-‘’][^.]+$
For Example:
NOËL // true
Sanket ketkar // true
.sank // false
san. ket // false
NOËL.some // false
Basically it should return false when name has '.' in it.
Your pattern ^[^.][\\pL ,.'-‘’][^.]+$ matches at least 3 characters because you use 3 characters classes, where the first 2 expect to match at least 1 character and the last one matches 1 or more times.
You could remove the dot from your character class and repeat that character class only to match 1+ times any of the listed to also match when there are less than 3 characters.
^[\p{L} ,'‘’-]+$
Regex demo
Or you could use a negated character class:
^[^.\r\n]+$
^ Start of string
[^.\r\n]+ Negated character class, match any char except a dot or newline
$ End of string
Regex demo
You could try:
^[\p{L},\-\s‘’]+(?!\.)$
As seen here: https://regex101.com/r/ireqbW/5
Explanation -
The first part of the regex [\p{L},\-\s‘’]+ matches any unicode letter, hyphen or space (given by \s)
(?!\.) is a Negative LookAhead in regex, which basically tells the regex that for each match, it should not be followed by a .
^[^.]+$
It will match any non-empty string that does not contain a dot between the start and the end of the string.
If there is a dot somewhere between start to end (i.e. anywhere) it will fail.

Regular expression to fetch beginning of string or a symbol

I am writing a function to find attributes value from given string and given attribute name.
The input stings look like those below:
sip:+19999999999#trunkgroup2:5060;user=phone
<sip:+19999999999;tgrp=0180401;trunk-context=aaaa.aaaa.ca#10.10.10.100:8000;user=phone;transport=udp>
<sip:19999999999;tgrp=0306001;trunk-context=aaaa.aaaa.ca#10.10.10.100:8000;transport=udp>
<sip:+19999999999;tgrp=SMPPDIN;trunk-context=aaaa.aaaa.ca#10.10.10.100:8000;transport=udp>
After few hours I came out with this regular expression: /(\Wsip[:,+,=]+)(\w+)/g, but this is not working for the first example - as there is no not a word character before the attributes name.
How can I fix this expression to fetch both cases - <sip... and sip.. only when it is the beginning of the string.
I use this function to extract both sip and tgrp values.
Replace \W with \b, and use
\b(sip[:+=]+)(\w+)
Or, to match at the beginning of a string:
^\W?(sip[:+=]+)(\w+)
See the first regex demo and the second regex demo.
As \W is a consuming pattern matching any non-word char (a char other than a letter/digit/_) you won't have a match at the start of the string. A \b word boundary will match at the start of the string and in case there is a non-word char before s.
If you literally need to find a match at the beginning of a string after an optional non-word char, the \W must be replaced with ^\W? where ^ match the start of a string, and \W? matches 1 or 0 non-word chars.
Also, note that , inside a character class is matched as a literal ,. If you mean to use it to enumerate chars, you should remove it.
Pattern details:
\b - a word boundary
OR
^ - start of string
\W? - 1 or 0 (due to the ? quantifier) non-word chars (i.e. chars other than letters/digits and _)
(sip[:+=]+) - Group 1: sip substring followed with one or more :, + or = chars
(\w+) - Group 2: one or more word chars.
for begining of line use ^ and to make < is optional use ?
^<?(sip[:,+,=]+)(\w+)

Capture between pattern of digits

I'm stuck trying to capture a structure like this:
1:1 wefeff qwefejä qwefjk
dfjdf 10:2 jdskjdksdjö
12:1 qwe qwe: qwertyå
I would want to match everything between the digits, followed by a colon, followed by another set of digits. So the expected output would be:
match 1 = 1:1 wefeff qwefejä qwefjk dfjdf
match 2 = 10:2 jdskjdksdjö
match 3 = 12:1 qwe qwe: qwertyå
Here's what I have tried:
\d+\:\d+.+
But that fails if there are word characters spanning two lines.
I'm using a javascript based regex engine.
You may use a regex based on a tempered greedy token:
/\d+:\d+(?:(?!\d+:\d)[\s\S])*/g
The \d+:\d+ part will match one or more digits, a colon, one or more digits and (?:(?!\d+:\d)[\s\S])* will match any char, zero or more occurrences, that do not start a sequence of one or more digits followed with a colon and a digit. See this regex demo.
As the tempered greedy token is a resource consuming construct, you can unroll it into a more efficient pattern like
/\d+:\d+\D*(?:\d(?!\d*:\d)\D*)*/g
See another regex demo.
Now, the () is turned into a pattern that matches strings linearly:
\D* - 0+ non-digit symbols
(?: - start of a non-capturing group matching zero or more sequences of:
\d - a digit that is...
(?!\d*:\d) - not followed with 0+ digits, : and a digit
\D* - 0+ non-digit symbols
)* - end of the non-capturing group.
you can use or not the ñ-Ñ, but you should be ok this way
\d+?:\d+? [a-zñA-ZÑ ]*
Edited:
If you want to include the break lines, you can add the \n or \r to the set,
\d+?:\d+? [a-zñA-ZÑ\n ]*
\d+?:\d+? [a-zñA-ZÑ\r ]*
Give it a try ! also tested in https://regex101.com/
for more chars:
^[a-zA-Z0-9!##\$%\^\&*)(+=._-]+$

Javascript Regular Expression .*?

I have the following regular expressions:
var regEx = /^\W*(.*?)\W*$/;
var regEx2 = /^\W*(.*)\W*$/;
What does (.*?) actually mean? What's the difference between (.*?) and (.*)?
Why does regEx.exec("abc ") returns ['abc ', 'abc'] in Javascript?
Why does regEx2.exec("abc ") returns ['abc ', 'abc '] in Javascript?
Adding ? after quantifier *, +, {n,m}, etc. makes reluctant/lazy matching, as opposed to the default greedy matching. It's quite intuitive from the name. Greedy means it will try to match as many as possible. Lazy means that it will try to match as few as possible.
There is no non-word \W token, so \W* matches empty string. Then (.*?) will match as few as possible but checking whether \W* can match something. So (.*?) will match and capture "abc", and \W* (non-word) will match the space.
Almost the same as above, but (.*) will eat up as much as possible and will match and capture "abc " , and \W* will be left with empty string, which it matches.
For 2 and 3, the 2nd element in the return array is the captured text by the first capturing group in the regex. The 1st element in the array is the text that matches the entire regex.
What does (.*?) actually mean?
Non-greedily match any character zero or more times, in a matching group.
Why does regEx.exec("abc ") returns ['abc ', 'abc'] in Javascript?
You get one member of the array for each matching group. The element at index 0 is the entire match, the next element is from the first (and only) matching group above.
Why does regEx2.exec("abc ") returns ['abc ', 'abc '] in Javascript?
For the same reason as above, except this time, the greedy match will match the space at the end as well, so your first capture group is identical to the full match in this case.
Okay, the easiest thing to do when looking at regular expressions I find is to break them down and write out what each part is doing.
So taking the first regular expression /^\W*(.*?)\W*$/
^ Start of search string
\W* Match a non-word character zero or more times
( Start of group
.*? Match any character (except a line terminator) zero or more times but as few as possible
) End of group
\W* Match a non-word character zero or more times
$ End of search string
The exec method searches the text and returns an array of strings (or null if it fails). The string at element 0 is the substring matched by the entire expression, strings after this are those which correspond to the individual capture groups.
So for your first example, the entire expression is capturing "abc " but the (.*?) group is capturing "abc" and so you get two items in your array

Categories