I have a string in JavaScript which looks like the following:
"This {{#is}} a $|test$| string with $|#string$| delimiters {{as}} follows"
And I have a regex which is used to extract the strings between $|.*?$| and {{.*?}} as follows:
/{{(.*?)}}|\$\|(.*?)\$\|/g
Example: https://regex101.com/r/mV3uR1/1
I would like to combine the alternation so there is only one matching group, e.g.:
/{{|\$\|(.*?)}}|\$\|/g
But this seems to ignore my quantifier for 0 or 1 times (the ?) and it matches the entire string up to ... {{as.
Example: https://regex101.com/r/qZ7iI5/1
Why is that happening?
If I enhance that regex to include parenthesis as follows, it does work:
/({{|\$\|)(.*?)(}}|\$\|)/g
Example: https://regex101.com/r/fT5qH0/1
But this then includes the curly braces/dollar-pipe in my matching group which is what I am trying to avoid (as I only care about the string between these delimiters so only want one matching group).
Can anybody shed some light on this please?
Let's compare:
working regex:
/({{|\$\|)(.*?)(}}|\$\|)/g
and not working regex:
/{{|\$\|(.*?)}}|\$\|/g
In the 2nd regex (.*?) has to be followed by }} and alternation is for the whole \$\|(.*?)}} sub-pattern so effectively it means match:
{{ OR \$\|(.*?)}} OR \$\|
Whereas in the first regex due to grouping alternation is correctly applied before & after (.*?).
You can use non-capturing groups as well:
/(?:{{|\$\|)(.*?)(?:}}|\$\|)/g
Now it means:
{{ OR \$\| followed by (.*?) followed by }} OR \$\|.
(?:{{|\$\|)(.*?)(?:}}|\$\|)
^^ ^^
You can try this.See demo.By making the other groups non capturing you will have only 1 group.
https://regex101.com/r/cT0hV4/13
But this will match {{asd$| too.
Your regex {{|\$\|(.*?)}}|\$\| will match any of the 3 following different strings:
{{
\$\|(.*?)}} #look at the start and end of this string and you will understand
$|.
That is the reason you are getting that match,
Related
Goal
The goal is matching a string in JavaScript without certain delimiters, i.e. a string between two characters (the characters can be included in the match).
For example, this string should be fully matched: $ test string $. This can appear anywhere in a string. That would be trivial, however, we want to allow escaping the syntax, e.g. The price is 5\$ to 10\$.
Summarized:
Match any string that is enclosed by two $ signs.
Do not match it if the dollar signs are escaped using \$.
Solution using negative lookbehind
A solution that achieves this goal perfectly is: (?<!\\)\$(.*?)(?<!\\)\$.
Problem
This solution uses negative lookbehind, which is not supported on Safari. How can the same matches be achieved without using negative lookbehind (i.e. on Safari)?
A solution that partially works is (?<!\\)\$(.*?)(?<!\\)\$. However, this will also match the character in front of the $ sign if it is not a \.
You might rule out what you don't want by matching it, and capture what you want to keep in group 1
\\\$.*?\$|\$.*?\\\$|(\$.*?\$)
Regex demo
You may use this regex and grab your inner text using capture group #1 as you are already doing in your current regex using lookbehind:
(?:^|[^\\])\$((?:\\.|[^$])*)\$
RegEx Demo
RegEx Details:
(?:^|[^\\]): Match start position or a non-backslash character in a non-capturing group
\$: Match starting $
(: Start capturing group
(?:\\.|[^$])*: Match any escaped character or a non-$ character. Repeat this group 0 or more times
): End capturing group
\$: Match closing $
PS: This regex will give same matches as your current regex: (?<!\\)\$(.*?)(?<!\\)\$
I need to write a little RegEx matcher which will match any occurrence of strings in the form of
[a-zA-Z]+(_[a-zA-Z0-9]+)?
If I use the regex above it does match the sections needed but would also match onto the abc part of 4_abc which is not intended. I tried to exclude it with:
(?:[^a-zA-Z0-9_]|^)([a-zA-Z]+(_[a-zA-Z0-9]+)?)(?:[^a-zA-Z0-9_]|$)
The problem is that the 'not' matches at the beginning and end are not really working like I hoped they would. If I use them on the example
a_d Dd_da 4_d d_4
they would block matching the second Dd_da because the space was used in the first match.Sadly I can't use lookarounds because I am using JS.
So the input:
a_d Dd_da 4_d d_4
should match: a_d, Dd_da and d_4
but matches: a_d (there is a space at the end)
Is there another way to match the needed sections, or to not consume the 'anchor' matches?
I really appreciate your help.
You can make use of \b:
\b[a-zA-Z]+(_[a-zA-Z0-9]+)?\b
\b matches the (zero-width) point where either the preceding character or following character is a letter, digit or underscore, but not both. It also matches with the start/end of the string if the first/last character is a letter, digit or underscore.
What I am trying is to match until first occurrence of & met. Right now it is matching only the last occurrence of &.
My regular expression is
(?!^)(http[^\\]+)\&
And I'm trying to match against this text:
https://www.google.com/url?rct3Dj&sa3Dt&url3Dhttp://business.itbusinessnet.com/article/WorldStage-Supports-Massive-4K-Video-Mapping-at-Adobe-MAX-with-Christie-Boxer-4K-Projectors---4820052&ct3Dga&cd3DCAEYACoTOTEwNTAyMzI0OTkyNzU0OTI0MjIaMTBmYTYxYzBmZDFlN2RlZjpjb206ZW46VVM&usg3DAFQjCNE6oIhIxR6qRMBmLkHOJTKLvamLFg
What I need is:
http://business.itbusinessnet.com/article/WorldStage-Supports-Massive-4K-Video-Mapping-at-Adobe-MAX-with-Christie-Boxer-4K-Projectors---4820052
Click for the codebase.
Use the non-greedy mode like this:
/(?!^)(http[^\\]+?)&/
// ^
In non-greedy mode (or lazy mode) the match will be as short as possible.
If you want to get rid ot the & then just wrap it in a lookahead group so it won't be in the match like this:
/(?!^)(http[^\\]+?)(?=&)/
// ^^ ^
Or you could optimize the regular expression as #apsillers suggested in the comment bellow like this:
/(?!^)(http[^\\&]+)/
Note: & is not a special character so you don't need to escape it,
Say I have an array of words, for example: (hi|ll|this|that|etc) and I want to find it in the following text:
Hi, I'll match this and ll too
I'm using: \\b(hi|ll|this|that|etc)\\b
But I want to only match whole words, excluding words found in contractions. Basically treat apostrophes as another "word seperator". In this case, it shouldn't match the "ll" in "I'll".
Ideas?
Use the apostrophe in addition to \b to begin and end a match:
(?:\b|')(hi|ll|this|that|etc)(?:\b|')
(?:...) means a non-capturing group. Stub on Regex101
If you want match just words you can try with:
(?:^|(?=[^']).\b)(hi|ll|th(?:is|at)|etc)\b
DEMO
and get words with group 1. However the \b will still allow to match fragments like: -this or #ll. I don't know is it desired result.
I'm working on a special regex to match a javascript regex.
For now I have this regex working:
/\/(.*)?\/([i|g|m]+)?/
For example:
'/^foo/'.match(/\/(.*)?\/([i|g|m]+)?/) => ["/^foo/", "^foo", undefined]
'/^foo/i'.match(/\/(.*)?\/([i|g|m]+)?/) => ["/^foo/i", "^foo", "i"]
Now I need to get this regex working with:
'^foo'.match(/\/(.*)?\/([i|g|m]+)?/) => ["^foo", "^foo", undefined]
Unfortunately my previous regex doesn't work for that one.
Can someone help me to find a regex matching this example (and others too):
'^foo'.match([a regex]) => ["^foo", "^foo", undefined]
A regular expression to match a regular expression is
/\/((?![*+?])(?:[^\r\n\[/\\]|\\.|\[(?:[^\r\n\]\\]|\\.)*\])+)\/((?:g(?:im?|mi?)?|i(?:gm?|mg?)?|m(?:gi?|ig?)?)?)/
To break it down,
\/ matches a literal /
(?![*+?]) is necessary because /* starts a comment, not a regular expression.
[^\r\n\[/\\] matches any non-escape sequence character and non-start of character group
\[...\] matches a character group which can contain an un-escaped /.
\\. matches a prefix of an escape sequence
+ is necessary because // is a line comment, not a regular expression.
(?:g...)? matches any combination of non-repeating regular expression flags. So ugly.
This doesn't attempt to pair parentheses, or check that repetition modifiers are not applied to themselves, but filters out most of the other ways that regular expressions fail to syntax check.
If you need one that matches just the body, just strip off everything else:
/(?![*+?])(?:[^\r\n\[/\\]|\\.|\[(?:[^\r\n\]\\]|\\.)*\])+/
or alternatively, add "/" at the beginning and end of your input.