Regular expression match specific key words - javascript

I am trying to use regexp to match some specific key words.
For those codes as below, I'd like to only match those IFs at first and second line, which have no prefix and postfix. The regexp I am using now is \b(IF|ELSE)\b, and it will give me all the IFs back.
IF A > B THEN STOP
IF B < C THEN STOP
LOL.IF
IF.LOL
IF.ELSE
Thanks for any help in advance.
And I am using http://regexr.com/ for test.
Need to work with JS.

I'm guessing this is what you're looking for, assuming you've added the m flag for multiline:
(?:^|\s)(IF|ELSE)(?:$|\s)
It's comprised of three groups:
(?:^|\s) - Matches either the beginning of the line, or a single space character
(IF|ELSE) - Matches one of your keywords
(?:$|\s) - Matches either the end of the line, or a single space character.
Regexr

you can do it with lookaround (lookahead + lookbehind). this is what you really want as it explicitly matches what you are searching. you don't want to check for other characters like string start or whitespaces around the match but exactly match "IF or ELSE not surrounded by dots"
/(?<!\.)(IF|ELSE)(?!\.)/g
explanation:
use the g-flag to find all occurrences
(?<!X)Y is a negative lookbehind which matches a Y not preceeded by an X
Y(?!X) is a negative lookahead which matches a Y not followed by an X
working example: https://regex101.com/r/oS2dZ6/1
PS: if you don't have to write regex for JS better use a tool which supports the posix standard like regex101.com

Related

How to not match given prefix in RegEx without negative lookbehind?

Goal
The goal is matching a string in JavaScript without certain delimiters, i.e. a string between two characters (the characters can be included in the match).
For example, this string should be fully matched: $ test string $. This can appear anywhere in a string. That would be trivial, however, we want to allow escaping the syntax, e.g. The price is 5\$ to 10\$.
Summarized:
Match any string that is enclosed by two $ signs.
Do not match it if the dollar signs are escaped using \$.
Solution using negative lookbehind
A solution that achieves this goal perfectly is: (?<!\\)\$(.*?)(?<!\\)\$.
Problem
This solution uses negative lookbehind, which is not supported on Safari. How can the same matches be achieved without using negative lookbehind (i.e. on Safari)?
A solution that partially works is (?<!\\)\$(.*?)(?<!\\)\$. However, this will also match the character in front of the $ sign if it is not a \.
You might rule out what you don't want by matching it, and capture what you want to keep in group 1
\\\$.*?\$|\$.*?\\\$|(\$.*?\$)
Regex demo
You may use this regex and grab your inner text using capture group #1 as you are already doing in your current regex using lookbehind:
(?:^|[^\\])\$((?:\\.|[^$])*)\$
RegEx Demo
RegEx Details:
(?:^|[^\\]): Match start position or a non-backslash character in a non-capturing group
\$: Match starting $
(: Start capturing group
(?:\\.|[^$])*: Match any escaped character or a non-$ character. Repeat this group 0 or more times
): End capturing group
\$: Match closing $
PS: This regex will give same matches as your current regex: (?<!\\)\$(.*?)(?<!\\)\$

javascript regex to extract {abc} but not '{abc}' [duplicate]

I want to replace mm units to cm units in my code. In the case of the big amount of such replacements I use regexp.
I made such expression:
(?!a-zA-Z)mm(?!a-zA-Z)
But it still matches words like summa, gamma and dummy.
How to make up regexp correctly?
Use character classes and change the first (?!...) lookahead into a lookbehind:
(?<![a-zA-Z])mm(?![a-zA-Z])
^^^^^^^^^^^^^ ^^^^^^^^^^^
See the regex demo
The pattern matches:
(?<![a-zA-Z]) - a negative lookbehind that fails the match if there is an ASCII letter immediately to the left of the current location
mm - a literal substring
(?![a-zA-Z]) - a negative lookahead that fails the match if there is an ASCII letter immediately to the right of the current location
NOTE: If you need to make your pattern Unicode-aware, replace [a-zA-Z] with [^\W\d_] (and use re.U flag if you are using Python 2.x).
There's no need to use lookaheads and lookbehinds, so if you wish to simplify your pattern you can try something like this;
\d+\s?(mm)\b
This does assume that your millimetre symbol will always follow a number, with an optional space in-between, which I think that in this case is a reasonable assumption.
The \b checks for a word boundary to make sure the mm is not part of a word such as dummy etc.
Demo here

RegEx matching help: won't match on each appearence

I need to write a little RegEx matcher which will match any occurrence of strings in the form of
[a-zA-Z]+(_[a-zA-Z0-9]+)?
If I use the regex above it does match the sections needed but would also match onto the abc part of 4_abc which is not intended. I tried to exclude it with:
(?:[^a-zA-Z0-9_]|^)([a-zA-Z]+(_[a-zA-Z0-9]+)?)(?:[^a-zA-Z0-9_]|$)
The problem is that the 'not' matches at the beginning and end are not really working like I hoped they would. If I use them on the example
a_d Dd_da 4_d d_4
they would block matching the second Dd_da because the space was used in the first match.Sadly I can't use lookarounds because I am using JS.
So the input:
a_d Dd_da 4_d d_4
should match: a_d, Dd_da and d_4
but matches: a_d (there is a space at the end)
Is there another way to match the needed sections, or to not consume the 'anchor' matches?
I really appreciate your help.
You can make use of \b:
\b[a-zA-Z]+(_[a-zA-Z0-9]+)?\b
\b matches the (zero-width) point where either the preceding character or following character is a letter, digit or underscore, but not both. It also matches with the start/end of the string if the first/last character is a letter, digit or underscore.

Regex to match char after string

I'm attempting to match the first 3 letters that could be a-z followed by a specific character.
For testing I'm using a regex online tester.
I thought this should work (without success):
^[a-z]{0,3}$[z]
My test string is abcz.
Hope you can tell me what I'm doing wrong.
If you need to match a whole string abcz, use
/^[a-z]{0,3}z$/
^^
or - if the 3 letters are compulsory:
/^[a-z]{3}z$/
See the regex demo.
The $[z] in your pattern attempts to match a z after the end of string anchor, which makes the regex fail always.
Details:
^ - string start
[a-z]{0,3} - 0 to 3 lowercase ASCII letters (to require 3 letters, remove 0,)
z - a z
$ - end of string anchor.
You've got the end of line identifier too early
/^[a-z]{0,3}[z]$/m
You can see a working version here
You can do away with the [] around z. Square brackets are used to define a range or list of characters to match - as you're matching only one they're not needed here.
/^[a-z]{0,3}z$/m

How to match all words starting with dollar sign but not slash dollar

I want to match all words which are starting with dollar sign but not slash and dollar sign.
I already try few regex.
(?:(?!\\)\$\w+)
\\(\\?\$\w+)\b
String
$10<i class="">$i01d</i>\$id
Expected result
*$10*
*$i01d*
but not this
*$id*
After find all expected matching word i want to replace this my object.
One option is to eliminate escape sequences first, and then match the cleaned-up string:
s = String.raw`$10<i class="">$i01d</i>\$id`
found = s.replace(/\\./g, '').match(/\$\w+/g)
console.log(found)
The big problem here is that you need a negative lookbehind, however, JavaScript does not support it. It's possible to emulate it crudely, but I will offer an alternative which, while not great, will work:
var input = '$10<i class="">$i01d</i>\\$id';
var regex = /\b\w+\b\$(?!\\)/g;
//sample implementation of a string reversal function. There are better implementations out there
function reverseString(string) {
return string.split("").reverse().join("");
}
var reverseInput = reverseString(input);
var matches = reverseInput
.match(regex)
.map(reverseString);
console.log(matches);
It is not elegant but it will do the job. Here is how it works:
JavaScript does support a lookahead expression ((?>)) and a negative lookahead ((?!)). Since this is the reverse of of a negative lookbehind, you can reverse the string and reverse the regex, which will match exactly what you want. Since all the matches are going to be in reverse, you need to also reverse them back to the original.
It is not elegant, as I said, since it does a lot of string manipulations but it does produce exactly what you want.
See this in action on Regex101
Regex explanation Normally, the "match x long as it's not preceded by y" will be expressed as (?<!y)x, so in your case, the regex will be
/(?<!\\)\$\b\w+\b/g
demonstration (not JavaScript)
where
(?<!\\) //do not match a preceding "\"
\$ //match literal "$"
\b //word boundary
\w+ //one or more word characters
\b //second word boundary, hence making the match a word
When the input is reversed, so do all the tokens in order to match. Furthermore, the negative lookbehind gets inverted into a negative lookahead of the form x(?!y) so the new regular expression is
/\b\w+\b\$(?!\\)/g;
This is more difficult than it appears at first blush. How like Regular Expressions!
If you have look-behind available, you can try:
/(?<!\\)\$\w+/g
This is NOT available in JS. Alternatively, you could specify a boundary that you know exists and use a capture group like:
/\s(\$\w+)/g
Unfortunately, you cannot rely on word boundaries via /b because there's no such boundary before '\'.
Also, this is a cool site for testing your regex expressions. And this explains the word boundary anchor.
If you're using a language that supports negative lookback assertions you can use something like this.
(?<!\\)\$\w+
I think this is the cleanest approach, but unfortunately it's not supported by all languages.
This is a hackier implementation that may work as well.
(?:(^\$\w+)|[^\\](\$\w+))
This matches either
A literal $ at the beginning of a line followed by multiple word characters. Or...
A literal $ this is preceded by any character except a backslash.
Here is a working example.

Categories