Regex to extract function notation with nested functions (javascript) - javascript

I am trying to extract function notation from a string in javascript (so I can't use lookbehind), here are some examples:
f(2) should match f(
f(10)g(8) should match f( and g(
f(2+g(3)) should match f( and g(
f(2+g(sqrt(10))) should match f( and g(
f(g(2)) should match f( and g(
Right now I am using
/\b[a-z]\([^x]/g
because I don't want to match when it is a string of letters (such as sqrt) only when there is a single letter then a parentheses. The problem I am having is with the last one in the list (nested functions). ( is not part of the \b catches so it doesn't match.
My current plan is to add a space after every ( using something like
input = input.replace(/\([^\s]/g, '( ');
Which splits the nested function so that \b comes into play [becomes f( g( 3))] but before I started messing with the input string, I thought I would ask here if there was a better way to do it. Obviously regex is not something I am super strong with but I am trying to learn so an explanation with the answer would be appreciated (though I will take any pointers that I can google myself too! I am not entirely sure of what to search for here.)

The point here is that [^x] is a negated character class that still matches, consumes the symbol after ( and it prevents overlapping matches. To make a check that the next character is not x, use a lookahead:
\b[a-z]\((?!x)
^^^^^
See regex demo
Perhaps, you want to fail a match only if a x is the only letter inside f() or g():
\b[a-z]\((?!x\))
From Regular-expressions.info:
Negative lookahead is indispensable if you want to match something not followed by something else. When explaining [character classes][4], this tutorial explained why you cannot use a negated character class to match a q not followed by a u. Negative lookahead provides the solution: q(?!u). The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point.

I think you just have to remove [^x]:
"f(g(2))".match(/\b[a-z]\(/g)
// ["f(", "g("]

Related

Regex is capturing too much of the string

I have these three regex statements and I'm using Javascript
/(?<=[AND])\s?\[(.*?)\]/
/(?<=[OR])\s?\[(.*?)\]/
/(?<=[NOT])\s?\[(.*?)\]/
given a string like -> AND [thing1, thing2, thing3] OR [thing4, thing5] NOT [thing6] I would expect the matches for the patterns to return in order
thing1, thing2, thing3
thing4, thing5
thing6
When a user enters a string like this -> AND [thing1, thing2, thing3 OR [thing4, thing5]
the first pattern returns
thing1, thing2, thing3 OR [thing4, thing5
I'm trying to figure out how to prevent the regex from matching when there is a boolean keyword present before a closing bracket. I've tried messing around with adding [^NOT]|[^OR] ^NOT|^OR in the capturing group but nothing I've done works right (regex newb here).
Also if there are any other potentially obvious mistakes with my current regex you see please point them out.
Edit: Sorry everyone for the slow response, it took me a little while to decipher what your regexes are doing. Thank you for the info about the brackets being capturing groups, I see what I was doing wrong beforehand.
Why not combine each of these patterns into a single matching group? You seem to have some fundamental misconceptions, specifically regarding the use of square brackets ([]) in your pattern.
It's not clear why you've elected to include your logical tokens in these square brackets - in RegExp, these are used specifically to denotate character sets. The way you've expressed this in your original pattern matches one of the two or three characters in the set literally, and not the entire word (as appears to be your intention). You also seem to have fallen victim to the same misconception in your attempt to include these logical tokens in the last group in your pattern.
Instead, use alternatives (denotated by vertical pipes |) correctly:
const test1 = `AND [thing1, thing2, thing3] OR [thing4, thing5] NOT [thing6]`;
const test2 = `AND [thing1, thing2, thing3 OR [thing4, thing5]`;
const pattern = /(?<=AND|OR|NOT)(?:\s?)\[(.*?)(?:\]|\s(AND|OR|NOT))/g;
console.log([...test1.matchAll(pattern)].map(match => match[1]));
console.log([...test2.matchAll(pattern)].map(match => match[1]));
regex101
Since you're new to JavaScript, I'd recommend using a utility like regex101 to build your patterns - by default, a pane in the right-hand side of the window explains, in plain English, what each part of your pattern actually does, which you can compare to what you expect it to do and adjust accordingly.
Using [AND] means a character class, and matches a single character A N D
Using [^NOT] means a negated character class matching any single char except N O T
Using ^NOT|^OR means matching either NOT or OR at the start of the string
If you want to 3 alternatives, you can do so using an alternation in a group (?:AND|OR|NOT)
Using Javascript, you might assert that between the opening and the closing square bracket there is no AND OR NOT
(?<=(?:AND|OR|NOT)\s?\[)(?![^\][]*\b(?:AND|OR|NOT)\b)[^\][]*(?=\])
(?<=(?:AND|OR|NOT)\s?\[) Positive lookbehind, assert any of the alternatives to the left
(?![^\][]*\b(?:AND|OR|NOT)\b) Negative lookahead to assert not any of the alternatives before any of the square brackets
[^\][]* Match optional chars other than [ and ]
(?=\]) Positive lookahead, assert ] to the right
Regex demo

Get the Opposite of a Regular Expression [duplicate]

Is it possible to write a regex that returns the converse of a desired result? Regexes are usually inclusive - finding matches. I want to be able to transform a regex into its opposite - asserting that there are no matches. Is this possible? If so, how?
http://zijab.blogspot.com/2008/09/finding-opposite-of-regular-expression.html states that you should bracket your regex with
/^((?!^ MYREGEX ).)*$/
, but this doesn't seem to work. If I have regex
/[a|b]./
, the string "abc" returns false with both my regex and the converse suggested by zijab,
/^((?!^[a|b].).)*$/
. Is it possible to write a regex's converse, or am I thinking incorrectly?
Couldn't you just check to see if there are no matches? I don't know what language you are using, but how about this pseudocode?
if (!'Some String'.match(someRegularExpression))
// do something...
If you can only change the regex, then the one you got from your link should work:
/^((?!REGULAR_EXPRESSION_HERE).)*$/
The reason your inverted regex isn't working is because of the '^' inside the negative lookahead:
/^((?!^[ab].).)*$/
^ # WRONG
Maybe it's different in vim, but in every regex flavor I'm familiar with, the caret matches the beginning of the string (or the beginning of a line in multiline mode). But I think that was just a typo in the blog entry.
You also need to take into account the semantics of the regex tool you're using. For example, in Perl, this is true:
"abc" =~ /[ab]./
But in Java, this isn't:
"abc".matches("[ab].")
That's because the regex passed to the matches() method is implicitly anchored at both ends (i.e., /^[ab].$/).
Taking the more common, Perl semantics, /[ab]./ means the target string contains a sequence consisting of an 'a' or 'b' followed by at least one (non-line separator) character. In other words, at ANY point, the condition is TRUE. The inverse of that statement is, at EVERY point the condition is FALSE. That means, before you consume each character, you perform a negative lookahead to confirm that the character isn't the beginning of a matching sequence:
(?![ab].).
And you have to examine every character, so the regex has to be anchored at both ends:
/^(?:(?![ab].).)*$/
That's the general idea, but I don't think it's possible to invert every regex--not when the original regexes can include positive and negative lookarounds, reluctant and possessive quantifiers, and who-knows-what.
You can invert the character set by writing a ^ at the start ([^…]). So the opposite expression of [ab] (match either a or b) is [^ab] (match neither a nor b).
But the more complex your expression gets, the more complex is the complementary expression too. An example:
You want to match the literal foo. An expression, that does match anything else but a string that contains foo would have to match either
any string that’s shorter than foo (^.{0,2}$), or
any three characters long string that’s not foo (^([^f]..|f[^o].|fo[^o])$), or
any longer string that does not contain foo.
All together this may work:
^[^fo]*(f+($|[^o]|o($|[^fo]*)))*$
But note: This does only apply to foo.
You can also do this (in python) by using re.split, and splitting based on your regular expression, thus returning all the parts that don't match the regex, how to find the converse of a regex
In perl you can anti-match with $string !~ /regex/;.
With grep, you can use --invert-match or -v.
Java Regexps have an interesting way of doing this (can test here) where you can create a greedy optional match for the string you want, and then match data after it. If the greedy match fails, it's optional so it doesn't matter, if it succeeds, it needs some extra data to match the second expression and so fails.
It looks counter-intuitive, but works.
Eg (foo)?+.+ matches bar, foox and xfoo but won't match foo (or an empty string).
It might be possible in other dialects, but couldn't get it to work myself (they seem more willing to backtrack if the second match fails?)

How to replace string between two string with the same length

I have an input string like this:
ABCDEFG[HIJKLMN]OPQRSTUVWXYZ
How can I replace each character in the string between the [] with an X (resulting in the same number of Xs as there were characters)?
For example, with the input above, I would like an output of:
ABCDEFG[XXXXXXX]OPQRSTUVWXYZ
I am using JavaScript's RegEx for this and would prefer if answers could be an implementation that does this using JavaScript's RegEx Replace function.
I am new to RegEx so please explain what you do and (if possible) link articles to where I can get further help.
Using replace() and passing the match to a function as parameter, and then Array(m.length).join("X") to generate the X's needed:
var str = "ABCDEFG[HIJKLMN]OPQRSTUVWXYZ"
str = str.replace(/\[[A-Z]*\]/g,(m)=>"["+Array(m.length-1).join("X")+"]")
console.log(str);
We could use also .* instead of [A-Z] in the regex to match any character.
About regular expressions there are thousands of resources, specifically in JavaScript, you could see Regular Expressions MDN but the best way to learn, in my opinion, is practicing, I find regex101 useful.
const str="ABCDEFG[HIJKLMN]OPQRSTUVWXYZ";
const run=str=>str.replace(/\[.*]/,(a,b,c)=>c=a.replace(/[^\[\]]/g,x=>x="X"));
console.log(run(str));
The first pattern /\[.*]/ is to select letters inside bracket [] and the second pattern /[^\[\]]/ is to replace the letters to "X"
We can observe that every individual letter you wish to match is followed by a series of zero or more non-'[' characters, until a ']' is found. This is quite simple to express in JavaScript-friendly regex:
/[A-Z](?=[^\[]*\])/g
regex101 example
(?= ) is a "positive lookahead assertion"; it peeks ahead of the current matching point, without consuming characters, to verify its contents are matched. In this case, "[^[]*]" matches exactly what I described above.
Now you can substitute each [A-Z] matched with a single 'X'.
You can use the following solution to replace a string between two square brackets:
const rxp = /\[.*?\]/g;
"ABCDEFG[HIJKLMN]OPQRSTUVWXYZ".replace(rxp, (x) => {
return x.replace(rxp, "X".repeat(x.length)-2);
});

How to match all words starting with dollar sign but not slash dollar

I want to match all words which are starting with dollar sign but not slash and dollar sign.
I already try few regex.
(?:(?!\\)\$\w+)
\\(\\?\$\w+)\b
String
$10<i class="">$i01d</i>\$id
Expected result
*$10*
*$i01d*
but not this
*$id*
After find all expected matching word i want to replace this my object.
One option is to eliminate escape sequences first, and then match the cleaned-up string:
s = String.raw`$10<i class="">$i01d</i>\$id`
found = s.replace(/\\./g, '').match(/\$\w+/g)
console.log(found)
The big problem here is that you need a negative lookbehind, however, JavaScript does not support it. It's possible to emulate it crudely, but I will offer an alternative which, while not great, will work:
var input = '$10<i class="">$i01d</i>\\$id';
var regex = /\b\w+\b\$(?!\\)/g;
//sample implementation of a string reversal function. There are better implementations out there
function reverseString(string) {
return string.split("").reverse().join("");
}
var reverseInput = reverseString(input);
var matches = reverseInput
.match(regex)
.map(reverseString);
console.log(matches);
It is not elegant but it will do the job. Here is how it works:
JavaScript does support a lookahead expression ((?>)) and a negative lookahead ((?!)). Since this is the reverse of of a negative lookbehind, you can reverse the string and reverse the regex, which will match exactly what you want. Since all the matches are going to be in reverse, you need to also reverse them back to the original.
It is not elegant, as I said, since it does a lot of string manipulations but it does produce exactly what you want.
See this in action on Regex101
Regex explanation Normally, the "match x long as it's not preceded by y" will be expressed as (?<!y)x, so in your case, the regex will be
/(?<!\\)\$\b\w+\b/g
demonstration (not JavaScript)
where
(?<!\\) //do not match a preceding "\"
\$ //match literal "$"
\b //word boundary
\w+ //one or more word characters
\b //second word boundary, hence making the match a word
When the input is reversed, so do all the tokens in order to match. Furthermore, the negative lookbehind gets inverted into a negative lookahead of the form x(?!y) so the new regular expression is
/\b\w+\b\$(?!\\)/g;
This is more difficult than it appears at first blush. How like Regular Expressions!
If you have look-behind available, you can try:
/(?<!\\)\$\w+/g
This is NOT available in JS. Alternatively, you could specify a boundary that you know exists and use a capture group like:
/\s(\$\w+)/g
Unfortunately, you cannot rely on word boundaries via /b because there's no such boundary before '\'.
Also, this is a cool site for testing your regex expressions. And this explains the word boundary anchor.
If you're using a language that supports negative lookback assertions you can use something like this.
(?<!\\)\$\w+
I think this is the cleanest approach, but unfortunately it's not supported by all languages.
This is a hackier implementation that may work as well.
(?:(^\$\w+)|[^\\](\$\w+))
This matches either
A literal $ at the beginning of a line followed by multiple word characters. Or...
A literal $ this is preceded by any character except a backslash.
Here is a working example.

Regular expression match specific key words

I am trying to use regexp to match some specific key words.
For those codes as below, I'd like to only match those IFs at first and second line, which have no prefix and postfix. The regexp I am using now is \b(IF|ELSE)\b, and it will give me all the IFs back.
IF A > B THEN STOP
IF B < C THEN STOP
LOL.IF
IF.LOL
IF.ELSE
Thanks for any help in advance.
And I am using http://regexr.com/ for test.
Need to work with JS.
I'm guessing this is what you're looking for, assuming you've added the m flag for multiline:
(?:^|\s)(IF|ELSE)(?:$|\s)
It's comprised of three groups:
(?:^|\s) - Matches either the beginning of the line, or a single space character
(IF|ELSE) - Matches one of your keywords
(?:$|\s) - Matches either the end of the line, or a single space character.
Regexr
you can do it with lookaround (lookahead + lookbehind). this is what you really want as it explicitly matches what you are searching. you don't want to check for other characters like string start or whitespaces around the match but exactly match "IF or ELSE not surrounded by dots"
/(?<!\.)(IF|ELSE)(?!\.)/g
explanation:
use the g-flag to find all occurrences
(?<!X)Y is a negative lookbehind which matches a Y not preceeded by an X
Y(?!X) is a negative lookahead which matches a Y not followed by an X
working example: https://regex101.com/r/oS2dZ6/1
PS: if you don't have to write regex for JS better use a tool which supports the posix standard like regex101.com

Categories