Regex Lazy mode doesn't work as expected

Regex Lazy mode doesn't work as expected - javascript

Given the following string:
FFSMQWUNUPZRJMTHACFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDADYSORTYZQPWGMBLNAQOFODSNXSZFURUNPMZGHTA
I'm trying to match every substring that contains CABDA with the following regex:
C.*?A.*?B.*?D.*?A
The only thing I find then is
CFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDA
Which in itself is not wrong - but I should be finding CSAYBRHXQQGUDA
What am I missing?
You can test it here if you'd like
Any help is appreciated.

A lazy quantifier doesn't mean that it would try to match the smallest substring possible. It just means that it would try to match as little characters as it can and backtrack towards more characters, as opposed to match as many characters as it can and backtrack towards less.
Finding the position remains the same - the first one from left to right. For example:
x+?y
when matched against:
xxxy
will still match xxxy and not just xy since it was able to start from the first x and backtrack towards more xes.

You can use this negation class based regex:
/C[^C]*?A[^A]*?B[^B]*?D[^D]*?A/
RegEx Demo
This finds CSAYBRHXQQGUDA in your given input.

(?=(C.*?A.*?B.*?D.*?A))
Put your expression inside lookahead to get all matches.See demo
https://regex101.com/r/fM9lY3/46
If you want to find only the shortest you can use
C(?:(?!C|A|B|D).)*A(?:(?!C|A|B|D).)*B(?:(?!C|A|B|D).)*D(?:(?!C|A|B|D).)*A

Related

Regex exclude doesn't exclude string only first character

Firstly we have the following string:
aaa{ignoreme}asdebla bla f{}asdfdsaignoreme}asd
We want our regex to find the whitespaces and any special charsacters like {}, but if after { comes exactly ignoreme} then exclude it
This is where we are right now:
(?!{ignoreme})[\s\[\]{}()<>\\'"|^`]
The problem is that our regex finds the } after ignoreme
Here is the link https://regex101.com/r/bU1oG0/2
Any help is appreciated,
Thanks

The point is that the } is matched since your (?!{ignoreme}) lookahead only skips a { followed with ignoreme} and matches a } since it is not starting a {ignoreme} char sequence. Also, in JS, you cannot use a lookbehind, like (?<!{ignoreme)}.
This is a kind of issue that can be handled with a regex that matches what you do not need, and matches and captures what you need:
/{ignoreme}|([\s[\]{}()<>\\'"|^`])/g
See the regex demo
Now, {ignoreme} is matched (and you do not have to use this value) and ([\s[]{}()<>\\'"|^`]) is captured into Group 1 the value of which you need to use.

Matching multiple optional characters depending on each other

I want to match all valid prefixes of substitute followed by other characters, so that
sub/abc/def matches the sub part.
substitute/abc/def matches the substitute part.
subt/abc/def either doesn't match or only matches the sub part, not the t.
My current Regex is /^s(u(b(s(t(i(t(u(te?)?)?)?)?)?)?)?)?/, which works, however this seems a bit verbose.
Is there any better (as in, less verbose) way to do this?

This would do like the same as you mentioned in your question.
^s(?:ubstitute|ubstitut|ubstitu|ubstit|ubsti|ubst|ubs|ub|u)?
The above regex will always try to match the large possible word. So at first it checks for substitute, if it finds any then it will do matching else it jumps to next pattern ie, substitut , likewise it goes on upto u.
DEMO 1 DEMO 2

you could use a two-step regex
find first word of subject by using this simple pattern ^(\w+)
use the extracted word from step 1 as your regex pattern e.g. ^subs against the word substitute

Javascript regex: is there anyway to write a regex which gives true if backreference is NOT matched

so here is my problem: I'm checking an input of 2 years with a hyphen. Like:
2001-2015
To test this, I use the simple regex
/^([0-9]{4})-([0-9]{4})$/
I know groups aren't needed, and (19|20)[0-9]{2}, is a closer match to the basic year exp, but bear with me.
Now, if my requirement was to match the two years only if they are the same, i could have used a backreference like:
/^([0-9]{4})-\1$/
which matches 2000-2000 but not 2000-2014
My actual requirement is exactly the opposite. I want it to match if the years are different but not if they're same. That is, 2000-2014 should match. 2000-2000 should not.
And using the negative of the boolean I find is not an option. I need this for a huuuge regex which is supposed to match a whole lot of different date formats. This is just a part of it.
Is there any way to achieve this?

You can use a negative lookahead to achieve this:
^([0-9]{4})-(?!\1)[0-9]{4}$
Demo
This is almost the same pattern, except it inserts a condition check using the backreference.
(?!\1) will fail if \1 matches at its position.

You can use negative lookahead:
\b(\d{4})-(?!\1)\d{4}\b
RegEx Demo

Use Negative Lookahead.
Like this :
^([0-9]{4})-(?!\1)[0-9]{4}$
It does work on your example.
Explanation : (?!\1) Assert that it is impossible to match the regex \1. Then you just put your 4 digits requirement.

Match full sentences skipping spurious dots

I need to match complete sentences ending at the full stop, but I'm stuck on trying to skip false dots.
To keep it simple, I've started with this syntax [^.]+[^ ] which works fine with normal sentences, but, as you can see, it breaks at every dots.
My regex101
So, at the first sentence, the result should be:
Recent studies have described a pattern associated with specific object (e.g., face-related and building-related) in human occipito-temporal cortex.
and so on.

Just use a lookahead to set the condition as match upto a dot which must be followed by a space or end of the line anchor $.
(.*?\.)(?=\s|$)
DEMO

Expanding upon this, here is a regex that doesn't use reluctant matching and potentially more efficient:
(?:[^.]+|\.\S)+\.
And if you would like to match the sentences themselves, and remove the one trending space that you would get from using the regex of the accepted answer, you can use this:
\S(?:[^.]+|\.\S)+\.
Here is a regex demo.

Javascript Regex for all words not between certain characters

I'm trying to return a count of all words NOT between square brackets. So given ..
[don't match these words] but do match these
I get a count of 4 for the last four words.
This works in .net:
\b(?<!\[)[\w']+(?!\])\b
but it won't work in Javascript because it doesn't support lookbehind
Any ideas for a pure js regex solution?

Ok, I think this should work:
\[[^\]]+\](?:^|\s)([\w']+)(?!\])\b|(?:^|\s)([\w']+)(?!\])\b
You can test it here:
http://regexpal.com/
If you need an alternative with text in square brackets coming after the main text, it could be added as a second alternative and the current second one would become third.
It's a bit complicated but I can't think of a better solution right now.
If you need to do something with the actual matches you will find them in the capturing groups.
UPDATE:
Explanation:
So, we've got two options here:
\[[^\]]+\](?:^|\s)([\w']+)(?!\])\b
This is saying:
\[[^\]]+\] - match everything in square brackets (don't capture)
(?:^|\s) - followed by line start or a space - when I look at it now take the caret out as it doesn't make sense so this will become just \s
([\w']+) - match all following word characters as long as (?!\])the next character is not the closing bracket - well this is probably also unnecessary now, so let's try and remove the lookahead
\b - and match word boundary
2 (?:^|\s)([\w']+)(?!\])\b
If you cannot find the option 1 - do just the word matching, without looking for square brackets as we ensured with the first part that they are not here.
Ok, so I removed all the things that we don't need (they stayed there because I tried quite a few options before it worked:-) and the revised regex is the one below:
\[[^\]]+\]\s([\w']+)(?!\])\b|(?:^|\s)([\w']+)\b

I would use something like \[[^\]]*\] to remove the words between square brackets, and then explode by spaces the returned string to count the remaining words.

Chris, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a general question about how to exclude patterns in regex.)
Here's our simple regex (see it at work on regex101, looking at the Group captures in the bottom right panel):
\[[^\]]*\]|(\b\w+\b)
The left side of the alternation matches complete [bracketed groups]. We will ignore these matches. The right side matches and captures words to Group 1, and we know they are the right words because they were not matched by the expression on the left.
This program shows how to use the regex (see the count result in the online demo):
<script>
var subject = '[match ye not these words] but do match these';
var regex = /\[[^\]]*\]|(\b\w+\b)/g;
var group1Caps = [];
var match = regex.exec(subject);
// put Group 1 captures in an array
while (match != null) {
if( match[1] != null ) group1Caps.push(match[1]);
match = regex.exec(subject);
}
document.write("<br>*** Number of Matches ***<br>");
document.write(group1Caps.length);
</script>
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...

We Keep Coding

JavaScript is the programming language of the Web.

Regex Lazy mode doesn't work as expected - javascript

You can use this negation class based regex: /C[^C]?A[^A]?B[^B]?D[^D]?A/ RegEx Demo This finds CSAYBRHXQQGUDA in your given input.

(?=(C.?A.?B.?D.?A)) Put your expression inside lookahead to get all matches.See demo https://regex101.com/r/fM9lY3/46 If you want to find only the shortest you can use C(?:(?!C|A|B|D).)A(?:(?!C|A|B|D).)B(?:(?!C|A|B|D).)D(?:(?!C|A|B|D).)A

Related

Regex exclude doesn't exclude string only first character

Matching multiple optional characters depending on each other

Javascript regex: is there anyway to write a regex which gives true if backreference is NOT matched

Match full sentences skipping spurious dots

Javascript Regex for all words not between certain characters

Categories

Resources

We Keep Coding

JavaScript is the programming language of the Web.

Regex Lazy mode doesn't work as expected - javascript

You can use this negation class based regex: /C[^C]*?A[^A]*?B[^B]*?D[^D]*?A/ RegEx Demo This finds CSAYBRHXQQGUDA in your given input.

(?=(C.*?A.*?B.*?D.*?A)) Put your expression inside lookahead to get all matches.See demo https://regex101.com/r/fM9lY3/46 If you want to find only the shortest you can use C(?:(?!C|A|B|D).)*A(?:(?!C|A|B|D).)*B(?:(?!C|A|B|D).)*D(?:(?!C|A|B|D).)*A

Related

Regex exclude doesn't exclude string only first character

Matching multiple optional characters depending on each other

Javascript regex: is there anyway to write a regex which gives true if backreference is NOT matched

Match full sentences skipping spurious dots

Javascript Regex for all words not between certain characters

Categories

Resources

You can use this negation class based regex: /C[^C]?A[^A]?B[^B]?D[^D]?A/ RegEx Demo This finds CSAYBRHXQQGUDA in your given input.

(?=(C.?A.?B.?D.?A)) Put your expression inside lookahead to get all matches.See demo https://regex101.com/r/fM9lY3/46 If you want to find only the shortest you can use C(?:(?!C|A|B|D).)A(?:(?!C|A|B|D).)B(?:(?!C|A|B|D).)D(?:(?!C|A|B|D).)A