Regex to match phrases not containing a palindrome - javascript

Is there a way to match a word not containing a palindrome (be it as long as it may)?
For instance, for a 6-character-long palindrome, foo/bar would match but xbarrabzz/1xoxxoxa14 would not match.

Use a negative lookahead, for example for length 5/6 (3-letter with middle letter reused or doubled):
^(?:(.)(?!(.)(.)\3?\2\1))*$
See live demo.
But you would have to add another look ahead for each length (which I leave as an exercise for the reader).

You can use \b(?:(?!(\w)(\w)\2?\1)\w)+\b.
Online Demo.
It's a simple negative lookahead that checks if the word contains a structure like xyx or xyyx.

Related

RegExp to match dot but not i.e or e.g

I'd like the match the third dot in this string
"Test i.e and some more e.g. And"
So, find the first dot that isn't "i.e" or "e.g"
So far, I have
(?!i\.e|e\.g)(\.)
But it still seems to be capturing all dots
Different ideas...
1.) To match dot at a non-word boundary
\.\B
See demo at regex101
2.) Or if it is always the very last character, just use end anchor:
\.$
Demo at regex101
3.) But if you want to match the last dot with characters ahead, use a lookahead.
\.(?![^.]*\.)
At any dot looks if not another dot is ahead (with any amount of [^.]* non-dots in between).
Demo at regex101
Javascript does not support look-behinds, this will be a hard task to do with the language. If you ask me, I would suggest traversing the text stream yourself and getting the dots, this is very simple to do and would probably be a lot faster than regexps.

Regex Lazy mode doesn't work as expected

Given the following string:
FFSMQWUNUPZRJMTHACFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDADYSORTYZQPWGMBLNAQOFODSNXSZFURUNPMZGHTA
I'm trying to match every substring that contains CABDA with the following regex:
C.*?A.*?B.*?D.*?A
The only thing I find then is
CFELGHDZEJWFDWVPYOZEVEJKQWHQAHOCIYWGVLPSHFESCGEUCJGYLGDWPIWIDWZZXRUFXERABQJOXZALQOCSAYBRHXQQGUDA
Which in itself is not wrong - but I should be finding CSAYBRHXQQGUDA
What am I missing?
You can test it here if you'd like
Any help is appreciated.
A lazy quantifier doesn't mean that it would try to match the smallest substring possible. It just means that it would try to match as little characters as it can and backtrack towards more characters, as opposed to match as many characters as it can and backtrack towards less.
Finding the position remains the same - the first one from left to right. For example:
x+?y
when matched against:
xxxy
will still match xxxy and not just xy since it was able to start from the first x and backtrack towards more xes.
You can use this negation class based regex:
/C[^C]*?A[^A]*?B[^B]*?D[^D]*?A/
RegEx Demo
This finds CSAYBRHXQQGUDA in your given input.
(?=(C.*?A.*?B.*?D.*?A))
Put your expression inside lookahead to get all matches.See demo
https://regex101.com/r/fM9lY3/46
If you want to find only the shortest you can use
C(?:(?!C|A|B|D).)*A(?:(?!C|A|B|D).)*B(?:(?!C|A|B|D).)*D(?:(?!C|A|B|D).)*A

Matching multiple optional characters depending on each other

I want to match all valid prefixes of substitute followed by other characters, so that
sub/abc/def matches the sub part.
substitute/abc/def matches the substitute part.
subt/abc/def either doesn't match or only matches the sub part, not the t.
My current Regex is /^s(u(b(s(t(i(t(u(te?)?)?)?)?)?)?)?)?/, which works, however this seems a bit verbose.
Is there any better (as in, less verbose) way to do this?
This would do like the same as you mentioned in your question.
^s(?:ubstitute|ubstitut|ubstitu|ubstit|ubsti|ubst|ubs|ub|u)?
The above regex will always try to match the large possible word. So at first it checks for substitute, if it finds any then it will do matching else it jumps to next pattern ie, substitut , likewise it goes on upto u.
DEMO 1 DEMO 2
you could use a two-step regex
find first word of subject by using this simple pattern ^(\w+)
use the extracted word from step 1 as your regex pattern e.g. ^subs against the word substitute

Match full sentences skipping spurious dots

I need to match complete sentences ending at the full stop, but I'm stuck on trying to skip false dots.
To keep it simple, I've started with this syntax [^.]+[^ ] which works fine with normal sentences, but, as you can see, it breaks at every dots.
My regex101
So, at the first sentence, the result should be:
Recent studies have described a pattern associated with specific object (e.g., face-related and building-related) in human occipito-temporal cortex.
and so on.
Just use a lookahead to set the condition as match upto a dot which must be followed by a space or end of the line anchor $.
(.*?\.)(?=\s|$)
DEMO
Expanding upon this, here is a regex that doesn't use reluctant matching and potentially more efficient:
(?:[^.]+|\.\S)+\.
And if you would like to match the sentences themselves, and remove the one trending space that you would get from using the regex of the accepted answer, you can use this:
\S(?:[^.]+|\.\S)+\.
Here is a regex demo.

Since "a+?" is Lazy, Why does "a+?b" Match "aaab"?

While learning regular expressions in javascript using JavaScript: The Definitive Guide, I was confused by this passage:
But /a+?/ matches one or more occurrences of the letter a, matching as
few characters as necessary. When applied to the same string, this
pattern matches only the first letter a.
…
Now let’s use the nongreedy version: /a+?b/. This should match the
letter b preceded by the fewest number of a’s possible. When applied
to the same string “aaab”, you might expect it to match only one a and
the last letter b. In fact, however, this pattern matches the entire
string, just like the greedy version of the pattern.
Why is this so?
This is the explanation from the book:
This is because regular-expression pattern matching is done by finding
the first position in the string at which a match is possible. Since a
match is possible starting at the first character of the
string,shorter matches starting at subsequent characters are never
even considered.
I don't understand. Can anyone give me a more detailed explanation?
Okay, so you have your search space, "aaabc", and your pattern, /a+?b/
Does /a+?b/ match "a"? No.
Does /a+?b/ match "aa"? No.
Does /a+?b/ match "aaa"? No.
Does /a+?b/ match "aaab"? Yes.
Since you're matching literal characters and not any sort of wildcard, the regular expression a+?b is effectively the same as a+b anyway. The only type of sequence either one will match is a string of one or more a characters followed by a single b character. The non-greedy modifier makes no difference here, as the only thing an a can possibly match is an a.
The non-greedy qualifier becomes interesting when it's applied to something that can take on lots of different values, like .. (edit or cases where there's interesting stuff to the left of something like a+?)
edit — if you're expecting a+?b to match just the last a before the b in aaab, well that's not how it works. Searching for a pattern in a string implicitly means to search for the earliest occurrence of the pattern. Thus, though starting from the last a does give a substring that matches the pattern, it's not the first substring that matches.
The Engine Attempts a Match at the Beginning of the String
Can anyone give me a more detailed explanation?
Yes.
In short: .+? does not look for a shortest match globally, at the level of the entire string, but locally, from the position in the string where the engine is currently positioned.
How the Engine Works
When you try a regex against the string aaab, the engine first tries to find a match starting at the very first position in the string. That position is the position before the first a. If the engine cannot find a match at the first position, it moves on and tries again starting from the second position (between the first and second a)
So can a match be found by the regex a+?b at the first position? Yes.
a matches the first a
The +? quantifiers tells the engine to match the fewest number of a chars necessary. Since we are looking to return a match, necessary means that the following tokens (in this case) have to be allowed to match. In this case, the fewest number of a chars needed to allow the b to match is all the remaining a chars.
b matches
In the details the second point is a bit more complex (the engine tries to match b against the second a, fails, backtracks...) but you don't need to worry about that.
'?' after a+ means minimum number of characters to satisfy expression. /a+/ means one 'a' or as many as you can encounter before some other character. In order to satisfy /a+?/ (since it's nogreedy) it only needs single 'a'.
In order to satisfy /a+?b/, since we have 'b' at the end, in order to satisfy this expression it needs to match one or more 'a' before it hits 'b'. It has to hit that 'b'. /a+/ doesn't have to hit b because RegEx doesn't ask for that. /a+?b/ has to hit that 'b'.
Just think about it. What other meaning /a+?b/ could have?
Hope this helps

Categories