Negative lookahead Regular Expression

Negative lookahead Regular Expression - javascript

I want to match all strings ending in ".htm" unless it ends in "foo.htm". I'm generally decent with regular expressions, but negative lookaheads have me stumped. Why doesn't this work?
/(?!foo)\.htm$/i.test("/foo.htm"); // returns true. I want false.
What should I be using instead? I think I need a "negative lookbehind" expression (if JavaScript supported such a thing, which I know it doesn't).

The problem is pretty simple really. This will do it:
/^(?!.*foo\.htm$).*\.htm$/i.test("/foo.htm"); // returns false

What you are describing (your intention) is a negative look-behind, and Javascript has no support for look-behinds.
Look-aheads look forward from the character at which they are placed — and you've placed it before the .. So, what you've got is actually saying "anything ending in .htm as long as the first three characters starting at that position (.ht) are not foo" which is always true.
Usually, the substitute for negative look-behinds is to match more than you need, and extract only the part you actually do need. This is hacky, and depending on your precise situation you can probably come up with something else, but something like this:
// Checks that the last 3 characters before the dot are not foo:
/(?!foo).{3}\.htm$/i.test("/foo.htm"); // returns false

As mentioned JavaScript does not support negative look-behind assertions.
But you could use a workaroud:
/(foo)?\.htm$/i.test("/foo.htm") && RegExp.$1 != "foo";
This will match everything that ends with .htm but it will store "foo" into RegExp.$1 if it matches foo.htm, so you can handle it separately.

Like Renesis mentioned, "lookbehind" is not supported in JavaScript, so maybe just use two regexps in combination:
!/foo\.htm$/i.test(teststring) && /\.htm$/i.test(teststring)

Probably this answer has arrived just a little bit later than necessary but I'll leave it here just in case someone will run into the same issue now (7 years, 6 months after this question was asked).
Now lookbehinds are included in ECMA2018 standard & supported at least in last version of Chrome. However, you might solve the puzzle with or without them.
A solution with negative lookahead:
let testString = `html.htm app.htm foo.tm foo.htm bar.js 1to3.htm _.js _.htm`;
testString.match(/\b(?!foo)[\w-.]+\.htm\b/gi);
> (4) ["html.htm", "app.htm", "1to3.htm", "_.htm"]
A solution with negative lookbehind:
testString.match(/\b[\w-.]+(?<!foo)\.htm\b/gi);
> (4) ["html.htm", "app.htm", "1to3.htm", "_.htm"]
A solution with (technically) positive lookahead:
testString.match(/\b(?=[^f])[\w-.]+\.htm\b/gi);
> (4) ["html.htm", "app.htm", "1to3.htm", "_.htm"]
etc.
All these RegExps tell JS engine the same thing in different ways, the message that they pass to JS engine is something like the following.
Please, find in this string all sequences of characters that are:
Separated from other text (like words);
Consist of one or more letter(s) of english alphabet, underscore(s),
hyphen(s), dot(s) or digit(s);
End with ".htm";
Apart from that, the part of sequence before ".htm" could be anything
but "foo".

String.prototype.endsWith (ES6)
console.log( /* !(not)endsWith */
!"foo.html".endsWith("foo.htm"), // true
!"barfoo.htm".endsWith("foo.htm"), // false (here you go)
!"foo.htm".endsWith("foo.htm"), // false (here you go)
!"test.html".endsWith("foo.htm"), // true
!"test.htm".endsWith("foo.htm") // true
);

You could emulate the negative lookbehind with something like
/(.|..|.*[^f]..|.*f[^o].|.*fo[^o])\.htm$/, but a programmatic approach would be better.

Related

Regex expression for exactly known pattern without "cutting into" the string not working

I am currently developing a web-application where I work with java, javascript, html, jquery, etc. and at some point I need to check that whether an input matches a known pattern and only proceed if it is true.
The pattern should be [at least one but max 3 numbers between 0-9]/[exactly 4 numbers between 0-9], so the only acceptable variations should be like
1/2014 or 23/2015 or 123/2016.
and nothing else, and I CANNOT accept something like 1234/3012 or anything else, and this is my problem right here, it accepts everything in which it can find the above pattern, so like from 12345/6789 it accepts and saves 345/6789.
I am a total newbie with regex, so I checked out http://regexr.com and this is the code I have in my javascript:
$.validator.addMethod("hatarozat", function(value, element) {
return (this.optional(element) || /[0-9]{1,3}(?:\/)[0-9]{4}/i.test(value));
}, "Hibás határozat szám!");
So this is my regex: /[0-9]{1,3}(?:\/)[0-9]{4}/i
which I built up using the above website. What could be the problem, or how can I achived what I described? I tried /^[0-9]{1,3}(?:\/)[0-9]{4}$/ibut this doesn't seem to work, please anyone help me, I have everything else done and am getting pretty stressed over something looking so simple yet I cannot solve it. Thank you!

Your last regex with the anchors (^ and $) is a correct regex. What prevents your code from working is this.optional(element) ||. Since this is a static thing, and is probably true, so it does not show any error (as || is an OR condition, if the first is true, the whole returns true, the regex is not checked at all).
So, use
return /^[0-9]{1,3}\/[0-9]{4}$/.test(value);
Note you do not need the (?:...) with \/ as the grouping does not do anything important here and is just redundant. The anchors are important, since you want the whole string to match the pattern (and ^ anchors the regex at the start of the string and $ does that at the end of the string.)

You need use the the following special characters in your regex expression:
^ and $
or \b
so 2 regexp will be correct:
/\b[0-9]{1,3}(?:\/)[0-9]{4}\b/i;
or
/^[0-9]{1,3}(?:\/)[0-9]{4}$/i

Greedy Regex for varying number of new line [duplicate]

var ss= "<pre>aaaa\nbbb\nccc</pre>ddd";
var arr= ss.match( /<pre.*?<\/pre>/gm );
alert(arr); // null
I'd want the PRE block be picked up, even though it spans over newline characters. I thought the 'm' flag does it. Does not.
Found the answer here before posting. SInce I thought I knew JavaScript (read three books, worked hours) and there wasn't an existing solution at SO, I'll dare to post anyways. throw stones here
So the solution is:
var ss= "<pre>aaaa\nbbb\nccc</pre>ddd";
var arr= ss.match( /<pre[\s\S]*?<\/pre>/gm );
alert(arr); // <pre>...</pre> :)
Does anyone have a less cryptic way?
Edit: this is a duplicate but since it's harder to find than mine, I don't remove.
It proposes [^] as a "multiline dot". What I still don't understand is why [.\n] does not work. Guess this is one of the sad parts of JavaScript..

DON'T use (.|[\r\n]) instead of . for multiline matching.
DO use [\s\S] instead of . for multiline matching
Also, avoid greediness where not needed by using *? or +? quantifier instead of * or +. This can have a huge performance impact.
See the benchmark I have made: https://jsben.ch/R4Hxu
Using [^]: fastest
Using [\s\S]: 0.83% slower
Using (.|\r|\n): 96% slower
Using (.|[\r\n]): 96% slower
NB: You can also use [^] but it is deprecated in the below comment.

[.\n] does not work because . has no special meaning inside of [], it just means a literal .. (.|\n) would be a way to specify "any character, including a newline". If you want to match all newlines, you would need to add \r as well to include Windows and classic Mac OS style line endings: (.|[\r\n]).
That turns out to be somewhat cumbersome, as well as slow, (see KrisWebDev's answer for details), so a better approach would be to match all whitespace characters and all non-whitespace characters, with [\s\S], which will match everything, and is faster and simpler.
In general, you shouldn't try to use a regexp to match the actual HTML tags. See, for instance, these questions for more information on why.
Instead, try actually searching the DOM for the tag you need (using jQuery makes this easier, but you can always do document.getElementsByTagName("pre") with the standard DOM), and then search the text content of those results with a regexp if you need to match against the contents.

You do not specify your environment and version of JavaScript (ECMAScript), and I realise this post was from 2009, but just for completeness:
With the release of ECMA2018 we can now use the s flag to cause . to match \n (see https://stackoverflow.com/a/36006948/141801).
Thus:
let s = 'I am a string\nover several\nlines.';
console.log('String: "' + s + '".');
let r = /string.*several.*lines/s; // Note 's' modifier
console.log('Match? ' + r.test(s)); // 'test' returns true
This is a recent addition and will not work in many current environments, for example Node v8.7.0 does not seem to recognise it, but it works in Chromium, and I'm using it in a Typescript test I'm writing and presumably it will become more mainstream as time goes by.

Now there's the s (single line) modifier, that lets the dot matches new lines as well :)
\s will also match new lines :D
Just add the s behind the slash
/<pre>.*?<\/pre>/gms

[.\n] doesn't work, because dot in [] (by regex definition; not javascript only) means the dot-character. You can use (.|\n) (or (.|[\n\r])) instead.

I have tested it (Chrome) and it's working for me (both [^] and [^\0]), by changing the dot (.) with either [^\0] or [^] , because dot doesn't match line break (See here: http://www.regular-expressions.info/dot.html).
var ss= "<pre>aaaa\nbbb\nccc</pre>ddd";
var arr= ss.match( /<pre[^\0]*?<\/pre>/gm );
alert(arr); //Working

In addition to above-said examples, it is an alternate.
^[\\w\\s]*$
Where \w is for words and \s is for white spaces

[\\w\\s]*
This one was beyond helpful for me, especially for matching multiple things that include new lines, every single other answer ended up just grouping all of the matches together.

RegEx inner content

Using JavaScript, I'm looking to pinpoint text that's inside two other strings WITHOUT including those strings. For example:
input: ONE example TWO
regular expression: (?=ONE).+(?=TWO)
matches: ONE example
I want: example
I'm really surprised that the question mark (which is supposed just include that string in the query but not the result) works on the end of the string, but not on the start.

Ah-ha! I figured it out.
for example, here's how to get text inside parenthesis without the parenthesis
(?<=\().+(?=\))
Here's a nice reference: http://www.regular-expressions.info/lookaround.html
Part of my confusion was javascript's fault. It evidently doesn't support "lookbehinds" natively. I found this workaround though:
http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript

(I use Python's re module to show the examples -- exactly how to do this depends on your regexp implementation [some don't have groups, for example -- or backreferences])
Use a backwards assertion, not a forward assertion, for the first assertion.
>>> re.search(r"(?<=ONE).+(?=TWO)", "ONE x a b TWO").group()
' x a b '
The problem is that the zero width assertion (?=ONE) matches the text "ONE", but doesn't "consume" it -- i.e. it just checks that it's there, but leaves the string as-is. Then the .+ starts reading text, and does consume it.
Backwards assertions don't look ahead, they look behind, so .+ doesn't get run until whatever is behind it is "ONE".
It is probably better not to bother with these at all, but use groups. Consider:
>>> re.search(r"ONE(.+)TWO", "ONE x a b TWO").group(1)
' x a b '

Javascript Regular Expressions Lookbehind Failing

I am hoping that this will have a pretty quick and simple answer. I am using regular-expressions.info to help me get the right regular expression to turn URL-encoded, ISO-8859-1 pound sign ("%A3"), into a URL-encoded UTF-8 pound sign ("%C2%A3").
In other words I just want to swap %A3 with %C2%A3, when the %A3 is not already prefixed with %C2.
So I would have thought the following would work:
Regular Expression: (?!(\%C2))\%A3
Replace With: %C2%A3
But it doesn't and I can't figure out why!
I assume my syntax is just slightly wrong, but I can't figure it out! Any ideas?
FYI - I know that the following will work (and have used this as a workaround in the meantime), but really want to understand why the former doesn't work.
Regular Expression: ([^\%C2])\%A3
Replace With: $1%C2%A3
TIA!

Why not just replace ((%C2)?%A3) with %C2%A3, making the prefix an optional part of the match? It means that you're "replacing" text with itself even when it's already right, but I don't foresee a performance issue.

Unfortunately, the (?!) syntax is negative lookahead. To the best of my knowledge, JavaScript does not support negative lookbehind.
What you could do is go forward with the replacement anyway, and end up with %C2%C2%A3 strings, but these could easily be converted in a second pass to the desired %C2%A3.

You could replace
(^.?.?|(?!%C2)...)%A3
with
$1%C2%A3

I would suggest you use the functional form of Javascript String.replace (see the section "Specifying a function as a parameter"). This lets you put arbitrary logic, including state if necessary, into a regexp-matching session. For your case, I'd use a simpler regexp that matches a superset of what you want, then in the function call you can test whether it meets your exact criteria, and if it doesn't then just return the matched string as is.
The only problem with this approach is that if you have overlapping potential matches, you have the possibility of missing the second match, since there's no way to return a value to tell the replace() method that it isn't really a match after all.

Why is my RegExp construction not accepted by JavaScript?

I'm using a RegExp to validate some user input on an ASP.NET web page. It's meant to enforce the construction of a password (i.e. between 8 and 20 long, at least one upper case character, at least one lower case character, at least one number, at least one of the characters ##!$% and no use of letters L or O (upper or lower) or numbers 0 and 1. This RegExp works fine in my tester (Expresso) and in my C# code.
This is how it looks:
(?-i)^(?=.{8,20})(?=.*[2-9])(?=.*[a-hj-km-np-z])(?=.*[A-HJ-KM-NP-Z])
(?=.*[##!$%])[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]*$
(Line break added for formatting)
However, when I run the code it lives in in IE6 or IE7 (haven't tried other browsers as this is an internal app and we're a Microsoft shop), I get a runtime error saying 'Syntax error in regular expression'. That's it - no further information in the error message aside from the line number.
What is it about this that JavaScript doesn't like?

Well, there are two ways of defining a Regex in Javascript:
a. Through a Regexp object constructor:
var re = new RegExp("pattern","flags");
re.test(myTestString);
b. Using a string literal:
var re = /pattern/flags;
You should also note that JS does not support some of the tenets of Regular Expressions. For a non-comprehensive list of features unsupported in JS, check out the regular-expressions.info site.
Specifically speaking, you appear to be setting some flags on the expression (for example, the case insensitive flag). I would suggest that you use the /i flag (as indicated by the syntax above) instead of using (?-i)
That would make your Regex as follows (Positive Lookahead appears to be supported):
/^(?=.{8,20})(?=.*[2-9])(?=.*[a-hj-km-np-z])(?=.*[A-HJ-KM-NP-Z])(?=.*[##!$%])[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]*$/i;
For a very good article on the subject, check out Regular Expressions in JavaScript.
Edit (after Howard's comment)
If you are simply assigning this Regex pattern to a RegularExpressionValidator control, then you will not have the ability to set Regex options (such as ignore case). Also, you will not be able to use the Regex literal syntax supported by Javascript. Therefore, the only option that remains is to make your pattern intrinsically case insensitive. For example, [a-h] would have to be written as [A-Ha-h]. This would make your Regex quite long-winded, I'm sorry to say.
Here is a solution to this problem, though I cannot vouch for it's legitimacy. Some other options that come to mind may be to turn of Client side validation altogether and validate exclusively on the Server. This will give you access to the full Regex flavour implemented by the System.Text.RegularExpressions.Regex object. Alternatively, use a CustomValidator and create your own JS function which applies the Regex match using the patterns that I (and others) have suggested.

I'm not familiar with C#'s regular expression syntax, but is this (at the start)
(?-i)
meant to turn the case insensitivity pattern modifier on? If so, that's your problem. Javascript doesn't support specifying the pattern modifiers in the expression. There's two ways to do this in javascript
var re = /pattern/i
var re = new RegExp('pattern','i');
Give one of those a try, and your expression should be happy.

As Cerberus mentions, (?-i) is not supported in JavaScript regexps. So, you need to get rid of that and use /i. Something to keep in mind is that there is no standard for regular expression syntax; it is different in each language, so testing in something that uses the .NET regular expression engine is not a valid test of how it will work in JavaScript. Instead, try and look for a reference on JavaScript regular expressions, such as this one.
Your match that looks for 8-20 characters is also invalid. This will ensure that there are at least 8 characters, but it does not limit the string to 20, since the character class with the kleene-closure (* operator) at the end can match as many characters as provided. What you want instead is to replace the * at the end with the {8,20}, and eliminate it from the beginning.
var re = /^(?=.*[2-9])(?=.*[a-hj-km-np-z])(?=.*[A-HJ-KM-NP-Z])(?=.*[##!$%])[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]{8,20}$/i;
On the other hand, I'm not really sure why you would want to restrict the length of passwords, unless there's a hard database limit (which there shouldn't be, since you shouldn't be storing passwords in plain text in the database, but instead hashing them down to something fixed size using a secure hash algorithm with a salt). And as mentioned, I don't see a reason to be so restrictive on the set of characters you allow. I'd recommend something more like this:
var re = /^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##!$%])[a-zA-Z0-9##!$%]{8,}$/i;
Also, why would you forbid 1, 0, L and O from your passwords (and it looks like you're trying to forbid I as well, which you forgot to mention)? This will make it very hard for people to construct good passwords, and since you never see a password as you type it, there's no reason to worry about letters which look confusingly similar. If you want to have a more permissive regexp:
var re = /^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##!$%]).{8,}$/i;

Are you enclosing the regexp in / / characters?
var regexp = /[]/;
return regexp.test();

(?-i)
Doesn't exist in JS Regexp. Flags can be specified as “new RegExp('pattern', 'i')”, or literal syntax “/pattern/i”.
(?=
Exists in modern implementations of JS Regexp, but is dangerously buggy in IE. Lookahead assertions should be avoided in JS for this reason.
between 8 and 20 long, at least one upper case character, at least one lower case character, at least one number, at least one of the characters ##!$% and no use of letters L or O (upper or lower) or numbers 0 and 1.
Do you have to do this in RegExp, and do you have to put all the conditions in one RegExp? Because those are easy conditions to match using multiple RegExps, or even simple string matching:
if (
s.length<8 || s.length>20 ||
s==s.toLowerCase() || s==s.toUpperCase() ||
s.indexOf('0')!=-1 || s.indexOf('1')!=-1 ||
s.toLowerCase().indexOf('l')!=-1 || s.toLowerCase().indexOf('o')!=-1 ||
(s.indexOf('#')==-1 && s.indexOf('#')==-1 && s.indexOf('!')==-1 && s.indexOf('%')==-1 && s.indexOf('%')==-1)
)
alert('Bad password!');
(These are really cruel and unhelpful password rules if meant for end-users BTW!)

I would use this regular expression:
/(?=[^2-9]*[2-9])(?=[^a-hj-km-np-z]*[a-hj-km-np-z])(?=[^A-HJ-KM-NP-Z]*[A-HJ-KM-NP-Z])(?=[^##!$%]*[##!$%])^[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]{8,}$/
The [^a-z]*[a-z] will make sure that the match is made as early as possible instead of expanding the .* and doing backtracking.

(?-i) is supposed to turn case-insensitivity off. Everybody seems to be assuming you're trying to turn it on, but that would be (?i). Anyway, you don't want it to be case-insensitive, since you need to ensure that there are both uppercase and lowercase letters. Since case-sensitive matching is the default, prefacing a regex with (?-i) is pointless even in those flavors (like .NET) that support inline modifiers.

We Keep Coding

JavaScript is the programming language of the Web.

Negative lookahead Regular Expression - javascript

The problem is pretty simple really. This will do it: /^(?!.foo\.htm$).\.htm$/i.test("/foo.htm"); // returns false

Like Renesis mentioned, "lookbehind" is not supported in JavaScript, so maybe just use two regexps in combination: !/foo\.htm$/i.test(teststring) && /\.htm$/i.test(teststring)

You could emulate the negative lookbehind with something like /(.|..|.[^f]..|.f[^o].|.*fo[^o])\.htm$/, but a programmatic approach would be better.

Related

Regex expression for exactly known pattern without "cutting into" the string not working

Greedy Regex for varying number of new line [duplicate]

RegEx inner content

Javascript Regular Expressions Lookbehind Failing

Why is my RegExp construction not accepted by JavaScript?

Categories

Resources

We Keep Coding

JavaScript is the programming language of the Web.

Negative lookahead Regular Expression - javascript

The problem is pretty simple really. This will do it: /^(?!.*foo\.htm$).*\.htm$/i.test("/foo.htm"); // returns false

Like Renesis mentioned, "lookbehind" is not supported in JavaScript, so maybe just use two regexps in combination: !/foo\.htm$/i.test(teststring) && /\.htm$/i.test(teststring)

You could emulate the negative lookbehind with something like /(.|..|.*[^f]..|.*f[^o].|.*fo[^o])\.htm$/, but a programmatic approach would be better.

Related

Regex expression for exactly known pattern without "cutting into" the string not working

Greedy Regex for varying number of new line [duplicate]

RegEx inner content

Javascript Regular Expressions Lookbehind Failing

Why is my RegExp construction not accepted by JavaScript?

Categories

Resources

The problem is pretty simple really. This will do it: /^(?!.foo\.htm$).\.htm$/i.test("/foo.htm"); // returns false

You could emulate the negative lookbehind with something like /(.|..|.[^f]..|.f[^o].|.*fo[^o])\.htm$/, but a programmatic approach would be better.