Javascript RegExp to match user ID pattern

Javascript RegExp to match user ID pattern - javascript

I am using C# code below to detect if a string is formatted E123456, H123456 or T123456.
Regex(#"\b[eht]\d{6}")
I am trying to use the Javascript equivalent but am having difficulties.
So far I have, but it's returning false each time when it should be returning true.
RegExp("\b[eht]\d{6}")
Any help will be appreciated, or a good link to RegExp formatting.

I believe the issue you are having is due to the fact that when using the RegExp constructor with a string argument, special characters such as slashes and quotation marks must be escaped with the backslash character. Also, use the i flag if you want to allow both upper and lower case matches.
To make the RegExp with the constructor method, you would use:
new RegExp("\\b[eht]\\d{6}", "i")
Or to make a RegExp literal, go with:
var regExName = /\b[eht]\d{6}/i
Also, if you want to experiment more with RegEx's in JavaScript, http://regexr.com/ is a wonderful site that I highly recommend!

Your regex only matches lower case characters while your user ids have upper case E, H and T. So either use upper case in the regex string (RegExp("\b[EHT]\d{6}")) or use the i flag (RegExp("\b[eht]\d{6}",'i'))
Reference for the flags: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/RegExp
Online tester for the upper case solution: https://regex101.com/r/yW9zF2/1
Online tester for the flag solution: https://regex101.com/r/mZ1vX9/1

There is some trouble with escaping the string, and as I see the regex should be case insensitive. Try this regex expression:
/\b[eht]\d{6}/i
Or using the RegExp constructor:
new RegExp("\\b[eht]\\d{6}", "i")

Related

Get the Opposite of a Regular Expression [duplicate]

Is it possible to write a regex that returns the converse of a desired result? Regexes are usually inclusive - finding matches. I want to be able to transform a regex into its opposite - asserting that there are no matches. Is this possible? If so, how?
http://zijab.blogspot.com/2008/09/finding-opposite-of-regular-expression.html states that you should bracket your regex with
/^((?!^ MYREGEX ).)*$/
, but this doesn't seem to work. If I have regex
/[a|b]./
, the string "abc" returns false with both my regex and the converse suggested by zijab,
/^((?!^[a|b].).)*$/
. Is it possible to write a regex's converse, or am I thinking incorrectly?

Couldn't you just check to see if there are no matches? I don't know what language you are using, but how about this pseudocode?
if (!'Some String'.match(someRegularExpression))
// do something...
If you can only change the regex, then the one you got from your link should work:
/^((?!REGULAR_EXPRESSION_HERE).)*$/

The reason your inverted regex isn't working is because of the '^' inside the negative lookahead:
/^((?!^[ab].).)*$/
^ # WRONG
Maybe it's different in vim, but in every regex flavor I'm familiar with, the caret matches the beginning of the string (or the beginning of a line in multiline mode). But I think that was just a typo in the blog entry.
You also need to take into account the semantics of the regex tool you're using. For example, in Perl, this is true:
"abc" =~ /[ab]./
But in Java, this isn't:
"abc".matches("[ab].")
That's because the regex passed to the matches() method is implicitly anchored at both ends (i.e., /^[ab].$/).
Taking the more common, Perl semantics, /[ab]./ means the target string contains a sequence consisting of an 'a' or 'b' followed by at least one (non-line separator) character. In other words, at ANY point, the condition is TRUE. The inverse of that statement is, at EVERY point the condition is FALSE. That means, before you consume each character, you perform a negative lookahead to confirm that the character isn't the beginning of a matching sequence:
(?![ab].).
And you have to examine every character, so the regex has to be anchored at both ends:
/^(?:(?![ab].).)*$/
That's the general idea, but I don't think it's possible to invert every regex--not when the original regexes can include positive and negative lookarounds, reluctant and possessive quantifiers, and who-knows-what.

You can invert the character set by writing a ^ at the start ([^…]). So the opposite expression of [ab] (match either a or b) is [^ab] (match neither a nor b).
But the more complex your expression gets, the more complex is the complementary expression too. An example:
You want to match the literal foo. An expression, that does match anything else but a string that contains foo would have to match either
any string that’s shorter than foo (^.{0,2}$), or
any three characters long string that’s not foo (^([^f]..|f[^o].|fo[^o])$), or
any longer string that does not contain foo.
All together this may work:
^[^fo]*(f+($|[^o]|o($|[^fo]*)))*$
But note: This does only apply to foo.

You can also do this (in python) by using re.split, and splitting based on your regular expression, thus returning all the parts that don't match the regex, how to find the converse of a regex

In perl you can anti-match with $string !~ /regex/;.

With grep, you can use --invert-match or -v.

Java Regexps have an interesting way of doing this (can test here) where you can create a greedy optional match for the string you want, and then match data after it. If the greedy match fails, it's optional so it doesn't matter, if it succeeds, it needs some extra data to match the second expression and so fails.
It looks counter-intuitive, but works.
Eg (foo)?+.+ matches bar, foox and xfoo but won't match foo (or an empty string).
It might be possible in other dialects, but couldn't get it to work myself (they seem more willing to backtrack if the second match fails?)

Javascript RegExp anomaly [duplicate]

I am trying to build a regexp from static text plus a variable in javascript. Obviously I am missing something very basic, see comments in code below. Help is very much appreciated:
var test_string = "goodweather";
// One regexp we just set:
var regexp1 = /goodweather/;
// The other regexp we built from a variable + static text:
var regexp_part = "good";
var regexp2 = "\/" + regexp_part + "weather\/";
// These alerts now show the 2 regexp are completely identical:
alert (regexp1);
alert (regexp2);
// But one works, the other doesn't ??
if (test_string.match(regexp1))
alert ("This is displayed.");
if (test_string.match(regexp2))
alert ("This is not displayed.");

First, the answer to the question:
The other answers are nearly correct, but fail to consider what happens when the text to be matched contains a literal backslash, (i.e. when: regexp_part contains a literal backslash). For example, what happens when regexp_part equals: "C:\Windows"? In this case the suggested methods do not work as expected (The resulting regex becomes: /C:\Windows/ where the \W is erroneously interpreted as a non-word character class). The correct solution is to first escape any backslashes in regexp_part (the needed regex is actually: /C:\\Windows/).
To illustrate the correct way of handling this, here is a function which takes a passed phrase and creates a regex with the phrase wrapped in \b word boundaries:
// Given a phrase, create a RegExp object with word boundaries.
function makeRegExp(phrase) {
// First escape any backslashes in the phrase string.
// i.e. replace each backslash with two backslashes.
phrase = phrase.replace(/\\/g, "\\\\");
// Wrap the escaped phrase with \b word boundaries.
var re_str = "\\b"+ phrase +"\\b";
// Create a new regex object with "g" and "i" flags set.
var re = new RegExp(re_str, "gi");
return re;
}
// Here is a condensed version of same function.
function makeRegExpShort(phrase) {
return new RegExp("\\b"+ phrase.replace(/\\/g, "\\\\") +"\\b", "gi");
}
To understand this in more depth, follows is a discussion...
In-depth discussion, or "What's up with all these backslashes!?"
JavaScript has two ways to create a RegExp object:
/pattern/flags - You can specify a RegExp Literal expression directly, where the pattern is delimited using a pair of forward slashes followed by any combination of the three pattern modifier flags: i.e. 'g' global, 'i' ignore-case, or 'm' multi-line. This type of regex cannot be created dynamically.
new RegExp("pattern", "flags") - You can create a RegExp object by calling the RegExp() constructor function and pass the pattern as a string (without forward slash delimiters) as the first parameter and the optional pattern modifier flags (also as a string) as the second (optional) parameter. This type of regex can be created dynamically.
The following example demonstrates creating a simple RegExp object using both of these two methods. Lets say we wish to match the word "apple". The regex pattern we need is simply: apple. Additionally, we wish to set all three modifier flags.
Example 1: Simple pattern having no special characters: apple
// A RegExp literal to match "apple" with all three flags set:
var re1 = /apple/gim;
// Create the same object using RegExp() constructor:
var re2 = new RegExp("apple", "gim");
Simple enough. However, there are significant differences between these two methods with regard to the handling of escaped characters. The regex literal syntax is quite handy because you only need to escape forward slashes - all other characters are passed directly to the regex engine unaltered. However, when using the RegExp constructor method, you pass the pattern as a string, and there are two levels of escaping to be considered; first is the interpretation of the string and the second is the interpretation of the regex engine. Several examples will illustrate these differences.
First lets consider a pattern which contains a single literal forward slash. Let's say we wish to match the text sequence: "and/or" in a case-insensitive manner. The needed pattern is: and/or.
Example 2: Pattern having one forward slash: and/or
// A RegExp literal to match "and/or":
var re3 = /and\/or/i;
// Create the same object using RegExp() :
var re4 = new RegExp("and/or", "i");
Note that with the regex literal syntax, the forward slash must be escaped (preceded with a single backslash) because with a regex literal, the forward slash has special meaning (it is a special metacharacter which is used to delimit the pattern). On the other hand, with the RegExp constructor syntax (which uses a string to store the pattern), the forward slash does NOT have any special meaning and does NOT need to be escaped.
Next lets consider a pattern which includes a special: \b word boundary regex metasequence. Say we wish to create a regex to match the word "apple" as a whole word only (so that it won't match "pineapple"). The pattern (as seen by the regex engine) needs to be: \bapple\b:
Example 3: Pattern having \b word boundaries: \bapple\b
// A RegExp literal to match the whole word "apple":
var re5 = /\bapple\b/;
// Create the same object using RegExp() constructor:
var re6 = new RegExp("\\bapple\\b");
In this case the backslash must be escaped when using the RegExp constructor method, because the pattern is stored in a string, and to get a literal backslash into a string, it must be escaped with another backslash. However, with a regex literal, there is no need to escape the backslash. (Remember that with a regex literal, the only special metacharacter is the forward slash.)
Backslash SOUP!
Things get even more interesting when we need to match a literal backslash. Let's say we want to match the text sequence: "C:\Program Files\JGsoft\RegexBuddy3\RegexBuddy.exe". The pattern to be processed by the regex engine needs to be: C:\\Program Files\\JGsoft\\RegexBuddy3\\RegexBuddy\.exe. (Note that the regex pattern to match a single backslash is \\ i.e. each must be escaped.) Here is how you create the needed RegExp object using the two JavaScript syntaxes
Example 4: Pattern to match literal back slashes:
// A RegExp literal to match the ultimate Windows regex debugger app:
var re7 = /C:\\Program Files\\JGsoft\\RegexBuddy3\\RegexBuddy\.exe/;
// Create the same object using RegExp() constructor:
var re8 = new RegExp(
"C:\\\\Program Files\\\\JGsoft\\\\RegexBuddy3\\\\RegexBuddy\\.exe");
This is why the /regex literal/ syntax is generally preferred over the new RegExp("pattern", "flags") method - it completely avoids the backslash soup that can frequently arise. However, when you need to dynamically create a regex, as the OP needs to here, you are forced to use the new RegExp() syntax and deal with the backslash soup. (Its really not that bad once you get your head wrapped 'round it.)
RegexBuddy to the rescue!
RegexBuddy is a Windows app that can help with this backslash soup problem - it understands the regex syntaxes and escaping requirements of many languages and will automatically add and remove backslashes as required when pasting to and from the application. Inside the application you compose and debug the regex in native regex format. Once the regex works correctly, you export it using one of the many "copy as..." options to get the needed syntax. Very handy!

You should use the RegExp constructor to accomplish this:
var regexp2 = new RegExp(regexp_part + "weather");
Here's a related question that might help.

The forward slashes are just Javascript syntax to enclose regular expresions in. If you use normal string as regex, you shouldn't include them as they will be matched against. Therefore you should just build the regex like that:
var regexp2 = regexp_part + "weather";

I would use :
var regexp2 = new RegExp(regexp_part+"weather");
Like you have done that does :
var regexp2 = "/goodweather/";
And after there is :
test_string.match("/goodweather/")
Wich use match with a string and not with the regex like you wanted :
test_string.match(/goodweather/)

While this solution may be overkill for this specific question, if you want to build RegExps programmatically, compose-regexp can come in handy.
This specific problem would be solved by using
import {sequence} from 'compose-regexp'
const weatherify = x => sequence(x, /weather/)
Strings are escaped, so
weatherify('.')
returns
/\.weather/
But it can also accept RegExps
weatherify(/./u)
returns
/.weather/u
compose-regexp supports the whole range of RegExps features, and let one build RegExps from sub-parts, which helps with code reuse and testability.

Negate random regular expression

Is there a way to negate any regular expression? I'm using regular expressions to validate input on a form. I'm now trying to create a button that sanitizes my input. Is there a way so I can use the regular expression used for the validating also for stripping the invalid characters?
I'm using this regex for validation of illegal characters
<input data-val-regex-pattern="[^|<>:\?'\*\[\]\=%\$\+,;~&\{\}]*" type="text" />
When clicking on a button next to it, I'm calling this function:
$('#button').click(function () {
var inputElement = $(this).prev();
var regex = new RegExp(inputElement.attr('data-val-regex-pattern'), 'g');
var value = inputElement.val();
inputElement.val(value.replace(regex, ''));
});
At the moment the javascript is doing the exact opposite of what I'm trying to accomplish. I need to find a way to 'reverse' the regex.
Edit: I'm trying to reverse the regex in the javascript function. The regex in the data-val-regex-pattern-attribute is doing his job for validation.

To find the invalid characters, just take the ^ off from your regex. The carret is the negative of everything that is inside the brackets.
data-val-regex-pattern="[|<>:\?'\*\[\]\=%\$\+,;~&\{\}]*"
This will return the undesired characters so you can replace them.
Also, as you want to take off a lot of non-word characters, you could try a simpler regex. If you want only word characters and spaces, you could use something like this:
data-val-regex-pattern="[\W\S]*"

Your reges is as so:
[^|<>:\?'\*\[\]\=%\$\+,;~&\{\}]*
That means, it matches any non-invalid character multiple times.
Then you replace this for empty, so you leave only the bad characters.
Try this instead, without the negation (hat moved somewhere else):
[|^<>:\?'\*\[\]\=%\$\+,;~&\{\}]*

The following answer is to the general question of negating a regular expression. In your specific case you just need to negate a character group, or more precisely remove the negation of a character group - which is detailed in other answers.
Regular languages – those consisting of all strings entirely by matched some RE – are in fact closed under negation: there is another RE which matches exactly those strings the original RE does not. It is however not trivial to construct, which perhaps explains why RE implementations often do not offer a negation operator.
However the Javascript regexp language has extensions that make it more expressive than regular languages; in particular there is the construct of negative lookahead.
If R1 is a regexp then
^(?!.*(R1))
matches precisely the strings that does not contain a match for R1.
And
^(?!R1$)
matches precisely the strings where the whole string is not a match for R1.
Ie. negation.
For rewriting any substring not matching a given regexp, the above is insufficient. One would have to do something like
((?!R1).)*
Which would catch any substring not containing a subsubstring that matches R1. - But consideration of the edge cases show that this does not quite do what we are after. For example ((?!ab).)* matches "b" in "ab", because "ab" is not a substring of "b".
One can cheat, and make your regexp like;
(.*)(R1|$)
And rewrite to T1$2
Where T1 is the target string you want to rewrite to.
This should rewrite any portion of the string not matching R1 to T1. However I would be very careful about any edge cases for this. So much so that it might be better to write the regexp from scratch rather than trying a general approach.

RegularExpression RegExp.test

I want to match a string with following regular expression -
^\d{4}-\d{5}$|^\d{4}-\d{6}$
which is regex for a zip code with 4 digits-then 5 OR 6 digits after dash.
I am hoping my regex is correct as I have tested it on some online RegEx tester.
and for matching my string with above regex in jquery, I am using:
var regExpTest = new RegExp("^\d{4}-\d{5}$|^\d{4}-\d{6}$");
alert(regExpTest.test("1234-123456"));
But I am always getting false, can anyone please guide what is going wrong here?
Thank you!

Because the regular expression constructor takes a string as its argument, you need to escape the backslash \ wherever you use it. In your example, anywhere you have a \d needs to be \\d. You can see what happens if you don't by testing your code in Firebug or Chrome's developer tools:
new RegExp("^\d{4}-\d{5}$|^\d{4}-\d{6}$");
//-> /^d{4}-d{5}$|^d{4}-d{6}$/
Notice the slashes are gone? Now watch what happens when we escape each backslash:
new RegExp("^\\d{4}-\\d{5}$|^\\d{4}-\\d{6}$");
//-> /^\d{4}-\d{5}$|^\d{4}-\d{6}$/
So that should fix your problem. However, it's much easier to use the literal grammar for regular expressions when you're not using a variable to create them:
var regExpTest = /^\d{4}-\d{5}$|^\d{4}-\d{6}$/;
alert(regExpTest.test("1234-123456"));
//-> "true"
This way, you can write the expression without having to worry about double-escaping.

Javascript String pattern Validation

I have a string and I want to validate that string so that it must not contain certain characters like '/' '\' '&' ';' etc... How can I validate all that at once?

You can solve this with regular expressions!
mystring = "hello"
yourstring = "bad & string"
validRegEx = /^[^\\\/&]*$/
alert(mystring.match(validRegEx))
alert(yourstring.match(validRegEx))
matching against the regex returns the string if it is ok, or null if its invalid!
Explanation:
JavaScript RegEx Literals are delimited like strings, but with slashes (/'s) instead of quotes ("'s).
The first and last characters of the validRegEx cause it to match against the whole string, instead of just part, the carat anchors it to the beginning, and the dollar sign to the end.
The part between the brackets ([ and ]) are a character class, which matches any character so long as it's in the class. The first character inside that, a carat, means that the class is negated, to match the characters not mentioned in the character class. If it had been omited, the class would match the characters it specifies.
The next two sequences, \\ and \/ are backslash escaped because the backslash by itself would be an escape sequence for something else, and the forward slash would confuse the parser into thinking that it had reached the end of the regex, (exactly similar to escaping quotes in strings).
The ampersand (&) has no special meaning and is unescaped.
The remaining character, the kleene star, (*) means that whatever preceeded it should be matched zero or more times, so that the character class will eat as many characters that are not forward or backward slashes or ampersands, including none if it cant find any. If you wanted to make sure the matched string was non-empty, you can replace it with a plus (+).

I would use regular expressions.
See this guide from Mozillla.org. This article does also give a good introduction to regular expressions in JavaScript.

Here is a good article on Javascript validation. Remember you will need to validate on the server side too. Javascript validation can easily be circumvented, so it should never be used for security reasons such as preventing SQL Injection or XSS attacks.

You could learn regular expressions, or (probably simpler if you only check for one character at a time) you could have a list of characters and then some kind of sanitize function to remove each one from the string.
var myString = "An /invalid &string;";
var charList = ['/', '\\', '&', ';']; // etc...
function sanitize(input, list) {
for (char in list) {
input = input.replace(char, '');
}
return input
}
So then:
sanitize(myString, charList) // returns "An invalid string"

You can use the test method, with regular expressions:
function validString(input){
return !(/[\\/&;]/.test(input));
}
validString('test;') //false

You can use regex. For example if your string matches:
[\\/&;]+
then it is not valid. Look at:
http://www.regular-expressions.info/javascriptexample.html

You could probably use a regular expression.

As the others have answered you can solve this with regexp but remember to also check the value server-side. There is no guarantee that the user has JavaScript activated. Never trust user input!

We Keep Coding

JavaScript is the programming language of the Web.