what is max length of string to test regular expression in javascript

what is max length of string to test regular expression in javascript - javascript

I am using following regex if I pass lenghty string to check, regex101.com showing timeout message. Is there any ideal length to test regular expresssion?
^(\d+[\s\r\n\t,]*){1,100}$
https://regex101.com/r/eC5qO7/1

I suggest running a split, then making sure what is split is a number, so:
var test = "123,456 789 101112 asdf";
var numbers = test.split(/\s*,\s*|\s+/);
numbers.forEach(function(n) {
if (n % 1 !== 0) {
alert(n + " is not an integer");
}
});

The catastrophic backtracking is caused by the fact that the [\s\r\n\t,] character class has a * quantifier applied to it, and then a + quantifier is set to the whole group. The regex backtracks into each digit to search for optional whitespace and commas, which creates a huge number of possibilities that the engine tries before running into the "catastrophe".
Besides, there is another potential bottleneck: \s in the character class can also match \r, \n and \t. See see Do character classes present the same risk?.
Without atomic groups and possessive quantifiers, the regex optimization is only possible by means of making one of the "separators" obligatory. In this case, it is clearly a comma (judging by the example string). Since you just want to validate the number of input numbers separated with commas and optional spaces, you can use a simpler regex:
^(?:[0-9]+\s*,\s*){1,100}$
Here, it fails gracefully, and here it matches the string OK.
If a comma at the end is optional, use
^(?:\d+\s*,\s*){1,99}\d+,?\s*$
See demo
Also note you do not need the i modifier, as there are no letters in the pattern.

Related

Odd RegEx request for Javascript

I'm having trouble with a certain RegEx replacement string for later use in Javascript.
We have quite a bit of text that was stored in a rather odd format that we aren't allowed to fix.
But we do need to find all the "network path" strings inside it, following these rules:
A. The matches always start with 2 backslashes.
B. The matching characters should stop as soon as it hits a first occurrence of any 1 of these:
A < character
A space
A line feed
A carriage return
A & character
A literal "\r" or "\n" string (but only if occurring at end of line)
We "almost" have it working with /\\\\[^ &<\s]*/gi as shown in this RegEx Tester page:
https://regex101.com/r/T4cDOL/5
Even if we get it working, the RegEx has to be even futher "escape escaped" before putting on
our Javascript code, but that's also not working as expected.

From your example, it seems you literally have a backslash followed by an n and a backslash followed by an r (as opposed to a newline or carriage return), which means you can't only use a negated character class (since you need to handle a sequence of two characters). I'd use a positive lookahead to know where to stop, so I can use an alternation for that part.
You haven't said what parts of those strings should match, so I've had to guess a bit, but here's my best guess (with useful input from Niet the Dark Absol):
const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$| ))/gmi;
That says:
Match starting with \\
Take everything prior to the lookahead (non-greedy)
Lookahead: An alternation of:
A space, &, <, carriage return (\r, character 13), or a newline (\n, character 10); or
A backslash followed by r or n if that's either at the end of a line or followed by a space (so we get the \nancy but not the \n after it).
Updated regex101
You might want to have more characters than just a space after the \r/\n. If so, make it a character class (and/or use \s for "whitespace" if that applies):
const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$|[ others]))/gmi;
// −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−^^^^^^^^^

Modify regex to disallow spaces

I'm working with the following regex:
var currentVal = $(this).val();
//making sure it's only numbers, decimals, and commas
var isValid = /^[0-9,.]*$/.test(currentVal);
and I'm trying to modify it so that it disallows spaces as well. I tried adding /s within the regex but it still allows it. Newer to regex, gets confusing quick

The brackets [] in your regex form a character class. ^ means start of string and $ means end of string with * applying the character class greedily multiple times. As it is written it should only allow characters specified in the class 0 through 9, comma, and period characters. If you tried adding in \s you would be also allowing for any whitespace characters in your character class, and thus cause your problem.
console.log(/^[0-9,.]*$/.test("123 903"));

javascript regex : possible to have a range in the quantifier? [duplicate]

I've written a regular expression that matches any number of letters with any number of single spaces between the letters. I would like that regular expression to also enforce a minimum and maximum number of characters, but I'm not sure how to do that (or if it's possible).
My regular expression is:
[A-Za-z](\s?[A-Za-z])+
I realized it was only matching two sets of letters surrounding a single space, so I modified it slightly to fix that. The original question is still the same though.
Is there a way to enforce a minimum of three characters and a maximum of 30?

Yes
Just like + means one or more you can use {3,30} to match between 3 and 30
For example [a-z]{3,30} matches between 3 and 30 lowercase alphabet letters
From the documentation of the Pattern class
X{n,m} X, at least n but not more than m times
In your case, matching 3-30 letters followed by spaces could be accomplished with:
([a-zA-Z]\s){3,30}
If you require trailing whitespace, if you don't you can use: (2-29 times letter+space, then letter)
([a-zA-Z]\s){2,29}[a-zA-Z]
If you'd like whitespaces to count as characters you need to divide that number by 2 to get
([a-zA-Z]\s){1,14}[a-zA-Z]
You can add \s? to that last one if the trailing whitespace is optional. These were all tested on RegexPlanet
If you'd like the entire string altogether to be between 3 and 30 characters you can use lookaheads adding (?=^.{3,30}$) at the beginning of the RegExp and removing the other size limitations
All that said, in all honestly I'd probably just test the String's .length property. It's more readable.

This is what you are looking for
^[a-zA-Z](\s?[a-zA-Z]){2,29}$
^ is the start of string
$ is the end of string
(\s?[a-zA-Z]){2,29} would match (\s?[a-zA-Z]) 2 to 29 times..

Actually Benjamin's answer will lead to the complete solution to the OP's question.
Using lookaheads it is possible to restrict the total number of characters AND restrict the match to a set combination of letters and (optional) single spaces.
The regex that solves the entire problem would become
(?=^.{3,30}$)^([A-Za-z][\s]?)+$
This will match AAA, A A and also fail to match AA A since there are two consecutive spaces.
I tested this at http://regexpal.com/ and it does the trick.

You should use
[a-zA-Z ]{20}
[For allowed characters]{for limiting of the number of characters}

JavaScript regular expressions to match no digits, whitespace and selected symbols

Thanks for taking a look.
My goal is to come up with a regexp that will match input that contains no digits, whitespace or the symbols !#£$%^&*()+= or any other symbol I may choose.
I am however struggling to grasp precisely how regular expressions work.
I started out with the simple pattern /\D/, which from my understanding will match the first non-digit character it can find. This would match the string 'James' which is correct but also 'James1' which I don't want.
So, my understanding is that if I want to ensure that a pattern is not found anywhere in a given string, I use the ^ and $ characters, as in /^\D$/. Now because this will only match a single character that is not a digit, I needed to use + to specify that 1 or more digits should not be founds in the entire string, giving me the expression /^\D+$/. Brilliant, it no longer matches 'James1'.
Question 1
Is my reasoning up to this point correct?
The next requirement was to ensure no whitespace is in the given string. \s will match a single whitespace and [^\s] will match the first non-whitespace character. So, from my understanding I just had to add this to what I have already to match strings that contain no digits and no whitespace. Again, because [^\s] will only match a single non-white space character, I used + to match one or more whitespace characters, giving the new regexp of /^\D+[^\s]+$/.
This is where I got lost, as the expression now matches 'James1' or even 'James Smith25'. What? Massively confused at this point.
Question 2
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Question 3
How would I go about writing the regular expression I'm trying to solve?
While I am keen to solve the problem I am more interested in figuring where my understanding of regular expressions is lacking, so any explanations would be helpful.

Not quite; ^ and $ are actually "anchors" - they mean "start" and "end", it's actually a little more complicated, but you can consider them to mean the start and end of a line for now - look up the various modifiers on regular expressions if you're interested in learning more about this. Unfortunately ^ has an overloaded meaning; if used inside square brackets it means "not", which is the meaning you are already acquainted with. It's very important that you understand the difference between these two meanings and that the definition in your head actually applies only to character range matching!
Contributing further to your confusion is that \d means "a numerical digit" and \D means "not a numerical digit". Similarly \s means "a whitespace (space/tab/newline/etc.) character" and \S means "not a whitespace character."
It's worth noting that \d is effectively a shortcut for [0-9] (note that - has a special meaning inside square brackets), and \D is a shortcut for [^0-9].
The reason it's matching strings that contain spaces is that you've asked for "1+ non-numerical digits followed by 1+ non-space characters" - so it'll match lots of strings! I think that perhaps you don't understand that regular expressions match bits of strings, you're not adding constraints as you go, but rather building up bots of matchers that will match bits of corresponding strings.
/^[^\d\s!#£$%^&*()+=]+$/ is the answer you're looking for - I'd look at it like this:
i. [] - match a range of characters
ii. []+ - match one or more of that range of characters
iii. [^\d\s]+ - match one or more characters that do not match \d (numerical digit) or \s (whitespace)
iv. [^\d\s!#£$%^&*()+=]+ - here's a bunch of other characters I don't want you to match
v. ^[^\d\s!#£$%^&*()+=]+$ - now there are anchors applied, so this matcher has to apply to the whole line otherwise it fails to match
A useful website to explore regexs is http://regexr.com/3b9h7 - which I supply with my suggested solution as an example. Edit: Pruthvi Raj's link to debuggerx is awesome!

Is my reasoning up to this point correct?
Almost. /\D/ matches any character other than a digit, but not just the first one (if you use g option).
and [^\s] will match the first non-whitespace character
Almost, [^\s] will match any non-whitespace character, not just the first one (if you use g option).
/^\D+[^\s]+$/ matching strings that contain spaces?
Yes, it does, because \D matches a space (space is not a digit).
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Because \D+ in /^\D+[^\s]+$/can match spaces.
Conclusion:
Use
^[^\d\s!#£$%^&*()+=]+$
It will match strings that have no digits and spaces, and the symbols you do not allow.
Mind that to match a literal -, ] or [ with a character class, you either need to escape them, or use at the start or end of the expression. To play it safe, escape them.

Just insert every character you don't want to include in a negated character class as follows:
^[^\s\d!#£$%^&*()+=]*$
DEMO
Debuggex Demo
^ - start of the string
[^...] - matches one character that is not in `...`
\s - matches a whitespace (space, newline,tab)
\d - matches a digit from 0 to 9
* - a quantifier that repeats immediately preceeding element by 0 or more times
so the regex matches any string that has
1. string that has a beginning
2. containing 0 or more number of characters that is not whitesapce, digit, and all the symbols included in the character class ( In this example !#£$%^&*()+=) i.e., characters that are not included in the character class `[...]`
3.that has ending
NOTE:
If the symbols you don't want it to have also includes - , a hyphen, don't put it in between some other characters because it is a metacharacter in character class, put it at last of character class

Regex to allow certain special characters - escape issue

I've working on a javascript regex that I intend to use with the jquery validate plugin (I'll add this as an additional method). It must (among other rules):
test if at least one of the following special characters is entered:
!, ", #, $, %, &, ', (, ), *, +,-, .,/, :, ;, <, =, >, ?, #, [, \, ], ^, _, `, {, |, }, ~
not allow 3 consecutive identical characters:
passed:
aa
99
++
not passed:
aaa
999
+++
The problem with my regex is that is having problem with these mentioned rules:
I think the issue is related to escaping and I've tried escaping + and - to no avail. Can anyone help! This is my regex: http://regexr.com/3ack3

This is one of those requirements where you can really simplify your life by using multiple regexes, rather than trying to cram all the logic into one complex regex with many assertions. Here's some JavaScript that implements your requirement:
var specialCharRegex = /[!"#$%&'()*+.\/:;<=>?#\[\\\]^_`{|}~-]/;
var threeConsecutiveRegex = /(.)\1\1/;
var input = prompt();
if (specialCharRegex.test(input) && !threeConsecutiveRegex.test(input)) {
alert('passed');
} else {
alert('failed');
} // end if
http://jsfiddle.net/t8609xv2/
Some notes on the trickier points:
inside the bracket expression, the following four special characters had to be backslash-escaped: /[\]. (Forward slash because it delimits the regex, backslash because it's the escape character, and the brackets because they delimit the bracket expression.)
inside the bracket expression, the dash had to be moved to the end, because otherwise it would likely specify a character range. When at the end, it never specifies a range, so it's always safer to put it there.
This modular approach also benefits maintainability, as you will more easily be able to make changes (modify/add/remove regexes, or change the if-test logic) at a later point in time.
Another benefit is that you could test each regex independently, which could allow you to provide a more accurate error message to the user, as opposed to just saying something like "invalid password".
Edit: Here's how you can whitelist the chars that are accepted in the input:
var specialCharRegex = /[!"#$%&'()*+.\/:;<=>?#\[\\\]^_`{|}~-]/;
var threeConsecutiveRegex = /(.)\1\1/;
var nonWhitelistCharRegex = /[^a-zA-Z0-9!"#$%&'()*+.\/:;<=>?#\[\\\]^_`{|}~-]/;
var input = prompt();
if (specialCharRegex.test(input) && !threeConsecutiveRegex.test(input) && !nonWhitelistCharRegex.test(input)) {
alert('passed');
} else {
alert('failed');
} // end if
http://jsfiddle.net/t8609xv2/2/

^(?=.*[!"#$%&'()*+,,\/:;<=>?#\[\]^_`{|}~-])(?!.*(.)\1\1).*$
Try this.See demo.
https://regex101.com/r/wX9fR1/10
You need a positive lookahead to check for special characters.
And
A negative lookahead to check if a character is is there 3 times.

You can use this regex:
^(?!.*?(.)\1{2})(?=[^a-z]*[a-z])(?=[^A-Z]*[A-Z])(?=\D*\d)(?=.*?[!##$%^&*()_=\[\]{};':"\\|,.<>\/?+-]).{8,20}$
RegEx Demo
You might be able to shorten it using:
^(?!.*?(.)\1{2})(?=[^a-z]*[a-z])(?=[^A-Z]*[A-Z])(?=\D*\d)(?=.*?[\W_]).{8,20}$
i.e. using non-word property \W instead of listing each and every special character.

We Keep Coding

JavaScript is the programming language of the Web.