Javascript regular expression quantifier

Javascript regular expression quantifier - javascript

I am trying to write a javascript regular expression that matches a min and max number of words based on finding this pattern: any number of characters followed by a space. This matches one word followed by an empty space (for example: one ):
(^[a-zA-Z]+\s$)
Debuggex Demo
When I add in the range quantifier {1,3}, it doesn't match two occurrences of the pattern (for example: one two ). What do I need to change to the regular expression to match a min and max of this pattern?
(^[a-zA-Z]+\s$){1,3}
Debuggex Demo
Any explanation is greatly appreciated.

Take ^ and $ out of the quantified group, because you can't match the beginning and end of the string multiple times in one line.
^([a-zA-Z]+\s){1,3}$
DEMO

The following will work exactly as specified:
^([a-zA-Z]+ ){1,3}$
Replace the space with \s to match any single whitespace character:
^([a-zA-Z]+\s){1,3}$
Add a quantifier to the \s to set how many whitespace characters are acceptable. The following allows one or more by adding +:
^([a-zA-Z]+\s+){1,3}$
If the whitespace at the end is optional, then the following will work:
^([a-zA-Z]+(\s[a-zA-Z]+){0,2})\s*$

(^[a-zA-Z]+\s$) will start scanning from the start of the line ^, scan for a word [a-zA-Z]+, scan for a space \s, and expect the end of the line $
When you have two words, it does not find the end of the line, so it fails. If you take out $, the second word would fail because it is not the start of the line.
So the start line and end line have to go around the limit scan.
To make it more generic:
(\S+\s*){1,3}
\S+: At least one Non-whitespace
\s*: Any amount of Whitespace
This will allow scanning of words even if there is no space at the end of the string. If you want to force the whole line, then you can put ^ in the front and $ at the end:
^(\S+\s*){1,3}$

Related

Regex max string length between commas but ignores whitespace

I am trying to set the max string length between commas to 4.
I am using ^([^,?]{0,4},)*[^,?]{0,4}$, which works fine. However if the user adds a space before the word, the current code counts that whitespace.
Example: 'this','will','be','fine'. <-- this works.
'this',' will','not','work' <-- this does Not work. Notice the whitespace before the ' will'. How do I modify my regex to not count this whitespace?

You can use
Validators.pattern('\\s*[^\\s,?]{0,4}(?:\\s*,\\s*[^\\s,?]{0,4})*\\s*')
Validators.pattern(/^\s*[^\s,?]{0,4}(?:\s*,\s*[^\s,?]{0,4})*\s*$/)
See the regex demo. Adjust the pattern by removing \s* anywhere you see fit.
Whenever you see a regex matches in the regex101.com tester and does not work in your code, always check the Code generator page link. See the Regex not working in Angular Validators.pattern() while working in online regex testers and Regular expression works on regex101.com, but not on prod.
Details:
^ - start of string
\s* - zero or more whitespacs
[^\s,?]{0,4} - zero to four chars other than whitespace, comma and a question mark
(?:\s*,\s*[^\s,?]{0,4})* - zero or more sequences of a comma enclosed with zero or more whitespaces followed with zero to four chars other than whitespace, comma and a question mark
\s* - zero or more whitespaces
$ - end of string

Issue with javascript regex not matching less than 3 characters

I have the following javascript regex:
/^[^\s][a-z0-9 ]+[^\s]$/i
I need to allow any alphanumeric character as well as spaces inside the string but not at the beginning nor at the end.
Oddly enough, the above regex will not accept less than 3 characters, e.g. aa will not match but aaa will.
I am not sure why. Can anyone please help ?

You have: [^\s] (requires matching at least one non-whitespace character), [a-z0-9 ]+ (requires matching at least one alphanumeric or space character), and [^\s] again (requires matching at least one non-whitespace character). So, in total, you need at least 3 characters in the string.
Use word boundaries at the beginning and end instead:
/^\b[a-z0-9 ]+\b$/i
https://regex101.com/r/2GhH3N/1

Try the following regex:
^(?! )[a-z0-9 ]*[a-z0-9]$
Details:
^(?! ) - Start of the string and no space after it (so here we exclude the
initial space).
[a-z0-9 ]* - A sequence of letters, digits and spaces, possibly empty
(the content before the last letter(see below).
[a-z0-9]$ - The last letter and the end of string (so here we exclude the
terminal space).

You should re-write the expression as
/^[a-z0-9]+(?:\s+[a-z0-9]+)*$/i
See the regex demo.
NOTE: If only one whitespace is allowed between the alphanumeric chars use
/^[a-z0-9]+(?:\s[a-z0-9]+)*$/i
^^
Details
^ - start of string
[a-z0-9]+ - 1+ letters/digits
(?:\s+[a-z0-9]+)* - 0 or more repetitions of 1+ whitespaces (\s+) and 1+ digit/letters
$ - end of string.
See the regex graph:

regex to coordinates WGS84?

I'm trying this regular expressión, but I can't validate correctly the end white space and the letter:
/^\d{0,2}(\-\d{0,2})?(\-\d{0,2})?(\ ?\d[W,E]?)?$/
Examples of correct values:
33-39-10 N //OK
85-50 W //OK
-85-50 E //Wrong
What's wrong?

\d{0,2} this quantifier also matches a digit zero times so that would match the leading - in the 3rd example.
In the character class [W,E] you could omit the comma and list the characters you allow to match [ENW]
If only the third group is optional you could try including the whitespace before the end of the line $
^\d{2}(-\d{2})(-\d{2})? [ENW] $

I have used this regular expression : ^(?!\-)\d{0,2}?(\-\d{0,2}).+\s(N|E|W|S)$
Using a negative lookahead, we have excluded anything that starts with a dash (-).
(?!\-) = Starting at the current position in the expression,
ensures that the given pattern will not match
\s(N|E|W|S) matches anything with a space (\s) and one of the letters using OR operator |.
You may also use \s+(N|E|W|S).
+ = Matches between one and unlimited times, as many times as
possible, giving back as needed

Lookaheads to delimit text

I'm trying to delimit a huge text with several documents inside. Each document starts with the word 'MINISTÉRIO', so i'm trying to use lookaheads to catch everything from MINISTÉRIO until the next MINISTÉRIO:
(MINISTÉRIO)[\s\S]*?(^(?=\1))
http://regexr.com/3dk6k
I also was trying to:
(^MINISTÉRIO)[\s\S]*?(?=\1)
http://regexr.com/3dk6h
Nether is working. I have two questions: Why my regex is not working? Should be i think... And, how to fix?
Thanks!

Issue Description
The /(MINISTÉRIO)[\s\S]*?(^(?=\1))/gm matches the word MINISTÉRIO at any place in the text capturing it into Group 1. [\s\S]*? matches lazily any character, 0 or more repetitions up to a beginning of a line that is followed with the word MINISTÉRIO. Thus, if you have a "document" from some place in the string up to the end, that match won't be found as you cannot specify the $ anchor since it is redefined to match the end of a line.
Using /(^MINISTÉRIO)[\s\S]*?(?=\1)/g, you match and capture the MINISTÉRIO word at the beginning of the whole string only, and match any char as few as possible up to the first MINISTÉRIO substring in the string, at any place in the string, and there is no check for the beginning of a line.
Solution
You may use an unrolled regex like
/^MINISTÉRIO\b.*(?:\n(?!MINISTÉRIO\b).*)*/gm
The regex demo is here
When the text is too long, lazy matching like in your pattern takes too much time, and using negated character classes can greatly increase performance.
In short:
^MINISTÉRIO\b - matches MINISTÉRIO as a whole word at the start of a line:
^ - start of a line (due to /m modifier)
MINISTÉRIO\b - a whole word MINISTÉRIO as \b is a word boundary
.*(?:\n(?!MINISTÉRIO\b).*)* - matches any text that is not MINISTÉRIO at the start of a line:
.* - 0+ chars other than a newline
(?:\n(?!MINISTÉRIO\b).*)* - 0+ sequences of:
\n(?!MINISTÉRIO\b) - a newline not followed with MINISTÉRIO as a whole word
.* - 0+ chars other than a newline
It is basically the same as /^MINISTÉRIO\b(?:(?!^MINISTÉRIO\b)[\s\S])*/gm, but should be much faster as the tempered greedy token ((?:(?!^MINISTÉRIO\b)[\s\S])*) is rather resource consuming.

Regex to match multiple white spaces until one character

I'm trying to create a regex that will detect markdown quoting blocks.
Here is my regex so far:
(?:^|\n)[ \t]*(>[ \t]*\S(?:(?!\n(\s*\n)+[^>])[\s\S])*)
(?:^|\n) beginning of string, or a line break
[ \t]* optional spaces
>[ \t]*\S `>` followed by optional spaces and at least one character
( ... [\s\S])* capture any following character\white-space multiple times
(?!\n(\s*\n)+[^>]) stop capture if next following characters are
at least 2 line breaks mixed with other white-spaces
followed by anything but `>`
Everything works fine, except for the negative lookahead: the capture stops if more than 2 line breaks are encountered.
Regex101 shows me 4 matches when I want 3. Any pointers?

I believe your issue is here:
(?!\n(\s*\n)+[^>])
The [^>] is matching a \n. Change it to this:
(?!\n(\s*\n)+[^>\s])
http://regex101.com/r/mT4wC4

We Keep Coding

JavaScript is the programming language of the Web.

Javascript regular expression quantifier - javascript

Take ^ and $ out of the quantified group, because you can't match the beginning and end of the string multiple times in one line. ^([a-zA-Z]+\s){1,3}$ DEMO

Related

Regex max string length between commas but ignores whitespace

Issue with javascript regex not matching less than 3 characters

regex to coordinates WGS84?

Lookaheads to delimit text

Regex to match multiple white spaces until one character

Categories

Resources