how does javascript regex lazy match work?

how does javascript regex lazy match work? - javascript

For this string
abc.com/file/some.png?v=123
how do I match .png? I use
/\..*?\?/
but it is matching .com/file/some.png?, so why is the lazy match rule not working here?

There are lots of variants to this answer. I will propose matching the first file suffix after the last / character.
That can be done with this regex
/(?!.*\/)\.\w+\?/
Explaination
(?!.*/)\.\w+\?
Options: Case insensitive; Dot doesn’t match line breaks; ^$ match at line breaks
Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!.*/)
Match any single character that is NOT a line break character (line feed, carriage return, line separator, paragraph separator) .*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “/” literally /
Match the character “.” literally \.
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) \w+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the question mark character \?
\1
Insert a backslash \
Insert the character “1” literally 1
Created with RegexBuddy

Related

Issue with javascript regex not matching less than 3 characters

I have the following javascript regex:
/^[^\s][a-z0-9 ]+[^\s]$/i
I need to allow any alphanumeric character as well as spaces inside the string but not at the beginning nor at the end.
Oddly enough, the above regex will not accept less than 3 characters, e.g. aa will not match but aaa will.
I am not sure why. Can anyone please help ?

You have: [^\s] (requires matching at least one non-whitespace character), [a-z0-9 ]+ (requires matching at least one alphanumeric or space character), and [^\s] again (requires matching at least one non-whitespace character). So, in total, you need at least 3 characters in the string.
Use word boundaries at the beginning and end instead:
/^\b[a-z0-9 ]+\b$/i
https://regex101.com/r/2GhH3N/1

Try the following regex:
^(?! )[a-z0-9 ]*[a-z0-9]$
Details:
^(?! ) - Start of the string and no space after it (so here we exclude the
initial space).
[a-z0-9 ]* - A sequence of letters, digits and spaces, possibly empty
(the content before the last letter(see below).
[a-z0-9]$ - The last letter and the end of string (so here we exclude the
terminal space).

You should re-write the expression as
/^[a-z0-9]+(?:\s+[a-z0-9]+)*$/i
See the regex demo.
NOTE: If only one whitespace is allowed between the alphanumeric chars use
/^[a-z0-9]+(?:\s[a-z0-9]+)*$/i
^^
Details
^ - start of string
[a-z0-9]+ - 1+ letters/digits
(?:\s+[a-z0-9]+)* - 0 or more repetitions of 1+ whitespaces (\s+) and 1+ digit/letters
$ - end of string.
See the regex graph:

Match Specific File Path Regex

so far I have this regex $fileregex = /([a-z]:\\\\([^\\\\^\\.])*)|(\/[^\/.])/i; but I am very confused on what to do next.
I want to match strings in this format
c:\\something\\else\\something
c:\\something\\else\\something.whatever
/etc/whatever/something/here
/etc/here.txt
/
c:\\
But I don't want to match, for example
c:\oneslash\text.txt
\etc\hi
I am really stuck on my regex especially on repeating the optional path, as one could just request the root.Can anyone help me out with the regex?

This one should work:
preg_match_all('%[A-Za-z]:\\\\\\\\(.*?\\\\\\\\)*.*|/(.*?/)*.*%m', $input, $regs, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($regs[0]); $i++) {
# Matched text = $regs[0][$i];
}
Result:
Description of the Regex:
Match either the regular expression below (attempting the next alternative only if this one fails)
[A-Za-z] Match a single character present in the list below
A character in the range between “A” and “Z”
A character in the range between “a” and “z”
: Match the character “:” literally
\\\\ Match the character “\” literally
\\\\ Match the character “\” literally
( Match the regular expression below and capture its match into backreference number 1
. Match any single character that is not a line break character
*? Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\\\\ Match the character “\” literally
\\\\ Match the character “\” literally
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
| Or match regular expression number 2 below (the entire match attempt fails if this one fails to match)
/ Match the character “/” literally
( Match the regular expression below and capture its match into backreference number 2
. Match any single character that is not a line break character
*? Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
/ Match the character “/” literally
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)

Have you try this:
/^([a-zA-Z]\:|\\\\[^\/\\:*?"<>|]+\\[^\/\\:*?"<>|]+)(\\[^\/\\:*?"<>|]+)+(\.[^\/\\:*?"<>|]+)$/
This regular expression will match any valid file path. It checks local drives and network path. The file extension is required.
Further references : The regular expression library

Checking an Array Element for Two Blank Space Characters

How would you be able to tell if an array element is made up of three words (i.e. if it has two blank space characters in it)? It might look something like "abc def ghi". I am trying to search through an array for elements of this form and will remove this while others of the format "jkl xyz" or '"jkl"' would remain in the array.

You can use search function with following regex :
str.search(/\b(\w+ \w+ \w+)\b/g);
Read the detail in Demo

You can use a regex like:
/^[^\s]+\s[^\s]+\s[^\s]+$/.test("abc def def") // true
/^[^\s]+\s[^\s]+\s[^\s]+$/.test("abc def ") // false
It means:
^ Start of string
[^\s]+ 1 or more none space characters
\s a space character
[^\s]+ 1 or more none space characters
\s a space character
[^\s]+ 1 or more none space characters
\s a space character
$ End of string

var myArray = ["abc def ghi","jkl xyz","gty slp","zxc vbn jkl"];
for (i=0;i<myArray.length;++i) {
if (/\w+ \w+ \w+/.test(myArray[i])) {
myArray.splice([i], 1);
}
};
console.log(myArray);
Outputs:
["jkl xyz", "gty slp"]
CODEPEN DEMO
RegexExplanation:
\w+ \w+ \w+
-----------
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “ ” literally « »
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “ ” literally « »
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Is there a better way to write a regex that does not match on leading and trailing spaces along with a character limit?

The regex I have is...
^[A-z0-9]*[A-z0-9\s]{0,20}[A-z0-9]*$
The ultimate goal of this regex is not to allow leading and trailing spaces, while limiting the characters that are entered to 20, which the above regex doesn't do a good job at.
I found a some questions similar to this and the closest one to this would be How to validate a user name with regex?, but it did not limit the number of chars. This did solve the problem of leading and trailing spaces.
I also saw a way using negation and another negative lookahead, but that didn't work out so well for me.
Is there a better way to write the regex above with the 20 character limit? The repeat of the allowed characters is pretty ugly especially when the list of the allowed characters are large and specific.

Update:
I like this one even better. We use a negative lookahead to make sure there isn't ^\s (whitespace at the beginning of the string) or \s$ whitespace at the end of the string. And then match 1 alphanumeric character. We repeat this 1-20 times.
/^(?:(?!^\s|\s$)[a-z0-9\s]){1,20}$/i
Demo
^ (?# beginning of string)
(?: (?# non-capture group for repetition)
(?! (?# begin negative lookahead)
^\s (?# whitespace at beginning of string)
| (?# OR)
\s$ (?# whitespace at end of string)
) (?# end negative lookahead)
[a-z0-9\s] (?# match one alphanumeric/whitespace character)
){1,20} (?# repeat this process 1-20 times)
$ (?# end of string)
Initial:
I use a negative lookahead at the beginning of the string ((?!...)) to make sure that we don't start off with whitespace. Then we check for 0-19 alphanumeric (case-insensitive thanks to i modifier) or whitespace characters. Finally, we make sure we end with a pure alphanumeric character (no whitespace) since we can't use lookbehinds in Javascript.
/^(?!\s)[a-z0-9\s]{0,19}[a-z0-9]$/i

Hmm, if you need to exclude the single character text, I would go with:
^[A-z0-9][A-z0-9\s]{0,18}[A-z0-9]$
If a single character is also acceptable:
^[A-z0-9](?:[A-z0-9\s]{0,18}[A-z0-9])?$

I think your regex limits the input to 22 characters, not 20.
Are you aware that character range [A-z] includes characters [\]^_`?
I think I'd do something like this:
input = input.trim().replace(/\s+/, ' ');
if (input.length > MAX_INPUT_LENGTH ||
! /^[a-z ]+$/i.match(input) ) {
# raise exception?
}

\S matches a non-whitespace character. Therefore this should match what you're looking for:
^\S.{0,18}\S$
That is, a non-space character \S, followed by up to 18 of any type of character . (space or not), and finally a non-space character.
The only limitation of the above regex is that the value must be at least 2 characters. If you need to allow 1 character, you can use:
^\S(.{0,18}\S)?$
If you're looking to validate a user name (as you implied but didn't explicitly state) you're probably looking to allow only numbers, letters, and underscores. In that case, ^\w{1,20}$ will suffice.

use this pattern ^(?!\s).{0,20}(?<!\s)$
^(?!\s) start of line does not see a space
.{0,20} followed by 0 to 20 characters
(?<!\s)$ ends with a character that is not a space
Demo
or this pattern ^(\S.{0,18}\S)?$
Demo

Regex - Not containing a string and not ending with a /

How do I create a regular expression which don't contain the string "umbraco" and doesn't end with a /
This is the what I have so far but I'm unable to get it fully working, any help would be appreciated.
(?!umbraco)(?![/]$)
Test strings would be:
http://www.domain.com/umbraco/login.aspx - shouldn't match
http://www.domain.com/pages/1/ - shouldn't match
http://www.domain.com/pages/1 - should match

It should be this regex:
^(?!.*?umbraco).*?[^\/]$
Online Demo: http://regex101.com/r/lM0cS9
Explanation:
^ assert position at start of a line
(?!.*?umbraco) Negative Lookahead - Assert that it is impossible to match the regex below
.*? matches any character (except newline)
Quantifier: Between zero and unlimited times, as few times as possible, expanding as needed
umbraco matches the characters umbraco literally (case sensitive)
.*? matches any character (except newline)
Quantifier: Between zero and unlimited times, as few times as possible, expanding as needed
[^\/] match a single character not present in the list below
\/ matches the character / literally
$ assert position at end of a line

This should be the regex
^(?!.*?umbraco).*?[^\/\s*\n*]$
demo http://rubular.com/r/tEhY7JFjXK

We Keep Coding

JavaScript is the programming language of the Web.

how does javascript regex lazy match work? - javascript

For this string abc.com/file/some.png?v=123 how do I match .png? I use /\..*?\?/ but it is matching .com/file/some.png?, so why is the lazy match rule not working here?

Related

Issue with javascript regex not matching less than 3 characters

Match Specific File Path Regex

Checking an Array Element for Two Blank Space Characters

Is there a better way to write a regex that does not match on leading and trailing spaces along with a character limit?

Regex - Not containing a string and not ending with a /

Categories

Resources