Capture between pattern of digits - javascript

I'm stuck trying to capture a structure like this:
1:1 wefeff qwefejä qwefjk
dfjdf 10:2 jdskjdksdjö
12:1 qwe qwe: qwertyå
I would want to match everything between the digits, followed by a colon, followed by another set of digits. So the expected output would be:
match 1 = 1:1 wefeff qwefejä qwefjk dfjdf
match 2 = 10:2 jdskjdksdjö
match 3 = 12:1 qwe qwe: qwertyå
Here's what I have tried:
\d+\:\d+.+
But that fails if there are word characters spanning two lines.
I'm using a javascript based regex engine.

You may use a regex based on a tempered greedy token:
/\d+:\d+(?:(?!\d+:\d)[\s\S])*/g
The \d+:\d+ part will match one or more digits, a colon, one or more digits and (?:(?!\d+:\d)[\s\S])* will match any char, zero or more occurrences, that do not start a sequence of one or more digits followed with a colon and a digit. See this regex demo.
As the tempered greedy token is a resource consuming construct, you can unroll it into a more efficient pattern like
/\d+:\d+\D*(?:\d(?!\d*:\d)\D*)*/g
See another regex demo.
Now, the () is turned into a pattern that matches strings linearly:
\D* - 0+ non-digit symbols
(?: - start of a non-capturing group matching zero or more sequences of:
\d - a digit that is...
(?!\d*:\d) - not followed with 0+ digits, : and a digit
\D* - 0+ non-digit symbols
)* - end of the non-capturing group.

you can use or not the ñ-Ñ, but you should be ok this way
\d+?:\d+? [a-zñA-ZÑ ]*
Edited:
If you want to include the break lines, you can add the \n or \r to the set,
\d+?:\d+? [a-zñA-ZÑ\n ]*
\d+?:\d+? [a-zñA-ZÑ\r ]*
Give it a try ! also tested in https://regex101.com/
for more chars:
^[a-zA-Z0-9!##\$%\^\&*)(+=._-]+$

Related

Regex: select until first space or comma occurrence

I have following example of american addresses.
6301 Stonewood Dr Apt-728, Plano TX-75024
13323 Maham Road, Apt # 1621, Dallas, TX 75240
17040 Carlson Drive, #1027 Parker, CO 80134
3465 25th St., San Francisco, CA 94110
I want to extract city from using regex
Plano, Dallas, Parker, San Francisco
I am using following regex which is working for first example
(?<=[,.|•]).*\s+(?=[\s,.]?CA?[\s,.-]?[\d]{4,})
can you help me for the same as?
You can match the comma, then all except A-Z and capture from the first occurrence of A-Z.
,[^A-Z,]*?\b([A-Z][^,]*?),?\s*[A-Z]{2}[-\s]\d{4,}\s*$
Explanation
,[^A-Z,]*?\b Match a comma, then any char except A-Z or a comma till a word boundary
([A-Z][^,]*?) Capture group 1 Match A-Z and then any char except a comma as least as possible
,?\s*[A-Z]{2} match optional comma, optional whiteapace chars and 2 uppecase chars A-Z
[-\s]\d{4,}\s* Match either - or a whitespace char and then 4 or more digits followed by optional whiteapace chars
$ end of string
Regex demo
You can use
,(?:\s*#\d+)?\s*([^\s,][^,]*)(?=\W+[A-Z]{2}\W*\d{4,}\s*$)
See the regex demo. The necessary value is in Group 1.
Details:
, - a comma
(?:\s*#\d+)? - an optional sequence of zero or more whitespaces, # and then one or more digits
\s* - zero or more whitespaces
([^\s,][^,]*) - Group 1: a char other than whitespace and comma and then zero or more non-comma chars
(?=\W+[A-Z]{2}\W*\d{4,}\s*$) - a positive lookahead that requires (immediately on the right)
\W+ - one or more non-word chars
[A-Z]{2} - two uppercase ASCII letters
\W* - zero or more non-word chars
\d{4,} - gfour or more digits
\s* - zero or more whitespaces
$ - end of string.
Another approach (assuming the structure of ending is more or less fixed)
.+\s(\w+?),?.{4}\d{4,}
The best guess I could achieve was starting from the end of the string looking for a chain of non-spacing characters (being the portion you are looking for) followed by a space, a chain of capital letters, then an option space/dash and in the end a chain of numbers.
([^\s]+?)\,?\s[A-Z]+[\s\-]?\d+$
Being the first group, the target you are aiming for.
This is a live example with your use case embedded:
https://regexr.com/6nkq5
(as a side note, the demo on regexr may tell you the expression took more than 250ms and can't render.. you just slightly edit the test case to make it update and show you the actual result)
As long as your match comes always after the (exactly) two country letters, you can use that simple condition to match your city.
(?<= )[A-Za-z ]+(?=,? [A-Z]{2})
Your match [A-Za-z ]+ will be found between
(?<= ): a space and
(?=,? [A-Z]{2}): an optional comma + a space + two uppercase letters
Check the demo here.

Regex match any js number

So as an exercise I wanted to match any JS number. This is the one I could come up with:
/^(-?)(0|([1-9]\d*?|0)(\.\d+)?)$/
This however doesn't match the new syntax with underscore separators (1_2.3_4). I tried a couple of things but I couldn't come up with something that would work. How could I express all JS numbers in one regex?
For the format in the question, you could use:
^-?\d+(?:_\d+)*(?:\.\d+(?:_\d+)*)?$
See a regex demo.
Or allowing only a single leading zero:
^-?(?:0|[1-9]\d*)(?:_\d+)*(?:\.\d+(?:_\d+)*)?$
The pattern matches:
^ Start of string
-? Match an optional -
(?:0|[1-9]\d*) Match either 0 or 1-9 and optional digits
(?:_\d+)* Optionally repeat matching _ and 1+ digits
(?: Non capture group
\.\d+(?:_\d+)* Match . and 1+ digits and optionally repeat matching _ and 1+ digits
)? Close non capture group
$ End of string
See another regex demo.
how about this?
^(-?)(0|([1-9]*?(\_)?(\d)|0|)(\.\d+)?(\_)?(\d))$

how does javascript regex lazy match work?

For this string
abc.com/file/some.png?v=123
how do I match .png? I use
/\..*?\?/
but it is matching .com/file/some.png?, so why is the lazy match rule not working here?
There are lots of variants to this answer. I will propose matching the first file suffix after the last / character.
That can be done with this regex
/(?!.*\/)\.\w+\?/
Explaination
(?!.*/)\.\w+\?
Options: Case insensitive; Dot doesn’t match line breaks; ^$ match at line breaks
Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!.*/)
Match any single character that is NOT a line break character (line feed, carriage return, line separator, paragraph separator) .*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “/” literally /
Match the character “.” literally \.
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) \w+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the question mark character \?
\1
Insert a backslash \
Insert the character “1” literally 1
Created with RegexBuddy

Issue with javascript regex not matching less than 3 characters

I have the following javascript regex:
/^[^\s][a-z0-9 ]+[^\s]$/i
I need to allow any alphanumeric character as well as spaces inside the string but not at the beginning nor at the end.
Oddly enough, the above regex will not accept less than 3 characters, e.g. aa will not match but aaa will.
I am not sure why. Can anyone please help ?
You have: [^\s] (requires matching at least one non-whitespace character), [a-z0-9 ]+ (requires matching at least one alphanumeric or space character), and [^\s] again (requires matching at least one non-whitespace character). So, in total, you need at least 3 characters in the string.
Use word boundaries at the beginning and end instead:
/^\b[a-z0-9 ]+\b$/i
https://regex101.com/r/2GhH3N/1
Try the following regex:
^(?! )[a-z0-9 ]*[a-z0-9]$
Details:
^(?! ) - Start of the string and no space after it (so here we exclude the
initial space).
[a-z0-9 ]* - A sequence of letters, digits and spaces, possibly empty
(the content before the last letter(see below).
[a-z0-9]$ - The last letter and the end of string (so here we exclude the
terminal space).
You should re-write the expression as
/^[a-z0-9]+(?:\s+[a-z0-9]+)*$/i
See the regex demo.
NOTE: If only one whitespace is allowed between the alphanumeric chars use
/^[a-z0-9]+(?:\s[a-z0-9]+)*$/i
^^
Details
^ - start of string
[a-z0-9]+ - 1+ letters/digits
(?:\s+[a-z0-9]+)* - 0 or more repetitions of 1+ whitespaces (\s+) and 1+ digit/letters
$ - end of string.
See the regex graph:

Regex to allow numbers and digits but only one comma

I have a task where the user should be able to edit the first line of an address field but they should only be able to use one comma but can put that one comma anywhere in the string.
I was wondering if there was a way that this could be done in JavaScript?
so far I have tried:
^[a-zA-Z0-9\&\-\,\.\/\'_ ]+$
But this regex allows me to enter multiple commas.
So I want the regex to allow the user to do this:
21, Tash Place N13 2IJ
or this:
,Tash Place 21 N13 2IJ
But not this:
21, Tash Place, N13, 2IJ
Any help would be appreciated
You may use
/^[-a-zA-Z0-9&.\/'_ ]*(?:,[-a-zA-Z0-9&.\/'_ ]*)?$/
See the regex demo.
Here,
^ - matches the start of string,
[-a-zA-Z0-9&.\/'_ ]* - matches 0+ letters, digits or -./'_ symbols, then
(?:,[-a-zA-Z0-9&.\/'_ ]*)? - an optional sequence (1 or 0 occurrences) of:
, - a comma (thus, only one is allowed)
[-a-zA-Z0-9&.\/'_ ]* - matches 0+ letters, digits or -./'_ symbols, then
$ - end of string.
Another way is to add a (?!(?:[^,]*,){2}) negative lookahead to your regex:
/^(?!(?:[^,]*,){2})[-a-zA-Z0-9&.\/',_ ]+$/
^^^^^^^^^^^^^^^^^
See another regex demo
The (?!(?:[^,]*,){2}) lookahead will fail the match if there are 2 sequences of 0+ chars other than , and then a , in the string.

Categories