Regex: select until first space or comma occurrence - javascript

I have following example of american addresses.
6301 Stonewood Dr Apt-728, Plano TX-75024
13323 Maham Road, Apt # 1621, Dallas, TX 75240
17040 Carlson Drive, #1027 Parker, CO 80134
3465 25th St., San Francisco, CA 94110
I want to extract city from using regex
Plano, Dallas, Parker, San Francisco
I am using following regex which is working for first example
(?<=[,.|•]).*\s+(?=[\s,.]?CA?[\s,.-]?[\d]{4,})
can you help me for the same as?

You can match the comma, then all except A-Z and capture from the first occurrence of A-Z.
,[^A-Z,]*?\b([A-Z][^,]*?),?\s*[A-Z]{2}[-\s]\d{4,}\s*$
Explanation
,[^A-Z,]*?\b Match a comma, then any char except A-Z or a comma till a word boundary
([A-Z][^,]*?) Capture group 1 Match A-Z and then any char except a comma as least as possible
,?\s*[A-Z]{2} match optional comma, optional whiteapace chars and 2 uppecase chars A-Z
[-\s]\d{4,}\s* Match either - or a whitespace char and then 4 or more digits followed by optional whiteapace chars
$ end of string
Regex demo

You can use
,(?:\s*#\d+)?\s*([^\s,][^,]*)(?=\W+[A-Z]{2}\W*\d{4,}\s*$)
See the regex demo. The necessary value is in Group 1.
Details:
, - a comma
(?:\s*#\d+)? - an optional sequence of zero or more whitespaces, # and then one or more digits
\s* - zero or more whitespaces
([^\s,][^,]*) - Group 1: a char other than whitespace and comma and then zero or more non-comma chars
(?=\W+[A-Z]{2}\W*\d{4,}\s*$) - a positive lookahead that requires (immediately on the right)
\W+ - one or more non-word chars
[A-Z]{2} - two uppercase ASCII letters
\W* - zero or more non-word chars
\d{4,} - gfour or more digits
\s* - zero or more whitespaces
$ - end of string.

Another approach (assuming the structure of ending is more or less fixed)
.+\s(\w+?),?.{4}\d{4,}

The best guess I could achieve was starting from the end of the string looking for a chain of non-spacing characters (being the portion you are looking for) followed by a space, a chain of capital letters, then an option space/dash and in the end a chain of numbers.
([^\s]+?)\,?\s[A-Z]+[\s\-]?\d+$
Being the first group, the target you are aiming for.
This is a live example with your use case embedded:
https://regexr.com/6nkq5
(as a side note, the demo on regexr may tell you the expression took more than 250ms and can't render.. you just slightly edit the test case to make it update and show you the actual result)

As long as your match comes always after the (exactly) two country letters, you can use that simple condition to match your city.
(?<= )[A-Za-z ]+(?=,? [A-Z]{2})
Your match [A-Za-z ]+ will be found between
(?<= ): a space and
(?=,? [A-Z]{2}): an optional comma + a space + two uppercase letters
Check the demo here.

Related

Javascript: Regex to exclude whitespace and special characters

I need a regex to validate,
Should be of length 18
First 5 characters should be either (xyz34|xyz12)
Remaining 13 characters should be alphanumeric only letters and numbers, no whitespace or special characters is allowed.
I have a pattern like here, '/^(xyz34|xyz12)((?=.*[a-zA-Z])(?=.*[0-9])){13}/g'
But this is allowing whitespace and special characters like ($,% and etc) which is violating the rule #3.
Any suggestion to exclude this whitespace and special characters and to strictly check that it must be letters and numbers?
You should not quantify lookarounds. They are non-consuming patterns, i.e. the consecutive positive lookaheads check the presence of their patterns but do not advance the regex index, they check the text at the same position. It makes no sense repeating them 13 times. ^(xyz34|xyz12)((?=.*[a-zA-Z])(?=.*[0-9])){13} is equal to ^(xyz34|xyz12)(?=.*[a-zA-Z])(?=.*[0-9]), and means the string can start with xyz34 or xyz12 and then should have at least 1 letter and at least 1 digits.
You may consider fixing the issue by using a consuming pattern like this:
If you do not care if the last 13 chars contain only digits or only letters, use the patterns suggested by other users, like /^(?:xyz34|xyz12)[a-zA-Z\d]{13}$/ or /^xyz(?:34|12)[a-zA-Z0-9]{13}$/
If there must be at least 1 digit and at least 1 letter among those 13 alphanumeric chars, use /^xyz(?:34|12)(?=[a-zA-Z]*\d)(?=\d*[a-zA-Z])[a-zA-Z\d]{13}$/.
See the regex demo #1 and the regex demo #2.
NOTE: these are regex literals, do not use them inside single- or double quotes!
Details
^ - start of string
xyz - a common prefix
(?:34|12) - a non-capturing group matching 34 or 12
(?=[a-zA-Z]*\d) - there must be at least 1 digit after any 0+ letters to the right of the current location
(?=\d*[a-zA-Z]) - there must be at least 1 letter after any 0+ digtis to the right of the current location
[a-zA-Z\d]{13} - 13 letters or digits
$ - end of string.
JS demo:
var strs = ['xyz34abcdefghijkl1','xyz341bcdefghijklm','xyz34abcdefghijklm','xyz341234567890123','xyz14a234567890123'];
var rx = /^xyz(?:34|12)(?=[a-zA-Z]*\d)(?=\d*[a-zA-Z])[a-zA-Z\d]{13}$/;
for (var s of strs) {
console.log(s, "=>", rx.test(s));
}
.* will match any string, for your requirment you can use this:
/^xyz(34|12)[a-zA-Z0-9]{13}$/g
regex fiddle
/^(xyz34|xyz12)[a-zA-Z0-9]{13}$/
This should work,
^ asserts position at the start of a line
1st Capturing Group (xyz34|xyz12)
1st Alternative xyz34 matches the characters xyz34 literally (case sensitive)
2nd Alternative xyz12 matches the characters xyz12 literally (case sensitive)
Match a single character present in the list below [a-zA-Z0-9]{13}
{13} Quantifier — Matches exactly 13 times

Issue with javascript regex not matching less than 3 characters

I have the following javascript regex:
/^[^\s][a-z0-9 ]+[^\s]$/i
I need to allow any alphanumeric character as well as spaces inside the string but not at the beginning nor at the end.
Oddly enough, the above regex will not accept less than 3 characters, e.g. aa will not match but aaa will.
I am not sure why. Can anyone please help ?
You have: [^\s] (requires matching at least one non-whitespace character), [a-z0-9 ]+ (requires matching at least one alphanumeric or space character), and [^\s] again (requires matching at least one non-whitespace character). So, in total, you need at least 3 characters in the string.
Use word boundaries at the beginning and end instead:
/^\b[a-z0-9 ]+\b$/i
https://regex101.com/r/2GhH3N/1
Try the following regex:
^(?! )[a-z0-9 ]*[a-z0-9]$
Details:
^(?! ) - Start of the string and no space after it (so here we exclude the
initial space).
[a-z0-9 ]* - A sequence of letters, digits and spaces, possibly empty
(the content before the last letter(see below).
[a-z0-9]$ - The last letter and the end of string (so here we exclude the
terminal space).
You should re-write the expression as
/^[a-z0-9]+(?:\s+[a-z0-9]+)*$/i
See the regex demo.
NOTE: If only one whitespace is allowed between the alphanumeric chars use
/^[a-z0-9]+(?:\s[a-z0-9]+)*$/i
^^
Details
^ - start of string
[a-z0-9]+ - 1+ letters/digits
(?:\s+[a-z0-9]+)* - 0 or more repetitions of 1+ whitespaces (\s+) and 1+ digit/letters
$ - end of string.
See the regex graph:

Match the alphanumeric words(NOT NUMERIC-ONLY words) which have unique digits

Using regular expression, I want to select only the words which:
are alphanumeric
do not contain only numbers
do not contain only alphabets
have unique numbers(1 or more)
I am not really good with the regex but so far, I have tried [^\d\s]*(\d+)(?!.*\1) which takes me nowhere close to the desired output :(
Here are the input strings:
I would like abc123 to match but not 123.
ab12s should also match
Only number-words like 1234 should not match
Words containing same numbers like ab22s should not match
234 should not match
hel1lo2haha3hoho4
hel1lo2haha3hoho3
Expected Matches:
abc123
ab12s
hel1lo2haha3hoho4
You can use
\b(?=\d*[a-z])(?=[a-z]*\d)(?:[a-z]|(\d)(?!\w*\1))+\b
https://regex101.com/r/TimjdW/3
Anchor the start and end of the pattern at word boundaries with \b, then:
(?=\d*[a-z]) - Lookahead for an alphabetical character somewhere in the word
(?=[a-z]*\d) - Lookahead for a digit somewhere in the word
(?:[a-z]|(\d)(?!\w*\1))+ Repeatedly match either:
[a-z] - Any alphabetical character, or
(\d)(?!\w*\1) - A digit which does not occur again in the same word
Here is a bit shorter & faster regex to make it happen since it doesn't assert negative lookahead for each character:
/\b(?=[a-z]*\d)(?=\d*[a-z])(?!\w*(\d)\w*\1)[a-z\d]+\b/ig
RegEx Demo
RegEx Details:
\b: Word boundary
(?=[a-z]*\d): Make sure we have at least a digit
(?=\d*[a-z]): Make sure we have at least a letter
(?!\w*(\d)\w*\1): Make sure digits are not repeated anywhere in the word
[a-z\d]+: Match 1+ alphanumericals
\b: Word boundary
You could assert all the conditions using one negative lookahead:
\b(?![a-z]+\b|\d+\b|\w*(\d)\w*\1)[a-z\d]+\b
See live demo here
The important parts are starting match from \b and immediately looking for the conditions:
[a-z]+\b Only alphabetic
\d+\b Only numeric
\w*(\d)\w*\1 Has a repeating digit
You can use this
\b(?!\w*(\d)\w*\1)(?=(?:[a-z]+\d+)|(?:\d+[a-z]+))[a-z0-9]+\b
\b - Word boundary.
(?!\w*(\d)\w*\1) - Condition to check unique digits.
(?=(?:[a-z]+\d+)|(?:\d+[a-z]+)) - Condition to check alphanumeric words.
[a-z0-9]+ - Matches a to z and 0 to 9
Demo

Capture between pattern of digits

I'm stuck trying to capture a structure like this:
1:1 wefeff qwefejä qwefjk
dfjdf 10:2 jdskjdksdjö
12:1 qwe qwe: qwertyå
I would want to match everything between the digits, followed by a colon, followed by another set of digits. So the expected output would be:
match 1 = 1:1 wefeff qwefejä qwefjk dfjdf
match 2 = 10:2 jdskjdksdjö
match 3 = 12:1 qwe qwe: qwertyå
Here's what I have tried:
\d+\:\d+.+
But that fails if there are word characters spanning two lines.
I'm using a javascript based regex engine.
You may use a regex based on a tempered greedy token:
/\d+:\d+(?:(?!\d+:\d)[\s\S])*/g
The \d+:\d+ part will match one or more digits, a colon, one or more digits and (?:(?!\d+:\d)[\s\S])* will match any char, zero or more occurrences, that do not start a sequence of one or more digits followed with a colon and a digit. See this regex demo.
As the tempered greedy token is a resource consuming construct, you can unroll it into a more efficient pattern like
/\d+:\d+\D*(?:\d(?!\d*:\d)\D*)*/g
See another regex demo.
Now, the () is turned into a pattern that matches strings linearly:
\D* - 0+ non-digit symbols
(?: - start of a non-capturing group matching zero or more sequences of:
\d - a digit that is...
(?!\d*:\d) - not followed with 0+ digits, : and a digit
\D* - 0+ non-digit symbols
)* - end of the non-capturing group.
you can use or not the ñ-Ñ, but you should be ok this way
\d+?:\d+? [a-zñA-ZÑ ]*
Edited:
If you want to include the break lines, you can add the \n or \r to the set,
\d+?:\d+? [a-zñA-ZÑ\n ]*
\d+?:\d+? [a-zñA-ZÑ\r ]*
Give it a try ! also tested in https://regex101.com/
for more chars:
^[a-zA-Z0-9!##\$%\^\&*)(+=._-]+$

Checking an Array Element for Two Blank Space Characters

How would you be able to tell if an array element is made up of three words (i.e. if it has two blank space characters in it)? It might look something like "abc def ghi". I am trying to search through an array for elements of this form and will remove this while others of the format "jkl xyz" or '"jkl"' would remain in the array.
You can use search function with following regex :
str.search(/\b(\w+ \w+ \w+)\b/g);
Read the detail in Demo
You can use a regex like:
/^[^\s]+\s[^\s]+\s[^\s]+$/.test("abc def def") // true
/^[^\s]+\s[^\s]+\s[^\s]+$/.test("abc def ") // false
It means:
^ Start of string
[^\s]+ 1 or more none space characters
\s a space character
[^\s]+ 1 or more none space characters
\s a space character
[^\s]+ 1 or more none space characters
\s a space character
$ End of string
var myArray = ["abc def ghi","jkl xyz","gty slp","zxc vbn jkl"];
for (i=0;i<myArray.length;++i) {
if (/\w+ \w+ \w+/.test(myArray[i])) {
myArray.splice([i], 1);
}
};
console.log(myArray);
Outputs:
["jkl xyz", "gty slp"]
CODEPEN DEMO
RegexExplanation:
\w+ \w+ \w+
-----------
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “ ” literally « »
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “ ” literally « »
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Categories