Regex improvement match - javascript

I need some help to improve a regex!
In JavaScript I have a regular expression which looks for pairs of numbers in a filename
var nums = str.match(/[\d]{1,}[\d]{1,}/gi);
This will match
DV_Banner_1200x627.jpg
DV_Banner_1200y627.jpg
DV_Banner_1200 x 627.jpg
DV_Banner_1200 x627.jpg
DV_Banner_1200 627.jpg
with (1200,627)
I have tried to improve the reg ex, just incase there are more than two pairs of numbers, to look for the following
number(1 digit or more) + whitspace(1 or more) + x (zero or once) + whitspace(1 or more) + number(1 digit or more)
Which should fail on the second example (using a 'y' instead on an 'x'), which I thought would be:
[\d]{1,}[\s]?[x]?[\s]?[\d]{1,}
but it grabs all the digits in
DV_Banner_1200 x 627 01.jpg
with (1200,627,01) whereas I only want the first two numbers. I've written the code to deal only with the first two, but I was wondering where I was going wrong. Only a level 17 regex wizard can save me now! Thanks

I used \d+\s?x?\s?\d+ as my regex (same thing just replacing + for {1,} and removing the unnecessary []). You can see the outcome of it here.
The reason it's matching the 01 is because of all the ?. So it's matching the first /d+ (1 digit: 0), and then 0 of \s, 0 of x, and 0 of \s followed by \d+ (another 1 digit: 1)
The regex
(\d+)(?:\s?x\s?|\s)(\d+)
should do the trick. Test it here
(?:...) is a non-capture group. So it allows alternation while not assigning a back reference to it. This part matches the characters in between the two numbers (either has an x or a <space>).

Just try with following regex:
(\d+)(?:(?: ?x ?)| )(\d+)
demo

You say you want "one or more" whitespace characters between the "x", but you have used the ? quantifier which means "zero or one". Thus, because you've also marked the "x" as optional, it will match any two-or-more digit number: Your first [\d]{1,} will match against 0 then your second one will match on 1.
Note that you do not need to enclose single atoms into a character range: [\d] can be more simply written as \d. Also {1,} -- meaning "one or more" -- is more easily encoded as +.
As you want "one or more" whitespace character on either side of the "x", I would go with:
\d+(?:(?:\s+x\s+)|\s+)\d+
Note that (?: ... ) is a "non-capture group", so these bits won't form part of your match array. However, I don't think you want "one or more" whitespace character, as that won't match your first example. Instead, try this:
\d+(?:(?:\s*x\s*)|\s+)\d+
Where the * quantifier means "zero-or-more".

Related

How to replace consecutive non alphanumeric values with _ [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?
They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here
The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).
In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com
+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.
A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.
Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba
+ is minimal one, * can be zero as well.
A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.
I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression

How does the following code mean two consecutive numbers?

This is from an exercise on FCC beta and i can not understand how the following code means two consecutive numbers seeing how \D* means NOT 0 or more numbers and \d means number, so how does this accumulate to two numbers in a regexp?
let checkPass = /(?=\w{5,})(?=\D*\d)/;
This does not match two numbers. It doesn't really match anything except an empty string, as there is nothing preceding the lookup.
If you want to match two digits, you can do something like this:
(\d)(\d)
Or if you really want to do a positive lookup with the (?=\D*\d) section, you will have to do something like this:
\d(?=\D*\d)
This will match against the last digit which is followed by a bunch of non-digits and a single digit. A few examples (matched numbers highlighted):
2 hhebuehi3
^
245673
^^^^^
2v jugn45
^ ^
To also capture the second digit, you will have to put brackets around both numbers. Ie:
(\d)(?=\D*(\d))
Here it is in action.
In order to do what your original example wants, ie:
number
5+ \w characters
a non-number character
a number
... you will need to precede your original example with a \d character. This means that your lookups will actually match something which isn't just an empty string:
\d(?=\w{5,})(?=\D*\d)
IMPORTANT EDIT
After playing around a bit more with a JavaScript online console, I have worked out the problem with your original Regex.
This matches a string with 5 or more characters, including at least 1 number. This can match two numbers, but it can also match 1 number, 3 numbers, 12 numbers, etc. In order to match exactly two numbers in a string of 5-or-more characters, you should specify the number of digits you want in the second half of your lookup:
let regex = /(?=\w{5,})(?=\D*\d{2})/;
let string1 = "abcd2";
let regex1 = /(?=\w{5,})(?=\D*\d)/;
console.log("string 1 & regex 1: " + regex1.test(string1));
let regex2 = /(?=\w{5,})(?=\D*\d{2})/;
console.log("string 1 & regex 2: " + regex2.test(string1));
let string2 = "abcd23";
console.log("string 2 & regex 2: " + regex2.test(string2));
My original answer was about Regex in a vacuum and I glossed over the fact that you were using Regex in conjunction with JavaScript, which works a little differently when comparing Regex to a string. I still don't know why your original answer was supposed to match two numbers, but I hope this is a bit more helpful.
?= Positive lookahead
w{5,} matches any word character (equal to [a-zA-Z0-9_])
{5,}. matches between 5 and unlimited
\D* matches any character that\'s not a digit (equal to [^0-9])
* matches between zero and unlimited
\d matches a digit (equal to [0-9])
This expression is global - so tries to match all
You can always check your expression using regex101

Javascript regex for extracting certain part of string

I need to extract certain part of Javascript string. I was thinking to do it with regex, but couldn't come up with one which does it correctly.
String can have variable length & can contain all possible characters in all possible combinations.
What I need to extract from it, is 10 adjacent characters, that match one of next two possible combinations:
9 numbers & 1 letter "X" (capital letter "X", not X as variable letter!)
10 numbers
So, if input string is this: "[1X,!?X22;87654321X9]ddee", it should return only "87654321X9".
I hope I've explained it good enough. Thanks in advance!
This Regex will work:
\d{9}X|\d{8}X\d|\d{7}X\d{2}|\d{6}X\d{3}|\d{5}X\d{4}|\d{4}X\d{5}|\d{3}X\d{6}|\d{2}X\d{7}|\d{1}X\d{8}|\d{10}|X\d{9}
As described, It need to match 9 digits and any letter, and the letter can be at any position of the sequence.
\d{9}X # will match 9 digits and a letter in the end
\d{8}X\d # will match 8 digits a lettter then a digit again
...
\d{1}X\d{8} # will match 1 digits a lettter then 8 digits
\{10} # will match 10 digits
Edited to match only X
You can use this much simpler regex:
/(?!\d*X\d*X)[\dX]{10}/
RegEx Breakup:
(?!\d*X\d*X) # negative lookahead to fail the match if there are 2 X ahead
[\dX]{10} # match a digit or X 10 times
Since more than one X is not allowed due to use of negative lookahead, this regex will only allow either 10 digits or ekse 9 digits and a single X.
RegEx Demo
This regex has few advantages over the other answer:
Much simpler regex that is easier to read and maintain
Takes less than half steps to complete which can be substantial difference on larger text.

Regex to match only when expression match is no more than 12 characters long

I am trying to create a regular expression (Java/JavaScript) that matches the following regex, but only when there are fewer than 13 characters total (and a minimum of 4).
(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?) ← originally posted
(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ [A-Z]+)?)
These values should (and do) match:
MED-123
COTA-1224
MED4
COTB-892K777
MED-33 DDD
MED-234J5678
This value matches, but I don't want it to (I want to only match if there are fewer than 12 characters total):
COT-1111J11111111111111
See http://regexr.com/3bs7b http://regexr.com/3bsfv
I have tried grouping my expression and putting {4,12} at the end, but that just makes it look for 4 to 12 instances of the whole expression matching.
I feel like I am missing something simple...thanks in advance for your help!
You can use negative look-ahead:
(?!.{13,})(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?)
Since your expression already make sure that a match starts with COT or MED and there is at least one digit after that, it already guarantees that there are at least 4 characters
I have tried grouping my expression and putting {4,12} at the end, but
that just makes it look for 4 to 12 instances of the whole expression
matching.
This looks for 4 to 12 instances of the whole expression because you didn't add a word boundary \b. Your regex works fine, just add a word boundary and your desired outcome would be achieved. Take a look at this DEMO.
Your regex seems to be very clumsy and looks a little bit hard to read. It is also very limited to certain characters example JK except if you want it to be that way. For a more general pattern, you can check this out
(COT|MED)[AB]?-?[\dJK]{1,8}(\s+D{1,3})?\b
(COT|MED): matches either COT or MED
[AB]?: matches A or B which is optional because of the presence of ?
-?: matches - which is also optional
[\dJK]{1,8}: This matches a number,or J or K with a length of at least one character and a maximum of eight characters.
(\s+D{1,3})?: matches a space or a D at least one time and a maximum of 3 times and this is optional
\b: with respect to your question this seems to be the most important and it creates a boundary for the words that have already been matched. This means that anything exceeding the matched pattern would not be captured.
See the demo here DEMO2
The answer you are looking for is
(?!\S{13})(?:COT|MED)[ABCD]?-?\d{1,4}(?:[JK]+\d*|(?: [A-Z]+)?)
See regex demo
Note that it is almost impossible to check the length of a phrase that is not a whole string or that has spaces inside since boundaries are a bit "blurred". Thus, (?!\S{13}) is a kind of a workaround that just makes sure you do not have a string without whitespace that is 13 characters long or longer.
The regex breakdown:
(?!\S{13}) - Check if the substring that follows does not consist of 13 non-whitespace characters
(?:COT|MED) - Any of the values in the alternation (COTorMED`)
[ABCD]?-? - Optional A, B, C, D and then an optional -
\d{1,4} - 1 to 4 digits
(?:[JK]+\d*|(?: [A-Z]+)?) - a group of 2 alternatives:
[JK]+\d* - J or K, 1 or more times, and then 0 or more digits
(?: [A-Z]+)? - optional space and 1 or more Latin uppercase letters
As this answer suggests, you could solve this this way:
(?=(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?))(?={4 , 12})

javascript regular expression for -999x999

I need a regular expression for:
-[n digits]x[n digits]
I already tried this:
var s = "path/path/name-799x1024.jpg";
s.replace(/\d/g, "");
But this gets only the digits.
Here is a small jsfiddle: http://jsfiddle.net/aq6dp49n/
The outcome I try to get is:
pfad/pfade/name.jpg
How do I add the - and the small x between the two digits?
The regular expression that would match that is /-\d+x\d+/. Hence:
s.replace(/-\d+x\d+/, "")
Should work.
Here's what the regex means: the first - tells it that it should look for a - character. Then you have \d+ which means "one or more of \d", where \d is short-hand for the character class [0-9], i.e., all digits. After that you have x, which means it will look for the character x, and finally you have \d+ again, which is the same as before.
To match
-[n digits]x[n digits]
You would want
match(/-[0-9]{n}x[0-9]{n}\b/)
Though if you want an arbitrary (one or more) number of digits, use + in place of {n}. In the case of your example, you'd want 3 and 4 for your values of n.
Here's a step-by-step explanation of what this does:
/-[0-9]{3}x[0-9]{4}\b/
- matches the character - literally
[0-9]{3} match a single character present in the list below
Quantifier: {3} Exactly 3 times
0-9 a single character in the range between 0 and 9
x matches the character x literally (case sensitive)
[0-9]{4} match a single character present in the list below
Quantifier: {4} Exactly 4 times
0-9 a single character in the range between 0 and 9
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
To remove the last size-like part of a string, this should do:
"path/path/name-799x1024.jpg".replace(/(.*)-[0-9]+x[0-9]+/, "$1");
// "path/path/name.jpg"
"path/path/name-10x12-799x1024.jpg".replace(/(.*)-[0-9]+x[0-9]+/, "$1");
// "path/path/name-10x12.jpg"
This takes advantage of the fact that regexps are greedy, so the (.*) absorbs (and saves) as much preceding text as possible before finding the next match.
(I prefer to use [0-9] in place of \d because it's more specific (\d also matches non-latin numerals) and therefore slightly faster, though in this case it shouldn't matter.)

Categories