Difference between regex repeater and submatch recur - javascript

Can anyone explain to me the differences between these two regex approaches:
/(\d)\1/
/(\d){2,}/
As far as I can see they both match for at least one recurrence of a subexpression. If they, in fact, do the same thing, are there any performance issues that distinguish them?

No they don't do the same
/(\d)\1/
matches
11 and 22 and 33
With the brackets you put the matched digit in a capture group and access that variable with \1, so you match two equal digits in a row.
while
/(\d){2,}/
matches
12 and 22 and 123456789 and 22222222
Here you say match two or more ({2,}) digits in a row. This can be different digits.

/(\d)\1/ - Match a digit, capture it in group 1, and then match the same digit again, using a back references.
/(\d){2,}/ - Match 2 digits or more. The last digit will be captured in a group. Each digit is matched independently, they don't have to be the same.

Related

Regexp with a maximum of 2 consecutive equal chars and other options

I'm totally new with regualr expressions and I have to build one with the following requisites:
between 8 and 15 chars
at least 1 alphabetic char (a-z,A-Z)
at least 1 non alphabetic (all the others)
at least 1 CAPITAL letter
at least 1 non-capital letter
maximum of 2 consecutive equal chars (e.g.: 'g' accepted, 'gg' accepted, 'ggg' not)
I tried with this one, but it works only with a maximum of 5 consecutive equal chars (dont understand why). What I'm doing wrong?
var regexp = /^((?=.*[a-z])(?=.*[A-Z])(?=.*[^a-zA-Z])(.{8,15})(?!.*(.)\1{2}))$/;
EDIT it works with
asdfghjkl1Q
asdfghjkl1QQQ
asdfghjkl1QQQQQ
it not works with
asdfghjkl1QQQQQQ
asdfghjkl1QQQQQq
what i'm trying to obtain is:
WORKING with :
asdfghjkl1Q
asdfghjkl1QQ
asdfghjkl11
NOT WORKING with:
asdfghjkl1QQQ
asdfghjkl1QQq
asdfghjkl111
I think you don't need the outer capturing group so you might omit it.
You could first check for the 8,15 characters until the end of the string $ using a lookahead (?=.{8,15}$)
If all the lookaheads match, then match any character one or more times .+
Try it like this:
^(?=.{8,15}$)(?=.*[a-z])(?=.*[A-Z])(?=.*[^a-zA-Z])(?!.*(.)\1{2}).+$

Insert hyphen when regex group is not null or empty

I have a numeric code which varies in length from 6-11 digits
which is separated by hyphen after each 3 digits
possible combinations
123-456
123-456-78
123-456-7890
So, here I am trying to convert the user entered code to this format even when entered with spaces and hyphens in the middle.
For Ex:
123 456-7 -> 123-456-7
123456 789 -> 123-456-789
123456 -> 123-456
Valid user input format is 3digits[space or hyphen]3digits[space or hyphen]0to5digits
I tried it like this
code.replace(/^(\d{3})[- ]?(\d{3})[- ]?(\d{0,5})$/,'$1-$2-$3');
But when there are only 6 digits there is a hyphen(-) at the end of the number which is not desired.
123-456-
Could anybody help me with this? Thank you.
The easiest way is probably to just do this with a second replace:
code.replace(/^(\d{3})[- ]?(\d{3})[- ]?(\d{0,4})$/,'$1-$2-$3')
.replace(/-$/, '');
This is chaining a second replace function, which says "replace a - at the end of the string with an empty string". Or in other words, "if the last character of the string is - then delete it.
I find this approach more intuitive than trying to fit this logic all into a replace command, but this is also possible:
code.replace(
/^(\d{3})[- ]?(\d{3})[- ]?(\d{0,4})$/,
'$1-$2' + ($3 == null ? '' : '-') + $3
)
I think it's less obvious at a glance what this code i doing, but the net result is the same. If there was no $3 matched (i.e. your sting only contained 6 digits), then do not include the final - in the replacement.
I believe this will do it for you - replace
^(\d{3})[ -]?()|(\d{3})[ -]?(\d{1,5})
with
$1$3-$2$4
It has two alternations.
^(\d{3})[ -]?() matches start of line and then captures the first group of three digits ($1), then optionally followed by a space or an hyphen. Finally it captures an empty group ($2).
(\d{3})[ -]?(\d{1,5}) matches, and captures ($3), three digits, optionally followed by a space or an hyphen. Then it matches and captures (($4)) the remaining 1-5 digits if they're present.
Since the global flag is set it will make one or two iterations for each sequence of digits. The first will match the first alternation, capturing the first three digits into group 1. Group 2 will be empty.
For the second iteration the first match have matched the first three digits, so this time the second alternation will match and capture the next three digits into group 3 and then the remaining into group 4.
Note! If there only are three digits left after the first match, none of the alternations will match, leaving the last three digits as is.
So at the first iteration group 1 are digits 123. group 2, 3 and 4 are empty. The second iteration group 1 and two are empty, group 3 are the digits 456 and group 4 are digit 7-11.
This gives the first replace $1 = 123- plus nothing, and the second 456-67....
There's no syntax checking in this though. It assumes the digits has been entered as you stated they would.
See it here at regex101.

Match exactly 11 occurences of the same digit in a group

I need to match exactly 11 occurences of the same digit in a group, like:
11111111111
55555555555
But not:
11111000111
55552225555
What I've tried so far can get 11 occurences of digits:
/([0-9]){11}/g
/\d{11}/g
But it will match any 11 digits.
I've managed to do this:
/(0{11}|1{11}|2{11}|3{11}|4{11}|5{11}|6{11}|7{11}|8{11}|9{11})/g
Is there any other easier way to do it?
/(\d)\1{10}/
This matches the first digit and uses a reference to that digit \1 to match it ten more times. Note that this will also match if the digit repeats 12 or more times, and if other digits start the string, but this seems to be desired.
You should use backreference: ((\d)\2{10})
The \2 matches "the same thing as was the 2nd caputing group (parentheses)".
https://regex101.com/r/QESWrJ/1

Javascript regex for extracting certain part of string

I need to extract certain part of Javascript string. I was thinking to do it with regex, but couldn't come up with one which does it correctly.
String can have variable length & can contain all possible characters in all possible combinations.
What I need to extract from it, is 10 adjacent characters, that match one of next two possible combinations:
9 numbers & 1 letter "X" (capital letter "X", not X as variable letter!)
10 numbers
So, if input string is this: "[1X,!?X22;87654321X9]ddee", it should return only "87654321X9".
I hope I've explained it good enough. Thanks in advance!
This Regex will work:
\d{9}X|\d{8}X\d|\d{7}X\d{2}|\d{6}X\d{3}|\d{5}X\d{4}|\d{4}X\d{5}|\d{3}X\d{6}|\d{2}X\d{7}|\d{1}X\d{8}|\d{10}|X\d{9}
As described, It need to match 9 digits and any letter, and the letter can be at any position of the sequence.
\d{9}X # will match 9 digits and a letter in the end
\d{8}X\d # will match 8 digits a lettter then a digit again
...
\d{1}X\d{8} # will match 1 digits a lettter then 8 digits
\{10} # will match 10 digits
Edited to match only X
You can use this much simpler regex:
/(?!\d*X\d*X)[\dX]{10}/
RegEx Breakup:
(?!\d*X\d*X) # negative lookahead to fail the match if there are 2 X ahead
[\dX]{10} # match a digit or X 10 times
Since more than one X is not allowed due to use of negative lookahead, this regex will only allow either 10 digits or ekse 9 digits and a single X.
RegEx Demo
This regex has few advantages over the other answer:
Much simpler regex that is easier to read and maintain
Takes less than half steps to complete which can be substantial difference on larger text.

Regex to match only when expression match is no more than 12 characters long

I am trying to create a regular expression (Java/JavaScript) that matches the following regex, but only when there are fewer than 13 characters total (and a minimum of 4).
(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?) ← originally posted
(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ [A-Z]+)?)
These values should (and do) match:
MED-123
COTA-1224
MED4
COTB-892K777
MED-33 DDD
MED-234J5678
This value matches, but I don't want it to (I want to only match if there are fewer than 12 characters total):
COT-1111J11111111111111
See http://regexr.com/3bs7b http://regexr.com/3bsfv
I have tried grouping my expression and putting {4,12} at the end, but that just makes it look for 4 to 12 instances of the whole expression matching.
I feel like I am missing something simple...thanks in advance for your help!
You can use negative look-ahead:
(?!.{13,})(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?)
Since your expression already make sure that a match starts with COT or MED and there is at least one digit after that, it already guarantees that there are at least 4 characters
I have tried grouping my expression and putting {4,12} at the end, but
that just makes it look for 4 to 12 instances of the whole expression
matching.
This looks for 4 to 12 instances of the whole expression because you didn't add a word boundary \b. Your regex works fine, just add a word boundary and your desired outcome would be achieved. Take a look at this DEMO.
Your regex seems to be very clumsy and looks a little bit hard to read. It is also very limited to certain characters example JK except if you want it to be that way. For a more general pattern, you can check this out
(COT|MED)[AB]?-?[\dJK]{1,8}(\s+D{1,3})?\b
(COT|MED): matches either COT or MED
[AB]?: matches A or B which is optional because of the presence of ?
-?: matches - which is also optional
[\dJK]{1,8}: This matches a number,or J or K with a length of at least one character and a maximum of eight characters.
(\s+D{1,3})?: matches a space or a D at least one time and a maximum of 3 times and this is optional
\b: with respect to your question this seems to be the most important and it creates a boundary for the words that have already been matched. This means that anything exceeding the matched pattern would not be captured.
See the demo here DEMO2
The answer you are looking for is
(?!\S{13})(?:COT|MED)[ABCD]?-?\d{1,4}(?:[JK]+\d*|(?: [A-Z]+)?)
See regex demo
Note that it is almost impossible to check the length of a phrase that is not a whole string or that has spaces inside since boundaries are a bit "blurred". Thus, (?!\S{13}) is a kind of a workaround that just makes sure you do not have a string without whitespace that is 13 characters long or longer.
The regex breakdown:
(?!\S{13}) - Check if the substring that follows does not consist of 13 non-whitespace characters
(?:COT|MED) - Any of the values in the alternation (COTorMED`)
[ABCD]?-? - Optional A, B, C, D and then an optional -
\d{1,4} - 1 to 4 digits
(?:[JK]+\d*|(?: [A-Z]+)?) - a group of 2 alternatives:
[JK]+\d* - J or K, 1 or more times, and then 0 or more digits
(?: [A-Z]+)? - optional space and 1 or more Latin uppercase letters
As this answer suggests, you could solve this this way:
(?=(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?))(?={4 , 12})

Categories