Javascript regex for extracting certain part of string - javascript

I need to extract certain part of Javascript string. I was thinking to do it with regex, but couldn't come up with one which does it correctly.
String can have variable length & can contain all possible characters in all possible combinations.
What I need to extract from it, is 10 adjacent characters, that match one of next two possible combinations:
9 numbers & 1 letter "X" (capital letter "X", not X as variable letter!)
10 numbers
So, if input string is this: "[1X,!?X22;87654321X9]ddee", it should return only "87654321X9".
I hope I've explained it good enough. Thanks in advance!

This Regex will work:
\d{9}X|\d{8}X\d|\d{7}X\d{2}|\d{6}X\d{3}|\d{5}X\d{4}|\d{4}X\d{5}|\d{3}X\d{6}|\d{2}X\d{7}|\d{1}X\d{8}|\d{10}|X\d{9}
As described, It need to match 9 digits and any letter, and the letter can be at any position of the sequence.
\d{9}X # will match 9 digits and a letter in the end
\d{8}X\d # will match 8 digits a lettter then a digit again
...
\d{1}X\d{8} # will match 1 digits a lettter then 8 digits
\{10} # will match 10 digits
Edited to match only X

You can use this much simpler regex:
/(?!\d*X\d*X)[\dX]{10}/
RegEx Breakup:
(?!\d*X\d*X) # negative lookahead to fail the match if there are 2 X ahead
[\dX]{10} # match a digit or X 10 times
Since more than one X is not allowed due to use of negative lookahead, this regex will only allow either 10 digits or ekse 9 digits and a single X.
RegEx Demo
This regex has few advantages over the other answer:
Much simpler regex that is easier to read and maintain
Takes less than half steps to complete which can be substantial difference on larger text.

Related

How does the following code mean two consecutive numbers?

This is from an exercise on FCC beta and i can not understand how the following code means two consecutive numbers seeing how \D* means NOT 0 or more numbers and \d means number, so how does this accumulate to two numbers in a regexp?
let checkPass = /(?=\w{5,})(?=\D*\d)/;
This does not match two numbers. It doesn't really match anything except an empty string, as there is nothing preceding the lookup.
If you want to match two digits, you can do something like this:
(\d)(\d)
Or if you really want to do a positive lookup with the (?=\D*\d) section, you will have to do something like this:
\d(?=\D*\d)
This will match against the last digit which is followed by a bunch of non-digits and a single digit. A few examples (matched numbers highlighted):
2 hhebuehi3
^
245673
^^^^^
2v jugn45
^ ^
To also capture the second digit, you will have to put brackets around both numbers. Ie:
(\d)(?=\D*(\d))
Here it is in action.
In order to do what your original example wants, ie:
number
5+ \w characters
a non-number character
a number
... you will need to precede your original example with a \d character. This means that your lookups will actually match something which isn't just an empty string:
\d(?=\w{5,})(?=\D*\d)
IMPORTANT EDIT
After playing around a bit more with a JavaScript online console, I have worked out the problem with your original Regex.
This matches a string with 5 or more characters, including at least 1 number. This can match two numbers, but it can also match 1 number, 3 numbers, 12 numbers, etc. In order to match exactly two numbers in a string of 5-or-more characters, you should specify the number of digits you want in the second half of your lookup:
let regex = /(?=\w{5,})(?=\D*\d{2})/;
let string1 = "abcd2";
let regex1 = /(?=\w{5,})(?=\D*\d)/;
console.log("string 1 & regex 1: " + regex1.test(string1));
let regex2 = /(?=\w{5,})(?=\D*\d{2})/;
console.log("string 1 & regex 2: " + regex2.test(string1));
let string2 = "abcd23";
console.log("string 2 & regex 2: " + regex2.test(string2));
My original answer was about Regex in a vacuum and I glossed over the fact that you were using Regex in conjunction with JavaScript, which works a little differently when comparing Regex to a string. I still don't know why your original answer was supposed to match two numbers, but I hope this is a bit more helpful.
?= Positive lookahead
w{5,} matches any word character (equal to [a-zA-Z0-9_])
{5,}. matches between 5 and unlimited
\D* matches any character that\'s not a digit (equal to [^0-9])
* matches between zero and unlimited
\d matches a digit (equal to [0-9])
This expression is global - so tries to match all
You can always check your expression using regex101

Is there a difference between \d and \d+? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
https://www.freecodecamp.com/challenges/find-numbers-with-regular-expressions
I was doing a lesson in FCC, and they mentioned that the digit selector \d finds one digit and adding a + (\d+) in front of the selector allows it to search for more than one digit.
I experimented with it a bit, and noticed that its the g right after the expression that searches for every number, not the +. I tried using \d+ without the g after the expression, and it only matched the first number in the string.
Basically, whether I use \d or \d+, as long as I have the g after the expression, It will find all of the numbers. So my question is, what is the difference between the two?
// Setup
var testString = "There are 3 cats but 4 dogs.";
var expression = /\d+/g;
var digitCount = testString.match(expression).length;
The g at the end means global, ie. that you want to search for all occurrences. Without it, you'll just get the first match.
\d, as you know, means a single digit. You can add quantifiers to specify whether you want to match all the following, or a certain amount of digits afterwards.
\d means a single digit
\d+ means all sequential digits
So let's say we have a string like this:
123 456
7890123
/\d/g will match [1,2,3,4,5,6,7,8,9,0,1,2,3]
/\d/ will match 1
/\d+/ will match 123
/\d+/g will match [123,456,7890123]
You could also use /\d{1,3}/g to say you want to match all occurrences where there are from 1 to 3 digits in a sequence.
Another common quantifier is the star symbol, which means 0 or more. For example /1\d*/g would match all sequences of digits that start with 1, and have 0 or more digits after it.
Counting the occurrences of \d will find the number of digits in the string.
Counting the occurrences of \d+ will find the number of integers in the string.
I.E.
123 456 789
Has 9 digits, but 3 integers.
\d means any digit from 0 to 9, the + says "one or more times".
As long as your numbers are single digit there is no difference, but in the string "I have 23 cows" and \d would match 2 alone whereas \d+ would match 23.

How to match digit in middle of a string efficiently in javascript?

I have strings like
XXX-1234
XXXX-1234
XX - 4321
ABCDE - 4321
AB -5677
So there will be letters at the beginning. then there will be hyphen. and then 4 digits. Number of letters may vary but number of digits are same = 4
Now I need to match the first 2 positions from the digits. So I tried a long process.
temp_digit=mystring;
temp_digit=temp_digit.replace(/ /g,'');
temp_digit=temp_digit.split("-");
if(temp_digit[1].substring(0,2)=='12') {}
Now is there any process using regex / pattern matching so that I can do it in an efficient way. Something like string.match(regexp) I'm dumb in regex patterns. How can I find the first two digits from 4 digits from above strings ? Also it would be great it the solution can match digits without hyphens like XXX 1234 But this is optional.
Try a regular expression that finds at least one letter [a-zA-Z]+, followed by some space if necessary \s*, followed by a hyphen -, followed by some more space if necessary \s*. It then matches the first two digits \d{2} after the pattern.:
[a-zA-Z]+\s*-\s*(\d{2})
may vary but number of digits are same = 4
Now I need to match the first 2 positions from the digits.
Also it would be great it the solution can match digits without hyphens like XXX 1234 But this is optional.
Do you really need to check it starts with letters? How about matching ANY 4 digit number, and capturing only the first 2 digits?
Regex
/\b(\d{2})\d{2}\b/
Matches:
\b a word boundary
(\d{2}) 2 digits, captured in group 1, and assigned to match[1].
\d{2} 2 more digits (not captured).
\b a word boundary
Code
var regex = /\b(\d{2})\d{2}\b/;
var str = 'ABCDE 4321';
var result = str.match(regex)[1];
document.body.innerText += result;
If there are always 4 digits at the end, you can simply slice it:
str.trim().slice(-4,-2);
here's a jsfiddle with the example strings:
https://jsfiddle.net/mckinleymedia/6suffmmm/

Regex to match only when expression match is no more than 12 characters long

I am trying to create a regular expression (Java/JavaScript) that matches the following regex, but only when there are fewer than 13 characters total (and a minimum of 4).
(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?) ← originally posted
(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ [A-Z]+)?)
These values should (and do) match:
MED-123
COTA-1224
MED4
COTB-892K777
MED-33 DDD
MED-234J5678
This value matches, but I don't want it to (I want to only match if there are fewer than 12 characters total):
COT-1111J11111111111111
See http://regexr.com/3bs7b http://regexr.com/3bsfv
I have tried grouping my expression and putting {4,12} at the end, but that just makes it look for 4 to 12 instances of the whole expression matching.
I feel like I am missing something simple...thanks in advance for your help!
You can use negative look-ahead:
(?!.{13,})(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?)
Since your expression already make sure that a match starts with COT or MED and there is at least one digit after that, it already guarantees that there are at least 4 characters
I have tried grouping my expression and putting {4,12} at the end, but
that just makes it look for 4 to 12 instances of the whole expression
matching.
This looks for 4 to 12 instances of the whole expression because you didn't add a word boundary \b. Your regex works fine, just add a word boundary and your desired outcome would be achieved. Take a look at this DEMO.
Your regex seems to be very clumsy and looks a little bit hard to read. It is also very limited to certain characters example JK except if you want it to be that way. For a more general pattern, you can check this out
(COT|MED)[AB]?-?[\dJK]{1,8}(\s+D{1,3})?\b
(COT|MED): matches either COT or MED
[AB]?: matches A or B which is optional because of the presence of ?
-?: matches - which is also optional
[\dJK]{1,8}: This matches a number,or J or K with a length of at least one character and a maximum of eight characters.
(\s+D{1,3})?: matches a space or a D at least one time and a maximum of 3 times and this is optional
\b: with respect to your question this seems to be the most important and it creates a boundary for the words that have already been matched. This means that anything exceeding the matched pattern would not be captured.
See the demo here DEMO2
The answer you are looking for is
(?!\S{13})(?:COT|MED)[ABCD]?-?\d{1,4}(?:[JK]+\d*|(?: [A-Z]+)?)
See regex demo
Note that it is almost impossible to check the length of a phrase that is not a whole string or that has spaces inside since boundaries are a bit "blurred". Thus, (?!\S{13}) is a kind of a workaround that just makes sure you do not have a string without whitespace that is 13 characters long or longer.
The regex breakdown:
(?!\S{13}) - Check if the substring that follows does not consist of 13 non-whitespace characters
(?:COT|MED) - Any of the values in the alternation (COTorMED`)
[ABCD]?-? - Optional A, B, C, D and then an optional -
\d{1,4} - 1 to 4 digits
(?:[JK]+\d*|(?: [A-Z]+)?) - a group of 2 alternatives:
[JK]+\d* - J or K, 1 or more times, and then 0 or more digits
(?: [A-Z]+)? - optional space and 1 or more Latin uppercase letters
As this answer suggests, you could solve this this way:
(?=(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?))(?={4 , 12})

Regular Expression formatting

I have an input box and the condition is to allow the user to enter only numbers, the numbers entered should be in the following format in groups of 4, ex: 4444 5555 and the maximum number of characters to be entered in the textbox should be 9. I am pretty new to regex, so have no clue of how to start. A working sample in fiddle would be of great help.
If the requirement is strictly 10 numbers in the above grouping with spaces in the middle, the regex is simple:
/^\d{4}\s\d{4}\s\d{2}$/
Where \d means that it would only match a numeric character, {4} means that it would look exactly 4 times for the previous match (\d), and in this case that would match 4 numeric characters. \s means one whitespace, and similarly like the {4}, \d{2} matches 2 numeric characters. The ^ and $ mean start of the string to be matched and end of the string to be matched respectively.
Hope this helps.
If the length is fixed then you can just use \d to represent a digit
/^\d\d\d\d \d\d\d\d \d\d\d\d \d\d$/
or use the {n} multiplier instead
/^\d{4} \d{4} \d{4} \d\d$/
if instead the total length is arbitrary and you just want to be sure that every four digits you have a space things are just slightly more complex:
/^(\d{4} )*\d{1,4}$/
the meaning is that you want zero or more groups formed with 4 digits and one space followed by 1 to 4 digits. In the last part you can use {0,4} if you also want to accept an empty string as a valid response.
If you want 1 or more of something use '+'. For example 4+ would be 1 or more consecutive '4's.
Use * to for things that you want 0 or more of!
Use parentheses for groups of characters or groups of other groups.
If you want a space in between, then use the space character between two of them.
It looks like you want 1 or more '4's followed by 0 or more (space followed by 1 or more '4's)
This regular express would match all of the following strings: "4+( 4+)*"
44444
4 44 4
4 4 4
4
4444444444
4 4
44444444444444 44444444444444444 4444444444444444
4444 4444 44
As per example provided this regex will help
/^[0-9][0-9 ]*$/
This represent numbers with spaces in between. For eg. 444 444. But if you put in this way ' 444 444' like first inserting space then start the numbers then it wont allow.
For that you can use /^[0-9 ]*$/
^ represent start and $ represent end. So between start and end you can write numbers with spaces.

Categories