Overlapping named capturing groups

Overlapping named capturing groups - javascript

I'm using named capturing groups to validate and extract data out of a product number. The format of the product number looks like this:
1102961D048.075
Chars 1-2 gender_code 11
Chars 1-6 style 110296
Chars 7-8 width_code 1D
Chars 9-11 color_code 048
Char 12 delimiter ignored
Chars 13-15 size_code 075
My current code looks like this:
const validateMpn = (mpn) => {
const regex = /(?<style>\d{6})(?<width>\d{1}[ABDE])(?<color_code>\d{3})\.(?<size_code>\d{3})/gi
const match = regex.exec(mpn)
if (!match) {
return null
}
return match.groups
}
const str1 = '1102961D048.075'
const str2 = '1200322A001.085'
const match1 = validateMpn(str1)
const match2 = validateMpn(str2)
console.log(match1)
console.log(match2)
As gender_code and style overlap I'm not sure how to get them both. Therefore I have the following questions:
Is it possible to this with only one regular expression?
If yes, how could I accomplish this?

Sure, just place gender inside the style group:
const validateMpn = (mpn) => {
const regex = /(?<style>(?<gender>\d{2})\d{4})(?<width>\d{1}[ABDE])(?<color_code>\d{3})\.(?<size_code>\d{3})/gi
const match = regex.exec(mpn)
if (!match) {
return null
}
return match.groups
}
const str1 = '1102961D048.075'
const str2 = '1200322A001.085'
const match1 = validateMpn(str1)
const match2 = validateMpn(str2)
console.log(match1)
console.log(match2)

I suggest just having separate capture groups for the first two and four following characters. Then, form the style by just concatenating together the first two capture groups:
var input = "1102961D048.075";
var regex = /(.{2})(.{4})(.{2})(.{3}).(.{3})/g;
var match = regex.exec(input);
console.log("gender_code: " + match[1]);
console.log("style: " + match[1] + match[2]);
As a style note, I prefer not using named capture groups, because they tend to result in a bloated regex which is hard to read.

Yes you can capture gender_code using positive look ahead using this regex,
(?=(..))(\d{6})(\d{1}[ABDE])(\d{3})\.(\d{3})
Regex Demo
This is named groups regex but will only work in Chrome browser
and named capture grouping will be available in ECMAScript 2018 and is only supported in Chrome as of now.
This JS demo will work in Chrome as that is the only one as of now supporting EcmaScript2018,
const validateMpn = (mpn) => {
const regex = /(?=(?<gender_code>\d\d))(?<style>\d{6})(?<width>\d{1}[ABDE])(?<color_code>\d{3})\.(?<size_code>\d{3})/gi
const match = regex.exec(mpn)
if (!match) {
return null
}
return match.groups
}
const str1 = '1102961D048.075'
const str2 = '1200322A001.085'
const match1 = validateMpn(str1)
const match2 = validateMpn(str2)
console.log(match1)
console.log(match2)

Related

Lowercase everything after firts appearance of the character in a string in JS

Lowercase everything after firts appearance of the character in a string in JS

One option is using regular expression:
str.replace(/\.([^.]*?)$/, (m) => m.toLowerCase())

What you can do is splitting the string at ".", then convert the last part .toLowerCase() and finally .join() everything back together.
const t = 'qwery.ABC.ABC';
const parts = t.split(".");
console.log(parts.slice(0, -1).join(".") + "." + parts[parts.length - 1].toLowerCase());
One could argue whether that would actually be a cleaner variant. What usually isn't a bad idea for code readability is writing a utility function for that use case.
const t = "qwery.ABC.ABC";
const lastBitToLowerCase = (text, separator) => {
const parts = t.split(separator);
return `${parts.slice(0, -1).join(separator)}${separator}${parts[
parts.length - 1
].toLowerCase()}`;
};
const result = lastBitToLowerCase(t, "."); // "qwery.ABC.abc"

Regex using negative lookahead:
const re = /\.((?:.(?!\.))+)$/;
const inputs = [
"qwerty.ABC.ABC",
"yuiop.uu",
"QWERT.YUIOP"
];
inputs.forEach(input => {
const result = input.replace(re, x => x.toLowerCase());
console.log(input, "-->", result);
});
Regex described here: https://regexr.com/6qk6r

reg expression on OS 10 vs OS 11 [duplicate]

I am looking for an alternative for this:
(?<=\.\d\d)\d
(Match third digit after a period.)
I'm aware I can solve it by using other methods, but I have to use a regular expression and more importantly I have to use replace on the string, without adding a callback.

Turn the lookbehind in a consuming pattern and use a capturing group:
And use it as shown below:
var s = "some string.005";
var rx = /\.\d\d(\d)/;
var m = s.match(/\.\d\d(\d)/);
if (m) {
console.log(m[1]);
}
Or, to get all matches:
const s = "some string.005 some string.006";
const rx = /\.\d\d(\d)/g;
let result = [], m;
while (m = rx.exec(s)) {
result.push(m[1]);
}
console.log( result );
An example with matchAll:
const result = Array.from(s.matchAll(rx), x=>x[1]);
EDIT:
To remove the 3 from the str.123 using your current specifications, use the same capturing approach: capture what you need and restore the captured text in the result using the $n backreference(s) in the replacement pattern, and just match what you need to remove.
var s = "str.123";
var rx = /(\.\d\d)\d/;
var res = s.replace(rx, "$1");
console.log(res);

How to match identical strings in Javascript?

Take these two urls:
const url1 = '/user/{username}/edit'
const url2 = '/user/harry/edit'
Is there a solution to match these two urls and return true as they are similar?
I tried the following and should be the worst solution:
const url1 = '/user/{username}/edit'
const url2 = '/user/harry/edit'
const split1 = url1.split('/')
const split2 = url2.split('/')
let matchCount = 0
let notMatchedCount = 0
split1.map(x => {
if(x === split2[x]) {
matchCount++
} else {
notMatchedCount++
}
})
if(matchCount > notMatchedCount) {
console.log('Match Found')
} else {
console.log('Match not found')
}
EDIT
Solution was to use PathToRegExp package! Thanks to #ChiragRavindra!

You could use a regex to test the url
\/user\/ matching /user/
\w+ matching 1 or more word characters ([a-zA-Z0-9_]+)
\/edit matching /edit
const url1 = '/user/{username}/edit';
const urlCorrect = '/user/harry/edit';
const urlWrong = '/users/harry/edit';
//generate a regex string by escaping the slashes and changing word between curly brackets with {\w+}
var regexString = url1.replace(/\{\w+\}/g, '\\w+').replace(/\//g, '\\/');
console.log('generating regex: ' + regexString);
var regex = new RegExp(regexString);
//test using the generated regex
console.log(regex.test(urlCorrect));
console.log(regex.test(urlWrong));

I would suggest you to look inside this library
NPM - String similarity library
Library simply returns the probability of comparing two strings if they're similar.
Then it's all on you to set up the threshold from how many percentages you assume that they're the same.

Creating a regex to replace each matched character of a string with same character

In my application, I have an alphanumeric string being passed into my function. This string is typically 17 characters, but not always. I'm trying to write a regex that matches all but the last 4 characters in the string, and replaces them with X (to mask it).
For example
Input: HGHG8686HGHG8686H
Output: XXXXXXXXXXXXX686H
The Regex I wrote to perform the replace on the string is as follows
[a-zA-Z0-9].{12}
Code:
const maskedString = string.replace(/[a-zA-Z0-9].{12}/g, 'X');
The issue I'm having is that it's replacing all but the last 4 characters in the string with just that single X. It doesn't know to do that for every matched character. Any ideas?

you can use a function inside replace to do this, something like this will do:
var str = "HGHG8686HGHG8686H"
var regexp = /[a-zA-Z0-9]+(?=....)/g;
var modifiedStr = str.replace(regexp, function ($2) {
return ('X'.repeat($2.length +1));
});
console.log(modifiedStr);

The simple version: (Easier to read)
const maskedString = string.replace(/(.{4})$|(^(..)|(.))/g, 'X\1'); // or X$1
Now using: [a-zA-Z0-9]
const maskedString = string.replace(/([a-zA-Z0-9]{4})$|(^([a-zA-Z0-9]{2})|([a-zA-Z0-9]{1}))/g, 'X\1'); // or X$1
Note: The reason i match on the START PLUS TWO characters is to offset the first match. (The final 4 characters that are appended at the end.)

Look ahead (?=) to make sure there are at least four following characters.
const regex = /.(?=....)/g;
// ^ MATCH ANYTHING
// ^^^^^^^^ THAT IS FOLLOWED BY FOUR CHARS
function fix(str) { return str.replace(regex, 'X'); }
const test = "HGHG8686HGHG8686H";
// CODE BELOW IS MERELY FOR DEMO PURPOSES
const input = document.getElementById("input");
const output = document.getElementById("output");
function populate() { output.textContent = fix(input.value); }
input.addEventListener("input", populate);
input.value = test;
populate();
<p><label>Input: </label><input id="input"></p>
<p>Output: <span id="output"></span></p>
A non-regexp solution:
const test = "HGHG8686HGHG8686H";
function fix(str) {
return 'X'.repeat(str.length - 4) + str.slice(-4);
}
console.log(fix(test));
You will not find String#repeat in IE.

You can achieve using following method:
var str = "HGHG8686HGHG8686H"
var replaced=''
var match = str.match(/.+/)
for(i=0;i<match[0].length-4;i++){
str = match[0][i]
replaced += "X"
}
replaced += match[0].substr(match[0].length-4)
console.log(replaced);

JavaScript regex: Positive lookbehind alternative (for Safari and other browsers that do not support lookbehinds)

I am looking for an alternative for this:
(?<=\.\d\d)\d
(Match third digit after a period.)
I'm aware I can solve it by using other methods, but I have to use a regular expression and more importantly I have to use replace on the string, without adding a callback.

Turn the lookbehind in a consuming pattern and use a capturing group:
And use it as shown below:
var s = "some string.005";
var rx = /\.\d\d(\d)/;
var m = s.match(/\.\d\d(\d)/);
if (m) {
console.log(m[1]);
}
Or, to get all matches:
const s = "some string.005 some string.006";
const rx = /\.\d\d(\d)/g;
let result = [], m;
while (m = rx.exec(s)) {
result.push(m[1]);
}
console.log( result );
An example with matchAll:
const result = Array.from(s.matchAll(rx), x=>x[1]);
EDIT:
To remove the 3 from the str.123 using your current specifications, use the same capturing approach: capture what you need and restore the captured text in the result using the $n backreference(s) in the replacement pattern, and just match what you need to remove.
var s = "str.123";
var rx = /(\.\d\d)\d/;
var res = s.replace(rx, "$1");
console.log(res);

We Keep Coding

JavaScript is the programming language of the Web.

Overlapping named capturing groups - javascript

Related

Lowercase everything after firts appearance of the character in a string in JS

reg expression on OS 10 vs OS 11 [duplicate]

How to match identical strings in Javascript?

Creating a regex to replace each matched character of a string with same character

JavaScript regex: Positive lookbehind alternative (for Safari and other browsers that do not support lookbehinds)

Categories

Resources