Dynamically compare accuracy of regex patterns for match - javascript

Supposing I have two regexes and both match a string, but one of them matches it in a stricter way, is there a way to figure that out programmatically?
For example, I'm matching this string:
/path/on/file/system
and I have the following two regular expressions:
const opt1 = /\/path\/on/;
const opt2 = /\/path/;
I can see with my eyes that opt1 is stricter, but how can javascript know about that?
Is converting the regex to a string and checking for character length a good measure of strictness?

You can implement this function by:
Sorting your regular expressions by length.
loop through your sorted regular expressions array to check if there is a match
then return the most strict matching element.
function check(arrayOfRegEx, str) {
//sort the regex array by length
var sortedArr = arrayOfRegEx.sort(function(a, b) {
return a.toString().length - b.toString().length || a.toString().localeCompare(b);
});
let mostStrict = '';
sortedArr.forEach(function(reg) {
if(new RegExp((reg.toString()).replace(/\\/g, "").substring(1)).test(str)) {
mostStrict = reg;
}
});
return mostStrict;
}
var result = check([/\/path/, /\/test\/test/, /\/path\/on/], '/path/on/file/system');
console.log(result); // returns /\/path\/on/
And of course you can tweak this function to fit your needs

Related

Get all occurrences of value in between two patterns in JavaScript

I have to the following long string. How do I extract all the values that are in between "url=" and "," so that I then have the following array?
"load("#bazel_tools//tools/build_defs/repo:http.bzl","http_jar")definclude_java_deps():http_jar(name="com_google_inject_guice",sha256="b378ffc35e7f7125b3c5f3a461d4591ae1685e3c781392f0c854ed7b7581d6d2",url="https://repo1.maven.org/maven2/com/google/inject/guice/4.0/guice-4.0.jar",)http_jar(name="org_sonatype_sisu_inject_cglib",sha256="42e1dfb26becbf1a633f25b47e39fcc422b85e77e4c0468d9a44f885f5fa0be2",url="https://repo1.maven.org/maven2/org/sonatype/sisu/inject/cglib/2.2.1-v20090111/cglib-2.2.1-v20090111.jar",)http_jar(name="javax_inject_javax_inject",sha256="91c77044a50c481636c32d916fd89c9118a72195390452c81065080f957de7ff",url="https://repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1.jar",)"
[
https://repo1.maven.org/maven2/com/google/inject/guice/4.0/guice-4.0.jar,
https://repo1.maven.org/maven2/org/sonatype/sisu/inject/cglib/2.2.1-v20090111/cglib-2.2.1-v20090111.jar,
https://repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1.jar
]
I've tried the following, but it only gives me the first occurrence of it, but I need them all. Thanks!
var arr = contents.split('url=').pop().split(',')
for(i in arr) {
console.log(arr[i]);
}
You can solve this by using a Regular Expression
const regEx = /(?:url=")([^,]+)(?:",)/gm;
const string = 'load("#bazel_tools//tools/build_defs/repo:http.bzl","http_jar")definclude_java_deps():http_jar(name="com_google_inject_guice",sha256="b378ffc35e7f7125b3c5f3a461d4591ae1685e3c781392f0c854ed7b7581d6d2",url="https://repo1.maven.org/maven2/com/google/inject/guice/4.0/guice-4.0.jar",)http_jar(name="org_sonatype_sisu_inject_cglib",sha256="42e1dfb26becbf1a633f25b47e39fcc422b85e77e4c0468d9a44f885f5fa0be2",url="https://repo1.maven.org/maven2/org/sonatype/sisu/inject/cglib/2.2.1-v20090111/cglib-2.2.1-v20090111.jar",)http_jar(name="javax_inject_javax_inject",sha256="91c77044a50c481636c32d916fd89c9118a72195390452c81065080f957de7ff",url="https://repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1.jar",)'
const matches = string.matchAll(regEx);
for (const match of matches) {
console.log(match[1]);
}
Or with string methods
const string = 'load("#bazel_tools//tools/build_defs/repo:http.bzl","http_jar")definclude_java_deps():http_jar(name="com_google_inject_guice",sha256="b378ffc35e7f7125b3c5f3a461d4591ae1685e3c781392f0c854ed7b7581d6d2",url="https://repo1.maven.org/maven2/com/google/inject/guice/4.0/guice-4.0.jar",)http_jar(name="org_sonatype_sisu_inject_cglib",sha256="42e1dfb26becbf1a633f25b47e39fcc422b85e77e4c0468d9a44f885f5fa0be2",url="https://repo1.maven.org/maven2/org/sonatype/sisu/inject/cglib/2.2.1-v20090111/cglib-2.2.1-v20090111.jar",)http_jar(name="javax_inject_javax_inject",sha256="91c77044a50c481636c32d916fd89c9118a72195390452c81065080f957de7ff",url="https://repo1.maven.org/maven2/javax/inject/javax.inject/1/javax.inject-1.jar",)'
const arr = string.split('url="');
const urls = arr
.filter((subStr) => subStr.includes('https://'))
.map((subStr) => subStr.split('",)')[0]);
console.log(urls);
The RegEx solution is of course widely more flexible.
Be aware that Regular Expressions can be "unsafe" which means they might have extremely long evaluation times depending on the input. Libraries like this can help you detect these

How to take value using regular expressions?

I have such a string "Categ=All&Search=Jucs&Kin=LUU".How to get an array of values from this line [All,Jucs,LUU].
Here is an example
let x = /(\b\w+)$|(\b\w+)\b&/g;
let y = "Categories=All&Search=Filus";
console.log(y.match(x));
but I wanted no character &.
Since this looks like a URL query string, you can treat it as one and parse the data without needing a regex.
let query = "Categ=All&Search=Jucs&Kin=LUU",
parser = new URLSearchParams(query),
values = [];
parser.forEach(function(v, k){
values.push(v);
});
console.log(values);
Docs: https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams
Note: This may not work in IE, if that's something you care about.
Loop through all matches and take only the first group, ignoring the =
let x = /=([^&]+)/g;
let y = "Categories=All&Search=Filus";
let match;
while (match = x.exec(y)) {
console.log(match[1]);
}
To achieve expected result, use below option of using split and filter with index to separate Keys and values
1. Use split([^A-Za-z0-9]) to split string based on any special character other letters and numbers
2. Use Filter and index to get even or odd elements of array for keys and values
var str1 = "Categ=All&Search=Jucs&Kin=LUU";
function splitter(str, index){
return str.split(/[^A-Za-z0-9]/).filter((v,i)=>i%2=== index);
}
console.log(splitter(str1, 0)) //["Categ", "Search", "Kin"]
console.log(splitter(str1, 1))//["All", "Jucs", "LUU"]
codepen - https://codepen.io/nagasai/pen/yWMYwz?editors=1010

How to match a string against an array of pattern in Javascript?

I've two variables:
var input = "user1#gmail.com";
var preferredPatterns = ["*#gmail.com", "*#yahoo.com", "*#live.com"];
Want to match the input with preferred pattern array. If any of the patterns matches I've to do certain task (in this example, input is a definite match). How can I match against an array of pattern in javascript?
You can compile your patterns (if they are valid regular expressions) into one for performance:
var masterPattern = new RegExp(patterns.join('|'));
Putting it all together:
var input = 'user1#gmail.com';
var preferredPatterns = [
".*#gmail.com$",
".*#yahoo.com$",
".*#live.com$"
];
var masterPattern = new RegExp(preferredPatterns.join('|'));
console.log(masterPattern.test(input));
// true
You need to use RegExp constructor while passing a variable as regex.
var input = 'user1#gmail.com';
var preferredPatterns = [".*#gmail\\.com$", ".*#yahoo\\.com$", ".*#live\\.com$"];
for (i=0; i < preferredPatterns.length;i++) {
if(input.match(RegExp(preferredPatterns[i]))) {
console.log(preferredPatterns[i])
}
}
Dot is a special meta-character in regex which matches any character. You need to escape the dot in the regex to match a literal dot.
As #zerkms said, you could use the below list of patterns also.
var preferredPatterns = ["#gmail\\.com$", "#yahoo\\.com$", "#live\\.com$"];
Try this helper function:
/**
* Returns an integer representing the number of items in the patterns
* array that contain the target string text
*/
function check(str, patterns) {
return patterns.reduce(function (previous, current) {
return previous + (str.indexOf(current) == -1 ? 0 : 1);
}, 0);
}
check("user#gmail.com", ["#gmail.com", "#yahoo.com", "#live.com"]; // returns 1
check("user#live.com", ["#gmail.com", "#yahoo.com", "#live.com"]; // returns 0
If you want a general approach to matching against a list of regular expressions then some version of Avinash Raj's answer will work.
Based on the fact that you are specifying certain domains, you might want to match any valid email address using the regex here, and if it matches then check if the domain is a preferred one. There are a number of different ways you could do that of course. Here's just a simple example, splitting on the # and using jQuery.inArray() to check if the domain is preferred.
var preferredDomains = ["gmail.com", "yahoo.com", "live.com"];
function isValid(inputVal) {
var re = /^([\w-]+(?:\.[\w-]+)*)#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$/i;
return re.test(inputVal) && $.inArray(inputVal.split('#')[1], preferredDomains) > -1;
}
The advantage here is that the underlying regex doesn't change, just the much easier to read/maintain list of domains. You could tweak this to capture the domain in a group, instead of using split().
No regexp you may do as follows;
function matchPattern(xs, [y,...ys]){
function helper([x,...xs]){
return "*" + xs.join('') === y ? true
: xs.length ? helper(xs)
: false;
}
return helper(xs) ? y
: ys.length ? matchPattern(xs,ys)
: "No Match..!";
}
var input = "user1#gmail.com",
preferredPatterns = ["*#yahoo.com", "*#live.com", "*#gmail.com"];
result = matchPattern(input, preferredPatterns);
console.log(result);
preferredPatterns.forEach(function(element, index){
if(input.match('/'+element) != null){
console.log('matching ' + element)
}
})
you can write your custom logic if a string matches any pattern.
You may iterate through array and then use regex to compare with individual items.

replace non matches between delimiters

I've have a input string:
12345,3244,654,ffgv,87676,988ff,87657
I'm having a difficulty to transform all terms in the string that are not five digit numbers to a constant 34567 using regular expressions. So, the output would be like this:
12345,34567,34567,34567,87676,34567,87657
For this, I looked at two options:
negated character class: Not useful because it does not execute directly on this expression ,[^\d{5}],
lookahead and lookbehind: Issue here is that it doesn't include non-matched part in the result of this expression ,(?!\d{5}) or (?<!\d{5}), for the purpose of substitution/replace.
Once the desired expression is found, it would give a result so that one can replace non-matched part using tagged regions like \1, \2.
Is there any mechanism in regular expression tools to achieve the output as mentioned in the above example?
Edit: I really appreciate those who have answered non-regex solutions, but I would be more thankful if you provide a regex-based solution.
You don't need regex for this. You can use str.split to split the string at commas first and then for each item check if its length is greater than or equal to 5 and it contains only digits(using str.isdigit). Lastly combine all the items using str.join.
>>> s = '12345,3244,654,ffgv,87676,988ff,87657'
>>> ','.join(x if len(x) >= 5 and x.isdigit() else '34567' for x in s.split(','))
'12345,34567,34567,34567,87676,34567,87657'
Javascript version:
function isdigit(s){
for(var i=0; i <s.length; i++){
if(!(s[i] >= '0' && s[i] <= '9')){
return false;
}
}
return true;
}
arr = "12345,3244,654,ffgv,87676,988ff,87657".split(",");
for(var i=0; i < arr.length; i++){
if(arr[i].length < 5 || ! isdigit(arr[i])) arr[i] = '34567';
}
output = arr.join(",")
Try the following: /\b(?!\d{5})[^,]+\b/g
It constrains the expression between word boundaries (\b),
Followed by a negative look-ahead for non five digit numbers (!\d{5}),
Followed by any characters between ,
const expression = /\b(?!\d{5})[^,]+\b/g;
const input = '12345,3244,654,ffgv,87676,988ff,87657';
const expectedOutput = '12345,34567,34567,34567,87676,34567,87657';
const output = input.replace(expression, '34567');
console.log(output === expectedOutput, expectedOutput, output);
This approach uses /\b(\d{5})|(\w+)\b/g:
we match on boundaries (\b)
our first capture group captures "good strings"
our looser capture group gets the leftovers (bad strings)
our replacer() function knows the difference
const str = '12345,3244,654,ffgv,87676,988ff,87657';
const STAND_IN = '34567';
const massageString = (str) => {
const pattern = /\b(\d{5})|(\w+)\b/g;
const replacer = (match, goodstring, badstring) => {
if (goodstring) {
return goodstring;
} else {
return STAND_IN;
}
}
const r = str.replace(pattern,replacer);
return r;
};
console.log( massageString(str) );
I think the following would work for value no longer than 5 alphanumeric characters:
(,(?!\d{5})\w{1,5})
if longer than 5 alphanumeric characters, then remove 5 in above expression:
(,(?!\d{5})\w{1,})
and you can replace using:
,34567
You can see a demo on regex101. Of course, there might be faster non-regex methods for specific languages as well (python, perl or JS)

How to extract a text using regex?

My Text
1618148163###JASSER-PC#-#1125015374###anas kayyat#-#1543243035###anas kayyat#-#
Result Should Be:
JASSER-PC
anas kayyat
anas kayyat
I am using :
(?<=###)(.+)(?=#-#)
But it gives me that :
JASSER-PC#-#1125015374###anas kayyat#-#1543243035###anas kayyat
JavaScript’s regular expressions don’t support look-behind assertions (i.e. (?<=…) and (?<!…)), so you can’t use that regular expression. But you can use this:
###(.+)(?=#-#)
Then just take the matched string of the first group. Additionally, to only match as little as possible, make the + quantifier non-greedy by using +?.
The group (.+) will match as much as it can (it's "greedy"). To make it find a minimal match you can use (.+?).
JavaScript does not support lookbehinds. Make the quantifier non greedy, and use:
var regex = /###(.+?)#-#/g;
var strings = [];
var result;
while ((result = regex.exec(input)) != null) {
strings.push(result[1]);
}
I'll give you a non-regex answer, since using regular expressions isn't always appropriate, be it speed or readibility of the regex itself:
function getText(text) {
var arr = text.split("###"); // arr now contains [1618148163,JASSER-PC#-#1125015374,anas kayyat#-#1543243035,anas kayyat#-#]
var newarr = [];
for(var i = 0; i < arr.length; i++) {
var index = arr[i].indexOf("#-#");
if(index != -1) { // if an array element doesn't contain "#-#", we ignore it
newarr.push(arr[i].substring(0, index));
}
}
return newarr;
}
Now, using
getText("1618148163###JASSER-PC#-#1125015374###anas kayyat#-#1543243035###anas kayyat#-#");
returns what you wanted.

Categories