Regex pattern that will match similar or same strings in subject

Regex pattern that will match similar or same strings in subject - javascript

I don't even know how to properly set Title for this question. So I've been trying to make something, but I failed. I assume it'd be best to show a few examples below of what I want to accomplish.
// Let's say I have a list of some tags/slugs.
$subjects = [
'this-is-one',
'might-be-two',
'yessir',
'indeednodash',
'but-it-might'
];
$patterns = [
'this-is-one', // should match $subjects[0]
'mightbetwoorthree', // should match $subject[1]
'yes-sir', // should match $subject[2]
'indeednodash', // should match $subject[3]
'but-it-might-be-long-as-well' // should match $subject[4]
];
So, as one might see... Some of the patterns, do not fully/exactly match the given subject... So that's my problem. I want to make a regex, that would match all those possible variations.
I tried something basic, within foreach loop, but ofc it won't work as it's not fully matched...
if (preg_match("/\b$pattern\b/", $subject)) { // ... }
Any suggestions, explanations and code samples, please... I am trying to wrap my mind around regex, but not going well.
I will tag JS as well, because not necesserily has to do anything with php or preg_match.

function getMatchesOf(pattern, subjects) {
var result = [];
pattern = pattern.replace(/[^a-z]/g, '');
subjects.forEach(function(subject) {
var _subject = subject.replace(/[^a-z]/g, '');
if(pattern.includes(_subject))
result.push(subject);
});
return result;
}
var subjects = [
'this-is-one',
'might-be-two',
'yessir',
'indeednodash',
'but-it-might'
];
var patterns = [
'this-is-one',
'mightbe',
'yes-sir',
'indeednodash',
'but-it-might-be-long-as-well'
];
console.log(patterns[0] + " matches: ", getMatchesOf(patterns[0], subjects));
console.log(patterns[4] + " matches: ", getMatchesOf(patterns[4], subjects));

Related

Javascript cut string by begin and end and store in array

I need a algorithm which is doing something like this:
var example = "Hello $$user$$ your real name is $$realname$$. Have a good day"
Output --> ["Hello ", "$$user$$", " your real name is ", "$$realname$$", ". Have a good day"]
Hence, split the part by a selected character and put them together in a string array. Can someone help me out?
I'm looking for a solution with JavaScript/jQuery

It seems you want to split by pattern $$...$$; You could use /(\$\$.*?\$\$)/; To keep the pattern in the result, you can make it a capture group, and also make it lazy (?) so that it will split with the shortest length pattern matched:
example.split(/(\$\$.*?\$\$)/)
#[ 'Hello ',
# '$$user$$',
# ' your real name is ',
# '$$realname$$',
# '. Have a good day' ]

Yes, this is possible with JavaScript itself... Slightly tricky, but yes.
var strings = [], tokens = [];
var str = "Hello $$user$$ your real name is $$realname$$. Have a good day".replace(/\$\$(.*?)\$\$/g, "\$\$TOKEN$1\$\$").split("$");
for (var i = 0; i < str.length; i++) {
if (str[i].indexOf("TOKEN") === 0) {
// This is a token.
tokens.push(str[i].replace("TOKEN", ""));
} else {
strings.push(str[i]);
}
}
str = str.map(function (v) {
if (v.indexOf("TOKEN") === 0)
return "$$" + v.replace("TOKEN", "") + "$$";
return v;
});
console.log(str);
console.log(strings);
console.log(tokens);
The above code will split everything into tokens. And on top of it, it also separates the strings and tokens out. The above one gives as per your requirement:
[
"Hello ",
"$$user$$",
" your real name is ",
"$$realname$$",
". Have a good day"
]
Kindly note, there's nothing like {value, value}, there's only [value, value].

String.split()
The split() method splits a String object into an array of strings by separating the string into substrings.
var example = "Hello $$user$$ your real name is $$realname$$. Have a good day";
var exSplit = example.split("$$");
var userIdx = exSplit.indexOf("user");
var nameIdx = exSplit.indexOf("realname");
document.querySelector(".user").innerHTML = exSplit[userIdx];
document.querySelector(".name").innerHTML = exSplit[nameIdx];
<div class="user"></div>
<div class="name"></div>
Though, if I may suggest, variables can handle this type of operation without all of the hassle.

Regex: How to determine if a string contains the entire set of a given substring?

Say I have an array of substrings `['abcd', 'xyz', '091823', '9-+#$_#$*']. How can I use regex to make sure that a given string contains ALL of these?

It sounds like you want to take a dynamic array of search terms and check whether another string contains all of the search terms. I don't think you will find a regex-only solution; you should loop through the terms and check each one individually. Also, if the terms are dynamic you probably don't want to use regex unless you know they don't need sanitization (for example, '9-+#$_#$*' contains special regex characters).
Here is a sample solution:
var terms = ['abcd', 'xyz', '091823', '9-+#$_#$*'];
var tests = {
success: 'abcd 091823 xyz 9-+#$_#$*',
failure: 'abcd 091823 xyz'
}
var testKeys = Object.keys(tests);
for(var test of testKeys){
var pass = true;
for(var term of terms){
if(tests[test].indexOf(term) == -1){
pass = false;
break;
}
}
document.body.innerHTML += `${test}: ${pass}<br>`
}

Talking ES6, you are able to use includes() method without using Regular Expressions:
// When `str` doesn't contain all substrings
var str = 'there is no desired substring here';
result = ['abcd', 'xyz', '091823', '9-+#$_#$*'].every(function(e) {
return str.includes(e);
});
console.log(result);

Try using lookarounds:
/(?=.*abcd)(?=.*xyz)(?=.*091823)(?=.*9-\+#\$_#\$\*)/
For the general case of any array of strings to test:
function containsAll(str, arrayOfStr) {
return new RegExp(arrayOfStr.map(function (s) {
return '(?=.*' + s.replace(/([-\\[\]{}.^$+*()|?])/g, '\\$1') + ')';
}).join('')).test(str);
}
containsAll(str, ['abcd', 'xyz', '091823', '9-+#$_#$*']);

get all but last occurrences of a pattern in javascript

Given the following patterns:
"profile[foreclosure_defenses_attributes][0][some_text]"
"something[something_else_attributes][0][hello_attributes][0][other_stuff]"
I am able to extract the last part using non-capturing groups:
var regex = /(?:\w+(\[\w+\]\[\d+\])+)(\[\w+\])/;
str = "profile[foreclosure_defenses_attributes][0][properties_attributes][0][other_stuff]";
match = regex.exec(str);
["profile[foreclosure_defenses_attributes][0][properties_attributes][0][other_stuff]", "[properties_attributes][0]", "[other_stuff]"]
However, I want to be able to get everything but the last part. In other words, everything but [some_text] or [other_stuff].
I cannot figure out how to do this with noncapturing groups. How else can I achieve this?

Something like?
shorter, and matches from the back if you can have more of the [] items.
var regex = /(.*)(?:\[\w+\])$/;
var a = "something[something_else_attributes][0][hello_attributes][0][other_stuff11][other_stuff22][other_stuff33][other_stuff44]".match(regex)[1];
a;
or using replace, though less performant.
var regex = /(.*)(?:\[\w+\])$/;
var a = "something[something_else_attributes][0][hello_attributes][0][other_stuff11][other_stuff22][other_stuff33][other_stuff44]".replace(regex, function(_,$1){ return $1});
a;

If those really are your strings:
var regex = /(.*)\[/;

Change occurrences of sum(something) to something_sum

Admittedly I'm terrible with RegEx and pattern replacements, so I'm wondering if anyone can help me out with this one as I've been trying now for a few hours and in the process of pulling my hair out.
Examples:
sum(Sales) needs to be converted to Sales_sum
max(Sales) needs to be converted to Sales_max
min(Revenue) needs to be converted to Revenue_min
The only available prefixed words will be sum, min, max, avg, xcount - not sure if this makes a difference in the solution.
Hopefully that's enough information to kind of show what I'm trying to do. Is this possible via RegEx?
Thanks in advance.

There are a few possible ways, for example :
var str = "min(Revenue)";
var arr = str.match(/([^(]+)\(([^)]+)/);
var result = arr[2]+'_'+arr[1];
result is then "Revenue_min".
Here's a more complex example following your comment, handling many matches and lowercasing the verb :
var str = "SUM(Sales) + MIN(Revenue)";
var result = str.replace(/\b([^()]+)\(([^()]+)\)/g, function(_,a,b){
return b+'_'+a.toLowerCase()
});
Result : "Sales_sum + Revenue_min"

Try with:
var input = 'sum(Sales)',
matches = input.match(/^([^(]*)\(([^)]*)/),
output = matches[2] + '_' + matches[1];
console.log(output); // Sales_sum
Also:
var input = 'sum(Sales)',
output = input.replace(/^([^(]*)\(([^)]*)\)/, '$2_$1');

You can use replace with tokens:
'sum(Sales)'.replace(/(\w+)\((\w+)\)/, '$2_$1')

Using a whitelist for your list of prefixed words:
output = input.replace(/\b(sum|min|max|avg|xcount)\((.*?)\)/gi,function(_,a,b) {
return b.toLowerCase()+"_"+a;
});
Added \b, a word boundary. This prevents something like "haxcount(xorz)" from becoming "haxorz_xcount"

Find chars in string but prefer consecutive chars with NFA without atomic grouping

I'm trying to create a regex that will find chars anywhere in a string. I would prefer if they would first find consecutive chars though.
Let me give an example, assume s = 'this is a test test string' and I'm searching for tst I would want to find it like so:
// Correct
// v vv
s = 'this is a test test string'
And not:
// Incorrect
// v v v
s = 'this is a test test string'
Also if s = 'this is a test test tst string'
// Correct
// vvv
s = 'this is a test test tst string'
A couple of things to note:
The searching chars are user supplied (tst in this case)
I'm using javascript so I can't support atomi grouping, which I suspect would make this alot easier
My best try is something like this:
var find = 'tst';
var rStarts = [];
var rEnds = [];
for (var i = 0; i < find.length - 1; i++) {
rStarts.push(= '(' + find[i] + find[i + 1] )
rEnds.push( find[i] + '[^]*?' + find[i + 1] + ')' );
}
But halfway through I realized I had no idea where I was going with it.
Any ideas how to do this?

You can do something like this:
Compute regexps for all combinations of substrings of the needle in the order you prefer and match them sequentially. So for your test, you can do the following matches:
/(tst)/
/(ts).*(t)/
/(t).*(st)/ // <- this one matches
/(t).*(s).*(t)/
Computing the regexps is tricky and making them in the right order depends on whether you prefer a 4-1-1 split over a 2-2-2 split.

This finds the shortest collection of a supplied group of letters:
function findChars(chars,string)
{
var rx = new RegExp(chars.split("").join(".*?"),"g");
var finds = [];
while(res = rx.exec(string))
{
finds.push(res[0]);
rx.lastIndex -= res[0].length-1;
}
finds.sort(function(a,b) { return a.length-b.length; })
return finds[0];
}
var s2 = 'this is a test test tst string';
console.log(findChars('tst',s2));//"tst"
console.log(findChars('ess',s2));//"est ts"

Well, I'm still not sure what you're looking for exactly, but maybe that will do for a first try:
.*?(t)(s)(t)|.*?(t)(s).*?(t)|.*?(t).*?(s)(t)|(t).*?(s).*?(t)
regex101 demo
I'm capturing each of the letters here, but if you don't mind grouping them...
.*?(tst)|.*?(ts).*?(t)|.*?(t).*?(st)|(t).*?(s).*?(t)
This will match the parts you mentioned in your question.

You can use lookaheads to mimic atomic groups, as discussed in this article. This regex seems to do what want:
/^(?:(?=(.*?tst))\1|(?=(.*?ts.+?t))\2|(?=(.*?t.+?st))\3|(?=(.*?t.+?s.+?t))\4)/
...or in human-readable form:
^
(?:
(?=(.*?tst))\1
|
(?=(.*?ts.+?t))\2
|
(?=(.*?t.+?st))\3
|
(?=(.*?t.+?s.+?t))\4
)
ref

We Keep Coding

JavaScript is the programming language of the Web.

Regex pattern that will match similar or same strings in subject - javascript

Related

Javascript cut string by begin and end and store in array

Regex: How to determine if a string contains the entire set of a given substring?

get all but last occurrences of a pattern in javascript

Change occurrences of sum(something) to something_sum

Find chars in string but prefer consecutive chars with NFA without atomic grouping

Categories

Resources