Regex match character set without previous duplication

Regex match character set without previous duplication - javascript

What is the best way to create a dynamic regex to match a distinct set of characters (characters and their order provided during runtime).
character set: abcd
character format: ??j? (question mark represents a a character from character set)
Example
abjd = match
bdja = match
dbja = match
ab = no match
aajd = no match
abjdd = no match
abj = no match
I have created a regex builder (in js) as follows:
// characters are the character set
// wordFormat is the character format
// replace(str, search, replacement) replaces search in str with replacement
var chars = "[" + characters + "]{1}";
var afterSpecialConversion = replace(wordFormat, "?", chars);
var myRegex = new RegExp("^" + afterSpecialConversion + "$", "gi");
Unfortunately this does not achieve the result as it does not consider duplicate items. I thought about using matching groups to avoid duplicates however I don't know how to negate the already existing character group from the remainder of the set.
Also given character set aabcd now a can exist twice. Any suggestions?

Your regex-builder approach is correct (though a bit of a maintanability mess, so document it carefully), but not quite sophisticated enough. What you need to do is use lookaheads.
I've provided an example regex on Regex101 for the demo in your question.
The more general principle is to replace each set of n question marks with a pattern that matches this:
(?:([<chars>])(?!.*\<m>)){<n>}
Where <chars> is the character set you want to use, m is the index of the set of question marks (starting from 1 - more on this in a moment), and <n> is the number of question marks in the group. This yields regex-builder code that looks like this:
function getRe(pattern, chars) {
var re = "^";
var qMarkGroup = 1;
var qMarkCount = 0;
for (var index in pattern) {
var char = pattern[index];
if (char === "?") {
qMarkCount += 1;
} else {
if (qMarkCount > 0) {
re += "(?:([" + chars + "])(?!.*\\" + qMarkGroup + ")){" + qMarkCount + "}" + char;
qMarkCount = 0;
qMarkGroup += 1;
}
}
}
// Need to do this again in case we have a group of question marks at the end of the pattern
if (qMarkCount > 0) {
re += "(?:([" + chars + "])(?!.*\\" + qMarkGroup + ")){" + qMarkCount + "}";
}
re += "$";
return new Regexp(re, "gi");
}
Code demo on Repl.it
Obviously, this function definition is very verbose, to demonstrate the principles involved. Feel free to golf it (but remember to watch out for fencepost issues like I've described in the comments).
Additionally, be sure to sanitize the inputs. This is an example and will break if someone, for instance, puts in ] in chars.

Related

How to get total sum of matches from a loop?

I'm trying to loop through an array to check whether any of the words in the array are in a body of text:
for(var i = 0; i < wordArray.length; i++ ) {
if(textBody.indexOf(wordArray[i]) >= 1) {
console.log("One or two words.");
// do something
}
else if (textBody.indexOf(wordArray[i]) >= 3) {
console.log("Three or more words.");
// do something
}
else {
console.log("No words match.");
// do something
}
}
where >= 1 and >= 3 are supposed to determine the number of matched words (although it might just be determining their index position in the array? As, in its current state it will console.log hundreds of duplicate strings from the if / else statement).
How do I set the if / else statement to do actions based off of the amount of matched words?
Any help would be greatly appreciated!

Try this:
for (var i = 0; i < wordArray.length; i++) {
var regex = new RegExp('\\b' + wordArray[i] + '\\b', 'ig');
var matches = textBody.match(regex);
var numberOfMatches = matches ? matches.length : 0;
console.log(wordArray[i] + ' found ' + numberOfMatches + " times");
}
indefOf will do partial matches. For example "This is a bust".indexOf("bus") would match even though that is probably not what you want. It is better to use a regular expression with the word boundry token \b to eliminate partial word matches. In the Regexp constructor you need to escape the slash so \b becomes \\b. The regex uses the i flag to ignore case and the g flag to find all matches. Replace the console.log line with your if/else logic based on the numberOfMatches variable.
UPDATE: Per your clarification you would change the above to
var numberOfMatches = 0;
for (var i = 0; i < wordArray.length; i++) {
var regex = new RegExp('\\b' + wordArray[i] + '\\b', 'ig');
var matches = textBody.match(regex);
numberOfMatches += matches ? matches.length : 0;
}
console.log(numberOfMatches);

indexOf() provides the index of the first match, not the number of matches. So currently you're testing first if it appears at index one, then at index three - not counting the number of matches.
I can think of a couple different approaches off the top of my head that would work, but I'm not going to write them for you because this sounds like school work. One would be to use match: see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match and Count number of matches of a regex in Javascript
If you're scared of using regex, or can't be assed to spend the time learning how they work, you could get the index of the match, and if it matches make a substring excluding the portion up to that match, and test if it matches again, while incrementing a counter. indexOf() will return -1 if no matches are found.

You can split text to words with regExp and than find all occurrences of your word in this way
var text = "word1, word2, word word word word3"
var allWords = text.split(/\b/);
var getOccurrenceCount = function(word, allWords) {
return allWords.reduce(function(count, nextWord) {
count += word == nextWord ? 1 : 0;
return count;
}, 0);
};
getOccurrenceCount("word", allWords);

This may help you:
You have to use .match instead of .indexOf (get the index of the first occurence inside the string)
var textBody = document.getElementById('inside').innerHTML;
var wordArray = ['check','test'];
for(var i = 0; i < wordArray.length; i++ ) {
var regex = new RegExp( wordArray[i], 'g' );
var wordCount = (textBody.match(regex) || []).length;
console.log(wordCount + " times the word ["+ wordArray[i] +"]");
}
<body>
<p id="inside">
this is your test, check the test, how many test words check
<p>
</body>

I would first put the array into a hashmap, something like
_.each(array, function(a){map[a]=1})
Second split string into array by space and marks.
Loop through the new array to check if the word exist in the first map.
Make sure to compare string/words without cases.
This approach will help you improve the run time efficiency to linear.

Yes .indexOf gives you the first position of the word in the string. Many methods available to count a word in a string, I'm sharing my crazy version :
function matchesCount(word, str) {
return (' ' + str.replace(/[^A-Za-z]+/gi,' ') + ' ')
.split(' '+word+' ').length - 1;
}
console.log(matchesCount('test', 'A test to test how many test in this'));

Javascript: Back-Referencing in Regex without remembering last match

Is there any thing that doesn't remember the last Matched value in back referencing ?
(abc|def)=\1 matches abc=abc or def=def, but not abc=def or def=abc.
In a string i need to match a pattern that matches abc=def or def=abc with back-referencing. But i can match only abc=abc or def=def.
I can do it with regex pattern like (abc|def)=(abc|def).This matches all the cases like abc=abc, def=def, abc=def, def=abc
In my case abc or def is a very long thing to search and if i have to change the regex i have to change in both the places. Is there any way to back-reference to the group without remembrance ?
https://regex101.com/r/bL9wU1/1

You cannot use back references for this task.
Backreferences match the same text as previously matched by a capturing group.
That is, they do not use the same pattern defined in a capturing group.
A workaround is to build a dynamic regex pattern with blocks and use a RegExp constructor:
var ss = [ "abc=abc", "def=def", "abc=def", "def=abc"]; // Test strings
var block = "(?:abc|def)"; // Define the pattern block
var rx = RegExp(block + "=" + block); // Build the regex dynamically
document.body.innerHTML += "Pattern: <b>" + rx.source
+ "</b><br/>"; // Display resulting pattern
for (var s = 0; s < ss.length; s++) { // Demo
document.body.innerHTML += "Testing \"<i>" + ss[s] + "</i>\"... ";
document.body.innerHTML += "Matched: <b>" + rx.test(ss[s]) + "</b><br/>";
}

Add colon (:) after every 2nd character using Javascript

I have a string and want to add a colon after every 2nd character (but not after the last set), eg:
12345678
becomes
12:34:56:78
I've been using .replace(), eg:
mystring = mystring.replace(/(.{2})/g, NOT SURE WHAT GOES HERE)
but none of the regex for : I've used work and I havent been able to find anything useful on Google.
Can anyone point me in the right direction?

Without the need to remove any trailing colons:
mystring = mystring.replace(/..\B/g, '$&:')
\B matches a zero-width non-word boundary; in other words, when it hits the end of the string, it won't match (as that is considered to be a word boundary) and therefore won't perform the replacement (hence no trailing colon, either).
$& contains the matched substring (so you don't need to use a capture group).

mystring = mystring.replace(/(..)/g, '$1:').slice(0,-1)
This is what comes to mind immediately. I just strip off the final character to get rid of the colon at the end.
If you want to use this for odd length strings as well, you just need to make the second character optional. Like so:
mystring = mystring.replace(/(..?)/g, '$1:').slice(0,-1)

If you're looking for approach other than RegEx, try this:
var str = '12345678';
var output = '';
for(var i = 0; i < str.length; i++) {
output += str.charAt(i);
if(i % 2 == 1 && i > 0) {
output += ':';
}
}
alert(output.substring(0, output.length - 1));
Working JSFiddle

A somewhat different approach without regex could be using Array.prototype.reduce:
Array.prototype.reduce.call('12345678', function(acc, item, index){
return acc += index && index % 2 === 0 ? ':' + item : item;
}, ''); //12:34:56:78

mystring = mytring.replace(/(.{2})/g, '\:$1').slice(1)
try this

Easy, just match every group of up-to 2 characters and join the array with ':'
mystring.match(/.{1,2}/g).join(':')
var mystring = '12345678';
document.write(mystring.match(/.{1,2}/g).join(':'))
no string slicing / trimming required.

It's easier if you tweak what you're searching for to avoid an end-of-line colon(using negative lookahead regex)
mystring = mystring.replace(/(.{2})(?!$)/g, '\$1:');

mystring = mystring.replace(/(.{2})/g, '$1\:')
Give that a try

I like my approach the best :)
function colonizer(strIn){
var rebuiltString = '';
strIn.split('').forEach(function(ltr, i){
(i % 2) ? rebuiltString += ltr + ':' : rebuiltString += ltr;
});
return rebuiltString;
}
alert(colonizer('Nicholas Abrams'));
Here is a demo
http://codepen.io/anon/pen/BjjNJj

Regex match quotes inside bracket regex

I'm working on a regex that must match only the text inside quotes but not in a comment, my macthes must only the strings in bold
<"love";>
>/*"love"*/<
<>'love'<>
"lo
more love
ve"
I'm stunck on this:
/(?:((\"|\')(.|\n)*?(\"|\')))(?=(?:\/\**\*\/))/gm
The first one (?:((\"|\')(.|\n)*?(\"|\'))) match all the strings
the second one (?=(?:\/\**\*\/)) doesn't match text inside quotes inside /* "mystring" */
bit my logic is cleary wrong
Any suggestion?
Thanks

Maybe you just need to use a negative lookahead to check for the comment end */?
But first, I'd split the string into separate lines
var arrayOfLines = input_str.split(/\r?\n/);
or, without empty lines:
var arrayOfLines = input_str.match(/[^\r\n]+/g);
and then use this regex:
["']([^'"]+)["'](?!.*\*\/)
Sample code:
var rebuilt_string = ''
var re = /["']([^'"]+)["'](?!.*\*\/)/g;
var subst = '<b>$1</b>';
for (i = 0; i < arrayOfLines.length; i++)
{
rebuilt_string = rebuilt_string + arrayOfLines[i].replace(re, subst) + "\r\n";
}

The way to avoid commented parts is to match them before. The global pattern looks like this:
/(capture parts to avoid)|target/
Then use a callback function for the replacement (when the capture group exists, return the match without change, otherwise, replace the match with what you want.
Example:
var result = text.replace(/(\/\*[^*]*(?:\*+(?!\/)[^*]*)*\*\/)|"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*'/g,
function (m, g1) {
if (g1) return g1;
return '<b>' + m + '</b>';
});

Regex, grab only one instance of each letter

I have a paragraph that's broken up into an array, split at the periods. I'd like to perform a regex on index[i], replacing it's contents with one instance of each letter that index[i]'s string value has.
So; index[i]:"This is a sentence" would return --> index[i]:"thisaenc"
I read this thread. But i'm not sure if that's what i'm looking for.

Not sure how to do this in regex, but here's a very simple function to do it without using regex:
function charsInString(input) {
var output='';
for(var pos=0; pos<input.length; pos++) {
char=input.charAt(pos).toLowerCase();
if(output.indexOf(char) == -1 && char != ' ') {output+=char;}
}
return output;
}
alert(charsInString('This is a sentence'));

As I'm pretty sure what you need cannot be achieved using a single regular expression, I offer a more general solution:
// collapseSentences(ary) will collapse each sentence in ary
// into a string containing its constituent chars
// #param {Array} the array of strings to collapse
// #return {Array} the collapsed sentences
function collapseSentences(ary){
var result=[];
ary.forEach(function(line){
var tmp={};
line.toLowerCase().split('').forEach(function(c){
if(c >= 'a' && c <= 'z') {
tmp[c]++;
}
});
result.push(Object.keys(tmp).join(''));
});
return result;
}
which should do what you want except that the order of characters in each sentence cannot be guaranteed to be preserved, though in most cases it is.
Given:
var index=['This is a sentence','This is a test','this is another test'],
result=collapseSentences(index);
result contains:
["thisaenc","thisae", "thisanoer"]

(\w)(?<!.*?\1)
This yields a match for each of the right characters, but as if you were reading right-to-left instead.
This finds a word character, then looks ahead for the character just matched.

Nevermind, i managed:
justC = "";
if (color[i+1].match(/A/g)) {justC += " L_A";}
if (color[i+1].match(/B/g)) {justC += " L_B";}
if (color[i+1].match(/C/g)) {justC += " L_C";}
if (color[i+1].match(/D/g)) {justC += " L_D";}
if (color[i+1].match(/E/g)) {justC += " L_E";}
else {color[i+1] = "L_F";}
It's not exactly what my question may have lead to belive is what i wanted, but the printout for this is what i was after, for use in a class: <span class="L_A L_C L_E"></span>

How about:
var re = /(.)((.*?)\1)/g;
var str = 'This is a sentence';
x = str.toLowerCase();
x = x.replace(/ /g, '');
while(x.match(re)) {
x=x.replace(re, '$1$3');
}

I don't think this can be done in one fell regex swoop. You are going to need to use a loop.
While my example was not written in your language of choice, it doesn't seem to use any regex features not present in javascript.
perl -e '$foo="This is a sentence"; while ($foo =~ s/((.).*?)\2/$1/ig) { print "<$1><$2><$foo>\n"; } print "$foo\n";'
Producing:
This aenc

We Keep Coding

JavaScript is the programming language of the Web.

Regex match character set without previous duplication - javascript

Related

How to get total sum of matches from a loop?

Javascript: Back-Referencing in Regex without remembering last match

Add colon (:) after every 2nd character using Javascript

Regex match quotes inside bracket regex

Regex, grab only one instance of each letter

Categories

Resources