Regex lookbehind workaround for Javascript? - javascript

I am terrible at regex so I will communicate my question a bit unconventionally in the name of trying to better describe my problem.
var TheBadPattern = /(\d{2}:\d{2}:\d{2},\d{3})/;
var TheGoodPattern = /([a-zA-Z0-9\-,.;:'"])(?:\r\n?|\n)([a-zA-Z0-9\-])/gi;
// My goal is to then do this
inputString = inputString.replace(TheGoodPattern, '$1 $2);
Question: I want to match all the good patterns and do the subsequent find/replace UNLESS they are proceeded by the bad pattern, any ideas on how? I was able to accomplish this in other languages that support lookbehind but I am at a loss without it? (ps: from what I understand, JS does not support lookahead/lookbehind or if you prefer, '?>!', '?<=')

JavaScript does support lookaheads. And since you only need a lookbehind (and not a lookahead, too), there is a workaround (which doesn't really aid the readability of your code, but it works!). So what you can do is reverse both the string and the pattern.
inputString = inputString.split("").reverse().join("");
var pattern = /([a-z0-9\-])(?:\n\r?|\r)([a-z0-9\-,.;:'"])(?!\d{3},\d{2}:\d{2}:\d{2})/gi
inputString = inputString.replace(TheGoodPattern, '$1 $2');
inputString = inputString.split("").reverse().join("");
Note that you had redundantly used the upper case letters (they are being taken care of the i modifier).
I would actually test it for you if you supplied some example input.

I have also used the reverse methodology recommended by m.buettner, and it can get pretty tricky depending on your patterns. I find that workaround works well if you are matching simple patterns or strings.
With that said I thought I would go a bit outside the box just for fun. This solution is not without its own foibles, but it also works and it should be easy to adapt to existing code with medium to complicated regular expressions.
http://jsfiddle.net/52QBx/
js:
function negativeLookBehind(lookBehindRegExp, matchRegExp, modifiers)
{
var text = $('#content').html();
var badGoodRegex = regexMerge(lookBehindRegExp, matchRegExp, modifiers);
var badGoodMatches = text.match(badGoodRegex);
var placeHolderMap = {};
for(var i = 0;i<badGoodMatches.length;i++)
{
var match = badGoodMatches[i];
var placeHolder = "${item"+i+"}"
placeHolderMap[placeHolder] = match;
$('#content').html($('#content').html().replace(match, placeHolder));
}
var text = $('#content').html();
var goodRegex = matchRegExp;
var goodMatches = text.match(goodRegex);
for(prop in placeHolderMap)
{
$('#content').html($('#content').html().replace(prop, placeHolderMap[prop]));
}
return goodMatches;
}
function regexMerge(regex1, regex2, modifiers)
{
/*this whole concept could be its own beast, so I just asked to have modifiers for the combined expression passed in rather than determined from the two regexes passed in.*/
return new RegExp(regex1.source + regex2.source, modifiers);
}
var result = negativeLookBehind(/(bad )/gi, /(good\d)/gi, "gi");
alert(result);
​
html:
<div id="content">Some random text trying to find good1 text but only when that good2 text is not preceded by bad text so bad good3 should not be found bad good4 is a bad oxymoron anyway.</div>​
The main idea is find all the total patterns (both the lookbehind and the real match) and temporarily remove those from the text being searched. I utilized a map as the values being hidden could vary and thus each replacement had to be reversible. Then we can run just the regex for the items you really wanted to find without the ones that would have matched the lookbehind getting in the way. After the results are determined we swap back in the original items and return the results. It is a quirky, yet functional, workaround.

Related

How would I go about splitting a string by two brackets with regex?

I have been working with Discord.js and Node to a quick bot to look up something. I need a way to find all the occurrences that appear between two square brackers and store them in an array of strings. For now I'm using string-split() with some regex, but I am unsure of the regex to use.
I have tried using a few different ones, including /[^\[\[]+(?=\]\])/g and \[\[(.*?)\]\] - I dont mind having the actual brackets in the results, I can remove them manually with string.replace().
I am also working on a fallback with the normal string.split() and other string functions, not relying on regex, but I'm still curious about a possible regex version.
The result with the first regex is totally incorrect. For example, if I try "does [[this]] work [at all]?" the output is "[[]]" and "[at all]", when it really shouldn't take the "at all", but it shouls show the "[[this]]".
With the second regex I get somewhat closer, it gives back "this"(correct) and "[at all]" (again, it shouldn't take the "at all").
I don't mind having the brackets in the output, I can remove them manually myself, but I need to find all occurrences that are specifically between two brackets.
Try this regex:
\[\[([^[\]]|(?R))*\]\]
What you are trying to do is called Matching Balanced Constructs. More info at the link.
Upon further testing, unfortunately JS does not support (?R) so this becomes far more difficult. You could use the XRegExp.matchRecursive addon from the XRegExp package.
And your expression \[\[(.*?)\]\] should work. Working example below.
var str = 'does [[this]] work [at all] with another double [[here]]?';
var result = str.match(/\[\[(.*?)\]\]/g);
var newDiv = document.createElement("div");
newDiv.innerHTML = result;
document.body.appendChild(newDiv);
Try my solution
var str = "does [[this]] work [at all]?";
var regexp = /\[([a-z0-9\s]+)\]/ig;
var resultArray = str.match(regexp);
resultArray = resultArray.map((item) => {
return item.replace(/(\[|\])/g, "");
})
console.log(resultArray);

How to make RegExp find the first match only starting from an arbitrary index?

I'm creating a transpiler from an obscure scripting language(thyme) to javascript, in javascript.
As a whole its far too complex to interpret it through regex alone but being able to use regex would save me a lot of finger stamina.
Heres the situation:
I have a source code
I have an index which I know is the starting point of the thing I'm trying to capture
I want to capture the first occurrence that matches the regexp that I have and the first only as later matches would just be wasting performance without knowing their context at this time.
.
var src = "++this is my ++cool source ++code";
var idx = 13;
var regex = /(\+\+[^\{\[\(\)\]\}\;\,\?\:\.\=\+\-\*\/\<\>\%\&\|\^\!\~ \n\r\t]+)/g;
var my_capture = ???
How do I make it so that the above snippet would result in my_capture == "++cool"?

How to find any of the specific characters exists in a string

Im looking for a solution to search the existence of given characters in a string. That means if any of the given characters present in a string, it should return true.
Now am doing it with arrays and loops. But honestly I feel its not a good way. So is there is any easiest way without array or loop?
var special = ['$', '%', '#'];
var mystring = ' using it to replace VLOOKUP entirely.$ But there are still a few lookups that you are not sure how to perform. Most importantly, you would like to be able to look up a value based on multiple criteria within separate columns.';
var exists = false;
$.each(special, function(index, item) {
if (mystring.indexOf(item) >= 0) {
exists = true;
}
});
console.info(exists);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
try with regex
var patt = /[$%#]/;
console.log(patt.test("using it to replace VLOOKUP entirely.$ But there are still a few lookups that you are not sure how to perform. Most importantly, you would like to be able to look up a value based on multiple criteria within separate columns."));
Be aware that [x] in regEx is for single characters only.
If you say wanted to search for say replace, it's going to look for anything with 'r,e,p,l,a,c' in the string.
Another thing to be aware of with regEx is escaping. Using a simple escape regEx found here -> Is there a RegExp.escape function in Javascript? I've made a more generic find in string.
Of course you asked given characters in a string, so this is more of an addenum answer for anyone finding this post on SO. As looking at your original question of an array of strings, it might be easy for people to think that's what you could just pass to the regEx. IOW: your questions wasn't how can I find out if $, %, # exist in a string.
var mystring = ' using it to replace VLOOKUP entirely.$ But there are still a few lookups that you are not sure how to perform. Most importantly, you would like to be able to look up a value based on multiple criteria within separate columns.';
function makeStrSearchRegEx(findlist) {
return new RegExp('('+findlist.map(
s=>s.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')).join('|')+')');
}
var re = makeStrSearchRegEx(['$', '%', '#', 'VLOOKUP']);
console.log(re.test(mystring)); //true
console.log(re.test('...VLOOKUP..')); //true
console.log(re.test('...LOOKUP..')); //false
The best way is to use regular expressions. You can read more about it here.
In your case you should do something like this:
const specialCharacters = /[$%#]/;
const myString = ' using it to replace VLOOKUP entirely.$ But there are still a few lookups that you are not sure how to perform. Most importantly, you would like to be able to look up a value based on multiple criteria within separate columns.';
if(specialCharacters.test(myString)) {
console.info("Exists...");
}
Please, note, that it is good approach to store regular expressions in a variable to prevent creating of regular expression (which is not the fastest operation) each time you use it.

match css rules in javascript

I need to create a regular expression to find class inside a css file.
For example I have this css file:
#label-blu{
}
.label-blu, .test{
}
.label-blu-not-match{
}
.label-blu{
}
.label-blu span{
}
In this case I need to return 3 match
This is my regular expression:
var css = data;
var find_css = 'label-blu';
var found = css.match(/([#|\.]?)([\w|:|\s|\.]+)/gmi).length;
console.log('found: ' + found);
Inside var data there is all the css string
How can I solve?
Thanks
There are two points:
("word-does-not-include-hyphen").replace(/\w+/g, 'test')
And are you sure you should be matching against css label text label-blu? rather than the full css text itself? Currently you are finding the separations across the hyphen for label-blu...
var css = 'label-blu';
var found = css.match(/([#|\.]?)([\w|:|\s|\.]+)/gmi);
/// which gives ['label','blu']
Which is the reason for the returned length of two, rather than three. Were you not hoping to match the three items in the css text i.e
#label-blu
.label-blu-not-match
.label-blu
If so you will need to use a different text to match, the entire css, rather than just the string 'label-blue'.
However if you are trying to match:
#label-blu
.label-blu, .test
.label-blu
.label-blu span
Then you will need a different RegExp and the entire css string. Just need clarification on which route you need?
update
It's still not clear exactly out of the css text what you wish to match, this is the reason why I have outlined exactly. However, on the assumption you want to match the last four items I mention (and assuming you don't wish to match label-blu-not-match) then the following should help:
http://jsfiddle.net/5d7JX/
var found = csstext.match(/[#\.]label-blu([,:\s\.][^\{]*)?\{/gmi);
However the above is not full-proof for all possible css formats, nor does it protect against matches within the css rule-sets themselves. Generally speaking scanning through code that is usually quite complicated to parse into something logical using only Regular Expressions is frowned upon; unless you are solving a very specific use-case.
update 2
Yes excluding the ID selectors just involves removing the # part of the Reg Exp...
var found = csstext.match(/\.label-blu([,:\s\.][^\{]*)?\{/gmi);
I recommend that you read up on your regular expressions, this site is a good place:
http://www.regular-expressions.info/
update 3
To include a variable as part of a regular expression you will need to make sure you escape the characters to make the string literal, so any special characters wont interfere. As far as I'm aware there isn't a built in function to escape or quote for regular expressions in JavaScript; however you can find one here:
How to escape regular expression in javascript?
So if you add this to your code:
RegExp.quote = function(str) {
return (str+'').replace(/([.?*+^$[\]\\(){}|-])/g, "\\$1");
};
You then also need to convert your regexp to the object equivalent:
var reg = new RegExp('\\.label-blu([,:\\s\\.][^\\{]*)?\\{', 'gmi');
var found = csstext.match(reg);
And then add this:
var label = 'label-blu';
var reg = new RegExp('\\.' + RegExp.quote(label) + '([,:\\s\\.][^\\{]*)?\\{', 'gmi');
var found = csstext.match(reg);
http://jsfiddle.net/5d7JX/1/
In your example if you use:
var findClass = /(\.label-blu)(?!-)+/g;
var found = css.match(findClass).length;
should return 3...
maybe a better solution is:
var findClass = /(\.label-blu)[\s{,]+/g;
var found = css.match(findClass).length;
to cover a possibility when you might have something else rather than '-' added to your wanted class and it will only look for the class that's followed by a 'space' a '{' or a ','...
let me know if you have any questions

Javascript Regex to select every non-alphanumeric character AND whitespace?

I'm new to JS, tried using -
/[0-9a-z]+$/gi
/[^0-9a-z]+$/gi
neither worked. Can anyone tell me where I am going wrong?
Replace
var sentence_split = arr.split(/[0-9a-z]+$/gi);
with
var sentence_split = arr.split(/[^0-9a-z]+/gi);
... if you prefer to go this way.
Explanation: the original regex was anchored (with $) to the end of the string, and splitted by words - and not symbols separating them.
Still, there's more than one way to do the things you do: I'd probably go just with:
var words = sentence.match(/(\w+)/g);
... capturing sequences of word-consisting symbols instead of splitting the phrase by something that separates them. Here's a Fiddle to play with.
UPDATE: And one last thing. I felt a bit... uneasy about wasting sort just to get max essentially. I don't know if you share these thoughts, still here's how I would update the searching code:
var longest;
words.forEach(function(e) {
if (! longest || longest.length < e.length) {
longest = e;
}
});
It's forEach, because I'm a bit lazy and imagine having a luxury of NOT working with IE8-; still, it's quite easy to rewrite this into a regular for... array-walking routine.
Updated fiddle.

Categories