Different behaviours when using regex in javascript - javascript

I'm using javascript and regex to scan a sentence for a particular word('schiz') then return the match along with 5 words in front of and behind my queried word.
However, I seem to be running into an odd situation, where the "new Regexp()" object doesn't behave the same way as just using the plain regex form.
In the following code, if I use:
reg = /([^\s]+\s){0,5}schiz([^\s]+\s){0,5}/g
then it returns as expected, but since I need the query word to be a variable, I need to use "new regexp()" to create my regex.
reg = RegExp("([^\s]+\s){0,5}"+query+"([^\s]+\s){0,5}","g");
Where query is "schiz", doesn't give the same result, can anyone explain why this is the case?
Here is the entire snippet:
var matchItem = "<p><strong>Indications</strong></p>
<p>- Used to treat "resistant schizophrenia" (resistant meaning pt has tried 2 other antipsychotics to little effect)</p>
<p>- Better for refractory schizophrenia than chlorpromazine</p>";
var query = "schiz";
var reg = RegExp("([^\s]+\s){0,5}"+query+"([^\s]+\s){0,5}","g");
//reg = /([^\s]+\s){0,5}schiz([^\s]+\s){0,5}/g
var ms = ("" + matchItem).match(reg);
if(ms!=null){
ms = ms.join("...");
}
return ms;

\ is a special character in a string literal, you have to escape it
"([^\\s]+\\s){0,5}"+query+"([^\\s]+\\s){0,5}"

Related

JavaScript - strip everything before and including a character

I am relatively new to RegEx and am trying to achieve something which I think may be quite simple for someone more experienced than I.
I would like to construct a snippet in JavaScript which will take an input and strip anything before and including a specific character - in this case, an underscore.
Thus 0_test, 1_anotherTest, 2_someOtherTest would become test, anotherTest and someOtherTest, respectively.
Thanks in advance!
You can use the following regex (which can only be great if your special character is not known, see Alex's solution for just _):
^[^_]*_
Explanation:
^ - Beginning of a string
[^_]* - Any number of characters other than _
_ - Underscore
And replace with empty string.
var re = /^[^_]*_/;
var str = '1_anotherTest';
var subst = '';
document.getElementById("res").innerHTML = result = str.replace(re, subst);
<div id="res"/>
If you have to match before a digit, and you do not know which digit it can be, then the regex way is better (with the /^[^0-9]*[0-9]/ or /^\D*\d/ regex).
Simply read from its position to the end:
var str = "2_someOtherTest";
var res = str.substr(str.indexOf('_') + 1);

Firefox Extension How to pass string type variable into javascript replace regex parameter?

I've been working on firefox extension for several days now and there is one thing I can't solve.
I generate a list of regex and I wanted to pass that string into replace function in javascript (in the regex parameters). Here is the example of the string:
/(https?:\/\/(www\.)?rapidgator\.net\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))/g
/(https?:\/\/(www\.)?ul\.to\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))/g
/(https?:\/\/(www\.)?uploadable\.ch\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))/g
/(https?:\/\/(www\.)?180upload\.com\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))/g
For a convenient way, lets make it this way. I managed to get the file and get the first line of the string and assign it into a variable:
var rapidgator = "/(https?:\/\/(www\.)?rapidgator\.net\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))/g";
I want the string to be a "replace parameter" like this:
var rep = rep.replace(rapidgator,"<a href='$1'>$1</a>");
But I cant get that work.
I've been trying to use RegExp object and that didn't work to.
var rapidgator = new RegExp("(https?:\/\/(www\.)?rapidgator\.net\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))", "g");
How to make that work? Thank you for your advice :)
If you can get the regex, why not let it remain a regex literal?
var rapidgator = /(https?:\/\/(www\.)?rapidgator\.net\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))/g;
If you want to make it through RegExp constructor, make sure you escape \ with another backslash and you don't need delimiters and the second argument takes the flags.
As in
var rapidgator = new RegExp("(https?:\\/\\/(www\\.)?rapidgator\\.net\\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))","g")
You need to escape the backslash one more time when passing your regex within double quotes.
var rapidgator = new RegExp("(https?://(www\\.)?rapidgator\\.net\\b([-a-zA-Z0-9#:%_\\\\+.~#?&/=]*))", "g");
And also to match a backslash, you need to escape it exactly three times.

Javascript Regexp Duplicate Line Matching not working correctly

I am writing a Javascript code to parse some grammar files, it is quite some code but I will post relevant information here. I am using Javascript Regexp in order to match a duplicate line held within a string. The string contains, for example (assume the string name is lines):
if
else
;
print
{
}
test1
test1
=
+
-
*
/
(
)
num
string
comment
id
test2
test2
What should happen, is a match found on 'test1' and 'test2'. It should then delete the duplicate, leaving 1 instance of test1 and test2. What is happening is no match at all. I am confident in my regex but javascript may be doing something I am not expecting. Here is the code doing the work on the string given above:
var rex = new RegExp("(.*)(\r?\n\1)+","g");
var re = '/(.*)(\r?\n\1)+/g';
rex.lastIndex = 0;
var m = rex.exec(lines);
if (m) {
alert("Found Duplicate");
var linenum = lines.search(re); //Get line number of error
alert("Error: Symbol Defined twice\n");
alert("Error occured on line: " + linenum);
lines = lines.replace(rex,""); //Gets rid of the duplicate
}
It never gets into the if(m) statement. Therefore no match is found. I tested the regex here: http://regexpal.com/ using the regex in my code as well as the example text provided. It matches just fine, so I am at kind of a loss. If anyone can help, it would be great.
Thank you.
Edit:
Forgot to add, I am testing this in firefox, and it only has to work in firefox. Not sure if that matters.
First error: \ in a JS string is also an escape character.
var rex = new RegExp("(.*)(\r?\n\1)+","g");
should be written
var rex = new RegExp("(.*)(\\r?\\n\\1)+","g");
// or, shorter:
var rex = /(.*)(\r?\n\1)+/g;
if you want to make it work. In the case of the RegExp constructor, you’re passing the pattern as a string to the constructor function. This means you need to escape each \ backslash that occurs in the pattern. If you use a regexp literal, you don’t need to escape them, since they’re not in a string, but retain their ‘normal’ properties in the regexp pattern.
Second error, your expression
var re = '/(.*)(\r?\n\1)+/g';
is wrong. What you’re doing here is assigning a string literal to a variable. I’m assuming you meant to assign a regular expression literal, which should be written like this:
var re = /(.*)(\r?\n\1)+/g;
Third error: the last line
lines = lines.replace(rex,""); //Gets rid of the duplicate
removes both instances of all duplicate lines! If you want to keep the first instance of each duplicate, you should use
lines = lines.replace(rex, "$1");
And finally, this method only detects two consecutive identical lines. Is that what you want, or do you need to detect any duplicates, wherever they may be?
var str = 'if\nelse\n;\nprint\n{\n}\ntest1\ntest1\n=\n+\n-\n*\n/\n(\n)\nnum\nstring\ncomment\nid\ntest2\ntest2\ntest2\ntest2\ntest2';
console.log(str);
str = str.replace(/\r\n?/g,'');
// I prefer replacing all the newline characters with \n's here
str = str.replace(/(^|\n)([^\n]*)(\n\2)+/g,function(m0,m1,m2,m3,ind) {
var line = str.substr(0,ind).split(/\n/).length + 1;
var msg = '[Found duplicate]';
msg += '\nFollowing symbol defined more than once';
msg += '\n\tsymbol: ' + m2;
msg += '\n\ton line ' + line;
console.log(msg);
return m1 + m2;
});
console.log(str);
Otherwise you can skip the first line and change the pattern into
/(^|\r\n?|\n)([^\r\n]*)((?:\r\n?|\n)\2)+/g
Note that [^\n]* will also catch multiple empty lines. If you want to make sure it matches (and replaces) non-empty lines then you might want to use [^\n]+.
[EDIT]
For the record, each m represents each arguments object, so m0 is the whole match, m1 is the 1st subgroup ((^|\n)), m2 is the 2nd subgroup (([^\n]*)) and m3 is the last subgroup ((\n\2)). I could have used arguments[n] instead but these are shorter.
As with the return value, due to lack of lookbehind in the regex flavor used by Javascript, this pattern is catching a possible preceding newline (unless it is the first line) so it needs to return the match and that preceding newline if any. That's why it shouldn't be returning m2 only.

Javascript Complex RegEx with variables

I am using this tool to build a regex http://www.gethifi.com/tools/regex
I found that the one below works for me if, for example, I am looking to match $aazz[AB]:
var regex = /[\+\=\-\*\^\\]\$aazz\[AB\]/g;
I have read the other posts on the RegEx constructor in Javascript but cannot manage to make the following work:
var preToken = "[\+\=\-\*\^\\]";
var toFind = "\$aazz\[AB\]";
var stringToReplace = "/" + preToken + toFind + "/";
var regex = new RegExp(stringToReplace, "g");
Here is the jsbin http://jsbin.com/ifeday/3/edit
Thanks
When creating regular expressions from strings, you need to escape your backslashes twice.
\ becomes \\
\\ becomes \\\\
So, you can try (in a character class not everything needs escaping):
var preToken = "[+=\\-*^\\\\]";
var toFind = "azz\\[A\\]";
Also, the string source for your regular expression does not need to be bound by /s, but I see in your jsBin that you've already corrected that.
Update your jsBin with these variable declarations, it should work now.

Javascript regex grouping

I am trying to create a regular expression that would easily replace an input name such as "holes[0][shots][0][unit]" to "holes[0][shots]1[unit]". I'm basically cloning a HTML input and would like to make sure its position is incremented.
I got my regex built and working correctly using this (awesome) tool : http://gskinner.com/RegExr/
Here is my current regex :
(.*\[shots\]\[)([0-9]+)(\].*\])
and I am using a replace such as :
$12$3
this transforms "holes[0][shots][0][unit]" into "holes[0][shots][2][unit]". This is exactly want I want. However, when I try this in javascript (http://jsfiddle.net/PH2Rh/) :
var str = "holes[0][shots][0][units]";
var reg =new RegExp("(.*\[shots\]\[)([0-9]+)(\].*\])", "g");
console.log(str.replace(​reg,'$1'));​​​​​​​​​​​​​​​​​​​​​​​​
I get the following output : holes[0
I don't understand how my first group is supposed to represent "holes[0", since I included the whole [shots][ part in it.
I appreciate any inputs on this. THank you.
In strings, a single \ is not interpreted as a Regex-escaping character. To escape the bracket within string literals, you have to use two backslashes, \\:
var reg = new RegExp("(.*\\[shots\\]\\[)([0-9]+)(\\].*\\])", "g");
A preferable solution is to use RegEx literals:
var reg = /(.*\[shots\]\[)([0-9]+)(\].*\])/g;
Looks like, this works:
var str = "holes[0][shots][0][units]";
var reg = /(.*\[shots\]\[)([0-9]+)(\].*\])/;
console.log(str.replace(​reg,'$1'));​​​​​​​​​​​​​​​​​​​​​​​​

Categories