Javascript Regex with variable and $1 - javascript

I have read How do you pass a variable to a Regular Expression javascript
I'm looking to create a regular expression to get and replace a value with a variable..
section = 'abc';
reg = new RegExp('\[' + section + '\]\[\d+\]','g');
num = duplicate.replace(reg,"$1++");
where $1 = \d+ +1
and... without increment... it doesn't work...
it returns something like:
[abc]$1
Any idea?

Your regex is on the right track, however to perform any kind of operation you must use a replacement callback:
section = "abc";
reg = new RegExp("(\\["+section+"\\]\\[)(\\d+)(\\])","g");
num = duplicate.replace(reg,function(_,before,number,after) {
return before + (parseInt(number,10)+1) + after;
});

I think you need to read up more on Regular Expressions. Your current regular expression comes out to:
/[abc][d+]/g
Which will match an "a" "b" or "c", followed by a "d" or "+", like: ad or c+ or bd or even zebra++ etc.
A great resource to get started is: http://www.regular-expressions.info/javascript.html

I see at least two problems.
The \ character has a special meaning in JavaScript strings. It is used to escape special characters in the string. For example: \n is a new line, and \r is a carriage return. You can also escape quotes and apostrophes to include them in your string: "This isn't a normally \"quoted\" string... It has actual \" characters inside the string as well as delimiting it."
The second problem is that, in order to use a backreference ($1, $2, etc.) you must provide a capturing group in your pattern (the regex needs to know what to backreference). Try changing your pattern to:
'\\[' + section + '\\]\\[(\\d+)\\]'
Note the double-backslashes. This escapes the backslash character itself, allowing it to be a literal \ in a string. Also note the use of ( and ) (the capturing group). This tells the regex what to capture for $1.
After the regex is instantiated, with section === 'abc':
new RegExp('\\[' + section + '\\]\\[(\\d+)\\]', 'g');
Your pattern is now:
/\[abc\]\[(\d+)\]/g
And your .replace will return \d+++ (where \d+ is the captured digits from the input string).
Demo: http://jsfiddle.net/U46yx/

Related

regex precceded by two or more special character

I am stuck with creating regex such that if the word is preceded or ended by special character more than one regex on each side regex 'exec' method should throw null. Only if word is wrap with exactly one bracket on each side 'exec' method should give result Below is the regular expression I have come up with.
If the string is like "(test)" or then only regex.exec should have values for other combination such as "((test))" OR "((test)" OR "(test))" it should be null. Below code is not throwing null which it should. Please suggest.
var w1 = "\(test\)";
alert(new RegExp('(^|[' + '\(\)' + '])(' + w1 + ')(?=[' + '\(\)' + ']|$)', 'g').exec("this is ((test))"))
If you have a list of words and want to filter them, you can do the following.
string.split(' ').filter(function(word) {
return !(/^[!##$%^&*()]{2,}.+/).test(word) || !(/[!##$%^&*()]{2,}$).test(word)
});
The split() function splits a string at a space character and returns an array of words, which we can then filter.
To keep the valid words, we will test two regex expressions to see if the word starts or ends with 2 or more special characters respectively.
RegEx Breakdown
^ - Expression starts with the following
[] - A single character in the block
!##$%^&*() - These are the special characters I used. Replace them with the ones you want.
{2,} - Matches 2 or more of the preceeding characters
.+ - Matches 1 or more of any character
$ - Expression ends with the following
To use the exec function this way do this
!(/^[!##$%^&*()]{2,}.+/).exec(string) || !(/[!##$%^&*()]{2,}$).exec(string)
If I understand correctly, you are looking for any string which contains (test), anywhere in it, and exactly that, right?
In that case, what you probably need is the following:
var regExp = /.*[^)]\(test\)[^)].*/;
alert(regExp.exec("this is ((test))")); // → null
alert(regExp.exec("this is (test))" )); // → null
alert(regExp.exec("this is ((test)" )); // → null
alert(regExp.exec("this is (test) ...")); // → ["this is (test) ..."]
Explanation:
.* matches any character (except newline) between zero and unlimited times, as many times as possible.
[^)] match a single character but not the literal character )
This makes sure there's your test string in the given string, but it is only ever wrapped with one brace in every side!
You can use the following regex:
(^|[^(])(\(test\))(?!\))
See regex demo here, replace with $1<span style="new">$2</span>.
The regex features an alternation group (^|[^(]) that matches either start of string ^ or any character other than (. This alternation is a kind of a workaround since JS regex engine does not support look-behinds.
Then, (\(test\)) matches and captures (test). Note the round brackets are escaped. If they were not, they would be treated as a capturing group delimiters.
The (?!\)) is a look-ahead that makes sure there is no literal ) right after test). Look-aheads are supported fully by JS regex engine.
A JS snippet:
var re = /(^|[^(])(\(test\))(?!\))/gi;
var str = 'this is (test)\nthis is ((test))\nthis is ((test)\nthis is (test))\nthis is ((test\nthis is test))';
var subst = '$1<span style="new">$2</span>';
var result = str.replace(re, subst);
alert(result);

Understanding some JavaScript with a RegExp

I have the following js code
var regex = new RegExp('([\'"]?)((?:\\\\\\1|.)+?)\\1(,|$)', 'g'),
key = regex.exec( m ),
val = regex.exec( m );
I would like to understand it.
In particular:
why there are all those backslash in the definition of the RegExp? I can clearly see that \\1 is a reference to the first saved element. Why in a new RegExp using ' and not " we need to use \\1 and not simple \1?
why there is a comma between the two definitions of key and val? I may guess that it depends on the "instances" finded using "g", but it is not very clear anyway to me.
I tried to execute the code with
m = 'batman, robin'
and the result is pretty a mess, and I cannot really explain it very well.
The code is taken from JQuery Cookbook, 2.12
why there are all those backslash in the definition of the RegExp?
"\\" is a string whose value is \. One backslash is used as an escape, the second for the value. Then, within the regex you also need to escape the backslash character again because backslash characters are used to mean special things within regex.
For example
"\\1"
is a string whose value is \1, which, in a regular expression, matches the first captured group.
"\\\\"
is a string whose value is \\, which, in a regular expression, matches a single \ character.
"\\\\\\1"
is a string whose value is \\\1, which, in a regular expression, matches a single \ followed by the first captured group.
This need to escape backslashes, and then escape them again is called "double escaping". The reason you need to double escape is so that you have the correct value within the regular expression. The first escape is to make sure that the string has the correct value, the second escape is so that the regular expression matches the correct pattern.
why there is a comma between the two definitions of key and val?
The code you posted is a variable declaration. It's easier to see when formatted:
var regex = ...,
key = ...,
val = ...;
Each of the variable names in the list are declared via the var keyword. It is the same as declaring the keywords separately:
var regex,
key,
val;
regex = ...
key = ...
val = ...
Which is the same as declaring each var with a different var keyword:
var regex = ...
var key = ...
var val = ...
There's a difference when writing dynamic regex objects and static regex objects. When you initialize a regex object with a string it needs to be transformed into a regex object. However, not only does the '\' holds a special value within regex objects but also within javascript strings, hence the double escape.
Edit: Regarding your second question. You can do multiple declarations with comma, like so:
var one = 'one',
two = 'two',
three = 'three';
2nd Edit: Here's what happens with your string once it compiles into a RegEx object.
/(['"]?)((?:\\\1|.)+?)\1(,|$)/g
The regex is better represented as a regex literal:
var regex = /(['"]?)((?:\\\1|.)+?)\1(,|$)/g;
Backslashes are used to escape special characters. For example, if your regex needs to match a literal period, writing . will not work, since . matches any character: you need to "escape" the period with a backslash: \..
Backslashes that are not themselves part of an escape sequence must be escaped, so if you want to match just a backslash in the text, you must escape it with a backslash: \\.
The reason your regular expression is so complicated when passed into the RegExp constructor is because you are representing the above regular expression as a string, which adds another "layer" of escaping. Thus, every single backslash must be escaped by yet another backslash and because the string is enclosed in single quotes, your single quote must be escaped with yet another backslash:
var regex = new RegExp('([\'"]?)((?:\\\\\\1|.)+?)\\1(,|$)', 'g'),

what does `\\s` mean in regular expression

I have a problem with regular expression:
var regex = new RegExp('(^|\\s)' + clsName + '(\\s|$)');
What does (^|\\s) mean? Isn't it equal to (^|\s), what does (^|) mean?
Am I right, it means that the string should start with any letter or white space? I tried to test with browser and console.log but still can't get any solution.
In all tutorials \s is used to be a space pattern not \\s.
Ok i got it, the problem was:
When using the RegExp constructor: for each backslash in your regular expression, you have to type \\ in the RegExp constructor. (In JavaScript strings, \\ represents a single backslash!) For example, the following regular expressions match all leading and trailing whitespaces (\s); note that \\s is passed as part of the first argument of the RegExp constructor:
re = /^\s+|\s+$/g
re = new RegExp('^\\s+|\\s+$','g')
(^|\\s) means: Start of the string (^) OR (|) a space \\s.
If clsName is "abc", for example, it builds the pattern (^|\\s)abc(\\s|$). That searches for "abc" at the start, middle, or end of the string, and it may be surrounded by spaces, so these are valid:
"abc"
"abc x"
"x abc"
"x abc y"
Note that here you are using a string to build a RegExp. JavaScript ignores escape characters it doesn't know - '\s' would be the same as 's', which isn't right.
Another option is to use word boundaries, but might fail on some case (for example, searching for btn would match for btn-primary):
var regex = new RegExp('\\b' + clsName + '\\b');
I'd also warn that clsName might contain regex meta-characters, so you may want to escape it.
Why not just split the string on " "?
var string = 'abc defh ij klm';
var elements = string.split(' ');
var clsName = 'abc';
elements.filter(function (el) {
return el === clsName;
});
No need for a RegEx like the one you posted.

How do I ignore $1 replace backreferencing in javascript

I have a string that a user can edit at any time, and a regex that is being conducted on the string, to add it to an xml and then save it but they can add '$1' to the string. I just want the text '$1' to be saved but I have to perform a regular expression on the same string that $1 is in. It replaces the $1 with a character from the regex every time.
How do I find, and replace, the $1 in this string?
Example of what is happening:
string1 = '<item id="1">i have $100</item>'
regexp = new RegExp('<item id="1"([^<]|<[^\/]|<\/[^i]|<\/i[^t]|<\/it[^e]|<\/ite[^m]|<\/item[^>])*<\/item>');
data = '<data><item id="1">i have no money</item><item id="2">i have no money</item></data>'
data = data.replace(regexp, string1);
Results
<data><item id="1">i have >00</item><item id="2">i have no money</item></data>
If you have a variable string that you want to put in your replace() call which might possibly have $N's in it, you can prevent the $N from being treated as a backreference by replacing $ with $$. Apparently, unlike other special characters in JS regex, the $ character cannot be escaped with a \ - it must be escaped with a preceding $ (go figure).
In your example, you could do the following to fix the issue:
data = data.replace(regexp, string1.replace('$', '$$$'));
This should turn any $'s into $$ in string1, preventing them from being treated as backreferences.
(Note: I found this little nugget here)
This should only happen if you have a capturing group in the regex.
If you don't want your groups to capture, then place ?: inside the start of the group.
/foo(?:bar)/
You can escape the $. Eg:
var replacement = '<item id="1">i have \\$100</item>';
Useful when you have capturing groups and need to write a $.

regex to parse string with escaped characters

I am reading information out of a formatted string.
The format looks like this:
"foo:bar:beer:123::lol"
Everything between the ":" is data I want to extract with regex. If a : is followed by another : (like "::") the data for this has to be "" (an empty string).
Currently I am parsing it with this regex:
(.*?)(:|$)
Now it came to my mind that ":" may exist within the data, as well. So it has to be escaped.
Example:
"foo:bar:beer:\::1337"
How can I change my regular expression so that it matches the "\:" as data, too?
Edit: I am using JavaScript as programming language. It seems to have some limitations regarding complex regulat expressions. The solution should work in JavaScript, as well.
Thanks,
McFarlane
var myregexp = /((?:\\.|[^\\:])*)(?::|$)/g;
var match = myregexp.exec(subject);
while (match != null) {
for (var i = 0; i < match.length; i++) {
// Add match[1] to the list of matches
}
match = myregexp.exec(subject);
}
Input: "foo:bar:beer:\\:::1337"
Output: ["foo", "bar", "beer", "\\:", "", "1337", ""]
You'll always get an empty string as the last match. This is unavoidable given the requirement that you also want empty strings to match between delimiters (and the lack of lookbehind assertions in JavaScript).
Explanation:
( # Match and capture:
(?: # Either match...
\\. # an escaped character
| # or
[^\\:] # any character except backslash or colon
)* # zero or more times
) # End of capturing group
(?::|$) # Match (but don't capture) a colon or end-of-string
Here's a solution:
function tokenize(str) {
var reg = /((\\.|[^\\:])*)/g;
var array = [];
while(reg.lastIndex < str.length) {
match = reg.exec(str);
array.push(match[0].replace(/\\(\\|:)/g, "$1"));
reg.lastIndex++;
}
return array;
}
It splits a string into token depending on the : character.
But you can escape the : character with \ if you want it to be part of a token.
you can escape the \ with \ if you want it to be part of a token
any other \ won't be interpreted. (ie: \a remains \a)
So you can put any data in your tokens provided that data is correctly formatted before hand.
Here is an example with the string \a:b:\n::\\:\::x, which should give these token: \a, b, \n, <empty string>, \, :, x.
>>> tokenize("\\a:b:\\n::\\\\:\\::x");
["\a", "b", "\n", "", "\", ":", "x"]
In an attempt to be clearer: the string put into the tokenizer will be interpreted, it has 2 special character: \ and :
\ will only have a special meaning only if followed by \ or :, and will effectively "escape" these character: meaning that they will loose their special meaning for tokenizer, and they'll be considered as any normal character (and thus will be part of tokens).
: is the marker separating 2 tokens.
I realize the OP didn't ask for slash escaping, but other viewers could need a complete parsing library allowing any character in data.
Use a negative lookbehind assertion.
(.*?)((?<!\\):|$)
This will only match : if it's not preceded by \.

Categories