A simple javascript regex backreference - javascript

I have the following string that I am attempting to match:
REQS_HOME->31
The following Javascript code is attempting to match this:
pathRegExPattern = '(' + docTypeHome + ')' + '(->)' + '(\d+)';
parsedResult = pathCookieValue.match(pathRegExPattern);
cookieToDelete = docType + '_ScrollPos_' + $3;
alert(parsedResult); // output - null
Assume the following:
docTypeHome = "REQS_HOME"
pathCookieValue = "REQS_HOME->31"
Firstly, I am not calling my match function properly. And secondly, how do I access the value where I am attempting to match the digit values using the backreference operator?
I need to extract the value 31.

Your digit-matching part needs to double-up on the backslashes:
pathRegExPattern = '(' + docTypeHome + ')' + '(->)' + '(\\d+)';
When you build up a regular expression from string parts, the string syntax itself will "eat" a backslash. Thus, the regex you were winding up with was just d+, without the backslash.
The "31" (or whatever the number ends up being) will be in parsedResult[3]. Note that it'll be a string, so if you need it to be a number you'll want to convert it first, via the Number constructor, or parseInt(), or whatever.

Related

How to convert string with concatenation to real string in javascript?

I have a string which looks like this on a page response (saved as autoResponse):
... hexMD5('\262' + '****' + '\155\135\053\325\374\315\264\062\232\354\242\205\217\034\154\005'); ...
In order to capture this, I use:
var hex = autoResponse.split('hexMD5(')[1].split(')')[0];
This now gives me this string:
'\262' + '****' + '\155\135\053\325\374\315\264\062\232\354\242\205\217\034\154\005'
If I put this directly into the hexMD5() method, it thinks that the ', + symbols and white space are apart of the secret.
I tried to use replace() to remove them like so:
while(hex.split("'").length !== 1) hex = hex.replace("'", "");
while(hex.split("+").length !== 1) hex = hex.replace("+", "");
while(hex.split(" ").length !== 1) hex = hex.replace(" ", "");
However, when I then do hexMD5(hex) it gives me an incorrect hex. Is there anyway I can convert the hex to a string where it combines the strings together as if I was hardcoding it like
hexMD5('\262' + '****' + '\155\135\053\325\374\315\264\062\232\354\242\205\217\034\154\005');
any help would be appreciated.
You can use a single, much simpler RegExp for this:
hex = hex.replace(/' ?\+ ?'/g, '');
That says "replace all single-quotes, followed by possibly a space, then a plus, then possibly another space, followed by another single quote" and replaces those matches with nothing, thus removing them. (You need the \ before the + because + is a special character in RegExes that needs to be escaped.)

How to write a regular expression that only matches strings or variables that starts with +?

I am trying to write a regular expression that matches a pattern only if it contains a string or variable name that has the + sign before it.
So, for example, using this text as input:
+ "string1" + variable1 +"string2" + variable2
I want that the regex to match and return the text. But if the text is in this form:
+ "string1" + variable1 error + variable2 ` or `+ "string1" + variable1 +
(the string or variable does not have + before [first case], after + there is nothing [second case]) I want the regular expression not to match the text and in that way return nothing(null).
I have written this so far
(Here is a live demo to play with):
/^ *\+ *("(?:[^"]*)"|(?:[a-zA-Z]\w*)) *(.*)$/
However, it only matches if the first part of the text is +"string" or + variable even the part after has errors.
You can match + "string" or + var like substrings 1 or more times by enclosing these patterns into a capturing group and using a + quantifier:
^(\+\s*(?:"[^"]+"|[a-zA-Z]\w*)\s*)+$
Have a look at the demo
Note the m modifier that will make ^ and $ match the start and end of lines, not the start and end of the whole string (more for demonstration purposes).

Javascript nested square brackets in string

I am looking for an easier (and less hacky) way to get the substring of what is inside matching square brackets in a string. For example, lets say this is the string:
[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ
I want the substring:
ABC[D][E[FG]]HIJK[LMN]
Right now, I am looping through the string and counting the open and closed brackets, and when those numbers are the same, I take substring of the first open bracket and last closed bracket.
Is there an easier way to do this (ie with regex), so that I do need to loop through every character?
Here's another approach, an ugly hack which turns the input into a JS array representation and then parses it using JSON.parse:
function parse(str) {
return JSON.parse('[' +
str.split('') . join(',') . // insert commas
replace(/\[,/g, '[') . // clean up leading commas
replace(/,]/g, ']') . // clean up trailing commas
replace(/\w/g, '"$&"') // quote strings
+ ']');
}
>> hack('A[B]C')
<< ["A", ["B"], "C"]
Now a stringifier to turn arrays back into the bracketed form:
function stringify(array) {
return Array.isArray(array) ? '[' + array.map(stringify).join('') + ']' : array;
}
Now your problem can be solved by:
stringify(parse("[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ")[0])
Not sure if I get the question right (sorry about that).
So you mean that if you were to have a string of characters X, you would like to check if the string combination Y is contained within X?
Where Y being ABC[D][E[FG]]HIJK[LMN]
If so then you could simply do:
var str = "[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ";
var res = str.match(/ABC\[D]\[E\[FG]]HIJK\[LMN]/);
The above would then return the string literal Y as it matches what is inside str.
It is important that you pay attention to the fact that the symbols [ are being escaped with a \. This is because in regex if you were to have the two square brackets with any letter in between (ie. [asd]) regex would then match the single characters included in the specified set.
You can test the regex here:
https://regex101.com/r/zK3vZ3/1
I think the problem is to get all characters from an opening square bracket up to the corresponding closing square bracket. Balancing groups are not implemented in JavaScript, but there is a workaround: we can use several optional groups between these square brackets.
The following regex will match up to 3 nested [...] groups and you can add the capturing groups to support more:
\[[^\]\[]*(?:
\[[^\]\[]*(?:
\[[^\]\[]*(?:\[[^\]\[]*\])*\]
)*[^\]\[]*
\][^\]\[]*
)*[^\]\[]*
\]
See example here. However, performance may be not that high with such heavy backtracking.
UPDATE
Use XRegExp:
var str = '[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ';
// First match:
var res = XRegExp.matchRecursive(str, '\\[', ']');
document.body.innerHTML = "Getting the first match:<br/><pre>" + JSON.stringify(res, 0, 4) + "</pre><br/>And now, multiple matches (add \"g\" modifier when defining the XRegExp)";
// Multiple matches:
res = XRegExp.matchRecursive(str, '\\[', ']', 'g');
document.body.innerHTML += "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-all-min.js"></script>

Check if string contains a regex & no js

I have a string and I need to make sure that it contains only a regular expression and no javascript because I'm creating a new script with the string so a javascript snippet would be a security risk.
Exact scenario:
JS in mozilla addon loads configuration as json through HTTPrequest (json contains {"something": "^(?:http|https)://(?:.*)"}
JS creates a pac file(proxy configuration script) that uses the "something" regex from the configuration
Any ideas how to escape the string without destroying the regex in it?
It seems that most of the standard JavaScript functionality is available (source), so you can just do:
try {
RegExp(json.something+'');
pacFile += 'RegExp(' + JSON.stringify(json.something+'') + ')';
} catch(e) {/*handle invalid regexp*/}
And not worry, because a RegExp("console.log('test')") will only produce a valid /console.log('test')/ regexp and execute nothing.
You can use a regular expression to pull apart a JavaScript regular expression.
Then you should convert the regex to a lexically simpler subset of JavaScript that avoids all the non-context-free weirdness about what / means, and any irregularities in the input regex.
var REGEXP_PARTS = "(?:"
// A regular character
+ "[^/\r\n\u2028\u2029\\[\\\\]"
// An escaped character, charset reference or backreference
+ "|\\\\[^\r\n\u2028\u2029]"
// A character set
+ "|\\[(?!\\])(?:[^\\]\\\\]|\\\\[^\r\n\u2028\u2029])+\\]"
+ ")";
var REGEXP_REGEXP = new RegExp(
// A regex starts with a slash
"^[/]"
// It cannot be lexically ambiguous with a line or block comemnt
+ "(?![*/])"
// Capture the body in group 1
+ "(" + REGEXP_PARTS + "+)"
// The body is terminated by a slash
+ "[/]"
// Capture the flags in group 2
+ "([gmi]{0,3})$");
var match = myString.match(REGEXP_REGEXP);
if (match) {
var ctorExpression =
"(new RegExp("
// JSON.stringify escapes special chars in the body, so will
// preserve token boundaries.
+ JSON.stringify(match[1])
+ "," + JSON.stringify(match[2])
+ "))";
alert(ctorExpression);
}
which will result in an expression that is in a well-understood subset of JavaScript.
The complex regex above is not in the TCB. The only part that needs to function correctly for security to hold is the ctorExpression including the use of JSON.stringify.

Javascript RegEx non-capturing prefix

I am trying to do some string replacement with RegEx in Javascript. The scenario is a single line string containing long comma-delimited list of numbers, in which duplicates are possible.
An example string is: 272,2725,2726,272,2727,297,272 (The end may or may not end in a comma)
In this example, I am trying to match each occurrence of the whole number 272. (3 matches expected)
The example regex I'm trying to use is: (?:^|,)272(?=$|,)
The problem I am having is that the second and third matches are including the leading comma, which I do not want. I am confused because I thought (?:^|,) would match, but not capture. Can someone shed light on this for me? An interesting bit is that the trailing comma is excluded from the result, which is what I want.
For what it is worth, if I were using C# there is syntax for prefix matching that does what I want: (?<=^|,)
However, it appears to be unsupported in JavaScript.
Lastly, I know I could workaround it using string splitting, array manipulation and rejoining, but I want to learn.
Use word boundaries instead:
\b272\b
ensures that only 272 matches, but not 2725.
(?:...) matches and doesn't capture - but whatever it matches will be part of the overall match.
A lookaround assertion like (?=...) is different: It only checks if it is possible (or impossible) to match the enclosed regex at the current point, but it doesn't add to the overall match.
Here is a way to create a JavaScript look behind that has worked in all cases I needed.
This is an example. One can do many more complex and flexible things.
The main point here is that in some cases,
it is possible to create a RegExp non-capturing prefix
(look behind) construct in JavaScript .
This example is designed to extract all fields that are surrounded by braces '{...}'.
The braces are not returned with the field.
This is just an example to show the idea at work not necessarily a prelude to an application.
function testGetSingleRepeatedCharacterInBraces()
{
var leadingHtmlSpaces = ' ' ;
// The '(?:\b|\B(?={))' acts as a prefix non-capturing group.
// That is, this works (?:\b|\B(?=WhateverYouLike))
var regex = /(?:\b|\B(?={))(([0-9a-zA-Z_])\2{4})(?=})/g ;
var string = '' ;
string = 'Message has no fields' ;
document.write( 'String => "' + string
+ '"<br>' + leadingHtmlSpaces + 'fields => '
+ getMatchingFields( string, regex )
+ '<br>' ) ;
string = '{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}' ;
document.write( 'String => "' + string
+ '"<br>' + leadingHtmlSpaces + 'fields => '
+ getMatchingFields( string, regex )
+ '<br>' ) ;
} ;
function getMatchingFields( stringToSearch, regex )
{
var matches = stringToSearch.match( regex ) ;
return matches ? matches : [] ;
} ;
Output:
String => "Message has no fields"
fields =>
String => "{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}"
fields => LLLLL,11111,22222,EEEEE,_____,55555

Categories