I have a string and I need to make sure that it contains only a regular expression and no javascript because I'm creating a new script with the string so a javascript snippet would be a security risk.
Exact scenario:
JS in mozilla addon loads configuration as json through HTTPrequest (json contains {"something": "^(?:http|https)://(?:.*)"}
JS creates a pac file(proxy configuration script) that uses the "something" regex from the configuration
Any ideas how to escape the string without destroying the regex in it?
It seems that most of the standard JavaScript functionality is available (source), so you can just do:
try {
RegExp(json.something+'');
pacFile += 'RegExp(' + JSON.stringify(json.something+'') + ')';
} catch(e) {/*handle invalid regexp*/}
And not worry, because a RegExp("console.log('test')") will only produce a valid /console.log('test')/ regexp and execute nothing.
You can use a regular expression to pull apart a JavaScript regular expression.
Then you should convert the regex to a lexically simpler subset of JavaScript that avoids all the non-context-free weirdness about what / means, and any irregularities in the input regex.
var REGEXP_PARTS = "(?:"
// A regular character
+ "[^/\r\n\u2028\u2029\\[\\\\]"
// An escaped character, charset reference or backreference
+ "|\\\\[^\r\n\u2028\u2029]"
// A character set
+ "|\\[(?!\\])(?:[^\\]\\\\]|\\\\[^\r\n\u2028\u2029])+\\]"
+ ")";
var REGEXP_REGEXP = new RegExp(
// A regex starts with a slash
"^[/]"
// It cannot be lexically ambiguous with a line or block comemnt
+ "(?![*/])"
// Capture the body in group 1
+ "(" + REGEXP_PARTS + "+)"
// The body is terminated by a slash
+ "[/]"
// Capture the flags in group 2
+ "([gmi]{0,3})$");
var match = myString.match(REGEXP_REGEXP);
if (match) {
var ctorExpression =
"(new RegExp("
// JSON.stringify escapes special chars in the body, so will
// preserve token boundaries.
+ JSON.stringify(match[1])
+ "," + JSON.stringify(match[2])
+ "))";
alert(ctorExpression);
}
which will result in an expression that is in a well-understood subset of JavaScript.
The complex regex above is not in the TCB. The only part that needs to function correctly for security to hold is the ctorExpression including the use of JSON.stringify.
Related
I have a situation where I have 3 font files and I read its content in order to find mathes with font name. But the thing is that font names are Wingdings, Wingdings 2, Wingdings 3. And when I have Wingdings font name it matches all 3 files, but I need file that exactly is associated with font name, not all 3 of them. I tried to find it using indexOf method, but it didn't help. The only rational way is to use regular expression, but cannot think of a right one. One more thing need to be mentioned is that I have to pass a parameter into that regExp, something like
var regExp = new RegExp('\\^' + fontName + '$\\', 'g');
if (currentFileContent.search(regExp) !== -1) {...}
Any help will be greatly appreciated.
It seems you try to use regex delimiters in a RegExp constructor. You only need /.../ in the literal notation.
Note you need not escape the start and end of string anchors, they lose their special meaning in the regex then. \\ matches a single \, but it cannot be matched after end of string ($).
Also, you can use RegExp#test() function to check if the string matches the pattern (note no g modifier can be used with it):
var regExp = RegExp('^' + fontName + '$');
if (regExp.test(currentFileContent)) { ... }
If font names contain special characters, use escapeRegExp function from MDN:
function escapeRegExp(string){
return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}
And then
var regExp = RegExp('^' + escapeRegExp(fontName) + '$');
And the final note: if the font names appear inside a larger string, and you need to match Windings but not Windings3, use
var regExp = RegExp('\\b' + escapeRegExp(fontName) + '\\b');
The \b is a word boundary.
UPDATE
To make sure you only match a font name that is not followed by a whitespace (if any) and a digit, use a (?!\\s*\\d) lookahead when declaring a RegExp:
var fontName = "Wingding";
var contents = "Font name: Wingding, the other file: Font name: Wingding 2. And so forth. ";
var rExp = RegExp(fontName + '(?!\\s*\\d)');
if (rExp.test(contents)) {
document.write(fontName + " was found in '<i>" + contents + "</i>'.");
}
I got request on a remote service, this service give me fields with patterns as follows:
[a-zA-Zа-яА-ЯёЁ'+-]{1,100}
[0-9a-zA-Zа-яА-ЯёЁ'+-]{2,10}
In square bracket contains allowed symbols.
In curly brackets contains minimal and maximum symbols.
So I have fields and their patterns.
How I can validate entered data by incoming pattern?
Send the string to the RegExp constructor and use test.
For example:
string = "[a-zA-Zа-яА-ЯёЁ'+-]{1,100}"
pattern = new RegExp(string)
alert(pattern.test("This works, привет, 123"));
alert(pattern.test("$☛☛"));
Live demo
Depending on your situation, you might want to add "^" and "$" to the pattern.
You should use JavaScript regular expression to solve this.
you can do like this
"some test".match(/[a-zA-Zа-яА-ЯёЁ'+-]{1,100}/)
which returns ["some"]
or
/[a-zA-Zа-яА-ЯёЁ'+-]{1,100}/.test("some test")
which returns true
A simple example:
var s = "hello123";
var r1 = "[a-zA-Zа-яА-ЯёЁ'+-]{1,100}"; // the pattern you were given
var reg1 = RegExp("^" + r1 + "$"); // the pattern enclosed in `^` `$`
var r2 = "[0-9a-zA-Zа-яА-ЯёЁ'+-]{2,10}";
var reg2 = RegExp("^" + r2 + "$");
alert(reg1.test(s)); // false
alert(reg2.test(s)); // true
The regular expression has the pattern you mentioned, but enclosed between ^ and $ - meaning "the whole expression". The first expression fails because there is a number in s which is not allowed. The second expression passes - it has only numbers and letters, and between 2 and 10 characters total.
I have the following string that I am attempting to match:
REQS_HOME->31
The following Javascript code is attempting to match this:
pathRegExPattern = '(' + docTypeHome + ')' + '(->)' + '(\d+)';
parsedResult = pathCookieValue.match(pathRegExPattern);
cookieToDelete = docType + '_ScrollPos_' + $3;
alert(parsedResult); // output - null
Assume the following:
docTypeHome = "REQS_HOME"
pathCookieValue = "REQS_HOME->31"
Firstly, I am not calling my match function properly. And secondly, how do I access the value where I am attempting to match the digit values using the backreference operator?
I need to extract the value 31.
Your digit-matching part needs to double-up on the backslashes:
pathRegExPattern = '(' + docTypeHome + ')' + '(->)' + '(\\d+)';
When you build up a regular expression from string parts, the string syntax itself will "eat" a backslash. Thus, the regex you were winding up with was just d+, without the backslash.
The "31" (or whatever the number ends up being) will be in parsedResult[3]. Note that it'll be a string, so if you need it to be a number you'll want to convert it first, via the Number constructor, or parseInt(), or whatever.
I am trying to do some string replacement with RegEx in Javascript. The scenario is a single line string containing long comma-delimited list of numbers, in which duplicates are possible.
An example string is: 272,2725,2726,272,2727,297,272 (The end may or may not end in a comma)
In this example, I am trying to match each occurrence of the whole number 272. (3 matches expected)
The example regex I'm trying to use is: (?:^|,)272(?=$|,)
The problem I am having is that the second and third matches are including the leading comma, which I do not want. I am confused because I thought (?:^|,) would match, but not capture. Can someone shed light on this for me? An interesting bit is that the trailing comma is excluded from the result, which is what I want.
For what it is worth, if I were using C# there is syntax for prefix matching that does what I want: (?<=^|,)
However, it appears to be unsupported in JavaScript.
Lastly, I know I could workaround it using string splitting, array manipulation and rejoining, but I want to learn.
Use word boundaries instead:
\b272\b
ensures that only 272 matches, but not 2725.
(?:...) matches and doesn't capture - but whatever it matches will be part of the overall match.
A lookaround assertion like (?=...) is different: It only checks if it is possible (or impossible) to match the enclosed regex at the current point, but it doesn't add to the overall match.
Here is a way to create a JavaScript look behind that has worked in all cases I needed.
This is an example. One can do many more complex and flexible things.
The main point here is that in some cases,
it is possible to create a RegExp non-capturing prefix
(look behind) construct in JavaScript .
This example is designed to extract all fields that are surrounded by braces '{...}'.
The braces are not returned with the field.
This is just an example to show the idea at work not necessarily a prelude to an application.
function testGetSingleRepeatedCharacterInBraces()
{
var leadingHtmlSpaces = ' ' ;
// The '(?:\b|\B(?={))' acts as a prefix non-capturing group.
// That is, this works (?:\b|\B(?=WhateverYouLike))
var regex = /(?:\b|\B(?={))(([0-9a-zA-Z_])\2{4})(?=})/g ;
var string = '' ;
string = 'Message has no fields' ;
document.write( 'String => "' + string
+ '"<br>' + leadingHtmlSpaces + 'fields => '
+ getMatchingFields( string, regex )
+ '<br>' ) ;
string = '{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}' ;
document.write( 'String => "' + string
+ '"<br>' + leadingHtmlSpaces + 'fields => '
+ getMatchingFields( string, regex )
+ '<br>' ) ;
} ;
function getMatchingFields( stringToSearch, regex )
{
var matches = stringToSearch.match( regex ) ;
return matches ? matches : [] ;
} ;
Output:
String => "Message has no fields"
fields =>
String => "{LLLLL}Message {11111}{22222} {ffffff}abc def{EEEEE} {_____} {4444} {666666} {55555}"
fields => LLLLL,11111,22222,EEEEE,_____,55555
I'm trying to make a regexp that will match numbers, excluding numbers that are part of other words or numbers inside certain html tags. The part for matching numbers works well but I can't figure out how to find the numbers inside the html.
Current code:
//number regexp part
var prefix = '\\b()';//for future use
var baseNumber = '((\\+|-)?([\\d,]+)(?:(\\.)(\\d+))?)';
var SIBaseUnit = 'm|kg|s|A|K|mol|cd';
var SIPrefix = 'Y|Z|E|P|T|G|M|k|h|ia|d|c|m|µ|n|p|f|a|z|y';
var SIUnit = '(?:('+SIPrefix+')?('+SIBaseUnit+'))';
var generalSuffix = '(PM|AM|pm|am|in|ft)';
var suffix = '('+SIUnit+'|'+generalSuffix+')?\\b';
var number = '(' + prefix + baseNumber + suffix + ')';
//trying to make it match only when not within tags or inside excluded tags
var htmlBlackList = 'script|style|head'
var htmlStartTag = '<[^(' + htmlBlackList + ')]\\b[^>]*?>';
var reDecimal = new RegExp(htmlStartTag + '[^<]*?' + number + '[^>]*?<');
<script>
var htmlFragment = "<script>alert('hi')</script>";
var style = "<style>.foo { font-size: 14pt }</style>";
// ...
</script>
<!-- turn off this style for now
<style> ... </style>
-->
Good luck getting a regular expression to figure that out.
You're using JavaScript, so I'm guessing you're probably running in a browser. Which means you have access to the DOM, giving you access to the browser's very capable HTML parser. Use it.
The [^] regex modifier only works on single characters, not on compound expressions like (script|style|head). What you want is ?! :
var htmlStartTag = '<(?!(' + htmlBlackList + ')\\b)[^>]*?>';
(?! ... ) means 'not followed by ...' but [^ ... ] means 'a single character not in ...'.
I'm trying to make a regexp that will match numbers, excluding numbers that are part of other words or numbers inside certain html tags.
Regex cannot parse HTML. Do not use regex to parse HTML. Do not pass Go. Do not collect £200.
To ‘only match something not-within something else’ you would need a negative lookbehind assertion (“(?<!”), but JavaScript Regexps do not support lookbehind, and most other regex implementations don't support the complex variable-length lookbehind you'd need to have any hope of matching a context like being inside a tag. Even if you did have variable-length lookbehind, that'd still not reliably parse HTML, because as previously mentioned many times every day, regex cannot parse HTML.
Use an HTML parser. A browser HTML parser will be able to digest even partial input without complaining.