Prevent tainting properties of the RegExp constructor in JavaScript - javascript

This is a bit of a conundrum, I have an idea of how I might be able to fix it but I'm wondering if there's a (much) easier way.
In short, whenever a regular expression is executed in JavaScript, certain properties are assigned values on the RegExp constructor. For instance:
/foo/.test('football')
//-> true
RegExp.input
//-> "football"
RegExp.rightContext
//-> "tball"
I'd like to execute a regular expression without affecting these properties. If that's not possible (and I don't think it is), I'd like to at least restore them to their previous values afterwards.
I know input/$_ is writeable, but most of the others aren't, it seems. One option might be to reconstruct a regular expression that would reapply all these values, but I think that would be quite difficult.
The reason I want this is because I'm writing a shim of a native API, and testing it using the test262 suite. The test262 suite fails on certain tests where it checks to see if the RegExp object has unexpected values for these properties.

You can try to create a wrapper function for test:
var fTest = RegExp.test;
RegExp.test = function() {
var bReturn = fTest.apply(RegExp, arguments);
delete RegExp.input;
delete RegExp.rightContext;
return bReturn;
}

This is the final result. It's a little more robust than my initial effort; it properly escapes sub-expressions, makes sure they appear in the right order and doesn't stop when it finds an empty one:
/**
* Constructs a regular expression to restore tainted RegExp properties
*/
function createRegExpRestore () {
var lm = RegExp.lastMatch,
ret = {
input: RegExp.input
},
esc = /[.?*+^$[\]\\(){}|-]/g,
reg = [],
cap = {};
// Create a snapshot of all the 'captured' properties
for (var i = 1; i <= 9; i++)
cap['$'+i] = RegExp['$'+i];
// Escape any special characters in the lastMatch string
lm = lm.replace(esc, '\\$0');
// Now, iterate over the captured snapshot
for (var i = 1; i <= 9; i++) {
var m = cap['$'+i];
// If it's empty, add an empty capturing group
if (!m)
lm = '()' + lm;
// Else find the escaped string in lm wrap it to capture it
else
lm = lm.replace(m.replace(esc, '\\$0'), '($0)');
// Push to `reg` and chop `lm`
reg.push(lm.slice(0, lm.indexOf('(') + 1));
lm = lm.slice(lm.indexOf('(') + 1);
}
// Create the property-reconstructing regular expression
ret.exp = RegExp(reg.join('') + lm, RegExp.multiline ? 'm' : '');
return ret;
}
It does what I originally thought to be difficult. This should restore all the properties to their former values, if you use it like so:
var
// Create a 'restore point' for RegExp
old = createRegExpRestore(),
// Run your own regular expression
test = someOtherRegEx.test(someValue);
// Restore the previous values by running the RegExp
old.exp.test(old.input);

Related

How to use Javascript to change link [duplicate]

I've got a data-123 string.
How can I remove data- from the string while leaving the 123?
var ret = "data-123".replace('data-','');
console.log(ret); //prints: 123
Docs.
For all occurrences to be discarded use:
var ret = "data-123".replace(/data-/g,'');
PS: The replace function returns a new string and leaves the original string unchanged, so use the function return value after the replace() call.
This doesn't have anything to do with jQuery. You can use the JavaScript replace function for this:
var str = "data-123";
str = str.replace("data-", "");
You can also pass a regex to this function. In the following example, it would replace everything except numerics:
str = str.replace(/[^0-9\.]+/g, "");
You can use "data-123".replace('data-','');, as mentioned, but as replace() only replaces the FIRST instance of the matching text, if your string was something like "data-123data-" then
"data-123data-".replace('data-','');
will only replace the first matching text. And your output will be "123data-"
DEMO
So if you want all matches of text to be replaced in string you have to use a regular expression with the g flag like that:
"data-123data-".replace(/data-/g,'');
And your output will be "123"
DEMO2
You can use slice(), if you will know in advance how many characters need slicing off the original string. It returns characters between a given start point to an end point.
string.slice(start, end);
Here are some examples showing how it works:
var mystr = ("data-123").slice(5); // This just defines a start point so the output is "123"
var mystr = ("data-123").slice(5,7); // This defines a start and an end so the output is "12"
Demo
Plain old JavaScript will suffice - jQuery is not necessary for such a simple task:
var myString = "data-123";
var myNewString = myString.replace("data-", "");
See: .replace() docs on MDN for additional information and usage.
1- If is the sequences into your string:
let myString = "mytest-text";
let myNewString = myString.replace("mytest-", "");
the answer is text
2- if you whant to remove the first 3 characters:
"mytest-text".substring(3);
the answer is est-text
Ex:-
var value="Data-123";
var removeData=value.replace("Data-","");
alert(removeData);
Hopefully this will work for you.
Performance
Today 2021.01.14 I perform tests on MacOs HighSierra 10.13.6 on Chrome v87, Safari v13.1.2 and Firefox v84 for chosen solutions.
Results
For all browsers
solutions Ba, Cb, and Db are fast/fastest for long strings
solutions Ca, Da are fast/fastest for short strings
solutions Ab and E are slow for long strings
solutions Ba, Bb and F are slow for short strings
Details
I perform 2 tests cases:
short string - 10 chars - you can run it HERE
long string - 1 000 000 chars - you can run it HERE
Below snippet presents solutions
Aa
Ab
Ba
Bb
Ca
Cb
Da
Db
E
F
// https://stackoverflow.com/questions/10398931/how-to-strToRemove-text-from-a-string
// https://stackoverflow.com/a/10398941/860099
function Aa(str,strToRemove) {
return str.replace(strToRemove,'');
}
// https://stackoverflow.com/a/63362111/860099
function Ab(str,strToRemove) {
return str.replaceAll(strToRemove,'');
}
// https://stackoverflow.com/a/23539019/860099
function Ba(str,strToRemove) {
let re = strToRemove.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // regexp escape char
return str.replace(new RegExp(re),'');
}
// https://stackoverflow.com/a/63362111/860099
function Bb(str,strToRemove) {
let re = strToRemove.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // regexp escape char
return str.replaceAll(new RegExp(re,'g'),'');
}
// https://stackoverflow.com/a/27098801/860099
function Ca(str,strToRemove) {
let start = str.indexOf(strToRemove);
return str.slice(0,start) + str.slice(start+strToRemove.length, str.length);
}
// https://stackoverflow.com/a/27098801/860099
function Cb(str,strToRemove) {
let start = str.search(strToRemove);
return str.slice(0,start) + str.slice(start+strToRemove.length, str.length);
}
// https://stackoverflow.com/a/23181792/860099
function Da(str,strToRemove) {
let start = str.indexOf(strToRemove);
return str.substr(0, start) + str.substr(start + strToRemove.length);
}
// https://stackoverflow.com/a/23181792/860099
function Db(str,strToRemove) {
let start = str.search(strToRemove);
return str.substr(0, start) + str.substr(start + strToRemove.length);
}
// https://stackoverflow.com/a/49857431/860099
function E(str,strToRemove) {
return str.split(strToRemove).join('');
}
// https://stackoverflow.com/a/45406624/860099
function F(str,strToRemove) {
var n = str.search(strToRemove);
while (str.search(strToRemove) > -1) {
n = str.search(strToRemove);
str = str.substring(0, n) + str.substring(n + strToRemove.length, str.length);
}
return str;
}
let str = "data-123";
let strToRemove = "data-";
[Aa,Ab,Ba,Bb,Ca,Cb,Da,Db,E,F].map( f=> console.log(`${f.name.padEnd(2,' ')} ${f(str,strToRemove)}`));
This shippet only presents functions used in performance tests - it not perform tests itself!
And here are example results for chrome
This little function I made has always worked for me :)
String.prototype.deleteWord = function (searchTerm) {
var str = this;
var n = str.search(searchTerm);
while (str.search(searchTerm) > -1) {
n = str.search(searchTerm);
str = str.substring(0, n) + str.substring(n + searchTerm.length, str.length);
}
return str;
}
// Use it like this:
var string = "text is the cool!!";
string.deleteWord('the'); // Returns text is cool!!
I know it is not the best, but It has always worked for me :)
str.split('Yes').join('No');
This will replace all the occurrences of that specific string from original string.
I was used to the C# (Sharp) String.Remove method.
In Javascript, there is no remove function for string, but there is substr function.
You can use the substr function once or twice to remove characters from string.
You can make the following function to remove characters at start index to the end of string, just like the c# method first overload String.Remove(int startIndex):
function Remove(str, startIndex) {
return str.substr(0, startIndex);
}
and/or you also can make the following function to remove characters at start index and count, just like the c# method second overload String.Remove(int startIndex, int count):
function Remove(str, startIndex, count) {
return str.substr(0, startIndex) + str.substr(startIndex + count);
}
and then you can use these two functions or one of them for your needs!
Example:
alert(Remove("data-123", 0, 5));
Output: 123
Using match() and Number() to return a number variable:
Number(("data-123").match(/\d+$/));
// strNum = 123
Here's what the statement above does...working middle-out:
str.match(/\d+$/) - returns an array containing matches to any length of numbers at the end of str. In this case it returns an array containing a single string item ['123'].
Number() - converts it to a number type. Because the array returned from .match() contains a single element Number() will return the number.
Update 2023
There are many ways to solve this problem, but I believe this is the simplest:
const newString = string.split("data-").pop();
console.log(newString); /// 123
For doing such a thing there are a lot of different ways. A further way could be the following:
let str = 'data-123';
str = str.split('-')[1];
console.log('The remaining string is:\n' + str);
Basically the above code splits the string at the '-' char into two array elements and gets the second one, that is the one with the index 1, ignoring the first array element at the 0 index.
The following is one liner version:
console.log('The remaining string is:\n' + 'data-123'.split('-')[1]);
Another possible approach would be to add a method to the String prototype as follows:
String.prototype.remove = function (s){return this.replace(s,'')}
// After that it will be used like this:
a = 'ktkhkiksk kiksk ktkhkek kcklkekaknk kmkekskskakgkekk';
a = a.remove('k');
console.log(a);
Notice the above snippet will allow to remove only the first instance of the string you are interested to remove. But you can improve it a bit as follows:
String.prototype.removeAll = function (s){return this.replaceAll(s,'')}
// After that it will be used like this:
a = 'ktkhkiksk kiksk ktkhkek kcklkekaknk kmkekskskakgkekk';
a = a.removeAll('k');
console.log(a);
The above snippet instead will remove all instances of the string passed to the method.
Of course you don't need to implement the functions into the prototype of the String object: you can implement them as simple functions too if you wish (I will show the remove all function, for the other you will need to use just replace instead of replaceAll, so it is trivial to implement):
function strRemoveAll(s,r)
{
return s.replaceAll(r,'');
}
// you can use it as:
let a = 'ktkhkiksk kiksk ktkhkek kcklkekaknk kmkekskskakgkekk'
b = strRemoveAll (a,'k');
console.log(b);
Of course much more is possible.
Another way to replace all instances of a string is to use the new (as of August 2020) String.prototype.replaceAll() method.
It accepts either a string or RegEx as its first argument, and replaces all matches found with its second parameter, either a string or a function to generate the string.
As far as support goes, at time of writing, this method has adoption in current versions of all major desktop browsers* (even Opera!), except IE. For mobile, iOS SafariiOS 13.7+, Android Chromev85+, and Android Firefoxv79+ are all supported as well.
* This includes Edge/ Chrome v85+, Firefox v77+, Safari 13.1+, and Opera v71+
It'll take time for users to update to supported browser versions, but now that there's wide browser support, time is the only obstacle.
References:
MDN
Can I Use - Current Browser Support Information
TC39 Proposal Repo for .replaceAll()
You can test your current browser in the snippet below:
//Example coutesy of MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replaceAll
const p = 'The quick brown fox jumps over the lazy dog. If the dog reacted, was it really lazy?';
const regex = /dog/gi;
try {
console.log(p.replaceAll(regex, 'ferret'));
// expected output: "The quick brown fox jumps over the lazy ferret. If the ferret reacted, was it really lazy?"
console.log(p.replaceAll('dog', 'monkey'));
// expected output: "The quick brown fox jumps over the lazy monkey. If the monkey reacted, was it really lazy?"
console.log('Your browser is supported!');
} catch (e) {
console.log('Your browser is unsupported! :(');
}
.as-console-wrapper: {
max-height: 100% !important;
}
Make sure that if you are replacing strings in a loop that you initiate a new Regex in each iteration. As of 9/21/21, this is still a known issue with Regex essentially missing every other match. This threw me for a loop when I encountered this the first time:
yourArray.forEach((string) => {
string.replace(new RegExp(__your_regex__), '___desired_replacement_value___');
})
If you try and do it like so, don't be surprised if only every other one works
let reg = new RegExp('your regex');
yourArray.forEach((string) => {
string.replace(reg, '___desired_replacement_value___');
})

Break out of replace global loop

I have a RegExp, doing a string replace, with global set. I only need one replace, but I'm using global because there's a second set of pattern matching (a mathematical equation that determines acceptable indices for the start of the replace) that I can't readily express as part of a regex.
var myString = //function-created string
myString = myString.replace(myRegex, function(){
if (/* this index is okay */){
//!! want to STOP searching now !!//
return //my return string
} else {
return arguments[0];
//return the string we matched (no change)
//continue on to the next match
}
}, "g");
If even possible, how do I break out of the string global search?
Thanks
Possible Solution
A solution (that doesn't work in my scenario for performance reasons, since I have very large strings with thousands of possible matches to very complex RegExp running hundreds or thousands of times):
var matched = false;
var myString = //function-created string
myString = myString.replace(myRegex, function(){
if (!matched && /* this index is okay */){
matched = true;
//!! want to STOP searching now !!//
return //my return string
} else {
return arguments[0];
//return the string we matched (no change)
//continue on to the next match
}
}, "g");
Use RegExp.exec() instead. Since you only do replacement once, I make use of that fact to simplify the replacement logic.
var myString = "some string";
// NOTE: The g flag is important!
var myRegex = /some_regex/g;
// Default value when no match is found
var result = myString;
var arr = null;
while ((arr = myRegex.exec(myString)) != null) {
// arr.index gives the starting index of the match
if (/* index is OK */) {
// Assign new value to result
result = myString.substring(0, arr.index) +
/* replacement */ +
myString.substring(myRegex.lastIndex);
break;
}
// Increment lastIndex of myRegex if the regex matches an empty string
// This is important to prevent infinite loop
if (arr[0].length == 0) {
myRegex.lastIndex++;
}
}
This code exhibits the same behavior as String.match(), since it also increments the index by 1 if the last match is empty to prevent infinite loop.
You can put try-catch and use undeclared variable to exit the replace function
var i = 0;
try{
"aaaaa".replace ( /./g, function( a, b ){
//Exit the loop on the 3-rd iteration
if ( i === 3 ){
stop; //undeclared variable
}
//Increment i
i++
})
}
catch( err ){
}
alert ( "i = " + i ); //Shows 3
I question your logic about performance. I think some points made in the comments are valid. But, what do I know... ;)
However, this is one way of doing what you want. Again, I think this, performance wise, isn't the best...:
var myString = "This is the original string. Let's see if the original will change...";
var myRegex = new RegExp('original', 'g');
var matched=false;
document.write(myString+'<br>');
myString = myString.replace(myRegex, function (match) {
if ( !matched ) {
matched = true;
return 'replaced';
} else {
return match;
}
});
document.write(myString);
It's pretty much like your "Possible Solution". And it doesn't "abort" after the replace (hence my performance reservation). But it does what you asked for. It replaces the first instance, sets a flag and after that just returns the matched string.
See it work here.
Regards.

Regular Expressions required format

I want to validate following text using regular expressions
integer(1..any)/'fs' or 'sf'/ + or - /integer(1..any)/(h) or (m) or (d)
samples :
1) 8fs+60h
2) 10sf-30m
3) 2fs+3h
3) 15sf-20m
i tried with this
function checkRegx(str,id){
var arr = strSplit(str);
var regx_FS =/\wFS\w|\d{0,9}\d[hmd]/gi;
for (var i in arr){
var str_ = arr[i];
console.log(str_);
var is_ok = str_.match(regx_FS);
var err_pos = str_.search(regx_FS);
if(is_ok){
console.log(' ID from ok ' + id);
$('#'+id).text('Format Error');
break;
}else{
console.log(' ID from fail ' + id);
$('#'+id).text('');
}
}
}
but it is not working
please can any one help me to make this correct
This should do it:
/^[1-9]\d*(?:fs|sf)[-+][1-9]\d*[hmd]$/i
You were close, but you seem to be missing some basic regex comprehension.
First of all, the ^ and $ just make sure you're matching the entire string. Otherwise any junk before or after will count as valid.
The formation [1-9]\d* allows for any integer from 1 upwards (and any number of digits long).
(?:fs|sf) is an alternation (the ?: is to make the group non-capturing) to allow for both options.
[-+] and [hmd] are character classes allowing to match any one of the characters in there.
That final i allows the letters to be lowercase or uppercase.
I don't see how the expression you tried relates anyhow to the description you gave us. What you want is
/\d+(fs|sf)[+-]\d+[hmd]/
Since you seem to know a bit about regular expressions I won't give a step-by-step explanation :-)
If you need exclude zero from the "integer" matches, use [1-9]\d* instead. Not sure whether by "(1..any)" you meant the number of digits or the number itself.
Looking on the code, you
should not use for in enumerations on arrays
will need string start and end anchors to check whether _str exactly matches the regex (instead of only some part)
don't need the global flag on the regex
rather might use the RegExp test method than match - you don't need a result string but only whether it did match or not
are not using the err_pos variable anywhere, and it hardly will work with search
function checkRegx(str, id) {
var arr = strSplit(str);
var regx_FS = /^\d+(fs|sf)[+-]\d+[hmd]$/i;
for (var i=0; i<arr.length; i++) {
var str = arr[i];
console.log(str);
if (regx_FS.test(str) {
console.log(' ID from ok ' + id);
$('#'+id).text('Format Error');
break;
} else {
console.log(' ID from fail ' + id);
$('#'+id).text('');
}
}
}
Btw, it would be better to separate the validation (regex, array split, iteration) from the output (id, jQuery, logs) into two functions.
Try something like this:
/^\d+(?:fs|sf)[-+]\d+[hmd]$/i

Regex to check whether string starts with, ignoring case differences

I need to check whether a word starts with a particular substring ignoring the case differences. I have been doing this check using the following regex search pattern but that does not help when there is difference in case across the strings.
my case sensitive way:
var searchPattern = new RegExp('^' + query);
if (searchPattern.test(stringToCheck)) {}
Pass the i modifier as second argument:
new RegExp('^' + query, 'i');
Have a look at the documentation for more information.
You don't need a regular expression at all, just compare the strings:
if (stringToCheck.substr(0, query.length).toUpperCase() == query.toUpperCase())
Demo: http://jsfiddle.net/Guffa/AMD7V/
This also handles cases where you would need to escape characters to make the RegExp solution work, for example if query="4*5?" which would always match everything otherwise.
I think all the previous answers are correct. Here is another example similar to SERPRO's, but the difference is that there is no new constructor:
Notice: i ignores the case and ^ means "starts with".
var whateverString = "My test String";
var pattern = /^my/i;
var result = pattern.test(whateverString);
if (result === true) {
console.log(pattern, "pattern matched!");
} else {
console.log(pattern, "pattern did NOT match!");
}
Here is the jsfiddle (old version) if you would like to give it a try.
In this page you can see that modifiers can be added as second parameter. In your case you're are looking for 'i' (Canse insensitive)
//Syntax
var patt=new RegExp(pattern,modifiers);
//or more simply:
var patt=/pattern/modifiers;
For cases like these, JS Regex offers a feature called 'flag'. They offer an extra hand in making up Regular Expressions more efficient and widely applicable.
Here, the flag that could be used is the 'i' flag, which ignores cases (upper and lower), and matches irrespective of them (cases).
Literal Notation:
let string = 'PowerRangers'
let regex = /powerrangers/i
let result = regex.test(string) // true
Using the JS 'RegExp' constructor:
let string = 'PowerRangers'
let regex = new RegExp('powerrangers', 'i')
let result = regex.test(string)
2022, ECMA 11
Just created this helper function, I find it more useful and clean than modifying the regex and recreating one everytime.
/**
* #param {string} str
* #param {RegExp} search
* #returns {boolean}
*/
function regexStartsWith (str, search, {caseSensitive = true} = {})
{
var source = search.source
if (!source.startsWith('^')) source = '^' + source
var flags = search.flags
if (!caseSensitive && !flags.includes('i')) flags += 'i'
var reg = new RegExp(source, flags)
return reg.test(str)
}
Use it this way:
regexStartsWith('can you Fi nD me?', /fi.*nd/, {caseSensitive: false})

Template and Place holders algorithm

First a quick definition :)
Template - A string which may contain placeholders (example:"hello [name]")
Placeholder - A substring whitin square brackets (example: "name" in "hello [name]:).
Properties map - A valid object with strings as values
I need to write a code that replace placeholders (along with brackets) with the matching values in the properties map.
example:
for the following properties map:
{
"name":"world",
"my":"beautiful",
"a":"[b]",
"b":"c",
"c":"my"
}
Expected results:
"hello name" -> "hello name"
"hello [name]" -> "hello world"
"[b]" -> "c"
"[a]" -> "c" (because [a]->[b]->[c])
"[[b]]" -> "my" (because [[b]]->[c]->my)
"hello [my] [name]" -> "hello beautiful world"
var map = {
"name":"world",
"my":"beautiful",
"a":"[b]",
"b":"c",
"c":"my"
};
var str = "hello [my] [name] [[b]]";
do {
var strBeforeReplace = str;
for (var k in map) {
if (!map.hasOwnProperty(k)) continue;
var needle = "[" + k + "]";
str = str.replace(needle, map[k]);
}
var strChanged = str !== strBeforeReplace;
} while (strChanged);
document.write(str); //hello beautiful world my
The answer by #chris is excellent, I just want to provide an alternative solution using regular expressions that works "the other way round", i.e., not by looking for occurrences of the "placeholder versions" of all items in the properties map, but by repeatedly looking for occurrences of the placeholder itself, and substituting it with the corresponding value from the property map. This has two advantages:
If the property map grows very large, this solution should have
better performance (still to be benchmarked though).
The placeholder and the way substitutions work can easily be modified by adjusting the regular expression and the substitution function (might not be an issue here).
The downside is, of course, that the code is a little more complex (partly due to the fact that JavaScript lacks a nice way of substituting regular expression matches using custom functions, so that's what substituteRegExp is for):
function substituteRegExp(string, regexp, f) {
// substitute all matches of regexp in string with the value
// returned by f given a match and the corresponding group values
var found;
var lastIndex = 0;
var result = "";
while (found = regexp.exec(string)) {
var subst = f.apply(this, found);
result += string.slice(lastIndex, found.index) + subst;
lastIndex = found.index + found[0].length;
}
result += string.slice(lastIndex);
return result;
}
function templateReplace(string, values) {
// repeatedly substitute [key] placeholders in string by values[key]
var placeholder = /\[([a-zA-Z0-9]+)\]/g;
while (true) {
var newString = substituteRegExp(string, placeholder, function(match, key) {
return values[key];
});
if (newString == string)
break;
string = newString;
}
return string;
}
alert(templateReplace("hello [[b]] [my] [name]", {
"name":"world",
"my":"beautiful",
"a":"[b]",
"b":"c",
"c":"my"
})); // -> "hello my beautiful world"
Update: I did some little profiling to compare the two solutions (jsFiddle at http://jsfiddle.net/n8Fyv/1/, I also used Firebug). While #chris' solution is faster for small strings (no need for parsing the regular expression etc), this solution performs a lot better for large strings (in the order of thousands of characters). I did not compare for different sizes of the property map, but expect even bigger differences there.
In theory, this solution has runtime O(k n) where k is the depth of nesting of placeholders and n is the length of the string (assuming dictionary/hash lookups need constant time), while #chris' solution is O(k n m) where m is the number of items in the property map. All of this is only relevant for large inputs, of course.
If you're familiar with .NET's String.Format, then you should take a look at this JavaScript implementation. It supports number formatting too, just like String.Format.
Here's an example of how to use it:
var result = String.Format("Hello {my} {name}", map);
However, it would require some modification to do recursive templates.

Categories