Regex to identify specific strings and not just keys

Regex to identify specific strings and not just keys - javascript

I am creating a calculator for my website and I am using the eval()
function, the issue is that I have a whitelist of allowed terms and
keys and I want to make sure only terms that are specifically in that
order are let through. For example, if I have a term called SQRT, I
would like terms that are only SQRT to be let through, right now I
have the problem that all instances of the letters in SQRT are also
let through, for example T
var input1 = "0";
var keywords = ["sqrt", "sin", "cos", "tan", "log","exp"];
function evalue(){
var keys = [];
keywords.forEach(element => {
keys.push("{^"+ element +"$}");
});
var whitelist = new RegExp("[^-()/1234567890*+." + keys.join() + "]", "g");
input1 = document.getElementById("calcInput1").value;
input1 = input1.replace(whitelist, '');
keywords.forEach(element => {
var elSearch = new RegExp(element, "g");
input1 = input1.replace(elSearch, "Math."+element)
});
input1 = eval(input1);
input1 = (Math.round((input1*100)))/100;
document.getElementById("calcResult").innerHTML=input1;
console.log(input1);
}
I thought that by separating the terms into {^$}, they would only find those specific terms and let them through.

[^xxxx] is not character inside, if you put there any word it will use characters from the word.
I would solve your problem with tokenizing the string using the keywords and numbers and filtering out what don't match:
var keywords = ["sqrt", "sin", "cos", "tan", "log","exp"];
var re = new RegExp('(' + keywords.join('|') + '|[0-9]+)');
var input = document.querySelector('input');
document.querySelector('button').addEventListener('click', function() {
var text = input.value.split(re).filter(Boolean).map(function(token) {
if (token.match(re)) {
return token;
}
return '';
}).join('');
console.log(text);
});
<input/>
<button>eval</button>
And suggestion if you want proper calculator you probably will need a parser, you can use some parser generator. I've wrote an article about different parser generators in JS (but it's in Polish), there are example codes and you can use google translate (link with translation). 5 parser generators in JavaScript (here is original post 5 Parserów i Generatorów Parserów w JavaScript use this link to check the example source code, because Google was converting the code into one line).

Related

Regular expression to eliminate a word

I have this string:
var chain = "providerId=12$familyId=123&brandId=1122112$officeId=21&";
I need to do a method that erases a certain word with regular expressions.
Example:
var word = "familyId";
var newChain = deleteParam(chain, word);
console.log(newChain);
Result = "providerId=12$brandId=1122112$officeId=21&";
Delete : familyId=123&
I tried to do the method in the following way, but it does not work:
function deleteParam(chain, word) {
var exp = new RegExp(param, "=[0-9]&");
var str = chain.replace(exp, ""); // Delete
return str
}
Please, I need your help, I can not make this method work, because I do not understand well how to build regular expressions.
Excuse me all, English is not my native language
thank you very much to all.

You can use something like this new RegExp(param + "=\[^&]+")
Here is an example:
var chain = "providerId=12$familyId=123&brandId=1122112$officeId=21&";
var toRemove = "familyId";
var pattern = new RegExp(toRemove + "=[^&]*&");
var newChain = chain.replace(pattern, "");
console.log(newChain);
If you're looking to process a GET request's search parameters is better to use a pattern like this (&|\?)*parameterName=[^&]*

javascript string replace unicode troubles

I want to be able to transliterate names into different languages. I'm starting with Armenian.
My html is like this:
<input type="text" class="name"></input><br>
<p class="transliterated"></p>
<button id="button">transliterate!</button>
My javascript is like this:
var buttonEl = document.getElementById("button");
buttonEl.addEventListener("click", getArmenian);
function getArmenian() {
var inputEl = document.getElementsByClassName("name");
var outputEl = document.getElementsByClassName("transliterated");
for (var i = 0; i < inputEl.length; i++) {
var nameEl = inputEl[i].value;
var ayb = '&#x561';
var ben = '&#x562';
var nameEl = nameEl.replace(/a/gi, ayb);
var nameEl = nameEl.replace(/b/gi, ben);
outputEl[i].innerHTML = nameEl;
}
}
In the above example, I'm picking out the letters a and b, and replacing them with the Armenian characters 'ayb' and 'ben', respectively.
So far so good.
The pickle starts here: I've defined variables for all the letters of the Armenian alphabet the same way I did for 'ayb' and I used replace to replace the respective letter of the English alphabet. This is not a sophisticated transliteration at this point.
The 'x' is problematic, since each code contains an 'x' so I just search and replace the 'x' first, and that mini-problem is solved.
But this thing doesn't distinguish between between 'r' and 'R' when searching. How can I fix that? Right now, if I transliterate 'Rob' it gives me '&#x57C + &#x585 + &#x562' which I am happy with, but I didn't knowingly program it to recognize the capital letter 'R'.
Once I do that, how do I keep this thing from replacing the 'C' in '&#x57C' which is the letter 'ra'?

As NULL-POINTER mentioned, it is because if your "i" in the regex pattern. I think that you will have a lot of these "problematic" occurrences doing it via a regex, as multiple letters might have this exact problem. I'd recommend instead using a hash to represent the transliteration. I have made an example that maps a, b, and c to x, y, and z. Of course you'd want to write your own hash, but here's the gist of the idea:
var hash = {
a: "x",
b: "y",
c: "z"
};
var str = "abcdddcba";
// first, do a split so each letter is a part of the array
var translate = str.split("")
// then when you have an array of letters, you can map it to the new values in the hash, and default it to itself if no match is found
.map((letter) => hash[letter] || letter)
// next, join all of the letters back together
.join("");
You can check this out on a fiddle I made for you: https://jsfiddle.net/fwkr94Le/

Most efficient way to extract parts of string

I have a bunch of strings in the format 'TYPE_1_VARIABLE_NAME'.
The goal is to get 3 variables in the form of:
varType = 'TYPE',
varNumber = '1',
varName = 'VARIABLE_NAME'
What's the most efficient way of achieving this?
I know I can use:
var firstUnderscore = str.indexOf('_')
varType = str.slice(0, firstUnderscore))
varNumber = str.slice(firstUnderscore+1,firstUnderscore+2)
varName = str.slice(firstUnderscore+3)
but this feels like a poor way of doing it. Is there a better way? RegEx?
Or should I just rename the variable to 'TYPE_1_variableName' and do a:
varArray = str.split('_')
and then get them with:
varType = varArray[0],
varNumber = varArray[1],
varName = varArray[2]
Any help appreciated. jQuery also ok.

Regex solution
Given that the first and second underscores are the delimiters, this regex approach will extract the parts (allowing underscores in the last part):
//input data
var string = 'TYPE_1_VARIABLE_NAME';
//extract parts using .match()
var parts = string.match(/([^_]+)_([^_]+)_([^$]+)/);
//indexes 1 through 3 contains the parts
var varType = parts[1];
var varNumber = parts[2];
var varName = parts[3];
Given that the first variable consists of characters and the second of digits, this more specific regex could be used instead:
var parts = string.match(/(\w+)_(\d)_(.+)/);
Non-regex solution
Using .split('_'), you could do this:
//input data
var string = 'TYPE_1_VARIABLE_NAME';
//extract parts using .split()
var parts = string.split('_');
//indexes 0 and 1 contain the first parts
//the rest of the parts array contains the last part
var varType = parts[0];
var varNumber = parts[1];
var varName = parts.slice(2).join('_');
In matters of efficiency, both approaches contain about the same amount of code.

You could use regex and split
var string='TYPE_1_VARIABLE_NAME';
var div=string.split(/^([A-Z]+)_(\d+)_(\w+)$/);
console.log('type:'+div[1]);
console.log('num:'+div[2]);
console.log('name:'+div[3]);

Here's an answer I found here:
var array = str.split('_'),
type = array[0], number = array[1], name = array[2];
ES6 standardises destructuring assignment, which allows you to do what Firefox has supported for quite a while now:
var [type, number, name] = str.split('_');
You can check browser support using Kangax's compatibility table.
Here's a sample Fiddle

replace characters in a string using an associative array mapping

I have an associative array/object such at this:
mymap = {'e':'f', 'l':'g'};
And I want to replace all matching characters in a string using the above as a simple cypher, but only replacing existing characters. As an example,
input = "hello world";
output = input.map(mymap); //how can I do this?
//output is "hfggo worgd"
Balancing performance (for large input) and code size are of interest.
My application is replacing unicode characters with latex strings using this map, but I'm happy to stick with the more general question.

The following works:
mymap = {'e':'f', 'l':'g'};
var replacechars = function(c){
return mymap[c] || c;
};
input = "hello world";
output = input.split('').map(replacechars).join('');
although having to split and then join the input seems quite round-about, particularly if this is applied to a wall of text.

Another way would be loop over the object properties and use regex for each replacement:
var input = 'hello world';
var output = '';
for (var prop in mymap) {
if (mymap.hasOwnProperty(prop)) {
var re = new RegExp(prop, 'g');
output = input.replace(re, mymap[prop]);
}
}

Javascript Regexp loop all matches

I'm trying to do something similar with stack overflow's rich text editor. Given this text:
[Text Example][1]
[1][http://www.example.com]
I want to loop each [string][int] that is found which I do this way:
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
var arrMatch = null;
var rePattern = new RegExp(
"\\[(.+?)\\]\\[([0-9]+)\\]",
"gi"
);
while (arrMatch = rePattern.exec(Text)) {
console.log("ok");
}
This works great, it alerts 'ok' for each [string][int]. What I need to do though, is for each match found, replace the initial match with components of the second match.
So in the loop $2 would represent the int part originally matched, and I would run this regexp (pseduo)
while (arrMatch = rePattern.exec(Text)) {
var FindIndex = $2; // This would be 1 in our example
new RegExp("\\[" + FindIndex + "\\]\\[(.+?)\\]", "g")
// Replace original match now with hyperlink
}
This would match
[1][http://www.example.com]
End result for first example would be:
Text Example
Edit
I've gotten as far as this now:
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
reg = new RegExp(
"\\[(.+?)\\]\\[([0-9]+)\\]",
"gi");
var result;
while ((result = reg.exec(Text)) !== null) {
var LinkText = result[1];
var Match = result[0];
Text = Text.replace(new RegExp(Match, "g"), '" + LinkText + "');
}
console.log(Text);

I agree with Jason that it’d be faster/safer to use an existing Markdown library, but you’re looking for String.prototype.replace (also, use RegExp literals!):
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
var rePattern = /\[(.+?)\]\[([0-9]+)\]/gi;
console.log(Text.replace(rePattern, function(match, text, urlId) {
// return an appropriately-formatted link
return `${text}`;
}));

I managed to do it in the end with this:
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
reg = new RegExp(
"\\[(.+?)\\]\\[([0-9]+)\\]",
"gi");
var result;
while (result = reg.exec(Text)) {
var LinkText = result[1];
var Match = result[0];
var LinkID = result[2];
var FoundURL = new RegExp("\\[" + LinkID + "\\]\\[(.+?)\\]", "g").exec(Text);
Text = Text.replace(Match, '' + LinkText + '');
}
console.log(Text);

Here we're using exec method, it helps to get all matches (with help while loop) and get position of matched string.
var input = "A 3 numbers in 333";
var regExp = /\b(\d+)\b/g, match;
while (match = regExp.exec(input))
console.log("Found", match[1], "at", match.index);
// → Found 3 at 2 // Found 333 at 15

Using back-references to to restrict the match so that the code will match if your text is:
[Text Example][1]\n[1][http://www.example.com]
and the code will not match if your text is:
[Text Example][1]\n[2][http://www.example.com]
var re = /\[(.+?)\]\[([0-9]+)\s*.*\s*\[(\2)\]\[(.+?)\]/gi;
var str = '[Text Example][1]\n[1][http://www.example.com]';
var subst = '$1';
var result = str.replace(re, subst);
console.log(result);
\number is used in regex to refer a group match number, and $number is used by the replace function in the same way, to refer group results.

This format is based on Markdown. There are several JavaScript ports available. If you don't want the whole syntax, then I recommend stealing the portions related to links.

Another way to iterate over all matches without relying on exec and match subtleties, is using the string replace function using the regex as the first parameter and a function as the second one. When used like this, the function argument receives the whole match as the first parameter, the grouped matches as next parameters and the index as the last one:
var text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
var arrMatch = null;
var rePattern = new RegExp("\\[(.+?)\\]\\[([0-9]+)\\]", "gi");
text.replace(rePattern, function(match, g1, g2, index){
// Do whatever
})
You can even iterate over all groups of each match using the global JS variable arguments, excluding the first and last ones.

I know it's old, but since I stumble upon this post, I want to strait the things up.
First of all, your way of thinking into solving this problem is too complicated, and when the solution of supposedly simple problem becomes too complicated, it is time to stop and think what went wrong.
Second, your solution is super inefficient in a way, that you are first trying to find what you want to replace and then you are trying to search the referenced link information in the same text. So calculation complexity eventually becomes O(n^2).
This is very disappointing to see so many upvotes on something wrong, because people that are coming here, learning mostly from the accepted solution, thinking that this seems be legit answer and using this concept in their project, which then becomes a very badly implemented product.
The approach to this problem is pretty simple. All you need to do, is to find all referenced links in the text, save them as a dictionary and only then search for the placeholders to replace, using the dictionary. That's it. It is so simple! And in this case you will get complexity of just O(n).
So this is how it goes:
const text = `
[2][https://en.wikipedia.org/wiki/Scientific_journal][5][https://en.wikipedia.org/wiki/Herpetology]
The Wells and Wellington affair was a dispute about the publication of three papers in the Australian Journal of [Herpetology][5] in 1983 and 1985. The publication was established in 1981 as a [peer-reviewed][1] [scientific journal][2] focusing on the study of [3][https://en.wikipedia.org/wiki/Amphibian][amphibians][3] and [reptiles][4] ([herpetology][5]). Its first two issues were published under the editorship of Richard W. Wells, a first-year biology student at Australia's University of New England. Wells then ceased communicating with the journal's editorial board for two years before suddenly publishing three papers without peer review in the journal in 1983 and 1985. Coauthored by himself and high school teacher Cliff Ross Wellington, the papers reorganized the taxonomy of all of Australia's and New Zealand's [amphibians][3] and [reptiles][4] and proposed over 700 changes to the binomial nomenclature of the region's herpetofauna.
[1][https://en.wikipedia.org/wiki/Academic_peer_review]
[4][https://en.wikipedia.org/wiki/Reptile]
`;
const linkRefs = {};
const linkRefPattern = /\[(?<id>\d+)\]\[(?<link>[^\]]+)\]/g;
const linkPlaceholderPattern = /\[(?<text>[^\]]+)\]\[(?<refid>\d+)\]/g;
const parsedText = text
.replace(linkRefPattern, (...[,,,,,ref]) => (linkRefs[ref.id] = ref.link, ''))
.replace(linkPlaceholderPattern, (...[,,,,,placeholder]) => `${placeholder.text}`)
.trim();
console.log(parsedText);

We Keep Coding

JavaScript is the programming language of the Web.

Regex to identify specific strings and not just keys - javascript

Related

Regular expression to eliminate a word

javascript string replace unicode troubles

Most efficient way to extract parts of string

replace characters in a string using an associative array mapping

Javascript Regexp loop all matches

Categories

Resources