Loop to single statement - javascript

In one of our applications we have the following lines:
while (text.indexOf(' ') !== -1)
text = text.replace(' ', '_');
while (text.indexOf('*') !== -1)
text = text.replace('*', 'x');
As far as I know I could also write it like this to avoid the loops:
text = text.replace(/ /g, '_');
text = text.replace(/*/g, 'x');
Which of the two versions would be better programming style? Is there any difference (performance, result, errors, ...) between these two? Do we have to avoid loops if possible?

I noticed that using regex somewhat confuses new (fresh/inexperienced) developers. So, they would find the first option easier to read and grasp what is it doing.
However, the second options is:
Short and concise, also easy to read (provided, you are familiar with regex);
Theoretically it should be faster, since you are letting the native code to do all the heavy lifting, instead of asking it to interpret the looping code an reassigning a string variable. However, the regex has some overhead that may lead to inefficiency. But, it does not apply (noticeable) in real life scenarios, especially in this case.
As for errors, it does not throw any errors, if you don't make any mistakes in your regex string. Ironically, you did (you should escape special character *). So, here is one reason for choosing option 1 :)
text.replace(/\*/g, 'x');

No brainer - your second option. Don't invoke a loop if you can avoid it. Javascript is a functional language; use it!
An even better solution is to chain them as follows:
text = text.replace(/ /g, "_").replace(/\*/g, "x");

Related

Finding text strings in JavaScript

I have a large valid JavaScript file (utf-8), from which I need to extract all text strings automatically.
For simplicity, the file doesn't contain any comment blocks in it, only valid ES6 JavaScript code.
Once I find an occurrence of ' or " or `, I'm supposed to scan for the end of the text block, is where I got stuck, given all the possible variations, like "'", '"', "\'", '\"', '", `\``, etc.
Is there a known and/or reusable algorithm for detecting the end of a valid ES6 JavaScript text block?
UPDATE-1: My JavaScript file isn't just large, I also have to process it as a stream, in chunks, so Regex is absolutely not usable. I didn't want to complicate my question, mentioning joint chunks of code, I will figure that out myself, If I have an algorithm that can work for a single piece of code that's in memory.
UPDATE-2: I got this working initially, thanks to the many advises given here, but then I got stuck again, because of the Regular Expressions.
Examples of Regular Expressions that break any of the text detection techniques suggested so far:
/'/
/"/
/\`/
Having studied the matter closer, by reading this: How does JavaScript detect regular expressions?, I'm afraid that detecting regular expressions in JavaScript is a whole new ball game, worth a separate question, or else it gets too complicated. But I appreciate very much if somebody can point me in the right direction with this issue...
UPDATE-3: After much research I found with regret that I cannot come up with an algorithm that would work in my case, because presence of Regular Expressions makes the task incredibly more complicated than was initially thought. According to the following: When parsing Javascript, what determines the meaning of a slash?, determining the beginning and end of regular expressions in JavaScript is one of the most complex and convoluted tasks. And without it we cannot figure out when symbols ', '"' and ` are opening a text block or whether they are inside a regular expression.
The only way to parse JavaScript is with a JavaScript parser. Even if you were able to use regular expressions, at the end of the day they are not powerful enough to do what you are trying to do here.
You could either use one of several existing parsers, that are very easy to use, or you could write your own, simplified to focus on the string extraction problem. I hardly imagine you want to write your own parser, even a simplified one. You will spend much more time writing it and maintaining it than you might think.
For instance, an existing parser will handle something like the following without breaking a sweat.
`foo${"bar"+`baz`}`
The obvious candidates for parsers to use are esprima and babel.
By the way, what are you planning to do with these strings once you extract them?
If you only need an approximate answer, or if you want to get the string literals exactly as they appear in the source code, then a regular expression can do the job.
Given the string literal "\n", do you expect a single-character string containing a newline or the two characters backslash and n?
In the former case you need to interpret escape sequences exactly like a JavaScript interpreter does. What you need is a lexer for JavaScript, and many people have already programmed this piece of code.
In the latter case the regular expression has to recognize escape sequences like \x40 and \u2026, so even in that case you should copy the code from an existing JavaScript lexer.
See https://github.com/douglascrockford/JSLint/blob/master/jslint.js, function tokenize.
Try code below:
txt = "var z,b \n;z=10;\n b='321`1123`321321';\n c='321`321`312`3123`';"
function fetchStrings(txt, breaker){
var result = [];
for (var i=0; i < txt.length; i++){
// Define possible string starts characters
if ((txt[i] == "'")||(txt[i] == "`")){
// Get our text string;
textString = txt.slice(i+1, i + 1 + txt.slice(i+1).indexOf(txt[i]));
result.push(textString)
// Jump to end of fetched string;
i = i + textString.length + 1;
}
}
return result;
};
console.log(fetchStrings(txt));

Regex Delimeter in PapaParse

I wish to ask, is it possible to use regexes as delimiters in PapaParse? Something like:
Papa.parse(string,
{
delimiter:regex
}
);
I am trying to match a specific pattern of CSVs like so:
/([A-Za-z]{2}[0-9]+,?)/g
i.e. I want exactly 2 letters, any amount of numbers, and a comma (or not, in the case of the last element).
Since string.split has a wonderful habit of returning anything but null when nothing matches regex patterns, I was hoping that my answer would lie in PapaParse. If this is not possible, then I would do something more long winded, but hopefully I can be laz-... efficent this time. :)
Trying to do the following:
Papa.parse('ACB5,dsa',{delimiter:'[A-Za-z]{2}[0-9]+,?'});
Results in
["ACB5","dsa"]
Thank you for your time.
edit
Trying out the regex on regexr.com shows that it works with values like
AB544444444444,BC5,
aa5,
At this point, I realize that this was actually a dozy question, considering how a delimiter is the thing that separates what you want to break up.
I'm writing the longer winded version now, so I'll stick that up soon
As Matt (and common sense) rightly say, yes, The delimiter is just ye olde comma. I was looking for a way to separate the results based on a regex, which past me had thought would have some similarity to how string.split works. This is the snippet I was trying to shrink down.
var result = null;
var regex = /([A-Za-z]{2}[0-9]+,?)/g; //Any two letters, a number, and a space
$(document).ready( function() {
$('#inputGraphText').on('input', function(e){ //on text input...
result = $(this).val().split(','); //split using the complex delimiter ','. Also adds a few "" elements to the array for some reason.
var tidy = new Array();
result.forEach(function(elem){
if(elem.search(regex) > -1){
tidy.push(elem.replace('/[, ]/g',''));//Tidy up the result
}
});
$('#first').html(tidy); //place to print out the tidied results
})
});
Obviously , this is not terribly schnazzy (and completely misses out on using PapaParse), but it is what I originally set out to do.
Any better alternatives will take pride of place, but for now, this is fine.
My apologies for the confusion.

Performance about replace() or substr() in Javascript

I was wondering about Javascript performance about using string.replace() or string.substr(). Let me explain what I'm doing.
I've a string like
str = "a.aa.a.aa."
I just have to "pop" last element in str where I always know what type of character it is (e.g, it's a dot here).
It's so simple, I can follow a lot of ways, like
str = str.substr(0, str.length-1) // same as using slice()
or
str = str.replace(/\.$/, '')
Which methods would you use? Why? Is there some lack in performance using this or that method? Length of the string is negligible.
(this is my first post, so if I'm doing something wrong please, notify me!)
For performance tests in JavaScript use jsPerf.com
I created a testcase for your question here, which shows, that substr is a lot faster (at least in firefox).
If you just want the last character in the string, then use the subscript, not some replacement:
str[str.length-1]
Do you have to do this thousands of times in a loop? If not (and "Length of string is negligible"), any way will do.
That said, I'd prefer the first option, since it makes the intention of trimming the last character more clear than the second one (oh, and it's faster, in case you do need to run this a zillion times. Since in the regex case, you need to not only build a new string but also compile a RegExp and run it against the input.)
When you have this kind of doubt, either pick what you like the best (style-speaking, as running this only once doesn't matter much), or use http://jsperf.com.
For this very example, see here why substr is better :-).
The substr way should always be faster than any kind of RegExp. But the performance difference should be minor.

Fastest way to remove hyphens from a string

I have IDs that look like: 185-51-671 but they can also have letters at the end, 175-1-7b
All I want to do is remove the hyphens, as a pre-processing step. Show me some cool ways to do this in javascript? I figure there are probably quite a few questions like this one, but I'm interested to see what optimizations people will come up with for "just hyphens"
Thanks!
edit: I am using jQuery, so I guess .replace(a,b) does the trick (replacing a with b)
numberNoHyphens = number.replace("-","");
any other alternatives?
edit #2:
So, just in case anyone is wondering, the correct answer was
numberNoHyphens = number.replace(/-/g,"");
and you need the "g" which is the pattern switch or "global flag" because
numberNoHyphens = number.replace(/-/,"");
will only match and replace the first hyphen
You need to include the global flag:
var str="185-51-671";
var newStr = str.replace(/-/g, "");
This is not faster, but
str.split('-').join('');
should also work.
I set up a jsperf test if anyone wants to add and compare their methods, but it's unlikely anything will be faster than the replace method.
http://jsperf.com/remove-hyphens-from-string
var str='185-51-671';
str=str.replace(/-/g,'');
Gets much easier in String.prototype.replaceAll(). Check out the browser support for the built-in method.
const str = '185-51-671';
console.log(str.replaceAll('-', ''));
Som of these answers, prior to edits, did not remove all of the hyphens. You would need to use .replaceAll("-","")
In tidyverse, there are multiple functions that could suit your needs. Specifically, I would use str_remove, which will replace in a string, the giver character by an empty string (""), effectively removing it (check here the documentation). Example of its usage:
str_remove(x, '-')

Processing Javascript RegEx submatches

I am trying to write some JavaScript RegEx to replace user inputed tags with real html tags, so [b] will become <b> and so forth. the RegEx I am using looks like so
var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;
with the following JavaScript
s.replace(exptags,"<$1>$2</$1>");
this works fine for single nested tags, for example:
[b]hello[/b] [u]world[/u]
but if the tags are nested inside each other it will only match the outer tags, for example
[b]foo [u]to the[/u] bar[/b]
this will only match the b tags. how can I fix this? should i just loop until the starting string is the same as the outcome? I have a feeling that the ((.){1,}?) patten is wrong also?
Thanks
The easiest solution would be to to replace all the tags, whether they are closed or not and let .innerHTML work out if they are matched or not it will much more resilient that way..
var tagreg = /\[(\/?)(b|u|i|s|center|code)]/ig
div.innerHTML="[b][i]helloworld[/b]".replace(tagreg, "<$1$2>") //no closing i
//div.inerHTML=="<b><i>helloworld</i></b>"
AFAIK you can't express recursion with regular expressions.
You can however do that with .NET's System.Text.RegularExpressions using balanced matching. See more here: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
If you're using .NET you can probably implement what you need with a callback.
If not, you may have to roll your own little javascript parser.
Then again, if you can afford to hit the server you can use the full parser. :)
What do you need this for, anyway? If it is for anything other than a preview I highly recommend doing the processing server-side.
You could just repeatedly apply the regexp until it no longer matches. That would do odd things like "[b][b]foo[/b][/b]" => "<b>[b]foo</b>[/b]" => "<b><b>foo</b></b>", but as far as I can see the end result will still be a sensible string with matching (though not necessarily properly nested) tags.
Or if you want to do it 'right', just write a simple recursive descent parser. Though people might expect "[b]foo[u]bar[/b]baz[/u]" to work, which is tricky to recognise with a parser.
The reason the nested block doesn't get replaced is because the match, for [b], places the position after [/b]. Thus, everything that ((.){1,}?) matches is then ignored.
It is possible to write a recursive parser in server-side -- Perl uses qr// and Ruby probably has something similar.
Though, you don't necessarily need true recursive. You can use a relatively simple loop to handle the string equivalently:
var s = '[b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]';
var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;
while (s.match(exptags)) {
s = s.replace(exptags, "<$1>$2</$1>");
}
document.writeln('<div>' + s + '</div>'); // after
In this case, it'll make 2 passes:
0: [b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]
1: <b>hello</b> <u>world</u> <b>foo [u]to the[/u] bar</b>
2: <b>hello</b> <u>world</u> <b>foo <u>to the</u> bar</b>
Also, a few suggestions for cleaning up the RegEx:
var exptags = /\[(b|u|i|s|center|code)\](.+?)\[\/(\1)\]/ig;
{1} is assumed when no other count specifiers exist
{1,} can be shortened to +
Agree with Richard Szalay, but his regex didn't get quoted right:
var exptags = /\[(b|u|i|s|center|code)](.*)\[\/\1]/ig;
is cleaner. Note that I also change .+? to .*. There are two problems with .+?:
you won't match [u][/u], since there isn't at least one character between them (+)
a non-greedy match won't deal as nicely with the same tag nested inside itself (?)
Yes, you will have to loop. Alternatively since your tags looks so much like HTML ones you could replace [b] for <b> and [/b] for </b> separately. (.){1,}? is the same as (.*?) - that is, any symbols, least possible sequence length.
Updated: Thanks to MrP, (.){1,}? is (.)+?, my bad.
How about:
tagreg=/\[(.?)?(b|u|i|s|center|code)\]/gi;
"[b][i]helloworld[/i][/b]".replace(tagreg, "<$1$2>");
"[b]helloworld[/b]".replace(tagreg, "<$1$2>");
For me the above produces:
<b><i>helloworld</i></b>
<b>helloworld</b>
This appears to do what you want, and has the advantage of needing only a single pass.
Disclaimer: I don't code often in JS, so if I made any mistakes please feel free to point them out :-)
You are right about the inner pattern being troublesome.
((.){1,}?)
That is doing a captured match at least once and then the whole thing is captured. Every character inside your tag will be captured as a group.
You are also capturing your closing element name when you don't need it and are using {1} when that is implied. Below is a cleanup up version:
/\[(b|u|i|s|center|code)](.+?)\[\/\1]/ig
Not sure about the other problem.

Categories