Substring arguments best practice - javascript

The JavaScript String object has two substring functions substring and substr.
substring takes two parameters beginIndex and endIndex.
substr also takes two parameters beginIndex and length.
It's trivial to convert any range between the two variants but I wonder if there's any significance two how the two normally would be used (in day-to-day programming). I tend to favor the index/length variant but I have no good explanation as to why.
I guess it depends on what kind of programming you do, but if you have strong opinion on the matter, I'd like to hear it.
When is a (absolute, relative) range more suited than an (absolute, absolute) and vice versa?
Update:
This is not a JavaScript question per se (JavaScript just happen to implement both variants [which I think is stupid]), but what practical implication does the relative vs. absolute range have? I'm looking for solid argument for why we prefer one over the other. To broaden the debate a bit, how would you prefer to design your data structures for use with either one approach?

I prefer the startIndex, endIndex variant (substring) because String.substring() operates the same way in Java and I feel it makes me more efficient to stick to the same concepts in whatever language I use most often (when possible).
If I were doing more C# work, I might use the other variant more because that is how String.Substring() works in C#.
To answer your comment about JavaScript having both, it looks like substr() was added to browsers after substring() (reference - it seems that although substr() was part of JavaScript 1.0, most browser vendors didn't implement it until later). This suggests to me that even the implementers of the early language recognized the duplication of functionality. I'd suggest substring() came first in an attempt to leverage the JavaScript trademark. Regardless, it seems that they recognized this duplication in ECMA-262 and took some small steps toward removing it:
substring(): ECMA Version: ECMA-262
substr(): ECMA Version: None, although ECMA-262 ed. 3 has a non-normative section suggesting uniform semantics for substr
Personally I wouldn't mind a substring() where the second parameter can be negative, which would return the characters between the first parameter and the length of the string minus the second parameter. Of course you can already achieve that more explicitly and I imagine the design would be confusing to many developers:
String s1 = "The quick brown fox jumps over the lazy dog";
String s2 = s1.substring(20, -13); // "jumps over"

When is a (absolute, relative) range more suited than an (absolute, absolute) and vice versa?
The former, when you know how much, the latter when you know where.
I presume substring is implemented in terms of substr:
substring( b, e ) {
return substr( b, e - b );
}
or substr in terms of substring:
substr( b, l) {
return substring( b, b + l );
}

I slightly prefer the startIndex, endIndex variant, since then to get the last bit of a string I can do:
string foo = bar.substring(5, foo.length());
instead of:
string foo = bar.substring(5, foo.length() - 5);

It depends on the case, but I more often find I know exactly how many characters I want to take out, and prefer the start with length parameterization. But I could easily see a case where I've searched a long string for two tokens and now have their indexes, while it's trivial math to use either case, in this case I might prefer the start and end indexes.
Also, from a document writer's perspective, having two parameters of the same basic meaning is probably easier to write about and an easier mnemonic.
Each of these functions does neat saves when given strange values, such as an end smaller than a start, a negative length, a negative start, or a length or end beyond the string's end.
For JavaScript the best practice is to use substring over substr because it's supported in more (albeit usually older) browsers. If they'd gone with BasicScript instead would there have been a MID() and a MIDDLE() function? Who doesn't love BASIC syntax?

Related

What is the time complexity or Big O notation for "str.replace()" built In function in Javascript?

I am confused if the time complexity for str.replace() function is O(n) or O(1), for example:
var str = "Hello World";
str = str.replace("Hello", "Hi");
console.log(str);
//===> str = "Hi World"
Is it always the same answer or does it depend on what we replace?
Any thoughts or helpful links?!
Firstly it should be
str = str.replace("Hello", "Hi");
Secondly,
searching a substring inside a string can be done in linear time using KMP algorithm which is the most efficient.
Replacing in the worst case will take linear time as well.
So overall time complexity: O(n)
Here n is dependent on the string str.
In the worst case it will end up traversing the whole string and still not find the searchValue given to the replace function.
It's definitely not O(1) (comparison of string searching algorithms) , but ECMAScript 6 doesn't dictate the search algorithm:
Search string for the first occurrence of searchString and let pos be the index within string of the first code unit of the matched substring and let matched be searchString. If no occurrences of searchString were found, return string.
So it depends on the implementation.
Is it always the same answer or does it depend on what we replace?
Generally, it will be slower for longer search strings. How much slower is implementation-dependent.
You'll really have to look into implementation details for a complete answer. But to start there's V8's runtime-strings.cc and builtins-string-gen.cc. It's a deep dive--and I don't know c++, so I'm not entirely sure if I'm even looking at the right files, but they seem to use different approaches depending on the size of the needle and the depth of recursion needed to build a search tree.
For example, in builtins-string-gen.cc there's a block under ES6 #sec-string.prototype.replace that checks if the search_string has a length of 1, and if the subject_string length is greater than 255 (0xFF). When those conditions are true it looks like Runtime_StringReplaceOneCharWithString in runtime-strings.cc is called which in turn will try calling StringReplaceOneCharWithString first with a tree-traversable subject_string.
If that search hits the recursion limit, the runtime makes another call to StringReplaceOneCharWithString but this time with a flattened subject_string.
So, my partially educated guess here is you're always looking at some kind of linear time. Possibly O(mn) when hitting the recursion limit and doing a follow-on naive search. I don't know for sure that it's a naive search, but a flattened string to me implies traversing the subject_string step-by-step instead of through a search tree.
And possibly something less than O(mn) when the tree-traversal doesn't hit the recursion limit, though I'm not entirely sure what they're gaining by walking the subject_string recursively.
For an actual what is the time complexity for Javascript implementations, you'll probably want to ask the runtime devs directly or see if what they're doing is like other string searching algorithms to figure out which cases run in what time complexity.

Math.pow alternative "**" ES7 polyfill for IE11

I'm trying to evaluate an expression which contains power, in string as **. i.e. eval("(22**3)/12*6+3/2").The problem is Internet Explorer 11 does not recognizes this and throws syntax error. Which poly-fill I should use to overcome this? Right now I'm using Modernizr 2.6.2.
example equation would be,
((1*2)*((3*(4*5)*(1+3)**(4*5))/((1+3)**(4*5)-1)-1)/6)/7
((1*2)*((3*(4*5)*(1+3)**(4*5))/((1+3)**(4*5)-1)-1)/6)/7*58+2*5
(4*5+4-5.5*5.21+14*36**2+69/0.258+2)/(12+65)
If it is not possible to do this, what are the possible alternatives?
You cannot polyfill operators - only library members (prototypes, constructors, properties).
As your operation is confined to an eval call, you could attempt to write your own expression parser, but that would be a lot of work.
(As an aside, you shouldn't be using eval anyway, for very good reasons that I won't get into in this posting).
Another (hack-ish) option is to use a regular expression to identify trivial cases of x**y and convert them to Math.pow:
function detectAndFixTrivialPow( expressionString ) {
var pattern = /(\w+)\*\*(\w+)/i;
var fixed = expressionString.replace( pattern, 'Math.pow($1,$2)' );
return fixed;
}
eval( detectAndFixTrivialPow( "foo**bar" ) );
You can use a regular expression to replace the occurrences of ** with Math.pow() invocations:
let expression = "(22**3)/12*6+3/2"
let processed = expression.replace(/(\w+)\*\*(\w+)/g, 'Math.pow($1,$2)');
console.log(processed);
console.log(eval(processed));
Things might get complicated if you start using nested or chained power expressions though.
I think you need to do some preprocessing of the input. Here is how i would approach this:
Find "**" in string.
Check what is on the left and right.
Extract "full expressions" from left and right - if there is just a number - take it as is, and if there is a bracket - find the matching one and take whatever is inside as an expression.
Replace the 2 expressions with Math.pow(left, right)
You can use Babel online to convert javascript for IE 11.

Most efficient regex for checking if a string contains at least 3 alphanumeric characters

I have this regex:
(?:.*[a-zA-Z0-9].*){3}
I use it to see if a string has at least 3 alphanumeric characters in it. It seems to work.
Examples of strings it should match:
'a3c'
'_0_c_8_'
' 9 9d '
However, I need it to work faster. Is there a better way to use regex to match the same patterns?
Edit:
I ended up using this regex for my purposes:
(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}
(no modifiers needed)
The most efficient regex approach is to use the principle of contrast, i.e. using opposite character classes side by side. Here is a regex that can be used to check if a string has 3 Latin script letters or digits:
^(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}
See demo.
In case you need a full string match, you will need to append .* (or .*$ if you want to guarantee you will match all up to the end of string/line), but in my tests on regexhero, .* yields better performance):
^(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}.*
Also, a lot depends on the engine. PCRE has auto-optimizations in place that consists in auto-possessification (i.e. it turns the * to *+ in (?:[^a-zA-Z0-9]*+).
See more details on password validation optimizations here.
(?:.*?[a-zA-Z0-9]){3}.*
You can use this.This is much faster and takes much lesser steps than yours.See demo.You probably would want to use ^$ anchors too to make sure there are no partial matches.
https://regex101.com/r/nS2lT4/32
The reason is
(?:.*[a-zA-Z0-9].*){3}
^^
This actually consumes the whole string and then engine has to backtrack.When using the other regex this is avoided
Just consider this. Regular expressions are powerful because they're expressive and very flexible (with features such as look-ahead, greedy consumption and back-tracking). There will almost always be a cost to that, however minor.
If you want raw speed (and you're willing to give up the expressiveness), you may find that it's faster to bypass regular expressions altogether and just evaluate the string, such as with the following pseudo-code:
def hasThreeAlphaNums(str):
alphanums = 0
for pos = 0 to len(str) - 1:
if str[pos] in set "[a-zA-Z0-9]":
alphanums++
if alphanums == 3:
return true
return false
It's a parser (a very simple one in this case), a tool that can be even more powerful than regular expressions. For a more concrete example, consider the following C code:
#include <ctype.h>
int hasThreeAlphaNums (char *str) {
int count = 0;
for (int ch = *str; ch != '\0'; str++)
if (isalnum (ch))
if (++count == 3)
return 1;
return 0;
}
Now, as to whether or not that's faster for this specific case, that depends on many factors, such as whether the language is interpreted or compiled, how efficient the regex is under the covers, and so on.
That's why the mantra of optimisation is "Measure, don't guess!" You should evaluate the possibilities in your target environment.

"+" Quantifier not doing its job?

I'm a novice programmer making a simple calculator in JavaScript for a school project, and instead of using eval() to evaluate a string, I made my own function calculate(exp).
Essentially, my program uses order of operations (PEMDAS, or Parenthesis, Exponents, Multiplication/Division, Addition/Subtraction) to evaluate a string expression. One of my regex patterns is like so ("mdi" for multiplication/division):
mdi = /(-?\d+(\.\d+)?)([\*\/])(-?\d+(\.\d+)?)/g; // line 36 on JSFiddle
What this does is:
-?\d+ finds an integer number
(\.\d+)? matches the decimal if there is one
[\*\/] matches the operator used (* or / for multiplication or division)
/g matches every occurence in the string expression.
I loop through this regular expression's matches with the following code:
while((res = mdi.exec(exp)) !== null) { // line 69 on JSFiddle
exp = exp.replace(mdi,
function(match,$1,$3,$4,$5) {
if($4 == "*")
return parseFloat($1) * parseFloat($5);
else
return parseFloat($1) / parseFloat($5);
});
exp = exp.replace(doN,""); // this gets rid of double negatives
}
However, this does not work all the time. It only works with numbers with an absolute value less than 10. I cannot do any operations on numbers like 24 and -5232000321, even though the regex should match it with the + quantifier. It works with small numbers, but crashes and uses up most of my CPU when the numbers are larger than 10.
For example, when the expression 5*.5 is inputted, 2.5 is outputted, but when you input 75*.5 and press enter, the program stops.
I'm not really sure what's happening here, because I can't locate the source of the error for some reason - nothing is showing up even though I have console.log() all over my code for debugging, but I think it is something wrong with this regex. What is happening?
The full code (so far) is here at JSFiddle.net, but please be aware that it may crash. If you have any other suggestions, please tell me as well.
Thanks for any help.
The problem is
bzp = /^.\d/;
while((res = bzp.exec(result)) !== null) {
result = result.replace(bzp,
function($match) {
console.log($match + " -> 0 + " + $match);
return "0" + $match;
});
}
It keeps prepending zeros with no limit.
Removing that code it works well.
I have also cleaned your code, declared variables, and made it more maintainable: Demo
If you have any other suggestions, please tell me as well.
As pointed out in the comments, parsing your input by iteratively applying regular expressions is very ad-hoc. A better approach would be to actually construct a grammar for your input language and parse based on that. Here's an example grammar that basically matches your input language:
expr ::= term ( additiveOperator term )*
term ::= factor ( multiplicativeOperator factor )*
expr ::= number | '(' expr ')'
additiveOperator ::= '+' | '-'
multiplicativeOperator ::= '*' | '/'
The syntax here is pretty similar to regular expressions, where parenthesese denote groups, * denotes zero-or-more repetitions, and | denotes alternatives. The symbols enclosed in single quotes are literals, whereas everything else is symbolic. Note that this grammar doesn't handle unary operators (based on your post it sounds like you assume a single negative sign for negative numbers, which can be parsed by the number parser).
There are several parser-generator libraries for JavaScript, but I prefer combinator-style parsers where the parser is built functionally at runtime rather than having to run a separate tool to generate the code for your parer. Parsimmon is a nice combinator parser for JavaScript, and the API is pretty easy to wrap your head around.
A parser usually returns some sort of a tree data structure corresponding to the parsed syntax (i.e. an abstract syntax tree). You then traverse this data structure in order to calculate the value of the arithmetic expression.
I created a fiddle demonstrating parsing and evaluating of arithmetic expressions. I didn't integrate any of this into your existing calculator interface, but if you can understand how to use the parser
Mathematical expression are not parsed and calculated with regular expressions because of the number of permutations and combinations available. The faster way so far, is POST FIX notation because other notations are not as fast as this one. As they mention on Wikipedia:
In comparison testing of reverse Polish notation with algebraic
notation, reverse Polish has been found to lead to faster
calculations, for two reasons. Because reverse Polish calculators do
not need expressions to be parenthesized, fewer operations need to be
entered to perform typical calculations. Additionally, users of
reverse Polish calculators made fewer mistakes than for other types of
calculator. Later research clarified that the increased speed
from reverse Polish notation may be attributed to the smaller number
of keystrokes needed to enter this notation, rather than to a smaller
cognitive load on its users. However, anecdotal evidence suggests
that reverse Polish notation is more difficult for users to learn than
algebraic notation.
Full article: Reverse Polish Notation
And also here you can see other notations that are still far more better than regex.
Calculator Input Methods
I would therefore suggest you change your algorithm to a more efficient one, personally I would prefer POST FIX.

Performance about replace() or substr() in Javascript

I was wondering about Javascript performance about using string.replace() or string.substr(). Let me explain what I'm doing.
I've a string like
str = "a.aa.a.aa."
I just have to "pop" last element in str where I always know what type of character it is (e.g, it's a dot here).
It's so simple, I can follow a lot of ways, like
str = str.substr(0, str.length-1) // same as using slice()
or
str = str.replace(/\.$/, '')
Which methods would you use? Why? Is there some lack in performance using this or that method? Length of the string is negligible.
(this is my first post, so if I'm doing something wrong please, notify me!)
For performance tests in JavaScript use jsPerf.com
I created a testcase for your question here, which shows, that substr is a lot faster (at least in firefox).
If you just want the last character in the string, then use the subscript, not some replacement:
str[str.length-1]
Do you have to do this thousands of times in a loop? If not (and "Length of string is negligible"), any way will do.
That said, I'd prefer the first option, since it makes the intention of trimming the last character more clear than the second one (oh, and it's faster, in case you do need to run this a zillion times. Since in the regex case, you need to not only build a new string but also compile a RegExp and run it against the input.)
When you have this kind of doubt, either pick what you like the best (style-speaking, as running this only once doesn't matter much), or use http://jsperf.com.
For this very example, see here why substr is better :-).
The substr way should always be faster than any kind of RegExp. But the performance difference should be minor.

Categories