why the third option is better than regex?

why the third option is better than regex? - javascript

I think regex is pretty fast and the third option is confusing. What do you think?
http://jqfundamentals.com/book/ch09s12.html
// old way
if (type == 'foo' || type == 'bar') { ... }
// better
if (/^(foo|bar)$/.test(type)) { ... }
// object literal lookup
if (({ foo : 1, bar : 1 })[type]) { ... }

I'll humbly disagree with Rebecca Murphey and vote for simplicity, for the first option.
I think regex is pretty fast
Machine code is even faster, but we don't use it.
the third option is confusing
It's only confusing if you're unfamiliar with the trick. (And for people not used to seeing regex to compare two strings, second option will be even more confusing.)

I just made a rudimentary benchmark and I'm honestly not sure how she got those results...
http://jsbin.com/uzuxi4/2/edit
Regex seems to scale the best, but the first is by far the fastest on all modern browsers. The last is excruciatingly slow. I understand the complexity theory between the three, but in practice, it doesn't seem that she's correct.
Let alone the fact that the first also has the best readability, it also seems to be the fastest. I even nested loops to take advantage of any browser caching of literal tables or constants (to no avail).
Edit:
It appears that when an object is explicitly created, she is indeed correct, however: http://jsbin.com/uzuxi4/4/edit
function __hash() {
...
var type = 'bar';
var testobj = { foo : 1, bar : 1 };
var c = 0;
for (i = 0; i < 1000; i++) {
if (testobj[type]) {
for (j = 0; j < 10000; j++) {
if (testobj[type]) { c++; }
}
}
}
...
}
We see that once the object has an internal reference, the seek time drops to about 500 ms which is probably the plateau. Object key lookup may be the best for larger data-sets, but in practice I don't really see it as a viable option for every-day use.

The first option involves
potentially two string compares.
The second option involves a parse each time.
The third option does a simple hash of the string and then a hash table look
up, which is the most efficient in this case, in terms of the amount of work that needs to be done.
The third option also scales better than the other two as more alternative strings are added, because the first two are O(n) and the third is O(1) in the average case.
If we want to talk about which option is prettier / more maintainable, that's a whole separate conversation.

The first case should really be done with === to avoid any type coercions, but depending on the number of alternatives you need to check it can become O(N), however depending on your code most JS engines will be able to a simple pointer check for the comparison.
In the second case you use a RegExp, and while RegExps are very fast, they tend to be slower for simple equality decisions than more direct equality comparisons. Simple string comparisons like yours are likely to be a pointer compare in a modern JS engine, but if you use a regexp the regexp must read every character.
The third case is more tricky -- if you do have a lot of values to check it may be faster, especially if you cache the object rather than repeatedly recreating it as it will simply be a hash lookup -- the exact performance of the lookup depends on the engine though.
I suspect a switch statement would beat the object literal case though.
Out of curiosity I made a test (which you can see here), the fastest approach (in a webkit nightly at least) seems to be a switch statement, followed by if, followed by the object, with regexp's last.

Just wanted to weigh in here and remind everyone that this is an open-source book with contributions from many people! The section being discussed, indeed, is based on content provided by a community member. If you have suggestions for improving the section, by all means, please open an issue on the repository, or better, fork the repo and send me a pull request :)
That said, I have just set up a jsPerf test (http://jsperf.com/string-tests), and at least in Chrome, the results are the opposite of what the book says. I've opened an issue on the book, and will try to deal with this in the near future.
Finally, two things:
I want to echo what another commenter said: perf optimizations are fun to talk about, and while there are some that really do matter, many don't. It's important to keep perspective on how much -- or little -- of a difference stuff like this makes.
I also want to echo the commenter who said, essentially, that readability is in the eyes of the beholder. Something confusing to one person may be perfectly clear to another. I do believe we should strive for readability, but I think there's a happy medium. Reading code that was a bit perplexing to me at first opened my eyes to a lot of great techniques; I'd have hated if it had been written so the complete newb that I was at the time could understand it.

The object literal lookup is optimized with a hash lookup which only requires one logical check instead of n. In a longer list you also won't have to repeat "type == " a zillion times.

For simplicity and readability the first will win every time. It might not be as fast, but who cares unless it is in a heavily run loop.
Good compilers should optimize things like this away.

Related

Writing high-performance Javascript code without getting deoptimised

When writing performance-sensitive code in Javascript which operates on large numeric arrays (think a linear algebra package, operating on integers or floating-point numbers), one always wants the the JIT to help out as much as possible. Roughly this means:
We always want our arrays to be packed SMIs (small integers) or packed Doubles, depending on whether we're doing integer or floating-point calculations.
We always want to be passing the same type of thing to functions, so that they don't get labelled "megamorphic" and deoptimised. For instance, we always want to be calling vec.add(x, y) with both x and y being packed SMI arrays, or both packed Double arrays.
We want functions to be inlined as much as possible.
When one strays outside of these cases, a sudden and drastic performance dropoff occurs. This can happen for various innocuous reasons:
You might turn a packed SMI array into a packed Double array via a seemingly innocuous operation, like the equivalent of myArray.map(x => -x). This is actually the "best" bad case, since packed Double arrays are still very fast.
You might turn a packed array into a generic boxed array, for example by mapping the array over a function which (unexpectedly) returned null or undefined. This bad case is fairly easy to avoid.
You might deoptimise a whole function such as vec.add() by passing in too many types of things and turning it megamorphic. This could happen if you want to do "generic programming", where vec.add() is used both in cases where you're not being careful about types (so it sees a lot of types come in) and in cases where you want to eke out maximum performance (it should only ever receive boxed doubles, for instance).
My question is more of a soft question, about how one writes high-performance Javascript code in light of the considerations above, while still keeping the code nice and readable. Some specific sub-questions so that you know what kind of answer I'm aiming for:
Is there a set of guidelines somewhere on how to program while staying in the world of packed SMI arrays (for instance)?
Is possible to do generic high-performance programming in Javascript without using something like a macro system to inline things like vec.add() into callsites?
How does one modularise high-performance code into libaries in light of things like megamorphic call sites and deoptimisations? For instance, if I am happily using Linear Algebra package A at high speed, and then I import a package B that depends on A, but B calls it with other types and deoptimises it, suddenly (without my code changing) my code runs slower.
Are there any good easy to use measurement tools for checking what the Javascript engine is doing internally with types?

V8 developer here. Given the amount of interest in this question, and the lack of other answers, I can give this a shot; I'm afraid it won't be the answer you were hoping for though.
Is there a set of guidelines somewhere on how to program while staying in the world of packed SMI arrays (for instance)?
Short answer: it's right here: const guidelines = ["keep your integers small enough"].
Longer answer: giving a comprehensive set of guidelines is difficult for various reasons. In general, our opinion is that JavaScript developers should write code that makes sense to them and their use case, and JavaScript engine developers should figure out how to run that code fast on their engines. On the flip side, there are obviously some limitations to that ideal, in the sense that some coding patterns will always have higher performance costs than others, regardless of engine implementation choices and optimization efforts.
When we talk about performance advice, we try to keep that in mind, and carefully estimate what recommendations have a high likelihood of remaining valid across many engines and many years, and also are reasonably idiomatic/non-intrusive.
Getting back to the example at hand: using Smis internally is supposed to be an implementation detail that user code doesn't need to know about. It'll make some cases more efficient, and shouldn't hurt in other cases. Not all engines use Smis (for example, AFAIK Firefox/Spidermonkey historically hasn't; I've heard that for some cases they do use Smis these days; but I don't know any details and can't speak with any authority on the matter). In V8, the size of Smis is an internal detail, and has actually been changing over time and over versions. On 32-bit platforms, which used to be the majority use case, Smis have always been 31-bit signed integers; on 64-bit platforms they used to be 32-bit signed integers, which recently seemed like the most common case, until in Chrome 80 we shipped "pointer compression" for 64-bit architectures, which required lowering Smi size to the 31 bits known from 32-bit platforms. If you happened to have based an implementation on the assumption that Smis are typically 32 bits, you'd get unfortunate situations like this.
Thankfully, as you noted, double arrays are still very fast. For numerics-heavy code, it probably makes sense to assume/target double arrays. Given the prevalence of doubles in JavaScript, it is reasonable to assume that all engines have good support for doubles and double arrays.
Is possible to do generic high-performance programming in Javascript without using something like a macro system to inline things like vec.add() into callsites?
"generic" is generally at odds with "high-performance". This is unrelated to JavaScript, or to specific engine implementations.
"Generic" code means that decisions have to be made at runtime. Every time you execute a function, code has to run to determine, say, "is x an integer? If so, take that code path. Is x a string? Then jump over here. Is it an object? Does it have .valueOf? No? Then maybe .toString()? Maybe on its prototype chain? Call that, and restart from the beginning with its result". "High-performance" optimized code is essentially built on the idea to drop all these dynamic checks; that's only possible when the engine/compiler has some way to infer types ahead of time: if it can prove (or assume with high enough probability) that x is always going to be an integer, then it only needs to generate code for that case (guarded by a type check if unproven assumptions were involved).
Inlining is orthogonal to all this. A "generic" function can still get inlined. In some cases, the compiler might be able to propagate type information into the inlined function to reduce polymorphism there.
(For comparison: C++, being a statically compiled language, has templates to solve a related problem. In short, they let the programmer explicitly instruct the compiler to create specialized copies of functions (or entire classes), parameterized on given types. That's a nice solution for some cases, but not without its own set of drawbacks, for example long compile times and large binaries. JavaScript, of course, has no such thing as templates. You could use eval to build a system that's somewhat similar, but then you'd run into similar drawbacks: you'd have to do the equivalent of the C++ compiler's work at runtime, and you'd have to worry about the sheer amount of code you're generating.)
How does one modularise high-performance code into libaries in light of things like megamorphic call sites and deoptimisations? For instance, if I am happily using Linear Algebra package A at high speed, and then I import a package B that depends on A, but B calls it with other types and deoptimises it, suddenly (without my code changing) my code runs slower.
Yes, that's a general problem with JavaScript. V8 used to implement certain builtins (things like Array.sort) in JavaScript internally, and this problem (which we call "type feedback pollution") was one of the primary reasons why we have entirely moved away from that technique.
That said, for numerical code, there aren't all that many types (only Smis and doubles), and as you noted they should have similar performance in practice, so while type feedback pollution is indeed a theoretical concern, and in some cases can have significant impact, it's also fairly likely that in linear algebra scenarios you won't see a measurable difference.
Also, inside the engine there are many more situations than "one type == fast" and "more than one type == slow". If a given operation has seen both Smis and doubles, that's totally fine. Loading elements from two kinds of arrays is fine too. We use the term "megamorphic" for the situation when a load has seen so many different types that it's given up on tracking them individually and instead uses a more generic mechanism that scales better to large numbers of types -- a function containing such loads can still get optimized. A "deoptimization" is the very specific act of having to throw away optimized code for a function because a new type is seen that hasn't been seen previously, and that the optimized code therefore isn't equipped to handle. But even that is fine: just go back to unoptimized code to collect more type feedback, and optimize again later. If this happens a couple of times, then it's nothing to worry about; it only becomes a problem in pathologically bad cases.
So the summary of all that is: don't worry about it. Just write reasonable code, let the engine deal with it. And by "reasonable", I mean: what makes sense for your use case, is readable, maintainable, uses efficient algorithms, doesn't contain bugs like reading beyond the length of arrays. Ideally, that's all there is to it, and you don't need to do anything else. If it makes you feel better to do something, and/or if you're actually observing performance issues, I can offer two ideas:
Using TypeScript can help. Big fat warning: TypeScript's types are aimed at developer productivity, not execution performance (and as it turns out, those two perspectives have very different requirements from a type system). That said, there is some overlap: e.g. if you consistently annotate things as number, then the TS compiler will warn you if you accidentally put null into an array or function that's supposed to only contain/operate on numbers. Of course, discipline is still required: a single number_func(random_object as number) escape hatch can silently undermine everything, because the correctness of the type annotations is not enforced anywhere.
Using TypedArrays can also help. They have a little more overhead (memory consumption and allocation speed) per array compared to regular JavaScript arrays (so if you need many small arrays, then regular arrays are probably more efficient), and they're less flexible because they can't grow or shrink after allocation, but they do provide the guarantee that all elements have exactly one type.
Are there any good easy to use measurement tools for checking what the Javascript engine is doing internally with types?
No, and that's intentional. As explained above, we don't want you to specifically tailor your code to whatever patterns V8 can optimize particularly well today, and we don't believe that you really want to do that either. That set of things can change in either direction: if there's a pattern you'd love to use, we might optimize for that in a future version (we have previously toyed with the idea of storing unboxed 32-bit integers as array elements... but work on that hasn't started yet, so no promises); and sometimes if there's a pattern we used to optimize for in the past, we might decide to drop that if it gets in the way of other, more important/impactful optimizations. Also, things like inlining heuristics are notoriously difficult to get right, so making the right inlining decision at the right time is an area of ongoing research and corresponding changes to engine/compiler behavior; which makes this another case where it would be unfortunate for everyone (you and us) if you spent a lot of time tweaking your code until some set of current browser versions does approximately the inlining decisions you think (or know?) are best, only to come back half a year later to realize that then-current browsers have changed their heuristics.
You can, of course, always measure performance of your application as a whole -- that's what ultimately matters, not what choices specifically the engine made internally. Beware of microbenchmarks, for they are misleading: if you only extract two lines of code and benchmark those, then chances are that the scenario will be sufficiently different (e.g., different type feedback) that the engine will make very different decisions.

Replacement of "-' with "_" using a loop in javascript

I have applied the same method to replace "-" with "_" in c++ and it is working properly but in javascript, it is not replacing at all.
function replace(str)
{
for(var i=0;i<str.length;i++)
{
if(str.charAt(i)=="-")
{
str.charAt(i) = "_";
}
}
return str;
}

It's simple enough in javascript, it's not really worth making a new function:
function replace(str){
return str.replace(/-/g, '_') // uses regular expression with 'g' flag for 'global'
}
console.log(replace("hello-this-is-a-string"))
This does not alter the original string, however, because strings in javascript are immutable.
If you are dead set on avoiding the builtin(maybe you want to do more complex processing), reduce() can be useful:
function replace(str){
return [...str].reduce((s, letter) => s += letter == '-' ? '_' : letter , '')
}
console.log(replace("hello-this-is-a-string"))

This is yet another case of "make sure you read the error message". In the case of
str.charAt(i) = "_";
the correct description of what happens is not "it is not replacing at all", as you would have it; it is "it generates a run-time JavaScript error", which in Chrome is worded as
Uncaught ReferenceError: Invalid left-hand side in assignment
In Firefox, it is
ReferenceError: invalid assignment left-hand side
That should have given you the clue you needed to track down your problem.
I repeat: read error messages closely. Then read them again, and again. In the great majority of cases, they tell you exactly what the problem is (if you only take the time to try to understand them), or at least point you in right direction.
Of course, reading the error message assumes you know how to view the error message. Do you? In most browsers, a development tools window is available--often via F12--which contains a "console", displaying error messages. If you don't know about devtools, or the console, then make sure you understand them before you write a single line of code.
Or, you could have simply searched on Stack Overflow, since (almost) all questions have already been asked there, especially those from newcomers. For example, I searched in Google for
can i use charat in javascript to replace a character in a string?
and the first result was How do I replace a character at a particular index in JavaScript?, which has over 400 upvotes, as does the first and accepted answer, which reads:
In JavaScript, strings are immutable, which means the best you can do is create a new string with the changed content, and assign the variable to point to it.
As you learn to program, or learn a new languages, you will inevitably run into things you don't know, or things that confuse you. What to do? Posting to Stack Overflow is almost always the worst alternative. After all, as you know, it's not a tutorial site, and it's not a help site, and it's not a forum. It's a Q&A site for interesting programming questions.
In the best case, you'll get snarky comments and answers which will ruin your day; in the worst case, you'll get down-voted, and close-voted, which is not just embarrassing, but may actually prevent you from asking questions in the future. Since you want to make sure you are able to ask questions when you really need to, you are best off taking much more time doing research, including Google and SO searches, on simple beginner questions before posting. Or find a forum which is designed to help new folks. Or ask the person next to you if there is one. Or run through one or more tutorials.
But why write it yourself at all?
However, unless you are working on this problem as a way of teaching yourself JavaScript, as a kind of training exercise, there is no reason to write it at all. It has already been written hundreds, or probably thousands, of times in the history of computing. And the overwhelming majority of those implementations are going to be better, faster, cleaner, less buggy, and more featureful than whatever you will write. So your job as a "programmer" is not to write something that converts dashes to underscores; it's to find and use something that does.
As the wise man said, today we don't write algorithms any more; we string together API calls. Our job is to find, and understand, the APIs to call.
Finding the API is not at all hard with Google. In this case, it could be helpful if you knew that strings with underscores are sometimes called "snake-cased", but even without knowing that you can find something on the first page of Google results with a query such as "javascript convert string to use underscores library".
The very first result, and the one you should take a look at, is underscore.string, a collection of string utilities written in the spirit of the versatile "underscore" library, and designed to be used with it. It provides an API called underscored. In addition to dealing with "dasherized" input (your case), it also handles other string/identifier formats such as "camelCase".
Even if you don't want to import this particular library and use it (which you probably should), you would be much better off stealing or cloning its code, which in simplified form is:
str
.replace(/([a-z\d])([A-Z]+)/g, '$1_$2')
.replace(/[-\s]+/g, '_')
This is not as complicated as it looks. The first part is to deal with camelCase input, so that "catsAndDogs" becomes "cats-and-dogs". the second line is what handles the dashes and spaces). Look closely--it replaces runs of multiple dashes with a single underscore. That could easily be something that you want to do too, depending on who is doing what with the transformed string downstream. That's a perfect example of something that someone else writing a professional-level library for this task has thought of that you might not when writing your home-grown solution.
Note that this well-regarded, finely-turned library does not use the split/join trick to do global replaces in strings, as another answer suggests. That approach went out of use almost a decade ago.
Besides saving yourself the trouble of writing it yourself, and ending up with better code, if you take time time to understand what it's doing, you will also end up knowing more about regular expressions.

You can easily replace complete string using .split() and .join().
function replace(str){
return str.split("-").join("_");
}
console.log(replace("hello-this-is-a-string"))

What is the relative processing speed of JSONObject.item vs JSONObject['item']?

I'm working on speeding up some old JavaScript code that uses a lot of the following structure:
var obj = {
attr1: value,
attr2: value2,
...
attrN: valueN
};
someFunction(obj['attr1']);
JSHint gives me the following advice:
['attr1'] is better written in dot notation.
So it prefers obj.attr1 over obj['attr1']. I understand the aesthetic reason for this warning (explained here), but which notation is faster? I would think that the former would be more efficient because the latter involves the conversion of a string literal, but I have nothing other than speculation to back that up.
Any help you can offer would be much appreciated.

They're almost even. See these two jsperf examples:
http://jsperf.com/dot-operator-vs-array-notation
http://jsperf.com/dot-notation-vs-bracket-notation/2
Both of them show them being within 1% similar, however, they both show that array notation is ever-so-slightly faster.
EDIT:
Browsing through newly created jsperfs, I found these two:
http://jsperf.com/mpaaa
http://jsperf.com/property-dot-versus-string
They both show almost the same, and, actually, after testing them multiple times, they showed different results (sometimes dot was faster, sometimes array notation)
It's a tie
ANOTHER EDIT:
The browserscope is wrong, although, at least for me, it shows some very uneven tests in other browsers, I tested it in one of the ones it showed a huge difference in, only to find similar results to what I already had found

JavaScript code convention - object declaration

I'm developing a JavaScript Style Guide and I'm aware that the only rule that can be applied on code conventions is to be consistent but I'm curious about this question since none of the major frameworks use this convention.
What are the advantages and disadvantages of vertical align the colons/equals when declaring objects/variables?
var a = 1,
ab = 2,
abc = 3;
var obj = {
a : 1,
ab : 2,
abc : 3
};

Better code readability. That's about it.
Disadvantage is when using find to find the value of a variable as there are more spaces than required. So a search for variablename = may result in nothing due to it actually being defined as variablename[space][space][space]=.

Searching in the code would be one obvious disadvantage:
Imagine that I inherited your system, or must work on it. In the progress, I need to figure out what the a variable is. The system has a huge codebase with lots of lines, so simply scrolling through and looking at every line is not an option.
If the code standard was variable = 123, then I could simply do a "search-in-files" for variable[SPACE]=[SPACE].
This is where the standard you posted might be a bad idea. I don't know how many spaces are going to between the variable name and your equal sign. Of course I could search for a[SPACE]=, a[SPACE][SPACE]=, a[SPACE][SPACE][SPACE]= etc., but it'd be a lot harder.
The only way to search for the declarations (in a raw code editor with no "find declaration" helper), would be to do a Regex search for variable\s+= (one or more spaces) or variable\s*= (zero or more spaces), which will take longer and requires an editor with Regex search capabilities.

Putting each variable on a separate line will benefit you when you go to read it again (or someone else reads it) because it will be clear that these are different variables, so increased readability is the main factor.
It mainly is a benefit in the long term when it comes down to passing the code to another programmer to maintain or modify something or if you come back to it after a long period of time and can't remember the code.
It is the same in most languages.
Also aligning the variables with eachother (and the values) can make the code look very neat and tidy but it is a lot of hassle to do and most people don't do this. You would align the start of the line but not anything else afterwards.
Imagine what would happen if something changed and you had to realign everything.
NOT GOOD. Waste of time which could be spent programming.

initialise javascript array with zeros

I am testing different methods to initialise a large javascript array with zeros. So far a simple for loop with push(0) seems to outperform the other approaches by far and large (see http://jsperf.com/initialise-array-with-zeros), but I am having doubts about the validity of this test.
In practice you would create such a large array only once and cache it, so that later when you need a large initialised array again you can simply slice it. Therefore I believe the most important evaluation is the time it takes the first time this code is executed, rather than an average over many trials.
Does anyone disagree? Or does anybody know how/where I can test the timings of only one round?
Edit: In response to some misconceptions as to the rationale of allocating an array with so many zeros I would like to clarify two things.
There will be no sparsity. I need to create more than one large array and use them for computations. These copies will be filled with floats and the chance for a float to be exactly zero is negligible.
Not all computations are performed sequentially over the array. I believe that a function that generates the array in the process would be inefficient compared to overwriting values in an array that is passed by reference (see e.g. gl-matrix.js).
My solution is therefore to create one large zero-filled array once and then take a slice() whenever a new array is needed, then pass that copy by reference to any function to work with it. Slice is super-duper-mega fast in any browser.
Now, although you may still have concerns why I want to do this, what I am really interested in is if it is at all possible to evaluate the performance of different initialisation methods at the first time run. I want to have this timing because in my situation I will certainly only run this once.
And yes, my jsperf code likely misses some solutions. So if you have an approach that I didn't think of, please feel free to add it! Thanks!

Testing the operation only once is very complicated, as the performance varies a lot depending on what else the computer is doing. You would have to run that single test a lot of times, and reset to the same conditions between each test. The reason that jsperf runs the test a lot of times is to get a good average to weed out the anomalies.
You should test this in different browsers, to see which method is the best overall. You will see that you get very varying results.
In Internet Explorer, the fastest methods is actually neither of the ones you tested, but a simple loop that assigns the zeroes:
for (var i = 0; i < numzeros; i++) zeros[i] = 0;

Starting ES6, you can use fill like:
var totals = [].fill.call({ length: 5 }, 0);

There's no practical task that would amount to "initialise javascript array with zeros", especially a big one. You should rethink why you need 0's there. Is this a sparse array and you need 0 as default value? Then just add a conditional on access to set retrieved value to 0 instead of wasting memory and initialization time.

We Keep Coding

JavaScript is the programming language of the Web.