I recently wrote some code that is grabbing text from two separate elements and then subtracting them, to my surprise I didnt have to convert them first to integers. I did a little looking around and it seems JavaScript converts strings to numbers when using the subtraction operator. Does anyone know if this is ok to leave them as strings or for best practice should they first me converted to integers? and if so why? Thank you.
example:
"10" - "6" = 4
[…] to my surprise I didnt have to convert them first to integers […]
A couple more surprises, then:
JavaScript resolves many operations (such as arithmetic) by implicitly coercing incompatible values to a different type, instead of raising an exception. This makes JavaScript “weakly typed” to that extent.
There is no integer type built into JavaScript; the only number type is IEEE-754 floating-point.
So, your string values were coerced to floating-point values, in the context of the arithmetic operation. JavaScript didn't tell you this was happening.
This is a source of bugs that can remain hidden for a long time, because if your string values would successfully convert to a number, the operation would succeed, even if you would expect those values to raise an error.
js> "1e15" - "0x45" // The reader might have expected this to raise an error.
999999999999931
The brief “Wat” presentation by Gary Bernhardt is packed with other surprising (and hilarous) results of JavaScript's implicit type coercion.
Does anyone know if this is ok to leave them as strings or for best practice should they first me converted to [numbers]?
Yes, in my opinion you should do arithmetic only on explicitly-converted numbers, because (as you discovered) for newcomers reading the code, the implicit coercion rules are not always obvious.
I'm trying to figure out how my script will behave if rendered in a browser using Chinese (or other) locale using Chinese numerals (or another non-Latin symbol set). Can't seem to find any info on this on the interwebs.
Looking at the page
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/toLocaleString
we see examples of localized numbers when converting from number to string, but what about the other way around? I tried parseInt("一二三") in IE11 debug console which returned NaN, but I'm not using Chinese Windows. Could someone test this?
My confusion comes from JavaScript having loosely typed data, so what if I end up running into an implicit string-to-number conversion, such as this:
var a = "١٢٣";
var b = .01;
console.log(a*b);
Mind you my variables a and b could come from user input in a more complex example. How can you make sure that input coming from a non-Latin symbology is converted to the right number-representation internally before you do arithmetic if parseInt and implicit conversion don't work?
It won't work for several reasons. Notice firstly that while there is a toLocalString but there is no parseLocalStringInt or fromLocaleString. Secondly javascript only really does implicit type coercion when particular operators are used e.g. ==. * however can't be used in this fashion and even == and other operators only support very limited coercion in comparison with what you are describing.
This coercion can still be very dangerous or useful depending on your point of view e.g.
0 == false is true
but 0 === false is false but it certainly isn't as powerful and you think it is.
If we have a huge string, named str1, say 5 million characters long, and then str2 = str1.substr(5555, 100) so that str2 is 100 characters long and is a substring of str1 starting at 5555 (or any other randomly selected position).
How JavaScript stores str2 internally? Is the string contents copied or the new string is sort of virtual and only a reference to the original string and values for position and size are stored?
I know this is implementation dependent, ECMAScript standard (probably) does not define what's under the hood of the string implementation. But I want to know from some expert who knows V8 or SpiderMonkey from inside well enough to clarify this.
Thank you
AFAIK V8 has four string representations:
ASCII
UTF-16
concatenation of multiple strings
slice of another string
Adventures in the land of substrings and RegExps has great explanations and illustrations.
Thus, it does not have to copy the string; it just has to beginning and ending markers to the other string.
SpiderMonkey does the same thing. (See Large substrings ~9000x faster in Firefox than Chrome: why? ... though the answer for Chrome is outdated.)
This can give real speed boosts, but sometimes this is undesirable, since it can cause small strings to hold onto the memory of the larger parent string (V8 bug report)
This old blog post of mine explains it, as well as some other string representation forms: https://web.archive.org/web/20170607033600/http://blog.cdleary.com:80/2012/01/string-representation-in-spidermonkey/
Search for "dependent string". I think I know what you might be getting at with the question: they can be problematic things, at times, because if there are no references to the original, you can keep a giant string around in order to keep a bitty little substring that's actually semantically reachable. There are things that an implementation could do to mitigate that problem, like record information on a GC-generation basis to see if such one-dependent-string entities exist and collapse them to their minimal size, but last I knew of that was not being done. (Essentially with that kind of approach you're recovering runtime_refcount == 1 style information at GC-sweep time.)
I've been playing around with Typed Arrays in JavaScript.
var buffer = new ArrayBuffer(16);
var int32View = new Int32Array(buffer);
I imagine normal arrays ([1, 257, true]) in JavaScript have poor performance because their values could be of any type, therefore, reaching an offset in memory is not trivial.
I originally thought that JavaScript array subscripts worked the same as objects (as they have many similarities), and were hash map based, requiring a hash based lookup. But I haven't found much credible information to confirm this.
So, I'd assume the reason why Typed Arrays perform so well is because they work like normal arrays in C, where they're always typed. Given the initial code example above, and wishing to get the 10th value in the typed array...
var value = int32View[10];
The type is Int32, so each value must consist of 32 bits or 4 bytes.
The subscript is 10.
So the location in memory of that value is <array offset> + (4 * 10), and then read 4 bytes to get the total value.
I basically just want to confirm my assumptions. Is my thoughts around this correct, and if not, please elaborate.
I checked out the V8 source to see if I could answer it myself, but my C is rusty and I'm not too familiar with C++.
Typed Arrays were designed by the WebGL standards committee, for performance reasons. Typically Javascript arrays are generic and can hold objects, other arrays and so on - and the elements are not necessarily sequential in memory, like they would be in C. WebGL requires buffers to be sequential in memory, because that's how the underlying C API expects them. If Typed Arrays are not used, passing an ordinary array to a WebGL function requires a lot of work: each element must be inspected, the type checked, and if it's the right thing (e.g. a float) then copy it out to a separate sequential C-like buffer, then pass that sequential buffer to the C API. Ouch - lots of work! For performance-sensitive WebGL applications this could cause a big drop in the framerate.
On the other hand, like you suggest in the question, Typed Arrays use a sequential C-like buffer already in their behind-the-scenes storage. When you write to a typed array, you are indeed assigning to a C-like array behind the scenes. For the purposes of WebGL, this means the buffer can be used directly by the corresponding C API.
Note your memory address calculation isn't quite enough: the browser must also bounds-check the array, to prevent out-of-range accesses. This has to happen with any kind of Javascript array, but in many cases clever Javascript engines can omit the check when it can prove the index value is already within bounds (such as looping from 0 to the length of the array). It also has to check the array index is really a number and not a string or something else! But it is in essence like you describe, using C-like addressing.
BUT... that's not all! In some cases clever Javascript engines can also deduce the type of ordinary Javascript arrays. In an engine like V8, if you make an ordinary Javascript array and only store floats in it, V8 may optimistically decide it's an array of floats and optimise the code it generates for that. The performance can then be equivalent to typed arrays. So typed arrays aren't actually necessary to reach maximum performance: just use arrays predictably (with every element the same type) and some engines can optimise for that as well.
So why do typed arrays still need to exist?
Optimisations like deducing the type of arrays is really complicated. If V8 deduces an ordinary array has only floats in it, then you store an object in an element, it has to de-optimise and regenerate code that makes the array generic again. It's quite an achievement that all this works transparently. Typed Arrays are much simpler: they're guaranteed to be one type, and you just can't store other things like objects in them.
Optimisations are never guaranteed to happen; you may store only floats in an ordinary array, but the engine may decide for various reasons not to optimise it.
The fact they're much simpler means other less-sophisticated javascript engines can easily implement them. They don't need all the advanced deoptimisation support.
Even with really advanced engines, proving optimisations can be used is extremely difficult and can sometimes be impossible. A typed array significantly simplifies the level of proof the engine needs to be able to optimise around it. A value returned from a typed array is certainly of a certain type, and engines can optimise for the result being that type. A value returned from an ordinary array could in theory have any type, and the engine may not be able to prove it will always have the same type result, and therefore generates less efficient code. Therefore code around a typed array is more easily optimised.
Typed arrays remove the opportunity to make a mistake. You just can't accidentally store an object and suddenly get far worse performance.
So, in short, ordinary arrays can in theory be equally fast as typed arrays. But typed arrays make it much easier to reach peak performance.
Yes, you are mostly correct. With a standard JavaScript array, the JavaScript engine has to assume that the data in the array is all objects. It can still store this as a C-like array/vector, where the access to the memory is still like you described. The problem is that the data is not the value, but something referencing that value (the object).
So, performing a[i] = b[i] + 2 requires the engine to:
access the object in b at index i;
check what type the object is;
extract the value out of the object;
add 2 to the value;
create a new object with the newly computed value from 4;
assign the new object from step 5 into a at index i.
With a typed array, the engine can:
access the value in b at index i (including placing it in a CPU register);
increment the value by 2;
assign the new object from step 2 into a at index i.
NOTE: These are not the exact steps a JavaScript engine will perform, as that depends on the code being compiled (including surrounding code) and the engine in question.
This allows the resulting computations to be much more efficient. Also, the typed arrays have a memory layout guarantee (arrays of n-byte values) and can thus be used to directly interface with data (audio, video, etc.).
When it comes to performance, things can change fast. As AshleysBrain says, it comes down to whether the VM can deduce that a normal array can be implemented as a typed array quickly and accurately. That depends on the particular optimizations of the particular JavaScript VM, and it can change in any new browser version.
This Chrome developer comment provides some guidance that worked as of June 2012:
Normal arrays can be as fast as typed arrays if you do a lot of sequential access. Random access outside the bounds of the array causes the array to grow.
Typed arrays are fast for access, but slow to be allocated. If you create temporary arrays frequently, avoid typed arrays. (Fixing this is possible, but it's low priority.)
Micro-benchmarks such as JSPerf are not reliable for real-world performance.
If I might elaborate on the last point, I've seen this phenomenon with Java for years. When you test the speed of a small piece of code by running it over and over again in isolation, the VM optimizes the heck out of it. It makes optimizations which only make sense for that specific test. Your benchmark can get a hundredfold speed improvement compared to running the same code inside another program, or compared to running it immediately after running several different tests that optimize the same code differently.
I'm not really contributor to any javascript engine, only had some readings on v8, so my answer might not be completely true:
Well values in arrays(only normal arrays with no holes/gaps, not sparse. Sparse arrays are treated as objects.) are all either pointers or a number with a fixed length(in v8 they are 32 bit, if a 31 bit integer then it's tagged with a 0 bit in the end, else it's a pointer).
So I don't think finding the memory location is any different than a typedArray, since the number of the bytes are the same all over the array. But the difference comes that if it's an a object, then you have to add one unboxing layer, which doesn't happen for normal typedArrays.
And ofcourse when accessing typedArrays, definitely doesn't have type checking's that a normal array have(though that might be remove in a higly optimized code, which is only generated for hot code).
For Writing, if it's the same type shouldn't be much slower. If it's a different type then the JS engine might generate polymorphic code for it, which is slower.
You can also try making some benchmarks on jsperf.com to confirm.
This came as a huge surprise for me, and I'd like to understand this result. I made a test in jsperf that is basically supposed to take a string (that is part of a URL that I'd like to check) and checks for the presence of 4 items (that are in fact, present in the string).
It checks in 5 ways:
plain indexOf;
Split the string, then indexOf;
regex search;
regex match;
Split the string, loop through the array of items, and then check if any of them matches the things it's supposed to match
To my huge surprise, number 5 is the fastest in Chrome 21. This is what I can't explain.
In Firefox 14, the plain indexOf is the fastest, that one I can believe.
I'm also surprised but Chrome uses v8, a highly optimized JavaScript engine which pulls all kinds of tricks. And the guys at Google probably have the largest set of JavaScript to run to test the performance of their implementation. So my guess is this happens:
The compiler notices that the array is a string array (type can be determine at compile time, no runtime checks necessary).
In the loop, since you use ===, builtin CPU op codes to compare strings (repe cmpsb) can be used. So no functions are being called (unlike in any other test case)
After the first loop, everything important (the array, the strings to compare against) is in CPU caches. Locality rulez them all.
All the other approaches need to invoke functions and locality might be an issue for the regexp versions because they build a parse tree.
I have added two more tests : http://jsperf.com/finding-components-of-a-url/2
The single regExp is fastest now (on Chrome). Also regExp literals are faster than string literals converted to RegExp.