How does Array.push() method work exactly in Javascript?

How does Array.push() method work exactly in Javascript? - javascript

while I was taking online CS class, the topic was about an overview of how Array and memory works, and the teacher used C as an Example as that you cannot simply add extra element to an existing Array while the Array is already full with element. As a beginner developer who started with JavaScript, even though I know JavaScript is a high level language and Array.push() is already a familiar function to me, but it seems like it doesn't fit such context, out of curiosity, I've search through google and StackOverflow, I just don't see people discussing why JavaScript can just add extra elements to an existing Array.
Does JavaScript simply create a new Array with added element and point the variable I've already assigned to to the new Array or something else?

Arrays in javascript are not the same as an array in C. In C you'll allocate a new array with type and a fixed size and if you want to alter the size you'll have to create a new one. For javascript arrays on the other hand if you have a look at the description of arrays over at MDN you'll see this line.
Arrays are list-like objects whose prototype has methods to perform traversal and mutation operations. Neither the length of a JavaScript array nor the types of its elements are fixed. Since an array's length can change at any time, and data can be stored at non-contiguous locations in the array
So while it is called "array" it is more like a list which lets you resize it freely (and also store anything, because javascript).
There is also another question about the semantics of the push/pop methods here on SO that can shine some more light on those: How does the Javascript Array Push code work internally

A JavaScript Array is not like a C array, it's more like a C++ std::vector. It grows in length by dynamically allocating memory when needed.
As stated in the reference documentation for std::vector:
Vectors usually occupy more space than static arrays, because more memory is allocated to handle future growth. This way a vector does not need to reallocate each time an element is inserted, but only when the additional memory is exhausted. [...] Reallocations are usually costly operations in terms of performance.

Related

Constant time to access array element while storing items of multiple data types

How do relatively modern languages such as ruby/python/js etc may store multiple data types in arrays and are still able to access any element from the array using its index in O(1) time?
As far as I understand, we do simple mathematics to determine the memory address pointing to any element, and we do so by the index multiplied by the size of each element of the array.

Firstly, neither the Ruby Language Specification nor the Python Language Specification nor the ECMAScript Language Specification prescribe any particular implementation strategy for arrays (or lists as they are called in Python). Every implementor is free to implement them however they wish.
Secondly, lumping them all together doesn't make much sense. For example, in ECMAScript, arrays are really just objects with numeric properties, and actually, those numeric properties aren't even really numeric, they are strings.
Third, they don't really store multiple data types. E.g. Ruby only has one data type: objects. Since everything is an object, everything has the same type, so there is no problem storing objects in arrays.
Fourth, at least the Ruby Language Specification does not actually guarantee that array access is O(1). It is highly likely that a Ruby Implementation which does not provide O(1) access would be rejected by the community, but it would not violate any spec.
Now, of course, any implementor is allowed to be as clever as they want to be. E.g. V8 detects when all values of an array are numbers and then stores the array differently. But that is a private internal implementation detail of V8.

In memory, a heterogeneous array is an array of pointers. Every array element stores the memory address of the item at that position in the array.
Since memory addresses are all the same size, you can find the address of each address by multiplying the array index by the address size, and adding it to the base address of the array.

Are Javascript Arrays really Arrays?

I am a beginner to Javascript and learning the concepts. I came across the below code snippet as part of my learning process.
<script>
var data=[];
data.push("100");
data.push(100);
var object=[];
object.string="100";
object.number=100;
data.push(object);
console.log(data);
</script>
The above code snippet defines an Array and pushes a String, Number and an Object into it. I think this violates the definition of an Array which reads as follows:
Array is a container which can hold a fix number of items and these
items should be of the same type.
I would like to know if Array is the right term to be used in this case as the elements are not of the same type.

You understand/read them incorrect (in case of javascript).
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array
Arrays are list-like objects whose prototype has methods to perform traversal and mutation operations. Neither the length of a JavaScript array nor the types of its elements are fixed.
And yes, this definition is purely for Javascript. For ex: Java arrays are strictly type based.

The "should be" here most likely refers to the fact that logically homogeneous collections are easier to deal with than heterogeneous collections. If you have an array of numbers, you can do aggregate operations such as summing them easily; if you have a mixed array of stuff, you need to pick it apart one by one and can't easily do anything with it. That's often a sign of a badly structured program.
There's no technical reason why collections must be homogeneous, especially in a dynamically typed language like Javascript (more so in statically typed languages).

What reasons are there for using Arrays instead of Objects to store lists other than helper functions?

TL:DR
Beside the use of the convent Array helper functions (which I could theoretically create for objects), and considering the performance advantage of Object lookups, what reason could be given to use an Array instead of an Object?
Objects
From what I understand, because JavaScript objects use hash tables to lookup their key -> data pairs, the look-up time, no matter the length of the object is very small.
For example if I want a really fast dictionary look up, in the past I've (and we can condense the syntax but that's besides the point) stored dictionary data in JSON as
"apple" : "apple",
and then used
if (Dictionary.apple) console.log("Yep it's a word!");
And the result return very very fast regardless of whether my dictionary contains 30,000 words or 300,000.
Arrays
On the other hand, unless I know the number an array item is attached to, I have to loop through the entire array, causing larger lookup times the further the item is down the list.
The good thing I know of about using an array is that I get access to convenient functions such as slice, but these could probably be created for use with objects.
My Question
So, considering the lookup efficiency of objects, I'd currently choose an object over an array for every situation. But I could easily be wrong about this.
Beside the use of the convent Array helper functions (which I could theoretically create for objects), and considering the performance advantage of Object lookups, what reason could be given to use an Array instead of an Object?

You're comparing apples to oranges here. If you need to map from arbitrary string keys to values, as in your example with "apple", then you use an object. (In ES2015, you might alternatively use a Map instance.)
If you have a whole bunch of oranges, and you want to keep them in a list numbered from 0, you put the oranges in an array and index by which (numbered) orange you want.
The process of locating a property on an object is the same whether the object is a plain Object instance or an Array instance. In modern JavaScript runtime environments, it's safe to assume that the process for looking up number-indexed array properties is appropriately optimized to be even faster than the hash lookup for arbitrary string-named properties. That, however, is a completely separate issue from the nature of the work you need to do and the choice of data structure. Either you have a list of things, such that the order of the things in the list is the salient relationship between them, or you have named things that you need to access by those names. The two situations are conceptually different.

One big difference is the order of elements.
Looping through objects keys can't guarantee any specific order.
Looping through array keys will always give you the same order of elements.

Are the advantages of Typed Arrays in JavaScript is that they work the same or similar in C?

I've been playing around with Typed Arrays in JavaScript.
var buffer = new ArrayBuffer(16);
var int32View = new Int32Array(buffer);
I imagine normal arrays ([1, 257, true]) in JavaScript have poor performance because their values could be of any type, therefore, reaching an offset in memory is not trivial.
I originally thought that JavaScript array subscripts worked the same as objects (as they have many similarities), and were hash map based, requiring a hash based lookup. But I haven't found much credible information to confirm this.
So, I'd assume the reason why Typed Arrays perform so well is because they work like normal arrays in C, where they're always typed. Given the initial code example above, and wishing to get the 10th value in the typed array...
var value = int32View[10];
The type is Int32, so each value must consist of 32 bits or 4 bytes.
The subscript is 10.
So the location in memory of that value is <array offset> + (4 * 10), and then read 4 bytes to get the total value.
I basically just want to confirm my assumptions. Is my thoughts around this correct, and if not, please elaborate.
I checked out the V8 source to see if I could answer it myself, but my C is rusty and I'm not too familiar with C++.

Typed Arrays were designed by the WebGL standards committee, for performance reasons. Typically Javascript arrays are generic and can hold objects, other arrays and so on - and the elements are not necessarily sequential in memory, like they would be in C. WebGL requires buffers to be sequential in memory, because that's how the underlying C API expects them. If Typed Arrays are not used, passing an ordinary array to a WebGL function requires a lot of work: each element must be inspected, the type checked, and if it's the right thing (e.g. a float) then copy it out to a separate sequential C-like buffer, then pass that sequential buffer to the C API. Ouch - lots of work! For performance-sensitive WebGL applications this could cause a big drop in the framerate.
On the other hand, like you suggest in the question, Typed Arrays use a sequential C-like buffer already in their behind-the-scenes storage. When you write to a typed array, you are indeed assigning to a C-like array behind the scenes. For the purposes of WebGL, this means the buffer can be used directly by the corresponding C API.
Note your memory address calculation isn't quite enough: the browser must also bounds-check the array, to prevent out-of-range accesses. This has to happen with any kind of Javascript array, but in many cases clever Javascript engines can omit the check when it can prove the index value is already within bounds (such as looping from 0 to the length of the array). It also has to check the array index is really a number and not a string or something else! But it is in essence like you describe, using C-like addressing.
BUT... that's not all! In some cases clever Javascript engines can also deduce the type of ordinary Javascript arrays. In an engine like V8, if you make an ordinary Javascript array and only store floats in it, V8 may optimistically decide it's an array of floats and optimise the code it generates for that. The performance can then be equivalent to typed arrays. So typed arrays aren't actually necessary to reach maximum performance: just use arrays predictably (with every element the same type) and some engines can optimise for that as well.
So why do typed arrays still need to exist?
Optimisations like deducing the type of arrays is really complicated. If V8 deduces an ordinary array has only floats in it, then you store an object in an element, it has to de-optimise and regenerate code that makes the array generic again. It's quite an achievement that all this works transparently. Typed Arrays are much simpler: they're guaranteed to be one type, and you just can't store other things like objects in them.
Optimisations are never guaranteed to happen; you may store only floats in an ordinary array, but the engine may decide for various reasons not to optimise it.
The fact they're much simpler means other less-sophisticated javascript engines can easily implement them. They don't need all the advanced deoptimisation support.
Even with really advanced engines, proving optimisations can be used is extremely difficult and can sometimes be impossible. A typed array significantly simplifies the level of proof the engine needs to be able to optimise around it. A value returned from a typed array is certainly of a certain type, and engines can optimise for the result being that type. A value returned from an ordinary array could in theory have any type, and the engine may not be able to prove it will always have the same type result, and therefore generates less efficient code. Therefore code around a typed array is more easily optimised.
Typed arrays remove the opportunity to make a mistake. You just can't accidentally store an object and suddenly get far worse performance.
So, in short, ordinary arrays can in theory be equally fast as typed arrays. But typed arrays make it much easier to reach peak performance.

Yes, you are mostly correct. With a standard JavaScript array, the JavaScript engine has to assume that the data in the array is all objects. It can still store this as a C-like array/vector, where the access to the memory is still like you described. The problem is that the data is not the value, but something referencing that value (the object).
So, performing a[i] = b[i] + 2 requires the engine to:
access the object in b at index i;
check what type the object is;
extract the value out of the object;
add 2 to the value;
create a new object with the newly computed value from 4;
assign the new object from step 5 into a at index i.
With a typed array, the engine can:
access the value in b at index i (including placing it in a CPU register);
increment the value by 2;
assign the new object from step 2 into a at index i.
NOTE: These are not the exact steps a JavaScript engine will perform, as that depends on the code being compiled (including surrounding code) and the engine in question.
This allows the resulting computations to be much more efficient. Also, the typed arrays have a memory layout guarantee (arrays of n-byte values) and can thus be used to directly interface with data (audio, video, etc.).

When it comes to performance, things can change fast. As AshleysBrain says, it comes down to whether the VM can deduce that a normal array can be implemented as a typed array quickly and accurately. That depends on the particular optimizations of the particular JavaScript VM, and it can change in any new browser version.
This Chrome developer comment provides some guidance that worked as of June 2012:
Normal arrays can be as fast as typed arrays if you do a lot of sequential access. Random access outside the bounds of the array causes the array to grow.
Typed arrays are fast for access, but slow to be allocated. If you create temporary arrays frequently, avoid typed arrays. (Fixing this is possible, but it's low priority.)
Micro-benchmarks such as JSPerf are not reliable for real-world performance.
If I might elaborate on the last point, I've seen this phenomenon with Java for years. When you test the speed of a small piece of code by running it over and over again in isolation, the VM optimizes the heck out of it. It makes optimizations which only make sense for that specific test. Your benchmark can get a hundredfold speed improvement compared to running the same code inside another program, or compared to running it immediately after running several different tests that optimize the same code differently.

I'm not really contributor to any javascript engine, only had some readings on v8, so my answer might not be completely true:
Well values in arrays(only normal arrays with no holes/gaps, not sparse. Sparse arrays are treated as objects.) are all either pointers or a number with a fixed length(in v8 they are 32 bit, if a 31 bit integer then it's tagged with a 0 bit in the end, else it's a pointer).
So I don't think finding the memory location is any different than a typedArray, since the number of the bytes are the same all over the array. But the difference comes that if it's an a object, then you have to add one unboxing layer, which doesn't happen for normal typedArrays.
And ofcourse when accessing typedArrays, definitely doesn't have type checking's that a normal array have(though that might be remove in a higly optimized code, which is only generated for hot code).
For Writing, if it's the same type shouldn't be much slower. If it's a different type then the JS engine might generate polymorphic code for it, which is slower.
You can also try making some benchmarks on jsperf.com to confirm.

Are JavaScript arrays actually linked lists?

I'm new to Javascript, and notice that you don't need to specify an array's size and often see people dynamically creating arrays one element at time. This would be a huge performance problem in other languages as you would constantly need to reallocate memory for the array as it increases in size.
Is this not a problem in JavaScript? If so, then is there a list structure available?

Javascript arrays are typically implemented as hashmaps (just like Javascript objects) with one added feature: there is an attribute length, which is one higher than the highest positive integer that has been used as a key. Nothing stops you from also using strings, floating-point numbers, even negative numbers as keys. Nothing except good sense.

It most likely depends on what JavaScript engine you use.
Internet Explorer uses a mix of sparse arrays and dense arrays to make that work. Some of the more gory details are explained here: http://blogs.msdn.com/b/jscript/archive/2008/04/08/performance-optimization-of-arrays-part-ii.aspx.

The thing about dynamic languages is, well, that they're dynamic. Just like ArrayList in Java, or arrays in Perl, PHP, and Python, an Array in JavaScript will allocate a certain amount of memory and when it gets to be too big, the language automatically appends to the object. Is it as efficient as C++ or even Java? No (C++ can run circles around even the best implementations of JS), but people aren't building Quake in JS (just yet).
It is actually better to think of them as HashMaps with some specialized methods too anyway -- after all, this is valid: var a = []; a['cat']='meow';.

No.
What JavaScript arrays are and aren't is determined by the language specification specifically section 15.4. Array is defined in terms of the operations it provides not implementation details of the memory layout of any particular data structure.
Could Array be implemented on top of a linked list? Yes. This might make certain operations faster such as shift and unshift efficient, but Array also is frequently accessed by index which is not efficient with linked lists.
It's also possible to get the best of both worlds without linked lists. Continguous memory data structures, such as circular queues have both efficient insertion/removal from the front and efficient random access.
In practice, most interpreters optimize dense arrays by using a data structure based around a resizable or reallocable array similar to a C++ vector or Java ArrayList.

Javascript arrays are not true arrays like in C/C++ or other languages. Therefore, they aren't as efficient, but they are arguably easier to use and do not throw out of bounds exceptions.

They are actually more like custom objects that use the properties as indexes.
Example:
var a = { "1": 1, "2": 2};
a.length = 2;
for(var i=0;i<a.length;i++)
console.log(a[i]);
a will behave almost like an array, and you can also call functions from the Array.prototype on it.

We Keep Coding

JavaScript is the programming language of the Web.