I have about a million rows in javascript and I need to store an object for metadata for each of the rows. Given the following two different object types:
{0: {'e', 0, 'v': 'This is a value'}
And:
{0: '0This is a value'}
What would be the difference in memory between a million objects of the first type and a million objects of the second type? That is:
[obj1, obj1, obj1, ...] // array of 1M
[obj2, obj2, obj2, ...] // array of 1M
V8 developer here. The answer is still "it depends", because engines for a dynamic language tend to adapt to what you're doing, so a tiny testcase is very likely not representative of the behavior of a real application. One high-level rule of thumb that will always hold true: a single string takes less memory than an object wrapping that string. How much less? Depends.
That said, I can give a specific answer for your specific example. For the following code:
const kCount = 1000000;
let a = new Array(kCount);
for (let i = 0; i < kCount; i++) {
// Version 1 (comment out the one or the other):
a[i] = {0: {'e': 0, 'v': 'This is a value'}};
// Version 2:
a[i] = {0: '0This is a value'};
}
gc();
running with --expose-gc --trace-gc, I'm seeing:
Version 1: 244.5 MB
Version 2: 206.4 MB
(Nearly current V8, x64, d8 shell. This is what #paulsm4 suggested you could do in DevTools yourself.)
The breakdown is as follows:
the array itself will need 8 bytes per entry
an object created from an object literal has a header of 3 pointers and preallocated space for 4 named properties (unused here), total 7 * 8 = 56 bytes
its backing store for indexed properties allocates space for 17 entries even though only one will be used, plus header that's 19 pointers = 152 bytes
in version 1 we have an inner object that detects that two (and only two) named properties are needed, so it gets trimmed to a size of 5 (3 header, 2 for "e" and "v") pointers = 40 bytes
in version 2 there's no inner object, just a pointer to a string
the string literals are deduplicated, and 0 is stored as a "Smi" directly in the pointer, so neither of these needs extra space.
Summing up:
Version 1: 8+56+152+40 = 256 bytes per object
Version 2: 8+56+152 = 216 bytes per object
However, things will change dramatically if not all strings are the same, if the objects have more or fewer named or indexed properties, if they come from constructors rather than literals, if they grow or shrink over the course of their lifetimes, and a bunch of other factors. Frankly, I don't think any particularly useful insight can be gleaned from these numbers (specifically, while they might seem quite inefficient, they're unlikely to occur in practice in this way -- I bet you're not actually storing so many zeros, and wrapping the actual data into a single-property {0: ...} object doesn't look realistic either).
Let's see! If I drop all the obviously-redundant information from the small test, and at the same time force creation of a fresh, unique string for every entry, I'll be left with this loop to fill the array:
for (let i = 0; i < kCount; i++) {
a[i] = i.toString();
}
which consumes only ~31 MB total. Prefer an actual object for the metadata?
function Metadata(e, v) {
this.e = e;
this.v = v;
}
for (let i = 0; i < kCount; i++) {
a[i] = new Metadata(i, i.toString());
}
Now we're at ~69 MB. As you can see: dramatic changes ;-)
So to determine the memory requirements of your actual, complete app, and any implementation alternatives for it, you'll have to measure things yourself.
Related
Chrome docs says that retained size is "the size of memory that is freed once the object itself is deleted along with its dependent objects that were made unreachable from GC roots" which is fair enough. However, even for simple objects, retained size is often 3x of shallow size. I understand that V8 need to store reference to hidden shape, probably some data for GC and so on, but sometimes objects have hundreds of extra "retained" bytes, which seems to be a problem when you need to have millions of such objects. Let's take a look at a simple example:
class TestObject {
constructor( x, y, z ) {
this.x = x;
this.y = y;
this.z = z;
}
}
window.arr = [];
for ( let i = 0; i < 100000; i++ ) {
window.arr.push( new TestObject( Math.random(), Math.random(), Math.random() ) );
}
Here's the memory snapshot:
Shallow size is 24 bytes, which is perfectly matches with the fact that we're storing 3 x 8-byte doubles. "Extra" size is 36 bytes, which allows to store 9 x 4-byte pointers (assuming that pointer compression is on). If we add three extra properties, extra size will be 72 (!) bytes, so it depends on number of properties. What is being stored there? Is it possible to avoid such massive memory overhead?
V8 developer here.
Shallow size is the object itself, consisting of the standard object header (3 pointers) and 3 in-object properties, which are again pointers. That's 6 (compressed) pointers of 4 bytes each = 24 bytes.
Additional retained size is the storage for the three properties. Each of them is a "HeapNumber", consisting of a 4-byte map pointer plus an 8-byte payload. So that's 3 properties times 12 bytes = 36 bytes. (Armed with this knowledge, it shouldn't be surprising that with another three properties, which presumably are also numbers, this doubles to 72.)
Added up, each object occupies a total of 24+36 = 60 bytes.
Map and prototype don't count for each object's retained size because they are shared by all objects, so freeing one object wouldn't allow them to be freed as well.
One idea to save memory (if you feel that it is important) is to "transpose" your data organization: instead of 1 array containing 100,000 objects with 3 numbers each, you could have 1 object containing 3 arrays with 100,000 numbers each. Depending on your use case, this may or may not be a feasible approach: if the triples of numbers come and go a lot, then storing them in a single huge array would be unpleasant; whereas if it's a static data set, then both models might be fairly equivalent in usability. If you did this, you'd avoid the repeated per-object overhead; additionally arrays can store double numbers inline (as long as the entire array contains only numbers), so you'd be able to store the same 300K numbers with only 2.4MB total memory consumption.
If you try replacing the 3-property objects with many small TypedArrays, you'll see a significant increase in memory usage, because TypedArrays have much bigger per-object overhead than simple objects. They are geared towards having a few large arrays, not many small ones.
Here is a typical example of what I need to do
$testArr = array(2.05080E6,29400,420);
$stockArrays = array(
array(2.05080E6,29400,0),
array(2.05080E6,9800,420),
array(1.715E6,24500,280),
array(2.05080E6,29400,140),
array(2.05080E6,4900,7));
I need to identify the stockArray that is the least different. A few clarifications
The numeric values of array elements at each position are guaranteed not to overlap. (i.e. arr[0] will always have the biggest values, arr1 will be at least an order of 10 magnitude smaller etc).
The absolute values of the differences do not count when determining least different. Only, the number of differing array indices matter.
Positional differences do have a weighting. Thus in my example stockArr1 is "more different" thought it too - like its stockArr[0] & stockArr[3] counterparts - differs in only one index position because that index position is bigger.
The number of stockArrays elements will typically be less than 10 but could potentially be much more (though never into 3 figures)
The stock arrays will always have the same number of elements. The test array will have the same or fewer elements. However, when fewer testArr would be padded out so that potentially matching elements are always in the same place as the stockArray. e.g.
$testArray(29400,140)
would be transformed to
$testArray(0,29400,140);
prior to being subjected to difference testing.
Finally, a tie is possible. For instance my example above the matches would be stockArrays[0] and stockArrays[3].
In my example the result would be
$result = array(0=>array(0,0,1),3=>array(0,0,1));
indicating that the least different stock arrays are at indices 0 & 3 with the differences being at position 2.
In PHP I would handle all of this with array_diff as my starting point. For Node/JavaScript I would probably be tempted to the php.js array_diff port though I would be inclined to explore a bit given that in the worst cast scenario it is an O(n2) affair.
I am a newbie when it comes to Golang so I am not sure how I would implement this problem there. I have noted that Node does have an array_diff npm module.
One off-beat idea I have had is converting the array to a padded string (smaller array elements are 0 padded) and effectively do an XOR on the ordinal value of each character but have dismissed that as probably a rather nutty thing to do.
I am concerned with speed but not at all costs. In an ideal world the same solution (algorithm) would be used in each target language though in reality the differences between them might mean that is not possible/not a good idea.
Perhaps someone here might be able to point me to less pedestrian ways of accomplishing this - i.e. not just array_diff ports.
Here's the equivalent of the array_diff solution: (assuming I didn't make a mistake)
package main
import "fmt"
func FindLeastDifferent(needle []float64, haystack [][]float64) int {
if len(haystack) == 0 {
return -1
}
var currentIndex, currentDiff int
for i, arr := range haystack {
diff := 0
for j := range needle {
if arr[j] != needle[j] {
diff++
}
}
if i == 0 || diff < currentDiff {
currentDiff = diff
currentIndex = i
}
}
return currentIndex
}
func main() {
idx := FindLeastDifferent(
[]float64{2.05080E6, 29400, 420},
[][]float64{
{2.05080E6, 29400, 0},
{2.05080E6, 9800, 420},
{1.715E6, 24500, 280},
{2.05080E6, 29400, 140},
{2.05080E6, 4900, 7},
{2.05080E6, 29400, 420},
},
)
fmt.Println(idx)
}
Like you said its O(n * m) where n is the number of elements in the needle array, and m is the number of arrays in the haystack.
If you don't know the haystack ahead of time, then there's probably not much you can do to improve this. But if, instead, you're storing this list in a database, I think your intuition about string search has some potential. PostgreSQL for example supports string similarity indexes. (And here's an explanation of a similar idea for regular expressions: http://swtch.com/~rsc/regexp/regexp4.html)
One other idea: if your arrays are really big you can calculate fuzzy hashes (http://ssdeep.sourceforge.net/) which would make your n smaller.
Suppose I have map like this:
var map = {"a" : 100, "b" : 200, "c": 700};
And I want an Array consisting of "a" 100 times, "b" 200 times and "c" 700 times:
map_array = [a, a, a, a, ... a, b, b, b, ... b, c, c, c, ... c]
Simple solution is to just loop the frequency times and push in the array:
var map_array = []
for(key in map)
{
for(var i=1; i <= map[key] ; i++)
{
map_array.push(key)
}
}
But this will obviously take time to processes with large data, can we rework the above function to make it more efficient?
It seems to me that the real problem here is constructing the sub-arrays of repeated "a"'s, "b"'s, and "c"'s. Once you have them, you can just concat them to make your final array.
So, what we really want is a function f(x, n) which creates an array filled with n x's.
So, as a standard testbed, I'm going to define a pair of clock functions. The first measures the time it takes some array-filling function to create 500000 arrays, each containing 2187 "a"'s. The second measures the time it takes some array-filling function to create 500 arrays, each containing 1594323 "a"'s. I chose powers of three because some of my algorithms are binary-based, and I wanted to avoid any coincidences. Regardless, all of the algorithms will for for any n.
var clock1=function(f)
{
var m,t;
m=500000;
t=Date.now();
while(m--)
{
f("a", 2187);
}
t=Date.now()-t;
return t;
};
var clock2=function(f)
{
var m,t;
m=500;
t=Date.now();
while(m--)
{
f("a", 1594323);
}
t=Date.now()-t;
return t;
};
I'm running this test on my local machine running plain v8 in strict mode. Below are some candidates for f:
Linear Method
As already suggested by Alex, you can do this using a linear loop. Simply define an array and run a loop which executes n times, each time adding one x to our array.
var f=function(x,n)
{
var y;
y=Array(n);
while(n--)
{
y[n]=x;
}
return y;
};
We can optimize by using a counting variable, n to avoid calling push or y.length, as well as pre-initializing the array to the desired length. (Both suggested by Alex.) My backwards while loop is just an old habit that may boost performance slightly.
This function takes 2200ms to pass clock1, and 90658ms to pass clock2.
Partial Binary Method
We can also try constructing it using binary concatenation. The idea is that you start out with a single-element array, then , if its length is significantly less than the target length, you concat it with itself, effectively doubling it. When you get it close to the target size, switch back to adding elements one at a time until it reaches its target size:
var f=function(x,n)
{
var y,m;
y=[x];
m=1;
while(m<n)
{
if(m*2<=n)
{
y=y.concat(y);
m*=2;
}
else
{
y[m]=x;
m++;
}
}
return y;
};
Here, m is just a counting variable to keep track of the size of y.
This function takes 3630ms to pass clock1, and 42591ms to pass clock2, making it 65% slower than the linear method for small arrays, but 112% faster for large ones.
Full Binary Method
We can boost performance still further, however, by using full binary construction. The partial binary method suffers because it is forced to switch to element-by-element addition when it approaches its target length (on average, about 75% of the way through). We can fix this:
First, convert the target size into binary and save it to an array. Now, define y to be a single-element array z to be an empty array. Then, loop (backwards) through the binary array, for each element concating y with itself. In each iteration, if the respective binary digit is 1, save y into z. Finally, concat all of the elements of z together. The result is your complete array.
So, in order to fill an array of length 700, it first converts 700 to binary:
[1,0,1,0,1,1,1,1,0,0]
Looping backwards across it, it performs 9 concat's and 6 element-additions, generating a z which looks like this:
[0,0,4,8,16,32,128,512]
// I've written the lengths of the sub-arrays rather than the arrays themselves.
When it concat's everything in z together, it gets a single array of length 700, our result.
var f=function(x,n)
{
var y,z,c;
c=0;
y=[x];
z=[];
while(n>0)
{
if(n%2)
{
z[c++]=y;
n--;
}
if(n===0)
{
break;
}
n/=2;
y=y.concat(y);
}
return z.concat.apply([],z);
};
To optimize, I've compressed the binary conversion step and the loop together here. z.concat.apply([],z) uses a bit of apply magic to flatten z, an array of arrays, into a single array. For some reason, this is faster than concating to z on the fly. The second if statement prevents it from doubling y one last time after the computation is complete.
This function takes 3157ms to pass clock1 and 26809ms to pass clock2, making it 15% faster than the partial binary method for small arrays and 59% faster for large ones. It is still 44% slower than the linear method for small arrays.
Binary String Method
The concat function is weird. The larger the arrays to be concatenated, the more efficient it becomes, relatively speaking. In other words, combining two arrays of length 100 is significantly faster than combining four arrays of length 50 using concat. As a result, as the involved arrays get larger, concat becomes more efficient than push, or direct assignment. This is one of the main reasons why the binary methods are faster than the linear method for large arrays. Unfortunately, concat also suffers because it copies the involved arrays each time. Because arrays are objects, this gets quite costly. Strings are less complex than arrays, so perhaps using them would avoid this drain? We can simply use string addition (analogous to concatenation) to construct our array, and the split the resulting string.
A full binary method based on strings would look like this:
var f=function(x,n)
{
var y,z;
y=""+x;
z="";
while(n>0)
{
if(n%2)
{
z+=y;
n--;
}
if(n===0)
{
break;
}
y+=y;
n/=2;
}
return z.split("");
};
This function takes 3484ms to pass clock1 and 14534ms to pass clock2, making it 10% slower than the array-based full binary method at computing small arrays, but 85% faster for large arrays.
So, overall, its a mixed bag. The linear method gets very good performance on smaller arrays and is extremely simple. The binary string method, however, is a whopping 524% faster on large arrays, and is actually slightly less complex than the binary array method.
Hope this helps!
There is a new feature in ECMA6 called .repeat()
It will solve your issue as magic: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/repeat
You can do something like this:
const map = {"a" : 10, "b" : 20, "c": 7};
const keys = Object.keys(map);
let finalArr = [];
keys.forEach(key=>{
finalArr = [...finalArr,...((key+" ").repeat(map[key]).trim().split(" "))];
})
console.log(finalArr);
Maybe defining the array length may be more performant, at least your garbage collector will be happier:
map_array = new Array(map.length);
var c = 0;
for (key in map) {
var max = map[key];
for (var i = 1; i <= max; i++) {
map_array[c] = key;
c++;
}
}
thats more performant than using map()
http://jsperf.com/map-vs-forin/3
EDIT: I don't recommend this solution, but check the comments on this answer in order to get most performant answer.
var arrays = Object.keys(map).map(function(obj) {
var i = 0, l = map[obj], s = "";
for(;i<l;++i) {
s+= obj +",";
}
return s.split(",");
});
It actually returns three arrays with values, but you can flatten them later with:
map_array = map_array.concat.apply(map_array, arrays);
http://jsperf.com/map-vs-forin
Performance associated with Arrays and Objects in JavaScript (especially Google V8) would be very interesting to document. I find no comprehensive article on this topic anywhere on the Internet.
I understand that some Objects use classes as their underlying data structure. If there are a lot of properties, it is sometimes treated as a hash table?
I also understand that Arrays are sometimes treated like C++ Arrays (i.e. fast random indexing, slow deletion and resizing). And, other times, they are treated more like Objects (fast indexing, fast insertion/removal, more memory). And, maybe sometimes they are stored as linked lists (i.e. slow random indexing, fast removal/insertion at the beginning/end)
What is the precise performance of Array/Object retrievals and manipulations in JavaScript? (specifically for Google V8)
More specifically, what it the performance impact of:
Adding a property to an Object
Removing a property from an Object
Indexing a property in an Object
Adding an item to an Array
Removing an item from an Array
Indexing an item in an Array
Calling Array.pop()
Calling Array.push()
Calling Array.shift()
Calling Array.unshift()
Calling Array.slice()
Any articles or links for more details would be appreciated, as well. :)
EDIT: I am really wondering how JavaScript arrays and objects work under the hood. Also, in what context does the V8 engine "know" to "switch-over" to another data structure?
For example, suppose I create an array with...
var arr = [];
arr[10000000] = 20;
arr.push(21);
What's really going on here?
Or... what about this...???
var arr = [];
//Add lots of items
for(var i = 0; i < 1000000; i++)
arr[i] = Math.random();
//Now I use it like a queue...
for(var i = 0; i < arr.length; i++)
{
var item = arr[i].shift();
//Do something with item...
}
For conventional arrays, the performance would be terrible; whereas, if a LinkedList was used... not so bad.
I created a test suite, precisely to explore these issues (and more) (archived copy).
And in that sense, you can see the performance issues in this 50+ test case tester (it will take a long time).
Also as its name suggest, it explores the usage of using the native linked list nature of the DOM structure.
(Currently down, rebuilt in progress) More details on my blog regarding this.
The summary is as followed
V8 Array is Fast, VERY FAST
Array push / pop / shift is ~approx 20x+ faster than any object equivalent.
Surprisingly Array.shift() is fast ~approx 6x slower than an array pop, but is ~approx 100x faster than an object attribute deletion.
Amusingly, Array.push( data ); is faster than Array[nextIndex] = data by almost 20 (dynamic array) to 10 (fixed array) times over.
Array.unshift(data) is slower as expected, and is ~approx 5x slower than a new property adding.
Nulling the value array[index] = null is faster than deleting it delete array[index] (undefined) in an array by ~approx 4x++ faster.
Surprisingly Nulling a value in an object is obj[attr] = null ~approx 2x slower than just deleting the attribute delete obj[attr]
Unsurprisingly, mid array Array.splice(index,0,data) is slow, very slow.
Surprisingly, Array.splice(index,1,data) has been optimized (no length change) and is 100x faster than just splice Array.splice(index,0,data)
unsurprisingly, the divLinkedList is inferior to an array on all sectors, except dll.splice(index,1) removal (Where it broke the test system).
BIGGEST SURPRISE of it all [as jjrv pointed out], V8 array writes are slightly faster than V8 reads =O
Note: These metrics applies only to large array/objects which v8 does not "entirely optimise out". There can be very isolated optimised performance cases for array/object size less then an arbitrary size (24?). More details can be seen extensively across several google IO videos.
Note 2: These wonderful performance results are not shared across browsers, especially
*cough* IE. Also the test is huge, hence I yet to fully analyze and evaluate the results : please edit it in =)
Updated Note (dec 2012): Google representatives have videos on youtubes describing the inner workings of chrome itself (like when it switches from a linkedlist array to a fixed array, etc), and how to optimize them. See GDC 2012: From Console to Chrome for more.
At a basic level that stays within the realms of JavaScript, properties on objects are much more complex entities. You can create properties with setters/getters, with differing enumerability, writability, and configurability. An item in an array isn't able to be customized in this way: it either exists or it doesn't. At the underlying engine level this allows for a lot more optimization in terms of organizing the memory that represents the structure.
In terms of identifying an array from an object (dictionary), JS engines have always made explicit lines between the two. That's why there's a multitude of articles on methods of trying to make a semi-fake Array-like object that behaves like one but allows other functionality. The reason this separation even exists is because the JS engines themselves store the two differently.
Properties can be stored on an array object but this simply demonstrates how JavaScript insists on making everything an object. The indexed values in an array are stored differently from any properties you decide to set on the array object that represents the underlying array data.
Whenever you're using a legit array object and using one of the standard methods of manipulating that array you're going to be hitting the underlying array data. In V8 specifically, these are essentially the same as a C++ array so those rules will apply. If for some reason you're working with an array that the engine isn't able to determine with confidence is an array, then you're on much shakier ground. With recent versions of V8 there's more room to work though. For example, it's possible to create a class that has Array.prototype as its prototype and still gain efficient access to the various native array manipulation methods. But this is a recent change.
Specific links to recent changes to array manipulation may come in handy here:
http://code.google.com/p/v8/source/detail?r=10024
http://code.google.com/p/v8/source/detail?r=9849
http://code.google.com/p/v8/source/detail?r=9747
As a bit of extra, here's Array Pop and Array Push directly from V8's source, both implemented in JS itself:
function ArrayPop() {
if (IS_NULL_OR_UNDEFINED(this) && !IS_UNDETECTABLE(this)) {
throw MakeTypeError("called_on_null_or_undefined",
["Array.prototype.pop"]);
}
var n = TO_UINT32(this.length);
if (n == 0) {
this.length = n;
return;
}
n--;
var value = this[n];
this.length = n;
delete this[n];
return value;
}
function ArrayPush() {
if (IS_NULL_OR_UNDEFINED(this) && !IS_UNDETECTABLE(this)) {
throw MakeTypeError("called_on_null_or_undefined",
["Array.prototype.push"]);
}
var n = TO_UINT32(this.length);
var m = %_ArgumentsLength();
for (var i = 0; i < m; i++) {
this[i+n] = %_Arguments(i);
}
this.length = n + m;
return this.length;
}
I'd like to complement existing answers with an investigation to the question of how implementations behave regarding growing arrays: If they implement them the "usual" way, one would see many quick pushes with rare, interspersed slow pushes at which point the implementation copies the internal representation of the array from one buffer to a larger one.
You can see this effect very nicely, this is from Chrome:
16: 4ms
40: 8ms 2.5
76: 20ms 1.9
130: 31ms 1.7105263157894737
211: 14ms 1.623076923076923
332: 55ms 1.5734597156398105
514: 44ms 1.5481927710843373
787: 61ms 1.5311284046692606
1196: 138ms 1.5196950444726811
1810: 139ms 1.5133779264214047
2731: 299ms 1.5088397790055248
4112: 341ms 1.5056755767118273
6184: 681ms 1.5038910505836576
9292: 1324ms 1.5025873221216042
Even though each push is profiled, the output contains only those that take time above a certain threshold. For each test I customized the threshold to exclude all the pushes that appear to be representing the fast pushes.
So the first number represents which element has been inserted (the first line is for the 17th element), the second is how long it took (for many arrays the benchmark is done for in parallel), and the last value is the division of the first number by that of the one in the former line.
All lines that have less than 2ms execution time are excluded for Chrome.
You can see that Chrome increases array size in powers of 1.5, plus some offset to account for small arrays.
For Firefox, it's a power of two:
126: 284ms
254: 65ms 2.015873015873016
510: 28ms 2.0078740157480315
1022: 58ms 2.003921568627451
2046: 89ms 2.0019569471624266
4094: 191ms 2.0009775171065494
8190: 364ms 2.0004885197850513
I had to put the threshold up quite a bit in Firefox, that's why we start at #126.
With IE, we get a mix:
256: 11ms 256
512: 26ms 2
1024: 77ms 2
1708: 113ms 1.66796875
2848: 154ms 1.6674473067915691
4748: 423ms 1.6671348314606742
7916: 944ms 1.6672283066554338
It's a power of two at first and then it moves to powers of five thirds.
So all common implementations use the "normal" way for arrays (instead of going crazy with ropes, for example).
Here's the benchmark code and here's the fiddle it's in.
var arrayCount = 10000;
var dynamicArrays = [];
for(var j=0;j<arrayCount;j++)
dynamicArrays[j] = [];
var lastLongI = 1;
for(var i=0;i<10000;i++)
{
var before = Date.now();
for(var j=0;j<arrayCount;j++)
dynamicArrays[j][i] = i;
var span = Date.now() - before;
if (span > 10)
{
console.log(i + ": " + span + "ms" + " " + (i / lastLongI));
lastLongI = i;
}
}
While running under node.js 0.10 (built on v8) I was seeing CPU usage that seemed excessive for the workload. I traced one performance problem to a function that was checking for the existence of a string in an array. So I ran some tests.
loaded 90,822 hosts
loading config took 0.087 seconds (array)
loading config took 0.152 seconds (object)
Loading 91k entries into an array (with validate & push) is faster than setting obj[key]=value.
In the next test, I looked up every hostname in the list one time (91k iterations, to average the lookup time):
searching config took 87.56 seconds (array)
searching config took 0.21 seconds (object)
The application here is Haraka (a SMTP server) and it loads the host_list once at startup (and after changes) and subsequently performs this lookup millions of times during operation. Switching to an object was a huge performance win.
Context: I'm building a little site that reads an rss feed, and updates/checks the feed in the background. I have one array to store data to display, and another which stores ID's of records that have been shown.
Question: How many items can an array hold in Javascript before things start getting slow, or sluggish. I'm not sorting the array, but am using jQuery's inArray function to do a comparison.
The website will be left running, and updating and its unlikely that the browser will be restarted / refreshed that often.
If I should think about clearing some records from the array, what is the best way to remove some records after a limit, like 100 items.
The maximum length until "it gets sluggish" is totally dependent on your target machine and your actual code, so you'll need to test on that (those) platform(s) to see what is acceptable.
However, the maximum length of an array according to the ECMA-262 5th Edition specification is bound by an unsigned 32-bit integer due to the ToUint32 abstract operation, so the longest possible array could have 232-1 = 4,294,967,295 = 4.29 billion elements.
No need to trim the array, simply address it as a circular buffer (index % maxlen). This will ensure it never goes over the limit (implementing a circular buffer means that once you get to the end you wrap around to the beginning again - not possible to overrun the end of the array).
For example:
var container = new Array ();
var maxlen = 100;
var index = 0;
// 'store' 1538 items (only the last 'maxlen' items are kept)
for (var i=0; i<1538; i++) {
container [index++ % maxlen] = "storing" + i;
}
// get element at index 11 (you want the 11th item in the array)
eleventh = container [(index + 11) % maxlen];
// get element at index 11 (you want the 11th item in the array)
thirtyfifth = container [(index + 35) % maxlen];
// print out all 100 elements that we have left in the array, note
// that it doesn't matter if we address past 100 - circular buffer
// so we'll simply get back to the beginning if we do that.
for (i=0; i<200; i++) {
document.write (container[(index + i) % maxlen] + "<br>\n");
}
Like #maerics said, your target machine and browser will determine performance.
But for some real world numbers, on my 2017 enterprise Chromebook, running the operation:
console.time();
Array(x).fill(0).filter(x => x < 6).length
console.timeEnd();
x=5e4 takes 16ms, good enough for 60fps
x=4e6 takes 250ms, which is noticeable but not a big deal
x=3e7 takes 1300ms, which is pretty bad
x=4e7 takes 11000ms and allocates an extra 2.5GB of memory
So around 30 million elements is a hard upper limit, because the javascript VM falls off a cliff at 40 million elements and will probably crash the process.
EDIT: In the code above, I'm actually filling the array with elements and looping over them, simulating the minimum of what an app might want to do with an array. If you just run Array(2**32-1) you're creating a sparse array that's closer to an empty JavaScript object with a length, like {length: 4294967295}. If you actually tried to use all those 4 billion elements, you'll definitely crash the javascript process.
You could try something like this to test and trim the length:
http://jsfiddle.net/orolo/wJDXL/
var longArray = [1, 2, 3, 4, 5, 6, 7, 8];
if (longArray.length >= 6) {
longArray.length = 3;
}
alert(longArray); //1, 2, 3
I have built a performance framework that manipulates and graphs millions of datasets, and even then, the javascript calculation latency was on order of tens of milliseconds. Unless you're worried about going over the array size limit, I don't think you have much to worry about.
It will be very browser dependant. 100 items doesn't sound like a large number - I expect you could go a lot higher than that. Thousands shouldn't be a problem. What may be a problem is the total memory consumption.
I have shamelessly pulled some pretty big datasets in memory, and altough it did get sluggish it took maybe 15 Mo of data upwards with pretty intense calculations on the dataset. I doubt you will run into problems with memory unless you have intense calculations on the data and many many rows. Profiling and benchmarking with different mock resultsets will be your best bet to evaluate performance.