Related
Yesterday I asked this question on how to iterate over a Map without allocating. A v8 developer replied with the following:
But if you're only interested in processing the values anyway, you can avoid it being created by iterating over just the values: for (let v of map.values()) {...} does not allocate any short-lived objects. The same is true for iterating over map.keys().
I'm trying to replicate this in a test but I'm having trouble. Here's a demo I made that reproduces the problem I'm having with a simple Set. Paste the following code in an empty HTML file in its script tags:
let a = new Set([ 1, 2, 3, 4, 5 ]);
let b = [ 1, 2, 3, 4, 5 ];
let sum = 0;
function setIterate() {
for (let item of a.values()) {
sum += item;
}
}
function arrayIterate() {
for (let i = 0; i < b.length; i++) {
sum += b[i];
}
}
setInterval(setIterate, 1);
// setInterval(arrayIterate, 1);
If you open this HTML file and then hit Shift+Escape it should open the Chrome Task Manager. If you then look at the Javascript Memory column, you'll see the memory slowly growing over time.
However, if you comment the line setInterval(setIterate, 1); and uncomment the line setInterval(arrayIterate, 1); and refresh the page you'll see that the memory usage stays constant.
That is, iterating over a Set produces garbage build-up over-time while iterating over an array does not.
Is there any way to iterate a Set without producing this garbage build-up over time? I'm developing a browser game and iterating over many Sets and Maps in my render loop and getting occasional frame-spikes due to the GC running.
Why does iterating over a Set's values allocate and create garbage?
It doesn't. Doing a million calls to setIterate with --trace-gc (in d8 or node) shows zero activity. If the function allocated even a single byte per iteration (which it can't; the minimum allocation granularity is two pointers, i.e. 8 bytes), then the young generation would have filled up at least once in that time.
Is there any way to iterate a Set without producing this garbage build-up over time?
Absolutely; for example how you're doing it.
why does the Task Manager show memory increasing over time?
I don't see it doing that. I've copy-pasted your snippet exactly as-is and let it sit for a couple of minutes; the "JavaScript memory" column of Chrome's task manager for that tab hasn't budged by a single kilobyte. (Which actually only proves that this test isn't really a stress test: it will start allocating like crazy once sum exceeds 1<<30, but that's not because of the set or the array...)
Wild guess: maybe you have some extension installed that injects code into every page and causes allocations? At any rate, whatever it is, it's not the for..of-loops over Sets.
I am trying to understand how to approach math problems such as the following excerpt, which was demonstrated in a pagination section of a tutorial I was following.
const renderResults = (arrayOfItems, pageNum = 1, resultsPerPage = 10) => {
const start = (pageNum - 1) * resultsPerPage;
const end = pageNum * resultsPerPage;
arrayOfItems.splice(start, end).forEach(renderToScreenFunction);
};
In the tutorial this solution was just typed out and not explained, which got me thinking, had I not seen the solution, I would not have been able to think of it in such a way.
I understood the goal of the problem, and how splice works to break the array into parts. But it was not obvious to me how to obtain the start and end values for using the splice method on an array of of indefinite length. How should have I gone about thinking to solve this problem?
Please understand, I am learning programming in my spare time and what might seem simple to most, I have always been afraid and struggle with math and I am posting this question in hopes to get better.
I would really appreciate if anyone could explain how does one go about solving such problems in theory. And what area of mathematics/programming should I study to get better at such problems. Any pointers would be a huge help. Many thanks.
OK, so what you're starting with is
a list of things to display that's, well, it's as long as it is.
a page number, such that the first page is page 1
a page size (number of items per page)
So to know which elements in the list to show, you need to think about what the page number and page size say about how many elements you have to skip. If you're on page 1, you don't need to skip any elements. What if you're on page 5?
Well, the first page skips nothing. The second page will have to skip the number of elements per page. The third page will have to skip twice the number of elements per page, and so on. We can generalize that and see that for page p, you need to skip p - 1 times the number of elements per page. Thus for page 5 you need to skip 4 times the number of elements per page.
To show that page after skipping over the previous pages is easy: just show the next elements-per-page elements.
Note that there are two details that the code you posted does not appear to address. These details are:
What if the actual length of the list is not evenly divisible by the page size?
What if a page far beyond the actual length of the list is requested?
For the first detail, you just need to test for that situation after you've figured out how far to skip forward.
Your function has an error, in the Splice method
arrayOfItems.splice(start, end).forEach(renderToScreenFunction);
The second argument must be the length to extract, not the final
index. You don't need to calculate the end index, but use the
resultsPerPage instead.
I've rewrite the code without errors, removing the function wrapper for better understanding, and adding some comments...
// set the initial variables
const arrayOfItems =['a','b','c','d','e','f','g','h','i','j','k','l','m'];
const pageNum = 2;
const resultsPerPage = 5;
// calculate start index
const start = (pageNum - 1) * resultsPerPage; // (2-1)*5=5
// generate a new array with elements from arrayOfItems from index 5 to 10
const itemsToShow = arrayOfItems.splice(start, resultsPerPage) ;
// done! output the results iterating the resulting array
itemsToShow.forEach( x=> console.log(x) )
Code explanation :
Sets the initial parameters
Calculate the start index of the array, corresponding to the page you try to get. ( (pageNum - 1) * resultsPerPage )
Generates a new array, extracting resultsPerPage items from arrayOfItems , starting in the start index (empty array is returned if the page does not exist)
Iterates the generated array (itemsToShow) to output the results.
The best way to understand code, is sometimes try to run it and observe the behavior and results.
Given an array having .length 100 containing elements having values 0 to 99 at the respective indexes, where the requirement is to find element of of array equal to n : 51.
Why is using a loop to iterate from start of array to end faster than iterating both start to end and end to start?
const arr = Array.from({length: 100}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start");
for (let i = 0; i < len; i++) {
if (arr[i] === n) break;
}
console.timeEnd("iterate from start");
const arr = Array.from({length: 100}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start and end");
for (let i = 0, k = len - 1; i < len && k >= 0; i++, k--) {
if (arr[i] === n || arr[k] === n) break;
}
console.timeEnd("iterate from start and end");
jsperf https://jsperf.com/iterate-from-start-iterate-from-start-and-end/1
The answer is pretty obvious:
More operations take more time.
When judging the speed of code, you look at how many operations it will perform. Just step through and count them. Every instruction will take one or more CPU cycles, and the more there are the longer it will take to run. That different instructions take a different amount of cycles mostly does not matter - while an array lookup might be more costly than integer arithmetic, both of them basically take constant time and if there are too many, it dominates the cost of our algorithm.
In your example, there are few different types of operations that you might want to count individually:
comparisons
increments/decrements
array lookup
conditional jumps
(we could be more granular, such as counting variable fetch and store operations, but those hardly matter - everything is in registers anyway - and their number basically is linear to the others).
Now both of your code iterate about 50 times - they element on which they break the loop is in the middle of the array. Ignoring off-by-a-few errors, those are the counts:
| forwards | forwards and backwards
---------------+------------+------------------------
>=/===/< | 100 | 200
++/-- | 50 | 100
a[b] | 50 | 100
&&/||/if/for | 100 | 200
Given that, it's not unexpected that doing twice the works takes considerably longer.
I'll also answer a few questions from your comments:
Is additional time needed for the second object lookup?
Yes, every individual lookup counts. It's not like they could be performed at once, or optimised into a single lookup (imaginable if they had looked up the same index).
Should there be two separate loops for each start to end and end to start?
Doesn't matter for the number of operations, just for their order.
Or, put differently still, what is the fastest approach to find an element in an array?
There is no "fastest" regarding the order, if you don't know where the element is (and they are evenly distributed) you have to try every index. Any order - even random ones - would work the same. Notice however that your code is strictly worse, as it looks at each index twice when the element is not found - it does not stop in the middle.
But still, there are a few different approaches at micro-optimising such a loop - check these benchmarks.
let is (still?) slower than var, see Why is using `let` inside a `for` loop so slow on Chrome? and Why is let slower than var in a for loop in nodejs?. This tear-up and tear-down (about 50 times) of the loop body scope in fact does dominate your runtime - that's why your inefficient code isn't completely twice as slow.
comparing against 0 is marginally faster than comparing against the length, which puts looping backwards at an advantage. See Why is iterating through an array backwards faster than forwards, JavaScript loop performance - Why is to decrement the iterator toward 0 faster than incrementing and Are loops really faster in reverse?
in general, see What's the fastest way to loop through an array in JavaScript?: it changes from engine update to engine update. Don't do anything weird, write idiomatic code, that's what will get optimised better.
#Bergi is correct. More operations is more time. Why? More CPU clock cycles.
Time is really a reference to how many clock cycles it takes to execute the code.
In order to get to the nitty-gritty of that you need to look at the machine level code (like assembly level code) to find the true evidence. Each CPU (core?) clock cycle can execute one instruction, so how many instructions are you executing?
I haven't counted the clock cycles in a long time since programming Motorola CPUs for embedded applications. If your code is taking longer then it is in fact generating a larger instruction set of machine code, even if the loop is shorter or runs an equal amount of times.
Never forget that your code is actually getting compiled into a set of commands that the CPU is going to execute (memory pointers, instruction-code level pointers, interrupts, etc.). That is how computers work and its easier to understand at the micro controller level like an ARM or Motorola processor but the same is true for the sophisticated machines that we are running on today.
Your code simply does not run the way you write it (sounds crazy right?). It is run as it is compiled to run as machine level instructions (writing a compiler is no fun). Mathematical expression and logic can be compiled in to quite a heap of assembly, machine level code and that is up to how the compiler chooses to interpret it (it is bit shifting, etc, remember binary mathematics anyone?)
Reference:
https://software.intel.com/en-us/articles/introduction-to-x64-assembly
Your question is hard to answer but as #Bergi stated the more operations the longer, but why? The more clock cycles it takes to execute your code. Dual core, quad core, threading, assembly (machine language) it is complex. But no code gets executed as you have written it. C++, C, Pascal, JavaScript, Java, unless you are writing in assembly (even that compiles down to machine code) but it is closer to actual execution code.
A masters in CS and you will get to counting clock cycles and sort times. You will likely make you own language framed on machine instruction sets.
Most people say who cares? Memory is cheap today and CPUs are screaming fast and getting faster.
But there are some critical applications where 10 ms matters, where an immediate interrupt is needed, etc.
Commerce, NASA, a Nuclear power plant, Defense Contractors, some robotics, you get the idea . . .
I vote let it ride and keep moving.
Cheers,
Wookie
Since the element you're looking for is always roughly in the middle of the array, you should expect the version that walks inward from both the start and end of the array to take about twice as long as one that just starts from the beginning.
Each variable update takes time, each comparison takes time, and you're doing twice as many of them. Since you know it will take one or two less iterations of the loop to terminate in this version, you should reason it will cost about twice as much CPU time.
This strategy is still O(n) time complexity since it only looks at each item once, it's just specifically worse when the item is near the center of the list. If it's near the end, this approach will have a better expected runtime. Try looking for item 90 in both, for example.
Selected answer is excellent. I'd like to add another aspect: Try findIndex(), it's 2-3 times faster than using loops:
const arr = Array.from({length: 900}, (_, i) => i);
const n = 51;
const len = arr.length;
console.time("iterate from start");
for (let i = 0; i < len; i++) {
if (arr[i] === n) break;
}
console.timeEnd("iterate from start");
console.time("iterate using findIndex");
var i = arr.findIndex(function(v) {
return v === n;
});
console.timeEnd("iterate using findIndex");
The other answers here cover the main reasons, but I think an interesting addition could be mentioning cache.
In general, sequentially accessing an array will be more efficient, particularly with large arrays. When your CPU reads an array from memory, it also fetches nearby memory locations into cache. This means that when you fetch element n, element n+1 is also probably loaded into cache. Now, cache is relatively big these days, so your 100 int array can probably fit comfortably in cache. However, on an array of much larger size, reading sequentially will be faster than switching between the beginning and the end of the array.
Performance associated with Arrays and Objects in JavaScript (especially Google V8) would be very interesting to document. I find no comprehensive article on this topic anywhere on the Internet.
I understand that some Objects use classes as their underlying data structure. If there are a lot of properties, it is sometimes treated as a hash table?
I also understand that Arrays are sometimes treated like C++ Arrays (i.e. fast random indexing, slow deletion and resizing). And, other times, they are treated more like Objects (fast indexing, fast insertion/removal, more memory). And, maybe sometimes they are stored as linked lists (i.e. slow random indexing, fast removal/insertion at the beginning/end)
What is the precise performance of Array/Object retrievals and manipulations in JavaScript? (specifically for Google V8)
More specifically, what it the performance impact of:
Adding a property to an Object
Removing a property from an Object
Indexing a property in an Object
Adding an item to an Array
Removing an item from an Array
Indexing an item in an Array
Calling Array.pop()
Calling Array.push()
Calling Array.shift()
Calling Array.unshift()
Calling Array.slice()
Any articles or links for more details would be appreciated, as well. :)
EDIT: I am really wondering how JavaScript arrays and objects work under the hood. Also, in what context does the V8 engine "know" to "switch-over" to another data structure?
For example, suppose I create an array with...
var arr = [];
arr[10000000] = 20;
arr.push(21);
What's really going on here?
Or... what about this...???
var arr = [];
//Add lots of items
for(var i = 0; i < 1000000; i++)
arr[i] = Math.random();
//Now I use it like a queue...
for(var i = 0; i < arr.length; i++)
{
var item = arr[i].shift();
//Do something with item...
}
For conventional arrays, the performance would be terrible; whereas, if a LinkedList was used... not so bad.
I created a test suite, precisely to explore these issues (and more) (archived copy).
And in that sense, you can see the performance issues in this 50+ test case tester (it will take a long time).
Also as its name suggest, it explores the usage of using the native linked list nature of the DOM structure.
(Currently down, rebuilt in progress) More details on my blog regarding this.
The summary is as followed
V8 Array is Fast, VERY FAST
Array push / pop / shift is ~approx 20x+ faster than any object equivalent.
Surprisingly Array.shift() is fast ~approx 6x slower than an array pop, but is ~approx 100x faster than an object attribute deletion.
Amusingly, Array.push( data ); is faster than Array[nextIndex] = data by almost 20 (dynamic array) to 10 (fixed array) times over.
Array.unshift(data) is slower as expected, and is ~approx 5x slower than a new property adding.
Nulling the value array[index] = null is faster than deleting it delete array[index] (undefined) in an array by ~approx 4x++ faster.
Surprisingly Nulling a value in an object is obj[attr] = null ~approx 2x slower than just deleting the attribute delete obj[attr]
Unsurprisingly, mid array Array.splice(index,0,data) is slow, very slow.
Surprisingly, Array.splice(index,1,data) has been optimized (no length change) and is 100x faster than just splice Array.splice(index,0,data)
unsurprisingly, the divLinkedList is inferior to an array on all sectors, except dll.splice(index,1) removal (Where it broke the test system).
BIGGEST SURPRISE of it all [as jjrv pointed out], V8 array writes are slightly faster than V8 reads =O
Note: These metrics applies only to large array/objects which v8 does not "entirely optimise out". There can be very isolated optimised performance cases for array/object size less then an arbitrary size (24?). More details can be seen extensively across several google IO videos.
Note 2: These wonderful performance results are not shared across browsers, especially
*cough* IE. Also the test is huge, hence I yet to fully analyze and evaluate the results : please edit it in =)
Updated Note (dec 2012): Google representatives have videos on youtubes describing the inner workings of chrome itself (like when it switches from a linkedlist array to a fixed array, etc), and how to optimize them. See GDC 2012: From Console to Chrome for more.
At a basic level that stays within the realms of JavaScript, properties on objects are much more complex entities. You can create properties with setters/getters, with differing enumerability, writability, and configurability. An item in an array isn't able to be customized in this way: it either exists or it doesn't. At the underlying engine level this allows for a lot more optimization in terms of organizing the memory that represents the structure.
In terms of identifying an array from an object (dictionary), JS engines have always made explicit lines between the two. That's why there's a multitude of articles on methods of trying to make a semi-fake Array-like object that behaves like one but allows other functionality. The reason this separation even exists is because the JS engines themselves store the two differently.
Properties can be stored on an array object but this simply demonstrates how JavaScript insists on making everything an object. The indexed values in an array are stored differently from any properties you decide to set on the array object that represents the underlying array data.
Whenever you're using a legit array object and using one of the standard methods of manipulating that array you're going to be hitting the underlying array data. In V8 specifically, these are essentially the same as a C++ array so those rules will apply. If for some reason you're working with an array that the engine isn't able to determine with confidence is an array, then you're on much shakier ground. With recent versions of V8 there's more room to work though. For example, it's possible to create a class that has Array.prototype as its prototype and still gain efficient access to the various native array manipulation methods. But this is a recent change.
Specific links to recent changes to array manipulation may come in handy here:
http://code.google.com/p/v8/source/detail?r=10024
http://code.google.com/p/v8/source/detail?r=9849
http://code.google.com/p/v8/source/detail?r=9747
As a bit of extra, here's Array Pop and Array Push directly from V8's source, both implemented in JS itself:
function ArrayPop() {
if (IS_NULL_OR_UNDEFINED(this) && !IS_UNDETECTABLE(this)) {
throw MakeTypeError("called_on_null_or_undefined",
["Array.prototype.pop"]);
}
var n = TO_UINT32(this.length);
if (n == 0) {
this.length = n;
return;
}
n--;
var value = this[n];
this.length = n;
delete this[n];
return value;
}
function ArrayPush() {
if (IS_NULL_OR_UNDEFINED(this) && !IS_UNDETECTABLE(this)) {
throw MakeTypeError("called_on_null_or_undefined",
["Array.prototype.push"]);
}
var n = TO_UINT32(this.length);
var m = %_ArgumentsLength();
for (var i = 0; i < m; i++) {
this[i+n] = %_Arguments(i);
}
this.length = n + m;
return this.length;
}
I'd like to complement existing answers with an investigation to the question of how implementations behave regarding growing arrays: If they implement them the "usual" way, one would see many quick pushes with rare, interspersed slow pushes at which point the implementation copies the internal representation of the array from one buffer to a larger one.
You can see this effect very nicely, this is from Chrome:
16: 4ms
40: 8ms 2.5
76: 20ms 1.9
130: 31ms 1.7105263157894737
211: 14ms 1.623076923076923
332: 55ms 1.5734597156398105
514: 44ms 1.5481927710843373
787: 61ms 1.5311284046692606
1196: 138ms 1.5196950444726811
1810: 139ms 1.5133779264214047
2731: 299ms 1.5088397790055248
4112: 341ms 1.5056755767118273
6184: 681ms 1.5038910505836576
9292: 1324ms 1.5025873221216042
Even though each push is profiled, the output contains only those that take time above a certain threshold. For each test I customized the threshold to exclude all the pushes that appear to be representing the fast pushes.
So the first number represents which element has been inserted (the first line is for the 17th element), the second is how long it took (for many arrays the benchmark is done for in parallel), and the last value is the division of the first number by that of the one in the former line.
All lines that have less than 2ms execution time are excluded for Chrome.
You can see that Chrome increases array size in powers of 1.5, plus some offset to account for small arrays.
For Firefox, it's a power of two:
126: 284ms
254: 65ms 2.015873015873016
510: 28ms 2.0078740157480315
1022: 58ms 2.003921568627451
2046: 89ms 2.0019569471624266
4094: 191ms 2.0009775171065494
8190: 364ms 2.0004885197850513
I had to put the threshold up quite a bit in Firefox, that's why we start at #126.
With IE, we get a mix:
256: 11ms 256
512: 26ms 2
1024: 77ms 2
1708: 113ms 1.66796875
2848: 154ms 1.6674473067915691
4748: 423ms 1.6671348314606742
7916: 944ms 1.6672283066554338
It's a power of two at first and then it moves to powers of five thirds.
So all common implementations use the "normal" way for arrays (instead of going crazy with ropes, for example).
Here's the benchmark code and here's the fiddle it's in.
var arrayCount = 10000;
var dynamicArrays = [];
for(var j=0;j<arrayCount;j++)
dynamicArrays[j] = [];
var lastLongI = 1;
for(var i=0;i<10000;i++)
{
var before = Date.now();
for(var j=0;j<arrayCount;j++)
dynamicArrays[j][i] = i;
var span = Date.now() - before;
if (span > 10)
{
console.log(i + ": " + span + "ms" + " " + (i / lastLongI));
lastLongI = i;
}
}
While running under node.js 0.10 (built on v8) I was seeing CPU usage that seemed excessive for the workload. I traced one performance problem to a function that was checking for the existence of a string in an array. So I ran some tests.
loaded 90,822 hosts
loading config took 0.087 seconds (array)
loading config took 0.152 seconds (object)
Loading 91k entries into an array (with validate & push) is faster than setting obj[key]=value.
In the next test, I looked up every hostname in the list one time (91k iterations, to average the lookup time):
searching config took 87.56 seconds (array)
searching config took 0.21 seconds (object)
The application here is Haraka (a SMTP server) and it loads the host_list once at startup (and after changes) and subsequently performs this lookup millions of times during operation. Switching to an object was a huge performance win.
Context: I'm building a little site that reads an rss feed, and updates/checks the feed in the background. I have one array to store data to display, and another which stores ID's of records that have been shown.
Question: How many items can an array hold in Javascript before things start getting slow, or sluggish. I'm not sorting the array, but am using jQuery's inArray function to do a comparison.
The website will be left running, and updating and its unlikely that the browser will be restarted / refreshed that often.
If I should think about clearing some records from the array, what is the best way to remove some records after a limit, like 100 items.
The maximum length until "it gets sluggish" is totally dependent on your target machine and your actual code, so you'll need to test on that (those) platform(s) to see what is acceptable.
However, the maximum length of an array according to the ECMA-262 5th Edition specification is bound by an unsigned 32-bit integer due to the ToUint32 abstract operation, so the longest possible array could have 232-1 = 4,294,967,295 = 4.29 billion elements.
No need to trim the array, simply address it as a circular buffer (index % maxlen). This will ensure it never goes over the limit (implementing a circular buffer means that once you get to the end you wrap around to the beginning again - not possible to overrun the end of the array).
For example:
var container = new Array ();
var maxlen = 100;
var index = 0;
// 'store' 1538 items (only the last 'maxlen' items are kept)
for (var i=0; i<1538; i++) {
container [index++ % maxlen] = "storing" + i;
}
// get element at index 11 (you want the 11th item in the array)
eleventh = container [(index + 11) % maxlen];
// get element at index 11 (you want the 11th item in the array)
thirtyfifth = container [(index + 35) % maxlen];
// print out all 100 elements that we have left in the array, note
// that it doesn't matter if we address past 100 - circular buffer
// so we'll simply get back to the beginning if we do that.
for (i=0; i<200; i++) {
document.write (container[(index + i) % maxlen] + "<br>\n");
}
Like #maerics said, your target machine and browser will determine performance.
But for some real world numbers, on my 2017 enterprise Chromebook, running the operation:
console.time();
Array(x).fill(0).filter(x => x < 6).length
console.timeEnd();
x=5e4 takes 16ms, good enough for 60fps
x=4e6 takes 250ms, which is noticeable but not a big deal
x=3e7 takes 1300ms, which is pretty bad
x=4e7 takes 11000ms and allocates an extra 2.5GB of memory
So around 30 million elements is a hard upper limit, because the javascript VM falls off a cliff at 40 million elements and will probably crash the process.
EDIT: In the code above, I'm actually filling the array with elements and looping over them, simulating the minimum of what an app might want to do with an array. If you just run Array(2**32-1) you're creating a sparse array that's closer to an empty JavaScript object with a length, like {length: 4294967295}. If you actually tried to use all those 4 billion elements, you'll definitely crash the javascript process.
You could try something like this to test and trim the length:
http://jsfiddle.net/orolo/wJDXL/
var longArray = [1, 2, 3, 4, 5, 6, 7, 8];
if (longArray.length >= 6) {
longArray.length = 3;
}
alert(longArray); //1, 2, 3
I have built a performance framework that manipulates and graphs millions of datasets, and even then, the javascript calculation latency was on order of tens of milliseconds. Unless you're worried about going over the array size limit, I don't think you have much to worry about.
It will be very browser dependant. 100 items doesn't sound like a large number - I expect you could go a lot higher than that. Thousands shouldn't be a problem. What may be a problem is the total memory consumption.
I have shamelessly pulled some pretty big datasets in memory, and altough it did get sluggish it took maybe 15 Mo of data upwards with pretty intense calculations on the dataset. I doubt you will run into problems with memory unless you have intense calculations on the data and many many rows. Profiling and benchmarking with different mock resultsets will be your best bet to evaluate performance.