In V8 why does a preallocated array consume less memory? - javascript

Consider the following two alternatives:
const mb_before = process.memoryUsage().heapUsed / 1024 / 1024;
const n = 15849;
const o = 115;
const entries = [];
for (var i = 0; i < n; i++) {
const subarr = [];
for (var j = 0; j < o; j++) {
subarr.push(Math.random());
}
entries.push(subarr);
}
const mb_after = process.memoryUsage().heapUsed / 1024 / 1024;
console.log('arr using ' + (mb_after - mb_before) + ' megabyte');
// arr using 15.110992431640625 megabyte
and
const mb_before = process.memoryUsage().heapUsed / 1024 / 1024;
const n = 15849;
const o = 115;
const entries = new Array(n);
for (var i = 0; i < n; i++) {
const subarr = new Array(o);
for (var j = 0; j < o; j++) {
subarr[j] = Math.random();
}
entries[i] = subarr;
}
const mb_after = process.memoryUsage().heapUsed / 1024 / 1024;
console.log('arr using ' + (mb_after - mb_before) + ' megabyte');
// arr using 12.118911743164062 megabyte
From my understanding the two arrays' size should be identical, only the way they were instantiated differs. How can it be explained that the resulting memory usage is consistently different?

I believe this has to do with the way array memory is allocated. When you instantiate an array giving it a specific size as you are in the second example, it will allocate that memory.
When you grow the array it will allocate a small amount of extra space to handle growth and then as you grow the array the additional memory allocations will get bigger. This results in extra free space in the first example.

I don't find this surprising at all. Although standard arrays aren't really arrays at all*, JavaScript engines default to optimization: Treating them as though they were really arrays when they can.
In your first example, V8 doesn't know how big each of the arrays is going to get — it just keeps growing, and in order to treat it as an optimized array (rather than an object with special properties), V8 has to keep reallocating and copying to make it bigger periodically. So it's not surprising that the most recent proactive allocation left a lot of extra room in case it kept growing.
In your second example, you've given V8 a big old clue in advance of how big you intend to make the array. So it's reasonable that V8 would use that information to optimize the allocation it does for the underlying true array.
* (that's a post on my anemic little blog)

Related

Why is the byte counter of these buffer segments showing 8TB?

I'm aware that I'm doing a computationally unfeasible thing, essentially doing Get all substrings of a string in JavaScript but with bytes of a 1MB exe instead of characters in a string.
But I wanted to see how many bytes all the segments would add up to, at least until my program crashed. Well it does crash, but I think my byte count is wrong.
const fs = require("fs");
const bytesPerKB = 1000;
const bytesPerMB = bytesPerKB * 1000;
const bytesPerGB = bytesPerMB * 1000;
function getAllSegments(buffer, skip = 1) {
let i, j, result = [], bytes = 0;
for (i = 0; i < buffer.length; i += skip) {
if (i % 1000 === 0) console.log('getting ranges for byte', i, 'with a total of', bytes / bytesPerGB, 'GB stored')
for (j = i + 1; j < buffer.length + 1; j++) {
const entry = buffer.slice(i, j)
bytes += entry.length
result.push(entry);
}
}
return result;
}
console.log('ready')
fs.promises.readFile('../data/scraped/test-1MB.exe').then(data => {
console.log('read file', data)
let segements = getAllSegments(data, 10000)
console.log('segments', segements);
})
output:
I'm pretty sure I don't have 8 TBs of storage on my PC, much less 8TB of swap space allocated. What'd I do wrong with the byte counting math?
For every single start position in the buffer, you accumulate the length of all possible substrings after that start position to the end of the buffer. Then, you repeat that processing starting one byte further into the buffer. There are gazillions of duplicates and lots of overlap that you are counting so, of course, it all adds up to way more than the size of the file or the size of your memory.
As for memory usage, buffer.slice(), returns a new Buffer object that references the original memory so that's why memory usage doesn't blow up as each of your sub-buffers is not a separate copy of the data. It's just a new Buffer object that "points" into the existing buffer with an offset and a length.
From the doc for buffer.slice():
Returns a new Buffer that references the same memory as the original, but offset and cropped by the start and end indices.

Knapsack variant in JavaScript

I have tried to implement this knapsack problem solution algorithm in JavaScript, but the solutions s_opt I get has a total weight greater than the L_max.
What am I doing wrong?
I suspect it could be something related to Closures in recursion.
/*
GENERAL:
Assume we have a knapsack and we want to bring as much stuff as possible.
Of each thing we have several variants to choose from. Each of these variants have
different value and takes different amount of space.
DEFINITIONS:
L_max = integer, size of the knapsack for the entire problem having N items
l = matrix, having the elements l[i-1][j-1] representing the space taken
by variant j of item i (-1 since indexing the matrices has index starting on zero, i.e. item i is stored at position i-1)
p = matrix, having the elements p[i-1][j-1] representing the value given by
by variant j of item i
n = total number of items (used in a sub-problem)
N = total number of items (used in the full problem, N >= n)
s_opt = vector having the optimal combination of variant selections s_i, i.e. s_opt = arg max p_sum
*/
function knapsack(L_max,l,p) {
// constructing (initializing) - they are private members
var self = this; // in order for private functions to be able read variables
this.N = l.length;
var DCached = []; // this is only used by a private function so it doesnt need to made public using this.*
this.s_opt = [];
this.p_mean = null;
this.L_max = L_max;
// define public optimization function for the entire problem
// when this is completed the user can read
// s_opt to get the solution and
// p_mean to know the quality of the solution
this.optimize = function() {
self.p_mean = D(self.N,self.L_max) / Math.max(1,self.N);
}
// define private sub-problem optimization function
var D = function(n,r) {
if (r<0)
return -Infinity;
if (n==0)
return 0;
if(DCached[n-1] != null) {
if(DCached[n-1][r-1] != null) {
return DCached[n-1][r-1];
}
}
var p_max = -Infinity;
var p_sum;
var J = l[n-1].length;
for(var j = 0; j < J; j++) {
p_sum = p[n-1][j] + D( n-1 , r - l[n-1][j] );
if(p_sum>p_max) {
p_max = p_sum;
self.s_opt[n-1] = j;
}
}
DCached[n-1] = [];
DCached[n-1][r-1] = p_max;
return p_max;
}
}
The client using this knapsack solver does the following:
var knapsackSolution = new knapsack(5,l,p);
knapsackSolution.optimize();
// now the client can access knapsackSolution.s_opt containing the solution.
I found a solution. When solving a sub-problem D(n,r) the code in the question returned the optimized value, but it didn't really manage the array s_opt in a proper way. In the modified solution, pasted below, I fixed this. Instead of only returning the optimized value of the knapsack also an array of chosen variants (e.g. the arg of the max) are returned. The cache is also modified to manage these two parts of the solution (both max value and arg max value).
The code below also contains an additional feature addition. The user can now also pass a value maxComputingComplexity controlling the computational size of the problem in some kind of heuristic manner.
/*
GENERAL:
Assume we have a knapsack and we want to bring as much stuff as possible.
Of each thing we have several variants to choose from. Each of these variants have
different value and takes different amount of space.
The quantity of each variant is one.
DEFINITIONS:
L_max = integer, size of the knapsack, e.g. max number of letters, for the entire problem having N items
l = matrix, having the elements l[i-1][j-1] representing the space taken
by variant j of item i (-1 since indexing the matrices has index starting on zero, i.e. item i is stored at position i-1)
p = matrix, having the elements p[i-1][j-1] representing the value given by
by variant j of item i
maxComputingComplexity = value limiting the product L_max*self.N*M_max in order to make the optimization
complete in limited amount of time. It has a serious implication, since it may cut the list of alternatives
so that only the first alternatives are used in the computation, meaning that the input should be well
ordered
n = total number of items (used in a sub-problem)
N = total number of items (used in the full problem, N >= n)
M_i = number of variants of item i
s_i = which variant is chosen to pack of item i
s = vector of elements s_i representing a possible solution
r = maximum total space in the knapsack, i.e. sum(l[i][s_i]) <= r
p_sum = sum of the values of the selected variants, i.e. sum(p[i][s_i]
s_opt = vector having the optimal combination of variant selections s_i, i.e. s_opt = arg max p_sum
In order to solve this, let us see p_sum as a function
D(n,r) = p_sum (just seeing it as a function of the sub-problem n combined with the maximum total space r)
RESULT:
*/
function knapsack(L_max,l,p,maxComputingComplexity) {
// constructing (initializing) - they are private members
var self = this; // in order for private functions to be able read variables
this.N = l.length;
var DCached = []; // this is only used by a private function so it doesnt need to made public using this.*
//this.s_opt = [];
//this.p_mean = null;
this.L_max = L_max;
this.maxComputingComplexity = maxComputingComplexity;
//console.log("knapsack: Creating knapsack. N=" + N + ". L_max=" + L_max + ".");
// object to store the solution (both big problem and sub-problems)
function result(p_max,s_opt) {
this.p_max = p_max; //max value
this.s_opt = s_opt; //arg max value
}
// define public optimization function for the entire problem
// when this is completed the user can read
// s_opt to get the solution and
// p_mean to know the quality of the solution
// computing complexity O(L_max*self.N*M_max),
// think O=L_max*N*M_max => M_max=O/L_max/N => 3=x/140/20 => x=3*140*20 => x=8400
this.optimize = function() {
var M_max = Math.max(maxComputingComplexity / (L_max*self.N),2); //totally useless if not at least two
console.log("optimize: Setting M_max =" + M_max);
return D(self.N,self.L_max,M_max);
//self.p_mean = mainResult.D / Math.max(1,self.N);
// console.log...
}
// Define private sub-problem optimization function.
// The function reads to "global" variables, p and l
// and as arguments it takes
// n delimiting the which sub-set of items to be able to include (from p and l)
// r setting the max space that this sub-set of items may take
// Based on these arguments the function optimizes D
// and returns
// D the max value that can be obtained by combining the things
// s_opt the selection (array of length n) of things optimizing D
var D = function(n,r,M_max) {
// Start by checking whether the value is already cached...
if(DCached[n-1] != null) {
if(DCached[n-1][r-1] != null) {
//console.log("knapsack.D: n=" + n + " r=" + r + " returning from cache.");
return DCached[n-1][r-1];
}
}
var D_result = new result(-Infinity, []); // here we will manage the result
//D_result.s_opt[n-1] = 0; // just put something there to start with
if (r<0) {
//D_result.p_max = -Infinity;
return D_result;
}
if (n==0) {
D_result.p_max = 0;
return D_result;
}
var p_sum;
//self.s_opt[n] = 0; not needed
var J = Math.min(l[n-1].length,M_max);
var D_minusOneResult; //storing the result when optimizing all previous items given a max length
for(var j = 0; j < J; j++) {
D_minusOneResult = D( n-1 , r - l[n-1][j] , M_max)
p_sum = p[n-1][j] + D_minusOneResult.p_max;
if(p_sum > D_result.p_max) {
D_result.p_max = p_sum;
D_result.s_opt = D_minusOneResult.s_opt;
D_result.s_opt[n-1] = j;
}
}
DCached[n-1] = [];
DCached[n-1][r-1] = D_result;
//console.log("knapsack.D: n=" + n + " r=" + r + " p_max= "+ p_max);
return D_result;
}
}

Javascript Typed array vs simple array: performance

What I'm basically trying to do is to map an array of data points into a WebGL vertex buffer (Float32Array) in realtime (working on animated parametric surfaces). I've assumed that representing data points with Float32Arrays (either one Float32Array per component: [xx...x, yy...y] or interleave them: xyxy...xy) should be faster than storing them in an array of points: [[x, y], [x, y],.. [x, y]] since that'd actually be a nested hash and all. However, to my surprise, that leads to a slowdown of about 15% in all the major browsers (not counting array creation time). Here's a little test I've set up:
var points = 250000, iters = 100;
function map_2a(x, y) {return Math.sin(x) + y;}
var output = new Float32Array(3 * points);
// generate data
var data = [];
for (var i = 0; i < points; i++)
data[i] = [Math.random(), Math.random()];
// run
console.time('native');
(function() {
for (var iter = 0; iter < iters; iter++)
for (var i = 0, to = 0; i < points; i++, to += 3) {
output[to] = data[i][0];
output[to + 1] = data[i][1];
output[to + 2] = map_2a(data[i][0], data[i][1]);
}
}());
console.timeEnd('native');
// generate data
var data = [new Float32Array(points), new Float32Array(points)];
for (var i = 0; i < points; i++) {
data[0][i] = Math.random();
data[1][i] = Math.random();
}
// run
console.time('typed');
(function() {
for (var iter = 0; iter < iters; iter++)
for (var i = 0, to = 0; i < points; i++, to += 3) {
output[to] = data[0][i];
output[to + 1] = data[1][i];
output[to + 2] = map_2a(data[0][i], data[1][i]);
}
}());
console.timeEnd('typed');
Is there anything I'm doing wrong?
I think your problem is that you are not comparing the same code. In the first example, you have one large array filled with very small arrays. In the second example, you have two very large arrays, and both of them need to be indexed. The profile is different.
If I structure the first example to be more like the second (two large generic arrays), then the Float32Array implementation far outperforms the generic array implementation.
Here is a jsPerf profile to show it.
In V8 variables can have SMI (int31/int32), double and pointer type. So I guess when you operate with floats it should be converted to double type. If you use usual arrays it is converted to doubles already.

What is the most efficient structure to organize buckets in a space partitioning structure on JavaScript - objects or arrays?

I need to create a structure that will partition space of a terrain map in buckets (tiles). I have 2 options. I could store buckets in objects:
var bucket = {
"0,0": [...],
"0,1": [...],
"0,2": [...],
...
};
var x = 1, y = 2;
var accessed_bucket = bucket[x + "," + y];
or I could use an array for this:
var width = 512, height = 512;
var bucket = [];
for (var i=0, len=width*height; i<len; ++i)
bucket[i] = [];
var x = 1, y = 2;
var accessed_bucket = bucket[x % width + y * width];
The object option is probably more flexible as I don't have to predefine a width/height. Also I don't need to allocate memory in precedence. But I guess access speed will be slower due to conversion from numbers to string, and the internal hashing algorithm. My question is, is the array option supposed to be faster? If so, how much? What is a good benchmark to test both options?
Edit: as requested by #Niklas B., this is a benchmark. According to it, arrays are astonishingly faster. I'm not sure this is a relevant benchmark, though.

array.push(element) vs array[array.length] = element [duplicate]

This question already has answers here:
How to append something to an array?
(30 answers)
Why is array.push sometimes faster than array[n] = value?
(5 answers)
Closed 9 years ago.
I was wondering if there is a reason to choose
array.push(element)
over
array[array.length] = element
or vice-versa.
Here's a simple example where I have an array of numbers and I want to make a new array of those numbers multiplied by 2:
var numbers = [5, 7, 20, 3, 13];
var arr1 = [];
var len = numbers.length;
for(var i = 0; i < len; i++){
arr1.push(numbers[i] * 2);
}
alert(arr1);
var arr2 = [];
for(var i = 0; i < len; i++){
arr2[arr2.length] = numbers[i] * 2;
}
alert(arr2);
The fastest way to do it with current JavaScript technology, while also using minimal code, is to store the last element first, thereby allocating the full set of array indices, and then counting backwards to 0 while storing the elements, thereby taking advantage of nearby memory storage positions and minimizing cache misses.
var arr3 = [];
for (var i = len; i>0;){
i--;
arr2[i] = numbers[i] * 2;
}
alert(arr2);
Note that if the number of elements being stored is "big enough" in the view of the JavaScript engine, then the array will be created as a "sparse" array and never converted to a regular flat array.
Yes, I can back this up. The only problem is that JavaScript optimizers are extremely aggressive in throwing away calculations that aren't used. So in order for the results to be calculated fairly, all the results have to be stored (temporarily). One further optimization that I believed to be obsolete, but actually improves the speed even further is to pre-initialize the array using new Array(*length*). That's an old-hat trick that for a while made no difference, but no in the days of extreme JavaScript engine optimizations, it appears to make a difference again.
<script>
function arrayFwd(set) {
var x = [];
for (var i = 0; i<set.length; i++)
x[x.length] = set[i];
return x;
}
function arrayRev(set) {
var x = new Array(set.length);
for (var i = set.length; i>0;) {
i--;
x[i] = set[i];
}
return x;
}
function arrayPush(set) {
var x = [];
for (var i = 0; i<set.length; i++)
x.push(set[i]);
return x;
}
results = []; /* we'll store the results so that
optimizers don't realize the results are not used
and thus skip the function's work completely */
function timer(f, n) {
return function(x) {
var n1 = new Date(), i = n;
do { results.push(f(x)); } while (i-- > 0); // do something here
return (new Date() - n1)/n;
};
}
set = [];
for (i=0; i<4096; i++)
set[i] = (i)*(i+1)/2;
timers = {
forward: timer(arrayFwd, 500),
backward: timer(arrayRev, 500),
push: timer(arrayPush, 500)
};
for (k in timers) {
document.write(k, ' = ', timers[k](set), ' ms<br />');
}
</script>
Opera 12.15:
forward = 0.12 ms
backward = 0.04 ms
push = 0.09 ms
Chrome (latest, v27):
forward = 0.07 ms
backward = 0.022 ms
push = 0.064 ms
(for comparison, when results are not stored, Chrome produces these numbers:
forward = 0.032 ms
backward = 0.008 ms
push = 0.022 ms
This is almost four times faster versus doing the array forwards, and almost three times faster versus doing push.)
IE 10:
forward = 0.028 ms
backward = 0.012 ms
push = 0.038 ms
Strangely, Firefox still shows push as faster. There must be some code re-writing going on under the hood with Firefox when push is used, because accessing a property and invoking a function are both slower than using an array index in terms of pure, un-enhanced JavaScript performance.

Categories