JavaScript 'var' Data/Object Sizes - javascript

Does JavaScript optimize the size of variables stored in memory? For instance, will a variable that has a boolean value take up less space than one that has an integer value?
Basically, will the following array:
var array = new Array(8192);
for (var i = 0; i < array.length; i++)
array[i] = true;
be any smaller in the computer's memory than:
var array = new Array(8192);
far (var i = 0; i < array.length; i++)
array[i] = 9;

Short answer: Yes.
Boolean's generally (and it will depend on the user agent and implementation) will take up 4 bytes, while integer's will take up 8.
Check out this other StackOverflow question to see how some others managed to measure memory footprints in JS: JavaScript object size
Edit: Section 8.5 of the ECMAScript Spec states the following:
The Number type has exactly 18437736874454810627 values, representing the doubleprecision 64-bit format IEEE 754 values as specified in the IEEE Standard for Binary Floating-Point Arithmetic
... so all numbers should, regardless of implementation, be 8 bytes.

Well, js has only one number type, which is a 64-bit float. Each character in a string is 16 bits ( src: douglas crockford's , javascript the good parts ). Handling of bools is probably thus interpreter implementation specific. if I remember correctly though, the V8 engine surely handles the 'Boolean' object as a 'c bool'.

Related

Javascript compare object size in memory

I have about a million rows in javascript and I need to store an object for metadata for each of the rows. Given the following two different object types:
{0: {'e', 0, 'v': 'This is a value'}
And:
{0: '0This is a value'}
What would be the difference in memory between a million objects of the first type and a million objects of the second type? That is:
[obj1, obj1, obj1, ...] // array of 1M
[obj2, obj2, obj2, ...] // array of 1M
V8 developer here. The answer is still "it depends", because engines for a dynamic language tend to adapt to what you're doing, so a tiny testcase is very likely not representative of the behavior of a real application. One high-level rule of thumb that will always hold true: a single string takes less memory than an object wrapping that string. How much less? Depends.
That said, I can give a specific answer for your specific example. For the following code:
const kCount = 1000000;
let a = new Array(kCount);
for (let i = 0; i < kCount; i++) {
// Version 1 (comment out the one or the other):
a[i] = {0: {'e': 0, 'v': 'This is a value'}};
// Version 2:
a[i] = {0: '0This is a value'};
}
gc();
running with --expose-gc --trace-gc, I'm seeing:
Version 1: 244.5 MB
Version 2: 206.4 MB
(Nearly current V8, x64, d8 shell. This is what #paulsm4 suggested you could do in DevTools yourself.)
The breakdown is as follows:
the array itself will need 8 bytes per entry
an object created from an object literal has a header of 3 pointers and preallocated space for 4 named properties (unused here), total 7 * 8 = 56 bytes
its backing store for indexed properties allocates space for 17 entries even though only one will be used, plus header that's 19 pointers = 152 bytes
in version 1 we have an inner object that detects that two (and only two) named properties are needed, so it gets trimmed to a size of 5 (3 header, 2 for "e" and "v") pointers = 40 bytes
in version 2 there's no inner object, just a pointer to a string
the string literals are deduplicated, and 0 is stored as a "Smi" directly in the pointer, so neither of these needs extra space.
Summing up:
Version 1: 8+56+152+40 = 256 bytes per object
Version 2: 8+56+152 = 216 bytes per object
However, things will change dramatically if not all strings are the same, if the objects have more or fewer named or indexed properties, if they come from constructors rather than literals, if they grow or shrink over the course of their lifetimes, and a bunch of other factors. Frankly, I don't think any particularly useful insight can be gleaned from these numbers (specifically, while they might seem quite inefficient, they're unlikely to occur in practice in this way -- I bet you're not actually storing so many zeros, and wrapping the actual data into a single-property {0: ...} object doesn't look realistic either).
Let's see! If I drop all the obviously-redundant information from the small test, and at the same time force creation of a fresh, unique string for every entry, I'll be left with this loop to fill the array:
for (let i = 0; i < kCount; i++) {
a[i] = i.toString();
}
which consumes only ~31 MB total. Prefer an actual object for the metadata?
function Metadata(e, v) {
this.e = e;
this.v = v;
}
for (let i = 0; i < kCount; i++) {
a[i] = new Metadata(i, i.toString());
}
Now we're at ~69 MB. As you can see: dramatic changes ;-)
So to determine the memory requirements of your actual, complete app, and any implementation alternatives for it, you'll have to measure things yourself.

Portable hashCode implementation for binary data

I am looking for a portable algorithm for creating a hashCode for binary data. None of the binary data is very long -- I am Avro-encoding keys for use in kafka.KeyedMessages -- we're probably talking anywhere from 2 to 100 bytes in length, but most of the keys are in the 4 to 8 byte range.
So far, my best solution is to convert the data to a hex string, and then do a hashCode of that. I'm able to make that work in both Scala and JavaScript. Assuming I have defined b: Array[Byte], the Scala looks like this:
b.map("%02X" format _).mkString.hashCode
It's a little more elaborate in JavaScript -- luckily someone already ported the basic hashCode algorithm to JavaScript -- but the point is being able to create a Hex string to represent the binary data, I can ensure the hashing algorithm works off the same inputs.
On the other hand, I have to create an object twice the size of the original just to create the hashCode. Luckily most of my data is tiny, but still -- there has to be a better way to do this.
Instead of padding the data as its hex value, I presume you could just coerce the binary data into a String so the String has the same number of bytes as the binary data. It would be all garbled, more control characters than printable characters, but it would be a string nonetheless. Do you run into portability issues though? Endian-ness, Unicode, etc.
Incidentally, if you got this far reading and don't already know this -- you can't just do:
val b: Array[Byte] = ...
b.hashCode
Luckily I already knew that before I started, because I ran into that one early on.
Update
Based on the first answer given, it appears at first blush that java.util.Arrays.hashCode(Array[Byte]) would do the trick. However, if you follow the javadoc trail, you'll see that this is the algorithm behind it, which is as based on the algorithm for List and the algorithm for byte combined.
int hashCode = 1;
for (byte e : list) hashCode = 31*hashCode + (e==null ? 0 : e.intValue());
As you can see, all it's doing is creating a Long representing the value. At a certain point, the number gets too big and it wraps around. This is not very portable. I can get it to work for JavaScript, but you have to import the npm module long. If you do, it looks like this:
function bufferHashCode(buffer) {
const Long = require('long');
var hashCode = new Long(1);
for (var value of buff.values()) { hashCode = hashCode.multiply(31).add(value) }
return hashCode
}
bufferHashCode(new Buffer([1,2,3]));
// hashCode = Long { low: 30817, high: 0, unsigned: false }
And you do get the same results when the data wraps around, sort of, though I'm not sure why. In Scala:
java.util.Arrays.hashCode(Array[Byte](1,2,3,4,5,6,7,8,9,10))
// res30: Int = -975991962
Note that the result is an Int. In JavaScript:
bufferHashCode(new Buffer([1,2,3,4,5,6,7,8,9,10]);
// hashCode = Long { low: -975991962, high: 197407, unsigned: false }
So I have to take the low bytes and ignore the high, but otherwise I get the same results.
This functionality is already available in Java standard library, look at the Arrays.hashCode() method.
Because your binary data are Array[Byte], here is how you can verify it works:
println(java.util.Arrays.hashCode(Array[Byte](1,2,3))) // prints 30817
println(java.util.Arrays.hashCode(Array[Byte](1,2,3))) // prints 30817
println(java.util.Arrays.hashCode(Array[Byte](2,2,3))) // prints 31778
Update: It is not true that the Java implementation boxes the bytes. Of course, there is conversion to int, but there's no way around that. This is the Java implementation:
public static int hashCode(byte a[]) {
if (a == null) return 0;
int result = 1;
for (byte element : a) result = 31 * result + element;
return result;
}
Update 2
If what you need is a JavaScript implementation that gives the same results as a Scala/Java implementation, than you can extend the algorithm by, e.g., taking only the rightmost 31 bits:
def hashCode(a: Array[Byte]): Int = {
if (a == null) {
0
} else {
var hash = 1
var i: Int = 0
while (i < a.length) {
hash = 31 * hash + a(i)
hash = hash & Int.MaxValue // taking only the rightmost 31 bits
i += 1
}
hash
}
}
and JavaScript:
var hashCode = function(arr) {
if (arr == null) return 0;
var hash = 1;
for (var i = 0; i < arr.length; i++) {
hash = hash * 31 + arr[i]
hash = hash % 0x80000000 // taking only the rightmost 31 bits in integer representation
}
return hash;
}
Why do the two implementations produce the same results? In Java, integer overflow behaves as if the addition was performed without loss of precision and then bits higher than 32 got thrown away and & Int.MaxValue throws away the 32nd bit. In JavaScript, there is no loss of precision for integers up to 253 which is a limit the expression 31 * hash + a(i) never exceeds. % 0x80000000 then behaves as taking the rightmost 31 bits. The case without overflows is obvious.
This is the meat of algorithm used in the Java library:
int result 1;
for (byte element : a) result = 31 * result + element;
You comment:
this algorithm isn't very portable
Incorrect. If we are talking about Java, then provided that we all agree on the type of the result, then the algorithm is 100% portable.
Yes the computation overflows, but it overflows exactly the same way on all valid implementations of the Java language. A Java int is specified to be 32 bits signed two's complement, and the behavior of the operators when overflow occurs is well-defined ... and the same for all implementations. (The same goes for long ... though the size is different, obviously.)
I'm not an expert, but my understanding is that Scala's numeric types have the same properties as Java. Javascript is different, being based on IEE 754 double precision floating point. However, with case you should be able to code the Java algorithm portably in Javascript. (I think #Mifeet's version is wrong ...)

Javascript -- reliable to use numbers as integers? [duplicate]

This question already has answers here:
What is JavaScript's highest integer value that a number can go to without losing precision?
(21 answers)
Closed 7 years ago.
Why is it apparently safe to use numbers as integers in Javascript? What I mean is that a loop such as the one below is generally "trusted" to run the expected number of times even though the final loop requires an exact compare of (10000 == 10000) when these two values are floats and not ints. Is there some sort of built-in rounding feature that makes this safe and reliable -- or is this horrible and untrustworthy coding? Thanks.
--edit--
It is interesting that there is a declared safe integer range. I was not aware of MAX_SAFE_INTEGER. We all know the standard whine that 2 + 2 = 3.9999. I note that MAX_SAFE_INTEGER is listed as ECMAScript-6 so does this imply that IEEE-754 does not actually mention a safe integer range?
var cnt = 0;
for (var i=0 ; i<=10000 ; i++){
// loop 10001 times
cnt++;
}
alert('cnt = '+ cnt);
IEEE-754 double-precision floating point numbers (the kind used by JavaScript) have a very wide range over which they precisely represent integers, specifically -9,007,199,254,740,991 through 9,007,199,254,740,991. (Those values are being added to JavaScript's Number function as constants: MIN_SAFE_INTEGER and MAX_SAFE_INTEGER.) Outside that range, you could indeed run into trouble.
In fact, if it weren't for safety, this loop would never end:
var n, safety;
safety = 0;
for (n = 9007199254740990; n != 9007199254740999; ++n) {
if (++safety === 20) { // Long after `n` should have matched
snippet.log("Had to break out!");
break;
}
snippet.log("n = " + n);
}
<!-- Script provides the `snippet` object, see http://meta.stackexchange.com/a/242144/134069 -->
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>
Firstly, the final loop doesn't require the two values to be equal. Just that i is less than 10000.
Secondly, the Number type in JavaScript holds integer values accurately up (and down) to a point. You can access the Number.MAX_SAFE_INTEGER and Number.MIN_SAFE_INTEGER properties to see what the safe range is for your browser/engine.
And if you want to check if an instance of Number is an integer, just use the Number.isInteger() method.
var i = 10, e = document.getElementById('message');
if(Number.isInteger(i)) {
e.innerHTML = "i is an integer";
} else {
e.innerHTML = "i is not an integer";
}
<div id='message'></div>

How to generate random numbers in a very large range via javascript?

I was using this function for a long time and was happy with it. You probably saw it millions of times. It is even in the example section of the MDN documentation for Math.random()!
function random(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min
};
However when I called it on really large range it performed really poorly. Here are some results:
for(var i=0;i<100;i++) { console.log(random(0, 34359738368)) }
34064924616
6800671568
30945277424
2591785504
16404206304
29609031808
14821448928
10712020504
26471102024
21454653384
33180253592
28189739360
27189739528
1159593656
24058421888
13727549496
21995862272
20907450968
28767901872
8055552544
2856286816
28137132160
22775692392
21141911808
16418994064
28151646560
19928528408
11100796192
24022825648
17873139800
10310184976
7425284936
27043756016
2521657024
2864339728
8080550424
8812058632
8867252312
18571554760
19600873680
33687248280
14707542936
28864740112
26338252144
7877957776
28207487968
2268429496
14461565136
28062983608
5637084472
29651319832
31910601904
19776200528
16996597392
2478335752
4751145704
24803500872
21899551216
23144535632
19854787112
8490486080
14932659320
8625736560
11379900040
32357265704
33852039680
2826278800
4648275784
27363699728
14164020752
22279817656
25238815424
16569505656
30065335928
9904863008
26944796040
23179908064
19887944032
27944730648
16242926184
6518696400
25727832240
7496221976
19014687568
5685988776
34324757344
12538943128
21639530152
9532790800
25800487608
34329978920
10871183016
23748271688
23826614456
11774681408
667541072
1316689640
4539806456
2323113432
7782744448
Hardly random at all. All numbers are even.
My question is this: What is the CANONICAL way (if any) to overcome this problem? I have the impression that the above random function is the go-to function for random numbers in range. Thanks in advance.
The WebCrypto API (supported in draft by all the major browsers) provides cryptographically random numbers....
/* assuming that window.crypto.getRandomValues is available */
var array = new Uint32Array(10);
window.crypto.getRandomValues(array);
console.log("Your lucky numbers:");
for (var i = 0; i < array.length; i++) {
console.log(array[i]);
}
W3C standard
https://www.w3.org/TR/WebCryptoAPI/
Example from here.
https://developer.mozilla.org/en-US/docs/Web/API/RandomSource/getRandomValues
The answer in general is don't use Math.random. It gets the job done, but it's not especially good. On top of that, any number in Javascript greater than 0xffffffffUL isn't represented by integer values--it's an IEEE 754 value with a behavior noted on the MDN site: "Note that as numbers in JavaScript are IEEE 754 floating point numbers with round-to-nearest-even behavior...."
And that's what you're seeing.
If you want larger random numbers, then you'll probably have to get something like Mersenne Twister or Blum-Blum-Shub 32-bit random integer values and multiply them. That will eliminate the rounding-off problem.
Thats wierd! Well you know there is no such thing as truly random when in comes to computers. There is always an algorithm used. So you found a number that causes even's for this particular algorithm.
I tried it out, it isn't necessarily caused by large numbers. More likely some kind of factorization of the number instead. Just try another number, even larger if you like and you should get output that isn't all even. Ex. 134359738368 which is even larger doesn't out all odd or even numbers.

Can you get powers of 10 faster than O(log n)?

I know that exponentiation is O(log n) or worse for most cases, but I'm getting lost trying to understand of how numbers are represented themselves. Take JavaScript, for example, because it has several native number formats:
100000 === 1E5 && 100000 === 0303240
>>> true
Internally, don't they all end up being stored and manipulated as binary values stored in memory? If so, is the machine able to store the decimal and scientific-notation representations as fast as it does the octal?
And thus, would you expect +("1E" + n) to be faster than Math.pow(10, n)?
Mostly this question is about how 1E(n) works, but in trying to think about the answer myself I became more curious about how the number is parsed and stored in the first place. I would appreciate any explanation you can offer.
I don't think string manipulation could be faster, because at least concatenation creates a new object (memory allocation, more job for GC), Math.pow usually comes to single machine instruction.
Moreover, some modern JS VMs do hotspot optimisation, producing machine code from javascript. There is chance of it for Math.pow, but nearly impossible for the string magic.
If you 100% sure that Math.pow works slow in your application (I just cannot believe in it), you could use array lookup, it should work fastest possible: [1,10,100,1000,10000,...][n]. Array will be relatively small and complexity is O(1).
but I'm getting lost trying to understand of how numbers are represented themselves. Take JavaScript, for example, because it has several native number formats:
Internally, don't they all end up being stored and manipulated as binary values stored in memory?
Yep In javascript, there is only one number type a 64bit float type therefore
1 === 1.0
http://www.hunlock.com/blogs/The_Complete_Javascript_Number_Reference
If so, is the machine able to store the decimal and scientific-notation representations as fast as it does the octal?
Yes again because there is only one type. (Maybe there is a minute difference but it should be negligible)
However, for this specific case there is a limit on the numbers that can be represented ~ 1e300, therefore the runtime is O(~300) = O(1) all other numbers are represented as +/- Infinity.
And thus, would you expect +("1E" + n) to be faster than Math.pow(10, n)?
Not quite! 1E100 is faster than Math.pow(10,n)
However +("1E"+n) is slower than Math.pow(10,n);
Not because of string and memory allocation, but because the JS interpreter has to parse the string and convert it into a number, and that is slower than the native Math.pow(num,num) operation.
jsperf test
I ran a jsperf on the options.
var sum = 0;
for (var i = 1; i < 20; ++i){
sum += +("1E" + i);
}
is slow because of string concatenation.
var sum = 0;
for (var i = 0; i < 20; ++i){
Math.pow(10, i);
}
is therefore faster, since it operates on numbers only.
var sum = 0;
sum += 1e0;
sum += 1e1;
...
sum += 1e19;
is fastest, but only likely since 1ex for a constant are precomputed values.
To get the best peformance, you might want to precompute the answers for yourself.
Math.pow doesn't distinguish between numbers so it is just as slow for every number, provided that the interpreter doesn't optimize for integers. It is likely to allocate just a few floats to run. I am ignoring parsing time.
"1E"+n will allocate 2~3 string objects which might have quite a substantial memory overhead, destroy intermediates, and reparse it as a number. Unlikely to be faster than pow. I am again ignoring the parse time.

Categories