Caching in javascript

Caching in javascript - javascript

Is there a limit on the ammount of data I can store inside a javascript variable? If yes:
Is it limited by javascript, or by browser? (is it a fixed number, or a variable number?)
What if the limit is reached, or exceeded? Does the browser crash, or javascript throws an error?
If I am making a lot of ajax calls, to different pages, and I want to store the result of these ajax calls in a global variable in javascript for future use(to free up the amount of queries to the server, and quicken the response the user gets), is it guaranteed that my data will be stored in this variable?
For example:
function afterAjaxResponse(responseText) {
cache[ajaxIdentifier]=responseText;
}
Is there a limit on how many data I can store in the "cache" object? If yes, can I check somehow if the data to be stored still fits in it, and if not, free up the cache? (for example with a try/catch)
EDIT: The possible duplicate doesn't answer my question, because I want to know the limit of a javascript object, not a string, and it also doesn't answer to what happens when the limit is reached.
There must be a limit, but it would be nice to know, if that limit comes from javascript or the browser, and if I can check somehow if that limit is reached, to solve the problem accordingly.

The only hard-limit i can think of looking at your sample is the array size, that is defined in the ECMAScript standard as being the maximum value that can be represented in an unsigned 32bit integer (via ToUint32):
ToUint32: (Unsigned 32 Bit Integer)
The abstract operation ToUint32 converts its argument to one of 2^32
integer values in the range 0 through 2^32−1, inclusive.
No other limits are present on a generic variable itself, other than the memory available for allocation, if you have enough memory that variable will be stored, if not, it will not (i guess it will not fail gracefully).
There is no way for you to know if something went wrong during allocation, the best approach is to decide beforehand how much memory at max your cache will use and stick to that maximum size (limiting array size or using a circular array considering that it's a cache).

Related

Find JSON object size without parsing it to string

I know I can get the size of a JSON object in bytes by using JSON.parse(data).length.
//UTF-8 etc can be ignored for now but parsing takes time for huge data which I don't want.
Is there any way to get its size in MB without transforming it to a string?

We have next options:
Recurring calculation like object-sizeof library
object to string/buffer:
JSON.stringify({a:1}).length
v8.serialize({a:1}).length

For security reasons, Javascript is not allowed to access or mutate information about the device, therefore, determining exactly how many bytes an object occupies should be impossible.
With that being said, the following Javascript command DOES exist (within chrome only):
window.performance.memory
This returns an object with the amount of bytes the window can use at maximum, the amount of bytes used including free space, and the amount of bytes actually used. You could, theoretically, use that to determine the amount of bytes used before an object was created and after, and calculate the difference. The memory-stats project for instance utilizes that command.
However, the values in this object never change except if chrome was launched with the "--enable-precise-memory-info" flag. You therefore cannot use this command in a (production) application (the MDN docs indicate the same). You can only approach the amount of memory an object occupies by counting all the strings and numbers and multiplying that by how much bytes a string usually occupies (which is what the object-sizeof library does).
If you are just interested in the size of the object and do not wish to use that information in a production app, you can simply do so by making a timeline recording in the Chrome Devtools.

There is no native way to calculate the size of an object in Javascript but there is a node module that gives you the size of an object in bytes.
object-sizeof
This would be an example of what you need:  
var sizeof = require('object-sizeof');
// 2B per character, 6 chars total => 12B
console.log(`${sizeof({abc: 'def'})/1024} MB`);

What is the complexity of JSON.parse() in JavaScript?

The title says it all. I'm going to be parsing a very large JSON string and was curious what the complexity of this built in method was.
I would hope that it's θ(n) where n is the number of characters in the string since it can determine whether there is a syntax error or not.
I tried searching but couldn't come up with anything.

JSON is very simple grammar that does not require even lookaheads. As soon as GC is not involved then it is purely O(n).

I do not know of the implementations in browsers, but your assumption is correct to a certain point. If the JSON includes mainly strings, it will be straight forward and very linear. If you have many floating points, it will take a bit of time to convert the numbers, but again quite linear (numbers with more digits take slightly longer, but in comparison to a long string... very similar).
Since in most cases arrays and objects are declared as maps, the memory allocation grows as required and will generally be linear. Many (if not most) implementations will make use of Java as a backend. This means garbage collection and thus a quite impossible way to know for sure how much time will be required to transform all the data as it will very much depend on things such as the size of the memory model used on the target computer and how often the garbage collection runs. However, it should generally just grow as items are added to the map and it will thus mostly look like it is linear as well. I would not expect an implementation to make use of a realloc() which would mean copying data and thus being slower and slower as an array/object grows bigger and bigger.

Was curious a little more so to add more info I believe this is the "high-level" implementation of JSON.parse. I tried finding if Chromium has their own source for it and not sure if this is it? This is going off of the source from Github.
Things to note:
worst case is probably the scenario of handling objects which requires O(N) time where N is the number of characters.
in the case a reviver function is passed, it has to rewalk the entire object after it's created but that only happens once so it's fairly negligible. Also depends what the reviver function is doing and you will have to account for it's own time complexity.

Are the advantages of Typed Arrays in JavaScript is that they work the same or similar in C?

I've been playing around with Typed Arrays in JavaScript.
var buffer = new ArrayBuffer(16);
var int32View = new Int32Array(buffer);
I imagine normal arrays ([1, 257, true]) in JavaScript have poor performance because their values could be of any type, therefore, reaching an offset in memory is not trivial.
I originally thought that JavaScript array subscripts worked the same as objects (as they have many similarities), and were hash map based, requiring a hash based lookup. But I haven't found much credible information to confirm this.
So, I'd assume the reason why Typed Arrays perform so well is because they work like normal arrays in C, where they're always typed. Given the initial code example above, and wishing to get the 10th value in the typed array...
var value = int32View[10];
The type is Int32, so each value must consist of 32 bits or 4 bytes.
The subscript is 10.
So the location in memory of that value is <array offset> + (4 * 10), and then read 4 bytes to get the total value.
I basically just want to confirm my assumptions. Is my thoughts around this correct, and if not, please elaborate.
I checked out the V8 source to see if I could answer it myself, but my C is rusty and I'm not too familiar with C++.

Typed Arrays were designed by the WebGL standards committee, for performance reasons. Typically Javascript arrays are generic and can hold objects, other arrays and so on - and the elements are not necessarily sequential in memory, like they would be in C. WebGL requires buffers to be sequential in memory, because that's how the underlying C API expects them. If Typed Arrays are not used, passing an ordinary array to a WebGL function requires a lot of work: each element must be inspected, the type checked, and if it's the right thing (e.g. a float) then copy it out to a separate sequential C-like buffer, then pass that sequential buffer to the C API. Ouch - lots of work! For performance-sensitive WebGL applications this could cause a big drop in the framerate.
On the other hand, like you suggest in the question, Typed Arrays use a sequential C-like buffer already in their behind-the-scenes storage. When you write to a typed array, you are indeed assigning to a C-like array behind the scenes. For the purposes of WebGL, this means the buffer can be used directly by the corresponding C API.
Note your memory address calculation isn't quite enough: the browser must also bounds-check the array, to prevent out-of-range accesses. This has to happen with any kind of Javascript array, but in many cases clever Javascript engines can omit the check when it can prove the index value is already within bounds (such as looping from 0 to the length of the array). It also has to check the array index is really a number and not a string or something else! But it is in essence like you describe, using C-like addressing.
BUT... that's not all! In some cases clever Javascript engines can also deduce the type of ordinary Javascript arrays. In an engine like V8, if you make an ordinary Javascript array and only store floats in it, V8 may optimistically decide it's an array of floats and optimise the code it generates for that. The performance can then be equivalent to typed arrays. So typed arrays aren't actually necessary to reach maximum performance: just use arrays predictably (with every element the same type) and some engines can optimise for that as well.
So why do typed arrays still need to exist?
Optimisations like deducing the type of arrays is really complicated. If V8 deduces an ordinary array has only floats in it, then you store an object in an element, it has to de-optimise and regenerate code that makes the array generic again. It's quite an achievement that all this works transparently. Typed Arrays are much simpler: they're guaranteed to be one type, and you just can't store other things like objects in them.
Optimisations are never guaranteed to happen; you may store only floats in an ordinary array, but the engine may decide for various reasons not to optimise it.
The fact they're much simpler means other less-sophisticated javascript engines can easily implement them. They don't need all the advanced deoptimisation support.
Even with really advanced engines, proving optimisations can be used is extremely difficult and can sometimes be impossible. A typed array significantly simplifies the level of proof the engine needs to be able to optimise around it. A value returned from a typed array is certainly of a certain type, and engines can optimise for the result being that type. A value returned from an ordinary array could in theory have any type, and the engine may not be able to prove it will always have the same type result, and therefore generates less efficient code. Therefore code around a typed array is more easily optimised.
Typed arrays remove the opportunity to make a mistake. You just can't accidentally store an object and suddenly get far worse performance.
So, in short, ordinary arrays can in theory be equally fast as typed arrays. But typed arrays make it much easier to reach peak performance.

Yes, you are mostly correct. With a standard JavaScript array, the JavaScript engine has to assume that the data in the array is all objects. It can still store this as a C-like array/vector, where the access to the memory is still like you described. The problem is that the data is not the value, but something referencing that value (the object).
So, performing a[i] = b[i] + 2 requires the engine to:
access the object in b at index i;
check what type the object is;
extract the value out of the object;
add 2 to the value;
create a new object with the newly computed value from 4;
assign the new object from step 5 into a at index i.
With a typed array, the engine can:
access the value in b at index i (including placing it in a CPU register);
increment the value by 2;
assign the new object from step 2 into a at index i.
NOTE: These are not the exact steps a JavaScript engine will perform, as that depends on the code being compiled (including surrounding code) and the engine in question.
This allows the resulting computations to be much more efficient. Also, the typed arrays have a memory layout guarantee (arrays of n-byte values) and can thus be used to directly interface with data (audio, video, etc.).

When it comes to performance, things can change fast. As AshleysBrain says, it comes down to whether the VM can deduce that a normal array can be implemented as a typed array quickly and accurately. That depends on the particular optimizations of the particular JavaScript VM, and it can change in any new browser version.
This Chrome developer comment provides some guidance that worked as of June 2012:
Normal arrays can be as fast as typed arrays if you do a lot of sequential access. Random access outside the bounds of the array causes the array to grow.
Typed arrays are fast for access, but slow to be allocated. If you create temporary arrays frequently, avoid typed arrays. (Fixing this is possible, but it's low priority.)
Micro-benchmarks such as JSPerf are not reliable for real-world performance.
If I might elaborate on the last point, I've seen this phenomenon with Java for years. When you test the speed of a small piece of code by running it over and over again in isolation, the VM optimizes the heck out of it. It makes optimizations which only make sense for that specific test. Your benchmark can get a hundredfold speed improvement compared to running the same code inside another program, or compared to running it immediately after running several different tests that optimize the same code differently.

I'm not really contributor to any javascript engine, only had some readings on v8, so my answer might not be completely true:
Well values in arrays(only normal arrays with no holes/gaps, not sparse. Sparse arrays are treated as objects.) are all either pointers or a number with a fixed length(in v8 they are 32 bit, if a 31 bit integer then it's tagged with a 0 bit in the end, else it's a pointer).
So I don't think finding the memory location is any different than a typedArray, since the number of the bytes are the same all over the array. But the difference comes that if it's an a object, then you have to add one unboxing layer, which doesn't happen for normal typedArrays.
And ofcourse when accessing typedArrays, definitely doesn't have type checking's that a normal array have(though that might be remove in a higly optimized code, which is only generated for hot code).
For Writing, if it's the same type shouldn't be much slower. If it's a different type then the JS engine might generate polymorphic code for it, which is slower.
You can also try making some benchmarks on jsperf.com to confirm.

Best way to encode/decode large quantities of data for a JavaScript client?

I'm writing a standalone javascript application with Spine, Node.js, etc.(Here is an earlier incarnation of it if you are interested). Basically, the application is an interactive 'number property' explorer. The idea being that you can select any number, and see what properties it possesses. Is it a prime, or triangular, etc? Where are other numbers that share the same properties? That kind of thing.
At the moment I can pretty easily show like numbers 1-10k, but I would like to show properties for numbers 1-million, or even better 1-billion.
I want my client to download a set of static data files, and then use them to present the information to the user. I don't want to write a server backend.
Currently I'm using JSON for the data files. For some data, I know a simple algorithm to derive the information I'm looking for on the client side, and I use that (ie, is it even?). For the harder numbers, I pre compute them, and then store the values in JSON parseable data files. I've kinda gone a little overboard with the whole thing - I implemented a pure javascript bloom filter and when that didn't scale to 1 million for primes, I tried using CONCISE bitmaps underneath (which didn't help). Eventually I realized that it doesn't matter too much how 'compressed' I get my data, if I'm representing it as JSON.
So the question is - I want to display 30 properties for each number, and I want to show a million numbers...thats like 30 million data points. I want the javascript app to download this data and present it to the user, but I don't want the user to have to download megabytes of information to use the app...
What options do I have for efficiently sending these large sets of data to my javascript only solution?
Can I convert to binary and then read binary on the client side? Examples, please!

How about just computing these data points on the client?
You'll save yourself a lot of headache. You can pre-compute the index chart and leave the rest of the data-points to be processed only when the user selects a particular number.
For the properties exhibited per number. Pure JavaScript on modern desktops is blindingly fast (if you stay away from DOM), I think you'll find processing speed differences are negligible between the algorithmic vs pre-computed JSON solution and you'll be saving yourself a lot of pain and unnecessary bandwith usage.
As for the initial index chart, this displays only the number of properties per number and can be transferred as an array:
'[18,12,9,11,9,7,8,2,6,1,4, ...]'
or in JSON:
{"i": [18,12,9,11,9,7,8,2,6,1,4, ...]}
Note that this works the same for a logarithmic scale since either way you can only attach a value to 1 point in the screen at any one time. You just have to cater the contents of the array accordingly (by returning logarithmic values sequentially on a 1-2K sized array).
You can even use a DEFLATE algorithm to compress it further, but since you can only display a limited amount of numbers on screen (<1-2K pixels on desktop), I would recommend you create your solution around this fact, for example by checking if you can calculate 2K *30 = 60K properties on the go with minimal impact, which will probably be faster than asking the server at this point to give you some JSON.
UPDATE 10-Jan-2012
I just saw your comment about users being able to click on a particular property and get a list of numbers that display that property.
I think the intial transfer of number of properties above can be jazzed up to include all properties in the initial payload, bearing in mind that you only want to transfer the values for numbers displayed in the initial logarithmic scale you wish to display (that means that you can skip numbers if they are not going to be represented on screen when a user first loads the page or clicks on a property). Anything beyond the initial payload can be calculated on the client.
{
"n": [18,12,9,11,9,7,8,2,6,1,4, ...] // number of properties x 1-2K
"p": [1,2,3,5,7,13,...] // prime numbers x 1-2K
"f": [1,2,6, ...] // factorials x 1-2K
}
My guess is that a JSON object like this will be around 30-60K, but you can further reduce this by removing properties whose algorithms are not recursive and letting the client calculate those locally.
If you want an alternative way to compress those arrays when you get to large numbers, you can format your array as a VECTOR instead of a list of numbers, storing differences between one number and the next, this will keep space down when you are dealing with large numbers (>1000). An example of the JSON above using vectors would be as follows:
{
"n": [18,-6,-3,2,-2,-2,1,-6,4,-5,-1, ...] // vectorised no of properties x 1-2K
"p": [1,1,2,2,2,6,...] // vectorised prime numbers x 1-2K
"f": [1,1,4, ...] // vectorised factorials x 1-2K
}

I would say the easiest way would be to break the dataset out into multiple data files. The "client" can then download the files as-needed based on what number(s) the user is looking for.
One advantage of this is that you can tune the size of the data files as you see fit, from one number per file up to all of the numbers in one file. The client only has to know how to pick the file it's numbers are in. This does require there to be some server, but all it needs to do is serve out the static data files.
To reduce the data load, you can also cache the data files using local storage within the browser.

Greasemonkey Storage

Is there any limit on how much data can be stored using GM_setValue?

GM stores it in properties. Open about:config and look for them.
According to http://diveintogreasemonkey.org/api/gm_getvalue.html, you can find them in the greasemonkey.scriptvals branch.
This sqlite info on its limits shows some default limits for strings and blobs, but they may be changed by Firefox.

More information is in the Greasespot Wiki:
The Firefox preference store is not designed for storing large amounts of data. There are no hard limits, but very large amounts of data may cause Firefox to consume more memory and/or run more slowly.2
The link refers to a discussion in the Greasemonkey Mailinglist. Anthony Lieuallen answers the same question as you posted:
I've just tested this. Running up to a 32 meg string seems to work
without major issues, but 64 or 128 starts to thrash the disk for
virtual memory a fair deal.

According to the site you provided, "The value argument can be a string, boolean, or integer."
Obviously, a string can hold far more information than an integer or boolean.
Since GreaseMonkey scripts are JavaScript, the max length for a GM_setValue is the max length of a JavaScript string. Actually, the JavaScript engine (browser specific) determines the max length of a string.
I do not know any specifics, but you could write a script to determine max length.
Keep doubling length until you get an error. Then, try a value halfway between maxGoodLen and minBadLen until maxGoodLen = maxBadLen - 1.

We Keep Coding

JavaScript is the programming language of the Web.