JavaScript performance when handling large arrays - javascript

I'm currently writing an image editing program in JavaScript. I've chosen JS because I wanted to learn more about it. The average image I'm handling is about 3000 x 4000 pixels big. When converted into imageData (for editing the pixels), that adds up to 48000000 values I have to deal with. That's why I decided to introduce webworkers and let them edit only the n-th-part of the array. Pretending that I have ten webworkers, each worker will have to deal with 4800000 values.
To be able to use webworkers I'm dividing the big array through the amount of threads I've chosen. The piece of code I use looks like this:
while(pixelArray.length > 0){
cD.pixelsSliced.push(pixelArray.splice(0, chunks)); //Chop off a chunk from the picture array
}
Later after the workers have done something with the array, they save it into another array. Each worker has an ID and saves his part in the mentioned array at the place of his id (to make sure the arrays stay in the correct order). I use $.map to concat that array (looking like [[1231][123213123][213123123]] into one big array [231231231413431] from which I will later create the imageData I need. It looks like that:
cD.newPixels = jQuery.map(pixelsnew, function(n){
return n;
});
After this array (cD.pixelsSliced) is created I create imageData and copy this image into the imageData-Object like so:
cD.imageData = cD.context.createImageData(cD.width, cD.height);
for(var i = 0; i < cD.imageData.data.length; i +=4){ //Build imageData
cD.imageData.data[i + eD.offset["r"]] = cD.newPixels[i + eD.offset["r"]];
cD.imageData.data[i + eD.offset["g"]] = cD.newPixels[i + eD.offset["g"]];
cD.imageData.data[i + eD.offset["b"]] = cD.newPixels[i + eD.offset["b"]];
cD.imageData.data[i + eD.offset["a"]] = cD.newPixels[i + eD.offset["a"]];
}
Now I do realize that I'm dealing with a huge amount of data here and that I probably shouldn't use the browser for image editing, but a different language (I'm using Java in uni). However I was wondering if you have any tips regarding the performance, because frankly I was pretty surprised when I tried a big image for the first time. I didn't figure, that it would take "that" long to load the image (First peace of code). Firefox actually thinks that my script is broken. The other two parts of codes are those ones which I found to slow down the script (which is normal). So yeah I would be thankful for any tips.
Thank you

I would recommend looking into Transferable Objects instead of Structured Cloning when using Web Workers. Web Workers normally use structured cloning to pass objects, in other words a copy is made. This can take loads of time for large objects such as large images.
When using Transferable Objects data is transferred from one context to another. In other words, zero-copy, which should improve the performance of sending data to a Worker.
For more info check:
http://www.w3.org/html/wg/drafts/html/master/infrastructure.html#transferable-objects
Also, another idea perhaps would be to move the task of splitting and butting back the large array to a web worker.
Just brainstorming here, but, perhaps you could first spaw a Web Worker, let's call it Mother Worker. This worker could split the array and then spawn 10 other child workers that performs the heavy duty task and sends back to their mother.
The mother finally puts it all back together and send back to main application.

Related

Building large strings in JavaScript ; Is join method most efficient?

In writing a database to disk as a text file of JSON strings, I've been experimenting with how to most efficiently build the string of text that is ultimately converted to a blob for download to disk.
There a number of questions that state to not concatenate a string with the + operator in a loop, but instead write the component strings to an array and then use the join method to build one large string.
The best explanation I came across explaining why can be found here, by Jeol Mueller:
In JavaScript (and C# for that matter) strings are immutable. They can never be changed, only replaced with other strings. You're probably aware that combined + "hello " doesn't directly modify the combined variable - the operation creates a new string that is the result of concatenating the two strings together, but you must then assign that new string to the combined variable if you want it to be changed.
So what this loop is doing is creating a million different string objects, and throwing away 999,999 of them. Creating that many strings that are continually growing in size is not fast, and now the garbage collector has a lot of work to do to clean up after this."
The thread here, was also helpful.
However, using the join method didn't allow me to build the string I was aiming for without getting the error:
allocation size overflow
I was trying to write 50,000 JSON strings from a database into one text file, which simply may have been too large no matter what. I think it was reaching over 350MB. I was just testing the limit of my application and picked something far larger than a user of the application will likely ever create. So, this test case was likely unreasonable.
Nonetheless, this leaves me with three questions about working with large strings.
For the same amount of data overall, does altering the number of array elements joined in a single join operation affect the efficiency in terms of not hitting an allocation size overflow?
For example, I tried writing the JSON strings to a pseudo 3-D array of 100 (and then 50) elements per dimension; and then looped through the outer two dimensions joining them together. 100^3 = 1,000,000 or 50^3 = 125,000 both provide more than enough entries to hold the 50,000 JSON strings. I know I'm not including the 0 index, here.
So, the 50,000 strings were held in an array from a[1][1][1] to a5[100][100] in the first attempt and of a[1][1][1] to a[20][50][50] in the second attempt. If the dimensions are i, j, k from outer to inner, I joined all the k elements in each a[i][j]; and then joined all of those i x j joins, and lastly all of these i joins into the final text string.
All attemtps still hit the allocation size overflow before completing.
So, is there any difference between joining 50,000 smaller strings in one join versus 50 larger strings, if the total data is the same?
Is there a better, more efficient way to build large strings than the join method?
Does the same principle described by Joel Mueller regarding string concatenation apply to reducing a string through substring, such as string = string.substring(position)?
The context of this third question is that when I read a text file in as a string and break it down into its component JSON strings before writing to the database, I use an array that is map of the file layout; so, I know the length of each JSON string in advance and repeat three statements inside a loop:
l = map[i].l;
str = text.substring(0,l);
text = text.substring(l).
It would appear that since strings are immutable, this sort of reverse of concatenation step is as inefficient as using the + operator to concatenate.
Would it be more efficient to not delete the str from text each iteration, and just keep track of the increasing start and end positions for the substrings as step through the loop reading the entire text string?
Response to message about duplicate question
I got a message, I guess from the stackoverflow system itself, asking me to edit my question explaining why it is different from the proposed duplicate.
Reasons are:
The proposed duplicate asks specifically and exclusively about the maximum size of a single string. None of the three bolded questions, here, asks about the maximum size of a single string, although that is useful to know.
This question asks about the most efficient way of building large strings and that isn't addressed in the answers found in the proposed duplicate, apart from an efficent way of building a large test string. They don't address how to build a realistic string, comprised of actual application data.
This question provides a couple links to some information concerning the efficiency of building large strings that may be helpful to those interested in more than the maximum size alone.
This question also has a specific context of why the large string was being built, which led to some suggestions about how to handle that situation in a more efficient manner. Although, in the strictest sense, they don't specifically address the question by title, they do address the broader context of the question as presented, which is how to deal with the large strings, even if that means ways to work around them. Someone searching on this same topic might find real help in these suggestions that is not provided in the proposed duplicate.
So, although the proposed duplicate is somewhat helpful, it doesn't appear to be anywhere near a genuine duplicate of this question in its full context.
Additional Information
This doesn't answer the question concerning the most efficient way to build a large string, but it refers to the comments about how to get around the string size limit.
Converting each component string to a blob and holding them in an array, and then converting the array of blobs into a single blob, accomplished this. I don't know what the size limit of a single blob is, but did see 800MB in another question.
A process (or starting point) for creating the blob to write the database to disk and then to read it back in again can be found here.
Regarding the idea of writing the blobs or strings to disk as they are generated on the client as opposed to generating one giant string or blob for download, although the most logical and efficient method, may not be possible in the scenario presented here of an offline application.
According to this question, web extensions no longer have access to the privileged javascript code necessary to accomplish this through the File API.
I asked this question related to the Streams API write stream method and something called StreamSaver.
In writing a database to disk as a text file of JSON strings.
I see no reason to store the data in a string or array of strings in this case. Instead you can write the data directly to the file.
In the simplest case you can write each string to the file separately.
To get better performance, you could first write some data to a smaller buffer, and then write that buffer to disk when it's full.
For best performance you could create a file of a certain size and create a memory mapping over that file. Then write/copy the data directly to the mapped memory (which is your file). The trick would be to know or guess the size up front, or you could resize the file when needed and then remap the file.
Joining or growing strings will trigger a lot of memory (re)allocations, which is unnecessary overhead in this case.
I don't want the user to have to download more than one file
If the goal is to let a user download that generated file, you could even do better by streaming those strings directly to the user without even creating a file. This also has the advantage that the user starts receiving data immediately instead of first having to wait till the whole file is generated.
Because the file size is not known up front, you could use chunked transfer encoding.

Node.js JSON.parse on object creation vs. with getter property

This is largely an 'am I doing it right / how can I do this better' kind of topic, with some concrete questions at the end. If you have other advice / remarks on the text below, even if I didn't specifically ask those questions, feel free to comment.
I have a MySQL table for users of my app that, along with a set of fixed columns, also has a text column containing a JSON config object. This is to store variable configuration data that cannot be stored in separate columns because it has different properties per user. There doesn't need to be any lookup / ordering / anything on the configuration data, so we decided this would be the best way to go.
When querying the database from my Node.JS app (running on Node 0.12.4), I assign the JSON text to an object and then use Object.defineProperty to create a getter property that parses the JSON string data when it is needed and adds it to the object.
The code looks like this:
user =
uid: results[0].uid
_c: results[0].user_config # JSON config data as string
Object.defineProperty user, 'config',
get: ->
#c = JSON.parse #_c if not #c?
return #c
Edit: above code is Coffeescript, here's the (approximate) Javascript equivalent for those of you who don't use Coffeescript:
var user = {
uid: results[0].uid,
_c: results[0].user_config // JSON config data as string
};
Object.defineProperty(user, 'config', {
get: function() {
if(this.c === undefined){
this.c = JSON.parse(this._c);
}
return this.c;
}
});
I implemented it this way because parsing JSON blocks the Node event loop, and the config property is only needed about half the time (this is in a middleware function for an express server) so this way the JSON would only be parsed when it is actually needed. The config data itself can range from 5 to around 50 different properties organised in a couple of nested objects, not a huge amount of data but still more than just a few lines of JSON.
Additionally, there are three of these JSON objects (I only showed one since they're all basically the same, just with different data in them). Each one is needed in different scenarios but all of the scenarios depend on variables (some of which come from external sources) so at the point of this function it's impossible to know which ones will be necessary.
So I had a couple of questions about this approach that I hope you guys can answer.
Is there a negative performance impact when using Object.defineProperty, and if yes, is it possible that it could negate the benefit from not parsing the JSON data right away?
Am I correct in assuming that not parsing the JSON right away will actually improve performance? We're looking at a continuously high number of requests and we need to process these quickly and efficiently.
Right now the three JSON data sets come from two different tables JOINed in an SQL query. This is to only have to do one query per request instead of up to four. Keeping in mind that there are scenarios where none of the JSON data is needed, but also scenarios where all three data sets are needed (and of course scenarios inbetween), could it be an improvement to only get the required JSON data from its table, at the point when one of the data sets is actually needed? I avoided this because I feel like waiting for four separate SELECT queries to be executed would take longer than waiting for one query with two JOINed tables.
Are there other ways to approach this that would improve the general performance even more? (I know, this one's a bit of a subjective question, but ideas / suggestions of things I should check out are welcome). I'm not looking to spin off parsing the JSON data into a separate thread though, because as our service runs on a cluster of virtualised single-core servers, creating a child process would only increase overall CPU usage, which at high loads would have even more negative impact on performance.
Note: when I say performance it mainly means fast and efficient throughput rates. We prefer a somewhat larger memory footprint over heavier CPU usage.
We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil
- Donald Knuth
What do I get from that article? Too much time is spent in optimizing with dubious results instead of focusing on design and clarity.
It's true that JSON.parse blocks the event loop, but every synchronous call does - this is just code execution and is not a bad thing.
The root concern is not that it is blocking, but how long it is blocking. I remember a Strongloop instructor saying 10ms was a good rule of thumb for max execution time for a call in an app at cloud scale. >10ms is time to start optimizing - for apps at huge scale. Each app has to define that threshold.
So, how much execution time will your lazy init save? This article says it takes 1.5s to parse a 15MB json string - about 10,000 B/ms. 3 configs, 50 properties each, 30 bytes/k-v pair = 4500 bytes - about half a millisecond.
When the time came to optimize, I would look at having your lazy init do the MySQL call. A config is needed only 50% of the time, it won't block the event loop, and an external call to a db absolutely dwarfs a JSON.parse().
All of this to say: What you are doing is not necessarily bad or wrong, but if the whole app is littered with these types of dubious optimizations, how does that impact feature addition and maintenance? The biggest problems I see revolve around time to market, not speed. Code complexity increases time to market.
Q1: Is there a negative performance impact when using Object.defineProperty...
Check out this site for a hint.
Q2: *...not parsing the JSON right away will actually improve performance...
IMHO: inconsequentially
Q3: Right now the three JSON data sets come from two different tables...
The majority db query cost is usually the out of process call and the network data transport (unless you have a really bad schema or config). All data in one call is the right move.
Q4: Are there other ways to approach this that would improve the general performance
Impossible to tell. The place to start is with an observed behavior, then profiler tools to identify the culprit, then code optimization.

how to pass large data to web workers

I am working on web workers and I am passing large amount of data to web worker, which takes a lot of time. I want to know the efficient way to send the data.
I have tried the following code:
var worker = new Worker('js2.js');
worker.postMessage( buffer,[ buffer]);
worker.postMessage(obj,[obj.mat2]);
if (buffer.byteLength) {
alert('Transferables are not supported in your browser!');
}
UPDATE
Modern versions of Chrome, Edge, and Firefox now support SharedArrayBuffers (though not safari at the time of this writing see SharedArrayBuffers on MDN), so that would be another possibility for a fast transfer of data with a different set of trade offs compared to a transferrable (you can see MDN for all the trade offs and requirements of SharedArrayBuffers).
UPDATE:
According to Mozilla the SharedArrayBuffer has been disabled in all major browsers, thus the option described in the following EDIT does no longer apply.
Note that SharedArrayBuffer was disabled by default in all major
browsers on 5 January, 2018 in response to Spectre.
EDIT: There is now another option and it is sending a sharedArray buffer. This is part of ES2017 under shared memory and atomics and is now supported in FireFox 54 Nightly. If you want to read about it you can look here. I will probably write up something some time and add it to my answer. I will try and add to the performance benchmark as well.
To answer the original question:
I am working on web workers and I am passing large amount of data to
web worker, which takes a lot of time. I want to know the efficient
way to send the data.
The alternative to #MichaelDibbets answer, his sends a copy of the object to the webworker, is using a transferrable object which is zero-copy.
It shows that you were intending to make your data transferrable, but I'm guessing it didn't work out. So I will explain what it means for some data to be transferrable for you and future readers.
Transferring objects "by reference" (although that isn't the perfect term for it as explained in the next quote) doesn't just work on any JavaScript Object. It has to be a transferrable data-type.
[With Web Workers] Most browsers implement the structured cloning
algorithm, which allows you to pass more complex types in/out of
Workers such as File, Blob, ArrayBuffer, and JSON objects. However,
when passing these types of data using postMessage(), a copy is still
made. Therefore, if you're passing a large 50MB file (for example),
there's a noticeable overhead in getting that file between the worker
and the main thread.
Structured cloning is great, but a copy can take hundreds of
milliseconds. To combat the perf hit, you can use Transferable
Objects.
With Transferable Objects, data is transferred from one context to
another. It is zero-copy, which vastly improves the performance of
sending data to a Worker. Think of it as pass-by-reference if you're
from the C/C++ world. However, unlike pass-by-reference, the 'version'
from the calling context is no longer available once transferred to
the new context. For example, when transferring an ArrayBuffer from
your main app to Worker, the original ArrayBuffer is cleared and no
longer usable. Its contents are (quiet literally) transferred to the
Worker context.
- Eric Bidelman Developer at Google, source: html5rocks
The only problem is there are only two things that are transferrable as of now. ArrayBuffer, and MessagePort. (Canvas Proxies are hopefully coming later). ArrayBuffers cannot be manipulated directly through their API and should be used to create a typed array object or a DataView to give a particular view into the buffer and be able to read and write to it.
From the html5rocks link
To use transferrable objects, use a slightly different signature of
postMessage():
worker.postMessage(arrayBuffer, [arrayBuffer]);
window.postMessage(arrayBuffer, targetOrigin, [arrayBuffer]);
The worker case, the first argument is the data and the second is the
list of items that should be transferred. The first argument doesn't
have to be an ArrayBuffer by the way. For example, it can be a JSON
object:
worker.postMessage({data: int8View, moreData: anotherBuffer}, [int8View.buffer, anotherBuffer]);
So according to that your
var worker = new Worker('js2.js');
worker.postMessage(buffer, [ buffer]);
worker.postMessage(obj, [obj.mat2]);
should be performing at great speeds and should be being transferred zero-copy. The only problem would be if your buffer or obj.mat2 is not an ArrayBuffer or transferrable. You may be confusing ArrayBuffers with a view of a typed array instead of what you should be using its buffer.
So if you have this ArrayBuffer and it's Int32 representation. (though the variable is titled view it is not a DataView, but DataView's do have a property buffer just as typed arrays do. Also at the time this was written the MDN use the name 'view' for the result of calling a typed arrays constructor so I assumed it was a good way to define it.)
var buffer = new ArrayBuffer(90000000);
var view = new Int32Array(buffer);
for(var c=0;c<view.length;c++) {
view[c]=42;
}
This is what you should not do (send the view)
worker.postMessage(view);
This is what you should do (send the ArrayBuffer)
worker.postMessage(buffer, [buffer]);
These are the results after running this test on plnkr.
Average for sending views is 144.12690000608563
Average for sending ArrayBuffers is 0.3522000042721629
EDIT: As stated by #Bergi in the comments you don't need the buffer variable at all if you have the view, because you can just send view.buffer like so
worker.postMessage(view.buffer, [view.buffer]);
Just as a side note to future readers just sending an ArrayBuffer without the last argument specifying what the ArrayBuffers are you will not send the ArrayBuffer transferrably
In other words when sending transferrables you want this:
worker.postMessage(buffer, [buffer]);
Not this:
worker.postMessage(buffer);
EDIT: And one last note since you are sending a buffer don't forget to turn your buffer back into a view once it's received by the webworker. Once it's a view you can manipulate it (read and write from it) again.
And for the bounty:
I am also interested in official size limits for firefox/chrome (not
only time limit). However answer the original question qualifies for
the bounty (;
As to a webbrowsers limit to send something of a certain size I am not completeley sure, but from that quote that entry on html5rocks by Eric Bidelman when talking about workers he did bring up a 50 mb file being transferred without using a transferrable data-type in hundreds of milliseconds and as shown through my test in a only around a millisecond using a transferrable data-type. Which 50 mb is honestly pretty large.
Purely my own opinion, but I don't believe there to be a limit on the size of the file you send on a transferrable or non-transferrable data-type other than the limits of the data type itself. Of course your biggest worry would probably be for the browser stopping long running scripts if it has to copy the whole thing and is not zero-copy and transferrable.
Hope this post helps. Honestly I knew nothing about transferrables before this, but it was fun figuring out them through some tests and through that blog post by Eric Bidelman.
I had issues with webworkers too, until I just passed a single argument to the webworker.
So instead of
worker.postMessage( buffer,[ buffer]);
worker.postMessage(obj,[obj.mat2]);
Try
var myobj = {buffer:buffer,obj:obj};
worker.postMessage(myobj);
This way I found it gets passed by reference and its insanely fast. I post back and forth over 20.000 dataelements in a single push per 5 seconds without me noticing the datatransfer.
I've been exclusively working with chrome though, so I don't know how it'll hold up in other browsers.
Update
I've done some testing for some stats.
tmp = new ArrayBuffer(90000000);
test = new Int32Array(tmp);
for(c=0;c<test.length;c++) {
test[c]=42;
}
for(c=0;c<4;c++) {
window.setTimeout(function(){
// Cloning the Array. "We" will have lost the array once its sent to the webworker.
// This is to make sure we dont have to repopulate it.
testsend = new Int32Array(test);
// marking time. sister mark is in webworker
console.log("sending at at "+window.performance.now());
// post the clone to the thread.
FieldValueCommunicator.worker.postMessage(testsend);
},1000*c);
}
results of the tests. I don't know if this falls in your category of slow or not since you did not define "slow"
sending at at 28837.418999988586
recieved at 28923.06199995801
86 ms
sending at at 212387.9840001464
recieved at 212504.72499988973
117 ms
sending at at 247635.6210000813
recieved at 247760.1259998046
125 ms
sending at at 288194.15999995545
recieved at 288304.4079998508
110 ms
It depends on how large the data is
I found this article that says, the better strategy is to pass large data to a web worker and back in small bits. In addition, it also discourages the use of ArrayBuffers.
Please have a look: https://developers.redhat.com/blog/2014/05/20/communicating-large-objects-with-web-workers-in-javascript

StarDict support for JavaScript and a Firefox OS App

I wrote a dictionary app in the spirit of GoldenDict (www.goldendict.org, also see Google Play Store for more information) for Firefox OS: http://tuxor1337.github.io/firedict and https://marketplace.firefox.com/app/firedict
Since apps for ffos are based on HTML, CSS and JavaScript (WebAPI etc.), I had to write everything from scratch. At first, I wrote a basic library for synchronous and asynchronous access to StarDict dictionaries in JavaScript: https://github.com/tuxor1337/stardict.js
Although the app can be called stable by now, overall performance is still a bit sluggish. For some dictionaries, I have a list of words of almost 1,000,000 entries! That's huge. Indexing takes a really long time (up to several minutes per dictionary) and lookup as well. At the moment, the words are stored in an IndexedDB object store. Is there another alternative? With the current solution (words accessed and inserted using binary search) the overall experience is pretty slow. Maybe it would become faster, if there was some locale sort support by IndexedDB... Actually, I'm not even storing the terms themselves in the DB but only their offsets in the *.syn/*.idx file. I hope to save some memory doing that. But of course I'm not able to use any IDB sorting functionality with this configuration...
Maybe it's not the best idea to do the sorting in memory, because now the app is killed by the kernel due to an OOM on some devices (e.g. ZTE Open). A dictionary with more than 500,000 entries will definitely exceed 100 MB in memory. (That's only 200 Byte per entry and if you suppose the keyword strings are UTF-8, you'll exceed 100 MB immediately...)
Feel free to contribute directly to the project on GitHub. Otherwise, I would be glad to hear your advice concerning the above issues.
I am working on a pure Javascript implementation of MDict parser (https://github.com/fengdh/mdict-js) simliliar to your stardict project. MDict is another popular dictionary format with rich format (embeded image/audio/css etc.), which is widely support on window/linux/ios/android/windows phone. I have some ideas to share, and wish you can apply it to improve stardict.js in future.
MDict dictionary file (mdx/mdd) divides keyword and record into (optionaly compressed) block each contains around 2000 entries, and also provides a keyword block index table and record block index table to help quick look-up. Because of its compact data structure, I can implement my MDict parser scanning directly on dictionary file with small pre-load index table but no need of IndexDB.
Each keyword block index looks like:
{num_entries: ..,
first_word: ..,
last_word: ..,
comp_size: .., // size in compression
decomp_size: .., // size after decompression
offset: .., // offset in mdx file
index: ..
}
In keyblock, each entries is a pair of [keyword, offset]
Each record block index looks like:
{comp_size: .., // size in compression
decomp_size: .., // size after decompression
}
Given a word, use binary search to locate the keyword block maybe containing it.
Slice the keyword block and Load all keys in it, filter out matched one and get its record offfset.
Use binary search to locate the record block containing the word's record.
Slice the record block and retrieve its record (a definition in text or resource in ArrayBuffer) directly.
Since each block contains only around 2000 entries, it is fast enough to lookup word among 100K~1M dictionary entries within 100ms, quite decent value for human interaction. mdict-js parses file head only, it is super fast and of low memory usage.
In the same way, it is possible to retrieve a list of neighboring words for given phrase, even with wild card.
Please take a look on my online demo here: http://fengdh.github.io/mdict-js/
(You have to choose a local MDict dictionary: a mdx + optional mdd file)

Have I reached the limits of the size of objects JavaScript in my browser can handle?

I'm embedding a large array in <script> tags in my HTML, like this (nothing surprising):
<script>
var largeArray = [/* lots of stuff in here */];
</script>
In this particular example, the array has 210,000 elements. That's well below the theoretical maximum of 231 - by 4 orders of magnitude. Here's the fun part: if I save JS source for the array to a file, that file is >44 megabytes (46,573,399 bytes, to be exact).
If you want to see for yourself, you can download it from GitHub. (All the data in there is canned, so much of it is repeated. This will not be the case in production.)
Now, I'm really not concerned about serving that much data. My server gzips its responses, so it really doesn't take all that long to get the data over the wire. However, there is a really nasty tendency for the page, once loaded, to crash the browser. I'm not testing at all in IE (this is an internal tool). My primary targets are Chrome 8 and Firefox 3.6.
In Firefox, I can see a reasonably useful error in the console:
Error: script stack space quota is exhausted
In Chrome, I simply get the sad-tab page:
Cut to the chase, already
Is this really too much data for our modern, "high-performance" browsers to handle?
Is there anything I can do* to gracefully handle this much data?
Incidentally, I was able to get this to work (read: not crash the tab) on-and-off in Chrome. I really thought that Chrome, at least, was made of tougher stuff, but apparently I was wrong...
Edit 1
#Crayon: I wasn't looking to justify why I'd like to dump this much data into the browser at once. Short version: either I solve this one (admittedly not-that-easy) problem, or I have to solve a whole slew of other problems. I'm opting for the simpler approach for now.
#various: right now, I'm not especially looking for ways to actually reduce the number of elements in the array. I know I could implement Ajax paging or what-have-you, but that introduces its own set of problems for me in other regards.
#Phrogz: each element looks something like this:
{dateTime:new Date(1296176400000),
terminalId:'terminal999',
'General___BuildVersion':'10.05a_V110119_Beta',
'SSM___ExtId':26680,
'MD_CDMA_NETLOADER_NO_BCAST___Valid':'false',
'MD_CDMA_NETLOADER_NO_BCAST___PngAttempt':0}
#Will: but I have a computer with a 4-core processor, 6 gigabytes of RAM, over half a terabyte of disk space ...and I'm not even asking for the browser to do this quickly - I'm just asking for it to work at all! ☹
Edit 2
Mission accomplished!
With the spot-on suggestions from Juan as well as Guffa, I was able to get this to work! It would appear that the problem was just in parsing the source code, not actually working with it in memory.
To summarize the comment quagmire on Juan's answer: I had to split up my big array into a series of smaller ones, and then Array#concat() them, but that wasn't enough. I also had to put them into separate var statements. Like this:
var arr0 = [...];
var arr1 = [...];
var arr2 = [...];
/* ... */
var bigArray = arr0.concat(arr1, arr2, ...);
To everyone who contributed to solving this: thank you. The first round is on me!
*other than the obvious: sending less data to the browser
Here's what I would try: you said it's a 44MB file. That surely takes more than 44MB of memory, I'm guessing this takes much over 44MB of RAM, maybe half a gig. Could you just cut down the data until the browser doesn't crash and see how much memory the browser uses?
Even apps that run only on the server would be well served to not read a 44MB file and keep it in memory. Having said all that, I believe the browser should be able to handle it, so let me run some tests.
(Using Windows 7, 4GB of memory)
First Test
I cut the array in half and there were no problems, uses 80MB, no crash
Second Test
I split the array into two separate arrays, but still contains all the data, uses 160Mb, no crash
Third Test
Since Firefox said it ran out of stack, the problem is probably that it can't parse the array at once. I created two separate arrays, arr1, arr2 then did arr3 = arr1.concat(arr2); It ran fine and uses only slightly more memory, around 165MB.
Fourth Test I am creating 7 of those arrays (22MB each) and concatting them to test browser limits. It takes about 10 seconds for the page to finish loading. Memory goes up to 1.3GB, then it goes back down to 500MB. So yeah chrome can handle it. It just can't parse it all at once because it uses some kind of recursion as can be noticed by the console's error message.
Answer Create separate arrays (less than 20MB each) and then concat them. Each array should be on its own var statement, instead of doing multiple declarations with a single var.
I would still consider fetching only the necessary part, it may make the browser sluggish. however, if it's an internal task, this should be fine.
Last point: You're not at maximum memory levels, just max parsing levels.
Yes, it's too much to ask of a browser.
That amount of data would be managable if it already was data, but it isn't data yet. Consider that the browser has to parse that huge block of source code while checking that the syntax adds up for it all. Once parsed into valid code, the code has to run to produce the actual array.
So, all of the data will exist in (at least) two or three versions at once, each with a certain amount of overhead. As the array literal is a single statement, each step will have to include all of the data.
Dividing the data into several smaller arrays would possibly make it easier on the browser.
Do you really need all the data? can't you stream just the data currently needed using AJAX? Similar to Google Maps - you can't fit all the map data into browser's memory, they display just the part you are currently seeing.
Remember that 40 megs of hard data can be inflated to much more in browser's internal representation. For example the JS interpreter may use hashtable to implement the array, which would add additional memory overhead. Also, I expect that the browsers stores both source code and the JS memory, that alone doubles the amount of data.
JS is designed to provide client-side UI interaction, not handle loads of data.
EDIT:
Btw, do you really think users will like downloading 40 megabytes worth of code? There are still many users with less than broadband internet access. And execution of the script will be suspended until all the data is downloaded.
EDIT2:
I had a look at the data. That array will definitely be represented as hashtable. Also many of the items are objects, which will require reference tracking...that is additional memory.
I guess the performance would be better if it was simple vector of primitive data.
EDIT3: The data could certainly be simplified. The bulk of it are repeating strings, which could be encoded
in some way as integers or something. Also, my Opera is having trouble just displaying the text, let alone interpreting it.
EDIT4: Forget the DateTime objects! Use unix era timestamps or strings, but not objects!
EDIT5: Your processor doesn't matter because JS is single-threaded. And your RAM doesn't matter either, most browsers are 32bit, so they can't use much of that memory.
EDIT6: Try changing the array indices to sequential integers (0, 1, 2, 3...). That might make the browser use more efficient array data structure. You can use constants to access the array items efficiently. This is going to cut down the array size by huge chunk.
Try retrieving the data with Ajax as an JSON page. I don't know the exact size but I've been able to pull large amounts of data into Google Chrome that way.
Use lazy loading. Have pointers to the data and get it when the user asks.
This technique is used in various places to manage millions of records of data.
[Edit]
I found what I was looking for. Virtual scrolling in the jqgrid. That's 500k records being lazy loaded.
I would try having it as one big string with separator between each "item" then use split, something like:
var largeString = "item1,item2,.......";
var largeArray = largeString.split(",");
Hopefully string won't exhaust the stack so fast.
Edit: in order to test it I created dummy array with 200,000 simple items (each item one number) and Chrome loaded it within an instant. 2,000,000 items? Couple of seconds but no crash. 6,000,000 items array (50 MB file) made Chrome load for about 10 seconds but still, no crash in either ways.
So this leads me to believe the problem is not with the array itself but rather it's contents.. optimize the contents to simple items then parse them "on the fly" and it should work.

Categories