how to pass large data to web workers

how to pass large data to web workers - javascript

I am working on web workers and I am passing large amount of data to web worker, which takes a lot of time. I want to know the efficient way to send the data.
I have tried the following code:
var worker = new Worker('js2.js');
worker.postMessage( buffer,[ buffer]);
worker.postMessage(obj,[obj.mat2]);
if (buffer.byteLength) {
alert('Transferables are not supported in your browser!');
}

UPDATE
Modern versions of Chrome, Edge, and Firefox now support SharedArrayBuffers (though not safari at the time of this writing see SharedArrayBuffers on MDN), so that would be another possibility for a fast transfer of data with a different set of trade offs compared to a transferrable (you can see MDN for all the trade offs and requirements of SharedArrayBuffers).
UPDATE:
According to Mozilla the SharedArrayBuffer has been disabled in all major browsers, thus the option described in the following EDIT does no longer apply.
Note that SharedArrayBuffer was disabled by default in all major
browsers on 5 January, 2018 in response to Spectre.
EDIT: There is now another option and it is sending a sharedArray buffer. This is part of ES2017 under shared memory and atomics and is now supported in FireFox 54 Nightly. If you want to read about it you can look here. I will probably write up something some time and add it to my answer. I will try and add to the performance benchmark as well.
To answer the original question:
I am working on web workers and I am passing large amount of data to
web worker, which takes a lot of time. I want to know the efficient
way to send the data.
The alternative to #MichaelDibbets answer, his sends a copy of the object to the webworker, is using a transferrable object which is zero-copy.
It shows that you were intending to make your data transferrable, but I'm guessing it didn't work out. So I will explain what it means for some data to be transferrable for you and future readers.
Transferring objects "by reference" (although that isn't the perfect term for it as explained in the next quote) doesn't just work on any JavaScript Object. It has to be a transferrable data-type.
[With Web Workers] Most browsers implement the structured cloning
algorithm, which allows you to pass more complex types in/out of
Workers such as File, Blob, ArrayBuffer, and JSON objects. However,
when passing these types of data using postMessage(), a copy is still
made. Therefore, if you're passing a large 50MB file (for example),
there's a noticeable overhead in getting that file between the worker
and the main thread.
Structured cloning is great, but a copy can take hundreds of
milliseconds. To combat the perf hit, you can use Transferable
Objects.
With Transferable Objects, data is transferred from one context to
another. It is zero-copy, which vastly improves the performance of
sending data to a Worker. Think of it as pass-by-reference if you're
from the C/C++ world. However, unlike pass-by-reference, the 'version'
from the calling context is no longer available once transferred to
the new context. For example, when transferring an ArrayBuffer from
your main app to Worker, the original ArrayBuffer is cleared and no
longer usable. Its contents are (quiet literally) transferred to the
Worker context.
- Eric Bidelman Developer at Google, source: html5rocks
The only problem is there are only two things that are transferrable as of now. ArrayBuffer, and MessagePort. (Canvas Proxies are hopefully coming later). ArrayBuffers cannot be manipulated directly through their API and should be used to create a typed array object or a DataView to give a particular view into the buffer and be able to read and write to it.
From the html5rocks link
To use transferrable objects, use a slightly different signature of
postMessage():
worker.postMessage(arrayBuffer, [arrayBuffer]);
window.postMessage(arrayBuffer, targetOrigin, [arrayBuffer]);
The worker case, the first argument is the data and the second is the
list of items that should be transferred. The first argument doesn't
have to be an ArrayBuffer by the way. For example, it can be a JSON
object:
worker.postMessage({data: int8View, moreData: anotherBuffer}, [int8View.buffer, anotherBuffer]);
So according to that your
var worker = new Worker('js2.js');
worker.postMessage(buffer, [ buffer]);
worker.postMessage(obj, [obj.mat2]);
should be performing at great speeds and should be being transferred zero-copy. The only problem would be if your buffer or obj.mat2 is not an ArrayBuffer or transferrable. You may be confusing ArrayBuffers with a view of a typed array instead of what you should be using its buffer.
So if you have this ArrayBuffer and it's Int32 representation. (though the variable is titled view it is not a DataView, but DataView's do have a property buffer just as typed arrays do. Also at the time this was written the MDN use the name 'view' for the result of calling a typed arrays constructor so I assumed it was a good way to define it.)
var buffer = new ArrayBuffer(90000000);
var view = new Int32Array(buffer);
for(var c=0;c<view.length;c++) {
view[c]=42;
}
This is what you should not do (send the view)
worker.postMessage(view);
This is what you should do (send the ArrayBuffer)
worker.postMessage(buffer, [buffer]);
These are the results after running this test on plnkr.
Average for sending views is 144.12690000608563
Average for sending ArrayBuffers is 0.3522000042721629
EDIT: As stated by #Bergi in the comments you don't need the buffer variable at all if you have the view, because you can just send view.buffer like so
worker.postMessage(view.buffer, [view.buffer]);
Just as a side note to future readers just sending an ArrayBuffer without the last argument specifying what the ArrayBuffers are you will not send the ArrayBuffer transferrably
In other words when sending transferrables you want this:
worker.postMessage(buffer, [buffer]);
Not this:
worker.postMessage(buffer);
EDIT: And one last note since you are sending a buffer don't forget to turn your buffer back into a view once it's received by the webworker. Once it's a view you can manipulate it (read and write from it) again.
And for the bounty:
I am also interested in official size limits for firefox/chrome (not
only time limit). However answer the original question qualifies for
the bounty (;
As to a webbrowsers limit to send something of a certain size I am not completeley sure, but from that quote that entry on html5rocks by Eric Bidelman when talking about workers he did bring up a 50 mb file being transferred without using a transferrable data-type in hundreds of milliseconds and as shown through my test in a only around a millisecond using a transferrable data-type. Which 50 mb is honestly pretty large.
Purely my own opinion, but I don't believe there to be a limit on the size of the file you send on a transferrable or non-transferrable data-type other than the limits of the data type itself. Of course your biggest worry would probably be for the browser stopping long running scripts if it has to copy the whole thing and is not zero-copy and transferrable.
Hope this post helps. Honestly I knew nothing about transferrables before this, but it was fun figuring out them through some tests and through that blog post by Eric Bidelman.

I had issues with webworkers too, until I just passed a single argument to the webworker.
So instead of
worker.postMessage( buffer,[ buffer]);
worker.postMessage(obj,[obj.mat2]);
Try
var myobj = {buffer:buffer,obj:obj};
worker.postMessage(myobj);
This way I found it gets passed by reference and its insanely fast. I post back and forth over 20.000 dataelements in a single push per 5 seconds without me noticing the datatransfer.
I've been exclusively working with chrome though, so I don't know how it'll hold up in other browsers.
Update
I've done some testing for some stats.
tmp = new ArrayBuffer(90000000);
test = new Int32Array(tmp);
for(c=0;c<test.length;c++) {
test[c]=42;
}
for(c=0;c<4;c++) {
window.setTimeout(function(){
// Cloning the Array. "We" will have lost the array once its sent to the webworker.
// This is to make sure we dont have to repopulate it.
testsend = new Int32Array(test);
// marking time. sister mark is in webworker
console.log("sending at at "+window.performance.now());
// post the clone to the thread.
FieldValueCommunicator.worker.postMessage(testsend);
},1000*c);
}
results of the tests. I don't know if this falls in your category of slow or not since you did not define "slow"
sending at at 28837.418999988586
recieved at 28923.06199995801
86 ms
sending at at 212387.9840001464
recieved at 212504.72499988973
117 ms
sending at at 247635.6210000813
recieved at 247760.1259998046
125 ms
sending at at 288194.15999995545
recieved at 288304.4079998508
110 ms

It depends on how large the data is
I found this article that says, the better strategy is to pass large data to a web worker and back in small bits. In addition, it also discourages the use of ArrayBuffers.
Please have a look: https://developers.redhat.com/blog/2014/05/20/communicating-large-objects-with-web-workers-in-javascript

Related

Is copying a large blob over to a worker expensive?

Using the Fetch API I'm able to make a network request for a large asset of binary data (say more than 500 MB) and then convert the Response to either a Blob or an ArrayBuffer.
Afterwards, I can either do worker.postMessage and let the standard structured clone algorithm copy the Blob over to a Web Worker or transfer the ArrayBuffer over to the worker context (making effectively no longer available from the main thread).
At first, it would seem that it would be much preferable to fetch the data as an ArrayBuffer, since a Blob is not transferrable and thus, will need to be copied over. However, blobs are immutable and thus, it seems that the browser doesn't store it in the JS heap associated to the page, but rather in a dedicated blob storage space and thus, what's ended up being copied over to the worker context is just a reference.
I've prepared a demo to try out the difference between the two approaches: https://blobvsab.vercel.app/. I'm fetching 656 MB worth of binary data using both approaches.
Something interesting I've observed in my local tests, is that copying the Blob is even faster than transferring the ArrayBuffer:
Blob copy time from main thread to worker: 1.828125 ms
ArrayBuffer transfer time from main thread to worker: 3.393310546875 ms
This is a strong indicator that dealing with Blobs is actually pretty cheap. Since they're immutable, the browser seems to be smart enough to treat them as a reference rather than linking the overlying binary data to those references.
Here are the heap memory snapshots I've taken when fetching as a Blob:
The first two snapshots were taken after the resulting Blob of fetching was copied over the worker context using postMessage. Notice that neither of those heaps include the 656 MBs.
The latter two snapshots were taken after I've used a FileReader to actually access the underlying data, and as expected, the heap grew a lot.
Now, this is what happens with fetching directly as an ArrayBuffer:
Here, since the binary data was simply transferred over the worker thread, the heap of the main thread is small but the worker heap contains the entirety of the 656 MBs, even before reading this data.
Now, looking around at SO I see that What is the difference between an ArrayBuffer and a Blob? mentions a lot of underlying differences between the two structures, but I haven't found a good reference regarding if one should be worried about copying over a Blob between execution contexts vs. what would seem an inherent advantage of ArrayBuffer that they're transferrable. However, my experiments show that copying the Blob might actually be faster and thus I think preferable.
It seems to be up to each browser vendor how they're storing and handling Blobs. I've found this Chromium documentation describing that all Blobs are transferred from each renderer process (i.e. a page on a tab) to the browser process and that way Chrome can even offload the Blob to the secondary memory if needed.
Does anyone have some more insights regarding all of this? If I can choose to fetch some large binary data over the network and move that to a Web Worker should I prefer a Blob or a ArrayBuffer?

No, it's not expensive at all to postMessage a Blob.
The cloning steps of a Blob are
Their serialization steps, given value and serialized, are:
Set serialized.[[SnapshotState]] to value’s snapshot state.
Set serialized.[[ByteSequence]] to value’s underlying byte sequence.
Their deserialization step, given serialized and value, are:
Set value’s snapshot state to serialized.[[SnapshotState]].
Set value’s underlying byte sequence to serialized.[[ByteSequence]].
In other words, nothing is copied, both the snapshot state and the byte sequence are passed by reference, (even though the wrapping JS object is not).
However regarding your full project, I wouldn't advise using Blobs here for two reasons:
The fetch algorithm first fetches as an ArrayBuffer internally. Requesting a Blob adds an extra step there (which consumes memory).
You'll probably need to read that Blob from the Worker, adding yet an other step (which will also consume memory since here the data will actually get copied).

Size of storing JSON response in Javascript

I'm working on a PhoneGap app and find it most efficient to cache as much user data as possible on the device when the user logs in. (Profile information, etc.) Due to project constraints, I can't use local storage.
I'm making an API call and pulling back of JSON data that I use to power the app. My specific question: It is somewhat safe to assume that the byte size of the JSON results will be roughly equal to the memory consumed? i.e. if the API call response is 200k of JSON data, that about 200k of memory will be used to store it in a javascript object?

The best answer is, "Test it."
As several commenters have said, under many common conditions where will be a rough 1:1 correspondence, but there are so many caveats and gotchas, some of which are engine-specific, that it's impossible to say with any certainty.
In other words, if you test it and it looks like the assumption holds under your particular use-case, then 'somewhat safe' is a reasonable assumption. Just don't bet the farm on it.

Have I reached the limits of the size of objects JavaScript in my browser can handle?

I'm embedding a large array in <script> tags in my HTML, like this (nothing surprising):
<script>
var largeArray = [/* lots of stuff in here */];
</script>
In this particular example, the array has 210,000 elements. That's well below the theoretical maximum of 231 - by 4 orders of magnitude. Here's the fun part: if I save JS source for the array to a file, that file is >44 megabytes (46,573,399 bytes, to be exact).
If you want to see for yourself, you can download it from GitHub. (All the data in there is canned, so much of it is repeated. This will not be the case in production.)
Now, I'm really not concerned about serving that much data. My server gzips its responses, so it really doesn't take all that long to get the data over the wire. However, there is a really nasty tendency for the page, once loaded, to crash the browser. I'm not testing at all in IE (this is an internal tool). My primary targets are Chrome 8 and Firefox 3.6.
In Firefox, I can see a reasonably useful error in the console:
Error: script stack space quota is exhausted
In Chrome, I simply get the sad-tab page:
Cut to the chase, already
Is this really too much data for our modern, "high-performance" browsers to handle?
Is there anything I can do* to gracefully handle this much data?
Incidentally, I was able to get this to work (read: not crash the tab) on-and-off in Chrome. I really thought that Chrome, at least, was made of tougher stuff, but apparently I was wrong...
Edit 1
#Crayon: I wasn't looking to justify why I'd like to dump this much data into the browser at once. Short version: either I solve this one (admittedly not-that-easy) problem, or I have to solve a whole slew of other problems. I'm opting for the simpler approach for now.
#various: right now, I'm not especially looking for ways to actually reduce the number of elements in the array. I know I could implement Ajax paging or what-have-you, but that introduces its own set of problems for me in other regards.
#Phrogz: each element looks something like this:
{dateTime:new Date(1296176400000),
terminalId:'terminal999',
'General___BuildVersion':'10.05a_V110119_Beta',
'SSM___ExtId':26680,
'MD_CDMA_NETLOADER_NO_BCAST___Valid':'false',
'MD_CDMA_NETLOADER_NO_BCAST___PngAttempt':0}
#Will: but I have a computer with a 4-core processor, 6 gigabytes of RAM, over half a terabyte of disk space ...and I'm not even asking for the browser to do this quickly - I'm just asking for it to work at all! ☹
Edit 2
Mission accomplished!
With the spot-on suggestions from Juan as well as Guffa, I was able to get this to work! It would appear that the problem was just in parsing the source code, not actually working with it in memory.
To summarize the comment quagmire on Juan's answer: I had to split up my big array into a series of smaller ones, and then Array#concat() them, but that wasn't enough. I also had to put them into separate var statements. Like this:
var arr0 = [...];
var arr1 = [...];
var arr2 = [...];
/* ... */
var bigArray = arr0.concat(arr1, arr2, ...);
To everyone who contributed to solving this: thank you. The first round is on me!
*other than the obvious: sending less data to the browser

Here's what I would try: you said it's a 44MB file. That surely takes more than 44MB of memory, I'm guessing this takes much over 44MB of RAM, maybe half a gig. Could you just cut down the data until the browser doesn't crash and see how much memory the browser uses?
Even apps that run only on the server would be well served to not read a 44MB file and keep it in memory. Having said all that, I believe the browser should be able to handle it, so let me run some tests.
(Using Windows 7, 4GB of memory)
First Test
I cut the array in half and there were no problems, uses 80MB, no crash
Second Test
I split the array into two separate arrays, but still contains all the data, uses 160Mb, no crash
Third Test
Since Firefox said it ran out of stack, the problem is probably that it can't parse the array at once. I created two separate arrays, arr1, arr2 then did arr3 = arr1.concat(arr2); It ran fine and uses only slightly more memory, around 165MB.
Fourth Test I am creating 7 of those arrays (22MB each) and concatting them to test browser limits. It takes about 10 seconds for the page to finish loading. Memory goes up to 1.3GB, then it goes back down to 500MB. So yeah chrome can handle it. It just can't parse it all at once because it uses some kind of recursion as can be noticed by the console's error message.
Answer Create separate arrays (less than 20MB each) and then concat them. Each array should be on its own var statement, instead of doing multiple declarations with a single var.
I would still consider fetching only the necessary part, it may make the browser sluggish. however, if it's an internal task, this should be fine.
Last point: You're not at maximum memory levels, just max parsing levels.

Yes, it's too much to ask of a browser.
That amount of data would be managable if it already was data, but it isn't data yet. Consider that the browser has to parse that huge block of source code while checking that the syntax adds up for it all. Once parsed into valid code, the code has to run to produce the actual array.
So, all of the data will exist in (at least) two or three versions at once, each with a certain amount of overhead. As the array literal is a single statement, each step will have to include all of the data.
Dividing the data into several smaller arrays would possibly make it easier on the browser.

Do you really need all the data? can't you stream just the data currently needed using AJAX? Similar to Google Maps - you can't fit all the map data into browser's memory, they display just the part you are currently seeing.
Remember that 40 megs of hard data can be inflated to much more in browser's internal representation. For example the JS interpreter may use hashtable to implement the array, which would add additional memory overhead. Also, I expect that the browsers stores both source code and the JS memory, that alone doubles the amount of data.
JS is designed to provide client-side UI interaction, not handle loads of data.
EDIT:
Btw, do you really think users will like downloading 40 megabytes worth of code? There are still many users with less than broadband internet access. And execution of the script will be suspended until all the data is downloaded.
EDIT2:
I had a look at the data. That array will definitely be represented as hashtable. Also many of the items are objects, which will require reference tracking...that is additional memory.
I guess the performance would be better if it was simple vector of primitive data.
EDIT3: The data could certainly be simplified. The bulk of it are repeating strings, which could be encoded
in some way as integers or something. Also, my Opera is having trouble just displaying the text, let alone interpreting it.
EDIT4: Forget the DateTime objects! Use unix era timestamps or strings, but not objects!
EDIT5: Your processor doesn't matter because JS is single-threaded. And your RAM doesn't matter either, most browsers are 32bit, so they can't use much of that memory.
EDIT6: Try changing the array indices to sequential integers (0, 1, 2, 3...). That might make the browser use more efficient array data structure. You can use constants to access the array items efficiently. This is going to cut down the array size by huge chunk.

Try retrieving the data with Ajax as an JSON page. I don't know the exact size but I've been able to pull large amounts of data into Google Chrome that way.

Use lazy loading. Have pointers to the data and get it when the user asks.
This technique is used in various places to manage millions of records of data.
[Edit]
I found what I was looking for. Virtual scrolling in the jqgrid. That's 500k records being lazy loaded.

I would try having it as one big string with separator between each "item" then use split, something like:
var largeString = "item1,item2,.......";
var largeArray = largeString.split(",");
Hopefully string won't exhaust the stack so fast.
Edit: in order to test it I created dummy array with 200,000 simple items (each item one number) and Chrome loaded it within an instant. 2,000,000 items? Couple of seconds but no crash. 6,000,000 items array (50 MB file) made Chrome load for about 10 seconds but still, no crash in either ways.
So this leads me to believe the problem is not with the array itself but rather it's contents.. optimize the contents to simple items then parse them "on the fly" and it should work.

Passing an ActionScript JPG Byte Array to Javascript (and eventually to PHP)

Our web application has a feature which uses Flash (AS3) to take photos using the user's web cam, then passes the resulting byte array to PHP where it is reconstructed and saved on the server.
However, we need to be able to take this web application offline, and we have chosen Gears to do so. The user takes the app offline, performs his tasks, then when he's reconnected to the server, we "sync" the data back with our central database.
We don't have PHP to interact with Flash anymore, but we still need to allow users to take and save photos. We don't know how to save a JPG that Flash creates in a local database. Our hope was that we could save the byte array, a serialized string, or somehow actually persist the object itself, then pass it back to either PHP or Flash (and then PHP) to recreate the JPG.
We have tried:
- passing the byte array to Javascript instead of PHP, but javascript doesn't seem to be able to do anything with it (the object seems to be stripped of its methods)
- stringifying the byte array in Flash, and then passing it to Javascript, but we always get the same string:
ÿØÿà
Now we are thinking of serializing the string in Flash, passing it to Javascript, then on the return route, passing that string back to Flash which will then pass it to PHP to be reconstructed as a JPG. (whew). Since no one on our team has extensive Flash background, we're a bit lost.
Is serialization the way to go? Is there a more realistic way to do this? Does anyone have any experience with this sort of thing? Perhaps we can build a javascript class that is the same as the byte array class in AS?

I'm not sure why you would want to use Javascript here. Anyway, the string you pasted looks like the beginning of a JPG header. The problem is that a JPG will for sure contain NULs (characters with 0 as its value). This will most likely truncate the string (as it seems to be the case with the sample you posted). If you want to "stringify" the JPG, the standard approach is encoding it as Base 64.
If you want to persist data locally, however, there's a way to do it in Flash. It's simple, but it has some limitations.
You can use a local Shared Object for this. By default, there's a 100 Kb limit, which is rather inadequate for image files; you could ask the user to allot more space to your app, though. In any case, I'd try to store the image as JPG, not the raw pixels, since the difference in size is very significative.
Shared Objects will handle serialization / deserialization for you transparently. There are some caveats: not every object can really be serialized; for starters, it has to have a parameterless constructor; DisplayObjects such as Sprites, MovieClips, etc, won't work. It's possible to serialize a ByteArray, however, so you could save your JPGs locally (if the user allows for the extra space). You should use AMF3 as the encoding scheme (which is the default, I think); also, you should map the class you're serializing with registerClassAlias to preserve the type of serialized the object (otherwise it will be treated as an Object object). You only need to do it once in the app life cycle, but it must be done before any read / write to the Shared Object.
Something along the lines of:
registerClassAlias("flash.utils.ByteArray",ByteArray);
I'd use Shared Objects rather than Javascript. Just keep in mind that you'll most likely have to ask the user to give you more space for storing the images (which seems reasonable enough if you're allowing them to work offline), and that the user could delete the data at any time (just like he could delete their browser's cookies).
Edit
I realize I didn't really pay much attention the "we have chosen Gears to do so" part of your question.
In that case, you could give the base 64 approach a try to pass the data to JS. From the Actionscript side it's easy (grab one of the many available Base64 encoders/decoders out there), and I assume the Gear's API must have an encoder / decoder available already (or at least it shouldn't be hard to find one). At that point you'll probably have to turn that into a Blob and store it to disk (maybe using the BlobAPI, but I'm not sure as I don't have experience with Gears).

CouchDB: How to change view function via javascript?

I am playing around with CouchDB to test if it is "possible" [1] to store scientific data (simulated and experimental raw data + metadata). A big pro is the schema-less approach of CouchDB: we have to be very flexible with the metadata, as the set of parameters changes very often.
Up to now I have some code to feed raw data, plots (both as attachments), and hierarchical metadata (as JSON) into CouchDB documents, and have written some prototype Javascript for filtering and showing. But the filtering is done on the client side (a.k.a. browser): The map function simply returns everything.
How could I change the (or push a second) map function of a specific _design-document with simple browser-JS?
I do not think that a temporary view would yield any performance gain...
Thanks for your time and answers.
[1]: of course it is possible, but is it also useful? feasible? reasonable?
[added]
Ah, the jquery.couch.js (version 0.9.0) provides a saveDoc() function, which could update the _design document with the new map function.
But I also tried out the query function, which uses a temporary view. Okay, "do not use this in the real product, only during development"... But scientific research is steady development, right?
Temporary views are getting cached, as I noticed, and it works well for ~1000 documents per DB. A second plus: all users (think of 1 to 3, so a big user management is quit of an overkill) can work with their own temporary view.

Never ever use temporary views. They are really only there for dev and debugging purposes. For more information, see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views (specifically the bold "NOTE").
And yes, because design documents are really just documents with special powers, you can run you GET/POST/PUT/DELETE methods on them. However, you will usually need admin privileges to do this. So, if you are allowing a client side piece of software to do that, you are making your entire database public for read/write access - this may be fine for your application, but is important to remember.
Ex., if you restrict access to your database, but put the username and password in client side javascript, then anyone can see that username and password.
Cheers.

I´ve written an helper functions for jquery.couch and design docs, take a look at:
https://github.com/grischaandreew/jquery.couch.js

We Keep Coding

JavaScript is the programming language of the Web.