Web Worker Creating Garbage - javascript

I'm working on a project that utilizes web workers. It seems that the workers are generating quite a bit of extra garbage that has to be collected from the message passing.
I'm sending three things to the worker via post message from the main thread. First is just a number, second is an array with 7 numbers, and 3rd is the date. The firs two are properties of an object as seen below. This is called every 16ms on RAF for about 20 objects. The GC ends up collecting 12MB every 2 seconds or so. I'm wondering if there is a way to do this without creating so much garbage? Thanks for any help!
//planet num (property of object) is just a number like: 1
//planetele looks like this (property of an object)
//[19.22942, 313.4868, 0.04441, 0.7726, 170.5310, 73.9893, 84.3234]
//date is just the date object
//posted to worker like so:
planetWorker.postMessage({
"planetnum": planet.num,
"planetele": planet.ele,
"date": datet
});
//the worker.js file uses that information to do calculations
//and sends back the planet number, with xyz coordinates. (4 numbers)
postMessage({data: {planetnum : planetnum, planetpos: planetpos}});

I tried two different avenues and ended up using a combination of them. First, before I sent some of the elements over I used JSON.stringify to convert them to strings, then JSON.parse to get them back once they were sent to the worker. For the array I ended up using transferable objects. Here is a simplified example of what I did:
var ast = [];
ast.elements = new Float64Array([0.3871, 252.2507, 0.20563, 7.005, 77.4548, 48.3305, 0.2408]);
ast.num = 1;
var astnumJ = JSON.stringify(ast.num); // Probably not needed, just example
// From main thread, post message to worker
asteroidWorker.postMessage({
"asteroidnum": astnumJ,
"asteroidele": ast.elements.buffer
},[ast.elements.buffer]);
This sends the array to the worker, it doesn't copy it, which reduces the garbage made. It is now not accessible in the main thread, so once the worker posts the message, you have to send the array back to the main thread or it wont be accessible as a property of ast anymore. In my case, because I have 20 - 30 ast objects, I need to make sure they all have their elements restored via post message before I call another update to them. I did this with a simple counter in a loop.
// In worker.js
asteroidele = new Float64Array(e.data.asteroidele); // cast to type
asteroidnum = JSON.parse(e.data.asteroidnum); // parse JSON
// Do calculations with this information in worker then return it to the main thread
// Post message from worker back to main
self.postMessage({
asteroidnum : asteroidnum,
asteroidpos : asteroidpos, // Calculated position with elements
asteroidele : asteroidele // Return the elements buffer back to main
});
// Main thread worker onmessage function
asteroidWorker.onmessage = function(e){
var data1 = e.data;
ast.ele = data1.asteroidele; // Restore elements back to ast object
}
Not sure this is the best approach yet, but it does work for sending the array to and from the worker without making a bunch of extra garbage. I think the best approach here will be to send the array to the worker and leave it there, then just return updated positions. Working on that still.

Related

JSON.stringify is very slow for large objects

I have a very big object in javascript (about 10MB).
And when I stringify it, it takes a long time, so I send it to backend and parse it to an object( actually nested objects with arrays), and that takes long time too but it's not our problem in this question.
The problem:
How can I make JSON.stringify faster, any ideas or alternatives, I need a javaScript solution, libraries I can use or ideas here.
What I've tried
I googled a lot and looks there is no better performance than JSON.stringify or my googling skills got rusty!
Result
I accept any suggestion that may solve me the long saving (sending to backend) in the request (I know its big request).
Code Sample of problem (details about problem)
Request URL:http://localhost:8081/systemName/controllerA/update.html;jsessionid=FB3848B6C0F4AD9873EA12DBE61E6008
Request Method:POST
Status Code:200 OK
Am sending a POST to backend and then in JAVA
request.getParameter("BigPostParameter")
and I read it to convert to object using
public boolean fromJSON(String string) {
if (string != null && !string.isEmpty()) {
ObjectMapper json = new ObjectMapper();
DateFormat dateFormat = new SimpleDateFormat(YYYY_MM_DD_T_HH_MM_SS_SSS_Z);
dateFormat.setTimeZone(TimeZone.getDefault());
json.setDateFormat(dateFormat);
json.configure(DeserializationFeature.ACCEPT_SINGLE_VALUE_AS_ARRAY, true);
WebObject object;
// Logger.getLogger("JSON Tracker").log(Level.SEVERE, "Start");
try {
object = json.readValue(string, this.getClass());
} catch (IOException ex) {
Logger.getLogger(JSON_ERROR).log(Level.SEVERE, "JSON Error: {0}", ex.getMessage());
return false;
}
// Logger.getLogger("JSON Tracker").log(Level.SEVERE, "END");
return this.setThis(object);
}
return false;
}
Like This
BigObject someObj = new BigObject();
someObj.fromJSON(request.getParameter("BigPostParameter"))
P.S : FYI this line object = json.readValue(string, this.getClass());
is also very very very slow.
Again to summarize
Problem in posting time (stringify) JavaScript bottle nick.
Another problem parsing that stringified into an object (using jackson), and mainly I have svg tags content in that stringified object as a style column, and other columns are strings, int mainly
As commenters said - there is no way to make parsing faster.
If the concern is that the app is blocked while it's stringifying/parsing then try to split data into separate objects, stringily them and assemble back into one object before saving on the server.
If loading time of the app is not a problem you could try to ad-hoc incremental change on top of the existing app.
... App loading
Load map data
Make full copy of the data
... End loading
... App working without changes
... When saving changes
diff copy with changed data to get JSON diff
send changes (much smaller then full data)
... On server
apply JSON diff changes on the server to the full data stored on server
save changed data
I used json-diff https://github.com/andreyvit/json-diff to calc changes, and there are few analogs.
Parsing is a slow process. If what you want is to POST a 10MB object, turn it into a file, a blob, or a buffer. Send that file/blob/buffer using formdata instead of application/json and application/x-www-form-urlencoded.
Reference
An example using express/multer
Solution
Well just as most big "repeatable" problems go, you could use async!
But wait, isn't JS still single-threaded even when it does async... yes... but you can use Service-Workers to get true async and serialize an object way faster by parallelizing the process.
General Approach
mainPage.js
//= Functions / Classes =============================================================|
// To tell JSON stringify that this is already processed, don't touch
class SerializedChunk {
constructor(data){this.data = data}
toJSON() {return this.data}
}
// Attach all events and props we need on workers to handle this use case
const mapCommonBindings = w => {
w.addEventListener('message', e => w._res(e.data), false)
w.addEventListener('error', e => w._rej(e.data), false)
w.solve = obj => {
w._state && await w._state.catch(_=>_) // Wait for any older tasks to complete if there is another queued
w._state = new Promise((_res, _rej) => {
// Give this object promise bindings that can be handled by the event bindings
// (just make sure not to fire 2 errors or 2 messages at the same time)
Object.assign(w, {_res, _rej})
})
w.postMessage(obj)
return await w._state // Return the final output, when we get the `message` event
}
}
//= Initialization ===================================================================|
// Let's make our 10 workers
const workers = Array(10).fill(0).map(_ => new Worker('worker.js'))
workers.forEach(mapCommonBindings)
// A helper function that schedules workers in a round-robin
workers.schedule = async task => {
workers._c = ((workers._c || -1) + 1) % workers.length
const worker = workers[workers._c]
return await worker.solve(task)
}
// A helper used below that takes an object key, value pair and uses a worker to solve it
const _asyncHandleValuePair = async ([key, value]) => [key, new SerializedChunk(
await workers.schedule(value)
)]
//= Final Function ===================================================================|
// The new function (You could improve the runtime by changing how this function schedules tasks)
// Note! This is async now, obviously
const jsonStringifyThreaded = async o => {
const f_pairs = await Promise.all(Object.entries(o).map(_asyncHandleValuePair))
// Take all final processed pairs, create a new object, JSON stringify top level
final = f_pairs.reduce((o, ([key, chunk]) => (
o[key] = chunk, // Add current key / chunk to object
o // Return the object to next reduce
), {}) // Seed empty object that will contain all the data
return JSON.stringify(final)
}
/* lot of other code, till the function that actually uses this code */
async function submitter() {
// other stuff
const payload = await jsonStringifyThreaded(input.value)
await server.send(payload)
console.log('Done!')
}
worker.js
self.addEventListener('message', function(e) {
const obj = e.data
self.postMessage(JSON.stringify(obj))
}, false)
Notes:
This works the following way:
Creates a list of 10 workers, and adds a few methods and props to them
We care about async .solve(Object): String which solves our tasks using promises while masking away callback hell
Use a new method: async jsonStringifyThreaded(Object): String which does the JSON.stringify asynchronously
We break the object into entries and solve each one parallelly (this can be optimized to be recursive to a certain depth, use best judgement :))
Processed chunks are cast into SerializedChunk which the JSON.stringify will use as is, and not try to process (since it has .toJSON())
Internally if the number of keys exceeds the workers, we round-robin back to the first worker and overschedule them (remember, they can handle queued tasks)
Optimizations
You may want to consider a few more things to improve performance:
Use of Transferable Objects which will decrease the overhead of passing objects to service workers significantly
Redesign jsonStringifyThreaded() to schedule more objects at deeper levels.
You can explore libraries like fast-json-stringify which use a template schema and use it while converting the json object, to boost the performance. Check the below article.
https://developpaper.com/how-to-improve-the-performance-of-json-stringify/

Node.js Socket IO: How to continuously save socket data to MongoDB

I'm building a 3D game in the browser using THREE.js. Lots of fun, but I came across the following situation:
An object in my 3D scene is continuously moving around, driven by user input. I need to save the object's position to my database in real-time.
Let's start at the front-end. Angular.js is watching my object's position using its built-in $watch functionality. The object's position can change multiple times per second.
On each change, I emit an event to the backend Node.js server using Socket IO, like so:
socket.emit('update', {
id: id,
position: position
});
In the back-end, the event is caught and immediatly emitted to other members in the same Socket IO Room. This way, everyone in this room will have the most real-time update possible.
Now, because the event can happen multiple times per second, I don't want to update my MongoDB collection on each change, since this would cause a lot of overhead. Instead, I'm looking for a way of incidentally saving data to the database.
I've came up with a solution by using Node.js setInterval function, which saves data every 1000ms. For each distinct id (which is unique per object) received on the backend, a new key is created on an JavaScript object, thus keeping track of changes on a per-object basis.
The (simplified) code on the backend:
let update_queue = new Object();
// ...
// Update Event
socket.on('update', (msg) => {
// Flag Changes
if (!update_queue[msg.id]) update_queue[msg.id] = { changes: true };
// Set Interval Timer
if (!update_queue[msg.id].timer) {
update_queue[msg.id].timer = setInterval(() => {
if (!update_queue[msg.id].changes) {
clearInterval(update_queue[msg.id].timer);
return;
}
// This saves data to MongoDB
Object3DCollection.update(msg.id, msg.position)
.then((res) => {
console.log('saved');
});
// Unflag Changes
update_queue[msg.id].changes = false;
}, 1000);
}
// Immediate Broadcast to Socket Room
socket.broadcast.to('some_room').emit('object_updated', msg);
});
The Question
Is this a proper way of handling very frequent socket data and still saving it to a database? Or are there any other suggestions/solutions that are more robuust or work better.
Note
I do not want to wait for my object to be saved to the database and then emit the saved data to the rest of the socket room. The delay of database write operations is not suitable for the real-time game situation I'm dealing with.
Thanks in advance! All suggestions/solutions are appreciated and will be considered.

Will Rx.Observable.groupBy clean up empty streams?

In a Node application I'm trying to process a stream of events using RxJS. The event stream is a list of changes to many documents. I'm using groupBy to partition the stream into new streams by documentId. But I'm wondering, once a document is closed on the client and no new events are added to the stream for that documentId, will groupBy dispose of that document's stream once it is empty? If not, how would I manually do that? I want to avoid a memory leak caused by new documents streams being create but never destroyed.
What I'd suggest doing:
instead of just having a documentChanges observable, have a documentEvents observable.
Clients will send documentOpened events when they open a document, documentChanged events when they change a document and documentClosed events when they close a document.
By sending all 3 types of events through the same observable, you establish and guarantee an ordering. If a client that sends documentOpened, documentChanged, documentClosed events in that order, then your server will see them in that order. Note there won't be any guarantees about the order of events sent by 2 different clients. This will just let you ensure that the events sent by a particular client will be in order.
And then, this is how you'd use groupByUntil:
documentEvents
.groupByUntil(
function (e) { return e.documentId; }, // key
null, // element
function (group) { // duration selector
var documentId = group.key;
return group.filter(function (e) { return e.eventType === 'documentClosed'; });
})
.flatMap(function (eventsForDocument) {
var documentId = eventsForDocument.key;
return eventsForDocument.whatever(...);
})
.subscribe(...);
Another option that is a lot simpler: you can just expire the group after an idle period. Depending on what you are doing with the events this may be more than sufficient. This example expires a group if the document has not been edited in 5 minutes. If more edits come in then a new group is spun up.
var idleTime = 5 * 60 * 1000;
events
.groupByUntil(
function(e) { return e.documentId; },
null,
function(g) { return g.debounce(idleTime); })
.flatMap...
Since you included the .NET tag, I'll cover Rx.NET as well.
Your question is a phrased a bit incorrectly. Streams are empty if and only if they never have an event. So, they can't become empty. A stream that isn't emitting data doesn't typically consume much in the way of resources though.
In .NET, groups will not terminate until the source terminates. We use 'GroupByUntil` which allows you to specify a durationSelector stream for each group. Observable.Timer often works well for this.
This means that you may get multiple non-concurrent streams with the same key appearing over time, but if (as is often the case) your group streams are flattened at some point, it won't matter.
In rxjs, we also have groupByUntil.
In Rx-Java, the groupByUntil method, which behaved similarly, was rolled into groupBy - see https://github.com/ReactiveX/RxJava/pull/1727 and https://github.com/benjchristensen/RxJava/commit/b9302956832e3e77579f63fd9db25aa60eb4192a for more details.
http://reactivex.io/documentation/operators/groupby.html says:
If you unsubscribe from one of the GroupedObservables, that GroupedObservable will be terminated. If the source Observable later emits an item whose key matches the GroupedObservable that was terminated in this way, groupBy will create and emit a new GroupedObservable to match the key.
So, in Rx-Java you must unsubscribe from a grouped observable stream to terminate it. takeUntil with a timer stream can work for this.
Addendum:
In response to your comment, a stream will not terminate until a downstream operator unsubscribes from it. The duration selector of groupByUntil would cause termination. If a document will not be opened again once closed, then you can just send a "documentclosed" event into the stream and use a regular groupBy with a takeWhile testing for the "documentClosed".
The reason why it's important the document is not opened again is because with groupBy (in rx-js, and rx.net) a new group will not be created if an already seen key reappears.
If this is a problem, then you will need to use groupByUntil and use a published stream to watch for the documentClosed event - using a published stream will ensure you don't get subscription side effects.

ChromeWorker set pointer to the memory or send big data. How?

We trying send 500kb through ChromeWoker and get out of memory error in console.
This code:
let charArray= ctypes.ArrayType(ctypes.char);
let base641 = new charArray(9999999);
var {ChromeWorker} = Cu.import("resource://gre/modules/Services.jsm", null);
var chworker = new ChromeWorker(self.data.url("async.js"));
chworker.onmessage = function(e){
console.error(e.data);
};
chworker.postMessage(base641);
return error:
stack:"#undefined:undefined:undefined
CuddlefishLoader/options<.load#resource://gre/modules/commonjs/sdk/loader/cuddlefish.js:129:9
run#resource://gre/modules/commonjs/sdk/addon/runner.js:169:9
startup/<#resource://gre/modules/commonjs/sdk/addon/runner.js:113:7
Handler.prototype.process#resource://gre/modules/Promise-backend.js:863:11
this.PromiseWalker.walkerLoop#resource://gre/modules/Promise-backend.js:742:7"
What is undefined? Can we set pointer to the memory through ChromeWorker?
To pass a ctypes array, you'd take the address() and post that in a message and construct another pointer from that address on the other side. This works, but is rather nasty, of course. You'd also need to make sure that garbage collection does not collect stuff while the other side it still using it!
You might be better off using ArrayBuffer/typed arrays which you can pass directly, also from non-privileged code.

Loading large amount of data into memory - most efficient way to do this?

I have a web-based documentation searching/viewing system that I'm developing for a client. Part of this system is a search system that allows the client to search for a term[s] contained in the documentation. I've got the necessary search data files created, but there's a lot of data that needs to be loaded, and it takes anywhere from 8-20 seconds to load all the data. The data is broken into 40-100 files, depending on what documentation needs to be searched. Each file is anywhere from 40-350kb.
Also, this application must be able to run on the local file system, as well as through a webserver.
When the webpage loads up, I can generate a list of what search data files I need load. This entire list must be loaded before the webpage can be considered functional.
With that preface out of the way, let's look at how I'm doing it now.
After I know that the entire webpage is loaded, I call a loadData() function
function loadData(){
var d = new Date();
var curr_min = d.getMinutes();
var curr_sec = d.getSeconds();
var curr_mil = d.getMilliseconds();
console.log("test.js started background loading, time is: " + curr_min + ":" + curr_sec+ ":" + curr_mil);
recursiveCall();
}
function recursiveCall(){
if(file_array.length > 0){
var string = file_array.pop();
setTimeout(function(){$.getScript(string,recursiveCall);},1);
}
else{
var d = new Date();
var curr_min = d.getMinutes();
var curr_sec = d.getSeconds();
var curr_mil = d.getMilliseconds();
console.log("test.js stopped background loading, time is: " + curr_min + ":" + curr_sec+ ":" + curr_mil);
}
}
What this does is processes an array of files sequentially, taking a 1ms break between files. This helps prevent the browser from being completely locked up during the loading process, but the browser still tends to get bogged down by loading the data. Each of the files that I'm loading look like this:
AddToBookData(0,[0,1,2,3,4,5,6,7,8]);
AddToBookData(1,[0,1,2,3,4,5,6,7,8]);
AddToBookData(2,[0,1,2,3,4,5,6,7,8]);
Where each line is a function call that is adding data to an array. The "AddToBookData" function simply does the following:
function AddToBookData(index1,value1){
BookData[BookIndex].push([index1,value1]);
}
This is the existing system. After loading all the data, "AddToBookData" can get called 100,000+ times.
I figured that was pretty inefficient, so I wrote a script to take the test.js file which contains all the function calls above, and processed it to change it into a giant array which is equal to the data structure that BookData is creating. Instead of making all the function calls that the old system did, I simply do the following:
var test_array[..........(data structure I need).......]
BookData[BookIndex] = test_array;
I was expecting to see a performance increase because I was removing all the function calls above, this method takes slightly more time to create the exact data structure. I should note that "test_array" holds slightly over 90,000 elements in my real world test.
It seems that both methods of loading data have roughly the same CPU utilization. I was surprised to find this, since I was expecting the second method to require little CPU time, since the data structure is being created before hand.
Please advise?
Looks like there are two basic areas for optimising the data loading, that can be considered and tackled separately:
Downloading the data from the server. Rather than one large file you should gain wins from parallel loads of multiple smaller files. Experiment with number of simultaneous loads, bear in mind browser limits and diminishing returns of having too many parallel connections. See my parallel vs sequential experiments on jsfiddle but bear in mind that the results will vary due to the vagaries of pulling the test data from github - you're best off testing with your own data under more tightly controlled conditions.
Building your data structure as efficiently as possible. Your result looks like a multi-dimensional array, this interesting article on JavaScript array performance may give you some ideas for experimentation in this area.
But I'm not sure how far you'll really be able to go with optimising the data loading alone. To solve the actual problem with your application (browser locking up for too long) have you considered options such as?
Using Web Workers
Web Workers might not be supported by all your target browsers, but should prevent the main browser thread from locking up while it processes the data.
For browsers without workers, you could consider increasing the setTimeout interval slightly to give the browser time to service the user as well as your JS. This will make things actually slightly slower but may increase user happiness when combined with the next point.
Providing feedback of progress
For both worker-capable and worker-deficient browsers, take some time to update the DOM with a progress bar. You know how many files you have left to load so progress should be fairly consistent and although things may actually be slightly slower, users will feel better if they get the feedback and don't think the browser has locked up on them.
Lazy Loading
As suggested by jira in his comment. If Google Instant can search the entire web as we type, is it really not possible to have the server return a file with all locations of the search keyword within the current book? This file should be much smaller and faster to load than the locations of all words within the book, which is what I assume you are currently trying to get loaded as quickly as you can?
I tested three methods of loading the same 9,000,000 point dataset into Firefox 3.64.
1: Stephen's GetJSON Method
2) My function based push method
3) My pre-processed array appending method:
I ran my tests two ways: The first iteration of testing I imported 100 files containing 10,000 rows of data, each row containing 9 data elements [0,1,2,3,4,5,6,7,8]
The second interation I tried combining files, so that I was importing 1 file with 9 million data points.
This was a lot larger than the dataset I'll be using, but it helps demonstrate the speed of the various import methods.
Separate files: Combined file:
JSON: 34 seconds 34
FUNC-BASED: 17.5 24
ARRAY-BASED: 23 46
Interesting results, to say the least. I closed out the browser after loading each webpage, and ran the tests 4 times each to minimize the effect of network traffic/variation. (ran across a network, using a file server). The number you see is the average, although the individual runs differed by only a second or two at most.
Instead of using $.getScript to load JavaScript files containing function calls, consider using $.getJSON. This may boost performance. The files would now look like this:
{
"key" : 0,
"values" : [0,1,2,3,4,5,6,7,8]
}
After receiving the JSON response, you could then call AddToBookData on it, like this:
function AddToBookData(json) {
BookData[BookIndex].push([json.key,json.values]);
}
If your files have multiple sets of calls to AddToBookData, you could structure them like this:
[
{
"key" : 0,
"values" : [0,1,2,3,4,5,6,7,8]
},
{
"key" : 1,
"values" : [0,1,2,3,4,5,6,7,8]
},
{
"key" : 2,
"values" : [0,1,2,3,4,5,6,7,8]
}
]
And then change the AddToBookData function to compensate for the new structure:
function AddToBookData(json) {
$.each(json, function(index, data) {
BookData[BookIndex].push([data.key,data.values]);
});
}
Addendum
I suspect that regardless what method you use to transport the data from the files to the BookData array, the true bottleneck is in the sheer number of requests. Must the files be fragmented into 40-100? If you change to JSON format, you could load a single file that looks like this:
{
"file1" : [
{
"key" : 0,
"values" : [0,1,2,3,4,5,6,7,8]
},
// all the rest...
],
"file2" : [
{
"key" : 1,
"values" : [0,1,2,3,4,5,6,7,8]
},
// yadda yadda
]
}
Then you could do one request, load all the data you need, and move on... Although the browser may initially lock up (although, maybe not), it would probably be MUCH faster this way.
Here is a nice JSON tutorial, if you're not familiar: http://www.webmonkey.com/2010/02/get_started_with_json/
Fetch all the data as a string, and use split(). This is the fastest way to build an array in Javascript.
There's an excellent article a very similar problem, from the people who built the flickr search: http://code.flickr.com/blog/2009/03/18/building-fast-client-side-searches/

Categories