I'm working with a code that handles all AJAX requests using Web Workers (when available). These workers do almost nothing more than XMLHttpRequest object handling (no extra computations). All requests created by workers are asynchronous (request.open("get",url,true)).
Recently, I got couple of issues regarding this code and I started to wonder if I should spend time fixing this or just dump the whole solution.
My research so far suggests that this code may be actually hurting performance. However, I wasn't able to find any credible source supporting this. My only two findings are:
2 year old jQuery feature suggestion to use web workers for AJAX calls
this SO question that seems to ask about something a bit different (using synchronous requests in web workers vs AJAX calls)
Can someone point me to a reliable source discussing this issue? Or, are there any benchmarks that may dispel my doubts?
[EDIT] This question gets a little bit more interesting when WebWorker is also responsible for parsing the result (JSON.parse). Is asynchronous parsing improving performance?
I have created a proper benchmark for that on jsperf. Depending on the browser, WebWorker approach is 85-95% slower than a raw ajax call.
Notes:
since network response time can be different for each request, I'm testing only new XMLHttpRequest() and JSON.parse(jsonString);. There are no real AJAX calls being made.
WebWorker setup and teardown operations are not being measured
note that I'm testing a single request, results for webworker approach may be better for multiple simultaneous requests
Calvin Metcalf explained to me that comparing sync and async on jsperf won't give accurate results and he created another benchmark that eliminates async overhead. Results still show that WebWorker approach is significantly slower.
From the Reddit discussion I learned that data passed between the main page and WebWorker are copied and have to be serialized in the process. Therefore, using WebWorker for parsing only doesn't make much sense, data will have to be serialized and deserialized anyway before you can use them on the main page.
First thing to remember is that web workers rarely make things faster in the sense of taking less time, they make things faster in the sense that they off load computation to a background thread so that processing related to user interaction is not blocked. For instance when you take into account transferring the data, doing a huge calculation might take 8 seconds instead of 4. But if it was done on the main thread the entire page would be frozen for 4 seconds which is likely unacceptable.
With that in mind moving just the ajax calls off the main thread won't gain you anything as ajax calls are non blocking. But if you have to parse JSON or even better, extract a small subset out of a large request then a web worker can help you out.
A caveat i've heard but not confirmed is that workers use a different cache than the main page so that if the same resources are being loaded in the main thread and the worker it could cause a large duplication of effort.
You are optimizing your code in the wrong place.
AJAX requests already run in a separate thread and return to the main event loop once they fulfil (and call the defined callback function).
Web workers are an interface to threads, meant for computationally expensive operations. Just like in classical desktop applications when you don't want to block the interface with computations that take a long time.
Asynchronous IO is an important concept of Javascript.
First, your request is already asynchronous, the IO is non-blocking and during your request, you can run any another Javascript code. Executing the callback in a worker is much more interesting than the request.
Second, Javascript engines execute all code in the same thread, if you create new threads, you need to handle data communication with the worker message api (see Semaphore).
In conclusion, the asynchronous and single-threaded nature of JavaScript is powerful, use it as much as possible and create workers only if you really need it, for example in a long Javascript process.
From my experience, Web Workers should not be used for AJAX calls. First of all, they are asynchronous, meaning code will still run while you're waiting for the information to come back.
Now, using a worker to handle the response is definitely something you could use the Web Worker for. Some examples:
Parsing the response to build a large model
Computing large amounts of data from the response
Using a Shared Web Worker with a template engine in conjunction with the AJAX response to build the HTML which will then be returned for appending to the DOM.
Edit: another good read would be: Opinion about synchronous requests in web workers
Related
So I was looking at this module, and I cannot understand why it uses callbacks.
I thought that memory caching is supposed to be fast and that is also the purpose someone would use caching, because it's fast... like instant.
Adding callbacks implies that you need to wait for something.
But how much you need to wait actually? If the result gets back to you very fast, aren't you slowing things down by wrapping everything in callbacks + promises on top (because as a user of this module you are forced to promisify those callbacks) ?
By design, javascript is asynchronous for most of its external calls (http, 3rd parties libraries, ...).
As mention here
Javascript is a single threaded language. This means it has one call stack and one memory heap. As expected, it executes code in order and must finish executing a piece code before moving onto the next. It's synchronous, but at times that can be harmful. For example, if a function takes a while to execute or has to wait on something, it freezes everything up in the meanwhile.
Having synchronous function will block the thread and the execution of the script. To avoid any blocking (due to networking, file access, etc...), it is recommended to get these information asynchronously.
Most of the time, the redis caching will take a few ms. However, this is preventing a possible network lag and will keep your application up and running with a tiny amount of connectivity errors for your customers.
TLDR: reading from memory is fast, reading from network is slower and shouldn't block the process
You're right. Reading from memory cache is very fast. It's as fast as accessing any variable (micro or nano seconds), so there is no good reason to implement this as a callback/promise as it will be significantly slower. But this is only true if you're reading from the nodejs process memory.
The problem with using redis from node, is that the memory cache is stored on another machine (redis server) or at least another process. So the even if redis reads the data very quickly, it still has to go through the network to return to your node server, which isn't always guaranteed to be fast (usually few milliseconds at least). For example, if you're using a redis server which is not physically close to your nodejs server, or you have too many network requests, ... the request can take longer to reach redis and return back to your server. Now imagine if this was blocking by default, it would prevent your server from doing anything else until the request is complete. Which will result in a very poor performance as your server is sitting idle waiting for the network trip. That's the reason why any I/O (disk, network, ..) operation in nodejs should be async by default.
Alex, you remarked with "I thought that memory caching is supposed to be fast and that is also the purpose someone would use caching, because it's fast... like instant." And you're near being wholly right.
Now, what does Redis actually mean?
It means REmote DIctionary Server.
~ FAQ - Redis
Yes, a dictionary usually performs in O(1) time. However, do note that the perception of the said performance is effective from the facade of procedures running inside the process holding the dictionary. Therefore, access to the memory owned by the Redis process from another process, is a channel of operations that is not O(1).
So, because Redis is a REmote DIctionary Server asynchronous APIs are required to access its service.
As it has already been answered here, your redis instance could be on your machine (and accessing redis RAM storage is nearly as fast as accessing a regular javascript variable) but it could also be an another machine/cloud/cluster/you name it. And in that case, network latency could be problematic, that's why the promises/callbacks syntax.
If you are 100% confident that your Redis instance would always lay on the same machine your code is, that having some blocking asynchronous calls to it is fine, you could just use the ES6 await syntax to write it as blocking synchronous events and avoid the callbacks or the promises :)
But I'm not sure it is worth it, in term of coding habit and scalability. But every project is different and that could suits you.
In order to make a web app responsive you use asynchronous non-blocking requests. I can envision two ways of accomplishing this. One is to use deferreds/promises. The other is Web Workers. With Web Workers we end up introducing another process and we incur the overhead of having to marshal data back and forth. I was looking for some kind of performance metrics to help understand when to chose simple non-blocking callbacks over Web Workers.
Is there some way of formulating which to use without having to prototype both approaches? I see lots of tutorials online about Web Workers but I don't see many success/failure stories. All I know is I want a responsive app. I'm thinking to use a Web Worker as the interface to an in-memory data structure that may be anywhere from 0.5-15MB (essentially a DB) that the user can query and update.
As I understand javascript processing, it is possible to take a single long-running task and slice it up so that it periodically yields control allowing other tasks a slice of processing time. Would that be a sign to use Web Workers?
`Deferred/Promises and Web Workers address different needs:
Deferred/promise are constructs to assign a reference to a result not yet available, and to organize code that runs once the result becomes available or a failure is returned.
Web Workers perform actual work asynchronously (using operating system threads not processes - so they are relatively light weight).
In other words, JavaScript being single threaded, you cannot use deferred/promises to run code asynchronously — once the code runs that fulfills the promise, no other code will run (you may change the order of execution, e.g. using setTimeout(), but that does not make your web app more responsive per se). Still, you might somehow be able to create the illusion of an asynchronous query by e.g. iterating over an array of values by incrementing the index every few milliseconds (e.g. using setInterval), but that's hardly practical.
In order to perform work such as your query asynchronously and thus off load this work from your app's UI, you need something that actually works asynchronously. I see several options:
use an IndexedDB which provides an asynchronous API,
run your own in-memory data structure, and use web workers, as you indicated, to perform the actual query,
use a server-side script engine such as NodeJS to run your code, then use client-side ajax to start the query (plus a promise to process the results),
use a database accessible over HTTP (e.g. Redis, CouchDB), and from the client issue an asynchronous GET (i.e. ajax) to query the DB (plus a promise to process the results),
develop a hybrid web app using e.g. Parse.
Which approach is the best in your case? Hard to tell without exact requirements, but here are the dimensions I would look at:
Code complexity — if you already have code for your data structure, probably Web Workers are a good fit, otherwise IndexedDB looks more sensible.
Performance — if you need consistent performance a server-side implementation or DB seems more appropriate
Architecture/complexity — do you want all processing to be done client side or can you afford the effort (cost) to manage a server side implementation?
I found this book a useful read.
Knowing that
Javascript is strictly single threaded and settimeout doesn't spawn
new threads. Instead it follows event-loop model
Worker threads are new HTML5 features and its support is still not
available in all HTML5 browsers
Which one I should be using for background data download purpose? If you have any experience or have any benchmark data available please share.
Not sure how it matters to question, but still for the sake of completeness I would like to mention that data is expected to be in XML format and multiple server-side services will be invoked to get the data. Is there a framework already available which caters to both hand-held device based browsers and desktop based browsers data downloading?
Javascript is single threaded, but that doesn't mean data isn't downloaded in parallel. If you make an asynchronous AJAX call, you are downloading data in the background while the rest of your code is running.
Web workers are meant to do CPU-heavy work off the main thread. They won't help you get data any faster.
Worker thread are not supported on all major browsers yet, so I don't think it will be great idea to involve worker-thread into your design unless you only want to support latest browsers.
All the Ajax calls can be made in parallel, but I guess your problem will be how to know that all ajax call's response has come and data is inserted/updated into browser's database. One way to know that all AJAX responses have been dealt with would be to chain all the AJAX calls. But this will warrant you to hard-code the sequence of AJAX calls into your code-base. If that is not desirable than you would need a separate asynch flow to keep checking if all the AJAX calls have been responded to. You can achieve this by setting a flag for each ajax call and set that to true when the response has been used. Then you will keep checking the status of all the flag in an another flow (started using settimeout).
I am aware of any existing framework to take care of such complex activity, so you will have to write the code yourself for the same. Hope this would have helped you in giving some direction.
Here is a thread where they show how to run a bunch of parallel AJAX calls with JQuery Parallel asynchronous Ajax requests using jQuery
The only reason I can think of to use a worker thread would be to have it constantly talking with the server, but in that case websockets are a better solution http://www.html5rocks.com/en/tutorials/websockets/basics/
Web workers are only supported on IE10+ so you will still need a fall back solution if you want to support older versions of IE. http://msdn.microsoft.com/en-us/library/ie/hh673568(v=vs.85).aspx
I want to know what do you think about this. Is recommended to use synchronous requests (XMLHttpRequest) in a web worker? What problems can I find?
I have been testing this in my app and I haven't find any trouble. But I'm afraid of this synchronus behaviour because of old experiences with jQuery and AJAX.
My app gets a big amount of data from several tables in a database, and this requires a time. For each bunch of data retrieved from a table, I need to instantly process it to not delay the whole thing too much. Meanwhile, the user is interacting with the browser, so it can be blocked, and I thought that web workers would work fine.
Do you thing that this is a good solution? Or should I try with asynchronus requests?
Thank you.
I don't have hard facts, but since you asked for opinions... :)
There is a telling issue in Chrome: Too many Web Workers can cause a silent crash (caps ~60-100, according to this bug report). The general problem is Web Workers are resource intensive, at least with v8.
Assuming you're going to end up making multiple HTTP calls, if you're doing synchronous HTTP calls in a Web Worker:
In a sense, you're trading asynchronous HTTP calls for asynchronous Web Workers, which will only be adding another intermediary into the mix and you still have to manage things asynchronously.
If you go the simpler and more resource efficient route and only use one Web Worker you'll spend a lot of time waiting for it to give you responses.
If, on the other hand, you use multiple Web Workers you will likely need to keep track of which one is free, which one is busy, etc., in which case you will be creating a home-grown scheduler instead of using what comes baked into the browser.
Finally, Web Workers are expensive (apparently), and you might end up creating multiple Web Workers just so they can sit around and wait for an HTTP call to finish.
I do not consider myself an expert on the matter, so please take this for what it's worth.
Update: Adding some pros / cons for various scenarios.
Some pros / cons that come to mind when choosing between making synchronous and asynchronous HTTP calls when using a Web Worker:
Generally, synchronous requests will be easier to write and will result in code that is easy to follow. A downside of synchronous requests is they might encourage writing long functions that should be carved out into separate, smaller functions.
If you're making a single call, there is no difference in the time it takes to finish between the two methods and synchronous is better because it's a bit simpler. I say it's only a bit simpler because a single asynch call with one callback listener is really rather simple.
If you're making multiple calls that have to happen in a specific sequence, like loading a user's profile data and then getting the local weather based on their address, synchronous calls will be better because it will be easier to write and a lot easier to read. The main thing about reading it is the sequential dependencies in the calls will be clearly outlined by the choice of making the calls synchronously and their order in the function. The more calls there are, the more this will matter. If there are many calls, the difference in complexity is likely to be drastic.
If you have to make multiple calls that do not need to happen in any specific order, then asynchronous requests are better because the overall process is likely to be orders of magnitude faster than with synchronous requests. The more calls you're making or the slower the connection, the more significant the difference in total elapsed time will be; this difference will grow very quickly (exponentially?). From the perspective of someone reading the code, I think using synchronous requests, in this situation, would be somewhat misleading as it would suggest there is a sequential nature to the calls even though there is not. From the perspective of writing a series of asynch requests that are not dependent on each other, it shouldn't be too bad because you just setup a counter, make all the calls, increment the counter in each of the callbacks and you're finished when the counter equals the number of calls you made.
Update: #rbrundritt makes a very interesting and relevant observation in a comment on this answer:
One thing I found working with web workers is that they appear to each gain their own http limit. Browsers limit the number of concurrent http requests to around 8 or 12 depending on the browser before throttling, which can be a bottle neck if you have a lot of requests to process. I've found that if I pass my requests web workers, each can do 8 to 12 concurrent requests before they start throttling. This can be a huge benefit for some apps.
#rbrundritt
I just wanted to add a note (that is too long for a comment) that the limit of ~60 web workers mentioned in #tiffon's answer doesn't appear to exist in Chromium anymore. I can create 500 workers in Chrome like this with no errors/crashes:
let workers = [];
for(let i = 0; i < 500; i++) {
let worker = new Worker(URL.createObjectURL(new Blob([`console.log('cool${i}')`])));
worker.postMessage({});
workers.push(worker);
}
await new Promise(r => setTimeout(r, 15000));
for(let i = 0; i < workers.length; i++) {
workers[i].terminate();
}
Though, it's hard to imagine a situation where hundreds of workers would be appropriate, given core counts don't currently go higher than around 64.
I understand why browser vendors don't want to help me block their UI thread. However, I don't understand why there is:
no sleep(2) in Web Workers
no synchronous WebSockets API
There is a synchronous FileSystem API. There is also a synchronous IndexedDB API. To me, it seems like a contradiction.
The reason why there's not a sleep() function available to WebWorkers is simple: you don't need it. sleep is a synchronous function, (it blocks until it returns) which doesn't make sense in the asynchronous context of WebWorkers.
If you send a message to a WebWorker, it doesn't block waiting for a response; the response is sent as a message to your message handler function. If you want to wait a certain amount of time before sending a response, you wouldn't use sleep, you'd use setTimeout and fire a message off when your function gets called.
Similarly, if you're using WebWorkers for WebSocket data transmission, you'd receive a message from the main thread, send a packet via websocket asynchronously, then in the response handler you'd send a message back to the main thread. There's no logical place to use a synchronous sleep function.
As far as why there's not a synchronous mode for WebSockets like there is for the filesystem, the primary difference is that the filesystem isn't accessed across the network.
Generally, asynchronous APIs are preferable for network-based functions, so I guess I don't see it as much of a contradiction.
IDB is only supported by 3 browsers, none of which have implemented the synchronous API, so I don't see that as a shining example of synchronous APIs. Inf fact, I think that's the contradiction that people would define an API and not bother to implement it.
It is not obvious at all : TCP protocol is a network protocol too, right ? And it is quite often used in synchronous mode, because it makes applications simpler to develop and debug.
In my opinion Async mode is obvious in the context of mono threaded applications, when you don't want I/Os to block a UI. It is very less obivous if you intend to use web workers, for instance, to handle background I/Os. It would indeed be convenient to have synchronous Websocket in conjonction with web workers.
Finally, it is just not carful to assume that a file read call will be done and quickly. You should always have a timeout or accept the fact that your app is going to hang if IO doesn't respond.
For me it is quite obvious.
FileSystem API & IndexedDB API works in order of milliseconds so you can trust have your data right now, instead it, WebSockets API must be at least 100 times slower, the data must fly over the wild internet, so it's obvious to make it asynchronous. Your response can even never back.
Indexed db will not block execution for longer time, most likely it will give result in few milli seconds and we are not expecting to store millions of records in indexed db. Same with file API, most API will result in quicker execution.
Also synchronous API will lead to race conditions and will require multi thread synchronization etc which will increase programming complexity. Instead message based threading is easier to program and we are free from synchronization issues.
Also most javascript engines are stable, and people are familiar with async programming ways. It's easier and only way to write worker. Changing this will require huge rewrite of javascript engines. Introducing more native API will make worker programming more complicated. Different os and different architecture or devices wiki introducr more complexity.
Since V8 has implemented ES2017 await/async, I can use that with Promise-enabled libraries, and I don't need the synchronous API so badly anymore.