I want to know what do you think about this. Is recommended to use synchronous requests (XMLHttpRequest) in a web worker? What problems can I find?
I have been testing this in my app and I haven't find any trouble. But I'm afraid of this synchronus behaviour because of old experiences with jQuery and AJAX.
My app gets a big amount of data from several tables in a database, and this requires a time. For each bunch of data retrieved from a table, I need to instantly process it to not delay the whole thing too much. Meanwhile, the user is interacting with the browser, so it can be blocked, and I thought that web workers would work fine.
Do you thing that this is a good solution? Or should I try with asynchronus requests?
Thank you.
I don't have hard facts, but since you asked for opinions... :)
There is a telling issue in Chrome: Too many Web Workers can cause a silent crash (caps ~60-100, according to this bug report). The general problem is Web Workers are resource intensive, at least with v8.
Assuming you're going to end up making multiple HTTP calls, if you're doing synchronous HTTP calls in a Web Worker:
In a sense, you're trading asynchronous HTTP calls for asynchronous Web Workers, which will only be adding another intermediary into the mix and you still have to manage things asynchronously.
If you go the simpler and more resource efficient route and only use one Web Worker you'll spend a lot of time waiting for it to give you responses.
If, on the other hand, you use multiple Web Workers you will likely need to keep track of which one is free, which one is busy, etc., in which case you will be creating a home-grown scheduler instead of using what comes baked into the browser.
Finally, Web Workers are expensive (apparently), and you might end up creating multiple Web Workers just so they can sit around and wait for an HTTP call to finish.
I do not consider myself an expert on the matter, so please take this for what it's worth.
Update: Adding some pros / cons for various scenarios.
Some pros / cons that come to mind when choosing between making synchronous and asynchronous HTTP calls when using a Web Worker:
Generally, synchronous requests will be easier to write and will result in code that is easy to follow. A downside of synchronous requests is they might encourage writing long functions that should be carved out into separate, smaller functions.
If you're making a single call, there is no difference in the time it takes to finish between the two methods and synchronous is better because it's a bit simpler. I say it's only a bit simpler because a single asynch call with one callback listener is really rather simple.
If you're making multiple calls that have to happen in a specific sequence, like loading a user's profile data and then getting the local weather based on their address, synchronous calls will be better because it will be easier to write and a lot easier to read. The main thing about reading it is the sequential dependencies in the calls will be clearly outlined by the choice of making the calls synchronously and their order in the function. The more calls there are, the more this will matter. If there are many calls, the difference in complexity is likely to be drastic.
If you have to make multiple calls that do not need to happen in any specific order, then asynchronous requests are better because the overall process is likely to be orders of magnitude faster than with synchronous requests. The more calls you're making or the slower the connection, the more significant the difference in total elapsed time will be; this difference will grow very quickly (exponentially?). From the perspective of someone reading the code, I think using synchronous requests, in this situation, would be somewhat misleading as it would suggest there is a sequential nature to the calls even though there is not. From the perspective of writing a series of asynch requests that are not dependent on each other, it shouldn't be too bad because you just setup a counter, make all the calls, increment the counter in each of the callbacks and you're finished when the counter equals the number of calls you made.
Update: #rbrundritt makes a very interesting and relevant observation in a comment on this answer:
One thing I found working with web workers is that they appear to each gain their own http limit. Browsers limit the number of concurrent http requests to around 8 or 12 depending on the browser before throttling, which can be a bottle neck if you have a lot of requests to process. I've found that if I pass my requests web workers, each can do 8 to 12 concurrent requests before they start throttling. This can be a huge benefit for some apps.
#rbrundritt
I just wanted to add a note (that is too long for a comment) that the limit of ~60 web workers mentioned in #tiffon's answer doesn't appear to exist in Chromium anymore. I can create 500 workers in Chrome like this with no errors/crashes:
let workers = [];
for(let i = 0; i < 500; i++) {
let worker = new Worker(URL.createObjectURL(new Blob([`console.log('cool${i}')`])));
worker.postMessage({});
workers.push(worker);
}
await new Promise(r => setTimeout(r, 15000));
for(let i = 0; i < workers.length; i++) {
workers[i].terminate();
}
Though, it's hard to imagine a situation where hundreds of workers would be appropriate, given core counts don't currently go higher than around 64.
Related
This topic was on my mind for a long time.
Let's assume we have a typical web server, one in Node.js and the other in Java(or any other language with threads).
Why would node perform better (handle more IO/network based requests per second) than a java server just because it uses async/await? Isn't it just a syntatic sugar that utilizes the same threads java/c#/c++ use behind the scenes?
There is no reason to expect Node to be faster than a server written in Java. Why do you think it might be?
It seems the other answers here (so far) are explaining the benefits of asynchronous programming in JS compared to single-threaded synchronous operations -- that's obvious, but not the question.
The key point everyone agrees on is that certain operations are inherently slow (e.g.: waiting for network requests, waiting for disk/database access), and it's efficient to let the CPU do something else while such operations are in flight. Using several threads in your application is one well-established way to do that; but of course that's only possible in languages that give you threads. Many traditional server implementations (in Java, C, C++, ...) use one thread per request (or, equivalently, a thread pool to distribute incoming requests over). These threads can block waiting for, say, the database -- that's okay, the operating system will put the waiting thread to sleep and let the CPU work on another thread (handling another request) in the meantime. The end result is fairly similar to what you get with Node.
JavaScript, of course, doesn't make threads available to the programmer. But instead, it has this concept of scheduling requests with the JavaScript engine and providing a callback to be invoked upon completion of the request. That's how the overall system behaves similarly to a traditional threaded programming language: user code can say "do this stuff, then schedule a database access, and when the result is available, continue with this [callback] code here", and while waiting for the database request, the CPU gets to execute some other code. What you want to avoid is the CPU sitting around idly waiting while there is other work waiting for the CPU to have time for it, and both approaches (Java threads and JavaScript callbacks) accomplish that.
Finally, async/await (just like Promises) are indeed just syntactic sugar that make it easier to write callback-based code. Code using async/await isn't any faster than old-style code using callbacks directly, just prettier and less error-prone. It also isn't any faster than a (well-written) Java-based server.
Node.js is popular because it's convenient to use the same language for the client and server parts of an app. From a performance point of view, it's not better than traditional alternatives, it's just also not worse (or at least not much; in practice how efficiently you design your app matters more than whether you implement it in Java or JavaScript). Don't fall for the hype :-)
Asynchrony(asyn/await) is essential for activities that are potentially blocking, such as when your application accesses the web. Access to a web resource sometimes is slow or delayed. If such an activity is blocked within a synchronous process, the entire application must wait.
In an asynchronous process (thread), the application can continue with other work that doesn't depend on the web resource until the potentially blocking task finishes.
https://learn.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2012/hh191443(v=vs.110)?redirectedfrom=MSDN#threads Though you should understand the differences between threads and async/await programming.
In regards to Asynchronous programming, Usability is the key. For instance, one request can be split up into smaller tasks i.e. fetching into internal results, reading, writing, establishing connections etc... thus, half the time gets wasted waiting on dependent tasks. Asynchronous models use this time to handle other incoming requests keeping a callback function, registering in a queue saving the state and becomes available for another request. Thus, they can handle more requests.
Understand more on handling requests: https://www.youtube.com/watch?v=cCOL7MC4Pl0
So I was looking at this module, and I cannot understand why it uses callbacks.
I thought that memory caching is supposed to be fast and that is also the purpose someone would use caching, because it's fast... like instant.
Adding callbacks implies that you need to wait for something.
But how much you need to wait actually? If the result gets back to you very fast, aren't you slowing things down by wrapping everything in callbacks + promises on top (because as a user of this module you are forced to promisify those callbacks) ?
By design, javascript is asynchronous for most of its external calls (http, 3rd parties libraries, ...).
As mention here
Javascript is a single threaded language. This means it has one call stack and one memory heap. As expected, it executes code in order and must finish executing a piece code before moving onto the next. It's synchronous, but at times that can be harmful. For example, if a function takes a while to execute or has to wait on something, it freezes everything up in the meanwhile.
Having synchronous function will block the thread and the execution of the script. To avoid any blocking (due to networking, file access, etc...), it is recommended to get these information asynchronously.
Most of the time, the redis caching will take a few ms. However, this is preventing a possible network lag and will keep your application up and running with a tiny amount of connectivity errors for your customers.
TLDR: reading from memory is fast, reading from network is slower and shouldn't block the process
You're right. Reading from memory cache is very fast. It's as fast as accessing any variable (micro or nano seconds), so there is no good reason to implement this as a callback/promise as it will be significantly slower. But this is only true if you're reading from the nodejs process memory.
The problem with using redis from node, is that the memory cache is stored on another machine (redis server) or at least another process. So the even if redis reads the data very quickly, it still has to go through the network to return to your node server, which isn't always guaranteed to be fast (usually few milliseconds at least). For example, if you're using a redis server which is not physically close to your nodejs server, or you have too many network requests, ... the request can take longer to reach redis and return back to your server. Now imagine if this was blocking by default, it would prevent your server from doing anything else until the request is complete. Which will result in a very poor performance as your server is sitting idle waiting for the network trip. That's the reason why any I/O (disk, network, ..) operation in nodejs should be async by default.
Alex, you remarked with "I thought that memory caching is supposed to be fast and that is also the purpose someone would use caching, because it's fast... like instant." And you're near being wholly right.
Now, what does Redis actually mean?
It means REmote DIctionary Server.
~ FAQ - Redis
Yes, a dictionary usually performs in O(1) time. However, do note that the perception of the said performance is effective from the facade of procedures running inside the process holding the dictionary. Therefore, access to the memory owned by the Redis process from another process, is a channel of operations that is not O(1).
So, because Redis is a REmote DIctionary Server asynchronous APIs are required to access its service.
As it has already been answered here, your redis instance could be on your machine (and accessing redis RAM storage is nearly as fast as accessing a regular javascript variable) but it could also be an another machine/cloud/cluster/you name it. And in that case, network latency could be problematic, that's why the promises/callbacks syntax.
If you are 100% confident that your Redis instance would always lay on the same machine your code is, that having some blocking asynchronous calls to it is fine, you could just use the ES6 await syntax to write it as blocking synchronous events and avoid the callbacks or the promises :)
But I'm not sure it is worth it, in term of coding habit and scalability. But every project is different and that could suits you.
When calling back to the same server, at what point am I better off making one bigger call, versus multiple parallel requests.
In my particular case, assume that the server processing time (not including request processing, etc) is linear (e.g. 1 big call asking for 3 bits of data takes the same processing time as 3 smaller calls).
I know that if I have a 1000 calls, I am better off batching them so as to not incur all the network overhead. But if I only have 2, I'm assuming parallel requests are probably better.
Is this right?
If so, where is the cutoff?
TL;DR: It depends on a number of factors that are highly dependant on your setup. If performance is a huge concern of yours, I would run tests, either with a 3rd party application like Wireshark, or write some performance testing code on the server. In general though, limit your amount of parallel requests to under a handful if possible, by concatenating them.
In general, a few requests (in parallel) are okay. A modern browser will attempt to run them in parallel as much as possible over the TCP stream.
That being said, this starts to get bloated because every single request you make at your server using the HTTP/1.* protocol comes with headers, which can be huge, as they contain things like the referrer and browser cookies. The request body might be one character, but the request itself will be much larger.
Furthermore, the scenario changes with HTTP/2 (or even SPDY), the new transfer protocol. Requests over the wire here are treated differently, and don't always carry the extra weight of all the header metadata that normal requests do. So, if your server and browser support HTTP/2, you might be able to run more requests in parallel.
For the most part, though, you'll be running over HTTP/1.*, which means any more than a couple requests in parallel can see a serious performance impact (in the scenario you described for server processing time) for total completion time over one large load.
There's one other thing to consider though, too, which is application dependant: when does that data matter? If you batch what would have been a ton of small requests into one larger one, none of the return data will come back until the entire operation is complete server-side. If you need to display data more rapidly, or you want to load things step-by-step for slower network conditions, the performance trade-off might be worth it for multiple small requests.
Hope this explanation helps.
Definitely read up on the FAQ for HTTP/2: they also cover some of the same performance issues you'll run into with HTTP/1.* in the scenario you described
I'm working with a code that handles all AJAX requests using Web Workers (when available). These workers do almost nothing more than XMLHttpRequest object handling (no extra computations). All requests created by workers are asynchronous (request.open("get",url,true)).
Recently, I got couple of issues regarding this code and I started to wonder if I should spend time fixing this or just dump the whole solution.
My research so far suggests that this code may be actually hurting performance. However, I wasn't able to find any credible source supporting this. My only two findings are:
2 year old jQuery feature suggestion to use web workers for AJAX calls
this SO question that seems to ask about something a bit different (using synchronous requests in web workers vs AJAX calls)
Can someone point me to a reliable source discussing this issue? Or, are there any benchmarks that may dispel my doubts?
[EDIT] This question gets a little bit more interesting when WebWorker is also responsible for parsing the result (JSON.parse). Is asynchronous parsing improving performance?
I have created a proper benchmark for that on jsperf. Depending on the browser, WebWorker approach is 85-95% slower than a raw ajax call.
Notes:
since network response time can be different for each request, I'm testing only new XMLHttpRequest() and JSON.parse(jsonString);. There are no real AJAX calls being made.
WebWorker setup and teardown operations are not being measured
note that I'm testing a single request, results for webworker approach may be better for multiple simultaneous requests
Calvin Metcalf explained to me that comparing sync and async on jsperf won't give accurate results and he created another benchmark that eliminates async overhead. Results still show that WebWorker approach is significantly slower.
From the Reddit discussion I learned that data passed between the main page and WebWorker are copied and have to be serialized in the process. Therefore, using WebWorker for parsing only doesn't make much sense, data will have to be serialized and deserialized anyway before you can use them on the main page.
First thing to remember is that web workers rarely make things faster in the sense of taking less time, they make things faster in the sense that they off load computation to a background thread so that processing related to user interaction is not blocked. For instance when you take into account transferring the data, doing a huge calculation might take 8 seconds instead of 4. But if it was done on the main thread the entire page would be frozen for 4 seconds which is likely unacceptable.
With that in mind moving just the ajax calls off the main thread won't gain you anything as ajax calls are non blocking. But if you have to parse JSON or even better, extract a small subset out of a large request then a web worker can help you out.
A caveat i've heard but not confirmed is that workers use a different cache than the main page so that if the same resources are being loaded in the main thread and the worker it could cause a large duplication of effort.
You are optimizing your code in the wrong place.
AJAX requests already run in a separate thread and return to the main event loop once they fulfil (and call the defined callback function).
Web workers are an interface to threads, meant for computationally expensive operations. Just like in classical desktop applications when you don't want to block the interface with computations that take a long time.
Asynchronous IO is an important concept of Javascript.
First, your request is already asynchronous, the IO is non-blocking and during your request, you can run any another Javascript code. Executing the callback in a worker is much more interesting than the request.
Second, Javascript engines execute all code in the same thread, if you create new threads, you need to handle data communication with the worker message api (see Semaphore).
In conclusion, the asynchronous and single-threaded nature of JavaScript is powerful, use it as much as possible and create workers only if you really need it, for example in a long Javascript process.
From my experience, Web Workers should not be used for AJAX calls. First of all, they are asynchronous, meaning code will still run while you're waiting for the information to come back.
Now, using a worker to handle the response is definitely something you could use the Web Worker for. Some examples:
Parsing the response to build a large model
Computing large amounts of data from the response
Using a Shared Web Worker with a template engine in conjunction with the AJAX response to build the HTML which will then be returned for appending to the DOM.
Edit: another good read would be: Opinion about synchronous requests in web workers
I understand why browser vendors don't want to help me block their UI thread. However, I don't understand why there is:
no sleep(2) in Web Workers
no synchronous WebSockets API
There is a synchronous FileSystem API. There is also a synchronous IndexedDB API. To me, it seems like a contradiction.
The reason why there's not a sleep() function available to WebWorkers is simple: you don't need it. sleep is a synchronous function, (it blocks until it returns) which doesn't make sense in the asynchronous context of WebWorkers.
If you send a message to a WebWorker, it doesn't block waiting for a response; the response is sent as a message to your message handler function. If you want to wait a certain amount of time before sending a response, you wouldn't use sleep, you'd use setTimeout and fire a message off when your function gets called.
Similarly, if you're using WebWorkers for WebSocket data transmission, you'd receive a message from the main thread, send a packet via websocket asynchronously, then in the response handler you'd send a message back to the main thread. There's no logical place to use a synchronous sleep function.
As far as why there's not a synchronous mode for WebSockets like there is for the filesystem, the primary difference is that the filesystem isn't accessed across the network.
Generally, asynchronous APIs are preferable for network-based functions, so I guess I don't see it as much of a contradiction.
IDB is only supported by 3 browsers, none of which have implemented the synchronous API, so I don't see that as a shining example of synchronous APIs. Inf fact, I think that's the contradiction that people would define an API and not bother to implement it.
It is not obvious at all : TCP protocol is a network protocol too, right ? And it is quite often used in synchronous mode, because it makes applications simpler to develop and debug.
In my opinion Async mode is obvious in the context of mono threaded applications, when you don't want I/Os to block a UI. It is very less obivous if you intend to use web workers, for instance, to handle background I/Os. It would indeed be convenient to have synchronous Websocket in conjonction with web workers.
Finally, it is just not carful to assume that a file read call will be done and quickly. You should always have a timeout or accept the fact that your app is going to hang if IO doesn't respond.
For me it is quite obvious.
FileSystem API & IndexedDB API works in order of milliseconds so you can trust have your data right now, instead it, WebSockets API must be at least 100 times slower, the data must fly over the wild internet, so it's obvious to make it asynchronous. Your response can even never back.
Indexed db will not block execution for longer time, most likely it will give result in few milli seconds and we are not expecting to store millions of records in indexed db. Same with file API, most API will result in quicker execution.
Also synchronous API will lead to race conditions and will require multi thread synchronization etc which will increase programming complexity. Instead message based threading is easier to program and we are free from synchronization issues.
Also most javascript engines are stable, and people are familiar with async programming ways. It's easier and only way to write worker. Changing this will require huge rewrite of javascript engines. Introducing more native API will make worker programming more complicated. Different os and different architecture or devices wiki introducr more complexity.
Since V8 has implemented ES2017 await/async, I can use that with Promise-enabled libraries, and I don't need the synchronous API so badly anymore.