In order to make a web app responsive you use asynchronous non-blocking requests. I can envision two ways of accomplishing this. One is to use deferreds/promises. The other is Web Workers. With Web Workers we end up introducing another process and we incur the overhead of having to marshal data back and forth. I was looking for some kind of performance metrics to help understand when to chose simple non-blocking callbacks over Web Workers.
Is there some way of formulating which to use without having to prototype both approaches? I see lots of tutorials online about Web Workers but I don't see many success/failure stories. All I know is I want a responsive app. I'm thinking to use a Web Worker as the interface to an in-memory data structure that may be anywhere from 0.5-15MB (essentially a DB) that the user can query and update.
As I understand javascript processing, it is possible to take a single long-running task and slice it up so that it periodically yields control allowing other tasks a slice of processing time. Would that be a sign to use Web Workers?
`Deferred/Promises and Web Workers address different needs:
Deferred/promise are constructs to assign a reference to a result not yet available, and to organize code that runs once the result becomes available or a failure is returned.
Web Workers perform actual work asynchronously (using operating system threads not processes - so they are relatively light weight).
In other words, JavaScript being single threaded, you cannot use deferred/promises to run code asynchronously — once the code runs that fulfills the promise, no other code will run (you may change the order of execution, e.g. using setTimeout(), but that does not make your web app more responsive per se). Still, you might somehow be able to create the illusion of an asynchronous query by e.g. iterating over an array of values by incrementing the index every few milliseconds (e.g. using setInterval), but that's hardly practical.
In order to perform work such as your query asynchronously and thus off load this work from your app's UI, you need something that actually works asynchronously. I see several options:
use an IndexedDB which provides an asynchronous API,
run your own in-memory data structure, and use web workers, as you indicated, to perform the actual query,
use a server-side script engine such as NodeJS to run your code, then use client-side ajax to start the query (plus a promise to process the results),
use a database accessible over HTTP (e.g. Redis, CouchDB), and from the client issue an asynchronous GET (i.e. ajax) to query the DB (plus a promise to process the results),
develop a hybrid web app using e.g. Parse.
Which approach is the best in your case? Hard to tell without exact requirements, but here are the dimensions I would look at:
Code complexity — if you already have code for your data structure, probably Web Workers are a good fit, otherwise IndexedDB looks more sensible.
Performance — if you need consistent performance a server-side implementation or DB seems more appropriate
Architecture/complexity — do you want all processing to be done client side or can you afford the effort (cost) to manage a server side implementation?
I found this book a useful read.
Related
I learnt Node.js is single-threaded and non-blocking. Here I saw a nice explanation How, in general, does Node.js handle 10,000 concurrent requests?
But the first answer says
The seemingly mysterious thing is how both the approaches above manage to run workload in "parallel"? The answer is that the database is threaded. So our single-threaded app is actually leveraging the multi-threaded behaviour of another process: the database.
(1) which gets me confused. Take a simple express application as an example, when I
var monk = require('monk');
var db = monk('localhost:27017/databaseName');
router.get('/QueryVideo', function(req, res) {
var collection = db.get('videos');
collection.find({}, function(err, videos){
if (err) throw err;
res.render('index', { videos: videos })
});
});
And when my router responds multi requests by doing simple MongoDB query. Are those queries handled by different threads? I know there is only one thread in node to router client requests though.
(2) My second question is, how does such a single-threaded node application ensure security? I don't know much about security but it looks like multi requests should be isolated (at least different memory space?) Even though multi-threaded applications can't ensure security, because they still share many things. I know this may not be a reasonable question, but in today's cloud service, isolation seems to be an important topic. I am lost in topics such as serverless, wasm, and wasm based serverless using Node.js environment.
Thank you for your help!!
Since you asked about my answer I guess I can help clarify.
1
For the specific case of handling multiple parallel client requests which triggers multiple parallel MongoDB queries you asked:
Are those queries handled by different threads?
On node.js since MongoDB connects via the network stack (tcp/ip) all parallel requests are handled in a single thread. The magic is a system API that allows your program to wait in parallel. Node.js uses libuv to select which API to use depending on which OS at compile time. But which API does not matter. It is enough to know that all modern OSes have APIs that allow you to wait on multiple sockets in parallel (instead of the usual waiting for a single socket in multiple threads/processes). These APIs are collectively called asynchronous I/O APIs.
On MongoDB.. I don't know much about MongoDB. Mongo may be implemented in multiple threads or it may be singlethreaded like node.js. Disk I/O are themselves handled in parallel by the OS without using threads but instead use I/O channels (eg, PCI lanes) and DMA channels. Basically both threads/processes and asynchronous I/O are generally implemented by the OS (at least on Linux and Mac) using the same underlying system: OS events. And OS events are just functions that handle interrupts. Anyway, this is straying quite far from the discussion about databases..
I know that MySQL and Postgresql are both multithreaded to handle parsing the SQL query loop (query processing in SQL are basically operations that loop through rows and filter the result - this requires both I/O and CPU which is why they're multithreaded)
If you are still curious how computers can do things (like wait for I/O) without the CPU executing a single instruction you can check out my answers to the following related questions:
Is there any other way to implement a "listening" function without an infinite while loop?
What is the mechanism that allows the scheduler to switch which threads are executing?
2
Security is ensured by the language being interpreted and making sure the interpreter does not have any stack overflow or underflow bugs. For the most part this is true for all modern javascript engines. The main mechanism to inject code and execute foreign code via program input is via buffer overflow or underflow. Being able to execute foreign code then allows you to access memory. If you cannot execute foreign code being able to access memory is kind of moot.
There is a second mechanism to inject foreign code which is prevalent in some programming language cultures: code eval (I'm looking at you PHP!). Using string substitution to construct database queries in any language open you up to sql code eval attack (more commonly called sql injection) regardless of your program's memory model. Javascript itself has an eval() function. To protect against this javascript programmers simply consider eval evil. Basically protection against eval is down to good programming practices and Node.js being open source allows anyone to look at the code and report any cases where code evaluation attack is possible. Historically Node.js has been quite good in this regards - so your main guarantee about security from code eval is Node's reputation.
(1) The big picture goes like this; for nodejs there are 2 types of thread: event (single) and workers (pool). So long you don't block the event loop, after nodejs placed the blocked I/O call to worker thread; nodejs goes on to service next request. The worker will place the completed I/O back to the event loop for next course of action.
In short the main thread: "Do something else when it need to wait, come back and continue when the wait is over, and it does this one at a time".
And this reactive mechanism has nothing to do with thread running in another process (ie database). The database may deploy other type of thread management scheme.
(2) The 'memory space' in your question is in the same process space. A thread belongs to a process (ie Express app A) never run in other process (ie Fastify app B) space.
I've heard that node.js is fast when it performs IO tasks like querying a database and because javascript is single threaded and uses event loop, it uses a lot less resources when comparing to using a separate thread whenever a concurrent database query is made. But doesn't that mean the maximum number of concurrent queries it can make is still limited by how many concurrent connections a particular database can handle? If that's the case, what are the advantages of node.js over something like vert.x, which can use multiple event loops instead of just one.
Since node.js 12 (it's available since node.js 10 as "experimental feature") you can use worker threads like you're maybe already using worker verticle with vert.x. You also have the child process with node or cluster mode.
Either ways (vert.x or node) you need to keep in mind the golden rule aka "do not block the eventloop or worker pool" and think about the size of your thread pool according to your capacity.
I think it's often better to keep you process monothreaded and to scale it as new and completely isolated instances that can be hosted on multiple places in your infrastructure or even in the same place (using docker/kubernetes/whatever or even with systemd for example). And dynamically perform the scaling according to the requests increase, your infrastructure's ability to scale and of course the size of the connections pool your database can handle.
When you're designing your application as a reactive or event-driven one (the right way to use node or vert.x) and a stateless one, it's easier to handle the scaling that way.
So I was looking at this module, and I cannot understand why it uses callbacks.
I thought that memory caching is supposed to be fast and that is also the purpose someone would use caching, because it's fast... like instant.
Adding callbacks implies that you need to wait for something.
But how much you need to wait actually? If the result gets back to you very fast, aren't you slowing things down by wrapping everything in callbacks + promises on top (because as a user of this module you are forced to promisify those callbacks) ?
By design, javascript is asynchronous for most of its external calls (http, 3rd parties libraries, ...).
As mention here
Javascript is a single threaded language. This means it has one call stack and one memory heap. As expected, it executes code in order and must finish executing a piece code before moving onto the next. It's synchronous, but at times that can be harmful. For example, if a function takes a while to execute or has to wait on something, it freezes everything up in the meanwhile.
Having synchronous function will block the thread and the execution of the script. To avoid any blocking (due to networking, file access, etc...), it is recommended to get these information asynchronously.
Most of the time, the redis caching will take a few ms. However, this is preventing a possible network lag and will keep your application up and running with a tiny amount of connectivity errors for your customers.
TLDR: reading from memory is fast, reading from network is slower and shouldn't block the process
You're right. Reading from memory cache is very fast. It's as fast as accessing any variable (micro or nano seconds), so there is no good reason to implement this as a callback/promise as it will be significantly slower. But this is only true if you're reading from the nodejs process memory.
The problem with using redis from node, is that the memory cache is stored on another machine (redis server) or at least another process. So the even if redis reads the data very quickly, it still has to go through the network to return to your node server, which isn't always guaranteed to be fast (usually few milliseconds at least). For example, if you're using a redis server which is not physically close to your nodejs server, or you have too many network requests, ... the request can take longer to reach redis and return back to your server. Now imagine if this was blocking by default, it would prevent your server from doing anything else until the request is complete. Which will result in a very poor performance as your server is sitting idle waiting for the network trip. That's the reason why any I/O (disk, network, ..) operation in nodejs should be async by default.
Alex, you remarked with "I thought that memory caching is supposed to be fast and that is also the purpose someone would use caching, because it's fast... like instant." And you're near being wholly right.
Now, what does Redis actually mean?
It means REmote DIctionary Server.
~ FAQ - Redis
Yes, a dictionary usually performs in O(1) time. However, do note that the perception of the said performance is effective from the facade of procedures running inside the process holding the dictionary. Therefore, access to the memory owned by the Redis process from another process, is a channel of operations that is not O(1).
So, because Redis is a REmote DIctionary Server asynchronous APIs are required to access its service.
As it has already been answered here, your redis instance could be on your machine (and accessing redis RAM storage is nearly as fast as accessing a regular javascript variable) but it could also be an another machine/cloud/cluster/you name it. And in that case, network latency could be problematic, that's why the promises/callbacks syntax.
If you are 100% confident that your Redis instance would always lay on the same machine your code is, that having some blocking asynchronous calls to it is fine, you could just use the ES6 await syntax to write it as blocking synchronous events and avoid the callbacks or the promises :)
But I'm not sure it is worth it, in term of coding habit and scalability. But every project is different and that could suits you.
I'm working with a code that handles all AJAX requests using Web Workers (when available). These workers do almost nothing more than XMLHttpRequest object handling (no extra computations). All requests created by workers are asynchronous (request.open("get",url,true)).
Recently, I got couple of issues regarding this code and I started to wonder if I should spend time fixing this or just dump the whole solution.
My research so far suggests that this code may be actually hurting performance. However, I wasn't able to find any credible source supporting this. My only two findings are:
2 year old jQuery feature suggestion to use web workers for AJAX calls
this SO question that seems to ask about something a bit different (using synchronous requests in web workers vs AJAX calls)
Can someone point me to a reliable source discussing this issue? Or, are there any benchmarks that may dispel my doubts?
[EDIT] This question gets a little bit more interesting when WebWorker is also responsible for parsing the result (JSON.parse). Is asynchronous parsing improving performance?
I have created a proper benchmark for that on jsperf. Depending on the browser, WebWorker approach is 85-95% slower than a raw ajax call.
Notes:
since network response time can be different for each request, I'm testing only new XMLHttpRequest() and JSON.parse(jsonString);. There are no real AJAX calls being made.
WebWorker setup and teardown operations are not being measured
note that I'm testing a single request, results for webworker approach may be better for multiple simultaneous requests
Calvin Metcalf explained to me that comparing sync and async on jsperf won't give accurate results and he created another benchmark that eliminates async overhead. Results still show that WebWorker approach is significantly slower.
From the Reddit discussion I learned that data passed between the main page and WebWorker are copied and have to be serialized in the process. Therefore, using WebWorker for parsing only doesn't make much sense, data will have to be serialized and deserialized anyway before you can use them on the main page.
First thing to remember is that web workers rarely make things faster in the sense of taking less time, they make things faster in the sense that they off load computation to a background thread so that processing related to user interaction is not blocked. For instance when you take into account transferring the data, doing a huge calculation might take 8 seconds instead of 4. But if it was done on the main thread the entire page would be frozen for 4 seconds which is likely unacceptable.
With that in mind moving just the ajax calls off the main thread won't gain you anything as ajax calls are non blocking. But if you have to parse JSON or even better, extract a small subset out of a large request then a web worker can help you out.
A caveat i've heard but not confirmed is that workers use a different cache than the main page so that if the same resources are being loaded in the main thread and the worker it could cause a large duplication of effort.
You are optimizing your code in the wrong place.
AJAX requests already run in a separate thread and return to the main event loop once they fulfil (and call the defined callback function).
Web workers are an interface to threads, meant for computationally expensive operations. Just like in classical desktop applications when you don't want to block the interface with computations that take a long time.
Asynchronous IO is an important concept of Javascript.
First, your request is already asynchronous, the IO is non-blocking and during your request, you can run any another Javascript code. Executing the callback in a worker is much more interesting than the request.
Second, Javascript engines execute all code in the same thread, if you create new threads, you need to handle data communication with the worker message api (see Semaphore).
In conclusion, the asynchronous and single-threaded nature of JavaScript is powerful, use it as much as possible and create workers only if you really need it, for example in a long Javascript process.
From my experience, Web Workers should not be used for AJAX calls. First of all, they are asynchronous, meaning code will still run while you're waiting for the information to come back.
Now, using a worker to handle the response is definitely something you could use the Web Worker for. Some examples:
Parsing the response to build a large model
Computing large amounts of data from the response
Using a Shared Web Worker with a template engine in conjunction with the AJAX response to build the HTML which will then be returned for appending to the DOM.
Edit: another good read would be: Opinion about synchronous requests in web workers
Knowing that
Javascript is strictly single threaded and settimeout doesn't spawn
new threads. Instead it follows event-loop model
Worker threads are new HTML5 features and its support is still not
available in all HTML5 browsers
Which one I should be using for background data download purpose? If you have any experience or have any benchmark data available please share.
Not sure how it matters to question, but still for the sake of completeness I would like to mention that data is expected to be in XML format and multiple server-side services will be invoked to get the data. Is there a framework already available which caters to both hand-held device based browsers and desktop based browsers data downloading?
Javascript is single threaded, but that doesn't mean data isn't downloaded in parallel. If you make an asynchronous AJAX call, you are downloading data in the background while the rest of your code is running.
Web workers are meant to do CPU-heavy work off the main thread. They won't help you get data any faster.
Worker thread are not supported on all major browsers yet, so I don't think it will be great idea to involve worker-thread into your design unless you only want to support latest browsers.
All the Ajax calls can be made in parallel, but I guess your problem will be how to know that all ajax call's response has come and data is inserted/updated into browser's database. One way to know that all AJAX responses have been dealt with would be to chain all the AJAX calls. But this will warrant you to hard-code the sequence of AJAX calls into your code-base. If that is not desirable than you would need a separate asynch flow to keep checking if all the AJAX calls have been responded to. You can achieve this by setting a flag for each ajax call and set that to true when the response has been used. Then you will keep checking the status of all the flag in an another flow (started using settimeout).
I am aware of any existing framework to take care of such complex activity, so you will have to write the code yourself for the same. Hope this would have helped you in giving some direction.
Here is a thread where they show how to run a bunch of parallel AJAX calls with JQuery Parallel asynchronous Ajax requests using jQuery
The only reason I can think of to use a worker thread would be to have it constantly talking with the server, but in that case websockets are a better solution http://www.html5rocks.com/en/tutorials/websockets/basics/
Web workers are only supported on IE10+ so you will still need a fall back solution if you want to support older versions of IE. http://msdn.microsoft.com/en-us/library/ie/hh673568(v=vs.85).aspx