Bull Queue Concurrency Questions

Bull Queue Concurrency Questions - javascript

I need help understanding how Bull Queue (bull.js) processes concurrent jobs.
Suppose I have 10 Node.js instances that each instantiate a Bull Queue connected to the same Redis instance:
const bullQueue = require('bull');
const queue = new bullQueue('taskqueue', {...})
const concurrency = 5;
queue.process('jobTypeA', concurrency, job => {...do something...});
Does this mean that globally across all 10 node instances there will be a maximum of 5 (concurrency) concurrently running jobs of type jobTypeA? Or am I misunderstanding and the concurrency setting is per-Node instance?
What happens if one Node instance specifies a different concurrency value?
Can I be certain that jobs will not be processed by more than one Node instance?

The TL;DR is: under normal conditions, jobs are being processed only once. If things go wrong (say Node.js process crashes), jobs may be double processed.
Quoting from Bull's official README.md:
Important Notes
The queue aims for an "at least once" working strategy. This means that in some situations, a job could be processed more than once. This mostly happens when a worker fails to keep a lock for a given job during the total duration of the processing.
When a worker is processing a job it will keep the job "locked" so other workers can't process it.
It's important to understand how locking works to prevent your jobs from losing their lock - becoming stalled - and being restarted as a result. Locking is implemented internally by creating a lock for lockDuration on interval lockRenewTime (which is usually half lockDuration). If lockDuration elapses before the lock can be renewed, the job will be considered stalled and is automatically restarted; it will be double processed. This can happen when:
The Node process running your job processor unexpectedly terminates.
Your job processor was too CPU-intensive and stalled the Node event loop, and as a result, Bull couldn't renew the job lock (see #488 for how we might better detect this). You can fix this by breaking your job processor into smaller parts so that no single part can block the Node event loop. Alternatively, you can pass a larger value for the lockDuration setting (with the tradeoff being that it will take longer to recognize a real stalled job).
As such, you should always listen for the stalled event and log this to your error monitoring system, as this means your jobs are likely getting double-processed.
As a safeguard so problematic jobs won't get restarted indefinitely (e.g. if the job processor aways crashes its Node process), jobs will be recovered from a stalled state a maximum of maxStalledCount times (default: 1).

Bull is designed for processing jobs concurrently with "at least once" semantics, although if the processors are working correctly, i.e. not stalling or crashing, it is in fact delivering "exactly once". However you can set the maximum stalled retries to 0 (maxStalledCount https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queue) and then the semantics will be "at most once".
Having said that I will try to answer to the 2 questions asked by the poster:
What happens if one Node instance specifies a different concurrency value?
I will assume you mean "queue instance". If so, the concurrency is specified in the processor. If the concurrency is X, what happens is that at most X jobs will be processed concurrently by that given processor.
Can I be certain that jobs will not be processed by more than one Node instance?
Yes, as long as your job does not crash or your max stalled jobs setting is 0.

I spent a bunch of time digging into it as a result of facing a problem with too many processor threads.
The short story is that bull's concurrency is at a queue object level, not a queue level.
If you dig into the code the concurrency setting is invoked at the point in which you call .process on your queue object. This means that even within the same Node application if you create multiple queues and call .process multiple times they will add to the number of concurrent jobs that can be processed.
One contributor posted the following:
Yes, It was a little surprising for me too when I used Bull first
time. Queue options are never persisted in Redis. You can have as many
Queue instances per application as you want, each can have different
settings. The concurrency setting is set when you're registering a
processor, it is in fact specific to each process() function call, not
Queue. If you'd use named processors, you can call process() multiple
times. Each call will register N event loop handlers (with Node's
process.nextTick()), by the amount of concurrency (default is 1).
So the answer to your question is: yes, your processes WILL be processed by multiple node instances if you register process handlers in multiple node instances.

Ah Welcome! This is a meta answer and probably not what you were hoping for but a general process for solving this:
Read the documentation ultra carefully to identify which guarantees your solution aims to provide:
You can specify a concurrency argument. Bull will then call your
handler in parallel respecting this maximum value.
I personally don't really understand this or the guarantees that bull provides. Since it's not super clear:
Dive into source to better understand what is actually happening. I usually just trace the path to understand:
https://github.com/OptimalBits/bull/blob/f05e67724cc2e3845ed929e72fcf7fb6a0f92626/lib/queue.js#L629
https://github.com/OptimalBits/bull/blob/f05e67724cc2e3845ed929e72fcf7fb6a0f92626/lib/queue.js#L651
https://github.com/OptimalBits/bull/blob/f05e67724cc2e3845ed929e72fcf7fb6a0f92626/lib/queue.js#L658
... more this is pretty big :p
If the implementation and guarantees offered are still not clear than create test cases to try and invalidate assumptions it sounds like:
Initialize process for the same queue with 2 different concurrency values
Create a queue and two workers, set a concurrent level of 1, and a callback that logs message process then times out on each worker, enqueue 2 events and observe if both are processed concurrently or if it is limited to 1
IMO the biggest thing is:
Can I be certain that jobs will not be processed by more than one Node
instance?
If exclusive message processing is an invariant and would result in incorrectness for your application, even with great documentation, I would highly recommend to perform due diligence on the library :p

Looking into it more, I think Bull doesn't handle being distributed across multiple Node instances at all, so the behavior is at best undefined.

Related

Why is Non blocking asynchronous single-threaded faster for IO than blocking multi-threaded for some applications

It helps me understand things by using real world comparison, in this case fastfood.
In java, for synchronous blocking I understand that each request processed by a thread can only be completed one at a time. Like ordering through a drive through, so if im tenth in line I have to wait for the 9 cars ahead of me. But, I can open up more threads such that multiple orders are completed simultaneously.
In javascript you can have asynchronous non-blocking but single threaded. As I understand it, multiple requests are made, and those request are immediately accepted, but the request is processed by some background process at some later time before returning. I don't understand how this would be faster. If you order 10 burgers at the same time the 10 requests would be put in immediately but since there is only one cook (single thread) it still takes the same time to create the 10 burgers.
I mean I understand the reasoning, of why non blocking async single thread "should" be faster for somethings, but the more I ask myself questions the less I understand it which makes me not understand it.
I really dont understand how non blocking async single threaded can be faster than sync blocking multithreaded for any type of application including IO.

Non-blocking async single threaded is sometimes faster
That's unlikely. Where are you getting this from?
In multi-threaded synchronous I/O, this is roughly how it works:
The OS and appserver platform (e.g. a JVM) work together to create 10 threads. These are data structures represented in memory, and a scheduler running at the kernel/OS level will use these data structures to tell one of your CPU cores to 'jump to' some point in the code to run the commands it finds there.
The datastructure that represents a thread contains more or less the following items:
What is the location in memory of the instruction we were running
The entire 'stack'. If some function invokes a second function, then we need to remember all local variables and the point we were at in that original method, so that when the second method 'returns', it knows how to do that. e.g. your average java program is probably ~20 methods deep, so that's 20x the local vars, 20 places in code to track. This is all done on stacks. Each thread has one. They tend to be fixed size for the entire app.
What cache page(s) were spun up in the local cache of the core running this code?
The code in the thread is written as follows: All commands to interact with 'resources' (which are orders of magnitude slower than your CPU; think network packets, disk access, etc) are specified to either return the data requested immediately (only possible if everything you asked for is already available and in memory). If that is impossible, because the data you wanted just isn't there yet (let's say the packet carrying the data you want is still on the wire, heading to your network card), there's only one thing to do for the code that powers the 'get me network data' function: Wait until that packet arrives and makes its way into memory.
To not just do nothing at all, the OS/CPU will work together to take that datastructure that represents the thread, freeze it, find another such frozen datastructure, unfreeze it, and jump to the 'where did we leave things' point in the code.
That's a 'thread switch': Core A was running thread 1. Now core A is running thread 2.
The thread switch involves moving a bunch of memory around: All those 'live' cached pages, and that stack, need to be near that core for the CPU to do the job, so that's a CPU loading in a bunch of pages from main memory, which does take some time. Not a lot (nanoseconds), but not zero either. Modern CPUs can only operate on the data loaded in a nearby cachepage (which are ~64k to 1MB in size, no more than that, a thousand+ times less than what your RAM sticks can store).
In single-threaded asynchronous I/O, this is roughly how it works:
There's still a thread of course (all things run in one), but this time the app in question doesn't multithread at all. Instead, it, itself, creates the data structures required to track multiple incoming connections, and, crucially, the primitives used to ask for data work differently. Remember that in the synchronous case, if the code asks for the next bunch of bytes from the network connection then the thread will end up 'freezing' (telling the kernel to find some other work to do) until the data is there. In asynchronous modes, instead the data is returned if available, but if not available, the function 'give me some data!' still returns, but it just says: Sorry bud. I have 0 new bytes for you.
The app itself will then decide to go work on some other connection, and in that way, a single thread can manage a bunch of connections: Is there data for connection #1? Yes, great, I shall process this. No? Oh, okay. Is there data for connection #2? and so on and so forth.
Note that, if data arrives on, say, connection #5, then this one thread, to do the job of handling this incoming data, will presumably need to load, from memory, a bunch of state info, and may need to write it.
For example, let's say you are processing an image, and half of the PNG data arrives on the wire. There's not a lot you can do with it, so this one thread will create a buffer and store half of the PNG inside it. As it then hops to another connection, it needs to load the ~15% of the image it alrady got, and add onto that buffer the 10% of the image that just arrived in a network packet.
This app is also causing a bunch of memory to be moved around into and out of cache pages just the same, so in that sense it's not all that different, and if you want to handle 100k things at once, you're inevitably going to end up having to move stuff into and out of cache pages.
So what is the difference? Can you put it in fry cook terms?
Not really, no. It's all just data structures.
The key difference is in what gets moved into and out of those cache pages.
In the case of async it is exactly what the code you wrote wants to buffer. No more, no less.
In the case of synchronous, it's that 'datastructure representing a thread'.
Take java, for example: That means at the very least the entire stack for that thread. That's, depending on the -Xss parameter, about 128k worth of data. So, if you have 100k connections to be handled simultaneously, that's 12.8GB of RAM just for those stacks!
If those incoming images really are all only about 4k in size, you could have done it with 4k buffers, for only 0.4GB of memory needed at most, if you handrolled that by going async.
That is where the gain lies for async: By handrolling your buffers, you can't avoid moving memory into and out of cache pages, but you can ensure it's smaller chunks. and that will be faster.
Of course, to really make it faster, the buffer for storing state in the async model needs to be small (not much point to this if you need to save 128k into memory before you can operate on it, that's how large those stacks were already), and you need to handle so many things at once (10k+ simultaneous).
There's a reason we don't write all code in assembler or why memory managed languages are popular: Handrolling such concerns is tedious and error-prone. You shouldn't do it unless the benefits are clear.
That's why synchronous is usually the better option, and in practice, often actually faster (those OS thread schedulers are written by expert coders and tweaked extremely well. You don't stand a chance to replicate their work) - that whole 'by handrolling my buffers I can reduce the # of bytes that need to be moved around a ton!' thing needs to outweigh the losses.
In addition, async is complicated as a programming model.
In async mode, you can never block. Wanna do a quick DB query? That could block, so you can't do that, you have to write your code as: Okay, fire off this job, and here's some code to run when it gets back. You can't 'wait for an answer', because in async land, waiting is not allowed.
In async mode, anytime you ask for data, you need to be capable of dealing with getting half of what you wanted. In synchronized mode, if you ask for 4k, you get 4k. The fact that your thread may freeze during this task until the 4k is available is not something you need to worry about, you write your code as if it just arrives as you ask for it, complete.
Bbbuutt... fry cooks!
Look, CPU design just isn't simple enough to put in terms of a restaurant like this.

You are mentally moving the bottleneck from your process (the burger orderer) to the other process (the burger maker).
This will not make your application faster.
When considering the single-threaded async model, the real benefit is that your process is not blocked while waiting for the other process.
In other words, do not associate async with the word fast but with the word free. Free to do other work.

RxJS share memory across node.js instances using Redis

We're working on a project where we are creating an event processor using RxJS. We have a few 'rules', so to speak, where input is provided from a few different source and output has to be generated based on the number of times an input is above a set value (simple rule).
Now, all this works without any problems, but we want to move the project from beta to production. This means running multiple instances of Node.JS with RxJS on top of it.
We're wondering if it's possible for RxJS to share its memory using Redis for example. This way when one of the instances dies for whatever reason, another one can pick up where the dead one stopped. Ensuring that the amount of times the value was above the set value is retained.
This would also allow us to spread the load over multiple instances if the 'rules' get more complex and the amount of data increases.
Is something like this possible with RxJS, or should we build our own administration around it?

You can't share memory between node.js processes, as far as I know. Doing so would be super-unsafe, since then you're dealing with concurrency problems that can't be mitigated with javascript (what happens when one process interrupts another?)
That said, you can pass messages back and forth with redis. Generally, what I do is establish a work queue as a redis queue. Some servers push into the work queue. Some workers will pull from the queue and process data and deal with the results as needed.
A concrete example is generating outbound emails in response to some REST event (new post or whatever). The webapp does a LPUSH to a known queue. The worker process can use BRPOPLPUSH to atomically pull an entry from the work queue and push it into an "in process" queue. Once the mail is sent, it can be removed from the in process queue. If there's a server crash or long timeout, entries in the in process queue can be pushed back into the work queue and re-tried.
But you're not going to get a fancy shared-memory solution here, I don't think at least.

What would happen if a variable were manipulated more than once at the exact same time? Is it possible? [duplicate]

Lets assume I run this piece of code.
var score = 0;
for (var i = 0; i < arbitrary_length; i++) {
async_task(i, function() { score++; }); // increment callback function
}
In theory I understand that this presents a data race and two threads trying to increment at the same time may result in a single increment, however, nodejs(and javascript) are known to be single threaded. Am I guaranteed that the final value of score will be equal to arbitrary_length?

Am I guaranteed that the final value of score will be equal to
arbitrary_length?
Yes, as long as all async_task() calls call the callback once and only once, you are guaranteed that the final value of score will be equal to arbitrary_length.
It is the single-threaded nature of Javascript that guarantees that there are never two pieces of Javascript running at the exact same time. Instead, because of the event driven nature of Javascript in both browsers and node.js, one piece of JS runs to completion, then the next event is pulled from the event queue and that triggers a callback which will also run to completion.
There is no such thing as interrupt driven Javascript (where some callback might interrupt some other piece of Javascript that is currently running). Everything is serialized through the event queue. This is an enormous simplification and prevents a lot of stickly situations that would otherwise be a lot of work to program safely when you have either multiple threads running concurrently or interrupt driven code.
There still are some concurrency issues to be concerned about, but they have more to do with shared state that multiple asynchronous callbacks can all access. While only one will ever be accessing it at any given time, it is still possible that a piece of code that contains several asynchronous operations could leave some state in an "in between" state while it was in the middle of several async operations at a point where some other async operation could run and could attempt to access that data.
You can read more about the event driven nature of Javascript here: How does JavaScript handle AJAX responses in the background? and that answer also contains a number of other references.
And another similar answer that discusses the kind of shared data race conditions that are possible: Can this code cause a race condition in socket io?
Some other references:
how do I prevent event handlers to handle multiple events at once in javascript?
Do I need to be concerned with race conditions with asynchronous Javascript?
JavaScript - When exactly does the call stack become "empty"?
Node.js server with multiple concurrent requests, how does it work?
To give you an idea of the concurrency issues that can happen in Javascript (even without threads and without interrupts, here's an example from my own code.
I have a Raspberry Pi node.js server that controls the attic fans in my house. Every 10 seconds it checks two temperature probes, one inside the attic and one outside the house and decides how it should control the fans (via relays). It also records temperature data that can be presented in charts. Once an hour, it saves the latest temperature data that was collected in memory to some files for persistence in case of power outage or server crash. That saving operation involves a series of async file writes. Each one of those async writes yields control back to the system and then continues when the async callback is called signaling completion. Because this is a low memory system and the data can potentially occupy a significant portion of the available RAM, the data is not copied in memory before writing (that's simply not practical). So, I'm writing the live in-memory data to disk.
At any time during any of these async file I/O operations, while waiting for a callback to signify completion of the many file writes involved, one of my timers in the server could fire, I'd collect a new set of temperature data and that would attempt to modify the in-memory data set that I'm in the middle of writing. That's a concurrency issue waiting to happen. If it changes the data while I've written part of it and am waiting for that write to finish before writing the rest, then the data that gets written can easily end up corrupted because I will have written out one part of the data, the data will have gotten modified from underneath me and then I will attempt to write out more data without realizing it's been changed. That's a concurrency issue.
I actually have a console.log() statement that explicitly logs when this concurrency issue occurs on my server (and is handled safely by my code). It happens once every few days on my server. I know it's there and it's real.
There are many ways to work around those types of concurrency issues. The simplest would have been to just make a copy in memory of all the data and then write out the copy. Because there are not threads or interrupts, making a copy in memory would be safe from concurrency (there would be no yielding to async operations in the middle of the copy to create a concurrency issue). But, that wasn't practical in this case. So, I implemented a queue. Whenever I start writing, I set a flag on the object that manages the data. Then, anytime the system wants to add or modify data in the stored data while that flag is set, those changes just go into a queue. The actual data is not touched while that flag is set. When the data has been safely written to disk, the flag is reset and the queued items are processed. Any concurrency issue was safely avoided.
So, this is an example of concurrency issues that you do have to be concerned about. One great simplifying assumption with Javascript is that a piece of Javascript will run to completion without any thread of getting interrupted as long as it doesn't purposely return control back to the system. That makes handling concurrency issues like described above lots, lots easier because your code will never be interrupted except when you consciously yield control back to the system. This is why we don't need mutexes and semaphores and other things like that in our own Javascript. We can use simple flags (just a regular Javascript variable) like I described above if needed.
In any entirely synchronous piece of Javascript, you will never be interrupted by other Javascript. A synchronous piece of Javascript will run to completion before the next event in the event queue is processed. This is what is meant by Javascript being an "event-driven" language. As an example of this, if you had this code:
console.log("A");
// schedule timer for 500 ms from now
setTimeout(function() {
console.log("B");
}, 500);
console.log("C");
// spin for 1000ms
var start = Date.now();
while(Data.now() - start < 1000) {}
console.log("D");
You would get the following in the console:
A
C
D
B
The timer event cannot be processed until the current piece of Javascript runs to completion, even though it was likely added to the event queue sooner than that. The way the JS interpreter works is that it runs the current JS until it returns control back to the system and then (and only then), it fetches the next event from the event queue and calls the callback associated with that event.
Here's the sequence of events under the covers.
This JS starts running.
console.log("A") is output.
A timer event is schedule for 500ms from now. The timer subsystem uses native code.
console.log("C") is output.
The code enters the spin loop.
At some point in time part-way through the spin loop the previously set timer is ready to fire. It is up to the interpreter implementation to decide exactly how this works, but the end result is that a timer event is inserted into the Javascript event queue.
The spin loop finishes.
console.log("D") is output.
This piece of Javascript finishes and returns control back to the system.
The Javascript interpreter sees that the current piece of Javascript is done so it checks the event queue to see if there are any pending events waiting to run. It finds the timer event and a callback associated with that event and calls that callback (starting a new block of JS execution). That code starts running and console.log("B") is output.
That setTimeout() callback finishes execution and the interpreter again checks the event queue to see if there are any other events that are ready to run.

Node uses an event loop. You can think of this as a queue. So we can assume, that your for loop puts the function() { score++; } callback arbitrary_length times on this queue. After that the js engine runs these one by one and increase score each time. So yes. The only exception if a callback is not called or the score variable is accessed from somewhere else.
Actually you can use this pattern to do tasks parallel, collect the results and call a single callback when every task is done.
var results = [];
for (var i = 0; i < arbitrary_length; i++) {
async_task(i, function(result) {
results.push(result);
if (results.length == arbitrary_length)
tasksDone(results);
});
}

No two invocations of the function can happen at the same time (b/c node is single threaded) so that will not be a problem. The only problem would be ifin some cases async_task(..) drops the callback. But if, e.g., 'async_task(..)' was just calling setTimeout(..) with the given function, then yes, each call will execute, they will never collide with each other, and 'score' will have the value expected, 'arbitrary_length', at the end.
Of course, the 'arbitrary_length' can't be so great as to exhaust memory, or overflow whatever collection is holding these callbacks. There is no threading issue however.

I do think it’s worth noting for others that view this, you have a common mistake in your code. For the variable i you either need to use let or reassign to another variable before passing it into the async_task(). The current implementation will result in each function getting the last value of i.

How many events can Node.js queue?

From what I see, if an event in Node take a "long time" to be dispatched, Node creates some kind of "queue of events", and they are triggered as soon as possible, one by one.
How long can this queue be?

While this may seem like a simple question, it is actually a rather complex problem; unfortunately, there's no simple number that anyone can give you.
First: wall time doesn't really play a part in anything here. All events are dispatched in the same fashion, whether or not things are taking "a long time." In other words, all events pass through a "queue."
Second: there is no single queue. There are many places where different kinds of events can be dispatched into JS. (The following assumes you know what a tick is.)
There are the things you (or the libraries you use) pass to process.nextTick(). They are called at the end of the current tick until the nextTick queue is empty.
There are the things you (or the libraries you use) pass to setImmediate(). They are called at the start of the next tick. (This means that nextTick tasks can add things to the current tick indefinitely, preventing other operations from happening whereas setImmediate tasks can only add things to the queue for the next tick.)
I/O events are handled by libuv via epoll/kqueue/IOCP on Linux/Mac/Windows respectively. When the OS notifies libuv that I/O has happened, it in turn invokes the appropriate handler in JS. A given tick of the event loop may process zero or more I/O events; if a tick takes a long time, I/O events will queue in an operating system queue.
Signals sent by the OS.
Native code (C/C++) executed on a separate thread may invoke JS functions. This is usually accomplished through the libuv work queue.
Since there are many places where work may be queued, it is not easy to answer "how many items are currently queued", much less what the absolute limit of those queues are. Essentially, the hard limit for the size of your task queues is available RAM.
In practice, your app will:
Hit V8 heap constraints
For I/O, max out the number of allowable open file descriptors.
...well before the size of any queue becomes problematic.
If you're just interested in whether or not your app under heavy load, toobusy may be of interest -- it times each tick of the event loop to determine whether or not your app is spending an unusual amount of time processing each tick (which may indicate that your task queues are very large).

Handlers for a specific event are called synchronously (in the order they were added) as soon as the event is emitted, they are not delayed at all.
The total number of event handlers is limited only by v8 and/or the amount of available RAM.

I believe you're talking about operations that can take an undefined amount of time to complete, such as an http request or filesystem access.
Node gives you a method to complete these types of operations asynchronously, meaning that you can tell node, or a 3rd party library, to start an operation, and then call some code (a function that you define) to inform you when the operation is complete. This can be done through event listeners, or callback functions, both of which have their own limitations.
With event listeners the maximum amount of listeners you can have is dependent on the maximum array size of your environment. In the case of node.js the javascript engine is v8, but according to this post there is a maximum set out by the 5th ECMA standard of ~4billion elements, which is a limit that you shouldn't ever overcome.
With callbacks the limitation you have is the max call stack size, meaning how deep your functions can call each other. For instance you can have a callback calling a callback calling a callback calling another callback, etc etc. The call stack size dictates how may callbacks calling callbacks you can have. Note that the call stack size can be a limitation with event listeners as well as they're essentially callbacks that can be executed multiple times.
And these are the limitations with each.

What's an event-loop and how is it different than using other models?

I have been looking into Node.JS and all the documentation and blogs talk about how it uses an event-loop rather than a per-request model.
I am having some confusion understanding the difference. I feel like I am 80% there understanding it but not fully getting it yet.

A threaded model will spawn a new thread for every request. This means that you get quite some overhead in terms of computation and memory. An event loop runs in a single thread, which means you don't get the overhead.
The result of this is that you must change your programming model. Because all these different things are happening in the same thread, you cannot block. This means you cannot wait for something to happen because that would block the whole thread. Instead you define a callback that is called once the action is complete. This is usually referred to as non-blocking I/O.
Pseudo example for blocking I/O:
row = db_query('SELECT * FROM some_table');
print(row);
Pseudo example for non-blocking I/O:
db_query('SELECT * FROM some_table', function (row) {
print(row);
});
This example uses lambdas (anonymous functions) like they are used in JavaScript all the time. JS makes heavy use of events, and that's exactly what callbacks are about. Once the action is complete, an event is fired which triggers the callback. This is why it is often referred to as an evented model or also asynchronous model.
The implementation of this model uses a loop that processes and fires these events. That's why it is called an event queue or event loop.
Prominent examples of event queue frameworks include:
EventMachine (Ruby)
Tornado (Python)
node.js (V8 server-side JavaScript)

Think of incoming requests or callbacks as events, that are enqueued and processed.
That is exactly the same what is done in most of the GUI systems. The system can't know when a user will click a button or do some interaction. But when he does, the event will propagated to the event loop, which is basically a loop that checks for new events in the queue and process them.
The advantage is, that you don't have to wait for results for yourself. Instead, you register callback functions that are executed when the event is triggered. This allows the framework to handle I/O stuff and you can easily rely on it's internal efficiency when dealing with long-taking actions instead of blocking processes by yourself.
In short, everythings runs in parallel but your code. There will never be two fragments of callback functions running concurrently – the event loop is a single thread. The processes that execute stuff externally and finally propagate events however can be distributed in multiple threads/processes.

An evented loop allows you to handle the time it takes to talk to the hard drive or network. take this list of time:
Source | CPU Cycles
L1 | 3 Cycles
L2 | 14 Cycles
RAM | 250 Cycles
Disk | 41,000,000 Cycles
Network| 240,000,000 Cycles
That time you're running curl in PHP is just wasting CPU.

We Keep Coding

JavaScript is the programming language of the Web.