I want to understand internal working of node.js, I am intentionally including computation task ( for loop). But I see it is still blocking main thread.
Here is my script
console.log("start");
for (let i = 0; i < 10; i++) {
console.log(i)
}
console.log("end")
And the o/p is :
start
1
2
3
....
10
end
But according to node.js architecture shouldn't high computation tasks be executed by different thread picked from thread pool and event loop continue executing non-blocking task?
I am referencing node.js internal architecture using this link enter link description here
Can someone please explain the architecture and behavior of the script?
By default, nodejs uses only ONE thread to run your Javascript with. That means that (unless you engage WorkerThreads which are essentially an entirely separate VM), only one piece of Javascript is ever running at once. Nodejs does not "detect" some long running piece of Javascript and move it to another thread. It has no features like that at all. If you have some long running piece of synchronous Javascript, it will block the event loop and block all other Javascript and all other event processing.
Internal to its implementation, nodejs has a thread pool that it uses for certain types of native code (internal implementations of file I/O and crypto operations). That only supports the implementation of asynchronous implementations for file I/O and crypto operations - it does not parallelize the running of Javascript.
So, your script you show:
console.log("start");
for (let i = 0; i < 10; i++) {
console.log(i)
}
console.log("end")
Is entirely synchronous and runs sequentially and blocks all other Javascript from running while it is running because it is using the one thread for running Javascript while it is running.
Nodejs gets its excellent scalability from its asynchronous I/O model that does not have to use a separate thread in order to have lots of asynchronous operations in flight at the same time. But, keep in mind that these asynchronous I/O operations all have native code behind them (some of which may use threads in their native code implementations).
But, if you have long running synchronous Javascript operations (like say something like image analysis written in Javascript), then those typically need to be moved out of the main event loop thread either by shunting them off to WorkerThreads or to other processes or to a native code implementation that may use OS threads.
But according to node.js architecture shouldn't high computation tasks be executed by different thread picked from thread pool and event loop continue executing non-blocking task?
No, that is not how nodejs works and is not a correct interpretation of the diagram you show. The thread pool is NOT used for running your Javascript. It is used for internal implementation of some APIs such as file I/O and some crypto operations. It is not used for running your Javascript. There is just one main thread for running your Javascript (unless you specifically run your code in a WorkerThread).
I want to understand internal working of node.js, I am intentionally including computation task ( for loop). But I see it is still blocking main thread.
Yes, a for loop (that does not contain an await statement that is awaiting a promise) will completely occupy the single Javascript thread and will block the event loop from processing other events while the for loop is running.
JS executes its code Synchronouse. there are few things that gets "Asynchronouse" like setInterval or setTimout for exmple. But thats actually not fully true. Asynchronouse means things get done in parallel witch is not true. Take a look at setTimeout. By executing it you add the function into the task que, later the event loop grabs it from the que and put it onto the stack and executes it, syncrhonouse. If you want to execute something really parallel then you should consider using an worker thread
There are absolutely no threads in JS (unless you explicitly use worker threads). Javascript uses cooperative multi-tasking which means that a function will always complete before the next one will start. The only other way to yield control back to the scheduler is to separate a task out into another function that is called asynchronously. So in your example, e.g., you could do:
console.log("start");
setTimeout(() => {
for (let i = 0; i < 10; i++) {
console.log(i)
}}, 0);
console.log("end")
and you would get:
start
end
1
2
..
9
This also answers your question about heavy computations: unless you use the relatively new worker threads, you cannot run heavy computations in node.js "in the background" without the use of native code.
So if you really have heavy loads you have three options:
worker threads,
native code that is multi-threaded, e.g., written in C/C++, or
breaking your computation down into small pieces, each one yielding control back to the scheduler when done (e.g., using map/reduce).
Related
I just read an article about the event loop in JavaScript.
I found two contradictive phrases and I would be glad if someone could clarify.
A downside of this model is that if a message takes too long to
complete, the web application is unable to process user interactions
like click or scroll. The browser mitigates this with the "a script is
taking too long to run" dialog
A very interesting property of the event loop model is that
JavaScript, unlike a lot of other languages, never blocks. Handling
I/O is typically performed via events and callbacks, so when the
application is waiting for an IndexedDB query to return or an XHR
request to return, it can still process other things like user input
So, when is the first one true and when is the second one true?
"A very interesting property of the event loop model is that
JavaScript, unlike a lot of other languages, never blocks.
This is misleading. Without clever programming, JavaScript would always block the UI thread, because runtime logic always blocks the UI, by design. At a smooth sixty frames a second, that means your application logic must always cooperatively yield control (or simply complete execution) within about 16 milliseconds, otherwise your UI will freeze or stutter.
Because of this, most JavaScript APIs that might take a long time (eg. network requests) are designed in such a way to use techniques (eg callbacks, promises) to circumvent this problem, so that they do not block the event loop, avoiding the UI becoming unresponsive.
Put another way: host environments (eg a Web browser or a Node.js runtime instance) are specifically designed to enable the use of an event-based programming model (originally inspired by programming environments like Hypercard on the Mac) whereby the host environment can be asked to perform a long-running task (eg run a timer), without blocking the main thread of execution, and for your program to be notified later, via an "event" when the long-running task is complete, enabling your program to pick-up where it left-off.
Both are correct, even though I agree it is somewhat wrongly expressed.
So by points:
It's true that if a synchronous task takes too long to complete, the event loop "gets stuck" there and then all other queued tasks can't run till it finishes.
Here it is talking about asynchronous tasks so even though an HTTP request, an I/O request or whatever that is async takes too long to process, all the synchronous tasks can keep doing their job, like processing user input
There are two types of code inside Javascript
Synchronous (it's like going one by one).
Asynchronous (it's like skipping for the future)
Synchronous code
You want to find the prime number from 1 to 10000000 with synchronous code you will write a function and that function will perform the calculation and finds out the prime number in the given range but what will happen with synchronous code. The javascript engine is not able to do any task until that task gets finished.
Asynchronous Code
If you wrap the same code inside a callback or more friendly with the SetTimeout method the javascript put that function inside the event queue and perform the other operation when a certain time came the timeout method fires callback certainly when there is nothing inside the call stack, it will ask event loop to pass the first thing which is inside the event queue. So this more about finding an idle time to perform the heavy operation.
Use javascript workers to perform heavy mathematics tasks not
SetTimeout because eventually, it will block the engine when the
function is inside the call stack.
I thought that they were basically the same thing — writing programs that split tasks between processors (on machines that have 2+ processors). Then I'm reading this, which says:
Async methods are intended to be non-blocking operations. An await
expression in an async method doesn’t block the current thread while
the awaited task is running. Instead, the expression signs up the rest
of the method as a continuation and returns control to the caller of
the async method.
The async and await keywords don't cause additional threads to be
created. Async methods don't require multithreading because an async
method doesn't run on its own thread. The method runs on the current
synchronization context and uses time on the thread only when the
method is active. You can use Task.Run to move CPU-bound work to a
background thread, but a background thread doesn't help with a process
that's just waiting for results to become available.
and I'm wondering whether someone can translate that to English for me. It seems to draw a distinction between asynchronicity (is that a word?) and threading and imply that you can have a program that has asynchronous tasks but no multithreading.
Now I understand the idea of asynchronous tasks such as the example on pg. 467 of Jon Skeet's C# In Depth, Third Edition
async void DisplayWebsiteLength ( object sender, EventArgs e )
{
label.Text = "Fetching ...";
using ( HttpClient client = new HttpClient() )
{
Task<string> task = client.GetStringAsync("http://csharpindepth.com");
string text = await task;
label.Text = text.Length.ToString();
}
}
The async keyword means "This function, whenever it is called, will not be called in a context in which its completion is required for everything after its call to be called."
In other words, writing it in the middle of some task
int x = 5;
DisplayWebsiteLength();
double y = Math.Pow((double)x,2000.0);
, since DisplayWebsiteLength() has nothing to do with x or y, will cause DisplayWebsiteLength() to be executed "in the background", like
processor 1 | processor 2
-------------------------------------------------------------------
int x = 5; | DisplayWebsiteLength()
double y = Math.Pow((double)x,2000.0); |
Obviously that's a stupid example, but am I correct or am I totally confused or what?
(Also, I'm confused about why sender and e aren't ever used in the body of the above function.)
Your misunderstanding is extremely common. Many people are taught that multithreading and asynchrony are the same thing, but they are not.
An analogy usually helps. You are cooking in a restaurant. An order comes in for eggs and toast.
Synchronous: you cook the eggs, then you cook the toast.
Asynchronous, single threaded: you start the eggs cooking and set a timer. You start the toast cooking, and set a timer. While they are both cooking, you clean the kitchen. When the timers go off you take the eggs off the heat and the toast out of the toaster and serve them.
Asynchronous, multithreaded: you hire two more cooks, one to cook eggs and one to cook toast. Now you have the problem of coordinating the cooks so that they do not conflict with each other in the kitchen when sharing resources. And you have to pay them.
Now does it make sense that multithreading is only one kind of asynchrony? Threading is about workers; asynchrony is about tasks. In multithreaded workflows you assign tasks to workers. In asynchronous single-threaded workflows you have a graph of tasks where some tasks depend on the results of others; as each task completes it invokes the code that schedules the next task that can run, given the results of the just-completed task. But you (hopefully) only need one worker to perform all the tasks, not one worker per task.
It will help to realize that many tasks are not processor-bound. For processor-bound tasks it makes sense to hire as many workers (threads) as there are processors, assign one task to each worker, assign one processor to each worker, and have each processor do the job of nothing else but computing the result as quickly as possible. But for tasks that are not waiting on a processor, you don't need to assign a worker at all. You just wait for the message to arrive that the result is available and do something else while you're waiting. When that message arrives then you can schedule the continuation of the completed task as the next thing on your to-do list to check off.
So let's look at Jon's example in more detail. What happens?
Someone invokes DisplayWebSiteLength. Who? We don't care.
It sets a label, creates a client, and asks the client to fetch something. The client returns an object representing the task of fetching something. That task is in progress.
Is it in progress on another thread? Probably not. Read Stephen's article on why there is no thread.
Now we await the task. What happens? We check to see if the task has completed between the time we created it and we awaited it. If yes, then we fetch the result and keep running. Let's suppose it has not completed. We sign up the remainder of this method as the continuation of that task and return.
Now control has returned to the caller. What does it do? Whatever it wants.
Now suppose the task completes. How did it do that? Maybe it was running on another thread, or maybe the caller that we just returned to allowed it to run to completion on the current thread. Regardless, we now have a completed task.
The completed task asks the correct thread -- again, likely the only thread -- to run the continuation of the task.
Control passes immediately back into the method we just left at the point of the await. Now there is a result available so we can assign text and run the rest of the method.
It's just like in my analogy. Someone asks you for a document. You send away in the mail for the document, and keep on doing other work. When it arrives in the mail you are signalled, and when you feel like it, you do the rest of the workflow -- open the envelope, pay the delivery fees, whatever. You don't need to hire another worker to do all that for you.
In-browser Javascript is a great example of an asynchronous program that has no multithreading.
You don't have to worry about multiple pieces of code touching the same objects at the same time: each function will finish running before any other javascript is allowed to run on the page. (Update: Since this was written, JavaScript has added async functions and generator functions. These functions do not always run to completion before any other javascript is executed: whenever they reach a yield or await keyword, they yield execution to other javascript, and can continue execution later, similar to C#'s async methods.)
However, when doing something like an AJAX request, no code is running at all, so other javascript can respond to things like click events until that request comes back and invokes the callback associated with it. If one of these other event handlers is still running when the AJAX request gets back, its handler won't be called until they're done. There's only one JavaScript "thread" running, even though it's possible for you to effectively pause the thing you were doing until you have the information you need.
In C# applications, the same thing happens any time you're dealing with UI elements--you're only allowed to interact with UI elements when you're on the UI thread. If the user clicked a button, and you wanted to respond by reading a large file from the disk, an inexperienced programmer might make the mistake of reading the file within the click event handler itself, which would cause the application to "freeze" until the file finished loading because it's not allowed to respond to any more clicking, hovering, or any other UI-related events until that thread is freed.
One option programmers might use to avoid this problem is to create a new thread to load the file, and then tell that thread's code that when the file is loaded it needs to run the remaining code on the UI thread again so it can update UI elements based on what it found in the file. Until recently, this approach was very popular because it was what the C# libraries and language made easy, but it's fundamentally more complicated than it has to be.
If you think about what the CPU is doing when it reads a file at the level of the hardware and Operating System, it's basically issuing an instruction to read pieces of data from the disk into memory, and to hit the operating system with an "interrupt" when the read is complete. In other words, reading from disk (or any I/O really) is an inherently asynchronous operation. The concept of a thread waiting for that I/O to complete is an abstraction that the library developers created to make it easier to program against. It's not necessary.
Now, most I/O operations in .NET have a corresponding ...Async() method you can invoke, which returns a Task almost immediately. You can add callbacks to this Task to specify code that you want to have run when the asynchronous operation completes. You can also specify which thread you want that code to run on, and you can provide a token which the asynchronous operation can check from time to time to see if you decided to cancel the asynchronous task, giving it the opportunity to stop its work quickly and gracefully.
Until the async/await keywords were added, C# was much more obvious about how callback code gets invoked, because those callbacks were in the form of delegates that you associated with the task. In order to still give you the benefit of using the ...Async() operation, while avoiding complexity in code, async/await abstracts away the creation of those delegates. But they're still there in the compiled code.
So you can have your UI event handler await an I/O operation, freeing up the UI thread to do other things, and more-or-less automatically returning to the UI thread once you've finished reading the file--without ever having to create a new thread.
I went through the link below and understood single threaded javascript and its asynchronous nature a little
https://www.sohamkamani.com/blog/2016/03/14/wrapping-your-head-around-async-programming/
But I still have questions that javascript is single threaded and it always moves in forward direction in sequential manner until it finishes its execution.
Whenever we made call to function which has a callback, that callback will be executed after function receives response. Execution of javascript code continues during the wait time for the response. In this way where execution happening in sequence how callback execution will be resumed once after response received. It's like thread is moving backwards for callback execution.
Thread of execution should always move in forward direction righy?.
please clarify on this.
It's true that JavaScript is (now) specified to have only a single active thread per realm (roughly: a global environment and its contents).¹ But I wouldn't call it "single-threaded;" you can have multiple threads via workers. They do not share a common global environment, which makes it dramatically easier to reason about code and not worry about the values of variables changing out from under you unexpectedly, but they can communicate via messaging and even access shared memory (with all the complications that brings, including the values of shared memory slots changing out from under you unexpectedly).
But running on a single thread and having asynchronous callbacks are not at all in conflict. A JavaScript thread works on the basis of a job queue that jobs get added to. A job is a unit of code that runs to completion (no other code in the realm can run until it does). When that unit of code is done running to completion, the thread picks up the next job from the queue and runs that. One job cannot interrupt another job. Jobs running on the main thread (the UI thread in browsers) cannot be suspended in the middle (mostly²), though jobs on worker threads can be (via Atomics.wait). If a job is suspended, no other job in the realm will run until that job is resumed and completed.
So for instance, consider:
console.log("one");
setTimeout(function() {
console.log("three");
}, 10);
console.log("two");
When you run that, you see
one
two
three
in the console. Here's what happened:
A job for the main script execution was added to the job queue
The main JavaScript thread for the browser picked up that job
It ran the first console.log, setTimeout, and last console.log
The job terminated
The main JavaScript thread idled for a bit
The browser's timer mechanism determined that it was time for that setTimeout callback to run and added a job to the job queue to run it
The main JavaScript thread picked up that job and ran that final console.log
If the main JavaScript thread were tied up (for instance, while (true);), jobs would just pile up in the queue and never get processed, because that job never completes.
¹ The JavaScript specification was silent on the topic of threading until fairly recently. Browsers and Node.js used a single-active-thread-per-realm model (mostly), but some much less common environments didn't. I vaguely recall an early fork of V8 (the JavaScript engine in Chromium-based browsers and Node.js) that added multiple threading, but it never went anywhere. The Java virtual machine can run JavaScript code via its scripting support, and that code is multi-threaded (or at least it was with the Rhino engine; I have no ideal whether Narwhal changes that), but again that's quite niche.
² "A job is a unit of code that runs to completion." and "Jobs running on th emain thread...cannot be suspended in the middle..." Two caveats here:
alert, confirm, and prompt — those 90's synchronous user interactions — suspend a job on the main UI thread while waiting on the user. This is antiquated behavior that's grandfathered in (and is being at least partially phased out).
Naturally, the host process — browser, etc. — can terminate the entire environment a job is running in while the job is running. For instance, when a web page becomes "unresponsive," the browser can kill it. But that's not just the job, it's the entire environment the job was running in.
Just to add to T.J.Crowder’s answer above:
The job queue is called an Event Loop which keeps track of all the callbacks that need to be executed. Whenever a callback is ready to be executed ( example: after an asynchronous action has finished ), it is added in the Event loop.
As explained by T.J. Crowder, you can imagine Event loop as a queue. Whenever there is a callback to execute in the loop, the loop takes control of the main thread and executes that callback. The execution of the normal flow stops while this is happening. This way JavaScript can be imagined as a single-threaded language.
You can learn more about Event Loops and how they work in this amazing talk by Philip Roberts.
Have read tons of articles and docs, however this topic still not clear enough to me. Quote from one answer https://stackoverflow.com/a/46004461/630169:
As long as the code contained inside the async/await is non-blocking
it won't block, for example, db calls, network calls, filesystem calls.
But if the code contained inside async/await is blocking, then it will
block the entire Node.js process, for example, infinite loops, CPU
intensive tasks like image processing, etc.
However, Understanding the node.js event loop says:
Of course, on the backend, there are threads and processes for DB
access and process execution.
In C# it is enough to write function marked with async and call with await so .Net puts it in another thread. However, it confused me things organized differently in Node.js and async/await function still could block the main thread.
So the question is: how to write (organize) arbitrary async/await function in node.js to be sure it will run asynchronously in separate thread or process? Is there good code example? Some npm module? Also good to have it not much trickier than C# variant. Thanks!
Some function example to made it non-blocking, for example, if I want synchronous DB call to make asynchronous (non-blocking):
var Database = require('better-sqlite3');
var db = new Database('./my_db.sqlite');
async function DBRequest() {
var row = db.prepare("SELECT * FROM table");
return row;
};
Note: better-sqlite3 — synchronous module.
Well here's some example code. Decided it was a worthwhile exercise to provide it.
You can write long-running blocking code in such a way that it can yield execution time to other functions
var array = new Array(100);
function processNext(){
if (array.length === 0) return;
var item = array.shift(); // gets first item from array and removes it.
process(item); // 0.5 seconds of blocking time
setTimeout(processNext ,500); // wait 0.5 seconds and then process the next one
// during this waiting time, other code will run on your server.
}
processNext();
Admittedly I'm a novice and this may be a very bad idea for reasons I don't know about.
You're really at the mercy of the library your using here - if the code is synchronous and not I/O bound then there's really not much else you can do within your Node process to make it asynchronous.
Your only real alternative is to move that code into it's own process which consequently makes it I/O bound, that way your app can wait on it and not block it's own thread.
My interest was not for nothing - similar questions people asked before me, as well as on stackoverflow (and here):
But what's with longish, CPU-bound tasks?
How do you avoid blocking the event loop, when the task is at hand is
not I / O bound, and lasts more than a few fractions of a millisecond?
You simply can not, because there's no way ... well, there was not
before threads_a_gogo.
and to solve them they have already created a bunch of modules, and some, such as threads, work both in the browser and in Node.js:
threads
Threads à gogo
nPool
threadpool-js
threadpool
etc.
Good if someone could attach async/await to them or provide good example - that would be nice.
By the way, here's a comrade tests driven on Threads à gogo - results with threads is 40x faster than with Cluster. So single threaded idea of Node.js does not always work well.
I have some tasks I want to do in JS that are resource intensive. For this question, lets assume they are some heavy calculations, rather then system access. Now I want to run tasks A, B and C at the same time, and executing some function D when this is done.
The async library provides a nice scaffolding for this:
async.parallel([A, B, C], D);
If what I am doing is just calculations, then this will still run synchronously (unless the library is putting the tasks on different threads itself, which I expect is not the case). How do I make this be actually parallel? What is the thing done typically by async code to not block the caller (when working with NodeJS)? Is it starting a child process?
2022 notice: this answer predates the introduction of worker threads in Node.js
How do I make this be actually parallel?
First, you won't really be running in parallel while in a single node application. A node application runs on a single thread and only one event at a time is processed by node's event loop. Even when running on a multi-core box you won't get parallelism of processing within a node application.
That said, you can get processing parallelism on multicore machine via forking the code into separate node processes or by spawning child process. This, in effect, allows you to create multiple instances of node itself and to communicate with those processes in different ways (e.g. stdout, process fork IPC mechanism). Additionally, you could choose to separate the functions (by responsibility) into their own node app/server and call it via RPC.
What is the thing done typically by async code to not block the caller (when working with NodeJS)? Is it starting a child process?
It is not starting a new process. Underneath, when async.parallel is used in node.js, it is using process.nextTick(). And nextTick() allows you to avoid blocking the caller by deferring work onto a new stack so you can interleave cpu intensive tasks, etc.
Long story short
Node doesn't make it easy "out of the box" to achieve multiprocessor concurrency. Node instead gives you a non-blocking design and an event loop that leverages a thread without sharing memory. Multiple threads cannot share data/memory, therefore locks aren't needed. Node is lock free. One node process leverages one thread, and this makes node both safe and powerful.
When you need to split work up among multiple processes then use some sort of message passing to communicate with the other processes / servers. e.g. IPC/RPC.
For more see:
Awesome answer from SO on What is Node.js... with tons of goodness.
Understanding process.nextTick()
Asynchronous and parallel are not the same thing. Asynchronous means that you don't have to wait for synchronization. Parallel means that you can be doing multiple things at the same time. Node.js is only asynchronous, but its only ever 1 thread. It can only work on 1 thing at once. If you have a long running computation, you should start another process and then just have your node.js process asynchronously wait for results.
To do this you could use child_process.spawn and then read data from stdin.
http://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options
var spawn = require('child_process').spawn;
var process2 = spawn('sh', ['./computationProgram', 'parameter'] );
process2.stderr.on('data', function (data) {
//handle error input
});
process2.stdout.on('data', function (data) {
//handle data results
});
Keep in mind I/O is parallelized by Node.js; only your JavaScript callbacks are single threaded.
Assuming you are writing a server, an alternative to adding the complexity of spawning processes or forking is to simply build stateless node servers and run an instance per core, or better yet run many instances each in their own virtualized micro server. Coordinate incoming requests using a reverse proxy or load balancer.
You could also offload computation to another server, maybe MongoDB (using MapReduce) or Hadoop.
To be truly hardcore, you could write a Node plugin in C++ and have fine-grained control of parallelizing the computation code. The speed up from C++ might negate the need of parallelization anyway.
You can always write code to perform computationally intensive tasks in another language best suited for numeric computation, and e.g. expose them through a REST API.
Finally, you could perhaps run the code on the GPU using node-cuda or something similar depending on the type of computation (not all can be optimized for GPU).
Yes, you can fork and spawn other processes, but it seems to me one of the major advantages of node is to not much have to worry about parallelization and threading, and therefor bypass a great amount of complexity altogether.
Depending on your use case you can use something like
task.js Simplified interface for getting CPU intensive code to run on all cores (node.js, and web)
A example would be
function blocking (exampleArgument) {
// block thread
}
// turn blocking pure function into a worker task
const blockingAsync = task.wrap(blocking);
// run task on a autoscaling worker pool
blockingAsync('exampleArgumentValue').then(result => {
// do something with result
});
Just recently came across parallel.js but it seems to be actually using multi-core and also has map reduce type features.
http://adambom.github.io/parallel.js/