I know that CPU intensive work in main process will block the UI process. I have another question, does long time IO operation in main process block the UI.
Recently, I use electron to develop a desktop application of file management.
Step 1:
My UI process use asynchronous IPC (provided by Electron) to tell the main-process to fetch data of file list from network, (only fetch meta data of file, not contain file content)
Step 2:
Main-process fetch data of file list from network and then store the file list into sqlite(I use TypeORM ), and then select parts of the file list from sqlite, and response them back to UI-process
Sometimes the step2 costs tens of seconds (for example, I fetch 10000 items of file data from network ), and my UI will be slowed down.
So, I have two question:
+ Does long time IO operation in main process block the UI ?
+ What's the best way to do IO operation(database or local file) in electron applcation ?
Potentially, I/O can block your application. Node offers blocking and non-blocking I/O operations. You'll want to use the non-blocking variants.
The Node docs have a section on blocking vs non-blocking I/O. Two code samples from that page, one blocking, one non-blocking:
const fs = require('fs');
const data = fs.readFileSync('/file.md'); // blocks here until file is read
const fs = require('fs');
fs.readFile('/file.md', (err, data) => {
if (err) throw err;
});
The second question ("what's the best way?") is opinionated and off-topic so I'll focus on the first:
Does long time IO operation in main process block the UI ?
No it does not. I/O in electron happens either from the Chromium side or the Node.js side - in both cases JavaScript's I/O execution model uses an event loop. The action is queued and then performed either on a threadpool in the background (like dns queries for example) or using native operating system async non-blocking I/O facilities (like socket writes).
The one caveat is the browsers do offer some (older) APIs that are blocking (like a synchronous XMLHttpRequest), but you are likely not using those.
For more details see our event loop and timers tutorial.
Related
I just started getting into child_process and all I know is that it's good for delegating blocking functions (e.g. looping a huge array) to child processes.
I use typeorm to communicate with the mysql database. I was wondering if there's a benefit to move some of the asynchronous database works to child processes. I read it in another thread (Unfortunately I couldn't find it in the browser history) that there's no good reason to delegate async functions to child processes. Is it true?
example code:
child.js
import {createConnection} "./dbConnection";
import {SomeTable} from "./entity/SomeTable";
process.on('message', (m)=> {
createConnection().then(async connection=>{
let repository = connection.getRepository(SomeTable);
let results = await repository
.createQueryBuilder("t")
.orderBy("t.postId", "DESC")
.getMany();
process.send(results);
})
});
main.js
const cp = require('child_process');
const child = cp.fork('./child.js');
child.send('Please fetch some data');
child.on('message', (m)=>{
console.log(m);
});
The big gain about Javascript is its asynchronous nature...
What happens when you call an asynchronous function is that the code continues to execute, not waiting for the answer. And just when the function is done, and an answer is given does it then continue on with that part.
Your database call is already asynchronous. So you would spawn another node process for completely nothing. Since your database takes all the heat, having more nodeJS processes wouldn't help on that part.
Take the same example but with a file write. What could make the write to the disk faster? Nothing much really... But do we care? Nope because our NodeJS is not blocked and keeps answering requests and handling tasks. The only thing that you might want to check is to not send a thousand file writes at the same time, if they are big there would be a negative impact on the file system, but since a write is not CPU intensive, node will run just fine.
child processes really are a great tool, but it is rare to need it. I too wanted to use some when I heard about them, but the thing is that you will certainly not need them at all... The only time I decided to use it was to create a CPU intensive worker. It would make sure it spawns one child process per Core (since node is single threaded) and respawn any faulty ones.
Despite being single threaded, how is node.js is faster?
I haven't run any tests to find statistics, but while digging in the node.js forums, I find everyone says it's faster and more lightweight.
But no matter how light weight it is, how can a single threaded server be faster than multi thread ones?
First, why is a program faster when multi-threaded ?
It's partly due to the fact a multi-threaded program can run on multiple cores but the main reason, by far, is that when a thread is waiting for some IO operation (which is very often, especially in a server), the other threads can still progress.
Now, what about node ?
Node isn't single threaded. The user script in JS is executed in one thread but all IO operations are natively handled by libuv and the OS which are multi-threaded.
More explanation here.
In practice, this means that several requests are handled in parallel. Here's a very (very) simplified example of a possible sequence of actions:
user script | node + OS "threads" (libuv)
-------------------------------------------------------------
receive and analyze request 1 |
ask node for file 1 | fetching file 1
receive and analyze request 2 | fetching file 1
ask node for file 2 | fetching file 1, fetching file 2
prepare response header 1 | fetching file 2
tell node to send file 1 | send file 1, fetching file 2
prepare response header 2 | send file 1
tell node to send file 2 | send file 1, send file 2
The whole architecture of node (and io.js) makes it simple to have a high level of parallelism. The user thread is only called by the event loop for very short tasks which stop at the next IO operation (well, not really only IO, but most often) when your code gives to node a callback that will be called when the operation finished.
Of course this only works when you're using the asynchronous functions of Node. Any time you use a function ending in "Sync" like writeFileSync, you're defeating the concurrency.
Node.js is not single threaded : see https://nodejs.org/about/ :
any connections can be handled concurrently
In fact, it does not use system threads, but instead uses the V8 engine along with the libuv library for multi-threading through asynchronous callbacks.
Also, you can use additional child process through child_process.fork
Finally, this does not in any way condition speed of the response or overall speed of the engine. Multi-threading is here for scalability.
Because nodejs won't wait for the response, instead it follows event-driven programming with callbacks i.e Once a request is submitted it will be pushed in an event queue and each request is handled by a single thread but this thread will just submit the request and move to the next request and so on and never waits for the response. Once the request is processed the respective callback function of the request will be executed.
NodeJS uses a event driven model in which only one thread executes the events. I understand the first event executed will be the user JS code. Simple example from nodeJS website of a webserver is below
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');
First event executed will perform the above steps. Later on the event loop will wait for events to be enqueued. My question is which thread enqueues event? Is there are a separate thread that does that? If yes can multiple threads enqueue events also?
Thank you.
My question is which thread enqueues event? Is there are a separate thread that does that? If yes can multiple threads enqueue events also?
At it's core, the javascript interpreter is a loop around the select() sytem call. The select() system call informs the OS about the filehandles that your program is interested in and the OS will block execution of your program until there are events on any of those filehandles. Basically, select() will return if there's data on some I/O channel your program is interested in.
In pseudocode, it looks something like this:
while(1) {
select(files_to_write,files_to_read,NULL,NULL,timeout)
process_writable(files_to_write)
process_readable(files_to_read)
timeout = process_timers()
}
This single function allows the interpreter to implement both asynchronous I/O and setTimeout/setInterval. (Technically, this is only partly true. Node.js uses either select or poll or epoll etc. based on what functions are available and what functions are more efficient on your OS)
So, who enqueues events? The OS.
There is one exception. Disk I/O on node.js is handled by a separate thread. So for this I/O it is this disk I/O thread that enqueues events to the event loop.
It could be implemented without threads. For example, the Tcl programming language (which predates javascript but also has built-in event loop) implements disk I/O in the main thread using features such as kqueue (on BSD based OS like MacOS) or aio (on Linux and several other OSes) or overlapped-i/o (Windows). But the node.js developers simply chose threading to handle disk i/o.
For more how this works at the C level see this answer: I know that callback function runs asynchronously, but why?
I have some tasks I want to do in JS that are resource intensive. For this question, lets assume they are some heavy calculations, rather then system access. Now I want to run tasks A, B and C at the same time, and executing some function D when this is done.
The async library provides a nice scaffolding for this:
async.parallel([A, B, C], D);
If what I am doing is just calculations, then this will still run synchronously (unless the library is putting the tasks on different threads itself, which I expect is not the case). How do I make this be actually parallel? What is the thing done typically by async code to not block the caller (when working with NodeJS)? Is it starting a child process?
2022 notice: this answer predates the introduction of worker threads in Node.js
How do I make this be actually parallel?
First, you won't really be running in parallel while in a single node application. A node application runs on a single thread and only one event at a time is processed by node's event loop. Even when running on a multi-core box you won't get parallelism of processing within a node application.
That said, you can get processing parallelism on multicore machine via forking the code into separate node processes or by spawning child process. This, in effect, allows you to create multiple instances of node itself and to communicate with those processes in different ways (e.g. stdout, process fork IPC mechanism). Additionally, you could choose to separate the functions (by responsibility) into their own node app/server and call it via RPC.
What is the thing done typically by async code to not block the caller (when working with NodeJS)? Is it starting a child process?
It is not starting a new process. Underneath, when async.parallel is used in node.js, it is using process.nextTick(). And nextTick() allows you to avoid blocking the caller by deferring work onto a new stack so you can interleave cpu intensive tasks, etc.
Long story short
Node doesn't make it easy "out of the box" to achieve multiprocessor concurrency. Node instead gives you a non-blocking design and an event loop that leverages a thread without sharing memory. Multiple threads cannot share data/memory, therefore locks aren't needed. Node is lock free. One node process leverages one thread, and this makes node both safe and powerful.
When you need to split work up among multiple processes then use some sort of message passing to communicate with the other processes / servers. e.g. IPC/RPC.
For more see:
Awesome answer from SO on What is Node.js... with tons of goodness.
Understanding process.nextTick()
Asynchronous and parallel are not the same thing. Asynchronous means that you don't have to wait for synchronization. Parallel means that you can be doing multiple things at the same time. Node.js is only asynchronous, but its only ever 1 thread. It can only work on 1 thing at once. If you have a long running computation, you should start another process and then just have your node.js process asynchronously wait for results.
To do this you could use child_process.spawn and then read data from stdin.
http://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options
var spawn = require('child_process').spawn;
var process2 = spawn('sh', ['./computationProgram', 'parameter'] );
process2.stderr.on('data', function (data) {
//handle error input
});
process2.stdout.on('data', function (data) {
//handle data results
});
Keep in mind I/O is parallelized by Node.js; only your JavaScript callbacks are single threaded.
Assuming you are writing a server, an alternative to adding the complexity of spawning processes or forking is to simply build stateless node servers and run an instance per core, or better yet run many instances each in their own virtualized micro server. Coordinate incoming requests using a reverse proxy or load balancer.
You could also offload computation to another server, maybe MongoDB (using MapReduce) or Hadoop.
To be truly hardcore, you could write a Node plugin in C++ and have fine-grained control of parallelizing the computation code. The speed up from C++ might negate the need of parallelization anyway.
You can always write code to perform computationally intensive tasks in another language best suited for numeric computation, and e.g. expose them through a REST API.
Finally, you could perhaps run the code on the GPU using node-cuda or something similar depending on the type of computation (not all can be optimized for GPU).
Yes, you can fork and spawn other processes, but it seems to me one of the major advantages of node is to not much have to worry about parallelization and threading, and therefor bypass a great amount of complexity altogether.
Depending on your use case you can use something like
task.js Simplified interface for getting CPU intensive code to run on all cores (node.js, and web)
A example would be
function blocking (exampleArgument) {
// block thread
}
// turn blocking pure function into a worker task
const blockingAsync = task.wrap(blocking);
// run task on a autoscaling worker pool
blockingAsync('exampleArgumentValue').then(result => {
// do something with result
});
Just recently came across parallel.js but it seems to be actually using multi-core and also has map reduce type features.
http://adambom.github.io/parallel.js/
I have a very simple utility script that I've written in JavaScript for node.js which reads a file, does a few calculations, then writes an output file. The source in its current form looks something like this:
fs.readFile(inputPath, function (err, data) {
if (err) throw err;
// do something with the data
fs.writeFile(outputPath, output, function (err) {
if (err) throw err;
console.log("File successfully written.");
});
});
This works fine, but I'm wondering if there is any disadvantage in this case to using the synchronous variety of these functions instead, like this:
var data = fs.readFileSync(inputPath);
// do something with the data
fs.writeFileSync(outputPath, output);
console.log("File successfully written.");
To me, this is much simpler to read and understand than the callback variety. Is there any reason to use the former method in this case?
I realize that speed isn't an issue at all with this simple script I'm running locally, but I'm interested in understanding the theory behind it. When does using the async methods help, and when does it not? Even in a production application, if I'm only reading a file, then waiting to perform the next task, is there any reason to use the asynchronous method?
What matters is what ELSE your node process needs to do while the synchronous IO happens. In the case of a simple shell script that is run at the command line by a single user, synchronous IO is totally fine since if you were doing asychronous IO all you'd be doing is waiting for the IO to come back anyway.
However, in a network service with multiple users you can NEVER use ANY synchronous IO calls (which is kind of the whole point of node, so believe me when I say this). To do so will cause ALL connected clients to halt processing and it is complete doom.
Rule of thumb: shell script: OK, network service: verboten!
For further reading, I made several analogies in this answer.
Basically, when node does asynchronous IO in a network server, it can ask the OS to do many things: read a few files, make some DB queries, send out some network traffic, and while waiting for that async IO to be ready, it can do memory/CPU things in the main event thread. Using this architecture, node gets pretty good performance/concurrency. However, when a synchronous IO operation happens, the entire node process just blocks and does absolutely nothing. It just waits. No new connections can be received. No processing happens, no event loop ticks, no callbacks, nothing. Just 1 synchronous operation stalls the entire server for all clients. You must not do it at all. It doesn't matter how fast it is or anything like that. It doesn't matter local filesystem or network request. Even if you spend 10ms reading a tiny file from disk for each client, if you have 100 clients, client 100 will wait a full second while that file is read one at a time over and over for clients 1-99.
Asynchronous code does not block the flow of execution, allowing your program to perform other tasks while waiting for an operation to complete.
In the first example, your code can continue running without waiting for the file to be written. In your second example, the code execution is "blocked" until the file is written. This is why synchronous code is known as "blocking" while asynchronous is known as "non-blocking."