Will my node.js lock up while comparing lists?

Will my node.js lock up while comparing lists? - javascript

So, basically i have 2 lists on a node.js app that i'm planning, one of them i get from a database and the other from copying the names of files on a server and storing that inside a list.
The problem is: both of them have ~750000 strings each, and i need to search for each string in one inside the other.
I am pretty new to node, so i'm wondering, will my app lock itself to other users while comparing the lists, with it being single threaded and all? comparing two huge lists like these seems pretty cpu intensive to me.

It depends of how you are comparing the lists and how your application is built, but generally yes. For example, this code would block execution:
server.on("request", (req, res)=>{
// This function will not be called while the comparasion is running.
// Requests will have to wait until the call stack empties (until nothing
// is running).
}
compareLists();
Maybe you should take a look at worker threads or cluster in order to make your Node.js application multi-threaded, or you could just execute another Node.js script in parallel with child process. Also, see this guide about the JavaScript event loop.

Related

Is moving database works to child processes a good idea in node.js?

I just started getting into child_process and all I know is that it's good for delegating blocking functions (e.g. looping a huge array) to child processes.
I use typeorm to communicate with the mysql database. I was wondering if there's a benefit to move some of the asynchronous database works to child processes. I read it in another thread (Unfortunately I couldn't find it in the browser history) that there's no good reason to delegate async functions to child processes. Is it true?
example code:
child.js
import {createConnection} "./dbConnection";
import {SomeTable} from "./entity/SomeTable";
process.on('message', (m)=> {
createConnection().then(async connection=>{
let repository = connection.getRepository(SomeTable);
let results = await repository
.createQueryBuilder("t")
.orderBy("t.postId", "DESC")
.getMany();
process.send(results);
})
});
main.js
const cp = require('child_process');
const child = cp.fork('./child.js');
child.send('Please fetch some data');
child.on('message', (m)=>{
console.log(m);
});

The big gain about Javascript is its asynchronous nature...
What happens when you call an asynchronous function is that the code continues to execute, not waiting for the answer. And just when the function is done, and an answer is given does it then continue on with that part.
Your database call is already asynchronous. So you would spawn another node process for completely nothing. Since your database takes all the heat, having more nodeJS processes wouldn't help on that part.
Take the same example but with a file write. What could make the write to the disk faster? Nothing much really... But do we care? Nope because our NodeJS is not blocked and keeps answering requests and handling tasks. The only thing that you might want to check is to not send a thousand file writes at the same time, if they are big there would be a negative impact on the file system, but since a write is not CPU intensive, node will run just fine.
child processes really are a great tool, but it is rare to need it. I too wanted to use some when I heard about them, but the thing is that you will certainly not need them at all... The only time I decided to use it was to create a CPU intensive worker. It would make sure it spawns one child process per Core (since node is single threaded) and respawn any faulty ones.

dealing with long server side calculations in meteor

I am using jimp (https://www.npmjs.com/package/jimp) in meteor JS to generate an image server side. In other words I am 'calculating' the pixels of the image using a recursive algorithm. The algorithm takes quite some time to complete.
The issue I am having is that this seems to completely block the meteor server. Users trying to visit the webpage while an image is being generated are forced to wait. The website is therefore not rendered at all.
Is there any (meteor) way to run the heavy recursive algorithm in a thread or something so that it does not block the entire website?

Node (and consequently meteor) runs in a single process which blocks on CPU activity. In short, node works really well when you are IO-bound, but as soon as you do anything that's compute-bound you need another approach.
As was suggested in the comments above, you'll need to offload this CPU-intensive activity to another process which could live on the same server (if you have multiple cores) or a different server.
We have a similar problem at Edthena were we need to transcode a subset of our video files. For now I decided to use a meteor-based solution, because it was easy to set up. Here's what we did:
When new transcode jobs need to happen, we insert a "video job" document in to the database.
On a separate server (we max out the full CPU when transcoding), we have an app which calls observe like this:
Meteor.startup(function () {
// Listen for non-failed transcode jobs in creation order. Use a limit of 1 to
// prevent multiple jobs of this type from running concurrently.
var selector = {
type: 'transcode',
state: { $ne: 'failed' },
};
var options = {
sort: { createdAt: 1 }, limit: 1,
};
VideoJobs.find(selector, options).observe({
added: function (videoJob) {
transcode(videoJob);
}, });
});
As the comments indicate this allows only one job to be called at a time, which may or may not be what you want. This has the further limitation that you can only run it on one app instance (multiple instances calling observe would simultaneously complete the job). So it's a pretty simplistic job queue, but it may work for your purposes for a while.
As you scale, you could use a more robust mechanism for dequeuing and processing the tasks like Amazon's sqs service. You can also explore other meteor-based solutions like job-collection.

I believe you're looking for Meteor.defer(yourFunction).
Relevant Kadira article: https://kadira.io/academy/meteor-performance-101/content/make-your-app-faster

Thanks for the comments and answers! It seems to be working now. What I did is what David suggested. I am running a meteor app on the same server. This app deals with the generating of the images. However, this resulted in the app still eating away all the processing power.
As a result of this I set a slightly lower priority on the generating algorithm with the renice command on the PID. (https://www.nixtutor.com/linux/changing-priority-on-linux-processes/) This works! Any time a user logs into the website the other (client) meteor application gains priority over the generating algorithm. Absolutely no delay at all anymore now.
The only issue I am having now is that whenever the server restarts I somehow have to rerun or run the (re)nice command.
Since I am using meteor up for deployment both apps run the same user and the same 'command': node main.js. I am currently trying to figure out how to run the nice command within the startup script of meteor up. (located at /etc/init/.conf)

Node.js, MongoDB, and Concurrency

I'm working on a game prototype and worried about the following case: Browser does AJAX to Node.JS, which has to do several MongoDB operations using async.series.
What prevents multiple requests at the same time causing the database issues? New events (i.e. db operations) seem like they could be run out of order or in between the async.series steps.
In other words, what happens if a user does AJAX calls very quickly, before the prior ones have finished their async.series. Hopefully that makes sense.
If this is indeed an issue, what is the proper way to handle it?

First and foremost, #fmodos's comment should be completely disregarded. It is wrong on many levels but most simply you could have any number of nodes running (say on Heroku) and there is no guarantee that subsequent requests will hit the same node.
Now, I'm going to answer your question by asking more questions. (You really didn't give me a choice here)
What are these operations doing? Inserting documents? Updating existing documents? Removing documents? This is very important because if all you're doing is simply inserting documents then why does it matter if one finishes for before the other? If you're updating documents then you should NOT be issuing a find, grabbing a ref to the object, and then calling save. (I'm making the assumption you're using mongoose, if you're not, I would) Instead what you should be doing is using built in mongo functions like $inc which properly handle concurrent requests.
http://docs.mongodb.org/manual/reference/operator/update/inc/
Does that help at all? If not, please let me know and I will give it another shot.

Mongo has database wide read/write locks. It gives preference to writes of the same collection first then fulfills reads. So, if by chance, you have Bill writing to the db and Joe is reading at the same time, Bill's write will execute first while Joe waits until the write is complete and then he is given all the data (including Bill's).

Parallelizing tasks in Node.js

I have some tasks I want to do in JS that are resource intensive. For this question, lets assume they are some heavy calculations, rather then system access. Now I want to run tasks A, B and C at the same time, and executing some function D when this is done.
The async library provides a nice scaffolding for this:
async.parallel([A, B, C], D);
If what I am doing is just calculations, then this will still run synchronously (unless the library is putting the tasks on different threads itself, which I expect is not the case). How do I make this be actually parallel? What is the thing done typically by async code to not block the caller (when working with NodeJS)? Is it starting a child process?

2022 notice: this answer predates the introduction of worker threads in Node.js
How do I make this be actually parallel?
First, you won't really be running in parallel while in a single node application. A node application runs on a single thread and only one event at a time is processed by node's event loop. Even when running on a multi-core box you won't get parallelism of processing within a node application.
That said, you can get processing parallelism on multicore machine via forking the code into separate node processes or by spawning child process. This, in effect, allows you to create multiple instances of node itself and to communicate with those processes in different ways (e.g. stdout, process fork IPC mechanism). Additionally, you could choose to separate the functions (by responsibility) into their own node app/server and call it via RPC.
What is the thing done typically by async code to not block the caller (when working with NodeJS)? Is it starting a child process?
It is not starting a new process. Underneath, when async.parallel is used in node.js, it is using process.nextTick(). And nextTick() allows you to avoid blocking the caller by deferring work onto a new stack so you can interleave cpu intensive tasks, etc.
Long story short
Node doesn't make it easy "out of the box" to achieve multiprocessor concurrency. Node instead gives you a non-blocking design and an event loop that leverages a thread without sharing memory. Multiple threads cannot share data/memory, therefore locks aren't needed. Node is lock free. One node process leverages one thread, and this makes node both safe and powerful.
When you need to split work up among multiple processes then use some sort of message passing to communicate with the other processes / servers. e.g. IPC/RPC.
For more see:
Awesome answer from SO on What is Node.js... with tons of goodness.
Understanding process.nextTick()

Asynchronous and parallel are not the same thing. Asynchronous means that you don't have to wait for synchronization. Parallel means that you can be doing multiple things at the same time. Node.js is only asynchronous, but its only ever 1 thread. It can only work on 1 thing at once. If you have a long running computation, you should start another process and then just have your node.js process asynchronously wait for results.
To do this you could use child_process.spawn and then read data from stdin.
http://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options
var spawn = require('child_process').spawn;
var process2 = spawn('sh', ['./computationProgram', 'parameter'] );
process2.stderr.on('data', function (data) {
//handle error input
});
process2.stdout.on('data', function (data) {
//handle data results
});

Keep in mind I/O is parallelized by Node.js; only your JavaScript callbacks are single threaded.
Assuming you are writing a server, an alternative to adding the complexity of spawning processes or forking is to simply build stateless node servers and run an instance per core, or better yet run many instances each in their own virtualized micro server. Coordinate incoming requests using a reverse proxy or load balancer.
You could also offload computation to another server, maybe MongoDB (using MapReduce) or Hadoop.
To be truly hardcore, you could write a Node plugin in C++ and have fine-grained control of parallelizing the computation code. The speed up from C++ might negate the need of parallelization anyway.
You can always write code to perform computationally intensive tasks in another language best suited for numeric computation, and e.g. expose them through a REST API.
Finally, you could perhaps run the code on the GPU using node-cuda or something similar depending on the type of computation (not all can be optimized for GPU).
Yes, you can fork and spawn other processes, but it seems to me one of the major advantages of node is to not much have to worry about parallelization and threading, and therefor bypass a great amount of complexity altogether.

Depending on your use case you can use something like
task.js Simplified interface for getting CPU intensive code to run on all cores (node.js, and web)
A example would be
function blocking (exampleArgument) {
// block thread
}
// turn blocking pure function into a worker task
const blockingAsync = task.wrap(blocking);
// run task on a autoscaling worker pool
blockingAsync('exampleArgumentValue').then(result => {
// do something with result
});

Just recently came across parallel.js but it seems to be actually using multi-core and also has map reduce type features.
http://adambom.github.io/parallel.js/

How does NodeJS handle async file IO?

Having worked with NodeJS for some time now, I've been wondering about how node handles file operations internally.
Considering the following pseudo code:
initialize http server
on connection:
modify_some_file:
on success:
print "it worked"
Let's consider two users A & B that try to access the page nearly simultaneously. Let's further assume A is the first one to connect, then the following happens:
A connects
NodeJS initializes the file operation and tells the operating system to be notified once it is done
And here's what I'm wondering about: Let's say, the file operation isn't done yet and B connects, what does node do? How and when does it access the file when it is still in the process of "being modified"?
I hope my question is somewhat clear ;)
Looking forward to your answers!

AFAIK, Node won't care.
At least on Unix, it's perfectly legal to have multiple writers to the same file. Sometimes that's not a problem (say your file consists of fixed-size records, where writer #1 writes to record X and writer #2 writes to record Y, with X !== Y), and sometimes it is (same example: when both writers want to write to record X).
In Node, the problems are mitigated because I/O operations "take turns", but I think there's still potential of two writers getting in each others way. It's up to the programmer to make sure that doesn't happen.
With Node, you could use the *Sync() versions of the fs operations (but those will block your app during the operation), use append mode (which is only atomic up to certain write sizes I think, and it depends on your requirements if appending is actually useful), use some form of locking, or use something like a queue where write operations would be put onto the queue and there's a single queue consumer to handle the writes.

We Keep Coding

JavaScript is the programming language of the Web.