NodeJS promise blocking requests - javascript

I am quite confused about why is my promise blocking the node app requests.
Here is my simplified code:
var express = require('express');
var someModule = require('somemodule');
app = express();
app.get('/', function (req, res) {
res.status(200).send('Main');
});
app.get('/status', function (req, res) {
res.status(200).send('Status');
});
// Init Promise
someModule.doSomething({}).then(function(){},function(){}, function(progress){
console.log(progress);
});
var server = app.listen(3000, function () {
var host = server.address().address;
var port = server.address().port;
console.log('Example app listening at http://%s:%s in %s environment',host, port, app.get('env'));
});
And the module:
var q = require('q');
function SomeModule(){
this.doSomething = function(){
return q.Promise(function(resolve, reject, notify){
for (var i=0;i<10000;i++){
notify('Progress '+i);
}
resolve();
});
}
}
module.exports = SomeModule;
Obviously this is very simplified. The promise function does some work that takes anywhere from 5 to 30 minutes and has to run only when server starts up.
There is NO async operation in that promise function. Its just a lot of data processing, loops etc.
I wont to be able to do requests right away though. So what I expect is when I run the server, I can go right away to 127.0.0.1:3000 and see Main and same for any other requests.
Eventually I want to see the progress of that task by accessing /status but Im sure I can make that work once the server works as expected.
At the moment, when I open / it just hangs until the promise job finishes..
Obviously im doing something wrong...

If your task is IO-bound go with process.nextTick. If your task is CPU-bound asynchronous calls won't offer much performance-wise. In that case you need to delegate the task to another process. An example solution would be to spawn a child process, do the work and pipe the results back to the parent process when done.
See nodejs.org/api/child_process.html for more.
If your application needs to do this often then forking lots of child processes quickly becomes a resource hog - each time you fork, a new V8 process will be loaded into memory. In this case it is probably better to use one of the multiprocessing modules like Node's own Cluster. This module offers easy creation and communication between master-worker processes and can remove a lot of complexity from your code.
See also a related question: Node.js - Sending a big object to child_process is slow

The main thread of Javascript in node.js is single threaded. So, if you do some giant loop that is processor bound, then that will hog the one thread and no other JS will run in node.js until that one operation is done.
So, when you call:
someModule.doSomething()
and that is all synchronous, then it does not return until it is done executing and thus the lines of code following that don't execute until the doSomething() method returns. And, just so you understand, the use of promises with synchronous CPU-hogging code does not help your cause at all. If it's synchronous and CPU bound, it's just going to take a long time to run before anything else can run.
If there is I/O involves in the loop (like disk I/O or network I/O), then there are opportunities to use async I/O operations and make the code non-blocking. But, if not and it's just a lot of CPU stuff, then it will block until done and no other code will run.
Your opportunities for changing this are:
Run the CPU consuming code in another process. Either create a separate program that you run as a child process that you can pass input to and get output from or create a separate server that you can then make async requests to.
Break the non-blocking work into chunks where you execute 100ms chunks of work at a time, then yield the processor back to the event loop (using something like setTimeout() to allow other things in the event queue to be serviced and run before you pick up and run the next chunk of work. You can see Best way to iterate over an array without blocking the UI for ideas on how to chunk synchronous work.
As an example, you could chunk your current loop. This runs up to 100ms of cycles and then breaks execution to give other things a chance to run. You can set the cycle time to whatever you want.
function SomeModule(){
this.doSomething = function(){
return q.Promise(function(resolve, reject, notify){
var cntr = 0, numIterations = 10000, timePerSlice = 100;
function run() {
if (cntr < numIterations) {
var start = Date.now();
while (Date.now() - start < timePerSlice && cntr < numIterations) {
notify('Progress '+cntr);
++cntr;
}
// give some other things a chance to run and then call us again
// setImmediate() is also an option here, but setTimeout() gives all
// other operations a chance to run alongside this operation
setTimeout(run, 10);
} else {
resolve();
}
}
run();
});
}
}

Related

Long-running asynchronous file copies block browser requests

Express.js serving a Remix app. The server-side code sets several timers at startup that do various background jobs every so often, one of which checks if a remote Jenkins build is finished. If so, it copies several large PDFs from one network path to another network path (both on GSA).
One function creates an array of chained glob+copyFile promises:
import { copyFile } from 'node:fs/promises';
import { promisify } from "util";
import glob from "glob";
...
async function getFiles() {
let result: Promise<void>[] = [];
let globPromise = promisify(glob);
for (let wildcard of wildcards) { // lots of file wildcards here
result.push(globPromise(wildcard).then(
(files: string[]) => {
if (files.length < 1) {
// do error stuff
} else {
for (let srcFile of files) {
let tgtFile = tgtDir + basename(srcFile);
return copyFile(srcFile, tgtFile);
}
}
},
(reason: any) => {
// do error stuff
}));
}
return result;
}
Another async function gets that array and does Promise.allSettled on it:
copyPromises = await getFiles();
console.log("CALLING ALLSETTLED.THEN()...");
return Promise.allSettled(copyPromises).then(
(results) => {
console.log("ALLSETTLED COMPLETE...");
Between the "CALLING" and "COMPLETE" messages, which can take on the order of several minutes, the server no longer responds to browser requests, which timeout.
However, during this time my other active backend timers can still be seen running and completing just fine in the server console log (I made one run every 5 seconds for test purposes, and it runs quite smoothly over and over while those file copies are crawling along).
So it's not blocking the server as a whole, it's seemingly just preventing browser requests from being handled. And once the "COMPLETE" message pops up in the log, browser requests are served up normally again.
The Express startup script basically just does this for Remix:
const { createRequestHandler } = require("#remix-run/express");
...
app.all(
"*",
createRequestHandler({
build: require(BUILD_DIR),
mode: process.env.NODE_ENV,
})
);
What's going on here, and how do I solve this?
It's apparent no further discussion is forthcoming, and I've not determined why the async I/O functions are preventing server responses, so I'll go ahead and post an answer that was basically Konrad Linkowski's workaround solution from the comments: to use the OS to do the copies instead of using copyFile(). It boils down to this in place of the glob+copyFile calls inside getFiles:
const exec = util.promisify(require('node:child_process').exec);
...
async function getFiles() {
...
result.push( exec("copy /y " + wildcard + " " + tgtDir) );
...
}
This does not exhibit any of the request-crippling behavior; for the entire time the copies are chugging away (many minutes), browser requests are handled instantly.
It's an OS-specific solution and thus non-portable as-is, but that's fine in our case since we will likely be using a Windows server for this app for many years to come. And certainly if needed, runtime OS-detection could be used to make the commands run on other OSes.
I guess that this is due to node's libuv using a threadpool with synchronous access for file system operations, and the pool size is only 4. See https://kariera.future-processing.pl/blog/on-problems-with-threads-in-node-js/ for a demonstration of the problem, or Nodejs - What is maximum thread that can run same time as thread pool size is four? for an explanation of how this is normally not a problem in network-heavy applications.
So if you have a filesystem-access-heavy application, try increasing the thread pool by setting the UV_THREADPOOL_SIZE environment variable.

Running background task in node.js

How do we run background tasks in node.js? So that I can perform multiple requests in parallel and simultaneously without blocking the UI. Once a request completes, I can inform the user that the operation is complete and the result is ready. While one request is getting processed, the end user is free to perform other operations as well
Javascript is single-threaded, so your programs must be asynchronous. Javascript uses events (callbacks, timeouts) to orchestrate the running of many quick function invocations. Nodejs doesn't have a UI to block, but these quick functions can be blocked by a slow function. This asynchronous nature of Javascript applies to both nodejs and browser code.
I believe your question is about how to deal with operations taking a lot of time.
One approach is to break up those operations into a lot of chunks -- quick function calls -- and have each of them use setTimeout() to start the next one. It's hard to suggest how to do this without understanding your long running function. Here's a question and some answers on the topic. How to break up a long running function in javascript, but keep performance
Summarizing the answer you use this sort of code.
function (data, callbackWhenDone) {
let n = 0
const max = data.length;
const batchSize = 100;
try {
function doBatch () {
let i
for (var i = 0; i < batch && n < max; ++i, ++n) {
doComputation(data[n])
}
if (n < max) setTimeout(doBatch, 0)
else callbackWhenDone(null, data)
}
} catch (error) {
callbackWhenDone(error)
}
doBatch()
}
The setTimeout(fn,0) at the end of the batch starts the next batch by queuing up a timeout in Javascript's main loop. Other items in that queue get a chance to run. Each batch knows where to start because the doBatch() function updates n as it runs. When the whole lot finishes, it invokes the callback you passed in. The callback uses the typical nodejs pattern where the first parameter, if not null, is an error.
Another approach is to use Worker Threads. You can put your computation's Javascript code into a separate Javascript context and have it run. You'll need the knack of passing the data back and forth from the main nodejs process to the thread. Make sure you have a reliable debugger setup before you try this; it can be confusing.

How to execute socket.io endpoints sychronously

My application is using socket.io, from what I gather, socket.io executes asynchronously. Most of the time this is not a problem, however there is a particular case where 2 users in my app may call the same socket endpoint at the same time, and this causes issues.
What I would rather have is for each socket endpoint to wait until the one before it finishes executing, before it gets executed. If they run asynchronously, I get unexpected results.
On the server I have the following...
// Establish a connection with a WebSocket.
io.on("connection", socket => {
socket.on("add_song", async (data) => {
PlaylistHandler.add_song(io, socket, data);
});
...
...
add_song gets called at the same time by two different io connections (2 different users). I don't want the function PlaylistHandler.add_song to run in parallel for each so I tried using async/await...
await PlaylistHandler.add_song(io, socket, data);
That didn't solve anything because I suspect it is because there are two different io connections making the call.
Is there any way to make the socket call itself run sequentially rather than in parallel?
await doesn't block the event loop, so it indeed doesn't matter here. Both await PlaylistHandler.add_song will get executed in their respective io.on listeners in parallel.
Your best/easiest shot is to set a variable calculating = true at the beginning of your add_song, and to postpone any add_song if we are already doing one.
Hope this snippet will inspire you into achieving a workable solution:
let calculating = false;
function add_song(io, socket, data){
if(calculating){
setTimeout(function(){
add_song(io, socket, data)
}, 500); //depends on how often you want to check, reduce/increase timeout depending on how time-sensitive checking should be
return;
}
calculating = true;
//Do all your usual add_song processing
//After final operation of add_song
calculating = false;
}

How to be sure two clients are not requesting at the same time corrupting state

I am new to Node JS and I am building a small app that relies on filesystem.
Let's say that the goal of my app is to fill a file like this:
1
2
3
4
..
And I want that at each request, a new line is written to the file, and in the right order.
Can I achieve that?
I know I can't let my question here without making any code so here it is. I am using an Express JS server:
(We imagine that the file contains only 1 at the first code launch)
import express from 'express'
import fs from 'fs'
let app = express();
app.all('*', function (req, res, next) {
// At every request, I want to write my file
writeFile()
next()
})
app.get('/', function(req,res) {
res.send('Hello World')
})
app.listen(3000, function (req,res) {
console.log('listening on port 3000')
})
function writeFile() {
// I get the file
let content = fs.readFileSync('myfile.txt','utf-8')
// I get an array of the numbers
let numbers = content.split('\n').map(item => {
return parseInt(item)
})
// I compute the new number and push it to the list
let new_number = numbers[numbers.length - 1] + 1
numbers.push(new_number)
// I write back the file
fs.writeFileSync('myfile.txt',numbers.join('\n'))
}
I tried to make a guess on the synchronous process that made me thinking that I was sure that nothing else was made at the same moment but I was really note sure...
If I am unclear, please tell me in the comments
If I understood you correctly, what you are scared of is a race condition happening in this case, where if two clients reach the HTTP server at the same time, the file is saved with the same contents where the number is only incremented once instead of twice.
The simple fix for it is to make sure the shared resource is only access or modified once at a time. In this case, using synchronous methods fix your problem. As when they are executing the whole node process is blocked and will not do anything.
If you change the synchronous methods with their asynchronous counter-parts without any other concurrency control measures then your code is definitely vulnerable to race conditions or corrupted state.
Now if this is only the thing your application is doing, it's probably best to keep this way as it's very simple, but let's say you want to add other functionality to it, in that case you probably want to avoid any synchronous methods as it blocks the process and won't let you have any concurrency.
A simple way to add a concurrency control, is to have a counter which keeps track how many requests are queued. If there's nothing queued up(counter === 0), then we just do read and write the file, else we add to the counter. Once writing to the file is finished we decrease from the counter and repeat:
app.all('*', function (req, res, next) {
// At every request, I want to write my file
writeFile();
next();
});
let counter = 0;
function writeFile() {
if (counter === 0) {
work(function onWriteFileDone() {
counter--;
if (counter > 0) {
work();
}
});
} else {
counter++;
}
function work(callback) {
// I get the file
let content = fs.readFile('myfile.txt','utf-8', function (err, content) {
// ignore the error because life is too short on stackoverflow questions...
// I get an array of the numbers
let numbers = content.split('\n').map(parseInt);
// I compute the new number and push it to the list
let new_number = numbers[numbers.length - 1] + 1;
numbers.push(new_number);
// I write back the file
fs.writeFile('myfile.txt',numbers.join('\n'), callback);
});
}
}
Of course this function doesn't have any arguments, but if you want to add to it, you have to use a queue instead of the counter where you store the arguments in the queue.
Now don't write your own concurrency mechanisms. There's a lot of in the node ecosystem. For example you can use the async module, which provide a queue.
Note that if you only have one process at a time, then you don't have to worry about multiple threads since In node.js, in one process there's only one thread of execution at a time, but let's say there's multiple processes writing to the file then that might make things more complicated, but let's keep that for another question if not already covered. Operating systems provides a few different ways to handle this, also you could use your own lock files or a dedicated process to write to the file or a message queue process.

Run 1000 requests so that only 10 runs at a time

With node.js I want to http.get a number of remote urls in a way that only 10 (or n) runs at a time.
I also want to retry a request if an exception occures locally (m times), but when the status code returns an error (5XX, 4XX, etc) the request counts as valid.
This is really hard for me to wrap my head around.
Problems:
Cannot try-catch http.get as it is async.
Need a way to retry a request on failure.
I need some kind of semaphore that keeps track of the currently active request count.
When all requests finished I want to get the list of all request urls and response status codes in a list which I want to sort/group/manipulate, so I need to wait for all requests to finish.
Seems like for every async problem using promises are recommended, but I end up nesting too many promises and it quickly becomes uncypherable.
There are lots of ways to approach the 10 requests running at a time.
Async Library - Use the async library with the .parallelLimit() method where you can specify the number of requests you want running at one time.
Bluebird Promise Library - Use the Bluebird promise library and the request library to wrap your http.get() into something that can return a promise and then use Promise.map() with a concurrency option set to 10.
Manually coded - Code your requests manually to start up 10 and then each time one completes, start another one.
In all cases, you will have to manually write some retry code and as with all retry code, you will have to very carefully decide which types of errors you retry, how soon you retry them, how much you backoff between retry attempts and when you eventually give up (all things you have not specified).
Other related answers:
How to make millions of parallel http requests from nodejs app?
Million requests, 10 at a time - manually coded example
My preferred method is with Bluebird and promises. Including retry and result collection in order, that could look something like this:
const request = require('request');
const Promise = require('bluebird');
const get = Promise.promisify(request.get);
let remoteUrls = [...]; // large array of URLs
const maxRetryCnt = 3;
const retryDelay = 500;
Promise.map(remoteUrls, function(url) {
let retryCnt = 0;
function run() {
return get(url).then(function(result) {
// do whatever you want with the result here
return result;
}).catch(function(err) {
// decide what your retry strategy is here
// catch all errors here so other URLs continue to execute
if (err is of retry type && retryCnt < maxRetryCnt) {
++retryCnt;
// try again after a short delay
// chain onto previous promise so Promise.map() is still
// respecting our concurrency value
return Promise.delay(retryDelay).then(run);
}
// make value be null if no retries succeeded
return null;
});
}
return run();
}, {concurrency: 10}).then(function(allResults) {
// everything done here and allResults contains results with null for err URLs
});
The simple way is to use async library, it has a .parallelLimit method that does exactly what you need.

Categories