How to run an infinite blocking process in NodeJS? - javascript

I have a set of API endpoints in Express. One of them receives a request and starts a long running process that blocks other incoming Express requests.
My goal to make this process non-blocking. To understand better inner logic of Node Event Loop and how I can do it properly, I want to replace this long running function with my dummy long running blocking function that would start when I send a request to its endpoint.
I suppose, that different ways of making the dummy function blocking could cause Node manage these blockings differently.
So, my question is - how can I make a basic blocking process as a function that would run infinitely?

You can use node-webworker-threads.
var Worker, i$, x$, spin;
Worker = require('webworker-threads').Worker;
for (i$ = 0; i$ < 5; ++i$) {
x$ = new Worker(fn$);
x$.onmessage = fn1$;
x$.postMessage(Math.ceil(Math.random() * 30));
}
(spin = function(){
return setImmediate(spin);
})();
function fn$(){
var fibo;
fibo = function(n){
if (n > 1) {
return fibo(n - 1) + fibo(n - 2);
} else {
return 1;
}
};
return this.onmessage = function(arg$){
var data;
data = arg$.data;
return postMessage(fibo(data));
};
}
function fn1$(arg$){
var data;
data = arg$.data;
console.log("[" + this.thread.id + "] " + data);
return this.postMessage(Math.ceil(Math.random() * 30));
}
https://github.com/audreyt/node-webworker-threads

So, my question is - how can I make a basic blocking process as a function that would run infinitely?
function block() {
// not sure why you need that though
while(true);
}
I suppose, that different ways of making the dummy function blocking could cause Node manage these blockings differently.
Not really. I can't think of a "special way" to block the engine differently.
My goal to make this process non-blocking.
If it is really that long running you should really offload it to another thread.

There are short cut ways to do a quick fix if its like a one time thing, you can do it using a npm module that would do the job.
But the right way to do it is setting up a common design pattern called 'Work Queues'. You will need to set up a queuing mechanism, like rabbitMq, zeroMq, etc. How it works is, whenever you get a computation heavy task, instead of doing it in the same thread, you send it to the queue with relevant id values. Then a separate node process commonly called a 'worker' process will be listening for new actions on the queue and will process them as they arrive. This is a worker queue pattern and you can read up on it here:
https://www.rabbitmq.com/tutorials/tutorial-one-javascript.html
I would strongly advise you to learn this pattern as you would come across many tasks that would require this kind of mechanism. Also with this in place you can scale both your node servers and your workers independently.

I am not sure what exactly your 'long processing' is, but in general you can approach this kind of problem in two different ways.
Option 1:
Use the webworker-threads module as #serkan pointed out. The usual 'thread' limitations apply in this scenario. You will need to communicate with the Worker in messages.
This method should be preferable only when the logic is too complicated to be broken down into smaller independent problems (explained in option 2). Depending on complexity you should also consider if native code would better serve the purpose.
Option 2:
Break down the problem into smaller problems. Solve a part of the problem, schedule the next part to be executed later, and yield to let NodeJS process other events.
For example, consider the following example for calculating the factorial of a number.
Sync way:
function factorial(inputNum) {
let result = 1;
while(inputNum) {
result = result * inputNum;
inputNum--;
}
return result;
}
Async way:
function factorial(inputNum) {
return new Promise(resolve => {
let result = 1;
const calcFactOneLevel = () => {
result = result * inputNum;
inputNum--;
if(inputNum) {
return process.nextTick(calcFactOneLevel);
}
resolve(result);
}
calcFactOneLevel();
}
}
The code in second example will not block the node process. You can send the response when returned promise resolves.

Related

How to run expensive code parallel in the same file

I'm trying to run a piece of JavaScript code asynchronously to the main thread. I don't necessarily need the code to actually run on a different thread (so performance does not need to be better that sequential execution), but I want the code to be executed in parallel to the main thread, meaning no freezing.
Additionally, all the code needed needs to be contained within a single function.
My example workload is as follows:
function work() {
for(let i=0; i<100000; i++)
console.log("Async");
}
Additionally, I may have some work on the main thread (which is allowed to freeze the side, just for testing):
function seqWork() {
for(let i=0; i<100000; i++)
console.log("Sequential");
}
The expected output should be something like this:
Sequential
Async
Sequential
Sequential
Async
Sequential
Async
Async
...
You get the point.
Disclaimer: I am absolutely unexperienced in JavaScript and in working with async and await.
What I've tried
I did some research, and found these 3 options:
1. async/await
Seems like the obvious choice. So I tried this:
let f= async function f() {
await work();
}
f();
seqWork();
Output:
Async (100000)
Sequential (100000)
I also tried:
let f = async function f() {
let g = () => new Promise((res,rej) => {
work();
res();
});
await g();
}
f();
seqWork();
Output:
Async (100000)
Sequential (100000)
So both methods did not work. They also both freeze the browser during the async output, so it seems that that has absolutely no effect(?) I may be doing something very wrong here, but I don't know what.
2. Promise.all
This seems to be praised as the solution for any expensive task, but only seems like a reasonably choice if you have many blocking tasks and you want to "combine" them into just one blocking task that is faster than executing them sequentially. There are certainly use cases for this, but for my task it is useless, because I only have one task to execute asynchronously, and the main "thread" should keep running during that task.
3. Worker
This seemed to me like the most promising option, but I have not got it working yet. The main problem is that you seem to need a second script. I cannot do that, but even in local testing with a second file Firefox is blocking the loading of that script.
This is what I've tried, and I have not found any other options in my research. I'm starting to think that something like this is straight up not possible in JS, but it seems like a quite simple task. Again, I don't need this to be actually executed in parallel, it would be enough if the event loop would alternate between calling a statement from the main thread and the async "thread". Coming from Java, they are also able to simulate multi threading on a single hardware thread.
Edit: Context
I have some java code that gets converted to JavaScript (I have no control over the conversion) using TeaVM. Java natively supports multithreading, and a lot of my code relies on that being possible. Now since JavaScript apparently does not really support real multithreading, TeaVM converts Thread in the most simplistic way to JS: Calling Thread.start() directly calls Thread.run() which makes it completely unusable. I want to create a better multithreading emulation here which can - pretty much - execute the thread code basically without modification. Now it is not ideal but inserting "yielding" statements into the java code would be possible.
TeaVM has a handy feature which allows you to write native Java methods annotated with matching JS code that will be converted directly into that code. Problem is, you cannot set the method body so you can't make it an async method.
One thing I'm now trying to do is implement a JS native "yield" / "pause" (to not use the keyword in JS) function which I can call to allow for other code to run right from the java method. The method basically has to briefly block the execution of the calling code and instead invoke execution of other queued tasks. I'm not sure whether that is possible with the main code not being in an async function. (I cannot alter the generated JS code)
The only other way I can think of to work around this would be to let the JS method call all the blocking code, refering back to the Java code. The main problem though is, that this means splitting up the method body of the java method into many small chunks as Java does not support something like yield return from C#. This basically means a complete rework of every single parallel executed piece of code, which I would desperately try to avoid. Also, you could not "yield" from within a called method, making it way less modular. At that point I may as well just call the method chunks from within Java directly from an internal event loop.
Since JavaScript is single threaded the choice is between
running some code asynchronously in the main thread, or
running the same code in a worker thread (i.e. one that is not the main thread.
Coopererative Multitasking
If you want to run heavy code in the main thread without undue blocking it would need to be written to multitask cooperatively. This requires long running synchronous tasks to periodically yield control to the task manager to allow other tasks to run. In terms of JavaScript you could achieve this by running a task in an asynchronous function that periodically waits for a system timer of short duration. This has potential because await saves the current execution context and returns control to the task manager while an asynchronous task is performed. A timer call ensures that the task manager can actually loop and do other things before returning control to the asynchronous task that started the timer.
Awaiting a promise that is already fulfilled would only interleave execution of jobs in the microtask queue without returning to the event loop proper and is not a suitable for this purpose.
Calling code pattern:
doWork()
.then( data => console.log("work done"));
Work code:
async function doWork() {
for( i = 1; i < 10000; ++i) {
// do stuff
if( i%1000 == 0) {
// let other things happen:
await new Promise( resolve=>setTimeout(resolve, 4))
}
}
}
Note this draws on historical practice and might suit the purpose of getting prototype code working quickly. I wouldn't think it particularly suitability for a commercial production environment.
Worker Threads
A localhost server can be used to serve worker code from a URL so development can proceed. A common method is to use a node/express server listening on a port of the loopback address known as localhost.
You will need to install node and install express using NPM (which is installed with node). It is not my intention to go into the node/express eco-system - there is abundant material about it on the web.
If you are still looking for a minimalist static file server to serve files from the current working directory, here's one I wrote earlier. Again there are any number of similar examples available on the net.
"use strict";
/*
* express-cwd.js
* node/express server for files in current working directory
* Terminal or shortcut/desktop launcher command: node express-cwd
*/
const express = require('express');
const path = require('path');
const process = require("process");
const app = express();
app.get( '/*', express.static( process.cwd())); // serve files from working directory
const ip='::1'; // local host
const port=8088; // port 8088
const server = app.listen(port, ip, function () {
console.log( path.parse(module.filename).base + ' listening at http://localhost:%s', port);
})
Promise Delays
The inlined promise delay shown in "work code" above can be written as a function, not called yield which is a reserved word. For example
const timeOut = msec => new Promise( r=>setTimeout(r, msec));
An example of executing blocking code in sections:
"use strict";
// update page every 500 msec
const counter = document.getElementById("count");
setInterval( ()=> counter.textContent = +counter.textContent + 1, 500);
function block_500ms() {
let start = Date.now();
let end = start + 500;
for( ;Date.now() < end; );
}
// synchronously block for 4 seconds
document.getElementById("block")
.addEventListener("click", ()=> {
for( var i = 8; i--; ) {
block_500ms();
}
console.log( "block() done");
});
// block for 500 msec 8 times, with timeout every 500 ms
document.getElementById("block8")
.addEventListener("click", async ()=> {
for( var i = 8; i--; ) {
block_500ms();
await new Promise( resolve=>setTimeout(resolve, 5))
}
console.log("block8() done");
});
const timeOut = msec => new Promise( r=>setTimeout(r, msec));
document.getElementById("blockDelay")
.addEventListener("click", async ()=> {
for( var i = 8; i--; ) {
block_500ms();
await timeOut();
}
console.log("blockDelay(1) done");
});
Up Counter: <span id="count">0</span>
<p>
<button id="block" type="button" >Block counter for 4 seconds</button> - <strong> with no breaks</strong>
<p>
<button id="block8" type="button" >Block for 4 seconds </button> - <strong> with short breaks every 500 ms (inline)</strong>
<p>
<button id="blockDelay" type="button" >Block for 4 seconds </button> - <strong> with short breaks every 500 ms (using promise function) </strong>
Some jerkiness may be noticeable with interleaved sections of blocking code but the total freeze is avoided. Timeout values are determined by experiment - the shorter the value that works in an acceptable manner the better.
Caution
Program design must ensure that variables holding input data, intermediate results and accumulated output data are not corrupted by main thread code that may or may not be executed part way through the course of heavy code execution.

Emscripten sandwiched by asynchronous Javascript Code

I'm trying to use Emscripten to write a Software to run in browser but also on other architectures (e.g. Android, PC-standalone app).
The Software structure is something like this:
main_program_loop() {
if (gui.button_clicked()) {
run_async(some_complex_action, gui.text_field.to_string())
}
if (some_complex_action_has_finished())
{
make_use_of(get_result_from_complex_action());
}
}
some_complex_action(string_argument)
{
some_object = read_local(string_argument);
interm_res = simple_computation(some_object);
other_object = expensive_computation(interm_res);
send_remote(some_object.member_var, other_object);
return other_object.member_var;
}
Let's call main_program_loop the GUI or frontend, some_complex_action the intermediate layer, and read_local, send_remode and expensive_computation the backend or lower layer.
Now the frontend and backend would be architecture specific (e.g. for Javascript read_local could use IndexDB, send_remote could use fetch),
but the intermediate layer should make up more then 50% of the code (that's why I do not want to write it two times in two different languages, and instead write it once in C and transpile it to Javascript, for Android I would use JNI).
Problems come in since in Javascript the functions on the lowest layer (fetch etc) run asyncronously (return a promise or require a callback).
One approach I tried was to use promises and send IDs through the intermediate layer
var promises = {};
var last_id = 0;
handle_click() {
var id = Module.ccall('some_complex_action', 'number', ['string'], [text_field.value]);
promises[id].then((result) => make_us_of(result));
}
recv_remote: function(str) {
promises[last_id] = fetch(get_url(str)).then((response) => response.arrayBuffer());
last_id += 1;
return last_id - 1;
}
It works for the simple case of
some_complex_action(char *str)
{
return recv_remote(str);
}
But for real cases it seem to be getting really complicated, maybe impossible. (I tried some approach where I'd given every function a state and every time a backend function finishes, the function is recalled and advances it's state or so, but the code started getting complicated like hell.) To compare, if I was to call some_complex_action from C or Java, I'd just call it in a thread separate from the GUI thread, and inside the thread everything would happen synchronously.
I wished I could just call some_complex_action from an async function and put await inside recv_remote but of cause I can put await only directly in the async function, not in some function called down the line. So that idea did not work out either.
Ideally if somehow I could stop execution of the intermediate Emscripten transpiled code until the backend function has completed, then return from the backend function with the result and continue executing the transpiled code.
Has anyone used Emterpreter and can imagine that it could help me get to my goal?
Any ideas what I could do?

Stopping synchronous function after 2 seconds

I'm using the npm library jsdiff, which has a function that determines the difference between two strings. This is a synchronous function, but given two large, very different strings, it will take extremely long periods of time to compute.
diff = jsdiff.diffWords(article[revision_comparison.field], content[revision_comparison.comparison]);
This function is called in a stack that handles an request through Express. How can I, for the sake of the user, make the experience more bearable? I think my two options are:
Cancelling the synchronous function somehow.
Cancelling the user request somehow. (But would this keep the function still running?)
Edit: I should note that given two very large and different strings, I want a different logic to take place in the code. Therefore, simply waiting for the process to finish is unnecessary and cumbersome on the load - I definitely don't want it to run for any long period of time.
fork a child process for that specific task, you can even create a queu to limit the number of child process that can be running in a given moment.
Here you have a basic example of a worker that sends the original express req and res to a child that performs heavy sync. operations without blocking the main (master) thread, and once it has finished returns back to the master the outcome.
Worker (Fork Example) :
process.on('message', function(req,res) {
/* > Your jsdiff logic goes here */
//change this for your heavy synchronous :
var input = req.params.input;
var outcome = false;
if(input=='testlongerstring'){outcome = true;}
// Pass results back to parent process :
process.send(req,res,outcome);
});
And from your Master :
var cp = require('child_process');
var child = cp.fork(__dirname+'/worker.js');
child.on('message', function(req,res,outcome) {
// Receive results from child process
console.log('received: ' + outcome);
res.send(outcome); // end response with data
});
You can perfectly send some work to the child along with the req and res like this (from the Master): (imagine app = express)
app.get('/stringCheck/:input',function(req,res){
child.send(req,res);
});
I found this on jsdiff's repository:
All methods above which accept the optional callback method will run in sync mode when that parameter is omitted and in async mode when supplied. This allows for larger diffs without blocking the event loop. This may be passed either directly as the final parameter or as the callback field in the options object.
This means that you should be able to add a callback as the last parameter, making the function asynchronous. It will look something like this:
jsdiff.diffWords(article[x], content[y], function(err, diff) {
//add whatever you need
});
Now, you have several choices:
Return directly to the user and keep the function running in the background.
Set a 2 second timeout (or whatever limit fits your application) using setTimeout as outlined in this
answer.
If you go with option 2, your code should look something like this
jsdiff.diffWords(article[x], content[y], function(err, diff) {
//add whatever you need
return callback(err, diff);
});
//if this was called, it means that the above operation took more than 2000ms (2 seconds)
setTimeout(function() { return callback(); }, 2000);

Balancing clean design against browser limits

I'm trying to design an interface to a model (part of an MVC/MVP design) which could represent data stored remotely (on a server) or locally. To do this requires that my interface be asynchronous, i.e. I will request some data from the model and a callback will provide the actual data. However, the model could also be stored locally, that is, the request could be satisfied synchronously.
The issue I am running into is that there could conceivably be a lot of calls to the model in order to generate the view, but if the model is synchronous this could result in a stack overflow since each callback is recursing one level deeper and no browser javascript implementations support tail recursion. I thought about using setTimeout but the minimum delay is 4ms, and anything running into a stack overflow (which under my version of chrome is a bit over 10k) is going to be unacceptably slow using setTimeout (over 40s!) due to all those unnecessary 4ms waits between iterations through the model.
So I'm wondering how to solve this problem and I'm not coming up with any good solutions. The browser doesn't seem capable of doing what I need it to, and I can't have two different interfaces - one for local and one for remote - because the calling code shouldn't have to care.
Update: Here is the source of the recursion:
I'm using continuation.js to turn synchronous looking code here:
function test2() {
var i
for (i=0;i<10;i++) {
AJaX(some_query+i,cont(tab))
console.log(tab.status)
}
}
Into this:
function test2() {
var i, tab;
i = 0;
function _$loop_0(_$loop_0__$cont) {
if (i < 10) {
AJaX(some_query + i, function (arguments, _$param0) {
tab = _$param0;
console.log(tab.status);
i++;
_$loop_0(_$loop_0__$cont);
}.bind(this, arguments));
} else {
_$loop_0__$cont();
}
}
_$loop_0(function () {
});
}

Node.js and Mutexes

I'm wondering if mutexes/locks are required for data access within Node.js. For example, lets say I've created a simple server. The server provides a couple protocol methods to add to and remove from an internal array. Do I need to protect the internal array with some type of mutex?
I understand Javascript (and thus Node.js) is single threaded. I'm just not clear on how events are handled. Do events interrupt? If that is the case, my app could be in the middle of reading the array, get interrupted to run an event callback which changes the array, and then continue processing the array which has now been changed by the event callback.
Locks and mutexes are indeed necessary sometimes, even if Node.js is single-threaded.
Suppose you have two files that must have the same content and not having the same content is considered an inconsistent state. Now suppose you need to change them without blocking the server. If you do this:
fs.writeFile('file1', 'content', function (error) {
if (error) {
// ...
} else {
fs.writeFile('file2', 'content', function (error) {
if (error) {
// ...
} else {
// ready to continue
}
});
}
});
you fall in an inconsistent state between the two calls, when another function in the same script may be able to read the two files.
The rwlock module is perfect to handle these cases.
I'm wondering if mutexes/locks are required for data access within Node.js.
Nope! Events are handled the moment there's no other code to run, this means there will be no contention, as only the currently running code has access to that internal array. As a side-effect of node being single-threaded, long computations will block all other events until the computation is done.
I understand Javascript (and thus Node.js) is single threaded. I'm just not clear on how events are handled. Do events interrupt?
Nope, events are not interrupted. For example, if you put a while(true){} into your code, it would stop any other code from being executed, because there is always another iteration of the loop to be run.
If you have a long-running computation, it is a good idea to use process.nextTick, as this will allow it to be run when nothing else is running (I'm fuzzy on this: the example below shows that I'm probably right about it running uninterrupted, probably).
If you have any other questions, feel free to stop into #node.js and ask questions. Also, I asked a couple people to look at this and make sure I'm not totally wrong ;)
var count = 0;
var numIterations = 100;
while(numIterations--) {
process.nextTick(function() {
count = count + 1;
});
}
setTimeout(function() {
console.log(count);
}, 2);
//
//=> 100
//
Thanks to AAA_awright of #node.js :)
I was looking for solution for node mutexes. Mutexes are sometimes necessary - you could be running multiple instances of your node application and may want to assure that only one of them is doing some particular thing. All solutions I could find were either not cross-process or depending on redis.
So I made my own solution using file locks: https://github.com/Perennials/mutex-node
Mutexes are definitely necessary for a lot of back end implementations. Consider a class where you need to maintain synchronicity of async execution by constructing a promise chain.
let _ = new WeakMap();
class Foobar {
constructor() {
_.set(this, { pc : Promise.resolve() } );
}
doSomething(x) {
return new Promise( (resolve,reject) => {
_.get(this).pc = _.get(this).pc.then( () => {
y = some value gotten asynchronously
resolve(y);
})
})
}
}
How can you be sure that a promise is not left dangling via race condition? It's frustrating that node hasn't made mutexes native since javascript is so inherently asynchronous and bringing third party modules into the process space is always a security risk.

Categories