I'm building a site and one particular operation triggers a long, server-side process to run. This operation can't be run twice at the same time, so I need to implement some sort of protection. It also can't be made synchronous, because the server needs to continue responding to other requests while it runs.
To that end I've constructed this small concept test, using sleep 5 as a substitute for my actual long-running process (requires express and child-process-promise, runs on a system with a sleep command but substitute whatever for Windows):
var site = require("express")();
var exec = require("child-process-promise").exec;
var busy = false;
site.get("/test", function (req, res) {
if (busy) {
res.json({status:"busy"});
} else {
busy = true; // <-- set busy before we start
exec("sleep 5").then(function () {
res.json({status:"ok"});
}).catch(function (err) {
res.json({status:err.message});
}).then(function () {
busy = false; // <-- finally: clear busy
});
}
});
site.listen(8082);
The intention of this is when "/test" is requested it triggers a long operation, and if "/test" is requested again while it is running, it replies with "busy" and does nothing.
My question is, is this implementation safe and correct? It appears to work in my cursory tests but it's suspiciously simple. Is this the proper way to essentially implement a mutex + a "try-lock" operation, or is there some more appropriate Node.js construct? Coming from languages where I'm used to standard multithreading practices, I'm not quite comfortable with Node's single-threaded-but-asynchronous nature yet.
You're fine - Javascript code can't run concurrently with other JS code in Node. Nothing will change the busy flag out from under you. No need for multithreaded-styled monitors or critical sections.
Related
I'm trying to run a piece of JavaScript code asynchronously to the main thread. I don't necessarily need the code to actually run on a different thread (so performance does not need to be better that sequential execution), but I want the code to be executed in parallel to the main thread, meaning no freezing.
Additionally, all the code needed needs to be contained within a single function.
My example workload is as follows:
function work() {
for(let i=0; i<100000; i++)
console.log("Async");
}
Additionally, I may have some work on the main thread (which is allowed to freeze the side, just for testing):
function seqWork() {
for(let i=0; i<100000; i++)
console.log("Sequential");
}
The expected output should be something like this:
Sequential
Async
Sequential
Sequential
Async
Sequential
Async
Async
...
You get the point.
Disclaimer: I am absolutely unexperienced in JavaScript and in working with async and await.
What I've tried
I did some research, and found these 3 options:
1. async/await
Seems like the obvious choice. So I tried this:
let f= async function f() {
await work();
}
f();
seqWork();
Output:
Async (100000)
Sequential (100000)
I also tried:
let f = async function f() {
let g = () => new Promise((res,rej) => {
work();
res();
});
await g();
}
f();
seqWork();
Output:
Async (100000)
Sequential (100000)
So both methods did not work. They also both freeze the browser during the async output, so it seems that that has absolutely no effect(?) I may be doing something very wrong here, but I don't know what.
2. Promise.all
This seems to be praised as the solution for any expensive task, but only seems like a reasonably choice if you have many blocking tasks and you want to "combine" them into just one blocking task that is faster than executing them sequentially. There are certainly use cases for this, but for my task it is useless, because I only have one task to execute asynchronously, and the main "thread" should keep running during that task.
3. Worker
This seemed to me like the most promising option, but I have not got it working yet. The main problem is that you seem to need a second script. I cannot do that, but even in local testing with a second file Firefox is blocking the loading of that script.
This is what I've tried, and I have not found any other options in my research. I'm starting to think that something like this is straight up not possible in JS, but it seems like a quite simple task. Again, I don't need this to be actually executed in parallel, it would be enough if the event loop would alternate between calling a statement from the main thread and the async "thread". Coming from Java, they are also able to simulate multi threading on a single hardware thread.
Edit: Context
I have some java code that gets converted to JavaScript (I have no control over the conversion) using TeaVM. Java natively supports multithreading, and a lot of my code relies on that being possible. Now since JavaScript apparently does not really support real multithreading, TeaVM converts Thread in the most simplistic way to JS: Calling Thread.start() directly calls Thread.run() which makes it completely unusable. I want to create a better multithreading emulation here which can - pretty much - execute the thread code basically without modification. Now it is not ideal but inserting "yielding" statements into the java code would be possible.
TeaVM has a handy feature which allows you to write native Java methods annotated with matching JS code that will be converted directly into that code. Problem is, you cannot set the method body so you can't make it an async method.
One thing I'm now trying to do is implement a JS native "yield" / "pause" (to not use the keyword in JS) function which I can call to allow for other code to run right from the java method. The method basically has to briefly block the execution of the calling code and instead invoke execution of other queued tasks. I'm not sure whether that is possible with the main code not being in an async function. (I cannot alter the generated JS code)
The only other way I can think of to work around this would be to let the JS method call all the blocking code, refering back to the Java code. The main problem though is, that this means splitting up the method body of the java method into many small chunks as Java does not support something like yield return from C#. This basically means a complete rework of every single parallel executed piece of code, which I would desperately try to avoid. Also, you could not "yield" from within a called method, making it way less modular. At that point I may as well just call the method chunks from within Java directly from an internal event loop.
Since JavaScript is single threaded the choice is between
running some code asynchronously in the main thread, or
running the same code in a worker thread (i.e. one that is not the main thread.
Coopererative Multitasking
If you want to run heavy code in the main thread without undue blocking it would need to be written to multitask cooperatively. This requires long running synchronous tasks to periodically yield control to the task manager to allow other tasks to run. In terms of JavaScript you could achieve this by running a task in an asynchronous function that periodically waits for a system timer of short duration. This has potential because await saves the current execution context and returns control to the task manager while an asynchronous task is performed. A timer call ensures that the task manager can actually loop and do other things before returning control to the asynchronous task that started the timer.
Awaiting a promise that is already fulfilled would only interleave execution of jobs in the microtask queue without returning to the event loop proper and is not a suitable for this purpose.
Calling code pattern:
doWork()
.then( data => console.log("work done"));
Work code:
async function doWork() {
for( i = 1; i < 10000; ++i) {
// do stuff
if( i%1000 == 0) {
// let other things happen:
await new Promise( resolve=>setTimeout(resolve, 4))
}
}
}
Note this draws on historical practice and might suit the purpose of getting prototype code working quickly. I wouldn't think it particularly suitability for a commercial production environment.
Worker Threads
A localhost server can be used to serve worker code from a URL so development can proceed. A common method is to use a node/express server listening on a port of the loopback address known as localhost.
You will need to install node and install express using NPM (which is installed with node). It is not my intention to go into the node/express eco-system - there is abundant material about it on the web.
If you are still looking for a minimalist static file server to serve files from the current working directory, here's one I wrote earlier. Again there are any number of similar examples available on the net.
"use strict";
/*
* express-cwd.js
* node/express server for files in current working directory
* Terminal or shortcut/desktop launcher command: node express-cwd
*/
const express = require('express');
const path = require('path');
const process = require("process");
const app = express();
app.get( '/*', express.static( process.cwd())); // serve files from working directory
const ip='::1'; // local host
const port=8088; // port 8088
const server = app.listen(port, ip, function () {
console.log( path.parse(module.filename).base + ' listening at http://localhost:%s', port);
})
Promise Delays
The inlined promise delay shown in "work code" above can be written as a function, not called yield which is a reserved word. For example
const timeOut = msec => new Promise( r=>setTimeout(r, msec));
An example of executing blocking code in sections:
"use strict";
// update page every 500 msec
const counter = document.getElementById("count");
setInterval( ()=> counter.textContent = +counter.textContent + 1, 500);
function block_500ms() {
let start = Date.now();
let end = start + 500;
for( ;Date.now() < end; );
}
// synchronously block for 4 seconds
document.getElementById("block")
.addEventListener("click", ()=> {
for( var i = 8; i--; ) {
block_500ms();
}
console.log( "block() done");
});
// block for 500 msec 8 times, with timeout every 500 ms
document.getElementById("block8")
.addEventListener("click", async ()=> {
for( var i = 8; i--; ) {
block_500ms();
await new Promise( resolve=>setTimeout(resolve, 5))
}
console.log("block8() done");
});
const timeOut = msec => new Promise( r=>setTimeout(r, msec));
document.getElementById("blockDelay")
.addEventListener("click", async ()=> {
for( var i = 8; i--; ) {
block_500ms();
await timeOut();
}
console.log("blockDelay(1) done");
});
Up Counter: <span id="count">0</span>
<p>
<button id="block" type="button" >Block counter for 4 seconds</button> - <strong> with no breaks</strong>
<p>
<button id="block8" type="button" >Block for 4 seconds </button> - <strong> with short breaks every 500 ms (inline)</strong>
<p>
<button id="blockDelay" type="button" >Block for 4 seconds </button> - <strong> with short breaks every 500 ms (using promise function) </strong>
Some jerkiness may be noticeable with interleaved sections of blocking code but the total freeze is avoided. Timeout values are determined by experiment - the shorter the value that works in an acceptable manner the better.
Caution
Program design must ensure that variables holding input data, intermediate results and accumulated output data are not corrupted by main thread code that may or may not be executed part way through the course of heavy code execution.
I have a wasm process (compiled from c++) that processes data inside a web application. Let's say the necessary code looks like this:
std::vector<JSONObject> data
for (size_t i = 0; i < data.size(); i++)
{
process_data(data[i]);
if (i % 1000 == 0) {
bool is_cancelled = check_if_cancelled();
if (is_cancelled) {
break;
}
}
}
This code basically "runs/processes a query" similar to a SQL query interface:
However, queries may take several minutes to run/process and at any given time the user may cancel their query. The cancellation process would occur in the normal javascript/web application, outside of the service Worker running the wasm.
My question then is what would be an example of how we could know that the user has clicked the 'cancel' button and communicate it to the wasm process so that knows the process has been cancelled so it can exit? Using the worker.terminate() is not an option, as we need to keep all the loaded data for that worker and cannot just kill that worker (it needs to stay alive with its stored data, so another query can be run...).
What would be an example way to communicate here between the javascript and worker/wasm/c++ application so that we can know when to exit, and how to do it properly?
Additionally, let us suppose a typical query takes 60s to run and processes 500MB of data in-browser using cpp/wasm.
Update: I think there are the following possible solutions here based on some research (and the initial answers/comments below) with some feedback on them:
Use two workers, with one worker storing the data and another worker processing the data. In this way the processing-worker can be terminated, and the data will always remain. Feasible? Not really, as it would take way too much time to copy over ~ 500MB of data to the webworker whenever it starts. This could have been done (previously) using SharedArrayBuffer, but its support is now quite limited/nonexistent due to some security concerns. Too bad, as this seems like by far the best solution if it were supported...
Use a single worker using Emterpreter and using emscripten_sleep_with_yield. Feasible? No, destroys performance when using Emterpreter (mentioned in the docs above), and slows down all queries by about 4-6x.
Always run a second worker and in the UI just display the most recent. Feasible? No, would probably run into quite a few OOM errors if it's not a shared data structure and the data size is 500MB x 2 = 1GB (500MB seems to be a large though acceptable size when running in a modern desktop browser/computer).
Use an API call to a server to store the status and check whether the query is cancelled or not. Feasible? Yes, though it seems quite heavy-handed to long-poll with network requests every second from every running query.
Use an incremental-parsing approach where only a row at a time is parsed. Feasible? Yes, but also would require a tremendous amount of re-writing the parsing functions so that every function supports this (the actual data parsing is handled in several functions -- filter, search, calculate, group by, sort, etc. etc.
Use IndexedDB and store the state in javascript. Allocate a chunk of memory in WASM, then return its pointer to JavaScript. Then read database there and fill the pointer. Then process your data in C++. Feasible? Not sure, though this seems like the best solution if it can be implemented.
[Anything else?]
In the bounty then I was wondering three things:
If the above six analyses seem generally valid?
Are there other (perhaps better) approaches I'm missing?
Would anyone be able to show a very basic example of doing #6 -- seems like that would be the best solution if it's possible and works cross-browser.
For Chrome (only) you may use shared memory (shared buffer as memory). And raise a flag in memory when you want to halt. Not a big fan of this solution (is complex and is supported only in chrome). It also depends on how your query works, and if there are places where the lengthy query can check the flag.
Instead you should probably call the c++ function multiple times (e.g. for each query) and check if you should halt after each call (just send a message to the worker to halt).
What I mean by multiple time is make the query in stages (multiple function cals for a single query). It may not be applicable in your case.
Regardless, AFAIK there is no way to send a signal to a Webassembly execution (e.g. Linux kill). Therefore, you'll have to wait for the operation to finish in order to complete the cancellation.
I'm attaching a code snippet that may explain this idea.
worker.js:
... init webassembly
onmessage = function(q) {
// query received from main thread.
const result = ... call webassembly(q);
postMessage(result);
}
main.js:
const worker = new Worker("worker.js");
const cancel = false;
const processing = false;
worker.onmessage(function(r) {
// when worker has finished processing the query.
// r is the results of the processing.
processing = false;
if (cancel === true) {
// processing is done, but result is not required.
// instead of showing the results, update that the query was canceled.
cancel = false;
... update UI "cancled".
return;
}
... update UI "results r".
}
function onCancel() {
// Occurs when user clicks on the cancel button.
if (cancel) {
// sanity test - prevent this in UI.
throw "already cancelling";
}
cancel = true;
... update UI "canceling".
}
function onQuery(q) {
if (processing === true) {
// sanity test - prevent this in UI.
throw "already processing";
}
processing = true;
// Send the query to the worker.
// When the worker receives the message it will process the query via webassembly.
worker.postMessage(q);
}
An idea from user experience perspective:
You may create ~two workers. This will take twice the memory, but will allow you to "cancel" "immediately" once. (it will just mean that in the backend the 2nd worker will run the next query, and when the 1st finishes the cancellation, cancellation will again become immediate).
Shared Thread
Since the worker and the C++ function that it called share the same thread, the worker will also be blocked until the C++ loop is finished, and won't be able to handle any incoming messages. I think the a solid option would minimize the amount of time that the thread is blocked by instead initializing one iteration at a time from the main application.
It would look something like this.
main.js -> worker.js -> C++ function -> worker.js -> main.js
Breaking up the Loop
Below, C++ has a variable initialized at 0, which will be incremented at each loop iteration and stored in memory.
C++ function then performs one iteration of the loop, increments the variable to keep track of loop position, and immediately breaks.
int x;
x = 0; // initialized counter at 0
std::vector<JSONObject> data
for (size_t i = x; i < data.size(); i++)
{
process_data(data[i]);
x++ // increment counter
break; // stop function until told to iterate again starting at x
}
Then you should be able to post a message to the web worker, which then sends a message to main.js that the thread is no longer blocked.
Canceling the Operation
From this point, main.js knows that the web worker thread is no longer blocked, and can decide whether or not to tell the web worker to execute the C++ function again (with the C++ variable keeping track of the loop increment in memory.)
let continueOperation = true
// here you can set to false at any time since the thread is not blocked here
worker.expensiveThreadBlockingFunction()
// results in one iteration of the loop being iterated until message is received below
worker.onmessage = function(e) {
if (continueOperation) {
worker.expensiveThreadBlockingFunction()
// execute worker function again, ultimately continuing the increment in C++
} {
return false
// or send message to worker to reset C++ counter to prepare for next execution
}
}
Continuing the Operation
Assuming all is well, and the user has not cancelled the operation, the loop should continue until finished. Keep in mind you should also send a distinct message for whether the loop has completed, or needs to continue, so you don't keep blocking the worker thread.
This question already has answers here:
Long-running computations in node.js
(3 answers)
Closed 8 years ago.
Callbacks are asynchronous , So does that mean that if I run a lengthy computation in a callback it wont affect my main thread ?
For example:
function compute(req,res){ // this is called in an expressjs route.
db.collection.find({'key':aString}).toArray(function(err, items) {
for(var i=0;i<items.length;i++){ // items length may be in thousands.
// Heavy/lengthy computation here, Which may take 5 seconds.
}
res.send("Done");
});
}
So, the call to database is ascnchronous. Does that mean the for loop inside the callback will NOT block the main thread ?
And if it is blocking, How may I perform such things in an async way?
For the most part, node.js runs in a single thread. However, node.js allows you to make calls that execute low-level operations (file reads, network requests, etc.) which are handled by separate threads. As such, your database call most likely happens on a separate thread. But, when the database call returns, we return back to the main thread and your code will run in the main thread (blocking it).
The way to get around this is to spin up a new thread. You can use cluster to do this. See:
http://nodejs.org/api/cluster.html
Your main program will make the database call
When the database call finishes, it will call fork() and spin up a new thread that runs your-calculations.js and sends an event to it with any input data
your-calculations.js will listen for an event and do the necessary processing when it handles the event
your-calculations.js will then send an event back to the main thread when it has finished processing (it can send any output data back)
If the main thread needs the output data, it can listen for the event that your-calculations.js emits
If you can't do, or don't want to use a thread, you can split up the long computation with setImmediates. e.g. (writing quickly on my tablet so may be sloppy)
function compute(startIndex, max, array, partialResult, callback) {
var done = false;
var err = null;
var stop = startIndex+100; // or some reasonable amount of calcs...
if (stop >= max) {
stop = max;
done = true;
}
// do calc from startIndex to stop, using partialResult as input
if (done)
callback(err, result);
else
process.setImmediate ( go look this part up or I'll edit tomorrow)...
But the idea is you call youself again with start += 100.
}
In between every 100 calculations node will have time to process other requests, handle other callbacks, etc. Of course, if they trigger another huge calculation evedntually things will grind to a halt.
efor this problem i am using Node-Webkit (node.js) and Async, loading a Windows App.
The reason of this question is to definitively answer:
What really means asynchronous execution in Javascript and Node.Js.
My personal code problem is at the end of the Question. "The Case".
I am going to explain all about the problem i have directly with a schematic summary. (And I will be updating the info as you help me to understand it)
The Concept (theory)
Imagine a Primary Screen (JS, Html, css, ... Node.js frameworks) and a Background Procedure (JS execution every 10 min, JS internal checks, background Database Optimization, ...).
Whatever you do in Primary Screen wont affect background execution (except some important cases), and Background can change even the Screen if he needs to (screen timers, info about online web status, ...)
Then the behaviour is like:
Thread 1: Your actions inside the App framework. Thread 2: Background App routines
Any action as they finish gives his output to screen, despite of the rest of the actions in async parallel
The Interpretation (For me)
I think this is something that "Async" will handle without problems, as a parallel execution.
async.parallel([
function(){ ... },
function(){ ... }
], callback); //optional callback
So the Thread 1 and Thread 2 can work together correctly while they do not affect the same code or instruction.
The Content will be changing while any threads request something of/to it.
The Implementation (Reality)
Code is not fully asynchronous during the execution, there are sync parts with common actions, that when they need calls the async codes.
Sync: Startup with containers -> Async: load multiple content and do general stuff -> Sync: Do an action in the screen -> ...
The Case
So here it is my not working properly code:
win.on('loaded', function() {
$( "#ContentProgram" ).load( "view/launcherWorkSpace.html", function() {
$("#bgLauncher").hide();
win.show();
async.parallel([
function() //**Background Process: Access to DB and return HTML content**
{
var datacontent = new data.GetActiveData();
var exeSQL = new data.conn(datacontent);
if(exeSQL.Res)
{
var r = exeSQL.Content;
if(r.Found)
{
logSalon = new data.activeSData(r)
$('#RelativeInfo').empty();
$("#RelativeInfo").html("<h4 class='text-success'>Data found: <b>" + logData.getName + "</b></h4>");
}
}
},
function() //**Foreground Process: See an effect on screen during load.**
{
$("#bgLauncher").fadeIn(400);
$("#centralAccess").delay(500).animate({bottom:0},200);
}
]);
});
});
As you can see, im not using "Callback()" because i dont need to (and it does the same).
I want to do the Foreground Process even if Background Process is not finished, but the result of the code is done at same time when both request has finished...
If i disconect the DB manually, first function takes 3 seconds until gives an exception (that i wont handle). Until then, both proccess will not output (show on screen) anything. (Foreground Process should be launched whatever happends to Background Process).
Thanks and sorry for so much explanation for something that looks like trivial.
EDITED
This start to be annoying... I tried without Async, just a javascript with callback like this:
launchEffect(function () {
var datacontent = new data.GetActiveData();
var exeSQL = new data.conn(datacontent);
if(exeSQL.Res)
{
var r = exeSQL.Content;
if(r.Found)
{
logData = new data.activeData(r)
$('#RelativeInfo').empty();
$("#RelativeInfo").html("<h4 class='text-success'>Salón: <b>" + log.getName + "</b></h4>");
}
}
});
});
});
function launchEffect(callback)
{
$("#bgLauncher").fadeIn(400);
$("#centralAccess").delay(500).animate({bottom:0},200);
callback();
}
Even with this... Jquery doesnt work until the callback answer...
node-webkit let's you run code written like code for node.js, but is ultimately just a shim running in WebKit's Javascript runtime and only has one thread, which means that most 'asynchronous' code will still block the execution of any other code.
If you were running node.js itself, you'd see different behavior because it can do genuinely asynchronous threading behind the scenes. If you want more threads, you'll need to supply them in your host app.
I'm starting to write a server in node.js and wondering whether or not I'm doing things the right way...
Basically my structure is like the following pseudocode:
function processStatus(file, data, status) {
...
}
function gotDBInfo(dbInfo) {
var myFile = dbInfo.file;
function gotFileInfo(fileInfo) {
var contents = fileInfo.contents;
function sentMessage(status) {
processStatus(myFile, contents, status);
}
sendMessage(myFile.name + contents, sentMessage);
}
checkFile(myFile, gotFileInfo);
}
checkDB(query, gotDBInfo);
In general, I'm wondering if this is the right way to code for node.js, and more specifically:
1) Is the VM smart enough to run "concurrently" (i.e. switch contexts) between each callback to not get hung up with lots of connected clients?
2) When garbage collection is run, will it clear the memory completely if the last callback (processStatus) finished?
Node.js is event-based, all codes are basically handlers of events. The V8 engine will execute-to-end any synchronous code in the handler and then process the next event.
Async call (network/file IO) will post an event to another thread to do the blocking IO (that's in libev libeio AFAIK, I may be wrong on this). Your app can then handle other clients. When the IO task is done, an event is fired and your callback function is called upon.
Here's an example of aync call flow, simulating a Node app handling a client request:
onRequest(req, res) {
// we have to do some IO and CPU intensive task before responding the client
asyncCall(function callback1() {
// callback1() trigger after asyncCall() done it's part
// *note that some other code might have been executed in between*
moreAsyncCall(function callback2(data) {
// callback2() trigger after moreAsyncCall() done it's part
// note that some other code might have been executed in between
// res is in scope thanks to closure
res.end(data);
// callback2() returns here, Node can execute other code
// the client should receive a response
// the TCP connection may be kept alive though
});
// callback1() returns here, Node can execute other code
// we could have done the processing of asyncCall() synchronously
// in callback1(), but that would block for too long
// so we used moreAsyncCall() to *yield to other code*
// this is kind of like cooperative scheduling
});
// tasks are scheduled by calling asyncCall()
// onRequest() returns here, Node can execute other code
}
When V8 does not have enough memory, it will do garbage collection. It knows whether a chunk of memory is reachable by live JavaScript object. I'm not sure if it will aggressively clean up memory upon reaching end of function.
References:
This Google I/O presentation discussed the GC mechanism of Chrome (hence V8).
http://platformjs.wordpress.com/2010/11/24/node-js-under-the-hood/
http://blog.zenika.com/index.php?post/2011/04/10/NodeJS