HapiJS start longer background process

HapiJS start longer background process - javascript

How should I implement a PHP exec like call to a system function with HapiJS? The user submits a processing job that needs to run in the background for some time.
I somehow need to return a job id / session id to the user, run the job asynchronously, allow the user to check back for completion and reroute when completed...
I bet there are existing solutions for that, yet I'd highly welcome a pointer into the right direction.

Check out node's child process documentation: here
To do what you are describing I would spawn a process without a callback and then use a little trick: trying to kill a process that isn't running causes an error see here
const exec = require('child_process').exec;
//Launch the process
const child = exec('ls');
const pid = child.pid;
//later in another scope when you are looking to see if it is running
try {
process.kill(pid, 0);
}
catch (e) {
console.log("it's finished");
}

Related

How to wait an arbitrary amount of time in nightwatch.js without web browser interaction

I'm migrating my tests from Intern.js to Nightwatch.js and I'm sure it's not the recommended way of writing tests, but my tests asynchronously make browser command calls (my tests use a class that has its own command queue). The Nightwatch command queue can be empty sometimes depending on what's executing, but will eventually be populated. I need a way to manually tell Nightwatch when it's done, and just wait until then, otherwise the browser will close too early.
I can do something like this:
finished() {
let done = false;
//will run at the end of my class's internal command queue
this.command
.then(() => {
done = true;
});
//will cause nightwatch to wait until my command queue is done
this.browser.perform(doneFn => {
const check = () => {
if (done) {
return doneFn();
}
setTimeout(check, 10);
}
check();
});
}
But in that case perform() will time out after waiting 10 seconds, and I don't seem to be able to configure that. I could otherwise run a sequence of ~9 second perform() calls but it just sounds too hacky. I could rewrite to depend on nightwatch command queue instead, but that'd also be a lot more work to rewrite everything.

In the end it was easier to just switch to depending on using nightwatch's command queue instead. Anything that was just a .then() with Intern.js got changed to a .perform() in its various forms.

How to cancel a wasm process from within a webworker

I have a wasm process (compiled from c++) that processes data inside a web application. Let's say the necessary code looks like this:
std::vector<JSONObject> data
for (size_t i = 0; i < data.size(); i++)
{
process_data(data[i]);
if (i % 1000 == 0) {
bool is_cancelled = check_if_cancelled();
if (is_cancelled) {
break;
}
}
}
This code basically "runs/processes a query" similar to a SQL query interface:
However, queries may take several minutes to run/process and at any given time the user may cancel their query. The cancellation process would occur in the normal javascript/web application, outside of the service Worker running the wasm.
My question then is what would be an example of how we could know that the user has clicked the 'cancel' button and communicate it to the wasm process so that knows the process has been cancelled so it can exit? Using the worker.terminate() is not an option, as we need to keep all the loaded data for that worker and cannot just kill that worker (it needs to stay alive with its stored data, so another query can be run...).
What would be an example way to communicate here between the javascript and worker/wasm/c++ application so that we can know when to exit, and how to do it properly?
Additionally, let us suppose a typical query takes 60s to run and processes 500MB of data in-browser using cpp/wasm.
Update: I think there are the following possible solutions here based on some research (and the initial answers/comments below) with some feedback on them:
Use two workers, with one worker storing the data and another worker processing the data. In this way the processing-worker can be terminated, and the data will always remain. Feasible? Not really, as it would take way too much time to copy over ~ 500MB of data to the webworker whenever it starts. This could have been done (previously) using SharedArrayBuffer, but its support is now quite limited/nonexistent due to some security concerns. Too bad, as this seems like by far the best solution if it were supported...
Use a single worker using Emterpreter and using emscripten_sleep_with_yield. Feasible? No, destroys performance when using Emterpreter (mentioned in the docs above), and slows down all queries by about 4-6x.
Always run a second worker and in the UI just display the most recent. Feasible? No, would probably run into quite a few OOM errors if it's not a shared data structure and the data size is 500MB x 2 = 1GB (500MB seems to be a large though acceptable size when running in a modern desktop browser/computer).
Use an API call to a server to store the status and check whether the query is cancelled or not. Feasible? Yes, though it seems quite heavy-handed to long-poll with network requests every second from every running query.
Use an incremental-parsing approach where only a row at a time is parsed. Feasible? Yes, but also would require a tremendous amount of re-writing the parsing functions so that every function supports this (the actual data parsing is handled in several functions -- filter, search, calculate, group by, sort, etc. etc.
Use IndexedDB and store the state in javascript. Allocate a chunk of memory in WASM, then return its pointer to JavaScript. Then read database there and fill the pointer. Then process your data in C++. Feasible? Not sure, though this seems like the best solution if it can be implemented.
[Anything else?]
In the bounty then I was wondering three things:
If the above six analyses seem generally valid?
Are there other (perhaps better) approaches I'm missing?
Would anyone be able to show a very basic example of doing #6 -- seems like that would be the best solution if it's possible and works cross-browser.

For Chrome (only) you may use shared memory (shared buffer as memory). And raise a flag in memory when you want to halt. Not a big fan of this solution (is complex and is supported only in chrome). It also depends on how your query works, and if there are places where the lengthy query can check the flag.
Instead you should probably call the c++ function multiple times (e.g. for each query) and check if you should halt after each call (just send a message to the worker to halt).
What I mean by multiple time is make the query in stages (multiple function cals for a single query). It may not be applicable in your case.
Regardless, AFAIK there is no way to send a signal to a Webassembly execution (e.g. Linux kill). Therefore, you'll have to wait for the operation to finish in order to complete the cancellation.
I'm attaching a code snippet that may explain this idea.
worker.js:
... init webassembly
onmessage = function(q) {
// query received from main thread.
const result = ... call webassembly(q);
postMessage(result);
}
main.js:
const worker = new Worker("worker.js");
const cancel = false;
const processing = false;
worker.onmessage(function(r) {
// when worker has finished processing the query.
// r is the results of the processing.
processing = false;
if (cancel === true) {
// processing is done, but result is not required.
// instead of showing the results, update that the query was canceled.
cancel = false;
... update UI "cancled".
return;
}
... update UI "results r".
}
function onCancel() {
// Occurs when user clicks on the cancel button.
if (cancel) {
// sanity test - prevent this in UI.
throw "already cancelling";
}
cancel = true;
... update UI "canceling".
}
function onQuery(q) {
if (processing === true) {
// sanity test - prevent this in UI.
throw "already processing";
}
processing = true;
// Send the query to the worker.
// When the worker receives the message it will process the query via webassembly.
worker.postMessage(q);
}
An idea from user experience perspective:
You may create ~two workers. This will take twice the memory, but will allow you to "cancel" "immediately" once. (it will just mean that in the backend the 2nd worker will run the next query, and when the 1st finishes the cancellation, cancellation will again become immediate).

Shared Thread
Since the worker and the C++ function that it called share the same thread, the worker will also be blocked until the C++ loop is finished, and won't be able to handle any incoming messages. I think the a solid option would minimize the amount of time that the thread is blocked by instead initializing one iteration at a time from the main application.
It would look something like this.
main.js -> worker.js -> C++ function -> worker.js -> main.js
Breaking up the Loop
Below, C++ has a variable initialized at 0, which will be incremented at each loop iteration and stored in memory.
C++ function then performs one iteration of the loop, increments the variable to keep track of loop position, and immediately breaks.
int x;
x = 0; // initialized counter at 0
std::vector<JSONObject> data
for (size_t i = x; i < data.size(); i++)
{
process_data(data[i]);
x++ // increment counter
break; // stop function until told to iterate again starting at x
}
Then you should be able to post a message to the web worker, which then sends a message to main.js that the thread is no longer blocked.
Canceling the Operation
From this point, main.js knows that the web worker thread is no longer blocked, and can decide whether or not to tell the web worker to execute the C++ function again (with the C++ variable keeping track of the loop increment in memory.)
let continueOperation = true
// here you can set to false at any time since the thread is not blocked here
worker.expensiveThreadBlockingFunction()
// results in one iteration of the loop being iterated until message is received below
worker.onmessage = function(e) {
if (continueOperation) {
worker.expensiveThreadBlockingFunction()
// execute worker function again, ultimately continuing the increment in C++
} {
return false
// or send message to worker to reset C++ counter to prepare for next execution
}
}
Continuing the Operation
Assuming all is well, and the user has not cancelled the operation, the loop should continue until finished. Keep in mind you should also send a distinct message for whether the loop has completed, or needs to continue, so you don't keep blocking the worker thread.

SailsJS test if process is already running

say I have a long task that starts running when a person connects to InitializeDB. (Of course with authorization in the future, but left that out for now).
'post /initializeDB':'OrderController.initializeAll',
Now the problem is: the initialize function should never be run twice. - I know ideally I set up a taskmanager which just starts a task in the background which I could poll.
However for current simplicity, (and to show a proof of concept) is it possible for a sails route to "know" that another connection/route is already running? So that if I connect twice to /initializeDB it won't try to initialize the database twice?

You can use a variable in your controller - just toggle it to true when the process is running, something like that. So, in OrderController.js:
var initializeRunning = false;
module.exports = {
initializeAll: function(req, res) {
// return benign result if already running
if (initializeRunning) {
return res.send({alreadyRunning: true});
}
// start running
initializeRunning = true;
// using setTimeout as a stand-in for a long async process
setTimeout(function() {
// finished the process
res.send({complete: true});
// if you want to allow this method to run again later, unset your toggle
initializeRunning = false;
}, 3000);
},
};

Nodejs manage different threads

I'm a little bit newbie with Nodejs.
I'm working in a Nodejs - express solution.
I want to send and e-mail when some information is added to a MSSSQL database.
This is working well for me. The problem is that I want to check every five minutes that this information added to the database is modified or not, and if not, send another e-mail.
The call to add information to the db is this route:
router.post('/postlinevalidation', function(req, res) {
//insert function into mssql
silkcartCtrl.sendMail(req, res);
});
The controller part for sending the e-mail:
exports.sendMail = function(req, res) {
var emails = "";
fs.readFile('./config/email.conf', 'utf8', function (err,data) {
if (err) {
return logger.error(err);
}
emails = data;
});
var minutes = 5, the_interval = minutes * 60 * 1000;
var refreshId = setInterval(function() {
logger.info("I am doing my 5 minutes check FL_PENDIENTE");
var request = new sql.Request(req.dbsqlserver);
var sqlpendinglinesvalidation = "SELECT [FK_IDCHECK],[FK_IDPEDIDO],[BK_IDPROVEEDOR],[DE_PROVEEDOR]"+
",[FK_FAMILIA],[BK_FAMILIA],[FK_SUBFAMILIA],[BK_SUBFAMILIA],[FK_ARTICULO]"+
",[BK_ARTICULO],[FL_VALIDAR],[DT_FECHA],[FL_PENDIENTE],[DES_CHECK],[QNT_PROPUESTA],[FECHA]"+
"FROM TABLE"+
" WHERE [FL_PENDIENTE] = 1";
request.query(sqlpendinglinesvalidation, function (err, recordset) {
if (recordset.length > 0) {
var transporter = nodemailer.createTransport('smtps://user%40gmail.com:pwd#smtp.gmail.com');
var mailOptions = {
from: '"Mailer" <mail#mail.com>', // sender address
to: emails, // list of receivers
subject: 'Tienes compras pendientes de validar', // Subject line
text: 'Tienes compras pendientes de validar', // plaintext body
html: '<b>Tienes compras pendientes de validar.</b>' // html body
};
// send mail with defined transport object
transporter.sendMail(mailOptions, function(error, info){
if(error){
return logger.error(error);
}
logger.info('Message sent: ' + info.response);
});
} else {
clearInterval(refreshId);
return true;
}
});
}, the_interval);
};
As I said this is working well.
I control the five minutes withsetInterval
But I supossed every time the route postlinevalidation is called, a new thread is open, so I will have several setInterval processes running.
I want to know how to manage it. If the controller function exports.sendMail is running, when the route is called again, "kill this process", and start again exports.sendMail
Thanks in advance

But I supossed every time the route postlinevalidation is called, a
new thread is open, so I will have several setInterval processes
running.
No, this is not how node.js works. You don't get multiple threads because of multiple setInterval() timers.
node.js by itself is single threaded. So, each time a route is called, it just creates an event in the node.js event queue and they are served FIFO, one at a time. At any point that one of the route handlers makes an async call, it essentially "yields" control back and the next item in the event queue gets to run until it yields or finishes.
Timers like setInterval() also use the event queue so no additional threads are creates by setInterval(). It is possible that node.js modules that use native code may themselves use threads and node.js uses a small thread pool that it uses for disk managemnet, but neither of those have anything to do with setInterval().
If you explicitly want to create another execution context for a long running operation in node.js to separate it from the single node.js thread, then that is usually done with the child process module that is part of node.js. You create a new process (which can be a node.js process or some other program running in the process) and you can then communicate with that other process.
If the controller function exports.sendMail is running, when the route
is called again, "kill this process", and start again
exports.sendMail
This is something that would need to be an explicit feature of the nodemailer module in order for you to cancel an operation in process. How "in process" asynchronous operations are implemented and controlled is not a generic node.js thing, but is specific to how that specific module implements things and keeps track of things.
Looking into the code for the node-mailer and more specifically the smtp-connection module, it looks like it uses plain async node.js socket code. That means it does not create any new threads or processes on its own.
As for your setInterval() calls, you need to make sure that any body of code that creates a setInterval() keeps track of the interval timer ID and eventually clears the interval so it stops and you don't keep piling up more and more interval timers. Another possibility is that you have only one interval and it does checking for all outstanding operations (rather than have a separate interval for each one).

From a quick look, I think you don't really need to put the sendMail function inside postlinevalidation. If you want to control it, you could run it in a different script from the express app. You can use something like pm2 or parallelshell to run multiple scripts at the same time.

If you are using setInterval then you can use clearInterval to stop the setInterval based on your condition. Whenever you call a setInterval function, it returns an id using which you can stop the setInterval.
var interval = setInterval(doStuff, 5000);
function doStuff() {
if(your_condition) {
clearInterval(interval);
}
}

Calling socket.disconnect in a forEach loop doesn't actually call disconnect on all sockets

I am new to javascript world. Recently I was working on a chat application in nodejs. So I have a method called gracefulshutdown as follows.
var gracefulShutdown = function() {
logger.info("Received kill signal, shutting down gracefully.");
server.close();
logger.info('Disconnecting all the socket.io clients');
if (Object.keys(io.sockets.sockets).length == 0) process.exit();
var _map = io.sockets.sockets,
_socket;
for (var _k in _map) {
if (_map.hasOwnProperty(_k)) {
_socket = _map[_k];
_socket.disconnect(true);
}
}
...code here...
setTimeout(function() {
logger.error("Could not close connections in time, shutting down");
process.exit();
}, 10 * 1000);
}
Here is what is happening in the disconnect listener.The removeDisconnectedClient method simply updates an entry in the db to indicate the removed client.
socket.on('disconnect', function() {
removeDisconnectedClient(socket);
});
So in this case the disconnect event wasn't fired for all sockets. It was fired for only a few sockets randomly from the array. Although I was able to fix it using setTimeout(fn, 0) with the help of a teammate.
I read about it online and understood only this much that setTimeout defers the execution of of code by adding it to end of event queue. I read about javascript context, call stack, event loop. But I couldn't put together all of it in this context. I really don't understand why and how this issue occurred. Could someone explain it in detail. And what is the best way to solve or avoid them.

It is hard to say for sure without a little more context about the rest of the code in gracefulShutdown but I'm surprised it is disconnecting any of the sockets at all:
_socket = _map[ _k ];
socket.disconnect(true);
It appears that you are assigning an item from _map to the variable _socket but then calling disconnect on socket, which is a different variable. I'm guessing it is a typo and you meant to call disconnect on _socket?
Some of the sockets might be disconnecting for other reasons and the appearance that your loop is disconnecting some but not all the sockets is probably just coincidence.
As far as I can tell from the code you posted, socket should be undefined and you should be getting errors about trying to call the disconnect method on undefined.

From the method name where you use it I can suppose that application exits after attempts to disconnect all sockets. The nature of socket communication is asynchronous, so given you have a decent amount of items in _map it can occur that not all messages with disconnect will be sent before the process exits.
You can increase chances by calling exit after some timeout after disconnecting all sockets. However, why would you manually disconnect? On connection interruption remote sockets will automatically get disconnected...
UPDATE
Socket.io for Node.js doesn't have a callback to know for sure that packet with disconnect command was sent. At least in v0.9. I've debugged that and came to conclusion that without modification of sources it is not possible to catch that moment.
In file "socket.io\lib\transports\websocket\hybi-16.js" a method write is called to send the disconnect packet
WebSocket.prototype.write = function (data) {
...
this.socket.write(buf, 'binary');
...
}
Whereas socket.write is defined in Node.js core transport "nodejs-{your-node-version}-src\core-modules-sources\lib\net.js" as
Socket.prototype.write = function(chunk, encoding, cb)
//cb is a callback to be called on writeRequest complete
However as you see this callback is not provided, so socket.io will not know about the packet having been sent.
At the same time when disconnect() is called for websocket, member disconnected is set to true, and "disconnect" event is broadcasted, indeed. But synchronously. So .on('disconnect' handler on server socket doesn't give and valuable information about whether the packet was sent or not.
Solution
I can make a general conclusion from this. If it is so critical to make sure that all clients are immediately informed (and not wait for a heartbeat timeout or if heartbeat is disabled) then this logic should be implemented manually.
You can send an ordinary message which will mean for the client that server is shutting down and call socket disconnect as soon as the message is received. At the same time server will be able to accept all acknowledgements
Server-side:
var sockets = [];
for (var _k in _map) {
if (_map.hasOwnProperty(_k)) {
sockets.push(_map[_k]);
}
}
sockets.map(function (socket) {
socket.emit('shutdown', function () {
socket.isShutdown = true;
var all = sockets.every(function (skt) {
return skt.isShutdown;
});
if (all) {
//wrap in timeout to let current tick finish before quitting
setTimeout(function () {
process.exit();
});
}
})
})
Clients should behave simply
socket.on('shutdown', function () {
socket.disconnect();
});
Thus we make sure each client has explicitly disconnected. We don't care about server. It will be shutdown shortly.

In the example code it looks like io.sockets.sockets is an Object, however, at least in the library version I am using, it is a mutable array which the socket.io library is free to modify each time you are removing a socket with disconnect(true).
Thus, when you call disconnect(true); if the currently iterated item from index i is removed, this effect like this happens:
var a = [1,2,3,4];
for( var i in a) {
a.splice(i,1); // remove item from array
alert(i);
}
// alerts 0,1
Thus, the disconnect(true) call will ask the socket.io to remove the item from the array - and because you are both holding reference to the same array, the contents of the array are modified during the loop.
The solution is to create a copy of the _map with slice() before the loop:
var _map = io.sockets.sockets.slice(); // copy of the original
It would create a copy of the original array and thus should go through all the items in the array.
The reason why calling setTimeout() would also work is that it would defer the removal of the items from the array, allowing the whole loop iterate without modifying the sockets -Array.

The problem here is that sockjs and socket.io use asynchronous "disconnect" methods. IE. When you call disconnect, it is not immediately terminated. It is just a promise that it WILL be terminated. This has the following effect (assuming 3 sockets)
Your for loop grabs the first socket
The disconnect method is called on the first socket
Your for loop grabs the second socket
The disconnect method is called on the second socket
The disconnect method on the first socket finishes
Your for loop grabs the third socket
The disconnect method is called on the third socket
Program kills itself
Notice, that sockets 2 and 3 haven't necessarily finished yet. This could be for a number of reasons.
Finally, setTimeout(fn, 0) is, as you said, blocking the final call, but it may not be consistent (I haven't dug into this too much). By that I mean, you've set the final termination to be AFTER all your sockets have disconnected. The setTimeout and setInterval methods essentially act more like a queue. Your position in the queue is dictated by the timer you set. Two intervals set for 10s each, where they both run synchronously will cause one to run AFTER the other.

After Socket.io 1.0, the library does not expose you an array of the connected sockets. You can check that io.socket.sockets.length, is not equal to the open socket objects. Your best bet is that you broadcast a 'disconnect' message to all the clients that you want to off, and on.'disconnect' on the client side close the actual WebSocket.

We Keep Coding

JavaScript is the programming language of the Web.