Efficient closure structure in node.js - javascript

I'm starting to write a server in node.js and wondering whether or not I'm doing things the right way...
Basically my structure is like the following pseudocode:
function processStatus(file, data, status) {
...
}
function gotDBInfo(dbInfo) {
var myFile = dbInfo.file;
function gotFileInfo(fileInfo) {
var contents = fileInfo.contents;
function sentMessage(status) {
processStatus(myFile, contents, status);
}
sendMessage(myFile.name + contents, sentMessage);
}
checkFile(myFile, gotFileInfo);
}
checkDB(query, gotDBInfo);
In general, I'm wondering if this is the right way to code for node.js, and more specifically:
1) Is the VM smart enough to run "concurrently" (i.e. switch contexts) between each callback to not get hung up with lots of connected clients?
2) When garbage collection is run, will it clear the memory completely if the last callback (processStatus) finished?

Node.js is event-based, all codes are basically handlers of events. The V8 engine will execute-to-end any synchronous code in the handler and then process the next event.
Async call (network/file IO) will post an event to another thread to do the blocking IO (that's in libev libeio AFAIK, I may be wrong on this). Your app can then handle other clients. When the IO task is done, an event is fired and your callback function is called upon.
Here's an example of aync call flow, simulating a Node app handling a client request:
onRequest(req, res) {
// we have to do some IO and CPU intensive task before responding the client
asyncCall(function callback1() {
// callback1() trigger after asyncCall() done it's part
// *note that some other code might have been executed in between*
moreAsyncCall(function callback2(data) {
// callback2() trigger after moreAsyncCall() done it's part
// note that some other code might have been executed in between
// res is in scope thanks to closure
res.end(data);
// callback2() returns here, Node can execute other code
// the client should receive a response
// the TCP connection may be kept alive though
});
// callback1() returns here, Node can execute other code
// we could have done the processing of asyncCall() synchronously
// in callback1(), but that would block for too long
// so we used moreAsyncCall() to *yield to other code*
// this is kind of like cooperative scheduling
});
// tasks are scheduled by calling asyncCall()
// onRequest() returns here, Node can execute other code
}
When V8 does not have enough memory, it will do garbage collection. It knows whether a chunk of memory is reachable by live JavaScript object. I'm not sure if it will aggressively clean up memory upon reaching end of function.
References:
This Google I/O presentation discussed the GC mechanism of Chrome (hence V8).
http://platformjs.wordpress.com/2010/11/24/node-js-under-the-hood/
http://blog.zenika.com/index.php?post/2011/04/10/NodeJS

Related

How to run expensive code parallel in the same file

I'm trying to run a piece of JavaScript code asynchronously to the main thread. I don't necessarily need the code to actually run on a different thread (so performance does not need to be better that sequential execution), but I want the code to be executed in parallel to the main thread, meaning no freezing.
Additionally, all the code needed needs to be contained within a single function.
My example workload is as follows:
function work() {
for(let i=0; i<100000; i++)
console.log("Async");
}
Additionally, I may have some work on the main thread (which is allowed to freeze the side, just for testing):
function seqWork() {
for(let i=0; i<100000; i++)
console.log("Sequential");
}
The expected output should be something like this:
Sequential
Async
Sequential
Sequential
Async
Sequential
Async
Async
...
You get the point.
Disclaimer: I am absolutely unexperienced in JavaScript and in working with async and await.
What I've tried
I did some research, and found these 3 options:
1. async/await
Seems like the obvious choice. So I tried this:
let f= async function f() {
await work();
}
f();
seqWork();
Output:
Async (100000)
Sequential (100000)
I also tried:
let f = async function f() {
let g = () => new Promise((res,rej) => {
work();
res();
});
await g();
}
f();
seqWork();
Output:
Async (100000)
Sequential (100000)
So both methods did not work. They also both freeze the browser during the async output, so it seems that that has absolutely no effect(?) I may be doing something very wrong here, but I don't know what.
2. Promise.all
This seems to be praised as the solution for any expensive task, but only seems like a reasonably choice if you have many blocking tasks and you want to "combine" them into just one blocking task that is faster than executing them sequentially. There are certainly use cases for this, but for my task it is useless, because I only have one task to execute asynchronously, and the main "thread" should keep running during that task.
3. Worker
This seemed to me like the most promising option, but I have not got it working yet. The main problem is that you seem to need a second script. I cannot do that, but even in local testing with a second file Firefox is blocking the loading of that script.
This is what I've tried, and I have not found any other options in my research. I'm starting to think that something like this is straight up not possible in JS, but it seems like a quite simple task. Again, I don't need this to be actually executed in parallel, it would be enough if the event loop would alternate between calling a statement from the main thread and the async "thread". Coming from Java, they are also able to simulate multi threading on a single hardware thread.
Edit: Context
I have some java code that gets converted to JavaScript (I have no control over the conversion) using TeaVM. Java natively supports multithreading, and a lot of my code relies on that being possible. Now since JavaScript apparently does not really support real multithreading, TeaVM converts Thread in the most simplistic way to JS: Calling Thread.start() directly calls Thread.run() which makes it completely unusable. I want to create a better multithreading emulation here which can - pretty much - execute the thread code basically without modification. Now it is not ideal but inserting "yielding" statements into the java code would be possible.
TeaVM has a handy feature which allows you to write native Java methods annotated with matching JS code that will be converted directly into that code. Problem is, you cannot set the method body so you can't make it an async method.
One thing I'm now trying to do is implement a JS native "yield" / "pause" (to not use the keyword in JS) function which I can call to allow for other code to run right from the java method. The method basically has to briefly block the execution of the calling code and instead invoke execution of other queued tasks. I'm not sure whether that is possible with the main code not being in an async function. (I cannot alter the generated JS code)
The only other way I can think of to work around this would be to let the JS method call all the blocking code, refering back to the Java code. The main problem though is, that this means splitting up the method body of the java method into many small chunks as Java does not support something like yield return from C#. This basically means a complete rework of every single parallel executed piece of code, which I would desperately try to avoid. Also, you could not "yield" from within a called method, making it way less modular. At that point I may as well just call the method chunks from within Java directly from an internal event loop.
Since JavaScript is single threaded the choice is between
running some code asynchronously in the main thread, or
running the same code in a worker thread (i.e. one that is not the main thread.
Coopererative Multitasking
If you want to run heavy code in the main thread without undue blocking it would need to be written to multitask cooperatively. This requires long running synchronous tasks to periodically yield control to the task manager to allow other tasks to run. In terms of JavaScript you could achieve this by running a task in an asynchronous function that periodically waits for a system timer of short duration. This has potential because await saves the current execution context and returns control to the task manager while an asynchronous task is performed. A timer call ensures that the task manager can actually loop and do other things before returning control to the asynchronous task that started the timer.
Awaiting a promise that is already fulfilled would only interleave execution of jobs in the microtask queue without returning to the event loop proper and is not a suitable for this purpose.
Calling code pattern:
doWork()
.then( data => console.log("work done"));
Work code:
async function doWork() {
for( i = 1; i < 10000; ++i) {
// do stuff
if( i%1000 == 0) {
// let other things happen:
await new Promise( resolve=>setTimeout(resolve, 4))
}
}
}
Note this draws on historical practice and might suit the purpose of getting prototype code working quickly. I wouldn't think it particularly suitability for a commercial production environment.
Worker Threads
A localhost server can be used to serve worker code from a URL so development can proceed. A common method is to use a node/express server listening on a port of the loopback address known as localhost.
You will need to install node and install express using NPM (which is installed with node). It is not my intention to go into the node/express eco-system - there is abundant material about it on the web.
If you are still looking for a minimalist static file server to serve files from the current working directory, here's one I wrote earlier. Again there are any number of similar examples available on the net.
"use strict";
/*
* express-cwd.js
* node/express server for files in current working directory
* Terminal or shortcut/desktop launcher command: node express-cwd
*/
const express = require('express');
const path = require('path');
const process = require("process");
const app = express();
app.get( '/*', express.static( process.cwd())); // serve files from working directory
const ip='::1'; // local host
const port=8088; // port 8088
const server = app.listen(port, ip, function () {
console.log( path.parse(module.filename).base + ' listening at http://localhost:%s', port);
})
Promise Delays
The inlined promise delay shown in "work code" above can be written as a function, not called yield which is a reserved word. For example
const timeOut = msec => new Promise( r=>setTimeout(r, msec));
An example of executing blocking code in sections:
"use strict";
// update page every 500 msec
const counter = document.getElementById("count");
setInterval( ()=> counter.textContent = +counter.textContent + 1, 500);
function block_500ms() {
let start = Date.now();
let end = start + 500;
for( ;Date.now() < end; );
}
// synchronously block for 4 seconds
document.getElementById("block")
.addEventListener("click", ()=> {
for( var i = 8; i--; ) {
block_500ms();
}
console.log( "block() done");
});
// block for 500 msec 8 times, with timeout every 500 ms
document.getElementById("block8")
.addEventListener("click", async ()=> {
for( var i = 8; i--; ) {
block_500ms();
await new Promise( resolve=>setTimeout(resolve, 5))
}
console.log("block8() done");
});
const timeOut = msec => new Promise( r=>setTimeout(r, msec));
document.getElementById("blockDelay")
.addEventListener("click", async ()=> {
for( var i = 8; i--; ) {
block_500ms();
await timeOut();
}
console.log("blockDelay(1) done");
});
Up Counter: <span id="count">0</span>
<p>
<button id="block" type="button" >Block counter for 4 seconds</button> - <strong> with no breaks</strong>
<p>
<button id="block8" type="button" >Block for 4 seconds </button> - <strong> with short breaks every 500 ms (inline)</strong>
<p>
<button id="blockDelay" type="button" >Block for 4 seconds </button> - <strong> with short breaks every 500 ms (using promise function) </strong>
Some jerkiness may be noticeable with interleaved sections of blocking code but the total freeze is avoided. Timeout values are determined by experiment - the shorter the value that works in an acceptable manner the better.
Caution
Program design must ensure that variables holding input data, intermediate results and accumulated output data are not corrupted by main thread code that may or may not be executed part way through the course of heavy code execution.

How does node process concurrent requests?

I have been reading up on nodejs lately, trying to understand how it handles multiple concurrent requests. I know NodeJs is a single threaded event loop based architecture, and at a given point in time only one statement is going to be executing, i.e. on the main thread and that blocking code/IO calls are handled by the worker threads (default is 4).
Now my question is, what happens when a web server built using NodeJs receives multiple requests? I know that there are lots of similar questions here, but haven't found a concrete answer to my question.
So as an example, let's say we have following code inside a route like /index:
app.use('/index', function(req, res, next) {
console.log("hello index routes was invoked");
readImage("path", function(err, content) {
status = "Success";
if(err) {
console.log("err :", err);
status = "Error"
}
else {
console.log("Image read");
}
return res.send({ status: status });
});
var a = 4, b = 5;
console.log("sum =", a + b);
});
Let's assume that the readImage() function takes around 1 min to read that Image.
If two requests, T1, and T2 come in concurrently, how is NodeJs going to process these request ?
Does it going to take first request T1, process it while queueing the request T2? I assume that if any async/blocking stuff is encountered like readImage, it then sends that to a worker thread (then some point later when async stuff is done that thread notifies the main thread and main thread starts executing the callback?), and continues by executing the next line of code?
When it is done with T1, it then processes the T2 request? Is that correct? Or it can process T2 in between (meaning whilethe code for readImage is running, it can start processing T2)?
Is that right?
Your confusion might be coming from not focusing on the event loop enough. Clearly you have an idea of how this works, but maybe you do not have the full picture yet.
Part 1, Event Loop Basics
When you call the use method, what happens behind the scenes is another thread is created to listen for connections.
However, when a request comes in, because we're in a different thread than the V8 engine (and cannot directly invoke the route function), a serialized call to the function is appended onto the shared event loop, for it to be called later. ('event loop' is a poor name in this context, as it operates more like a queue or stack)
At the end of the JavaScript file, the V8 engine will check if there are any running theads or messages in the event loop. If there are none, it will exit with a code of 0 (this is why server code keeps the process running). So the first Timing nuance to understand is that no request will be processed until the synchronous end of the JavaScript file is reached.
If the event loop was appended to while the process was starting up, each function call on the event loop will be handled one by one, in its entirety, synchronously.
For simplicity, let me break down your example into something more expressive.
function callback() {
setTimeout(function inner() {
console.log('hello inner!');
}, 0); // †
console.log('hello callback!');
}
setTimeout(callback, 0);
setTimeout(callback, 0);
† setTimeout with a time of 0, is a quick and easy way to put something on the event loop without any timer complications, since no matter what, it has always been at least 0ms.
In this example, the output will always be:
hello callback!
hello callback!
hello inner!
hello inner!
Both serialized calls to callback are appended to the event loop before either of them is called. This is guaranteed. That happens because nothing can be invoked from the event loop until after the full synchronous execution of the file.
It can be helpful to think of the execution of your file, as the first thing on the event loop. Because each invocation from the event loop can only happen in series, it becomes a logical consequence, that no other event loop invocation can occur during its execution; Only when the previous invocation is finished, can the next event loop function be invoked.
Part 2, The inner Callback
The same logic applies to the inner callback as well, and can be used to explain why the program will never output:
hello callback!
hello inner!
hello callback!
hello inner!
Like you might expect.
By the end of the execution of the file, two serialized function calls will be on the event loop, both for callback. As the Event loop is a FIFO (first in, first out), the setTimeout that came first, will be be invoked first.
The first thing callback does is perform another setTimeout. As before, this will append a serialized call, this time to the inner function, to the event loop. setTimeout immediately returns, and execution will move on to the first console.log.
At this time, the event loop looks like this:
1 [callback] (executing)
2 [callback] (next in line)
3 [inner] (just added by callback)
The return of callback is the signal for the event loop to remove that invocation from itself. This leaves 2 things in the event loop now: 1 more call to callback, and 1 call to inner.
Now callback is the next function in line, so it will be invoked next. The process repeats itself. A call to inner is appended to the event loop. A console.log prints Hello Callback! and we finish by removing this invocation of callback from the event loop.
This leaves the event loop with 2 more functions:
1 [inner] (next in line)
2 [inner] (added by most recent callback)
Neither of these functions mess with the event loop any further. They execute one after the other, the second one waiting for the first one's return. Then when the second one returns, the event loop is left empty. This fact, combined with the fact that there are no other threads currently running, triggers the end of the process, which exits with a return code of 0.
Part 3, Relating to the Original Example
The first thing that happens in your example, is that a thread is created within the process which will create a server bound to a particular port. Note, this is happening in precompiled C++ code, not JavaScript, and is not a separate process, it's a thread within the same process. see: C++ Thread Tutorial.
So now, whenever a request comes in, the execution of your original code won't be disturbed. Instead, incoming connection requests will be opened, held onto, and appended to the event loop.
The use function, is the gateway into catching the events for incoming requests. Its an abstraction layer, but for the sake of simplicity, it's helpful to think of the use function like you would a setTimeout. Except, instead of waiting a set amount of time, it appends the callback to the event loop upon incoming http requests.
So, let's assume that there are two requests coming in to the server: T1 and T2. In your question you say they come in concurrently, since this is technically impossible, I'm going to assume they are one after the other, with a negligible time in between them.
Whichever request comes in first, will be handled first by the secondary thread from earlier. Once that connection has been opened, it's appended to the event loop, and we move on to the next request, and repeat.
At any point after the first request is added to the event loop, V8 can begin execution of the use callback.
A quick aside about readImage
Since its unclear whether readImage is from a particular library, something you wrote or otherwise, it's impossible to tell exactly what it will do in this case. There are only 2 possibilities though, so here they are:
It's entirely synchronous, never using an alternate thread or the event loop
function readImage (path, callback) {
let image = fs.readFileSync(path);
callback(null, image);
// a definition like this will force the callback to
// fully return before readImage returns. This means
// means readImage will block any subsequent calls.
}
It's entirely asynchronous, and takes advantage of fs' async callback.
function readImage (path, callback) {
fs.readFile(path, (err, data) => {
callback(err, data);
});
// a definition like this will force the readImage
// to immediately return, and allow exectution
// to continue.
}
For the purposes of explanation, I'll be operating under the assumption that readImage will immediately return, as proper asynchronous functions should.
Once the use callback execution is started, the following will happen:
The first console log will print.
readImage will kick off a worker thread and immediately return.
The second console log will print.
During all of this, its important to note, these operations are happening synchronously; No other event loop invocation can start until these are finished. readImage may be asynchronous, but calling it is not, the callback and usage of a worker thread is what makes it asynchronous.
After this use callback returns, the next request has probably already finished parsing and was added to the event loop, while V8 was busy doing our console logs and readImage call.
So the next use callback is invoked, and repeats the same process: log, kick off a readImage thread, log again, return.
After this point, the readImage functions (depending on how long they take) have probably already retrieved what they needed and appended their callback to the event loop. So they will get executed next, in order of whichever one retrieved its data first. Remember, these operations were happening in separate threads, so they happened not only in parallel to the main javascript thread, but also parallel to each other, so here, it doesn't matter which one got called first, it matters which one finished first, and got 'dibs' on the event loop.
Whichever readImage completed first will be the first one to execute. So, assuming no errors occured, we'll print out to the console, then write to the response for the corresponding request, held in lexical scope.
When that send returns, the next readImage callback will begin execution: console log, and writing to the response.
At this point, both readImage threads have died, and the event loop is empty, but the thread that holds the server port binding is keeping the process alive, waiting for something else to add to the event loop, and the cycle to continue.
I hope this helps you understand the mechanics behind the asynchronous nature of the example you provided.
For each incoming request, node will handle it one by one. That means there must be order, just like the queue, first in first serve. When node starts processing request, all synchronous code will execute, and asynchronous will pass to work thread, so node can start to process the next request. When the asynchrous part is done, it will go back to main thread and keep going.
So when your synchronous code takes too long, you block the main thread, node won't be able to handle other request, it's easy to test.
app.use('/index', function(req, res, next) {
// synchronous part
console.log("hello index routes was invoked");
var sum = 0;
// useless heavy task to keep running and block the main thread
for (var i = 0; i < 100000000000000000; i++) {
sum += i;
}
// asynchronous part, pass to work thread
readImage("path", function(err, content) {
// when work thread finishes, add this to the end of the event loop and wait to be processed by main thread
status = "Success";
if(err) {
console.log("err :", err);
status = "Error"
}
else {
console.log("Image read");
}
return res.send({ status: status });
});
// continue synchronous part at the same time.
var a = 4, b = 5;
console.log("sum =", a + b);
});
Node won't start processing the next request until finish all synchronous part. So people said don't block the main thread.
There are a number of articles that explain this such as this one
The long and the short of it is that nodejs is not really a single threaded application, its an illusion. The diagram at the top of the above link explains it reasonably well, however as a summary
NodeJS event-loop runs in a single thread
When it gets a request, it hands that request off to a new thread
So, in your code, your running application will have a PID of 1 for example. When you get request T1 it creates PID 2 that processes that request (taking 1 minute). While thats running you get request T2 which spawns PID 3 also taking 1 minute. Both PID 2 and 3 will end after their task is completed, however PID 1 will continue listening and handing off events as and when they come in.
In summary, NodeJS being 'single threaded' is true, however its just an event-loop listener. When events are heard (requests), it passes them off to a pool of threads that execute asynchronously, meaning its not blocking other requests.
You can simply create child process by shifting readImage() function in a different file using fork().
The parent file, parent.js:
const { fork } = require('child_process');
const forked = fork('child.js');
forked.on('message', (msg) => {
console.log('Message from child', msg);
});
forked.send({ hello: 'world' });
The child file, child.js:
process.on('message', (msg) => {
console.log('Message from parent:', msg);
});
let counter = 0;
setInterval(() => {
process.send({ counter: counter++ });
}, 1000);
Above article might be useful to you .
In the parent file above, we fork child.js (which will execute the file with the node command) and then we listen for the message event. The message event will be emitted whenever the child uses process.send, which we’re doing every second.
To pass down messages from the parent to the child, we can execute the send function on the forked object itself, and then, in the child script, we can listen to the message event on the global process object.
When executing the parent.js file above, it’ll first send down the { hello: 'world' } object to be printed by the forked child process and then the forked child process will send an incremented counter value every second to be printed by the parent process.
The V8 JS interpeter (ie: Node) is basically single threaded. But, the processes it kicks off can be async, example: 'fs.readFile'.
As the express server runs, it will open new processes as it needs to complete the requests. So the 'readImage' function will be kicked off (usually asynchronously) meaning that they will return in any order. However the server will manage which response goes to which request automatically.
So you will NOT have to manage which readImage response goes to which request.
So basically, T1 and T2, will not return concurrently, this is virtually impossible. They are both heavily reliant on the Filesystem to complete the 'read' and they may finish in ANY ORDER (this cannot be predicted). Note that processes are handled by the OS layer and are by nature multithreaded (in a modern computer).
If you are looking for a queue system, it should not be too hard to implement/ensure that images are read/returned in the exact order that they are requested.
Since there's not really more to add to the previous answer from Marcus - here's a graphic that explains the single threaded event-loop mechanism:

Allowing only one async operation at a time

I'm building a site and one particular operation triggers a long, server-side process to run. This operation can't be run twice at the same time, so I need to implement some sort of protection. It also can't be made synchronous, because the server needs to continue responding to other requests while it runs.
To that end I've constructed this small concept test, using sleep 5 as a substitute for my actual long-running process (requires express and child-process-promise, runs on a system with a sleep command but substitute whatever for Windows):
var site = require("express")();
var exec = require("child-process-promise").exec;
var busy = false;
site.get("/test", function (req, res) {
if (busy) {
res.json({status:"busy"});
} else {
busy = true; // <-- set busy before we start
exec("sleep 5").then(function () {
res.json({status:"ok"});
}).catch(function (err) {
res.json({status:err.message});
}).then(function () {
busy = false; // <-- finally: clear busy
});
}
});
site.listen(8082);
The intention of this is when "/test" is requested it triggers a long operation, and if "/test" is requested again while it is running, it replies with "busy" and does nothing.
My question is, is this implementation safe and correct? It appears to work in my cursory tests but it's suspiciously simple. Is this the proper way to essentially implement a mutex + a "try-lock" operation, or is there some more appropriate Node.js construct? Coming from languages where I'm used to standard multithreading practices, I'm not quite comfortable with Node's single-threaded-but-asynchronous nature yet.
You're fine - Javascript code can't run concurrently with other JS code in Node. Nothing will change the busy flag out from under you. No need for multithreaded-styled monitors or critical sections.

Calling socket.disconnect in a forEach loop doesn't actually call disconnect on all sockets

I am new to javascript world. Recently I was working on a chat application in nodejs. So I have a method called gracefulshutdown as follows.
var gracefulShutdown = function() {
logger.info("Received kill signal, shutting down gracefully.");
server.close();
logger.info('Disconnecting all the socket.io clients');
if (Object.keys(io.sockets.sockets).length == 0) process.exit();
var _map = io.sockets.sockets,
_socket;
for (var _k in _map) {
if (_map.hasOwnProperty(_k)) {
_socket = _map[_k];
_socket.disconnect(true);
}
}
...code here...
setTimeout(function() {
logger.error("Could not close connections in time, shutting down");
process.exit();
}, 10 * 1000);
}
Here is what is happening in the disconnect listener.The removeDisconnectedClient method simply updates an entry in the db to indicate the removed client.
socket.on('disconnect', function() {
removeDisconnectedClient(socket);
});
So in this case the disconnect event wasn't fired for all sockets. It was fired for only a few sockets randomly from the array. Although I was able to fix it using setTimeout(fn, 0) with the help of a teammate.
I read about it online and understood only this much that setTimeout defers the execution of of code by adding it to end of event queue. I read about javascript context, call stack, event loop. But I couldn't put together all of it in this context. I really don't understand why and how this issue occurred. Could someone explain it in detail. And what is the best way to solve or avoid them.
It is hard to say for sure without a little more context about the rest of the code in gracefulShutdown but I'm surprised it is disconnecting any of the sockets at all:
_socket = _map[ _k ];
socket.disconnect(true);
It appears that you are assigning an item from _map to the variable _socket but then calling disconnect on socket, which is a different variable. I'm guessing it is a typo and you meant to call disconnect on _socket?
Some of the sockets might be disconnecting for other reasons and the appearance that your loop is disconnecting some but not all the sockets is probably just coincidence.
As far as I can tell from the code you posted, socket should be undefined and you should be getting errors about trying to call the disconnect method on undefined.
From the method name where you use it I can suppose that application exits after attempts to disconnect all sockets. The nature of socket communication is asynchronous, so given you have a decent amount of items in _map it can occur that not all messages with disconnect will be sent before the process exits.
You can increase chances by calling exit after some timeout after disconnecting all sockets. However, why would you manually disconnect? On connection interruption remote sockets will automatically get disconnected...
UPDATE
Socket.io for Node.js doesn't have a callback to know for sure that packet with disconnect command was sent. At least in v0.9. I've debugged that and came to conclusion that without modification of sources it is not possible to catch that moment.
In file "socket.io\lib\transports\websocket\hybi-16.js" a method write is called to send the disconnect packet
WebSocket.prototype.write = function (data) {
...
this.socket.write(buf, 'binary');
...
}
Whereas socket.write is defined in Node.js core transport "nodejs-{your-node-version}-src\core-modules-sources\lib\net.js" as
Socket.prototype.write = function(chunk, encoding, cb)
//cb is a callback to be called on writeRequest complete
However as you see this callback is not provided, so socket.io will not know about the packet having been sent.
At the same time when disconnect() is called for websocket, member disconnected is set to true, and "disconnect" event is broadcasted, indeed. But synchronously. So .on('disconnect' handler on server socket doesn't give and valuable information about whether the packet was sent or not.
Solution
I can make a general conclusion from this. If it is so critical to make sure that all clients are immediately informed (and not wait for a heartbeat timeout or if heartbeat is disabled) then this logic should be implemented manually.
You can send an ordinary message which will mean for the client that server is shutting down and call socket disconnect as soon as the message is received. At the same time server will be able to accept all acknowledgements
Server-side:
var sockets = [];
for (var _k in _map) {
if (_map.hasOwnProperty(_k)) {
sockets.push(_map[_k]);
}
}
sockets.map(function (socket) {
socket.emit('shutdown', function () {
socket.isShutdown = true;
var all = sockets.every(function (skt) {
return skt.isShutdown;
});
if (all) {
//wrap in timeout to let current tick finish before quitting
setTimeout(function () {
process.exit();
});
}
})
})
Clients should behave simply
socket.on('shutdown', function () {
socket.disconnect();
});
Thus we make sure each client has explicitly disconnected. We don't care about server. It will be shutdown shortly.
In the example code it looks like io.sockets.sockets is an Object, however, at least in the library version I am using, it is a mutable array which the socket.io library is free to modify each time you are removing a socket with disconnect(true).
Thus, when you call disconnect(true); if the currently iterated item from index i is removed, this effect like this happens:
var a = [1,2,3,4];
for( var i in a) {
a.splice(i,1); // remove item from array
alert(i);
}
// alerts 0,1
Thus, the disconnect(true) call will ask the socket.io to remove the item from the array - and because you are both holding reference to the same array, the contents of the array are modified during the loop.
The solution is to create a copy of the _map with slice() before the loop:
var _map = io.sockets.sockets.slice(); // copy of the original
It would create a copy of the original array and thus should go through all the items in the array.
The reason why calling setTimeout() would also work is that it would defer the removal of the items from the array, allowing the whole loop iterate without modifying the sockets -Array.
The problem here is that sockjs and socket.io use asynchronous "disconnect" methods. IE. When you call disconnect, it is not immediately terminated. It is just a promise that it WILL be terminated. This has the following effect (assuming 3 sockets)
Your for loop grabs the first socket
The disconnect method is called on the first socket
Your for loop grabs the second socket
The disconnect method is called on the second socket
The disconnect method on the first socket finishes
Your for loop grabs the third socket
The disconnect method is called on the third socket
Program kills itself
Notice, that sockets 2 and 3 haven't necessarily finished yet. This could be for a number of reasons.
Finally, setTimeout(fn, 0) is, as you said, blocking the final call, but it may not be consistent (I haven't dug into this too much). By that I mean, you've set the final termination to be AFTER all your sockets have disconnected. The setTimeout and setInterval methods essentially act more like a queue. Your position in the queue is dictated by the timer you set. Two intervals set for 10s each, where they both run synchronously will cause one to run AFTER the other.
After Socket.io 1.0, the library does not expose you an array of the connected sockets. You can check that io.socket.sockets.length, is not equal to the open socket objects. Your best bet is that you broadcast a 'disconnect' message to all the clients that you want to off, and on.'disconnect' on the client side close the actual WebSocket.

Stop function execution from outer closure?

I want to run some scheduled jobs. These jobs can take up a long time and I want to give them a timeout. When a function is already running for 60s then stop the exectuion immediately of the function and all calls from this function.
var x = function (){
/* doing something for eventually a looong time */
}
x.kill()
Is this possible?
Because node.js by itself is single threaded, no other code outside of x() will run until x() returns. So, you cannot conceptually do what you're trying to do with a single function that runs synchronously.
There are some libraries that add some sort of threading (with special limitations) that you could use. You can see this post for more details on some of those options: How to create threads in nodejs
If you show us what the code is doing inside of x(), then we could offer you some ideas on how to restructure that code to accomplish your goal. If it has asynchronous operations (or could be converted to use asynchronous operations), then x() will return and other code could run while node.js is waiting for those asynchronous operations to do their thing and you could signal to those asynchronous operations that they should stop doing their thing. But, the majority of the time would have to be waiting for asynchronous operations in order for that scheme to work.
If you can move the x() code to a separate process, then you could start a child process, give it the x() code to run and then kill the entire node child process if it doesn't finish in a certain amount of time. This is obviously a bit of a heavy-weight way to do handle this function call, but it does give you the ability to kill the whole environment if needed. It also provides process isolation if that's useful from a security or privacy point of view.
If you're using Node.js, you may consider using child_process to make asynchronous function calls which you can kill it later in case it doesn't finish in a period of time.
But this approach will need you to separate function x to another JS file, say modulex.js which implements:
function x(){
// do whatever
}
x();
While, in your main.js (or any name you give it) where you want to start running function x in that modulex.js asynchronously and kill it later, you call it via child_process which is one of the built-in feature of Node.js:
var spawn = require('child_process').spawn;
var x = spawn('node modulex.js'); // give any particular arguments if required
x.stdout.on('data', function (data) {
// event handler when function x finishes!
// data = any output printed by modulex.js
});
x.stderr.on('data', function (data) {
// event handler when function x fails!
// data = any error printed by modulex.js
});
// Or kill the `x` after a timeout with this:
function timeout(){
x.kill();
}
This approach will need you to redesign the architecture of your node application slightly. But this will cope with single-threaded JavaScript more efficiently.
I recommend reading this official documentation of child_process on node.js before getting started: https://nodejs.org/api/child_process.html

Categories