What happens with unhandled socket.io events? - javascript

Does socket.io ignore\drop them?
The reason why Im asking this is the following.
There is a client with several states. Each state has its own set of socket handlers. At different moments server notifies the client of state change and after that sends several state dependent messages.
But! It takes some time for the client to change state and to set new handlers. In this case client can miss some msgs... because there are no handlers at the moment.
If I understand correctly unhandled msgs are lost for client.
May be I miss the concept or do smth wrong... How to hanle this issues?

Unhandled messages are just ignored. It's just like when an event occurs and there are no event listeners for that event. The socket receives the msg and doesn't find a handler for it so nothing happens with it.
You could avoid missing messages by always having the handlers installed and then deciding in the handlers (based on other state) whether to do anything with the message or not.

jfriend00's answer is a good one, and you are probably fine just leaving the handlers in place and using logic in the callback to ignore events as needed. If you really want to manage the unhandled packets though, read on...
You can get the list of callbacks from the socket internals, and use it to compare to the incoming message header. This client-side code will do just that.
// Save a copy of the onevent function
socket._onevent = socket.onevent;
// Replace the onevent function with a handler that captures all messages
socket.onevent = function (packet) {
// Compare the list of callbacks to the incoming event name
if( !Object.keys(socket._callbacks).map(x => x.substr(1)).includes(packet.data[0]) ) {
console.log(`WARNING: Unhandled Event: ${packet.data}`);
}
socket._onevent.apply(socket, Array.prototype.slice.call(arguments));
};
The object socket._callbacks contains the callbacks and the keys are the names. They have a $ prepended to them, so you can trim that off the entire list by mapping substring(1) onto it. That results in a nice clean list of event names.
IMPORTANT NOTE: Normally you should not attempt to externally modify any object member starting with an underscore. Also, expect that any data in it is unstable. The underscore indicates it is for internal use in that object, class or function. Though this object is not stable, it should be up to date enough for us to use it, and we aren't modifying it directly.
The event name is stored in the first entry under packet.data. Just check to see if it is in the list, and raise the alarm if it is not. Now when you send an event from the server the client does not know it will note it in the browser console.
Now you need to save the unhandled messages in a buffer, to play back once the handlers are available again. So to expand on our client-side code from before...
// Save a copy of the onevent function
socket._onevent = socket.onevent;
// Make buffer and configure buffer timings
socket._packetBuffer = [];
socket._packetBufferWaitTime = 1000; // in milliseconds
socket._packetBufferPopDelay = 50; // in milliseconds
function isPacketUnhandled(packet) {
return !Object.keys(socket._callbacks).map(x => x.substr(1)).includes(packet.data[0]);
}
// Define the function that will process the buffer
socket._packetBufferHandler = function(packet) {
if( isPacketUnhandled(packet) ) {
// packet can't be processed yet, restart wait cycle
socket._packetBuffer.push(packet);
console.log(`packet handling not completed, retrying`)
setTimeout(socket._packetBufferHandler, socket._packetBufferWaitTime, socket._packetBuffer.pop());
}
else {
// packet can be processed now, start going through buffer
socket._onevent.apply(socket, Array.prototype.slice.call(arguments));
if(socket._packetBuffer.length > 0) {
setTimeout(socket._packetBufferHandler,socket._packetBufferPopDelay(), socket._packetBuffer.pop());
}
else {
console.log(`all packets in buffer processed`)
socket._packetsWaiting = false;
}
}
}
// Replace the onevent function with a handler that captures all messages
socket.onevent = function (packet) {
// Compare the list of callbacks to the incoming event name
if( isPacketUnhandled(packet) ) {
console.log(`WARNING: Unhandled Event: ${packet.data}`);
socket._packetBuffer.push(packet);
if(!socket._packetsWaiting) {
socket._packetsWaiting = true;
setTimeout(socket._packetBufferHandler, socket._packetBufferWaitTime, socket._packetBuffer.pop());
}
}
socket._onevent.apply(socket, Array.prototype.slice.call(arguments));
};
Here the unhandled packets get pushed into the buffer and a timer is set running. Once the given amount of time has passed, if starts checking to see if the handlers for each item are ready. Each one is handled until all are exhausted or a handler is missing, which trigger another wait.
This can and will stack up unhandled calls until you blow out the client's allotted memory, so make sure that those handlers DO get loaded in a reasonable time span. And take care not to send it anything that will never get handled, because it will keep trying forever.
I tested it with really long strings and it was able to push them through, so what they are calling 'packet' is probably not a standard packet.
Tested with SocketIO version 2.2.0 on Chrome.

Related

How to delay function call until its callback will be finished in other places

I'm using child_process to write commands to console, and then subscribe on 'data' event to get output from it. The problem is that sometime outputs are merged with each other.
let command = spawn('vlc', { shell: true });
writeCommand(cmd, callback) {
process.stdin.write(`${cmd}\n`);
this.isBusy = true;
this.process.stdout.on('data', (d) => {
callback(d);
});
}
Function writeCommand is used in several places, how can I delay it from executing until output from previous command is finished?
My output can look like (for status command for example):
( audio volume: 230 ) ( state stopped ) >
data events on a stream have zero guarantees that a whole "unit" of output will come together in a single data event. It could easily be broken up into multiple data events. So, this combined with the fact that you are providing multiple inputs which generate multiple outputs means that you need a way to parse both when you have a complete set of output and thus should call the callback with it and also how to delineate the boundaries between sets of output.
You don't show us what your output looks like so we can't offer any concrete suggestions on how to parse it in that way, but common delimiters are double line feeds of things like that. It would entirely depend upon what your output naturally does at the end or if you control the content the child process creates, what you can insert at the end of the output.
Another work-around for the merged output would be to not send the 2nd command until the 1st one is done (perhaps by using some sort of pending queue). But, you will still need a way to parse the output to know when you actually have the completion of the previous output.
Another problem:
In the code you show, every time you call writeCommand(), you will add yet another listener for the data event. So, when you call it twice to send different commands, you will now have two listeners both listening for the same data and you will be processing the same response twice instead of just once.
let command = spawn('vlc', { shell: true });
writeCommand(cmd, callback) {
process.stdin.write(`${cmd}\n`);
this.isBusy = true;
// every time writeCommand is called, it adds yet another listener
this.process.stdout.on('data', (d) => {
callback(d);
});
}
If you really intend to call this multiple times and multiple commands could be "in flight" at the same time, then you really can't use this coding structure. You will probably need one permanent listener for the data event that is outside this function because you don't want to have more than one listener at the same time and since you've already found that the data from two commands can be merged, even if you separate them, you can't use this structure to capture the data appropriately for the second part of the merged output.
You can use a queuing mechanism to execute the next command after the first one is finished. You can also use a library like https://www.npmjs.com/package/p-limit to do it for you.

How to cancel a wasm process from within a webworker

I have a wasm process (compiled from c++) that processes data inside a web application. Let's say the necessary code looks like this:
std::vector<JSONObject> data
for (size_t i = 0; i < data.size(); i++)
{
process_data(data[i]);
if (i % 1000 == 0) {
bool is_cancelled = check_if_cancelled();
if (is_cancelled) {
break;
}
}
}
This code basically "runs/processes a query" similar to a SQL query interface:
However, queries may take several minutes to run/process and at any given time the user may cancel their query. The cancellation process would occur in the normal javascript/web application, outside of the service Worker running the wasm.
My question then is what would be an example of how we could know that the user has clicked the 'cancel' button and communicate it to the wasm process so that knows the process has been cancelled so it can exit? Using the worker.terminate() is not an option, as we need to keep all the loaded data for that worker and cannot just kill that worker (it needs to stay alive with its stored data, so another query can be run...).
What would be an example way to communicate here between the javascript and worker/wasm/c++ application so that we can know when to exit, and how to do it properly?
Additionally, let us suppose a typical query takes 60s to run and processes 500MB of data in-browser using cpp/wasm.
Update: I think there are the following possible solutions here based on some research (and the initial answers/comments below) with some feedback on them:
Use two workers, with one worker storing the data and another worker processing the data. In this way the processing-worker can be terminated, and the data will always remain. Feasible? Not really, as it would take way too much time to copy over ~ 500MB of data to the webworker whenever it starts. This could have been done (previously) using SharedArrayBuffer, but its support is now quite limited/nonexistent due to some security concerns. Too bad, as this seems like by far the best solution if it were supported...
Use a single worker using Emterpreter and using emscripten_sleep_with_yield. Feasible? No, destroys performance when using Emterpreter (mentioned in the docs above), and slows down all queries by about 4-6x.
Always run a second worker and in the UI just display the most recent. Feasible? No, would probably run into quite a few OOM errors if it's not a shared data structure and the data size is 500MB x 2 = 1GB (500MB seems to be a large though acceptable size when running in a modern desktop browser/computer).
Use an API call to a server to store the status and check whether the query is cancelled or not. Feasible? Yes, though it seems quite heavy-handed to long-poll with network requests every second from every running query.
Use an incremental-parsing approach where only a row at a time is parsed. Feasible? Yes, but also would require a tremendous amount of re-writing the parsing functions so that every function supports this (the actual data parsing is handled in several functions -- filter, search, calculate, group by, sort, etc. etc.
Use IndexedDB and store the state in javascript. Allocate a chunk of memory in WASM, then return its pointer to JavaScript. Then read database there and fill the pointer. Then process your data in C++. Feasible? Not sure, though this seems like the best solution if it can be implemented.
[Anything else?]
In the bounty then I was wondering three things:
If the above six analyses seem generally valid?
Are there other (perhaps better) approaches I'm missing?
Would anyone be able to show a very basic example of doing #6 -- seems like that would be the best solution if it's possible and works cross-browser.
For Chrome (only) you may use shared memory (shared buffer as memory). And raise a flag in memory when you want to halt. Not a big fan of this solution (is complex and is supported only in chrome). It also depends on how your query works, and if there are places where the lengthy query can check the flag.
Instead you should probably call the c++ function multiple times (e.g. for each query) and check if you should halt after each call (just send a message to the worker to halt).
What I mean by multiple time is make the query in stages (multiple function cals for a single query). It may not be applicable in your case.
Regardless, AFAIK there is no way to send a signal to a Webassembly execution (e.g. Linux kill). Therefore, you'll have to wait for the operation to finish in order to complete the cancellation.
I'm attaching a code snippet that may explain this idea.
worker.js:
... init webassembly
onmessage = function(q) {
// query received from main thread.
const result = ... call webassembly(q);
postMessage(result);
}
main.js:
const worker = new Worker("worker.js");
const cancel = false;
const processing = false;
worker.onmessage(function(r) {
// when worker has finished processing the query.
// r is the results of the processing.
processing = false;
if (cancel === true) {
// processing is done, but result is not required.
// instead of showing the results, update that the query was canceled.
cancel = false;
... update UI "cancled".
return;
}
... update UI "results r".
}
function onCancel() {
// Occurs when user clicks on the cancel button.
if (cancel) {
// sanity test - prevent this in UI.
throw "already cancelling";
}
cancel = true;
... update UI "canceling".
}
function onQuery(q) {
if (processing === true) {
// sanity test - prevent this in UI.
throw "already processing";
}
processing = true;
// Send the query to the worker.
// When the worker receives the message it will process the query via webassembly.
worker.postMessage(q);
}
An idea from user experience perspective:
You may create ~two workers. This will take twice the memory, but will allow you to "cancel" "immediately" once. (it will just mean that in the backend the 2nd worker will run the next query, and when the 1st finishes the cancellation, cancellation will again become immediate).
Shared Thread
Since the worker and the C++ function that it called share the same thread, the worker will also be blocked until the C++ loop is finished, and won't be able to handle any incoming messages. I think the a solid option would minimize the amount of time that the thread is blocked by instead initializing one iteration at a time from the main application.
It would look something like this.
main.js -> worker.js -> C++ function -> worker.js -> main.js
Breaking up the Loop
Below, C++ has a variable initialized at 0, which will be incremented at each loop iteration and stored in memory.
C++ function then performs one iteration of the loop, increments the variable to keep track of loop position, and immediately breaks.
int x;
x = 0; // initialized counter at 0
std::vector<JSONObject> data
for (size_t i = x; i < data.size(); i++)
{
process_data(data[i]);
x++ // increment counter
break; // stop function until told to iterate again starting at x
}
Then you should be able to post a message to the web worker, which then sends a message to main.js that the thread is no longer blocked.
Canceling the Operation
From this point, main.js knows that the web worker thread is no longer blocked, and can decide whether or not to tell the web worker to execute the C++ function again (with the C++ variable keeping track of the loop increment in memory.)
let continueOperation = true
// here you can set to false at any time since the thread is not blocked here
worker.expensiveThreadBlockingFunction()
// results in one iteration of the loop being iterated until message is received below
worker.onmessage = function(e) {
if (continueOperation) {
worker.expensiveThreadBlockingFunction()
// execute worker function again, ultimately continuing the increment in C++
} {
return false
// or send message to worker to reset C++ counter to prepare for next execution
}
}
Continuing the Operation
Assuming all is well, and the user has not cancelled the operation, the loop should continue until finished. Keep in mind you should also send a distinct message for whether the loop has completed, or needs to continue, so you don't keep blocking the worker thread.

How to avoid multiple node processes doing repetitive things?

I have a module in Node.js which repeatedly pick a document from MongoDB and process it. One document should be processed only once. I also want to use multiple processes concept. I want to run the same module(process) on different processors, which run independently.
The problem is, there might be a scenario where the same document picked and processed by two different workers. How multiple processes can know that, a particular document is processed by some other worker so I should not touch it. And there is no way that my independent processes can communicate. I cannot use a parent which forks multiple processes and acts as a bridge between them. How to avoid this kind of problems in Node.js?
One way to do it is to assign an unique numeric ID to each of your MongoDB documents, and to assign an unique numeric identifier to each of your node.js workers.
For example, have an env var called NUM_WORKERS, and then in your node.js module:
var NumWorkers = process.env.NUM_WORKERS || 1;
You then need to assign an unique, contiguous instance number id (in the range 0 to NumWorkers-1) to each of your workers (e.g. via a command line parameter read by your node.js process when it initializes). You can store that in a variable called MyWorkerInstanceNum.
When you pick a document from MongoDB, call the following function (passing the document's unique documentId as a parameter):
function isMine(documentId){
//
// Example: documentId=10
// NumWorkers= 4
// (10 % 4) = 2
// If MyWorkerInstanceNum is 2, return true, else return false.
return ((documentId % NumWorkers) === MyWorkerInstanceNum);
}
Only continue to actually process the document if isMine() returns true.
So, multiple workers may "pick" a document, but only one worker will actually process it.
Simply keep a transaction log of the document being processed by its unique ID. In the transaction log table for the processed documents, write the status as one of the following (for example):
requested
initiated
processed
failed
You may also want a column in that table for stderr/stdout in case you want to know why something failed or succeeded, and timestamps - that sort of thing.
When you initialize the processing of the document in your Node app, look up the document by ID and check its status. If it doesn't exist, then you're free to process it.
Pseudo-code (sorry, I'm not a Mongo guy!):
db.collection.list('collectionName', function(err, doc) {
db.collection.find(doc.id, 'transactions', function(err, trx) {
if (trx === undefined || trx.status === 'failed') {
DocProcessor.child.process(doc)
} else {
// don't need to process it, it's already been done
}
})
})
You'll also want to enable concurrency locking on the transactions log collection so that you ensure a row (and subsequent job) can't be duplicated. If this becomes a challenge to ensure docs are being queued properly, consider adding in an AMQP service to handle queuing of the docs. Set up a handler to manage distribution of the child processes and transaction logging. Flow would be something like:
MQ ⇢ Log ⇢ Handler ⇢ Doc processor children

Calling socket.disconnect in a forEach loop doesn't actually call disconnect on all sockets

I am new to javascript world. Recently I was working on a chat application in nodejs. So I have a method called gracefulshutdown as follows.
var gracefulShutdown = function() {
logger.info("Received kill signal, shutting down gracefully.");
server.close();
logger.info('Disconnecting all the socket.io clients');
if (Object.keys(io.sockets.sockets).length == 0) process.exit();
var _map = io.sockets.sockets,
_socket;
for (var _k in _map) {
if (_map.hasOwnProperty(_k)) {
_socket = _map[_k];
_socket.disconnect(true);
}
}
...code here...
setTimeout(function() {
logger.error("Could not close connections in time, shutting down");
process.exit();
}, 10 * 1000);
}
Here is what is happening in the disconnect listener.The removeDisconnectedClient method simply updates an entry in the db to indicate the removed client.
socket.on('disconnect', function() {
removeDisconnectedClient(socket);
});
So in this case the disconnect event wasn't fired for all sockets. It was fired for only a few sockets randomly from the array. Although I was able to fix it using setTimeout(fn, 0) with the help of a teammate.
I read about it online and understood only this much that setTimeout defers the execution of of code by adding it to end of event queue. I read about javascript context, call stack, event loop. But I couldn't put together all of it in this context. I really don't understand why and how this issue occurred. Could someone explain it in detail. And what is the best way to solve or avoid them.
It is hard to say for sure without a little more context about the rest of the code in gracefulShutdown but I'm surprised it is disconnecting any of the sockets at all:
_socket = _map[ _k ];
socket.disconnect(true);
It appears that you are assigning an item from _map to the variable _socket but then calling disconnect on socket, which is a different variable. I'm guessing it is a typo and you meant to call disconnect on _socket?
Some of the sockets might be disconnecting for other reasons and the appearance that your loop is disconnecting some but not all the sockets is probably just coincidence.
As far as I can tell from the code you posted, socket should be undefined and you should be getting errors about trying to call the disconnect method on undefined.
From the method name where you use it I can suppose that application exits after attempts to disconnect all sockets. The nature of socket communication is asynchronous, so given you have a decent amount of items in _map it can occur that not all messages with disconnect will be sent before the process exits.
You can increase chances by calling exit after some timeout after disconnecting all sockets. However, why would you manually disconnect? On connection interruption remote sockets will automatically get disconnected...
UPDATE
Socket.io for Node.js doesn't have a callback to know for sure that packet with disconnect command was sent. At least in v0.9. I've debugged that and came to conclusion that without modification of sources it is not possible to catch that moment.
In file "socket.io\lib\transports\websocket\hybi-16.js" a method write is called to send the disconnect packet
WebSocket.prototype.write = function (data) {
...
this.socket.write(buf, 'binary');
...
}
Whereas socket.write is defined in Node.js core transport "nodejs-{your-node-version}-src\core-modules-sources\lib\net.js" as
Socket.prototype.write = function(chunk, encoding, cb)
//cb is a callback to be called on writeRequest complete
However as you see this callback is not provided, so socket.io will not know about the packet having been sent.
At the same time when disconnect() is called for websocket, member disconnected is set to true, and "disconnect" event is broadcasted, indeed. But synchronously. So .on('disconnect' handler on server socket doesn't give and valuable information about whether the packet was sent or not.
Solution
I can make a general conclusion from this. If it is so critical to make sure that all clients are immediately informed (and not wait for a heartbeat timeout or if heartbeat is disabled) then this logic should be implemented manually.
You can send an ordinary message which will mean for the client that server is shutting down and call socket disconnect as soon as the message is received. At the same time server will be able to accept all acknowledgements
Server-side:
var sockets = [];
for (var _k in _map) {
if (_map.hasOwnProperty(_k)) {
sockets.push(_map[_k]);
}
}
sockets.map(function (socket) {
socket.emit('shutdown', function () {
socket.isShutdown = true;
var all = sockets.every(function (skt) {
return skt.isShutdown;
});
if (all) {
//wrap in timeout to let current tick finish before quitting
setTimeout(function () {
process.exit();
});
}
})
})
Clients should behave simply
socket.on('shutdown', function () {
socket.disconnect();
});
Thus we make sure each client has explicitly disconnected. We don't care about server. It will be shutdown shortly.
In the example code it looks like io.sockets.sockets is an Object, however, at least in the library version I am using, it is a mutable array which the socket.io library is free to modify each time you are removing a socket with disconnect(true).
Thus, when you call disconnect(true); if the currently iterated item from index i is removed, this effect like this happens:
var a = [1,2,3,4];
for( var i in a) {
a.splice(i,1); // remove item from array
alert(i);
}
// alerts 0,1
Thus, the disconnect(true) call will ask the socket.io to remove the item from the array - and because you are both holding reference to the same array, the contents of the array are modified during the loop.
The solution is to create a copy of the _map with slice() before the loop:
var _map = io.sockets.sockets.slice(); // copy of the original
It would create a copy of the original array and thus should go through all the items in the array.
The reason why calling setTimeout() would also work is that it would defer the removal of the items from the array, allowing the whole loop iterate without modifying the sockets -Array.
The problem here is that sockjs and socket.io use asynchronous "disconnect" methods. IE. When you call disconnect, it is not immediately terminated. It is just a promise that it WILL be terminated. This has the following effect (assuming 3 sockets)
Your for loop grabs the first socket
The disconnect method is called on the first socket
Your for loop grabs the second socket
The disconnect method is called on the second socket
The disconnect method on the first socket finishes
Your for loop grabs the third socket
The disconnect method is called on the third socket
Program kills itself
Notice, that sockets 2 and 3 haven't necessarily finished yet. This could be for a number of reasons.
Finally, setTimeout(fn, 0) is, as you said, blocking the final call, but it may not be consistent (I haven't dug into this too much). By that I mean, you've set the final termination to be AFTER all your sockets have disconnected. The setTimeout and setInterval methods essentially act more like a queue. Your position in the queue is dictated by the timer you set. Two intervals set for 10s each, where they both run synchronously will cause one to run AFTER the other.
After Socket.io 1.0, the library does not expose you an array of the connected sockets. You can check that io.socket.sockets.length, is not equal to the open socket objects. Your best bet is that you broadcast a 'disconnect' message to all the clients that you want to off, and on.'disconnect' on the client side close the actual WebSocket.

Internals (client and server) of aborting an XMLHttpRequest

So I'm curious about the actual underlying behaviours that occur when aborting an async javascript request. There was some related info in this question but I've yet to find anything comprehensive.
My assumption has always been that aborting the request causes the browser to close the connection and stop processing it entirely, thus causing the server to do the same if it's been setup to do so. I imagine however that there might be browser-specific quirks or edge cases here I'm not thinking of.
My understanding is as follows, I'm hoping someone can correct it if necessary and that this can be a good reference for others going forwards.
Aborting the XHR request clientside causes the browser to internally close the socket and stop processing it. I would expect this behaviour rather than simply ignoring the data coming in and wasting memory. I'm not betting on IE on that though.
An aborted request on the server would be up to what's running there:
I know with PHP the default behaviour is to stop processing when the client socket is closed, unless ignore_user_abort() has been called. So closing XHR connections saves you server power as well.
I'm really interested to know how this could be handled in node.js, I assume some manual work would be needed there.
I have no idea really about other server languages / frameworks and how they behave but if anyone wants to contribute specifics I'm happy to add them here.
For the client, the best place to look is in the source, so let's do this! :)
Let's look at Blink's implementation of XMLHttpRequest's abort method (lines 1083-1119 in XMLHttpRequest.cpp):
void XMLHttpRequest::abort()
{
WTF_LOG(Network, "XMLHttpRequest %p abort()", this);
// internalAbort() clears |m_loader|. Compute |sendFlag| now.
//
// |sendFlag| corresponds to "the send() flag" defined in the XHR spec.
//
// |sendFlag| is only set when we have an active, asynchronous loader.
// Don't use it as "the send() flag" when the XHR is in sync mode.
bool sendFlag = m_loader;
// internalAbort() clears the response. Save the data needed for
// dispatching ProgressEvents.
long long expectedLength = m_response.expectedContentLength();
long long receivedLength = m_receivedLength;
if (!internalAbort())
return;
// The script never gets any chance to call abort() on a sync XHR between
// send() call and transition to the DONE state. It's because a sync XHR
// doesn't dispatch any event between them. So, if |m_async| is false, we
// can skip the "request error steps" (defined in the XHR spec) without any
// state check.
//
// FIXME: It's possible open() is invoked in internalAbort() and |m_async|
// becomes true by that. We should implement more reliable treatment for
// nested method invocations at some point.
if (m_async) {
if ((m_state == OPENED && sendFlag) || m_state == HEADERS_RECEIVED || m_state == LOADING) {
ASSERT(!m_loader);
handleRequestError(0, EventTypeNames::abort, receivedLength, expectedLength);
}
}
m_state = UNSENT;
}
So from this, it looks like the majority of the grunt work is done within internalAbort, which looks like this:
bool XMLHttpRequest::internalAbort()
{
m_error = true;
if (m_responseDocumentParser && !m_responseDocumentParser->isStopped())
m_responseDocumentParser->stopParsing();
clearVariablesForLoading();
InspectorInstrumentation::didFailXHRLoading(executionContext(), this, this);
if (m_responseLegacyStream && m_state != DONE)
m_responseLegacyStream->abort();
if (m_responseStream) {
// When the stream is already closed (including canceled from the
// user), |error| does nothing.
// FIXME: Create a more specific error.
m_responseStream->error(DOMException::create(!m_async && m_exceptionCode ? m_exceptionCode : AbortError, "XMLHttpRequest::abort"));
}
clearResponse();
clearRequest();
if (!m_loader)
return true;
// Cancelling the ThreadableLoader m_loader may result in calling
// window.onload synchronously. If such an onload handler contains open()
// call on the same XMLHttpRequest object, reentry happens.
//
// If, window.onload contains open() and send(), m_loader will be set to
// non 0 value. So, we cannot continue the outer open(). In such case,
// just abort the outer open() by returning false.
RefPtr<ThreadableLoader> loader = m_loader.release();
loader->cancel();
// If abort() called internalAbort() and a nested open() ended up
// clearing the error flag, but didn't send(), make sure the error
// flag is still set.
bool newLoadStarted = m_loader;
if (!newLoadStarted)
m_error = true;
return !newLoadStarted;
}
I'm no C++ expert but from the looks of it, internalAbort does a few things:
Stops any processing it's currently doing on a given incoming response
Clears out any internal XHR state associated with the request/response
Tells the inspector to report that the XHR failed (this is really interesting! I bet it's where those nice console messages originate)
Closes either the "legacy" version of a response stream, or the modern version of the response stream (this is probably the most interesting part pertaining to your question)
Deals with some threading issues to ensure the error is propagated properly (thanks, comments).
After doing a lot of digging around, I came across an interesting function within HttpResponseBodyDrainer (lines 110-124) called Finish which to me looks like something that would eventually be called when a request is cancelled:
void HttpResponseBodyDrainer::Finish(int result) {
DCHECK_NE(ERR_IO_PENDING, result);
if (session_)
session_->RemoveResponseDrainer(this);
if (result < 0) {
stream_->Close(true /* no keep-alive */);
} else {
DCHECK_EQ(OK, result);
stream_->Close(false /* keep-alive */);
}
delete this;
}
It turns out that stream_->Close, at least in the BasicHttpStream, delegates to the HttpStreamParser::Close, which, when given a non-reusable flag (which does seem to happen when the request is aborted, as seen in HttpResponseDrainer), does close the socket:
void HttpStreamParser::Close(bool not_reusable) {
if (not_reusable && connection_->socket())
connection_->socket()->Disconnect();
connection_->Reset();
}
So, in terms of what happens on the client, at least in the case of Chrome, it looks like your initial intuitions were correct as far as I can tell :) seems like most of the quirks and edge cases have to do with scheduling/event notification/threading issues, as well as browser-specific handling, e.g. reporting the aborted XHR to the devtools console.
In terms of the server, in the case of NodeJS you'd want to listen for the 'close' event on the http response object. Here's a simple example:
'use strict';
var http = require('http');
var server = http.createServer(function(req, res) {
res.on('close', console.error.bind(console, 'Connection terminated before response could be sent!'));
setTimeout(res.end.bind(res, 'yo'), 2000);
});
server.listen(8080);
Try running that and canceling the request before it completes. You'll see an error at your console.
Hope you found this useful. Digging through the Chromium/Blink source was a lot of fun :)

Categories