How to process websocket messages sequentially

How to process websocket messages sequentially - javascript

I am receiving dozens of messages per WebSocket which can arrive with a few milliseconds of difference. I need to process these data with operations which can sometimes take a little time (insertions in DB for example).
In order to process a new message received, it is imperative that the previous one has finished being processed.
My first idea was to prepare a queue with node.js Bull ( with Redis ), but I'm afraid it's too long to run. The processing of these messages must remain fast.
I tried to use JS iterators/generators ( something I never used until now ) and I tested something like this :
const ws = new WebSocket(`${this.baseUrl}${this.path}`)
const duplex = WebSocket.createWebSocketStream(ws, { encoding: 'utf8' })
const messageGenerator = async function* (duplex) {
for await (const message of duplex) {
yield message
}
}
for await (let msg of messageGenerator(socketApi.duplex)) {
console.log('start process')
await this.messageHandler.handleMessage(msg, user)
console.log('end process')
}
log :
start process
start process
end process
end process
Unfortunately, as you can see, messages continue to be processed without waiting for the previous one to finish. Do you have a solution to this problem?
Should I finally use a queue with Redis to process the messages?
Thanks

I am not a nodeJS guy but I have thought about the same issue multiple times in other languages. I have concluded that it really matters how slow are the message process operations, because if they are too slow (slower than a certain threshold depending on the msg per second value), this can cause a bottleneck on the websocket connection and when this bottleneck builds up it can cause extreme delays in future messages.
If await and async have identical behaviour as in python, if you process any operation using them, your processing will be asynchronous, which means that it indeed will not wait for the previous one to be processed.
So far I have though of two options:
Keep processing the messages asynchronously, but write some additional logic in the code processing them, which manages the order mess. For example, confirm that the previous message has been already processed before proceeding with the current message. This logic can be complex and slow because it runs in a separate thread and doesn't block the websocket messages.
Process the messages synchronously, one by one, but extremely fast by doing one single operation: storing them in Redis. This is way faster than storing them in database and in most cases will be enough fast not to cause bottlenecks in the WS connection. Then use separate process to get these messages from Redis and process them.

Related

NodeJS TCP server not working because of while loop

I'm going to illustrate my issue with this simple example.
What boggles my mind is why the server never gets created
and the socket never gets printed.
If I were to remove the while loop everything works.
What do I have to change to make the example below function?
const net = require('net');
net.createServer(socket => {
socket.setEncoding('utf-8');
console.log(socket);
}).listen(4242, '127.0.0.1');
console.log('do some while logic here')
while(true) { }

This happens because your socket creation is not an instant process. It needs to make system calls and so on. In other words, it is asynchronous. The way javascript works are that it has main loop and callback queue. Basically the main loop is what is executed and callback queue is the things that await to be executed (See MDN docs on this https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop).
What happens in your case is that your callback goes to the callback queue and waits to be executed, but it never gets to do so, because your main loop is blocked by while (true) {} loop. If you want nonblocking behavior you need to send things that are inside you while loop to callback queue instead. One of the ways to do it in javascript is to use setTimeout. E.g.
const net = require('net');
net.createServer(socket => {
socket.setEncoding('utf-8');
console.log(socket);
}).listen(4242, '127.0.0.1');
console.log('do some while logic here')
function main() {
// do something here
setTimeout(main);
}
main()
This way you're not going to have a stack overflow issue and you get nonblocking behavior in you while loop.

Nodejs is an event driven system that runs your Javascript single threaded. That means that in order for things to work properly, you cannot hog the entire CPU in a while() loop (or any other kind of loop) unless the loop directly contains an await statement that is awaiting an actual promise tied to an asynchronous operation.
This is a basic principle of programming in nodejs and you have to learn how to structure your program logic into the event driven world. You don't show what you're really trying to do, but "polling" anything in a tight loop is generally not the correct way to program an event driven system.
So, in the code you show here:
const net = require('net');
net.createServer(socket => {
socket.setEncoding('utf-8');
console.log(socket);
}).listen(4242, '127.0.0.1');
console.log('do some while logic here')
while(true) { }
Your while loop just spins forever and never allows any events to get processed and therefore your server can never get events about incoming connections. The events will just pile up in the event queue, but you never give nodejs a chance to go back to the event queue to process those events. To do so, you must finish what you're doing and return control back to the system (thus why you can't use the while(true) { } loop).
So, you really need to be thinking event-driven programming in nodejs. You set up event listeners and you execute code some time in the future when those events occur. You can artificially create events with setTimeout() or setInterval(), but doing that constantly or with really, really short time durations is just polling and is not an efficient way to program a nodejs server either.
If you show or describe for us what you're really trying to do in the rest of your code, we can advise the most important part of this question which is how to actually write that code in an event-driven fashion.
I repeat, learning how to program in an event-driven fashion is required for an efficient, scalable nodejs server process.

Is there a delay between task completion and callback function execution?

I am learning Node.js and some javascript. I read up some stuff of thinks like queues and execution stacks.
I am trying to calculate time taken by a websocket request to complete. A very typical emit is of form:
microtime1 = getTimeNow;
socket.emit("message","data", function (error, data) {
// calculate time taken by using microtime module by getting updated time and calculating difference.
microtime2 = getTimeNow;
time = microtime2 - microtime1;
})
If I am sending multiple messages, can I rely on callback getting executed without delay or can there be a hold up in the queue and callback won't get executed.
In other words, would callback only get called once it's in stack or does it get executed while it's waiting to be picked up in the queue ?
Hope, I was able to explain my question.

In other words, would callback only get called once it's in stack or does it get executed while it's waiting to be picked up in the queue ?
The callback gets executed after the event, that it is waiting for, is resolved.
So the callback should work just fine, however there is a caveat. because node-js is single threaded, you could have another process that's blocked the main thread.
For example the simple view of execution may look like this. One event is processed, and then another one is processed after.
However, in reality it may look more like this
The single thread is meant for the main thread only, things like the IO operations are done on another dedicated thread that will notify main thread when it's done, and then the callback can be executed after
The problem occurs if your main thread becomes busy while waiting for the network action to complete
This is hard to predict though and depends on what the rest of the app is doing. If your app is not doing anything else, this likely won't be an issue. But, IMHO, a better way is to make hundreds or thousands of calls and allow get an average which will account for other possible causes for discrepancies in the delta.
Additional data from c-sharpcorner.com
The above diagram shows the
execution process of Node.js. Let's understand it step by step.
Step 1
Whenever a request comes to Node.js API, that incoming request is
added to the event queue. This is because Node.js can't handle
multiple requests simultaneously. The first time, the incoming request
is added to the event queue.
Step 2
Now, you can see in the diagram that one loop is there which always
checks if any event or request is available in event queue or not. If
any requests are there, then according to the "First Come, First
Served" property of queue, the requests will be served.
Step 3
This Node.js event loop is single threaded and performs non blocking
i/o tasks, so it sends requests to C++ internal thread pool where lots
of threads can be run. This C++ internal thread pool is the part of
event loop developed in Libuv. This can handle multiple requests. Now,
event loop checks again and again if any event is there in the Event
Queue. If there is any, then it serves to the thread pool if the
blocking process is there.
Step 4
Now, the internal thread pool handles a lot of requests, like database
request, file request, and many more.
Step 5
Whenever any thread completes that task, the callback function calls
and sends the response back to the event loop.
Step 6
Now, event loop sends back the response to the client whose request is
completed.

Meteor server restarts itself upon slow requests

I have a Meteor app that is performing some calls that are currently hanging. I'm processing a lot of items in a loop that is then upserting to server-side Mongo. (I think this is done asynchronously) I understand the upserting in a loop is not good .
This whole functionality seems to make the app hang for a while. I'm even noticing sock.js and websocket error out in the console. I think this is all due to DDP, async Mongo upserts, and the slow requests.
Here's some pseduocode to what I'm talking about
for (1..A Lot of records) {
//Is this async?
Collection.upsert(record)
}
Eventually this function will complete. However, I'll notice that Meteor "Restarts" (I think this is true because I see Accounts.onLogin being called again. It's almost like the client refreshes after the slow request has actually finished. This results in something that appears like an infinite loop.
My question is why the app is "restarting". Is this due to something in the framework and how it handles slow requests? I.e. does it queue up all bad requests and then eventually retry them automatically?

I am not sure about what exactly is going on here, but it sounds like the client isn't able to reach the server while it is "busy", and then the client connection over DDP times out, and ends up with a client refresh. The server process probably doesn't restart.
One technique for improving this is to implement a queue in your database. One piece of code detects there are a bunch of database upserts to do, so it records the information in a table which is used as a queue.
You set up a cron job (using eg npm module node-cron) that looks for things in the queue on a regular basis - when it finds an unprocessed record, it does the upsert work needed, and then either updates a status value in the queue record to 'done', or simply deletes it from the queue. You can decide how many records to process at a time to minimise interruptions.
Another approach is to do the processing in another node process on your server, basically like a worker process. If this process is busy, it is not going to impact your front end. The same queueing technique can be used to make sure this doesn't get bogged down either.
You lose a little reactivity this way, but given it's some kind of bulk process, that shouldn't matter.

Is there a limit to how many promises can or should run concurrently?

Surprisingly google had trouble returning the result for this question.
I'm wondering how many promises can or should be ran in parallel before queuing them and waiting for the next one to finish. I guess it might depend on the user's internet, but I figured it was worth asking.
If it's based on the user's ISP/connection type is there a way to test for the ideal amount of promises to send before starting a queue?
Also, I'm talking strictly from the client side. So, single thread js.
Example code:
function uploadToServer(requestData){
return Promise((...));
}
function sendRequests(requestArray){
var count = 0;
for(var requestData in requestArray){
if(count<idealAmount){
uploadToServer(idealAmount).then(count--);
count++;
}else{
// Logic to wait before attempting to fire event
}
}
}

Promises themselves have no particular coded limits. They are just a notification system and you could have millions of them just fine (as long as you had enough memory to hold those Javascript objects).
Now, if a promise represents an underlying asynchronous operation (which they usually do), there could very well be some limits to how many of that specific type of asynchronous operation can be in flight at the same time. For example, at some point you might run into limits of how many requests a single host would accept from you at the same time. Or, you might run into local resources issues with zillions of connections somewhere.
For things like node.js disk I/O operations, the underlying disk I/O sub-system already has a queuing system so that only a small number of operations are actually running at once and the rest are queued.
So, to answer a question about how many concurrent operations you can have, it can only be analyzed and answered in the context of a specific type of asynchronous request and sometimes even a specific type of receiving host.
If you know you're processing a large or potentially large array of requests and you'll be sending a network request for every item in the array, then it is common to code a limit yourself to avoid overwhelming either local resources or the target host resources. This is usually not done with a queue, but rather code that just launches N requests and then as one finishes, it launches the next one and so on. Both the Bluebird and Async libraries have methods for managing this for you. In Bluebird, it's the concurrency option for Promise.map(). I've also hand-coded loops that manage the number of concurrent connections several times myself and here are links to some of that code:
Promise.all consumes all my RAM
Javascript - how to control how many promises access network in parallel
Make several requests to an API that can only handle 20 request a minute
Loop through an api get request with variable URL
Choose proper async method for batch processing for max requests/sec
Nodejs: Async request with a list of URL

As #jfried00 mentioned there can't be any limits on a number of promises running, as there's no such thing as running a Promise. Once you run an async function or run a code like new Promise(res => something(res)), the method is run.
What you can do is limit the number of promise chains being resolved:
// ten promises ago:
let oldPromise = doSomethingAsync();
// and now:
oldPromise.then(doSomethingNewAsync());
But actually coding this on your own is gonna dye your hair grey rather quickly as my example has shown - error handling, finding the empty slots and keeping the flow in the right order will be hard.
That said it is possible and my framework, Scramjet, which I'll shamelessly plug here does what you need:
DataStream.from(requestArray)
.setOptions({maxParallel: 4})
.unorder(requestData => uploadToServer(requestData))
.run()
Scramjet will keep 4 promises resolving but won't try to keep order (there are other methods for that) and you can use any function - if it doesn't return a promise, it will work the same as if it did. Here's some more text on unordered transforms in scramjet. You can also peek at the source code if you'd rather do that yourself...

What would happen if a variable were manipulated more than once at the exact same time? Is it possible? [duplicate]

Lets assume I run this piece of code.
var score = 0;
for (var i = 0; i < arbitrary_length; i++) {
async_task(i, function() { score++; }); // increment callback function
}
In theory I understand that this presents a data race and two threads trying to increment at the same time may result in a single increment, however, nodejs(and javascript) are known to be single threaded. Am I guaranteed that the final value of score will be equal to arbitrary_length?

Am I guaranteed that the final value of score will be equal to
arbitrary_length?
Yes, as long as all async_task() calls call the callback once and only once, you are guaranteed that the final value of score will be equal to arbitrary_length.
It is the single-threaded nature of Javascript that guarantees that there are never two pieces of Javascript running at the exact same time. Instead, because of the event driven nature of Javascript in both browsers and node.js, one piece of JS runs to completion, then the next event is pulled from the event queue and that triggers a callback which will also run to completion.
There is no such thing as interrupt driven Javascript (where some callback might interrupt some other piece of Javascript that is currently running). Everything is serialized through the event queue. This is an enormous simplification and prevents a lot of stickly situations that would otherwise be a lot of work to program safely when you have either multiple threads running concurrently or interrupt driven code.
There still are some concurrency issues to be concerned about, but they have more to do with shared state that multiple asynchronous callbacks can all access. While only one will ever be accessing it at any given time, it is still possible that a piece of code that contains several asynchronous operations could leave some state in an "in between" state while it was in the middle of several async operations at a point where some other async operation could run and could attempt to access that data.
You can read more about the event driven nature of Javascript here: How does JavaScript handle AJAX responses in the background? and that answer also contains a number of other references.
And another similar answer that discusses the kind of shared data race conditions that are possible: Can this code cause a race condition in socket io?
Some other references:
how do I prevent event handlers to handle multiple events at once in javascript?
Do I need to be concerned with race conditions with asynchronous Javascript?
JavaScript - When exactly does the call stack become "empty"?
Node.js server with multiple concurrent requests, how does it work?
To give you an idea of the concurrency issues that can happen in Javascript (even without threads and without interrupts, here's an example from my own code.
I have a Raspberry Pi node.js server that controls the attic fans in my house. Every 10 seconds it checks two temperature probes, one inside the attic and one outside the house and decides how it should control the fans (via relays). It also records temperature data that can be presented in charts. Once an hour, it saves the latest temperature data that was collected in memory to some files for persistence in case of power outage or server crash. That saving operation involves a series of async file writes. Each one of those async writes yields control back to the system and then continues when the async callback is called signaling completion. Because this is a low memory system and the data can potentially occupy a significant portion of the available RAM, the data is not copied in memory before writing (that's simply not practical). So, I'm writing the live in-memory data to disk.
At any time during any of these async file I/O operations, while waiting for a callback to signify completion of the many file writes involved, one of my timers in the server could fire, I'd collect a new set of temperature data and that would attempt to modify the in-memory data set that I'm in the middle of writing. That's a concurrency issue waiting to happen. If it changes the data while I've written part of it and am waiting for that write to finish before writing the rest, then the data that gets written can easily end up corrupted because I will have written out one part of the data, the data will have gotten modified from underneath me and then I will attempt to write out more data without realizing it's been changed. That's a concurrency issue.
I actually have a console.log() statement that explicitly logs when this concurrency issue occurs on my server (and is handled safely by my code). It happens once every few days on my server. I know it's there and it's real.
There are many ways to work around those types of concurrency issues. The simplest would have been to just make a copy in memory of all the data and then write out the copy. Because there are not threads or interrupts, making a copy in memory would be safe from concurrency (there would be no yielding to async operations in the middle of the copy to create a concurrency issue). But, that wasn't practical in this case. So, I implemented a queue. Whenever I start writing, I set a flag on the object that manages the data. Then, anytime the system wants to add or modify data in the stored data while that flag is set, those changes just go into a queue. The actual data is not touched while that flag is set. When the data has been safely written to disk, the flag is reset and the queued items are processed. Any concurrency issue was safely avoided.
So, this is an example of concurrency issues that you do have to be concerned about. One great simplifying assumption with Javascript is that a piece of Javascript will run to completion without any thread of getting interrupted as long as it doesn't purposely return control back to the system. That makes handling concurrency issues like described above lots, lots easier because your code will never be interrupted except when you consciously yield control back to the system. This is why we don't need mutexes and semaphores and other things like that in our own Javascript. We can use simple flags (just a regular Javascript variable) like I described above if needed.
In any entirely synchronous piece of Javascript, you will never be interrupted by other Javascript. A synchronous piece of Javascript will run to completion before the next event in the event queue is processed. This is what is meant by Javascript being an "event-driven" language. As an example of this, if you had this code:
console.log("A");
// schedule timer for 500 ms from now
setTimeout(function() {
console.log("B");
}, 500);
console.log("C");
// spin for 1000ms
var start = Date.now();
while(Data.now() - start < 1000) {}
console.log("D");
You would get the following in the console:
A
C
D
B
The timer event cannot be processed until the current piece of Javascript runs to completion, even though it was likely added to the event queue sooner than that. The way the JS interpreter works is that it runs the current JS until it returns control back to the system and then (and only then), it fetches the next event from the event queue and calls the callback associated with that event.
Here's the sequence of events under the covers.
This JS starts running.
console.log("A") is output.
A timer event is schedule for 500ms from now. The timer subsystem uses native code.
console.log("C") is output.
The code enters the spin loop.
At some point in time part-way through the spin loop the previously set timer is ready to fire. It is up to the interpreter implementation to decide exactly how this works, but the end result is that a timer event is inserted into the Javascript event queue.
The spin loop finishes.
console.log("D") is output.
This piece of Javascript finishes and returns control back to the system.
The Javascript interpreter sees that the current piece of Javascript is done so it checks the event queue to see if there are any pending events waiting to run. It finds the timer event and a callback associated with that event and calls that callback (starting a new block of JS execution). That code starts running and console.log("B") is output.
That setTimeout() callback finishes execution and the interpreter again checks the event queue to see if there are any other events that are ready to run.

Node uses an event loop. You can think of this as a queue. So we can assume, that your for loop puts the function() { score++; } callback arbitrary_length times on this queue. After that the js engine runs these one by one and increase score each time. So yes. The only exception if a callback is not called or the score variable is accessed from somewhere else.
Actually you can use this pattern to do tasks parallel, collect the results and call a single callback when every task is done.
var results = [];
for (var i = 0; i < arbitrary_length; i++) {
async_task(i, function(result) {
results.push(result);
if (results.length == arbitrary_length)
tasksDone(results);
});
}

No two invocations of the function can happen at the same time (b/c node is single threaded) so that will not be a problem. The only problem would be ifin some cases async_task(..) drops the callback. But if, e.g., 'async_task(..)' was just calling setTimeout(..) with the given function, then yes, each call will execute, they will never collide with each other, and 'score' will have the value expected, 'arbitrary_length', at the end.
Of course, the 'arbitrary_length' can't be so great as to exhaust memory, or overflow whatever collection is holding these callbacks. There is no threading issue however.

I do think it’s worth noting for others that view this, you have a common mistake in your code. For the variable i you either need to use let or reassign to another variable before passing it into the async_task(). The current implementation will result in each function getting the last value of i.

We Keep Coding

JavaScript is the programming language of the Web.