How to process socket.io events in their incoming order - javascript

I have the following setup:
async MyFunction(param) {
//... Do some computation
await WriteToDB()
}
io.on('connection', (socket) => {
socket.on('AnEvent', (param) => MyFunction(param))
})
When an event comes in, it calls an asynchronous function which does some computation and in the end write the result to a database with another asynchronous call.
If MyFunction doesn't have an asynchronous call to write the database in the end, for example
MyFunction(param) {
//... Do some computation
}
then it is obviously that all events will be processed in their incoming order. The processing of the next event will only start after the processing of the previous one finishes. However, because of the asynchronous call to the database, I don't know if those incoming events will still be fully processed in order. I am afraid that the processing of the next event starts before the previous await WriteToDB() finishes. How do I change the code to fully processing them in order?

You are correct that there's no guarantee that incoming events will be processed in the order.
To achieve what you are asking, you would need a "Message Queue" that will periodically check for new messages and process them one by one.
const messageQueue = [];
// SocketIO adding Message to MessageQueue
const eventHandler = (message) => {
messageQueue.push(message);
}
const messageHandler = () => {
if (messageQueue.length === 0) {
return;
}
const message = messageQueue.shift();
// Handle Message
// If successful, ask for next message
return messageHandler();
}
Of course, my example is pretty naive, but I hope it will give you a general idea of how what you are asking is accomplished.
If you are finding yourself needing a more robust message queue, Look into RabbitMQ, BullMQ, Kafka

Related

Creating a Readable stream from emitted data chunks

Short backstory: I am trying to create a Readable stream based on data chunks that are emitted back to my server from the client side with WebSockets. Here's a class I've created to "simulate" that behavior:
class DataEmitter extends EventEmitter {
constructor() {
super();
const data = ['foo', 'bar', 'baz', 'hello', 'world', 'abc', '123'];
// Every second, emit an event with a chunk of data
const interval = setInterval(() => {
this.emit('chunk', data.splice(0, 1)[0]);
// Once there are no more items, emit an event
// notifying that that is the case
if (!data.length) {
this.emit('done');
clearInterval(interval);
}
}, 1e3);
}
}
In this post, the dataEmitter in question will have been created like this.
// Our data is being emitted through events in chunks from some place.
// This is just to simulate that. We cannot change the flow - only listen
// for the events and do something with the chunks.
const dataEmitter = new DataEmitter();
Right, so I initially tried this:
const readable = new Readable();
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
But that results in this error:
Error [ERR_METHOD_NOT_IMPLEMENTED]: The _read() method is not implemented
So I did this, implementing read() as an empty function:
const readable = new Readable({
read() {},
});
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
And it works when piping into a write stream, or sending the stream to my test API server. The resulting .txt file looks exactly as it should:
foobarbazhelloworldabc123
However, I feel like there's something quite wrong and hacky with my solution. I attempted to put the listener registration logic (.on('chunk', ...) and .once('done', ...)) within the read() implementation; however, read() seems to get called multiple times, and that results in the listeners being registered multiple times.
The Node.js documentation says this about the _read() method:
When readable._read() is called, if data is available from the resource, the implementation should begin pushing that data into the read queue using the this.push(dataChunk) method. _read() will be called again after each call to this.push(dataChunk) once the stream is ready to accept more data. _read() may continue reading from the resource and pushing data until readable.push() returns false. Only when _read() is called again after it has stopped should it resume pushing additional data into the queue.
After dissecting this, it seems that the consumer of the stream calls upon .read() when it's ready to read more data. And when it is called, data should be pushed into the stream. But, if it is not called, the stream should not have data pushed into it until the method is called again (???). So wait, does the consumer call .read() when it is ready for more data, or does it call it after each time .push() is called? Or both?? The docs seem to contradict themselves.
Implementing .read() on Readable is straightforward when you've got a basic resource to stream, but what would be the proper way of implementing it in this case?
And also, would someone be able to explain in better terms what the .read() method is on a deeper level, and how it should be implemented?
Thanks!
Response to the answer:
I did try registering the listeners within the read() implementation, but because it is called multiple times by the consumer, it registers the listeners multiple times.
Observing this code:
const readable = new Readable({
read() {
console.log('called');
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
},
});
readable.pipe(createWriteStream('./data.txt'));
The resulting file looks like this:
foobarbarbazbazbazhellohellohellohelloworldworldworldworldworldabcabcabcabcabcabc123123123123123123123
Which makes sense, because the listeners are being registered multiple times.
Seems like the only purpose of actually implementing the read() method is to only start receiving the chunks and pushing them into the stream when the consumer is ready for that.
Based on these conclusions, I've come up with this solution.
class MyReadable extends Readable {
// Keep track of whether or not the listeners have already
// been added to the data emitter.
#registered = false;
_read() {
// If the listeners have already been registered, do
// absolutely nothing.
if (this.#registered) return;
// "Notify" the client via websockets that we're ready
// to start streaming the data chunks.
const emitter = new DataEmitter();
const handler = (chunk: string) => {
this.push(chunk);
};
emitter.on('chunk', handler);
emitter.once('done', () => {
this.push(null);
// Clean up the listener once it's done (this is
// assuming the #emitter object will still be used
// in the future).
emitter.off('chunk', handler);
});
// Mark the listeners as registered.
this.#registered = true;
}
}
const readable = new MyReadable();
readable.pipe(createWriteStream('./data.txt'));
But this implementation doesn't allow for the consumer to control when things are pushed. I guess, however, in order to achieve that sort of control, you'd need to communicate with the resource emitting the chunks to tell it to stop until the read() method is called again.

Asynchronous readline loop without async / await

I'd like to be using this function (which runs fine on my laptop), or something like it, on my embedded device. But two problems:
Node.JS on our device is so old that the JavaScript doesn't support async / await. Given our hundreds of units in the field, updating is probably impractical.
Even if that weren't a problem, we have another problem: the file may be tens of megabytes in size, not something that could fit in memory all at once. And this for-await loop will set thousands of asynchronous tasks into motion, more or less all at once.
async function sendOldData(pathname) {
const reader = ReadLine.createInterface({
input: fs.createReadStream(pathname),
crlfDelay: Infinity
})
for await (const line of reader) {
record = JSON.parse(line);
sendOldRecord(record);
}
}
function sendOldRecord(record) {...}
Promises work in this old version. I'm sure there is a graceful syntax for doing this sequentially with Promises:
read one line
massage its data
send that data to our server
sequentially, but asynchronously, so that the JavaScript event loop is not blocked while the data is sent to the server.
Please, could someone suggest the right syntax for doing this in my outdated JavaScript?
Make a queue so it takes the next one off the array
function foo() {
const reader = [
'{"foo": 1}',
'{"foo": 2}',
'{"foo": 3}',
'{"foo": 4}',
'{"foo": 5}',
];
function doNext() {
if (!reader.length) {
console.log('done');
return;
}
const line = reader.shift();
const record = JSON.parse(line);
sendOldRecord(record, doNext);
}
doNext();
}
function sendOldRecord(record, done) {
console.log(record);
// what ever your async task is
window.setTimeout(function () {
done();
}, Math.floor(2000 * Math.random()));
}
foo();
The Problem
Streams and asynchronous processing are somewhat of a pain to get them to work well together and handle all possible error conditions and things are even worse for the readline module. Since you seem to be saying that you can't use the for await () construct for the readable (which even when it is supported, has various issues), things are even a bit more complicated.
The main problem with readline.createInterface() on a stream is that it reads a chunk of the file, parses that chunk for full lines and then synchronously sends all the lines in a tight for loop.
You can literally see the code here:
for (let n = 0; n < lines.length; n++) this[kOnLine](lines[n]);
The implementation of kOnLine does this:
this.emit('line', line);
So, this is a tight for loop that emits all the lines it read out. So ... if you try to do something asynchronous in your responding to the line event, the moment you hit an await or an asynchronous callback, this readline code will send the next line event before you're done processing the previous one. This makes it a pain to do asynchronous processing of the line events in sequential order where you finishing asynchronous processing one line before starting on the next one. IMO, this is a very busted design as it only really works with synchronous processing. You will notice that this for loop also doesn't care if the readline object was paused either. It just pumps out all the lines it has without regard for anything.
Discussion of Possible Solutions
So, what to do about that. Some part of a fix for this is in the asynchronous iterator interface to readline (but it has other problems which I've filed bugs on). But, the supposition of your question seems to be that you can't use that asynchronous iterator interface because your device may have an older version of nodejs. If that's the case, then I only know of two options:
Ditch the readline.createInterface() functionality entirely and either use a 3rd party module or do your own line boundary processing.
Cover the line event with your own code that supports asynchronous processing of lines without getting the next line in the middle of still processing the previous one.
A Solution
I've written an implementation for option #2, covering the line event with your own code. In my implementation, we just acknowledge that line events will arrive during our asynchronous processing of previous lines, but instead of notifying you about then, the input stream gets paused and these "early" lines get queued. With this solution the readline code will read a chunk of data from the input stream, parse it into its full lines, synchronously send all the line events for those full lines. But, upon receipt of the first line event, we will pause the input stream and initiate queueing of subsequent line events. So, you can asynchronously process a line and you won't get another one until you ask for the next line.
This code has a different way of communicating incoming lines to your code. Since we're in the age of promises for asynchronous code, I've added a promise-based reader.getNextLine() function to the reader object.
This lets you write code like this:
import fs from 'fs';
async function run(filename) {
let reader = createLineReader({
input: fs.createReadStream(filename),
crlfDelay: Infinity
});
let line;
let cntr = 0;
while ((line = await reader.getNextLine()) !== null) {
// simulate some asynchronous operation in the processing of the line
console.log(`${++cntr}: ${line}`);
await processLine(line);
}
}
run("temp.txt").then(result => {
console.log("done");
}).catch(err => {
console.log(err);
});
And, here's the implementation of createLineReader():
import * as ReadLine from 'readline';
function createLineReader(options) {
const stream = options.input;
const reader = ReadLine.createInterface(options);
// state machine variables
let latchedErr = null;
let isPaused = false;
let readerClosed = false;
const queuedLines = [];
// resolves with line
// resolves with null if no more lines
// rejects with error
reader.getNextLine = async function() {
if (latchedErr) {
// once we get an error, we're done
throw latchedErr;
} else if (queuedLines.length) {
// if something in the queue, return the oldest from the queue
const line = queuedLines.shift();
if (queuedLines.length === 0 && isPaused) {
reader.resume();
}
return line;
} else if (readerClosed) {
// if nothing in the queue and the reader is closed, then signify end of data
return null;
} else {
// waiting for more line data to arrive
return new Promise((resolve, reject) => {
function clear() {
reader.off('error', errorListener);
reader.off('queued', queuedListener);
reader.off('done', doneListener);
}
function queuedListener() {
clear();
resolve(queuedLines.shift());
}
function errorListener(e) {
clear();
reject(e);
}
function doneListener() {
clear();
resolve(null);
}
reader.once('queued', queuedListener);
reader.once('error', errorListener);
reader.once('done', doneListener);
});
}
}
reader.on('pause', () => {
isPaused = true;
}).on('resume', () => {
isPaused = false;
}).on('line', line => {
queuedLines.push(line);
if (!isPaused) {
reader.pause();
}
// tell any queue listener that something was just added to the queue
reader.emit('queued');
}).on('close', () => {
readerClosed = true;
if (queuedLines.length === 0) {
reader.emit('done');
}
});
return reader;
}
Explanation
Internally, the implementation takes each new line event and puts it into a queue. Then, reader.getNextLine() just pulls items from the queue or waits (with a promise) for the queue to get something put in it.
During operation, the readline object will get a chunk of data from your readstream, it will parse that into whole lines. The whole lines will all get added to the queue (via line events). The readstream will be paused so it won't generate any more lines until the queue has been drained.
When the queue becomes empty, the readstream will be resumed so it can send more data to the reader object.
This is scalable to very large files because it will only queue the whole lines found in one chunk of the file being read. Once those lines are queued, the input stream is paused so it won't put more into the queue. After the queue is drained, the inputs stream is resumed so it can send more data and repeat...
Any errors in the readstream will trigger an error event on the readline object which will either reject a reader.getNextLine() that is already waiting for the next line or will reject the next time reader.getNextLine() is called.
Disclaimers
This has only been tested with file-based readstreams.
I would not recommend having more than one reader.getNextLine() in process at once as this code does not anticipate that and it's not even clear what that should do.
Basically you can achieve this using functional approach:
const arrayOfValues = [1,2,3,4,5];
const chainOfPromises = arrayOfValues.reduce((acc, item) => {
return acc.then((result) => {
// Here you can add your logic for parsing/sending request
// And here you are chaining next promise request
return yourAsyncFunction(item);
})
}, Promise.resolve());
// Basically this will do
// Promise.resolve().then(_ => yourAsyncFunction(1)).then(_ => yourAsyncFunction(2)) and so on...
// Start
chainOfPromises.then();

Cancel current websocket handling when a new websocket request is made

I am using npm ws module (or actually the wrapper called isomorphic-ws) for websocket connection.
NPM Module: isomorphic-ws
I use it to receive some array data from a websocket++ server running on the same PC. This data is then processed and displayed as a series of charts.
Now the problem is that the handling itself takes a very long time. I use one message to calculate 16 charts and for each of them I need to calculate a lot of logarithms and other slow operations and all that in JS. Well, the whole refresh operation takes about 20 seconds.
Now I could actually live with that, but the problem is that when I get a new request it is processed after the whole message handler is finished. And if I get several requests in the meantime, all of them shall be processed as they came in. And so the requests are there queued and the current state gets more and more outdated as the time goes on...
I would like to have a way of detecting that there is another message waiting to be processed. If that is the case I could just stop the current handler at any time and start over... So when using npm ws, is there a way of telling that there is another message waiting in to be processed?
Thanks
You need to create some sort of cancelable job wrapper. It's hard to give a concrete suggestion without seeing your code. But it could be something like this.
const processArray = array => {
let canceled = false;
const promise = new Promise((resolve, reject) => {
// do something with the array
for(let i = 0; i < array.length; i++) {
// check on each iteration if the job has been canceled
if(canceled) return reject({ reason: 'canceled' });
doSomething(array[i])
}
resolve(result)
})
return {
cancel: () => {
cancel = true
},
promise
}
}
const job = processArray([1, 2, 3, ...1000000]) // huge array
// handle the success
job.promise.then(result => console.log(result))
// Cancel the job
job.cancel()
I'm sure there are libraries to serve this exact purpose. But I just wanted to give a basic example of how it could be done.

what's the differences between these two async func in nodejs?

const fs = require("fs");
fs.readFile("aa.js", () => {
console.log("1");
process.nextTick(() => {
console.log("3");
});
});
fs.readFile("aa.js", () => {
console.log("2");
process.nextTick(() => {
console.log("4");
});
});
// the result is 1 3 2 4
const net = require("net");
const server = net.createServer(() => {}).listen(8080);
server.on("listening", () => {
console.log("1");
process.nextTick(() => {
console.log("3");
});
});
server.on("listening", () => {
console.log("2");
process.nextTick(() => {
console.log("4");
});
});
// the result is 1 2 3 4
IMO, these two async callback should behave the same,
but the result is different,
what's the reason behind the scene?
The first one is a race between two completely separate asynchronous fs.readFile() operations. Whichever one completes first is likely to get both of it's console logs before the other. Because these are operations that take some measurable amount of time and they both have to do the exact same amount of work, it's likely that one you started first will finish first and that's what you're seeing. But, technically, it's an indeterminate race between the two asynchronous operations and they could finish in any order. Since one is likely to finish slightly before the other, it's completion callback will be called before the other and it's also likely that the 2nd one won't yet be done before the next tick happens so that's why you see both log messages from whichever one finishes first.
Your second one is two event listeners for the exact same event. So, those two listeners are guaranteed to be called on the same tick one after the other. When an event listener object emits an event, it synchronously calls all the listeners for that event one after the other, all on the same tick. That's why you get 1 and then 2 before 3 and 4 which occur on the future ticks.
One should not confuse an eventEmitter object with the event queue. They are not the same thing. In this case, your server object is a subclass of an eventEmitter. Some code internal to the server decides to emit the listening event to listeners of the server's eventEmitter. That decision to emit the event was likely the result of some asynchronous operation that came from the event queue. But, to actually emit to the eventEmitter, this is just synchronous function calls to the registered listeners. The event queue is not involved in this. Internal to the eventEmitter code, it literally has a for loop that loops through the matching event handlers and calls each one, one after the other. That's why you get 1, then 2.
In fact, here's the code reference inside the .emit() method on the EventEmitter class definition that shows looping through the matching listeners and calling them synchronously. And, here's a snippet of that code calling each listener one after the other:
const len = handler.length;
const listeners = arrayClone(handler, len);
for (var i = 0; i < len; ++i)
Reflect.apply(listeners[i], this, args);
}

What does Winston callback actually mean?

I am a beginner in Node js and was wondering if someone could help me out.
Winston allows you to pass in a callback which is executed when all transports have been logged - could someone explain what this means as I am slightly lost in the context of callbacks and Winston?
From https://www.npmjs.com/package/winston#events-and-callbacks-in-winston I am shown an example which looks like this:
logger.info('CHILL WINSTON!', { seriously: true }, function (err, level, msg, meta) {
// [msg] and [meta] have now been logged at [level] to **every** transport.
});
Great... however I have several logger.info across my program, and was wondering what do I put into the callback? Also, do I need to do this for every logger.info - or can I put all the logs into one function?
I was thinking to add all of the log call into an array, and then use async.parallel so they all get logged at the same time? Good or bad idea?
The main aim is to log everything before my program continues with other tasks.
Explanation of the code above in callback and winston context would be greatly appreciated!
Winston allows you to pass in a callback which is executed when all transports have been logged
This means that if you have a logger that handles more than one transport (for instance, console and file), the callback will be executed only after the messages have been logged on all of them (in this case, on both the console and the file).
An I/O operation on a file will always take longer than just outputting a message on the console. Winston makes sure that the callback will be triggered, not at the end of the first transport logging, but at the end of the last one of them (that is, the one that takes longest).
You don't need to use a callback for every logger.info, but in this case it can help you make sure everything has been logged before continuing with the other tasks:
var winston = require('winston');
winston.add(winston.transports.File, { filename: './somefile.log' });
winston.level = 'debug';
const tasks = [x => {console.log('task1');x();},x => {console.log('task2');x();},x => {console.log('task3');x()}];
let taskID = 0;
let complete = 0;
tasks.forEach(task => {
task(() => winston.debug('CHILL WINSTON!', `logging task${++taskID}`, waitForIt));
});
function waitForIt() {
// Executed every time a logger has logged all of its transports
if (++complete===tasks.length) nowGo();
};
function nowGo() {
// Now all loggers have logged all of its transports
winston.log('debug', 'All tasks complete. Moving on!');
}
Sure... you probably won't define tasks that way, but just to show one way you could launch all the tasks in parallel and wait until everythings has been logged to continue with other tasks.
Just to explain the example code:
The const tasks is an array of functions, where each one accepts a function x as a parameter, first performs the task at hand, in this case a simple console.log('task1'); then executes the function received as parameter, x();
The function passed as parameter to each one of those functions in the array is the () => winston.debug('CHILL WINSTON!',`logging task${++taskID}`, waitForIt)
The waitForIt, the third parameter in this winston.debug call, is the actual callback (the winston callback you inquired about).
Now, taskID counts the tasks that have been launched, while complete counts the loggers that have finished logging.
Being async, one could launch them as 1, 2, 3, but their loggers could end in a 1, 3, 2 sequence, for all we know. But since all of them will trigger the waitForIt callback once they're done, we just count how many have finished, then call the nowGo function when they all are done.
Compare it to
var winston = require('winston');
var logger = new winston.Logger({
level:'debug',
transports: [
new (winston.transports.Console)(),
new (winston.transports.File)({filename: './somefile.log'})
]
});
const tasks = [x => {console.log("task1");x();},x => {console.log("task2");x();},x => {console.log("task3");x()}];
let taskID = 0;
let complete = 0;
tasks.forEach(task => {
task(() => logger.debug('CHILL WINSTON!', `logging task${++taskID}`, (taskID===tasks.length)? nowGo : null));
});
logger.on('logging', () => console.log(`# of complete loggers: ${++complete}`));
function nowGo() {
// Stop listening to the logging event
logger.removeAllListeners('logging');
// Now all loggers have logged all of its transports
logger.debug('All tasks complete. Moving on!');
}
In this case, the nowGo would be the callback, and it would be added only to the third logger.debug call. But if the second logger finished later than the third, it would have continued without waiting for the second one to finish logging.
In such simple example it won't make a difference, since all of them finish equally fast, but I hope it's enough to get the concept.
While at it, let me recommend the book Node.js Design Patterns by Mario Casciaro for more advanced async flow sequencing patterns. It also has a great EventEmitter vs callback comparison.
Hope this helped ;)

Categories