Asynchronous readline loop without async / await

Asynchronous readline loop without async / await - javascript

I'd like to be using this function (which runs fine on my laptop), or something like it, on my embedded device. But two problems:
Node.JS on our device is so old that the JavaScript doesn't support async / await. Given our hundreds of units in the field, updating is probably impractical.
Even if that weren't a problem, we have another problem: the file may be tens of megabytes in size, not something that could fit in memory all at once. And this for-await loop will set thousands of asynchronous tasks into motion, more or less all at once.
async function sendOldData(pathname) {
const reader = ReadLine.createInterface({
input: fs.createReadStream(pathname),
crlfDelay: Infinity
})
for await (const line of reader) {
record = JSON.parse(line);
sendOldRecord(record);
}
}
function sendOldRecord(record) {...}
Promises work in this old version. I'm sure there is a graceful syntax for doing this sequentially with Promises:
read one line
massage its data
send that data to our server
sequentially, but asynchronously, so that the JavaScript event loop is not blocked while the data is sent to the server.
Please, could someone suggest the right syntax for doing this in my outdated JavaScript?

Make a queue so it takes the next one off the array
function foo() {
const reader = [
'{"foo": 1}',
'{"foo": 2}',
'{"foo": 3}',
'{"foo": 4}',
'{"foo": 5}',
];
function doNext() {
if (!reader.length) {
console.log('done');
return;
}
const line = reader.shift();
const record = JSON.parse(line);
sendOldRecord(record, doNext);
}
doNext();
}
function sendOldRecord(record, done) {
console.log(record);
// what ever your async task is
window.setTimeout(function () {
done();
}, Math.floor(2000 * Math.random()));
}
foo();

The Problem
Streams and asynchronous processing are somewhat of a pain to get them to work well together and handle all possible error conditions and things are even worse for the readline module. Since you seem to be saying that you can't use the for await () construct for the readable (which even when it is supported, has various issues), things are even a bit more complicated.
The main problem with readline.createInterface() on a stream is that it reads a chunk of the file, parses that chunk for full lines and then synchronously sends all the lines in a tight for loop.
You can literally see the code here:
for (let n = 0; n < lines.length; n++) this[kOnLine](lines[n]);
The implementation of kOnLine does this:
this.emit('line', line);
So, this is a tight for loop that emits all the lines it read out. So ... if you try to do something asynchronous in your responding to the line event, the moment you hit an await or an asynchronous callback, this readline code will send the next line event before you're done processing the previous one. This makes it a pain to do asynchronous processing of the line events in sequential order where you finishing asynchronous processing one line before starting on the next one. IMO, this is a very busted design as it only really works with synchronous processing. You will notice that this for loop also doesn't care if the readline object was paused either. It just pumps out all the lines it has without regard for anything.
Discussion of Possible Solutions
So, what to do about that. Some part of a fix for this is in the asynchronous iterator interface to readline (but it has other problems which I've filed bugs on). But, the supposition of your question seems to be that you can't use that asynchronous iterator interface because your device may have an older version of nodejs. If that's the case, then I only know of two options:
Ditch the readline.createInterface() functionality entirely and either use a 3rd party module or do your own line boundary processing.
Cover the line event with your own code that supports asynchronous processing of lines without getting the next line in the middle of still processing the previous one.
A Solution
I've written an implementation for option #2, covering the line event with your own code. In my implementation, we just acknowledge that line events will arrive during our asynchronous processing of previous lines, but instead of notifying you about then, the input stream gets paused and these "early" lines get queued. With this solution the readline code will read a chunk of data from the input stream, parse it into its full lines, synchronously send all the line events for those full lines. But, upon receipt of the first line event, we will pause the input stream and initiate queueing of subsequent line events. So, you can asynchronously process a line and you won't get another one until you ask for the next line.
This code has a different way of communicating incoming lines to your code. Since we're in the age of promises for asynchronous code, I've added a promise-based reader.getNextLine() function to the reader object.
This lets you write code like this:
import fs from 'fs';
async function run(filename) {
let reader = createLineReader({
input: fs.createReadStream(filename),
crlfDelay: Infinity
});
let line;
let cntr = 0;
while ((line = await reader.getNextLine()) !== null) {
// simulate some asynchronous operation in the processing of the line
console.log(`${++cntr}: ${line}`);
await processLine(line);
}
}
run("temp.txt").then(result => {
console.log("done");
}).catch(err => {
console.log(err);
});
And, here's the implementation of createLineReader():
import * as ReadLine from 'readline';
function createLineReader(options) {
const stream = options.input;
const reader = ReadLine.createInterface(options);
// state machine variables
let latchedErr = null;
let isPaused = false;
let readerClosed = false;
const queuedLines = [];
// resolves with line
// resolves with null if no more lines
// rejects with error
reader.getNextLine = async function() {
if (latchedErr) {
// once we get an error, we're done
throw latchedErr;
} else if (queuedLines.length) {
// if something in the queue, return the oldest from the queue
const line = queuedLines.shift();
if (queuedLines.length === 0 && isPaused) {
reader.resume();
}
return line;
} else if (readerClosed) {
// if nothing in the queue and the reader is closed, then signify end of data
return null;
} else {
// waiting for more line data to arrive
return new Promise((resolve, reject) => {
function clear() {
reader.off('error', errorListener);
reader.off('queued', queuedListener);
reader.off('done', doneListener);
}
function queuedListener() {
clear();
resolve(queuedLines.shift());
}
function errorListener(e) {
clear();
reject(e);
}
function doneListener() {
clear();
resolve(null);
}
reader.once('queued', queuedListener);
reader.once('error', errorListener);
reader.once('done', doneListener);
});
}
}
reader.on('pause', () => {
isPaused = true;
}).on('resume', () => {
isPaused = false;
}).on('line', line => {
queuedLines.push(line);
if (!isPaused) {
reader.pause();
}
// tell any queue listener that something was just added to the queue
reader.emit('queued');
}).on('close', () => {
readerClosed = true;
if (queuedLines.length === 0) {
reader.emit('done');
}
});
return reader;
}
Explanation
Internally, the implementation takes each new line event and puts it into a queue. Then, reader.getNextLine() just pulls items from the queue or waits (with a promise) for the queue to get something put in it.
During operation, the readline object will get a chunk of data from your readstream, it will parse that into whole lines. The whole lines will all get added to the queue (via line events). The readstream will be paused so it won't generate any more lines until the queue has been drained.
When the queue becomes empty, the readstream will be resumed so it can send more data to the reader object.
This is scalable to very large files because it will only queue the whole lines found in one chunk of the file being read. Once those lines are queued, the input stream is paused so it won't put more into the queue. After the queue is drained, the inputs stream is resumed so it can send more data and repeat...
Any errors in the readstream will trigger an error event on the readline object which will either reject a reader.getNextLine() that is already waiting for the next line or will reject the next time reader.getNextLine() is called.
Disclaimers
This has only been tested with file-based readstreams.
I would not recommend having more than one reader.getNextLine() in process at once as this code does not anticipate that and it's not even clear what that should do.

Basically you can achieve this using functional approach:
const arrayOfValues = [1,2,3,4,5];
const chainOfPromises = arrayOfValues.reduce((acc, item) => {
return acc.then((result) => {
// Here you can add your logic for parsing/sending request
// And here you are chaining next promise request
return yourAsyncFunction(item);
})
}, Promise.resolve());
// Basically this will do
// Promise.resolve().then(_ => yourAsyncFunction(1)).then(_ => yourAsyncFunction(2)) and so on...
// Start
chainOfPromises.then();

Related

How to process socket.io events in their incoming order

I have the following setup:
async MyFunction(param) {
//... Do some computation
await WriteToDB()
}
io.on('connection', (socket) => {
socket.on('AnEvent', (param) => MyFunction(param))
})
When an event comes in, it calls an asynchronous function which does some computation and in the end write the result to a database with another asynchronous call.
If MyFunction doesn't have an asynchronous call to write the database in the end, for example
MyFunction(param) {
//... Do some computation
}
then it is obviously that all events will be processed in their incoming order. The processing of the next event will only start after the processing of the previous one finishes. However, because of the asynchronous call to the database, I don't know if those incoming events will still be fully processed in order. I am afraid that the processing of the next event starts before the previous await WriteToDB() finishes. How do I change the code to fully processing them in order?

You are correct that there's no guarantee that incoming events will be processed in the order.
To achieve what you are asking, you would need a "Message Queue" that will periodically check for new messages and process them one by one.
const messageQueue = [];
// SocketIO adding Message to MessageQueue
const eventHandler = (message) => {
messageQueue.push(message);
}
const messageHandler = () => {
if (messageQueue.length === 0) {
return;
}
const message = messageQueue.shift();
// Handle Message
// If successful, ask for next message
return messageHandler();
}
Of course, my example is pretty naive, but I hope it will give you a general idea of how what you are asking is accomplished.
If you are finding yourself needing a more robust message queue, Look into RabbitMQ, BullMQ, Kafka

puppeteer-cluster: queue instead of execute

I'm experimenting with Puppeteer Cluster and I just don't understand how to use queuing properly. Can it only be used for calls where you don't wait for a response? I'm using Artillery to fire a bunch of requests simultaneously, but they all fail while only some fail when I have the command execute directly.
I've taken the code straight from the examples and replaced execute with queue which I expected to work, except the code doesn't wait for the result. Is there a way to achieve this anyway?
So this works:
const screen = await cluster.execute(req.query.url);
But this breaks:
const screen = await cluster.queue(req.query.url);
Here's the full example with queue:
const express = require('express');
const app = express();
const { Cluster } = require('puppeteer-cluster');
(async () => {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 2,
});
await cluster.task(async ({ page, data: url }) => {
// make a screenshot
await page.goto('http://' + url);
const screen = await page.screenshot();
return screen;
});
// setup server
app.get('/', async function (req, res) {
if (!req.query.url) {
return res.end('Please specify url like this: ?url=example.com');
}
try {
const screen = await cluster.queue(req.query.url);
// respond with image
res.writeHead(200, {
'Content-Type': 'image/jpg',
'Content-Length': screen.length //variable is undefined here
});
res.end(screen);
} catch (err) {
// catch error
res.end('Error: ' + err.message);
}
});
app.listen(3000, function () {
console.log('Screenshot server listening on port 3000.');
});
})();
What am I doing wrong here? I'd really like to use queuing because without it every incoming request appears to slow down all the other ones.

Author of puppeteer-cluster here.
Quote from the docs:
cluster.queue(..): [...] Be aware that this function only returns a Promise for backward compatibility reasons. This function does not run asynchronously and will immediately return.
cluster.execute(...): [...] Works like Cluster.queue, just that this function returns a Promise which will be resolved after the task is executed. In case an error happens during the execution, this function will reject the Promise with the thrown error. There will be no "taskerror" event fired.
When to use which function:
Use cluster.queue if you want to queue a large number of jobs (e.g. list of URLs). The task function needs to take care of storing the results by printing them to console or storing them into a database.
Use cluster.execute if your task function returns a result. This will still queue the job, so this is like calling queue in addition to waiting for the job to finish. In this scenario, there is most often a "idling cluster" present which is used when a request hits the server (like in your example code).
So, you definitely want to use cluster.execute as you want to wait for the results of the task function. The reason, you do not see any errors is (as quoted above) that the errors of the cluster.queue function are emitted via a taskerror event. The cluster.execute errors are directly thrown (Promise is rejected). Most likely, in both cases your jobs fail, but it is only visible for the cluster.execute

What is the correct pattern with generators and iterators for managing a stream

I am trying to figure out how to arrange a pair of routines to control writing to a stream using the generator/iterator functions in ES2015. Its a simple logging system to use in nodejs
What I am trying to achieve is a function that external processes can call to write to a log.I am hoping that the new generator/iterator functions means that if it needs to suspend/inside this routine that is transparent.
stream.write should normally return immediately, but can return false to say that the stream is full. In this case it needs to wait for stream.on('drain',cb) to fire before returning
I am thinking that the actual software that writes to the stream is a generator function which yields when it is ready to accept another request, and that the function I provide to allow external people to call the stream is an interator, but I might have this the wrong way round.
So, something like this
var stopLogger = false;
var it = writer();
function writeLog(line) {
it.next(line);
})
function *writer() {
while (!stopLogger) {
line = yield;
if(!stream.write) {
yield *WaitDrain(); //can't continue until we get drain
}
}
});
function *waitDrain() {
//Not sure what do do here to avoid waiting
stream.on('drain', () => {/*do I yield here or something*/});

I found the answer here https://davidwalsh.name/async-generators
I have it backwards.
The code above should be
var stopLogger = false;
function *writeLog(line) {
yield writer(line)
})
var it = writeLog();
function writer(line) {
if (stopLogger) {
setTimeout(()=>{it.next();},1};//needed so can get to yield
} else {
if(stream.write(line)) {
setTimeout(()=>{it.next();},1}; //needed so can get to yeild
}
}
}
stream.on('drain', () => {
it.next();
}
I haven't quite tried this, just translated from the above article, and there is some complication around errors etc which the article suggests can be solved by enhancing the it operator to return a promise which can get resolved in a "runGenerator" function, But it solved my main issue, which was about how should the pattern work.

Node.js WriteStream synchronous

I'm writing a purely synchronous, single threaded command line program in node.js, which needs to write a single binary file, for which I'm using WriteStream. My usage pattern is along the lines of:
var stream = fs.createWriteStream(file)
stream.write(buf1)
stream.write(buf2)
This seems to work, but the documentation says it's asynchronous and I want to make sure I'm not writing code that works 99% of the time. I don't care exactly when the data gets written as long as it's written in the specified order and no later than when the program exits, and the quantity of data is small so speed and memory consumption are not issues.
I've seen mention of stream.end() but it seems to work without it and I've also seen suggestions that calling it may actually be a bad idea if you're not using callbacks because it might end up getting called before all the data is written.
Is my approach correct (given that I want purely synchronous) or is there anything I need to watch out for?

You can do this, the only problem can be if you create two or more concurrent streams for the same path: the order of writes from different streams will be undefined. By the way, there is a synchronous fs write stream implementation in node: fs.SyncWriteStream. It's kind of private and requires fd as an argument, but if you really want it...

I'm working on a timing-critical API, where a new file has to have been written and its stream completely handled before the next action can be performed. The solution, in my case (and, quite possibly, that of the OP's question) was to use:
writer.on('finish', () => {
console.error('All writes are now complete.');
});
as per the fs Event: 'finish' documentation

const writeToLocalDisk = (stream, path) => {
return new Promise((resolve, reject) => {
const istream = stream;
const ostream = fs.createWriteStream(path);
istream.pipe(ostream);
istream.on("end", () => {
console.log(`Fetched ${path} from elsewhere`);
resolve();
});
istream.on("error", (err) => {
console.log(JSON.stringify(err, null, 2));
resolve();
});
});
};
// Then use an async function to perform sequential-like operation
async function sequential (stream) {
const path = "";
await writeToLocalDisk(stream, path);
console.log('other operation here');
}

How to read an entire text stream in node.js?

In RingoJS there's a function called read which allows you to read an entire stream until the end is reached. This is useful when you're making a command line application. For example you may write a tac program as follows:
#!/usr/bin/env ringo
var string = system.stdin.read(); // read the entire input stream
var lines = string.split("\n"); // split the lines
lines.reverse(); // reverse the lines
var reversed = lines.join("\n"); // join the reversed lines
system.stdout.write(reversed); // write the reversed lines
This allows you to fire up a shell and run the tac command. Then you type in as many lines as you wish to and after you're done you can press Ctrl+D (or Ctrl+Z on Windows) to signal the end of transmission.
I want to do the same thing in node.js but I can't find any function which would do so. I thought of using the readSync function from the fs library to simulate as follows, but to no avail:
fs.readSync(0, buffer, 0, buffer.length, null);
The file descriptor for stdin (the first argument) is 0. So it should read the data from the keyboard. Instead it gives me the following error:
Error: ESPIPE, invalid seek
at Object.fs.readSync (fs.js:381:19)
at repl:1:4
at REPLServer.self.eval (repl.js:109:21)
at rli.on.self.bufferedCmd (repl.js:258:20)
at REPLServer.self.eval (repl.js:116:5)
at Interface.<anonymous> (repl.js:248:12)
at Interface.EventEmitter.emit (events.js:96:17)
at Interface._onLine (readline.js:200:10)
at Interface._line (readline.js:518:8)
at Interface._ttyWrite (readline.js:736:14)
How would you synchronously collect all the data in an input text stream and return it as a string in node.js? A code example would be very helpful.

As node.js is event and stream oriented there is no API to wait until end of stdin and buffer result, but it's easy to do manually
var content = '';
process.stdin.resume();
process.stdin.on('data', function(buf) { content += buf.toString(); });
process.stdin.on('end', function() {
// your code here
console.log(content.split('').reverse().join(''));
});
In most cases it's better not to buffer data and process incoming chunks as they arrive (using chain of already available stream parsers like xml or zlib or your own FSM parser)

The key is to use these two Stream events:
Event: 'data'
Event: 'end'
For stream.on('data', ...) you should collect your data data into either a Buffer (if it is binary) or into a string.
For on('end', ...) you should call a callback with you completed buffer, or if you can inline it and use return using a Promises library.

Let me illustrate StreetStrider's answer.
Here is how to do it with concat-stream
var concat = require('concat-stream');
yourStream.pipe(concat(function(buf){
// buf is a Node Buffer instance which contains the entire data in stream
// if your stream sends textual data, use buf.toString() to get entire stream as string
var streamContent = buf.toString();
doSomething(streamContent);
}));
// error handling is still on stream
yourStream.on('error',function(err){
console.error(err);
});
Please note that process.stdin is a stream.

There is a module for that particular task, called concat-stream.

If you are in async context and have a recent version of Node.js, here is a quick suggestion:
const chunks = []
for await (let chunk of readable) {
chunks.push(chunk)
}
console.log(Buffer.concat(chunks))

On Windows, I had some problems with the other solutions posted here - the program would run indefinitely when there's no input.
Here is a TypeScript implementation for modern NodeJS, using async generators and for await - quite a bit simpler and more robust than using the old callback based APIs, and this worked on Windows:
import process from "process";
/**
* Read everything from standard input and return a string.
*
* (If there is no data available, the Promise is rejected.)
*/
export async function readInput(): Promise<string> {
const { stdin } = process;
const chunks: Uint8Array[] = [];
if (stdin.isTTY) {
throw new Error("No input available");
}
for await (const chunk of stdin) {
chunks.push(chunk);
}
return Buffer.concat(chunks).toString('utf8');
}
Example:
(async () => {
const input = await readInput();
console.log(input);
})();
(consider adding a try/catch, if you want to handle the Promise rejection and display a more user-friendly error-message when there's no input.)

We Keep Coding

JavaScript is the programming language of the Web.

Asynchronous readline loop without async / await - javascript

Related

How to process socket.io events in their incoming order

puppeteer-cluster: queue instead of execute

What is the correct pattern with generators and iterators for managing a stream

Node.js WriteStream synchronous

How to read an entire text stream in node.js?

Categories

Resources