Creating a Readable stream from emitted data chunks - javascript

Short backstory: I am trying to create a Readable stream based on data chunks that are emitted back to my server from the client side with WebSockets. Here's a class I've created to "simulate" that behavior:
class DataEmitter extends EventEmitter {
constructor() {
super();
const data = ['foo', 'bar', 'baz', 'hello', 'world', 'abc', '123'];
// Every second, emit an event with a chunk of data
const interval = setInterval(() => {
this.emit('chunk', data.splice(0, 1)[0]);
// Once there are no more items, emit an event
// notifying that that is the case
if (!data.length) {
this.emit('done');
clearInterval(interval);
}
}, 1e3);
}
}
In this post, the dataEmitter in question will have been created like this.
// Our data is being emitted through events in chunks from some place.
// This is just to simulate that. We cannot change the flow - only listen
// for the events and do something with the chunks.
const dataEmitter = new DataEmitter();
Right, so I initially tried this:
const readable = new Readable();
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
But that results in this error:
Error [ERR_METHOD_NOT_IMPLEMENTED]: The _read() method is not implemented
So I did this, implementing read() as an empty function:
const readable = new Readable({
read() {},
});
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
And it works when piping into a write stream, or sending the stream to my test API server. The resulting .txt file looks exactly as it should:
foobarbazhelloworldabc123
However, I feel like there's something quite wrong and hacky with my solution. I attempted to put the listener registration logic (.on('chunk', ...) and .once('done', ...)) within the read() implementation; however, read() seems to get called multiple times, and that results in the listeners being registered multiple times.
The Node.js documentation says this about the _read() method:
When readable._read() is called, if data is available from the resource, the implementation should begin pushing that data into the read queue using the this.push(dataChunk) method. _read() will be called again after each call to this.push(dataChunk) once the stream is ready to accept more data. _read() may continue reading from the resource and pushing data until readable.push() returns false. Only when _read() is called again after it has stopped should it resume pushing additional data into the queue.
After dissecting this, it seems that the consumer of the stream calls upon .read() when it's ready to read more data. And when it is called, data should be pushed into the stream. But, if it is not called, the stream should not have data pushed into it until the method is called again (???). So wait, does the consumer call .read() when it is ready for more data, or does it call it after each time .push() is called? Or both?? The docs seem to contradict themselves.
Implementing .read() on Readable is straightforward when you've got a basic resource to stream, but what would be the proper way of implementing it in this case?
And also, would someone be able to explain in better terms what the .read() method is on a deeper level, and how it should be implemented?
Thanks!
Response to the answer:
I did try registering the listeners within the read() implementation, but because it is called multiple times by the consumer, it registers the listeners multiple times.
Observing this code:
const readable = new Readable({
read() {
console.log('called');
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
},
});
readable.pipe(createWriteStream('./data.txt'));
The resulting file looks like this:
foobarbarbazbazbazhellohellohellohelloworldworldworldworldworldabcabcabcabcabcabc123123123123123123123
Which makes sense, because the listeners are being registered multiple times.

Seems like the only purpose of actually implementing the read() method is to only start receiving the chunks and pushing them into the stream when the consumer is ready for that.
Based on these conclusions, I've come up with this solution.
class MyReadable extends Readable {
// Keep track of whether or not the listeners have already
// been added to the data emitter.
#registered = false;
_read() {
// If the listeners have already been registered, do
// absolutely nothing.
if (this.#registered) return;
// "Notify" the client via websockets that we're ready
// to start streaming the data chunks.
const emitter = new DataEmitter();
const handler = (chunk: string) => {
this.push(chunk);
};
emitter.on('chunk', handler);
emitter.once('done', () => {
this.push(null);
// Clean up the listener once it's done (this is
// assuming the #emitter object will still be used
// in the future).
emitter.off('chunk', handler);
});
// Mark the listeners as registered.
this.#registered = true;
}
}
const readable = new MyReadable();
readable.pipe(createWriteStream('./data.txt'));
But this implementation doesn't allow for the consumer to control when things are pushed. I guess, however, in order to achieve that sort of control, you'd need to communicate with the resource emitting the chunks to tell it to stop until the read() method is called again.

Related

How do I know when a Stream (Node.js, MongoDB) is ready for changes?

I want to make sure that a Stream is ready for changes before I can send data to the client. My code:
// Get the Stream (MongoDB collection)
let stream = collection.watch()
// Generate identifier to send it to the client
const uuid = uuid()
// Listen for changes
stream
.on('change', () => {
// Send something to WebSocket client
webSocket.emit('identifier', uuid)
})
// Mutate data base collection to kick off the "change" event
await collection.updateOne()
The line with webSocket.emit is my problem. How do I know if the Stream is already ready to receive events? It happens that change event never occurs so the webSocket.emit gets never invoked.
TL;DR
Basically, I need to send something to the client but need to make sure that the Stream is ready for receiving events before that.
This looks like a race condition where your update query is executed before the changeStream aggregation pipeline reaches the server. Basically you need to wait for the stream cursor to be set before triggering the change.
I couldn't find any "cursor ready" event, so as a workaround you can check its id. It is assigned by the server so when it is available on the client it kinda guarantee that all consecutive data changes will be captured.
Something like this should do the job:
async function streamReady(stream) {
return new Promise(ok => {
const i = setInterval(() => {
if (stream.cursor.cursorState.cursorId) {
clearInterval(i);
return ok()
}
}, 1)
});
}
Then in your code:
// Get the Stream (MongoDB collection)
let stream = collection.watch()
// Generate identifier to send it to the client
const uuid = uuid()
// Listen for changes
stream
.on('change', () => {
// Send something to WebSocket client
webSocket.emit('identifier', uuid)
})
await streamReady(stream);
// Mutate data base collection to kick off the "change" event
await collection.updateOne()
Disclaimer:
The streamReady function above relies on cursorState. It is an internal field which can be changed without notice even in a patch version update of the driver.
I've managed to get this work without using an undocumented piece of the API.
await new Promise<void>((resolve) => {
changeStream.once('resumeTokenChanged', () => {
resolve();
});
});
I'm using the resumeTokenChanged event. This worked for me.

Handle events. One by one. Node.js

I have got this event listener:
manager.on('newOffer', (offer) => {
console.log(`We have a new offer :#${offer.id}`.bgGreen);
getInventory( function(){
processTradeOffer(offer);});});
My problem born when many 'newOffers' appear. They cannot been process correctly because the program is firing 'newOffers' withouting process first.
What I need(In my opinion): When an 'newOffer' appear, turn off the event listener, complete functions and then turn on again. Is that possibly?
Sound like you need a queue with a concurrency of 1, something that async.queue() can provide:
const handler = (offer, done) => {
console.log(`We have a new offer :#${offer.id}`.bgGreen);
getInventory(() => {
processTradeOffer(offer); // NB: if this is async, pass a callback that calls `done`
done();
});
};
const queue = require('async').queue(handler, 1);
...
// Add a new offer to the queue:
let offer = new Offer(); // or however you create `offer` objects
queue.push(offer);
(however, this could become a bottleneck if you can't handle offers fast enough, in which case you may need to rethink how offers get handled)

Exposing a Highland.js stream, but handling the end event internally

I am reading data from a PostgreSQL database using Node.js:
const readFromDatabase = function (callback) {
pg.connect('pg://…', (errConnect, client, disconnect) => {
if (errConnect) {
return callback(errConnect);
}
const query = client.query('SELECT * FROM …');
// …
});
};
The query object now is an event emitter that emits row events whenever a row is received. Additionally, it emits an end event once all rows have been read.
What I would like to do now is to wrap this event emitter into a Highland.js stream and hand this over to the caller of my function. Basically this should do the job:
const stream = highland('row', query);
callback(null, stream);
Unfortunately, I still need to call the disconnect function once all rows have been read, and I don't want the caller to care about this. So how can I hand out the stream while still be able to register a callback for the end event?
I have seen that Highland.js offers the done function that does exactly what I need, but it also causes the stream to start flowing (which I do not want to do internally, that's up to my caller).
How can I solve this?

Waiting for an Asynchronous method in NodeJS

I've looked high and low, and can only find how to write async functions, which I already understand.
What I am trying to do is run an async method in a triggered event [EventEmitter], but such a simple thing seems to be just simply not possible as I can find.
Consider the following...
// Your basic async method..
function doSomething(callback) {
var obj = { title: 'hello' };
// Fire an event for event handlers to alter the object.
// EvenEmitters are called synchronously
eventobj.emit('alter_object', obj);
callback(null, obj);
}
// when this event is fired, I want to manipulate the data
eventobj.on('alter_object', function(obj) {
obj.title += " world!";
// Calling this async function here means that our
// event handler will return before our data is retrieved.
somemodule.asyncFunction(callback(err, data) {
obj.data = data;
});
});
As you can see in the last few lines, the event handler will finish before the object's data property is added.
What I need is something where I can turn the async function into an sync function and get the results there and then. so for example...
obj.data = somemodule.asyncFunction();
I've looked at the wait.for module, the async module, and none of these will not work. I've even looked into the yield method, but it seems not yet fully implemented into the V8 engine.
I've also tried using a while loop too wait for data to populate, but this just brings with it the CPU overload issue.
Has anyone experienced this and found a design pattern to get around this?
You cannot turn an async function into a synchronous one in node.js. It just cannot be done.
If you have an asynchronous result, you cannot return it synchronously or wait for it. You will have to redesign the interface to use an asynchronous interface (which nearly always involves passing in a callback that will be called when the result is ready).
If you're wanting to do something after you .emit() an event that is itself going to do something asynchronously and you want to wait until after the async thing finished, then the event emitter is probably not the right interface. You'd rather have a function call that returns a promise or takes a callback as an argument. You could manipulate an eventEmitter to use this, but you'd have to post back a second event when the async operation finished and have the original caller not do it's second part until it receives the second event (which is really not a good way to go).
Bottom line - you need a different design that works with async responses (e.g. callbacks or promises).
Seems what I wanted to do is just not possible and the two models conflict. To achieve what I wanted in the end, I encapsulated my modules into an array-like object with some event methods that will pass it on to the module's object, which inherited the async-eventemitter class.
So, think of it like so...
My custom app modules may inherit the async-eventemitter module so they have the .on() and .emit(), etc. methods.
I create a customised array item, that will allow me to pass an event on to the module in question that will work asynchronously.
The code I created (and this is by no means complete or perfect)...
// My private indexer for accessing the modules array (below) by name.
var module_dict = {};
// An array of my modules on my (express) app object
app.modules = [];
// Here I extended the array with easier ways to add and find modules.
// Haven't removed some code to trim down this. Let me know if you want the code.
Object.defineProperty(app.modules, 'contains', { enumerable: false, ... });
Object.defineProperty(app.modules, 'find', { enumerable: false, ... });
// Allows us to add a hook/(async)event to a module, if it exists
Object.defineProperty(app.modules, 'on', { enumerable: false, configurable: false, value: function(modulename, action, func) {
if (app.modules.contains(modulename)) {
var modu = app.modules.find(modulename);
if (modu.module && modu.module['on']) {
// This will pass on the event to the module's object that
// will have the async-eventemitter inherited
modu.module.on(action, func);
}
}
} });
Object.defineProperty(app.modules, 'once', { enumerable: false, configurable: false, value: function(modulename, action, func) {
if (app.modules.contains(modulename)) {
var modu = app.modules.find(modulename);
if (modu.on) {
modu.on(action, func);
}
}
} });
This then allows me to bind an event handler to a module by simply calling something like the following... .on(module_name, event_name, callback)
app.modules.on('my_special_module_name', 'loaded', function(err, data, next) {
// ...async stuff, that then calls next to continue to the next event...
if (data.filename.endsWith('.jpg'))
data.dimensions = { width: 100, height: 100 };
next(err, data);
});
And then to execute it I would do something like (express)...
app.get('/foo', function(req, res, next) {
var data = {
filename: 'bar.jpg'
};
// Now have event handlers alter/update our data
// (eg, extend an object about a file with image data if that file is an image file).
my_special_module.emit('loaded', data, function(err, data) {
if (err) next(err);
res.send(data);
next();
});
});
Again, this is just an example of what I did, so i've probably missed something in my copy above, but effectively it's the design I ended up using and it worked like a treat, and I was able to extend data on an object before being pushed out to my HTTP response, without having to replace the main [expressjs] object's standard EventEmitter model.
(eg, I added image data for files that we're loaded that we're image files. If anyone wants the code, let me know, I am more than happy to share what I did)

Ensure order that subscribers get updated

Is there a way to make sure the order on how subscribers get updated is ensured?
I've got a hot observable and my first subscriber does some sync work to update a variable and my next subscriber then has to initialise a service (only once!), and only after that variable is ensured to be set!
it looks like this:
import App from './App'
var appSource = App.init() // gets the hot observable
// our second subscriber
appSource.take(1).subscribe(() => {
// take 1 to only run this once
nextService.init()
})
where App.init looks like this:
...
init() {
var source = this.createObservable() // returns a hot interval observable that fetches a resource every few minutes
// first subscriber, updates the `myVar` every few minutes
source.subscribe((data) => this.myVar = data)
return source
}
...
this currently works, but I am unsure if it will always follow the order 100%.
EDIT:
As I've heard, subscribers will be invoked FIFO. So the order is somewhat assured.
I don't know if RxJS ever explicitly guarantees that observers are called in order of subscription. But, as you say, it usually works.
However, you might consider modelling your actual workflow instead of relying on implicit observer order.
It sounds like you need to know when your app is initialized so you can take further action. Instead of relying on knowledge of the internal workings of App.init, App could expose an API for this:
One (non-Rx way) is to let the caller supply a callback to init:
//...
init(callback) {
var source = this.createObservable() // returns a hot interval observable that fetches a resource every few minutes
// first subscriber, updates the `myVar` every few minutes
source.subscribe((data) => {
this.myVar = data;
if (callback) {
callback();
callback = undefined;
}
})
return source
}
// elsewhere
App.init(() => nextService.init());
Another option instead of a callback is to just have init return a Promise that your resolve (or an Rx.AsyncSubject that you signal) once initialization is complete.
And yet another option, but requires a bit of a refactor, is to model this.myVar as the observable data that it is. i.e.:
init() {
this.myVar = this.createObservable().replay(1);
this.myVar.connect();
// returns an observable that signals when we are initialized
return this.myVar.first();
}
// elsewhere, you end up with this pattern...
const servicesToInit = [ App, service1, service2, service3 ];
Observable
.of(servicesToInit)
.concatMap(s => Rx.Observable.defer(() => s.init()))
.toArray()
.subscribe(results => {
// all initializations complete
// results is an array containing the value returned by each service's init observable
});
Now, anything that wants to make use of myVar would always need to subscribe to it in someway to get the current and/or future values. They could never just synchronously ask for the current value.

Categories