Solving async producer-consumer problem in JS - javascript

I have a producer of data, and a consumer of data. The producer produces asynchronously, and in turn I would like the consumer to consume asynchronously when there is data to consume.
My immediate thought to solve this problem is to use some queue object that has an awaitable shift/get, much like this async queue in the python standard
However, I searched and I couldn't find any JS libraries that have this type of data structure for me to use. I would have thought this would be a common pattern.
What is the common pattern for solving this problem in JS, and are there any libraries to help?

If the producer of the data is just spontaneously producing data and the consumer just wants to know when there's some new data, then this sounds like the consumer should just subscribe to an event that will be triggered any time there is new data. You can just use the EventEmitter object in node.js to create an emitter that the consumer can listen to and the producer will trigger and event whenever there's new data. No external library is needed to implement this as the built-in EventEmitter object has all the tools you need to register for notifications and trigger notifications.
If the consumer of the data requests data and the producer then goes and gets it asynchronously, then this is just a typical asynchronous API. The API should probably return a promise and the producer will resolve the promise with the new data when it's ready or reject it if there was an error retrieving the data.
With the little bit of description you've provided, I don't see any particular need for an elaborate queuing system. It just sounds like publish/subscribe or a simple event notification system. If the problem is more complicated, then please give us more details on the producer of the data so we can better match the tools available in node.js to the needs of your particular problem.

In case of small simple program, I would just simply write something like this.
var data = [];
function Consumer()
{
this.isConsuming = false;
this.notify = function(){
if(this.isConsuming)
{
return;
}
this.consumeNext();
}
this.consumeNext = async function(){
this.isConsuming = true;
if(data.length > 0)
{
//consume one datum
console.log(await this.consume(data.shift()));
//consume next datum
this.consumeNext();
}
else
{
this.isConsuming = false;
}
}
this.consume = async function(datum){
return datum * datum;
}
}
var consumer = new Consumer();
//call consumer.notify() when your producer produces
data.push(1,2,3,4,5);
consumer.notify();

This will give you another idea. In my scenario the producer creates data every 1000 milliseconds and consumer waits until producer created new data and resolved its promise.
let dataArray = []
let consumerResolver = null
function producer() {
setInterval(() => {
const newData = "my new Data"
dataArray.push(newData)
if (consumerResolver) {
consumerResolver()
}
}, 1000);
}
async function consumer() {
while (true) {
if (dataArray.length === 0) {
const producerPromise = new Promise((resolve) => {
consumerResolver = resolve
})
await producerPromise
}
consumerResolver = null
const data = dataArray.shift()
console.log(data)
}
}

Related

Creating a Readable stream from emitted data chunks

Short backstory: I am trying to create a Readable stream based on data chunks that are emitted back to my server from the client side with WebSockets. Here's a class I've created to "simulate" that behavior:
class DataEmitter extends EventEmitter {
constructor() {
super();
const data = ['foo', 'bar', 'baz', 'hello', 'world', 'abc', '123'];
// Every second, emit an event with a chunk of data
const interval = setInterval(() => {
this.emit('chunk', data.splice(0, 1)[0]);
// Once there are no more items, emit an event
// notifying that that is the case
if (!data.length) {
this.emit('done');
clearInterval(interval);
}
}, 1e3);
}
}
In this post, the dataEmitter in question will have been created like this.
// Our data is being emitted through events in chunks from some place.
// This is just to simulate that. We cannot change the flow - only listen
// for the events and do something with the chunks.
const dataEmitter = new DataEmitter();
Right, so I initially tried this:
const readable = new Readable();
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
But that results in this error:
Error [ERR_METHOD_NOT_IMPLEMENTED]: The _read() method is not implemented
So I did this, implementing read() as an empty function:
const readable = new Readable({
read() {},
});
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
And it works when piping into a write stream, or sending the stream to my test API server. The resulting .txt file looks exactly as it should:
foobarbazhelloworldabc123
However, I feel like there's something quite wrong and hacky with my solution. I attempted to put the listener registration logic (.on('chunk', ...) and .once('done', ...)) within the read() implementation; however, read() seems to get called multiple times, and that results in the listeners being registered multiple times.
The Node.js documentation says this about the _read() method:
When readable._read() is called, if data is available from the resource, the implementation should begin pushing that data into the read queue using the this.push(dataChunk) method. _read() will be called again after each call to this.push(dataChunk) once the stream is ready to accept more data. _read() may continue reading from the resource and pushing data until readable.push() returns false. Only when _read() is called again after it has stopped should it resume pushing additional data into the queue.
After dissecting this, it seems that the consumer of the stream calls upon .read() when it's ready to read more data. And when it is called, data should be pushed into the stream. But, if it is not called, the stream should not have data pushed into it until the method is called again (???). So wait, does the consumer call .read() when it is ready for more data, or does it call it after each time .push() is called? Or both?? The docs seem to contradict themselves.
Implementing .read() on Readable is straightforward when you've got a basic resource to stream, but what would be the proper way of implementing it in this case?
And also, would someone be able to explain in better terms what the .read() method is on a deeper level, and how it should be implemented?
Thanks!
Response to the answer:
I did try registering the listeners within the read() implementation, but because it is called multiple times by the consumer, it registers the listeners multiple times.
Observing this code:
const readable = new Readable({
read() {
console.log('called');
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
},
});
readable.pipe(createWriteStream('./data.txt'));
The resulting file looks like this:
foobarbarbazbazbazhellohellohellohelloworldworldworldworldworldabcabcabcabcabcabc123123123123123123123
Which makes sense, because the listeners are being registered multiple times.
Seems like the only purpose of actually implementing the read() method is to only start receiving the chunks and pushing them into the stream when the consumer is ready for that.
Based on these conclusions, I've come up with this solution.
class MyReadable extends Readable {
// Keep track of whether or not the listeners have already
// been added to the data emitter.
#registered = false;
_read() {
// If the listeners have already been registered, do
// absolutely nothing.
if (this.#registered) return;
// "Notify" the client via websockets that we're ready
// to start streaming the data chunks.
const emitter = new DataEmitter();
const handler = (chunk: string) => {
this.push(chunk);
};
emitter.on('chunk', handler);
emitter.once('done', () => {
this.push(null);
// Clean up the listener once it's done (this is
// assuming the #emitter object will still be used
// in the future).
emitter.off('chunk', handler);
});
// Mark the listeners as registered.
this.#registered = true;
}
}
const readable = new MyReadable();
readable.pipe(createWriteStream('./data.txt'));
But this implementation doesn't allow for the consumer to control when things are pushed. I guess, however, in order to achieve that sort of control, you'd need to communicate with the resource emitting the chunks to tell it to stop until the read() method is called again.

Create new instance on every user request

Performance question - I'm trying to understand if i have a node http server powered by express , is it consider bad practice to create new class instance on every request sent by the user?
The class instance fetch data from another api and expose some functionality to manipulate the fetched data.
example of the code:
//--- Handler.js ---
const _ = require("lodash");
class Handler {
constructor() {
this.fetchData = this.getSiteModel.bind(this);
this.getA = this.getA.bind(this);
this.getB = this.getB.bind(this);
this.getC = this.getC.bind(this);
}
async fetchData(req,res,id){
const result = await fetch(...)
this.data = result;
}
getA(){
...
return this.data.A
}
getB(){
...
return this.data.B
}
getC(){
...
return this.data.C
}
}
//---- controller.js ----
const Handler = require("../Handler/");
exports.getDataById = async function(req ,res) {
const handler = new Handler();
return handler.getA();
}
Would it be better to do this instead
//---- controller.js ----
const fetchData = require("../Handler/");
const getA = require("../Handler/getA");
const getB = require("../Handler/getB");
const getC = require("../Handler/getC");
exports.getDataById = async function(req ,res) {
//no new handler instance created
const data = fetchData(url)
return getA(data);
}
Is it considered bad practice to create new class instance on every request sent by the user?
No, it's not generically considered a bad practice. It's normal to need to create objects during the processing of an incoming request. Look at any database query which is often used during a request handler. It will likely create multiple objects.
Now, whether or not there is a more efficient way to do what you're doing is a different question and we would need to see your actual code in order to offer some advice on that topic.
Would it be better to do this instead?
I don't see a whole lot of reasons for you to put the data into an object before making a single operation on that object.
Something like you proposed (with the addition of await to make it work properly):
exports.getDataById = async function(req ,res) {
//no new handler instance created
const data = await fetchData(url)
return getA(data);
}
Seems perfectly fine to me. When to structure things into a class and put the data into the instance data and why to just use a function to operate on the data is a classic OOP question and it depends upon lots of things which are all dependent upon seeing and understanding your real code, what you're doing with it, how it is most likely to be expanded in the future, how often you are calling multiple functions on the same data in one request, etc...

Service Worker: how to do a synchronous queue?

I have a Service Worker that receives push messages from Firebase FCM. They cause notifications to show or to cancel. The user can have multiple devices (that's what the cancel is for: when the user already acted on notification A I try to dismiss it on all devices).
The problem I have is when one of the user's devices is offline or turned off altogether. Once the device goes online, firebase delivers all the messages it couldn't deliver before. So, for example, you'd get:
Show notif A with content X
Show notif A with content Y (replaces notif A)
Show notif B with content Z
Cancel notif A
The SW receives these messages in rapid succession. The problem is that cancelling a notification is a lot faster than showing one (~2ms vs 16ms). So the 4th message is handled before the first (or second) message actually created the notification, the result being that the notification is not being cancelled.
// EDIT: Heavily edited question below. Added example code and broke down my questions. Also edited the title to better reflect my actual underlying question.
I tried pushing the messages in a queue and handling them one by one. Turns out this can become a bit complicated because everything in SW is async and, to make matters worse, it can be killed at any time when the browser thinks the SW finished its work. I tried to store the queue in a persistent manner but since LocalStorage is unavailable in SW I need to use the async IndexedDB API. More async calls that could cause problems (like losing items).
It's also possible that event.waitUntil thinks my worker is done before it's actually done because I'm not correctly 'passing the torch' from promise to promise ..
Here's a (lot of) simplified code of what I tried:
// Use localforage, simplified API for IndexedDB
importScripts("localforage.min.js");
// In memory..
var mQueue = []; // only accessed through get-/setQueue()
var mQueueBusy = false;
// Receive push messages..
self.addEventListener('push', function(event) {
var data = event.data.json().data;
event.waitUntil(addToQueue(data));
});
// Add to queue
function addToQueue(data) {
return new Promise(function(resolve, reject) {
// Get queue..
getQueue()
.then(function(queue) {
// Push + store..
queue.push(data);
setQueue(queue)
.then(function(queue){
handleQueue()
.then(function(){
resolve();
});
});
});
});
}
// Handle queue
function handleQueue(force) {
return new Promise(function(resolve, reject) {
// Check if busy
if (mQueueBusy && !force) {
resolve();
} else {
// Set busy..
mQueueBusy = true;
// Get queue..
getQueue()
.then(function(queue) {
// Check if we're done..
if (queue && queue.length<=0) {
resolve();
} else {
// Shift first item
var queuedData = queue.shift();
// Store before continuing..
setQueue(queue)
.then(function(queue){
// Now do work here..
doSomething(queuedData)
.then(function(){
// Call handleQueue with 'force=true' to go past (mQueueBusy)
resolve(handleQueue(true));
});
});
}
});
}
});
}
// Get queue
function getQueue() {
return new Promise(function(resolve, reject) {
// Get from memory if it's there..
if (mQueue && mQueue.length>0) {
resolve(mQueue);
}
// Read from indexed db..
else {
localforage.getItem("queue")
.then(function(val) {
var queue = (val) ? JSON.parse(val) : [];
mQueue = queue;
resolve(mQueue);
});
}
});
}
// Set queue
function setQueue(queue) {
return new Promise(function(resolve, reject) {
// Store queue to memory..
mQueue = queue;
// Write to indexed db..
localforage.setItem("queue", mQueue)
.then(function(){
resolve(mQueue);
});
});
}
// Do something..
function doSomething(queuedData) {
return new Promise(function(resolve, reject) {
// just print something and resolve
console.log(queuedData);
resolve();
});
}
The short version of my question - with my particular use-case in mind - is: how do I handle push messages synchronously without having to use more async API's?
And if I would split those questions into multiple:
Am I right to assume I would need to queue those messages?
If so, how would one handle queues in SW?
I can't (completely) rely on global variables because the SW may be killed and I can't use LocalStorage or similar synchronous API's, so I need to use yet another async API like IndexedDB to do this. Is this assumption correct?
Is my code above the right approach?
Somewhat related: Since I need to pass the event.waitUntil from promise to promise until the queue is processed, am I right to call resolve(handleQueue()) inside handleQueue() to keep it going? Or should I do return handleQueue()? Or..?
Just to apprehend the "why not use collapse_key": It's a chat app and every chat room has it's own tag. A user can participate in more than 4 chatrooms and since firebase limits the amount of collapse_keys to 4 I can't use that.
So I'm going to go out on a limb and say that serializing things to IDB could be overkill. As long as you wait until all your pending work is done before you resolve the promise passed to event.waitUntil(), the service worker should be kept alive. (If it takes minutes to finish that work, there's the chance that the service worker would be killed anyway, but for what you describe I'd say the risk of that is low.)
Here's a rough sketch of how I'd structure your code, taking advantage of native async/await support in all browsers that currently support service workers.
(I haven't actually tested any of this, but conceptually I think it's sound.)
// In your service-worker.js:
const isPushMessageHandlerRunning = false;
const queue = [];
self.addEventListener('push', event => {
var data = event.data.json().data;
event.waitUntil(queueData(data));
});
async function queueData(data) {
queue.push(data);
if (!isPushMessageHandlerRunning) {
await handlePushDataQueue();
}
}
async function handlePushDataQueue() {
isPushMessageHandlerRunning = true;
let data;
while(data = queue.shift()) {
// Await on something asynchronous, based on data.
// e.g. showNotification(), getNotifications() + notification.close(), etc.
await ...;
}
isPushMessageHandlerRunning = false;
}

JavaScript work queue

I've created this object which contains an array, which serves as a work queue.
It kind of works like this:
var work1 = new Work();
var work2 = new Work();
var queue = Workqueue.instance();
queue.add(work1) // Bluebird promise.
.then(function addWork2() {
return queue.add(work2);
})
.then(function toCommit() {
return queue.commit();
})
.then(function done(results) {
// obtain results here.
})
.catch(function(err){});
It works in that case and I can commit more than one task before I call the commit.
However if it's like this:
var work1 = new Work();
var work2 = new Work();
var queue = Workqueue.instance();
queue.add(work1)
.then(function toCommit1() {
return queue.commit();
})
.then(function done1(result1) {
// obtain result1 here.
})
.catch(function(err){});
queue.add(work2)
.then(function toCommit2() {
return queue.commit();
})
.then(function done2(result2) {
// obtain result2 here.
})
.catch(function(err){});
Something may go wrong, because if the first commit is called after the second commit (two works/tasks are already added), the first commit handler expects a result but they all go to the second commit handler.
The task involves Web SQL database read and may also involves network access. So it's basically a complicated procedure so the above described problem may surface. If only I can have a addWorkAndCommit() implemented which wraps the add and commit together, but still there is no guarantee because addWorkAndCommit() cannot be "atomic" in a sense because they involves asynchronous calls. So even two calls to addWorkAndCommit() may fail. (I don't know how to describe it other than by "atomic", since JavaScript is single-threaded, but this issue crops up).
What can I do?
The problem is that there is a commit() but no notion of a transaction, so you cannot explicitly have two isolated transactions running in parallel. From my understanding the Javascript Workqueue is a proxy for a remote queue and the calls to add() and commit() map directly to some kind of remote procedure calls having a similar interface without transactions. I also understand that you would not care if the second add() actually happened after the first commit(), you just want to write two simple subsequent addWorkAndCommit() statements without synchronizing the underlying calls in client code.
What you can do is write a wrapper around the local Workqueue (or alter it directly if it is your code), so that each update of the queue creates a new transaction and a commit() always refers to one such transaction. The wrapper then delays new updates until all previous transactions are committed (or rolled back).
Adopting Benjamin Gruenbaum's recommendation to use a disposer pattern, here is one, written as an adapter method for Workqueue.instance() :
Workqueue.transaction = function (work) { // `work` is a function
var queue = this.instance();
return Promise.resolve(work(queue)) // `Promise.resolve()` avoids an error if `work()` doesn't return a promise.
.then(function() {
return queue.commit();
});
}
Now you can write :
// if the order mattters,
// then add promises sequentially.
Workqueue.transaction(function(queue) {
var work1 = new Work();
var work2 = new Work();
return queue.add(work1)
.then(function() {
return queue.add(work2);
});
});
// if the order doesn't mattter,
// add promises in parallel.
Workqueue.transaction(function(queue) {
var work1 = new Work();
var work2 = new Work();
var promise1 = queue.add(work1);
var promise2 = queue.add(work2);
return Promise.all(promise1, promise2);
});
// you can even pass `queue` around
Workqueue.transaction(function(queue) {
var work1 = new Work();
var promise1 = queue.add(work1);
var promise2 = myCleverObject.doLotsOfAsyncStuff(queue);
return Promise.all(promise1, promise2);
});
In practice, an error handler should be included like this - Workqueue.transaction(function() {...}).catch(errorHandler);
Whatever you write, all you need to do is ensure that the callback function returns a promise that is an aggregate of all the component asynchronisms (component promises). When the aggregate promise resolves, the disposer will ensure that the transaction is committed.
As with all disposers, this one doesn't do anything you can't do without it. However it :
serves as a reminder of what you are doing by providing a named .transaction() method,
enforces the notion of a single transaction by constraining a Workqueue.instance() to one commit.
If for any reason you should ever need to do two or more commits on the same queue (why?), then you can always revert to calling Workqueue.instance() directly.

What is the correct pattern with generators and iterators for managing a stream

I am trying to figure out how to arrange a pair of routines to control writing to a stream using the generator/iterator functions in ES2015. Its a simple logging system to use in nodejs
What I am trying to achieve is a function that external processes can call to write to a log.I am hoping that the new generator/iterator functions means that if it needs to suspend/inside this routine that is transparent.
stream.write should normally return immediately, but can return false to say that the stream is full. In this case it needs to wait for stream.on('drain',cb) to fire before returning
I am thinking that the actual software that writes to the stream is a generator function which yields when it is ready to accept another request, and that the function I provide to allow external people to call the stream is an interator, but I might have this the wrong way round.
So, something like this
var stopLogger = false;
var it = writer();
function writeLog(line) {
it.next(line);
})
function *writer() {
while (!stopLogger) {
line = yield;
if(!stream.write) {
yield *WaitDrain(); //can't continue until we get drain
}
}
});
function *waitDrain() {
//Not sure what do do here to avoid waiting
stream.on('drain', () => {/*do I yield here or something*/});
I found the answer here https://davidwalsh.name/async-generators
I have it backwards.
The code above should be
var stopLogger = false;
function *writeLog(line) {
yield writer(line)
})
var it = writeLog();
function writer(line) {
if (stopLogger) {
setTimeout(()=>{it.next();},1};//needed so can get to yield
} else {
if(stream.write(line)) {
setTimeout(()=>{it.next();},1}; //needed so can get to yeild
}
}
}
stream.on('drain', () => {
it.next();
}
I haven't quite tried this, just translated from the above article, and there is some complication around errors etc which the article suggests can be solved by enhancing the it operator to return a promise which can get resolved in a "runGenerator" function, But it solved my main issue, which was about how should the pattern work.

Categories