Processing websocket onmessage events in order - javascript

I'm using a websocket API that streams real time changes on a remote database, which I then reflect on a local database. The operations on my side must be done in the same order (I create and update records as the data comes in).
My problem is the messages often arrive in bulk, faster than I can process them and end up out of order. Is there a way to make onmessage wait before I'm ready to process the next one ?
Here's an example of what I'm trying, without sucess:
async doSomething(ev) {
return new Promise(async (resolve, reject) => {
// database operations, etc.
resolve(true);
});
}
ws.onmessage = async (ev) => {
await doSomething();
}

You can create a queue (e.g., an array of values) and use Promises (and possibly and AsyncIterator, or ReadableStream, if needed) to process the data in sequential order.
For example
new ReadableStream({
pull(controller) {
controller.enqueue(event.data)
}
})
// ..
reader.read()
.then(({value, done}) => {
// do stuff
})

Related

Creating a Readable stream from emitted data chunks

Short backstory: I am trying to create a Readable stream based on data chunks that are emitted back to my server from the client side with WebSockets. Here's a class I've created to "simulate" that behavior:
class DataEmitter extends EventEmitter {
constructor() {
super();
const data = ['foo', 'bar', 'baz', 'hello', 'world', 'abc', '123'];
// Every second, emit an event with a chunk of data
const interval = setInterval(() => {
this.emit('chunk', data.splice(0, 1)[0]);
// Once there are no more items, emit an event
// notifying that that is the case
if (!data.length) {
this.emit('done');
clearInterval(interval);
}
}, 1e3);
}
}
In this post, the dataEmitter in question will have been created like this.
// Our data is being emitted through events in chunks from some place.
// This is just to simulate that. We cannot change the flow - only listen
// for the events and do something with the chunks.
const dataEmitter = new DataEmitter();
Right, so I initially tried this:
const readable = new Readable();
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
But that results in this error:
Error [ERR_METHOD_NOT_IMPLEMENTED]: The _read() method is not implemented
So I did this, implementing read() as an empty function:
const readable = new Readable({
read() {},
});
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
And it works when piping into a write stream, or sending the stream to my test API server. The resulting .txt file looks exactly as it should:
foobarbazhelloworldabc123
However, I feel like there's something quite wrong and hacky with my solution. I attempted to put the listener registration logic (.on('chunk', ...) and .once('done', ...)) within the read() implementation; however, read() seems to get called multiple times, and that results in the listeners being registered multiple times.
The Node.js documentation says this about the _read() method:
When readable._read() is called, if data is available from the resource, the implementation should begin pushing that data into the read queue using the this.push(dataChunk) method. _read() will be called again after each call to this.push(dataChunk) once the stream is ready to accept more data. _read() may continue reading from the resource and pushing data until readable.push() returns false. Only when _read() is called again after it has stopped should it resume pushing additional data into the queue.
After dissecting this, it seems that the consumer of the stream calls upon .read() when it's ready to read more data. And when it is called, data should be pushed into the stream. But, if it is not called, the stream should not have data pushed into it until the method is called again (???). So wait, does the consumer call .read() when it is ready for more data, or does it call it after each time .push() is called? Or both?? The docs seem to contradict themselves.
Implementing .read() on Readable is straightforward when you've got a basic resource to stream, but what would be the proper way of implementing it in this case?
And also, would someone be able to explain in better terms what the .read() method is on a deeper level, and how it should be implemented?
Thanks!
Response to the answer:
I did try registering the listeners within the read() implementation, but because it is called multiple times by the consumer, it registers the listeners multiple times.
Observing this code:
const readable = new Readable({
read() {
console.log('called');
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
},
});
readable.pipe(createWriteStream('./data.txt'));
The resulting file looks like this:
foobarbarbazbazbazhellohellohellohelloworldworldworldworldworldabcabcabcabcabcabc123123123123123123123
Which makes sense, because the listeners are being registered multiple times.
Seems like the only purpose of actually implementing the read() method is to only start receiving the chunks and pushing them into the stream when the consumer is ready for that.
Based on these conclusions, I've come up with this solution.
class MyReadable extends Readable {
// Keep track of whether or not the listeners have already
// been added to the data emitter.
#registered = false;
_read() {
// If the listeners have already been registered, do
// absolutely nothing.
if (this.#registered) return;
// "Notify" the client via websockets that we're ready
// to start streaming the data chunks.
const emitter = new DataEmitter();
const handler = (chunk: string) => {
this.push(chunk);
};
emitter.on('chunk', handler);
emitter.once('done', () => {
this.push(null);
// Clean up the listener once it's done (this is
// assuming the #emitter object will still be used
// in the future).
emitter.off('chunk', handler);
});
// Mark the listeners as registered.
this.#registered = true;
}
}
const readable = new MyReadable();
readable.pipe(createWriteStream('./data.txt'));
But this implementation doesn't allow for the consumer to control when things are pushed. I guess, however, in order to achieve that sort of control, you'd need to communicate with the resource emitting the chunks to tell it to stop until the read() method is called again.

Multiple fetch request in a raw

For a project, I have to generate a PDF that lists the vehicles in a DB. To do this, I use the JSPDF library.
1st step: I get data from a DB (via asynchronous requests on an API) and images on a server that I store in an Array.
2nd step: I call a function that generates the PDF with JSPDF.
The problem is that I need to have all my data retrieved before calling my generatePDF function otherwise the fields and images are empty because they have not yet been retrieved from the DB or the server.
A solution I found is to use the setTimeout function to put a delay between each call. However, this makes the code very slow and inflexible because you have to change the timeout manually depending on the number of data and images to retrieve. Moreover, it is impossible to determine exactly how long it will take to retrieve the data, especially since this can vary depending on the state of the network, so you have to allow for a margin which is often unnecessary.
Another solution is to use callbacks or to interweave fetch / ajax calls with .then / .done calls, but this becomes very complicated when it comes to retrieving the images since they are retrieved one by one and there are more than a hundred of them.
What would be the easiest way to do this in a clean and flexible way?
Thanks for your help and sorry for the long text, I tried to be as clear as possible :)
To do a series of asynchronous things in order, you start the next operation in the fulfillment handler of the previous operation.
An async function is the easiest way:
async function buildPDF() {
const response = await fetch("/path/to/the/data");
if (!response.ok) {
throw new Error(`HTTP error ${response.status}`);
}
const data = await response.json(); // Or `.text()` or whatever
const pdf = await createPDF(data); // Assuming it returns a promise
// ...
}
If you can't use async functions in your environment and don't want to transpile, you can write your fulfullment handlers as callbacks:
function buildPDF() {
return fetch("/path/to/the/data")
.then(response => {
if (!response.ok) {
throw new Error(`HTTP error ${response.status}`);
}
return response.json(); // Or `.text()` or whatever
})
.then(data => createPDF(data))
.then(pdf => {
// ...
});
}
Note that I'm returning the result of the chain, so that the caller can handle errors.

Incorporating async actions, promise.then() and recursive setTimeout whilst avoiding "deferred antipattern"

I have been reading up on methods to implement a polling function and found a great article on https://davidwalsh.name/javascript-polling. Now using a setTimeout rather than setInterval to poll makes a log of sense, especially with an API that I have no control over and has shown to have varying response times.
So I tried to implement such a solution in my own code in order to challenge my understanding of callbacks, promises and the event loop. I have followed guidance outlined in the post to avoid any anti-patterns Is this a "Deferred Antipattern"? and to ensure promise resolution before a .then() promise resolve before inner promise resolved and this is where I am getting stuck. I have put some code together to simulate the scenario so I can highlight the issues.
My hypothetical scenario is this:
I have an API call to a server which responds with a userID. I then use that userID to make a request to another database server which returns a set of data that carries out some machine learning processing that can take several minutes.
Due to the latency, the task is put onto a task queue and once it is complete it updates a NoSql database entry with from isComplete: false to isComplete: true. This means that we then need to poll the database every n seconds until we get a response indicating isComplete: true and then we cease the polling. I understand there are a number of solutions to polling an api but I have yet to see one involving promises, conditional polling, and not following some of the anti-patterns mentioned in the previously linked post. If I have missed anything and this is a repeat I do apologize in advance.
So far the process is outlined by the code below:
let state = false;
const userIdApi = () => {
return new Promise((res, rej) => {
console.log("userIdApi");
const userId = "uid123";
setTimeout(()=> res(userId), 2000)
})
}
const startProcessingTaskApi = userIdApi().then(result => {
return new Promise((res, rej) => {
console.log("startProcessingTaskApi");
const msg = "Task submitted";
setTimeout(()=> res(msg), 2000)
})
})
const pollDatabase = (userId) => {
return new Promise((res, rej) => {
console.log("Polling databse with " + userId)
setTimeout(()=> res(true), 2000)
})
}
Promise.all([userIdApi(), startProcessingTaskApi])
.then(([resultsuserIdApi, resultsStartProcessingTaskApi]) => {
const id = setTimeout(function poll(resultsuserIdApi){
console.log(resultsuserIdApi)
return pollDatabase(resultsuserIdApi)
.then(res=> {
state = res
if (state === true){
clearTimeout(id);
return;
}
setTimeout(poll, 2000, resultsuserIdApi);
})
},2000)
})
I have a question that relates to this code as it is failing to carry out the polling as I need:
I saw in the accepted answer of the post How do I access previous promise results in a .then() chain? that one should "Break the chain" to avoid huge chains of .then() statements. I followed the guidance and it seemed to do the trick (before adding the polling), however, when I console logged out every line it seems that userIdApi is executed twice; once where it is used in the startProcessingTaskApi definition and then in the Promise.all line.
Is this a known occurrence? It makes sense why it happens I am just wondering why this is fine to send two requests to execute the same promise, or if there is a way to perhaps prevent the first request from happening and restrict the function execution to the Promise.all statement?
I am fairly new to Javascript having come from Python so any pointers on where I may be missing some knowledge to be able to get this seemingly simple task working would be greatly appreciated.
I think you're almost there, it seems you're just struggling with the asynchronous nature of javascript. Using promises is definitely the way to go here and understanding how to chain them together is key to implementing your use case.
I would start by implementing a single method that wraps setTimeout to simplify things down.
function delay(millis) {
return new Promise((resolve) => setTimeout(resolve, millis));
}
Then you can re-implement the "API" methods using the delay function.
const userIdApi = () => {
return delay(2000).then(() => "uid123");
};
// Make userId an argument to this method (like pollDatabase) so we don't need to get it twice.
const startProcessingTaskApi = (userId) => {
return delay(2000).then(() => "Task submitted");
};
const pollDatabase = (userId) => {
return delay(2000).then(() => true);
};
You can continue polling the database by simply chaining another promise in the chain when your condition is not met.
function pollUntilComplete(userId) {
return pollDatabase(userId).then((result) => {
if (!result) {
// Result is not ready yet, chain another database polling promise.
return pollUntilComplete(userId);
}
});
}
Then you can put everything together to implement your use case.
userIdApi().then((userId) => {
// Add task processing to the promise chain.
return startProcessingTaskApi(userId).then(() => {
// Add database polling to the promise chain.
return pollUntilComplete(userId);
});
}).then(() => {
// Everything is done at this point.
console.log('done');
}).catch((err) => {
// An error occurred at some point in the promise chain.
console.error(err);
});
This becomes a lot easier if you're able to actually use the async and await keywords.
Using the same delay function as in Jake's answer:
async function doItAll(userID) {
await startTaskProcessingApi(userID);
while (true) {
if (await pollDatabase(userID)) break;
}
}

Service Worker: how to do a synchronous queue?

I have a Service Worker that receives push messages from Firebase FCM. They cause notifications to show or to cancel. The user can have multiple devices (that's what the cancel is for: when the user already acted on notification A I try to dismiss it on all devices).
The problem I have is when one of the user's devices is offline or turned off altogether. Once the device goes online, firebase delivers all the messages it couldn't deliver before. So, for example, you'd get:
Show notif A with content X
Show notif A with content Y (replaces notif A)
Show notif B with content Z
Cancel notif A
The SW receives these messages in rapid succession. The problem is that cancelling a notification is a lot faster than showing one (~2ms vs 16ms). So the 4th message is handled before the first (or second) message actually created the notification, the result being that the notification is not being cancelled.
// EDIT: Heavily edited question below. Added example code and broke down my questions. Also edited the title to better reflect my actual underlying question.
I tried pushing the messages in a queue and handling them one by one. Turns out this can become a bit complicated because everything in SW is async and, to make matters worse, it can be killed at any time when the browser thinks the SW finished its work. I tried to store the queue in a persistent manner but since LocalStorage is unavailable in SW I need to use the async IndexedDB API. More async calls that could cause problems (like losing items).
It's also possible that event.waitUntil thinks my worker is done before it's actually done because I'm not correctly 'passing the torch' from promise to promise ..
Here's a (lot of) simplified code of what I tried:
// Use localforage, simplified API for IndexedDB
importScripts("localforage.min.js");
// In memory..
var mQueue = []; // only accessed through get-/setQueue()
var mQueueBusy = false;
// Receive push messages..
self.addEventListener('push', function(event) {
var data = event.data.json().data;
event.waitUntil(addToQueue(data));
});
// Add to queue
function addToQueue(data) {
return new Promise(function(resolve, reject) {
// Get queue..
getQueue()
.then(function(queue) {
// Push + store..
queue.push(data);
setQueue(queue)
.then(function(queue){
handleQueue()
.then(function(){
resolve();
});
});
});
});
}
// Handle queue
function handleQueue(force) {
return new Promise(function(resolve, reject) {
// Check if busy
if (mQueueBusy && !force) {
resolve();
} else {
// Set busy..
mQueueBusy = true;
// Get queue..
getQueue()
.then(function(queue) {
// Check if we're done..
if (queue && queue.length<=0) {
resolve();
} else {
// Shift first item
var queuedData = queue.shift();
// Store before continuing..
setQueue(queue)
.then(function(queue){
// Now do work here..
doSomething(queuedData)
.then(function(){
// Call handleQueue with 'force=true' to go past (mQueueBusy)
resolve(handleQueue(true));
});
});
}
});
}
});
}
// Get queue
function getQueue() {
return new Promise(function(resolve, reject) {
// Get from memory if it's there..
if (mQueue && mQueue.length>0) {
resolve(mQueue);
}
// Read from indexed db..
else {
localforage.getItem("queue")
.then(function(val) {
var queue = (val) ? JSON.parse(val) : [];
mQueue = queue;
resolve(mQueue);
});
}
});
}
// Set queue
function setQueue(queue) {
return new Promise(function(resolve, reject) {
// Store queue to memory..
mQueue = queue;
// Write to indexed db..
localforage.setItem("queue", mQueue)
.then(function(){
resolve(mQueue);
});
});
}
// Do something..
function doSomething(queuedData) {
return new Promise(function(resolve, reject) {
// just print something and resolve
console.log(queuedData);
resolve();
});
}
The short version of my question - with my particular use-case in mind - is: how do I handle push messages synchronously without having to use more async API's?
And if I would split those questions into multiple:
Am I right to assume I would need to queue those messages?
If so, how would one handle queues in SW?
I can't (completely) rely on global variables because the SW may be killed and I can't use LocalStorage or similar synchronous API's, so I need to use yet another async API like IndexedDB to do this. Is this assumption correct?
Is my code above the right approach?
Somewhat related: Since I need to pass the event.waitUntil from promise to promise until the queue is processed, am I right to call resolve(handleQueue()) inside handleQueue() to keep it going? Or should I do return handleQueue()? Or..?
Just to apprehend the "why not use collapse_key": It's a chat app and every chat room has it's own tag. A user can participate in more than 4 chatrooms and since firebase limits the amount of collapse_keys to 4 I can't use that.
So I'm going to go out on a limb and say that serializing things to IDB could be overkill. As long as you wait until all your pending work is done before you resolve the promise passed to event.waitUntil(), the service worker should be kept alive. (If it takes minutes to finish that work, there's the chance that the service worker would be killed anyway, but for what you describe I'd say the risk of that is low.)
Here's a rough sketch of how I'd structure your code, taking advantage of native async/await support in all browsers that currently support service workers.
(I haven't actually tested any of this, but conceptually I think it's sound.)
// In your service-worker.js:
const isPushMessageHandlerRunning = false;
const queue = [];
self.addEventListener('push', event => {
var data = event.data.json().data;
event.waitUntil(queueData(data));
});
async function queueData(data) {
queue.push(data);
if (!isPushMessageHandlerRunning) {
await handlePushDataQueue();
}
}
async function handlePushDataQueue() {
isPushMessageHandlerRunning = true;
let data;
while(data = queue.shift()) {
// Await on something asynchronous, based on data.
// e.g. showNotification(), getNotifications() + notification.close(), etc.
await ...;
}
isPushMessageHandlerRunning = false;
}

Best way to calling API inside for loop using Promises

I have 500 millions of object in which each has n number of contacts as like below
var groupsArray = [
{'G1': ['C1','C2','C3'....]},
{'G2': ['D1','D2','D3'....]}
...
{'G2000': ['D2001','D2002','D2003'....]}
...
]
I have two way of implementation in nodejs which is based on regular promises and another one using bluebird as shown below
Regular promises
...
var groupsArray = [
{'G1': ['C1','C2','C3']},
{'G2': ['D1','D2','D3']}
]
function ajax(url) {
return new Promise(function(resolve, reject) {
request.get(url,{json: true}, function(error, data) {
if (error) {
reject(error);
} else {
resolve(data);
}
});
});
}
_.each(groupsArray,function(groupData){
_.each(groupData,function(contactlists,groupIndex){
// console.log(groupIndex)
_.each(contactlists,function(contactData){
ajax('http://localhost:3001/api/getcontactdata/'+groupIndex+'/'+contactData).then(function(result) {
console.log(result.body);
// Code depending on result
}).catch(function() {
// An error occurred
});
})
})
})
...
Using bluebird way i have used concurrency to check how to control the queue of promises
...
_.each(groupsArray,function(groupData){
_.each(groupData,function(contactlists,groupIndex){
var contacts = [];
// console.log(groupIndex)
_.each(contactlists,function(contactData){
contacts.push({
contact_name: 'Contact ' + contactData
});
})
groups.push({
task_name: 'Group ' + groupIndex,
contacts: contacts
});
})
})
Promise.each(groups, group =>
Promise.map(group.contacts,
contact => new Promise((resolve, reject) => {
/*setTimeout(() =>
resolve(group.task_name + ' ' + contact.contact_name), 1000);*/
request.get('http://localhost:3001/api/getcontactdata/'+group.task_name+'/'+contact.contact_name,{json: true}, function(error, data) {
if (error) {
reject(error);
} else {
resolve(data);
}
});
}).then(log => console.log(log.body)),
{
concurrency: 50
}).then(() => console.log())).then(() => {
console.log('All Done!!');
});
...
I want to know when dealing with 100 millions of api call inside loop using promises. please advise the best way to call API asynchronously and deal the response later.
My answer using regular Node.js promises (this can probably easily be adapted to Bluebird or another library).
You could fire off all Promises at once using Promise.all:
var groupsArray = [
{'G1': ['C1','C2','C3']},
{'G2': ['D1','D2','D3']}
];
function ajax(url) {
return new Promise(function(resolve, reject) {
request.get(url,{json: true}, function(error, data) {
if (error) {
reject(error);
} else {
resolve(data);
}
});
});
}
Promise.all(groupsArray.map(group => ajax("your-url-here")))
.then(results => {
// Code that depends on all results.
})
.catch(err => {
// Handle the error.
});
Using Promise.all attempts to run all your requests in parallel. This probably won't work well when you have 500 million requests to make all being attempted at the same time!
A more effective way to do it is to use the JavaScript reduce function to sequence your requests one after the other:
// ... Setup as before ...
const results = [];
groupsArray.reduce((prevPromise, group) => {
return prevPromise.then(() => {
return ajax("your-url-here")
.then(result => {
// Process a single result if necessary.
results.push(result); // Collect your results.
});
});
},
Promise.resolve() // Seed promise.
);
.then(() => {
// Code that depends on all results.
})
.catch(err => {
// Handle the error.
});
This example chains together the promises so that the next one only starts once the previous one completes.
Unfortunately the sequencing approach will be very slow because it has to wait until each request has completed before starting a new one. Whilst each request is in progress (it takes time to make an API request) your CPU is sitting idle whereas it could be working on another request!
A more efficient, but complicated approach to this problem is to use a combination of the above approaches. You should batch your requests so that the requests in each batch (of say 10) are executed in parallel and then the batches are sequenced one after the other.
It's tricky to implement this yourself - although it's a great learning exercise
- using a combination of Promise.all and the reduce function, but I'd suggest using the library async-await-parallel. There's a bunch of such libraries, but I use this one and it works well and it easily does the job you want.
You can install the library like this:
npm install --save async-await-parallel
Here's how you would use it:
const parallel = require("async-await-parallel");
// ... Setup as before ...
const batchSize = 10;
parallel(groupsArray.map(group => {
return () => { // We need to return a 'thunk' function, so that the jobs can be started when they are need, rather than all at once.
return ajax("your-url-here");
}
}, batchSize)
.then(() => {
// Code that depends on all results.
})
.catch(err => {
// Handle the error.
});
This is better, but it's still a clunky way to make such a large amount of requests! Maybe you need to up the ante and consider investing time in proper asynchronous job management.
I've been using Kue lately for managing a cluster of worker processes. Using Kue with the Node.js cluster library allows you to get proper parallelism happening on a multi-core PC and you can then easily extend it to muliple cloud-based VMs if you need even more grunt.
See my answer here for some Kue example code.
In my opinion you have two problems coupled in one questions - I'd decouple them.
#1 Loading of a large dataset
Operation on such a large dataset (500m records) will surely cause some memory limit issues sooner or later - node.js runs in a single thread and that is limited to use approx 1.5GB of memory - after that your process will crash.
In order to avoid that you could be reading your data as a stream from a CSV - I'll use scramjet as it'll help us with the second problem, but JSONStream or papaparse would do pretty well too:
$ npm install --save scramjet
Then let's read the data - I'd assume from a CSV:
const {StringStream} = require("scramjet");
const stream = require("fs")
.createReadStream(pathToFile)
.pipe(new StringStream('utf-8'))
.csvParse()
Now we have a stream of objects that will return the data line by line, but only if we read it. Solved problem #1, now to "augment" the stream:
#2 Stream data asynchronous augmentation
No worries - that's just what you do - for every line of data you want to fetch some additional info (so augment) from some API, which by default is asynchronous.
That's where scramjet kicks in with just couple additional lines:
stream
.flatMap(groupData => Object.entries(groupData))
.flatMap(([groupIndex, contactList]) => contactList.map(contactData => ([contactData, groupIndex])
// now you have a simple stream of entries for your call
.map(([contactData, groupIndex]) => ajax('http://localhost:3001/api/getcontactdata/'+groupIndex+'/'+contactData))
// and here you can print or do anything you like with your data stream
.each(console.log)
After this you'd need to accumulate the data or output it to stream - there are numbers of options - for example: .toJSONArray().pipe(fileStream).
Using scramjet you are able to separate the process to multiple lines without much impact on performance. Using setOptions({maxParallel: 32}) you can control concurrency and best of all, all this will run with a minimal memory footprint - much much faster than if you were to load the whole data into memory.
Let me know how if this is helpful - your question is quite complex so let me know if you run into any problems - I'll be happy to help. :)

Categories