JavaScript stack overflow when dealing with a large dataset - javascript

I just created a small NodeJS application for doing some network IO and ran into what I believe is a stack overflow because of the number of callbacks.
My application is basically a file transfer function composed of 2 network operations... lets call them fetchData() and sendData(). All they do is some HTTP Request to either GET or POST data.
I have my application setup like this:
function runTransfer(){
// I need to transfer 100 million small documents
// So I need to fetch and send a group (1000) at a time
fetchData(fetchDataCallback);
function fetchDataCallback(data){
// Now that I have my data, lets send it to my other 'store'
sendData(data, sendDataCallback);
}
function sendDataCallback(){
// now that I got a 200 OK response we can fetch more data... and so on
if(totalDocsFetched >= totalNumDocs){
return; // This is where the application would finally end, once we fetched and sent all 100 million docs
}
fetchData(fetchDataCallback);
}
}
Following this design pattern, I think that I would eventually get a stackoverflow because fetchData -> fetchDataCallback -> sendData -> sendDataCallback -> fetchData -> fetchDataCallback -> sendData -> sendData... and so forth until my stack explodes!
What kind of design pattern can I use here to ensure I don't get an overflow like this?

Related

Only one message is received from SQS ( nodejs aws sdk)

I created an SQS with default settings. I published two messages to it, and I would like to read them back in the same time. I tried it like this:
const sqsClient = new SQSClient({ region: REGION });
const params = {
AttributeNames: ["SentTimestamp"],
MaxNumberOfMessages: 5,
MessageAttributeNames: ["All"],
QueueUrl: queueURL,
WaitTimeSeconds: 5,
};
const data = await sqsClient.send(new ReceiveMessageCommand(params));
const messages = data.Messages ?? [];
console.log(messages.length);
Unfortunately only one message is returned, no matter what I provide in MaxNumberOfMessages. What can cause this? How is it possible to fix this issue?
I was able to find a similar question, but it has only one answer, refering to a 3rd party library.
A ReceiveMessageCommand does not guarantee that you will get exactly the number of messages specified for MaxNumberOfMessages. In fact the documentation says the following:
Short poll is the default behavior where a weighted random set of machines is sampled on a ReceiveMessage call. Thus, only the messages on the sampled machines are returned. If the number of messages in the queue is small (fewer than 1,000), you most likely get fewer messages than you requested per ReceiveMessage call. If the number of messages in the queue is extremely small, you might not receive any messages in a particular ReceiveMessage response. If this happens, repeat the request.
You must use long-polling to receive multiple messages. This is essentially setting the WaitTimeSeconds to a greater value (5 seconds should be enough).
And you must have a larger number of messages in the queue to be able to fetch multiple messages with one call.
To summarize:
SQS is a distributed system, each call will poll one machine only.
Messages are distributes on those machines, if you have a small number of messages, it might happen that you fetch only one message, or none.
Test your code with a larger set of sent messages and put your receiving call in loop.

Express retriggering API endpoint on request timeouts

Context:
My Express.js web server is currently serving an API which wraps a SOAP service (some legacy service which I can't change). The SOAP service takes a dynamic number of items to process and takes about 1.5 seconds to process each request. The Nginx server has a timeout of 60 seconds.
Problem:
For a request to this API which e.g. lets say takes more than 60 seconds to complete, I am observing that the service is getting re-triggered automatically (I am assuming by Express.js). So if in the original request I was expecting to insert lets say 50 records to a table, now due to the re-triggering of the API I am ending up with 100 records inserted (duplication).
Here is a skeleton/sample of log that kind of shows the issue: (sensitive info stripped)
January 10, 2022 15:35:44 [... ee905] - Starting myAwesomeAPI() <-- Original API trigger
January 10, 2022 15:36:44 [... ff870] - Starting myAwesomeAPI() <-- Re-trigger happens
January 10, 2022 15:36:54 [... ee905] - Completed myAwesomeAPI() <-- Original API ends (inserts 50 records in the table)
January 10, 2022 15:37:54 [... ff870] - Completed myAwesomeAPI() <-- Re-triggered API ends (inserting another 50 records in the table resulting in duplication)
What I have tried:
To reproduce the issue and check if the re-triggering can be independent of nginx. With the Nginx timeout set to 60 seconds, I changed my Express server's timeout to 10 seconds and 15 items to process (to force timeout before processing can be complete) using this:
const express = require("express")
const server = express()
server.setTimeout(10000) <-- sets all requests to have a 10 seconds timeout
// myAwesomeAPI code
Testing showed that after 10 seconds, the timeout "did" re-trigger the API and the 15 items were duplicated (I saw 30 records inserted). So this tells me that the API is getting re-triggered by Express.js.
Question(s):
How to stop the re-trigger from happening, is there an express server configuration to enable/disable the auto re-triggering on timeout?
Solutions & Ideas:
Since the max items = 100 (set by team), increasing the Nginx and Express.js timeout to 300 seconds should be a quick but dirty fix. I understand that tying async API calls to some approximation of time is pure foolishness (tell me about trying to explain this to other engineers in my team ;-p), so I would like to avoid this approach.
Create a composite key with some combination of columns and enforce the insert restrictions on the table. Combine this with checking if the composite key is already inserted/present in the table and decide to skip/insert. This approach seems a bit better .
Another approach can be to respond back to the API call immediately on receipt (which will close the request) and then continue with the request processing. Something like this (inspiration): https://www.bennadel.com/blog/3275-you-can-continue-to-process-an-express-js-request-after-the-client-response-has-been-sent.htm.
This will make me independent of platform's timeout settings but will take away the real-time nature of the response being delivered with statuses for different items and add a bit more complexity of tracking the request statuses via other lookups etc.
If you have the ability to alter the front end you can add a transaction ID to it. Store the transaction routine in an object linked to the transaction ID, then if you get an API request for an ongoing transaction you can refer to the ongoing transaction.
Something like this:
let transactions = {};
router.get('/myapi', async (req,res,next) => {
try {
let {transactionID} = req.params;
delete(req.params.transactionID);
let transaction = transactions[transactionID];
if(!transaction) {
transaction = (async () => {
let ret = await SOAPCall(req.params);
// hold onto the transaction for some period of time
let to = setTimeout(()=>{
delete(transactions[transactionID]);
}, 5000);
to.detach(); // don't hold up process exit
return ret;
})();
transactions[transactionID] = transaction;
}
let ret = await transaction;
res.json(ret);
}
catch(err) { next(err) }
});

How do I populate array of twilio fax results and send via express?

I am trying to get results from the twilio api like so: twilio => our secured api backend => our client app We are doing this to project our api-keys and other security based purposes.
We have sending faxes down, and also checking for single instances. I am however having a hard time getting the list of faxes we get sent back to our client app after it completes. Mainly due to the fact that it's a repeating call. This is what we have so far for this problem:
app.post('/fax', function (req, res) {
const faxList = [];
const getFax = client.fax.faxes.each((faxes) => {
faxList.push(faxes);
console.log(faxList);
});
Right now when I run this I see the array populated one by one just like it should, but can't seem to return the final result after it completes.
From my searches online it looks like I need to utilize Promise.all to send my completed res.status(200).json(faxList); so express can send the list of faxes to our app. I'm having issues setting up the promise.all as the faxList variable is just empty. Almost as if the push done to the array doesn't last once the call completes.
Does this have something to do with the way that twilio has their fax api function set up? https://github.com/twilio/twilio-node/blob/master/lib/rest/fax/v1/fax.js or would this me not understanding how promise.all functions?
I'm newer to the node side of javascript. I have more experience with other languages so I apologize in advance.
I would try to get the whole list , if you have less than a page worth of faxes . (I think a page in Twilio is 50 )
like so
return new Promise((resolve) => {
client.faxes.list().then(function(faxes){
if (!empty(faxes)){
resolve(faxes);
}
});
});

using publications to create a progressbar for subscriptions

I am calling a Google Analytics API multiple times and load that data into a subscription. Now I want to create a progressbar to inform the user that data is being loaded and give a view on how long it is going to take.
I read that it's best to use publications to pass data from server to client. Is this true?
I created following publication on the server. What is does is following:
set the initial progressValue and the initial publication with id 1
keep looping if the progressValue is less than 100 and tell that the publication of 1 is changing.
Below this code I have an other publication running where progressValue is being set in steps in a loop.
When looking at the client only the last progressValue gets posted. Before this I receive a lot of empty arrays. So it's like:
[]
[]
[]
[]
...
Progress publication
What I want is that the client receives every change in progressValue instead of only the last one. How can I solve this?
If there are any better suggestions on how to create a subscription progressbar, these answers will also be accepted.
if (Meteor.isServer) {
let progressValue = 0;
Meteor.publish('progress', function() {
const self = this;
let lastProgressValue = 0;
const id = 1;
self.added('progress', id, {
progress: progressValue,
total: 100,
});
while (progressValue < 100) {
self.changed('progress', id, {
progress: progressValue,
total: 100,
});
}
self.ready();
});
...
Hmm... so, a couple of things here.
I read that it's best to use publications to pass data from server to
client. Is this true?
This is the whole point of Meteor, using ddp. Means that data is sent to the client automagically from the server. So, the bulk of the work to manipulate data is actually handled client side using minimongo.
Have a look at this article for a good discussion of the 'automagic' part...
http://richsilv.github.io/meteor/meteor-low-level-publications/
How do you do progress?
You don't want to try handle the incrementing on the server side. Instead, you want to get a simple count of the server, perhaps using a reactive aggregate (see my answer here How to reactively aggregate mongodb in meteor) and send that to the client. So, server does a count as a publication and tells the client '57' coming.
Then as your normal data publication, you send the 57 records to the client. ON THE CLIENT, you now basically do the same sum as you did on the server, but as only some of the 57 data records have been received by the client, you effectively get a progress counter by dividing client received by the servers message of total to be sent.
Summary
On the SERVER - 2 publications, 1 reactive aggregate for the count of the records to be sent and 1 as the normal data being sent
On the CLIENT - function to count the records in the local minimongo collection - collection.find({}).count() - will do the trick. This will increment as each record is received by the client from the server.
Progress is as simple as count on client divided by server sent count to be delivered.

HTTP calls in loop in a meteor app stops after a few calls

Trying to automate sending mails to a queue of email ids stored in collection email. I am trying meteor for the first time so please pardon my lack of understanding if I have any.
I am using the following code (on isServer ) :
Meteor.methods({
'sendEmails': function () {
this.unblock();
Emails.find({status: "no"}).forEach(function (obj) {
var result = Meteor.http.call("GET", "http://someapidomain/email.php?email=" + obj.email);
console.log(result.content);
});
}
});
This code is called at Meteor.startup.
When this app is run, api is called and i get the results for 13 emails, sometimes for 5 emails, sometimes 2 emails and then nothing happens. Please help.
Let me know if more detail is required.
I would suggest doing
Emails.find({status: "no"}).fetch().forEach(...)
Note the fetch() in the chain, which ensures that all Mongo communication has completed prior to doing HTTP calls. find() alone returns a reactive cursor, which means you're mixing Mongo activity and HTTP activity which might not be playing nicely - just a theory.
Reference: http://docs.meteor.com/#/basic/Mongo-Collection-find

Categories