How to improve performance of my app ? Heavy calculations in Nodejs - javascript

I am using a trading bot I created using Nodejs (Express), Postgres and React. React is only for the UI and all the work is done with Node. Postgres is used to store in DB trades information.
The application only runs on my localhost.
The workflow is like this :
Fetch the list of every crypto there is on the trading platform (1 external API request)
Loop on each crypto fetched and perform some "heavy" calculations on them (+150 external API requests and call lot of async helper functions)
If one crypto meets required conditions, buy it (2 external API requests per bought crypto) then insert the trade data in DB (1 internal API request)
When all cryptos have been looped through, repeat
I tried to simplify it the most I could because it is a little more complex than that but this is the basic workflow.
This works fine but is really slow. The loop on each crypto (2) takes about a minute to complete (around 2.5 seconds per crypto).
Here is some code to give you an idea of the structure (again, simplified to the max because it uses lot of other helpers functions and is a few thousands lines of code total) :
// Controller :
exports.strategyBacktesting = async (req, res, next) => {
//... Some code ...
const markets = await getMarkets();
const calculationsResults = await calculations(markets);
//... More code ...
res.status(200).json({
// Return calculationResults and other stuff for the UI
)};
}
//----- Files in utils folders
// Called from controller
exports.getMarkets = async () => {
// Get all the listed cryptos via an external API request
// Loop through all the fetched data to filter the cryptos I want to work with
// Return a Promise (array containing all the cryptos I will do the calculations on)
}
// Called from controller
exports.calculations = async (markets) => {
// Loop through all the results from getMarkets() (markets param) (+150 results) = all the cryptos to work with
for (let cryptoPair in markets) ...
// In the loop:
// Call the getPairCandles() function (below) to get all the data needed per crypto :
// Note that the cryptoPair param is the object from the loop
const { /* data needed for the calculation */ } = await getPairCandles(cryptoPair)
// Perform all the needed calculations (one iteration = one crypto) => This is where it takes a long time (around 2.5 seconds per crypto)
// Calls lots of async helper functions and do the calculations
// If some required conditions are met, do :
//- 1 external async API call to buy the crypto
//- 1 external async API call to set a sell limit
//- One internal async API call to save the trade info in the DB
// When the loop is done on the +150 cryptos, return a Promise for the controller to return to UI
}
// Called from calculations() function above
const getPairCandles = async (cryptoPair) => {
// Make an external API request to get specific real-time data about one crypto (the cryptoPair param of the function)
// Loop through some data in the object received to filter the wanted properties
// Return a Promise containing 5 arrays (the data I need to calculate on this crypto)
}
I read about workers and such in Nodejs that could help with heavy calculations. That being said, how could I implement it ? Is it even the good solution in my case ? The thing is, the loop that performs the calculations is the same that contains lots of calls to async functions returning Promises and async API calls. This is the loop that takes around a minute to complete.
So I think I can greatly improve the performance, but don't know how. What can I do about it ? Could someone please guide me in the good direction ?
Thank you very much

Related

Calculating sum of multiple async callbacks

I am still learning Javascript properly and read a few threads on here about asynchronity (1 2).
Unfortunately, I have still trouble to find a solution to my problem. I am calling an API in async fashion and would like to create a sum out of the response that I am receiving of multiple async calls.
Let's say I want my code to perform an action if my "sum" has a value of 40. Async call "A" returns 6, async call "B" returns 8, and so on until we finally reach the value of 40.
How can I create such a sum, as my calls are all async?
Would it require to write each async result to a database and pull the value up in the next async call?
Is there a better solution for this?
Thank you for your help.
EDIT: To make things easier to understand I will add some source code:
Webhook.js
router.post('/', (req, res, next) => {
if (middleware.active)
middleware.handle(req.body) // <--- this gives me one result
res.sendStatus(200)
});
Basically, I will receive multiple webhook calls. "middleware.handle" will send an API call to third party application. I want to take the result from that API call and add it to the result of another API call from another webhook request.
As you can see, I don't know when my webhook will be triggered, neither how many times it will be triggered before reaching my desired total of 40.
You can use Promise.all([apiCallOne, apiCallTwo]).then(values => sum(values))
sum being a function that sums an array of numbers
You can await for both in a do/while loop then do the rest of the logic
async function myFunc() {
let valA = 0;
let valB = 0;
do {
valA += await fetch('https://api.url.com/a');
valB += await fetch('https://api.url.com/b');
} while((valA + valB) < 40 );
You can also do each one in its do/while loop if you want to request value from one only at a time then check condition.

Seat booking logic for meteor react application

I am working on a meteor booking application where I have a limited number of seats. My application can be used by many users in parallel so there is a possibility that two users try to book same seat. But as per business logic seats can never be overbooked. In Java, I could have restricted parallel booking using synchronized block. I don't have much experience in meteor/react so not sure what should be the right way to achieve this.
My current thought is to use a reactive boolean to create a lock so if the application gets two booking request then it process them synchronously and fail the second booking request. Since the seat will already be allocated in first request. But I am afraid whether I will get into any deadlock. So I am seeking your opinion/help to implement this in a proper way.
Thanks for your advice!
I'm assuming here your backend is node.js, seen as your using meteor, you're already using NPM, so a backend using Node makes sense.
In this case let's say your using Express or KOA to handle your requests, you can simply chain your tasks using promises, this will force the tasks to execute linearly.
Below is a simple working example, if you run the snippet you will notice I'm adding tasks every 700ms, but the tasks can only be completed in 1000ms, but as you see there is not overlapping and tasks get done in order.
const delay = (ms) => new Promise((r) => setTimeout(r, ms));
let lastTask = Promise.resolve();
async function addTask(txt) {
const ptask = lastTask;
lastTask = (async () => {
await ptask;
console.log(`starting task ${txt}`);
await delay(1000);
console.log(`done task ${txt}`);
})();
}
async function test() {
for (let l = 0; l < 5; l += 1) {
setTimeout(() => {
console.log(`adding task ${l}`);
addTask(l);
}, l * 700);
}
}
test();
if you are using pub/sub in your Meteor, the job is done. Your bookings are reactive on a first come first served basis. As long as your connection is on, when you write your first booking, the seat is taken.
E.g. (logic writing)
1 Publish your bookings within desired scope.
2.Subscribe on the client within the same scope.
3.If document bookedOn $exists (date of booking) make 'unbookable' / unclickable, make UX show the necessary colors/experience.
When one books it, all users online on the platform and viewing that component will get the update.
It would be a bit of an 'issue' if you wouldn't use pubs/subs but common ... you're on Meteor you should use the native reactivity of Meteor. Your boolean is Boolean(bookedOn) or just bookedOn.
I think the meteor way to do it is to call meteor method, to than user occupies a seat.
In this method check if this seat is already occupied.
More info here:
https://forums.meteor.com/t/if-multiple-users-are-trying-to-access-one-method-of-meteor-methods-how-to-make-method-as-synchronous-to-use-one-user-only-at-a-time/24969/8
But methods from different clients runs concurrently on the server.
You'll have to use something like semaphore. The easiest way would be to write a lock in mongo and check if the lock is not exists for this seat. Later the lock can be destroyed by mongo with TTL https://docs.mongodb.com/manual/tutorial/expire-data/
You can read more about methods here https://guide.meteor.com/methods.html
To wrap it up, the pseudo code would be something like this:
accuireLock(userId, seatId); // will read lock, if it's free write it, then read again just in case. In anything fails it should throw error
takeSeat(userId, seatId);

Duplicate Array Data Web Scraping

I can't seem to get the article duplicates out of my web scraper results, this is my code:
app.get("/scrape", function (req, res) {
request("https://www.nytimes.com/", function (error, response, html) {
// Load the HTML into cheerio and save it to a variable
// '$' becomes a shorthand for cheerio's selector commands, much like jQuery's '$'
var $ = cheerio.load(html);
var uniqueResults = [];
// With cheerio, find each p-tag with the "title" class
// (i: iterator. element: the current element)
$("div.collection").each(function (i, element) {
// An empty array to save the data that we'll scrape
var results = [];
// store scraped data in appropriate variables
results.link = $(element).find("a").attr("href");
results.title = $(element).find("a").text();
results.summary = $(element).find("p.summary").text().trim();
// Log the results once you've looped through each of the elements found with cheerio
db.Article.create(results)
.then(function (dbArticle) {
res.json(dbArticle);
}).catch(function (err) {
return res.json(err);
});
});
res.send("You scraped the data successfully.");
});
});
// Route for getting all Articles from the db
app.get("/articles", function (req, res) {
// Grab every document in the Articles collection
db.Article.find()
.then(function (dbArticle) {
res.json(dbArticle);
})
.catch(function (err) {
res.json(err);
});
});
Right now I am getting five copies of each article sent to the user. I have tried db.Article.distinct and various versions of this to filter the results down to only unique articles. Any tips?
In Short:
Switching the var results = [] from an Array to an Object var results = {} did the trick for me. Still haven't figured out the exact reason for the duplicate insertion of documents in database, will update as soon I find out.
Long Story:
You have multiple mistakes and points of improvement there in your code. I will try pointing them out:
Let's follow them first to make your code error free.
Mistakes
1. Although mongoose's model.create, new mongoose() does seem to work fine with Arrays but I haven't seen such a use before and it does not even look appropriate.
If you intend to create documents one after another then represent your documents using an object instead of an Array. Using an array is more mainstream when you intend to create multiple documents at once.
So switch -
var results = [];
to
var results = {};
2. Sending response headers after they are already sent will create for you an error. I don't know if you have already noticed it or not but its pretty much clear upfront as once the error is popped up the remaining documents won't get stored because of PromiseRejection Error if you haven't setup a try/catch block.
The block inside $("div.collection").each(function (i, element) runs asynchronously so your process control won't wait for each document to get processed, instead it would immediately execute res.send("You scraped the data successfully.");.
This will effectively terminate the Http connection between the client and the server and any further issue of response termination calls like res.json(dbArticle) or res.json(err) will throw an error.
So, just comment the res.json statements inside the .create's then and catch methods. This will although terminate the response even before the whole articles are saved in the DB but you need not to worry as your code would still work behind the scene saving articles in database for you (asynchronously).
If you want your response to be terminated only after you have successfully saved the data then change your middleware implementation to -
request('https://www.nytimes.com', (err, response, html) => {
var $ = cheerio.load(html);
var results = [];
$("div.collection").each(function (i, element) {
var ob = {};
ob.link = $(element).find("a").attr("href");
ob.title = $(element).find("a").text();
ob.summary = $(element).find("p.summary").text().trim();
results.push(ob);
});
db.Article.create(results)
.then(function (dbArticles) {
res.json(dbArticles);
}).catch(function (err) {
return res.json(err);
});
});
After making above changes and even after the first one, my version of your code ran fine. So if you want you can continue on with your current version, or you may try reading some points of improvement.
Points of Improvements
1. Era of callbacks is long gone:
Convert your implementation to utilise Promises as they are more maintainable and easier to reason about. Here are the things you can do -
Change request library from request to axios or any one which supports Promises by default.
2. Make effective use of mongoose methods for insertion. You can perform bulk inserts of multiple statements in just one query. You may find docs on creating documents in mongodb quite helpful.
3. Start using some frontend task automation library such as puppeteer or nightmare.js for data scraping related task. Trust me, they make life a hell lot easier than using cheerio or any other library for the same. Their docs are really good and well maintained so you won't have have hard time picking these up.

Run 1000 requests so that only 10 runs at a time

With node.js I want to http.get a number of remote urls in a way that only 10 (or n) runs at a time.
I also want to retry a request if an exception occures locally (m times), but when the status code returns an error (5XX, 4XX, etc) the request counts as valid.
This is really hard for me to wrap my head around.
Problems:
Cannot try-catch http.get as it is async.
Need a way to retry a request on failure.
I need some kind of semaphore that keeps track of the currently active request count.
When all requests finished I want to get the list of all request urls and response status codes in a list which I want to sort/group/manipulate, so I need to wait for all requests to finish.
Seems like for every async problem using promises are recommended, but I end up nesting too many promises and it quickly becomes uncypherable.
There are lots of ways to approach the 10 requests running at a time.
Async Library - Use the async library with the .parallelLimit() method where you can specify the number of requests you want running at one time.
Bluebird Promise Library - Use the Bluebird promise library and the request library to wrap your http.get() into something that can return a promise and then use Promise.map() with a concurrency option set to 10.
Manually coded - Code your requests manually to start up 10 and then each time one completes, start another one.
In all cases, you will have to manually write some retry code and as with all retry code, you will have to very carefully decide which types of errors you retry, how soon you retry them, how much you backoff between retry attempts and when you eventually give up (all things you have not specified).
Other related answers:
How to make millions of parallel http requests from nodejs app?
Million requests, 10 at a time - manually coded example
My preferred method is with Bluebird and promises. Including retry and result collection in order, that could look something like this:
const request = require('request');
const Promise = require('bluebird');
const get = Promise.promisify(request.get);
let remoteUrls = [...]; // large array of URLs
const maxRetryCnt = 3;
const retryDelay = 500;
Promise.map(remoteUrls, function(url) {
let retryCnt = 0;
function run() {
return get(url).then(function(result) {
// do whatever you want with the result here
return result;
}).catch(function(err) {
// decide what your retry strategy is here
// catch all errors here so other URLs continue to execute
if (err is of retry type && retryCnt < maxRetryCnt) {
++retryCnt;
// try again after a short delay
// chain onto previous promise so Promise.map() is still
// respecting our concurrency value
return Promise.delay(retryDelay).then(run);
}
// make value be null if no retries succeeded
return null;
});
}
return run();
}, {concurrency: 10}).then(function(allResults) {
// everything done here and allResults contains results with null for err URLs
});
The simple way is to use async library, it has a .parallelLimit method that does exactly what you need.

NodeJS http and extremely large response bodies

At the moment, I'm trying to request a very large JSON object from an API (particularly this one) which, depending on various factors, can be upwards of a few MB. The problem is, however, is that NodeJS takes forever to do anything and then just runs out of memory: the first line of my response callback doesn't ever execute.
I could request each item individually, but that is a tremendous amount of requests. To quote the a dev behind the new API:
Until now, if you wanted to get all the market orders for Tranquility you had to request every type per region individually. That would generally be 50+ regions multiplied by upwards of 13,000 types. Even if it was just 13,000 types and 50 regions, that is 650,000 requests required to get all the market information. And if you wanted to get all the data in the 5-minute cache window, it would require almost 2,200 requests per second.
Obviously, that is not a great idea.
I'm trying to get the array items into redis for use later, then follow the next url and repeat until the last page is reached. Is there any way to do this?
EDIT:
Here's the problem code. Visiting the URL works fine in-browser.
// ...
REGIONS.forEach((region) => {
LOG.info(' * Grabbing data for `' + region.name + '#' + region.id + '`');
var href = url + region.id + '/orders/all/', next = href;
var page = 1;
while (!!next) {
https.get(next, (res) => {
LOG.info(' * * Page ' + page++ + ' responded with ' + res.statusCode);
// ...
The first LOG.info line executes, while the second does not.
It appears that you are doing a while(!!next) loop which is the cause of your problem. If you show more of the server code, we could advise more precisely and even suggest a better way to code it.
Javascript run your code single threaded. That means one thread of execution runs to completion before any other events can be run.
So, if you do:
while(!!next) {
https.get(..., (res) => {
// hoping this will run
});
}
Then, your callback to http.get() will never get called. Your while loop just keeps running forever. As long as it is running, the callback from the https.get() can never get called. That request is likely long since completed and there's an event sitting in the internal JS event queue to call the callback, but until your while() loop finished, that event can't get called. So you have a deadlock. The while() loop is waiting for something else to run to change it's condition, but nothing else can run until the while() loop is done.
There are several other ways to do serial async iterations. In general, you can't use .forEach() or while().
Here are several schemes for async looping:
Node.js: How do you handle callbacks in a loop?
While loop with jQuery async AJAX calls
How to synchronize a sequence of promises?
How to use after and each in conjunction to create a synchronous loop in underscore js
Or, the async library which you mentioned also has functions for doing async looping.
First of all, a few MBs of json payload is not exactly huge. So the route handler code might require some close scrutiny.
However, to actually deal with huge amounts of JSON, you can consume your request as a stream. JSONStream (along with many other similar libraries) allow you to do this in a memory efficient way. You can specify the paths you need to process using JSONPath (XPath analog for JSON) and then subscribe to the stream for matching data sets.
Following example from the README of JSONStream illustrates this succinctly:
var request = require('request')
, JSONStream = require('JSONStream')
, es = require('event-stream')
request({url: 'http://isaacs.couchone.com/registry/_all_docs'})
.pipe(JSONStream.parse('rows.*'))
.pipe(es.mapSync(function (data) {
console.error(data)
return data
}))
Use the stream functionality of the request module to process large amounts of incoming data. As data comes through the stream, parse it to a chunk of data that can be worked with, push that data through the pipe, and pull in the next chunk of data.
You might create a transform stream to manipulate a chunk of data that has been parsed and a write stream to store the chunk of data.
For example:
var stream = request ({ url: your_url }).pipe(parseStream)
.pipe(transformStream)
.pipe (writeStream);
stream.on('finish', () => {
setImmediate (() => process.exit(0));
});
Try for info on creating streams https://bl.ocks.org/joyrexus/10026630

Categories