Nodejs synchronous loops block execution - javascript

when I try to run a function in background it blocks every other requests until it is done...
For example if I execute that function and then try to make a get request to a route that returns some information from the database then the response will come only after that function execution is done and I don't understand why.
This is the basic structure of my function that runs in background (it finds the 3rd party requests from a page and then look for the initiator request for each of them):
const thirdPartyReq = [];
let allRequests = [];
const findInitiatorReq = async () => {
allRequests = allRequests.reverse();
for(const [_, request] of thirdPartyReq.entries()) {
if(!request["Initiator Request"]) {
const fullRequest = request['Request URL'];
const parseUrl = new URL(fullRequest);
let hostname = parseUrl.hostname || null;
const domain = await extractDomain(hostname);
let pathname = parseUrl.pathname || null;
hostname = hostname.replace(/www./g, '')
let checkUrl;
const domainIndex = hostname.indexOf(domain) - 1;
const subdomain = (hostname.substr(0, domainIndex));
const queryString = parseUrl.search || '';
const noProtocol = hostname + pathname + queryString;
const noQueryString = hostname + pathname;
const requestProcessing = [fullRequest, noProtocol, noQueryString, hostname];
const requestIndex = allRequests.findIndex((el) => {
return (el.url == request['Request URL'] && el.thirdParty);
});
for(const [_, query] of requestProcessing.entries()) {
for(const [index, checkRequest] of allRequests.entries()) {
if(index > requestIndex) {
if(checkRequest.content && checkRequest.content.body) {
const contentBody = checkRequest.content.body;
if(contentBody.includes(query)) {
request['Initiator Request'] = checkRequest.url;
}
}
}
}
}
}
}
}
for(const [pageIndex, page] of results.entries()) {
const pageUrl = page.url;
const requests = page.requests;
const savedRequestUrls = [];
let parseUrl = new URL(pageUrl);
let hostname = parseUrl.hostname;
let requestsCounter = 0;
const pageDomain = await extractDomain(hostname);
if(!urlList.includes(pageUrl)) {
crawledUrls.push(pageUrl);
}
for(const [_, request] of Object.entries(requests)) {
if(request.url.indexOf('data:') == -1) {
parseUrl = new URL(request.url);
hostname = parseUrl.hostname;
let requestDomain = await extractDomain(hostname);
const reqObj = await findThirdPartyReq(pageUrl, request, requestDomain);
if(reqObj != null) {
request.thirdParty = true;
savedRequestUrls.push(reqObj);
}
// Store all requests that have a domain
if(requestDomain) {
request.page = pageUrl;
allRequests.push(request);
requestsCounter++;
}
}
}
findInitiatorReq();
}
I noticed that everything will work well if I remove this part of code:
for(const [_, query] of requestProcessing.entries()) {
for(const [index, checkRequest] of allRequests.entries()) {
if(index > requestIndex) {
if(checkRequest.content && checkRequest.content.body) {
const contentBody = checkRequest.content.body;
if(contentBody.includes(query)) {
request['Initiator Request'] = checkRequest.url;
}
}
}
}
}
This is the route that calls the function:
router.get('/cookies',async (req, res) => {
res.status(200).send(true);
const cookies = await myFunc();
}
Can anyone please tell me why that function is blocking everything until it returns a response and how can I fix this?

The obvious answer here is to convert your function into an asynchronous one. There are already multiple answers here on StackOverflow about that topic.
The gist: use asynchronous functions when you're elaborating some heavy task. Bear in mind that NodeJS is single threaded, thus the fact that sync functions block execution of other functions, is somewhat expected.
The tool you need to use to achieve asynchronous functions are: async/await (included without libraries/transpiling in the latest NodeJS LTS) and Promises. Forget about callbacks, since they are a really bad design.
How to use async / await in js:
https://medium.com/#Abazhenov/using-async-await-in-express-with-node-8-b8af872c0016
https://medium.freecodecamp.org/how-to-write-beautiful-node-js-apis-using-async-await-and-the-firebase-database-befdf3a5ffee
How to use Promises and what they are:
Replacing callbacks with promises in Node.js
understanding javascript promise object

Well, obviously you have a synchronous loop, which, of course, blocks execution. It will eventually block it anyway, as it has to perform several heavy operations. The response to the client is sent, but you still continue to work on some stuff, so other requests will have to wait.
A probable solution could be something like triggering another node process and handling stuff out there (something similar to a WebWorker in the browser)
You can try this library: async, there is a eachSeries method in it, meant specifically for handling big chunks of data/arrays. See the documentation for further information

Related

Slow firebase function response from background activity

The function shown below puzzles me for two reasons:
the function execution terminates before all output is given
the function execution takes more than 3 minutes; a very long time (so long, that it might not be because of the "cold starts" issue only).
When searching for bestpractices I found a hint, that background acitivities are slowed down after function execution is terminated (https://cloud.google.com/functions/docs/bestpractices/tips#do_not_start_background_activities).
How can I create a function, that terminates after all output is created and avoids background activity?
Is there any way how to speed up the get() processing?
screenshot of firebase functions dashboard
screensthot of firestore showing the document created to trigger the function
Please have a look on the code:
// The Cloud Functions for Firebase SDK to create Cloud Functions .
const functions = require("firebase-functions");
// The Firebase Admin SDK to access Firestore.
const admin = require("firebase-admin");
admin.initializeApp();
const db = admin.firestore();
exports.evaluateScore = functions
.region('europe-west1')
.firestore
.document('organizations/{orgId}/persons/{personId}')
.onWrite(async (snap, context) => {
const newDocument = snap.after.exists ? snap.after.data() : null;
const oldDocument = snap.before.exists ? snap.before.data() : null;
console.log(`lastName: '${newDocument.personLastName}'; id: '${snap.after.id}'`);
// if only newDocument exists
if (newDocument != null && oldDocument == null ) {
const arrayNameSplit = snap.after.ref.path.split('/');
var orgId = arrayNameSplit[arrayNameSplit.length -3];
var listOfProfiles = newDocument.listOfProfiles;
console.log(`listOfProfiles: `, JSON.stringify(listOfProfiles));
for (let i = 0; i < listOfProfiles.length; i++) {
db.collection('organizations').doc(orgId).collection('profiles').doc(listOfProfiles[i]).get()
.then(docRef => {
const profile = docRef.data();
console.log(i, ' profileTitle:', JSON.stringify(profile.profileTitle))
}).catch(e => {
console.error('something went wrong', e)
});
}
}
});
You have asynchronous calls in your code, but are not telling the Cloud Functions runtime about that (through the return value). It is very likely that your database get() calls don't even complete at this stage.
To fix that problem, you can use await inside the loop or Promise.all:
exports.evaluateScore = functions
.region('europe-west1')
.firestore
.document('organizations/{orgId}/persons/{personId}')
.onWrite(async (snap, context) => {
const newDocument = snap.after.exists ? snap.after.data() : null;
const oldDocument = snap.before.exists ? snap.before.data() : null;
console.log(`lastName: '${newDocument.personLastName}'; id: '${snap.after.id}'`);
// if only newDocument exists
if (newDocument != null && oldDocument == null ) {
const arrayNameSplit = snap.after.ref.path.split('/');
var orgId = arrayNameSplit[arrayNameSplit.length -3];
var listOfProfiles = newDocument.listOfProfiles;
console.log(`listOfProfiles: `, JSON.stringify(listOfProfiles));
for (let i = 0; i < listOfProfiles.length; i++) {
const docRef = await db.collection('organizations').doc(orgId).collection('profiles').doc(listOfProfiles[i]).get();
const profile = docRef.data();
console.log(i, ' profileTitle:', JSON.stringify(profile.profileTitle))
}
}
});
There may be more problems with your code, so I recommend reading the documentation on sync, async and promises, and how to create a minimal, complete, verifiable example for future questions.

How to correctly use 'async, await and promises' in nodejs, while allocating values to a variable returned from a time-consuming function?

Problem Statement:
Our aim is to allocate values in the array ytQueryAppJs, which are returned from a time consuming function httpsYtGetFunc().
The values in ytQueryAppJs needs to be used many times in further part of the code, hence it needs to be done 'filled', before the code proceeds further.
There are many other arrays like ytQueryAppJs, namely one of them is ytCoverAppJs, that needs to be allocated the value, the same way as ytQueryAppJs.
The values in ytCoverAppJs further require the use of values from ytQueryAppJs. So a solution with clean code would be highly appreciated.
(I am an absolute beginner. I have never used async, await or promises and I'm unaware of the correct way to use it. Please guide.)
Flow (to focus on):
The user submits a queryValue in index.html.
An array ytQueryAppJs is logged in console, based on the query.
Expected Log in Console (similar to):
Current Log in Console:
Flow (originally required by the project):
User submits query in index.html.
The values of arrays, ytQueryAppJs, ytCoverAppJs, ytCoverUniqueAppJs, ytLiveAppJs, ytLiveUniqueAppJs gets logged in the console, based on the query.
Code to focus on, from 'app.js':
// https://stackoverflow.com/a/14930567/14597561
function compareAndRemove(removeFromThis, compareToThis) {
return (removeFromThis = removeFromThis.filter(val => !compareToThis.includes(val)));
}
// Declaring variables for the function 'httpsYtGetFunc'
let apiKey = "";
let urlOfYtGetFunc = "";
let resultOfYtGetFunc = "";
let extractedResultOfYtGetFunc = [];
// This function GETs data, parses it, pushes required values in an array.
async function httpsYtGetFunc(queryOfYtGetFunc) {
apiKey = "AI...MI"
urlOfYtGetFunc = "https://www.googleapis.com/youtube/v3/search?key=" + apiKey + "&part=snippet&q=" + queryOfYtGetFunc + "&maxResults=4&order=relevance&type=video";
let promise = new Promise((resolve, reject) => {
// GETting data and storing it in chunks.
https.get(urlOfYtGetFunc, (response) => {
const chunks = []
response.on('data', (d) => {
chunks.push(d)
})
// Parsing the chunks
response.on('end', () => {
resultOfYtGetFunc = JSON.parse((Buffer.concat(chunks).toString()))
// console.log(resultOfYtGetFunc)
// Extracting useful data, and allocating it.
for (i = 0; i < (resultOfYtGetFunc.items).length; i++) {
extractedResultOfYtGetFunc[i] = resultOfYtGetFunc.items[i].id.videoId;
// console.log(extractedResultOfYtGetFunc);
}
resolve(extractedResultOfYtGetFunc);
})
})
})
let result = await promise;
return result;
}
app.post("/", function(req, res) {
// Accessing the queryValue, user submitted in index.html. We're using body-parser package here.
query = req.body.queryValue;
// Fetching top results related to user's query and putting them in the array.
ytQueryAppJs = httpsYtGetFunc(query);
console.log("ytQueryAppJs:");
console.log(ytQueryAppJs);
});
Complete app.post method from app.js:
(For better understanding of the problem.)
app.post("/", function(req, res) {
// Accessing the queryValue user submitted in index.html.
query = req.body.queryValue;
// Fetcing top results related to user's query and putting them in the array.
ytQueryAppJs = httpsYtGetFunc(query);
console.log("ytQueryAppJs:");
console.log(ytQueryAppJs);
// Fetching 'cover' songs related to user's query and putting them in the array.
if (query.includes("cover") == true) {
ytCoverAppJs = httpsYtGetFunc(query);
console.log("ytCoverAppJs:");
console.log(ytCoverAppJs);
// Removing redundant values.
ytCoverUniqueAppJs = compareAndRemove(ytCoverAppJs, ytQueryAppJs);
console.log("ytCoverUniqueAppJs:");
console.log(ytCoverUniqueAppJs);
} else {
ytCoverAppJs = httpsYtGetFunc(query + " cover");
console.log("ytCoverAppJs:");
console.log(ytCoverAppJs);
// Removing redundant values.
ytCoverUniqueAppJs = compareAndRemove(ytCoverAppJs, ytQueryAppJs);
console.log("ytCoverUniqueAppJs:");
console.log(ytCoverUniqueAppJs);
}
// Fetching 'live performances' related to user's query and putting them in the array.
if (query.includes("live") == true) {
ytLiveAppJs = httpsYtGetFunc(query);
console.log("ytLiveAppJs:");
console.log(ytLiveAppJs);
// Removing redundant values.
ytLiveUniqueAppJs = compareAndRemove(ytLiveAppJs, ytQueryAppJs.concat(ytCoverUniqueAppJs));
console.log("ytLiveUniqueAppJs:");
console.log(ytLiveUniqueAppJs);
} else {
ytLiveAppJs = httpsYtGetFunc(query + " live");
console.log("ytLiveAppJs:");
console.log(ytLiveAppJs);
// Removing redundant values.
ytLiveUniqueAppJs = compareAndRemove(ytLiveAppJs, ytQueryAppJs.concat(ytCoverUniqueAppJs));
console.log("ytLiveUniqueAppJs:");
console.log(ytLiveUniqueAppJs);
}
// Emptying all the arrays.
ytQueryAppJs.length = 0;
ytCoverAppJs.length = 0;
ytCoverUniqueAppJs.length = 0;
ytLiveAppJs.length = 0;
ytLiveUniqueAppJs.length = 0;
});
Unfortunately you can use the async/await on http module when making requests. You can install and use axios module . In your case it will be something like this
const axios = require('axios');
// Declaring variables for the function 'httpsYtGetFunc'
let apiKey = "";
let urlOfYtGetFunc = "";
let resultOfYtGetFunc = "";
let extractedResultOfYtGetFunc = [];
// This function GETs data, parses it, pushes required values in an array.
async function httpsYtGetFunc(queryOfYtGetFunc) {
apiKey = "AI...MI"
urlOfYtGetFunc = "https://www.googleapis.com/youtube/v3/search?key=" + apiKey + "&part=snippet&q=" + queryOfYtGetFunc + "&maxResults=4&order=relevance&type=video";
const promise = axios.get(urlOfYtGetFunc).then(data => {
//do your data manipulations here
})
.catch(err => {
//decide what happens on error
})
Or async await
const data = await axios.get(urlOfYtGetFunc);
//Your data variable will become what the api has returned
If you still want to catch errors on async await you can use try catch
try{
const data = await axios.get(urlOfYtGetFunc);
}catch(err){
//In case of error do something
}
I have just looked at the code I think the issue is how you are handling the async code in the request handler. You are not awaiting the result of the function call to httpsYtGetFunc in the body so when it returns before the promise is finished which is why you get the Promise {Pending}.
Another issue is that the array is not extractedResultOfYtGetFunc is not initialised and you may access indexes that don't exist. The method to add an item to the array is push.
To fix this you need to restructure your code slightly. A possible solution is something like this,
// Declaring variables for the function 'httpsYtGetFunc'
let apiKey = "";
let urlOfYtGetFunc = "";
let resultOfYtGetFunc = "";
let extractedResultOfYtGetFunc = [];
// This function GETs data, parses it, pushes required values in an array.
function httpsYtGetFunc(queryOfYtGetFunc) {
apiKey = "AI...MI";
urlOfYtGetFunc =
"https://www.googleapis.com/youtube/v3/search?key=" +
apiKey +
"&part=snippet&q=" +
queryOfYtGetFunc +
"&maxResults=4&order=relevance&type=video";
return new Promise((resolve, reject) => {
// GETting data and storing it in chunks.
https.get(urlOfYtGetFunc, (response) => {
const chunks = [];
response.on("data", (d) => {
chunks.push(d);
});
// Parsing the chunks
response.on("end", () => {
// Initialising the array
extractedResultOfYtGetFunc = []
resultOfYtGetFunc = JSON.parse(Buffer.concat(chunks).toString());
// console.log(resultOfYtGetFunc)
// Extracting useful data, and allocating it.
for (i = 0; i < resultOfYtGetFunc.items.length; i++) {
// Adding the element to the array
extractedResultOfYtGetFunc.push(resultOfYtGetFunc.items[i].id.videoId);
// console.log(extractedResultOfYtGetFunc);
}
resolve(extractedResultOfYtGetFunc);
});
});
});
}
app.post("/", async function (req, res) {
query = req.body.queryValue;
// Fetching top results related to user's query and putting them in the array.
ytQueryAppJs = await httpsYtGetFunc(query);
console.log("ytQueryAppJs:");
console.log(ytQueryAppJs);
});
Another option would be to use axios,
The code for this would just be,
app.post("/", async function (req, res) {
query = req.body.queryValue;
// Fetching top results related to user's query and putting them in the array.
try{
ytQueryAppJs = await axios.get(url); // replace with your URL
console.log("ytQueryAppJs:");
console.log(ytQueryAppJs);
} catch(e) {
console.log(e);
}
});
Using Axios would be a quicker way as you don't need to write promise wrappers around everything, which is required as the node HTTP(S) libraries don't support promises out of the box.

Cannot Pass Variables Through Different Scopes in DiscordJS

Alright so my problem is that in the first set of console.log(streamXXXX)s, where XXXX are the various variables, when I read their values they all read as they should, while in the second set they read as undefined. Is this a scope issue? Maybe an Async issue? I tried adding awaits to each time I make a web request but nothing seems to work, and one of the most interesting parts about this is the fact that there are no errors?
Anyways, my code is listed below, as well as a link to test it out in Repl using a sample bot I created. Below that is the list of libraries required for said program to run. Thanks!
if (!message.member.voiceChannel) return message.channel.send(`You do realize you have to be in a voice channel to do that, right ${message.author.username}?`)
if (!message.member.voiceConnection) message.member.voiceChannel.join().then(async connection => {
let streamURL = args.slice(1).join(" ")
let streamTitle = "";
let streamThumb = "";
let streamAuth = "";
let streamAuthThumb = "";
if (streamURL.includes("https://www.youtube.com") || streamURL.includes("https://youtu.be/") && !streamURL.includes(' ')) {
youtube.getVideo(streamURL)
.then(async results => {
let {
body
} = await snekfetch.get(`https://www.googleapis.com/youtube/v3/channels?part=snippet&id=${results.channel.id}&fields=items%2Fsnippet%2Fthumbnails&key=${ytapikey}`).query({
limit: 800
})
streamTitle = results.title
streamThumb = results.thumbnails.medium.url
streamAuth = results.channel.title
streamAuthThumb = body.items[0].snippet.thumbnails.medium.url
console.log(streamURL)
console.log(streamTitle)
console.log(streamThumb)
console.log(streamAuth)
console.log(streamAuthThumb)
})
.catch(console.error)
} else if (!streamURL.includes("https://www.youtube.com") || !streamURL.includes("https://youtu.be/")) {
youtube.searchVideos(streamURL)
.then(async results => {
let {
body
} = await snekfetch.get(`https://www.googleapis.com/youtube/v3/channels?part=snippet&id=${results[0].channel.id}&fields=items%2Fsnippet%2Fthumbnails&key=${ytapikey}`).query({
limit: 800
})
streamURL = results[0].url
streamTitle = results[0].title
streamThumb = results[0].thumbnails.default.medium.url
streamAuth = results[0].channel.title
streamAuthThumb = body.items[0].snippet.thumbnails.medium.url
}).catch(console.error)
} else {
return message.reply("I can only play videos from YouTube (#NotSponsored).")
}
console.log(streamURL)
console.log(streamTitle)
console.log(streamThumb)
console.log(streamAuth)
console.log(streamAuthThumb)
const stream = ytdl(streamURL, {
filter: 'audioonly'
})
const dispatcher = connection.playStream(stream)
dispatcher.on("end", end => {
return
})
let musicEmbed = new Discord.RichEmbed()
.setAuthor(streamAuth, streamAuthThumb)
.setTitle(`Now Playing: ${streamTitle}`)
.setImage(streamThumb)
.setColor(embedRed)
.setFooter(`${streamAuth} - ${streamTitle} (${streamURL}`)
await message.channel.send(musicEmbed)
})
Link to test out the program on a sample bot I made
Modules you will need to test this:
discord.js
simple-youtube-api
node-opus
ffmpeg
ffbinaries
ffmpeg-binaries
opusscript
snekfetch
node-fetch
ytdl-core
Thanks again!
The reason why your output is undefined is due to the way promises work and how you structured your code:
let streamTitle = "";
// 1. Promise created
youtube.getVideo(streamURL)
// 2. Promise still pending, skip for now
.then(async results => {
// 4. Promise fulfilled
console.log(results.title); // 5. Logged actual title
});
console.log(streamTitle); // 3. Logged ""
You already have the correct approach for your snekfetch requests, just need to apply it to the YT ones as well:
let streamTitle = "";
const results = await youtube.getVideo(streamURL);
streamTitle = results.title;
console.log(streamTitle); // Desired output

Fetching data in the loop

So I'm trying to connect to external server called Pexels to get some photos. I'm doing that from node.js but it is just a javascript issue. Pexels unfortunately lets user to download object with only 40 pictures per page.
https://api.pexels.com/v1/curated?per_page=40&page=1 // 40 is maximum
But actually I need more then that. I'd like to get 160 results, ie. to combine all first four pages. In order to do that I tried looping the request:
let pexelsData = [];
for(let i = 1; i < 5; i++) {
const randomPage = getRandomFromRange(1, 100); //pages should be randomized
const moreData = await axios.get(`https://api.pexels.com/v1/curated?per_page=40&page=${randomPage}`,
createHeaders('bearer ', keys.pexelsKey));
pexelsData = [ ...moreData.data.photos, ...pexelsData ];
}
Now I can use pexelsData but it work very unstable, sometimes it is able to get all combined data, sometimes it crashes. Is there a correct and stable way of looping requests?
You work with 3rd party API, which has rate limits. So you should add rate limits to your code. The simplest solution for you is using p-limit or similar approach form promise-fun
It will looks like that:
const pLimit = require('p-limit');
const limit = pLimit(1);
const input = [
limit(() => fetchSomething('foo')),
limit(() => fetchSomething('bar')),
limit(() => doSomething())
];
(async () => {
// Only one promise is run at once
const result = await Promise.all(input);
console.log(result);
})();
you can break it into functions like..
let images=[];
const getResponse = async i=> {
if(i<5)
return await axios.get(`https://api.pexels.com/v1/curated?per_page=40&page=${i}`)
}
const getImage = (i)=>{
if(i<5){
try {
const request = getResponse(i);
images = [...images,...request];
// here you will get all the images in an array
console.log(images)
getImage(++i)
} catch (error) {
console.log("catch error",error)
// getImage(i)
}
}
}
getImage(0); //call initail

How can I get the raw download size of a request using Puppeteer?

That is, the total amount of data downloaded across all resources (including video/media), similar to that returned by Chrome DevTools' Network tab.
There doesn't seem to be any way to do this as of January 2018 that works with all resource types (listening for the response event fails for videos), and that correctly counts compressed resources.
The best workaround seems to be to listen for the Network.dataReceived event, and process the event manually:
const resources = {};
page._client.on('Network.dataReceived', (event) => {
const request = page._networkManager._requestIdToRequest.get(
event.requestId
);
if (request && request.url().startsWith('data:')) {
return;
}
const url = request.url();
// encodedDataLength is supposed to be the amount of data received
// over the wire, but it's often 0, so just use dataLength for consistency.
// https://chromedevtools.github.io/devtools-protocol/tot/Network/#event-dataReceived
// const length = event.encodedDataLength > 0 ?
// event.encodedDataLength : event.dataLength;
const length = event.dataLength;
if (url in resources) {
resources[url] += length;
} else {
resources[url] = length;
}
});
// page.goto(...), etc.
// totalCompressedBytes is unavailable; see comment above
const totalUncompressedBytes = Object.values(resources).reduce((a, n) => a + n, 0);
The solution of #mjs works perfectly even in 2021. Just need to replace:
page._networkManager -> page._frameManager._networkManager
Full example that works for me:
const resources = {};
page._client.on('Network.dataReceived', (event) => {
const request = page._frameManager._networkManager._requestIdToRequest.get(
event.requestId
);
if (request && request.url().startsWith('data:')) {
return;
}
const url = request.url();
const length = event.dataLength;
if (url in resources) {
resources[url] += length;
} else {
resources[url] = length;
}
});
await page.goto('https://stackoverflow.com/questions/48263345/how-can-i-get-the-raw-download-size-of-a-request-using-puppeteer');
const totalUncompressedBytes = Object.values(resources).reduce((a, n) => a + n, 0);
console.log(totalUncompressedBytes);
If you are using puppeteer, you have server side node... Why not pipe the request through a stream, or streams and then calculate the content size?
Also there is https://github.com/watson/request-stats
Also you may want to call page.waitForNavigation as you may be wrestling with async timing issues
const imgaes_width = await page.$$eval('img', anchors => [].map.call(anchors, img => img.width));
const imgaes_height = await page.$$eval('img', anchors => [].map.call(anchors, img => img.height));

Categories