I have a JavaScript loop
for (var index = 0; index < this.excelData.results.length; index++) {
let pmidList = this.excelData.results[index]["PMIDList"];
if (pmidList.length == 0 ){
continue;
}
let count = 0;
let pmidsList = pmidList.split(',');
if (pmidsList.length > 200){
pmidList = pmidsList.slice(count, count+ 200).join(',');
} else {
pmidList= pmidsList.join(",");
}
// Create some type of mini loop
// pmidList is a comma separated string so I need to first put it into an array
// then slice the array into 200 item segments
let getJSONLink = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?'
getJSONLink += 'db=pubmed&retmode=json&id=' + pmidList
await axios.get(getJSONLink)
.then(res => {
let jsonObj = res.data.result;
//Do Stuff with the data
}
}).catch(function(error) {
console.log(error);
});
//Loop
}
the entire process works fine EXCEPT when the PMIDList has more than 200 comma separated items. The web service only will accept 200 at a time. So I need to add an internal loop that parses out the first 200 hundred and loop back around to do the rest before going to the next index and it would be nice to do this synchronously since the webservice only allows 3 requests a second. And since I'm using Vue wait is another issue.
The easiest option would be to use async await within a while loop, mutating the original array. Something like:
async function doHttpStuff(params) {
let getJSONLink = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?'
getJSONLink += 'db=pubmed&retmode=json&id=' + params.join('')
return axios.get(getJSONLink)
}
and use it inside your inner while loop:
...
let pmidsList = pmidList.split(',');
// I'll clone this, because mutating might have some side effects?
const clone = [...pmidsList];
while(clone.length) {
const tmp = clone.splice(0, 199); // we are mutating the array, so length is reduced
const data = await doHttpStuff(tmp);
// do things with data
}
...
Related
Beginner here...
So I wanted to make what I thought was a simple change, but it has proven to be more difficult. I have a .js script that runs on a timer as a function. The function queries data and brings back a list of objects that turn into an array of ids of records in the first loop.
In its original format -
// get pledge objects
const pledges = await getOppPledges()
//Check if there are records
if (pledges && pledges.length > 0) {
//get array of pledge opportunity ids
for (let pledge of pledges) {
let oppIds = pledges.map(p => p.Opportunity__c)
let sfOpps = await getOpp(oppIds)
//run loop to create oppty doc
for await (let opp of sfOpps) {...
}
...}
My problem is that the double "for" loop generates double documents when there is more than one pledge. When I tried closing the first loop right before the second loop, "sfOpps" was unavailable for the second loop, and the code broke.
I had rearranged the code to something like,
if (pledges && pledges.length > 0) {
for (let pledge of pledges) {
let oppIds = [pledge.Opportunity__c]
let opp = await getOpp(oppIds)
But it seems like it returns "opp" as an array instead of an object. This does not work either because it comes back as undefined when I call for a value of "opp" like "opp.OpportunityContactRoles.totalSize".
Any input is appreciated!
If I understood your problem correctly you shouldn't be needing the first loop as you map your ids on all your pledges inside the loop. It's the reason why you do the operations n times. (n being the length of the "pledges" array)
let oppIds = pledges.map(p => p.Opportunity__c) // You're already mapping all the ids at once
The correct way to do it would be
// get pledge objects
const pledges = await getOppPledges()
//Check if there are records
if (pledges && pledges.length > 0) {
//get array of pledge opportunity ids
let oppIds = pledges.map(p => p.Opportunity__c)
let sfOpps = await getOpp(oppIds)
//run loop to create oppty doc
for await (let opp of sfOpps) {...
}
So... the code is like this
<script>
console.log("Calculating the number of cases...");
calculate_total();
function calculate_total() {
fetch('https://covid.ourworldindata.org/data/owid-covid-data.json')
.then(res => {
return res.json()
})
.then (raw_data => {
var total_cases = 0;
var new_cases = 0;
var total_deaths = 0;
for (const key in raw_data) {
const country = raw_data[key];
const country_data = country.data;
const latest_data = country_data[country_data.length - 1];
if (country.location != "World") {
if (latest_data.total_cases != null) {
total_cases += latest_data.total_cases;
}
if (latest_data.new_cases != null) {
new_cases += latest_data.new_cases;
}
if (latest_data.total_deaths != null) {
total_deaths += latest_data.total_deaths;
}
}
}
console.log("Number of total cases:" + total_cases);
console.log("Number of new confirmed cases:" + new_cases);
console.log("Number of deaths:" + total_deaths);
document.getElementById("total_cases").innerHTML = total_cases;
document.getElementById("new_cases").innerHTML = new_cases;
document.getElementById("total_deaths").innerHTML = total_deaths;
})
}
</script>
The result actually show what I want to see, however it probably takes 5 minutes until the result shows.
What should I change in order to get the result instantly or less waiting time?
The data is around 35mb and the complete traversal of the JSON using the for loop is expected to take a long time.
Also, trying to convert this long stringified version of the JSON also takes compute time.
You cannot do much but try to get only the fields that you require (this is possible if fetching data from a GraphQL API). This way the result you get only contains the fields that you want to work with which I guess in this case will significantly reduce the compute time
I'm trying to insert 10M+ rows into a MySQL database using Knex.js. Is there a way to use a for loop to insert arrays of length 10000 (which seems to maximum size that I am able to insert - anything larger than that gets "Error: ER_NET_PACKET_TOO_LARGE: Got a packet bigger than 'max_allowed_packet' bytes").
I tried using a promise chain but the chain would be very long to accommodate 10M records.
exports.seed = (knex) => {
// Deletes ALL existing entries
return knex('books').del()
.then(() => {
const fakeBooks = [];
for (let i = 0; i < 10000; i += 1) {
fakeBooks.push(createFakeBooks());
}
return knex('books').insert(fakeBooks)
.then(() => {
const fakeBooks1 = [];
for (let i = 0; i < 10000; i += 1) {
fakeBooks1.push(createFakeBooks());
}
return knex('books').insert(fakeBooks1)
.then(() => {
const fakeBooks2 = [];
for (let i = 0; i < 10000; i += 1) {
fakeBooks2.push(createFakeBooks());
}
...
It's easier if you use async and await and ditch the thens. It can then be written like this:
exports.seed = async (knex) => {
await knex('books').del();
let fakeBooks = [];
for (let i = 1; i <= 10000000; i += 1) {
fakeBooks.push(createFakeBooks());
if (i % 1000 === 0) {
await knex('books').insert(fakeBooks);
fakeBooks = [];
}
}
};
await will make the promise finish before the function continues, without blocking the thread. The loop will run ten million times and insert into the database for every 1000 rows. You can change it to 10000 rows, but you might as well use 1000 to be sure.
I only tried with one million rows myself, as it took too much time to insert ten million.
You can use https://knexjs.org/#Utility-BatchInsert which is done for inserting big amount of rows to DB.
await knex.batchInsert('books', create10MFakeBooks(), 5000)
However you might want to actually create those books in smaller batches to prevent using gigabytes of memory. So MikaS's answer is valid, that just use async / await and it will be trivial to write.
I would not use knex for this kind of job, but raw SQL.
There seem to be many questions about this problem out here, but none directly relate to my question AFAICT. Here is the problem statement:
This problem is the same as the previous problem (HTTP COLLECT) in that you need to use http.get(). However, this time you will be provided with three URLs as the first three command-line arguments.
You must collect the complete content provided to you by each of the URLs and print it to the console (stdout). You don't need to print out the length, just the data as a String; one line per URL. The catch is that you must print them out in the same order as the URLs are provided to you as command-line arguments.
Here is my original solution that fails:
var http = require('http')
var concat = require('concat-stream')
var args = process.argv.slice(2, 5)
var args_len = args.length
var results = []
args.forEach(function(arg, i) {
http.get(arg, function(res) {
res.setEncoding('utf8')
res.pipe(concat(function(str) {
results[i] = str
if (results.length === args_len)
results.forEach(function(val) {
console.log(val)
})
}))
}).on('error', console.error)
})
This is the solution they recommend:
var http = require('http')
var bl = require('bl')
var results = []
var count = 0
function printResults () {
for (var i = 0; i < 3; i++)
console.log(results[i])
}
function httpGet (index) {
http.get(process.argv[2 + index], function (response) {
response.pipe(bl(function (err, data) {
if (err)
return console.error(err)
results[index] = data.toString()
count++
if (count == 3)
printResults()
}))
})
}
for (var i = 0; i < 3; i++)
httpGet(i)
What I fail to grok is the fundamental difference between my code and the official solution. I am doing the same as their solution when it comes to stuffing the replies into an array to reference later. They use a counter to count the number of callbacks while I am comparing the length of two arrays (one whose length increases every callback); does that matter? When I try my solution outside the bounds of the learnyounode program it seems to work just fine. But I know that probably means little.... So someone who knows node better than I... care to explain where I have gone wrong? TIA.
They use a counter to count the number of callbacks while I am comparing the length of two arrays (one whose length increases every callback); does that matter?
Yes, it does matter. The .length of an array depends on the highest index in the array, not the actual number of assigned elements.
The difference surfaces only when the results from the asynchronous requests come back out of order. If you first assign index 0, then 1, then 2 and so on, the .length matches the number of assigned elements and would be the same as their counter. But now try out this:
var results = []
console.log(results.length) // 0 - as expected
results[1] = "lo ";
console.log(results.length) // 2 - sic!
results[0] = "Hel";
console.log(results.length) // 2 - didn't change!
results[3] = "ld!";
console.log(results.length) // 4
results[2] = "Wor";
console.log(results.length) // 4
If you would test the length after each assignment and output the array whenever you get 4, it would print
"Hello ld!"
"Hello World!"
So it turns out there were two different issues here, one of which was pointed out by #Bergi above. The two issues are as follows:
The .length method does not actually return the number of elements in the array. Rather it returns the highest index that is available. This seems quite silly. Thanks to #Bergi for pointing this out.
The scoping of the i variable is improper, and as such the value of i can change. This causes a race condition when results come back.
My final solution ended up being as follows:
var http = require('http')
var concat = require('concat-stream')
var args = process.argv.slice(2, 5)
var args_len = args.length
var results = []
var count = 0
function get_url_save(url, idx) {
http.get(url, function(res) {
res.setEncoding('utf8')
res.pipe(concat(function(str) {
results[idx] = str
if (++count === args_len)
results.forEach(function(val) {
console.log(val)
})
}))
}).on('error', console.error)
}
args.forEach(function(arg, i) {
get_url_save(arg, i)
})
Breaking the outtermost forEach into a method call solves the changing i issue since i gets passed in as parameter by value, thus never changing. The addition of the counter solves the issue described by #Bergi since the .length method isn't as intuitive as one would imagine.
I'm using the native driver for mongoDB. In the db I have about 7 collections and I want create a variable that stores the amount of entries in each collection minus the last collection. Afterwards I want to create another variable that stores the entries of the last collection then I want to pass the variables through the res.render() command and show it on the webpage.
The problem I'm having here is that I'm so used to synchronous execution of functions which in this case goes straight out the window.
The code below is the way I'm thinking, if everything is executed in sync.
var count = 0;
db.listCollections().toArray(function(err,collection){
for(i = 1; i < collection.length;i++){
db.collection(collection[i].name).count(function(err,value){
count = count + value;
})
}
var count2 = db.collection(collection[i].name).count(function(err,value){
return value;
})
res.render('index.html',{data1: count, data2: count2})
})
Obviously this doesn't do want I want to do so I tried playing around with promise, but ended up being even more confused.
You could do something like this with Promises:
Get collection names, iterate over them, and return either count, or entries (if it's the last collection). Then sum up individual counts and send everything to the client.
db.listCollections().toArray()
.then(collections => {
let len = collections.length - 1
return Promise.all(collections.map(({name}, i) => {
let curr = db.collection(name)
return i < len ? curr.count() : curr.find().toArray()
}
))
}
)
.then(res => {
let last = res.pop(),
count = res.reduce((p, c) => p + c)
res.render('index.html', {count, last})
})