Variable is overwritten in asynchronous function with async.forEach loop - javascript

I've stumbled upon (or created myself of course) an error that I cannot model in my head. I'm iteratively calling an URL using the webdriverio client with different IDs and parsing the resulting HTML. However, the html variable gets overwritten with the last element in the loop, which results in the array containing multiple duplicates of the last html variable value:
async.forEach(test, function (id, callback) {
self.url('https://<api-page>?id=' + id).getHTML('table tbody', true).then(function(html) {
//Parse HTML
parser.write(html);
parser.end();
//Add course to person, proceed to next.
callback();
});
}, function (err) {
self.end().finally();
res.json(person);
});
Parsing is done using the htmlparser2 NPM library. The html variable always returns the last element, even though I can see it going through the different API ids with different data. I would think the error lies at when I get HTML and return it, but I cannot say why nor have any of my fixes worked.
Hopefully someone more skilled than me can see the error.
Thanks in advance,
Chris
UPDATE/Solution - See solution below

I am not sure if I understood quite well the context but the html variable is not overridden, it is just the last chunk that you 've retrieved from the self.url function call. If you want to have the whole result saved in a variable, you should keep append on every loop the result. Probably, you need something like that:
var html = '';
async.forEach(test, function (id, callback) {
self.url('https://<api-page>?id=' + id).getHTML('table tbody', true).then(function (tmpHtml) {
//Parse HTML
parser.write(tmpHtml);
parser.end();
html += tmpHtml;
//Add course to person, proceed to next.
callback();
});
}, function (err) {
self.end().finally();
res.json(person);
});

I finally figured it out and I missed that async.forEach executes the function in parallel, whereas the function I needed was async.timesSeries, which executes the functions in a loop, waiting for each function to finish before starting the next! I've attached the working code below:
async.timesSeries(3, function(n, next) {
self.url('<api-page>?id=' + n').then(function() {
console.log("URL Opened");
}).getHTML('table tbody', true).then(function(html) {
console.log("getHTML");
parser.write(html);
parser.end();
next();
});
}, function(err, results) {
//Add to person object!
self.end().finally();
res.json(person);
});

Related

jQuery - not able to consistently access created DOM elements

I'm working with jQuery (version 3.2.1), and I'm finding that sometimes, for reasons that I cannot discern, jQuery is unable to locate jQuery-created DOM elements. I would say that this issue occurs about 1 out of every 10 times I refresh the page. In those instances, the element is undefined.
It's a bit of a long and involved script, so I'll attempt to distill it to its critical parts. First of all, the scripts are introduced in the index.html like so:
<body>
...
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
<script src="app.js"></script>
</body>
Pretty much what you'd expect. Here's the relevant (and abbreviated) code from app.js - the problem occurs in the loadItems() function:
function getQueryParamCat(queryParam) {
return $('.category-item[data-query_name=' + '\'' + queryParam + '\'' + ']');
}
function loadItems(queryParam) {
$.post('./get_items.php', {}, () => {
const queryParamCat = getQueryParamCat(queryParam);
if (queryParamCat[0]) {
// Leaving out categoryClick() - it triggers a click on the relevant DOM element
categoryClick(queryParamCat);
} else {
categoryClick($('category').first());
}
});
}
function loadCategories(callBack) {
$.post('./get_categories.php', {}, (data) => {
const categories = $.parseJSON(data);
$.each(categories, (i, value) => {
const cat = $('<category>').appendTo($('left')).html(value.name);
cat.attr('class', 'category-item');
cat.attr('data-query_name', value.name.toLowerCase());
cat.mousedown(function () {
categoryClick($(this));
});
});
return callBack;
});
}
$(document).ready(() => {
// Leaving out getParameterByName() - just gets a string from the url
const queryParam = getParameterByName();
loadCategories(loadItems(queryParam));
});
In brief summary:
the page loads and loadCategories() is called.
the client makes an AJAX request to get_categories.php, and the returned data is used to create a set of <category> DOM elements.
loadItems(queryParam) is then called as a callback, which then makes an additional AJAX request to get more data.
In the callback following that request, we ultimately want to call the categoryClick() function, passing in a <category> DOM element as the argument (the element to be 'clicked'). THIS IS WHERE THE PROBLEM OCCURS.
About 1 out of 10 times, the result of getQueryParamCat() comes back as r.fn.init [prevObject: r.fn.init(1)], which makes the value of queryParamCat[0] in the conditional in loadItems() evaluate to undefined. However, in those situations, $('category') also evaluates to r.fn.init [prevObject: r.fn.init(1)], meaning that $('category').first() is also undefined.
This problem only seems to affect elements that are created by jQuery - anything that was hard-coded in the HTML can be accessed, no problem. Why is it that jQuery is unable to consistently find these elements? Is it trying to find those elements before they've been successfully appended? I could understand if it failed all the time, but the inconsistency is confusing to me. Can anyone offer any suggestions as to how to make this code perform reliably?
Odd syntax; loadCategories expects a callback as an argument, but loadItems doesn't return anything, so loadCategories(loadItems(queryParam)); turns into loadCategories(undefined);.
Also, return callBack; doesn't do anything inside of a $.post function; it's not only not returning the value to the outer function's caller, it's also running async.
Maybe did you mean to do something like this?
loadCategories(() => {
loadItems(queryParam)
});
function loadCategories(callBack) {
// ...
$.each(categories, (i, value) => {
// ...
});
callBack();
That ensures the callback is called after the meat of loadCategories is done.

TypeError: Cannot read property 'latestTimestamp' of undefined [duplicate]

This question already has answers here:
Calling an asynchronous function within a for loop in JavaScript
(10 answers)
Closed 8 years ago.
I found lots of similar questions, but I still don't know what's wrong with my code. It seems that I cannot read global variable value (urls) in the callback function: I want to update the urls latestTimestamp value in the callback function(err, articles). Here is the code that went wrong:
var urls=[
{"url": "http://www.economist.com/feeds/print-sections/77/business.xml", "latestTimestamp": new Number(0)},
{"url": "http://news.sky.com/feeds/rss/home.xml", "latestTimestamp": new Number(0)},
]; // Example RSS Feeds;
// parse RssFeeds from given websites and write them into databse
function parseRssFeeds(collection){
var feed = require('feed-read'); // require the feed-read module
// loop through our list of RSS feed urls
for (var j = 0; j < urls.length; j++)
{
console.log('Original url timestamp is: '+ urls[j].latestTimestamp.toString());
// fetch rss feed for the url:
feed(urls[j], function(err, articles)
{
// loop through the list of articles returned
for (var i = 0; i < articles.length; i++)
{
var message =
{"title": articles[i].title,
"link": articles[i].link,
"content": articles[i].content,
"published": articles[i].published.getTime()};
collection.insert(message, {safe:true}, function(err, docs) {
if (err) {
console.log('Insert error: '+err);
}else{
console.log('This item timestamp is: '+ message.published);
// get the latest timestamp
if (message.published >urls[j].latestTimestamp) {
console.log('update timestamp to be: '+ message.published);
urls[j].latestTimestamp = message.published;
}
}
});// end collection insert
} // end inner for loop
}) // end call to feed method
} // end urls for loop
}
Thanks for any help. The error is:
TypeError: Cannot read property 'latestTimestamp' of undefined
at /Users/Laura/Documents/IBM/project/TestList/app.js:244:37
at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/collection/core.js:123:9
at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/db.js:1131:7
at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/db.js:1847:9
at Server.Base._callHandler (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/base.js:445:41)
at /Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/server.js:478:18
at MongoReply.parseBody (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/responses/mongo_reply.js:68:5)
at null.<anonymous> (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/server.js:436:20)
at emit (events.js:95:17)
at null.<anonymous> (/Users/Laura/Documents/IBM/project/TestList/node_modules/mongodb/lib/mongodb/connection/connection_pool.js:201:13)
This should probably be closed as a duplicate, but I'll put an answer here because the relationships between all the duplicate questions are often so hard to grasp for JavaScript programmers that haven't understood the fundamental problem.
There are two ways to solve this. One way is to change the way that you create the callback. Instead of using an inline anonymous function:
feed(urls[j], function(err, articles)
{
// loop through the list of articles returned
// ...
you'd create another function that returns the callback. You'd pass that function the URL, and that's what the returned function would use:
function makeFeedResultHandler(url) {
return function(err, articles) {
// loop through the list of articles returned
// ... code as before up to this line:
if (message.published > url.latestTimestamp) {
console.log('update timestamp to be: '+ message.published);
url.latestTimestamp = message.published;
}
// ... etc
};
}
Then you'd call "feed" like this:
feed(urls[j], makeFeedResultHandler(urls[j]));
The key difference is that each function passed to "feed" will have its own private copy of the object (well, a copy of the reference to be picky) from the "urls" array, so it won't need to refer to the variable "j" at all. That's the crux of the problem: "j" is shared by all the callbacks in your code. By the time the callbacks are invoked, the value of "j" is equal to the length of the "urls" array, so urls[j] is undefined.
The other approach would be to use the .forEach method, available in newer JavaScript implementations. That approach would get rid of "j" altogether:
urls.forEach(function(url) {
console.log('Original url timestamp is: '+ url.latestTimestamp.toString());
// fetch rss feed for the url:
feed(url, function(err, articles)
{
// loop through the list of articles returned
// ... code as previously, substituting "url" for "urls[j]" everywhere
});
});
Again, that makes sure that every callback sent to the "feed" function has its own copy of the "urls" element.
Expanding on what #Pointy said in his comment under your post:
The insert function you are using with MongoDB is async, but you are treating the callback like it is synchronous. What is essentially happening in your loop, is everything works as planned until you hit collection.insert. From there, the process breaks off and essentially says "I'm going to tell mongo to insert a record now.. and eventually I'll expect a response." Meanwhile, the loop continues on to the next index and doesn't synchronously wait until the callback fires.
By the time your callback fires, your loop is already done, and J doesn't represent the index anymore, which is why its coming up undefined. You also run the risk of getting a different index than what you plan also with this current method.
I would recommend reworking your loop to support the async nature of node. There is a great library called - oddly enough - async that makes this process super simple. The async.each() function should help you accomplish what you are trying to do.

Node: How to ensure a function runs ONLY after an object has been created?

I'm using nodejs with cheerio to scrape data from a website, and it creates an object from it. Then, it needs to take that object and use it in a function.
The issue is, my object is being created but before cheerio can properly parse the data and put it into the object, the next function is already running. Here's my code:
function getInfo(obj, link){
request(link, function(err, resp, body) {
if (err) {
console.log("Uh-oh: " + err);
throw err;
}
$ = cheerio.load(body);
function createProduct(obj, callback){
var product = {
name : $('#name').text(),
gender : obj.gender,
infoLink : link,
designer : $('.label').first().text(),
price : $('#price').first().text(),
description : $('.description').text(),
date : new Date()
}
product.systemName = (function(){
return product.name.replace(/\s+/g, ' ');
}());
callback(product);
}
createProduct(obj, function(product){
lookUp(product);
});
I'm getting mixed results here. Some product objects are being sent to the function just fine with all the details properly input. Some are missing descriptions, others are missing every cheerio-populated content. Others have some cheerio scraped content, but are missing certain bits. The gender and date attributes are always there, and the properties exist, but they're just blank (e.g. product.name returns "" rather than undefined).
I've checked each offending link and all pages contain the correct selectors to be scraped.
How can I set up the callback to ONLY function once the product object has been populated?
There are two possible asynchronous executions which can get you these results :
cheerio.load has not finished before createProduct is called.
In createProduct product is not getting populated or partially like description before callback is called (not sure).
You can use async library to make functions execute synchronously (by using async.series). If createProduct is asynchronous as well , you will have to make it synchronous in similar way.
async.series([
function(callback){
$ = cheerio.load(body);
callback();
},
function(callback){
createProduct(obj, function(product){
lookUp(product);
});
callback();
}
]);

Properly way to modify an object property inside an anonymous function callback on Javascript

I'm using Node.js + Express + nodejs-sqlite3 to make a form that when submited will insert a new row on an slite3 database.
On query sucess I want to write certain response.
So the small big problem is just: Modify a string that will be storing the html to be shown, inside the callback function of sqlite3.run()
I read about closures, and passing an object with methods to modify its own attributes. But it seems it's not working. It will pass the object attributes and methods, but no change will remain when the callback function ends. I read that objects will be passed as reference, not copies.
This is the code:
app.post("/insert.html", function(req, res){
function TheBody(){
this.html = "";
this.msg = "";
this.num = "";
}
TheBody.prototype.add = function(string){
this.html = this.html + string;
}
var body = new TheBody();
body.msg = req.body.message;
body.num = req.body.number;
var insertCallback = function(data){
return function(err){
if( err != null){
console.log("Can't insert new msg: " + err.message);
data.add("ERROR-DB");
} else {
console.log("Ok. Inserted: " + data.msg);
console.log(data.html);
data.add("OK - MSG: "+data.msg+" NUM: "+data.num);
console.log(data.html);
}
};
};
var db = new lite.Database('database.db');
var query = "INSERT INTO outbox (message, number) VALUES (?, ?)";
db.run(query, [body.msg, body.num], insertCallback(body) );
res.setHeader('Content-Type', 'text/html');
res.setHeader('Content-Length', body.html.length);
res.end(body.html);
}
On server side I'll see
Ok. Inserted: TestString
[Blank space since data.html still has no information]
OK - MSG: TestString NUM: TestNumber [Showing that indeed was modified inside the function]
But on the client side res.end(body.html); will send an empty string.
The object is not being passed as reference.
What's missing in the code, and what simpler alternatives I have to change a string variable inside a callback anonymous function?.
I already know I could use response.write() to write directly on the function if it were more simpler. But I discovered it would only work if I use response.end() inside the callback, otherwise (being outside as it is now) it will meet a race condition where the buffer will be closed before sqlite3.run() be able to use response.write().
-------- Answered --------
As hinted by Justin Bicknell and confirmed by George P. Nodejs-sqlite3 functions are run asynchronously. So I was ending the stream to the client before the callback would be called, thus nothing was being printed.
This was a problem more about "This is SPART- nodejs, so write your stuff according to events'" rather than a logic one. I found this kind of programming kind of convoluted but nobody else than me told me to use nodejs. For those wondering about how one could put some order over the order of queries on the database, nodejs-sqlite3 functions returns a database object that is used to chain the next query.
Since I was printing the information to the client just once in every handled event, the resulting object ended like this:
function TheBody(response){
this.response = response;
}
TheBody.prototype.printAll = function(string){
this.response.setHeader('Content-Type', 'text/html');
this.response.setHeader('Content-Length', string.length);
this.response.end(string);
}
Preferring that to clutter all the code a lot of res.setHeader() lines.
node-sqlite3 methods are, by default, run in parallel (asynchronously). That means that your code is going through this timeline:
Your code calls db.run(...)
Your code calls res.end(...)
db.run completes and calls your callback.
This is the source of a huge number of questions here on SO, so you can almost certainly find a better answer than anything that I could write here in a reasonable amount of time.
I would start here: How does Asynchronous Javascript Execution happen? and when not to use return statement?

How to deal with async function results in JavaScript

Coming from a c# background, I'm probably looking at JavaScript from a completely wrong perspective, so please bear with me.
Leaving the advantages of async aside for a minute,
let's say I simply want to retreive a value from an SQLite database in an HTML5 page.
What I want to see is something like
var something = db.getPicture(1);
Now consider a (perhaps very naive) implementation of this:
this.getPicture(id)
{
this.database.transaction(function(tx)
{
tx.executeSql('SELECT ......', null, function(tx, results)
{
if (results.rows.length == 1)
return results.rows.items(0).Url; //This of course does not resturn
//anything to the caller of .getPicture(id)
}
},
function(error)
{
//do some error handling
},
function(tx)
{
//no error
});
}
First off, it's one big mess of nested functions and second... there's no way for me to return the result I got from the database as the value of the .getPicture() function.
And this is the easy version, what if I wanted to retreive an index from a table first,
then use that index in the next query and so on...
Is this normal for JavaScript developers, am I doing it completely wrong, is there a solution, etc...
The basic pattern to follow in JavaScript (in asynchronous environments like a web browser or Node.js) is that the work you need to do when an operation is finished should happen in the "success" callback that the API provides. In your case, that'd be the function passed in to your "executeSql()" method.
this.getPicture = function(id, whenFinished)
{
this.database.transaction(function(tx)
{
tx.executeSql('SELECT ......', null, function(tx, results)
{
if (results.rows.length == 1)
whenFinished(results.rows.items(0).Url);
}
},
In that setup, the result of the database operation is passed as a parameter to the function provided when "getPicture()" was invoked.
Because JavaScript functions form closures, they have access to the local variables in the calling context. That is, the function you pass in to "getPicture()" as the "whenFinished" parameters will have access to the local variables that were live at the point "getPicture()" is called.

Categories