I'm having some trouble understanding asynchronous functions. I've read the chapter in Mixu's Node Book but I still can't wrap my head around it.
Basically I want to request a ressource (using the node package cheerio), parse it for valid URLs and add every match to my redis set setname.
The problem is that in the end it's only adding the first match to the redis set.
function parse(url, setname)
{
request(url, function (error, response, body)
{
if (!error && response.statusCode == 200)
{
$ = cheerio.load(body)
// For every 'a' tag in the body
$('a').each(function()
{
// Add blog URL to redis if not already there.
var blog = $(this).attr('href')
console.log("test [all]: " + blog);
// filter valid URLs
var regex = /http:\/\/[^www]*.example.com\//
var result = blog.match(regex);
if(result != null)
{
console.log("test [filtered]: " + result[0]);
redis.sismember(setname, result[0], function(err, reply)
{
if(!reply)
{
redis.sadd(setname, result[0])
console.log("Added " + result[0])
}
redis.quit()
})
}
})
}
})
}
I'd be very grateful for pointers on how I'd have to restructure this so the redis.sadd method is working with the correct result.
The output of the current implementation looks like:
test [all]: http://test1.example.com/
test [filtered]: http://test1.example.com/
...
Added http://test2.example.com/
So it's adding the test1.example.com but not printing the "added" line, and it's not adding the test2.example.com but it's printing the "added" line for it.
Thank you!
The first issue is caused by redis.sismember() being asynchronous: when its callback is called, you have already overwritten the result variable so it will point to the last value it had, and not the value at the moment at which you called redis.sismember().
One way to solve that is to create a new scoped variable by wrapping the asynchronous function in a closure:
(function(result) {
redis.sismember(setname, result[0], function(err, reply) {
...
});
})(result);
Another option is to create a partial function that's used as callback:
redis.sismember(setname, result[0], function(result, err, reply) {
...
}.bind(this, result));
The second issue is, I think, caused by redis.quit() being called, which closes the Redis connection after the first sadd(). You're not checking err, but if you do it might tell you more.
Related
I did a couple of projects with node.js and I'm aware of the async behaviour and that one should usually use callback functions, etc. But one thing that bothers me ist the following.
I'm developing an Alexa skill and I have a function that handles the User intent:
'MyFunction': function() {
var toSay = ""; // Holds info what Alexa says
// Lot of checks and calculations what needs to be said by Alexa (nothing special)
if(xyz) {
toSay = "XYZ";
}else if(abc) {
toSay = "ABC";
}else{
toSay = "Something";
}
// Here is the "tricky" party
if(someSpecialEvent) {
toSay += " "+askDatabaseForInput(); // Add some information from database to string
}
this.emit(':ask', toSay, this.t('REPROMT_SPEECH')); // Gives the Info to Alexa (code execution stops here)
}
As mentioned in the code, there is some code which is usually used to find out what the output to Alexa should be.
Only on rare events, "someSpecialEvent", I need to query the database and add information to the String "toSay".
Querying the DB would look something like:
function askDatabaseForInput() { // The function to query the DB
var params = {
TableName: "MyTable",
OtherValues: "..."
};
// Do the Query
docClient.query(params, function(err, data) {
// Of course here are some checks if everything worked, etc.
var item = data.Items[0];
return item; // Item SHOULD be returned
});
return infoFromDocClient; // Which is, of course not possible
}
Now I know, that in the first function "'MyFunction'" I could just pass the variable "toSay" down to the DB Function and then to the DB Query and if everything is fine, I would do the "this.emit()" in the DB Query function. But for me, this looks very dirty and not much reusable.
So is there a way I can use "askDatabaseForInput()" to return DB information and just add it to a String? This means making the asynchronous call synchronous.
Making a synchronous call wouldn't affect the user experience, as the code isn't doing anything else anyway and it just creates the String and is (maybe) waiting for DB input.
Thanks for any help.
So you could do 2 things:
Like the person who commented says you could use a callback:
function askDatabaseForInput(callback) {
var params = {
TableName: "MyTable",
OtherValues: "..."
};
docClient.query(params, function(err, data) {
if (err) {
callback(err, null)
} else {
var item = data.Items[0];
callback(null, item);
}
});
}
or you could use promises:
function askDatabaseForInput() {
var params = {
TableName: "MyTable",
OtherValues: "..."
};
return new Promise(function (resolve, reject) {
docClient.query(params, function(err, data) {
if (err) {
reject(err)
} else {
var item = data.Items[0];
resolve(item);
}
});
});
}
you can then either put a function in where you call askDatabaseForInput or do askDatabaseForInput.then(....).
In the function or the .then you would add what you retrieved from the database to the variable toSay
hope this helps
I'd like to write a feature like this:
Scenario: new Singleton create
When a new, unmatchable identity is received
Then a new tin record should be created
And a new bronze record should be created
And a new gold record should be created
which would tie to steps like this:
defineSupportCode(function ({ Before, Given, Then, When }) {
var expect = require('chai').expect;
var chanceGenerator = require('./helpers/chanceGenerator')
var request = require('./helpers/requestGenerator')
let identMap;
// reset identMap before each scenario
Before(function () {
identMap = [];
});
// should generate a valid identity
// persist it in a local variable so it can be tested in later steps
// and persist to the db via public endpoint
When('a new, unmatchable identity is received', function (callback) {
identMap.push(chanceGenerator.identity());
request.pubPostIdentity(identMap[identMap.length-1], callback);
});
// use the local variable to retrieve Tin that was persisted
// validate the tin persisted all the props that it should have
Then('a new tin record should be created', function (callback) {
request.pubGetIdentity(identMap[identMap.length-1], callback);
// var self = this;
// request.pubGetIdentity(identMap[identMap.length-1], callback, () => {
// console.log('never gets here...');
// self.callback();
// callback();
// });
// request.pubGetIdentity(identMap[identMap.length-1], (callback) => {
// console.log('never gets here...');
// self.callback();
// callback();
// });
});
The issue that I'm having is that I can't do anything in the Then callback. That is where I'd like to be able to verify the response has the right data.
Here are relevant excerpts from the helper files:
var pubPostIdentity = function (ident, callback) {
console.log('pubIdentity');
var options = {
method: 'POST',
url: 'http://cucumber.utu.ai:4020/identity/' + ident.platform + '/' + ident.platformId,
headers: {
'X-Consumer-Custom-Id': ident.botId + '_' + ident.botId
},
body: JSON.stringify(ident)
};
console.log('ident: ', ident);
request(options, (err, response, body) => {
if (err) {
console.log('pubPostIdentity: ', err);
callback(err);
}
console.log('pubPostIdentity: ', response.statusCode);
callback();
});
}
// accept an identity and retrieve from staging via identity public endpoint
var pubGetIdentity = function (ident, callback) {
console.log('pubGetIdentity');
var options = {
method: 'GET',
url: 'http://cucumber.utu.ai:4020/identity/' + ident.platform + '/' + ident.platformId,
headers: {
'X-Consumer-Custom-Id': ident.botId + '_' + ident.botId
}
};
request(options, (err, response) => {
if (err) {
console.log('pubGetIdentity: ', err);
callback(err);
}
console.log('pubGetIdentity: ', response.body);
callback();
});
}
Something that we are considering as an option is to re-write the feature to fit a different step definition structure. If we re-wrote the feature like this:
Scenario: new Singleton create
When a new, unmatchable 'TIN_RECORD' is received
Then the Identity Record should be created successfully
When the Identity Record is retreived for 'tin'
Then a new 'tin' should be created
When the Identity Record is retreived for 'bronze'
Then a new 'bronze' should be created
When the Identity Record is retreived for 'gold'
Then a new 'gold' should be created
I believe it bypasses the instep callback issue we are wrestling with, but I really hate the breakdown of the feature. It makes the feature less readable and comprehensible to the business.
So... my question, the summary feature presented first, is it written wrong? Am I trying to get step definitions to do something that they shouldn't? Or is my lack of Js skills shining bright, and this should be very doable, I'm just screwing up the callbacks?
Firstly, I'd say your rewritten feature is wrong. You should never go back in the progression Given, When, Then. You are going back from the Then to the When, which is wrong.
Given is used for setting up preconditions. When is used for the actual test. Then is used for the assertions. Each scenario should be a single test, so should have very few When clauses. If you want, you can use Scenario Outlines to mix several very similar tests together.
In this case, is recommend to take it back to first principles and see if that works. Then build up slowly to get out working.
I suspect in this case that the problem is in some exception being thrown that isn't handled. You could try rewriting it to use promises instead, which will then be rejected on error. That gives better error reporting.
I was recently building a scraper module to get some information with nodejs until I encountered this "little" problem. The modules that I'm using are cheeriojs and request.
Actually the module works like a charm if I call only one method at a time. It contains three function and only two of them are exported, this is the code:
'use strict';
var request = require('request'),
cheerio = require('cheerio'),
counter = 0;
function find(term, cat, callback) {
// All the check for the parameters
scrape("http://.../search.php?search=" + encodeURIComponent(term), cat, callback);
}
function last(cat, callback) {
// All the check for the parameters
scrape("http://google.com/", cat, callback);
}
function scrape(url, cat, callback) {
request(url, function (error, response, body) {
if (!error && response.statusCode == 200) {
var $ = cheerio.load(body);
var result = [];
var items = $('.foo, .foo2').filter(function() {
// Condition to filter the resulted items
});
items.each(function(i, row) {
// Had to do another request inside here to scrape other information
request( $(".newpagelink").attr("href"), function(error, response, body) {
var name = $(".selector").text(),
surname = $(".selector2").text(),
link = cheerio.load(body)('.magnet').attr('href'); // This is the only thing that I'm scraping from the new page, the rest comes from the other "cheerio.load"
// Push an object in the array
result.push( { "name": name, "surname": surname, "link": link } );
// To check when the async requests are ended
counter++;
if(counter == items.length-1) {
callback(null, result);
}
});
});
}
});
}
exports.find = find;
exports.last = last;
The problem now, as I was saying, is that if I create a new node script "test.js" and I call only last OR find, it works perfectly! But if I call both the methods consecutively like this:
var mod = require("../index-tmp.js");
mod.find("bla", "blabla", function(err, data) {
if (err) throw err;
console.log(data.length + " find");
});
mod.last(function(err, data) {
console.log(data.length + " last");
});
The results are completely messed up, sometimes the script doesn't even print something, other times print the result of only "find" or "last", and other times returns a cheeriojs error (I won't add here to not mess you up, because probably it's my script's fault). I thought also to repeat the same function two times for both the methods but nothing, the same problems occur... I don't know what else to try, I hope you'll tell me the cause of this behavior!
Your counter variable is global, not specific to each scrape call. It wouldn't work if you called find twice at the same time either, or last.
Move the declaration and initialisation of var counter = 0; into the scrape function, or even better right next to the result and items declarations.
From scanning your code quickly, this is probably due to the variable counter being global. These are asynchronous functions, so they will both act on counter at the same thing. Move the declaration inside of the scrape function.
If you need more information about asynchronous programming, refer to Felix's great answer in this question.
im very very new to Node.js, javascript in general, and also functional programming (which node is if im not mistaken?)
Im currently on stage of doing learnyounode tutorials.
I know i can find all the solutions and work it out just fine, but im a little curious why wouldnt my code work...
If anyone is familiar with the learnyounode im stuck at "Juggling async".
The code that i wrote:
var http = require("http");
var addriee = [process.argv[2], process.argv[3], process.argv[4]];
function getStuffFromNet(address, callback) {
http.get(address, function getShitDone(response) {
var dataToCallback = "";
response.on("error", function(data) {
callback(data, null);
});
response.on("data", function(data) {
dataToCallback+=data;
});
response.on("end", function(data) {
callback(null, dataToCallback);
});
});
};
function printToConsole(data) {
console.log(data);
}
printToConsole(getStuffFromNet(addriee[0]));
My goal was to reuse function that would get "stuff from net", the error i get is:
learnyounode run http-get3.js
undefined
/home/ubuntu/workspace/learnyounode/http-get3.js:17
callback(null, dataToCallback);
^
TypeError: undefined is not a function
at IncomingMessage.<anonymous> (/home/ubuntu/workspace/learnyounode/http-get3.js:17:7)
at IncomingMessage.emit (events.js:117:20)
at _stream_readable.js:944:16
at process._tickCallback (node.js:442:13)
Why is the last callback null and not data ?
Also it might be handier to not initialize
var dataToCallback = "";
to
var dataToCallback;
because else you can't use data
typeof dataToCallback !== 'undefined'
Not sure about 's atm.
Also try to comment you're code a lot more. Especially when you're learning it.
Example of some debugging level I have (noob or not I quickly find errors this way)
/**
* Divest the desired amount
*/
socket.on("divest", function (amount) {
error.debug(classname + "Divest is called [" + amount + "]");
invest.divest(hash, amount, function (err, callback) {
if (!err) {
error.debug(uid, name + " />divesting [CBACK]" + callback);
} else {
error.debug(uid, name + " />divesting [ERROR]" + err);
}
socket.emit("done", true);
});
});
Hope I helped.
To explain your situation, the data was read to the end and "callback" is invoked,
but the "callback" was not defined at the last line of your script.
If you wonder why the data.on("error" .....) wasn't triggered, It will only be triggered by data error of the http.get(), it means you are "ABLE TO READ DATA" from the URLs, so the http.get() will trigger data.on("data" ....) and data.on("end" .....) only.
I am wondering why trying to run the following test suite fails when I try to delete the table I have stored entities in. The error I get is the following
1) Azure Storage cloud storage operations "after all" hook:
Error: The specified resource does not exist. RequestId:3745d709-fa5e-4a2b-b517-89edad3efdd2
Time:2013-12-03T22:26:39.5532356Z
If I comment out the actual insertion of data it fails every other time, and if I try to do the insertion of data it fails every time with an additional "The table specified does not exist.".
For the first case this seems to indicate that there is some kind of delay in the table creation, so in every other test it is successful, and for the second case it seems to indicate that even though my callbacks are being called after table creation, the table(s) still aren't ready for data insertion.
The test suite and associated code looks like this:
describe('cloud storage operations', function () {
var storage;
before(function (done) {
this.timeout(5000);
storage = AzureStorage.usingTable('TEST', done);
});
after(function (done) {
storage.deleteTable(done);
});
it('should store without trouble', function (done) {
storage.save(factory.createChangeSet()).then(done, done);
});
});
... // snipped from azure.js
var AzureStorage = function (storageClient, tableName, callback) {
assert(storageClient && tableName && partitionKey, "Missing parameters");
this.storageClient = storageClient;
this.tableName = tableName;
var defaultCallback = function (err) { if (err) { throw error; } };
this.storageClient.createTableIfNotExists(this.tableName, function () {
callback();
} || defaultCallback);
};
AzureStorage.usingTable = function (tableName, callback) {
return new AzureStorage(
azure.createTableService(accountName, accountKey)
, tableName
, callback
);
};
AzureStorage.prototype.deleteTable = function (callback) {
this.storageClient.deleteTable(this.tableName, callback);
};
I've hit this using the c# library as well but I'm pretty sure the error message indicated the table could not be created because an operation was still in process for a table of the same name. Thinking of the backend supporting storage, it makes sense that it would not be instant. The table needs to be removed from the 3 local replicas as well as the replicas in the paired data center.
With that kind of async operation, it is going to be challenging to build up an tear them down fast enough for tests.
A workaround might be to increment a value appended to the "TEST" table name that would be unique to that test run.