Having problems structuring Cheerio scraping

Having problems structuring Cheerio scraping - javascript

I think this may be just basic syntax. I'm coming from Java and very new to Javascript. For example, when I see a $ in all the examples, my mind goes blank.
Code for parsing the HTTP request (which contains a bunch of dog shows) looks like (using the request library):
function parseRequest1(error, response, body) {
// TODO should check for error...
var Cheerio = require('cheerio');
parser = Cheerio.load(body);
var table2 = parser('.qs_table[bgcolor="#71828A"]');
var showList = [];
// skip over a bunch of crap to find the table. Each row with this BG color represents a dog show
var trows = parser('tr[bgcolor="#FFFFFF"]', table2);
trows.each(function(i, tablerow) {
var show = parseShow(tablerow);
if (show) // returns a null if something went wrong
showList.push(show);
});
// then do something with showList...
}
which is called by
Request.get(URL, parseRequest1);
So far, so good. Where I'm stuck is in how to write the parseShow function. I'd like to go something like
function parseShow(tableRow) {
var tds = parser('td', tableRow);
//and then go through the tds scraping info...
}
but I get an error:
TypeError: Object #<Object> has no method 'find'
at new module.exports (C:\Users\Morgan\WebstormProjects\agility\node_modules\cheerio\lib\cheerio.js:76:18)
at exports.load.initialize (C:\Users\Morgan\WebstormProjects\agility\node_modules\cheerio\lib\static.js:19:12)
at parseShow (C:\Users\Morgan\WebstormProjects\agility\routes\akc.js:20:15)
Looking at the stack trace, it looks like Cheerio is creating a new one. How am I supposed to pass the Cheerio parser down to the second function? Right now parser is a global var in the file.
I've tried a bunch of random things like these but they don't work either:
var tds = tableRow('td');
var tds = Cheerio('td', tableRow);
What I'm forced to do instead is a bunch of disgusting, fragile code accessing tableRow.children[1], tableRow.children[3], etc... (the HTML has /r/ns all over creation so many of the children are whitespace)

I know what you mean about the $(..). The $ is just a function name. I think it was chosen as it's short and catches the eye.
Used with Cheerio, and more generally JQuery, it is used with css selectors:
var table2 = $('.qs_table[bgcolor="#71828A"]');
The advantage of this is that table2 is now a selector Object and will have a .find() method which can be called.
In Jquery (I'm not so sure about Cheerio), the selector Object is also a collection, so multiple elements can be matched (or none).
The object model in javascript is a lot more dynamic than Java which can lead to much shorter - if more confusing code.
The code to parse table rows:
$('tr[bgcolor="#FFFFFF"]').each(function(i, tablerow) {
var show = tablerow.text();
if (show) // returns a null if something went wrong
showList.push(show);
});
In your code above parser(..) is used rather than $(..). However once, the object has been loaded with the body you can just keep using it:
parser('tr[bgcolor="#FFFFFF"]').each(function(i, tablerow) {
or to just find the rows of the table you want the following:
parser('.qs_table[bgcolor="#71828A"] tr[bgcolor="#FFFFFF"]').each(function(i, tablerow) {
The selector is css so this will find all tr[bgcolor="#FFFFFF"] elements which are children of the .qs_table[bg="#71828A'] element.

Related

Accessing prototype function from another Javascript file keeps returning 'undefined'

I've dug into this for a couple of hours, looking at Javascript prototype accessing another prototype function, Accessing a Javascript prototype function, Trigger one prototype function from another (including itself), Cannot call prototype method from another function and around 3-4 other similar questions, and thought "ok, that doesn't seem so bad" and went to implement a solution (or three) to my particular problem. Or so I had thought!
I have a JS file (compiled from Typescript) that contains an AppComponent class and several methods with it (shortened version focused on my specific trouble follows):
AppComponent = (function () {
function AppComponent() {
var _this = this;
this.gridNo = '1';
//... and so on...
}
AppComponent.prototype.MenuSelect = function (link) {
this.tabCount = 0;
this.tables = [];
utils_1.Logging(' MenuSelect: ' + JSON.stringify(link));
var grids = link.grids;
this.ws.emit('C:GDRDN', { ds: grids });
// build up some HTML to make a table of data and return it to
// the caller
return "grid stuff!";
};
.
.
.
}
The above is loaded up into Angular 2/Node (written by another co-worker) and works just fine in the context it was written: ie it displays tables of data ('grids') when called from other components written by that co-worker in TypeScript.
But when I am generating a menu and try to access the MenuSelect prototype directly from another, 'normal', JS file like so...
function createWHeelNavigation() {
basic.navigateFunction = function () {
var grids_selected = [ 4, 11 ];
var appcomp = new AppComponent();
output = appcomp.MenuSelect(grids_selected);
// minified.js function to add children content to a DOM element
$("grid_container").add(output);
}
// other navigation menu functions...
}
createWHeelNavigation();
...I continue to get "Uncaught ReferenceError: AppComponent is not defined" when I click on that particular 'basic' menu item, even though according to what I've read in SO and elsewhere that creating a 'new' instance of the object is the way to access its prototype methods.
So before I pull my hair out and go back to rocking in the corner of my office, whispering "mommy...", I thought I would pass this around to you fine people to see where I am going wrong. I have a niggling suspicion I should be using 'this' somewhere, but my eyes are crossing, and wish to be pointed in the right direction. Thanks for your time!

I continue to get "Uncaught ReferenceError: AppComponent is not defined" w
A common JavaScript ordering issue. Make sure you are loading your js / ts in the right order.
More
Please use modules if possible. https://basarat.gitbooks.io/typescript/content/docs/tips/outFile.html

Why can a MongoDb cursor be indexed as if it was an array?

I noticed that if I execute a JavaScript script using the mongo command, the script can treat a cursor object as if it was an array.
var conn = new Mongo('localhost:27017');
var db = conn.getDB('learn');
db.test.remove({});
db.test.insert({foo: 'bar'});
var cur = db.test.find();
print(cur[0].foo); //prints: bar
print(cur[1]); // prints: undefined
This seems like it should be beyond the capabilities of the JavaScript language, since there is no way to "overload the subscript operator". So how does this actually work?

As documentation says, it is special ability of driver. It automagicly converts cursor[0] to cursor.toArray()[0]. You can prove it by overriding toArray() with print function or new Error().stack to get callstack back. Here it is:
at DBQuery.a.toArray ((shell):1:32)
at DBQuery.arrayAccess (src/mongo/shell/query.js:290:17)
at (shell):1:2
As you can see, indexing calls arrayAccess. How? Here we have a dbQueryIndexAccess function, which calls arrayAccess.
v8::Handle<v8::Value> arrayAccess = info.This()->GetPrototype()->ToObject()->Get(
v8::String::New("arrayAccess"));
...
v8::Handle<v8::Function> f = arrayAccess.As<v8::Function>();
...
return f->Call(info.This(), 1, argv);
And here we have a code, which sets indexed property handler to this function. WOW, v8 API gives us ability to add this handler!
DBQueryFT()->InstanceTemplate()->SetIndexedPropertyHandler(dbQueryIndexAccess);
... and injects it into JS cursor class, which is defined originaly in JS.
injectV8Function("DBQuery", DBQueryFT(), _global);
Tl;dr: It is hacked in C++ source code of mongo shell.

Alfresco JavaScript - How to get list of allowed values of a node property?

I'm doing something like this:
document.properties["my:customProperty"] = getSomehowTheProperty(document);
my:customProperty is a string, which has some allowed values in the content model.
How can I get the allowed values from the content model, so that I don't have to store them in a JavaScript array inside the script?
Or how else can I check, that the function getSomehowTheProperty returned an allowed value?
I tried to wrap it with try-catch:
try {
document.properties["my:customProperty"] = getSomehowTheProperty(document);
document.save();
} catch (e) {
document.properties["my:customProperty"] = "Default Value";
document.save();
}
But it looks like integrity is checked and th error is thrown at the end of executing the script, not inside the try block.
Googling "alfresco js allowed values of node properties" and similar queries gives me nothing.

In order to get that sort of information, you'll have to use the DictionaryService to get the PropertyDefinition
Off the top of my head, you'll want to do something like:
QName customPropertyQ = QName.createQName("my:customProperty", namespaceService);
PropertyDefinition customPropertyT = dictionaryService.getProperty(customPropertyQ);
List allowedValues = customPropertyT.getConstraints();
That'd be in Java, see this blog post for details on how to work with the DictionaryService from JavaScript

initialise timezoneJS with JSON files

I'm working with timezone-js: https://github.com/mde/timezone-js. I have a list of predefined timezones I want to work with. So I pre-parsed JSON Data of those timezones.
But how exactly am I supposed to use this data?
var _tz = timezoneJS.timezone;
_tz.loadingScheme = _tz.loadingSchemes.MANUAL_LOAD;
_tz.loadZoneJSONData('/major_cities.json', true);
I can read the data, like here. But how am I supposed to use the tz variable to initialise timezoneJS?
I'm thinking that I'm supposed to do something like this first:
timezoneJS.timezone.loadZoneDataFromObject(_tz);
And then initialise it... And then initialise timezoneJS, but if I initialise now, I'll get an error that it can't find the default timezone: Uncaught Error: Error retrieving "null/northamerica" zoneinfo files, probably because I've supplied the json data.
Can I'd like to know what to do to use the json file, so I can create timezoneJS.Date objects.

First of all, tz_ is just a short cut which prevents writing timezoneJS.timezone in the subsequent lines.
Now, there are two options. If you have your file major_cities.json on the server and you want to initialize timezoneJS you just have to do what you wrote:
var _tz = timezoneJS.timezone;
_tz.loadingScheme = _tz.loadingSchemes.MANUAL_LOAD;
_tz.loadZoneJSONData('/major_cities.json', true);
and you're all set. The second option is that you have an object containing data from that file. In such case you should use loadZoneDataFromObject instead of loadZoneJSONData, namely:
var _tz = timezoneJS.timezone;
_tz.loadingScheme = _tz.loadingSchemes.MANUAL_LOAD;
_tz.loadZoneDataFromObject(majorCitiesObject);
After that, you should not try calling the init function, hence timezoneJS is already initialized by the loadZoneJSONData. If you want to create a date, just call new timezoneJS.Date(). The following lines should give you a hint:
var timezoneName = 'Europe/London';
var newDate = new timezoneJS.Date(timezoneName);
console.log(newDate.toString());
console.log(newDate.toISOString());
Result should be sth like:
2013-04-08 11:56:33
2013-04-08T10:56:33.019Z

How to add functions to some jQuery objects, but not others?

Let's say I have a <ul> list:
<ul class="products">
...
</ul>
I want to select it with jQuery, then add some functions to that object. For example, I'd like to add an addProduct(productData) function and a deleteProduct(productId) function. However, I'd like the functions to only be added to the object that's returned by the selector. So for example, something like this:
var productList = $.extend($('ul.products'), {
addProduct: function(productData) {
// add a new li item
},
deleteProduct: function(productId) {
// delete li with id
}
});
How would I do this using jQuery? The key point here is that I only want to add the functions to an instance returned by a jQuery selector. In other words, I don't want to modify jQuery's prototype or create a plugin, since those will make the functions available across everything, whereas I only want to add the functions to one specific instance.

If you only want the addProduct and deleteProduct methods to feature on that single jQuery object, then what you've got will work fine; but you'll have to keep a reference to that jQuery object/ only use it once, to preserve the existance of the addProduct and deleteProduct methods.
However, these addProduct and deleteProduct methods are unique to that particular jQuery object; the methods won't exist on any other jQuery objects you create;
var productList = $.extend($('ul.products'), {
addProduct: function(productData) {
// add a new li item
},
deleteProduct: function(productId) {
// delete li with id
}
});
// Using this particular jQuery object (productList) will work fine.
productList.addProduct();
productList.removeProduct();
// However, addProduct() does not exist on new jQuery objects, even if they use
// the same selector.
$('ul.products').addProduct(); // error; [object Object] has no method 'addProduct'
The best way do to this would be to go-back-to-basics and define separate addProduct and deleteProduct functions, which accept a jQuery object. If you wanted to restrict these functions to they only worked on the ul.products selector, you could do;
function addProduct(obj) {
obj = obj.filter('ul.products');
// do something with `obj` (its a jQuery object)
}
This approach would be recommended as it keeps the jQuery.fn API consistent; otherwise you'd be adding addProduct and removeProduct to some jQuery.fn instances but not others, or making their usage redundant in others. With this approach however addProduct and removeProduct are always there, but don't get in anyones way if they don't want to use them.
Historical Notes
This answer was originally written in November 2011, when jQuery 1.7 was released. Since then the API has changed considerably. The answer above is relevant to the current 2.0.0 version of jQuery.
Prior to 1.9, a little used method called jQuery.sub used to exist, which is related to what you're trying to do (but won't help you unless you change your approach). This creates a new jQuery constructor, so you could do;
var newjQuery = jQuery.sub();
newjQuery.fn.addProduct = function () {
// blah blah
};
newjQuery.fn.deleteProduct = function () {
// blah blah
};
var foo = newjQuery('ul.products');
foo.deleteProduct();
foo.addProduct();
var bar = jQuery('ul.products');
bar.deleteProduct(); // error
bar.addProduct(); // error
Be careful though, the $ method alias would reference the old jQuery object, rather than the newjQuery instance.
jQuery.sub was removed from jQuery in 1.9. It is now available as a plugin.

You can make your own jQuery methods as follows:
$.fn.addProduct = function(){
var myObject = $(this);
// do something
}
// Call it like:
$("ul.products").addProduct();
This is a bit tricky though because you are making methods that are very specific to lists. So, to be sure you should at least add some checking on the object's type and handle the code correctly if the current object is, let's say an input element.
An alternative is to make a normal Javascript method that receives the list as a parameter. That way you can make a more list specific method.

I think you want to add a function to that DOM Object.
$(function(){
// [0] gets the first object in array, which is your selected element, you can also use .get(0) in jQuery
$("#test")[0].addProduct = function(info){
alert("ID: " + this.id + " - Param: " + info);
};
$("#test")[0].addProduct("productid");
});
Above script wil alert "ID: test - Param: productid"
A live example: http://jsfiddle.net/jJ65A/1/
Or normal javascript
$(function(){
document.getElementById("test").addProduct = function(info){
alert(info);
};
});

I think may be just using delegate in jQuery:
$(".parentclass").delegate("childclass","eventname",function(){});

We Keep Coding

JavaScript is the programming language of the Web.

Having problems structuring Cheerio scraping - javascript

Related

Accessing prototype function from another Javascript file keeps returning 'undefined'

Why can a MongoDb cursor be indexed as if it was an array?

Alfresco JavaScript - How to get list of allowed values of a node property?

initialise timezoneJS with JSON files

How to add functions to some jQuery objects, but not others?

Categories

Resources