Click each element then scrape with horseman - javascript

I'm using a node.js module called horseman to scrape some data from a site which contains JavaScript. I'm having trouble figuring out how to click on each span element IF it contains a certain element within it, table in this case. This will expand that element and produce data available to scrape, which right now is hidden.
What I have right now
horseman
.open(url)
.click("span.title")
.waitforSelector("span.title")
.then(scrape)
The scrape function:
function scrape() {
return new Promise(function (resolve, reject) {
return getLinks()
.then(function (newLinks) {
links = links.concat(newLinks);
if (links.length < 1)
return horseman
.then(scrape);
}
})
.then(resolve);
});
}
And the getlinks function()
var links = [];
function getLinks() {
return horseman.evaluate(function () {
var links = [];
$("span.title").each(function (item) {
var link = {
title: $(this).text()
};
links.push(link);
});
return links;
});
}
My initial thoughts were that in the getLinks() function I could check if item contains table then click and then scrape, but not sure how to implement it.
The idea is to expand all the span elements, that are not already expanded, which means the data is visible and able to be scraped. I've hit a brick wall on what to do, so any help would be great!

The following code :
horseman
.open(url)
.click("span.title")
.waitforSelector("span.title")
.then(scrape)
...will not work because .click() horseman action only address single elements. Instead, you can try the following code that will work on many elements :
horseman
.open(url)
.evaluate(clickItems)
.waitforSelector("span.title XXX")
.then(scrape)
Where :
XXX should be the selector of the content inside the span.title (so the waitForSelector will actually wait). For example, let's consider this markup :
<span class="title"><!-- this is the clickable item -->
<table>...</table>
<div class="show-on-click">Blah blah</div>
</span>
In the above example, you would use .waitForSelector('span.item .show-on-click'). You have to find which selector does not exist until the data appears. (or use .wait(1000) instead)
clickItem function is defined as following (I see that you use jQuery so I will as well)
function clickItems() {
var $items = $('span.title:has(table)');
$items.each(function(index, $item) {
$item.click();
});
}
Note: This will click on all the elements span.title. You can modify the click element to add a table presence test in each $item, but I guess you can omit that if the other clicks do not do anything.

Related

I have a list of buttons with the same class name but different inner text, how do I get the value of the text on click?

My list is being populated with this block of code:
function addToHistory(cityName) {
let searchHistory = JSON.parse(localStorage.getItem("Weather Search History")) || [];
searchHistory.push(cityName);
localStorage.setItem("Weather Search History", JSON.stringify(searchHistory));
};
function updateHistory() {
let searchHistory = JSON.parse(localStorage.getItem("Weather Search History")) || [];
$("#searchHistory").html(searchHistory
.map(searchHistoryList => {
return (`<li><button class="btn btn-link"> ` + searchHistoryList + `</button></li>`);
})
.join(""));
};
and that works great. It pulls from an array in local storage that is created each time the user enters a search term. Then populates the site's sidebar with said list.
However, I'm not sure how to then take the text values of the buttons out so that I may manipulate it.
Currently have:
$('#searchHistory').on('click', function () {
console.log($(???).val());
});
You want .text() or innerText (plain JavaScript). this refers to the current element. You can also use event.target.
$('#searchHistory').on('click', function () {
console.log($(this).text());
});
Try this in your function:
console.log($(this).innerHTML());
"this" refers to the specific element that triggered the click event.

Recreate Select Control Options Dynamically jQuery

I have the following function which I use to populate a Select control with options. I am grabbing values from objects on the document, and if a condition is met, throwing another value into a Select Control as an option...
function dispatchList() {
//grab list element
var list = document.getElementById("techName");
//foreach div assigned the .square class,
$('.square').each(function () {
//convert each div with .square class toString
var square = $(this).html().toString();
//grab availability value
var availability = $(this).find('tr:eq(4)').find('td').text();
//grab IP
var online = $(this).find('tr:eq(3)').find('td').text()
//if availability and IP values meet below condition...
if ((availability === "True") && (online.indexOf("10.") === 0)) {
//grab the name value from this div
var availableName = $(this).find('tr:eq(0)').find('td').text();
//create a new option element
var item = document.createElement("option");
//create a new text node containing the name of the tech
item.appendChild(document.createTextNode(availableName));
//append the new text node (option) to our select control
list.appendChild(item);
}
})
}
This function works great, but it runs when the document is ready. I need it to run when the document is ready, but also to recreate this list without refreshing the page. Ideally the select control could be emptied and recreated with a click event on a div.
This is the part I have struggled with. I have the following click event which it would make sense to chain this to, but I have not been able to work it out...
function availability() {
//for each element with a class of .square...
$('.square').each(function () {
//grab the id of each input element (button) contained in each .square div...
var btnId = $(this).find("input").attr("id");
//when .square div is clicked, also click it's associated asp button...
$(this).on('click', function (clickEvent) {
document.getElementById(btnId).click();
//****AND ALSO TRIGGER THE dispatchList() FUNCTION TO REBUILD THE #techName LIST****
})
})
}
Can this be done without AJAX or some other post back on the select control?
Does the #techName list need to be emptied first, and then rebuilt?
Thank you for any advice!
$(document).ready(function(){
$(".square").on('click', function (clickEvent) {
var el = clickEvent.target || clickEvent.srcElement
document.getElementById($(el).find('input').attr("id")).click();
dispatchList();
})
})
That's all i can do with the given question. I didn't test the code. You can give fiddle or anything to test. Also this function is written in the browser.

How to display item from server once a link is clicked

student here! So I created a function that runs an $.ajax request, and for each item, I output html to the DOM. Well, check it out:
function getLinks (){
let blogLinks;
$.ajax(settings).then(function(data, status, xhr){
data.forEach(function(item, i, arr){
let id = item._id;
if (item.title){
blogLinks = `<li><a class="link" href=#${item.title}>${item.title}</a></li>`
$($findUl).append(blogLinks);
}
})
})
}
this var might be useful info as well:
let $blogViewAll =
$(`<div class="blog-view">
<h3>Check Out These Blogs</h3>
<div class="column-1">
<ul class="link-list"></ul>
</div>
<div class="column-2"></div>
</div>`);
let $findUl = $($blogViewAll).find('.link-list');
It's doing exactly what I want it to do and appending the links to the page. Now I want to click on the link and have the message body display on the page. So I'm trying this:
$findUl.on('click', function(e){
$.ajax(settings).then(function(data, status, xhr){
data.forEach(function(item, i, arr){
//since you clicked on that link
//I should look at what you clicked
//match it to the corresponding obj{item.body}
//and display it, but I don't know how to match and grab :(
})
})
});
And that's where I'm stuck! Any suggestions? Thanks!!!
(i'm using the tiny-za server btw)
You can access the current link the user clicked on in your initial argument e, via the property e.currentTarget:
$findUl.on('click', function(e){
// this gives you the DOM node that was clicked:
console.log(e.currentTarget);
}
A little shortcut to get the title using jQuery:
$findUl.on('click', function(e){
var title = $(e.currentTarget).text();
}
Another note, you should really only be loading the data from the server in that ajax call once, not every time something is clicked (which is sloooow).
I'd recommend loading the data into an object when the page loads, then using the obejct properties to append to the DOM. You'll be able to access that same object in future click handlers

Count every elements inside an iFrame with CasperJS

I am searching for a way to count every element inside a div. The problem is, the div is inside an iFrame.
casper.then(function() {
this.echo(this.evaluate(function() {
__utils__.getElementByXPath('//*[#id="main-frame"]', __utils__.findAll('#design-scrollbox-0')).length;
}));
});
What I trying to do above is:
Getting the iFrame with XPath
Collecting every element
and finally getting the length of the returned array.
I'd love if you could point my in the right direction. Sadly, I cannot use a jQuery version > 1.7, so jQuery.contents is not an option.
You could inject some other jQuery version, but you don't need it, since CasperJS provides a convenient way of changing into the iframe and doing stuff in its context. casper.withFrame is a shortcut for the PhantomJS functions page.switchToChildFrame and page.switchToParentFrame. It creates a new step from the callback where further steps can be nested.
There are certainly different types to count elements, but probably the easiest is using
casper.getElementsInfo(selector).length
This is the function for printing the number of links I use for the proofs-of-concept:
function printLinks(){
try{
this.echo("elements: " + this.getElementsInfo("a").length);
} catch(e) {
this.echo("CAUGHT: " + e);
}
}
Proof-of-concept for iframes:
casper.start("http://jsfiddle.net/anjw2gnr/1/")
.then(printLinks)
.withFrame(0, printLinks)
//.withFrame(1, printLinks)
.then(function() {
console.log('Done', this.getCurrentUrl());
})
.run();
prints
elements: 33
elements: 2
Proof-of-concept for frames:
casper.start("https://docs.oracle.com/javase/7/docs/api/index.html")
.then(printLinks)
.withFrame(0, printLinks)
.withFrame(1, printLinks)
.then(function() {
console.log('Done', this.getCurrentUrl());
})
.run();
prints
CAUGHT: CasperError: Cannot get information from a: no elements found.
elements: 210
elements: 4024
So, if you want to count elements, but don't want to use a try-catch block, this is better:
casper.exists(selector) ? casper.getElementsInfo(selector).length : 0
You can use Casper's switchToChildFrame (see for example this link) to get 'into' the iframe.
(untested):
casper.then(function() {
// switch the context to first child frame
this.page.switchToChildFrame(0);
// ... execute casper commands in iframe context
// and switch back to parent frame if you need to
this.page.switchToParentFrame();
// ... execute casper commands in parent page context
});
To count elements you could try (untested also):
casper.then(function() {
var amount_elements = this.evaluate(function() {
var elements = document.querySelectorAll("#design-scrollbox-0");
// you can store in the DOM:
window.amount_elements = elements.length;
// and return the amount
return elements.length;
});
});
// the object stored in the DOM can be used later on:
casper.then(function() {
var amount_elements = this.evaluate(function() {
return window.amount_elements;
});
});
You can always do this $('ul').children().length this will tell you all the children element in the selector. Hope it helps.

Dynamically added input box problem?

I have dynamically added div.In which i have text box.While adding dynamic div i can put a value to the current div but not the previously open divs. I want to ask how to add Value to the previously open text boxes of Div.
Thank You
here is a solution that refresh ALL. (I don't understand the "previously open text box" part of your question. Well I understand it, but it doesn't show in your code. I assume the "rhythm" column of your table is an input/textarea html element (since you use it's value).
Please note I'm not sure what the vitalset function is supposed to accomplish, or what "vitals_form_readings_1_rhythm" is.
function queryDb(statement)
{
dbQuery = new air.SQLStatement();
dbQuery.sqlConnection = db;
dbQuery.text = statement //"SELECT * FROM rhythm";
//alert(dbQuery.text);
try {
dbQuery.execute();
} catch (error) {
air.trace("Error retrieving notes from DB:", error);
air.trace(error.message);
return;
}
return (dbQuery.getResult());
}
function crhythm()
{
var statement = "SELECT * FROM rhythm";
return queryDb(statement)
}
function reading_speedcode()
{
if (!cvitals) {
var crhythms = crhythm();
var i=0;
$(crhythms).each( function () {
crhythm = this.crhythm;
var pr = 'card_' + i;
$('#rhythm1').append('<br/><td class="content_big" id="'+pr+'" name="'+pr+'">' + crhythm + ' </td>');
i++
});
}
});
$(document).ready( function () {
reading_speedcode();
$('#rhythm1 .content_big').live('click', function(event) {
$('#rhythm1').empty()
reading_speedcode();
});
});
now, there are several things about your code.
variable naming. (for god sake use meaningful names!)
reading full table when you need one row
where is cvitals declared or assigned?
string parsing. Jquery is good at working with set of elements, there should be no need to parse "pr" to recover the row number.
if a value is inserted in rhythm table (or deleted) before your click, the vitalset logic fails. you might want to use the table id instead.
make sure "#vitals_form_readings_1_rhythm" is unique, not retrieved from the table.
if you can answer my question from the top of this post(vitalset function, vitals_form_readings_1_rhythm, cvitals) I will try improve the code.

Categories