Navigation html with Cheerio web page scraper - javascript

I'm following this tutorial how to screen scrape with cheerio for Node.js and I'm 2-seconds away from just downloading the entire page and using Javascript to extract the information I need, which I'm sure is much more difficult than actually using Cheerio, but I'm having difficulty understanding now to navigate the HTML with Cheerio.
How do I extract the number '2' "blind-cow-white-number"?
Here is the HTML:
<div id="mainCows" class="row-fluid">
<div class="zone zone-content">
<article class="projection-page content-item">
<article class="post post-page content-item blind-cow">
<h1>blind cow</h1>
<div>
<div class="blind-cow-header" style="margin-bottom:15px">
<div class="blind-cow-list"> my list </div>
<div style="margin: 0 auto; width: 90%; text-wrap: none; text-align: center;">
<div class="blind-cow-white-number"> 1 </div>
<div class="blind-cow-white-number"> 2 </div>
</div>
<div class="blind-cow-died"> 3 </div>
<table class="blind-cow-table">
<table class="blind-cow-table">
<div class="blind-cow-Locations"> </div>
<br>
<div class="blind-cow-footer">
</div>
</article>
</article>
</div>
</div>
How do I achieve this with cheerios?
Is there a web screen scraper for node.js that allows me to use xpath instead?

Cheerio uses the same syntax and almost everything else as jQuery.
$(".blind-cow-white-number").eq(1).html();

Related

click event is not working. For few elements it is for few its not

In my code for cmd and normal its working, but for #aboutme its not. Don't know why such event is getting ignored while its working for the above snippet.
var normal = $(".normal");
var cmd = $(".cmd");
normal.on("click", function(){
$(".UI").show();
$(".console").hide();
});
cmd.on("click", function(){
$(".console").show();
$(".UI").hide();
});
$("#aboutme").on("click",function(){
console.log("okay");
});
My Html Code: class dashboard acts as a wrapper.
<div class="dashboard ">
<div class="option">
<div class="normal">Normal</div>
<div class="cmd">Terminal</div>
</div>
<hr style="background-color: white;">
<div class="console">
</div>
<div class="UI ">
<div class="showcase">
<div id="aboutme">
<h2><span>»About</span></h2>
<p>Self-motivated fresher seeking a career in recognized organization to prove my skills and utilize my knowledge and intelligence in the growth of organization.</p>
</div>
<div id="Skills">
<h2><span>Skills</span></h2>
<p> <kbd>Programming Languages</kbd> : Python, Node.js, C++</p>
<p> <kbd>Platform & Development Tools</kbd> : VS Code , Spyder and Jupiter Notebook</p>
</div>
</div>
</div>
Seems to work, can you provide your full HTML
$("#aboutme").on("click",function(){
console.log("okay");
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<button id="aboutme">Test</button>

Web Scrapy!! How can I crawl using Click event data?

I try to crawl this page : http://www.11st.co.kr/html/main.html
but there are some problems.
First, Scrapy cannot interpret javascript.
I want to get some 'href' data to crawl again in that button(red square one)
Site screencapture
even I cannot use selenium.
Because button code is in script.
so xpath can't find.
<script id="headerNavigationTemplate" type="text/x-handlebars-template">
{{#ifCond templateType '===' 'main'}}
<nav class="header_gnb" id="gnbNavArea">
{{else}}
<div class="header_gnb" id="gnbNavArea">
{{/ifCond}}
<div class="inner">
<h1 class="hide">대메뉴</h1>
<div class="gnb_l">
<div class="gnb_nav gnb_nav_category" id="gnbCategoryArea">
<p name="gnbNavBtn"><button type="button" class="gnb_btn_all" data-ga-event-category="PC_GNB" data-ga-event-action="전체보기 버튼" data-ga-event-label=""><span class="in_btn"><span class="ico"></span>전체보기</span></button></p>
<div class="gnb_nav_category_layer">
<div class="gnb_total_category">
<div class="row" id="navCtgrRow1"></div>
<div class="row" id="navCtgrRow2"></div>
<div class="row" id="navCtgrRow3"></div>
<div class="row" id="navCtgrRow4"></div>
<div class="row" id="navCtgrRow5"></div>
<div class="row" id="navCtgrRow6"></div>
<div class="row" id="navCtgrRow7"></div>
<div class="row" id="navCtgrRow8"></div>
<div class="row" id="navCtgrRow9"></div>
I want to get data that hide in
//div[#class = "gnb_total_category"]/div
how can I crawl.
Please help me.
Please try following script to get required data:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('http://www.11st.co.kr/html/main.html')
driver.find_element_by_xpath("//span[contains(text(), '전체보기')]").click()
print(driver.find_element_by_xpath('//div[#class="gnb_total_category"]/div').text)

How to grab value from ASTNode

I parsed an entire HTML page into an ASTNode.
Snippet of it looks like this:
<div id="buildings-wrapper">
<div id="building-info">
<h2><span class="field-content">Britney Spears' House</span></h2>
<div class="building-field">
<div class="field-content">9999 Hollywood Blvd</div>
</div>
<div class="building-field">
<div class="field-content">Building Hours: Mon. 07:00-23:00 Tue.-Fri. 06:30-22:00, Sat. 07:30-18:00, Sun. 12:00-18:00 Holidays - Closed</div>
</div>
<div class="building-field">
<div class="field-content">Locate on the stars map</div>
</div>
</div>
<div id="building-image">
<div class="field-content"><img src="../../../../ssc.adm.britneyspears.com/classroomservices/image/viewimage?userEvent=ShowBuildingImage&buildingID=britneyspears" alt="Image of BritneySpears"></div>
</div>
</div>
I need to grab some information from the page, like the name of the building and its address. How do I do that with an ASTNode? I've read the XML DOM tutorials and they suggest using document.getElementbyId and the ilk but I can't get to those functions from an ASTNode. I think I'm missing something simple but what's the easiest way to access the values I need?

Searchbox search in localstorage or html

this is what I have:
All the data that you see is saved in the localstorage. What I want now is that you can search and that all the sessions that are not equal to the search terms dissapear.
What is the best that I do? Search in the localstorage or search the html? (jquery/javascript)
HTML code:
<div id="content_wrapper" class="">
<div id="mastersearch" class="container hide">
<input type="text" id="txtmastersearch">
<div id="searchresults">
</div>
</div>
<div id="content" style="height: 520px;">
<style>
h2 {
margin-bottom: 0;
}
ul {
margin-bottom: 0;
}
</style>
<div class="container">
<div class="row">
<div class="span12">
<input id="searchbox" class="span12" type="search" placeholder="Search...">
</div>
</div>
<div id="here" class="row">
<div class="span6"><h2 class="before-blocks">Wednesday 3 April 2013</h2><ul id="here" class="sessionlist blocks unstyled"><li class="contentblock has-thumb">0:00 - 0:00<span class="ellipsis name" style="width: 95%;" '=""><b>tweede</b></span><span class="ellipsis"><em></em></span></li></ul></div><div class="span6"><h2 class="before-blocks">Sunday 21 April 2013</h2><ul id="here" class="sessionlist blocks unstyled"><li class="contentblock has-thumb"><a href="http://test1niels.m.niels.tapcrowd.com/sessions/view/21117">12:00 - 15:00<span class="ellipsis name" style="width: 95%;" '=""><b>html5 session</b></span>
Thanks in advance!
What is the best that I do? Search in the localstorage or search the html? (jquery/javascript)
If you are asking about what to choose:
get data from LocalStorage;
parse DOM;
I that case, I guess, getting data directly from LocalStorage is better solution.
It's logically better, cause html is just presentation layer and getting data from View instead getting it from Model isn't ok.

jade extend and include overwrites files for no reason

SlideBase.jade
.slideWrap
.slideInner
block slides
slideSet1.jade
extends SlideBase
append slides
.slide set1slide1
.slide set1slide2
.slide set1slide3
slideSet2.jade
extends SlideBase
append slides
.slide set2slide1
.slide set2slide2
.slide set2slide3
output.jade
#mySlides
p some copy
#slideZone
include slideSet1.jade
include slideSet2.jade
expected result:
<div id="mySlides>
<p>some copy</p>
<div id="slideZone>
<div class="slideWrap>
<div class="slideInner>
<div class="slide">set1slide1</div>
<div class="slide">set1slide2</div>
<div class="slide">set1slide3</div>
</div>
</div>
<div class="slideWrap>
<div class="slideInner>
<div class="slide">set2slide1</div>
<div class="slide">set2slide2</div>
<div class="slide">set2slide3</div>
</div>
</div>
</div>
</div>
actual result:
<div id="mySlides>
<p>some copy</p>
<div id="slideZone>
<div class="slideWrap>
<div class="slideInner>
<div class="slide">set1slide1</div>
<div class="slide">set1slide2</div>
<div class="slide">set1slide3</div>
</div>
</div>
<div class="slideWrap>
<div class="slideInner>
<div class="slide">set1slide1</div>
<div class="slide">set1slide2</div>
<div class="slide">set1slide3</div>
</div>
</div>
</div>
</div>
Rather than getting slideSet2.jade, jade compiler just repeats slideSet1.jade in its place. What am I doing wrong here?
DISCLOSURE:
I am running on Codekit with Jade version 0.27.2; and any accepted answer much address why its not working in my environment.
This issue was fixed in a newer version of jade. And CodeKit's version should be brought up to date.

Categories