Scraping data from Youtube with Cheerio

Scraping data from Youtube with Cheerio - javascript

var req = require('request');
var cheerio = require('cheerio');
req('https://www.youtube.com/channel/UCVRhrcoG6FOvHGKehYtvKHg/about', (err, response , body) => {
if(!err) {
let $ = cheerio.load(body);
console.log($('style-scope.ytd-channel-about-metadata-renderer').html())
} else {
console.log(err);
}
})
})
https://somon.is-inside.me/B49SiWJC.png
Hello, I'm trying to scrape the 'views' data in YouTube. But every time, it logs to console null.
There is a screenshot link, i'm trying to fetching data with class name but I couldn't get it to work. Where is the error?

You need to add the class selector character "."
try this console.log($('.style-scope .ytd-channel-about-metadata-renderer').html())
or this console.log($('.ytd-channel-about-metadata-renderer').html())

Related

Nothing shows up in the console when scraping a website

I'm doing a personal project where I want to scrape some game rankings off a website, but I'm unable to locate in the HTML the titles of the games that I want to scrape.
const request = require('request');
const cheerio = require('cheerio');
request('https://newzoo.com/insights/rankings/top-20-core-pc-games/', (error, response, html) => {
if (!error && response.statusCode == 200) {
const $ = cheerio.load(html);
//var table = $('#ranking');
//console.log(table.text());
$('.ranking-row').each((i,el) => {
const title = $(el).find('td').find('td:nth-child(1)').text();
console.log(title);
});
}
});

Change
const title = $(el).find('td').find('td:nth-child(1)').text();
to
const title = $(el).find('td:nth-child(2)').text();
PS: To debug xpaths, use the chrome debugger. If you go to this specific site and search for .ranking-row td td:nth-child(1), you will see that nothing is returned. But if you do .ranking-row td:nth-child(2) you would get the desired result.
This is a simple xpath error caused by looking for the same td twice and using the wrong index in nth-child.

convert cheerio.load() to a DOM object

I'm trying to learn how to make a web scraper and save content from a site into a text file using node. My issue is that to get the content, I am using cheerio and jquery (I think?), which I have no experience with. I'm trying to take the result I got from cheerio and convert it to a DOM object which I have much more experience dealing with. How can I take the html from cheerio and convert it to a DOM object? Thanks in advance!
const request = require('request');
const cheerio = require('cheerio');
request('https://www.wuxiaworld.com/novel/overgeared/og-chapter-153',(error, response, html) => {
if(!error & response.statusCode == 200) {
const $ = cheerio.load(html);
console.log(html);
html.getElementsByClassName('fr-view')[1];//I want the ability to do this
}
})

You are using cheerio, the first example there shows u how to add a class and get a string with the HTML.
You can change your code to look like that:
const request = require('request');
const cheerio = require('cheerio');
request('https://www.wuxiaworld.com/novel/overgeared/og-chapter-153',(error, response, html) => {
if(!error & response.statusCode == 200) {
const $ = cheerio.load(html);
const result = $('.my-calssName').html(); // cheerio api to find by css selector, just like jQuery.
console.log(result);
}
})

Web Scraping Node.js in DOM page

I want to get information from the site using Node.js
I tryied so hard, and ̶g̶o̶t̶ ̶s̶o̶ ̶f̶a̶r̶ . So, I want to get a magnet URI link, this link is in:
<div id="download">
<img src="/parse/s.rutor.org/i/magnet.gif">
How to get this link from div and href field using cheerio. I dont know how to jQuery, I just want to write an parser.
Here is my try:
const request = require('request');
const cheerio = require('cheerio');
request('http://s.new-rutor.org/torrent/562496/povorot-ne-tuda-5-krovnoe-rodstvo_wrong-turn-5-bloodlines-2012-bdrip-avc-p/', function(err, resp, body) {
if (!err){
const $ = cheerio.load(body);
var magnet = $('.href', '#downloads').text()
// $('#downloads').find('href').text()
console.log(magnet);
}
});
That code is only getting empty place in console

Note: I'm using request-promise instead of request
This code console.logs all a-tags with a href that contains 'magnet'
const request = require('request-promise');
const cheerio = require('cheerio');
request('http://s.new-rutor.org/torrent/562496/povorot-ne-tuda-5-krovnoe-rodstvo_wrong-turn-5-bloodlines-2012-bdrip-avc-p/').then(res => {
const $ = cheerio.load(res)
const links = $('a')
links.each(i => {
const link = links.eq(i).attr('href')
if (link && link.includes('magnet')) {
console.log(link)
}
})
})
eq selects a specific link from that index
links.each(i => links.eq(i))
then we grab the content inside the attribute href (the magnet link) with
links.eq(i).attr('href')

Using Node.js to find the value of Bitcoin on a webpage at real time

I'm trying to make a .js file that will constantly have the price of bitcoin updated (every five minutes or so). I've tried tons of different ways to web scrape but they always output with either null or nothing. Here is my latest code, any ideas?
var express = require('express');
var path = require('path');
var request = require('request');
var cheerio = require('cheerio');
var fs = require('fs');
var app = express();
var url = 'https://blockchain.info/charts/';
var port = 9945;
function BTC() {
request(url, function (err, res, body) {
var $ = cheerio.load(body);
var a = $(".market-price");
var b = a.text();
console.log(b);
})
setInterval(BTC, 300000)
}
BTC();
app.listen(port);
console.log('server is running on '+port);
It successfully says what port it's running on, that's not the problem. This example (when outputting) just makes a line break every time the function happens.
UPDATE:
I changed the new code I got from Wartoshika and it stopped working, but im not sure why. Here it is:
function BTCPrice() {
request('https://blockchain.info/de/ticker', (error, response, body) => {
const data = JSON.parse(body);
var value = (parseInt(data.USD.buy, 10) + parseInt(data.USD.sell, 10)) / 2;
return value;
});
};
console.log(BTCPrice());
If I have it console.log directly from inside the function it works, but when I have it console.log the output of the function it outputs undefined. Any ideas?

I would rather use a JSON api to get the current bitcoin value instead of an HTML parser. With the JSON api you get a strait forward result set that is parsable by your browser.
Checkout Exchange Rates API
Url will look like https://blockchain.info/de/ticker
Working script:
const request = require('request');
function BTC() {
// send a request to blockchain
request('https://blockchain.info/de/ticker', (error, response, body) => {
// parse the json answer and get the current bitcoin value
const data = JSON.parse(body);
value = (parseInt(data.THB.buy, 10) + parseInt(data.THB.sell, 10)) / 2;
console.log(value);
});
}
BTC();
Using the value as callback:
const request = require('request');
function BTC() {
return new Promise((resolve) => {
// send a request to blockchain
request('https://blockchain.info/de/ticker', (error, response, body) => {
// parse the json answer and get the current bitcoin value
const data = JSON.parse(body);
value = (parseInt(data.THB.buy, 10) + parseInt(data.THB.sell, 10)) / 2;
resolve(value);
});
});
}
BTC().then(val => console.log(val));

As the other answer stated, you should really use an API. You should also think about what type of price you want to request. If you just want a sort of index price that aggregates prices from multiple exchanges, use something like the CoinGecko API. Also if you need real-time data you need a websocket-based API, not a REST API.
If you need prices for a particular exchange, for example you're building a trading bot for one or more exchanges, you;ll need to communicate with each exchange's websoceket API directly. For that I would recommend something like the Coygo API, a node.js package that connects you directly to each exchange's real-time data feeds. You want something that doesn't add a middleman since that would add latency to your data.

Unable to get information from <div> Node spider with Cheerio

I'm trying to download the lat/long locations of CCTV locations from the City of Baltimore website (project on the surveillance state) but not getting the console to log anything.
Here's the site:
and my code is:
const request = require('request');
const cheerio = require('cheerio');
let URL = 'https://data.baltimorecity.gov/Public-Safety/CCTV-Locations/hdyb-27ak/data'
let cameras = [];
request(URL, function(err, res, body) {
if(!err && res.statusCode == 200) {
let $ = cheerio.load(body);
$('div.blist-t1-c140113793').each(function() {
let camera = $(this);
let location = camera.text();
console.log(location);
cameras.push(location);
});
console.log(cameras);
}
});
I've tried setting the to blist-t1-c140113793 and blist-td blist-t1-c140113793 but neither has worked.

That's because data for those divs are loaded asynchronously, after the page was rendered. JavaScript is not executed by Cherrio, or any other such library. You'll need either to analyze network traffic and understand which HTTP call loads this data, or use something like Selenium, that actually executes JavaScript inside the browser.

We Keep Coding

JavaScript is the programming language of the Web.

Scraping data from Youtube with Cheerio - javascript

You need to add the class selector character "." try this console.log($('.style-scope .ytd-channel-about-metadata-renderer').html()) or this console.log($('.ytd-channel-about-metadata-renderer').html())

Related

Nothing shows up in the console when scraping a website

convert cheerio.load() to a DOM object

Web Scraping Node.js in DOM page

Using Node.js to find the value of Bitcoin on a webpage at real time

Unable to get information from <div> Node spider with Cheerio

Categories

Resources