Add space after each link in console - javascript

I am fetching all anchor tag links via web scraping and want to print all links with space between them so while console I used "\n" but it is not making space after end of first link and second link text start without space.
Code:
(async() => {
const html = await axios.get('https://www.xyz');
const $ = await cheerio.load(html.data);
let data = []
$(".div-previews").each((i, elem) => {
console.log('data::', $(elem).find(".header-text a").text() + "\n"); // show links with space between them
})();
})

This should work better - I replaced your each and find
(async() => {
const html = await axios.get('https://www.xyz');
const $ = await cheerio.load(html.data);
console.log('data::',
$(".div-previews .header-text a")
.map(function() { return this.textContent })
.get()
.join("\n") // or .join(" ")
)
})
Example
console.log('data::',
$(".div-previews .header-text a")
.map(function() {
return this.textContent
})
.get()
.join("\n") // or .join(" ")
)
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
<div class="div-previews">
<div class="header-text">
Link 1
</div>
</div>
<div class="div-previews">
<div class="header-text">
Link 2
</div>
</div>
<div class="div-previews">
<div class="header-text">
Link 3
Link 4
</div>
</div>

Related

How do I get text after single <br> tag in Cheerio

I'm trying to get some text using Cheerio that is placed after a single <br> tag.
I've already tried the following lines:
let price = $(this).nextUntil('.col.search_price.discounted.responsive_secondrow').find('br').text().trim();
let price = $(this).nextUntil('.col.search_price.discounted.responsive_secondrow.br').text().trim();
Here is the HTML I'm trying to scrape:
<div class="col search_price_discount_combined responsive_secondrow" data-price-final="5039">
<div class="col search_discount responsive_secondrow">
<span>-90%</span>
</div>
<div class="col search_price discounted responsive_secondrow">
<span style="color: #888888;"><strike>ARS$ 503,99</strike></span><br>ARS$ 50,39
</div>
</div>
I would like to get "ARS$ 50,39".
If you're comfortable assuming this text is the last child element, you can use .contents().last():
const cheerio = require("cheerio"); // 1.0.0-rc.12
const html = `
<div class="col search_price_discount_combined responsive_secondrow" data-price-final="5039">
<div class="col search_discount responsive_secondrow">
<span>-90%</span>
</div>
<div class="col search_price discounted responsive_secondrow">
<span style="color: #888888;"><strike>ARS$ 503,99</strike></span><br>ARS$ 50,39
</div>
</div>
`;
const $ = cheerio.load(html);
const sel = ".col.search_price.discounted.responsive_secondrow";
const text = $(sel).contents().last().text().trim();
console.log(text); // => ARS$ 50,39
If you aren't comfortable with that assumption, you can search through the children to find the first non-empty text node:
// ...
const text = $([...$(sel).contents()]
.find(e => e.type === "text" && $(e).text().trim()))
.text()
.trim();
console.log(text); // => ARS$ 50,39
If it's critical that the text node immediately follows a <br> tag specifically, you can try:
// ...
const contents = [...$(sel).contents()];
const text = $(contents.find((e, i) =>
e.type === "text" && contents[i-1]?.tagName === "br"
))
.text()
.trim();
console.log(text); // => ARS$ 50,39
If you want all of the immediate text children, see:
How to get a text that's separated by different HTML tags in Cheerio
cheerio: Get normal + text nodes
You should be able to get the price by using:
$('.col.search_price.discounted.responsive_secondrow').html().trim().split('<br>')
This gets the inner HTML of the element, trims extra spaces, then splits on the <br> and takes the 2nd part.
See example at https://jsfiddle.net/b7nt0m24/3/ (note: uses jquery which has a similar API to cheerio)

How to display specified number of data through link parameter?

I want to display from json server for example 5 of 100 objects. Is there any parameter like this one which sort?
const url = "http://localhost:8000/players?_sort=points&_order=desc";
const url = "http://localhost:8000/players?_sort=points&_order=desc";
let template = "";
fetch(url)
.then((res) => res.json())
.then((data) => {
data.forEach((player, idx) => {
template += `
<div class='modal-leaderboard__player-name'>
<h2>${idx + 1}. </h2>
<h2 data-player-rank>${player.name} </h2>
<h2 style='margin-left: auto'> <span data-points-rank>${player.points}</span> points</h2>
</div>
`;
});
this.rank.innerHTML += template;
});
if you mean fo example the first 5 items, can add a condition on var idx :
...
if(parseInt(idx)<6){
template+=...
....
this.rank.innerHTML += template;
}
...

Pass Cheerio element to Puppeteer to have it clicked

I'm scraping a website and I'm using Cheerio and Puppeteer.
I need to click a certain button with a given text. Here is my code:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.website.com', {waitUntil: 'networkidle0'});
const html = await page.content();
const $ = cheerio.load(html);
const items = [];
$('.grid-table-container').each((index, element) => {
items.push({
element: $($('.grid-option-name', element)[0]).contents().not($('.grid-option-name', element).children()).text() },
button: $('.grid-option-selectable>div', element)
});
});
items.forEach(item => {
if (item.element === 'Foo Bar') {
await page.click(item.button);
}
});
Here is the markup I'm trying to scrape:
<div class="item-table"></div>
<div class="item-table"></div>
<div class="item-table"></div>
<div class="item-table"></div>
<div class="item-table"></div>
<div class="item-table"></div>
<div class="item-table">
<div class="grid-item">
<div class="grid-item-container">
<div class="grid-table-container>
<div class="grid-option-header">
<div class="grid-option-caption">
<div class="grid-option-name">
Foo Bar
<span>some other text</span>
</div>
</div>
</div>
<div class="grid-option-table">
<div class="grid-option">
<div class="grid-option-selectable">
<div></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="item-table"></div>
<div class="item-table"></div>
Clicking on Cheerio element doesn't work. So, does exist any way to do it?
You could add jquery to the page and do it there:
await page.addScriptTag({path: "jquery.js"})
await page.evaluate(() => {
// do jquery stuff here
})
There's no way to do this. Puppeteer is a totally different API from Cheerio. The two don't talk to each other or interoperate at all. The only thing you can do is snapshot an HTML string in Puppeteer and pass it to Cheerio.
Puppeteer works in the browser context on the live website, with native XPath and CSS capabilities--basically, all the power of the browser at your disposal.
On the other hand, Cheerio is a Node-based HTML parser that simulates a tiny portion of the browser environment. It offers a small subset of Puppeteer's functionality, so don't use Cheerio and Puppeteer together under most circumstances.
Taking a snapshot of the live site, then re-parsing the string into a tree Cheerio can work with is confusing, inefficient and offers few obvious advantages over using the actual thing that's right in front of you. It's like buying a bike just to carry it around.
The solution is to stick with Puppeteer ElementHandle objects:
const puppeteer = require("puppeteer"); // ^19.0.0
const html = `
<div class="item-table">
<div class="grid-item">
<div class="grid-item-container">
<div class="grid-table-container">
<div class="grid-option-header">
<div class="grid-option-caption">
<div class="grid-option-name">
Foo Bar
<span>some other text</span>
</div>
</div>
</div>
<div class="grid-option-table">
<div class="grid-option">
<div class="grid-option-selectable">
<div></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
// for testing purposes
const el = document.querySelector(".grid-option-selectable > div");
el.addEventListener("click", e => e.target.textContent = "clicked");
el.style.height = el.style.width = "50px";
</script>
`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
for (const el of await page.$$(".grid-item-container")) {
const text = await el.$eval(
".grid-option-name",
el => el.childNodes[0].textContent
);
const sel = ".grid-option-selectable > div";
if (text.trim() === "Foo Bar") {
const selectable = await el.$(sel);
await selectable.click();
}
console.log(await el.$eval(sel, el => el.textContent)); // => clicked
}
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Or perform your click in the browser:
await page.$$eval(".grid-item-container", els => els.forEach(el => {
const text = el.querySelector(".grid-option-name")
.childNodes[0].textContent.trim();
if (text.trim() === "Foo Bar") {
document.querySelector(".grid-option-selectable > div").click();
}
}));
You might consider selecting using an XPath or iterating childNodes to examine all text nodes rather than assuming the text is at position 0, but I've left these as exercises to focus on the main point at hand.

Json file struggling with the length

So, i got everything almost working as i want it, just a mistake that im struggling. Everytime i search for an item, when the result for that item shows the length is repeated.
When i search for ox there are 2 results and that is correct, but the length (2) shows in both of them, i only display one
[Code]
const resultHtml = (itemsMatch) => {
if (itemsMatch.length > 0) {
const html = itemsMatch
.map(
(item) => `
<span>${itemsMatch.length}</span>
<div class="card">
<div class="items-img">
</div>
<div class="items-info">
<h4>${item.title}</h4>
<small>${item.path}</small>
</div>
</div>
`
)
.join('');
//console.log(html);
itemList.innerHTML = html;
}
};
////
Question 2
I got one more question, i was trying to get the image from the Json and what i got was the path haha
why the apth and not the img
const resultHtml = (itemsMatch) => {
if (itemsMatch.length > 0) {
const html =
`<span class="items-results">${itemsMatch.length} Resultados</span>` +
itemsMatch
.map(
(item) => `
<div class="card">
<div class="items-img">
${item.image}
</div>
<div class="items-info">
<h4>${item.title}</h4>
<small>${item.path}</small>
</div>
</div>
`
)
.join('');
console.log(html);
itemList.innerHTML = html;
}
};
If you move <span>${itemsMatch.length}</span> out of your map callback, it will not repeat for each item. Read more about map() here.
Replace:
const html = itemsMatch
.map(
(item) => `
<span>${itemsMatch.length}</span>
... more HTML here
`
)
.join('');
With this:
const html = `<span>${itemsMatch.length}</span>` + (
itemsMatch
.map(
(item) => `
<div class="card">
<div class="items-img">
</div>
<div class="items-info">
<h4>${item.title}</h4>
<small>${item.path}</small>
</div>
</div>
`
)
.join('')
);
Regarding your image issue:
You are just outputting the path and that's why it's printing out just the path. If you are trying to display an image then put the path as source of <img> tag.
So, instead of just:
${item.image}
Use:
<img src="${item.image}">

How to get child of div in cheerio

I am working with cheerio and I am stuck at a point where I want to get the href value of children div of <div class="card">.
<div class="Card">
<div class="title">
<a target="_blank" href="test">
Php </a>
</div>
<div>some content</div>
<div>some content</div>
<div>some content</div>
</div>
I got first childern correctly but i want to get div class=title childern a href value. I am new to node and i already search for that but i didn't get an appropriate answer.
var jobs = $("div.jobsearch-SerpJobCard",html);
here is my script
const rp = require('request-promise');
const $ = require('cheerio');
const potusParse = require('./potusParser');
const url = "";
rp(url)
.then((html)=>{
const Urls = [];
var jobs = $("div.Card",html);
for (let i = 2; i < jobs.length; i++) {
Urls.push(
$("div.Card > div[class='title'] >a", html)[i].attribs.href
);
}
console.log(Urls);
})
.catch(err => console.log(err));
It looks something like this:
$('.Card').map((i, card) => {
return {
link: $(card).find('a').text(),
href: $(card).find('a').attr('href'),
}
}).get()
Edit: the nlp library is chrono-node and I also recommend timeago.js to go the opposite way

Categories