Puppeteer `textContent of undefined` - javascript

I'm trying to scrape some stats from basketball-reference using Puppeteer, but having some troubles accessing textContent of each of the <tr> in the table. The DOM structure looks something like
<table id="per_game_stats">
<tbody>
<tr>
<td data-stat="g">22</td>
</tr>
<tr>
<td data-stat="g">23</td>
</tr>
<tr>
<td data-stat="g">24</td>
</tr>
</tbody>
</table>
Here's my Puppeteer code that tries to access those values
const statsPerGame = await page.evaluate(() => {
const rows = Array.from(document.querySelectorAll('table#per_game_stats > tbody > tr'))
return rows.map((element) => element.innerHTML)
})
console.log(statsPerGame) // ... '<td data-stat="g">22</td>'
Okay, so element definitely exists. I want to get the value using textContent, but an error is thrown
const statsPerGame = await page.evaluate(() => {
const rows = Array.from(document.querySelectorAll('table#per_game_stats > tbody > tr'))
return rows.map((element) => element.querySelector('td[data-stat="g"]').textContent)
})
// TypeError: Cannot read property 'textContent' of null
Can someone help me out?

Related

Check if td element have certain value

i have a table with number :
<td id="table-number">{{ $loop->index + 1 }}</td>
now i want to get the number of "9" from the table row
Here is what i do :
const number = document.getElementById('table-number');
if(number.textContent.includes('9')) {
console.log('heyhey');
}
but it returns nothing. So, what should i do? I expect to get the table number.
ok guys, i got the answer at this post, sorry i didnt serach thoroughly. Need to upgrade my google skils
Assuming the <td> elements are produced in a loop and you want to know if any of them contain a 9, give the elements a class instead of id...
<td class="table-number">
and try something like this instead
const tableNumbers = Array.from(
document.querySelectorAll(".table-number"),
({ textContent }) => textContent
);
if (tableNumbers.some((content) => content.includes("9"))) {
console.log("heyhey");
}
You probably don't need an id or a class on the cells.
Use querySelectorAll to get a node list of all of the cells, coerce the node list to an array, and then find the cell with the text content that includes your query.
// Cache all the cells
const cells = document.querySelectorAll('td');
// Coerce the node list to an array on which you can
// use the `find` method to find the cell with the matching
// text
const found = [...cells].find(cell => {
return cell.textContent.includes(3);
});
if (found) console.log(found.textContent);
<table>
<tbody>
<tr>
<td>Cell 1</td>
<td>Cell 2</td>
</tr>
<tr>
<td>Cell 3</td>
<td>Cell 4</td>
</tr>
</tbody>
</table>

Sort table by two columns vanilla js [duplicate]

This question already has answers here:
Sorting an Array of Objects by two Properties
(5 answers)
Closed last year.
How can i sort my table by two columns ( Name, Available) on page load? I can use vanilla js
<table class="table_sort">
<thead>
<tr>
<th class="sorted-asc">Name</th>
<th>Genre</th>
<th>Publish year</th>
<th>Quanity</th>
<th class="available">Available</th>
</tr>
</thead>
<tbody id="tbody">
<tr>
<td>aname1</td>
<td>genre1</td>
<td>year1</td>
<td>quantity1</td>
<td>2</td>
</tr>
<tr>
<td>name1</td>
<td>genre1</td>
<td>year1</td>
<td>quantity1</td>
<td>1</td>
</tr>
<tr>
<td>aname1</td>
<td>genre1</td>
<td>year1</td>
<td>quantity1</td>
<td>10</td>
</tr>
<tr>
<td>aname1</td>
<td>genre1</td>
<td>year1</td>
<td>quantity1</td>
<td>6</td>
</tr>
</tbody>
</table>
I try to sort my table by two columns: Name, Available but it does not work.
document.addEventListener('DOMContentLoaded', () => {
const table = document.querySelector('.table_sort');
const indexToSorting = [...table.tHead.rows[0].cells].findIndex(cell => cell.classList.contains('sorted-asc'));
const availableIndexes = [...table.tHead.rows[0].cells].findIndex(cell => cell.classList.contains('available'));
const sortedRows = [...table.tBodies[0].rows].sort((rowA, rowB) => {
let cellC;
let cellD;
const sortedRowsByAvailable = [...table.tBodies[0].rows].sort((rowC, rowD) => {
cellC = rowC.cells[availableIndexes].innerText;
cellD = rowD.cells[availableIndexes].innerText;
const availableComparison = cellC.localeCompare(cellD);
return availableComparison;
});
const cellA = rowA.cells[indexToSorting].innerText;
const cellB = rowB.cells[indexToSorting].innerText;
const nameComparison = cellA.localeCompare(cellB);
return nameComparison !== 0 ? nameComparison : sortedRowsByAvailable
});
table.tBodies[0].append(...sortedRows);
});
My table is sorted by name, but i need to sort it by columns: name, available. Where is my mistake? I don't understand, please, help me
TBH, I didn't understand why you did cellA - cellB, but you need to add a comparison for Available cells in case cellA and cellB are equal.
const rowAAvailable = rowA.cells[indexToSorting].innerText;
const rowBAbailable = rowB.cells[indexToSorting].innerText;
const nameComparison = cellA.localeCompare(cellB);
const availableComparison = rowAAvailable.localeCompare(rowBAbailable); // sorry for terrible naming, but you get the idea
return nameComparison !== 0 ? nameComparison : availableComparison
This will sort your table by name as the first priority and available the second one.

How i can clear/refresh my table before getting new data from api?

I am trying to retrieve datas from an api by triggering a button.but evertimes i click the button the old datas remain exist which i dont want.i want the table will be reloaded and will have new datas from api.
const showData = document.querySelector('.showData')
const btn = document.querySelector('.shwData-btn')
btn.addEventListener('click', showdata)
function showdata(){
fetch('http://localhost:5000/posts')
.then(res => res.json())
.then(data=>{
data.forEach(item =>{
const id = item['_id']
const name = item.name
const email = item.email
const age = item.age
const tr = document.createElement('tr')
tr.innerHTML = `
<tr>
<td>${id}</td>
<td>${name}</td>
<td>${email}</td>
<td>${age}</td>
</tr>
`
showData.appendChild(tr)
})})}
<!-- language: lang-html -->
<button class="shwData-btn">Showdata</button>
<table class="showData">
<tr>
<td>id</td>
<td>email</td>
<td>name</td>
<td>age</td>
</tr>
</table>
You will have to render a blank table or clear all rows(tr) before populating it with data.
const showData = document.querySelector('.showData')
const btn = document.querySelector('.shwData-btn')
btn.addEventListener('click', showdata)
function showdata(){
fetch('http://localhost:5000/posts')
.then(res => res.json())
.then(data=>{
// Clear your table here or populate with blank data
// tbody because you do not want to clear column heading. Make sure you have tbody and theader
$(".showData tbody tr").remove();
data.forEach(item =>{
const id = item['_id']
const name = item.name
const email = item.email
const age = item.age
const tr = document.createElement('tr')
tr.innerHTML = `
<tr>
<td>${id}</td>
<td>${name}</td>
<td>${email}</td>
<td>${age}</td>
</tr>
`
showData.appendChild(tr)
})})}
<!-- language: lang-html -->
<button class="shwData-btn">Showdata</button>
<table class="showData">
<tr>
<td>id</td>
<td>email</td>
<td>name</td>
<td>age</td>
</tr>
</table>
Highly recommend to have a look at this as well:
Delete all rows in an HTML table

Mapping table children content with puppeteer

My goal is to fetch .textContent from different <td> tags, each lying within a separate <tr>.
I think the problem lies within the table variable, as I am not checking the correct variable for children. Currently, data variable is only fetching the first <tr>, so price evaluates with this code. However, volume and turnover does not. I think it is a simple fix but I just can't figure it out!
JavaScript:
try {
const tradingData = await page.evaluate(() => {
let table = document.querySelector("#trading-data tbody");
let tableData = Array.from(table.children);
let data = tableData.map(tradeData => {
console.log(tradeData);
let price = tradeData.querySelector(".quoteapi-price").textContent;
console.log(price);
let volume = tradeData.querySelector("quoteapi-volume").textContent;
console.log(volume);
let turnover = tradeData.querySelector("quoteapi-value").textContent;
console.log(turnover);
return { price, volume, turnover };
})
return data;
});
console.log(tradingData);
} catch (err) {
console.log(err);
}
HTML:
<table id="trading-data" class="qq_table">
<tbody>
<tr class="qq_tr_border_bot">
<td>Price</td>
<td class="qq_td_right quoteapi-number quoteapi-price" data-quoteapi="price">$0.105</td>
</tr>
<tr class="qq_tr_border_bot">
<td>Change</td>
<td class="qq_td_right pos" data-quoteapi="changeSignCSS">
<span data-quoteapi="change (signed)" class="quoteapi-number quoteapi-price quoteapi-change">0.005</span>
<span data-quoteapi="pctChange (pct)" class="quoteapi-number quoteapi-pct-change">(5.00%)</span>
</td>
</tr>
<tr class="qq_tr_border_bot">
<td>Volume</td>
<td class="qq_td_right quoteapi-number quoteapi-volume" data-quoteapi="volume scale=false">5,119,162</td>
</tr>
<tr>
<td>Turnover</td>
<td class="qq_td_right quoteapi-number quoteapi-value" data-quoteapi="value scale=false">$540,173</td>
</tr>
</tbody>
</table>
For example, this should return price="$0.11", volume="3,900,558", turnover="$412,187"
You only need the map function when you are expecting multiple tables or tbodies. As this seems not to be the case in your example, you can do it like this:
const tradingData = await page.evaluate(() => {
let table = document.querySelector("#trading-data tbody");
let price = table.querySelector(".quoteapi-price").textContent;
let volume = table.querySelector(".quoteapi-volume").textContent;
let turnover = table.querySelector(".quoteapi-value").textContent;
return { price, volume, turnover };
});
console.log(tradingData);

NodeJS: How can I scrape two different tables, that are visually part of the same table, into one JSON Object?

Here's an example of the table of data I'm scraping:
The elements in red are in the <th> tags while the elements in green are in a <td> tag, the <tr> tag can be displayed according to how they're grouped (i.e. '1' is in it's own <tr>; HTML snippet:
EDIT: I forgot to add the surrounding div
<div class="table-cont">
<table class="tg-1">
<thead>
<tr>
<th class="tg-phtq">ID</td>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-0pky">1</td>
<td class="tg-0pky">2</td>
<td class="tg-0pky">3</td>
</tr>
</tbody>
</table>
<table class="tg-2">
<thead>
<tr>
<th class="tg-phtq">Sample1</td>
<th class="tg-phtq">Sample2</td>
<...the rest of the table code matches the pattern...>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-0pky">Swimm</td>
<td class="tg-dvpl">1:30</td>
<...>
</tr>
</tbody>
<...the rest of the table code...>
</table>
</div>
As you can see, in the HTML they're actually two different tables while they're displayed in the above example as only one. I want to generate a JSON object where the keys and values include the data from the two tables as if they were one, and output a single JSON Object.
How I'm scraping it right now is a bit of modified javascript code I found on a tutorial:
EDIT: In the below, I've been trying to find a way to select all relevant <th> tags from both tables and insert them into the same array as the rest of the <th> tag array and do the same for <tr> in the table body; I'm fairly sure for the th I can just insert the element separately before the rest but only because there's a single one - I've been having problems figuring out how to do that for both arrays and make sure all the items in the two arrays map correctly to each other
EDIT 2: Possible solution? I tried using XPath Selectors and I can use them in devTools to select everything I want, but page.evaluate doesn't accept them and page.$x('XPath') returns JSHandle#node since I'm trying to make an array, but I don't know where to go from there
let scrapeMemberTable = async (page) => {
await page.evaluate(() => {
let ths = Array.from(document.querySelectorAll('div.table-cont > table.tg-2 > thead > tr > th'));
let trs = Array.from(document.querySelectorAll('div.table-cont > table.tg-2 > tbody > tr'));
// the above two lines of code are the main problem area- I haven't been
//able to select all the head/body elements I want in just those two lines of code
// just removig the table id "tg-2" seems to deselect the whole thing
const headers = ths.map(th => th.textContent);
let results = [];
trs.forEach(tr => {
let r = {};
let tds = Array.from(tr.querySelectorAll('td')).map(td => td.textContent);
headers.forEach((k,i) => r[k] = tds[i]);
results.push(r);
});
return results; //results is OBJ in JSON format
}
}
...
results = results.concat( //merge into one array OBJ
await scrapeMemberTable(page)
);
...
Intended Result:
[
{
"ID": "1", <-- this is the goal
"Sample1": "Swimm",
"Sample2": "1:30",
"Sample3": "2:05",
"Sample4": "1:15",
"Sample5": "1:41"
}
]
Actual Result:
[
{
"Sample1": "Swimm",
"Sample2": "1:30",
"Sample3": "2:05",
"Sample4": "1:15",
"Sample5": "1:41"
}
]

Categories