node.js puppeteer "document is not defined" - javascript

I am attempting to try click a button using code without an id or class, but my terminal always responds with:
document.getElementsByTagName("Accept Cookies");
^
ReferenceError: document is not defined
This is my code:
const puppeteer = require('puppeteer');
const product_url = "https://www.nike.com/launch"
async function givePage() {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
return page;
}
async function acceptCookies(page) {
await page.goto(product_url);
const btn = await page.waitForSelector('#cookie-settings-layout > div > div > div >
div:nth-child(3) > div.ncss-col-md-6.ncss-col-sm-12.mb5-sm > button')
await btn.click()
}
async function notifyMe(page) {
await page.goto(product_url);
document.querySelector("button[type=\"submit\"]").click("Notify Me");
}
async function checkout() {
var page = await givePage();
await acceptCookies(page);
await notifyMe(page);
}
checkout();
What did I do wrong and how can I fix this?

There's no built-in variable in NodeJS named document, since it doesn't run in the browser.
If you want to access document, in Puppeteer there's a page.evaluate() function where you can access the document variable (as well as everything else inside client-side JS):
// ...
await page.evaluate(() => {
document.querySelector("button[type=\"submit\"]").click();
});
Please note though, that all the JavaScript you run will be run on the browser, not in NodeJS, so if you want to get the value back you can return:
const result = await page.evaluate(() => {
var something = document.getElementById("something");
return something.innerText;
});
console.log(result); // will print in the console "blah blah blah"
Likewise if you want to pass variables to the callback you have to give them to the evaluate function:
await page.evaluate((name, age) => {
// do something with 'name' and 'age'
}, "John", 34);

You already have an example on your code on how to access elements. Instead of document.querySelector, use page.waitForSelector like what you did on line 12.
document.querySelector('button[type="submit"]').click()
should be
(await page.waitForSelector('button[type="submit"]')).click()

In Nodejs, you don't have access to web APIs like a window, document, etc. so you can't use document.querySelector to select elements here.
Instead of handling clicks on DOM elements on server side, you should handle those clicks on the client-side only and then fetch the data from the server accordingly.

Related

Read a page stylesheet and check a specific property with puppeteer

What I am trying to do is:
Load the page
Gain access to the contents of an external css named "mystyle.css"
Check if ".some_class" border has the value "2px"
I have tried
describe('CSS tests', () => {
it('.some_class border is 2px', async function () {
await page.goto(<homepageurl>);
const stylesheet = await page.evaluate(() => {
return document.querySelector("link[href*='mystyle.css']");
});
console.log(current_styles);
// rest of the code
});
});
I am getting an empty object {} as a result so I am lost and don't know how to carry on.

What is the difference between page.$$(selector) and page.$$eval(selector, function) in puppeteer?

I'm trying to load page elements into an array and retrieve the innerHTML from both and be able to click on them.
var grabElements = await page.$$(selector);
await grabElements[0].click();
This allows me to grab my elements and click on them but it won't display innerHTML.
var elNum = await page.$$eval(selector, (element) => {
let n = []
element.forEach(e => {
n.push(e);
})
return n;
});
await elNum[0].click();
This lets me get the innerHTML if I push the innerHTML to n. If I push just the element e and try to click or get its innerHTML outside of the var declaration, it doesn't work. The innerHTML comes as undefined and if I click, I get an error saying elnum[index].click() is not a function. What am I doing wrong?
The difference between page.$$eval (and other evaluate-style methods, with the exception of evaluateHandle) and page.$$ is that the evaluate family only works with serializable values. As you discovered, you can't return elements from these methods because they're not serialiable (they have circular references and would be useless in Node anyway).
On the other hand, page.$$ returns Puppeteer ElementHandles that are references to DOM elements that can be manipulated from Puppeteer's API in Node rather than in the browser. This is useful for many reasons, one of which is that ElementHandle.click() issues a totally different set of operations than running the native DOMElement.click() in the browser.
From the comments:
An example of what I'm trying to get is: <div class = "class">This is the innerHTML text I want. </div>. On the page, it's text inside a clickable portion of the website. What i want to do is loop through the available options, then click on the ones that match an innerHTML I'm looking for.
Here's a simple example you should be able to extrapolate to your actual use case:
const puppeteer = require("puppeteer"); // ^19.1.0
const {setTimeout} = require("timers/promises");
const html = `
<div>
<div class="class">This is the innerHTML text I want.</div>
<div class="class">This is the innerHTML text I don't want.</div>
<div class="class">This is the innerHTML text I want.</div>
</div>
<script>
document.querySelectorAll(".class").forEach(e => {
e.addEventListener("click", () => e.textContent = "clicked");
});
</script>
`;
const target = "This is the innerHTML text I want.";
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
///////////////////////////////////////////
// approach 1 -- trusted Puppeteer click //
///////////////////////////////////////////
const handles = await page.$$(".class");
for (const handle of handles) {
if (target === (await handle.evaluate(el => el.textContent))) {
await handle.click();
}
}
// show that it worked and reset
console.log(await page.$eval("div", el => el.innerHTML));
await page.setContent(html);
//////////////////////////////////////////////
// approach 2 -- untrusted native DOM click //
//////////////////////////////////////////////
await page.$$eval(".class", (els, target) => {
els.forEach(el => {
if (target === el.textContent) {
el.click();
}
});
}, target);
// show that it worked and reset
console.log(await page.$eval("div", el => el.innerHTML));
await page.setContent(html);
/////////////////////////////////////////////////////////////////
// approach 3 -- selecting with XPath and using trusted clicks //
/////////////////////////////////////////////////////////////////
const xp = '//*[#class="class"][text()="This is the innerHTML text I want."]';
for (const handle of await page.$x(xp)) {
await handle.click();
}
// show that it worked and reset
console.log(await page.$eval("div", el => el.innerHTML));
await page.setContent(html);
///////////////////////////////////////////////////////////////////
// approach 4 -- selecting with XPath and using untrusted clicks //
///////////////////////////////////////////////////////////////////
await page.evaluate(xp => {
// https://stackoverflow.com/a/68216786/6243352
const $x = xp => {
const snapshot = document.evaluate(
xp, document, null,
XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null
);
return [...Array(snapshot.snapshotLength)]
.map((_, i) => snapshot.snapshotItem(i))
;
};
$x(xp).forEach(e => e.click());
}, xp);
// show that it worked
console.log(await page.$eval("div", el => el.innerHTML));
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Output in all cases is:
<div class="class">clicked</div>
<div class="class">This is the innerHTML text I don't want.</div>
<div class="class">clicked</div>
Note that === might be too strict without calling .trim() on the textContent first. You may want an .includes() substring test instead, although the risk there is that it's too permissive. Or a regex may be the right tool. In short, use whatever makes sense for your use case rather than (necessarily) my === test.
With respect to the XPath approach, this answer shows a few options for dealing with whitespace and substrings.

get all spans and click them with puppeteer - fails with "node not visible' and other errors

I have a HTML page that has many div elements, each one with the following structure (the input id and name changes):
<div class="item">
<div class="box">
<div class="img-block">
<label for="check-11">
<input id="check-11" name="result11" type="checkbox">
<span class="fake-input"></span>
</label>
</div>
</div>
</div>
I want to use puppeteer to get all the span with the 'fake-input' class and click on them.
The problem is that it never works, no matter what I try.
In every attempt the start is the same:
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(baseUrl, { waitUntil: 'networkidle2' });
// FETCHING AND CLICKING
}();
I tried many things:
1:
await page.waitForSelector('span.fake-input');
await page.click('span.fake-input');
2:
await page.waitForSelector('span.fake-input')
.then(()=>{
console.log(`clicked!`);
page.click('span.fake-input')
3:
const spans = await page.evaluate(() => {
return Array.from(document.querySelectorAll('span'), el => el.textContent)
})
console.log('spans', spans)
for (let index = 0; index < 7; index++) {
const element = spans[index];
await page.click('span')
}'=
4:
await page.evaluate(selector=>{
return document.querySelector(selector).click();
},'span.fake-input)
console.log('clicked');
In every solution the page fails to get anything at all (either return null or undefined, so the error is "click" is not a funciton in null) or it fails with the error "Node is either not visible or not an HTMLElement".
No matter the error, in any case I fail to fetch all the spans, and click on them.
Can anyone tell me what I'm doing wrong?
Use page.$$ to return multiple elements (equivalent of document.querySelectorAll). Use page.$ to return a single element (equivalent of document.querySelector).
If you want to extract a certain value from a group of elements, use page.$$eval and page.$eval for a single element.
e.g. return elementHandle to script
const spans = await page.$$('div#item label .fake-input')
spans.forEach(span=>span.click())
If you extracting a value from an element, pass a callback to it that returns what you need to extract
e.g.
const spanTexts = page.$$eval('div#item label .fake-input', spans => {
spans.map(span=>span.innerText)
})
console.log(spanTexts)
I should add that page.$$eval and page.$eval executes your callback in the browser context.
var obj = document.querySelectorAll("span.fake-input");
for(var i=0;i<obj.length;i++){
obj[i].click();
}
Vanilla JavaScript would work much easier

How do I get whole html from Apify Cheerio crawler?

I want to get the whole html not just text.
Apify.main(async () => {
const requestQueue = await Apify.openRequestQueue();
await requestQueue.addRequest({
url: //adress,
uniqueKey: makeid(100)
});
const handlePageFunction = async ({ request, $ }) => {
var content_to = $('.class')
};
// Set up the crawler, passing a single options object as an argument.
const crawler = new Apify.CheerioCrawler({
requestQueue,
handlePageFunction,
});
await crawler.run();
});
When I try this the crawler returns complex object. I know I can extract the text from the content_to variable using .text() but I need the whole html with tags like . What should I do?
If I understand you correctly - you could just use .html() instead of .text(). This way you will get inner html instead of inner text of the element.
Another thing to mention - you could also put body to handlePageFunction arg object:
const handlePageFunction = async ({ request, body, $ }) => {
body would have the whole raw html of the page.

Can I simulate pressing the "Enter" key in Puppeteer using only a frame reference?

I'd really like to submit a form in an iframe using Puppeteer, which I've found I can do pretty easily by going
page.keyboard.press('Enter');
However, for nearly everything else I want to do, all I need to pass around is a reference to the iframe I'm interested in. For instance, I may have a method that fills out and submits a form like so:
// Some other setup script
const page = await context.newPage();
const frame = page.frames().find(frame => frame.name() === 'myFrame'); // Iframe ref
// Utility method
function useTheForm(frame) {
// ...
// Do other misc form setup
// ...
await frame.type('myInput', 'Some Value');
// TODO: Submit the form... somehow...
// "frame.keyboard" doesn't exist. Need some kind of ref like "frame.page"
// frame._frameManager._page.keyboard.press('Enter') works, but is kind of dirty...
}
// Use our utility method
useTheForm(frame);
I'd really like a way to submit the form using the "Enter" key without having to also keep track of and pass around a reference to page as well, but I'm hesitant to use intended-to-be-internal properties that aren't documented in the API.
You can focus an element in the iframe and then press a key with page.keyboard. Here is an example that press Enter on a focused link in an iframe causing iframe navigation (though this navigation seems failed due to site iframe policy):
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch(
{ headless: false, defaultViewport: null });
const [page] = await browser.pages();
await page.goto('https://example.org/');
const data = await page.evaluate(() => {
document.body.appendChild(document.createElement('iframe')).src =
'https://example.org/?foo=bar';
});
await page.waitFor(3000);
console.log(page.frames().map(frame => frame.url()));
await page.frames()[1].focus('a');
await page.keyboard.press('Enter');
//await browser.close();
} catch (err) {
console.error(err);
}
})();

Categories