Node.js: puppeteer focus() function - javascript

I am trying to login on the site using puppeteer and then some other stuff after I am logged in. Connection to site was successful, but I have problem with function focus(). It needs a selector as an parameter, but after inserting one, it show an error (selector is good, because I ran document.querySelector("input.login-field") in console of the site and returned this:
<input class="login-field" type="text" inputmode="email" autocapitalize="none" name="m" placeholder="Email or username" value="">). What's the problem?
Here's my code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless: false, slowMo: 25});
const page = await browser.newPage();
await page.goto("site");
await page.focus("input.login-field");
await page.keyboard.type("information");
await browser.close();
})();

If you're sure that the selector is good and it's working in the console in headful mode, try to wait until the page scripts are downloaded, started, and the needed element appeared in the DOM:
await page.goto("site");
await page.waitForSelector('input.login-field'); // <-- wait until it exists
await page.focus("input.login-field");

Related

Iterating through a list of urls with puppeteer

I'm attempting to iterate through a list of URLs, and instead of puppeteer loading each page, it only loads one. What can I do to make this work?
async function main() {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.setViewport({width: 1200, height: 720});
await page.goto('https://s23.a2zinc.net/clients/acmedia/americancoatingsshow2022/Public/Exhibitors.aspx?Index=All#', { waitUntil: 'networkidle0' }); // wait until page load
const hrefs = await page.$$eval('a', as => as.map(a => a.href));
for (let i = 0; i < hrefs.length; i++) {
const url = hrefs[i];
if (url.includes('eBooth.aspx')) {
console.log(url)
const page = await browser.newPage()
await page.goto(`${url}`);
await page.waitForNavigation({ waitUntil: 'networkidle0' });
}
}
main();
The main problem is an extra await page.waitForNavigation({ waitUntil: 'networkidle0' }); that will fail to resolve. page.goto already waits for navigation, so you're asking Puppeteer to wait for a navigation that will never happen.
Only use page.waitForNavigation if you're doing something to trigger a navigation, not as part of a typical page.goto call. Remove this line and your code should work (more or less) as expected.
Furthermore, you're opening a whole new page (browser tab) per link. That's 360 tabs by my count, liable to run most computers out of memory. Better to navigate a single page repeatedly or close pages after you're finished doing whatever you plan to do on these pages. If that's too slow, try running chunks in parallel or using a task queue.
Also, the links are available in the static HTML, so you might not need Puppeteer here, again, depending on what you're planning on doing on each page. If you can get all of the data from each page statically, you could have a massive speedup, completing 360 scrapes with fetch/cheerio in a fraction of the time it'd take Puppeteer.
If you do stick with Puppeteer to bypass detection or deal with JS/interactivity, consider using domcontentloaded rather than networkidle0, which is usually unnecessarily strict and slow. The blog post linked explains the difference between the various loading conditions. See also my answer in the canonical thread Puppeteer wait until page is completely loaded for a deeper dive into page loading in Puppeteer.
a[href] is a more precise selector than a, because it's possible that some a anchors have no href and should be discarded to avoid undefineds popping up.
Here's how I'd write this (with the aforementioned caveat that Puppeteer might not be needed at all):
const puppeteer = require("puppeteer"); // ^14.3.0
let browser;
(async () => {
browser = await puppeteer.launch({headless: false});
const [page] = await browser.pages();
await page.setViewport({width: 1200, height: 720});
const url = "https://s23.a2zinc.net/clients/acmedia/americancoatingsshow2022/Public/Exhibitors.aspx?Index=All#";
await page.goto(url, {waitUntil: "domcontentloaded"});
const hrefs = await page.$$eval("a[href]", els =>
els.map(a => a.href).filter(e => e.includes("eBooth.aspx"))
);
console.log(hrefs.length); // => 360
for (const url of hrefs) {
await page.goto(url);
// page is loaded; do your thing on this page
}
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;

why puppeteer does not type in some inputs

Somtimes puppeteer does not type in some input fields, to be specific, I tried to simply type something in a website's input filed called "https://webtor.io/", which has a single huge input field, I hope someone could help me with that specific example.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://webtor.io/');
await page.type(`input[type="text"]`, 'something', { delay: 50 })
})();
This happens because when you go to the page the page has render the html and load up scripts, which ends up causing the delay and sometimes the text input is not loaded hence the fail.
await page.goto(''https://webtor.io/', {waitUntil: 'networkidle0'});
Check out this link for further details.
Puppeteer wait until page is completely loaded

Puppeteer cannot goto web page to get selector

The problem has been resolved by adding cookie from an actual browser.
I'm trying to get half-price products from this website https://shop.coles.com.au/a/richmond-south/specials/search/half-price-specials. The website is rendered by AngularJS so I'm trying to use puppeteer for data scraping.
headless is false, just a blank page shows up
headless is true, it throws an exception as the image Error while running with headless browser
const puppeteer = require('puppeteer');
async function getProductNames(){
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.setViewport({ width: 1000, height: 926 });
await page.goto("https://shop.coles.com.au/a/richmond-south/specials/search/half-price-specials");
await page.waitForSelector('.product-name')
console.log("Begin to evaluate JS")
var productNames = await page.evaluate(() => {
var div = document.querySelectorAll('.product-name');
console.log(div)
var productnames = []
// leave it blank for now
return productnames
})
console.log(productNames)
browser.close()
}
getProductNames();
P/S: While looking into this issue, I figure out the web page is actually console.log out the data of each page, but I can't trace the request. If you can show me how it could be great.
The web page console log data
Try adding options parameter to page.to('url'[,options]) method
page.goto("https://shop.coles.com.au/a/richmond-south/specials/search/half-price-specials", { waitUntil: 'networkidle2' })
It will consider navigation to be finished only when there are no more than 2 network connections for at least 500 ms.
You can refer documentation about parameters of options object here: Goto Options parameter

waitFor() doesn't find element which is displayed on the page

I am trying to run my first code on puppeteer.
Puppeteer v1.20.0
Node v8.11.3
Npm v5.6.0
It is a basic example but it doesn't works :
const puppeteer = require('puppeteer');
puppeteer.launch({headless: false}).then(async browser => {
const page = await browser.newPage();
await page.goto('https://www.linkedin.com/learning/login', { waitUntil: 'networkidle0' });
console.log(0);
await page.waitFor('#username');
console.log(1);
await browser.close();
});
When I run the script, chromium start and I can see the Linkedin login page with the form and the #username form's field, but puppeteer doesn't find the field. It displays 0 but never 1 and then runs a TimeoutError: waiting for selector "#username" failed: timeout 30000ms exceeded.
Increase timeout doesn't change anything and if I check the console in chromium the field is there.
Linkedin login page works as an SPA and I don't know if I'm using the right way here.
Thank you in advance.
username input is inside iframe you cant access it like this , you need to access iframe first
await page.goto('https://www.linkedin.com/learning/login');
await page.waitForSelector('.authentication-iframe');
var frames = await page.frames();
var myframe = frames.find(
f =>
f.url().indexOf("uas/login") > 0);
let username = '123456#gmail.com';
const usernamefild = await myframe.$("#username");
await usernamefild.type(username, {delay: 10});

Usage of chrome headless for making PDF (puppeteer)

I wondering how can I get PDF using Chrome Headless (for example puppeteer). It seems like a good PDF maker but only on chrome using #media print. So here is my question:
Can I get PDF by puppeteer on another browser (ie, mozilla) too? I think I can do that if I want print static page with no inputs. But if I have inputs for users and they are saving it on IE. Can I use this somehow?
Ok i downloaded the puppeteer. I've got the code:
$scope.aClick = function(){
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('/vUrl_form.html', {waitUntil: 'networkidle'});
await page.pdf({path: 'images/asd.pdf', format: 'A4'});
browser.close();
})();
};
and this can't still work (i don't know why, but app can't run).
No - Puppeteer only works with Chromium/Chrome.
Unfortunately Puppeteer only works with Chromium/Chrome.
If you want to use Headless Mozilla Firefox, you might consider checking this out https://developer.mozilla.org/en-US/Firefox/Headless_mode .
If you still want to use Puppeteer, here is a working snippet that creates a .pdf file:
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com', {waitUntil: 'networkidle'});
// page.pdf() is currently supported only in headless mode.
// #see https://bugs.chromium.org/p/chromium/issues/detail?id=753118
await page.pdf({
path: 'hn.pdf',
format: 'letter'
});
browser.close();
})();
Today it's possible to use firefox with puppeter https://firefox-puppeteer.readthedocs.io/en/master/ Maybe when people answered it wasn't. But I can't find url to pdf functionality. Just screenshots.

Categories