Puppeteer for scraping a page (with authentication) - javascript

I am using puppeteer for scraping a page (load test application) and I cannot add username and password into this page. Does anyone of you know puppeteer and may help me? This is the code:
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto(“https://d22syekf1i694k.cloudfront.net/”, {waitUntil: ‘networkidle2’});
await page.waitForSelector(‘input[name=username]’);
await page.type(‘input[name=username]’, ‘Adenosine’);
await page.$eval(‘input[name=username]’, el => el.value = ‘Adenosine’);
await browser.close();
})(); ```

Related

Puppeteer returning empty array

I'm trying pick some data from followers page, but always return a empty array.
That's my code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless:false});
const page = await browser.newPage();
await page.goto('https://www.instagram.com/accounts/login/', {
waitUntil: 'networkidle0',
});
// Wait for log in form
await Promise.all([
page.waitForSelector('[name="username"]'),
page.waitForSelector('[name="password"]'),
page.waitForSelector('[type="submit"]'),
]);
// Enter username and password
await page.type('[name="username"]', 'yourAccount');
await page.type('[name="password"]', 'yourPassword');
// Submit log in credentials and wait for navigation
await Promise.all([
page.click('[type="submit"]'),
page.waitForNavigation({
waitUntil: 'networkidle0',
}),
]);
await page.goto('https://www.instagram.com/publicProfile /followers/', {waitUntil: 'networkidle0'});
const teste = await page.evaluate(() => {
const followers = document.querySelectorAll("._aaco span");
let followersArray = []
followers.forEach((item) =>{
followersArray.push(item.innerText)
})
return followersArray
})
console.log(teste)
await browser.close();
})();
publicProfile in the URL is a profile that I choose, but for privacy reasons e alterate for publicProfile.
UPDATE: The problem has resolved, as Örvar said the problem was that I wasn't logged, soo I search for help here and I found this (Puppeteer Login to Instagram) question that solved my question.
When you use a tool like Puppeteer to get content from a site where you need to login, you also need to login using Puppeteer, so the site that you are logging into will generate a user cookie.
Log into Instagram using Puppeteer with user credentials
When Puppeteer has logged in with user credentials, you can run the code you have posted above.

How to handle "accept cookies"?

I am trying to make a scraper that gets the reviews for hotel on tripadvisor.com. I was just working with pagination and testing if the browser would go all the way to the end, where there is no more pages.
Here is my code so far:
const puppeteer = require("puppeteer");
const cheerio = require("cheerio");
async function main() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://www.tripadvisor.com/Hotels-g298656-Ankara-Hotels.html');
while(true) {
await page.click('a[class="nav next ui_button primary"]');
await page.waitForNavigation({waitUntil: 'networkidle0'});
}
}
main();
However, this stops when the 'accept cookies' popup appears. How can I handle this?

How do I open a new window page from a button in Puppeteer?

I'm trying to open a new window page from a button in Puppeteer.
An example given: I'm logging to a website and the moment I click the button for the login a new fresh window page will pop-up, redirecting to the site the button is meant to be going. How can I do it?
You can do that by simply pressing Shift button while doing page.click
And to catch the newly opened window you can use waitForTarget.
const puppeteer = require('puppeteer')
;(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
})
const context = browser.defaultBrowserContext()
const page = (await context.pages())[0]
await page.goto('https://www.amazon.com/gp/product/B093GQSVPX/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1', {waitUntil: 'load'})
await page.waitForSelector('a[title="Add to List"]', {visible: true})
await page.keyboard.down('Shift')
await page.click('a[title="Add to List"]')
await page.keyboard.up('Shift')
const popup = await browser.waitForTarget(
(target) => target.url().includes('www.amazon.com/ap/signin')
)
const popupPage = await popup.page()
await popupPage.waitForSelector('a.a-link-expander[role="button"]')
await popupPage.click('a.a-link-expander[role="button"]')
await popupPage.click('input#continue[type="submit"]')
await browser.close()
})()

Favicon missing in headless puppeteer

I'm writing a simple script to just check if all the resources load correctly (i check status codes of the responses). I've decided to use puppeteer for this and i wrote
(async () => {
const browser = await puppeteer.launch({headless: false});
const [page] = await browser.pages();
page.on("response", (res) => {
const url = res.url(), status=res.status();
// my functionality goes here
if (url.includes("favicon")) console.log(status, url); // not logging in headless
});
await page.goto("https://stackoverflow.com/", {waitUntil: 'networkidle2'});
await browser.close();
})();
the issue is that if i run my application in headless mode the favicon is missing from the responses, i assume it has something to do with puppeteer not loading a favicon in headless. Any built in functionality or workarounds?
From the lack of a better solution right now I'm evaluating favicons url and manually visiting it
async function checkFavicon(page){
const iconUrl = await page.$eval("link[rel*='icon']", ({href}) => href);
await page.goto(iconUrl);
await page.goBack();
}

Not able to capture image while generating pdf using puppeteer API

Node- v8.11.1 Headless Chrome
Im trying to generate PDF but somehow the background image is not captured in the PDF.
Below is the code. Any help is appreciated
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
await page.goto('http://54.201.139.151/', {waitUntil : 'networkidle0'});
await page.pdf({path: 'hn40.pdf', printBackground: true, width: '1024px' , height: '768px'});
await browser.close();
})();
Update: page.emulateMedia() is dropped in favor of page.emulateMediaType()
As Rippo mentioned, you require page.emulateMedia("screen") for this to work properly. I have updated your script below, but I changed the page to google for testing.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://google.ca/', {waitUntil : 'networkidle2'});
await page.emulateMedia('screen');
await page.pdf({path: 'hn40.pdf', printBackground: true, width: '1024px' , height: '768px'});
await browser.close();
})();

Categories