I am trying to get the a new tab and scrape the title of that page with puppeteer.
This is what I have
// use puppeteer
const puppeteer = require('puppeteer');
//set wait length in ms: 1000ms = 1sec
const short_wait_ms = 1000
async function run() {
const browser = await puppeteer.launch({
headless: false, timeout: 0});
const page = await browser.newPage();
await page.goto('https://biologyforfun.wordpress.com/2017/04/03/interpreting-random-effects-in-linear-mixed-effect-models/');
// second page DOM elements
const CLICKHERE_SELECTOR = '#post-2068 > div > div.entry-content > p:nth-child(2) > a:nth-child(1)';
// main page
await page.waitFor(short_wait_ms);
await page.click(CLICKHERE_SELECTOR);
// new tab opens - move to new tab
let pages = await browser.pages();
//go to the newly opened page
//console.log title -- Generalized Linear Mixed Models in Ecology and in R
}
run();
I can't figure out how to use browser.page() to start working on the new page.
According to the Puppeteer Documentation:
page.title()
returns: <Promise<string>> Returns page's title.
Shortcut for page.mainFrame().title().
Therefore, you should use page.title() for getting the title of the newly opened page.
Alternatively, you can gain a slight performance boost by using the following:
page._frameManager._mainFrame.evaluate(() => document.title)
Note: Make sure to use the await operator when calling page.title(), as the title tag must be downloaded before Puppeteer can access its content.
You shouldn't need to move to the new tab.
To get the title of any page you can use:
const pageTitle = await page.title();
Also after you click something and you're waiting for the new page to load you should wait for the load event or the network to be Idle:
// Wait for redirection
await page.waitForNavigation({waitUntil: 'networkidle', networkIdleTimeout: 1000});
Check the docs: https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagewaitfornavigationoptions
Related
I am trying to inspect a page with playwright that holds a frame document that when I click a button a banner will appear for a couple of minutes. When it's done the page needs to be reloaded for the banner to disappear. I am checking every 5 minutes automatically until I don't see the banner on the page but when I can only do it for the 1 loop after that the code breaks. What can I do to fix this.
A possible solution could be going to the iframe link itself but the document breaks if I do that. I wish to avoid doing this. It's not how I would do things if I was manually doing this.
UnhandledPromiseRejectionWarning: frame.evaluate: Execution Context is not available in detached frame (are you trying to evaluate?)
const browser = await chromium.launch({
args: ["--start-maximized", "--disable-notifications", '--disable-extensions', '--mute-audio'],
defaultViewport: null,
devtools: true,
slowMo: 50,
downloadsPath: "D:\\Lambda\\projects\\puppeteer_test\\data",
});
// Create a new incognito browser context with user credentials
const context = await browser.newContext({
acceptDownloads: true,
viewport: null,
storageState: JSON.parse(storageState),
})
// Create a new page in a pristine context.
const page = await context.newPage()
// go to download your information
await page.goto("");
//select child frame
const frameDocUrl = await (await page.waitForSelector("iframe")).getAttribute("src")
const doc = await page.frame({url: frameDocUrl})
await doc.waitForLoadState('domcontentloaded');
/* waitForFile */
// refresh every 5 minute until notice of gathering file is gone
// then Pending becomes download
const frameUrl = await doc.url()
const fiveMinutes = 300000
let IsGatheringFile = await doc.$("//div[text()='A copy of your information is being created.']") ? true: false
while(IsGatheringFile){
//reload page
console.log("going to reload")
await doc.goto(frameUrl)
// wait for 5 minutes
console.log(`going to start waiting for 5 min starting in ${Date().split(" ")[4]}`)
await doc.waitForTimeout(fiveMinutes)
console.log("finish reloading")
// check if notice is gone
IsGatheringFile = await doc.$("//div[text()='A copy of your information is being created.']") ? true: false
}
console.log("finish waiting for data")
console.log("finish reloading the page until the banner is gone")
Solution:
after the page refresh/new navigation recapture the focus on the iframe.
const frameUrl = await doc.url()
await doc.goto(frameUrl)
Also, note that you can update the variable that you are passing by to the other parts of your script with the new refresh iframe.
old hacky fix:
Instead of reloading the page reload the iframe.
At the moment there is no frame.reload but this process can be achieved by frame.goto(frameURL)
const frameUrl = await doc.url()
await doc.goto(frameUrl)
Note: iframe can break. Reloading the page can fix it but the frame will be detached.
This post is a bit old but I will respond anyway as I had this problem this week and just resolved it.
I am in python not Node, but the logic is still the same I believe.
So for me, just recapturing the focus didn't work after the page.reload().
I did use the "old hacky fix" and instead of reload all the page, reloaded just the frame concerned.
My solution is like that :
iframe.goto(iframe.url)
is_detached = iframe.is_detached()
if is_detached:
iframe = page.main_frame.child_frames[-1]
I m using nodejs puppeteer to scrape a website. I've come across a situation where i need to go back in a new tab, but i couldn't find a way to do it in puppeteer (i can produce it manually on windows by ctrl + clicking the browser's go back button)
below is an example where i need to launch many pages in parallel starting from a particular page
const page = await browser.newPage();
await page.goto(myWebsiteUrl);
// going through some pages..
for (let i = 0; i < numberOfPagesInParallel; i++) {
// instanciating many pages with goback
const newBackPage = await page.gobackAndReturnNewPage(); // this is what i wish i could do, but not possible in puppeteer
const promise = processNewBackPageAsync(newBackPage);
this.allPromises.push(promise);
}
await Promise.all([...this.allPromises])
I searched across puppeteer api and chrome devtools protocol and don't find any way to clone a tab or clone history to another tab, maybe this is a usefull feature to add to both puppeteer and chrome CDP.
But, there is a way to create a new page and go back in history without need to track history, the limitation of this solution is that the new page does not share/clone the history of original page, I also tried to use Page.navigateToHistoryEntry but since the history is owned by page I got a error
So, there is the solution that creates a new page and go to last history url.
const puppeteer = require("puppeteer");
(async function() {
// headless: false
// to see the result in the browser
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
// let's do some navigation
await page.goto("http://localhost:5000");
await page.goto("http://localhost:5000/page-one");
await page.goto("http://localhost:5000/page-two");
// access history and evaluate last url of page
const session = await page.target().createCDPSession();
const history = await session.send("Page.getNavigationHistory");
const last = history.entries[history.entries.length - 2];
// create a new page and go back
// important: the page created here does not share the history
const backPage = await browser.newPage();
await backPage.goto(last.url);
// see results
await page.screenshot({ path: "page.png" });
await backPage.screenshot({ path: "back-page.png" });
// uncomment if you use headless chrome
// await browser.close();
})();
References:
https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-getNavigationHistory
The problem has been resolved by adding cookie from an actual browser.
I'm trying to get half-price products from this website https://shop.coles.com.au/a/richmond-south/specials/search/half-price-specials. The website is rendered by AngularJS so I'm trying to use puppeteer for data scraping.
headless is false, just a blank page shows up
headless is true, it throws an exception as the image Error while running with headless browser
const puppeteer = require('puppeteer');
async function getProductNames(){
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.setViewport({ width: 1000, height: 926 });
await page.goto("https://shop.coles.com.au/a/richmond-south/specials/search/half-price-specials");
await page.waitForSelector('.product-name')
console.log("Begin to evaluate JS")
var productNames = await page.evaluate(() => {
var div = document.querySelectorAll('.product-name');
console.log(div)
var productnames = []
// leave it blank for now
return productnames
})
console.log(productNames)
browser.close()
}
getProductNames();
P/S: While looking into this issue, I figure out the web page is actually console.log out the data of each page, but I can't trace the request. If you can show me how it could be great.
The web page console log data
Try adding options parameter to page.to('url'[,options]) method
page.goto("https://shop.coles.com.au/a/richmond-south/specials/search/half-price-specials", { waitUntil: 'networkidle2' })
It will consider navigation to be finished only when there are no more than 2 network connections for at least 500 ms.
You can refer documentation about parameters of options object here: Goto Options parameter
I want to start Chrome. Open our website and read value from Session Storage. However, it seems window.sesionStorage is not the solution. I want to read value of language from sessionStorage of my web site. Note - I do not want to set sessionStorage. I want to read values set when landing page gets opened.
const puppeteer = require('puppeteer');
run().then(() => console.log('Done')).catch(error => console.log(error));
async function run() {
// Create a new browser. By default, the browser is headless,
// which means it runs in the background and doesn't appear on
// the screen. Setting `headless: false` opens up a browser
// window so you can watch what happens.
const browser = await puppeteer.launch({ headless: false });
// Open a new page and navigate to google.com
const page = await browser.newPage();
await page.goto('https://mywebsitelanding.com/landing');
// Wait 5 seconds
await new Promise(resolve => setTimeout(resolve, 5000));
const sessionStorage = await page.evaluate(() => window.sessionStorage)
console.log(window.sessionStorage)
// const returnedCookie = await page.cookies();
// console.log(returnedCookie)
// Close the browser and exit the script
await browser.close();
}
There is no window object in puppeteer:
const sessionStorage = await page.evaluate(() => window.sessionStorage)
console.log(sessionStorage)
I am trying to run my first code on puppeteer.
Puppeteer v1.20.0
Node v8.11.3
Npm v5.6.0
It is a basic example but it doesn't works :
const puppeteer = require('puppeteer');
puppeteer.launch({headless: false}).then(async browser => {
const page = await browser.newPage();
await page.goto('https://www.linkedin.com/learning/login', { waitUntil: 'networkidle0' });
console.log(0);
await page.waitFor('#username');
console.log(1);
await browser.close();
});
When I run the script, chromium start and I can see the Linkedin login page with the form and the #username form's field, but puppeteer doesn't find the field. It displays 0 but never 1 and then runs a TimeoutError: waiting for selector "#username" failed: timeout 30000ms exceeded.
Increase timeout doesn't change anything and if I check the console in chromium the field is there.
Linkedin login page works as an SPA and I don't know if I'm using the right way here.
Thank you in advance.
username input is inside iframe you cant access it like this , you need to access iframe first
await page.goto('https://www.linkedin.com/learning/login');
await page.waitForSelector('.authentication-iframe');
var frames = await page.frames();
var myframe = frames.find(
f =>
f.url().indexOf("uas/login") > 0);
let username = '123456#gmail.com';
const usernamefild = await myframe.$("#username");
await usernamefild.type(username, {delay: 10});