I use code
const {chromium} = require('playwright');
(async () => {
const userDataDir = '\NewData';
const browser = await chromium.launchPersistentContext(userDataDir,{headless:false});
const page = await browser.newPage();
await page.goto('https://www.google.com/')
})()
But the browser runs one blank tab before opening google,
I think this is because I wrote the cookies incorrectly, but I have no desire to rewrite them. How can I close the first tab?
Saving the context won't prevent the browser from leaving that empty tab. If you don't want to see it you can use it instead of creating a new one.
const context = await chromium.launchPersistentContext(userDataDir,{headless:false});
const [ page ] = context.pages();
await page.goto('https://www.google.com/')
Related
I m using nodejs puppeteer to scrape a website. I've come across a situation where i need to go back in a new tab, but i couldn't find a way to do it in puppeteer (i can produce it manually on windows by ctrl + clicking the browser's go back button)
below is an example where i need to launch many pages in parallel starting from a particular page
const page = await browser.newPage();
await page.goto(myWebsiteUrl);
// going through some pages..
for (let i = 0; i < numberOfPagesInParallel; i++) {
// instanciating many pages with goback
const newBackPage = await page.gobackAndReturnNewPage(); // this is what i wish i could do, but not possible in puppeteer
const promise = processNewBackPageAsync(newBackPage);
this.allPromises.push(promise);
}
await Promise.all([...this.allPromises])
I searched across puppeteer api and chrome devtools protocol and don't find any way to clone a tab or clone history to another tab, maybe this is a usefull feature to add to both puppeteer and chrome CDP.
But, there is a way to create a new page and go back in history without need to track history, the limitation of this solution is that the new page does not share/clone the history of original page, I also tried to use Page.navigateToHistoryEntry but since the history is owned by page I got a error
So, there is the solution that creates a new page and go to last history url.
const puppeteer = require("puppeteer");
(async function() {
// headless: false
// to see the result in the browser
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
// let's do some navigation
await page.goto("http://localhost:5000");
await page.goto("http://localhost:5000/page-one");
await page.goto("http://localhost:5000/page-two");
// access history and evaluate last url of page
const session = await page.target().createCDPSession();
const history = await session.send("Page.getNavigationHistory");
const last = history.entries[history.entries.length - 2];
// create a new page and go back
// important: the page created here does not share the history
const backPage = await browser.newPage();
await backPage.goto(last.url);
// see results
await page.screenshot({ path: "page.png" });
await backPage.screenshot({ path: "back-page.png" });
// uncomment if you use headless chrome
// await browser.close();
})();
References:
https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-getNavigationHistory
I want to start Chrome. Open our website and read value from Session Storage. However, it seems window.sesionStorage is not the solution. I want to read value of language from sessionStorage of my web site. Note - I do not want to set sessionStorage. I want to read values set when landing page gets opened.
const puppeteer = require('puppeteer');
run().then(() => console.log('Done')).catch(error => console.log(error));
async function run() {
// Create a new browser. By default, the browser is headless,
// which means it runs in the background and doesn't appear on
// the screen. Setting `headless: false` opens up a browser
// window so you can watch what happens.
const browser = await puppeteer.launch({ headless: false });
// Open a new page and navigate to google.com
const page = await browser.newPage();
await page.goto('https://mywebsitelanding.com/landing');
// Wait 5 seconds
await new Promise(resolve => setTimeout(resolve, 5000));
const sessionStorage = await page.evaluate(() => window.sessionStorage)
console.log(window.sessionStorage)
// const returnedCookie = await page.cookies();
// console.log(returnedCookie)
// Close the browser and exit the script
await browser.close();
}
There is no window object in puppeteer:
const sessionStorage = await page.evaluate(() => window.sessionStorage)
console.log(sessionStorage)
I am connected to a browser using a ws endpoint (puppeteer.connect({ browserWSEndpoint: '' })).
When I launch the browser that I ultimately connect to, is there a way to launch this in incognito?
I know I can do something like this:
const incognito = await this.browser.createIncognitoBrowserContext();
But it seems like the incognito session is tied to the originally opened browser. I just want it to be by itself.
I also see you can do this:
const baseOptions: LaunchOptions = { args: ['--incognito']};
But I am not sure if this is the best way or not.
Any advice would be appreciated. Thank you!
The best way to accomplish your goal is to launch the browser directly into incognito mode by passing the --incognito flag to puppeteer.launch():
const browser = await puppeteer.launch({
args: [
'--incognito',
],
});
Alternatively, you can create a new incognito browser context after launching the browser using browser.createIncognitoBrowserContext():
const browser = await puppeteer.launch();
const context = await browser.createIncognitoBrowserContext();
You can check whether a browser context is incognito using browserContext.isIncognito():
if (context.isIncognito()) { /* ... */ }
the solutions above didn't work for me:
an incognito window is created, but then when the new page is created, it is no longer incognito.
The solution that worked for me was:
const browser = await puppeteer.launch();
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage();
then you can use page and it's an incognito page
For Puppeteer sharp it's rather messy but this seems to work.. Hopefully it helps someone.
using (Browser browser = await Puppeteer.LaunchAsync(options))
{
// create the async context
var context = await browser.CreateIncognitoBrowserContextAsync();
// get the page created by default when launch async ran and close it whilst keeping the browser active
var browserPages = await browser.PagesAsync();
await browserPages[0].CloseAsync();
// create a new page using the incognito context
using (Page page = await context.NewPageAsync())
{
// do something
}
}
How can I get 3rd-party cookies from a website using Puppeteer?
For first party, I know I can use:
await page.cookies()
I was interested to know the answer so have found a solution too, it works for the current versions of Chromium 75.0.3765.0 and puppeteer 1.15.0 (updated May 2nd 2019).
Using internal puppeteer page._client methods we can make use of Chrome DevTools Protocol directly:
(async() => {
const browser = await puppeteer.launch({});
const page = await browser.newPage();
await page.goto('https://stackoverflow.com', {waitUntil : 'networkidle2' });
// Here we can get all of the cookies
console.log(await page._client.send('Network.getAllCookies'));
})();
In the object returned there are cookies for google.com and imgur.com which we couldn't have obtained with normal browser javascript:
You can create a Chrome DevTools Protocol session on the page target using target.createCDPSession(). Then you can send Network.getAllCookies to obtain a list of all browser cookies.
The page.cookies() function will only return cookies for the current URL. So we can filter out the current page cookies from all of the browser cookies to obtain a list of third-party cookies only.
const client = await page.target().createCDPSession();
const all_browser_cookies = (await client.send('Network.getAllCookies')).cookies;
const current_url_cookies = await page.cookies();
const third_party_cookies = all_browser_cookies.filter(cookie => cookie.domain !== current_url_cookies[0].domain);
console.log(all_browser_cookies); // All Browser Cookies
console.log(current_url_cookies); // Current URL Cookies
console.log(third_party_cookies); // Third-Party Cookies
const browser = await puppeteer.launch({});
const page = await browser.newPage();
await page.goto('https://www.stackoverflow.com/', {waitUntil : 'networkidle0' });
// networkidle2, domcontentloaded, load are the options for wai until
// Here we can get all of the cookies
var content = await page._client.send('Network.getAllCookies');
console.log(JSON.stringify(content, null, 4));
I am trying to get the a new tab and scrape the title of that page with puppeteer.
This is what I have
// use puppeteer
const puppeteer = require('puppeteer');
//set wait length in ms: 1000ms = 1sec
const short_wait_ms = 1000
async function run() {
const browser = await puppeteer.launch({
headless: false, timeout: 0});
const page = await browser.newPage();
await page.goto('https://biologyforfun.wordpress.com/2017/04/03/interpreting-random-effects-in-linear-mixed-effect-models/');
// second page DOM elements
const CLICKHERE_SELECTOR = '#post-2068 > div > div.entry-content > p:nth-child(2) > a:nth-child(1)';
// main page
await page.waitFor(short_wait_ms);
await page.click(CLICKHERE_SELECTOR);
// new tab opens - move to new tab
let pages = await browser.pages();
//go to the newly opened page
//console.log title -- Generalized Linear Mixed Models in Ecology and in R
}
run();
I can't figure out how to use browser.page() to start working on the new page.
According to the Puppeteer Documentation:
page.title()
returns: <Promise<string>> Returns page's title.
Shortcut for page.mainFrame().title().
Therefore, you should use page.title() for getting the title of the newly opened page.
Alternatively, you can gain a slight performance boost by using the following:
page._frameManager._mainFrame.evaluate(() => document.title)
Note: Make sure to use the await operator when calling page.title(), as the title tag must be downloaded before Puppeteer can access its content.
You shouldn't need to move to the new tab.
To get the title of any page you can use:
const pageTitle = await page.title();
Also after you click something and you're waiting for the new page to load you should wait for the load event or the network to be Idle:
// Wait for redirection
await page.waitForNavigation({waitUntil: 'networkidle', networkIdleTimeout: 1000});
Check the docs: https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagewaitfornavigationoptions