I've tried just about everything and can't seem to figure out how to get Puppeteer to work in my current browser window (Where I'm logged in to Chrome) rather than a new cache-less logged out browser. Here's my current config setting up everything. I've tried starting chrome prior with remote debug port, loading user data in args for launching puppeteer, launching both Chromium and my current Chrome installation path, etc. Here's my current code:
const opts = {
logLevel: 'info',
output: 'json'
};
const chrome = await chromelauncher.launch( {port:9222 });
opts.port = chrome.port;
// Connect to it using puppeteer.connect().
const resp = await util.promisify(request)(`http://localhost:${opts.port}/json/version`);
const {webSocketDebuggerUrl} = JSON.parse(resp.body);
const browser = await puppeteer.connect({browserWSEndpoint: webSocketDebuggerUrl,
args: ["--disable-extensions"]});
const page = await browser.newPage();
await page.setViewport({ width: 1366, height: 768});
I've run out of resources to look, if something looks off please let me know. Thanks!
You have to use a userDataDir to reuse the cache.
puppeteer.launch({
userDataDir: 'PATH TO DATA FOLDER',
})
You can find your data directory here,
Windows 7, 8.1, and 10: C:\Users\<username>\AppData\Local\Google\Chrome\User Data\Default
Mac OS X El Capitan: Users/<username>/Library/Application Support/Google/Chrome/Default
Linux: /home/<username>/.config/google-chrome/default
Another way is to open chrome://version and pick the path from there,
Now, remove the Default and you will get your data directory,
[Profile Path] C:\Users\Alice\AppData\Local\Google\Chrome\User Data\Default
[User Data Dir] C:\Users\Alice\AppData\Local\Google\Chrome\User Data
So the code will look like,
puppeteer.launch({
userDataDir: `C:\Users\Alice\AppData\Local\Google\Chrome\User Data`,
// <-- notice I used backtick to avoid writing backslashs
})
Learn more about data directory here.
Another interesting argument is the --profile-directory, You can name a profile and use that.
--profile-directory=Default
Not sure if this is useful to your case as I use the Chromium browser and not my system chrome browser but I save the Chromium setting inside the project PeaceOut.
I'm still learning how to use puppeteer so might not be doing everything the best way.
const browser = await puppeteer.launch({
headless: false,
devtools: true,
// slowMo: 250 // slow down by 250ms
// executablePath <string> Path to a Chromium or Chrome executable to run
userDataDir: 'C:\\Users\\TeDev\\Scrape\\PeaceOut\\bdata'
// userDataDir <string> Path to a User Data Directory.
});
const page = await browser.pages();
await page[0].setViewport({ width: 1280, height: 1080 })
console.log(`Trying to access ${URL}`);
await page[0].goto(URL); // use tab 0, so Chromium doesn't show a blank tab.
Add --new-window argument when launching
Related
Once I open a puppeteer browser page and redirect to a specific site. If the proxy used for the browser is banned, I want to be able to switch it without closing or restarting the browser. refreshing the browser would be the best option. However, I have not yet been able to figure out how to do it. I've tried making variables out of '--proxy-server=xxxx' and switching that variable whenever proxy is banned, but that didnt work out. I've tried many other things too but have yet been able to figure it out. Any kind of help would be much appreciated.
var proxies = get_proxy() // get proxy
var useragent = randomUseragent.getRandom()
let browser = await puppeteer.launch({
headless: false,
args: [
`--proxy-server=${proxies.address}:${proxies.port}`,
`--user-agent="${useragent}"`,
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-infobars',
'--window-position=0,0',
'--ignore-certifcate-errors',
'--ignore-certifcate-errors-spki-list'
]
})
let page = await browser.newPage()
Example of how i want it to be:
const proxy_banned = await check_proxy_ban()
If(proxy_banned){
- Switch puppeteer browser proxy
- refresh
- return function
} else{
- return function
}
This is my first time using puppeteer and I want to open a google chrome page and navigate to a chrome extension I have installed . I try to enable the chrome extension but when I run my script in headless:false mode the browser pops up without my extension .
My code :
//my extension path
const StayFocusd = 'C:\\Users\\vasilis\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Extensions\\laankejkbhbdhmipfmgcngdelahlfoji\\1.6.0_0';
async function run(){
//this is where I try to enable my extension
const browser = await puppeteer.launch({
headless: false,
ignoreDefaultArgs: [`--disable-extensions-except=${StayFocusd}`,"--enable-automation"],
}
);
const page = await browser.newPage();
sleep(3000);
await browser.close();
}
run();
So the extension does not load and I get no error or anything . I would appreciate your help
It is not enough to set --disable-extensions-except launch flag with your CRX path, you should also use --load-extension to actually load your extension in the opened browser instance.
You also seem to make a mistake using ignoreDefaultArgs where you should have used args (like this Chromium literally did the opposite of what you've expected).
Correct usage of puppeteer.launch:
const browser = await puppeteer.launch({
headless: false,
args: [
`--disable-extensions-except=${StayFocusd}`,
`--load-extension=${StayFocusd}`,
'--enable-automation'
]
})
You can make use of the official docs about Working with Chrome Extensions.
I want to start a chromium browser instant headless, do some automated operations, and then turn it visible before doing the rest of the stuff.
Is this possible to do using Puppeteer, and if it is, can you tell me how? And if it is not, is there any other framework or library for browser automation that can do this?
So far I've tried the following but it didn't work.
const browser = await puppeteer.launch({'headless': false});
browser.headless = true;
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com', {waitUntil: 'networkidle2'});
await page.pdf({path: 'hn.pdf', format: 'A4'});
Short answer: It's not possible
Chrome only allows to either start the browser in headless or non-headless mode. You have to specify it when you launch the browser and it is not possible to switch during runtime.
What is possible, is to launch a second browser and reuse cookies (and any other data) from the first browser.
Long answer
You would assume that you could just reuse the data directory when calling puppeteer.launch, but this is currently not possible due to multiple bugs (#1268, #1270 in the puppeteer repo).
So the best approach is to save any cookies or local storage data that you need to share between the browser instances and restore the data when you launch the browser. You then visit the website a second time. Be aware that any state the website has in terms of JavaScript variable, will be lost when you recrawl the page.
Process
Summing up, the whole process should look like this (or vice versa for headless to headfull):
Crawl in non-headless mode until you want to switch mode
Serialize cookies
Launch or reuse second browser (in headless mode)
Restore cookies
Revisit page
Continue crawling
As mentioned, this isn't currently possible since the headless switch occurs via Chromium launch flags.
I usually do this with userDataDir, which the Chromium docs describe as follows:
The user data directory contains profile data such as history, bookmarks, and cookies, as well as other per-installation local state.
Here's a simple example. This launches a browser headlessly, sets a local storage value on an arbitrary page, closes the browser, re-opens it headfully, retrieves the local storage value and prints it.
const puppeteer = require("puppeteer"); // ^18.0.4
const url = "https://www.example.com";
const opts = {userDataDir: "./data"};
let browser;
(async () => {
{
browser = await puppeteer.launch({...opts, headless: true});
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.evaluate(() => localStorage.setItem("hello", "world"));
await browser.close();
}
{
browser = await puppeteer.launch({...opts, headless: false});
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
const result = await page.evaluate(() => localStorage.getItem("hello"));
console.log(result); // => world
}
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Change const opts = {userDataDir: "./data"}; to const opts = {}; and you'll see null print instead of world; the user data doesn't persist.
The answer from a few years ago mentions issues with userDataDir and suggests a cookies solution. That's fine, but I haven't had any issues with userDataDir so either they've been resolved on the Puppeteer end or my use cases haven't triggered the issues.
There's a useful-looking answer from a reputable source in How to turn headless on after launch? but I haven't had a chance to try it yet.
I've had quite a few issues trying to get Puppeteer to run in an existing chrome window with user data. I've tried simplifying things down and here is my current code. I don't know what else to do as i've tried everything related to the issue:
const browser = await puppeteer.launch({
headless: false,
userDataDir: 'C:\\Users\\me\\AppData\\Local\\Google\\Chrome\\User Data'
})
const page = await browser.newPage();
This causes the following error:
error Error: Failed to launch chrome!
[8084:14860:0321/091939.752:ERROR:cache_util_win.cc(19)] Unable to move the cach
e: 0
[8084:14860:0321/091939.752:ERROR:cache_util.cc(140)] Unable to move cache folde
r C:\Users\me\AppData\Local\Google\Chrome\User Data\ShaderCache\GPUCache
to C:\Users\me\AppData\Local\Google\Chrome\User Data\ShaderCache\old_GP
UCache_000
[8084:14860:0321/091939.752:ERROR:disk_cache.cc(184)] Unable to create cache
[8084:14860:0321/091939.752:ERROR:shader_disk_cache.cc(622)] Shader Cache Creati
on failed: -2
The issue is related to an open Chrome process. Make sure you close all of them in the task manager and the following will work:
browser = await puppeteer.launch({
headless: false,
executablePath: `C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe`,
userDataDir: `C:\\Users\\Marwan\\AppData\\Local\\Google\\Chrome\\User Data`,
});
Is there a way to test a Chrome extension using Puppeteer? For example can an extension detect that Chrome was launched in "test" mode to provide different UI, check content scripts are working, etc?
Passing --user-agent in puppeteer.launch() is a useful way to override the browser's UA with a custom value. Then, your extension can read back navigator.userAgent in its background page and identify that Chrome was launched with Puppeteer. At that point, you can provide different code paths for testing the crx vs. normal operation.
puppeteer_script.js
const puppeteer = require('puppeteer');
const CRX_PATH = '/path/to/crx/folder/';
puppeteer.launch({
headless: false, // extensions only supported in full chrome.
args: [
`--disable-extensions-except=${CRX_PATH}`,
`--load-extension=${CRX_PATH}`,
'--user-agent=PuppeteerAgent'
]
}).then(async browser => {
// ... do some testing ...
await browser.close();
});
Extension background.js
chrome.runtime.onInstalled.addListener(details => {
console.log(navigator.userAgent); // "PuppeteerAgent"
});
Alternatively, if you wanted to preserve the browser's original UA string, it gets tricky.
Launch Chrome and create a blank page in Puppeteer.
Set its title to a custom name.
Detect the tab's title update in your background script.
Set a global flag to reuse later.
background.js
let LAUNCHED_BY_PUPPETEER = false; // reuse in other parts of your crx as needed.
chrome.tabs.onUpdated.addListener((tabId, info, tab) => {
if (!LAUNCHED_BY_PUPPETEER && tab.title.includes('PuppeteerAgent')) {
chrome.tabs.remove(tabId);
LAUNCHED_BY_PUPPETEER = true;
}
});
puppeteer_script.js
const puppeteer = require('puppeteer');
const CRX_PATH = '/path/to/crx/folder/';
puppeteer.launch({
headless: false, // extensions only supported in full chrome.
args: [
`--disable-extensions-except=${CRX_PATH}`,
`--load-extension=${CRX_PATH}`,
]
}).then(async browser => {
const page = await browser.newPage();
await page.evaluate("document.title = 'PuppeteerAgent'");
// ... do some testing ...
await browser.close();
});
Note: The downside is that this approach requires the "tabs" permission in manifest.json.
Testing an extension page
Let's say you wanted to test your popup page UI? One way to do that would be to navigate to its chrome-extension:// URL directly, then use puppeteer to do the UI testing:
// Can we navigate to a chrome-extension page? YES!
const page = await browser.newPage();
await page.goto('chrome-extension://ipfiboohojhbonenbbppflmpfkakjhed/popup.html');
// click buttons, test UI elements, etc.
To create a stable extension id for testing, check out: https://stackoverflow.com/a/23877974/274673