I've had quite a few issues trying to get Puppeteer to run in an existing chrome window with user data. I've tried simplifying things down and here is my current code. I don't know what else to do as i've tried everything related to the issue:
const browser = await puppeteer.launch({
headless: false,
userDataDir: 'C:\\Users\\me\\AppData\\Local\\Google\\Chrome\\User Data'
})
const page = await browser.newPage();
This causes the following error:
error Error: Failed to launch chrome!
[8084:14860:0321/091939.752:ERROR:cache_util_win.cc(19)] Unable to move the cach
e: 0
[8084:14860:0321/091939.752:ERROR:cache_util.cc(140)] Unable to move cache folde
r C:\Users\me\AppData\Local\Google\Chrome\User Data\ShaderCache\GPUCache
to C:\Users\me\AppData\Local\Google\Chrome\User Data\ShaderCache\old_GP
UCache_000
[8084:14860:0321/091939.752:ERROR:disk_cache.cc(184)] Unable to create cache
[8084:14860:0321/091939.752:ERROR:shader_disk_cache.cc(622)] Shader Cache Creati
on failed: -2
The issue is related to an open Chrome process. Make sure you close all of them in the task manager and the following will work:
browser = await puppeteer.launch({
headless: false,
executablePath: `C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe`,
userDataDir: `C:\\Users\\Marwan\\AppData\\Local\\Google\\Chrome\\User Data`,
});
Related
I installed a chrome extension (web scraper) which is able of scraping websites.
With this extension, I can pre-save different "scrapes", trigger them from the application tab to finally recover a spreadsheet of scrapped data.
What I want to do is start these different pre-saved "scrapes" at regular time intervals from a google sheet/app script.
I already tried puppeteer but I'm unable to access my chrome profil (and so my pre-saved extension settings) from it.
async function startBrowser(){
let browser;
try {
console.log("Opening the browser......");
browser = await puppeteer.launch({
executablePath: "C:/Program Files/Google/Chrome/Application/chrome.exe",
userDataDir: "C:/Users/ASUS/AppData/Local/Google/Chrome/User Data",
headless: false,
});
} catch (err) {
console.log("Could not create a browser instance => : ", err);
}
return browser;
}
I always get a blank window from it
Is there another way of doing it ? Or do you know how to solve my problem with puppeteer ?
I am trying to build a simple scraper for this website by using puppeteer.
The code goes as follows:
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
let pagelink = "https://www.speisekarte.de/berlin/restaurants?page=1"
await page.waitFor(3 * 1000);
await page.goto(pagelink);
await page.waitFor(3 * 1000);
await page.waitForSelector("#notice")
However, I cannot access the overlay notice for the cookies which should have the Id "notice".
This does not work either for await page.waitForSelector("#notice")
in my puppeteer code.
Nor with document.getElementById("notice") in Chromium, if I use the console of Chromium during the session manually. Also, it does not work, if I use it in Firefox's console. Funnily enough, chunks like
document.querySelectorAll("button")
work as expected. I checked with a colleague and she can access the element using the above mentioned queries in her Chrome and in her Firefox browser. She also uses a Mac. Any idea, what is happening here? Any help would be much appreciated.
This is my first time using puppeteer and I want to open a google chrome page and navigate to a chrome extension I have installed . I try to enable the chrome extension but when I run my script in headless:false mode the browser pops up without my extension .
My code :
//my extension path
const StayFocusd = 'C:\\Users\\vasilis\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Extensions\\laankejkbhbdhmipfmgcngdelahlfoji\\1.6.0_0';
async function run(){
//this is where I try to enable my extension
const browser = await puppeteer.launch({
headless: false,
ignoreDefaultArgs: [`--disable-extensions-except=${StayFocusd}`,"--enable-automation"],
}
);
const page = await browser.newPage();
sleep(3000);
await browser.close();
}
run();
So the extension does not load and I get no error or anything . I would appreciate your help
It is not enough to set --disable-extensions-except launch flag with your CRX path, you should also use --load-extension to actually load your extension in the opened browser instance.
You also seem to make a mistake using ignoreDefaultArgs where you should have used args (like this Chromium literally did the opposite of what you've expected).
Correct usage of puppeteer.launch:
const browser = await puppeteer.launch({
headless: false,
args: [
`--disable-extensions-except=${StayFocusd}`,
`--load-extension=${StayFocusd}`,
'--enable-automation'
]
})
You can make use of the official docs about Working with Chrome Extensions.
The problem has been resolved by adding cookie from an actual browser.
I'm trying to get half-price products from this website https://shop.coles.com.au/a/richmond-south/specials/search/half-price-specials. The website is rendered by AngularJS so I'm trying to use puppeteer for data scraping.
headless is false, just a blank page shows up
headless is true, it throws an exception as the image Error while running with headless browser
const puppeteer = require('puppeteer');
async function getProductNames(){
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.setViewport({ width: 1000, height: 926 });
await page.goto("https://shop.coles.com.au/a/richmond-south/specials/search/half-price-specials");
await page.waitForSelector('.product-name')
console.log("Begin to evaluate JS")
var productNames = await page.evaluate(() => {
var div = document.querySelectorAll('.product-name');
console.log(div)
var productnames = []
// leave it blank for now
return productnames
})
console.log(productNames)
browser.close()
}
getProductNames();
P/S: While looking into this issue, I figure out the web page is actually console.log out the data of each page, but I can't trace the request. If you can show me how it could be great.
The web page console log data
Try adding options parameter to page.to('url'[,options]) method
page.goto("https://shop.coles.com.au/a/richmond-south/specials/search/half-price-specials", { waitUntil: 'networkidle2' })
It will consider navigation to be finished only when there are no more than 2 network connections for at least 500 ms.
You can refer documentation about parameters of options object here: Goto Options parameter
I've tried just about everything and can't seem to figure out how to get Puppeteer to work in my current browser window (Where I'm logged in to Chrome) rather than a new cache-less logged out browser. Here's my current config setting up everything. I've tried starting chrome prior with remote debug port, loading user data in args for launching puppeteer, launching both Chromium and my current Chrome installation path, etc. Here's my current code:
const opts = {
logLevel: 'info',
output: 'json'
};
const chrome = await chromelauncher.launch( {port:9222 });
opts.port = chrome.port;
// Connect to it using puppeteer.connect().
const resp = await util.promisify(request)(`http://localhost:${opts.port}/json/version`);
const {webSocketDebuggerUrl} = JSON.parse(resp.body);
const browser = await puppeteer.connect({browserWSEndpoint: webSocketDebuggerUrl,
args: ["--disable-extensions"]});
const page = await browser.newPage();
await page.setViewport({ width: 1366, height: 768});
I've run out of resources to look, if something looks off please let me know. Thanks!
You have to use a userDataDir to reuse the cache.
puppeteer.launch({
userDataDir: 'PATH TO DATA FOLDER',
})
You can find your data directory here,
Windows 7, 8.1, and 10: C:\Users\<username>\AppData\Local\Google\Chrome\User Data\Default
Mac OS X El Capitan: Users/<username>/Library/Application Support/Google/Chrome/Default
Linux: /home/<username>/.config/google-chrome/default
Another way is to open chrome://version and pick the path from there,
Now, remove the Default and you will get your data directory,
[Profile Path] C:\Users\Alice\AppData\Local\Google\Chrome\User Data\Default
[User Data Dir] C:\Users\Alice\AppData\Local\Google\Chrome\User Data
So the code will look like,
puppeteer.launch({
userDataDir: `C:\Users\Alice\AppData\Local\Google\Chrome\User Data`,
// <-- notice I used backtick to avoid writing backslashs
})
Learn more about data directory here.
Another interesting argument is the --profile-directory, You can name a profile and use that.
--profile-directory=Default
Not sure if this is useful to your case as I use the Chromium browser and not my system chrome browser but I save the Chromium setting inside the project PeaceOut.
I'm still learning how to use puppeteer so might not be doing everything the best way.
const browser = await puppeteer.launch({
headless: false,
devtools: true,
// slowMo: 250 // slow down by 250ms
// executablePath <string> Path to a Chromium or Chrome executable to run
userDataDir: 'C:\\Users\\TeDev\\Scrape\\PeaceOut\\bdata'
// userDataDir <string> Path to a User Data Directory.
});
const page = await browser.pages();
await page[0].setViewport({ width: 1280, height: 1080 })
console.log(`Trying to access ${URL}`);
await page[0].goto(URL); // use tab 0, so Chromium doesn't show a blank tab.
Add --new-window argument when launching