How can use puppeteer with my current chrome (keeping my credentials) - javascript

i'm actually trying to use puppeteer for scraping and i need to use my current chrome to keep all my credentials and use it instead of relogin and type password each time which is a really time lose !
is there a way to connect it ? how to do that ?
i'm actually using node v11.1.0
and puppeteer 1.10.0
let scrape = async () => {
const browser = await log()
const page = await browser.newPage()
const delayScroll = 200
// Login
await page.goto('somesite.com');
await page.type('#login-email', '*******);
await page.type('#login-password', "******");
await page.click('#login-submit');
// Wait to login
await page.waitFor(1000);
}
and now it will be perfect if i do not need to use that and go on page (headless, i dont wan't to see the page opening i'm just using the info scraping in node) but with my current chrome who does not need to login to have information i need. (because at the end i want to use it as an extension of chrome)
thx in advance if someone knows how to do that

First welcome to the community.
You can use Chrome instead of Chromium but sincerely in my case, I get a lot of errors and cause a mess with my personal tabs. So you can create and save a profile, then you can login with a current or a new account.
In your code you have a function called "log" I'm guessing that there you set launch puppeeteer.
const browser = await log()
Into that function use arguments and create a relative directory for your profile data:
const browser = await puppeteer.launch({
args: ["--user-data-dir=./Google/Chrome/User Data/"]
});
Run your application, login with an account and the next time you enter you should see your credentials
Any doubt please add a comment.

Related

Beginner problem with crashing code to lookup class or text on website

I write this code to check if on my website is a text and if its not then it should send me a notification thru slack. When I run it on VSC it crushes after some time like maybe 15 min or something like that.
I want to make it nice to put it on a server and run it remotly but need to be sure that will not crash every so often. I want to use it to check some websites for changing information on them and if they will change or be gone then send me notification. Best bit it works but crashes and don't know why :(
Can someone maybe help or pinpoint what it can be the problem? It will be better that this tool can just see text instead of class but I don't know how to do that.
//Puppeteer library
const pt= require('puppeteer')
const axios = require('axios')
process.setMaxListeners(0);
async function getText(){
//launch browser in headless mode
const browser = await pt.launch()
//browser new page
const page = await browser.newPage()
//launch URL
await page.setDefaultNavigationTimeout(0);
//website
await page.goto('https://mieciusio.pl/kontakt.html')
//identify element
if (await page.$("[class='p-style btn-resize-mode label-bloc-2-style label-1-style']"))
console.log("found")
else //console.log("not found")
axios.post(' https://hooks.slack.com/services/MYUniqeID', {text: 'Its changed'})
}
setInterval(getText, 12000)
Try to find it online on YT but it's hard I was looking in a lot of tut's but can't find right one to work on finding text on website or not to crush because I don't know why crashes.

How to reuse a cookie so, that the website knows I've already accept the terms?

Most websites when you load ask you to accept cookies and privacy, I think it's mainly in the EU.
I'm struggling on how to reuse the cookies so, I don't have to keep clicking "accept all", every time I load up chrome.
The way I'm thinking is that if I click on "accept all" the first time and save the cookie, I can write a code that fetches the cookie file and it knows I accepted the website cookie and so, it doesn't pop up again.
The website I'm using for this example is https://finviz.com/
const puppeteer = require('puppeteer')
const fs = require('fs')
;(async () => {
const browser = await puppeteer.launch({ headless: false })
const page = await browser.newPage()
await page.goto('https://finviz.com/')
const cookiesString = await fs.readFile('./cookies.json')
const cookies = JSON.parse(cookiesString)
await page.setCookie(...cookies)
})()
It is at least complicated to write an app that listens for the setting of cookies to copy them to a file and put them back when the browser is restartet. The same applies for the case that you want to save the cookies manually.
But if you do that then deleting the cookies would be unnecessary - so you could simply allow cookies in the settings of your browser.

uploading a file using puppeteer browserWSEndpoint

I am trying to upload a file using puppeteer and browserWSEndpoint, the error message I am getting is
"Uncaught (in promise) Error: File chooser handling does not work with multiple connections to the same page".
Here is my code:
const puppeteer = require('puppeteer');
async function getTest() {
const browser = await puppeteer.connect({
browserWSEndpoint: 'wss://chrome.browserless.io'
});
const page = (await browser.pages())[0];
await page.goto('https://someWebSite');
//DO STUFF
console.log("before upload"); //code runs until here
const [fileChooser] = await Promise.all([page.waitForFileChooser(),page.click('#uploadTrigger'),]);
await fileChooser.accept(['C:\\myProgram\\pic.jpg']);
await page.click('#edit-submit');
}
getTest().then(console.log);
I must mention that if I don't use browserWSEndpoint, and use this code at the beginning instead, everything works fine.
const browser = await puppeteer.launch({headless: false, defaultViewport:null});
Honnestly I am pretty lost with browserWSEndpoint, I used info from this post How to run Puppeteer code in any web browser?
which led me to browserless.io, copied the code and it works.
Now this is my precise question, my error indicates does not work with multiple connections to the same page. How exactly am I connecting with multiple connections? Maybe I can resolve this issue and then I could use const [fileChooser].
My main issue is that I need to upload a file, using browserless
Others seem to have the same problem according to https://github.com/GoogleChrome/puppeteer/issues/4783, but using chromuim is not an option if I want to use browserless
If you are the only client connected to that browser you must be connected to a browser that doesn't support the fileChooser. You should connect to a Chromium 77.0.3844.0 (r674921) or higher.

Can the browser turned headless mid-execution when it was started normally, or vice-versa?

I want to start a chromium browser instant headless, do some automated operations, and then turn it visible before doing the rest of the stuff.
Is this possible to do using Puppeteer, and if it is, can you tell me how? And if it is not, is there any other framework or library for browser automation that can do this?
So far I've tried the following but it didn't work.
const browser = await puppeteer.launch({'headless': false});
browser.headless = true;
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com', {waitUntil: 'networkidle2'});
await page.pdf({path: 'hn.pdf', format: 'A4'});
Short answer: It's not possible
Chrome only allows to either start the browser in headless or non-headless mode. You have to specify it when you launch the browser and it is not possible to switch during runtime.
What is possible, is to launch a second browser and reuse cookies (and any other data) from the first browser.
Long answer
You would assume that you could just reuse the data directory when calling puppeteer.launch, but this is currently not possible due to multiple bugs (#1268, #1270 in the puppeteer repo).
So the best approach is to save any cookies or local storage data that you need to share between the browser instances and restore the data when you launch the browser. You then visit the website a second time. Be aware that any state the website has in terms of JavaScript variable, will be lost when you recrawl the page.
Process
Summing up, the whole process should look like this (or vice versa for headless to headfull):
Crawl in non-headless mode until you want to switch mode
Serialize cookies
Launch or reuse second browser (in headless mode)
Restore cookies
Revisit page
Continue crawling
As mentioned, this isn't currently possible since the headless switch occurs via Chromium launch flags.
I usually do this with userDataDir, which the Chromium docs describe as follows:
The user data directory contains profile data such as history, bookmarks, and cookies, as well as other per-installation local state.
Here's a simple example. This launches a browser headlessly, sets a local storage value on an arbitrary page, closes the browser, re-opens it headfully, retrieves the local storage value and prints it.
const puppeteer = require("puppeteer"); // ^18.0.4
const url = "https://www.example.com";
const opts = {userDataDir: "./data"};
let browser;
(async () => {
{
browser = await puppeteer.launch({...opts, headless: true});
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.evaluate(() => localStorage.setItem("hello", "world"));
await browser.close();
}
{
browser = await puppeteer.launch({...opts, headless: false});
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
const result = await page.evaluate(() => localStorage.getItem("hello"));
console.log(result); // => world
}
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Change const opts = {userDataDir: "./data"}; to const opts = {}; and you'll see null print instead of world; the user data doesn't persist.
The answer from a few years ago mentions issues with userDataDir and suggests a cookies solution. That's fine, but I haven't had any issues with userDataDir so either they've been resolved on the Puppeteer end or my use cases haven't triggered the issues.
There's a useful-looking answer from a reputable source in How to turn headless on after launch? but I haven't had a chance to try it yet.

Using page.getMetrics() to get page load time in puppeteer

I am trying to use puppeteer to measure how fast a set of web sites loads in my environment. My focus is on the quality of network connection and network speed, so I am happy to know the the time taken for a page to load, for a layman's definition of load, when all images and html is downloaded by browser.
By using puppeteer I can run the test repeatedly and measure the difference in load times precisely.
I can see that in 64.0.3240.0 (r508693) page.getMetrics and event: 'metrics' have landed, which should help me in getting what I am looking for.
But being a newbie in node and js I am not sure how to read the page.getMetrics and which of the different key/value pairs give a useful information in my context.
My current pathetic attempt at reading metrics is as follows:
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
page.on('load', () => console.log("Loaded: " + page.url()));
await page.goto('https://google.com');
const metrics = page.getMetrics();
console.log(metrics.Documents, metrics.Frames, metrics.JSEventListeners);
await page.goto('https://yahoo.com');
await page.goto('https://bing.com');
await page.goto('https://github.com/login');
browser.close();
}
run();
Any help in getting this code to some thing more respectable is much appreciated :)
in recent versions you have page.metrics() available:
It will return an object with a bunch of numbers including:
The timestamp when the metrics sample was taken
Combined durations of all page layouts
Combined duration of all tasks performed by the browser.
Check out the docs for the full list
You can use it like this:
await page.goto('https://github.com/login');
const gitMetrics = await page.metrics();
console.log(gitMetrics.Timestamp)
console.log(gitMetrics.TaskDuration)

Categories