Authenticate using cookies for web scraping? - javascript

I built an app that uses Puppeteer to scrape data from LinkedIn. I log in using email and password but would like to pass in cookies to authenticate. Here is what I currently use:
const puppeteer = require("puppeteer");
(async () => {
try {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto("https://www.linkedin.com/login");
await page.waitForSelector(loginBtn);
await page.type("#username", username);
await page.type("#password", password);
await page.click(loginBtn, { delay: 30 });
await browser.close();
} catch (error) {
console.log(`Our error = ${error}`);
}
})();
I've seen websites like Phantombuster that use "li_at" cookies to authenticate. https://i.imgur.com/PI8fzao.png
How can I authenticate using cookies?

Disclaimer: I work at Phantombuster ;)
Since logging in sets a cookie in your browser on success, you can replace that step with the direct result:
await page.setCookie({ name: "li_at", value: "[cookie here]", domain: "www.linkedin.com" })
You should then be able to goto any of the website page as if you were authenticated by the login form.

Related

Puppeteer returning empty array

I'm trying pick some data from followers page, but always return a empty array.
That's my code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless:false});
const page = await browser.newPage();
await page.goto('https://www.instagram.com/accounts/login/', {
waitUntil: 'networkidle0',
});
// Wait for log in form
await Promise.all([
page.waitForSelector('[name="username"]'),
page.waitForSelector('[name="password"]'),
page.waitForSelector('[type="submit"]'),
]);
// Enter username and password
await page.type('[name="username"]', 'yourAccount');
await page.type('[name="password"]', 'yourPassword');
// Submit log in credentials and wait for navigation
await Promise.all([
page.click('[type="submit"]'),
page.waitForNavigation({
waitUntil: 'networkidle0',
}),
]);
await page.goto('https://www.instagram.com/publicProfile /followers/', {waitUntil: 'networkidle0'});
const teste = await page.evaluate(() => {
const followers = document.querySelectorAll("._aaco span");
let followersArray = []
followers.forEach((item) =>{
followersArray.push(item.innerText)
})
return followersArray
})
console.log(teste)
await browser.close();
})();
publicProfile in the URL is a profile that I choose, but for privacy reasons e alterate for publicProfile.
UPDATE: The problem has resolved, as Örvar said the problem was that I wasn't logged, soo I search for help here and I found this (Puppeteer Login to Instagram) question that solved my question.
When you use a tool like Puppeteer to get content from a site where you need to login, you also need to login using Puppeteer, so the site that you are logging into will generate a user cookie.
Log into Instagram using Puppeteer with user credentials
When Puppeteer has logged in with user credentials, you can run the code you have posted above.

UnhandledPromiseRejectionWarning: Error: Evaluation failed theme is not defined

Before I start the question, I am new in JavaScript, and I have very basic knowledge of async js, but i need to solve this so i can have my first project functional.
I am trying to build a scraping app using Node and Puppeteer. Basically, the user enters a URL ("link" in the code below), puppeteer goes trough the website code, tries to find the specific piece and returns the data. That part I got working so far.
The problem is when a user enters a URL of a site that doesn't have that piece of code. In that case, I get UnhandledPromiseRejectionWarning: Error: Evaluation failed theme is not defined
What do I do so when there is an error like that, I can catch it and redirect the page instead of Getting Internal Server error.
app.post("/results", function(req, res) {
var link = req.body.link;
(async link => {
const browser = await puppeteer.launch({ args: ['--no-sandbox'] })
const page = await browser.newPage()
await page.goto(link, { waitUntil: 'networkidle2'})
const data = await page.evaluate('theme.name');
await browser.close()
return data
})(link)
.then(data => {
res.render("index", {data: data, siteUrl: link});
})
})
You can extend the async part to the whole route handler and do whatever you want on catch:
app.post('/results', async (req, res) => {
try {
const link = req.body.link
const browser = await puppeteer.launch({ args: ['--no-sandbox'] })
const page = await browser.newPage()
await page.goto(link, { waitUntil: 'networkidle2'})
const data = await page.evaluate('theme.name')
await browser.close()
res.render("index", {data: data, siteUrl: link})
} catch(e) {
// redirect or whatever
res.redirect('/')
}
});

Favicon missing in headless puppeteer

I'm writing a simple script to just check if all the resources load correctly (i check status codes of the responses). I've decided to use puppeteer for this and i wrote
(async () => {
const browser = await puppeteer.launch({headless: false});
const [page] = await browser.pages();
page.on("response", (res) => {
const url = res.url(), status=res.status();
// my functionality goes here
if (url.includes("favicon")) console.log(status, url); // not logging in headless
});
await page.goto("https://stackoverflow.com/", {waitUntil: 'networkidle2'});
await browser.close();
})();
the issue is that if i run my application in headless mode the favicon is missing from the responses, i assume it has something to do with puppeteer not loading a favicon in headless. Any built in functionality or workarounds?
From the lack of a better solution right now I'm evaluating favicons url and manually visiting it
async function checkFavicon(page){
const iconUrl = await page.$eval("link[rel*='icon']", ({href}) => href);
await page.goto(iconUrl);
await page.goBack();
}

How to accept a JavaScript alert popup in Puppeteer?

I have a script which uses Puppeteer to automatically log in to a corporate portal. The login uses SAML. So, when puppeteer opens up an instance of chromium and visits the page, a popup appears on screen to confirm the identity of the user. All I need to do is either manually click on "OK" button or press Enter from keyboard.
I have tried to simulate the pressing of the Enter key using puppeteer but it does not work.
The login screen -
Script -
const puppeteer = require('puppeteer');
async function startDb() {
const browser = await puppeteer.launch({
headless:false,
defaultViewport:null
});
const page = await browser.newPage();
await page.goto("https://example.com");
await page.waitFor(3000);
await page.keyboard.press('Enter');
console.log('Opened')
};
startDb();
**EDIT **
There is a solution proposed in this issue:
Basically just intercept the request, then fire the request off yourself using your favorite httpclient lib, and repond to the intercepted request with the response info.
const puppeteer = require('puppeteer');
const request = require('request');
const fs = require('fs');
(async () => {
const browser = await puppeteer.launch();
let page = await browser.newPage();
// Enable Request Interception
await page.setRequestInterception(true);
// Client cert files
const cert = fs.readFileSync('/path/to/cert.crt.pem');
const key = fs.readFileSync('/path/to/cert.key.pem');
page.on('request', interceptedRequest => {
// Intercept Request, pull out request options, add in client cert
const options = {
uri: interceptedRequest.url(),
method: interceptedRequest.method(),
headers: interceptedRequest.headers(),
body: interceptedRequest.postData(),
cert: cert,
key: key
};
// Fire off the request manually (example is using using 'request' lib)
request(options, function(err, resp, body) {
// Abort interceptedRequest on error
if (err) {
console.error(`Unable to call ${options.uri}`, err);
return interceptedRequest.abort('connectionrefused');
}
// Return retrieved response to interceptedRequest
interceptedRequest.respond({
status: resp.statusCode,
contentType: resp.headers['content-type'],
headers: resp.headers,
body: body
});
});
});
await page.goto('https://client.badssl.com/');
await browser.close();
})();
Before page.goto(), put this code:
page.on('dialog', async dialog => {
await dialog.accept();
});

I can't go from a page to another using page.goto() - Puppeteer

I'm trying to make a InstagramBot that logs in and then go to some profile, my code worked yesterday for awhile and than it just stopped working .
I've tried to clone my repository from github, but it does'n work either, sometimes it works again, but if I try to create another function, the code just ignore the line of the code that changes the page.
I've also tried to create a new page and then in this new page use the goto function and it worked, but the account doesn keep logged in
The version of puppeteer that I'm using: 1.16.0
The version of node.js that I'm using: v10.15.3
const puppeteer = require('puppeteer');
const BASE_URL = "https://www.instagram.com/accounts/login/?hl=en&source=auth_switcher";
const instagram = {
browser: null,
page: null,
profile_url: null,
initialize: async (profile) => {
instagram.browser = await puppeteer.launch({
headless: false
})
instagram.profile_url = await "https://www.instagram.com/" + profile;
instagram.page = await instagram.browser.newPage();
await instagram.page.goto(BASE_URL, {waitUntil: 'networkidle2'});
},
login: async(username, password) =>{
await instagram.page.waitFor(1000);
await instagram.page.type('input[name="username"]', username);
await instagram.page.type('input[name="password"', password);
await instagram.page.click('button[type="submit"]');
await instagram.page.waitFor(1500);
await console.log(instagram.profile_url);
await instagram.page.goto(instagram.profile_url, {timeout: 0, waitUntil: 'domcontentloaded'}); // the code just ignore this line
await instagram.page.waitFor(1000);
},
getPhotosLinks: async() => {
console.log("Do something here");
}
}
module.exports = instagram;
It doesn't give any error message, just doesn't work
Replace
await instagram.page.click('button[type="submit"]');
await instagram.page.waitFor(1500);
with
await Promise.all([
instagram.page.click('button[type="submit"]');,
instagram.page.waitForNavigation()
]);
and see if it works

Categories