Puppeteer returning empty array - javascript

I'm trying pick some data from followers page, but always return a empty array.
That's my code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless:false});
const page = await browser.newPage();
await page.goto('https://www.instagram.com/accounts/login/', {
waitUntil: 'networkidle0',
});
// Wait for log in form
await Promise.all([
page.waitForSelector('[name="username"]'),
page.waitForSelector('[name="password"]'),
page.waitForSelector('[type="submit"]'),
]);
// Enter username and password
await page.type('[name="username"]', 'yourAccount');
await page.type('[name="password"]', 'yourPassword');
// Submit log in credentials and wait for navigation
await Promise.all([
page.click('[type="submit"]'),
page.waitForNavigation({
waitUntil: 'networkidle0',
}),
]);
await page.goto('https://www.instagram.com/publicProfile /followers/', {waitUntil: 'networkidle0'});
const teste = await page.evaluate(() => {
const followers = document.querySelectorAll("._aaco span");
let followersArray = []
followers.forEach((item) =>{
followersArray.push(item.innerText)
})
return followersArray
})
console.log(teste)
await browser.close();
})();
publicProfile in the URL is a profile that I choose, but for privacy reasons e alterate for publicProfile.
UPDATE: The problem has resolved, as Örvar said the problem was that I wasn't logged, soo I search for help here and I found this (Puppeteer Login to Instagram) question that solved my question.

When you use a tool like Puppeteer to get content from a site where you need to login, you also need to login using Puppeteer, so the site that you are logging into will generate a user cookie.
Log into Instagram using Puppeteer with user credentials
When Puppeteer has logged in with user credentials, you can run the code you have posted above.

Related

Node using test for UI login

I use the following blog to use the playright
login and I need something similar to use for my app, when I use the headless:flase
I see it opens the UI with the user password in however, it doesnt click on the logon button, I adopt the code I try with the following , am I missing something?
(async () => {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://app.com');
await page.fill('input[type="text"]', 'user#test.com');
await page.fill('input[type="password"]', 'Abcd1234!');
// page.click('div[data-testid="LoginForm_Login_Button"]');
page.click('div[id="logOnFormSubmit"]');
}
)();
You are currently using
page.click('div[id="logOnFormSubmit"]');
There is no div in your given code example with that ID, but instead there is a button. You'd need to change that line to reflect this. The final code would look like below.
(async () => {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://app.com');
await page.fill('input[type="text"]', 'user#test.com');
await page.fill('input[type="password"]', 'Abcd1234!');
// page.click('div[data-testid="LoginForm_Login_Button"]');
page.click('button[id="logOnFormSubmit"]');
}
)();

Authenticate using cookies for web scraping?

I built an app that uses Puppeteer to scrape data from LinkedIn. I log in using email and password but would like to pass in cookies to authenticate. Here is what I currently use:
const puppeteer = require("puppeteer");
(async () => {
try {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto("https://www.linkedin.com/login");
await page.waitForSelector(loginBtn);
await page.type("#username", username);
await page.type("#password", password);
await page.click(loginBtn, { delay: 30 });
await browser.close();
} catch (error) {
console.log(`Our error = ${error}`);
}
})();
I've seen websites like Phantombuster that use "li_at" cookies to authenticate. https://i.imgur.com/PI8fzao.png
How can I authenticate using cookies?
Disclaimer: I work at Phantombuster ;)
Since logging in sets a cookie in your browser on success, you can replace that step with the direct result:
await page.setCookie({ name: "li_at", value: "[cookie here]", domain: "www.linkedin.com" })
You should then be able to goto any of the website page as if you were authenticated by the login form.

UnhandledPromiseRejectionWarning: Error: Evaluation failed theme is not defined

Before I start the question, I am new in JavaScript, and I have very basic knowledge of async js, but i need to solve this so i can have my first project functional.
I am trying to build a scraping app using Node and Puppeteer. Basically, the user enters a URL ("link" in the code below), puppeteer goes trough the website code, tries to find the specific piece and returns the data. That part I got working so far.
The problem is when a user enters a URL of a site that doesn't have that piece of code. In that case, I get UnhandledPromiseRejectionWarning: Error: Evaluation failed theme is not defined
What do I do so when there is an error like that, I can catch it and redirect the page instead of Getting Internal Server error.
app.post("/results", function(req, res) {
var link = req.body.link;
(async link => {
const browser = await puppeteer.launch({ args: ['--no-sandbox'] })
const page = await browser.newPage()
await page.goto(link, { waitUntil: 'networkidle2'})
const data = await page.evaluate('theme.name');
await browser.close()
return data
})(link)
.then(data => {
res.render("index", {data: data, siteUrl: link});
})
})
You can extend the async part to the whole route handler and do whatever you want on catch:
app.post('/results', async (req, res) => {
try {
const link = req.body.link
const browser = await puppeteer.launch({ args: ['--no-sandbox'] })
const page = await browser.newPage()
await page.goto(link, { waitUntil: 'networkidle2'})
const data = await page.evaluate('theme.name')
await browser.close()
res.render("index", {data: data, siteUrl: link})
} catch(e) {
// redirect or whatever
res.redirect('/')
}
});

Favicon missing in headless puppeteer

I'm writing a simple script to just check if all the resources load correctly (i check status codes of the responses). I've decided to use puppeteer for this and i wrote
(async () => {
const browser = await puppeteer.launch({headless: false});
const [page] = await browser.pages();
page.on("response", (res) => {
const url = res.url(), status=res.status();
// my functionality goes here
if (url.includes("favicon")) console.log(status, url); // not logging in headless
});
await page.goto("https://stackoverflow.com/", {waitUntil: 'networkidle2'});
await browser.close();
})();
the issue is that if i run my application in headless mode the favicon is missing from the responses, i assume it has something to do with puppeteer not loading a favicon in headless. Any built in functionality or workarounds?
From the lack of a better solution right now I'm evaluating favicons url and manually visiting it
async function checkFavicon(page){
const iconUrl = await page.$eval("link[rel*='icon']", ({href}) => href);
await page.goto(iconUrl);
await page.goBack();
}

I can't go from a page to another using page.goto() - Puppeteer

I'm trying to make a InstagramBot that logs in and then go to some profile, my code worked yesterday for awhile and than it just stopped working .
I've tried to clone my repository from github, but it does'n work either, sometimes it works again, but if I try to create another function, the code just ignore the line of the code that changes the page.
I've also tried to create a new page and then in this new page use the goto function and it worked, but the account doesn keep logged in
The version of puppeteer that I'm using: 1.16.0
The version of node.js that I'm using: v10.15.3
const puppeteer = require('puppeteer');
const BASE_URL = "https://www.instagram.com/accounts/login/?hl=en&source=auth_switcher";
const instagram = {
browser: null,
page: null,
profile_url: null,
initialize: async (profile) => {
instagram.browser = await puppeteer.launch({
headless: false
})
instagram.profile_url = await "https://www.instagram.com/" + profile;
instagram.page = await instagram.browser.newPage();
await instagram.page.goto(BASE_URL, {waitUntil: 'networkidle2'});
},
login: async(username, password) =>{
await instagram.page.waitFor(1000);
await instagram.page.type('input[name="username"]', username);
await instagram.page.type('input[name="password"', password);
await instagram.page.click('button[type="submit"]');
await instagram.page.waitFor(1500);
await console.log(instagram.profile_url);
await instagram.page.goto(instagram.profile_url, {timeout: 0, waitUntil: 'domcontentloaded'}); // the code just ignore this line
await instagram.page.waitFor(1000);
},
getPhotosLinks: async() => {
console.log("Do something here");
}
}
module.exports = instagram;
It doesn't give any error message, just doesn't work
Replace
await instagram.page.click('button[type="submit"]');
await instagram.page.waitFor(1500);
with
await Promise.all([
instagram.page.click('button[type="submit"]');,
instagram.page.waitForNavigation()
]);
and see if it works

Categories