Before I start the question, I am new in JavaScript, and I have very basic knowledge of async js, but i need to solve this so i can have my first project functional.
I am trying to build a scraping app using Node and Puppeteer. Basically, the user enters a URL ("link" in the code below), puppeteer goes trough the website code, tries to find the specific piece and returns the data. That part I got working so far.
The problem is when a user enters a URL of a site that doesn't have that piece of code. In that case, I get UnhandledPromiseRejectionWarning: Error: Evaluation failed theme is not defined
What do I do so when there is an error like that, I can catch it and redirect the page instead of Getting Internal Server error.
app.post("/results", function(req, res) {
var link = req.body.link;
(async link => {
const browser = await puppeteer.launch({ args: ['--no-sandbox'] })
const page = await browser.newPage()
await page.goto(link, { waitUntil: 'networkidle2'})
const data = await page.evaluate('theme.name');
await browser.close()
return data
})(link)
.then(data => {
res.render("index", {data: data, siteUrl: link});
})
})
You can extend the async part to the whole route handler and do whatever you want on catch:
app.post('/results', async (req, res) => {
try {
const link = req.body.link
const browser = await puppeteer.launch({ args: ['--no-sandbox'] })
const page = await browser.newPage()
await page.goto(link, { waitUntil: 'networkidle2'})
const data = await page.evaluate('theme.name')
await browser.close()
res.render("index", {data: data, siteUrl: link})
} catch(e) {
// redirect or whatever
res.redirect('/')
}
});
Related
I'm trying pick some data from followers page, but always return a empty array.
That's my code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless:false});
const page = await browser.newPage();
await page.goto('https://www.instagram.com/accounts/login/', {
waitUntil: 'networkidle0',
});
// Wait for log in form
await Promise.all([
page.waitForSelector('[name="username"]'),
page.waitForSelector('[name="password"]'),
page.waitForSelector('[type="submit"]'),
]);
// Enter username and password
await page.type('[name="username"]', 'yourAccount');
await page.type('[name="password"]', 'yourPassword');
// Submit log in credentials and wait for navigation
await Promise.all([
page.click('[type="submit"]'),
page.waitForNavigation({
waitUntil: 'networkidle0',
}),
]);
await page.goto('https://www.instagram.com/publicProfile /followers/', {waitUntil: 'networkidle0'});
const teste = await page.evaluate(() => {
const followers = document.querySelectorAll("._aaco span");
let followersArray = []
followers.forEach((item) =>{
followersArray.push(item.innerText)
})
return followersArray
})
console.log(teste)
await browser.close();
})();
publicProfile in the URL is a profile that I choose, but for privacy reasons e alterate for publicProfile.
UPDATE: The problem has resolved, as Örvar said the problem was that I wasn't logged, soo I search for help here and I found this (Puppeteer Login to Instagram) question that solved my question.
When you use a tool like Puppeteer to get content from a site where you need to login, you also need to login using Puppeteer, so the site that you are logging into will generate a user cookie.
Log into Instagram using Puppeteer with user credentials
When Puppeteer has logged in with user credentials, you can run the code you have posted above.
I am trying to send PUT request to the final URL but before final URL, there is a redirect. Also sending a fetch request inside final URL page is also fine. when I go to devtools console, write fetch from there also works but I need to do it inside the code, of course.
When I set await page.setRequestInterception(true); and page.once('request', (req) => {...}) it sends put request to the first page which I dont want it to do that.
Let's say first URL is https://example.com/first --> this redirects to final URL
final URL https://example.com/final --> this is where I want to send PUT request and retrieve status code. I have tried setting a timer or getting current url with page.url() and trying some if else statements, but did not work.
here is my current code;
app.get('/cookie', async (req, res) => {
puppeteer.use(StealthPlugin());
const browser = await puppeteer.launch({
headless: false,
executablePath: `C:/Program Files (x86)/Google/Chrome/Application/chrome.exe`,
defaultViewport: null,
args: ['--start-maximized'],
slowMo: 150,
});
const page = await browser.newPage();
await page.setUserAgent(randomUserAgent.getRandom());
page.setDefaultNavigationTimeout(0);
page.setJavaScriptEnabled(true);
await page.goto(
'finalURL',
{ waitUntil: 'load', timeout: 0 }
);
await delay(5000);
await page.setRequestInterception(true);
page.once('request', (request) => {
request.continue({
method: 'PUT',
});
page.setRequestInterception(false);
});
let statusCode;
await page.waitForResponse((response) => {
statusCode = response.status();
return true;
});
res.json(statusCode);
});
I'm trying to build an automation for my Linkedin account so that whenever I get a message, I want to do something custom with it.
I'm using Puppeteer and MutationObserver inside page.evaluate call after loading my Linkedin profile with my li_at session cookie.
But it fails to fire an event even when I see the node changing the textContent.
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch({ devtools: true, headless: false });
const page = await browser.newPage();
await page.setBypassCSP(true);
await page.setDefaultNavigationTimeout(0);
await page.setCookie({
'name': 'li_at',
'value': 'putYourSessionCookieHere',
'domain': '.www.linkedin.com'
})
await page.goto('https://www.linkedin.com', {waitUntil: 'networkidle2'});
await page.exposeFunction('puppeteerMutationListener', puppeteerMutationListener);
await page.evaluate(() => {
const target = document.querySelector('#messaging-nav-item .nav-item__badge-count');
const observer = new MutationObserver((mutationsList) => {
for (const mutation of mutationsList) {
window.puppeteerMutationListener(
mutation.removedNodes[0].textContent,
mutation.addedNodes[0].textContent,
);
}
});
observer.observe(
target,
{ childList: true},
);
});
} catch (err) {
console.error(err);
}
})();
function puppeteerMutationListener(oldValue, newValue) {
console.log(`${oldValue} -> ${newValue}`);
}
To reproduce the issue, you would need:
A Linkedin account
A helpful coworker with a linkedin account up for messaging you OR you can change the textContent of the Node by yourself
Any ideas on why this may be happening?
I am using JEST + Puppeteer to run functional tests on hosted web app.
here is test code:
const puppeteer = require('puppeteer');
const url = 'https://somewebsite.com';
const login = (async(page, login, password) =>{
await page.goto(url)
await page.waitForSelector('#mat-input-0')
await page.type('#mat-input-0', login)
await page.type('#mat-input-1', password)
await page.click('button')
})
beforeEach(async () => {
browser = await puppeteer.launch({ headless: false });
page = await browser.newPage();
});
afterEach(async () => {
await browser.close();
});
describe('login to website test', () => {
test('non existent user try', async() => {
jest.setTimeout(300000);
await login(page, 'user#email.com', 'upsiforgoTTThepassword')
await page.waitFor(1000)
var element = await page.$eval('.mat-simple-snackbar', (element) => {
return element.textContent.trim()
})
expect(element).toBe('User not Found')
})
})
And the problem I got is, that if I use puppeteer function await browser.close(); to exit browser after test ends It is automatically failed and I get the error in terminal:
● Test suite failed to run
Protocol error: Connection closed. Most likely the page has been closed.
and if I don't close browser after test ends it passes as it should.
I found out if I comment out preset in my jest.config.js, the error stops to occur:
// preset: "jest-puppeteer",
I'm trying to make a InstagramBot that logs in and then go to some profile, my code worked yesterday for awhile and than it just stopped working .
I've tried to clone my repository from github, but it does'n work either, sometimes it works again, but if I try to create another function, the code just ignore the line of the code that changes the page.
I've also tried to create a new page and then in this new page use the goto function and it worked, but the account doesn keep logged in
The version of puppeteer that I'm using: 1.16.0
The version of node.js that I'm using: v10.15.3
const puppeteer = require('puppeteer');
const BASE_URL = "https://www.instagram.com/accounts/login/?hl=en&source=auth_switcher";
const instagram = {
browser: null,
page: null,
profile_url: null,
initialize: async (profile) => {
instagram.browser = await puppeteer.launch({
headless: false
})
instagram.profile_url = await "https://www.instagram.com/" + profile;
instagram.page = await instagram.browser.newPage();
await instagram.page.goto(BASE_URL, {waitUntil: 'networkidle2'});
},
login: async(username, password) =>{
await instagram.page.waitFor(1000);
await instagram.page.type('input[name="username"]', username);
await instagram.page.type('input[name="password"', password);
await instagram.page.click('button[type="submit"]');
await instagram.page.waitFor(1500);
await console.log(instagram.profile_url);
await instagram.page.goto(instagram.profile_url, {timeout: 0, waitUntil: 'domcontentloaded'}); // the code just ignore this line
await instagram.page.waitFor(1000);
},
getPhotosLinks: async() => {
console.log("Do something here");
}
}
module.exports = instagram;
It doesn't give any error message, just doesn't work
Replace
await instagram.page.click('button[type="submit"]');
await instagram.page.waitFor(1500);
with
await Promise.all([
instagram.page.click('button[type="submit"]');,
instagram.page.waitForNavigation()
]);
and see if it works