Will heroku crash if I run multiple instances of puppeteer?

Will heroku crash if I run multiple instances of puppeteer? - javascript

So I created a simple node.js app to automatically generate and email PDFs using puppeteer. Everything is working perfectly on my local server but once I deploy to heroku the server will timeout if I try to create more than 2 PDFs. So if I only create 2 PDFs on my heroku app it works without an issue, but as soon as I try generate more than that the app times out.
Here is the loop I use to generate each PDF:
for (let x = 1; x <= numTickets; x++) {
console.log(x, " / ", numTickets);
try {
const browser = await puppeteer.launch({
headless: true,
args: ["--no-sandbox", "--disable-setuid-sandbox"],
});
const page = await browser.newPage();
//compile html
const content = await compile(template, {
billing,
id: id + "-" + `${x}`,
campaign,
});
options.attachments.push({
filename: `${billing.first_name}-housedoubleup-${id}-${x}.pdf`,
path: `./${billing.first_name}-housedoubleup-${id}-${x}.pdf`,
contentType: "application/pdf",
});
await page.setContent(content);
await page.emulateMediaType("screen");
await page.pdf({
path: `${billing.first_name}-housedoubleup-${id}-${x}.pdf`,
format: "a5",
printBackground: true,
landscape: true,
});
console.log("done");
} catch (e) {
console.log("Error -> ", e);
}
if (x === numTickets) {
sendEmail(options);
}
}
I'm wondering if the 512MB of RAM on my heroku free tier is maybe limiting the rest of the PDFs being generated.
If anyone has any idea how to help or what could be going wrong I'd really appreciate it :)

Every single iteration, your loop creates a new browser instance with a new page. Try using,
a single browser instance
a single page instance throughout the loop.

Related

Puppeteer Application Error: A client side exception has occurred

I am using Puppeteer with NEXT.JS, trying to take a screenshot. And it works fine on localhost but returns an image with this error in production:
Application error a client-side exception has occurred (see the browser console for more information!!
Taking a screenshot
export const createImages = async (urlArray) => {
try {
const browser = await puppeteer.launch({
headless: true,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-dev-shm-usage",
],
slowMo: 250, // slow down by 250ms
})
const page = await browser.newPage()
for (let i = 0; i < urlArray.length; i++) {
if (urlArray[i].address === "") continue
await page.goto(urlArray[i].address, {
waitUntil: "load",
timeout: 30000,
})
const screenshotBase64 = await page.screenshot({
encoding: "base64",
})
const screenshot = Buffer.from(
await screenshotBase64.replace(/^data:image\/\w+;base64,/, ""),
"base64"
)
urlArray[i]["imgBase64"] = screenshot
}
await browser.close()
} catch (err) {
console.log(new Date(), "was not able to create images: ", err)
return err
}
return 1
}
When I open the url manually in production, the page loads fine! And I have tried encoding the image to Binary instead but still the same issue.. Any idea !?

At first I was listening only to the errors. But after I listened to all console messages using this command:
page.on('console', msg => console.log('PAGE LOG:', msg.text()))
I was able to see this error:
'THREE.WebGLRenderer: Error creating WebGL context.'
And it point out that the GPU used on the server is blacklisted because it's old.

Facebook authentication button fails to load in Chromium browser spawned by Apify PuppeteerCrawler

Background
I am trying to scrape some content from prycd.com. But before that, I need to login into my prycd.com account, and then only I can access that page. For the same, I have used Apify with Puppeteer Crawler to automate the task.
Environment
node (v14.17.4)
apify-cli (v0.6.2)
Please note that my local machine is a Macbook but the same issue is seen when running in Apify cloud as well.
Issue
After I run the apify actor using the command apify run -p in my local machine, the puppeteer tries to load the URL (https://www.prycd.com/comp-report) which I had provided in the start URLs. For this URL to load the user needs to log in so a login overlay is open in the browser. This login overlay has 3 options for authentication: google authentication, Facebook authentication, and native authentication.
Login Screen of prycd.com
However, when I run apify actor to load this page then the Facebook authentication button fails to load and throws the error in the console shown below image. I only require native login for my task, but since the Facebook authentication button fails to load I am unable to operate the native login as well. I tried googling to see if others also has faced a similar issue but couldn't find anything.
Error seen in console
Approaches Tried
I have tried various approaches I am familiar with to fix this issue but none seem to work. These are some operations I tried:
Anti-scraping methods (Browser fingerprinting, checked presence of tracker, etc)
Directly calling the login API to obtain the session token.
Tried disabling loading of social authentication button (wasn't successful in doing this)
Played around with request headers and CORS.
Sample code snippet
/* eslint-disable indent */
const Apify = require('apify');
const { utils: { log, puppeteer } } = Apify;
Apify.main(async () => {
const { startUrls, email, password } = await Apify.getInput();
const requestList = await Apify.openRequestList('start-urls', startUrls);
const handlePageTimeoutSecs = 180; // 3 minutes;
// const proxyConfiguration = await Apify.createProxyConfiguration();
const width = 1280;
const height = 800;
const crawler = new Apify.PuppeteerCrawler({
requestList,
handlePageTimeoutSecs,
// proxyConfiguration,
useSessionPool: true,
persistCookiesPerSession: true,
launchContext: {
useChrome: true,
stealth: true,
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36',
launchOptions: {
args: [
`--window-size=${width},${height}`,
],
},
},
preNavigationHooks: [
async (crawlingContext) => {
const { page, request } = crawlingContext;
const blockedUrlPatterns = [
'.jpg', '.jpeg', '.png', '.svg', '.gif', '.webp', '.webm', '.ico', '.woff', '.eot',
'.tff', '.woff2',
];
await puppeteer.blockRequests(page, {
urlPatterns: blockedUrlPatterns,
extraUrlPatterns: [],
});
},
],
handlePageFunction: async (context) => {
const { request: { url }, page } = context;
log.info('Processing:', { url });
// Wait for LOG IN form to show up
await page.waitForSelector('#switchToEmailLink_SM_ROOT_COMP1', {
visible: true,
timeout: 10000,
});
await Promise.all([
page.click('#switchToEmailLink_SM_ROOT_COMP1', { delay: 100 }),
]);
log.info('Signing in to prycd.com');
// Fill in email and password
await page.evaluate((a, b) => {
document.querySelector('#input_input_emailInput_SM_ROOT_COMP1').value = a;
document.querySelector('#input_input_passwordInput_SM_ROOT_COMP1').value = b;
}, email, password);
// Click LOG IN button
await Promise.all([
page.click('div#okButton_SM_ROOT_COMP1 > button', { delay: 100 }),
page.waitForNavigation(),
]);
},
handleFailedRequestFunction: async ({ request }) => {
log.warning(`Request ${request.url} failed too many times`);
},
postNavigationHooks: [
async (crawlingContext) => {
const { request, response, session } = crawlingContext;
// eslint-disable-next-line no-underscore-dangle
if (session.retireOnBlockedStatusCodes(response._status)) {
log.error(`Page didn't load for ${request.url}`);
}
},
],
});
log.info('Starting the crawl.');
await crawler.run();
log.info('Crawl finished.');
});
Any help is highly appreciated.

How to use Puppeteer with Stripe Elements

Been slamming my head against this for a while now and no idea why this is happening.
I'm using react-stripe-elements and trying to write a test using Puppeteer. I simply cannot get Puppeteer to fill in the Card Elements.
I've tried a few approaches
Approach 1
I try to select the input by its name and then any input on the page by its class
await page.$eval('input[name="cardnumber"]')
await page.$eval(".InputElement");
I'm told that there's
Approach 2
I then tried to access the actual frame, my reasoning being that its technically a page with a different origin. Again, nothing happens. Now, strangely, when I try and print out the contents of the frame, nothing happens again.
const cardExpiryFrameHandle = await page.$('iframe[name="__privateStripeFrame5"]')
const cardExpiryFrame = await cardExpiryFrameHandle.contentFrame()
const test = await cardExpiryFrame.content()
console.log(test);
When I console log out cardExpiryFrame, it exists. This should fit the API defined here https://pptr.dev/#?product=Puppeteer&version=v3.3.0&show=api-class-frame, but it absolutely refuses to.
I also added arguments to disable some security features because I tracked down an issue that said that this might be a problem. I do so like this
module.exports = {
server: {
command: `npm run-script start:test:server`,
port: 3000,
launchTimeout: 100000,
debug: true,
args: ['--disable-web-security', '--disable-features=IsolateOrigins,site-per-process'],
},
launch: {
headless: false,
},
}
Approach 3
I then tried to mimic what a human would do and clicked the div and then tried to type out the test card number.
await page.click(getClass(paymentFlowCardInput))
await page.keyboard.type('4242424242424242', { delay: '50' })
Again no luck. No errors.
Now I'm out of ideas - what do I do?

A good solution for this is using tab to switch to the next input. In my test I have an input for the cardholder name and I then tab to the CardElement component.
describe('Test Input', () => {
test('Filling out card payment form', async () => {
let browser = await puppeteer.launch({
headless: false,
slowMo: 100
});
let page = await browser.newPage();
page.emulate({
viewport: {
width: 1280,
height: 1080
},
userAgent: ''
});
await page.goto('http://localhost:3000/asd/payment-requests/50-eur-by-2021-01-15t1200');
await page.waitForSelector('#input-name');
await page.click('input[name=card_name]')
await page.type('input[name=card_name]', 'Joe Bloggs')
await page.keyboard.press("Tab");
await page.keyboard.type(card.number, { delay: 500 })
await page.keyboard.type(card.month, { delay: 50 })
await page.keyboard.type(card.year, { delay: 50 })
await page.keyboard.type(card.cvc, { delay: 50 })
await page.keyboard.type(card.zip, { delay: 50 })
browser.close();
}, 90000);
});

You're likely running into this issue because your test isn't waiting for the CardElement to mount (and finish its animations) or, the animations are slower than your delay. Here's an example of a puppeteer test which takes those transitions into account for your reference: https://github.com/stripe/react-stripe-elements/issues/137#issuecomment-352092164

Rendering a page after generating PDF with puppeteer?

I'm currently trying to generate a PDF with puppeteer, then render a page with a "thank you" message to the user. Once the user hits that page, the Puppeteer PDF will hopefully begin downloading on the user's machine. But I'm having some troubles.
I can successfully redirect the user to the page I want them on after collecting some basic info from a form:
app.post("/generatepdf", function (req, res) {
UserPdfRequest.create({ email: req.body.email, companyName: req.body.companyName }, function (err, createdRequest) {
if (err) {
console.log(err);
} else {
console.log(createdRequest);
res.redirect("/" + createdRequest._id + "/pdf-download");
}
})
});
Then, I send them to my route which handles finding the user in question, generating the PDF, then rendering the Thank You page:
app.get("/:companyId/pdf-download", function (req, res) {
UserPdfRequest.findById(req.params.companyId, function (err, foundRequest) {
if (err) {
console.log(err);
} else {
console.log(foundRequest);
(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
const url = 'http://localhost:3000/' + req.params.companyId + '/pdf-download';
await page.goto(url, {waitUntil: 'networkidle0'});
const buffer = await page.pdf({ format: "A4", printBackground: true });
res.type('application/pdf')
res.send(buffer)
browser.close()
})()
res.render("pdfDownload", { email: foundRequest.email, companyName: foundRequest.companyName });
}
})
});
But when I land on the Thank You page, my PDF does not begin downloading. Furthermore, my console.log(foundRequest) seems to log over and over again very rapidly in my terminal, and I also receive the following errors:
https://imgur.com/ZsApRHn
I know I'm probably in over my head here given I don't have much experience with async. I'm sure this is a simple fix I'm missing; however, any help (and explanation) would be extremely valuable and appreciated. Thank you for your time!

You are calling send and render on the same response object. You can either send the data or send html but you cannot do it for the same request.
Usually it is workarounded by opening a new tab for downloading.

Puppeteer is not working the same on local vs prod

let templateHtml = fs.readFileSync(
path.join(process.cwd(), '../signedDocs/template.html'),
'utf8'
);
// making a compilable out of the HTML file
let template = handlebars.compile(templateHtml);
console.log('creafte pdf 1');
// passing the data to the HTML
let html = template(dataPDF);
// constructing the path where the generated PF file will be stored
let pdfPath = path.join(process.cwd(), '../signedDocs/' + userID + '.pdf');
console.log('creafte pdf 2');
// PDF configuration
let options = {
width: '1230px',
headerTemplate: '<p></p>',
footerTemplate: '<p></p>',
displayHeaderFooter: false,
printBackground: true,
pageRanges: '1-6',
format: 'A4',
preferCSSPageSize: true,
margin: {
top: '10px',
right: '20px',
bottom: '60px',
left: '20px'
},
path: pdfPath
};
console.log('creafte pdf 3.1');
// starting the browser with Puppeteer
const browser = await puppeteer.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox'],
headless: true
});
console.log('creafte pdf 3.2');
// starting a new blank page
let page = await browser.newPage();
try {
await page.goto(`data:text/html;charset=UTF-8,${html}`, {
waitUntil: 'networkidle0' //command used so the page w/ modules waited to be loaded
});
} catch (err) {
console.log(err);
}
console.log('creafte pdf 4');
try {
await page.pdf(options); // to generate the PDF
} catch (err) {
console.log('errrr on page.pdf');
console.log(err);
}
console.log('done');
await followUpEmail(user);
console.log('email sent');
await browser.close(); // for closing the browser
The above code works perfectly fine on my localhost. ( Running node.js 10 )
However i have now deployed my API to an EC2 instance and it runs until:
const browser = await puppeteer.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox'],
headless: false
});
I get the 3.1 console.log but nothing afterwards.
Im starting to get the feeling its something todo with my Prod env. However after trying all type of different approaches today i'm a bit lost.
Now i'm really hoping someone here has encountered this issue and has an answer or a direction!

So it turned out that NPM does install a version of Chrome however its missing a-lot of dependancies.
I checked to see which dependancies were missing by using:
ldd chrome | grep not
Installed a few manually however some are not on the PKM's
I then created a YUM config to install chrome, installed it and that came with the missing dependancies.

We Keep Coding

JavaScript is the programming language of the Web.

Will heroku crash if I run multiple instances of puppeteer? - javascript

Every single iteration, your loop creates a new browser instance with a new page. Try using, a single browser instance a single page instance throughout the loop.

Related

Puppeteer Application Error: A client side exception has occurred

Facebook authentication button fails to load in Chromium browser spawned by Apify PuppeteerCrawler

How to use Puppeteer with Stripe Elements

Rendering a page after generating PDF with puppeteer?

Puppeteer is not working the same on local vs prod

Categories

Resources