I am using Puppeteer with NEXT.JS, trying to take a screenshot. And it works fine on localhost but returns an image with this error in production:
Application error a client-side exception has occurred (see the browser console for more information!!
Taking a screenshot
export const createImages = async (urlArray) => {
try {
const browser = await puppeteer.launch({
headless: true,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-dev-shm-usage",
],
slowMo: 250, // slow down by 250ms
})
const page = await browser.newPage()
for (let i = 0; i < urlArray.length; i++) {
if (urlArray[i].address === "") continue
await page.goto(urlArray[i].address, {
waitUntil: "load",
timeout: 30000,
})
const screenshotBase64 = await page.screenshot({
encoding: "base64",
})
const screenshot = Buffer.from(
await screenshotBase64.replace(/^data:image\/\w+;base64,/, ""),
"base64"
)
urlArray[i]["imgBase64"] = screenshot
}
await browser.close()
} catch (err) {
console.log(new Date(), "was not able to create images: ", err)
return err
}
return 1
}
When I open the url manually in production, the page loads fine! And I have tried encoding the image to Binary instead but still the same issue.. Any idea !?
At first I was listening only to the errors. But after I listened to all console messages using this command:
page.on('console', msg => console.log('PAGE LOG:', msg.text()))
I was able to see this error:
'THREE.WebGLRenderer: Error creating WebGL context.'
And it point out that the GPU used on the server is blacklisted because it's old.
Related
I am trying to build a scraper to monitor web projects automatically.
So far so good, the script is running, but now I want to add a feature that automatically analyses what libraries I used in the projects. The most powerful script for this job is wappalyser. They have a node package (https://www.npmjs.com/package/wappalyzer) and it's written that you can use it combined with pupperteer.
I managed to run pupperteer and to log the source code of the sites in the console, but I don't get the right way to pass the source code to the wappalyzer analyse function.
Do you guys have a hint for me?
I tryed this code but a am getting a TypeError: url.split is not a function
function getLibarys(url) {
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url);
// get source code with puppeteer
const html = await page.content();
const wappalyzer = new Wappalyzer();
(async function () {
try {
await wappalyzer.init()
// Optionally set additional request headers
const headers = {}
const site = await wappalyzer.open(page, headers)
// Optionally capture and output errors
site.on('error', console.error)
const results = await site.analyze()
console.log(JSON.stringify(results, null, 2))
} catch (error) {
console.error(error)
}
await wappalyzer.destroy()
})()
await browser.close()
})()
}
Fixed it by using the sample code from wappalyzer.
function getLibarys(url) {
const Wappalyzer = require('wappalyzer');
const options = {
debug: false,
delay: 500,
headers: {},
maxDepth: 3,
maxUrls: 10,
maxWait: 5000,
recursive: true,
probe: true,
proxy: false,
userAgent: 'Wappalyzer',
htmlMaxCols: 2000,
htmlMaxRows: 2000,
noScripts: false,
noRedirect: false,
};
const wappalyzer = new Wappalyzer(options)
;(async function() {
try {
await wappalyzer.init()
// Optionally set additional request headers
const headers = {}
const site = await wappalyzer.open(url, headers)
// Optionally capture and output errors
site.on('error', console.error)
const results = await site.analyze()
console.log(JSON.stringify(results, null, 2))
} catch (error) {
console.error(error)
}
await wappalyzer.destroy()
})()
}
I do not know if you still need an answer to this. But this is what a wappalyzer collaborator told me:
Normally you'd run Wappalyzer like this:
const Wappalyzer = require('wappalyzer')
const wappalyzer = new Wappalyzer()
await wappalyzer.init() // Launches a Puppeteer instance
const site = await wappalyzer.open(url)
If you want to use your own browser instance, you can skip wappalyzer.init() and assign the instance to wappalyzer.browser:
const Wappalyzer = require('wappalyzer')
const wappalyzer = new Wappalyzer()
wappalyzer.browser = await puppeteer.launch() // Use your own Puppeteer launch logic
const site = await wappalyzer.open(url)
You can find the discussion here.
Hope this helps.
I have used puppeteer to capture the screenshot of my page in React JS. But it is taking a blank screenshot instead of the actual charts present on the page. Here is my code.
const puppeteer = require('puppeteer');
const url = process.argv[2];
if (!url) {
throw "Please provide URL as a first argument";
}
async function run () {
return new Promise(async (resolve, reject) => {
try {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox'],headless: true, ignoreHTTPSErrors:true});
const page = await browser.newPage();
await page.goto(url, {
timeout: 30000,
waitUntil: "networkidle0"
});
await page.content();
let imgDataBase64 = await page.screenshot({quality:100, fullPage: true, encoding: "base64", type: "jpeg"});
await browser.close();
return resolve(imgDataBase64);
} catch (e) {
return reject(e);
}
})
}
run().then(console.log).catch(console.error);
The reason for the same could be document is getting loaded first before the chart loads. And puppeteer takes the screenshot as soon as the document loads. Can anyone please help me with this? We have to be sure that there is no delay in chart loading after the document is loaded so that screenshot can be captured properly. Please help. Thanks in advance.
I'm using Puppeteer.js to crawl some URL. I'm using the default Chromium browser of Puppeteer.All is working well, but the problem is, that when I run the crawling script, and doing other things in the background and the focus is no longer on the Chromium browser of Puppeteer, it's not working: waiting for elements way too long, and abort operations, or in other words: puppeteer is paused (or freeze).
P.S, I'm also using puppeteer-extra and puppeteer-extra-plugin-stealth NPM packages for advance options.
Here is how I create the browser and the page:
async initiateCrawl(isDisableAsserts) {
// Set the browser.
this.isPlannedClose = false;
const browser = await puppeteerExtra.launch({
headless: false,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--start-maximized',
'--disable-background-timer-throttling',
'--disable-backgrounding-occluded-windows',
'--disable-renderer-backgrounding'
]
});
const pid = browser.process().pid;
browser.on('disconnected', () => {
systemUtils.killProcess(pid);
if (!this.isPlannedClose) {
systemUtils.exit(Status.BROWSER_CLOSE, Color.RED, 0);
}
});
process.on('SIGINT', () => {
this.close(browser, true);
});
// Set the page and close the first empty tab.
const page = await browser.newPage();
const pages = await browser.pages();
if (pages.length > 1) {
await pages[0].close();
}
await page.setRequestInterception(true);
await page.setJavaScriptEnabled(false);
await page.setDefaultNavigationTimeout(this.timeout);
page.on('request', (request) => {
if (isDisableAsserts && ['image', 'stylesheet', 'font', 'script'].indexOf(request.resourceType()) !== -1) {
request.abort();
} else {
request.continue();
}
});
return {
browser: browser,
page: page
};
}
I already looked at:
https://github.com/puppeteer/puppeteer/issues/3339
https://github.com/GoogleChrome/chrome-launcher/issues/169
https://www.gitmemory.com/issue/GoogleChrome/puppeteer/3339/530620329
Not working solutions:
const session = await page.target().createCDPSession();
await session.send('Page.enable');
await session.send('Page.setWebLifecycleState', {state: 'active'});
const chromeArgs = [
'--disable-background-timer-throttling',
'--disable-backgrounding-occluded-windows',
'--disable-renderer-backgrounding'
];
var ops = {args:[
'--kiosks',
'--disable-background-timer-throttling',
'--disable-backgrounding-occluded-windows',
'--disable-renderer-backgrounding',
'--disable-canvas-aa',
'--disable-2d-canvas-clip-aa',
'--disable-gl-drawing-for-tests',
'--disable-dev-shm-usage',
'--no-zygote',
'--use-gl=desktop',
'--enable-webgl',
'--hide-scrollbars',
'--mute-audio',
'--start-maximized',
'--no-first-run',
'--disable-infobars',
'--disable-breakpad',
'--user-data-dir='+tempFolder,
'--no-sandbox',
'--disable-setuid-sandbox'
], headless: false, timeout:0 };
puppeteer = require('puppeteer');
browser = await puppeteer.launch(ops);
page = await browser.newPage();
Has anyone faced this issue before and have any idea how to solve this? Thanks.
My issue was solved when I updated to the latest puppeteer version (9.0.0).
Before I start the question, I am new in JavaScript, and I have very basic knowledge of async js, but i need to solve this so i can have my first project functional.
I am trying to build a scraping app using Node and Puppeteer. Basically, the user enters a URL ("link" in the code below), puppeteer goes trough the website code, tries to find the specific piece and returns the data. That part I got working so far.
The problem is when a user enters a URL of a site that doesn't have that piece of code. In that case, I get UnhandledPromiseRejectionWarning: Error: Evaluation failed theme is not defined
What do I do so when there is an error like that, I can catch it and redirect the page instead of Getting Internal Server error.
app.post("/results", function(req, res) {
var link = req.body.link;
(async link => {
const browser = await puppeteer.launch({ args: ['--no-sandbox'] })
const page = await browser.newPage()
await page.goto(link, { waitUntil: 'networkidle2'})
const data = await page.evaluate('theme.name');
await browser.close()
return data
})(link)
.then(data => {
res.render("index", {data: data, siteUrl: link});
})
})
You can extend the async part to the whole route handler and do whatever you want on catch:
app.post('/results', async (req, res) => {
try {
const link = req.body.link
const browser = await puppeteer.launch({ args: ['--no-sandbox'] })
const page = await browser.newPage()
await page.goto(link, { waitUntil: 'networkidle2'})
const data = await page.evaluate('theme.name')
await browser.close()
res.render("index", {data: data, siteUrl: link})
} catch(e) {
// redirect or whatever
res.redirect('/')
}
});
let templateHtml = fs.readFileSync(
path.join(process.cwd(), '../signedDocs/template.html'),
'utf8'
);
// making a compilable out of the HTML file
let template = handlebars.compile(templateHtml);
console.log('creafte pdf 1');
// passing the data to the HTML
let html = template(dataPDF);
// constructing the path where the generated PF file will be stored
let pdfPath = path.join(process.cwd(), '../signedDocs/' + userID + '.pdf');
console.log('creafte pdf 2');
// PDF configuration
let options = {
width: '1230px',
headerTemplate: '<p></p>',
footerTemplate: '<p></p>',
displayHeaderFooter: false,
printBackground: true,
pageRanges: '1-6',
format: 'A4',
preferCSSPageSize: true,
margin: {
top: '10px',
right: '20px',
bottom: '60px',
left: '20px'
},
path: pdfPath
};
console.log('creafte pdf 3.1');
// starting the browser with Puppeteer
const browser = await puppeteer.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox'],
headless: true
});
console.log('creafte pdf 3.2');
// starting a new blank page
let page = await browser.newPage();
try {
await page.goto(`data:text/html;charset=UTF-8,${html}`, {
waitUntil: 'networkidle0' //command used so the page w/ modules waited to be loaded
});
} catch (err) {
console.log(err);
}
console.log('creafte pdf 4');
try {
await page.pdf(options); // to generate the PDF
} catch (err) {
console.log('errrr on page.pdf');
console.log(err);
}
console.log('done');
await followUpEmail(user);
console.log('email sent');
await browser.close(); // for closing the browser
The above code works perfectly fine on my localhost. ( Running node.js 10 )
However i have now deployed my API to an EC2 instance and it runs until:
const browser = await puppeteer.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox'],
headless: false
});
I get the 3.1 console.log but nothing afterwards.
Im starting to get the feeling its something todo with my Prod env. However after trying all type of different approaches today i'm a bit lost.
Now i'm really hoping someone here has encountered this issue and has an answer or a direction!
So it turned out that NPM does install a version of Chrome however its missing a-lot of dependancies.
I checked to see which dependancies were missing by using:
ldd chrome | grep not
Installed a few manually however some are not on the PKM's
I then created a YUM config to install chrome, installed it and that came with the missing dependancies.