how not to load any images using puppeteer?

how not to load any images using puppeteer? - javascript

I am using puppeteer to navigate through pages as a mobile user. I used this starter code:
import Puppeteer from 'puppeteer'; // Using ES6 Module
(async () => {
const iPhone = Puppeteer.devices['iPhone 11 Pro Max'];
const browser = await puppeteer.launch({
headless: false,
slowMo: 90,
defaultViewport: iPhone.viewport
});
const page = await browser.newPage();
await page.emulate(iPhone);
await page.goto("https://www.google.com/");
await page.tap('input[type="search"]');
await page.type('input[type="search"]', "boats");
await page.keyboard.press("Enter");
await page.waitForNavigation({ waitUntil: "networkidle2" });
await page.tap('a[data-hveid="CAMQAQ"]');
})();
Note: This code might not navigate in your browser because the selector for images isn't the best. This is only a temp code.
This code runs fine, but I wanted to reduce internet usage by not loading these images.
How can I negate any image, video, or anything that might be heavy on loading the page?

Related

Not able to rendering hindi data properly on pdf using puppeteer in nodejs

I am generating a PDF using a puppeteer. The language in the PDF I am using is English and Hindi. The problem is the Hindi font is not rending properly in nodejs
const browser = await puppeteer.launch({
args: ["--no-sandbox", "--disable-setuid-sandbox", "--headless",'--font-render-hinting=none'],
ignoreDefaultArgs: ["--disable-extensions"],
headless: true,
});
const page = await browser.newPage();
await browser.close()
res.set({
"Content-Type": "application/pdf;charset=utf-8",
"Content-Disposition": `inline; filename=${moment().valueOf()}_pdf.pdf`,
});
Expected Result:--
Getting this Result
Does anyone know what's wrong I am doing.?Please let me know if any other information is required. Thanks in advance!!!

Puppeteer to save image open in the browser

I have a link for a (gif) image, obtained manually via 'open in new tab'. I want Puppeteer to open the image and then save it to a file. If doing it in a normal browser I would click right button and choose 'save' from the context menu. Is there a simple way to perform this action in Puppeteer?

These lines of codes below will save Wikipedia image logo as filename logo.png
import * as fs from 'fs'
import puppeteer from 'puppeteer'
;(async () => {
const wikipedia = 'https://www.wikipedia.org/'
const browser = await puppeteer.launch()
const page = (await browser.pages())[0]
const get = await page.goto(wikipedia)
const image = await page.waitForSelector('img[src][alt="Wikipedia"]')
const imgURL = await image.evaluate(img => img.getAttribute('src'))
const pageNew = await browser.newPage()
const response = await pageNew.goto(wikipedia + imgURL, {timeout: 0, waitUntil: 'networkidle0'})
const imageBuffer = await response.buffer()
await fs.promises.writeFile('./logo.png', imageBuffer)
await page.close()
await pageNew.close()
await browser.close()
})()
Please select this as the right answer if this help you.

In Puppeteer it's possible to right click, but it's not possible to automate the navigation through the "save as" menu. However, there is a solution outlined in the top answer here:
How can I download images on a page using puppeteer?
You can write the images to disk directly from the page response.

Puppeteer: Remove links from page

I am converting a webpage into a .pdf-file with the help of Node.js and Puppeteer.
This works fine, but I want to remove all links on this page before converting it to a .pdf-file because otherwise the .pdf-file includes these links which can't be opened in my app when someone clicks on them. Is there a way to do so?
The page is an .aspx page which uses javascript. The links all start with "javascript:__". It is an intranet page which shows our meals and I just want to display the mealplan as a .pdf.
What I have in my .js-file looks like this:
const puppeteer = require('puppeteer');
let url = 'http://my-url.de/meals.aspx'
let browser = await puppeteer.launch()
let page = await browser.newPage()
await page.goto(url, {waitUntil: 'networkidle2' })
await page.pdf({
format:"A4",
path:files[0],
displayHeaderFooter: false,
printBackground:true
})
In my app it says "URL can't be opened", thats why I want these links to be removed.

It seems that these are not proper links, at least they are not <a> tags with href pointing to a website.
Instead, you are dealing with links that require javascript to navigate and that's why these are not working in the pdf.
What you could do is transform all these invalid hrefs to something valid for a pdf before capturing the page.
Check my attempt below. Its possible that you need to modify it a bit to suit your case since I don't have access to the actual website you try to parse.
const puppeteer = require('puppeteer');
let url = 'http://my-url.de/meals.aspx'
(async() => {
let browser = await puppeteer.launch()
let page = await browser.newPage()
await page.goto(url, {
waitUntil: 'networkidle2'
})
// Modifing the page here
await page.evaluate(_ => {
// Capture all links that start with javascript on the href property
// and change it to # instead.
document.querySelectorAll('a[href^="javascript"]')
.forEach(a => {
a.href = '#'
})
});
await page.pdf({
format: "A4",
path: files[0],
displayHeaderFooter: false,
printBackground: true
})
})()

Puppeteer creates bad pdf

I am using puppeteer to create a pdf from my static local html file. The PDF is created but it's corrupted. Adobe reader can't open the file and says - 'Bad file handle'. any suggestions?
I am using below standard code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('local_html_file', {waitUntil: 'networkidle2'});
await page.pdf({path: 'hn.pdf', format: 'A4'});
await browser.close();
})();
I also tried setContent() but same result. The page.screenshot() function works however.

Probably your code triggers exception. You should check pdf file size is not "zero" and you can read your pdf file with less or cat command. Sometimes pdf creators software can write errors to top of the pdf file content.
const puppeteer = require('puppeteer');
(async () => {
try{
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('local_html_file', {waitUntil: 'networkidle2'});
await page.pdf({path: 'hn.pdf', format: 'A4'});
await browser.close();
}catch(e){
console.log(e);
}
})();

The issue was the pdf filename I gave - 'con.pdf'
This seems to be a reserved name in windows and hence bad file handle. :D
What a coincidence !!!
Thanks everyone.

Fetch rendered font using Chrome headless browser

I have been looking through the Chrome headless browser documentation but unable to found this information so far.
Is it possible to capture the rendered font on a website? This information is available through the Chrome dev console.

Puppeteer doesn't expose this API directly, but it's possible to use the raw devtools protocol to get the "Rendered Fonts" information:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.stackoverflow.com/');
await page._client.send('DOM.enable');
await page._client.send('CSS.enable');
const doc = await page._client.send('DOM.getDocument');
const node = await page._client.send('DOM.querySelector', {nodeId: doc.root.nodeId, selector: 'h1'});
const fonts = await page._client.send('CSS.getPlatformFontsForNode', {nodeId: node.nodeId});
console.log(fonts);
await browser.close();
})();
The devtools protocol documentation for CSS.getPlatformFontsForNode can be found here: https://chromedevtools.github.io/devtools-protocol/tot/CSS#method-getPlatformFontsForNode

We Keep Coding

JavaScript is the programming language of the Web.

how not to load any images using puppeteer? - javascript

Related

Not able to rendering hindi data properly on pdf using puppeteer in nodejs

Puppeteer to save image open in the browser

Puppeteer: Remove links from page

Puppeteer creates bad pdf

Fetch rendered font using Chrome headless browser

Categories

Resources