Playwright (Puppeteer) context is an empty array at launch?

Playwright (Puppeteer) context is an empty array at launch? - javascript

When using puppeteer, i used to get new tab by using this lines of code:
const browser = await puppeteer.launch()
const [page] = await browser.pages()
await page.goto('http://example.com')
The main purpose of this is the fewer tabs number, my app is running lighter.
But when i using playwright, it seems that the context isn't contain any page yet.
const browser = await playwright.chromium.launch()
const context = await browser.newContext()
const [page] = await context.pages()
await page.goto('http://example.com')
My code is running, but i keep getting this error message:
(node:47248) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'goto' of undefined
Am i the only one who getting this kind of error?

That's the same behavior you would get in puppeteer if you use createIncognitoBrowserContext.
const browser = await puppeteer.launch();
const context = await browser.createIncognitoBrowserContext();
const [page] = await context.pages(); //Page is null here
await page.goto('http://example.com');
Both createIncognitoBrowserContext in puppeteer and newContext in playwright are created with no pages.
As you mentioned in your answer, you could use the default context or call newPage in the context you just created.

After trying to get this error gone, i'm getting the code to be like this:
const browser = await playwright.chromium.launch()
const context = await browser.defaultContext()
const [page] = await context.pages()
await page.goto('http://example.com')
I change newContext() to defaultContext().

Related

Puppeteer to save image open in the browser

I have a link for a (gif) image, obtained manually via 'open in new tab'. I want Puppeteer to open the image and then save it to a file. If doing it in a normal browser I would click right button and choose 'save' from the context menu. Is there a simple way to perform this action in Puppeteer?

These lines of codes below will save Wikipedia image logo as filename logo.png
import * as fs from 'fs'
import puppeteer from 'puppeteer'
;(async () => {
const wikipedia = 'https://www.wikipedia.org/'
const browser = await puppeteer.launch()
const page = (await browser.pages())[0]
const get = await page.goto(wikipedia)
const image = await page.waitForSelector('img[src][alt="Wikipedia"]')
const imgURL = await image.evaluate(img => img.getAttribute('src'))
const pageNew = await browser.newPage()
const response = await pageNew.goto(wikipedia + imgURL, {timeout: 0, waitUntil: 'networkidle0'})
const imageBuffer = await response.buffer()
await fs.promises.writeFile('./logo.png', imageBuffer)
await page.close()
await pageNew.close()
await browser.close()
})()
Please select this as the right answer if this help you.

In Puppeteer it's possible to right click, but it's not possible to automate the navigation through the "save as" menu. However, there is a solution outlined in the top answer here:
How can I download images on a page using puppeteer?
You can write the images to disk directly from the page response.

Get current page url with Playwright Automation tool?

How can I retrieve the current URL of the page in Playwright?
Something similar to browser.getCurrentUrl() in Protractor?

const {browser}=this.helpers.Playwright;
await browser.pages(); //list pages in the browser
//get current page
const {page}=this.helpers.Playwright;
const url=await page.url();//get the url of the current page

To get the URL of the current page as a string (no await needed):
page.url()
Where "page" is an object of the Page class. You should already have a Page object, and there are various ways to instantiate it, depending on how your framework is set up: https://playwright.dev/docs/api/class-page
It can be imported with
import Page from '#playwright/test';
or this
const { webkit } = require('playwright');
(async () => {
const browser = await webkit.launch();
const context = await browser.newContext();
const page = await context.newPage();
}

How to write data to a file using Puppeteer?

Puppeteer exposes a page.screenshot() method for saving a screenshot locally on your machine. Here are the docs.
See: https://github.com/GoogleChrome/puppeteer
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
Is there a way to save a data file in a similar fashion. I'm seeking something analogous to...
page.writeToFile({data, path,});

Since any puppeteer script is an ordinary node.js script you can use anything you would use in node, say the good old fs module:
const fs = require('fs');
fs.writeFileSync('path/to/file.json', data);

Way to scrape a JS-Rendered page?

I'm currently scraping a list of URLs on my site using the request-promise npm module.
This works well for what I need, however, I'm noticing that not all of my divs are appearing because some are rendered after the fact with JS. I know I can't run that JS code remotely to force the render, but is there any ways to be able to scrape the pages only after those elements are added in?
I'm doing this currently with Node, and would prefer to keep using Node if possible.
Here is what I have:
const urls ['fake.com/link-1', 'fake.com/link-2', 'fake.com/link-3']
urls.forEach(url => {
request(url)
.then(function(html){
//get dummy dom
const d_dom = new JSDOM(html);
....
}
});
Any thoughts on how to accomplish this? Or if there is currently an alternative to Selenium as an npm module?

You will want to use puppeteer which is a Chrome headless browser (owned and maintained by Chrome/Google) for loading and parsing dynamic web pages.
Use page.goto() to goto a specific page, then use page.content() to load the html content from the rendered page.
Here is an example of how to use it:
const { JSDOM } = require("jsdom");
const puppeteer = require('puppeteer')
const urls = ['fake.com/link-1', 'fake.com/link-2', 'fake.com/link-3']
urls.forEach(async url => {
let dom = new JSDOM(await makeRequest(url))
console.log(dom.window.document.title)
});
async function makeRequest(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
let html = await page.content()
await browser.close();
return html
}

Fetch rendered font using Chrome headless browser

I have been looking through the Chrome headless browser documentation but unable to found this information so far.
Is it possible to capture the rendered font on a website? This information is available through the Chrome dev console.

Puppeteer doesn't expose this API directly, but it's possible to use the raw devtools protocol to get the "Rendered Fonts" information:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.stackoverflow.com/');
await page._client.send('DOM.enable');
await page._client.send('CSS.enable');
const doc = await page._client.send('DOM.getDocument');
const node = await page._client.send('DOM.querySelector', {nodeId: doc.root.nodeId, selector: 'h1'});
const fonts = await page._client.send('CSS.getPlatformFontsForNode', {nodeId: node.nodeId});
console.log(fonts);
await browser.close();
})();
The devtools protocol documentation for CSS.getPlatformFontsForNode can be found here: https://chromedevtools.github.io/devtools-protocol/tot/CSS#method-getPlatformFontsForNode

We Keep Coding

JavaScript is the programming language of the Web.

Playwright (Puppeteer) context is an empty array at launch? - javascript

After trying to get this error gone, i'm getting the code to be like this: const browser = await playwright.chromium.launch() const context = await browser.defaultContext() const [page] = await context.pages() await page.goto('http://example.com') I change newContext() to defaultContext().

Related

Puppeteer to save image open in the browser

Get current page url with Playwright Automation tool?

How to write data to a file using Puppeteer?

Way to scrape a JS-Rendered page?

Fetch rendered font using Chrome headless browser

Categories

Resources