How can I retrieve the current URL of the page in Playwright?
Something similar to browser.getCurrentUrl() in Protractor?
const {browser}=this.helpers.Playwright;
await browser.pages(); //list pages in the browser
//get current page
const {page}=this.helpers.Playwright;
const url=await page.url();//get the url of the current page
To get the URL of the current page as a string (no await needed):
page.url()
Where "page" is an object of the Page class. You should already have a Page object, and there are various ways to instantiate it, depending on how your framework is set up: https://playwright.dev/docs/api/class-page
It can be imported with
import Page from '#playwright/test';
or this
const { webkit } = require('playwright');
(async () => {
const browser = await webkit.launch();
const context = await browser.newContext();
const page = await context.newPage();
}
Related
I'm trying to find the numbers of users on this website, using node.js, but I'm not sure of what process I can do to get it. The value is subclassed under multiple divs with different class names. Right now I will just be console.logging it but I'm going to do more with the data later on.
I was suggested to use the package "puppeteer", but I'm not sure how this would fetch it or what to do with it to fetch it.
Thank you in advance
You can use Puppeteer and obtain the value via a query string. The typical code looks like this:
const pt= require('puppeteer')
async function getText(){
//launch browser in headless mode
const browser = await pt.launch()
//browser new page
const page = await browser.newPage()
//launch URL
await page.goto('https://feds.lol/')
//identify element
const f = await page.$("QUERY_STRIG")
//obtain text
const text = await (await f.getProperty('textContent')).jsonValue()
console.log("Text is: " + text)
}
getText()
However, I see this website is protected by Captcha
I have a link for a (gif) image, obtained manually via 'open in new tab'. I want Puppeteer to open the image and then save it to a file. If doing it in a normal browser I would click right button and choose 'save' from the context menu. Is there a simple way to perform this action in Puppeteer?
These lines of codes below will save Wikipedia image logo as filename logo.png
import * as fs from 'fs'
import puppeteer from 'puppeteer'
;(async () => {
const wikipedia = 'https://www.wikipedia.org/'
const browser = await puppeteer.launch()
const page = (await browser.pages())[0]
const get = await page.goto(wikipedia)
const image = await page.waitForSelector('img[src][alt="Wikipedia"]')
const imgURL = await image.evaluate(img => img.getAttribute('src'))
const pageNew = await browser.newPage()
const response = await pageNew.goto(wikipedia + imgURL, {timeout: 0, waitUntil: 'networkidle0'})
const imageBuffer = await response.buffer()
await fs.promises.writeFile('./logo.png', imageBuffer)
await page.close()
await pageNew.close()
await browser.close()
})()
Please select this as the right answer if this help you.
In Puppeteer it's possible to right click, but it's not possible to automate the navigation through the "save as" menu. However, there is a solution outlined in the top answer here:
How can I download images on a page using puppeteer?
You can write the images to disk directly from the page response.
I am trying to convert an html web page into a pdf file by using puppeteer. I am storing a value in localStorage and getting the value back to change the font size of h1. The problem is if I store the value in localStorage via eventListeners, puppeteer seems to be ignoring the localStorage value and converting the web page with the default font-size. But if I store value in localStorage with calling the setItem method outside of any eventlisteners, puppeteer is considering those localstorage values and converting the page with new font-size. I want it to work when I call the setItem method inside of eventListeners.
I have tried changing the event listener to 'beforeprint' but I got the same results.
let link = document.querySelector('a');
let heading = document.querySelector('h1');
window.addEventListener('DOMContentLoaded', () => {
let fSize = localStorage.getItem('size');
heading.style.fontSize = `${fSize}px`
})
localStorage.setItem('size', 500); // If I call it here puppeteer is considering the localStorage value
link.addEventListener('click', (e) => {
localStorage.setItem('size', 500); //but if I call it here it is not considering the localStorage value
})
<h1>A Heading</h1>
download
//puppeteer code snippet
let printPDF = async() => {
const filePath = path.resolve('./file.pdf')
const fileUrl = 'http://127.0.0.1:3000'
const browser = await puppeteer.launch({
args: ['--no-sandbox'],
headless: true
});
const page = await browser.newPage();
try {
await page.goto(fileUrl)
await page.pdf({
format: 'A4',
path: filePath,
printBackground: true
});
await page.close();
await browser.close();
} catch (error) {
await page.close();
await browser.close();
}
}
app
.route('/')
.get(getIndexPage);
app.route('/download').get((req, res) => {
printPDF().then(() => {
res.sendFile('./downloadPage.html', {
root: __dirname
})
}).catch(e => console.log(e))
})
Puppeteer opens an entirely new browser when you run it and that browser doesn't have the same localStorage data as the browser that you used to click your link. Your browser and the browser puppeteer spins up each have their own localStorage.
The reason it worked in the first case is that that code ran every time the page was loaded, even when the puppeteer browser loaded it.
Any changes you make to a web page after it loads (like a click event), won't be there when puppeteer loads the page on its own a few seconds later. It's like a refresh.
Could you just pass the data you need from the client to the server?
When using puppeteer, i used to get new tab by using this lines of code:
const browser = await puppeteer.launch()
const [page] = await browser.pages()
await page.goto('http://example.com')
The main purpose of this is the fewer tabs number, my app is running lighter.
But when i using playwright, it seems that the context isn't contain any page yet.
const browser = await playwright.chromium.launch()
const context = await browser.newContext()
const [page] = await context.pages()
await page.goto('http://example.com')
My code is running, but i keep getting this error message:
(node:47248) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'goto' of undefined
Am i the only one who getting this kind of error?
That's the same behavior you would get in puppeteer if you use createIncognitoBrowserContext.
const browser = await puppeteer.launch();
const context = await browser.createIncognitoBrowserContext();
const [page] = await context.pages(); //Page is null here
await page.goto('http://example.com');
Both createIncognitoBrowserContext in puppeteer and newContext in playwright are created with no pages.
As you mentioned in your answer, you could use the default context or call newPage in the context you just created.
After trying to get this error gone, i'm getting the code to be like this:
const browser = await playwright.chromium.launch()
const context = await browser.defaultContext()
const [page] = await context.pages()
await page.goto('http://example.com')
I change newContext() to defaultContext().
I'm currently scraping a list of URLs on my site using the request-promise npm module.
This works well for what I need, however, I'm noticing that not all of my divs are appearing because some are rendered after the fact with JS. I know I can't run that JS code remotely to force the render, but is there any ways to be able to scrape the pages only after those elements are added in?
I'm doing this currently with Node, and would prefer to keep using Node if possible.
Here is what I have:
const urls ['fake.com/link-1', 'fake.com/link-2', 'fake.com/link-3']
urls.forEach(url => {
request(url)
.then(function(html){
//get dummy dom
const d_dom = new JSDOM(html);
....
}
});
Any thoughts on how to accomplish this? Or if there is currently an alternative to Selenium as an npm module?
You will want to use puppeteer which is a Chrome headless browser (owned and maintained by Chrome/Google) for loading and parsing dynamic web pages.
Use page.goto() to goto a specific page, then use page.content() to load the html content from the rendered page.
Here is an example of how to use it:
const { JSDOM } = require("jsdom");
const puppeteer = require('puppeteer')
const urls = ['fake.com/link-1', 'fake.com/link-2', 'fake.com/link-3']
urls.forEach(async url => {
let dom = new JSDOM(await makeRequest(url))
console.log(dom.window.document.title)
});
async function makeRequest(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
let html = await page.content()
await browser.close();
return html
}