New to JavaScript and trying to understand how to run the following simple test, which loads the google home page, and gets the title. This title is then tested.
const puppeteer = require("puppeteer");
var page_title = "blank";
assert = require("assert");
async function run() {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto("http://www.google.co.uk");
page_title = await page.title();
console.log("Page Title: ", page_title)
await browser.close();
}
run();
describe("Google", function() {
it("Title contains Google", async function() {
assert.equal(page_title, "Google");
});
});
The issue is the describe/it block runs before the page_title is obtained. Please could someone advise how I should actually be structuring this?
You just need read the mocha documentation. No need to digging deeper, async code located on the TOC.
mocha offer 3 ways:
callback
Simply invoke the callback when your test is complete. By adding a callback (usually named done) to it().
promise
async and await
So it revised like this with async and await :
const puppeteer = require("puppeteer");
var page_title = "blank";
assert = require("assert");
describe("Google", function() {
// this.timeout(0);
it("Title contains Google", async ()=> {
const browser = await puppeteer.launch(); //headless by default
const page = await browser.newPage();
await page.goto("http://www.google.co.uk");
page_title = await page.title();
console.log("Page Title: ", page_title);
assert.equal(page_title, "Google");
await browser.close()
});
});
My advice is quick reading on every explanation on TOC, and read brief explanation async and await
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 11 hours ago.
Improve this question
I'm trying to scrape YouTube Shorts from a specific YouTube Channel, using Puppeteer running on MeteorJs Galaxy.
Here's the code that I've done so far:
import puppeteer from 'puppeteer';
import { YouTubeShorts } from '../imports/api/youTubeShorts'; //meteor mongo local instance
let URL = 'https://www.youtube.com/#ummahtoday1513/shorts'
const processShortsData = (iteratedData) => {
let documentExist = YouTubeShorts.findOne({ videoId:iteratedData.videoId })
if(documentExist === undefined) { //undefined meaning this incoming shorts in a new one
YouTubeShorts.insert({
videoId: iteratedData.videoId,
title: iteratedData.title,
thumbnail: iteratedData.thumbnail,
height: iteratedData.height,
width: iteratedData.width
})
}
}
const fetchShorts = () => {
puppeteer.launch({
headless:true,
args:[
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--single-process'
]
})
.then( async function(browser){
async function fetchingData(){
new Promise(async function(resolve, reject){
const page = await browser.newPage();
await Promise.all([
await page.setDefaultNavigationTimeout(0),
await page.waitForNavigation({waitUntil: "domcontentloaded"}),
await page.goto(URL, {waitUntil:["domcontentloaded", "networkidle2"]}),
await page.waitForSelector('ytd-rich-grid-slim-media', { visible:true }),
new Promise(async function(resolve,reject){
page.evaluate(()=>{
const trialData = document.getElementsByTagName('ytd-rich-grid-slim-media');
const titles = Array.from(trialData).map(i => {
const singleData = {
videoId: i.data.videoId,
title: i.data.headline.simpleText,
thumbnail: i.data.thumbnail.thumbnails[0].url,
height: i.data.thumbnail.thumbnails[0].height,
width: i.data.thumbnail.thumbnails[0].width,
}
return singleData
})
resolve(titles);
})
}),
])
await page.close()
})
await browser.close()
}
async function fetchAndProcessData(){
const datum = await fetchingData()
console.log('DATUM:', datum)
}
await fetchAndProcessData()
})
}
fetchShorts();
I am struggling with two things here:
Async, await, and promises, and
Finding reason behind why Puppeteer output the ProtocolError: Protocol error (Target.createTarget): Target closed. error in the console.
I'm new to puppeteer and trying to learn from various examples on StackOverflow and Google in general, but I'm still having trouble getting it right.
A general word of advice: code slowly and test frequently, especially when you're in an unfamiliar domain. Try to minimize problems so you can understand what's failing. There are many issues here, giving the impression that the code was written in one fell swoop without incremental validation. There's no obvious entry point to debugging this.
Let's examine some failing patterns.
First, basically never use new Promise() when you're working with a promise-based API like Puppeteer. This is discussed in the canonical What is the explicit promise construction antipattern and how do I avoid it? so I'll avoid repeating the answers there.
Second, don't mix async/await and then. The point of promises is to flatten code and avoid pyramids of doom. If you find you have 5-6 deeply nested functions, you're misusing promises. In Puppeteer, there's basically no need for then.
Third, setting timeouts to infinity with page.setDefaultNavigationTimeout(0) suppresses errors. It's fine if you want a long delay, but if a navigation is taking more than a few minutes, something is wrong and you want an error so you can understand and debug it rather than having the script wait silently until you kill it, with no clear diagnostics as to what went wrong or where it failed.
Fourth, watch out for pointless calls to waitForNavigation. Code like this doesn't make much sense:
await page.waitForNavigation(...);
await page.goto(...);
What navigation are you waiting for? This seems ripe for triggering timeouts, or worse yet, infinite hangs after you've set navs to never timeout.
Fifth, avoid premature abstractions. You have various helper functions but you haven't established functionally correct code, so these just add to the confused state of affairs. Start with correctness, then add abstractions once the cut points become obvious.
Sixth, avoid Promise.all() when all of the contents of the array are sequentially awaited. In other words:
await Promise.all([
await foo(),
await bar(),
await baz(),
await quux(),
garply(),
]);
is identical to:
await foo();
await bar();
await baz();
await quux();
await garply();
Seventh, always return promises if you have them:
const fetchShorts = () => {
puppeteer.launch({
// ..
should be:
const fetchShorts = () => {
return puppeteer.launch({
// ..
This way, the caller can await the function's completion. Without it, it gets launched into the void and can never be connected with the caller's flow.
Eighth, evaluate doesn't have access to variables in Node, so this pattern doesn't work:
new Promise(resolve => {
page.evaluate(() => resolve());
});
Instead, avoid the new promise antipattern and use the promise that Puppeteer already returns to you:
await page.evaluate(() => {});
Better yet, use $$eval here since it's an abstraction of the common pattern of selecting elements first thing in evaluate.
Putting all of this together, here's a rewrite:
const puppeteer = require("puppeteer"); // ^19.6.3
const url = "<Your URL>";
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector("ytd-rich-grid-slim-media");
const result = await page.$$eval("ytd-rich-grid-slim-media", els =>
els.map(({data: {videoId, headline, thumbnail: {thumbnails}}}) => ({
videoId,
title: headline.simpleText,
thumbnail: thumbnails[0].url,
height: thumbnails[0].height,
width: thumbnails[0].width,
}))
);
console.log(result);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Note that I ensure browser cleanup with finally so the process doesn't hang in case the code throws.
Now, all we want is a bit of text, so there's no sense in loading much of the extra stuff YouTube downloads. You can speed up the script by blocking anything unnecessary to your goal:
const [page] = await browser.pages();
await page.setRequestInterception(true);
page.on("request", req => {
if (
req.url().startsWith("https://www.youtube.com") &&
["document", "script"].includes(req.resourceType())
) {
req.continue();
}
else {
req.abort();
}
});
// ...
Note that ["domcontentloaded", "networkidle2"] is basically the same as "networkidle2" since "domcontentloaded" will happen long before "networkidle2". But please avoid "networkidle2" here since all you need is some text, which doesn't depend on all network resources.
Once you've established correctness, if you're ready to factor this to a function, you can do so:
const fetchShorts = async () => {
const url = "<Your URL>";
let browser;
try {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector("ytd-rich-grid-slim-media");
return await page.$$eval("ytd-rich-grid-slim-media", els =>
els.map(({data: {videoId, headline, thumbnail: {thumbnails}}}) => ({
videoId,
title: headline.simpleText,
thumbnail: thumbnails[0].url,
height: thumbnails[0].height,
width: thumbnails[0].width,
}))
);
}
finally {
await browser?.close();
}
};
fetchShorts()
.then(shorts => console.log(shorts))
.catch(err => console.error(err));
But keep in mind, making the function responsible for managing the browser resource hampers its reusability and slows it down considerably. I usually let the caller handle the browser and make all of my scraping helpers accept a page argument:
const fetchShorts = async page => {
const url = "<Your URL>";
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector("ytd-rich-grid-slim-media");
return await page.$$eval("ytd-rich-grid-slim-media", els =>
els.map(({data: {videoId, headline, thumbnail: {thumbnails}}}) => ({
videoId,
title: headline.simpleText,
thumbnail: thumbnails[0].url,
height: thumbnails[0].height,
width: thumbnails[0].width,
}))
);
};
(async () => {
let browser;
try {
browser = await puppeteer.launch();
const [page] = await browser.pages();
console.log(await fetchShorts(page));
}
catch (err) {
console.error(err);
}
finally {
await browser?.close();
}
})();
I have a web scraper that uses Puppeteer. I am writing tests for my initial method: loadMainPage
loadMainPage:
const loadMainPage = async () => {
try {
// load puppeteer headless browser
const browser = await puppeteer.launch({
headless: true,
});
const mainPage = await browser.newPage();
await mainPage.goto(URL, { waitUntil: ["domcontentloaded"] });
// make sure page loaded.
console.log(URL + " loaded...");
const links = await getPackLinks(mainPage);
// close mainPage
await mainPage.close();
// loop through all links/pages and run the scraper
if (mainPage.isClosed()) {
await loadSubPage(links[6], browser);
console.log("Closing browser session...");
await browser.close();
}
} catch (e) {
console.error(e);
}
};
My test file:
const puppeteer = require("puppeteer");
const { loadMainPage } = require("./scraper");
jest.mock("puppeteer");
describe("loadMainPage()", () => {
it("should launch a new browser session", () => {
loadMainPage();
expect(puppeteer.launch).toBeCalled();
});
it("should open a new page", () => {
loadMainPage();
expect(???)
});
});
All I want to do is test whether certain methods in the puppeteer module are being called. My first test, checking for puppeteer.launch to be called, works just fine. The launch method returns a new instance of a Puppeteer object (a Browser), on which there is a newPage() method. How can I test to see if this method was called? newPage() itself returns another object (a Page), with its own methods that I will also need to test. I tried mocking my own implementations with the factory function that jest.mock accepts, but it was getting to be too much. I felt like I was missing something. Any help?
I have a simple piece of code
describe('My First Puppeeteer Test', () => {
it('Should launch the browser', async function() {
const browser = await puppeteer.launch({ headless: false})
const page = await browser.newPage()
await page.goto('https://github.com/login')
await page.type('#login_field', testLogin)
await page.type('#password', testPassword)
await page.click('[name="commit"]')
await page.waitForNavigation()
let [element] = await page.$x('//h3[#class="text-normal"]')
let helloText = await page.evaluate(element => element.textContent, element);
console.log(helloText);
browser.close();
})
})
Everything worked before but today I get an error + my stacktrace:
Error: Evaluation failed: TypeError: Cannot read properties of undefined (reading 'textContent')
at puppeteer_evaluation_script:1:21
at ExecutionContext._evaluateInternal (node_modules\puppeteer\lib\cjs\puppeteer\common\ExecutionContext.js:221:19)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async ExecutionContext.evaluate (node_modules\puppeteer\lib\cjs\puppeteer\common\ExecutionContext.js:110:16)
at async Context. (tests\example.tests.js:16:22)
How I can resolve this?
Kind regards
While I haven't tested the code due to the login and I assume your selectors are correct, the main problem is almost certainly that
await page.click('[name="commit"]')
await page.waitForNavigation()
creates a race condition. The docs clarify:
Bear in mind that if click() triggers a navigation event and there's a separate page.waitForNavigation() promise to be resolved, you may end up with a race condition that yields unexpected results. The correct pattern for click and wait for navigation is the following:
const [response] = await Promise.all([
page.waitForNavigation(waitOptions),
page.click(selector, clickOptions),
]);
As a side point, it's probably better to do waitForXPath rather than $x, although this seems less likely the root problem. Don't forget to await all promises such as browser.close().
const puppeteer = require("puppeteer");
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
await page.goto('https://github.com/login');
await page.type('#login_field', testLogin);
await page.type('#password', testPassword);
// vvvvvvvvvvv
await Promise.all([
page.click('[name="commit"]'),
page.waitForNavigation(),
]);
const el = await page.waitForXPath('//h3[#class="text-normal"]');
// ^^^^^^^^^^^^
//const el = await page.waitForSelector("h3.text-normal"); // ..or
const text = await el.evaluate(el => el.textContent);
console.log(text);
//await browser.close();
//^^^^^ missing await, or use finally as below
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Additionally, if you're using Jest, once you get things working, you might want to move the browser and page management to beforeEach/afterEach or beforeAll/afterAll blocks. It's faster to use the same browser instance for all test cases, and pages can be opened and closed before/after each case.
I'm playing around with puppeteer to learn a bit about automation in the browser. I wanted to open the chromium browser visable so not in headless. I set the launch option to false, but it's still not opening Chromium.
I tried to use no sandbox args, i did even deflag the --disable-extensions in the args, but nothing helped..
There are no errors in the terminal, it just doesn't launch.
Here is my code:
const puppeteer = require ("puppeteer");
async () => {
const browser = await puppeteer.launch({ headless: false });
const page = browser.newPage();
await page.goto("https://google.de");
await browser.close();
};
Any idea why chromium is not opening? Also there are no logs about errors...
Problem
You are not calling the function, you are just defining it via async () => { ... }. This is why you are not getting any errors, as the function is not executed. In addition, as the other answer already said, you are missing an await.
Solution
Your code should look like this:
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage(); // missing await
await page.goto("https://google.de");
await browser.close();
})(); // Here, we actually call the function
newPage() returns a promise so you should await it
const puppeteer = require ("puppeteer");
async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto("https://google.de");
await browser.close();
};
so, I'm using Puppeteer with Jest. After adding
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
My tests does not perform any actions. It doesn't matter if I'm using headless mode or let's call it "normal" mode. Anybody can help me?
homepage.test.js
const puppeteer = require('puppeteer');
const HomePage = require('./page_objects/HomePage');
const homePage = new HomePage();
describe('Homepage', () => {
beforeAll(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(homePage.path);
await page.waitForSelector(homePage.loginPanel);
});
it('Log into your account', async () => {
await homePage.fillLoginForm();
await expect(page).toMatchElement(homePage.productList);
await page.screenshot({ path: 'example.png' });
});
HomePage.js
module.exports = class HomePage {
constructor() {
this.path = 'https://www.saucedemo.com/index.html';
this.loginPanel = '#login_button_container';
this.productList = 'div[class="inventory_container"]';
this.loginForm = {
fields: {
usernameInput: 'input[id="user-name"]',
passwordInput: 'input[id="password"]',
logInButton: 'input[class="btn_action"]',
},
};
}
async fillLoginForm() {
await page.type(this.loginForm.fields.usernameInput, 'standard_user');
await page.type(this.loginForm.fields.passwordInput, 'secret_sauce');
await page.click(this.loginForm.fields.logInButton);
}
};
The answer has two parts, one with normal jest and another with jest-puppeteer. You can skip to the jest-puppeteer if you want.
Problem (with jest):
The browser and page inside beforeAll block has no relation to the it blocks. It also does not have any relation with the page inside HomePage class as well.
You did not mention if you were using jest-puppeteer or not.
Solution:
Create block scoped variables for the describe block, and pass the page object to the modules.
Refining the HomePage class
Consider the following HomePage class.
// HomePage.js
class HomePage {
constructor(page) {
this.page = page;
}
async getScreenshot() {
await this.page.screenshot({ path: "example.png" });
}
async getTitle(page) {
return page.title();
}
}
As you can see, there are two ways to access to the page inside the class. Either pass inside the constructor, or use with the method directly.
The method getScreenshot has a this.page, while getTitle has access to a page.
Refining the test
You cannot use this inside the jest tests due to this issue, but you can declare a variable on top of a block, then access it later.
describe("Example", () => {
// define them up here inside the parent block
let browser;
let page;
let homepage;
beforeAll(async () => {
// it has access to the browser, page and homepage
browser = await puppeteer.launch({ headless: true });
page = await browser.newPage();
homepage = new HomePage(page); // <-- pass the page to HomePage here
await page.goto("http://example.com");
await page.waitForSelector("h1");
return true;
});
});
Now all other blocks can access to the page. According to our previous example HomePage class, we can do either of following depending on how we defined the methods.
it("Gets the screenshot", async () => {
await homepage.getScreenshot(); // <-- will use this.page
});
it("Gets the title", async () => {
await homepage.getTitle(page); // <-- will use the page we are passing on
});
Finally we cleanup the tests,
afterAll(async () => {
await page.close();
await browser.close();
return true;
});
We probably need to run the jest tests with detectOpenHandles for headfull mode.
jest . --detectOpenHandles
Result:
Problem (with jest-puppeteer):
jest-puppeteer already gives you a global browser and page object. You do not need define anything.
However if you want to use jest-puppeteer and expect-puppeteer, you have to use a custom config file.
Solution:
Create a jest-config.json file and put the contents,
{
"preset": "jest-puppeteer",
"setupFilesAfterEnv": ["expect-puppeteer"]
}
Now, get rid of browser and page creation code, as well as any afterAll hooks for page.close as well.
Here is a working test file,
class HomePage {
async getTitle() {
return page.$("h1");
}
}
describe("Example", () => {
const homepage = new HomePage();
beforeAll(async () => {
// it has access to a global browser, page and scoped homepage
await page.goto("http://example.com");
await page.waitForSelector("h1");
});
it("Gets the screenshot", async () => {
const element = await homepage.getTitle();
await expect(element).toMatch("Example");
});
});
And let's run this,
jest . --detectOpenHandles --config jest-config.json
Result: