I am trying to do some scraping using a library and my code uses Node's
async/await pattern.
I have defined a variable 'page' in function named 'sayhi' and I pass the same variable to function ex, I get error while running the code.
const puppeteer = require('puppeteer');
async function sayhi() {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('https://www.example.com/'); //
ex(page); //FAILS
var frames2 = await newpage.frames(); // WORKS
}
function ex(newpage){
var frames = await newpage.frames(); // FAILING
}
sayhi();
You're using await in a function that isn't an async function. Try this instead:
async function ex(newpage) {
If you need frames2 to run only after ex is finished completely, you'll also want to await ex(page); in sayhi.
Related
So I just started learning puppeteer.js and what happens is that my code runs it runs without bugs but it does not display anything to the console. (I use Node.js for debugging purposes) The only time is does display something is when I put the async function inside another function and even then it return undefined. I was wondering why this is and how to fix it.
Here is my code:
(async () =>{
let movieUrl = 'https://www.imdb.com/title/tt0111161/?ref_=nav_sr_1'
let browser = await puppeteer.launch();
let page = await browser.newPage();
await page.goto(movieUrl, {waitUntil: 'networkidle2'})
let data = await page.evaluate(() => {
let title = document.querySelector('div[class="title_wrapper"] > h1').innerText;
let rating = document.querySelector('span[itemprop="ratingValue"]').innerText;
let ratingCount = document.querySelector('span[itemprop="ratingCount"]').innerText;
return{title, rating, ratingCount};
})
console.log(data);
debugger;
await browser.close();
})
This is an unnamed async function:
async () => {
// the function code
}
If you want to run it straight away, you need to call it. You can do it by enclosing it in parentheses:
(async () => {
// the function code
})()
This is called IIFE, Immediately-invoked Function Expression. You declare a function and run it immediately.
See these articles to learn about it:
https://developer.mozilla.org/en-US/docs/Glossary/IIFE
https://flaviocopes.com/javascript-iife/
I'm trying hard to understand what exactly this new feature (top level async await) means from v8 features list
When I try to run in vanila JS the results seems quite same to me here's what I try to do in vanilla js.
(() => {
let test1 = async() =>
async() => {
return 'true';
};
(async() => {
let result = await test1();
result = await result();
console.log('r', result)
})();
})()
I want to know what exactly this feature means and how to use it.
Here is v8's document. To me it is pretty self descriptive and a very handy feature for me personally.
Previously, you couldn't just write await someAsyncFunction() out of no where because for awaiting a function you must call the await inside an async function.
Example:
main.js
const fs = require('fs');
const util = require('util');
const unlink = util.promisify(fs.unlink); // promisify unlink function
await unlink('file_path'); // delete file
The above code would not work. The last line would give you an error. So, what we did previously is something like this:
async function main() {
const fs = require('fs');
const util = require('util');
const unlink = util.promisify(fs.unlink); // promisify unlink function
await unlink('file_path'); // delete file
}
main();
But, now you don't (!) have to do this. The first code would work.
THIS ANSWER IS BASED ON MY UNDERSTANDING
Top level async await allows you to await Promises returned by async functions at the top level of a module, without having to declare a separate async function. Most importantly, you can now conveniently export values returned by async functions.
For example, without this feature, you need to create a separate async function (the usual "main" async function), or use Promise.then in order to do something with the returned value at the top-level, and cannot simply export the returned value.
let test = async () => 'true';
test().then(result => console.log('r', result));
// or even more verbose
(async () => {
console.log(await test());
})();
// This exports a Promise, not the returned value, "true".
export let result = test();
// This throws an Error because export should be at the top-level.
(async () => {
export let result = await test();
})();
But with this new feature, you can simply do:
let test = async () => 'true';
export let result = await test();
console.log(result);
This feature is especially useful when you want to export a value that has to be obtained asynchronously; for example, a value you get from network at run-time, or a module like a big encryption suite that is large and loads slowly and asynchronously.
How do I execute client-side JS code within the page.evaluate() statement (not just browser JavaScript code, Node.js code)?
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await page.evaluate(() => {
document.querySelector('button[type=submit]').click();
});
console.log('yes')
await browser.close();
})();
The first parameter passed to page.evaluate() should be a function which will be evaluated in the page context in the browser.
Node.js is server-side code, and is meant to be executed on the server.
You can pass arguments from the Node.js environment to the page function using the following method:
// Node.js Environment
const hello_world = 'Hello, world! (from Node.js)';
await page.evaluate(hello_world => {
// Browser Page Environment
console.log(hello_world);
}, hello_world);
You can listen for the 'console' event to occur in the page context and print the result using page.on():
page.on('console', msg => {
for (let i = 0; i < msg.args().length; i++) {
console.log(`${i}: ${msg.args()[i]}`);
}
});
This question already has answers here:
How to pass a function in Puppeteers .evaluate() method
(5 answers)
Closed 5 months ago.
I am using Puppeteer for headless Chrome. I wish to evaluate a function inside the page that uses parts of other functions, defined dynamically elsewhere.
The code below is a minimal example / proof. In reality functionToInject() and otherFunctionToInject() are more complex and require the pages DOM.
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(someURL);
var functionToInject = function(){
return 1+1;
}
var otherFunctionToInject = function(input){
return 6
}
var data = await page.evaluate(function(functionToInject, otherFunctionToInject){
console.log('woo I run inside a browser')
return functionToInject() + otherFunctionToInject();
});
return data
When I run the code, I get:
Error: Evaluation failed: TypeError: functionToInject is not a function
Which I understand: functionToInject isn't being passed into the page's JS context. But how do I pass it into the page's JS context?
You can add function to page context with addScriptTag:
const browser = await puppeteer.launch();
const page = await browser.newPage();
function functionToInject (){
return 1+1;
}
function otherFunctionToInject(input){
return 6
}
await page.addScriptTag({ content: `${functionToInject} ${otherFunctionToInject}`});
var data = await page.evaluate(function(){
console.log('woo I run inside a browser')
return functionToInject() + otherFunctionToInject();
});
console.log(data);
await browser.close();
This example is a dirty way of solving this problem with string concatenation. More clean would be using a url or path in the addScriptTag method.
Or use exposeFunction (but now functions are wrapped in Promise):
const browser = await puppeteer.launch();
const page = await browser.newPage();
var functionToInject = function(){
return 1+1;
}
var otherFunctionToInject = function(input){
return 6
}
await page.exposeFunction('functionToInject', functionToInject);
await page.exposeFunction('otherFunctionToInject', otherFunctionToInject);
var data = await page.evaluate(async function(){
console.log('woo I run inside a browser')
return await functionToInject() + await otherFunctionToInject();
});
console.log(data);
await browser.close();
working example accessible by link, in the same repo you can see the tested component.
it("click should return option value", async () => {
const optionToReturn = "ClickedOption";
const page = await newE2EPage();
const mockCallBack = jest.fn();
await page.setContent(
`<list-option option='${optionToReturn}'></list-option>`
);
await page.exposeFunction("functionToInject", mockCallBack); // Inject function
await page.$eval("list-option", (elm: any) => {
elm.onOptionSelected = this.functionToInject; // Assign function
});
await page.waitForChanges();
const element = await page.find("list-option");
await element.click();
expect(mockCallBack.mock.calls.length).toEqual(1); // Check calls
expect(mockCallBack.mock.calls[0][0]).toBe(optionToReturn); // Check argument
});
You can also use page.exposeFunction() which will make your function return a Promise (requiring the use of async and await). This happens because your function will not be running inside your browser, but inside your nodejs application and its results are being send back and forth into/to the browser code.
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(someURL);
var functionToInject = function(){
return 1+1;
}
var otherFunctionToInject = function(input){
return 6
}
await page.exposeFunction("functionToInject", functionToInject)
await page.exposeFunction("otherFunctionToInject", otherFunctionToInject)
var data = await page.evaluate(async function(){
console.log('woo I run inside a browser')
return await functionToInject() + await otherFunctionToInject();
});
return data
Related questions:
exposeFunction() does not work after goto()
exposed function queryseldtcor not working in puppeteer
How to use evaluateOnNewDocument and exposeFunction?
exposeFunction remains in memory?
Puppeteer: pass variable in .evaluate()
Puppeteer evaluate function
allow to pass a parameterized funciton as a string to page.evaluate
Functions bound with page.exposeFunction() produce unhandled promise rejections
How to pass a function in Puppeteers .evaluate() method?
Why can't I access 'window' in an exposeFunction() function with Puppeteer?
Recently, I used Puppeteer for a new project.
I have a few questions about thea part of the API I don't understand. The documentation is very simple for these API introductions:
page.exposeFunction
page.evaluateOnNewDocument
Can I have a detailed demo to gain a better understanding?
Summary:
The Puppeteer function page.exposeFunction() essentially allows you to access Node.js functionality within the Page DOM Environment.
On the other hand, page.evaluateOnNewDocument() evaluates a predefined function when a new document is created and before any of its scripts are executed.
The Puppeteer Documentation for page.exposeFunction() states:
page.exposeFunction(name, puppeteerFunction)
name <string> Name of the function on the window object
puppeteerFunction <function> Callback function which will be called in Puppeteer's context.
returns: <Promise>
The method adds a function called name on the page's window object. When called, the function executes puppeteerFunction in node.js and returns a Promise which resolves to the return value of puppeteerFunction.
If the puppeteerFunction returns a Promise, it will be awaited.
NOTE Functions installed via page.exposeFunction survive navigations.
An example of adding an md5 function into the page:
const puppeteer = require('puppeteer');
const crypto = require('crypto');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
page.on('console', msg => console.log(msg.text()));
await page.exposeFunction('md5', text =>
crypto.createHash('md5').update(text).digest('hex')
);
await page.evaluate(async () => {
// use window.md5 to compute hashes
const myString = 'PUPPETEER';
const myHash = await window.md5(myString);
console.log(`md5 of ${myString} is ${myHash}`);
});
await browser.close();
});
An example of adding a window.readfile function into the page:
const puppeteer = require('puppeteer');
const fs = require('fs');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
page.on('console', msg => console.log(msg.text()));
await page.exposeFunction('readfile', async filePath => {
return new Promise((resolve, reject) => {
fs.readFile(filePath, 'utf8', (err, text) => {
if (err)
reject(err);
else
resolve(text);
});
});
});
await page.evaluate(async () => {
// use window.readfile to read contents of a file
const content = await window.readfile('/etc/hosts');
console.log(content);
});
await browser.close();
});
Furthermore, the Puppeteer Documentation for page.evaluateOnNewDocument explains:
page.evaluateOnNewDocument(pageFunction, ...args)
pageFunction <function|string> Function to be evaluated in browser context
...args <...Serializable> Arguments to pass to pageFunction
returns: <Promise>
Adds a function which would be invoked in one of the following scenarios:
whenever the page is navigated
whenever the child frame is attached or navigated. In this case, the function is invoked in the context of the newly attached frame
The function is invoked after the document was created but before any of its scripts were run. This is useful to amend the JavaScript environment, e.g. to seed Math.random.
An example of overriding the navigator.languages property before the page loads:
// preload.js
// overwrite the `languages` property to use a custom getter
Object.defineProperty(navigator, "languages", {
get: function() {
return ["en-US", "en", "bn"];
}
});
// In your puppeteer script, assuming the preload.js file is in same folder of our script
const preloadFile = fs.readFileSync('./preload.js', 'utf8');
await page.evaluateOnNewDocument(preloadFile);