Passing functions inside $$eval or $eval func., Puppeter.js [duplicate] - javascript

Question
How do I expose an object with a bunch of methods to puppeteer? I am trying to retain the definition of the parent object and method (i.e. foo.one) within page.evaluate, if possible. In other words, I am looking for console.log(foo.one('world')), typed as such, to return world.
Background
foo is a library container which returns a whole bunch of (relatively) pure functions. These functions are required both in the main script context AND within the puppeteer browser. I would prefer to not have to redefine each of them within page.evaluate and instead pass this entire "package" to page.evaluate for repository readability/maintenance. Nonetheless, as one answer suggests below, iterating over the methods from foo and exposing them individually to puppeteer with a different name isn't a terrible option. It just would require redefinitions within page.evaluate which I am trying to avoid.
Expected vs Actual
Let's assume an immediately invoked function which returns an object with a series of function definitions as properties. When trying to pass this IIFE (or object) to puppeteer page, I receive the following error:
import puppeteer from 'puppeteer'
const foo = (()=>{
const one = (msg) => console.log('1) ' + msg)
const two = (msg) => console.log('2) ' + msg)
const three = (msg) => console.log('3) ' + msg)
return {one, two, three}
})()
const browser = await puppeteer.launch().catch(err => `Browser not launched properly: ${err}`)
const page = await browser.newPage()
page.on('console', (msg) => console.log('PUPPETEER:', msg._text)); // Pipe puppeteer console to local console
await page.evaluate((foo)=>{
console.log('hello')
console.log(foo.one('world'))
},foo)
browser.close()
// Error: Evaluation failed: TypeError: foo.one is not a function
When I try to use page.exposeFunction I receive an error. This is to be expected because foo is an object.
page.exposeFunction('foo',foo)
// Error: Failed to add page binding with name foo: [object Object] is not a function or a module with a default export.
The control case, defining the function within the browser page, works as expected:
import puppeteer from 'puppeteer'
const browser = await puppeteer.launch().catch(err => `Browser not launched properly: ${err}`)
const page = await browser.newPage()
page.on('console', (msg) => console.log('PUPPETEER:', msg._text)); // Pipe puppeteer console to local console
await page.evaluate(()=>{
const bar = (()=>{
const one = (msg) => console.log('1) ' + msg)
const two = (msg) => console.log('2) ' + msg)
const three = (msg) => console.log('3) ' + msg)
return {one, two, three}
})()
console.log('hello')
console.log(bar.one('world'))
})
browser.close()
// PUPPETEER: hello
// PUPPETEER: 1) world
Update (5/19/2022)
Adding a quick update after testing the below solutions given my use case
Reminder: I am trying to pass an externally defined utilities.js library to the browser so that it can conditionally interact with page data and navigate accordingly.
I'm open to any ideas or feedback!
addScriptTag()
Unfortunately, passing a node.js module of utility functions is very difficult in my situation. When the module contains export statements or objects, addScriptTag() fails.
I get Error: Evaluation failed: ReferenceError: {x} is not defined in this case. I created an intermediary function to remove the export statements. That is messy but it seemed to work. However, some of my functions are IIFE which return an object with methods. And objects are proving very hard to work with via addScriptTag(), to say the least.
redundant code
I think for smaller projects the simplest and best option is to just re-declare the objects/functions in the puppeteer context. I hate redefining things but it works as expected.
import()
As #ggorlen suggests, I was able to host the utilities function on another server. This can be sourced by both the node.js and puppeteer environments. I still had to import the library twice: once in the node.js environment and once in the browser context. But it's probably better in my case than redeclaring dozens of functions and objects.

It might be repetitive when calling, but you could iterate over the object and call page.exposeFunction for each.
page.exposeFunction('fooOne', foo.one);
// ...
or
for (const [fnName, fn] of Object.entries(foo)) {
page.exposeFunction(fnName, fn);
}
If the functions can all be executed in the context of the page, simply defining them inside a page.evaluate would work too.
page.evaluate(() => {
window.foo = (()=>{
const one = (msg) => console.log('1) ' + msg)
const two = (msg) => console.log('2) ' + msg)
const three = (msg) => console.log('3) ' + msg)
return {one, two, three}
})();
});
If you have to have only a single object containing the functions in the context of the page, you could first put an object on the window with page.evaluate, then in the main script, have an serial async loop over the keys and values of the object that:
calls page.exposeFunction('fnToMove' with the function
calls page.evaluate which assigns fnToMove to the desired property on the object created earlier
But that's somewhat convoluted. I wouldn't recommend it unless you really need it.

This is a bit speculative, because the use case matters quite a bit here. For example, exposeFunction means the code runs in Node context, so that involves inter-process communication and data serialization and deserialization, which seems inappropriate for your use case of processing the data fully in the browser. Then again, if there are Node-specific tasks like reading files or making cross-origin requests, it's appropriate.
If, on the other hand, you want to add code for the browser to call in the console context, a scalable way is to put your library into a script, then use page.addScriptTag("./your-lib.js") to attach it to the window. Either use a bundler to build the lib for browser compatibility or attach it by hand. Use module.exports if you also want to import it in Node.
For example:
foo.js
;(function () {
var foo = {
one: function () { return 1; },
two: function () { return 2; },
// ...
};
if (typeof module === "object" &&
typeof module.exports === "object") {
module.exports = foo;
}
if (typeof window === "object") {
window.foo = foo;
}
})();
foo-tester.js
const puppeteer = require("puppeteer"); // ^13.5.1
const foo = require("./foo"); // also use it in Node if you want...
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
await page.addScriptTag({path: "./foo.js"});
console.log(foo.one()); // => 1
console.log(await page.evaluate(() => foo.two())); // => 2
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
addScriptTag also works for modules and raw JS strings. For example, this works too:
await page.addScriptTag({content: `
window.foo = {two() { return 2; }};
`});
console.log(await page.evaluate(() => foo.two())); // => 2
A hacky approach is to stringify the object of functions you may have in Node. I don't recommend this, but it's possible:
const foo = {
one() { return 1; },
two() { return 2; },
};
const fooToWindow = `window.foo = {
${Object.values(foo).map(fn => fn.toString())}
}`;
await page.addScriptTag({content: fooToWindow});
console.log(await page.evaluate(() => foo.two())); // => 2
See also Is there a way to use a class inside Puppeteer evaluate?

Related

Can I build a WebWorker that executes arbitrary Javascript code?

I'd like to build a layer of abstraction over the WebWorker API that would allow (1) executing an arbitrary function over a webworker, and (2) wrapping the interaction in a Promise. At a high level, this would look something like this:
function bake() {
... // expensive calculation
return 'mmmm, pizza'
}
async function handlePizzaButtonClick() {
const pizza = await workIt(bake)
eat(pizza)
}
(Obviously, methods with arguments could be added without much difficulty.)
My first cut at workIt looks like this:
async function workIt<T>(f: () => T): Promise<T> {
const worker: Worker = new Worker('./unicorn.js') // no such worker, yet
worker.postMessage(f)
return new Promise<T>((resolve, reject) => {
worker.onmessage = ({data}: MessageEvent) => resolve(data)
worker.onerror = ({error}: ErrorEvent) => reject(error)
})
}
This fails because functions are not structured-cloneable and thus can't be passed in worker messages. (The Promise wrapper part works fine.)
There are various options for serializing Javascript functions, some scarier than others. But before I go that route, am I missing something here? Is there another way to leverage a WebWorker (or anything that executes in a separate thread) to run arbitrary Javascript?
I thought an example would be useful in addition to my comment, so here's a basic (no error handling, etc.), self-contained example which loads the worker from an object URL:
Meta: I'm not posting it in a runnable code snippet view because the rendered iframe runs at a different origin (https://stacksnippets.net at the time I write this answer — see snippet output), which prevents success: in Chrome, I receive the error message Refused to cross-origin redirects of the top-level worker script..
Anyway, you can just copy the text contents, paste it into your dev tools JS console right on this page, and execute it to see that it works. And, of course, it will work in a normal module in a same-origin context.
console.log(new URL(window.location.href).origin);
// Example candidate function:
// - pure
// - uses only syntax which is legal in worker module scope
async function get100LesserRandoms () {
// If `getRandomAsync` were defined outside the function,
// then this function would no longer be pure (it would be a closure)
// and `getRandomAsync` would need to be a function accessible from
// the scope of the `message` event handler within the worker
// else a `ReferenceError` would be thrown upon invocation
const getRandomAsync = () => Promise.resolve(Math.random());
const result = [];
while (result.length < 100) {
const n = await getRandomAsync();
if (n < 0.5) result.push(n);
}
return result;
}
const workerModuleText =
`self.addEventListener('message', async ({data: {id, fn}}) => self.postMessage({id, value: await eval(\`(\${fn})\`)()}));`;
const workerModuleSpecifier = URL.createObjectURL(
new Blob([workerModuleText], {type: 'text/javascript'}),
);
const worker = new Worker(workerModuleSpecifier, {type: 'module'});
worker.addEventListener('message', ({data: {id, value}}) => {
worker.dispatchEvent(new CustomEvent(id, {detail: value}));
});
function notOnMyThread (fn) {
return new Promise(resolve => {
const id = window.crypto.randomUUID();
worker.addEventListener(id, ({detail}) => resolve(detail), {once: true});
worker.postMessage({id, fn: fn.toString()});
});
}
async function main () {
const lesserRandoms = await notOnMyThread(get100LesserRandoms);
console.log(lesserRandoms);
}
main();

How to test calling to an async not-awaited function

I want to test the execution of both fuctionUT and an inner async unwaited function externalCall passed by injection. The following code is simple working example of my functions and their usage:
const sleep = async (ms) => new Promise( (accept) => setTimeout(() => accept(), ms) )
const callToExternaService = async () => sleep(1000)
const backgroundJob = async (externalCall) => {
await sleep(500) // Simulate in app work
await externalCall() // Simulate external call
console.log('bk job done')
return 'job done'
}
const appDeps = {
externalService: callToExternaService
}
const functionUT = async (deps) => {
await sleep(30) // Simulate func work
// await backgroundJob(deps.externalService) // This make test work but slow down functionUT execution
backgroundJob(deps.externalService) // I don't want to wait for performance reason
.then( () => console.log('bk job ok') )
.catch( () => console.log('bk job error') )
return 'done'
}
functionUT( appDeps )
.then( (result) => console.log(result) )
.catch( err => console.log(err) )
module.exports = {
functionUT
}
Here there is a simple jest test case that fail but just for timing reasons:
const { functionUT } = require('./index')
describe('test', () => {
it('should pass', async () => {
const externaServiceMock = jest.fn()
const fakeDeps = {
externalService: externaServiceMock
}
const result = await functionUT(fakeDeps)
expect(result).toBe('done')
expect(externaServiceMock).toBeCalledTimes(1) //Here fail but just for timing reasons
})
})
What is the correct way to test the calling of externaServiceMock (make the test pass) without slowdown the performance of the functionUT ?
I have already found similar requests, but they threat only a simplified version of the problem.
how to test an embedded async call
You can't test for the callToExternaService to be called "somewhen later" indeed.
You can however mock backgroundJob and test that is was called with the expected arguments (before functionUT completes), as well as unit test backgroundJob on its own.
If a promise exists but cannot be reached in a place that relies on its settlement, this is a potential design problem. A module that does asynchronous side effects on imports is another problem. Both concerns affect testability, also they can affect the application if the way it works changes.
Considering there's a promise, you have an option to chain or not chain it in a specific place. This doesn't mean it should be thrown away. In this specific case it can be possibly returned from a function that doesn't chain it.
A common way to do this is to preserve a promise at every point in case it's needed later, at least for testing purposes, but probably for clean shutdown, extending, etc.
const functionUT = async (deps) => {
await sleep(30) // Simulate func work
return {
status: 'done',
backgroundJob: backgroundJob(deps.externalService)...
};
}
const initialization = functionUT( appDeps )...
module.exports = {
functionUT,
initialization
}
In this form it's supposed to be tested like:
beforeAll(async () => {
let result = await initialization;
await result.backgroundJob;
});
...
let result = await functionUT(fakeDeps);
expect(result.status).toBe('done')
await result.backgroundJob;
expect(externaServiceMock).toBeCalledTimes(1);
Not waiting for initialization can result in open handler if test suite is short enough and cause a reasonable warning from Jest.
The test can be made faster by using Jest fake timers in right places together with flush-promises.
functionUT( appDeps ) call can be extracted from the module to cause a side effect only in the place where it's needed, e.g. in entry point. This way it won't interfere with the rest of tests that use this module. Also at least some functions can be extracted to their own modules to be mockable and improve testability (backgroundJob, as another answer suggests) because they cannot be mocked separately when they are declared in the same module the way they are.

js only - run a function only once

I'm fetching some data from firebase and would like to run async/await function (to fetch data) only once upon the first page load. I'm used to React and lifecycle methods / hooks doing it but this little project is just too small to use React. I just need to run this function once, fetch the data, save it to a variable and do not make any further calls to firebase api in the same session.
async function getEntries() {
const snapshot = await firebase.firestore().collection('riders').get()
// Do my thing with the data, etc.
// console.log(snapshot.docs.map(doc => doc.data()));
}
Is there any js-only way of running this function only once when the page loads?
If you call a function just once, why do you need the function at all?
const snapshot = await firebase.firestore().collection('riders').get()
// Do my thing with the data, etc.
// console.log(snapshot.docs.map(doc => doc.data()));
This top level await only works in modules, and it blocks all depending modules to load. If that is not necessary (they don't depend on the data), or if you don't want write a module, you can wrap the code in an async IIFE, and store the returned promise in a variable:
const dataPromise = (async function() {
//...
return data;
})();
While the data is loading, you might want to show some loading icon or so. That can easily be done with the following hook:
function usePromise(p) {
const [state, setState] = useState(null);
useEffect(() => { p.then(setState); }, []);
return state;
}
// Inside a component:
const data = usePromise(dataPromise);
if(data === null)
return <Loading />;
// show data
Yes. You can use Self Invoking (self executing) Functions. Syntax is like:
(function(){})();
The last parentheses are for running function. the function is anonymous.
You can Implement it this way:
(async function () {
const snapshot = await firebase.firestore().collection('riders').get()
})();
in this way you can never call this function again and it will run only once.
Tutorial: https://blog.mgechev.com/2012/08/29/self-invoking-functions-in-javascript-or-immediately-invoked-function-expression/
And The question you asked is somehow duplicate and answered here: Function in JavaScript that can be called only once
What you are looking for is memoization of the function result. There are several libraries to supporting including react.
Theres also a handmade pattern you can use by changing the function implementation after it's called once, accoring to JavaScript: The Good Parts
async function getEntries() {
const snapshot = await firebase.firestore().collection('riders').get()
// Do my thing with the data, etc.
// console.log(snapshot.docs.map(doc => doc.data()));
getEntries = async function(){
return snapshot
}
return snapshot
}
I think you can load it with the load method when the page is first loaded and then set it to cookie or local stroge. You can check this value on next page loads. You can do this quickly using jQuery.
$(window).load(function() {
var item = localStorage.getItem('test');
if(item != null){
// your code
}
else {
localStorage.setItem('test', 1);
}
});
The simplest way is to make a global variable like:
let isCalled = false;
and in the function body do:
if(isCalled) return;
//the stuff the function would do
isCalled = true;
//Assign isCalled to true before using a return statement as it will make the program discard the lines below it.

How to use top level Async await released in v8 typescript

I'm trying hard to understand what exactly this new feature (top level async await) means from v8 features list
When I try to run in vanila JS the results seems quite same to me here's what I try to do in vanilla js.
(() => {
let test1 = async() =>
async() => {
return 'true';
};
(async() => {
let result = await test1();
result = await result();
console.log('r', result)
})();
})()
I want to know what exactly this feature means and how to use it.
Here is v8's document. To me it is pretty self descriptive and a very handy feature for me personally.
Previously, you couldn't just write await someAsyncFunction() out of no where because for awaiting a function you must call the await inside an async function.
Example:
main.js
const fs = require('fs');
const util = require('util');
const unlink = util.promisify(fs.unlink); // promisify unlink function
await unlink('file_path'); // delete file
The above code would not work. The last line would give you an error. So, what we did previously is something like this:
async function main() {
const fs = require('fs');
const util = require('util');
const unlink = util.promisify(fs.unlink); // promisify unlink function
await unlink('file_path'); // delete file
}
main();
But, now you don't (!) have to do this. The first code would work.
THIS ANSWER IS BASED ON MY UNDERSTANDING
Top level async await allows you to await Promises returned by async functions at the top level of a module, without having to declare a separate async function. Most importantly, you can now conveniently export values returned by async functions.
For example, without this feature, you need to create a separate async function (the usual "main" async function), or use Promise.then in order to do something with the returned value at the top-level, and cannot simply export the returned value.
let test = async () => 'true';
test().then(result => console.log('r', result));
// or even more verbose
(async () => {
console.log(await test());
})();
// This exports a Promise, not the returned value, "true".
export let result = test();
// This throws an Error because export should be at the top-level.
(async () => {
export let result = await test();
})();
But with this new feature, you can simply do:
let test = async () => 'true';
export let result = await test();
console.log(result);
This feature is especially useful when you want to export a value that has to be obtained asynchronously; for example, a value you get from network at run-time, or a module like a big encryption suite that is large and loads slowly and asynchronously.

How to use evaluateOnNewDocument and exposeFunction?

Recently, I used Puppeteer for a new project.
I have a few questions about thea part of the API I don't understand. The documentation is very simple for these API introductions:
page.exposeFunction
page.evaluateOnNewDocument
Can I have a detailed demo to gain a better understanding?
Summary:
The Puppeteer function page.exposeFunction() essentially allows you to access Node.js functionality within the Page DOM Environment.
On the other hand, page.evaluateOnNewDocument() evaluates a predefined function when a new document is created and before any of its scripts are executed.
The Puppeteer Documentation for page.exposeFunction() states:
page.exposeFunction(name, puppeteerFunction)
name <string> Name of the function on the window object
puppeteerFunction <function> Callback function which will be called in Puppeteer's context.
returns: <Promise>
The method adds a function called name on the page's window object. When called, the function executes puppeteerFunction in node.js and returns a Promise which resolves to the return value of puppeteerFunction.
If the puppeteerFunction returns a Promise, it will be awaited.
NOTE Functions installed via page.exposeFunction survive navigations.
An example of adding an md5 function into the page:
const puppeteer = require('puppeteer');
const crypto = require('crypto');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
page.on('console', msg => console.log(msg.text()));
await page.exposeFunction('md5', text =>
crypto.createHash('md5').update(text).digest('hex')
);
await page.evaluate(async () => {
// use window.md5 to compute hashes
const myString = 'PUPPETEER';
const myHash = await window.md5(myString);
console.log(`md5 of ${myString} is ${myHash}`);
});
await browser.close();
});
An example of adding a window.readfile function into the page:
const puppeteer = require('puppeteer');
const fs = require('fs');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
page.on('console', msg => console.log(msg.text()));
await page.exposeFunction('readfile', async filePath => {
return new Promise((resolve, reject) => {
fs.readFile(filePath, 'utf8', (err, text) => {
if (err)
reject(err);
else
resolve(text);
});
});
});
await page.evaluate(async () => {
// use window.readfile to read contents of a file
const content = await window.readfile('/etc/hosts');
console.log(content);
});
await browser.close();
});
Furthermore, the Puppeteer Documentation for page.evaluateOnNewDocument explains:
page.evaluateOnNewDocument(pageFunction, ...args)
pageFunction <function|string> Function to be evaluated in browser context
...args <...Serializable> Arguments to pass to pageFunction
returns: <Promise>
Adds a function which would be invoked in one of the following scenarios:
whenever the page is navigated
whenever the child frame is attached or navigated. In this case, the function is invoked in the context of the newly attached frame
The function is invoked after the document was created but before any of its scripts were run. This is useful to amend the JavaScript environment, e.g. to seed Math.random.
An example of overriding the navigator.languages property before the page loads:
// preload.js
// overwrite the `languages` property to use a custom getter
Object.defineProperty(navigator, "languages", {
get: function() {
return ["en-US", "en", "bn"];
}
});
// In your puppeteer script, assuming the preload.js file is in same folder of our script
const preloadFile = fs.readFileSync('./preload.js', 'utf8');
await page.evaluateOnNewDocument(preloadFile);

Categories