<script setup>
import puppeteer from 'puppeteer';
const onChange = async () => {
// Launch the browser
const browser = await puppeteer.launch();
// Create a page
const page = await browser.newPage();
// Go to your site
await page.goto('https://stackoverflow.com');
// Evaluate JavaScript
const three = await page.evaluate(() => {
return 1 + 2;
});
console.log(three);
// Close browser.
await browser.close();
};
console.log(onChange);
</script>
here I am trying to scrape a website using (puppeteer) but I get an error
in ReferenceError: process is not defined
From the official page
Puppeteer is a Node.js library
You should probably not try to get that one working with VueJS (a client-side framework), gonna have a harder time for not a lot of benefits.
Use regular NodeJS for the scraping.
Otherwise, you could give a try to Nuxt3 with its SSR capabilities if you think that having Puppeteer and Vue alongside is crucial.
Related
I Am creating a bot that can access websites and I am creating a GUI for it using flutter. I want to write the code for the bot accessing the website in javascript but I am struggling to find anything that can allow me to execute a script on the click of a button. I have attempted to use flutter_js but when I press the button to call the script, I get this error:
SyntaxError: expecting '('
at <eval>:1
JSError (SyntaxError: expecting '('
at <eval>:1
)
The code for the flutter to Js is:
import 'package:flutter/services.dart';
import 'package:flutter_js/flutter_js.dart';
import 'package:flutter_js/quickjs/qjs_typedefs.dart';
void addFromJs(JavascriptRuntime _javascriptRuntime)
async{
String test_code = await rootBundle.loadString("assets/bot_code/testcode.js");
final eval = _javascriptRuntime.evaluate(test_code+ """openTrapstar()""");
final res = eval.stringResult;
}
And the code for the js script is:
import * as puppeteer from 'puppeteer';
function openTrapstar(){
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('https://uk.trapstarlondon.com/');
console.log("HERE");
await browser.close();
})();
};
openTrapstar();
addFromJs is then called in an OnPressed function where the button is.
From this post I want to know what I am doing wrong in flutter_js or if it's even possible to run a script like this in flutter_js.
If not how can I go about achieving this goal.
I am using Puppeteer to get page data, but unfortunately there is no way to make all requests.
Therefore, the question arose - How, after opening the site, get from all Fetch / XHR requests with the name v2 JSON contained in their responses?
In this case, as I understand it, need to use waiting.
It is not possible to peep into the request and the body and repeat a similar request, since the body uses code that is generated randomly each time - therefore this is not an option, it was in connection with this that it became necessary to simply display all json responses from requests with names v2.
I am attaching a screenshot and my code, I beg you - point me in the right direction, I will be grateful for any help!
// puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
import puppeteer from "puppeteer-extra";
// add stealth plugin and use defaults (all evasion techniques)
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
export async function ProductAPI() {
try {
puppeteer.use(StealthPlugin())
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
await page.goto('here goes link for website');
const pdata = await page.content() // this just prints HTML
console.log(pdata)
browser.close();
} catch (err) {
throw err
}
}(ProductAPI())
link for image: https://i.stack.imgur.com/ZR6T1.png
I know that the code I wrote just returns html. I'm just trying to figure out how to get the data I need, I googled for a very long time, but could not find the answer I needed.
It is very important that the execution is on node js (javscript) and it doesn’t matter if it’s a puppeteer or something else.
This works!
import puppeteer from "puppeteer-extra";
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
async function SomeFunction () {
puppeteer.use(StealthPlugin())
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
page.on('response', async (response) => {
if(response.url().includes('write_link_here')){
console.log('XHR response received');
const HTMLdata = await response.text()
console.log(HTMLdata)
};});
await page.goto('some_website_link');}
I am trying to upload a file using puppeteer and browserWSEndpoint, the error message I am getting is
"Uncaught (in promise) Error: File chooser handling does not work with multiple connections to the same page".
Here is my code:
const puppeteer = require('puppeteer');
async function getTest() {
const browser = await puppeteer.connect({
browserWSEndpoint: 'wss://chrome.browserless.io'
});
const page = (await browser.pages())[0];
await page.goto('https://someWebSite');
//DO STUFF
console.log("before upload"); //code runs until here
const [fileChooser] = await Promise.all([page.waitForFileChooser(),page.click('#uploadTrigger'),]);
await fileChooser.accept(['C:\\myProgram\\pic.jpg']);
await page.click('#edit-submit');
}
getTest().then(console.log);
I must mention that if I don't use browserWSEndpoint, and use this code at the beginning instead, everything works fine.
const browser = await puppeteer.launch({headless: false, defaultViewport:null});
Honnestly I am pretty lost with browserWSEndpoint, I used info from this post How to run Puppeteer code in any web browser?
which led me to browserless.io, copied the code and it works.
Now this is my precise question, my error indicates does not work with multiple connections to the same page. How exactly am I connecting with multiple connections? Maybe I can resolve this issue and then I could use const [fileChooser].
My main issue is that I need to upload a file, using browserless
Others seem to have the same problem according to https://github.com/GoogleChrome/puppeteer/issues/4783, but using chromuim is not an option if I want to use browserless
If you are the only client connected to that browser you must be connected to a browser that doesn't support the fileChooser. You should connect to a Chromium 77.0.3844.0 (r674921) or higher.
I want to start a chromium browser instant headless, do some automated operations, and then turn it visible before doing the rest of the stuff.
Is this possible to do using Puppeteer, and if it is, can you tell me how? And if it is not, is there any other framework or library for browser automation that can do this?
So far I've tried the following but it didn't work.
const browser = await puppeteer.launch({'headless': false});
browser.headless = true;
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com', {waitUntil: 'networkidle2'});
await page.pdf({path: 'hn.pdf', format: 'A4'});
Short answer: It's not possible
Chrome only allows to either start the browser in headless or non-headless mode. You have to specify it when you launch the browser and it is not possible to switch during runtime.
What is possible, is to launch a second browser and reuse cookies (and any other data) from the first browser.
Long answer
You would assume that you could just reuse the data directory when calling puppeteer.launch, but this is currently not possible due to multiple bugs (#1268, #1270 in the puppeteer repo).
So the best approach is to save any cookies or local storage data that you need to share between the browser instances and restore the data when you launch the browser. You then visit the website a second time. Be aware that any state the website has in terms of JavaScript variable, will be lost when you recrawl the page.
Process
Summing up, the whole process should look like this (or vice versa for headless to headfull):
Crawl in non-headless mode until you want to switch mode
Serialize cookies
Launch or reuse second browser (in headless mode)
Restore cookies
Revisit page
Continue crawling
As mentioned, this isn't currently possible since the headless switch occurs via Chromium launch flags.
I usually do this with userDataDir, which the Chromium docs describe as follows:
The user data directory contains profile data such as history, bookmarks, and cookies, as well as other per-installation local state.
Here's a simple example. This launches a browser headlessly, sets a local storage value on an arbitrary page, closes the browser, re-opens it headfully, retrieves the local storage value and prints it.
const puppeteer = require("puppeteer"); // ^18.0.4
const url = "https://www.example.com";
const opts = {userDataDir: "./data"};
let browser;
(async () => {
{
browser = await puppeteer.launch({...opts, headless: true});
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.evaluate(() => localStorage.setItem("hello", "world"));
await browser.close();
}
{
browser = await puppeteer.launch({...opts, headless: false});
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
const result = await page.evaluate(() => localStorage.getItem("hello"));
console.log(result); // => world
}
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Change const opts = {userDataDir: "./data"}; to const opts = {}; and you'll see null print instead of world; the user data doesn't persist.
The answer from a few years ago mentions issues with userDataDir and suggests a cookies solution. That's fine, but I haven't had any issues with userDataDir so either they've been resolved on the Puppeteer end or my use cases haven't triggered the issues.
There's a useful-looking answer from a reputable source in How to turn headless on after launch? but I haven't had a chance to try it yet.
What i would like to do, is loading a page, and getting the content of something trough XPath or Selector or JS Path to then use a value got by that into my program. How could i do that ?
For instance on this page, doing a request using the url of the page and following that path (while also targeting the type somehow, here it is the class) :
//*[#id="question-header"]/h1/a
Would give me 'Load any url content and follow XPATH in JS'
As i am getting the text inside this :
Load any url content and follow XPATH in JS
If you need the most reliable way to get some data from a web page — i.e. including the data that can be generated by a JavaScript execution on the client side — you can use some manager of a headless browser. For example, the described task can be accomplished with Node.js and puppeteer in this script (selectors and XPath are supported as well as all the Web API via evaluation of code fragments in browser context and exchanging the data between Node.js and browser contexts):
'use strict';
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto('https://stackoverflow.com/questions/54847748/load-any-url-content-and-follow-xpath-in-js');
const data = await page.evaluate(() => {
return document.querySelector('#question-header > h1 > a').innerText;
});
console.log(data);
await browser.close();
} catch (err) {
console.error(err);
}
})();
Well, you could use something like
document.getElementById('question-header').children[0].children[0].href;
It's not as dynamic as XPATH (redundancy of the children), but should do the trick of you're facing a static structure. For Node.js there are several libraries that could as well do it, such as libxmljs or parse5 - more on this here.