I am using Puppeteer to get page data, but unfortunately there is no way to make all requests.
Therefore, the question arose - How, after opening the site, get from all Fetch / XHR requests with the name v2 JSON contained in their responses?
In this case, as I understand it, need to use waiting.
It is not possible to peep into the request and the body and repeat a similar request, since the body uses code that is generated randomly each time - therefore this is not an option, it was in connection with this that it became necessary to simply display all json responses from requests with names v2.
I am attaching a screenshot and my code, I beg you - point me in the right direction, I will be grateful for any help!
// puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
import puppeteer from "puppeteer-extra";
// add stealth plugin and use defaults (all evasion techniques)
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
export async function ProductAPI() {
try {
puppeteer.use(StealthPlugin())
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
await page.goto('here goes link for website');
const pdata = await page.content() // this just prints HTML
console.log(pdata)
browser.close();
} catch (err) {
throw err
}
}(ProductAPI())
link for image: https://i.stack.imgur.com/ZR6T1.png
I know that the code I wrote just returns html. I'm just trying to figure out how to get the data I need, I googled for a very long time, but could not find the answer I needed.
It is very important that the execution is on node js (javscript) and it doesn’t matter if it’s a puppeteer or something else.
This works!
import puppeteer from "puppeteer-extra";
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
async function SomeFunction () {
puppeteer.use(StealthPlugin())
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
page.on('response', async (response) => {
if(response.url().includes('write_link_here')){
console.log('XHR response received');
const HTMLdata = await response.text()
console.log(HTMLdata)
};});
await page.goto('some_website_link');}
Related
<script setup>
import puppeteer from 'puppeteer';
const onChange = async () => {
// Launch the browser
const browser = await puppeteer.launch();
// Create a page
const page = await browser.newPage();
// Go to your site
await page.goto('https://stackoverflow.com');
// Evaluate JavaScript
const three = await page.evaluate(() => {
return 1 + 2;
});
console.log(three);
// Close browser.
await browser.close();
};
console.log(onChange);
</script>
here I am trying to scrape a website using (puppeteer) but I get an error
in ReferenceError: process is not defined
From the official page
Puppeteer is a Node.js library
You should probably not try to get that one working with VueJS (a client-side framework), gonna have a harder time for not a lot of benefits.
Use regular NodeJS for the scraping.
Otherwise, you could give a try to Nuxt3 with its SSR capabilities if you think that having Puppeteer and Vue alongside is crucial.
I Am creating a bot that can access websites and I am creating a GUI for it using flutter. I want to write the code for the bot accessing the website in javascript but I am struggling to find anything that can allow me to execute a script on the click of a button. I have attempted to use flutter_js but when I press the button to call the script, I get this error:
SyntaxError: expecting '('
at <eval>:1
JSError (SyntaxError: expecting '('
at <eval>:1
)
The code for the flutter to Js is:
import 'package:flutter/services.dart';
import 'package:flutter_js/flutter_js.dart';
import 'package:flutter_js/quickjs/qjs_typedefs.dart';
void addFromJs(JavascriptRuntime _javascriptRuntime)
async{
String test_code = await rootBundle.loadString("assets/bot_code/testcode.js");
final eval = _javascriptRuntime.evaluate(test_code+ """openTrapstar()""");
final res = eval.stringResult;
}
And the code for the js script is:
import * as puppeteer from 'puppeteer';
function openTrapstar(){
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('https://uk.trapstarlondon.com/');
console.log("HERE");
await browser.close();
})();
};
openTrapstar();
addFromJs is then called in an OnPressed function where the button is.
From this post I want to know what I am doing wrong in flutter_js or if it's even possible to run a script like this in flutter_js.
If not how can I go about achieving this goal.
I'm using Puppeteer in my Node JS app to get the URLs in a redirect chain, e.g: going from one URL to the next. Up until this point I've been creating ngrok URLs which use simple PHP header functions to redirect a user with 301 and 302 requests, and my starting URL is a page that redirects to one of the ngrok URL's after a few seconds.
However, it appears that Network.requestWillBeSent exits if it comes across a page that uses a Javascript redirection, and I need it to somehow wait and pick up these ones as well.
Example journey of URLs:
START -> https://example.com/ <-- setTimeout and redirects to an ngrok
ngrok url uses PHP to redirect with a 301
some other ngrok that uses a JS setTimeout to redirect to, for example, another https://example.com/
FINISH -> https://example.com/
In this situation, Network.requestWillBeSent picks up 1 and 2, but finishes on 3 and thus doesn't get to 4.
So rather than it console logging all four URLs, I only get two.
It's difficult to create a reproduction since I can't set up all ngrok urls etc, but here's a Codesandbox link and a Github link, attached below is my code:
const dayjs = require('dayjs');
const AdvancedFormat = require('dayjs/plugin/advancedFormat');
dayjs.extend(AdvancedFormat);
const puppeteer = require('puppeteer');
async function runEmulation () {
const goToUrl = 'https://example.com/';
// vars
const journey = [];
let hopDataToReturn;
// initiate a Puppeteer instance with options and launch
const browser = await puppeteer.launch({
headless: false
});
// launch a new page
const page = await browser.newPage();
// initiate a new CDP session
const client = await page.target().createCDPSession();
await client.send('Network.enable');
await client.on('Network.requestWillBeSent', async (e) => {
// if not a document, skip
if (e.type !== 'Document') return;
console.log(`adding URL to journey: ${e.documentURL}`)
// the journey
journey.push({
url: e.documentURL,
type: e.redirectResponse ? e.redirectResponse.status : 'JS Redirection',
duration_in_ms: 0,
duration_in_sec: 0,
loaded_at: dayjs().valueOf()
});
});
await page.goto(goToUrl);
await page.waitForNavigation();
await browser.close();
console.log('=== JOURNEY ===')
console.log(journey)
}
// init
runEmulation()
What am I missing inside Network.requestWillBeSent or what do I need to add in order to pick up websites in the middle that use JS to redirect to another site after a few seconds.
Since, client.on("Network.requestWillBeSent") takes a callback function, you cannot use await on this. await is only valid for methods that return a Promise. Every async function returns a Promise.
As you need to wait for the callback function to finish execution, you can put your code inside the callback function as
client.on('Network.requestWillBeSent', async (e) => {
// if not a document, skip
if (e.type !== 'Document') return;
console.log(`adding URL to journey: ${e.documentURL}`)
// the journey
journey.push({
url: e.documentURL,
type: e.redirectResponse ? e.redirectResponse.status : 'JS Redirection',
duration_in_ms: 0,
duration_in_sec: 0,
loaded_at: dayjs().valueOf()
});
await page.goto(goToUrl);
await page.waitForNavigation();
await browser.close();
console.log('=== JOURNEY ===')
console.log(journey)
});
What i would like to do, is loading a page, and getting the content of something trough XPath or Selector or JS Path to then use a value got by that into my program. How could i do that ?
For instance on this page, doing a request using the url of the page and following that path (while also targeting the type somehow, here it is the class) :
//*[#id="question-header"]/h1/a
Would give me 'Load any url content and follow XPATH in JS'
As i am getting the text inside this :
Load any url content and follow XPATH in JS
If you need the most reliable way to get some data from a web page — i.e. including the data that can be generated by a JavaScript execution on the client side — you can use some manager of a headless browser. For example, the described task can be accomplished with Node.js and puppeteer in this script (selectors and XPath are supported as well as all the Web API via evaluation of code fragments in browser context and exchanging the data between Node.js and browser contexts):
'use strict';
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto('https://stackoverflow.com/questions/54847748/load-any-url-content-and-follow-xpath-in-js');
const data = await page.evaluate(() => {
return document.querySelector('#question-header > h1 > a').innerText;
});
console.log(data);
await browser.close();
} catch (err) {
console.error(err);
}
})();
Well, you could use something like
document.getElementById('question-header').children[0].children[0].href;
It's not as dynamic as XPATH (redundancy of the children), but should do the trick of you're facing a static structure. For Node.js there are several libraries that could as well do it, such as libxmljs or parse5 - more on this here.
i'm actually trying to use puppeteer for scraping and i need to use my current chrome to keep all my credentials and use it instead of relogin and type password each time which is a really time lose !
is there a way to connect it ? how to do that ?
i'm actually using node v11.1.0
and puppeteer 1.10.0
let scrape = async () => {
const browser = await log()
const page = await browser.newPage()
const delayScroll = 200
// Login
await page.goto('somesite.com');
await page.type('#login-email', '*******);
await page.type('#login-password', "******");
await page.click('#login-submit');
// Wait to login
await page.waitFor(1000);
}
and now it will be perfect if i do not need to use that and go on page (headless, i dont wan't to see the page opening i'm just using the info scraping in node) but with my current chrome who does not need to login to have information i need. (because at the end i want to use it as an extension of chrome)
thx in advance if someone knows how to do that
First welcome to the community.
You can use Chrome instead of Chromium but sincerely in my case, I get a lot of errors and cause a mess with my personal tabs. So you can create and save a profile, then you can login with a current or a new account.
In your code you have a function called "log" I'm guessing that there you set launch puppeeteer.
const browser = await log()
Into that function use arguments and create a relative directory for your profile data:
const browser = await puppeteer.launch({
args: ["--user-data-dir=./Google/Chrome/User Data/"]
});
Run your application, login with an account and the next time you enter you should see your credentials
Any doubt please add a comment.