Puppeteer - Click on ad - javascript

I'm doing some freelance work for a guy who wants information on the ads on his website. I need to click on the ad with Puppeteer and get the resulting page url.
Here's what I tried.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('http://example.com/page/ad', {waitUntil: 'networkidle2'});
await page.click('#aw0')
})();
It keeps returning Error: No node found for selector: #aw0

Clicking on ads definitely works, however you will need to tweak every single ad section differently and beware of the consequences.
Disclaimer
Read and use the content of the answer at your own risk,
Beware that clicking on ad automatically might result in banning from the ad network since there are many ways to know if the click was from actual user or not.
This have been done for many years and ended up badly. The below is to show how it works, but again, your/clients account will be banned for sure because if I were the ad network I'd avoid such easy method to cheat.
Some ads will trigger popups, so beware of ghost chrome processes too.
The cost of running the puppeteer and click ad might be actually bigger than doing marketing and such stuff to the website.
Overview
Consider this page with this simple ad, if you try to inspect, you will see iframe, but see further, it's an iframe inside iframe and that varies greatly between adservices and target website.
Clicking element within frame, within frame...?
As discussed here on the issue, So far we could do this to click something within frame.
await page.goto('https://example.com');
const frame = await page.frames().find(f => f.name() === 'someIframe');
const button = await frame.$('button');
button.click();
Now, if we want to click this particular element, what can be done? The name is not there, the id is random. Going to actual ad page will reveal the iframe, but again check above disclaimer,
If you see, the main iframe src says, /ads/adprotect300.aspx, so we can open it and click on the element there. We also see the iframe has a name starting with mdns. Taking all research in mind, we can prepare a code like this,
const page = await browser.newPage();
await page.goto('http://example.com/ads/adprotect300.aspx', {waituntil: "networkidle0"});
await page.waitFor('iframe');
await page.waitFor(4000); // artificial wait for randomness
const frame = await page.frames().find(f=>f.name().includes('mdns'));
const ad = await frame.$('div > a');
ad.click();
In this website, it opened a new tab, as stated before, it clicked and now we have to do is grab the links for all open tabs, so if it has any popups or redirects on new tab, it will be grabbed.
await page.waitFor(2000);
const pages = await browser.pages()
console.log(pages.map(page=>page.url()))
There are better ways to wait for the navigation and all, but I am just showing what can be done. The result,
[ 'chrome-search://local-ntp/local-ntp.html',
'http://example.com/ads/adprotect300.aspx',
'https://adwebsite/activity/htb/candy/pc?ref=93454&i=704ea49d-7b0b-4c05-b4d0-f0225ecc7154&h=12700290a03e232a14fa0f1cf35e27a346d91f6e&c=878146837666' ]
Let me remind you once again, this is clearly illegal and the accounts might be put on risk. Use your head at your own risk.

You can use waitFor to make sure that the specific selector is available in the DOM
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#framewaitforselectororfunctionortimeout-options-args

Related

Beginner problem with crashing code to lookup class or text on website

I write this code to check if on my website is a text and if its not then it should send me a notification thru slack. When I run it on VSC it crushes after some time like maybe 15 min or something like that.
I want to make it nice to put it on a server and run it remotly but need to be sure that will not crash every so often. I want to use it to check some websites for changing information on them and if they will change or be gone then send me notification. Best bit it works but crashes and don't know why :(
Can someone maybe help or pinpoint what it can be the problem? It will be better that this tool can just see text instead of class but I don't know how to do that.
//Puppeteer library
const pt= require('puppeteer')
const axios = require('axios')
process.setMaxListeners(0);
async function getText(){
//launch browser in headless mode
const browser = await pt.launch()
//browser new page
const page = await browser.newPage()
//launch URL
await page.setDefaultNavigationTimeout(0);
//website
await page.goto('https://mieciusio.pl/kontakt.html')
//identify element
if (await page.$("[class='p-style btn-resize-mode label-bloc-2-style label-1-style']"))
console.log("found")
else //console.log("not found")
axios.post(' https://hooks.slack.com/services/MYUniqeID', {text: 'Its changed'})
}
setInterval(getText, 12000)
Try to find it online on YT but it's hard I was looking in a lot of tut's but can't find right one to work on finding text on website or not to crush because I don't know why crashes.

Target an iFrame using Node.js/Puppeteer

Is it possible to target an iFrame when using the GUI Workflow builder in AWS Cloudwatch Synthetics?
I've set up the canary to log in to a website and redirect the page which has run successfully, but one of the elements I need to check with Node.js is within an iFrame which isn't being recognised.
This is the iframe code. It loads from Javascript, but all content is from the same domain:
<iframe id="paramsFrame" src="empty.htm" frameborder="0" ppTabId="-1"
onload="paramsDocumentLoaded('paramsFrame', true);"></iframe>
This is the code I'm using for this section, but it's just returning a timeout error:
await synthetics.executeStep('verifyText', async function() {
const elementHandle = await page.waitForSelector('#paramsFrame');
const frame = await elementHandle.contentFrame();
await frame.waitForXPath("//div[#class=\'css7\'][contains(text(),'Specificity')]", { timeout: 30000 });
})
This code is trying to target a div with class css7 found within an iframe with id paramsFrame
Edit: I did a null check on frame and it came back as not null, not sure if that is relevant.
I also tried to target an element directly:
const next = await frame.waitForSelector('.protocol-name-link');
but I got the error message:
TimeoutError: waiting for selector .protocol-name-link
If the iframe is on a different origin (e.g. different domain), you cannot access it through Puppeteer.
You can try to disable some security features of Puppeteer, although this is not advised.
Specifically, you'd probably want to add these args to puppeteer.launch
--disable-web-security
--disable-features=IsolateOrigins,site-per-process
I tried running similar code on a website which had a youtube iframe and I didnt need the puppeteer launch args
i.e
args: [
"--disable-web-security",
"--disable-features=IsolateOrigins,site-per-process",
],
But, First I would like to suggest is for the iframe try to confirm it is the same iframe that you need maybe by logging, debugging or even just going on dev console.
And the second is to use full xpath of the element in the frame.
Here is my code which I tried running.
const page = await browser.newPage();
console.log("open page");
await page.goto("https://captioncrusher.com/");
console.log("page opened");
// use this if you want to wait for all the requests to be done.
// await page.waitForNetworkIdle();
const elementHandle = await page.waitForSelector("iframe.yt");
const frame = await elementHandle.contentFrame();
//These both work for me
const aLink = await frame.waitForXPath("/html/body/div/div/a");
const classLink = await frame.waitForSelector(".ytp-impression-link");
await browser.close();

Get Puppeteer Page/Frame Handle for new page after `ElementHandle.click()`

Using puppeteer, I have a specific page that I am web-scraping for data and screenshot-ing for proof that the data is correct. The web page itself includes a button for creating a printer friendly version of the page. The button itself is implemented as an input of type button with no target attribute. Still, once clicked, the button opens the printer friendly version on a new page(tab) at about:blank that automatically opens up chrome's print dialog.
Whenever a new page opens up, I've typically done browser.waitForTarget() to try to capture the new target and work from there. The issue is that with any variation of code, I'm never able to find a Page that matches the page that was opened up. The closest I get is finding a Target of type other and a url of chrome://print.
Is there any way to find this type of target easily and even more get it's page (since target.page() only returns a page if the target.type() === 'page'? As a bonus, I'd like a way to potentially dismiss or ignore the window's print dialog, possibly even cancel.
You need to do the following to capture a new browser window:
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
let page1;
browser.on("targetcreated", async (target) => {
if (target.type() === "page") {
page1 = await target.page();
}
});
Or you can find the desired page using browser.pages() method. See the documentation for more information.

Multiple separate browser with one tab each - simultaneous interaction with elements on pages (puppeteer headless)

Using Node.js, Chrome and puppeteer as headless on ubuntu server, I'm scraping a few different websites. One of the occasional task is to interact with the loaded page (click on a link to open another page and then possibly do another click to accept the terms and such).
I can do all this just fine, but I'm trying to understand how it will work if I have multiple pages open simultaneously and am trying to interact with different loaded pages at the same time (overlapping times).
To visualize this, I'm thinking how a user will do the same job. They'll have to open multiple browser windows, open the page and switch between them to see and then click on links.
But using puppeteer, we have separate browser object, we don't need to see the window or page to know where to click. We can traverse it through the browser object and then do a click on desired element without looking (headless).
I'm thinking I should be able to do multiple pages at the same time as long as I have CPU and memory available to handle them.
Does anyone have any experience with puppeteer interacting with multiple websites simultaneously? Anything I need to watch out for?
This is the problem the library puppeteer-cluster (I'm the author) is addressing. It allows you to build a pool of pages (or browsers) to use and run tasks inside.
You find several general code samples in the repository (and also on stackoverflow). Let me address your specific use case of running different tasks with an example.
Code Sample
The following code creates two tasks:
crawl: Opens the page and extracts an URL to then start the second task
screenshot: Takes a screenshot of the extracted URL
The process is started by queuing the crawl task with the URLs.
const { Cluster } = require('puppeteer-cluster');
(async () => {
const cluster = await Cluster.launch({ // use four pages in parallel
concurrency: Cluster.CONCURRENCY_PAGE,
maxConcurrency: 4,
});
// We define two tasks
const crawl = async ({ page, data: url }) => {
await page.goto(url);
const extractedURL = /* ... */; // extract an URL (or multiple) from the document somehow
cluster.queue(extractedURL, screenshot);
};
const screenshot = async ({ page, data: url }) => {
await page.goto(url);
await page.screenshot();
};
// Crawl some pages
cluster.queue('https://www.google.com/', crawl);
cluster.queue('https://github.com/', crawl);
// Wait until everything is done and close the cluster
await cluster.idle();
await cluster.close();
})();
This is a minimal example. I left out error handling, monitoring and the setup options.
I can usually get 5 or so browsers going on a 4GB server, if you're just popping urls off a queue it's pretty straightforward:
const puppeteer = require('puppeteer');
let queue = [
'http://www.amazon.com',
'http://www.google.com',
'http://www.fabebook.com',
'http://www.reddit.com',
]
const doQueue = async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
let url
while(url = queue.shift()){
await page.goto(url)
console.log(await page.title())
}
await browser.close()
}
[1,2,3].map(() => doQueue())

How can use puppeteer with my current chrome (keeping my credentials)

i'm actually trying to use puppeteer for scraping and i need to use my current chrome to keep all my credentials and use it instead of relogin and type password each time which is a really time lose !
is there a way to connect it ? how to do that ?
i'm actually using node v11.1.0
and puppeteer 1.10.0
let scrape = async () => {
const browser = await log()
const page = await browser.newPage()
const delayScroll = 200
// Login
await page.goto('somesite.com');
await page.type('#login-email', '*******);
await page.type('#login-password', "******");
await page.click('#login-submit');
// Wait to login
await page.waitFor(1000);
}
and now it will be perfect if i do not need to use that and go on page (headless, i dont wan't to see the page opening i'm just using the info scraping in node) but with my current chrome who does not need to login to have information i need. (because at the end i want to use it as an extension of chrome)
thx in advance if someone knows how to do that
First welcome to the community.
You can use Chrome instead of Chromium but sincerely in my case, I get a lot of errors and cause a mess with my personal tabs. So you can create and save a profile, then you can login with a current or a new account.
In your code you have a function called "log" I'm guessing that there you set launch puppeeteer.
const browser = await log()
Into that function use arguments and create a relative directory for your profile data:
const browser = await puppeteer.launch({
args: ["--user-data-dir=./Google/Chrome/User Data/"]
});
Run your application, login with an account and the next time you enter you should see your credentials
Any doubt please add a comment.

Categories