Using puppeteer, I have a specific page that I am web-scraping for data and screenshot-ing for proof that the data is correct. The web page itself includes a button for creating a printer friendly version of the page. The button itself is implemented as an input of type button with no target attribute. Still, once clicked, the button opens the printer friendly version on a new page(tab) at about:blank that automatically opens up chrome's print dialog.
Whenever a new page opens up, I've typically done browser.waitForTarget() to try to capture the new target and work from there. The issue is that with any variation of code, I'm never able to find a Page that matches the page that was opened up. The closest I get is finding a Target of type other and a url of chrome://print.
Is there any way to find this type of target easily and even more get it's page (since target.page() only returns a page if the target.type() === 'page'? As a bonus, I'd like a way to potentially dismiss or ignore the window's print dialog, possibly even cancel.
You need to do the following to capture a new browser window:
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
let page1;
browser.on("targetcreated", async (target) => {
if (target.type() === "page") {
page1 = await target.page();
}
});
Or you can find the desired page using browser.pages() method. See the documentation for more information.
Related
My requirement is to let the user print a document and update the database status. The user usually prints document as a PDF using Microsoft Print to PDF. I need to identify whether either user clicked the print button and saved the PDF or closed it without printing. Also, I need to redirect the user after a successful print.
I tried to add element.onafterprint event but it seems it returns the same response in both cases.
Also, with the above PDF printer, element.onafterprint event triggers after the user clicks the Print button, but there is a Save dialog box. So webpage redirects to the other page before the user completes the full action.
Is there a way to fix this? My current (not working properly) approach is as follows.
PS: My project is written in React
const printDoc = () => {
const divContents = document.getElementById('printArea').innerHTML;
const element = window.open('', '', 'height=700, width=1024');
element.document.write(divContents);
element.onafterprint = (e) => { // this is where things not work
console.log(e)
element.close();
updatePickListPrintStatus(PRINT_STATUS.SUCCESS);
redirectAfterPrint()
}
getCssFromComponent(document, element.document);
element.document.close();
// Loading image lazy
setTimeout(() => {
element.print();
}, 1000);
};
I am trying to inspect a page with playwright that holds a frame document that when I click a button a banner will appear for a couple of minutes. When it's done the page needs to be reloaded for the banner to disappear. I am checking every 5 minutes automatically until I don't see the banner on the page but when I can only do it for the 1 loop after that the code breaks. What can I do to fix this.
A possible solution could be going to the iframe link itself but the document breaks if I do that. I wish to avoid doing this. It's not how I would do things if I was manually doing this.
UnhandledPromiseRejectionWarning: frame.evaluate: Execution Context is not available in detached frame (are you trying to evaluate?)
const browser = await chromium.launch({
args: ["--start-maximized", "--disable-notifications", '--disable-extensions', '--mute-audio'],
defaultViewport: null,
devtools: true,
slowMo: 50,
downloadsPath: "D:\\Lambda\\projects\\puppeteer_test\\data",
});
// Create a new incognito browser context with user credentials
const context = await browser.newContext({
acceptDownloads: true,
viewport: null,
storageState: JSON.parse(storageState),
})
// Create a new page in a pristine context.
const page = await context.newPage()
// go to download your information
await page.goto("");
//select child frame
const frameDocUrl = await (await page.waitForSelector("iframe")).getAttribute("src")
const doc = await page.frame({url: frameDocUrl})
await doc.waitForLoadState('domcontentloaded');
/* waitForFile */
// refresh every 5 minute until notice of gathering file is gone
// then Pending becomes download
const frameUrl = await doc.url()
const fiveMinutes = 300000
let IsGatheringFile = await doc.$("//div[text()='A copy of your information is being created.']") ? true: false
while(IsGatheringFile){
//reload page
console.log("going to reload")
await doc.goto(frameUrl)
// wait for 5 minutes
console.log(`going to start waiting for 5 min starting in ${Date().split(" ")[4]}`)
await doc.waitForTimeout(fiveMinutes)
console.log("finish reloading")
// check if notice is gone
IsGatheringFile = await doc.$("//div[text()='A copy of your information is being created.']") ? true: false
}
console.log("finish waiting for data")
console.log("finish reloading the page until the banner is gone")
Solution:
after the page refresh/new navigation recapture the focus on the iframe.
const frameUrl = await doc.url()
await doc.goto(frameUrl)
Also, note that you can update the variable that you are passing by to the other parts of your script with the new refresh iframe.
old hacky fix:
Instead of reloading the page reload the iframe.
At the moment there is no frame.reload but this process can be achieved by frame.goto(frameURL)
const frameUrl = await doc.url()
await doc.goto(frameUrl)
Note: iframe can break. Reloading the page can fix it but the frame will be detached.
This post is a bit old but I will respond anyway as I had this problem this week and just resolved it.
I am in python not Node, but the logic is still the same I believe.
So for me, just recapturing the focus didn't work after the page.reload().
I did use the "old hacky fix" and instead of reload all the page, reloaded just the frame concerned.
My solution is like that :
iframe.goto(iframe.url)
is_detached = iframe.is_detached()
if is_detached:
iframe = page.main_frame.child_frames[-1]
Im using the selenium webdriver for node.js and im also loading an extension, loading the extension works fine but when I start my project it goes to the page I want then instantly the extension opens a new tab (Thank you for adding this extension bla bla bla), Im wondering if theres a way to disable tabs that are not opened by myself, ive tried this:
await driver.get('https://mywebsite.com') //open my initial site
await driver.sleep(1000) //give time for the extension site to open
driver.switchTo(await driver.getAllWindowHandles()[1]) //switch to extension site
await driver.close()
driver.switchTo(await driver.getAllWindowHandles()[0]) //switch back to the main site
//rest of my code
Unfortunately this just does not seem to work, any advice appreciated!
There's no way to disable tabs not opened by your script. As long as you don't change window handles, the driver will still be on the original tab. You can proceed with the script from there, ignoring the other opened tabs.
I think the main issue I see with your code is that you are passing parameters to .switchTo() instead of .window(). It should be driver.switchTo().window(handle);.
If you want to find the new window to close it, I wrote that code in this answer. All you need to do is to add the .close() line after that code and switch back to the original handle, which you already have in your current code (after fixing with my feedback above).
Another approach is heavily based on the selenium.dev docs:
// Open the initial site
await driver.get('https://mywebsite.com')
// Store the ID of the original window
const originalWindow = await driver.getWindowHandle();
// Wait for the new window or tab
await driver.wait(async () => (await driver.getAllWindowHandles()).length === 2, 10000);
// Loop through until we find a new window handle
const windows = await driver.getAllWindowHandles();
windows.forEach(async handle => {
if (handle !== originalWindow) {
await driver.switchTo().window(handle);
}
});
await driver.close()
await driver.switchTo().window(originalWindow);
// Rest of the code
i'm actually trying to use puppeteer for scraping and i need to use my current chrome to keep all my credentials and use it instead of relogin and type password each time which is a really time lose !
is there a way to connect it ? how to do that ?
i'm actually using node v11.1.0
and puppeteer 1.10.0
let scrape = async () => {
const browser = await log()
const page = await browser.newPage()
const delayScroll = 200
// Login
await page.goto('somesite.com');
await page.type('#login-email', '*******);
await page.type('#login-password', "******");
await page.click('#login-submit');
// Wait to login
await page.waitFor(1000);
}
and now it will be perfect if i do not need to use that and go on page (headless, i dont wan't to see the page opening i'm just using the info scraping in node) but with my current chrome who does not need to login to have information i need. (because at the end i want to use it as an extension of chrome)
thx in advance if someone knows how to do that
First welcome to the community.
You can use Chrome instead of Chromium but sincerely in my case, I get a lot of errors and cause a mess with my personal tabs. So you can create and save a profile, then you can login with a current or a new account.
In your code you have a function called "log" I'm guessing that there you set launch puppeeteer.
const browser = await log()
Into that function use arguments and create a relative directory for your profile data:
const browser = await puppeteer.launch({
args: ["--user-data-dir=./Google/Chrome/User Data/"]
});
Run your application, login with an account and the next time you enter you should see your credentials
Any doubt please add a comment.
I'm doing some freelance work for a guy who wants information on the ads on his website. I need to click on the ad with Puppeteer and get the resulting page url.
Here's what I tried.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('http://example.com/page/ad', {waitUntil: 'networkidle2'});
await page.click('#aw0')
})();
It keeps returning Error: No node found for selector: #aw0
Clicking on ads definitely works, however you will need to tweak every single ad section differently and beware of the consequences.
Disclaimer
Read and use the content of the answer at your own risk,
Beware that clicking on ad automatically might result in banning from the ad network since there are many ways to know if the click was from actual user or not.
This have been done for many years and ended up badly. The below is to show how it works, but again, your/clients account will be banned for sure because if I were the ad network I'd avoid such easy method to cheat.
Some ads will trigger popups, so beware of ghost chrome processes too.
The cost of running the puppeteer and click ad might be actually bigger than doing marketing and such stuff to the website.
Overview
Consider this page with this simple ad, if you try to inspect, you will see iframe, but see further, it's an iframe inside iframe and that varies greatly between adservices and target website.
Clicking element within frame, within frame...?
As discussed here on the issue, So far we could do this to click something within frame.
await page.goto('https://example.com');
const frame = await page.frames().find(f => f.name() === 'someIframe');
const button = await frame.$('button');
button.click();
Now, if we want to click this particular element, what can be done? The name is not there, the id is random. Going to actual ad page will reveal the iframe, but again check above disclaimer,
If you see, the main iframe src says, /ads/adprotect300.aspx, so we can open it and click on the element there. We also see the iframe has a name starting with mdns. Taking all research in mind, we can prepare a code like this,
const page = await browser.newPage();
await page.goto('http://example.com/ads/adprotect300.aspx', {waituntil: "networkidle0"});
await page.waitFor('iframe');
await page.waitFor(4000); // artificial wait for randomness
const frame = await page.frames().find(f=>f.name().includes('mdns'));
const ad = await frame.$('div > a');
ad.click();
In this website, it opened a new tab, as stated before, it clicked and now we have to do is grab the links for all open tabs, so if it has any popups or redirects on new tab, it will be grabbed.
await page.waitFor(2000);
const pages = await browser.pages()
console.log(pages.map(page=>page.url()))
There are better ways to wait for the navigation and all, but I am just showing what can be done. The result,
[ 'chrome-search://local-ntp/local-ntp.html',
'http://example.com/ads/adprotect300.aspx',
'https://adwebsite/activity/htb/candy/pc?ref=93454&i=704ea49d-7b0b-4c05-b4d0-f0225ecc7154&h=12700290a03e232a14fa0f1cf35e27a346d91f6e&c=878146837666' ]
Let me remind you once again, this is clearly illegal and the accounts might be put on risk. Use your head at your own risk.
You can use waitFor to make sure that the specific selector is available in the DOM
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#framewaitforselectororfunctionortimeout-options-args