I am trying to inspect a page with playwright that holds a frame document that when I click a button a banner will appear for a couple of minutes. When it's done the page needs to be reloaded for the banner to disappear. I am checking every 5 minutes automatically until I don't see the banner on the page but when I can only do it for the 1 loop after that the code breaks. What can I do to fix this.
A possible solution could be going to the iframe link itself but the document breaks if I do that. I wish to avoid doing this. It's not how I would do things if I was manually doing this.
UnhandledPromiseRejectionWarning: frame.evaluate: Execution Context is not available in detached frame (are you trying to evaluate?)
const browser = await chromium.launch({
args: ["--start-maximized", "--disable-notifications", '--disable-extensions', '--mute-audio'],
defaultViewport: null,
devtools: true,
slowMo: 50,
downloadsPath: "D:\\Lambda\\projects\\puppeteer_test\\data",
});
// Create a new incognito browser context with user credentials
const context = await browser.newContext({
acceptDownloads: true,
viewport: null,
storageState: JSON.parse(storageState),
})
// Create a new page in a pristine context.
const page = await context.newPage()
// go to download your information
await page.goto("");
//select child frame
const frameDocUrl = await (await page.waitForSelector("iframe")).getAttribute("src")
const doc = await page.frame({url: frameDocUrl})
await doc.waitForLoadState('domcontentloaded');
/* waitForFile */
// refresh every 5 minute until notice of gathering file is gone
// then Pending becomes download
const frameUrl = await doc.url()
const fiveMinutes = 300000
let IsGatheringFile = await doc.$("//div[text()='A copy of your information is being created.']") ? true: false
while(IsGatheringFile){
//reload page
console.log("going to reload")
await doc.goto(frameUrl)
// wait for 5 minutes
console.log(`going to start waiting for 5 min starting in ${Date().split(" ")[4]}`)
await doc.waitForTimeout(fiveMinutes)
console.log("finish reloading")
// check if notice is gone
IsGatheringFile = await doc.$("//div[text()='A copy of your information is being created.']") ? true: false
}
console.log("finish waiting for data")
console.log("finish reloading the page until the banner is gone")
Solution:
after the page refresh/new navigation recapture the focus on the iframe.
const frameUrl = await doc.url()
await doc.goto(frameUrl)
Also, note that you can update the variable that you are passing by to the other parts of your script with the new refresh iframe.
old hacky fix:
Instead of reloading the page reload the iframe.
At the moment there is no frame.reload but this process can be achieved by frame.goto(frameURL)
const frameUrl = await doc.url()
await doc.goto(frameUrl)
Note: iframe can break. Reloading the page can fix it but the frame will be detached.
This post is a bit old but I will respond anyway as I had this problem this week and just resolved it.
I am in python not Node, but the logic is still the same I believe.
So for me, just recapturing the focus didn't work after the page.reload().
I did use the "old hacky fix" and instead of reload all the page, reloaded just the frame concerned.
My solution is like that :
iframe.goto(iframe.url)
is_detached = iframe.is_detached()
if is_detached:
iframe = page.main_frame.child_frames[-1]
Related
Is it possible to target an iFrame when using the GUI Workflow builder in AWS Cloudwatch Synthetics?
I've set up the canary to log in to a website and redirect the page which has run successfully, but one of the elements I need to check with Node.js is within an iFrame which isn't being recognised.
This is the iframe code. It loads from Javascript, but all content is from the same domain:
<iframe id="paramsFrame" src="empty.htm" frameborder="0" ppTabId="-1"
onload="paramsDocumentLoaded('paramsFrame', true);"></iframe>
This is the code I'm using for this section, but it's just returning a timeout error:
await synthetics.executeStep('verifyText', async function() {
const elementHandle = await page.waitForSelector('#paramsFrame');
const frame = await elementHandle.contentFrame();
await frame.waitForXPath("//div[#class=\'css7\'][contains(text(),'Specificity')]", { timeout: 30000 });
})
This code is trying to target a div with class css7 found within an iframe with id paramsFrame
Edit: I did a null check on frame and it came back as not null, not sure if that is relevant.
I also tried to target an element directly:
const next = await frame.waitForSelector('.protocol-name-link');
but I got the error message:
TimeoutError: waiting for selector .protocol-name-link
If the iframe is on a different origin (e.g. different domain), you cannot access it through Puppeteer.
You can try to disable some security features of Puppeteer, although this is not advised.
Specifically, you'd probably want to add these args to puppeteer.launch
--disable-web-security
--disable-features=IsolateOrigins,site-per-process
I tried running similar code on a website which had a youtube iframe and I didnt need the puppeteer launch args
i.e
args: [
"--disable-web-security",
"--disable-features=IsolateOrigins,site-per-process",
],
But, First I would like to suggest is for the iframe try to confirm it is the same iframe that you need maybe by logging, debugging or even just going on dev console.
And the second is to use full xpath of the element in the frame.
Here is my code which I tried running.
const page = await browser.newPage();
console.log("open page");
await page.goto("https://captioncrusher.com/");
console.log("page opened");
// use this if you want to wait for all the requests to be done.
// await page.waitForNetworkIdle();
const elementHandle = await page.waitForSelector("iframe.yt");
const frame = await elementHandle.contentFrame();
//These both work for me
const aLink = await frame.waitForXPath("/html/body/div/div/a");
const classLink = await frame.waitForSelector(".ytp-impression-link");
await browser.close();
Once I open a puppeteer browser page and redirect to a specific site. If the proxy used for the browser is banned, I want to be able to switch it without closing or restarting the browser. refreshing the browser would be the best option. However, I have not yet been able to figure out how to do it. I've tried making variables out of '--proxy-server=xxxx' and switching that variable whenever proxy is banned, but that didnt work out. I've tried many other things too but have yet been able to figure it out. Any kind of help would be much appreciated.
var proxies = get_proxy() // get proxy
var useragent = randomUseragent.getRandom()
let browser = await puppeteer.launch({
headless: false,
args: [
`--proxy-server=${proxies.address}:${proxies.port}`,
`--user-agent="${useragent}"`,
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-infobars',
'--window-position=0,0',
'--ignore-certifcate-errors',
'--ignore-certifcate-errors-spki-list'
]
})
let page = await browser.newPage()
Example of how i want it to be:
const proxy_banned = await check_proxy_ban()
If(proxy_banned){
- Switch puppeteer browser proxy
- refresh
- return function
} else{
- return function
}
I'm trying to bypass a captcha on a website and for that I need to execute a command in an iframe of a popup and i cannot find a way to do that. Here is my code:
const cookie = {
name: 'login_email',
value: 'example#domain.com',
domain: '.paypal.com',
url: 'https://www.paypal.com/',
path: '/',
httpOnly: true,
secure: true
}
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false, defaultViewport: null });
const page = await browser.newPage();
await page.setCookie(cookie)
await page.goto('https://www.paypal-dobijeni.cz/');
await page.waitForSelector('#login');
await page.click('#login');
const newPagePromise = new Promise(x => page.once('popup', x));
const popup = await newPagePromise;
await popup.waitForSelector('#password');
await popup.type('#password', 'examplepassword');
await popup.click('#btnLogin');
await popup.waitForSelector('form[name="challenge"]');
})();
The command that I need to execute is verifyCallback('<g-recaptcha-response>')
UPDATE: That's how I do it in the console:
First i select the iframe
Then I execute the command with the g-recaptcha-response I get from my captcha solving service
This isnt really the solution you are looking for but I'll post it in case you decide you want to use it.
First I use argv to parse arguments passed to the script. One of these arguments the user can pass is headless.
When the script runs, I find someway to detect when captchas pop up, and if one is detected and the browser is headless, I log something close to "Captcha appearred, run script with headless set to false and solve the captcha".
When the script is executed with headless set to false and captcha is detected, I await a Promise that holds a one second interval, which checks to see if the captcha has left the page. With the browser no longer being headless, you can manually solve the captcha. When the captcha is gone, the interval is cleared and the Promise is resolved and the rest of the script will execute.
If you are lucky, the captcha won't need to be solved again for that ip address
i'm actually trying to use puppeteer for scraping and i need to use my current chrome to keep all my credentials and use it instead of relogin and type password each time which is a really time lose !
is there a way to connect it ? how to do that ?
i'm actually using node v11.1.0
and puppeteer 1.10.0
let scrape = async () => {
const browser = await log()
const page = await browser.newPage()
const delayScroll = 200
// Login
await page.goto('somesite.com');
await page.type('#login-email', '*******);
await page.type('#login-password', "******");
await page.click('#login-submit');
// Wait to login
await page.waitFor(1000);
}
and now it will be perfect if i do not need to use that and go on page (headless, i dont wan't to see the page opening i'm just using the info scraping in node) but with my current chrome who does not need to login to have information i need. (because at the end i want to use it as an extension of chrome)
thx in advance if someone knows how to do that
First welcome to the community.
You can use Chrome instead of Chromium but sincerely in my case, I get a lot of errors and cause a mess with my personal tabs. So you can create and save a profile, then you can login with a current or a new account.
In your code you have a function called "log" I'm guessing that there you set launch puppeeteer.
const browser = await log()
Into that function use arguments and create a relative directory for your profile data:
const browser = await puppeteer.launch({
args: ["--user-data-dir=./Google/Chrome/User Data/"]
});
Run your application, login with an account and the next time you enter you should see your credentials
Any doubt please add a comment.
I am trying to get the a new tab and scrape the title of that page with puppeteer.
This is what I have
// use puppeteer
const puppeteer = require('puppeteer');
//set wait length in ms: 1000ms = 1sec
const short_wait_ms = 1000
async function run() {
const browser = await puppeteer.launch({
headless: false, timeout: 0});
const page = await browser.newPage();
await page.goto('https://biologyforfun.wordpress.com/2017/04/03/interpreting-random-effects-in-linear-mixed-effect-models/');
// second page DOM elements
const CLICKHERE_SELECTOR = '#post-2068 > div > div.entry-content > p:nth-child(2) > a:nth-child(1)';
// main page
await page.waitFor(short_wait_ms);
await page.click(CLICKHERE_SELECTOR);
// new tab opens - move to new tab
let pages = await browser.pages();
//go to the newly opened page
//console.log title -- Generalized Linear Mixed Models in Ecology and in R
}
run();
I can't figure out how to use browser.page() to start working on the new page.
According to the Puppeteer Documentation:
page.title()
returns: <Promise<string>> Returns page's title.
Shortcut for page.mainFrame().title().
Therefore, you should use page.title() for getting the title of the newly opened page.
Alternatively, you can gain a slight performance boost by using the following:
page._frameManager._mainFrame.evaluate(() => document.title)
Note: Make sure to use the await operator when calling page.title(), as the title tag must be downloaded before Puppeteer can access its content.
You shouldn't need to move to the new tab.
To get the title of any page you can use:
const pageTitle = await page.title();
Also after you click something and you're waiting for the new page to load you should wait for the load event or the network to be Idle:
// Wait for redirection
await page.waitForNavigation({waitUntil: 'networkidle', networkIdleTimeout: 1000});
Check the docs: https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagewaitfornavigationoptions