Im using the selenium webdriver for node.js and im also loading an extension, loading the extension works fine but when I start my project it goes to the page I want then instantly the extension opens a new tab (Thank you for adding this extension bla bla bla), Im wondering if theres a way to disable tabs that are not opened by myself, ive tried this:
await driver.get('https://mywebsite.com') //open my initial site
await driver.sleep(1000) //give time for the extension site to open
driver.switchTo(await driver.getAllWindowHandles()[1]) //switch to extension site
await driver.close()
driver.switchTo(await driver.getAllWindowHandles()[0]) //switch back to the main site
//rest of my code
Unfortunately this just does not seem to work, any advice appreciated!
There's no way to disable tabs not opened by your script. As long as you don't change window handles, the driver will still be on the original tab. You can proceed with the script from there, ignoring the other opened tabs.
I think the main issue I see with your code is that you are passing parameters to .switchTo() instead of .window(). It should be driver.switchTo().window(handle);.
If you want to find the new window to close it, I wrote that code in this answer. All you need to do is to add the .close() line after that code and switch back to the original handle, which you already have in your current code (after fixing with my feedback above).
Another approach is heavily based on the selenium.dev docs:
// Open the initial site
await driver.get('https://mywebsite.com')
// Store the ID of the original window
const originalWindow = await driver.getWindowHandle();
// Wait for the new window or tab
await driver.wait(async () => (await driver.getAllWindowHandles()).length === 2, 10000);
// Loop through until we find a new window handle
const windows = await driver.getAllWindowHandles();
windows.forEach(async handle => {
if (handle !== originalWindow) {
await driver.switchTo().window(handle);
}
});
await driver.close()
await driver.switchTo().window(originalWindow);
// Rest of the code
Related
Using puppeteer, I have a specific page that I am web-scraping for data and screenshot-ing for proof that the data is correct. The web page itself includes a button for creating a printer friendly version of the page. The button itself is implemented as an input of type button with no target attribute. Still, once clicked, the button opens the printer friendly version on a new page(tab) at about:blank that automatically opens up chrome's print dialog.
Whenever a new page opens up, I've typically done browser.waitForTarget() to try to capture the new target and work from there. The issue is that with any variation of code, I'm never able to find a Page that matches the page that was opened up. The closest I get is finding a Target of type other and a url of chrome://print.
Is there any way to find this type of target easily and even more get it's page (since target.page() only returns a page if the target.type() === 'page'? As a bonus, I'd like a way to potentially dismiss or ignore the window's print dialog, possibly even cancel.
You need to do the following to capture a new browser window:
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
let page1;
browser.on("targetcreated", async (target) => {
if (target.type() === "page") {
page1 = await target.page();
}
});
Or you can find the desired page using browser.pages() method. See the documentation for more information.
Using Node.js, Chrome and puppeteer as headless on ubuntu server, I'm scraping a few different websites. One of the occasional task is to interact with the loaded page (click on a link to open another page and then possibly do another click to accept the terms and such).
I can do all this just fine, but I'm trying to understand how it will work if I have multiple pages open simultaneously and am trying to interact with different loaded pages at the same time (overlapping times).
To visualize this, I'm thinking how a user will do the same job. They'll have to open multiple browser windows, open the page and switch between them to see and then click on links.
But using puppeteer, we have separate browser object, we don't need to see the window or page to know where to click. We can traverse it through the browser object and then do a click on desired element without looking (headless).
I'm thinking I should be able to do multiple pages at the same time as long as I have CPU and memory available to handle them.
Does anyone have any experience with puppeteer interacting with multiple websites simultaneously? Anything I need to watch out for?
This is the problem the library puppeteer-cluster (I'm the author) is addressing. It allows you to build a pool of pages (or browsers) to use and run tasks inside.
You find several general code samples in the repository (and also on stackoverflow). Let me address your specific use case of running different tasks with an example.
Code Sample
The following code creates two tasks:
crawl: Opens the page and extracts an URL to then start the second task
screenshot: Takes a screenshot of the extracted URL
The process is started by queuing the crawl task with the URLs.
const { Cluster } = require('puppeteer-cluster');
(async () => {
const cluster = await Cluster.launch({ // use four pages in parallel
concurrency: Cluster.CONCURRENCY_PAGE,
maxConcurrency: 4,
});
// We define two tasks
const crawl = async ({ page, data: url }) => {
await page.goto(url);
const extractedURL = /* ... */; // extract an URL (or multiple) from the document somehow
cluster.queue(extractedURL, screenshot);
};
const screenshot = async ({ page, data: url }) => {
await page.goto(url);
await page.screenshot();
};
// Crawl some pages
cluster.queue('https://www.google.com/', crawl);
cluster.queue('https://github.com/', crawl);
// Wait until everything is done and close the cluster
await cluster.idle();
await cluster.close();
})();
This is a minimal example. I left out error handling, monitoring and the setup options.
I can usually get 5 or so browsers going on a 4GB server, if you're just popping urls off a queue it's pretty straightforward:
const puppeteer = require('puppeteer');
let queue = [
'http://www.amazon.com',
'http://www.google.com',
'http://www.fabebook.com',
'http://www.reddit.com',
]
const doQueue = async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
let url
while(url = queue.shift()){
await page.goto(url)
console.log(await page.title())
}
await browser.close()
}
[1,2,3].map(() => doQueue())
i'm actually trying to use puppeteer for scraping and i need to use my current chrome to keep all my credentials and use it instead of relogin and type password each time which is a really time lose !
is there a way to connect it ? how to do that ?
i'm actually using node v11.1.0
and puppeteer 1.10.0
let scrape = async () => {
const browser = await log()
const page = await browser.newPage()
const delayScroll = 200
// Login
await page.goto('somesite.com');
await page.type('#login-email', '*******);
await page.type('#login-password', "******");
await page.click('#login-submit');
// Wait to login
await page.waitFor(1000);
}
and now it will be perfect if i do not need to use that and go on page (headless, i dont wan't to see the page opening i'm just using the info scraping in node) but with my current chrome who does not need to login to have information i need. (because at the end i want to use it as an extension of chrome)
thx in advance if someone knows how to do that
First welcome to the community.
You can use Chrome instead of Chromium but sincerely in my case, I get a lot of errors and cause a mess with my personal tabs. So you can create and save a profile, then you can login with a current or a new account.
In your code you have a function called "log" I'm guessing that there you set launch puppeeteer.
const browser = await log()
Into that function use arguments and create a relative directory for your profile data:
const browser = await puppeteer.launch({
args: ["--user-data-dir=./Google/Chrome/User Data/"]
});
Run your application, login with an account and the next time you enter you should see your credentials
Any doubt please add a comment.
I have a Meteor application where I'm downloading files from S3 using pre-signed URLs (need to be generated with an API call).
I was having an issue with popup blockers preventing a new tab from opening with the url generated by the AWS-SDK so I changed my code to the following:
downloadDocument(document, event) {
// open tab immediately to prevent popup blocker
const myNewTab = window.open();
// call method to generate url
Meteor.call('Events.Methods.Document.Download', { key: document.key, eventId: event._id }, (error, res) => {
if (error) { ... } // removed handle error code
// if url generated, set tab location to url
if (res) myNewTab.location.href = res;
// auto close the tab after 1 second
myNewTab.setTimeout(() => { myNewTab.close(); }, 1000);
});
}
This code is working for the most part but it doesn't feel very clean. Also if the API call ever takes more than 1 second (slow internet) then the tab will close before the download begins
How can I change this so that I can wait for the download to happen, before closing the tab? Or a similar solution that would result in me ensuring the downloads always go through without popup blockers being an issue?
Thanks
You are always going to run afoul of pop-up blockers if you open a new window.
What you should do is generate an <a href="my-custom-server-generated-url" download> link with the download property, which will force a download without needing a new window.
Then you also don't need to close the window on a timer (which wasn't a good approach in the first place)
This was happening only in Safari, so we switched to always downloading the file instead of opening in a new window in Safari/mobile.
I'm running a selenium-webdriver javascript scraper, which logs into a site and clicks a button that launches a new tab/window. I'm trying to switch the driver to focus on the newly generated window, but Selenium cannot find it. The code I have to look:
driver.sleep(10000).then(function() {
driver.getAllWindowHandles().then(function(d) {
console.log(d);
})
})
Which prints
[ '287ab61a-b155-46de-a2a6-298e0e98e440' ]
Which is the original browser window. What could cause Selenium to not pick up on the new window?
This is what I use whenever I need to switch to a new tab:
const tabs = await driver.getAllWindowHandles();
await driver.switchToWindow(tabs[1]);
If you want to switch back to the main tab just use tab[0].