python + selenium - focus to make infinite scroll load - javascript

I am scraping a limited number of items from the top of an infinite-scroll website.
links = driver.find_elements_by_xpath("//div[#class='fixed-recipe-card__info']//a")
while len(links)<100:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
links = driver.find_elements_by_xpath("//div[#class='fixed-recipe-card__info']//a")
This works wonderfully when the window is active. However, if I have the test browser minimized, the new content does not load and the loop runs infinitely. I'm rather new to selenium, so I'm not quite sure why. I suspect there is a Javascript onChange that isn't being triggered. Is there a javascript command I should add to my script, or another selenium command that will cause the new content to load?
I am using Python 2.7, selenium with Chromedriver. An example site is allrecipes.com.

Do you minimize it because you are busy with other things? You can use headless mode once your code is doing what you want visually and avoid this problem.

By the way you should try PhantomJs as the driver if minimizing the window is a big concern. It basically works the same way as the chrome driver but it uses no browser so all your code will run in the backgroud, it worked for me. It may work for you, happy coding! http://phantomjs.org

As you mentioned the test browser minimized, the new content does not load is pretty much expected as Selenium needs focus on the Browsing Window to interact with the DOM elements.
Reason
At this point it is worth to mention that a webpage could change its content when the focus is lost . You need to consider the fact Selenium was mainly designed for testing.
Solution
Ideally, Automated Test Execution or Web Scraping must be performed within an isolated Test Environment preferably in a Test Lab configured with the required Hardware and Software configuration which must be free from Manual Intervention.

Related

Can a browser's dev console continue executing JavaSript after a new page loads?

I'm trying to automate some online work through JavaScript and the Firefox (or Chrome) dev console. The work is mostly inputting the same (or similar) data on the same exact pages for many many people.
Example:
unique id
date 1 and 2
some more numbers
I wrote a very simple script that runs in the console and enters the data just fine.
The Problem
My script stops execution whenever it requires the page to reload or it loads another page. I cannot find any information on how to continue executing a script after a page has loaded.
My Limitations
I'm basically limited to what's on FireFox, Chrome, or Edge. Unfortunately, I cannot download any programs or tools that would make the automation any easier right now. Otherwise, I would just use Selenium and Python.
What I've Tried
First I tried to use the script that I describe above (simple DOM manipulation)
Then I tried to use the Selenium browser add-on, but I had to enter a starting URL for it to run. Selenium was not able to get past the login page of our system which is the only static URL that I can use as a starting point.
I then tried to use the Firefox Browser Console (different from the dev console) because the documentation seemed to suggest that I can use JavaScript on the entire browser (not just one tab). Unfortunately, I cannot find any helpful information on how to use the browser console for DOM manipulation. Everything that I search for points to how you create a browser extension, add-on, or how to use JavaScript on your own website.
What I Want To Do
I want to create a script that runs in a dev console. The script should take all of the data either from a separate page or an array then enter the data on each page for each person. I'll also have it prompt the user to verify the data before submission.
What I'm Looking For
What I'm hoping to get from this question is at least one three things.
An answer to the question's title.
Being directed to documentation or some other solution that can solve any of the above problems.
Being told if this is impossible and why by those who have more experience than me (I don't understand if the problem is just a lack of knowledge or limitations on the tools themselves.)
I think you can create a chrome extension and put your code in the background service worker. or use workers read this link

Is it possible to view a webpage that is being edited in vscode.dev?

If we are using vscode in the browser to edit an html file or project, is there a way to view the page?
I think I know what you mean.
On my chromebook I use vscode.dev in one browser tab, and then just drag and drop the .html file in to an empty browser tab and it can run Vanilla JS, CSS3, and HTML without anything else.
Refresh after each save, or install a live server. You could also have VScode running node, or whatever, in the background and just edit in vscode.dev, but that seems a bit excessive.
(Before I was using Crostini to run VScode on my chromebook and it would sometimes become very slow, buggy, or keep flickering (gpu style), and I'd have to restart/kill crostini to fix it, so finding vscode.dev has really helped when I can't find time to get to my workstation.)
CodeSwing is an extension that works in vscode.dev that does this.
Description: Interactive coding playground for building web applications (aka swings).
Id: codespaces-contrib.codeswing
VS Marketplace Link:
https://marketplace.visualstudio.com/items?itemName=codespaces-Contrib.codeswing

Headless Chrome: Run a webpage from command line without launching it?

I have a webpage that uses D3, canvg and gif.js to generate GIFs of time-lapse maps. The page generates 3,000 gifs, one at a time. The page is not meant for public consumption.
While it works pretty well to just open this page and download the GIFs, it tends to be asking a lot of the browser. So I'm curious if there's a way to run a page headlessly from the command-line without actually opening it, but running the full app to render the page.
Why not just use Phantom from Node, you might ask? For starters, Phantom is hard! But more importantly, I've never had complete success using Phantom or any other client-side browser engine, like jsdom, to completely render SVGs exactly right.
So my question is basically whether it's possible to use Chrome instead of Phantom and launch a page from the command line that executes the page as if it was merely opened in the browser but without actually opening the page.
Thanks!
You could use electron. The advantage would be you could very easily save your generated gifs, something you can't do with Chrome unless you also run a server.
Otherwise there are some docs for headless chrome here

Injecting a JavaScript script using Selenium Driver + PhantomJS and handling the redirection correctly in Python

My problem is:
I have being developing a Python script that connects to an URL, and using the selenium driver I manage to inject a Javascript file, after this file executes the currently page is redirected. This's all done using selenium to handle Firefox:
driver = webdriver.Firefox();
, but when I try to use PhantomJS as the browser, since it doesn't have any graphical interface:
driver = webdriver.PhantomJS();
I can't handle the response properly. Still haven't found out if the driver is not injecting the script correctly or if it's an response handling problem. If someone has any ideas it'll be great to hear.
I posted this on another question, but I think this will help:
After dealing with this same dilemma myself, I can wholeheartedly recommend using your preferred Selenium webkit (mine is Chrome) in conjunction with XVFB.
XVFB allows you to heedlessly run a browser like Firefox, Chrome, etc. which basically eradicates all of the bugginess that inherently comes with using PhantomJS. While it’s definitely an awesome piece, it’s inner workings tend to have different interactions at times (I ran into issues for instance with not being able to TAB from one element to another like one can in any browser). If you are using Jenkins, there is an incredibly awesome Plugin which literally takes one click of a button. Otherwise, I’d definitely recommend checking this out.
Phantom is a real pain in the ass, so it's definitely worth circumventing it :)
Hope this helps!

How would one monitor JavaScript execution in a web browser programmatically?

I would like to be able to know when arbitrary JavaScript successfully executes a command in a web browser. The medium doesn't matter, it could be a log, stack trace, event signal, it just has to be something that can be programmatically analyzed.
I've thought about this problem for some time now and I have not been able to come up with an adequate solution. I'm no expert with JavaScript though, so I'm wondering what ideas you have?
Since you'll probably be wondering why, it's just something I'm very interested in.
Any input is appreciated. Can you help me?
EDIT: I've investigated using something like Firebug to monitor JavaScript functions, however I wasn't able to determine if Firebug can be run programmatically on a simulated Web Browser (like a web-browser control in ASP.NET, which is what I'm currently using.) Does anyone know if it can?
You can use the profiler of Firebug.
Go to the console tab and click Profile. The profiler starts and all the javascript actions are "logged" till you click Profile again. Then you get the list of javascript functions that were executed in this interval.
A similar feature is available in most modern browsers' consoles.
Source: See/Log which javascript function is being executed by the browser
The firefox browser could be used in asp .net using the selenium web driver and it also provide the ability to access all details from a web page. see the document and download api code and integrate it in your project its very easy to integrate using its help.
http://docs.seleniumhq.org/projects/webdriver/

Categories