My goal is to scrape the src link within the video tag on this webpage. This is where I am seeing the video tag along with the link which I want.
I know how to grab the information within the tag using
driver.find_element(By.XPATH, '//video')
But when I tried to find the Xpath of the tag by using the console, I was unable to find it.
I also tried driver.find_element(By.TAG_NAME, 'video') but I got <selenium.webdriver.remote.webelement.WebElement (session="6a5b945439665a2261e0bb7cf4a19c8e", element="127606c2-b043-4b55-b8ff-5456bb39a2c3")> from which I dont know how to get the src link. I tried to use .text but it became blank.
I tried parsing through the page_source and finding the link manually but I still could'nt find it.
There is a = $0 right after the end of the video tag, meaning its a [last selected DOM node index]Selenium Duplicate Elements marked with ==$0
When I type $0 or console.log($0) into the console I get the video tag with the link.
What should I do scrape this tag and its contents ?
You can get the source attribute with:
[...]
source = driver.find_element(By.XPATH, '//video').get_attribute('src')
print(source)
[...]
Result in terminal:
blob:https://mplayer.me/d420cb30-ed6e-4772-b169-ed33a5d3ee9f
See Selenium documentation at https://www.selenium.dev/documentation/
Related
Please read the problem and requirement carefully as I have searched and tried innumerable things before adding a new question.
This is the code to get all elements from inspect element. Even though the code is in Python, Javascript code to do it will also work as I am executing it in Python.
from selenium import webdriver
url="https://www.websiteWithLotsOfJavascript.com"
driver = webdriver.PhantomJS(executable_path=r'my_path')
driver.get(url)
#This will get the initial html - before javascript
html1 = driver.page_source
# This will get the html after on-load javascript
html2 = driver.execute_script("return document.documentElement.outerHTML;")
#copied.txt has the manually copied inspect element
f=open("copied.txt",encoding="utf8")
copiedString=f.read();
print(len(copiedString))
print(len(html1))
print(len(html2))
OUTPUT:
3914543
588849
588740
The lengths of html1 and html2 are almost same but length of the copiedString (which is manually copied by me by going into inspect element and then right clicking the outermost HTML tag and then clicking Edit As HTML and then selecting all text and copy) is almost 6 times the length of html1 and html2.
I have tried both document.documentElement.outerHTML and document.documentElement.innerHTML.
I also tried pausing the program with time.sleep() before script execution line (thinking maybe network delay may cause everything to not load) but same result. I guess I have read various articles and almost every stackoverflow question on getting inspect element but nothing seems to work.
What can be causing the difference OR is there a way to get the complete HTML by some other means?
I am trying to build an automated Puppeteer script to download my monthly bank transactions from my bank website.
However, I am encountering a strange error (see attached Imgur for pictures of this behavior)
https://imgur.com/a/rSwCAxj
Problem: querySelector returns null on DOM element that is clearly visible:
Screenshot: https://imgur.com/d540E6p
(1) Input box for username is clearly visible on site (https://internet.ocbc.com/internet-banking/),
(2) However, when I run document.querySelector('#access-code'), console returns null.
I'm wondering why this behavior is so, and what are the circumstances that a browser would return null on a querySelector(#id) query when the DOM node is clearly visible.
# EDIT: Weird workaround that works:
I was continuing to play around with the browser, and used DevTools to inspect the DOM element and use it to Copy the JS Path.
Weirdly, after using Chrome Devtools to copy the JS Path, document.querySelector('#access-code') returned the correct element.
Screenshot of it returning the correct element: https://imgur.com/a/rSwCAxj
In both cases, the exact same search string is used for document.querySelector.
I believe that you cannot get proper value using document.querySelector('#access-code') because a website use frameset.
In the website there is frame with src to load content
<frame src="/internet-banking/Login/Login">
DOMContentLoading is executed when main document is loaded and not wait for frame content to be loaded.
First of all you need to have listener for load event.
window.addEventListener("load",function() {
...
});
And later on you cannot simply use document.querySelector('#access-code')
because input yuo want to get is inside frame. You will need to find a way to access frame content and than inside of it use simple querySelector.
So something like:
window.addEventListener("load",function() {
console.log(window.frames[0].document.querySelector('#access-code'));
});
BTW please see in: view-source:https://internet.ocbc.com/internet-banking/ looks like website is mostly rendered client-side.
I'm trying to click on href javascript link with Selenium in Python.
The HTML looks like this:
HTML Example
and I want to click on javascript:goType(1).
this is what I tried to do:
advance_search = browser.find_element_by_xpath("//a[#href='javascript:goType(1)']")
advance_search.click()
but it failed with: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a"}
Moreover, when I trying to print all "a" tags it prints an empty lists (Maybe this cause the error). Is There a chance that it isn't possible?
I searched for similars answers but they didn't helped. Plese help me :).
I think I realized something: When I did browser.find_elements_by_tag_name("body") it didn't found anything, but when I
tried with "head" it did found, and then I discovered that there is a 'page source' and a 'frame source', and my code works only on the page source and not on the frame source.
It doesn't finds anything because all my code is in the frame source.
How could I run selenium on the frame source?
I want to find elements inside an iframe tag. However, in the HTML source there isn't any iframe tag. If I inspect element there is, though. How to solve this using Selenium library in Python 2.7?
HTML source
Screenshot
Inspect element
Screenshot
If it's dynamically generated, it could explain why it's not found in the Selenium version of the DOM. You can still get it by using JavaScript in your code.
driver.execute_script("return document.getElementsByClassName('card-fields-iframe')")
I got the following warning in the Chrome's console:
"IntersectionObserver.observe(target): target element is not a descendant of root."
What is the meaning of this? How could I find the reason for it, in order to fix it?
This warning appeared for me too. Chrome Debugging tool did not like an attribute in an element. I found the offending attribute by cutting out chunks of html and reloading the page until I narrowed it down to a single attribute.
for me it was this muted attribute...
Hope this helps.
I got this warning when I was creating a HTMLVideoElement in JS, but not adding it to the body of the document, before playing it to extract the first frame image.
I worked around it by setting its display to none, appending the node as a child of the body, and in a later promise removing the element from the body.
So, I'd check if you're creating any DOM elements in JS, and not adding them to the body of the HTML document.