Shadow DOM not clear - javascript

I need to access the body element of an opened window that has Shadow DOM. Run this code on your browser (you need to disable third party security on your browser):
<script type="text/javascript">
janela = window.open("http://www.google.com.br");
window.setTimeout(
function() {
console.log(janela.window.document.body.innerHTML);
},
5000
);
</script>
If you see at your console there will be an empty string. Now change the URL http://www.google.com.br to http://www.bing.com.br and it works fine: the BODY innerHTML is displayed in the console.
I see that Google is now using Shadow DOM and it's probably what is causing my problem. Open Google.com in your browser -> F12 and you will see there is a #shadow-root element and I think this creates my problem. How can I bypass that and have access to the DOM?

Shadow DOM has nothing to do with this. Your browser doesn't want to let any random website open up something like gmail.com and read whatever they see there. If it did, then any website you visit could read all your email any time you're signed in to your gmail account.
Please read the section on "Cross-origin script API access" here.
JavaScript APIs such as iframe.contentWindow, window.parent, window.open and window.opener allow documents to directly reference each other. When the two documents do not have the same origin, these references provide very limited access to Window and Location objects, as described in the next two sections.

The whole idea of shadow DOM is encapsulation, so you cannot access the shadow DOM using JavaScript outside of the context of the context that creates it.

Related

Javascript function document.querySelectorAll works weird on Chrome

I am trying to get elements by the Chrome developer console using the function document.querySelectorAll, the point is that it does not return any element, however I see the elements on the Elements tabs.
I was wondering whether someone has faced similar issues. Shall I change some options on the browser configuration?
By the way, the Chrome version is 63 on MAC. In addition, the page I am working on has an iframe html tag, may this be the reason of the strange behavior?
This is what I get from the Developer Console
And this is what I get from the elements tabs:
There aren't any browser settings that would affect document.querySelectorAll(). It's pretty core functionality.
You mentioned an iframe, so it's likely that is the source of the confusion. When using iframes, you can't access or modify the contents of the iframe directly from the outer level. To the outer level, it's essentially a black box. This is due to sandboxing that the browser does.
The exception to this is if the iframe and the main page are on the same domain (e.g., http://example.com/page1 and http://example.com/page2).
If they are both on the same domain, then you can access it's window with contentWindow:
const iframe = document.querySelector('iframe');
iframe.contentWindow // the window for the iframe
From there, you can access its document, and run querySelectorAll() against that:
iframe.contentWindow.document.querySelectorAll('div');
That will get all of the div elements in the iframe.

Is it possible to search the DOM for a keyword occurring within an iframe and reload page until it's found?

I have an ad that displays within an iframe on a given publisher site around every 1000 loads or so. I have no control over the host site but I need a way to see the ad live as users will see it. I'm trying to figure out a javascript solution that will load the page, search for the name of my company to see if my ad loaded (the company name is the id of a div tag that loads the iframe) and then either stop there if it finds it, or reload the page if not.
I had it sort of working by running a script in the console that got the innerHTML of the document body, searched for a keyword and then reloaded the page if the keyword wasn't found.
Two problems though.
It could only find keywords outside of the iframe.
It didn't search the content of the iframe (where the actual keyword that would identify my particular ad sits) even if I set a delay or did onload.
Secondly, for every page refresh, the script would be cleared from the console.
I know this is beginner stuff but I would love any pointers to the correct way to tackle this problem.
Thanks so much for the help thus far (also, I upvoted everyone but don't think I have the necessary cred for it to show up publicly)
Here's where I got. I created a chrome plugin with the following manifest.json:
{
"manifest_version": 2,
"name": "Ad Refresh",
"version": "1.0",
"permissions": [
"activeTab",
"tabs"
],
"content_scripts": [
{
"matches": [
"<all_urls>"
],
"js": ["jquery.min.js", "contentscript.js"],
"all_frames": true
}
]
}
I have the content-scripts running on all urls but will restrict it once I get things running properly.
For contentscript.js that gets injected and runs in each frame, I have:
setTimeout(function(){
$("[title='3rd party ad content']").attr("id", "dfp"); // "3rd party ad content" is the title of all iframes that could potentially contain my ad and is the only identifying attribute across all iframe instances. I stick an id on there so it's easier to grab with getElementById. It only gets the first instance though, need to figure out how to loop through all.
var company = document.getElementById('dfp');
if (company == null) {
console.log("no hit");
} else {
console.log(company);
}
}, 5000);
I'm not worried about reloading the page, I'm just stuck on getting access within the iframe.
I am unable to directly grab any element within the actual content of the iframe with jquery $ or getElementById etc. However, if I run getElementId on the iframe itself and console.log it, it includes all the HTML inside the iframe:
http://i.stack.imgur.com/dfuYt.png
I tried getting the innerHTML of the iframe element so that I'd have it as a string and could search for it, but all it returns is the iframe element tags, none of the inner content. I don't know if any of that makes sense but I appear to be over my head at this point.
Ok, last addition. My ad runs a script that I can see under "Sources" in inspector. So I thought "Why not run
var scripts = document.getElementsByTagName('script'); to get an array of all the scripts that were loaded on the page? Then I could just search the array to see if my script and hence my ad had loaded and we'd be golden." Unfortunately though, it doesn't include my script in the array, even when it's loaded and is visible in "Sources" and accurately includes a random Stripe script that's also loading from within an iframe. Bummer...
Use .load event of the jQuery to know whether iFrame is loaded and then read the innerHTML of the iframe body
Try this:
$('#ID_OF_THE_IFRAME').load(function() {
var iFrameContent = $('body', this.contentWindow.document).html();
console.log(iFrameContent);
});
Fiddle
JS:
var company = document.getElementById('myframe1').contentWindow.document.getElementById('company');
if (company == null) {
//reload
console.log("reload");
} else {
//continue
console.log(company);
}
It sounds like the iframe containing the ad is loaded from a different domain than the main page, is that right? That would explain why your JavaScript code running in the main page (or in the console, same thing) can't access DOM elements inside the iframe. Browser cross-domain security prevents that kind of access: the iframe is treated just like a separate browser window or tab.
If the main page and the iframe were both loaded from the same domain, then you could use contentWindow as a couple of answers have described. But that won't work across domains.
So, what can you do?
You're building a tool for your own use or the use of your colleagues - not something you need to publish on a website for the rest of the world to use, right?
This gives you a couple of other options. First, you could simply disable cross-domain browser security. In Chrome, you can do that as described here:
Disable same origin policy in Chrome
Beware: Don't do any "normal" browsing in a Chrome session running in this mode, only your special testing. But if you do run Chrome in this mode, then you'll be able to access iframe DOM elements via contentWindow and contentWindow.document as described in the other answers.
If that doesn't do the trick, or if you don't want to have to start a special Chrome session for this, another approach would be to write a Chrome extension. This would allow you to write code to access DOM elements in both the iframe and the main window using techniques like these:
access iframe content from a chrome's extension content script
Chrome extension to remove DOM elements inside iframes
Or you could write a Firefox extension if you prefer - similar capabilities are available in both.

How to access DOM in Firefox addon script?

I tried to use something like document.forms in Firefox-addon script but it doesn't work.
So, I need to manipulate DOM objects in Firefox-addon script such as forms, inputs... etc. How can I do that without using SDK?
document.forms will not work, because document is not what you think it is: It is the top level browser (Firefox) window, and not the content in a tab.
A Firefox browser window can have multiple tabs, one of which is the active tab. The active tab <browser> element (which is the XUL element containing the actual content document) also has a shortcut named content, e.g. content.document.forms will be a collection of forms in the active tab.
So you'll have to adjust your mental model here from
window and document refer to a website
to
window and document refer to the top-level browser window that may contain a lot of different websites.
The top-level window is more like a document containing multiple frames (the actual websites), really, but with a different APIs to access them.
So, e.g. when executing some action after the user pressed some add-on toolbar button, it might be enough to just use content.document.forms to get the forms of the currently active tab.
But using content. is often not enough: Add-ons would listen for page loads in tabs as the user navigates by adding appropriate event listeners to the <tabbrowser> element (gBrowser), which is the element containing all tabs. MDN has some code snippets for that and lots of other stuff.
Other add-ons add item(s) to the content context menu (contentAreaContextMenu) and use the popupshowing event to know what DOM node (and by this what .ownerDocument and content window == .ownerDocument.defaultView) is currently focused.
An important thing to always keep in mind: Your add-on code runs with full privileges, while websites of course do not. So be careful not to write insecure code. E.g. all forms of unbound eval are evil.
Judging by your comments, your code is running in the context of the browser window. This means that document refers to the document of the browser window, not the document that is loaded into it. The easiest way to get to the latter is using the window.content property:
var contentDoc = content.document;
alert(contentDoc.forms.length);
This will give you only the current tab however. For the other tabs you can use the APIs provided by the <tabbrowser> element (accessible via the global gBrowser variable), e.g. to access the first tab:
var contentDoc = gBrowser.browsers[0].contentDocument;
alert(contentDoc.forms.length);

Can Chrome extension content scripts access window.opener?

In my extension, I'm trying to determine whether a new tab was created as a popup by another tab and if so, which tab.
I thought I would be able to use window.opener from a content script to help figure this out. But it looks like window.opener doesn't work correctly in content scripts.
When I create a tab manually, it's window.opener is null as expected.
When a tab is created as a popup by another tab, its window.opener is undefined. I can infer from this that the tab was created as a popup, but I can't use it to figure out which tab created the new one.
Is this a known issue, and does anybody know of any workarounds?
I didn't look closely into this problem, but I think I can point you in the right direction. Content script can't access a variable from a parent window because it is sandboxed. A workaround would be to run your code directly on a page, to do this you need to inject your script inside a script tag:
Your content script would look like this:
function injectJs(link) {
var scr = document.createElement("script");
scr.type="text/javascript";
scr.src=link;
(document.head || document.body || document.documentElement).appendChild(scr);
}
injectJs(chrome.extension.getURL("inject.js"));
Now you can run your code without sandbox restrictions as if it was right on the page:
inject.js:
alert(window.opener);
I assume you would like to now pass this information back to a background page, which is another challenge as you can't use Chrome API. Good news is that content script can access DOM and listen to DOM events, so you can use them to pass information to a content script which would send it to a background page. I am pretty sure you should be able to register a custom DOM event and have your content script listening to it (haven't tried this part myself).

How do I build an iframe with the same domain as the page in Safari/WebKit

The scene: I'm writing an embeddable widget. It takes the form of a <script> tag, which builds an iframe containing everything it needs to display. The iframe has no src, and the script writes to it with theIframe.contentWindow.document.write(). This keeps the widget contained, and keeps element ids and script from conflicting with the page on which the widget is embedded.
The trick: The widget has to be able to change its size. To do this, it sets its containing iframe's style.height. This requires access to the outer page's DOM. In Firefox and IE, this is allowed, because the iframe's document and the outer document are considered to share an origin.
The twist: In Safari, however, the two documents are considered not to share an origin. The inner document is considered to be at about:blank, while the outer document is clearly using a different protocol and "domain" (if blank can be considered the domain).
The question: How can I build an iframe programmatically whose document Safari/WebKit will consider to have the same origin as the document of the window creating it?
Edit: After further experimentation, I can't find a way to programmatically create an iframe whose location is not about:blank regardless of whether I change its contents.
If I create the frame with document.createElement(), give it a src which points to a real HTML resource on the same origin called "foo.html", and document.body.appendChild() it, Safari's console shows the element as expected in the DOM, but the contents of the page do not appear, and the document is listed in the sidebar as "about:blank".
If I include the HTML for the iframe directly in the page, the contents of foo.html appear, and "foo.html" appears in the sidebar.
If I insert the HTML using document.write(), I get the same result as with document.body.appendChild().
Both programmatic versions work in Firefox.
The best suggestion I could give is to have the iframe set to a blank page on the same server (ie blank.html) and then edit the content. A pain in the rear, I know but it's a workaround.
You could also try
iframe.contentDocument.open("replace");
iframe.contentDocument.write("<b>This is some content</b>");
iframe.contentDocument.close();
However, I'm not sure if that only works in IE. Sorry I couldn't be more helpful than that.
Aha. This seems to be a bug in WebKit. When an iframe is created programmatically, its src attribute is ignored. Instead, the frame defaults to about:blank and must be directed to a URL to point elsewhere. For example:
theIframe.contentWindow.location = theIframe.src

Categories