Getting selected text with cross frame scripting

Getting selected text with cross frame scripting - javascript

I've been struggling with a problem for a few hours now, and I would appreciate either some help in accomplishing my goal, or confirmation that what I'm trying to do is in fact impossible.
I have a webapp that takes the selected text (document.getSelection()) as input, from an arbitrary webpage. While it would be possible to use a bookmarklet to do such scripting fairly easily, it's best for the end-user if I can accomplish this with an iframe.
The parent frame is my site with this script:
$('#frame').load(function(){
// this event won't be triggered
$(window).mouseup(function(){
doStuff(window.getSelection());
});
// this will throw a security error
$(window.frames[0].document).mouseup(function(){
doStuff(window.frames[0].document.getSelection());
});
});
An arbitrary site is in the child frame. Unless the child document is from my domain, access is forbidden for XSS security reasons. I've tried several variations and attempted hacks, including setting the iframe src to my domain with the third party URL as an argument, and then redirecting to the third party URL. In a sense, I'm glad that it didn't work (because if it did, then XSS security would still have a long way to go...)
Another option would be downloading the third party page and serving it from my domain like a proxy server, but I've already run into a bunch of problems with relative paths to files, which are sometimes easy to make absolute, but sometimes a fool's errand (such as when the files are accessed via script).
I've concluded that I might just be out of luck. Perhaps an important distinction for my case is that I only want to access the .getSelection() method for the child. No need to be able to access cookies or keystrokes or interact with the DOM. Maybe it doesn't make a difference, but maybe it does.

You could try the proxy method but insert a base tag that points to the original domain. The paths should be taken care of then.
I wouldn't rely on any XSS hacks even if you could find them -- they'd likely be corrected and most likely not crossbrowser.

one possibility is to write your iframe's document with text from an XHR, or use jQuery's load() function, this only works if there is no navigation in the iframe though.

I didn't fully read your answer :-) but maybe I have a solution,
it involves passing data between parent and child frames, both ways.
You can WRITE (and not READ) the hash (hash is the http://url#HASH_PART) of parent from child.
So, on the parent iframe, just set interval to check the value, say every 50ms.
function checkHash() {
if (window.location.hash == "#something") {
// my child frame set this value using:
//parent.window.location.hash = "something";
doSomething();
}
}
For further details, and maybe doing parent to child communications, then
the full article explaining this (also has demo link) can be found here.

Related

how do I make javascript function run in same window; it's reloading to a new page [duplicate]

I know document.write is considered bad practice; and I'm hoping to compile a list of reasons to submit to a 3rd party vendor as to why they shouldn't use document.write in implementations of their analytics code.
Please include your reason for claiming document.write as a bad practice below.

A few of the more serious problems:
document.write (henceforth DW) does not work in XHTML
DW does not directly modify the DOM, preventing further manipulation (trying to find evidence of this, but it's at best situational)
DW executed after the page has finished loading will overwrite the page, or write a new page, or not work
DW executes where encountered: it cannot inject at a given node point
DW is effectively writing serialised text which is not the way the DOM works conceptually, and is an easy way to create bugs (.innerHTML has the same problem)
Far better to use the safe and DOM friendly DOM manipulation methods

There's actually nothing wrong with document.write, per se. The problem is that it's really easy to misuse it. Grossly, even.
In terms of vendors supplying analytics code (like Google Analytics) it's actually the easiest way for them to distribute such snippets
It keeps the scripts small
They don't have to worry about overriding already established onload events or including the necessary abstraction to add onload events safely
It's extremely compatible
As long as you don't try to use it after the document has loaded, document.write is not inherently evil, in my humble opinion.

Another legitimate use of document.write comes from the HTML5 Boilerplate index.html example.
<!-- Grab Google CDN's jQuery, with a protocol relative URL; fall back to local if offline -->
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.3/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="js/libs/jquery-1.6.3.min.js"><\/script>')</script>
I've also seen the same technique for using the json2.js JSON parse/stringify polyfill (needed by IE7 and below).
<script>window.JSON || document.write('<script src="json2.js"><\/script>')</script>

It can block your page
document.write only works while the page is loading; If you call it after the page is done loading, it will overwrite the whole page.
This effectively means you have to call it from an inline script block - And that will prevent the browser from processing parts of the page that follow. Scripts and Images will not be downloaded until the writing block is finished.

Pro:
It's the easiest way to embed inline content from an external (to your host/domain) script.
You can overwrite the entire content in a frame/iframe. I used to use this technique a lot for menu/navigation pieces before more modern Ajax techniques were widely available (1998-2002).
Con:
It serializes the rendering engine to pause until said external script is loaded, which could take much longer than an internal script.
It is usually used in such a way that the script is placed within the content, which is considered bad-form.

Here's my twopence worth, in general you shouldn't use document.write for heavy lifting, but there is one instance where it is definitely useful:
http://www.quirksmode.org/blog/archives/2005/06/three_javascrip_1.html
I discovered this recently trying to create an AJAX slider gallery. I created two nested divs, and applied width/height and overflow: hidden to the outer <div> with JS. This was so that in the event that the browser had JS disabled, the div would float to accommodate the images in the gallery - some nice graceful degradation.
Thing is, as with the article above, this JS hijacking of the CSS didn't kick in until the page had loaded, causing a momentary flash as the div was loaded. So I needed to write a CSS rule, or include a sheet, as the page loaded.
Obviously, this won't work in XHTML, but since XHTML appears to be something of a dead duck (and renders as tag soup in IE) it might be worth re-evaluating your choice of DOCTYPE...

It overwrites content on the page which is the most obvious reason but I wouldn't call it "bad".
It just doesn't have much use unless you're creating an entire document using JavaScript in which case you may start with document.write.
Even so, you aren't really leveraging the DOM when you use document.write--you are just dumping a blob of text into the document so I'd say it's bad form.

It breaks pages using XML rendering (like XHTML pages).
Best: some browser switch back to HTML rendering and everything works fine.
Probable: some browser disable the document.write() function in XML rendering mode.
Worst: some browser will fire an XML error whenever using the document.write() function.

Off the top of my head:
document.write needs to be used in the page load or body load. So if you want to use the script in any other time to update your page content document.write is pretty much useless.
Technically document.write will only update HTML pages not XHTML/XML. IE seems to be pretty forgiving of this fact but other browsers will not be.
http://www.w3.org/MarkUp/2004/xhtml-faq#docwrite

Chrome may block document.write that inserts a script in certain cases. When this happens, it will display this warning in the console:
A Parser-blocking, cross-origin script, ..., is invoked via
document.write. This may be blocked by the browser if the device has
poor network connectivity.
References:
This article on developers.google.com goes into more detail.
https://www.chromestatus.com/feature/5718547946799104

Browser Violation
.write is considered a browser violation as it halts the parser from rendering the page. The parser receives the message that the document is being modified; hence, it gets blocked until JS has completed its process. Only at this time will the parser resume.
Performance
The biggest consequence of employing such a method is lowered performance. The browser will take longer to load page content. The adverse reaction on load time depends on what is being written to the document. You won't see much of a difference if you are adding a <p> tag to the DOM as opposed to passing an array of 50-some references to JavaScript libraries (something which I have seen in working code and resulted in an 11 second delay - of course, this also depends on your hardware).
All in all, it's best to steer clear of this method if you can help it.
For more info see Intervening against document.write()

I don't think using document.write is a bad practice at all. In simple words it is like a high voltage for inexperienced people. If you use it the wrong way, you get cooked. There are many developers who have used this and other dangerous methods at least once, and they never really dig into their failures. Instead, when something goes wrong, they just bail out, and use something safer. Those are the ones who make such statements about what is considered a "Bad Practice".
It's like formatting a hard drive, when you need to delete only a few files and then saying "formatting drive is a bad practice".

Based on analysis done by Google-Chrome Dev Tools' Lighthouse Audit,
For users on slow connections, external scripts dynamically injected via document.write() can delay page load by tens of seconds.

One can think of document.write() (and .innerHTML) as evaluating a source code string. This can be very handy for many applications. For example if you get HTML code as a string from some source, it is handy to just "evaluate" it.
In the context of Lisp, DOM manipulation would be like manipulating a list structure, e.g. create the list (orange) by doing:
(cons 'orange '())
And document.write() would be like evaluating a string, e.g. create a list by evaluating a source code string like this:
(eval-string "(cons 'orange '())")
Lisp also has the very useful ability to create code using list manipulation (like using the "DOM style" to create a JS parse tree). This means you can build up a list structure using the "DOM style", rather than the "string style", and then run that code, e.g. like this:
(eval '(cons 'orange '()))
If you implement coding tools, like simple live editors, it is very handy to have the ability to quickly evaluate a string, for example using document.write() or .innerHTML. Lisp is ideal in this sense, but you can do very cool stuff also in JS, and many people are doing that, like http://jsbin.com/

A simple reason why document.write is a bad practice is that you cannot come up with a scenario where you cannot find a better alternative.
Another reason is that you are dealing with strings instead of objects (it is very primitive).
It does only append to documents.
It has nothing of the beauty of for instance the MVC (Model-View-Controller) pattern.
It is a lot more powerful to present dynamic content with ajax+jQuery or angularJS.

The disadvantages of document.write mainly depends on these 3 factors:
a) Implementation
The document.write() is mostly used to write content to the screen as soon as that content is needed. This means it happens anywhere, either in a JavaScript file or inside a script tag within an HTML file. With the script tag being placed anywhere within such an HTML file, it is a bad idea to have document.write() statements inside script blocks that are intertwined with HTML inside a web page.
b) Rendering
Well designed code in general will take any dynamically generated content, store it in memory, keep manipulating it as it passes through the code before it finally gets spit out to the screen. So to reiterate the last point in the preceding section, rendering content in-place may render faster than other content that may be relied upon, but it may not be available to the other code that in turn requires the content to be rendered for processing. To solve this dilemma we need to get rid of the document.write() and implement it the right way.
c) Impossible Manipulation
Once it's written it's done and over with. We cannot go back to manipulate it without tapping into the DOM.

I think the biggest problem is that any elements written via document.write are added to the end of the page's elements. That's rarely the desired effect with modern page layouts and AJAX. (you have to keep in mind that the elements in the DOM are temporal, and when the script runs may affect its behavior).
It's much better to set a placeholder element on the page, and then manipulate it's innerHTML.

using document.write in remotely loaded javascript to write out content - why a bad idea?

I'm not a full-time Javascript developer. We have a web app and one piece is to write out a small informational widget onto another domain. This literally is just a html table with some values written out into it. I have had to do this a couple of times over the past 8 years and I always end up doing it via a script that just document.write's out the table.
For example:
document.write('<table border="1"><tr><td>here is some content</td></tr></table>');
on theirdomain.com
<body>
....
<script src='http://ourdomain.com/arc/v1/api/inventory/1' type='text/javascript'></script>
.....
</body>
I always think this is a bit ugly but it works fine and we always have control over the content (or a trusted representative has control such as like your current inventory or something). So another project like this came up and I coded it up in like 5 minutes using document.write. Somebody else thinks this is just too ugly but I don't see what the problem is. Re the widget aspect, I have also done iframe and jsonp implementations but iframe tends not to play well with other site's css and jsonp tends to just be too much. Is there a some security element I'm missing? Or is what I'm doing ok? What would be the strongest argument against using this technique? Is there a best practice I don't get?

To be honest, I don't really see a problem. Yes, document.write is very old-school, but it is simple and universally supported; you can depend on it working the same in every browser.
For your application (writing out a HTML table with some data), I don't think a more complex solution is necessary if you're willing to assume a few small risks. Dealing with DOM mutation that works correctly across browsers is not an easy thing to get right if you're not using jQuery (et al).
The risks of document.write:
Your script must be loaded synchronously. This means a normal inline script tag (like you're already using). However, if someone gets clever and adds the async or defer attributes to your script tag (or does something fancy like appending a dynamically created script element to the head), your script will be loaded asynchronously.
This means that when your script eventually loads and calls write, the main document may have already finished loading and the document is "closed". Calling write on a closed document implicitly calls open, which completely clears the DOM – it's esentially the same as wiping the page clean and starting from scratch. You don't want that.
Because your script is loaded synchronously, you put third-party pages at the mercy of your server. If your server goes down or gets overloaded and responds slowly, every page that contain your script tag cannot finish loading until your server does respond or the browser times out the request.
The people who put your widget on their website will not be happy.
If you're confident in your uptime, then there's really no reason to change what you're doing.
The alternative is to load your script asynchronously and insert your table into the correct spot in the DOM. This means third parties would have to both insert a script snippet (either <script async src="..."> or use the dynamic script tag insertion trick. They would also need to carve out a special <div id="tablegoeshere"> for you to put your table into.

Using document.write() after loading the entire DOM do not allow you to access DOM any further.
See Why do I need to use document.write instead of DOM manipulation methods?.
You are in that case putting away a very powerfull functionnality of in web page...

Is there a some security element I'm missing?
The security risk is for them in that theirdomain.com trusting your domain's script code to not do anthing malicous. Your client script will run in the context of their domain and can do what it likes such as stealing cookies or embedding a key logger (not that you would do that of course). As long as they trust you, that is fine.

If source code is altered then redirect

Is it possible to use jQuery/Javascript to see if a webpages source code is altered by a visitor and if so redirect them?
And by altered, I mean if they open firebug or something and edit anything on the page once its finished loading?

This seems like a hack to prevent people from messing with your forms.
This is most definitely not the right way to make your site more secure; security must always come from the server-side or, if everything is done via the front-end, in a way that can only hurt the user who is currently signed in.
Even if you did succeed in implementing this using JavaScript, the first thing I would do is disable exactly that :) or just disable JavaScript, use wget, inspect the code first, then write a curl work-around, etc.

Even if there is a way to do that, the visitor can still edit this verification, so this is pointless.

yes. at the loading store the innerhtml of the html element in a string.
then set an interval every second to check if the current html matches the stored var.

prevent parent DOM manipulation for a child iframe script

I have a page containing another page on the same domain inside a frame. Is it possible to prevent that a script in the framed page can manipulate the top page DOM (for example adding an element or a script)?

You could experiment with getting rid of "dangerous" functions but saving anonymous references to them something like..
(function(){
var hiddenrefs = {};
hiddenrefs.dGetElementById = document.getElementById;
document.getElementById = null;
})();
and so on. However, this would be a very tedious job and bound to fail anyway. If this is an attempt to let users run Javascript in a controlled environment inside an iframe, this is a misguided form of security. The iframe could just issue top.location = "http://www.myevilpage.com" in which case it's game over for you anyway. (This is true even with a different domain. The iframe can still redirect the user and all sorts of nasty stuff, even if it strictly speaking can't access the parent's DOM.) Letting users run JS code is never ever safe without filtering the source code for malicious code, and even with filtering it's fairly unsafe because it's mostly easy to bypass the filtering. Many have tried and many have failed. I'd recommend not letting users run Javascript, ever.

The best solution is probably to use the HTML5 sandbox attribute on the iframe, which (by default) explicitly disables both scripting and same-origin access to the parent DOM.
See http://msdn.microsoft.com/en-us/hh563496.aspx

How to neutralize injected remote Ajax content?

I'll be inserting content from remote sources into a web app. The sources should be limited/trusted, but there are still a couple of problems:
The remote sources could
1) be hacked and inject bad things
2) overwrite objects in my global names
space
3) I might eventually open it up for users to enter their own remote source. (It would be up to the user to not get in trouble, but I could still reduce the risk.)
So I want to neutralize any/all injected content just to be safe.
Here's my plan so far:
1) find and remove all inline event handlers
str.replace(/(<[^>]+\bon\w+\s*=\s*["']?)/gi,"$1return;"); // untested
Ex.
<a onclick="doSomethingBad()" ...
would become
<a onclick="return;doSomethingBad()" ...
2) remove all occurences of these tags:
script, embed, object, form, iframe, or applet
3) find all occurences of the word script within a tag
and replace the word script with html entities for it
str.replace(/(<[>+])(script)/gi,toHTMLEntitiesFunc);
would take care
<a href="javascript: ..."
4) lastly any src or href attribute that doesn't start with http, should have the domain name of the remote source prepended to it
My question: Am I missing anything else? Other things that I should definitely do or not do?
Edit: I have a feeling that responses are going to fall into a couple camps.
1) The "Don't do it!" response
Okay, if someone wants to be 100% safe, they need to disconnect the computer.
It's a balance between usability and safety.
There's nothing to stop a user from just going to a site directly and being exposed. If I open it up, it will be a user entering content at their own risk. They could just as easily enter a given URL into their address bar as in my form. So unless there's a particular risk to my server, I'm okay with those risks.
2) The "I'm aware of common exploits and you need to account for this ..." response ... or You can prevent another kind of attack by doing this ... or What about this attack ...?
I'm looking for the second type unless someone can provide specific reasons why my would be more dangerous than what the user can do on their own.

Instead of sanitizing (black listing). I'd suggest you setup a white list and ONLY allow those very specific things.
The reason for this is you will never, never, never catch all variations of malicious script. There's just too many of them.

don't forget to also include <frame> and <frameset> along with <iframe>

for the sanitization thing , are you looking for this?
if not, perhaps you could learn a few tips from this code snippet.
But, it must go without saying that prevention is better than cure. You had better allow only trusted sources, than allow all and then sanitize.
On a related note, you may want to take a look at this article, and its slashdot discussion.

It sounds like you want to do the following:
Insert snippets of static HTML into your web page
These snippets are requested via AJAX from a remote site.
You want to sanitise the HTML before injecting into the site, as this could lead to security problems like XSS.
If this is the case, then there are no easy ways to strip out 'bad' content in JavaScript. A whitelist solution is the best, but this can get very complex. I would suggest proxying requests for the remote content through your own server and sanitizing the HTML server side. There are various libraries that can do this. I would recommend either AntiSamy or HTMLPurifier.
For a completely browser-based way of doing this, you can use IE8's toStaticHTML method. However no other browser currently implements this.

We Keep Coding

JavaScript is the programming language of the Web.