I have a web page with some user content displayed on initial page load. There's a button that can trigger further page loads (somewhat like Twitter's infinite scrolling). The Django template that renders the original page is also used to render content for infinite scroll (partial view). The server sends formatted HTML via AJAX that can be easily inserted into my existing page like this:
$(new-Html).insertAfter($('existing-content'))
With this option, I can reuse the existing template to render my content. Is it worth the convenience and can I assume that escaping at Django's end covers me from XSS? It's this jQuery ticket that worries me. It can be dangerous stuffing all that HTML into a selector $(..)
Or, should I use JSON as response type and carefully craft all user content as text nodes with jQuery? This is a lot more work & error prone since I'll be duplicating the template rendering to a large extent. Even though this route appears a lot more safer, it's difficult to maintain two redundant rendering methods, one via template, other via JS.
update : I considered using innerHTML as suggested in one of the comments but not sure about that either : http://www.slideshare.net/x00mario/the-innerhtml-apocalypse (Mario Heiderich on mXSS)
The Django template that renders the original page is also used to render content for infinite scroll (partial view).
Therefore you are safe. If all you are doing is rendering content that you have control of and already trust for your initial page load, then there will be nothing malicious in the page source to be rendered. In this case as you are loading form a trusted source, if there was any malicious code rendered that would cause XSS then this would be an issue with the AJAX service that supplies the source, not the JQuery that renders it.
The ticket you refer to is regarding XSS WITH $(LOCATION.HASH) AND $(#<TAG>) which is different to what you are doing (and has also been marked as fixed).
Update
Use this line before passing the HTML into the jQuery it will remove any scripts (including onmouseover etc.) that are included in the HTML:
var cleansed = str.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gmi, "");
cleansed = cleansed.replace(/\bon\w+\s*=[\s\S]*?>/gmi, ">");
I still say that the jQuery/Django method is the ideal method. The above line will clean the content of scripts before you add it to the DOM.
Previous
Django should escape anything it's passing into that line of jQuery. So long as you aren't using an eval() statement anywhere in there, cross-site scripting shouldn't be a concern. I'd say it's more practical to continue the way you're going.
Related
I'm not a full-time Javascript developer. We have a web app and one piece is to write out a small informational widget onto another domain. This literally is just a html table with some values written out into it. I have had to do this a couple of times over the past 8 years and I always end up doing it via a script that just document.write's out the table.
For example:
document.write('<table border="1"><tr><td>here is some content</td></tr></table>');
on theirdomain.com
<body>
....
<script src='http://ourdomain.com/arc/v1/api/inventory/1' type='text/javascript'></script>
.....
</body>
I always think this is a bit ugly but it works fine and we always have control over the content (or a trusted representative has control such as like your current inventory or something). So another project like this came up and I coded it up in like 5 minutes using document.write. Somebody else thinks this is just too ugly but I don't see what the problem is. Re the widget aspect, I have also done iframe and jsonp implementations but iframe tends not to play well with other site's css and jsonp tends to just be too much. Is there a some security element I'm missing? Or is what I'm doing ok? What would be the strongest argument against using this technique? Is there a best practice I don't get?
To be honest, I don't really see a problem. Yes, document.write is very old-school, but it is simple and universally supported; you can depend on it working the same in every browser.
For your application (writing out a HTML table with some data), I don't think a more complex solution is necessary if you're willing to assume a few small risks. Dealing with DOM mutation that works correctly across browsers is not an easy thing to get right if you're not using jQuery (et al).
The risks of document.write:
Your script must be loaded synchronously. This means a normal inline script tag (like you're already using). However, if someone gets clever and adds the async or defer attributes to your script tag (or does something fancy like appending a dynamically created script element to the head), your script will be loaded asynchronously.
This means that when your script eventually loads and calls write, the main document may have already finished loading and the document is "closed". Calling write on a closed document implicitly calls open, which completely clears the DOM – it's esentially the same as wiping the page clean and starting from scratch. You don't want that.
Because your script is loaded synchronously, you put third-party pages at the mercy of your server. If your server goes down or gets overloaded and responds slowly, every page that contain your script tag cannot finish loading until your server does respond or the browser times out the request.
The people who put your widget on their website will not be happy.
If you're confident in your uptime, then there's really no reason to change what you're doing.
The alternative is to load your script asynchronously and insert your table into the correct spot in the DOM. This means third parties would have to both insert a script snippet (either <script async src="..."> or use the dynamic script tag insertion trick. They would also need to carve out a special <div id="tablegoeshere"> for you to put your table into.
Using document.write() after loading the entire DOM do not allow you to access DOM any further.
See Why do I need to use document.write instead of DOM manipulation methods?.
You are in that case putting away a very powerfull functionnality of in web page...
Is there a some security element I'm missing?
The security risk is for them in that theirdomain.com trusting your domain's script code to not do anthing malicous. Your client script will run in the context of their domain and can do what it likes such as stealing cookies or embedding a key logger (not that you would do that of course). As long as they trust you, that is fine.
Currently I am creating a website which is completely JS driven. I don't use any HTML pages at all (except index page). Every query returns JSON and then I generate HTML inside JavaScript and insert into the DOM. Are there any disadvantages of doing this instead of creating HTML file with layout structure, then loading this file into the DOM and changing elements with new data from JSON?
EDIT:
All of my pages are loaded with AJAX calls. But I have a structure like this:
<nav></nav>
<div id="content"></div>
<footer></footer>
Basically, I never change nav or footer elements, they are only loaded once, when loading index.html file. Then on every page click I send an AJAX call to the server, it returns data in JSON and I generate HTML code with jQuery and insert like this $('#content').html(content);
Creating separate HTML files, and then for example using $('#someID').html(newContent) to change every element with JSON data, will use even more code and I will need 1 more request to server to load this file, so I thought I could just generate it in browser.
EDIT2:
SEO is not very important, because my website requires logging in so I will create all meta tags in index.html file.
In general, it's a nice way of doing things. I assume that you're updating the page with AJAX each time (although you didn't say that).
There are some things to look out for. If you always have the same URL, then your users can't come back to the same page. And they can't send links to their friends. To deal with this, you can use history.pushState() to update the URL without reloading the page.
Also, if you're sending more than one request per page and you don't have an HTML structure waiting for them, you may get them back in a different order each time. It's not a problem, just something to be aware of.
Returning HTML from the AJAX is a bad idea. It means that when you want to change the layout of the page, you need to edit all of your files. If you're returning JSON, it's much easier to make changes in one place.
One thing that definitly matters :
How long will it take you to develop a new system that will send data as JSON + code the JS required to inject it as HTML into the page ?
How long will it take to just return HTML ? And how long if you can re-use some of your already existing server-side code ?
and check how much is the server side interrection of your pages...
also some advantages of creating pure HTML :
1) It's simple markup, and often just as compact or actually more compact than JSON.
2) It's less error prone cause all you're getting is markup, and no code.
3) It will be faster to program in most cases cause you won't have to write code separately for the client end.
4) The HTML is the content, the JavaScript is the behavior. You're mixing both for absolutely no compelling reason.
in javascript or nay other scripting language .. if you encountered a problem in between the rest of the code will not work
and also it is easier to debug in pure html pages
my opinion ... use scriptiong code wherever necessary .. rest of the code you can do in html ...
it will save the triptime of going to server then fetch the data and then displaying it again.
Keep point No. 4 in your mind while coding.
I think that you can consider 3 methods:
Sending only JSON to the client and rendering according to a template (i.e.
handlerbar.js)
Creating the pages from the server-side, usually faster rendering also you can cache the page.
Or a mixture of this would be to generate partial views from the server and sending them to the client, for example it's like having a handlebar template on the client and applying the data from the JSON, but only having the same template on the server-side and rendering it on the server and sending it to the client in the final format, on the client you can just replace the partial views.
Also some things to think about determined by the use case of the applicaton, is that if you are targeting SEO you should consider ColBeseder advice, of if you are targeting mobile users, probably you would better go with the JSON only response, as this is a more lightweight response.
EDIT:
According to what you said you are creating a single page application, if this is correct, then probably you can go with either the JSON or a partial views like AngularJS has. But if your server-side logic is written to handle only JSON response, then probably you could better use a template engine on the client like handlerbar.js, underscore, or jquery templates, and you can define reusable portions of your HTML and apply to it the data from the JSON.
If you cared about SEO you'd want the HTML there at page load, which is closer to your second strategy than your first.
Update May 2014: Google claims to be getting better at executing Javascript: http://googlewebmastercentral.blogspot.com/2014/05/understanding-web-pages-better.html Still unclear what works and what does not.
Further updates probably belong here: Do Google or other search engines execute JavaScript?
I was reading this article
http://msdn.microsoft.com/en-us/magazine/hh708755.aspx
related to securing Asp.net Application, but one thing i am not able to understand like i am browing url http://www.abc.com/XSS.aspx?test=ok and if i replace it with http://www.abc.com/XSS.aspx?test=alert('hacked')... how the site is not safe or hacked?The point i am trying to make here is that it is not impacting or affecting the site?
The example i have mentioned above, is mentioned at many places whereever it discusses security,but didn't understand
Just imagine this if you are going to output the value of "test"(without escaping it properly for html usage) on your html page then one could possibly inject any javascript on your page !! Some possible exploits could be changing the background to something obscene or even redirecting your page to some scam websites .. in effect making you accessory to fraud of somekind !!
ALWAYS USE PROPER ESCAPING FOR STORING OR USING USER SUBMITTED INFORMATION!!
Edit: The escaping I am talking about will be useful so that people dont inject html or JS in your database. This would eventually lead to every user getting the injected HTML/JS (if the injected variable is same for everyone) on their page .. not just the user who injected it !!
I'm trying to create an infinite scrolling page - somewhat similar to tumblr archive pages like this. I understand the concept that I have to load the content with a server call, but I don't know how to achieve this "animated loading" design like in Tumblr.
I don't want to know the exact code, only the overall concept of the solution. So what would be the best practice to do things like this?
What should I get from the server: a bunch of JSON data or a full HTML page?
I have tried to decode the Tumblr page above, and I saw on my network traffic page that in every scroll event there is a POST request which returns a full HTML page which has its own JavaScript and CSS content!
I guess that the animation logic is inside of this JavaScript content.
But I have 2 questions about this method:
When I get the full HTML page from the server (which contains the new page as well), how can I throw the currently displayed HTML document away and I set the new one?
Isn't it too bad from a performance point of view to return a full HTML document every time? Because the full document would contain the results of the previous "pages" of the archive as well. Or do I think wrong?
Wouldn't it be better to return a JSON-only result from the server? (It have to be parsed on the client but it would be more network traffic-friendly, I guess)
If it would be better to return a JSON, why the Tumblr works on the other way?
It surely is beneficial to not send lots of data that will not be used.
However, if your server has a lot of resources, you can do some preprocessing on the server instead of client. This means, instead of JSON, you can send an HTML snippet, the block that will be added. Moreover, if your HTML structure is very complex, you don't want to implement it twice; once in HTML and once in Javascript.
The way Tumblr works might be because they don't want to add much more to the server code base, and instead offload the work to the client. Since only one page is sent at a time, the overhead is constant w.r.t. the number of pages. The client can just take the full HTML, find the corresponding element with DOM manipulation and place it somewhere.
In fact, that is what the AutoPager plugin does: It learns the "next" link and the page body from the user, then fetches additional full pages from the unsuspecting server and inserts their content into the page (and reads the next page url).
In short:
The benefit of JSON is low bandwidth usage.
The benefit of HTML snippets is low demands on client processing power, and little to no code duplication.
The benefit of full HTML is that the server needs not care if it's serving the first page or any other.
I'm writing a web app that inserts and modifies HTML elements via AJAX using JQuery. It works very nicely, but I want to be sure everything is ok under the bonnet. When I inspect the source of the page in IE or Chrome it shows me the original document markup, not what has changed since my AJAX calls.
I love using the WC3 validator to check my markup as it occasionally reminds me that I've forgotten to close a tag etc. How can I use this to check the markup of my page after the original source served from the server has been changed via Javascript?
Thank you.
Use developer tool in chrome to explore the DOM : it will show you all the HTML you've added in javascript.
You can now copy it and paste it in any validator you want.
Or instead of inserting code in JQuery, give it to the console, the browser will then not be able to close tags for you.
console.log(myHTML)
Both previous answers make good points about the fact the browser will 'fix' some of the html you insert into the DOM.
Back to your question, you could add the following to a bookmark in your browser. It will write out the contents of the DOM to a new window, copy and paste it into a validator.
javascript:window.open("").document.open("text/plain", "").write(document.documentElement.outerHTML);
If you're just concerned about well-formedness (missing closing tags and such), you probably just want to check the structure of the chunks AJAX is inserting. (Once it's part of the DOM, it's going to be well-formed... just not necessarily the structure you intended.) The simplest way to do that would probably be to attempt to parse it using an XML library. (one with an HTML mode that can be made strict, if you're not using XHTML)
Actual validation (Testing the "You can't put tag X inside tag Y" rules which browsers generally don't care too much about) is a lot trickier and, depending on how much effort you're willing to put into it, may not be worth the trouble. (Because, if you validate them in isolation, you'll get a lot of "This is just a fragment" false positives)
Whichever you decide to use, you need to grab the AJAX responses before the browser parses them if you want a reliable test result. (While they're still just a string of text rather than a DOM tree)