I'm trying to display a html email on a html page. Technically I can do it but security is a concern, it's possible an attacker could form malicious code and put into a html email.
I've tried using the Microsoft XSS library to sanitize the html but it strips out so much it basically not worth it.
I'm wondering if there's a better solution with iframes or something. eg, is there a way to secure data within an iframe?
GMail seems to display html emails, they must have a good html sanitizer.
Your basic options are:
sanitize the HTML (use a whitelist approach, for safety)
use an iframe with a src on a different domain, or with the html5 sandbox attribute
Both can be done effectively and there are lots of variations in the detail.
Regarding sanitizing so much it wasn't worth it: good libraries like https://github.com/guardian/html-janitor/ (javascript) and https://github.com/jsocol/bleach (python) have the ability to customize the whitelist. It depends if you're just trying to present typical user-generated HTML emails with basic formatting or if you're trying to display fully "designed" newsletters with lots of images, tables, etc. If just the former, in some quick experimenting with bleach I was able to make most emails look good by simply adding br and div to the list of approved tags, so the whitespace didn't get wiped.
Related
I'm working on a CMS application, where users can build and manage their own websites. There is a CRUD of HTML pages, in which while creation/updation, we are sanitizing user's input and removing any JavaScript code.
Some user's need to add widgets on their pages, which can be from any source. How can I allow them to do so without compromising the security of their page ? Basically, I want to allow <script> tag from trusted sources and which might have some JavaScript content within them.
If you don't want JS on these pages, then you probably should not open Pandora's box by allowing some 'under certain circumstances'.
A 'trusted' source is hard enough to define and potentially even harder to control. Also, the original source may include 3rd party scripts and it would be near impossible to test and monitor every single one of them.
At the very least, I would recommend to embed widgets in iframes so that they can't interfere with the main page.
As a side note: you're probably already aware of that, but removing all JS code is not an easy task either as it may be included in many different places, be obfuscated, etc.
Just as a quick (and safe) example:
<img src="foo" onerror="console.log(atob('SSBoYXMgaGFja2VkIHRoZSBJbnRlcm5ldHMhIQ=='))">
That said, I doubt that a user would intentionally corrupt his own page. Which raises another question: do you really need to block JS in the first place?
You may have specific reasons that are not obvious in your post, though.
All of this is based on my own understanding of your problem, so don't hesitate to provide us with more details if I missed the plot.
I have a textarea in which I have put validation code not to allow <script> tags and Javascript tags, but the user can enter descriptions like <strong onmouseover=alert(2)>.
So when someone hovers over this string tag JS alert box shows up.
How can I stop this kind of javascript injection?
You'll need to properly sanitize the HTML you allow. This is non-trivial, as you've discovered. (You probably need to disallow iframe and several other elements.)
Proper sanitizing requires a whitelist of elements, and within those a whitelist of attributes allowed on each. Obviously the various onXyz attributes would not be on the whitelist.
Sanitizing must happen server-side, because anything client-side can be bypassed. So without knowing what server technology you're using, one can't recommend something. For instance, JSoup is a well-known one for Java, but of course, that's not useful to you if you aren't using Java. :-) For .Net, there's the HTML Agility Pack or the Microsoft Anti-XSS library, but this is a very incomplete list.
There are a lot of tools called html purifiers. You can try this for example.
The easy answer is replace(/</g,'<');, but of course that prevents any HTML from being used. This is why BBCode, Markdown and other such languages exist: to provide formatting features without granting the user permission to post arbitrary code.
Alternatively, just search for things of the pattern /\bon[a-z]+=/i
Is there any support for putting non-webfonts into emails now? Not just using #font-face, maybe another method?
I found this SO question from some time back, along with some other questions and articles from about the same time period.
Not consistently. There are popular email clients that still remove all CSS from HTML emails.
I was able to find a blog post on Campaign Monitor's site that has some test results from using this technique. Almost all email clients stripped out #font-face specifically, regardless of their general CSS support: http://www.campaignmonitor.com/blog/post/3044/does-font-face-work-in-email/.
Your best bet would be to use images, which isn't a great solution for a number of reasons, the main one being that images are commonly blocked by default and you want clients to be able to view the content of the email regardless.
This will not work. Web-based email systems will simply strip out your font statements. MS Outlook uses the MS-Word engine to display mails and is incapable of doing this.
As far at HTML email goes, it's still 1999 and will remain that way for a loooooong time.
A persistent follow-up of an admittedly similar question I had asked: What security restrictions should be implemented in allowing a user to upload a Javascript file that directs canvas animation?
I like to think I know JS decent enough, and I see common characters in all the XSS examples I've come accoss, which I am somewhat familiar with. I am lacking good XSS examples that could bypass a securely sound, rationally programmed system. I want people to upload html5 canvas creations onto my site. Any sites like this yet? People get scared about this all the time it seems, but what if you just wanted to do it for fun for yourself and if something happens to the server then oh well it's just an animation site and information is spread around like wildfire anyway so if anyone cares then i'll tell them not to sign up.
If I allow a single textarea form field to act as an IDE using JS for my programming language written in JS, and do string replacing, filtering, and validation of the user's syntax before finally compiling it into JS to be echoed by PHP, how bad could it get for me to host that content? Please show me how you could bypass all of my combined considerations, with also taking into account the server-side as well:
If JavaScript is disabled, preventing any POST from getting through, keeping constant track of user session.
Namespacing the Class, so they can only prefix their functions and methods with EXAMPLE.
Making instance
Storing my JS Framework in an external (immutable in the browser?) JS file, which needs to be at the top of the page for the single textarea field in the form to be accepted, as well as a server-generated key which must follow it. On the page that hosts the compiled user-uploaded canvas game/animation (1 per page ONLY), the server will verify the correct JS filename string before echoing the rest out.
No external script calls! String replacing on client and server.
Allowing ONLY alphanumeric characters, dashes and astericks.
Removing alert, eval, window, XMLHttpRequest, prototyping, cookie, obvious stuff. No native JS reserved words or syntax.
Obfuscating and minifying another external JS file that helps to serve the IDE and recognize the programming language's uniquely named Canvas API methods.
When Window unloads, store the external JS code in to two dynamically generated form fields to be checked by the server in POST. All the original code will be cataloged in the DB thoroughly for filtering purposes.
Strict variable naming conventions ('example-square1-lengthPROPERTY', 'example-circle-spinMETHOD')
Copy/Paste Disabled, setInterval to constantly check if enabled by the user. If so, then trigger a block to the database, change window.location immediately and check the session ID through POST to confirm in case JS becomes disabled between that timeframe.
I mean, can I do it then? How can one do harm if they can't use HEX or ASCII and stuff like that?
I think there are a few other options.
Good places to go for real-life XSS tests, by the way, are the XSS Cheat Sheet and HTML5 Security Cheetsheet (newer). The problem with that, however, is that you want to allow Javascript but disallow bad Javascript. This is a different, and more complex, goal than the usual way of preventing XSS, by preventing all scripts.
Hosting on a separate domain
I've seen this referred to as an "iframe jail".
The goal with XSS attacks is to be able to run code in the same context as your site - that is, on the same domain. This is because the code will be able to read and set cookies for that domain, intiate user actions or redress your design, redirect, and so forth.
If, however, you have two separate domains - one for your site, and another which only hosts the untrusted, user-uploaded content, then that content will be isolated from your main site. You could include it in an iframe, and yet it would have no access to the cookies from your site, no access to redress or alter the design or links outside its iframe, and no access to the scripting variables of your main window (since it is on a different domain).
It could, of course, set cookies as much as it likes, and even read back the ones that it set. But these would still be isolated from the cookies for your site. It would not be able to affect or read your main site's cookies. It could also include other code which could annoy/harrass the user, such as pop-up windows, or could attempt to phish (you'd need to make it visually clear in your out-of-iframe UI that the content served is not part of your site). However, this is still sandboxed from your main site, where you own personal payload - your session cookies and the integrity of your overarching page design and scripts, is preserved. It would carry no less but no more risk than any site on the internet that you could embed in an iframe.
Using a subset of Javascript
Subsets of Javascript have been proposed, which provide compartmentalisation for scripts - the ability to load untrusted code and have it not able to alter or access other code if you don't give it the scope to do so.
Look into things like Google CAJA - whose aim is to enable exactly the type of service that you've described:
Caja allows websites to safely embed DHTML web applications from third parties, and enables rich interaction between the embedding page and the embedded applications. It uses an object-capability security model to allow for a wide range of flexible security policies, so that the containing page can effectively control the embedded applications' use of user data and to allow gadgets to prevent interference between gadgets' UI elements.
One issue here is that people submitting code would have to program it using the CAJA API. It's still valid Javascript, but it won't have access to the browser DOM, as CAJA's API mediates access. This would make it difficult for your users to port some existing code. There is also a compilation phase. Since Javascript is not a secure language, there is no way to ensure code cannot access your DOM or other global variables without running it through a parser, so that's what CAJA does - it compiles it from Javascript input to Javascript output, enforcing its security model.
htmlprufier consists of thousands of regular expressions that attempt "purify" html into a safe subset that is immune to xss. This project is bypassesed very few months, because it isn't nearly complex enough to address the problem of XSS.
Do you understand the complexity of XSS?
Do you know that javascript can exist without letters or numbers?
Okay, they very first thing I would try is inserting a meta tag that changes the encoding to I don't know lets say UTF-7 which is rendered by IE. Within this utf-7 enocded html it will contain javascript. Did you think of that? Well guess what there is somewhere between a hundred thousand and a a few million other vectors I didn't think of.
The XSS cheat sheet is so old my grandparents are immune to it. Here is a more up to date version.
(Oah and by the way you will be hacked because what you are trying to do fundamentally insecure.)
I am wondering what best practices are for providing dynamic content in lightweight, 'drop in' widget style that can be used by third party content editors.
To elaborate, we would like to give third parties the ability to show dynamic content from us on their website without a back end system integration where they would have to call one of our APIs server side - ideally it would be possible for their content editors simply to include a provided snippit in their HTML. A concrete example would be a bestseller list that changes every few hours.
Using an IFRAME is one obvious way of accomplishing this, but I'm curious if there are others that allow tighter integration into their source and more flexible styling and are 'expected best practice' for such an offering as it isn't a field I know well - JavaScript/JSON perhaps?
I'd call an iframe best practice since it does not grant the framed content any excess rights, but having a JavaScript file that other sites can include seems pretty common as well, so you could probably get a lot of site owners to accept that. Still, the iframe is preferable, you shouldn't use JavaScript unless it really makes a difference.
You can easily make the to-be-iframed page configurable through parameters in the link, so site owners can set things like background and font to match their own site.
Alternative to iFrames: JSONP
JSONP is used by Javascript widget libraries to pull in data from the widget library's server since JSONP gets around the same-origin issues.
This enables your JS widget library to provide data and UI services to the hosting page without any changes to the hosting page's server.
It's clean, neat, and avoids various iframe issues.
As mentioned in other answers, anyone including your JS in their pages is trusting that your JS is not a security/privacy issue. But that's not a problem depending on your relationship with the folks who'd include your library.
Be aware that you're opening a potential security Pandora's box. Take a look at the Caja project, it allows to safely embed untrusted JavaScript content.