I have two applications where users can submit HTML pages. I would like to make sure that no scripts are included in the HTML.
Normally you would escape content to get rid of scripts, but as this is HTML I can't do that. Anyone with good suggestions on how to do that? The applications are written in both C# and Java
OWASP has a project to scrub html and css
The first thing I'd do is see if there is a <script> tag in the HTML. That solves the first issue, then you have to make sure there are no inline onmouseover/onclick etc. events. You could maybe use a DOM Parser to go over all elements and remove all attributes that start with 'on'.
I have little to no experience in both C# as Java, so am unaware of any "easier" solutions that area already available. But maybe someone else here has a better idea for that.
Related
I'm using a Crawler4j and Jsoup to crawl a website and it works fine for the HTML text, but there are some important contents, which default values are hard coded in CSS and then dynamically set with JavaScript.
For example, I have the
and I need the width value, which in CSS is hard coded as 10px, but modified in JavaScript to, let's say, 5px.
Is there a way to get this value without using another crawler? Or a simple alternative?
I have already quite a lot of code, so I don't want to rewrite everything if there is a possibility to do that with the Crawler4j.
I hope my question is clear enough and thank you in advance for your help!
This is not possible with crawler4j nor with jsoup. They both handle only static HTML content.
There are several open issues related dynamic JavaScript execution on the official GitHub Repository: #49, #197 and #220.
To achieve your objectives, you would need to build a stack based on Selenium, CasperJS and/or PhantomJS, which could then be used for advanced crawling including JavaScript execution.
I am a beginner in HTML5, CSS3, and JavaScript. I reached my limit for the use of the trial version of Microsoft's OneNote. I like the program so much, I want to make an equivalent of it as an html version so I won't have to empty my pockets for the paid version.
The part I need help with is the part where you type in your notes. I don't know how to make a text edit field in html. Is it possible to do something like that? I would be satisfied if it could only do the same functions as note pad. Just so long as I am able to do the simple type and edit functions. Can someone show me how to code this or lead me to a site that teaches something like this?
Thanks! Tony.
There are plenty of solutions out there. Nicedit, CKEditor, etc. These all have a Rich text interface, and are javascript managed.
The simplest solution would be to just use a <textarea> which would allow for plain text input only.
The simplest way is to use the <textarea> tag in HTML. See this link too.
You can also use HTML5 Data caching to save your notes locally through your browser after implementing your textarea tags.
Here's a neat little plugin that should be relatively minor to install/use.
https://github.com/ekdevdes/storage.js
I've noticed that jQuery can create, and access non-existent/non-standard HTML tags. For example,
$('body').append('<fake></fake>').html('blah');
var foo = $('fake').html(); // foo === 'blah'
Will this break in some kind of validation? Is it a bad idea, or are there times this is useful? The main question is, although it can be done, should it be done?
Thanks in advance!
You can use non-standard HTML tags and most of the browsers should work fine, that's why you can use HTML5 tags in browsers that don't recognize them and all you need to do is tell them how to style them (particularly which tags are display: block). But I wouldn't recommend doing it for two reasons: first it breaks validation, and second you may use some tag that will later get added to HTML and suddenly your page stops working in newer browsers.
The biggest issue I see with this is that if you create a tag that's useful to you, who's to say it won't someday become standard? If that happens it may end up playing a role or get styles that you don't anticipate, breaking your code.
The rules of HTML do say that if manipulated through script the result should be valid both before and after the manipulation.
Validation is a means to an end, so if it works for you in some way, then I wouldn't worry too much about it. That said, I wouldn't do it to "sneak" past validation while using something like facebook's <fb:fan /> element - I'd just suck it up and admit the code wasn't valid.
HTML as such allows you to use any markup you like. Browsers may react differently to unknown tags (and don't they to known ones, too?), but the general bottom line is that they ignore unknown tags and try to render their contents instead.
So technically, nothing is stopping you from using <fake> elements (compare what IE7 would do with an HTML5 page and the new tags defined there). HTML standardization has always been an after-the-fact process. Browser vendors invented tags and at some point the line was drawn and it was called HTMLx.
The real question is, if you positively must do it. And if you care whether the W3C validator likes your document or not. Or if you care whether your fellow programmers like your document or not.
If you can do the same and stay within the standard, it's not worth the hassle.
There's really no reason to do something like this. The better way is to use classes like
<p class = "my_class">
And then do something like
$('p.my_class').html('bah');
Edit:
The main reason that it's bad to use fake tags is because it makes your HTML invalid and could screw up the rendering of your page on certain browsers since they don't know how to treat the tag you've created (though most would treat it as some kind of DIV).
That's the main reason this isn't good, it just breaks standards and leads to confusing code that is difficult to maintain because you have to explain what your custom tags are for.
If you were really determined to use custom tags, you could make your web page a valid XML file and then use XSLT to transform the XML into valid HTML. But in this case, I'd just stick with classes.
Is there a way to create your own HTML element? I want to make a specially designed check box.
I imagine such a thing would be done in JavaScript. Something akin to document.createHTMLElement but the ability to design your own element (and tag).
No, there isn't.
The HTML elements are limited to what the browser will handle. That is to say, if you created a custom firefox plugin, and then had it handle your special tag, then you "could" do it, for varying interpretations of "doing it". A list of all elements for a particular version of HTML may be found here: http://www.w3.org/TR/html4/index/elements.html
Probably, however, you don't actually want to. If you want to "combine" several existing elements in such a way as they operate together, then you can do that very JavaScript. For example, if you'd like a checkbox to, when clicked, show a dropdown list somewhere, populated with various things, you may do that.
Perhaps you may like to elaborate on what you actually want to achieve, and we can help further.
Yes, you can create your own tags. You have to create a Schema and import it on your page, and write a JavaScript layer to convert your new tags into existing HTML tags.
An example is fbml (Facebook Markup Language), which includes a schema and a JavaScript layer that Facebook wrote. See this: Open Graph protocol.
Using it you can make a like button really easily:
<fb:like href="http://developers.facebook.com/" width="450" height="80"/>
The easiest way would be probably to write a plugin say in Jquery (or Dojo, MooTools, pick one).
In case of jQuery you can find some plugins here http://plugins.jquery.com/ and use them as a sample.
You need to write own doctype or/and use own namespace to do this.
http://msdn.microsoft.com/en-us/magazine/cc301515.aspx
No, there is not. Moreover it is not allowed in HTML5.
Take a look at Ample SDK JavaScript GUI library that enables any custom elements or event namespaces client-side (this way XUL for example was implemented there) without interferring with the rules of HTML5.
Take a look into for example how XUL scale element implemented: http://github.com/clientside/amplesdk/blob/master/ample/languages/xul/elements/scale.js and its default stylesheet: http://github.com/clientside/amplesdk/blob/master/ample/languages/xul/themes/default/input.css
It's a valid question, but I think the name of the game from the UI side is progressive markup. Build out valid w3 compliant tags and then style them appropriately with javascript (in my case Jquery or Dojo) and CSS. A well-written block of CSS can be reused over and over (my favorite case is Jquery UI with themeroller) and style nearly any element on the page with just a one or two-word addition to the class declaration.
Here's some good Jquery/Javascript/CSS solutions that are relatively simple:
http://www.filamentgroup.com/examples/customInput/
http://aaronweyenberg.com/90/pretty-checkboxes-with-jquery
http://www.protofunc.com/scripts/jquery/checkbox-radiobutton/
Here's the spec for the upcoming (and promising) JqueryUI update for form elements:http://wiki.jqueryui.com/Checkbox
If you needed to validate input, this is an easy way to get inline validation with a single class or id tag: http://www.position-absolute.com/articles/jquery-form-validator-because-form-validation-is-a-mess/
Ok, so my solution isn't a 10 character, one line solution. However, Jquery Code aside, each individual tag wouldn't be much more than:
<input type="checkbox" id="theid">
So, while there would be a medium chunk of Jquery code, the individual elements would be very small, which is important if you're repeating it 250 times (programmatically) as my last project required. It's easy to code, degrades well, validates well, and because progressive markup would be on the user's end, have virtually no cost on the server end.
My current project is in Symfony--not my choice--which uses complex, bulky server-side tags to render form elements, validate, do javascript onclick, style, etc. This seems like what you were asking for at first....and let me tell you, it's CLUNKY. One tag to call a link can be 10 lines of code long! After being forced to do it, I'm not a fan.
Hm. The first thought is that you could create your own element and do a transformation with XSLT to the valid HTML then.
With the emergence of the emerging W3 Web Components standard, specifically the Custom Elements spec, you can now create your own custom HTML elements and register them with the parser with the document.register() DOM method.
X-Tag is a helpful sugar library, developed by Mozilla, that makes it even easier to work with Web Components, have a look: X-Tags.org
I am trying to figure out all the ways javascript can be written. I am making a white list of acceptable tags however the attributes are getting me.
In my rich html editor I allow stuff like links.
Hi
Now I am using html agility pack to get rid of attributes I won't support and html tags for that matter.
However I am still unclear if a person could do something like this
Bad
So I am not sure if I have to start looking at the inner text of all attributes that I support and html encode them? Or if what.
I am also not sure how to prevent a html link that goes to some page and launches some javascript on load.
I am not sure if a white list can stop that one.
Bad
or
Bad
If you're trying to write an XSS validator for user-entered HTML for production, I highly recommend you use an existing library. Even with the whitelist approach you are taking, there are many, many possible attribute values that can result in XSS. Search for "javascript:" in the XSS Cheat Sheet to see all sorts of places javascript: uris can turn up. Here is an incomplete list:
<IMG SRC="javascript:alert('XSS');">
<INPUT TYPE="IMAGE" SRC="javascript:alert('XSS');">
<BODY BACKGROUND="javascript:alert('XSS')">
<IMG LOWSRC="javascript:alert('XSS')">
There are also ways to inject external script urls, like this:
<XSS STYLE="behavior: url(xss.htc);">
If you're writing this for your own education, then the XSS Cheat Sheet has some really great fodder for unit tests.
you can do this:
<a href="javascript:(function(){
alert('hello');
})()">Hello</a>
if you want to get really crazy
Edit:
I like this even better
<a onclick="alert(eval({Crazy:function(){alert('Hello');return 'World';}}).Crazy());">
Crazy
</a>