XSS and other ways to terminate JavaScript strings - javascript

Is there a different way to terminate strings in JavaScript?
I'm testing a server for XSS vulnerabilities, and I'm seeing the following code in the HTTP response:
<script>
var myVar = "USER CONTROLLED STRING";
</script>
The user-controlled string comes from the URL and all double quotes are removed before the response is generated. Besides that, all other characters are allowed.
Is XSS possible?
(And ,yes, I know the proper thing to do here would would hex encode(\xHH) all non-alphanumeric characters as per recommended by the OWASP XSS Prevention Cheat Sheet, but from I tester's perspective I want to know if I could exploit this.)

Yes, you can perform an attack: http://jsfiddle.net/vTmq6/1/
<script>
var myVar = "</script><script>alert('hacked');</script>";
</script>

A </script> would denote the script element’s end tag regardless of whether the resulting JavaScript code is valid or not. This is due to the restrictions on the contents of raw text elements, which the script element belongs to:
The text in raw text and RCDATA elements must not contain any occurrences of the string "</" (U+003C LESS-THAN SIGN, U+002F SOLIDUS) followed by characters that case-insensitively match the tag name of the element followed by one of "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), U+0020 SPACE, ">" (U+003E), or "/" (U+002F).

Related

How to bypass the PHP strip_tags function [duplicate]

Is there a known XSS or other attack that makes it past a
$content = "some HTML code";
$content = strip_tags($content);
echo $content;
?
The manual has a warning:
This function does not modify any attributes on the tags that you allow using allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users.
but that is related to using the allowable_tags parameter only.
With no allowed tags set, is strip_tags() vulnerable to any attack?
Chris Shiflett seems to say it's safe:
Use Mature Solutions
When possible, use mature, existing solutions instead of trying to create your own. Functions like strip_tags() and htmlentities() are good choices.
is this correct? Please if possible, quote sources.
I know about HTML purifier, htmlspecialchars() etc.- I am not looking for the best method to sanitize HTML. I just want to know about this specific issue. This is a theoretical question that came up here.
Reference: strip_tags() implementation in the PHP source code
As its name may suggest, strip_tags should remove all HTML tags. The only way we can proof it is by analyzing the source code. The next analysis applies to a strip_tags('...') call, without a second argument for whitelisted tags.
First at all, some theory about HTML tags: a tag starts with a < followed by non-whitespace characters. If this string starts with a ?, it should not be parsed. If this string starts with a !--, it's considered a comment and the following text should neither be parsed. A comment is terminated with a -->, inside such a comment, characters like < and > are allowed. Attributes can occur in tags, their values may optionally be surrounded by a quote character (' or "). If such a quote exist, it must be closed, otherwise if a > is encountered, the tag is not closed.
The code text is interpreted in Firefox as:
text
The PHP function strip_tags is referenced in line 4036 of ext/standard/string.c. That function calls the internal function php_strip_tags_ex.
Two buffers exist, one for the output, the other for "inside HTML tags". A counter named depth holds the number of open angle brackets (<).
The variable in_q contains the quote character (' or ") if any, and 0 otherwise. The last character is stored in the variable lc.
The functions holds five states, three are mentioned in the description above the function. Based on this information and the function body, the following states can be derived:
State 0 is the output state (not in any tag)
State 1 means we are inside a normal html tag (the tag buffer contains <)
State 2 means we are inside a php tag
State 3: we came from the output state and encountered the < and ! characters (the tag buffer contains <!)
State 4: inside HTML comment
We need just to be careful that no tag can be inserted. That is, < followed by a non-whitespace character. Line 4326 checks an case with the < character which is described below:
If inside quotes (e.g. <a href="inside quotes">), the < character is ignored (removed from the output).
If the next character is a whitespace character, < is added to the output buffer.
if outside a HTML tag, the state becomes 1 ("inside HTML tag") and the last character lc is set to <
Otherwise, if inside the a HTML tag, the counter named depth is incremented and the character ignored.
If > is met while the tag is open (state == 1), in_q becomes 0 ("not in a quote") and state becomes 0 ("not in a tag"). The tag buffer is discarded.
Attribute checks (for characters like ' and ") are done on the tag buffer which is discarded. So the conclusion is:
strip_tags without a tag whitelist is safe for inclusion outside tags, no tag will be allowed.
By "outside tags", I mean not in tags as in outside tag. Text may contain < and > though, as in >< a>>. The result is not valid HTML though, <, > and & need still to be escaped, especially the &. That can be done with htmlspecialchars().
The description for strip_tags without an whitelist argument would be:
Makes sure that no HTML tag exist in the returned string.
I cannot predict future exploits, especially since I haven't looked at the PHP source code for this. However, there have been exploits in the past due to browsers accepting seemingly invalid tags (like <s\0cript>). So it's possible that in the future someone might be able to exploit odd browser behavior.
That aside, sending the output directly to the browser as a full block of HTML should never be insecure:
echo '<div>'.strip_tags($foo).'</div>'
However, this is not safe:
echo '<input value="'.strip_tags($foo).'" />';
because one could easily end the quote via " and insert a script handler.
I think it's much safer to always convert stray < into < (and the same with quotes).
According to this online tool, this string will be "perfectly" escaped, but
the result is another malicious one!
<<a>script>alert('ciao');<</a>/script>
In the string the "real" tags are <a> and </a>, since < and script> alone aren't tags.
I hope I'm wrong or that it's just because of an old version of PHP, but it's better to check in your environment.
YES, strip_tags() is vulnerable to scripting attacks, right through to (at least) PHP 8. Do not use it to prevent XSS. Instead, you should use filter_input().
The reason that strip_tags() is vulnerable is because it does not run recursively. That is to say, it does not check whether or not valid tags will remain after valid tags have been stripped. For example, the string
<<a>script>alert(XSS);<</a>/script> will strip the <a> tag successfully, yet fail to see this leaves
<script>alert(XSS);</script>.
This can be seen (in a safe environment) here.
Strip tags is perfectly safe - if all that you are doing is outputting the text to the html body.
It is not necessarily safe to put it into mysql or url attributes.

Sanitizing HTML input value

Do you have to convert anything besides the quotes (") to (") inside of:
<input type="text" value="$var">
I personally do not see how you can possibly break out of that without using " on*=....
Is this correct?
Edit: Apparently some people think my question is too vague;
<input type="text" value="<script>alert(0)</script>"> does not execute. Thus, making it impossible to break out of using without the usage of ".
Is this correct?
There really are two questions that you're asking (or at least can be interpreted):
Can the quoted value attribute of input[type="text"] be injected if quotes are disallowed?
Can an arbitrary quoted attribute of an element be injected if quotes are disallowed.
The second is trivially demonstrated by the following:
Foo
Or
<div onmousemove="alert(123);">...
The first is a bit more complicated.
HTML5
According to the HTML5 spec:
Attribute values are a mixture of text and character references, except with the additional restriction that the text cannot contain an ambiguous ampersand.
Which is further refined in quoted attributes to:
The attribute name, followed by zero or more space characters, followed by a single U+003D EQUALS SIGN character, followed by zero or more space characters, followed by a single """ (U+0022) character, followed by the attribute value, which, in addition to the requirements given above for attribute values, must not contain any literal U+0022 QUOTATION MARK characters ("), and finally followed by a second single """ (U+0022) character.
So in short, any character except an "ambiguous ampersand" (&[a-zA-Z0-9]+; when the result is not a valid character reference) and a quote character is valid inside of an attribute.
HTML 4.01
HTML 4.01 is less descriptive than HTML5 about the syntax (one of the reasons HTML5 was created in the first place). However, it does say this:
When script or style data is the value of an attribute (either style or the intrinsic event attributes), authors should escape occurrences of the delimiting single or double quotation mark within the value according to the script or style language convention. Authors should also escape occurrences of "&" if the "&" is not meant to be the beginning of a character reference.
Note, this is saying what an author should do, not what a parser should do. So a parser could technically accept or reject invalid input (or mangle it to be valid).
XML 1.0
The XML 1.0 Spec defines an attribute as:
Attribute ::= Name Eq AttValue
where AttValue is defined as:
AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
The & is similar to the concept of an "ambiguous ampersand" from HTML5, however it's basically saying "any unencoded ampersand".
Note though that it explicitly denies < from attribute values.
So while HTML5 allows it, XML1.0 explicitly denies it.
What Does It Mean
It means that for a compliant and bug free parser, HTML5 will ignore < characters in an attribute, and XML will error.
It also means that for a compliant and bug free parser, HTML 4.01 will behave in unspecified and potentially odd ways (since the specification doesn't detail the behavior).
And this gets down to the crux of the issue. In the past, HTML was such a loose spec, that every browser had slightly different rules for how it would deal with malformed html. Each would try to "fix" it, or "interpret" what you meant. So that means that while a HTML5 compliant browser wouldn't execute the JS in <input type="text" value="<script>alert(0)</script>">, there's nothing to say that a HTML 4.01 compliant browser wouldn't. And there's nothing to say that a bug may not exist in the XML or HTML5 parser that causes it to be executed (though that would be a pretty significant problem).
THAT is why OWASP (and most security experts) recommend you encode either all non-alpha-numeric characters or &<" inside of an attribute value. There's no cost in doing so, only the added security of knowing how the browser's parser will interpret the value.
Do you have to? no. But defense in depth suggests that, since there's no cost to doing so, the potential benefit is worth it.
If your question is "what types of xss-attacks are possible" then you better google it. I'll just leavev some examples of why you should sanitize your inputs
If input is generated by echo '<input type="text" value="$var">', then simple ' breaks it.
If input is plain HTML in PHP page then value=<?php deadly_php_script ?> breaks it
If this is plain HTML input in HTML file - then converting doublequotes should be enough.
Although, converting other special symbols (like <, > and so on) is a good practice. Inputs are made to input info that would be stored on server\transferred into another page\script, so you need to check what could break those files. Let's say we have this setup:
index.html:
<form method=post action=getinput.php>
<input type="text" name="xss">
<input type="submit"></form>
getinput.php:
echo $_POST['xss'];
Input value ;your_deadly_php_script breaks it totally (you can also sanitize server-side in that case)
If that's not enough - provide more info on your question, add more examples of your code.
I believe the person is referring to cross site scripting attacks. They tagged this as php, security, and xss
take for example
<input type="text" value=""><script>alert(0)</script><"">
The above code will execute the alert box code;
<?php $var= "\"><script>alert(0)</script><\""; ?>
<input type="text" value="<?php echo $var ?>">
This will also execute the alert box.
To solve this you need to escape ", < >, and a few more to be safe. PHP has a couple of functions worth looking into and each have their ups and downs!
htmlentities() - Convert all applicable characters to HTML entities
htmlspecialchars() - Convert special characters to HTML entities
get_html_translation_table() - Returns the translation table used by htmlspecialchars and htmlentities
urldecode() - Decodes URL-encoded string
What you have to be careful of is that you are passing in a variable and there ways to create errors and such to cause it to break out. Your best bet is to make sure that data is not formatted in an executable manner in case of errors. But you are right if they are no quotes you can't break out but there are ways you or I don't understand at this point that will allow that to happen.
$var = "><script>alert(0);</script> would work... If you can close the quotes you can then close the tag and open another one... But I think you are right, without closing the quotes no injection is possible...

Special characters not displaying correctly in a Javascript string

I have a function that assigns a string containing specials characters into a variable, then passes that variable to a DOM element via innerHTML property, but it prints strange characters. Let's say I code this...
someText = "äêíøù";
document.getElementById("someElement").innerHTML = someText;
It prints the following text...
äêíøù
I know how to use the entity names to prevent this, but when I use them to pass the value through a Javascript method, they print literally.
This means that you have a conflict of encodings. Your JavaScript and your HTML are being served to the browser with different encodings/character sets. Ensure that they're encoded in and served with the same encoding / character set (UTF8 is a good choice) to make sure that characters are correctly interpreted.
Obligatory link: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

How to handle possibly HTML encoded values in javascript

I have a situation where I'm not sure if the input I get is HTML encoded or not. How do I handle this? I also have jQuery available.
function someFunction(userInput){
$someJqueryElement.text(userInput);
}
// userInput "<script>" returns "<script>", which is fine
// userInput "<script>" returns &lt;script&gt;", which is bad
I could avoid escaping ampersands (&), but what are the risks in that? Any help is very much appreciated!
Important note: This user input is not in my control. It returns from a external service, and it is possible for someone to tamper with it and avoid the html escaping provided by that service itself.
You really need to make sure you avoid these situations as it introduces really difficult conditions to predict.
Try adding an additional variable input to the function.
function someFunction(userInput, isEncoded){
//Add some conditional logic based on isEncoded
$someJqueryElement.text(userInput);
}
If you look at products like fckEditor, you can choose to edit source or use the rich text editor. This prevents the need for automatic encoding detection.
If you are still insistent on automatically detecting html encoding characters, I would recommend using index of to verify that certain key phrases exist.
str.indexOf('<') !== -1
This example above will detect the < character.
~~~New text added after edit below this line.~~~
Finally, I would suggest looking at this answer. They suggest using the decode function and detecting lengths.
var string = "Your encoded & decoded string here"
function decode(str){
return decodeURIComponent(str).replace(/</g,'<').replace(/>/g,'>');
}
if(string.length == decode(string).length){
// The string does not contain any encoded html.
}else{
// The string contains encoded html.
}
Again, this still has the problem of a user faking out the process by entering those specially encoded characters, but that is what html encoding is. So it would be proper to assume html encoding as soon as one of these character sequences comes up.
You must always correctly encode untrusted input before concatenating it into a structured language like HTML.
Otherwise, you'll enable injection attacks like XSS.
If the input is supposed to contain HTML formatting, you should use a sanitizer library to strip all potentially unsafe tags & attributes.
You can also use the regex /<|>|&(?![a-z]+;) to check whether a string has any non-encoded characters; however, you cannot distinguish a string that has been encoded from an unencoded string that talks about encoding.

IE innerHTML chops sentence if the last word contains '&' (ampersand)

I am trying to populate a DOM element with ID 'myElement'. The content which I'm populating is a mix of text and HTML elements.
Assume following is the content I wish to populate in my DOM element.
var x = "<b>Success</b> is a matter of hard work &luck";
I tried using innerHTML as follows,
document.getElementById("myElement").innerHTML=x;
This resulted in chopping off of the last word in my sentence.
Apparently, the problem is due to the '&' character present in the last word. I played around with the '&' and innerHTML and following are my observations.
If the last word of the content is less than 10 characters and if it has a '&' character present in it, innerHTML chops off the sentence at '&'.
This problem does not happen in firefox.
If I use innerText the last word is in tact but then all the HTML tags which are part of the content becomes plain text.
I tried populating through jQuery's #html method,
$("#myElement").html(x);
This approach solves the problem in IE but not in chrome.
How can I insert a HTML content with a last word containing '&' without it being chopped off in all browsers?
Update : 1. I tried html encoding the content which I am trying to insert into the DOM. When I encode the content, the html tags which are part of the content becomes plain string.
For the above mentioned content, I expect the result to be rendered as,
Success is a matter of hard work &luck
but when I encode what I actually get in the rendered page is,
<b>Success</b> is a matter of hard work &luck
You should replace your & with &.
The & (ampersand) character is used within HTML to represent various special characters. For example, " = ", < = <, etcetera. Now, &luck clearly is not a valid HTML entity (for one it is missing the semicolon). However, various browsers may, due to combinations of error correcting (the semicolon), and the fact that it looks somewhat like an HTML entity (& followed by four characters) try to parse it as such.
Because &luck; is not a valid HTML entity, the original text is lost. Because of this, when using an ampersand in your HTML, always use &.
Update: When this text is entered by a user, it is up to you to escape this character properly. In PHP for example, you would call htmlentities on the text before displaying it to the user. This has the added benefit of filtering out malicious user code such as <script> tags.
The ampersand is a special character in HTML that indicates the start of a character entity reference or numeric character reference, you need to escape it like so:
var x = "<b>Success</b> is a matter of hard work &luck";
Try using this instead:
var x = "<b>Success</b> is a matter of hard work &luck";
By HTML encoding the ampersand, you are ensuring that there is no ambiguity in what you mean when you write "&luck".

Categories