JavaScript document.location.href seems to expect HTML

JavaScript document.location.href seems to expect HTML - javascript

If I execute this in the IE 11 console:
document.location.href="a&b";
I get an error saying:
Not found - The requested URL /a&b was not found on this server.
But should it not rather complain about a&b not being found? The URL which happens to contain a HTML entity seems to be interpreted as HTML, despite nothing here actually being connected to HTML.
I came across this while having a GET parameter named copy_from in an onclick attribute, and despite escaping the ampersand correctly (&copy_from=), the final URL showed up with a copyright character in it. (It works in IE by escaping the ampersand twice, but then it fails in other browsers.)
Am I missing something or is this an IE bug?

I do think this is a bug. If, in javascript you have specified '/a&b', then the URL requested should have nothing to do with HTML entities. This is supported by the fact that most browsers try to redirect you to '/a&b', as expected.
I've seen lots of inconsistencies between browsers like this before.
It's probably the kind of thing someone (with lots more time than I) could find a way to exploit.
If the document is XHTML - and interpreted as XML - however, you are required to use a CDATA section around the inline javascript which includes the ampersands. e.g. When is a CDATA section necessary within a script tag?.
In practice, this is rarely done, which is why this is probably a bug.

Related

Browser strips encoded dot character from url

I have a web app, which allows searching. So when I go to somedomain.com/search/<QUERY> it searches for entities according to <QUERY>. The problem is, when I try to search for . or .. it doesn't work as expected (which is pretty obvious). What surprised me though, is that if I manually enter the url of somedomain.com/search/%2E, the browser (tested Chrome and IE11) converts it somedomain.com/search/ and issues a request without necessary payload.
So far I haven't found anything that would say it's not possible to make this work, so I came here. Right now I have only one option: replacing . and .. to something like __dotPlaceholder__, but this feels like a dirty hack to me.
Any solution (js or non-js) will be welcomed. Any information on why do browsers strip url-encoded dots is also a nice-to-have.

Unfortunately part of RFC3986 defines the URI dot segments to be normalised and stripped out in that case, ie http://example.com/a/./ to become http://example.com/a
see https://www.rfc-editor.org/rfc/rfc3986#page-33 for more information

HTML Opening-Comment is valid JavaScript?

An old idiom for getting very old browsers to ignore JavaScript blocks in HTML pages is to wrap the contents of the <script> element in HTML comments:
<script>
<!--
alert("Your browser supports JavaScript");
//-->
</script>
The rationale is that old JavaScriptless browsers will render as text the contents of the <script> element, so putting the JavaScript in an HTML comment makes the browser have nothing to render.
A modern browser, on the other hand, will see the <script> element and parse its contents as JavaScript. Consequently, the comments need to be valid JavaScript. The closing HTML comment (-->) is ignored by the JavaScript parser because it is preceded by a JavaScript line-comment (//).
My question is, how does the opening HTML comment (<!--) not cause the JavaScript parser to fail? I have heard from various people that the opening HTML comment is valid JavaScript. If it's true that the opening comment is evaluated as JavaScript, what does it do when it executes?

It seemed to be something exciting, an expression that might have a special meaning (<, ! and -- are all operators in Javascript), but without operands it does not make sense.
Turns out that <!-- is simply equivalent to // in Javascript, it is used to comment out one line.
It is a language feature that does not seem to be well-documented though, and might have been added for the simple reason to support this "hack". And now we have to live with it not to break backwards compatibility.
Needless to say that while this is a funny thing to know, this type of commenting should not be used in real code that other people might happen to read and work with.
The "hack" is also obsolete, because now every browser understands the <script> tag and does not display its contents (even if Javascript is turned off). Personally, in most cases I try avoid writing Javascript directly into HTML anyways and always load my Javascript as an external resource to separate HTML and Javascript.

In another StackOverflow question, #MathiasBynens gave what I believe is the answer:
Why is the HTML comment open block valid JavaScript?
In short, apparently, this is a non-standard addition to browser-based JS engines that allows these <!-- and --> as single-line comment markers like the // in standard JS.

Why use \x3C instead of < when generating HTML from JavaScript?

I see the following HTML code used a lot to load jQuery from a content delivery network, but fall back to a local copy if the CDN is unavailable (e.g. in the Modernizr docs):
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.js"></script>
<script>window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">\x3C/script>')</script>
My question is, why is the last < character in the document.write() statement replaced with the escape sequence \x3C? < is a safe character in JavaScript and is even used earlier in the same string, so why escape it there? Is it just to prevent bad browser implementations from thinking the </script> inside the string is the real script end tag? If so are there really any browsers out there that would fail on this?
As a follow-on question, I've also seen a variant using unescape() (as given in this answer) in the wild a couple of times too. Is there a reason why that version always seems to substitute all the < and > characters?

When the browser sees </script>, it considers this to be the end of the script block (since the HTML parser has no idea about JavaScript, it can't distinguish between something that just appears in a string, and something that's actually meant to end the script element). So </script> appearing literally in JavaScript that's inside an HTML page will (in the best case) cause errors, and (in the worst case) be a huge security hole.
That's why you somehow have to prevent this sequence of characters to appear. Other common workarounds for this issue are "<"+"/script>" and "<\/script>" (they all come down to the same thing).
While some consider this to be a "bug", it actually has to happen this way, since, as per the specification, the HTML part of the user agent is completely separate from the scripting engine. You can put all kinds of things into <script> tags, not just JavaScript. The W3C mentions VBScript and TCL as examples. Another example is the jQuery template plugin, which uses those tags as well.
But even within JavaScript, where you could suggest that such content in strings could be recognized and thus not be treated as ending tags, the next ambiguity comes up when you consider comments:
<script type="text/javascript">foo(42); // call the function </script>
– what should the browser do in this case?
And finally, what about browsers that don't even know JavaScript? They would just ignore the part between <script> and </script>, but if you gave different semantics to the character sequence </script> based on the browsers knowledge of JavaScript, you'd suddenly have two different results in the HTML parsing stage.
Lastly, regarding your question about substituting all angle brackets: I'd say at least in 99% of the cases, that's for obfuscation, i.e. to hide (from anti-virus software, censoring proxies (like in your example (nested parens are awesome)), etc.) the fact that your JavaScript is doing some HTML-y stuff. I can't think of good technical reasons to hide anything but </script>, at least not for reasonably modern browsers (and by that, I mean pretty much anything newer than Mosaic).

Some parsers handle the < version as the closing tag and interpret the code as
<script>
window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">
</script>
\x3C is hexadecimal for <. Those are interchangable within the script.

Is it possible to XSS exploit JSON responses with proper JavaScript string escaping?

JSON responses can be exploited by overriding Array constructors or if hostile values are not JavaScript string-escaped.
Let's assume both of those vectors are addressed in the normal way. Google famously traps JSON response direct sourcing by prefixing all JSON with something like:
throw 1; < don't be evil' >
And then the rest of the JSON follows. So Dr. Evil cannot, using the sort of exploit discussed here. Get your cookie (assuming you're logged in) by putting the following on his site:
<script src="http://yourbank.com/accountStatus.json">
As for string escaping rules, well if we're using double quotes, we need to prefix each with a backslash and each backslash with another backslash etc.
But my question is, what if you're doing all of this?
Burp Suite (the automated security tool) detects embedded XSS attempts that are returned unHTML-escaped in a JSON response and it reports it as an XSS vulnerability. I have a report that my application contains vulnerabilities of this kind but I am not convinced. I've tried it and I can't make an exploit work.
So I don't think this is correct.
There is one specific case, that of IE MIME-type sniffing that I think could result in an exploit. After all, IE 7 still had the "feature" that script tags embedded in image comments were executed regardless of the Content-Type header. Let's also leave such clearly stupid behaviour aside at first.
Surely the JSON would be parsed by either the native JavaScript parser (Window.JSON in Firefox) or by an eval() as per the old default jQuery behaviour. In neither case would the following expression result in the alert being executed:
{"myJSON": "legit", "someParam": "12345<script>alert(1)</script>"}
Am I right or am I wrong?

This potential xss vulnerability can be avoided by using the correct Content-Type. Based on RFC-4627 all JSON responses should use the application/json type. The following code is not vulnerable to xss, go ahead test it:
<?php
header('Content-type: application/json');
header("x-content-type-options: nosniff");
print $_GET['json'];
?>
The nosniff header is used to disable content-sniffing on old versions of Internet Explorer. Another variant is as follows:
<?php
header("Content-Type: application/json");
header("x-content-type-options: nosniff");
print('{"someKey":"<body onload=alert(\'alert(/ThisIsNotXSS/)\')>"}');
?>
when the above code is viewed by a browser the user was prompted to download a JSON file, the JavaScript was not executed on modern versions of Chrome, FireFox and Internet Explorer. This would be an RFC violation.
If you use JavaScript to eval() the JSON above or write the response to the page then it becomes DOM Based XSS. DOM based XSS is patched on the client by sanitizing the JSON before acting on this data.

Burpsuite (the automated security tool) detects embedded XSS attempts
that are returned unHTML-escaped in a JSON response and it reports it
as an XSS vulnerability.
Maybe it tries to prevent the vulnerability described in the rule 3.1 of OWASP XSS Cheat Sheet.
They give the following example of vulnerable code:
<script>
var initData = <%= data.to_json %>;
</script>
Even if double quotes, slashes and newlines are properly escaped, you can break out of JSON if it's embedded in HTML:
<script>
var initData = {"foo":"</script><script>alert('XSS')</script>"};
</script>
jsFiddle.
to_json() function can prevent this issue by prefixing each slash with a backslash. If JSON is used in HTML attribute, the whole JSON string must be HTML-escaped. If it's used in a href="javascript:" attribute, it must be URL-escaped.

If we limit our scope to IE (all versions), assume you are running a site based on PHP or ASP.NET, and ignore the IE anti-XSS filter, then you are wrong: your users are vulnerable. Setting 'Content-type: application/json' will not help, either.
This is due to (as you mention) IE's content detection behavior, which goes beyond sniffing for HTML tags in the response body to include URI analysis.
This blog posting explains this very well:
https://www.adico.me/post/json-based-xss-exploitation

For the record, although I accepted an answer, for the exact literal question I am asking, I was right and there was no vulnerability due to the presence of non-HTML-escaped yet correctly JSON-escaped HTML inside JSON values. There could be a bug there if that value was inserted into the DOM without client-side escaping but Burpsuite has little chance of knowing if that would happen just by looking at network traffic.
In the general case of determining what is a security vulnerability in these circumstances, it's instructive to recognise that while it may not feel like good design, the response content of a JSON value could legitimately be known to certainly contain no user input and be intended to be already rendered HTML to be safely inserted in the DOM unescaped. Escaping it would be a (non-security) bug as I mentioned in another comment.

Javascript Methodname is replaced with !==

On the server lies a html file with javascript code included.
This javascript code includes a method called something like "CheckObject".
This file works for all users, except one specific (but important).
He gets a javascript error and in his browser sourcode appears something unbelievable:
The methodname "CheckObject" is replaced with "Check!==ect", means the "Obj" of the method name is replaced with !==.
Why could that be?
Hope anybody can help me!
Best regards

If he's using a browser that supports extensions (like Firefox, Chrome, and some others), it's probably worth disabling all of the extensions and seeing if the problem goes away.
If you haven't already, I'd completely clear his cache in case there was a bad page transfer once and the browser is reusing it.
I can't imagine how it would be happening reliably otherwise.

We Keep Coding

JavaScript is the programming language of the Web.