Why use \x3C instead of < when generating HTML from JavaScript?

Why use \x3C instead of < when generating HTML from JavaScript? - javascript

I see the following HTML code used a lot to load jQuery from a content delivery network, but fall back to a local copy if the CDN is unavailable (e.g. in the Modernizr docs):
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.js"></script>
<script>window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">\x3C/script>')</script>
My question is, why is the last < character in the document.write() statement replaced with the escape sequence \x3C? < is a safe character in JavaScript and is even used earlier in the same string, so why escape it there? Is it just to prevent bad browser implementations from thinking the </script> inside the string is the real script end tag? If so are there really any browsers out there that would fail on this?
As a follow-on question, I've also seen a variant using unescape() (as given in this answer) in the wild a couple of times too. Is there a reason why that version always seems to substitute all the < and > characters?

When the browser sees </script>, it considers this to be the end of the script block (since the HTML parser has no idea about JavaScript, it can't distinguish between something that just appears in a string, and something that's actually meant to end the script element). So </script> appearing literally in JavaScript that's inside an HTML page will (in the best case) cause errors, and (in the worst case) be a huge security hole.
That's why you somehow have to prevent this sequence of characters to appear. Other common workarounds for this issue are "<"+"/script>" and "<\/script>" (they all come down to the same thing).
While some consider this to be a "bug", it actually has to happen this way, since, as per the specification, the HTML part of the user agent is completely separate from the scripting engine. You can put all kinds of things into <script> tags, not just JavaScript. The W3C mentions VBScript and TCL as examples. Another example is the jQuery template plugin, which uses those tags as well.
But even within JavaScript, where you could suggest that such content in strings could be recognized and thus not be treated as ending tags, the next ambiguity comes up when you consider comments:
<script type="text/javascript">foo(42); // call the function </script>
– what should the browser do in this case?
And finally, what about browsers that don't even know JavaScript? They would just ignore the part between <script> and </script>, but if you gave different semantics to the character sequence </script> based on the browsers knowledge of JavaScript, you'd suddenly have two different results in the HTML parsing stage.
Lastly, regarding your question about substituting all angle brackets: I'd say at least in 99% of the cases, that's for obfuscation, i.e. to hide (from anti-virus software, censoring proxies (like in your example (nested parens are awesome)), etc.) the fact that your JavaScript is doing some HTML-y stuff. I can't think of good technical reasons to hide anything but </script>, at least not for reasonably modern browsers (and by that, I mean pretty much anything newer than Mosaic).

Some parsers handle the < version as the closing tag and interpret the code as
<script>
window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">
</script>
\x3C is hexadecimal for <. Those are interchangable within the script.

Related

JavaScript document.location.href seems to expect HTML

If I execute this in the IE 11 console:
document.location.href="a&b";
I get an error saying:
Not found - The requested URL /a&b was not found on this server.
But should it not rather complain about a&b not being found? The URL which happens to contain a HTML entity seems to be interpreted as HTML, despite nothing here actually being connected to HTML.
I came across this while having a GET parameter named copy_from in an onclick attribute, and despite escaping the ampersand correctly (&copy_from=), the final URL showed up with a copyright character in it. (It works in IE by escaping the ampersand twice, but then it fails in other browsers.)
Am I missing something or is this an IE bug?

I do think this is a bug. If, in javascript you have specified '/a&b', then the URL requested should have nothing to do with HTML entities. This is supported by the fact that most browsers try to redirect you to '/a&b', as expected.
I've seen lots of inconsistencies between browsers like this before.
It's probably the kind of thing someone (with lots more time than I) could find a way to exploit.
If the document is XHTML - and interpreted as XML - however, you are required to use a CDATA section around the inline javascript which includes the ampersands. e.g. When is a CDATA section necessary within a script tag?.
In practice, this is rarely done, which is why this is probably a bug.

Why does <!-- Not Throw a Syntax Error?

I noticed in some legacy code the following pattern:
<script>
<!--
// code
// -->
</script>
After some research, this appears to be a very old technique for hiding the contents of script elements from the DOM when the browser did not support the <script> element. More information can be found here.
My concern is this: why does <!-- not throw a Syntax Error? I've found on whatwg.org's website that <!-- should be functionally equivalent to //, and it links off to a snippet from the ECMAScript grammar about comments. The problem is, <!-- isn't defined by that grammar at all.
So this seems like undefined behavior that happens to be implemented by all major browsers. Is there a specification that allows for this, or is this a backwards-compatibility hack that people are bringing forward?

Officially: Because there's specific handling for it in the HTML spec. E.g., it's a "by fait" thing. It's not a JavaScript thing, you won't find it in the JavaScript grammar.
Unofficially, it would appear that at least some JavaScript engines handle it intrinsically, sometimes in ways that make what I believe is valid JavaScript invalid. For instance, on V8 in a browser, this fails:
eval("var a = 1; var n = 3; console.log(a<!--n);")
...with Unexpected end of input. I'm pretty sure it shouldn't, but I'm not a parsing lawyer. I'd expect it to log false to the console, like this does:
eval("var a = 1; var n = 3; console.log(a<! --n);")
// Note the space -------------------------^
Side note: Meteor's jsparser agrees with me, copy and paste just the bit inside the double quotes into it.
Note that the characters <! do not appear in the specification, nor does there appear to be anything near any of the 70 occurrences of the word "comment" in there, nor is it anywhere in the comment grammar, so it wouldn't seem to be an explicit in-spec exception. It's just something at least some JavaScript engines do to avoid getting messed up by people doing silly things. No great surprise. :-)

It is defined by the W3's docs for the user agents:
The JavaScript engine allows the string "<!--" to occur at the start of a SCRIPT element, and ignores further characters until the end of the line.
So browsers follow these standards

Could anyone explain these XSS test strings?

recently I found this tutorial about XSS and web application security -> https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet#XSS_Locator
At the start there are some strings to inject in order to test that a site is vulnerable to xss or not. These strings are:
';alert(String.fromCharCode(88,83,83))//';alert(String.fromCharCode(88,83,83))//";
alert(String.fromCharCode(88,83,83))//";alert(String.fromCharCode(88,83,83))//--
></SCRIPT>">'><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT>
and
'';!--"<XSS>=&{()}
I know the basic concepts of XSS, but here I can't understand why there's that repetition of 'alert(String.fromCharCode(88,83,83))' in the first string and why those //'; //"; //--> comments are needed for (do they mean something special when used in such a way whilesearching for xss bugs?).
And in the second string, what is the purpose of the &{()} sequence?
Could anyone exlain me with concrete examples how this two strings should work in order to retrieve an xss bug inside a web app? Cause on the site I linked no explanation is given...

This looks like it's trying several different injections, so I'll try and break them down one at a time:
The First Injection
';alert(String.fromCharCode(88,83,83))//
This injection attempts to terminate a JavaScript string literal (using '), then terminate the statement (with ;) and makes a call to alert(String.fromCharCode(88,83,83)) which will cause a popup box containing "XSS". The following // is an attempt to "comment out" the rest of the statement, so that a syntax error will not occur and the script will execute.
The Second Injection
";alert(String.fromCharCode(88,83,83))//
Like the first injection, but it uses " in an attempt to terminate a JavaScript string literal.
The Third Injection
--></SCRIPT>">'><SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT>
This attempts to do the following things:
Terminate an HTML (or XML) comment (with -->)
Terminate an existing <SCRIPT> tag using </SCRIPT>
This is done to prevent the injected script causing a syntax error, which would prevent the injected script from executing.
Terminate an HTML attribute and tag, using ">
Terminate an HTML attribute and tag, using '>
Inject JavaScript using <SCRIPT>alert(String.fromCharCode(88,83,83))</SCRIPT>
The Fourth Injection
'';!--"<XSS>=&{()}
This is a common string used to test what, if any, filters and/or encoding are being used on user input. Typically, the source of the page after this injection will contain either <XSS or <XSS. If the second is found, the application is most likely not filtering user input (as it allowed the addition of an arbitrary tag) and is likely vulnerable to XSS.
To answer your more direct questions:
why there's that repetition of 'alert(String.fromCharCode(88,83,83))'
This is a common "Proof of Concept" function, that will cause a popup box to appear containing "XSS". If this occurs, the injected JavaScript was executed.
why there's that repetition of 'alert(String.fromCharCode(88,83,83))' in the first string and why those //'; //"; //-->
These are used to prevent syntax errors, which can cause the injected JavaScript to fail to execute.

HTML Opening-Comment is valid JavaScript?

An old idiom for getting very old browsers to ignore JavaScript blocks in HTML pages is to wrap the contents of the <script> element in HTML comments:
<script>
<!--
alert("Your browser supports JavaScript");
//-->
</script>
The rationale is that old JavaScriptless browsers will render as text the contents of the <script> element, so putting the JavaScript in an HTML comment makes the browser have nothing to render.
A modern browser, on the other hand, will see the <script> element and parse its contents as JavaScript. Consequently, the comments need to be valid JavaScript. The closing HTML comment (-->) is ignored by the JavaScript parser because it is preceded by a JavaScript line-comment (//).
My question is, how does the opening HTML comment (<!--) not cause the JavaScript parser to fail? I have heard from various people that the opening HTML comment is valid JavaScript. If it's true that the opening comment is evaluated as JavaScript, what does it do when it executes?

It seemed to be something exciting, an expression that might have a special meaning (<, ! and -- are all operators in Javascript), but without operands it does not make sense.
Turns out that <!-- is simply equivalent to // in Javascript, it is used to comment out one line.
It is a language feature that does not seem to be well-documented though, and might have been added for the simple reason to support this "hack". And now we have to live with it not to break backwards compatibility.
Needless to say that while this is a funny thing to know, this type of commenting should not be used in real code that other people might happen to read and work with.
The "hack" is also obsolete, because now every browser understands the <script> tag and does not display its contents (even if Javascript is turned off). Personally, in most cases I try avoid writing Javascript directly into HTML anyways and always load my Javascript as an external resource to separate HTML and Javascript.

In another StackOverflow question, #MathiasBynens gave what I believe is the answer:
Why is the HTML comment open block valid JavaScript?
In short, apparently, this is a non-standard addition to browser-based JS engines that allows these <!-- and --> as single-line comment markers like the // in standard JS.

what does '</' mean in JavaScript?

I use Aptana Studio to code JavaScript.
When I write string with </, there will be warning saying
'<' + '/' + letter not allowed here
But it does not trigger error in browsers.
what does </ mean in JavaScript?

For inline scripts (e.g, using <script>), some HTML parsers may interpret anything that looks like </this (especially </script>) as an HTML tag, rather than part of your source code. Your IDE is trying to keep you from typing this by mistake.
This means that, if you're using an inline script, you can't have a </tag> as a constant string in JavaScript:
var endTag = "</tag>"; // don't do this!
You'll need to break it up somehow to keep it from being interpreted as a tag:
var endTag = "<" + "/tag>";
Note that this only applies to inline scripts. Standalone scripts (e.g, a .js file) can have anything they want in them.

It doesn't mean anything in a string, outside of a string it would be a syntax error.
EDIT: Before someone nitpicks there are some exceptions, eg var i = 1 </* comment */ 2; is legal and there may be some other cases (like performing less-than operation on a regex) but generally speaking it signifies nothing by itself.

It sounds like it's your IDE is denying it. Aptana Studio may be assuming some sort of injection attack, and thus throws an error.
You would probably get a more direct answer by asking them directly though; a general program help site like StackOverflow is less likely to know the reasoning for specific cases such as this.

We Keep Coding

JavaScript is the programming language of the Web.

Why use \x3C instead of < when generating HTML from JavaScript? - javascript

Some parsers handle the < version as the closing tag and interpret the code as <script> window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js"> </script> \x3C is hexadecimal for <. Those are interchangable within the script.

Related

JavaScript document.location.href seems to expect HTML

Why does <!-- Not Throw a Syntax Error?

Could anyone explain these XSS test strings?

HTML Opening-Comment is valid JavaScript?

what does '</' mean in JavaScript?

Categories

Resources