Why does <!-- Not Throw a Syntax Error?

Why does <!-- Not Throw a Syntax Error? - javascript

I noticed in some legacy code the following pattern:
<script>
<!--
// code
// -->
</script>
After some research, this appears to be a very old technique for hiding the contents of script elements from the DOM when the browser did not support the <script> element. More information can be found here.
My concern is this: why does <!-- not throw a Syntax Error? I've found on whatwg.org's website that <!-- should be functionally equivalent to //, and it links off to a snippet from the ECMAScript grammar about comments. The problem is, <!-- isn't defined by that grammar at all.
So this seems like undefined behavior that happens to be implemented by all major browsers. Is there a specification that allows for this, or is this a backwards-compatibility hack that people are bringing forward?

Officially: Because there's specific handling for it in the HTML spec. E.g., it's a "by fait" thing. It's not a JavaScript thing, you won't find it in the JavaScript grammar.
Unofficially, it would appear that at least some JavaScript engines handle it intrinsically, sometimes in ways that make what I believe is valid JavaScript invalid. For instance, on V8 in a browser, this fails:
eval("var a = 1; var n = 3; console.log(a<!--n);")
...with Unexpected end of input. I'm pretty sure it shouldn't, but I'm not a parsing lawyer. I'd expect it to log false to the console, like this does:
eval("var a = 1; var n = 3; console.log(a<! --n);")
// Note the space -------------------------^
Side note: Meteor's jsparser agrees with me, copy and paste just the bit inside the double quotes into it.
Note that the characters <! do not appear in the specification, nor does there appear to be anything near any of the 70 occurrences of the word "comment" in there, nor is it anywhere in the comment grammar, so it wouldn't seem to be an explicit in-spec exception. It's just something at least some JavaScript engines do to avoid getting messed up by people doing silly things. No great surprise. :-)

It is defined by the W3's docs for the user agents:
The JavaScript engine allows the string "<!--" to occur at the start of a SCRIPT element, and ignores further characters until the end of the line.
So browsers follow these standards

Related

HTML Opening-Comment is valid JavaScript?

An old idiom for getting very old browsers to ignore JavaScript blocks in HTML pages is to wrap the contents of the <script> element in HTML comments:
<script>
<!--
alert("Your browser supports JavaScript");
//-->
</script>
The rationale is that old JavaScriptless browsers will render as text the contents of the <script> element, so putting the JavaScript in an HTML comment makes the browser have nothing to render.
A modern browser, on the other hand, will see the <script> element and parse its contents as JavaScript. Consequently, the comments need to be valid JavaScript. The closing HTML comment (-->) is ignored by the JavaScript parser because it is preceded by a JavaScript line-comment (//).
My question is, how does the opening HTML comment (<!--) not cause the JavaScript parser to fail? I have heard from various people that the opening HTML comment is valid JavaScript. If it's true that the opening comment is evaluated as JavaScript, what does it do when it executes?

It seemed to be something exciting, an expression that might have a special meaning (<, ! and -- are all operators in Javascript), but without operands it does not make sense.
Turns out that <!-- is simply equivalent to // in Javascript, it is used to comment out one line.
It is a language feature that does not seem to be well-documented though, and might have been added for the simple reason to support this "hack". And now we have to live with it not to break backwards compatibility.
Needless to say that while this is a funny thing to know, this type of commenting should not be used in real code that other people might happen to read and work with.
The "hack" is also obsolete, because now every browser understands the <script> tag and does not display its contents (even if Javascript is turned off). Personally, in most cases I try avoid writing Javascript directly into HTML anyways and always load my Javascript as an external resource to separate HTML and Javascript.

In another StackOverflow question, #MathiasBynens gave what I believe is the answer:
Why is the HTML comment open block valid JavaScript?
In short, apparently, this is a non-standard addition to browser-based JS engines that allows these <!-- and --> as single-line comment markers like the // in standard JS.

What does IE use //# in javascript for?

So, a bug in a piece of javascript revolved around code similar to :
<script>
(function() {
if (true) {
//#todo: do we need to set total or -- ?
alert('hello?');
}
})();
</script>
In the larger system IE complained "Expected ';' ". In the small scale example IE simply caused a warning about blocking ActiveX controls.
Obviously, "//#" has some context to activeX controls in IE. I was unable to find this as searching for the symbols was useless, and any search about special comments in IE result in the conditional html comments. I am just curious how the //# are supposed to be used in IE.

The IE JScript engine supports conditional comments which turn comments written in a particular way into code (partially). However, you are not using those.
In your case it seems to be a way to tell e.g. an IDE that there is a TODO item. The error you got is most likely unrelated.

Unless there's some quirk about IE that I don't know, the //#todo is just commenting fluff that some programmers use when they are too lazy/don't know how to implement something.

what does '</' mean in JavaScript?

I use Aptana Studio to code JavaScript.
When I write string with </, there will be warning saying
'<' + '/' + letter not allowed here
But it does not trigger error in browsers.
what does </ mean in JavaScript?

For inline scripts (e.g, using <script>), some HTML parsers may interpret anything that looks like </this (especially </script>) as an HTML tag, rather than part of your source code. Your IDE is trying to keep you from typing this by mistake.
This means that, if you're using an inline script, you can't have a </tag> as a constant string in JavaScript:
var endTag = "</tag>"; // don't do this!
You'll need to break it up somehow to keep it from being interpreted as a tag:
var endTag = "<" + "/tag>";
Note that this only applies to inline scripts. Standalone scripts (e.g, a .js file) can have anything they want in them.

It doesn't mean anything in a string, outside of a string it would be a syntax error.
EDIT: Before someone nitpicks there are some exceptions, eg var i = 1 </* comment */ 2; is legal and there may be some other cases (like performing less-than operation on a regex) but generally speaking it signifies nothing by itself.

It sounds like it's your IDE is denying it. Aptana Studio may be assuming some sort of injection attack, and thus throws an error.
You would probably get a more direct answer by asking them directly though; a general program help site like StackOverflow is less likely to know the reasoning for specific cases such as this.

Why use \x3C instead of < when generating HTML from JavaScript?

I see the following HTML code used a lot to load jQuery from a content delivery network, but fall back to a local copy if the CDN is unavailable (e.g. in the Modernizr docs):
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.js"></script>
<script>window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">\x3C/script>')</script>
My question is, why is the last < character in the document.write() statement replaced with the escape sequence \x3C? < is a safe character in JavaScript and is even used earlier in the same string, so why escape it there? Is it just to prevent bad browser implementations from thinking the </script> inside the string is the real script end tag? If so are there really any browsers out there that would fail on this?
As a follow-on question, I've also seen a variant using unescape() (as given in this answer) in the wild a couple of times too. Is there a reason why that version always seems to substitute all the < and > characters?

When the browser sees </script>, it considers this to be the end of the script block (since the HTML parser has no idea about JavaScript, it can't distinguish between something that just appears in a string, and something that's actually meant to end the script element). So </script> appearing literally in JavaScript that's inside an HTML page will (in the best case) cause errors, and (in the worst case) be a huge security hole.
That's why you somehow have to prevent this sequence of characters to appear. Other common workarounds for this issue are "<"+"/script>" and "<\/script>" (they all come down to the same thing).
While some consider this to be a "bug", it actually has to happen this way, since, as per the specification, the HTML part of the user agent is completely separate from the scripting engine. You can put all kinds of things into <script> tags, not just JavaScript. The W3C mentions VBScript and TCL as examples. Another example is the jQuery template plugin, which uses those tags as well.
But even within JavaScript, where you could suggest that such content in strings could be recognized and thus not be treated as ending tags, the next ambiguity comes up when you consider comments:
<script type="text/javascript">foo(42); // call the function </script>
– what should the browser do in this case?
And finally, what about browsers that don't even know JavaScript? They would just ignore the part between <script> and </script>, but if you gave different semantics to the character sequence </script> based on the browsers knowledge of JavaScript, you'd suddenly have two different results in the HTML parsing stage.
Lastly, regarding your question about substituting all angle brackets: I'd say at least in 99% of the cases, that's for obfuscation, i.e. to hide (from anti-virus software, censoring proxies (like in your example (nested parens are awesome)), etc.) the fact that your JavaScript is doing some HTML-y stuff. I can't think of good technical reasons to hide anything but </script>, at least not for reasonably modern browsers (and by that, I mean pretty much anything newer than Mosaic).

Some parsers handle the < version as the closing tag and interpret the code as
<script>
window.jQuery || document.write('<script src="js/libs/jquery-1.6.1.min.js">
</script>
\x3C is hexadecimal for <. Those are interchangable within the script.

Find IE-breaking ECMAScript/JavaScript errors

I work on a relatively huge ECMAScript codebase (> 60,000 LOC) and we tend to struggle a bit with detecting errors for our dreaded friend Internet Explorer (especially 6 and 7).
At the moment, I've been stuck with an issue for 3 days where my RIA renders fine in Google Chrome, Firefox 3.6, Opera and Internet Explorer 8, but fails when running Internet Explorer 8 in IE7-Mode (or with a real IE-7).
My question really is: how do you do to identify code that will produce an error in IE7?
Usually I rely on JSLint and I tend to catch the usual suspects (trailing commas, I loathe you), but in this particular case I have just re-run a linter on all my code, both source and minimized, and it doesn't yield my usual suspects. So I guess I have mistakenly introduced something that IE doesn't like (who knows, maybe I got crazy and used Array.map instead of dojo.map somewhere?) and it blows up in my face, producing nice error messages that don't help me at all ("[object object]" and " is null" where it shouldn't be, so I assume there was an error up-stream that failed silently and prevented this object from being created).
I've tried having a look at the Google Closure Linter but it doesn't yield anything special, and I don't think that the Google Closure Compiler is going to be my savior either. is there any tool (command-line, web-service or other) that can parse/run code as if it were emulating IE so we can get the appropriate errors?
Any tips are appreciated.
EDIT: Thank you for your help in trying to solve my issue so far, but what I am really just asking is if there are tools that do this sort of checks, meaning validating a feature set and syntax against a particular browser. This is the sort of thing severly lacking in the JS-world in my opinion (not so much for FF or Chrome, obviously as their debuggers are a bit more helpful).
EDIT2: I eventually found the root of my problem today (after 3 days) by going through all my code changes between two branches and realizing that the problem was actually already there but never detected before and going back through even older changes to narrow down the mess and eventually end up adding console logs everywhere until I reached a point of failure (God thank emacs for its regular expression support to add logs to every single line... heavy but works...). Fun fact though: IE swallowed an error message supposed to be displayed in the try catch block originally catching the source problem and then re-throwing it. Still don't why, but if it hadn't then that would have been a lot easier to find, as it was intended for that purpose in case it broke. Weird. Might not like deep-levels of re-throw.
I'll leave the question open as for now there are no answer to the actual question.

You might try the IE8 debugger, as #galambalazs suggests; the old debugger from the IE6 era was generally not useful.
The low-level technique I've always used is to add my own try/catch blocks around large portions of the my Javascript source to narrow down the source of error. By iteratively adjusting your try/catch blocks you can do a "binary search" through your source to locate the code that's causing an exception. What you'll probably find is that there's some code that's harmless in Firefox but which IE's interpreter considers an error. (To be fair, it's usually the case that IE's strictness is justified, and that the lax behavior of Firefox is really undesirable.)
So in other words, you'd start on a Javascript source that you suspect, or perhaps you'd do this to all of your included .js files:
// top of Javascript source file foo.js
try {
// ... all the code ...
} catch (x) { alert("Error in foo.js: " + x); }
Now, if you load the page and get that alert, then you know that the error is somewhere in foo.js. Thus, you split it in half:
// top of foo.js
try {
// ... half the code, roughly ...
}
catch (x) { alert("Error in first half of foo.js: " + x); }
try {
// ... the rest of the code ...
} catch (x) { alert("Error in second half of foo.js: " + x); }
Repeat that process and you'll eventually find the problem.

You may try Microsoft Script Debugger, or Internet Explorer Developer Tools.
For complete list of what may be different in IE 8 from the older version consult with:
Internet Explorer 8 Readiness Toolkit (offical list)
especially DOM Improvements
JavaScript in Internet Explorer 8 (John Resig's post)
Also see Web Bug Track for possible quirks.
As a final note, doing console.log on every line may help you to find a particular bug in a hard way, but considering the amount of code you should really write unit test for modules. That's the only way you can make sure that your application will actually run with different inputs and conditions.

We Keep Coding

JavaScript is the programming language of the Web.