Assume, you want to shown an alert with the string content of <!-- Comment --> <script type="text/javascript"></script> in JavaScript on a HTML page. You can do that with the following code:
<!DOCTYPE html>
<html>
<head>
<title>Quoting</title>
</head>
<body>
<script type="text/javascript">
alert('<!-- Comment --> <script type="text/javascript"></script\u003E');
</script>
</body>
</html>
Note here the quoted > character in the </script> part of the text. This uses a JavaScript Unicode escape to prevent the HTML parser from interpreting this part of the string literal as the end of the script tag. The code above works perfectly in FF, Chrome, IE.
Now try to apply the > quoting also to the end of the comment within the string literal. This should change nothing, because the XML comment syntax should not be interpreted within script tags in HTML (and obviously is not interpreted, because the comment syntax was shown in the alert):
<!DOCTYPE html>
<html>
<head>
<title>Quoting</title>
</head>
<body>
<script type="text/javascript">
alert('<!-- Comment --\u003E <script type="text/javascript"></script\u003E');
</script>
</body>
</html>
Interestingly, this code breaks - the alert is not printed when the page loads. The problem can be reproduced at least in FF, Chrome and IE. Am I missing something in the specs, or is that a "browser-independent" bug in the HTML parser of all major browsers?
The DOM inspector shows the following:
It looks like the rest of the document is interpreted as part of the script in this case.
Any ideas?
This is not a bug.
The first < puts the parser into Script data less-than sign state then the ! puts it in script data escape start state and so on.
Replacing the > that ended the HTML comment with an escape sequence means it doesn't come out of the "dealing with a comment" state until it hits the end of the HTML document.
It is a historical artefact of the hack early HTML used to allow inline scripts without the JS source code showing up on the page for browsers which didn't support the <script> element.
Related
I've been struggling to understand how to render Katex without having to use $$ before and after the math expression. Katex on github says I should use this:
<script>
renderMathInElement(document.body);
</script>
But I still need to use $$ for each line of code. How can I render the whole page as katex? Thank you!
Here's a simple example. If you paste this code in an HTML file and then open that file in a browser, it should render, and with no occurrences of $$ anywhere to be found, so hopefully you can then tweak it to whatever you need.
Not sure why, but I needed to use \\ wherever I found \ on the Katex function support page, so if \\ doesn't work for you, try switching it to \.
<!doctype html>
<html>
<head>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.7.1/katex.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.7.1/katex.min.js"></script>
</head>
<body>
<span id="formula">f(x)</span>
<script>
katex.render("\\int_0^1{f(x)}", formula);
</script>
</body>
</html>
I'm learning xss prevention through this ppt:http://stash.github.io/empirejs-2014/#/2/23, and I have a question on this page.
It says "JavaScript sanitization doesn't save you from innerHTML", and I tried a simple test like this:
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>test</title>
</head>
<body>
<div id="test"></div>
<script>
var userName = "Jeremy\x3Cscript\x3Ealert('boom')\x3C/script\x3E";
document.getElementById('test').innerHTML = "<span>"+userName+"</span>";
</script>
</body>
</html>
when I opened this html on my browser(chrome), I only saw the name "Jeremy",by using F12, I saw
<div id="test"><span>Jeremy<script>alert('boom')</script></span></div>
Although the script had been added to html, the alert box didn't come out.
"JavaScript sanitization doesn't save you from innerHTML" I think this means that the word "boom" should be alerted. Am I right?
According to MDN, innerHTML prevents <script> elements from executing directly1, which means your test should not alert anything. However, it does not prevent event handlers from firing later on, which makes the following possible:
var name = "\x3Cimg src=x onerror=alert(1)\x3E";
document.getElementById('test').innerHTML = name; // shows the alert
<div id="test"></div>
(script adapted from the example in the article, with escape sequences although I'm not sure those are relevant outside of <script> elements)
Since <script> elements never execute when inserted via innerHTML, it's not clear to me what that slide is trying to convey with that example.
1 This is actually specified in HTML5. MDN links to a 2008 draft; in the current W3C Recommendation, it's located near the end of section 4.11.1, just before section 4.11.1.1 begins:
Note: When inserted using the document.write() method, script elements execute (typically synchronously), but when inserted using innerHTML and outerHTML attributes, they do not execute at all.
I have the following code which works properly in chome
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<script>
//<![CDATA[
!function (){
window.stop();
var html = '<!DOCTYPE html>\n<html>\n<head>\n <meta charset="utf-8">\n</head>\n<body>\n \<script>console.log("loaded");<\/script>\ntext\n</body>\n</html>';
document.documentElement.innerHTML = html;
}();
//]]>
</script>
</body>
</html>
It prints "loaded" in the console. The same code does not work by firefox, it does not run the script, just prints the text.
(If you are curious why I need this, you can find it here: https://stackoverflow.com/a/30933972/607033 )
I tried possible solutions like this: https://stackoverflow.com/a/20584396/607033 but they did not work. Any idea how to work this around?
Note: there are many scripts in the HTML, e.g. bootstrap, jquery, facebook, google, etc..., not just a single inline script.
I think there is no way in firefox to replace the complete HTML document with javascript without leaving the actual page. A workaround to reuse the original document and replace only the head and body tags:
$('html').html(html);
does this automatically: it strips out the HTML tags, injects the head and the body and loads the scripts.
ref: https://stackoverflow.com/a/1236372/607033
I'm writing a small script that determines if the user is on IE8 or below. If they are, the script should completely empty the document (body and head) and stop any further script executing.
I've played around with document.write() but can only get this working with window.onload. But I want it to execute as soon as it knows the browser version (which is when the script executes).
Example page setup:
<html>
<header>
Some CSS
Some meta
...
</head>
<body>
Page content
<script>
if (IE < 8) { //in reality I have a function to determine this
document.write('You browser is outdate. Please upgrade to view this site.');
}
</script>
<script src="more-scripts"></script>
</body>
</html>
This doesn't work but if I wrap the script in a window.onload it does. But then the page flashes up before the code executes. How can I get this to work?
Rather than using document.write() to print a message, you can use the .innerHTML property of the document.body element to entirely replace the body of the page. For this technique, your browser-check script should go in the head section, not the body (this is usually where scripts like this would go anyway).
<html>
<header>
Some CSS
Some meta
...
<script>
if (IE < 8) { //in reality I have a function to determine this
document.body.innerHTML = "You browser is outdate. Please upgrade to view this site.";
}
</script>
</head>
<body>
Page content
<script src="more-scripts"></script>
</body>
</html>
you could use conditional comments for that:
<!--[if IE 8]>
<script>
document.body.innerHTML = '';
document.write('You browser is outdate. Please upgrade to view this site.');
</script>
<![endif]-->
I'm using HTML Tidy in PHP and it's producing unexpected results because of a <script> tag in a JavaScript string literal. Here's a sample input:
<html>
<script>
var t='<script><'+'/script>';
</script>
</html>
HTML Tidy's output:
<html>
<script>
//<![CDATA[
var t='<script><'+'/script>';
<\/script>
<\/html>
//]]>
</script>
</html>
It's interpreting </script></html> as part of the script. Then, it adds another </script></html> to close the open tags. I tried this on an online version of HTML Tidy (http://www.dirtymarkup.com/) and it's producing the same error.
How do I prevent this error from occurring in PHP?
After playing around with it a bit I discovered that one can use comment //'<\/script>' to confuse the algorithm in a way to prevent this bug from occurring:
<html>
<script>
var t='<script><'+'/script>'; //'<\/script>'
</script>
</html>
After clean-up:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<script>
var t='<script><'+'/script>'; //'<\/script>'
</script>
<title></title>
</head>
<body>
</body>
</html>
My guess is that as the clean-up algorithm looks through the codes and detects the string <script> twice, it looks for </script> immediately. And separting < with /script> makes the second </script> goes undetected, which is why it decided to add another </script> at the end of the codes and somehow also closed it with antoher </html>. (Poor design indeed!)
So I made a second assumption that there isn't an if-statement in the algorithm to determine if a </scirpt> is in a comment, and I was right! Having another string <\/script> as a javascript comment indeed makes the algorithm to think that there are two </script> in total.
There's no need for string concatenation to avoid the closing </script>. Simply escaping the / character is enough to "fool" the parsers in browsers and, it seems, HTML Tidy's parser as well:
<html>
<script>
var t='<script><\/script>';
</script>
</html>
Try to make the script tag not a full word but a string concatenation
<html>
<script>
var t='<scr'+'ipt><'+'/script>';
</script>
</html>
Resulting cleaned code
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<script>
var t='<scr'+'ipt><'+'/script>';
</script>
<title></title>
</head>
<body>
</body>
</html>
This is probably a better practice to create a script tag like this:
(this should also solve your tidy issues)
<script>
script = document.createElement('script');
script.type = 'text/javascript';
script.src = 'http://myserver.com/file.js';
document.getElementsByTagName('head')[0].appendChild(script);
</script>
One way is to make it so tidy doesn't detect the script tag. The "cleanest" way I could come up with is to escape a character in the tag.
<html>
<script>
var t='<\script><'+'/script>';
</script>
</html>
so you could even do this, without having to break the string up as above:
var t='<\script></\script>';
That just works as expected
<html>
<script>
var t='<'+'script><'+'/script>';
</script>
</html>
By the way, string concatenation is not best way to create dynamically HTML to insert in page, look for document.createElement or even templates engines (handlebars.js is my favourite)