What is the purpose of this JavaScript? - javascript

I was playing around with a Python-based HTML parser and parsed Stackoverflow. The parser puked on a line with
HTMLParser.HTMLParseError: bad end tag: "</'+'scr'+'ipt>", at line 649, column 29
The error points to the following lines of javascript in the site's source:
<script type="text/javascript">
document.write('<s'+'cript lang' + 'uage="jav' + 'ascript" src=" [...] ">');
document.write('</'+'scr'+'ipt>');
</script>
([...] replace a long link, which is removed for simplicity)
Out of curiosity, is there a specific reason for what looks to me like artificial 'obfuscation' of the code, i.e. why use the document.write method to concatenate all the chopped up strings?

I think it's to fight adblockers.
... + 'uage="jav' + 'ascript" src="http://ads.stackoverflow.com

It has been written in that way to avoid the browser thinks it's the closing tag for <script>, which would cause some problems.

When the HTML parser encounters document.write('</script>');, it thinks it has found the end of the enclosing <script> tag. Breaking the tag up stops the parser from recognising the closing tag.
The other way I've seen this achieved is by escaping the slash, i.e. document.write('<\/script>');.
The correct way to do this is either:
Enclose the body of the script in a <![CDATA[ ... ]]> block (if serving XHTML), or
Put the script in an external file, or
Use the DOM API instead (i.e. create a script node and append that to the document head)

Perhaps its there to stop programs that search specifically for script tags. Ad blockers, for example, look for script tags and object tags.

Related

Using script tags within script break the script

Essentially, I have a script tag within my script.
(generic HTML)
<script>
function asdf(){
document.getElementById('jkl').innerHTML = "<script>(another script goes here)</script>"
}
</script>
(generic HTML)
Unfortunately, the first </script> tag is listened to, not the second. Is there any way to "comment" it, like butting a back slash in front of quotes?
You need to break your inside script string into two pieces like this:
<script>
function asdf(){
document.getElementById('jkl').innerHTML = "<script>(another script goes here)</scr" + "ipt>"
}
</script>
Otherwise the HTML parser will think that the inner </script> closing tag is closing the opening tag, and this will cause problems.

"%3Cscript" vs "<script"

Every once in a while, I'll see an HTML code snippet with:
%3Cscript
where the %3C replaces the <. Is this because the code was auto-generated or needs to display properly in an editor or was it coded that way explicitly for some reason and needs to keep that form on the HTML webpage? In case it is helpful here is the full beginning of the line of code I was questioning:
document.write(unescape('('%3Cscript
Wouldn't the line of code work just fine it you replaced the %3C with a <?
The unescape() Javascript function converts the %3C back to < before it gets written into the document. This is apparently an attempt to avoid triggering scanners that might see the literal <script tag in the source and misinterpret what it means.
When writing javascript in a script tag embedded in html, the sequence </script> cannot appear anywhere in the script because it will end the script tag:
<script type="text/javascript">
var a = "<script>alert('hello world');</script>";
</script>
Is more or less treated as:
<script type="text/javascript">
var a = "<script>alert('hello world');
</script>
";
<script></script>
In the eyes of the html parser.
Like mplungjan said, this is convoluted way and one can simply <\/script> in a javascript string literal to make it work:
<script type="text/javascript">
var a = "<script>alert('hello world');<\/script>";
</script>
This is not related to document.write technically at all, it's just that document.write is a common place where you need "</script>" in javascript string literal.
Also note that "<script>" is indeed totally fine as is. It's just the "</script>" that's the problem which you have cut out from the code.
As mentioned, possible attempt to fool scanners.
A more useful and important one is the
<\/script> or '...<scr'+'ipt>' needed to not end the current script block when document.writing a script inline

Need to escape /> (forward slash and greater than) with jQuery or Javascript

I'm working on a web page that will display code inside pre tags, and need to render characters used to form HTML tags within those pre tags. I'm able to escape the greater-than and less-than symbols via jQuery/Javascript per my code below.
However, the combination of a forward slash and a greater than symbol (/>) is problematic. Additionally, I'm getting more expected results rendered in the final output when the page runs.
The contents of the pre tag are simple.
<pre>
<abc />
<xyz />
</pre>
Here is my jQuery code.
$(function(){
$('pre').html(function() {
//this.innerHTML = this.innerHTML.replace(new RegExp(['/>'],"g"), "#");
//this.innerHTML = this.innerHTML.replace(new RegExp(['/'],"g"), "*");
this.innerHTML = this.innerHTML.replace(new RegExp(['<'],"g"), "<");
this.innerHTML = this.innerHTML.replace(new RegExp(['>'],"g"), ">");
});
});
When this runs, what I expect to happen is the page will render the following:
<abc/><xyz/>
Pretty simple. Instead, here is what gets rendered in Chrome, Firefox, and IE.
<abc>
<xyz>
</xyz></abc>
The tags get duplicated, and the forward slashes get moved after the less-than symbols. Presently I'm learning jQuery, so there may be something more fundamental wrong with my function. Thanks for your help.
You have some invalid HTML. The browser then tries to turn the invalid HTML into a DOM. jQuery then asks the browser to turn the DOM back into HTML. What it gets is a serialisation of the DOM at that stage. The original source is lost.
You can't use jQuery to recover the original broken source of an HTML document (short of making a new HTTP request and forcing it to treat the response as plain text).
Fix the HTML on the server before you send it to the client.

Javascript external script loading strangeness

I'm maintaining a legacy javascript application which has its components split into 4 JS files.
They are "Default.aspx", "set1.aspx", "set2.aspx" and "set3.aspx". The ASPX pages writes out compressed JS from multiple (all-different) source files belonged to their respective set and set content-type header to "text/javascript".
The application is invoked by adding a reference to the first set and creating the main entry object.
<script src="/app/default.aspx" type="text/javascript"></script>
<script type="text/javascript>
var ax;
// <body onload="OnLoad()">
function OnLoad() {
ax = new MyApp(document.getElementById("axTargetDiv"));
}
</script>
At the end of the first set of scripts (default.aspx) is the following exact code:
function Script(src) {
document.write('<script src="' + src + '" type="text/javascript"></script>');
}
Script("set1.aspx?v=" + Settings.Version);
Which loads the second set of scripts (set1.aspx). And this works without any errors in all major browsers (IE6-8 Firefox Safari Opera Chrome).
However, as I've been working on this script for quiet sometime, I'd like to simplify function calls in a lot of places and mistakenly inlined the above Script function, resulting in the following code:
document.write('<script src="set1.aspx?v=' + Settings.Version + '" type="text/javascript"></script>');
Which, when tested with a test page, now throws the following error in all browsers:
MyApp is not defined.
This happens at the line: ax = new MyApp(... as Visual Studio JS debugger and Firebug reports it.
I've tried various methods in the first 4 answers posted to this question to no avail. The only thing that will enable MyApp to loads successfully is only by putting the actual "add script" code inside a function (i.e. the document.write('script') line):
If I put the document.write line inside a function, it works, otherwise, it doesn't. What's happening?
Splitting and/or escaping the script text does not work.
To see the problem, look at that top line in its script element:
<script type="text/javascript">
document.write('<script src="set1.aspx?v=1234" type="text/javascript"></script>');
</script>
So an HTML parser comes along and sees the opening <script> tag. Inside <script>, normal <tag> parsing is disabled (in SGML terms, the element has CDATA content). To find where the script block ends, the HTML parser looks for the matching close-tag </script>.
The first one it finds is the one inside the string literal. An HTML parser can't know that it's inside a string literal, because HTML parsers don't know anything about JavaScript syntax, they only know about CDATA. So what you are actually saying is:
<script type="text/javascript">
document.write('<script src="set1.aspx?v=1234" type="text/javascript">
</script>
That is, an unclosed string literal and an unfinished function call. These result in JavaScript errors and the desired script tag is never written.
A common attempt to solve the problem is:
document.write('...</scr' + 'ipt>');
This is still technically wrong (and won't validate). This is because in SGML, the character sequence that ends a CDATA element is not actually ‘</tagname>’ but just ‘</’ — a sequence that is still present in the line above. Browsers generally are more forgiving and in practice will allow it.
Probably the best solution is to escape the sequence. There are a few possibilities, but the simplest is to use JavaScript string literal escapes ('\xNN'):
document.write('\x3Cscript src="set1.aspx?v=1234\x26w=5678" type="text/javascript"\x3E\x3C/script\x3E');
The above escapes all ‘<’, ‘>’ and ‘&’ characters, which not only stops the ‘</’ sequence appearing in the string, but also allows it to be inserted into an XHTML script block without causing errors.
(In XHTML, there's no such thing as a CDATA element, so these characters would have the same meaning as if included in normal content, and a string '<script>' inside a script block would actually create a nested script element! It's possible to allow <>& in an XHTML script block by using a <![CDATA[ section, but it's a bit ugly and usually better to avoid using those characters in inline script.)
1) Assure that you do not try to reference MyApp before the script is "actually" included in your page.
2) Try breaking the word "script" in your inline loader like this:
<script type="text/javascript">
document.write('<scr' + 'ipt src="set1.aspx?v=1234" type="text/javascript"></scr' + 'ipt>');
</script>
Alternatively, use this syntax which i borrowed from google analytics code and have been able to use successfully:
<script type="text/javascript">
document.write(unescape("%3Cscript src='set1.aspx?v=1234' type='text/javascript'%3E%3C/script%3E"));
</script>
You could also try:
var script = document.createElement("script");
script.src = "set1.aspx?v=1234";
script.type = "text/javascript";
document.getElementsByTagName("head")[0].appendChild(script);
Steve
If you could use JQuery you could use the following:
$.getScript("set1.aspx?v=1234");
This loads the script into the global javascript context.
Make sure you set contenttype of the response to "text/javascript".
Hope this helps...

How do I add a pre tag inside a code tag with jQuery?

I'm trying to use jQuery to format code blocks, specifically to add a <pre> tag inside the <code> tag:
$(document).ready(function() {
$("code").wrapInner("<pre></pre>");
});
Firefox applies the formatting correctly, but IE puts the entire code block on one line. If I add an alert
alert($("code").html());
I see that IE has inserted some additional text into the pre tag:
<PRE jQuery1218834632572="null">
If I reload the page, the number following jQuery changes.
If I use wrap() instead of wrapInner(), to wrap the <pre> outside the <code> tag, both IE and Firefox handle it correctly. But shouldn't <pre> work inside <code> as well?
I'd prefer to use wrapInner() because I can then add a CSS class to the <pre> tag to handle all formatting, but if I use wrap(), I have to put page formatting CSS in the <pre> tag and text/font formatting in the <code> tag, or Firefox and IE both choke. Not a huge deal, but I'd like to keep it as simple as possible.
Has anyone else encountered this? Am I missing something?
That's the difference between block and inline elements. pre is a block level element. It's not legal to put it inside a code tag, which can only contain inline content.
Because browsers have to support whatever godawful tag soup they might find on the real web, Firefox tries to do what you mean. IE happens to handle it differently, which is fine by the spec; behavior in that case is unspecified, because it should never happen.
Could you instead replace the code element with the pre? (Because of the block/inline issue, technically that should only work if the elements are inside an element with "flow" content, but the browsers might do what you want anyway.)
Why is it a code element in the first place, if you want pre's behavior?
You could also give the code element pre's whitespace preserving power with the CSS white-space: pre, but apparently IE 6 only honors that in Strict Mode.
Btw I don't know if it is related but pre tags inside code tags will not validate in strict mode.
Are you using the latest jQuery ?
What if you try
$("code").wrapInner(document.createElement("pre"));
Is it any better or do you get the same result ?
As markpasc stated, a PRE element inside CODE element is not allowed in HTML. The best solution is to change your HTML code to use <pre><code> (which means a preformatted block that contains code) directly in your HTML for code blocks.
You could use html() to wrap it:
$('code').each(function(i,e)
{
var self = $(e);
self.html('<pre>' + self.html() + '</pre>');
});
As mentioned above, you'd be better off changing your html. But this solution should work.

Categories