How to prevent & conversion to & when using JavaScript? (browser specifc) - javascript

I have a problem concerning string output on HTML page when using Javascript and ASP. Logic of page generation goes like this:
We use asp page to generate HTML code using Response.Write(). If string contains numeric character reference (for example С) it would show on the user's side just fine as a character.
After that we add OnLoad event, which calls for a Javascript function. All this happens inside <body><\body> tags. Source for JavaScript added inside <script></script> tags. The function only adds document.href, which contains reference to the same asp page.
The asp logic loads again and adds some text to the page using Response.BinaryWrite() (Response.Write can be used all the same) All character references are shown as codes:С. Obviously all '&' symbols become &(asp automatic conversion), browser decodes it as & and we can only see a code С and not the symbol 'С'.
As far as I know such behaviour can be caused by <script> tags, as a precaution against xss attacks. In the end I want to stop encoding '&' as &.
However here is the most important part:
If I add header with "Content-Type" "text\html", IE (any version) starts encoding NCR symbols in a correct way. But Firefox, Chrome and Safari do not change behavior and keep encoding & as &. I can see several questions on Stack Overflow which looks like mine, yet the situation is not exactly the same (My strings are not inserted directly by JavaScript, so I cannot manipulate output string and change & to &, also my strings have correct symbols in the first place, they get changed by asp or by browser). Is there any elegant way to force Firefox or Chrome to decode page as IE? Maybe some settings or attributes in HTML tags? This problem looks like it depends on a browser to me, am I right?

Related

window.open UTF-8 Issue

I have this site:
http://a.b/x – y
where the dash is non-ASCII \u2013 or %E2%80%93 in UTF-8 speak.
The following link with UTF-8 works fine:
True Link
but scripting it with window.open() with the exact same URL gives a 404:
Raw JS Link
Viewing properties on the error page to see the resulting URL I note the extended dash is replaced with:
â??
If I replace the extended dash, and only the extended dash with "\u2013" the link works fine:
Modified JS Link
and the resulting URL seems to have re-endocded the extended dash back to UTF-8.
With this in mind I tried to decode the UTF-8 encoding and re-encode just the space but this failed with the same error as before:
Raw JS Link
I suspect that window.open() is mangling the URL for some reason.
I then went on to try a bunch of different ideas and combinations of decode / encode and even dragged escpae()/unescape() back into use, but to no avail.
The reason for window.open is that I am limited to controlling just the content of the HREF attribute. In this case it's an SSRS expression in a "Go to URL" Action, which SSRS UTF-8 encodes certain characters, so that even with the split(' ') above I actually have to use split(String.fromCharCode(32)).
However I've stripped everything out into a simple HTML page which is where I am doing my analysis with.
PS: IE8, though user base is IE8+
PSS: Added missing quote.
PSS: It looks like this might be an IE8 specific issue.
<a href="javascript:void(window.open('http://a.b/...component...
So here you've got multiple nested escaping contexts. You're injecting text into:
a component of a URL (needs URL-escaping), inside
a JavaScript string literal (needs JS-escaping), inside
a javascript: pseudo-URL (needs URL-escaping), inside
an HTML attribute value (needs HTML-escaping)
So the value x – y has to be escaped four times:
URL-escape to x%20%E2%80%93%20y
JS-escape to x%20%E2%80%93%20y (no changes this time as there are no JS-special characters in this value)
URL-escape to x%2520%25E2%2580%2593%2520y
HTML-escape to x%2520%25E2%2580%2593%2520y (no changes this time as there are no HTML-special characters in this value).
Nested syntaxes needing escaping are very, very difficult to get right. And generally you should never use javascript: URLs: as well as being a nightmare of multiple-escaping, they're also pretty bad for usability and accessibility.
Avoid injecting into nested code. A better pattern for links that open in a new window (if you absolutely must) is to put the real URL in the href, so it responds correctly to middle-click and other link affordances, and then read that href from JS, eg.:
<a href="http://a.b/x%20%E2%80%93%20y" onclick="window.open(this.href, ...options...); return false;"
(The return-false prevents the link being followed after the window is opened.) Also consider breaking the JS code out into a separate script that binds to all appropriate links automatically (eg by class attribute) so you don't have to have inline JavaScript in your HTML.
The single quotes were misplaced on your last example, also, there's no need for .split(' ').join('%20') as it will create errors.
Raw JS Link
demo
http://jsfiddle.net/bf2703ah/1/

Linking to File With Space in JQuery

How would you link to a file that contains a space? Is it possible? I have a javascript document and already have dozens of images that contain spaces but I was hoping to be able to still link to them.
%20 is the escaped value for a blank space. Use that in a hyperlink, and you'll get the file you want :)
In case you test it in a browser: modern browsers (Chrome for sure) does not visually change the space to %20 anymore in the address bar, but it does still escape all characters before making a web request.
Edit
Generally speaking, you'd like to html encode your strings via an accessible method, rather than manually escaping the needed characters.
The following SO question has a very elegant solution. If you use it with an element that is not visible to the user (or not even part of the DOM, as is the case with the linked answer), they won't even know.

Custom mark up language breaks html

I am using document.write to output HTML to the browser ( I plan to change to .innerHTML soon).
When using view source I can only see the markup, I can not see the HTML output. However I verified visually that rows 1 and 2 of 0 through 6 are completely missing and commented as such below.
When I inspect the mark up below I see that these two rows have many special characters which leads me to believe this might be the problem.
Note:
Each row is divided by a || and each field is divided by a |. The markup lanaguage is properly escaped as you can see there are no superfulous | or ||.
Actually I just noticed the tag is being cropped for some reason:
https://www.google.com/#hl=en&sclient=psy-ab&q=new+york+city+venture+capitalists&pbx=1&oq=new+york+city+venture+capitalists&aq=f&aqi=&aql=&gs_sm=12&gs_upl=0l0l0l98460l0l0l0l0l0l0l0l0ll0l0&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&fp=94def8e69f73d3d7&biw=1214&bih=852
becomes
<a class=\'bookmark_tweet\' target=\'_blank\' href=\'https://www.google.com/#hl=en&sclient=psy-ab&q=new+york+city+venture+cap
I'll post relevant code once I get it:
View source shows you what was received from the server. If you add to it using document.write() you won't see that unless you use a DOM inspector in your browser, such as firebug (Firefox). I know there is one for IE but never use IE so I don't know what it's called.
Javascript strings don't span lines. You can't open a quote on one line, then close it on another.

How can I replace newline characters using javascript in IE8?

I've searched Stackoverflow for hours and hours, and nobody's solution works in Internet Explorer 8.
I am provided with a plaintext document like this:
This is a legal agreement ("Agreement") between you and ...
License
Subject to you continued and ongoing compliance with the terms and conditions set ...
Restrictions
Except as otherwise explicitly provided in this Agreement, ...
Ownership
Except for this license granted to you, as between you and ...
Disclaimer of Warranties
Use at your own risk. ...
And I need to replace the newline characters (linebreaks, carriage returns, whatever you want to call them) with double linebreaks (<br/><br/>) to make the text look more normal.
The nl2br function from jQuery convert line breaks to br (nl2br equivalent) works fine in most browsers. However, a client of mine uses IE8.
Go ahead and try the nl2br function using IE8 (or a modern Internet Explorer set to IE8 mode); it doesn't work.
Does anyone know why it doesn't work in IE8? And how to accomplish the goal?
P.S. I put some code here http://jsfiddle.net/L2Ufj/2/ and oddly enough it works in IE8 via jsfiddle, but if you copy it to somewhere else and run it for real, it won't work in IE8.
One way to get round this in IE8 is to convert the line-breaks into a 'token' that IE8 will recognise before it is rendered on the page. Then once it's rendered, in a success handler for example you can search for that token and replace with <br> or whatever you wish:
e.g.
Pre render (I've used < br > as my token but you can use anything)
textToEdit = textToEdit.replace(/\n/g, '<br>');
Post render (Search for your token and replace with <br> or whatever you wish)
renderedTextWrapper.innerHTML = renderedTextWrapper.innerHTML.replace(/<br>/g, '<br>');
When you retrieve element's innerHTML, IE will convert the innerHTML to a "standard-format" (by collapsing multi-spaces into one, removing linebreak, etc...) before giving you the result.
Thus, you can not find any linebreak character in the innerHTML you get with IE. What a bad news.
I think the most feasible & easy approach is to store your text inside <textarea> tag instead of normal <div>. IE will leave <textarea> alone when you get it's value instead of innerHTML:
originalText=document.getElementById('EULA_content').value
Of course, when you get the newText, you should append it to another div element.

innerHTML alternative for retrieving contents of page?

I'm currently using innerHTML to retrieve the contents of an HTML element and I've discovered that in some browsers it doesn't return exactly what is in the source.
For example, using innerHTML in Firefox on the following line:
<div id="test"><strong>Bold text</strong></strong></div>
Will return:
<strong>Bold text</strong>
In IE, it returns the original string, with two closing strong tags. I'm assuming in most cases it's not a problem (and may be a benefit) that Firefox cleans up the incorrect code. However, for what I'm trying to accomplish, I need the exact code as it appears in the original HTML source.
Is this at all possible? Is there another Javascript function I can us?
I don't think you can receive incorrect HTML code in modern browsers. And it's right behaviour, because you don't have source of dynamicly generated HTML. For example Firefox' innerHTML returns part of DOM tree represented in string. Not an HTML source. And this is not a problem because second </strong> tag is ignored by the browser anyway.
innerHTML is generated not from the actual source of the document ie. the HTML file but is derived from the DOM object that is rendered by the browser. So if IE somehow shows you incorrect HTML code then it's probably some kind of bug. There is no such method to retrieve the invalid HTML code in every browser.
You can't in general get the original invalid HTML for the reasons Ivan and Andris said.
IE is also “fixing” your code just like Firefox does, albeit in a way you don't notice on serialisation, by creating an Element node with the tagName /strong to correspond to the bogus end-tag. There is no guarantee at all that IE will happen to preserve other invalid markup structures through a parse/serialise cycle.
In fact even for valid code the output of innerHTML won't be exactly the same as the input. Attribute order isn't maintained, tagName case isn't maintained (IE gives you <STRONG>), whitespace is various places is lost, entity references aren't maintained, and so on. If you “need the exact code”, you will have to keep a copy of the exact code, for example in a JavaScript variable in a <script> block written after the content in question.
If you don't need the HTML to render (e.g., you're going to use it as a JS template or something) you can put it in a textarea and retrieve the contents with innerHTML.
<textarea id="myTemplate"><div id="test"><strong>Bold text</strong></strong></div></textarea>
And then:
$('#myTemplate').html() === '<div id="test"><strong>Bold text</strong></strong></div>'
Other than that, the browser gets to decide how to interpret the HTML and it will only return you it's interpretation, not the original.
innerTEXT ? or does that have the same eeffect?
You must use innerXML property. It does exactly what you want to achieve.

Categories