JavaScript:output symbols and special characters

JavaScript:output symbols and special characters - javascript

I am trying to include some symbols into a div using JavaScript.
It should look like this:
x ∈ &reals;
, but all I get is: x ∈ &reals;.
var div=document.getElementById("text");
var textnode = document.createTextNode("x ∈ &reals;");
div.appendChild(textnode);
<div id="text"></div>
I had tried document.getElementById("something").innerHTML="x ∈ &reals;" and it worked, so I have no clue why createTextNode method did not.
What should I do in order to output the right thing?

You are including HTML escapes ("entities") in what needs to be text. According to the docs for createTextNode:
data is a string containing the data to be put in the text node
That's it. It's the data to be put in the text node. The DOM spec is just as clear:
Creates a Text node given the specified string.
You want to include Unicode in this string. To include Unicode in a JavaScript string, use Unicode escapes, in the format \uXXXX.
var textnode = document.createTextNode("x \u2208 \u211D");
Or, you could simply include the actual Unicode character and avoid all the trouble:
var textnode = document.createTextNode("x ∈ ℝ");
In this case, just make sure that the JS file is served as UTF-8, you are saving the file as UTF-8, etc.
The reason that setting .innerHTML works with HTML entities is that it sets the content as HTML, meaning it interprets it as HTML, in all regards, including markup, special entities, etc. It may be easier to understand this if you consider the difference between the following:
document.createTextNode("<div>foo</div>");
document.createElement("div").textContent = "<div>foo</div";
document.createElement("div").innerHTML = "<div>foo</div>";
The first creates a text node with the literal characters "<div>foo</div>". The second sets the content of the new element literally to "<div>foo</div>". The third, on the other hand, creates an actual div element inside the new element containing the text "foo".

Every character has a hexadecimal name (for example 0211D). if you want to transform it into a HTML entity, add &#x => ℝ or use the entity name &reals; or the decimal name ℝ which can be found all here: http://www.w3schools.com/charsets/ref_html_entities_4.asp
But when you use JavaScript, in order to make the browser understand that you want to output a unicode symbol and not a string, escape entities are required. To do that, add \u before the hexadecimal name =>\u211D;.

document.createTextNode will automatically html-escape the needed characters. You have to provide those texts as JavaScript strings, either escaped or not:
document.body.appendChild(document.createTextNode("x ∈ ℝ"));
document.body.appendChild(document.createElement("br"));
document.body.appendChild(document.createTextNode("x \u2208 \u211d"));
EDIT: It's not true that the createTextNode function will do actual html escaping here as it doesn't need to. #deceze gave a very good explanation about the connection between the dom and html: html is a textual representation of the dom, thus you don't need any html-related escaping when directly manipulating the dom.

Related

unescape in javascript not working when %26 ( & sign) is in value

I have the below code in my JSP. UI displays every character correctly other than "&".
<c:out value="<script>var escapedData=unescape('${column}');
$('div').html(escapedData);</script>" escapeXml="false" /> </div>
E.g. 1) working case
input = ni!er#
Value in my escapedData variable is ni%21er%40. Now when I put it in my div using
$('div').html(escapedData); then o/p on html is as expected
E.g. 2) Issue case
input = nice&
Value in my escapedData variable is nice%26. Now when I put it in my div using
$('div').html(escapedData); then also it displays below
$('#test20').html('nice%26');
However, when output is displayed in JSP, it just prints "nice". It truncates everything after &.
Any suggestions?

It looks like you have some misunderstandings what unescape(val)/escape(val) do and where you need them. And what you need to take attention of when you use .html().
HTML and URI have certain character that have special meanings. The most important ones are:
HTML: <, >, &
URI: /,?,%,&
If you want to use one of those characters in HTML or URI you need to escape them.
The escaping for URI and for HTML are different.
The functions unescape/escape (deprecated) and decodeURI/endcodeURI are for URI. But was you want is to escape your data into the HTML format.
There is no build-in function in_JS_ that does this but you could e.g. use the code of the answer to this question Can I escape html special chars in javascript?.
But as it seems that you use jQuery you could think of just using .text instead of .html as this will do the escaping for you.
An additional note:
I'm pretty sure that the var escapedData=unescape('${column}'); does not do anything. I assume that ${column} already is ni!er#/nice&.
So please check your source code. If var escapedData=unescape('${column}'); will look like var escapedData=unescape('ni!er#'); then you should remove the unescape otherwise you would not get the expected result if the ${column} contains something like e.g. %23.

Special characters not displaying correctly in a Javascript string

I have a function that assigns a string containing specials characters into a variable, then passes that variable to a DOM element via innerHTML property, but it prints strange characters. Let's say I code this...
someText = "äêíøù";
document.getElementById("someElement").innerHTML = someText;
It prints the following text...
Ã¤ÃªÃÃ¸Ã¹
I know how to use the entity names to prevent this, but when I use them to pass the value through a Javascript method, they print literally.

This means that you have a conflict of encodings. Your JavaScript and your HTML are being served to the browser with different encodings/character sets. Ensure that they're encoded in and served with the same encoding / character set (UTF8 is a good choice) to make sure that characters are correctly interpreted.
Obligatory link: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

How to decode html miscellaneous symbol from its glyph?

I have a div that contains a settings icon that is a html miscellaneous symbol
<span class="settings-icon">⚙</span>
I have a jasmine test that checks the div contents to makes sure that it is not changed.
it("the settings div should contain ⚙", function() {
var settingsIconDiv = $('.settings-icon');
expect(settingsIconDiv.text())
.toContain('⚙');
});
It will not pass as it is evaluated as its glyph symbol of a gear icon ⚙
How to I decode the glyph in order to pass the test?

To get actual character from Unicode to compare it to a literal in HTML you can use String.fromCharCode() e.g.
.toContain(String.fromCharCode(9881))

You should check against the string '⚙' or, if you do not how to enter it in your code, the escape notation \u2699. There are other, clumsier ways to construct a string containing the character, but simplicity is best.
No matter how the character is written in HTML source code (e.g., as the reference ⚙), it appears in the DOM as the character itself, U+2699. In JavaScript, a string like ⚙ is just a sequence of seven Ascii characters (though you can pass it to a function that parses it as an HTML character reference, or you can assign it e.g. to the innerHTML property, causing HTML parsing, but this is rather pointless and confusing).

To match the browser behavior (because you don't know how it is encoded in html or in text) i would try the following
.toContain($("<span>⚙</span>").text()) instead of .toContain('⚙').
That way it should match how it is stored in the dom.
The String.fromCharCode(9881); mentioned by Yuriy Galanter will definitely also work reliable. But because dom engine and the js engine are two different parts, that could behave differently, i would test with both techniques.

getElementById replace HTML

<script type="text/javascript">
var haystackText = document.getElementById("navigation").innerHTML;
var matchText = 'Subscribe to RSS';
var replacementText = '<ul><li>Some Other Thing Here</li></ul>';
var replaced = haystackText.replace(matchText, replacementText);
document.getElementById("navigation").innerHTML = replaced;
</script>
I'm attempting to try and replace a string of HTML code to be something else. I cannot edit the code directly, so I'm using Javascript to alter the code.
If I use the above method Matching Text on a regular string, such as just 'Subscribe to RSS', I can replace it fine. However, once I try to replace an HTML string, the code 'fails'.
Also, what if the HTML I wish to replace contains line breaks? How would I search for that?
<ul><li>\n</li></ul>
??
What should I be using or doing instead of this? Or am I just missing a small step? I did search around here, but maybe my keywords for the search weren't optimal to find a result that fit my situation...
Edit: Gonna mention, I'm writing this script in the footer of my page, well after the text I wish to replace, so it's not an issue of the script being written before what I want to overwrite to appear. :)

Currently you are using String.replace(substring, replacement) that will search for an exact match of the substring and replace it with the replacement e.g.
"Hello world".replace("world", "Kojichan") => "Hello Kojichan"
The problem with exact matches is that it doesn't allow anything else but exact matches.
To solve the problem, you'll have to start to use regular expressions. When using regular expression you have to be aware of
special characters such as ?, /, and \ that need to escaped \?, \/, \\
multiline mode /regexp/m
global matching if you want to replace more than one instance of the expression /regexp/g
closures for allowing multiple instances of white space \s+ for [1..n] white-space characters and \s* for [0..n] white-space characters.
To use regular expression instead of substring matching you just need to change String.replace("substring", "replacement") to String.replace(/regexp/, "replacement") e.g.
"Hello world".replace(/world/, "Kojichan") => "Hello Kojichan"

From MDN:
Note: If a <div>, <span>, or <noembed> node has a child text node that
includes the characters (&), (<), or (>), innerHTML returns these
characters as &amp, &lt and &gt respectively. Use element.textContent
to get a correct copy of these text nodes' contents.
So since textContent (or innerText) won't get you the HTML, you'd have to modify your search string appropriately.

You can use Regular Expressions.

Recommend to use Regular Expression. Notice that ? and / are special characters in Regular Expression. And for global multi-line matching, you need g and m flags set in the regular expression.

Regular expression matching of HTML (other than plain text) that comes out of a web page is a bad idea and is troublesome to make work cross browser (particularly in IE). The HTML that comes out of a web page does not always look the same as what was put in because some browser reconstitute the HTML and don't actually store what went in. Attributes can change order, quote marks can change or disappear, entities can change, etc...
If you want to modify whole tags, then you should directly access the DOM and operate on the actual objects in the page.

IE innerHTML chops sentence if the last word contains '&' (ampersand)

I am trying to populate a DOM element with ID 'myElement'. The content which I'm populating is a mix of text and HTML elements.
Assume following is the content I wish to populate in my DOM element.
var x = "<b>Success</b> is a matter of hard work &luck";
I tried using innerHTML as follows,
document.getElementById("myElement").innerHTML=x;
This resulted in chopping off of the last word in my sentence.
Apparently, the problem is due to the '&' character present in the last word. I played around with the '&' and innerHTML and following are my observations.
If the last word of the content is less than 10 characters and if it has a '&' character present in it, innerHTML chops off the sentence at '&'.
This problem does not happen in firefox.
If I use innerText the last word is in tact but then all the HTML tags which are part of the content becomes plain text.
I tried populating through jQuery's #html method,
$("#myElement").html(x);
This approach solves the problem in IE but not in chrome.
How can I insert a HTML content with a last word containing '&' without it being chopped off in all browsers?
Update : 1. I tried html encoding the content which I am trying to insert into the DOM. When I encode the content, the html tags which are part of the content becomes plain string.
For the above mentioned content, I expect the result to be rendered as,
Success is a matter of hard work &luck
but when I encode what I actually get in the rendered page is,
<b>Success</b> is a matter of hard work &luck

You should replace your & with &.
The & (ampersand) character is used within HTML to represent various special characters. For example, " = ", < = <, etcetera. Now, &luck clearly is not a valid HTML entity (for one it is missing the semicolon). However, various browsers may, due to combinations of error correcting (the semicolon), and the fact that it looks somewhat like an HTML entity (& followed by four characters) try to parse it as such.
Because &luck; is not a valid HTML entity, the original text is lost. Because of this, when using an ampersand in your HTML, always use &.
Update: When this text is entered by a user, it is up to you to escape this character properly. In PHP for example, you would call htmlentities on the text before displaying it to the user. This has the added benefit of filtering out malicious user code such as <script> tags.

The ampersand is a special character in HTML that indicates the start of a character entity reference or numeric character reference, you need to escape it like so:
var x = "<b>Success</b> is a matter of hard work &luck";

Try using this instead:
var x = "<b>Success</b> is a matter of hard work &luck";
By HTML encoding the ampersand, you are ensuring that there is no ambiguity in what you mean when you write "&luck".

We Keep Coding

JavaScript is the programming language of the Web.

JavaScript:output symbols and special characters - javascript

Related

unescape in javascript not working when %26 ( & sign) is in value

Special characters not displaying correctly in a Javascript string

How to decode html miscellaneous symbol from its glyph?

getElementById replace HTML

IE innerHTML chops sentence if the last word contains '&' (ampersand)

Categories

Resources