In an external javascript file I have a function that is used to append text to table cells (within the HTML doc that the javascript file is added to), text that can sometimes have Finnish characters (such as ä). That text is passed as an argument to my function:
content += addTableField(XML, 'Käyttötarkoitus', 'purpose', 255);
The problem is that diacritics such as "ä" get converted to some other bogus characters, such as "�". I see this when viewing the HTML doc in a browser. This is obviously not desirable, and is quite strange as well since the character encoding for the HTML doc is UTF-8.
How can I solve this problem?
Thanks in advance for helping out!
The file that contains content += addTableField(XML, 'Käyttötarkoitus', 'purpose', 255); is not saved in UTF-8 encoding.
I don't know what editor you are using but you can find it in settings or in the save dialog.
Example:
If you can't get this to work you could always write out the literal code points in javascript:
content += addTableField(XML, 'K\u00E4ytt\u00f6tarkoitus', 'purpose', 255);
credit: triplee
To check out the character encoding announced by a server, you can use Firebug (in the Info menu, there’s a command for viewing HTTP headers). Alternatively, you can use online services like Web-Sniffer.
If the headers for the external .js file specify a charset parameter, you need to use that encoding, unless you can change the relevant server settings (perhaps a .htaccess file).
If they lack a charset parameter, you can specify the encoding in the script element, e.g. <script src="foo.js" charset="utf-8">.
The declared encoding should of course match the actual encoding, which you can normally select when you save a file (using “Save As” command if needed).
The character encoding of the HTML file / doc does not matter any external ressource.
You will need to deliver the script file with UTF8 character encoding. If it was saved as such, your server config is bogus.
Related
I'm working on something that will read a user's text messages and export them to a csv file, which they can then download. The messages are being retrieved from a third-party web interface—I am essentially using js to grab the html of each message and compiling it as needed. The content of each message is added to a variable which, once all message are gathered, is given to a new Blob, which is then downloaded.
The problem I am having is that, in this web interface, emoji are represented as images, rather than characters. Thus, when writing a message containing an emoji to a file, the result is as so:
"Blah blah blah <img height="18px" width="18px" class="emoji adjustedSpriteForMessageDisplay spriteEMOJI sprite-1f612" data-textvalue="%F0%9F%98%92" src="assets/blank.gif">"
Now, from this image, we can get 2 workable values:
The UTF-8 hex value
F09F9892
and the Unicode codepoint (I may be referring to this wrong, I don't know much about encoding).
U+1f612
Now, what I want to do is take either of these values (whichever works better), and write it to the csv file as the character itself. So that, when viewing the csv file in a text editor or what have you, it would appear as
Though I have no idea where to even start with this. Maybe it's as simple as throwing some syntax around the character values, but I haven't been able to get anything from google, because I'm not familiar enough with encoding to know what to Google.
I suggest preprocessing the data as you grab it from the webpage instead of extracting it from the string afterwards.
You can then use decodeURIComponent() to decode the percent-encoded string:
decodeURIComponent('%F0%9F%98%92')
Combine that with jQuery to access the data-textvalue-attribute:
decodeURIComponent($(element).data('textvalue'))
I created a simple example on JSFiddle.
For some reason the emoji doesn't render correctly in the result screen in my browser, but that is a font issue. When looking at the result using a DOM inspector (or copying the text into a different application), the result is shown with a smiley.
CSV file format does not have character encoding information, so Excel usually assumes ASCII.
https://en.wikipedia.org/wiki/Comma-separated_values#General_functionality
Microsoft Excel mangles Diacritics in .csv files?
I'm trying to use ☰ in external javascript file
$('<div />',{
text: '☰',
......
But I couldn't save the file and its saying:
The document's current encoding can not correctly save all of the characters within the document. You may want to change to UTF-8 or an encoding that supports the special characters in this document.
What should I do?
You should convert the file to UTF-8, and then try pasting the character in, again, after it's converted and saved.
Your file could be in one of many, many formats, depending on your editor, but if you're just using a text-editor like Notepad, it's going to cause you problems with things that don't fit happily into ASCII.
I heard a lot about Baase64 Encoding for Images in Webdesign.
And i saw a lot of developers they use it for thier headlines with: data:image/png;base64,iVBORw0...
Is there any automatism (with javascript) behind?
Or have they all converted & inserted ? (could not belive)
Example: http://obox-inkdrop.tumblr.com/ (- Headlines)
First of all, the encoding has to be done on the server-side, be it :
automated with a script, that reads the original image file, and returns the base64 encoded string to inject it into the HTML that's being generated
or by hand, and directly placed into the HTML.
The base64 encoding cannot be done on the client-side, as the goal is to avoid sending the image file from the server to the browser (to minimize the number of HTTP requests).
Depending of the language that's used on the server-side, you'll probably find some function to do base64 encoding.
In PHP, you might be interested by base64_encode()
I'm dealing with a JSON file that i cannot modify, i have to keep it AS IS.
it contains text, with all the apostrophes converted to ’, and other special chars here and there...
what is that? unicode? how can i convert to the regular apostrophe?
i placed already the META tag utf-8 on the header but it doesn't seem to change anything...
What mime type is your JSON response being sent with? (Look in the headers in FireBug or the Developer Console.) It seems that you one of these steps is using a different encoding:
The JSON string generated by the web server
The mime type encoding sent along with the response
The mime type of your HTML page
The mime type for your JavaScript code
If you supply the community with actual code, or better yet a working reproducible test case, then the community can better help you.
what is that?
It is an attempt to interpret data stored in one character encoding as data stored in a different character encoding.
To ensure everything displays correctly you need to:
Pick an encoding (UTF-8 is a good bet)
Store everything in that encoding
Configure your editor to use it!
Configure your database (if applicable) to use it!
Ensure any server side code you use expects UTF-8 input and gives UTF-8 output
Configure your webserver to include charset=utf-8 on the Content-Type HTTP header
The W3C has a good introductory article on the subject, which has links to lots of useful further reading at the end.
I would like to change string encoding from UTF-8 to ISO-8859-2 in Javascript. How can I do it?
I need it because I've designed a widget. User just copies < script > tag from my site and puts it on his. This script creates div and puts into div widget contents with text.
If target website is in UTF-8 encoding - it works fine. But when it is in ISO-8859-2 than text that is encoded in UTF-8 is displayed on site with ISO-8859-2 and as a result I see trash.
Instead of using e.g. "ĉ" in your JavaScript code, use Unicode escapes such as "\u0109".
If you're in control of the output, you can replace all special characters with unicode escapes (e.g. \u00e4 for ä). The browser can interpret it correctly regardless of document encoding.
The easiest way to do this would be to put the string into a JSON encoder. Both PHP's and Ruby's does that. Don't know about other implementations though.
Another solution that might work is to add charset="utf-8" to the <script> tag.
I suppose you just need to convert your wdiget from UTF-8 to ISO-8859-2 and provide 2 versions of script.