Converting Chinese UTF8 to readable string

Converting Chinese UTF8 to readable string - javascript

I'm trying to convert Chinese characters in a XML file to readable Chinese string using javascript, but I'm not sure how to. I have checked other SO posts, and tried the following
unescape(encodeURIComponent('丘'))
but still can't get it to work, and wondering if someone could help?
<utf8>丘</utf8>

Neither unescape nor encodeURIComponent (which deal with percent-encoding) will help you with a XML character entity. You just want to parse the XML file! Accessing the DOM then will yield the expected string.

Related

Problem with displaying Japanese text strings using JavaScript code in Pavlovia

I'm trying to translate text in a javascript file which has been created using English into Japanese. When I just change the string to Japanese characters using copy & paste, the text in Pavlovia is displayed as '?????'. From what I could gather from other posts this has something to do with the fact that Japanese characters are encoded in UTF-8, which is not the standard encoding used in my JavaScript file. I have tried saving the whole file with encoding UTF-8, but this seemingly cannot be read Pavlovia. So I assume I need some local UTF-8 encoding of the individual Japanese text bits.
Unfortunately I have zero JavaScript experience, so any help with this would be greatly appreciated.

JavaScript - HTML string erring with unexpected EOF

I have an HTMl string that is stored in a database. When I go to set the value of a javascript variable with this string on the front-end via my templating engine (Leaf), it stores escaped as:
var string = <p>It's a round about way.</p> <p><!-- pagebreak -->But Maybe this is the way?</p>;
I'm trying to set this value as the content value for TinyMCE, but JavaScript produces an Unexpected EOF error when reading this string and points to a & character, which I presume is the first character of the new line. I tried on the back-end to replace occurrences of string \r\n with a so it would play nicer with JavaScript but the changes didn't seem to take. I tried encoding/decoding the string but that didn't help. Perhaps someone can help shed some light on this seemingly trivial task?
Thanks in advance.

Javascript was rendering \r\n characters found in the string instead of escaping them. Parsed it out in the server-side code instead of handling in JS.

Working with characters based on their UTF-8 hex codes

I'm working on something that will read a user's text messages and export them to a csv file, which they can then download. The messages are being retrieved from a third-party web interface—I am essentially using js to grab the html of each message and compiling it as needed. The content of each message is added to a variable which, once all message are gathered, is given to a new Blob, which is then downloaded.
The problem I am having is that, in this web interface, emoji are represented as images, rather than characters. Thus, when writing a message containing an emoji to a file, the result is as so:
"Blah blah blah <img height="18px" width="18px" class="emoji adjustedSpriteForMessageDisplay spriteEMOJI sprite-1f612" data-textvalue="%F0%9F%98%92" src="assets/blank.gif">"
Now, from this image, we can get 2 workable values:
The UTF-8 hex value
F09F9892
and the Unicode codepoint (I may be referring to this wrong, I don't know much about encoding).
U+1f612
Now, what I want to do is take either of these values (whichever works better), and write it to the csv file as the character itself. So that, when viewing the csv file in a text editor or what have you, it would appear as
Though I have no idea where to even start with this. Maybe it's as simple as throwing some syntax around the character values, but I haven't been able to get anything from google, because I'm not familiar enough with encoding to know what to Google.

I suggest preprocessing the data as you grab it from the webpage instead of extracting it from the string afterwards.
You can then use decodeURIComponent() to decode the percent-encoded string:
decodeURIComponent('%F0%9F%98%92')
Combine that with jQuery to access the data-textvalue-attribute:
decodeURIComponent($(element).data('textvalue'))
I created a simple example on JSFiddle.
For some reason the emoji doesn't render correctly in the result screen in my browser, but that is a font issue. When looking at the result using a DOM inspector (or copying the text into a different application), the result is shown with a smiley.

CSV file format does not have character encoding information, so Excel usually assumes ASCII.
https://en.wikipedia.org/wiki/Comma-separated_values#General_functionality
Microsoft Excel mangles Diacritics in .csv files?

encoding in server and decode in javascript using escape

I have to encode string in c# and decode it with javascript unescape function.
the javascript unescape is the only option since I am sending the string with get request to some api that using unescape to decoed it.
i tried almost everything
server.urlencode
WebUtility.HtmlEncode
and a lot other encoding! I even tried Uri.EscapeDataString using jscript
Nothing isn't encode like the "escape" function
Any idea How to make it work?
EDIT:
this is my code
string apiGetRequest = String.Format("http://212.00.00.00/Klita?name={0}&city={1}&CREATEBY=test ", Uri.EscapeDataString(name), Uri.EscapeDataString(city));
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(apiGetRequest);
req.GetResponse();

Can you give an example of the string you want do encode and the encoded result?
URLencoding is the correct encoding-type you need. Make sure, you don't double encode your string somewhere in your code.
You might need to use decodeURIComponent instead of unescape, since unescape is not UTF-8 aware, thus might result in in broken string after decoding.
See http://xkr.us/articles/javascript/encode-compare/ for more information.
EDIT:
I don't know much about asp, but it looks like your trying to access the url not with a browser but with your ASP-server-side application. Well, your server does not run any JS code. You will just retrieve the HTML markup and maybe some JS code as a big string. This code would be parsed and executed within a browser but not within ASP.

Change encoding from UTF-8 to ISO-8859-2 in Javascript

I would like to change string encoding from UTF-8 to ISO-8859-2 in Javascript. How can I do it?
I need it because I've designed a widget. User just copies < script > tag from my site and puts it on his. This script creates div and puts into div widget contents with text.
If target website is in UTF-8 encoding - it works fine. But when it is in ISO-8859-2 than text that is encoded in UTF-8 is displayed on site with ISO-8859-2 and as a result I see trash.

Instead of using e.g. "ĉ" in your JavaScript code, use Unicode escapes such as "\u0109".

If you're in control of the output, you can replace all special characters with unicode escapes (e.g. \u00e4 for ä). The browser can interpret it correctly regardless of document encoding.
The easiest way to do this would be to put the string into a JSON encoder. Both PHP's and Ruby's does that. Don't know about other implementations though.
Another solution that might work is to add charset="utf-8" to the <script> tag.

I suppose you just need to convert your wdiget from UTF-8 to ISO-8859-2 and provide 2 versions of script.

We Keep Coding

JavaScript is the programming language of the Web.

Converting Chinese UTF8 to readable string - javascript

Neither unescape nor encodeURIComponent (which deal with percent-encoding) will help you with a XML character entity. You just want to parse the XML file! Accessing the DOM then will yield the expected string.

Related

Problem with displaying Japanese text strings using JavaScript code in Pavlovia

JavaScript - HTML string erring with unexpected EOF

Working with characters based on their UTF-8 hex codes

encoding in server and decode in javascript using escape

Change encoding from UTF-8 to ISO-8859-2 in Javascript

Categories

Resources