french chars html - javascript - javascript

I have an html page were i can fill in some text and send (with javascript) this to an sql-database.
On my pc, everything works fine, but on another one (a french windows), it doesn't save my chars correctly.
french chars like é, è, â,.. were saved as 'É', or something like that.
I googled a lot but still did not found any solution, i'm also not able to reproduce the problem on my own pc..

"É" occurs when a character encoded in utf-8 (2 bytes) is read as latin (1 byte). The problem can be on the client side (e.g. by the use of escape) or on the server side (wrong parsing of the form's POST data, database encoding).

Make sure that your html pages encoding is set to something like UTF-8, UTF-16, etc... Also make sure that your strings are escaped properly in javascript.

You need to encode the file in ANSI. I do this my self. For example in notepad 2 you would click File->Encoding->ANSI and then save.

Related

Working with characters based on their UTF-8 hex codes

I'm working on something that will read a user's text messages and export them to a csv file, which they can then download. The messages are being retrieved from a third-party web interface—I am essentially using js to grab the html of each message and compiling it as needed. The content of each message is added to a variable which, once all message are gathered, is given to a new Blob, which is then downloaded.
The problem I am having is that, in this web interface, emoji are represented as images, rather than characters. Thus, when writing a message containing an emoji to a file, the result is as so:
"Blah blah blah <img height="18px" width="18px" class="emoji adjustedSpriteForMessageDisplay spriteEMOJI sprite-1f612" data-textvalue="%F0%9F%98%92" src="assets/blank.gif">"
Now, from this image, we can get 2 workable values:
The UTF-8 hex value
F09F9892
and the Unicode codepoint (I may be referring to this wrong, I don't know much about encoding).
U+1f612
Now, what I want to do is take either of these values (whichever works better), and write it to the csv file as the character itself. So that, when viewing the csv file in a text editor or what have you, it would appear as
Though I have no idea where to even start with this. Maybe it's as simple as throwing some syntax around the character values, but I haven't been able to get anything from google, because I'm not familiar enough with encoding to know what to Google.
I suggest preprocessing the data as you grab it from the webpage instead of extracting it from the string afterwards.
You can then use decodeURIComponent() to decode the percent-encoded string:
decodeURIComponent('%F0%9F%98%92')
Combine that with jQuery to access the data-textvalue-attribute:
decodeURIComponent($(element).data('textvalue'))
I created a simple example on JSFiddle.
For some reason the emoji doesn't render correctly in the result screen in my browser, but that is a font issue. When looking at the result using a DOM inspector (or copying the text into a different application), the result is shown with a smiley.
CSV file format does not have character encoding information, so Excel usually assumes ASCII.
https://en.wikipedia.org/wiki/Comma-separated_values#General_functionality
Microsoft Excel mangles Diacritics in .csv files?

Sending HTML from servlet to js application corrupts data in Firefox

I'm sending some HTML code back to an iframe using a java servlet and an iframe on the js side. I'm actually just parsing some json from the HTML code by encasing it in a single <div>, using jQuery, but the string that gets sent back sometimes has added text.
If the text that gets added has a word with enclosing angle brackets, Firefox will automatically close the brackets for me, which I don't want.
For example, if I send this:
<div>{"location":[],"columns":["<case expression>","headers"]}</div>
Firefox (and ONLY Firefox so far, not IE or chrome) will receive it as this:
<div>{"location":[],"columns":["<case expression>","headers"]}</case></div>
which screws up my parsing. I'm sending the text with the Content-Type of text/html, which I think might be causing the issue. I've tried Content-Type of application/json, but it won't write html to the iframe unless I'm using the text/html.
Can someone help me with a solution? I'm willing to try a different method of sending the data if it's not too extensive.
In order to keep the browser from interpreting HTML meta-characters as such, so that your "<" and ">" characters end up as part of the text, you can "escape" them as HTML entities. The "<" character is < and the ">" is >. People generally also quote the ampersand ("&") as & but I think browsers are generally a little smarter about that.
Edit by OP for code solution:
I used StringEscapeUtils.escapeHTML(), which worked perfectly. Thanks!

Sending iso-8859-1 via AJAX POST request?

first of all it is a userscript and I can't change the server-side encoding.
My problem is that when using encodeURIComponent() for encoding POST params (later sent via xhr.setRequestHeader), the characters are encoded in utf-8, but the server needs to receive iso-8859-1 data. Is there an alternative to encodeURIComponent() that would encode in iso-8859-1 ?
.
To make sure you understand, here is an exemple:
A classic form on the website send é like this: yournewmessage:%E9
Ajax via xhr.send('yournewmessage='+encodeURIComponent('é')) sends this: yournewmessage:%E9%80%80
The server needs the former. Thanks to anyone who can help me.
So, I’ve since figured out this problem. What I did was searching for an equivalence between utf-8 and iso-8859-1, what I found was between utf-8 and cp1252 (Windows-1252) so there are two conversions, utf-8 to cp1252 and cp1252 to iso-8859-1 (these two having a lot of similarities)
http://pastebin.com/jTDqR2PQ
Ugly code, comments left in French, and unelegant solution, but I feel bad seeing this question unansered while I actually found a solution that works.

Spanish special characters like á ó while displaying shows jumbled or garbage value

I have a Spanish validation message which I'm trying to display using my JavaScript.
And all the special characters like above gets changed into & #243;.
And it is only happening when I'm using JavaScript, there are couple of more validation messages in Spanish which I'm displaying through server side and they are fine.
errorString = "<%:Validation.xyz %>";
I'm trying to get from resource file.
Can some one think of quick work around?
What you call garbage is actually but the HTML encoded value of the corresponding character and is there to prevent you from XSS. The encoding happens because you are using <%: which automatically HTML encodes the string but this shouldn't be a problem for your javascript. Example:
var text = 'hello &#243';
document.getElementById('foo').innerHTML = text;
works just fine and displays hello ó in the corresponding DOM element.
Check if you saved your file with UTF-8 encoding (just in case). It happens that it goes into TFS without UTF8 BOM and then mess can happen on client side.

Special characters in my javascript variables keep showing up as �, how do i prevent this?

I have a javascript script which is calling a php page to supply an ajax form with suggestions. The suggestions are returned fine by the php page, but for some reason, when i set the responsetext of the javascript object request as an element in my HTML page, all the special characters (ie. á or ã) show up as this question mark. Is there a function II must run on the response text of the request to make sure these are read properly?
Thanks.
If you are not serving your HTML pages as UTF-8, the browser will guess an encoding, typically a single-byte Windows codepage depending on the user's locale.
But this doesn't happen for AJAX. With XMLHttpRequest, unless you specifically state an encoding in the Content-Type: ...; charset= parameter, the browser will treat it as UTF-8. That means if you are actually serving Windows code page 1252 (Western European) content, you will get an invalid UTF-8 sequence and consequent question mark.
You don't want to be using a non-UTF-8 encoding! Make sure you are using UTF-8 throughout your application. Serve all your pages with Content-Type: text/html; charset=utf-8, store your data in UTF-8 tables, use mysql_set_charset() to choose UTF-8, etc.
In any case consider passing AJAX responses using JSON. The function json_encode() will create a JSON string that uses JavaScript escape sequences for non-ASCII characters, which avoids any problem of encoding mismatch. Also this is easier to extend to add functionality than returning raw HTML.
I would try, in your php script, to encode everything as html entities.
This can be easily tested by doing something like this before returning the results to javascript:
$results = htmlentities($htmlstring);
There's also the htmlspecialchars function you might try.
More about this here:
http://php.net/manual/en/function.htmlentities.php

Categories