Why are non-ASCII characters displayed as weird symbols? - javascript

I have two cases here:
My database contains a lot of information which I want to fetch to the page. Some of this information is name which contain non-ASCII characters like Uwe Rülke
- Old solution which works well:
I fetch the data from the database and populate the page directly from a VB while loop. In this case all the chars are displaying correctly Uwe Rülke.
- New solution which doesn't work properly:
The VB While loop doesn't throw the data directly to the page, rather in a JavaScript strings (to enhance performance by not calling the database each now and then). But when I used the information stored in the JavaScript variables, I got something like this: Uwe R�lke.
In both cases, the page's encoding is:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Where did I go wrong?
This is the code used to fetch (from the database) and then save to JavaScript strings.
I'm using AJAX LOAD from a page called ISEquery to build a specific request and query it from the database. It is used to either fetch data as an Excel file or as plain HTML. At this point the characters are well represented.
Then the magic happens, and the characters get mis-represented. I checked it in the exctractFields function:
$("<div></div>").load("ISEquery.asp?isExcel=0&" + info, function(){
// Extracting the fields into an array
var rows = "";
var options = "";
$(this).children().each(function(index){
var fieldsArray = exctractFields($(this).html());
rows += createISELine(fieldsArray);
options += createISELine_ComboBox(fieldsArray);
});
});

The � means that you used a character which can't be represented properly.
Somewhere between the server and the client, you need to encode the string data properly. I don't know how you transfer the data from the server to the client (generate JavaScript, Ajax, and GET requests). It's hard to say how to fix this.
But what you need to do: For every step, you must make sure that you know what the current encoding of the data is and what the recipient expects.
For example, if you generate inline JavaScript code in an HTML page, then the string value must be encoded with the same encoding as the page (iso-8859-1). If you use Ajax, then usually you have to use UTF-8.

I followed the string from server to the page and found that it is gets misrepresented after the AJAX LOAD, so I found this answer which resolved my problem. Although I had to use the charset="iso-8859-1" for it to work rather than charset="UTF-8".
So the final answer is:
-Encoding in the HTML page:
<meta http-equiv="Content-Type" content="text/html"; charset="iso-8859-1">
-Encoding the Ajax data:
$.ajaxSetup({
'beforeSend' : function(xhr) {
xhr.overrideMimeType('text/html; charset=iso-8859-1');
},
});
And now characters are displayed correctly.
(The lead was from Aaron Digulla's answer.)

The JavaScript default encoding for strings is UTF-16 (16 bits) while ISO 8859-1 is 8 bits.
What is the default JavaScript character encoding?
I think you can use encodeURI() to convert your special characters to ASCII characters and afterwards you can decode it with decodeURI():
JavaScript encodeURI() Function (W3Schools)

Related

python django json.dumps() and javascript cookies [duplicate]

I'm trying to encode an object in a Python script and set it as a cookie so I can read it with client-side JavaScript.
I've run into problems every way I've tried to do this. Generally, the cookie is formatted in a way that makes JSON.parse() break.
My current script:
cookie = Cookie.SimpleCookie()
data = {"name": "Janet", "if_nasty": "Ms. Jackson"}
cookie['test'] = json.dumps(data)
self.response.headers.add_header("Set-Cookie", cookie.output(header=''))
... which results in
test="{\"name\": \"janet\"\054 \"if_nasty\": \"Ms. Jackson\"}"
on the client.
I don't really want to introduce a hack-y solution to replace instances of commas when they appear. Any ideas how I can pass complex data structures (both by setting and reading cookies) with Python?
I also wanted to read a cookie (that had been set on the server) on the client. I worked around the issue by base64 encoding the JSON String, however there are a few small gotchas involved with this approach as well.
1: Base64 strings end with 0-2 equal signs, and these were being converted into the string \075. My approach is to revert those characters into equal characters on the client.
2: The base64 string is being enclosed in double quote characters in the cookie. I remove these on the client.
Server:
nav_json = json.dumps(nav_data)
nav_b64=base64.b64encode(nav_json)
self.response.set_cookie('nav_data', nav_b64)
Client:
var user_data_base64= $.cookie('nav_data');
// remove quotes from around the string
user_data_base64 = user_data_base64.replace(/"/g,"");
// replace \075 with =
user_data_base64 = user_data_base64.replace(/\\075/g,"=");
var user_data_encoded=$.base64.decode( user_data_base64 );
var user_data = $.parseJSON(user_data_encoded);
I am using 2 jquery plugins here:
https://github.com/carlo/jquery-base64
and
https://github.com/carhartl/jquery-cookie
Note: I consider this a hack: It would be better to re-implement the python code that encodes the cookie in javascript, however this also has the downside that you would need to notice and port and changes to that code.
I have now moved to a solution where I use a small html file to set the cookie on the client side and then redirect to the actual page requested. Here is a snippet from the JINJA2 template that I am using:
<script type="text/javascript">
var nav_data='{% autoescape false %}{{nav_data}}{% endautoescape %}';
$.cookie('nav_data', nav_data, { path: '/' });
window.location.replace("{{next}}")
</script>
Note 2: Cookies are not ideal for my use case and I will probably move on to Session or Local Storage to reduce network overhead (although my nav_data is quite small - a dozen characters or so.)
On the Python side:
json.dumps the string
escape spaces - just call .replace(' ', '%20')
Call urllib.parse.quote_plus() then write the string to the cookie
On the JavaScript side:
read the cookie
pass it through decodeURIComponent()
JSON.parse it
This seems to be the cleanest way I've found.
not sure a cookie is the best way of doing this? see the getting started guide for info rendering data to the client

How to decode characters utf-8 in iso 8859-1 with json in javascript?

In php I select data and display in php with last code
utf8_decode(json_encode($resultat, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES));
and get this result :
[ { "NUM_ASSU": "321-7777777-4", "NOM_ASSU": "MÀJIOTSOP TIAYA VALERIE" } ]
with correct data and accented characters.
but when I am back in Javascript coded with ExtJs framework it displays nothing.
this is the code:
var jsonData = Ext.util.JSON.decode(result.responseText, true);
jsonData appears empty.
Your server application has to communicate with the client in the correct encoding. The page encoding is set usually in page header, e.g.
<meta http-equiv="Content-Type" content="text/html; charset=windows-1250"/>
In this case you have to encode your server response in windows-1250. But I'd rather stick with utf-8 as other people say.
You can check your current page encoding like this:
alert(document.characterSet);
Side note: can you inspect result.responseText?

encodeURIComponent() adds too many characters

Either my encodeURICOmponent() in java script is adding to many characters or I don't understand exactly how it works.
I am using this line of code:
var encoded = encodeURIComponent(searchTerm);
When I look in the chrome inspect element after passing Abt 12 it shows the encoded variable added to the URL as this:
Abt%252012
I would think it should be this:
Abt%12
So when I pass it through PHP I get really odd results when actually conducting the search.
Form the comments, it looks like you are sending the value to server via jQuery ajax request, then it will take care of parameter encoding, so there is no need for you to encode it again.
$.get("website.php", { p: searchTerm, })

Submitting a form using ajax causes strange behavior with the Charset

I am having an issue here with a form being posted via Ajax.
Here is my jQuery code:
<script type="text/javascript" language="javascript" charset="utf-8">
$(document).ready(function(){
$("#newdata").submit(function(event) {
event.preventDefault();
$.post( "save.asp", $("#newdata").serialize() ,
function( data ) {
});
});
});
</script>
The problem is when I submit data with special characters like ® or © it saves it with an  in front of it. But if i submit without the jQuery/Ajax it doesn't format the data with this  character. Does anyone know why I'm having this problem?
http://jsfiddle.net/aTS67/2/
The problem is with the .serialize() method (it is not really a problem, it should do this). As you can see from my demo above when you use the method it encodes the special characters (as it should). You have two options:
Decode the url-encoded string on the server-side. You didn't mention what technology you are using but their is likely a function that will do this for you. For PHP for instance you may use htmlspecialchars_decode("YOUR ENCODED STRING"); but there will be something similiar for all server-side languages (best option)
Instead of using .serialize() you can build the string sent to the server-side manually. You can replace $("#newdata").serialize() with an object literal of key value pairs:
{"InputId1" : $("#InputId1").val(), "InputId2" : $("#InputId2").val()}
Edit
Just saw the extension on your file is ASP so you are using classic asp. I am not sure what the syntax is to decode but I am sure it is easy to find.
just in case someone else has this problem, the resolution is to have <%#codepage=65001%> at the top of the receiving page. jquery.serialize() serializes using utf8 and this basically puts the receiving page in the correct codepage.
NOTE: This is for classic asp.
Source: http://api.jquery.com/serialize/

How to encode a value that is rendered to page and finally used in URL?

I have a script that is rendered to an html page as a part of a tracking solution (etracker).
It is something like this:
<script>
var et_cart= 'nice shoes,10.0,100045;nice jacket,20.00,29887';
</script>
This will be transmitted to the server of the tracking solution by some javascript that I don't control. It will end up as 2 items. The items are separated by a semicolon in the source (after '100045').
I obviously need to Html-encode and Javascript-encode the values that will be rendered.
I first Html-encode and after that remove single quotes.
This works, but I have an issue with special characters in french and german e.g. umlaut (ü, ä...).
They render something like {. The output of the script when using lars ümlaut as the article is:
<script>
var et_cart= 'lars {mlaut,10.0,100045;nice jacket,20.00,29887';
</script>
The semicolon is evaluated as an item separator by the tracking solution.
The support of the tracking solution told me to url-encode the values. Can this work?
I guess URL-encoding doesn't stop any xss-atacks. Is it ok to first url-encode and html-encode, then javascript-encode after it?
The values only need to be URL encoded to transmit to the client. If the information is being displayed by the client it's their responsibility to ensure they are protecting themselves against xss attacks, not yours.
<script>
var et_cart= 'lars+%FCmlaut%2C10.0%2C100045%3Bnice+jacket%2C20.00%2C29887';
</script>

Categories