Javascript with UTF-8 encoding (in PHP form) - javascript

I have a web form which submits the comment to a predefined mail when the userfills it and clicks submit button, this is done using my Send.php file which includes all the codes needed to submit the comment with correct content and encoded with utf-8. everything works fine, however, after this procedure I included in the php file the JS code which popups the Alert windows, saying that the mail is sent, and after clicking OK button it redirects to the homepage.
this is a code:
echo '<script type="text/javascript">;
alert("კომენტარი გაიგზავნა წარმატებით, გმადლობთ");
</script>';
echo '<script type="text/javascript">
window.location = "http://g-projects.net78.net/index.html";
</script>'
However, because the alert text is in foreign language I get various unreadable symbols. I need to use utf-8 encoding, but how can I integrate it with this code? note that this code is called in PHP file.

Tell your text editor to edit your source file in utf-8.
Note that "კომენტარი გაიგზავნა წარმატებით, გმადლობთ" is a literal, so it is embedded in your php source file.
For example, in Notepad++ set: Encoding | Encode UTF-8.
(It already seems you have your html thinking it is outputting UTF-8, I hope.)

The system font for displaying message boxes is Segoe UI or Tahoma, by default, on Windows systems and Deja-Vu Sans or simply "Sans" on Linux systems. These fonts do not have the codes or the characters necessary to support the display of Indic text. In order to display text in that language your system's default UI font needs to have those characters encoded.
Also, just an aside: UTF-8 documents can be preambled, which means the byte-order-mark is given which tells the text processor to explicitly scan for a specific encoding. For UTF-8 three bytes EF BB BF signify text encoding. Do NOT use the preamble when saving PHP files. If you read the Unicode spec carefully, UTF-8 is designed not to be used with a preamble. But if you must add it, add it as an 'echo' from the PHP script before any other output, but do not start the PHP script file that way. (Just in case you run across this in your Unicode travels)

Related

Javascript failing to show utf-8 characters

So I'm trying to show utf-8 characters coming from JavaScript.
I should have it all:
header
<meta charset="utf-8">
include js
<script type="text/javascript" src="x.js" charset="utf-8"></script>
File x.js is saved as UTF-8 (and also the other files)
It works with all my PHP files, just not when it comes from a simple alert in JavaScript.
alert("Prénom doit être rempli");
Instead the famous '?' characters are showed in the alert box.
Anything I've forgotten?
Here what you need to do: open your file in notepad and save it again (save as) and this time select UTF-8 from save-file-dialog-box. Your issue will be solved
From the spec:
The charset attribute gives the character encoding of the external script resource. The attribute must not be specified if the src attribute is not present. If the attribute is set, its value must be an ASCII case-insensitive match for one of the labels of an encoding, and must specify the same encoding as the charset parameter of the Content-Type metadata of the external file, if any.
(My emphasis.)
So you need to ensure that your server is sending the correct Content-Type header — either with no charset, or with charset=utf-8.
If your server is already sending the charset as part of the Content-Type, that's a good thing: Just remove the charset attribute from the script tag.

Handle non-ASCII filenames in XHR uploading

I have pretty standard javascript/XHR drag-and-drop file upload code, and just came across an unfortunate real-world snag. I have a file on my (Win7) desktop called "TEST-é-TEST.txt". In Chrome (30.0.1599.69), it arrives at the server with filename in UTF-8, which works out fine. In Firefox (24.0), the filename seems mangled when it arrives at the server.
I didn't trust what Firebug/Chrome might be telling me about the encoding, so I examined the hex of the request packet. Everything else is the same except the non-ASCII character is indeed being encoded differently in the two browsers:
Chrome: C3 A9 (this is the expected UTF-8 for that character)
Firefox: EF BF BD (UTF-8 "replacement character"?!)
Is this a Firefox bug? I tried renaming the file, replacing the é with ó, and the Firefox hex was the same... so such a mangle really seems like a browser bug. (If Firefox were confusedly sending along ISO-8859-1, for example, without touching it, I'd see an E9 byte, and I could handle that on the server side, but it shouldn't mangle it!)
Regardless of the reason, is there something I can do on either the client or server sides to correct for this? If a replacement character is indeed being sent to the server, then it would seem unrecoverable there, so I almost certainly need to do it on the client side.
And yes, the page on which this code exists has charset=utf-8, and Firefox confirms that it perceives the page as UTF-8 under View>Character Encoding.
Furthermore, if I dump the filename to console.log, it appears fine there--I guess it's just getting mangled in/after setRequestHeader("X-File-Name",file.name).
Finally, it would seem that the value passed to setRequestHeader() should be able to have code points up to U+00FF, so U+00E9 (é) and U+00F3 (ó) shouldn't cause a problem, though higher codes could trigger a SyntaxError: http://www.w3.org/TR/XMLHttpRequest2/#the-setrequestheader-method
Thanks so much for Boris's help. Here's a summary of what I discovered through our interactions in comments:
1) The core issue is that HTTP Request headers are supposed to be ISO-8859-1. Prior versions of Chrome and Firefox both passed along UTF-8 strings unchanged in setRequestHeader() calls. This changed in FF24.0 (and apparently will be changing in Chrome soon too), such that FF drops high bytes and passes along only the low byte for each character. In the example I gave in the question, this was recoverable, but characters with higher codes could be mangled irretrievably.
2) One workaround would be to encode on the client side, e.g.:
setRequestHeader('X-File-Name',encodeURIComponent(filename))
and then decode on the server side, e.g. in PHP:
$filename=rawurldecode($_SERVER['HTTP_X_FILE_NAME'])
3) Note that this is only problematic because my ajax file upload approach is to send the raw file data in the request body, so I need to send the filename via a custom request header (as shown in many tutorials online). If I used FormData instead, I wouldn't have to worry about this. I believe if you want solid, standards-based unicode filename support, you should use FormData and not the request header approach.

Login with yahoo using OpenID for Multilingual site

I am integrating yahoo open ID for my site. My site is running for different languages( en, jp, cn etc.). When I am trying to logged in from english site with yahoo opend ID then its fine but when the same this I tried from japanese or chinese site then it's not redirecting me to yahoo open id login.
Each time I am getting the below error from javascript
Error: The character encoding of the plain text document was not
declared. The document will render with garbled text in some browser
configurations if the document contains characters from outside the
US-ASCII range. The character encoding of the file needs to be
declared in the transfer protocol or file needs to use a byte order
mark as an encoding signature. Source File:
http://uatstorefrontjpcr.mobi-book.com/ReturnFromSocial/LogOnYahoo
Line: 0
Can anyone suggest what to do.
I have used SocialAuth-net.dll for this purpose. I have set all required wrapper in web.config. Same coding is okay with Google and facebook open ID.
Your web server is probably sending back a response with a Content-Type of text/plain. When a web browser receives a response with that content type, it doesn't know what encoding should be used to decode it; since you haven't told it how to decode it, different browsers might choose different ways.
The solution is to provide an explicit encoding. For example, if you know that the text is UTF-8 encoded, then you could provide it in a header like so:
Content-Type: text/plain; charset=UTF-8
According to an informational page from the W3C, you can get ASP.Net to include that bit in the header using Response.ContentEncoding. Again using UTF-8 as an example, you can set it like so:
Response.ContentEncoding = Encoding.UTF8;

Google Chrome script debug feature shows corrupted HTML

Occasionally, when I'm debugging with Google Chrome, the script and html page I'm trying to debug shows up with a corrupted string instead of the actual javascript and HTML...
If I change the page name (change the case for some of the characters in the page's name), the page will come up correctly. But if I refresh, the page will return corrupted like above.
Clearing the Chrome cache doesn't help.
I'm using ASP.NET as the backend. This string does look suspiciously familiar to a View State hash.
That's not a corrupted string - it's a Base64 encoded version of the HTML. As to why it changes when you change the name of the file - that I'm not sure about - maybe some sort of server setting?
Typically, an encoded string may be presented in order to obfuscate the code.

Javascript Charset problem

I want to read a file from my server with javascript and display it's content in a html page.
The file is in ANSI charset, and it has romanian characters.. I want to display those characters in the way they are :D not in different black symbols..
So I think my problem is the charset.. I have a get request that takes the content of the file, like this:
function IO(U, V) {//LA MOD String Version. A tiny ajax library. by, DanDavis
var X = !window.XMLHttpRequest ? new ActiveXObject('Microsoft.XMLHTTP') : new XMLHttpRequest();
X.open(V ? 'PUT' : 'GET', U, false );
X.setRequestHeader('Content-Type', 'Charset=UTF-8');
X.send(V ? V : '');return X.responseText;}
As far as I know the romanian characters are included in UTF-8 charset so I set the charset of the request header to utf-8.. the file is in utf-8 format and I have the meta tag that tells the browser that the page has utf-8 content..
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
So if I query the server the direct file, the browser shows me the romanian characters but if I display the content of the page through this script, I see only symbols instead of characters..
So what I am doing wrong?
Thank you!
PS: I want this to work on Firefox at least not necessarily in all browsers..
While my initial assumption was the same as T.J. Crowder's, a quick chat established that the OP uses some hosting service and cannot easily change the Content-Type headers.
The files were sent as text/plain or text/html without any Charset paramter, hence the browser interprets them as UTF-8 (which is the default).
So saving the files in UTF-8 (instead of ANSI/Windows-1252) did the trick.
You need to ensure that the HTTP response returning the file data has the correct charset identified on it. You have to do that server-side, I don't think you can force it from the client. (When you set the content type in the request header, you're setting the content type of the request, not the response.) So for instance, the response header from the server would be along the lines of:
Content-Type: text/plain; charset=windows-1252
...if by "ANSI" you mean the Windows-1252 charset. That should tell the browser what it needs to do to decode the response text correctly before handing it to the JavaScript layer.
One problem, though: As far as I can tell, Windows-1252 doesn't have the full Romanian alphabet. So if you're seeing characters like Ș, ș, Ţ, ţ, etc., that suggests the source text is not in Windows-1252. Now, perhaps it's okay to drop the diacriticals on those in Romanian (I wouldn't know) and so if your source text just uses S and T instead of Ș and Ţ, etc., it could still be in Windows-1252. Or it may be ISO-8859 or ISO-8859-2 (both of which drop some diacriticals) or possibly ISO-8859-16 (which has full Romanian support). Details here.
So the first thing to do is determine what character set the source text is actually in.

Categories