Textbox with UTF-8 input

Textbox with UTF-8 input - javascript

I am in the process of internationalizing a website, and I need to allow for a user to input Chinese characters into a search textbox. This text will end up being analyzed on the backend, so I need to ensure that I can accept the text encoded as UTF-8 via javascript (everything is done through AJAX). For testing purposes, I have an alert box being popped up with the text I enter every time a search is done, and when some Chinese text is entered in, I get 'undefined' returned. With English the word I entered is returned back, as expected. How can I ensure that all text in the textbox is encoded with UTF-8?

Make sure of the following:
Your HTML and JS documents are UTF-8 encoded.
You are sending a Content-type header with appropriate (UTF-8) value for both your HTML and JS files.
The meta tag charset defined in your HTML is also, appropriately, UTF-8.
Avoid using the built-in escape method; it is not UTF-8 (multibyte character) aware.

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
html5
<meta charset="UTF-8" />
this will encode the entire page.

Related

Display UTF8 and ISO-8859-1 in select box HTML

Hello, Folks!
All my script files are utf8, the server responses are utf8, the db collation.. quite everything.
I have a JSON data that populates the options of a select box. When I fix ISO I get in trouble in UTF8, or vice versa.
The point is: How can select option display both ISO-8859-1 and UTF-8 special chars?
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<select id="values" name="values">
<option>VALÊNCIA 18</option>
<option>BAHRAIN â€«Ø§Ù„Ø¨ØØ±ÙŠÙ†â€¬â€Ž 40</option>
</select>
</body>
</html>

I think it is not possible. But as in http://www.w3schools.com/tags/att_a_charset.asp this is possible but not supported by any popular browser, and it is obsolete in HTML5, so you should not use it.
As an alternate, you can convert non-Unicode text to UTF-8 using server-side script (PHP, ASP.net,...).
PHP ----> UTFString= utf8_encode ( ISO_String)
ASP.NET ----> utf8_encode= Encoding.GetEncoding(28591).GetBytes(ISO_String);
https://msdn.microsoft.com/en-us/library/zs0350fy%28v=vs.90%29.aspx
Hopefully you will find this helpful

[SOLVED] If any one had the same problem as me and the charset was already correct, simply do this:
Copy all the code inside the .html file.
Open notepad (or any basic text editor) and paste the code.
Go "File -> Save As"
Enter you file name "example.html" (Select "Save as type: All Files (.)")
Select Encoding as UTF-8
Hit Save and you can now delete your old .html file and the encoding should be fixed

Remove ms word 2013 formatting from text

How do I escape AutoFormatting code from MS Word 2013 copied contents before persistance?
For instance, on persisting ‘this should be ok’ becomes ���this should be ok��� when rendered on the screen.
On the server side it shows as âPADSOSthis should be okâPADSGCI.
I had to disable autoformat feature in word to resolve this issue. I tried both ISO-8859-1 and UTF-8 encoding without luck. Its a java based web application.
I am setting the charset type as UTF-8 in the html file.
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">

Dynamically load the charset UTF-8 and ISO-8859-1 from Javascript

I tried charset UTF-8 to display the ä, it displayed some square box.
Also i tried with charset ISO-8859-1 to display the ä, it diplayed as ä. (which is correct)
But When combine the above both charset within javascript condition, its not working properly. Refer below code,
<html>
<head>
<script type="text/javascript">
var lang = 'German';
function f(){
if(lang != 'SomeOtherLanguage'){
//here code will execute. And page should display square box. Instead of square box, ä is displayed. Which is wrong. I cant able to find reason.
metaTag = '<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>';
}
else
metaTag = '<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>';
document.getElementsByTagName('head')[0].innerHTML += metaTag;
}
</script>
</head>
<body onload="f()">
<h1>Latin letter : ä </h1> <br />
</body>
</html>

You can't, the character set is established by the parser, which needs to parse Javascript in order to generate that meta DOM.
You still can use only one character set and convert the data.

What you are attempting to do will never work.
If the raw bytes of your HTML are not encoded as UTF-8 to begin with, you can't claim UTF-8, in a <meta> tag, or an HTTP Content-Type header. You would be lying to the browser/client, and is why you get bad results.
Your code will "work" only when your <meta> tag claims ISO-8859-1 (and there is no Content-Type header to override that) if your HTML is actually encoded in ISO-8859-1. In several (but not all) of the ISO-8859-X charsets, including ISO-8859-1, ä is encoded as byte 0xE4, so your code "works" when claiming ISO-8859-1 if byte 0x34 is present in the HTML's raw data.
In UTF-8, ä is encoded as bytes 0xC3 0xA4 instead. If your HTML contains byte 0xE4, but you claim UTF-8, you get bad results (0xE4 is not a valid byte in UTF-8).
So, your <meta> tag (and HTTP Content-Type header) needs to claim a charset that actually matches the real encoding of the HTML's raw bytes.
If your HTTP server is serving a static HTML file, the file is encoded in a specific charset when the HTML is saved to file. That same charset needs to be specified statically in the <meta> tag (and preferably also in the HTTP Content-Type header). If your HTTP server is generating the HTML dynamically, it needs to encode the HTML in a specific charset for transmission, so it needs to specify that same charset in the generated <meta> tag (and Content-Type header).
In other words, stop trying to lie to the browser/client. Tell the truth, then you won't run into this problem anymore.

how to inject chinese characters using javascript?

I have this code but it only works using english characters
$( "input[name*='Name']" ).attr("placeholder","姓名");
My web page displays other chinese characters just fine and if I change the chinese characters to "Name", it starts working again just fine. Is there something special I have to do here?
In the header, I do see this as the encoding...
<meta http-equiv="content-type" content="text/html; charset=utf-8">

If the script is inline (in the HTML file), then it's using the encoding of the HTML file and you won't have an issue.
If the script is loaded from another file:
Your text editor must save the file in an appropriate encoding such as utf-8 (it's probably doing this already if you're able to save it, close it, and reopen it with the characters still displaying correctly)
Your web server must serve the file with the right http header specifying that it's utf-8 (or whatever the enocding happens to be, as determined by your text editor settings). Here's an example for how to do this with php: Set HTTP header to UTF-8 using PHP
If you can't have your webserver do this, try to set the charset attribute on your script tag (e.g. <script type="text/javascript" charset="utf-8" src="..."></script> > I tried to see what the spec said should happen in the case of mismatching charsets defined by the tag and the http headers, but couldn't find anything concrete, so just test and see if it helps.
If that doesn't work, place your script inline

Javascript does not display non-standard characters

So this is the code:
<script type="text/javascript" charset="UTF-8">
function loadScript("http://www.qppstudio.net/individualdays/noscroll/2012-08-15.js") {
document.write('<script type="text/javascript" charset="UTF-8" src="', url, '">', '<', '/', 'script>');
}
</script>
Included in page header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Page displays unicode characters without any problems expect for the part where posted javascript writes to page. Is the script changed somehow while being downloaded between different domains?

Probably. This is the first time that I've seen the charset attribute. I'm not sure what you try to achieve with it. If the server sends you file with iso-latin-1 encoding but you tell the browser the file has UTF-8 encoding, what should happen? The browser won't convert the file from one encoding to the other; the best that you could hope for is that the browser tries to interpret the byte stream as UTF-8 which will not work.
The correct solution is to configure the server to send the files with the correct encoding in the HTTP response header. The browser will look for this information and read the byte stream with the encoding specified there.
Don't forget to actually send the bytes in the correct encoding! This means: File reading, copying to the output stream + setting the encoding headers, everything must work perfectly or you will have odd bugs.

We Keep Coding

JavaScript is the programming language of the Web.