Javascript plain text charecter encoding

Javascript plain text charecter encoding - javascript

I am working on a Javascript project that uses AngularJS. When I get data with http request, all characters are appearing well. For example, a downloaded string with ajax is "räksmörgås", when written to the console as plain text, is appearing with ugly charecters.
console.log("räksmörgås") results into this: r�ksm�rg�s
Is this a file type encoding problem? Or are JavaScript strings always UTF-16 causing this problem?

I think the problem is that you are not using the correct charset. For Swedish try to change the character encoding to iso-8859-1 or windows-1252. I suppose that you are sending the server response without the correct headers and the browser interprets it as UTF-8 as the default charset.
So maybe changing the header charset as below will resolv the issue:
Content-Type: text/plain; charset=windows-1252 // or
Content-Type: text/plain; charset=iso-8859-1
Another solution would be to declare your script tag with charset, this way forcing Js to handle the characters to be interpreted with a specific encoding.
<script src="yourscritp.js" charset="UTF-8"></script>

Related

How to send image to backend to store in mysql in js? [duplicate]

What does enctype='multipart/form-data' mean in an HTML form and when should we use it?

When you make a POST request, you have to encode the data that forms the body of the request in some way.
HTML forms provide three methods of encoding.
application/x-www-form-urlencoded (the default)
multipart/form-data
text/plain
Work was being done on adding application/json, but that has been abandoned.
(Other encodings are possible with HTTP requests generated using other means than an HTML form submission. JSON is a common format for use with web services and some still use SOAP.)
The specifics of the formats don't matter to most developers. The important points are:
Never use text/plain.
When you are writing client-side code:
use multipart/form-data when your form includes any <input type="file"> elements
otherwise you can use multipart/form-data or application/x-www-form-urlencoded but application/x-www-form-urlencoded will be more efficient
When you are writing server-side code:
Use a prewritten form handling library
Most (such as Perl's CGI->param or the one exposed by PHP's $_POST superglobal) will take care of the differences for you. Don't bother trying to parse the raw input received by the server.
Sometimes you will find a library that can't handle both formats. Node.js's most popular library for handling form data is body-parser which cannot handle multipart requests (but has documentation that recommends some alternatives which can).
If you are writing (or debugging) a library for parsing or generating the raw data, then you need to start worrying about the format. You might also want to know about it for interest's sake.
application/x-www-form-urlencoded is more or less the same as a query string on the end of the URL.
multipart/form-data is significantly more complicated but it allows entire files to be included in the data. An example of the result can be found in the HTML 4 specification.
text/plain is introduced by HTML 5 and is useful only for debugging — from the spec: They are not reliably interpretable by computer — and I'd argue that the others combined with tools (like the Network Panel in the developer tools of most browsers) are better for that).

when should we use it?
Quentin's answer is right: use multipart/form-data if the form contains a file upload, and application/x-www-form-urlencoded otherwise, which is the default if you omit enctype.
I'm going to:
add some more HTML5 references
explain why he is right with a form submit example
HTML5 references
There are three possibilities for enctype:
application/x-www-form-urlencoded
multipart/form-data (spec points to RFC7578)
text/plain. This is "not reliably interpretable by computer", so it should never be used in production, and we will not look further into it.
How to generate the examples
Once you see an example of each method, it becomes obvious how they work, and when you should use each one.
You can produce examples using:
nc -l or an ECHO server: HTTP test server accepting GET/POST requests
a user agent like a browser or cURL
Save the form to a minimal .html file:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<title>upload</title>
</head>
<body>
<form action="http://localhost:8000" method="post" enctype="multipart/form-data">
<p><input type="text" name="text1" value="text default">
<p><input type="text" name="text2" value="aωb">
<p><input type="file" name="file1">
<p><input type="file" name="file2">
<p><input type="file" name="file3">
<p><button type="submit">Submit</button>
</form>
</body>
</html>
We set the default text value to aωb, which means aωb because ω is U+03C9, which are the bytes 61 CF 89 62 in UTF-8.
Create files to upload:
echo 'Content of a.txt.' > a.txt
echo '<!DOCTYPE html><title>Content of a.html.</title>' > a.html
# Binary file containing 4 bytes: 'a', 1, 2 and 'b'.
printf 'a\xCF\x89b' > binary
Run our little echo server:
while true; do printf '' | nc -l localhost 8000; done
Open the HTML on your browser, select the files and click on submit and check the terminal.
nc prints the request received.
Tested on: Ubuntu 14.04.3, nc BSD 1.105, Firefox 40.
multipart/form-data
Firefox sent:
POST / HTTP/1.1
[[ Less interesting headers ... ]]
Content-Type: multipart/form-data; boundary=---------------------------735323031399963166993862150
Content-Length: 834
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="text1"
text default
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="text2"
aωb
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file1"; filename="a.txt"
Content-Type: text/plain
Content of a.txt.
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file2"; filename="a.html"
Content-Type: text/html
<!DOCTYPE html><title>Content of a.html.</title>
-----------------------------735323031399963166993862150
Content-Disposition: form-data; name="file3"; filename="binary"
Content-Type: application/octet-stream
aωb
-----------------------------735323031399963166993862150--
For the binary file and text field, the bytes 61 CF 89 62 (aωb in UTF-8) are sent literally. You could verify that with nc -l localhost 8000 | hd, which says that the bytes:
61 CF 89 62
were sent (61 == 'a' and 62 == 'b').
Therefore it is clear that:
Content-Type: multipart/form-data; boundary=---------------------------735323031399963166993862150 sets the content type to multipart/form-data and says that the fields are separated by the given boundary string.
But note that the:
boundary=---------------------------735323031399963166993862150
has two less dashes -- than the actual barrier
-----------------------------735323031399963166993862150
This is because the standard requires the boundary to start with two dashes --. The other dashes appear to be just how Firefox chose to implement the arbitrary boundary. RFC 7578 clearly mentions that those two leading dashes -- are required:
4.1. "Boundary" Parameter of multipart/form-data
As with other multipart types, the parts are delimited with a
boundary delimiter, constructed using CRLF, "--", and the value of
the "boundary" parameter.
every field gets some sub headers before its data: Content-Disposition: form-data;, the field name, the filename, followed by the data.
The server reads the data until the next boundary string. The browser must choose a boundary that will not appear in any of the fields, so this is why the boundary may vary between requests.
Because we have the unique boundary, no encoding of the data is necessary: binary data is sent as is.
TODO: what is the optimal boundary size (log(N) I bet), and name / running time of the algorithm that finds it? Asked at: https://cs.stackexchange.com/questions/39687/find-the-shortest-sequence-that-is-not-a-sub-sequence-of-a-set-of-sequences
Content-Type is automatically determined by the browser.
How it is determined exactly was asked at: How is mime type of an uploaded file determined by browser?
application/x-www-form-urlencoded
Now change the enctype to application/x-www-form-urlencoded, reload the browser, and resubmit.
Firefox sent:
POST / HTTP/1.1
[[ Less interesting headers ... ]]
Content-Type: application/x-www-form-urlencoded
Content-Length: 51
text1=text+default&text2=a%CF%89b&file1=a.txt&file2=a.html&file3=binary
Clearly the file data was not sent, only the basenames. So this cannot be used for files.
As for the text field, we see that usual printable characters like a and b were sent in one byte, while non-printable ones like 0xCF and 0x89 took up 3 bytes each: %CF%89!
Comparison
File uploads often contain lots of non-printable characters (e.g. images), while text forms almost never do.
From the examples we have seen that:
multipart/form-data: adds a few bytes of boundary overhead to the message, and must spend some time calculating it, but sends each byte in one byte.
application/x-www-form-urlencoded: has a single byte boundary per field (&), but adds a linear overhead factor of 3x for every non-printable character.
Therefore, even if we could send files with application/x-www-form-urlencoded, we wouldn't want to, because it is so inefficient.
But for printable characters found in text fields, it does not matter and generates less overhead, so we just use it.

enctype='multipart/form-data is an encoding type that allows files to be sent through a POST. Quite simply, without this encoding the files cannot be sent through POST.
If you want to allow a user to upload a file via a form, you must use this enctype.

When submitting a form, you tell your browser to send, via the HTTP protocol, a message on the network, properly enveloped in a TCP/IP protocol message structure. An HTML page has a way to send data to the server: by using <form>s.
When a form is submitted, an HTTP Request is created and sent to the server, the message will contain the field names in the form and the values filled in by the user. This transmission can happen with POST or GET HTTP methods.
POST tells your browser to build an HTTP message and put all content in the body of the message (a very useful way of doing things, more safe and also flexible).
GET will submit the form data in the querystring. It has some constraints about data representation and length.
Stating how to send your form to the server
Attribute enctype has sense only when using POST method. When specified, it instructs the browser to send the form by encoding its content in a specific way. From MDN - Form enctype:
When the value of the method attribute is post, enctype is the MIME
type of content that is used to submit the form to the server.
application/x-www-form-urlencoded: This is the default. When the form is sent, all names and values are collected and URL Encoding is performed on the final string.
multipart/form-data: Characters are NOT encoded. This is important when the form has a file upload control. You want to send the file binary and this ensures that bitstream is not altered.
text/plain: Spaces get converted, but no more encoding is performed.
Security
When submitting forms, some security concerns can arise as stated in RFC 7578 Section 7: Multipart form data - Security considerations:
All form-processing software should treat user supplied form-data
with sensitivity, as it often contains confidential or personally
identifying information. There is widespread use of form "auto-fill"
features in web browsers; these might be used to trick users to
unknowingly send confidential information when completing otherwise
innocuous tasks. multipart/form-data does not supply any features
for checking integrity, ensuring confidentiality, avoiding user
confusion, or other security features; those concerns must be
addressed by the form-filling and form-data-interpreting applications.
Applications that receive forms and process them must be careful
not to supply data back to the requesting form-processing site that
was not intended to be sent.
It is important when interpreting the filename of the Content-
Disposition header field to not inadvertently overwrite files in the
recipient's file space.
This concerns you if you are a developer and your server will process forms submitted by users which might end up containing sensitive information.

enctype='multipart/form-data' means that no characters will be encoded. that is why this type is used while uploading files to server.
So multipart/form-data is used when a form requires binary data, like the contents of a file, to be uploaded

Set the method attribute to POST because file content can't be put inside a URL parameter using a form.
Set the value of enctype to multipart/form-data because the data will be split into multiple parts, one for each file plus one for the text of the form body that may be sent with them.

enctype(ENCode TYPE) attribute specifies how the form-data should be encoded when submitting it to the server.
multipart/form-data is one of the value of enctype attribute, which is used in form element that have a file upload. multi-part means form data divides into multiple parts and send to server.

Usually this is when you have a POST form which needs to take a file upload as data... this will tell the server how it will encode the data transferred, in such case it won't get encoded because it will just transfer and upload the files to the server, Like for example when uploading an image or a pdf

The enctype attribute specifies how the form-data should be encoded when submitting it to the server.
The enctype attribute can be used only if method="post".
No characters are encoded. This value is required when you are using forms that have a file upload control
From W3Schools

JSON.parse: unexpected character at line 1 column 1 of the JSON data in Firefox only

I have create one JSON file through PowerShell and place it on serve.
When i access that JOSN file through $.getJSON it works fine in crome and IE browser but when i access that JSON file in Firefox i got error of
JSON.parse: unexpected character at line 1 column 1 of the JSON data
Header:
Response:
What should be issue and how to fix it in firefox?

You've said that the server sends that JSON back with Content-Type: text/plain. The data appears to be in UTF-16 (probably, that's based on the screenshot), but the default charset for text/plain is us-ascii (see §4.1.2 of RFC2046):
4.1.2. Charset Parameter
A critical parameter that may be specified in the Content-Type field
for "text/plain" data is the character set. This is specified with a
"charset" parameter, as in:
Content-type: text/plain; charset=iso-8859-1
Unlike some other parameter values, the values of the charset
parameter are NOT case sensitive. The default character set, which
must be assumed in the absence of a charset parameter, is US-ASCII.
So, you need to change the response from the server such that it correctly identifies the character set being used, e.g. Content-Type: text/plain; charset=UTF-16 (obviously ensuring first that that is, in fact, the charset of the resource).
I'll just note that, from what I can make out of the JSON, it looks like it's primarily in a western script. If so, UTF-16 is unusual and inefficient choice, you'd probably be better off with UTF-8. But I only have a small fragment of the text to work from.

Ajax response including special characters

I am trying to get a response xml which has special characters in it.
This is failing in IE but in Mozilla it is working fine.
Pls let me know how to fix this.
Here's the code:
request.setCharacterEncoding("UTF-8");
response.setContentType("text/xml; charset=UTF-8");
response.setHeader("Cache-Control", "no-cache");
response.getWriter().write("<xml><valid><![CDATA[2189971_Bright Starts bath time foam ©®!#& toys each]]></valid><productid>123</productid></xml>");

Try adding the encoding in the XML itself:
response.getWriter().write("<?xml version=\"1.0\" encoding=\"UTF-8\"?><root><valid><![CDATA[2189971_Bright Starts bath time foam ©®!#& toys each]]></valid><productid>123</productid></root>");

Most likely your XML is invalid - you are specifying UTF8 encoding in the XML but writing code probably does not output UTF8. Check out what browser receives with some HTTP watcher (likle Fiddler) to make sure response is properly UTF8 encoded (the characters you are having problem with must be encode as their are above ASCII range).
Not sure what language/framework you are using, but setting encoding on request and writing to response looks suspicious.

JQuery AJAX is not sending UTF-8 to my server, only in IE

I am sending UTF-8, japanese text, to my server.
It works in Firefox. My access.log and headers are:
/ajax/?q=%E6%BC%A2%E5%AD%97
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Content-Type application/x-www-form-urlencoded; charset=UTF-8
Howeer, in IE8, my access.log says:
/ajax/?q=??
For some reason, IE8 is turning my AJAX call into question marks. Why!? I added the scriptCharset and ContentType according to some tutorials, but still no luck.
And this is my code:
$.ajax({
method:"get",
url:"/ajax/",
scriptCharset: "utf-8" ,
contentType: "application/x-www-form-urlencoded; charset=UTF-8",
data:"q="+query ...,
...
})

Try encoding the query parameter with encodeURIComponent()
data:"q="+encodeURIComponent( query )
as bobince very correctly noted in his comment, if you use the object notation to pass parameters to the ajax method it will handle the encoding itself..
so
data:{ q : query }
will make jQuery handle the encoding ..

I'we read this post hoping it would solve the problem I had came across and that had to do with utf8 conversions.
In my case it turned out that the server engine (node.js) calculating the Content-length of the data with the data considered to be raw and not utf8, thus two character extended chars in uft8 was calculated as if they where one char resulting in the server sending one character too little.
See what I did to solve it here: Not well formed Json when sending to CouchDB

I know this is an old post but I had this problem recently and I'd like to contribute just in case someone else has the same problem.
I'm using PHP but I'm sure there's an option on every serverside language. It was just a couple of things:
Make sure you're sending the right headers on your ajax response by adding header('Content-Type: text/html; charset=utf-8'); This must be your first line. If you have any errors saying that headers have been sent already or something like that is because somewhere in your code you are outputing an extra space or something before sending the header so check your code.
When you build your response in your server, make sure you convert all your chars to the correspondig HTML char using echo htmlentities($your-string, null, 'utf-8); Because even after telling IE that you are sending utf-8 data, it seems like IE forgets that or it doesn't simply assume anything so adding this to your code will ensure the right output.
Thanks all for your help.

Use encodeURIComponent() in javaScript. Here is the sample:
function doPost()
{
var URL = "http://localhost/check.php?yab=" + encodeURIComponent(document.getElementById("formSearch").childNodes[1].value);
xmlHttp.open("GET", URL);
xmlHttp.send();
};

Why does my UTF8 data from my mod_perl application still get garbled in the web browser?

Before I begin, I would like to highlight the structure of what I am working with.
There is a text file from which a specific text is taken. The file is encoded in utf-8
Perl takes the file and prints it into a page. Everything is displayed as it should be. Perl is set to use utf-8
The web page Perl generates has the following header <meta content="text/html;charset=utf-8" http-equiv="content-type"/>. Hence it is utf-8
After the first load, everything is loaded dynamically via jQuery/AJAX. By flipping through pages, it is possible to load the exact same text, only this time it is loaded by JavaScript. The Request has following header Content-Type: application/x-www-form-urlencoded; charset=UTF-8
The Perl handler which processes the AJAX Request on the Backend delivers contents in utf-8
The AJAX Handler calls up a function in our custom Framework. Before the Framework prints out the text, it is displayed correctly as "üöä". After being sent to the AJAX Handler, it reads "x{c3}\x{b6}\x{c3}\x{a4}\x{c3}\x{bc}" which is the utf-8 representation of "üöä".
After the AJAX Handler delivers its package to the client as JSON, the webpage prints the following: "Ã¶Ã¤Ã¼".
The JS and Perl files themselves are saved in utf-8 (default setting in Eclipse)
These are the symptoms. I tried everything Google told me and I still have the problem. Does anyone have a clue what it could be? If you need any specific code snippet, tell me so and I'll try to paste it.
Edit 1
The Response Header from the AJAX Handler
Date: Mon, 09 Nov 2009 11:40:27 GMT
Server: Apache/2.2.10 (Linux/SUSE)
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset="utf-8"
200 OK
Answer
With the help of you folks and this page, I was able to track down the problem. Seems like the problem was not the encoding by itself, but rather Perl encoding my variable $text twice as utf-8 (according to the site). The solution was as simple as adding Encode::decode_utf8().
I was searching in the completely wrong place to begin with. I thank you all who helped me search in the right place :)
#spreads some upvote love#

returns the following: &38;&65;&116;&105;&108;&100;&101;&59;&38;&112;&97;&114;&97;&59;...
That's:
Ã¶Ã¤Ã¼
Which says your AJAX handler is using an HTML-entity-encoding function for its output, that is assuming input from the ISO-8859-1 character set. You could use a character-reference encoder that knew about UTF-8 instead, but probably it will be easier just to encode the potentially-special characters <>&"' and no others.
The Request has following header Content-Type: application/x-www-form-urlencoded; charset=UTF-8
There is no such parameter as charset for the MIME type application/x-www-form-urlencoded. This will be ignored. Form-encoded strings are inherently byte-based; it is up to the application to decide what character set they are treated as (if any; maybe the application does just want bytes).

This isn't an answer so much as a suggestion for debugging. The first thing that springs to mind is to try sending HTML entities like Ӓ instead of utf-8 codes. To make Perl send these there is surely a module or you can just do
my $text =~ s/(.)/"&#" . ord ($1) . ";"/ge;
The thing which it seems to me the most likely cause of this problem is that the JavaScript receiving end and is not able to understand the encoded UTF-8 from Perl.

We Keep Coding

JavaScript is the programming language of the Web.