In HTML, script element has optional charset attribute.
What is the purpose of it?
When is it useful?
If your javascript files are encoded with a different encoding than the page is using, you can use the charset attribute to tell the browser how to interpret it.
For example, if the page is using Latin1, but the JS file was created with UTF-8.
The purpose of the charset parameter is to specify the encoding of the external script in cases where the encoding is not specified at the HTTP protocol level. It is not meant to override encoding information in HTTP headers, and it does not do that.
This is useful when the author cannot control HTTP headers and the headers do not specify character encoding. It is also useful for offline files, such as in a local copy of a web page accessed directly, not via an HTTP server, so that no HTTP headers exist.
In practice, it is not very useful. If you need to use non-Ascii characters in a JavaScript file, you can use UTF-8 encoding. If you use UTF-8 with a leading BOM, the BOM acts as a useful indicator that lets browsers infer the encoding. But it does not hurt to additionally use charset=utf-8.
Each JavaScript file is a separate element from page, after all you can even load JS from some remote author's server that otherwise have no relations to your page at all. Just as with any other external element, you can manually specify "charset" if remote server returns wrong charset for some reason or just to be sure.
Also, if you have write access to this JS file yourself, you may want to replace all non-ASCII with Unicode position escapes - this will guarantee that symbols will always be interpreted correctly, no matter what encoding is specified in headers. Some JS minifiers, like Google Closure Compiler, can do it for you automatically.
Related
I am allowing users to download files from my application. For that I am explicitly setting "Content-Disposition" as "inline" or "attachment" based on the type of file. This is kinda manual right now. So, for pdf files i set it to "inline" but for html files I set it to "attachment".
Is there a way to automatically decide the value of "Content-Disposition" in express based on file type ?
If I do not send a "Content-Disposition" header, it seems to me currently that the request is treated like it has "Content-Disposition: inline" . Is this observation correct, or is there something more to it?
If by default browser tries to execute/preview the files (based on point 2), what does it mean for security when you allow downloading html files which can execute javascript?
Is there a way to automatically decide the value of "Content-Disposition" in express based on file type ?
You could write middleware that inspects the response and modifies it.
If I do not send a "Content-Disposition" header, it seems to me currently that the request is treated like it has "Content-Disposition: inline" . Is this observation correct, or is there something more to it?
See MDN which says: "The first parameter in the HTTP context is either inline (default value, indicating it can be displayed inside the Web page, or as the Web page)…"
If by default browser tries to execute/preview the files (based on point 2), what does it mean for security when you allow downloading html files which can execute javascript?
Not a lot unless you are serving up JavaScript that you (the website author) do not trust.
If you need to serve HTML documents which might contain JavaScript you don't trust then serve them from a different origin (to use the Same Origin Policy to sandbox them) and/or implement a Content Security Policy to ban them from executing JavaScript.
So I'm trying to show utf-8 characters coming from JavaScript.
I should have it all:
header
<meta charset="utf-8">
include js
<script type="text/javascript" src="x.js" charset="utf-8"></script>
File x.js is saved as UTF-8 (and also the other files)
It works with all my PHP files, just not when it comes from a simple alert in JavaScript.
alert("Prénom doit être rempli");
Instead the famous '?' characters are showed in the alert box.
Anything I've forgotten?
Here what you need to do: open your file in notepad and save it again (save as) and this time select UTF-8 from save-file-dialog-box. Your issue will be solved
From the spec:
The charset attribute gives the character encoding of the external script resource. The attribute must not be specified if the src attribute is not present. If the attribute is set, its value must be an ASCII case-insensitive match for one of the labels of an encoding, and must specify the same encoding as the charset parameter of the Content-Type metadata of the external file, if any.
(My emphasis.)
So you need to ensure that your server is sending the correct Content-Type header — either with no charset, or with charset=utf-8.
If your server is already sending the charset as part of the Content-Type, that's a good thing: Just remove the charset attribute from the script tag.
[UPDATE] I found two links useful to me:
http://mrcoles.com/blog/canvas-composite-operations-demo-animation/
http://tutorials.jenkov.com/html5-canvas/composition.html
Recently when I learn canvas, I find there is more than one way to specify the image src:
You can give a image-URI, like : www.XYZ.com/abc.png
You can give a data-URI, like: data:image/svg+xml;base64,'+ hexcode;
You can give a data-URI from canvas like: canvas.toDataURL("image/png");
I am a little confused about what the difference among them and wondering how browser processes them?
Thanks
It's simply a matter of what the browser supports. The browser will look at the protocol header of the string and if it recognizes it try to interpret it. For example:
If it starts with "http://" or "https://" it will parse the rest of the string with that as a basis. Then try to connect using the HTTP(S) protocol to the server and communicate over this protocol. The protocol itself is specified in RFCs. If all OK, then data is transferred from server to browser which then goes to the next step interpreting the received data itself (this can happen during or after the data has loaded completely).
If the string starts with "data:" the browser will assume a Data-URI protocol (if it supports this protocol). If not, it will consider the source invalid. As it does not need to connect to any external resources in this case, it will validate the content of it and use it if valid. Data-URI will be converted to binary data (the base-64 representation will be converted back to binary form).
(2. and 3. are the same BTW. It's not hexcode that is appended, but a base-64 encoded string, ie. ASCII representation. Other representations are possible but not common).
Then there are other protocols which the browser may support, such as Blob-URL and perhaps some browser will allow FTP (ftp://), and some allow file:// given certain restrictions.
I am integrating yahoo open ID for my site. My site is running for different languages( en, jp, cn etc.). When I am trying to logged in from english site with yahoo opend ID then its fine but when the same this I tried from japanese or chinese site then it's not redirecting me to yahoo open id login.
Each time I am getting the below error from javascript
Error: The character encoding of the plain text document was not
declared. The document will render with garbled text in some browser
configurations if the document contains characters from outside the
US-ASCII range. The character encoding of the file needs to be
declared in the transfer protocol or file needs to use a byte order
mark as an encoding signature. Source File:
http://uatstorefrontjpcr.mobi-book.com/ReturnFromSocial/LogOnYahoo
Line: 0
Can anyone suggest what to do.
I have used SocialAuth-net.dll for this purpose. I have set all required wrapper in web.config. Same coding is okay with Google and facebook open ID.
Your web server is probably sending back a response with a Content-Type of text/plain. When a web browser receives a response with that content type, it doesn't know what encoding should be used to decode it; since you haven't told it how to decode it, different browsers might choose different ways.
The solution is to provide an explicit encoding. For example, if you know that the text is UTF-8 encoded, then you could provide it in a header like so:
Content-Type: text/plain; charset=UTF-8
According to an informational page from the W3C, you can get ASP.Net to include that bit in the header using Response.ContentEncoding. Again using UTF-8 as an example, you can set it like so:
Response.ContentEncoding = Encoding.UTF8;
I want to read a file from my server with javascript and display it's content in a html page.
The file is in ANSI charset, and it has romanian characters.. I want to display those characters in the way they are :D not in different black symbols..
So I think my problem is the charset.. I have a get request that takes the content of the file, like this:
function IO(U, V) {//LA MOD String Version. A tiny ajax library. by, DanDavis
var X = !window.XMLHttpRequest ? new ActiveXObject('Microsoft.XMLHTTP') : new XMLHttpRequest();
X.open(V ? 'PUT' : 'GET', U, false );
X.setRequestHeader('Content-Type', 'Charset=UTF-8');
X.send(V ? V : '');return X.responseText;}
As far as I know the romanian characters are included in UTF-8 charset so I set the charset of the request header to utf-8.. the file is in utf-8 format and I have the meta tag that tells the browser that the page has utf-8 content..
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
So if I query the server the direct file, the browser shows me the romanian characters but if I display the content of the page through this script, I see only symbols instead of characters..
So what I am doing wrong?
Thank you!
PS: I want this to work on Firefox at least not necessarily in all browsers..
While my initial assumption was the same as T.J. Crowder's, a quick chat established that the OP uses some hosting service and cannot easily change the Content-Type headers.
The files were sent as text/plain or text/html without any Charset paramter, hence the browser interprets them as UTF-8 (which is the default).
So saving the files in UTF-8 (instead of ANSI/Windows-1252) did the trick.
You need to ensure that the HTTP response returning the file data has the correct charset identified on it. You have to do that server-side, I don't think you can force it from the client. (When you set the content type in the request header, you're setting the content type of the request, not the response.) So for instance, the response header from the server would be along the lines of:
Content-Type: text/plain; charset=windows-1252
...if by "ANSI" you mean the Windows-1252 charset. That should tell the browser what it needs to do to decode the response text correctly before handing it to the JavaScript layer.
One problem, though: As far as I can tell, Windows-1252 doesn't have the full Romanian alphabet. So if you're seeing characters like Ș, ș, Ţ, ţ, etc., that suggests the source text is not in Windows-1252. Now, perhaps it's okay to drop the diacriticals on those in Romanian (I wouldn't know) and so if your source text just uses S and T instead of Ș and Ţ, etc., it could still be in Windows-1252. Or it may be ISO-8859 or ISO-8859-2 (both of which drop some diacriticals) or possibly ISO-8859-16 (which has full Romanian support). Details here.
So the first thing to do is determine what character set the source text is actually in.