javascript - reading local text file - charset issue - javascript

I am reading local text file using input-type-file and FileReader.readAsText(). The problem arises when the local text file contains characters like Ü. In that case it is converted to ï¿. Of course I can set encoding manually to iso8859-1 as a parameter of FileReader.readAsText(File, encoding) but the thing is that I have no clue what kind of encoding user has set on his side.
My question is whether there is an option to determine encoding on client machine ?
Best regards
kkris1983

You'd need to analyze the raw binaries of the text file to have a best guess at what the encoding is. There isn't any libraries for this in javascript AFAIK but you could port one from other languages.
Since that isn't very robust, you should also provide a manual override like Characters not showing correctly? Change encoding:
You can also have smart defaults, for example ISO-8859-1 if you detect it's western windows machine.

Related

Eclipse Luna : can't remove default encoding for javascript and json

I'm trying to remove the default UTF-8 encoding of javascript / json files because I want to use the workspace default text encoding, but it doesn't seem to work with eclipse Luna.
1 - Default encoding set to UTF-8 for javascript files
2 - I remove the default encoding and click "update"
3 - If I leave and go back to check javascript file encoding, it's back to UTF-8 (1st image).
Am I missing something here ?
The problem here is that the plugin which defines the JavaScript content type specifies a default encoding of 'UTF-8', when you remove the Default encoding in the dialog the encoding just reverts to this default.
This means that you can't get this to default to the workspace settings.
The class org.eclipse.core.internal.content.ContentType defines this behaviour.
I too got this problem in old project. But UTF-8 is the most commonly used and recommended encoding for web, and eclipse (plugin in it) promote this idea for users in intrusive style. If you need create/edit few js-files with another encoding - may change encoding in properties of current file from "Default (determined from content type: UTF-8)" to other. But if this files are lot (more 20-50) - yes, its hurts to realize :-)
In my old project more then 100 js-files with cp1251 encoding, but i change encoding (from utf8) in properties only for few editable files. This does not affect build of the project

Making JettyRunner serve up static content (like CSS and JavaScript) with UTF-8 encoding

I'm running a Java project via Jetty Runner (7.6.15). I've been trying to play with D3.js lately, and I needed to serve it up unminified in order to debug some mystery problem. Well, D3 has some non-Latin Unicode characters in some variable names (like var π = Math.PI).
When I try to use that unminified file, I get errors because my browser thinks the character encoding is ISO-8859-1 instead of UTF-8. Sure enough, the "Content-Type" header in the server response has no character set.
I'm launching Jetty Runner with LANG and LC_ALL both set to "en_US.UTF-8", and I'm passing a system property file.encoding set to "UTF-8" as well on the Java command line. That apparently is not enough. I can look at the source file on my host and it's definitely intact; in fact if I load the JavaScript file directly from the browser address bar and manually tell Firefox that it's Unicode, then it looks fine.
I'm not launching Jetty Runner with a configuration file because I have no idea how to do that. It seems to add an explicit ISO-8859-1 marker to the content type header on the main HTML page (it's a single-page application), and that of course overrides the <meta charset> tag in the document head.
So is there a way to do this? Sometimes I feel like I'm one of the only 12 people on earth who use Jetty Runner :)
This turned out to be as simple as a clause in my web.xml:
<mime-mapping>
<extension>js</extension>
<mime-type>application/javascript; charset=UTF-8</mime-type>
</mime-mapping>
I don't know how something this obvious escaped my notice. It doesn't even have much to do with Jetty per se.

Javascript character encoding

In an external javascript file I have a function that is used to append text to table cells (within the HTML doc that the javascript file is added to), text that can sometimes have Finnish characters (such as ä). That text is passed as an argument to my function:
content += addTableField(XML, 'Käyttötarkoitus', 'purpose', 255);
The problem is that diacritics such as "ä" get converted to some other bogus characters, such as "�". I see this when viewing the HTML doc in a browser. This is obviously not desirable, and is quite strange as well since the character encoding for the HTML doc is UTF-8.
How can I solve this problem?
Thanks in advance for helping out!
The file that contains content += addTableField(XML, 'Käyttötarkoitus', 'purpose', 255); is not saved in UTF-8 encoding.
I don't know what editor you are using but you can find it in settings or in the save dialog.
Example:
If you can't get this to work you could always write out the literal code points in javascript:
content += addTableField(XML, 'K\u00E4ytt\u00f6tarkoitus', 'purpose', 255);
credit: triplee
To check out the character encoding announced by a server, you can use Firebug (in the Info menu, there’s a command for viewing HTTP headers). Alternatively, you can use online services like Web-Sniffer.
If the headers for the external .js file specify a charset parameter, you need to use that encoding, unless you can change the relevant server settings (perhaps a .htaccess file).
If they lack a charset parameter, you can specify the encoding in the script element, e.g. <script src="foo.js" charset="utf-8">.
The declared encoding should of course match the actual encoding, which you can normally select when you save a file (using “Save As” command if needed).
The character encoding of the HTML file / doc does not matter any external ressource.
You will need to deliver the script file with UTF8 character encoding. If it was saved as such, your server config is bogus.

Images with Base64 Automatism - Javascript (with PHP?)

I heard a lot about Baase64 Encoding for Images in Webdesign.
And i saw a lot of developers they use it for thier headlines with: ...
Is there any automatism (with javascript) behind?
Or have they all converted & inserted ? (could not belive)
Example: http://obox-inkdrop.tumblr.com/ (- Headlines)
First of all, the encoding has to be done on the server-side, be it :
automated with a script, that reads the original image file, and returns the base64 encoded string to inject it into the HTML that's being generated
or by hand, and directly placed into the HTML.
The base64 encoding cannot be done on the client-side, as the goal is to avoid sending the image file from the server to the browser (to minimize the number of HTTP requests).
Depending of the language that's used on the server-side, you'll probably find some function to do base64 encoding.
In PHP, you might be interested by base64_encode()

JS Problem with encoding decoding UTF?

I'm dealing with a JSON file that i cannot modify, i have to keep it AS IS.
it contains text, with all the apostrophes converted to ’, and other special chars here and there...
what is that? unicode? how can i convert to the regular apostrophe?
i placed already the META tag utf-8 on the header but it doesn't seem to change anything...
What mime type is your JSON response being sent with? (Look in the headers in FireBug or the Developer Console.) It seems that you one of these steps is using a different encoding:
The JSON string generated by the web server
The mime type encoding sent along with the response
The mime type of your HTML page
The mime type for your JavaScript code
If you supply the community with actual code, or better yet a working reproducible test case, then the community can better help you.
what is that?
It is an attempt to interpret data stored in one character encoding as data stored in a different character encoding.
To ensure everything displays correctly you need to:
Pick an encoding (UTF-8 is a good bet)
Store everything in that encoding
Configure your editor to use it!
Configure your database (if applicable) to use it!
Ensure any server side code you use expects UTF-8 input and gives UTF-8 output
Configure your webserver to include charset=utf-8 on the Content-Type HTTP header
The W3C has a good introductory article on the subject, which has links to lots of useful further reading at the end.

Categories