Javascript FileReader readAsText function not understaning utf-8 encoding characters like ä and ö - javascript

I have tried searching this a lot and nothing helped me. I have an import from csv feature and javascript code reads the csv content line by line. The characters ä,ö etc are just not recognized. FileReader readAsText has default encoding utf-8 but in this case it is not for some reason working. Here is my code.
reader = new FileReader()
reader.onload = (e) =>
result = e.target.result
console.log result
# file content
fileContent = result.split("\r")
reader.readAsText(e.target.files.item(0))
I have tried defining encoding like below and whatever I put there couldn't help me.
encoding = "UTF-8"
reader.readAsText(e.target.files.item(0), encoding)

I got this to work by using ISO Latin 4 encoding.
reader.readAsText(e.target.files.item(0), 'ISO-8859-4');
That should work for you but remember to use this particular encoding just for some scandinavian characters.

Related

Reading a Windows-1252 file in Node JS

I'm using node to read a text document using readFile and within that document is a character
�
This is a windows-1252 character but it is being converted in javascript to utf-8 automatically. The correct character should actually display as Å.
Is there a way I can convert this character from utf-8 to windows-1252 to render the correct character?
The file is being read using nodes readFile method and is being read as utf-8, due to the lack of support for the necessary encoding.
fs.readFile(`${logDirectory}myText.txt`,"utf-8", (err, text) => { ... }
I've tried a few options such as iconv-lite and legacy-decode but neither seem to return the correct result.
Any guidance appreciated.
You can try reading the file with the latin1-encoding as Windows-1252 is based on that:
fs.readFile(`${logDirectory}myText.txt`,'latin1', (err, text) => { ... }
Also note that in NodeJS the utf-8 encoding is called utf8 instead of utf-8 as described here.

Parse UTF-8 XML in javascript

I'm trying to load and parse a simple utf-8-encoded XML file in javascript using node and the xpath and xmldom packages. There are no XML namespaces used and the same XML parsed when converted to ASCII. I can see in the debugger in VS Code that the string has embedded spaces in between each character (surely due to loading the utf-8 file incorrectly) but I can't find a way to properly load and parse the utf-8 file.
Code:
var xpath = require('xpath')
, dom = require('xmldom').DOMParser;
const fs = require('fs');
var myXml = "path_to_my_file.xml";
var xmlContents = fs.readFileSync(myXml, 'utf8').toString();
// this line causes errors parsing every single tag as the tag names have spaces in them from improper utf-8 decoding
var doc = new dom().parseFromString(xmlContents, 'application/xml');
var cvNode = xpath.select1("//MyTag", doc);
console.log(cvNode.textContent);
The code works fine if the file is ASCII (textContent has the proper data), but if it is UTF-8 then there are a number of parsing errors and cvNode is undefined.
Is there a proper way to parse UTF-8 XML in node/javascript? I can't for the life of me find a decent example.
When you see additional white spaces between each letter, this suggests that the file isn't actually encoded using utf-8 but uses a 16 bit unicode encoding.
Try 'utf16le'.
For a list of supported encodings see Buffers and Character Encodings.

JavaScript atob is different than Notepad++ Base64 Decode

I am receiving the content of a zip file (from an API) as a Base64-encoded string.
If I paste that string into Notepad++ and go
Plugins > MIME Tools > Base64 Decode
and save it as test.zip, it becomes a valid zip file, I can open it.
Now, I am trying to achieve the same thing in JavaScript.
I have tried atob(), and probably everything mentioned in the answers here and the code from Mozilla doc.
atob produces a similar content, but some characters are decoded differently (hence becomes an invalid zip file). The other methods throw an invalid URI error.
How can I reproduce Notepad++ behaviour in JavaScript?
The window.atob is only good for decoding data which fits in a UTF-8 string. Anything which cannot be represented in UTF-8 string will not be equal to its binary form when decoded. Javascript, at most, will try encoding the resultant bytes in to UTF-8 character sequence. This is the reason why your zip archive is rendered invalid eventually.
The moment you do the following:
var data = window.atob(encoded_data)
... you are having a different representation of your data in a UTF-8 string which is referenced by the variable data.
You should decode your binary data directly to an ArrayBuffer. And window.atob is not a good fit for this.
Here is a function which can convert base64 encoded data directly in to an ArrayBuffer.
As mentioned, do not use atob directly for decoding Base64 encoded zip files. You can use this function mentioned in https://stackoverflow.com/a/21797381/3508516 instead.
function _base64ToArrayBuffer(base64) {
var binary_string = window.atob(base64);
var len = binary_string.length;
var bytes = new Uint8Array(len);
for (var i = 0; i < len; i++) {
bytes[i] = binary_string.charCodeAt(i);
}
return bytes.buffer;
}

Converting HTML to base64 is not encoding certain characters correctly

I am trying to convert an HTML string to base64 which I will then display in an iframe but I am running into some issues. When I encode the string as base64 and view it as a data URI, it is display non-breaking spaces as 'Â', and other characters such as apostrophes and dashes are displayed as random ASCII characters.
This is how I am converting the string to base64:
var blob = new Blob([message], {
type: contentType
});
var reader = new FileReader();
reader.onload = function (e) {
let result = e.target.result;
};
reader.readAsDataURL(blob);
Any ideas on how to prevent this? Thanks!
There is no reason to do what you're doing. Data URIs are almost always the wrong choice. Base-64 encoding adds 33% overhead of size and wastes CPU. Data URIs are subjected to some strict size limits.
Use a Blob/Object URL instead.
iframe.src = URL.createBlobURL(blob);
https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL
As to your UTF-8 encoding problem, the issue is likely in your file itself, in that if you're not serving from a server which sets the character set in the headers, it needs to be in the file:
<meta charset="UTF-8">
There's no server here, so you need to declare it in the file.

Javascript export CSV encoding utf-8 issue

I need to export javascript array to CSV file and download it. I did it but 'ı,ü,ö,ğ,ş' this characters looks like 'ı ü ö ÄŸ ÅŸ' in the CSV file. I have tried many solutions recommended on this site but didn't work for me.
I added my code snippet, Can anyone solve this problem?
var csvString = 'ı,ü,ö,ğ,ş';
var a = window.document.createElement('a');
a.setAttribute('href', 'data:text/csv; charset=utf-8,' + encodeURIComponent(csvString));
a.setAttribute('download', 'example.csv');
a.click();
This depends on what program is opening the example.csv file. Using a text editor, the encoding will be UTF-8 and the characters will not be malformed. But using Excel the default encoding for CSV is ANSI and not UTF-8. So without forcing Excel using not ANSI but UTF-8 as the encoding, the characters will be malformed.
Excel can be forced using UTF-8 for CSV with putting a BOM (Byte Order Mark) as first characters in the file. The default BOM for UTF-8 is the byte sequence 0xEF,0xBB,0xBF. So one could think simply putting "\xEF\xBB\xBF" as first bytes to the string will be the solution. But surely that would be too simple, wouldn't it? ;-) The problem with this is how to force JavaScript to not taking those bytes as characters. The "solution" is using a "universal BOM" "\uFEFF" as mentioned in Special Characters (JavaScript).
Example:
var csvString = 'ı,ü,ü,ğ,ş,#Hashtag,ä,ö';
var universalBOM = "\uFEFF";
var a = window.document.createElement('a');
a.setAttribute('href', 'data:text/csv; charset=utf-8,' + encodeURIComponent(universalBOM+csvString));
a.setAttribute('download', 'example.csv');
window.document.body.appendChild(a);
a.click();
See also Adding UTF-8 BOM to string/Blob.
Using this, the encoding will be correct. But nevertheless, this only works properly if comma is the default list separator in your Windows locale settings. If not, if for example semicolon is the default list separator in your Windows locale settings, then all content will be in first column without splitting it by comma. Then you have to use semicolon as delimiter in the CSV also. But this is another problem and leads to the conclusion not using CSV at all but using libraries which can directly creating Excel files (*.xls or *.xlsx).

Categories