How can I get the file encoding format with Js - javascript

When I use readAsText(file,'utf-8'), I ran into a problem. Because I use the utf8 format to parse the file, when the user file is not in utf8 encoding format, the parsing result is likely to be wrong. How can I use Js to get the encoding format of the user file to remind the user to upload the utf8 file? Under Windows, I judge whether the first 3 bytes of the file are BOM, but Mac will not add BOM to utf8 files, so can anyone help? thank you very much
const encodeReader = new FileReader();
encodeReader.onload = function (e) {
const view = new Uint8Array(e.target.result as ArrayBuffer);
let zwsp = "";
for (const num of view) {
zwsp += num.toString(16);
}
if (zwsp.toUpperCase() === "EFBBBF") {
console.log("it is utf-8");
}
};
encodeReader.readAsArrayBuffer(file.slice(0, 3));

Related

I want to decompress a GZIP string with JavaScript

I have this GZIPed string: H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==
I created that with this website: http://www.txtwizard.net/compression
I have tried using pako to ungzip it.
import { ungzip } from 'pako';
const textEncoder = new TextEncoder();
const gzipedData = textEncoder.encode("H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==");
console.log('gzipeddata', gzipedData);
const ungzipedData = ungzip(gzipedData);
console.log('ungziped data', ungzipedData);
The issue is that Pako throws the error: incorrect header check
What am I missing here?
A JSbin
The "H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==" is a base64 encoded string, you first need to decode that into a buffer.
textEncoder.encode just encodes that base64 encoded string into a byte stream.
How to do that depend on whether you are in a browser or on nodejs.
node.js version
To convert the unzipped data to a string you further have use new TextDecoder().decode()
For node you will use Buffer.from(string, 'base64') to decode the base64 encoded string:
import { ungzip } from 'pako';
// decode the base64 encoded data
const gzipedData = Buffer.from("H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==", "base64");
console.log('gzipeddata', gzipedData);
const ungzipedData = ungzip(gzipedData);
console.log('ungziped data', new TextDecoder().decode(ungzipedData));
browser version
In the browser, you have to use atob, and you need to convert the decoded data to an Uint8Array using e.g. Uint8Array.from.
The conversion I used was taken from Convert base64 string to ArrayBuffer, you might need to verify if that really works in all cases.
// decode the base64 encoded data
const gezipedData = atob("H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==")
const gzipedDataArray = Uint8Array.from(gezipedData, c => c.charCodeAt(0))
console.log('gzipeddata', gzipedDataArray);
const ungzipedData = pako.ungzip(gzipedDataArray);
console.log('ungziped data', new TextDecoder().decode(ungzipedData));
<script src="https://cdnjs.cloudflare.com/ajax/libs/pako/2.0.4/pako.min.js"></script>
This string is base64-encoded.
You first need to decode it to a buffer:
const gzippedString = 'H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==';
const gzippedBuffer = new Buffer(gzippedString, 'base64');
Then you can un(g)zip it:
const unzippedBuffer = ungzip(gzippedBuffer);
The result on ungzip is a Unit8Array. If you want to convert it back to a string you'll need to decode it again:
const unzippedString = new TextDecoder('utf8').decode(unzipped);

javascript, how could we read a local text file with accent letters into it?

I have one doubt because I need to read a local file and I have been studying some threads, and I have seen various ways to handle it, in most of the cases there is an input file.
I would need to load it directly through code.
I have studied this thread:
How to read a local text file?
And I could read it.
The surprising part was when I tried to split the lines and words, it showed: � replacing accent letters.
The code I have right now is:
myFileReader.js
function readTextFile(file) {
var rawFile = new XMLHttpRequest();
rawFile.open("GET", file, false);
rawFile.onreadystatechange = function () {
if (rawFile.readyState === 4) {
if (rawFile.status === 200 || rawFile.status == 0) {
allText = rawFile.responseText;
console.log('The complete text is', allText);
let lineArr = intoLines(allText);
let firstLineWords = intoWords(lineArr[0]);
let secondLineWords = intoWords(lineArr[1]);
console.log('Our first line is: ', lineArr[0]);
let atlas = {};
for (let i = 0; i < firstLineWords.length; i++) {
console.log(`Our ${i} word in the first line is : ${firstLineWords[i]}`);
console.log(`Our ${i} word in the SECOND line is : ${secondLineWords[i]}`);
atlas[firstLineWords[i]] = secondLineWords[i];
}
console.log('The atlas is: ', atlas);
let atlasJson = JSON.stringify(atlas);
console.log('Atlas as json is: ', atlasJson);
download(atlasJson, 'atlasJson.txt', 'text/plain');
}
}
};
rawFile.send(null);
}
function download(text, name, type) {
var a = document.getElementById("a");
var file = new Blob([text], {type: type});
a.href = URL.createObjectURL(file);
a.download = name;
}
function intoLines(text) {
// splitting all text data into array "\n" is splitting data from each new line
//and saving each new line as each element*
var lineArr = text.split('\n');
//just to check if it works output lineArr[index] as below
return lineArr;
}
function intoWords(lines) {
var wordsArr = lines.split('" "');
return wordsArr;
}
The doubt is: how could we handle those special character which are the vowels with accent?
I ask this, because even in the IDE thet interrogation marks appeared if we load the txt in UTF-8, so then I changed to ISO-8859-1 and it loaded well.
Also I have studied:
Read UTF-8 special chars from external file using Javascript
Convert special characters to HTML in Javascript
Reading a local text file from a local javascript file?
In addition, could you explain if there is a shorter way to load files in client javascript. For example in Java there is the FileReader / FileWriter / BufferedWriter. Is theren in Javascript something similar?
Thank you for you help!
It sounds like the file is encoded with ISO-8859-1 (or possibly the very-similar Windows-1252).
There's no BOM or equivalent for those encodings.
The only solutions I can see are:
Use a (local) server and have it return the HTTP Content-Type header with the encoding identified as a charset, e.g. Content-Type: text/plain; encoding=ISO-8859-1
Use UTF-8 instead (e.g., open the file in an editor as ISO-8859-1, then save it as UTF-8 instead), as that's the default encoding for XHR response bodies.
Put your text in an .html file with the corresponding content type,
for example:
<meta http-equiv="Content-Type" content="text/html; charset="UTF-8">
enclose the text between two tags ("####" in my example) (or put in a div)
Read the html page, extract the content and select the text:
window.open(url); //..
var content = newWindow.document.body.innerHTML;
var strSep="####";
var x = content.indexOf(strSep);
x=x+strSep.length;
var y = content.lastIndexOf(strSep);
var points=content.slice(x, y);

JAVASCRIPT decode a base64string (which is an encoded zipfile) to a zipfile and get the zipfiles content by name

the question says it all, im receiving a base64 encoded ZIPFILE from the server, which I first want to decode to a ZIPFILE in memory and then get the ZIPFILES content, which is a json-file.
I tried to use JSZIP but im totally lost in this case ... the base64 string is received with javascript by a promise.
So my question in short is: How can I convert a base64 encoded ZIPFILE to a ZIPFILE in memory to get its contents.
BASE64 -> ZIPFILE -> CONTENT
I use this complicated process to save much space on my database. And I dont want to handle this process on server-side, but on clientside with JS.
Thanks in advance!
If anyone is interested in my solution to this problem read my answer right here:
I received the data in a base64-string format, then converted the string to a blob. Then I used the blob-handle to load the zipfile with the JSZip-Library. After that I could just grab the contents of the zipfile. Code is below:
function base64ToBlob(base64) {
let binaryString = window.atob(base64);
let binaryLen = binaryString.length;
let ab = new ArrayBuffer(binaryLen);
let ia = new Uint8Array(ab);
for (let i = 0; i < binaryLen; i++) {
ia[i] = binaryString.charCodeAt(i);
}
let bb = new Blob([ab]);
bb.lastModifiedDate = new Date();
bb.name = "archive.zip";
bb.type = "zip";
return bb;
}
To get the contents of the zipfile:
let blob = base64ToBlob(resolved);
let zip = new JSZip();
zip.loadAsync(blob).then(function(zip) {
zip.file("archived.json").async("string").then(function (content) {
console.log(content);
// content is the file as a string
});
}).catch((e) => {
});
As you can see, first the blob is created from the base64-string. Then the handle is given over to the JSZip loadAsync method. After that you have to set the name of the file which you want to retrieve from the zipfile. In this case it is the file called "archived.json". Now because of the async("string") function the file (file contents) are returned as a string. To further use the extracted string, just work with the content variable.

Decompress gzip and zlib string in javascript

I want to get compress layer data from tmx file . Who knows libraries for decompress gzip and zlib string in javascript ? I try zlib but it doesn't work for me . Ex , layer data in tmx file is :
<data encoding="base64" compression="zlib">
eJztwTEBAAAAwqD1T20JT6AAAHgaCWAAAQ==
</data>
My javascript code is
var base64Data = "eJztwTEBAAAAwqD1T20JT6AAAHgaCWAAAQ==";
var compressData = atob(base64Data);
var inflate = new Zlib.Inflate(compressData);
var output = inflate.decompress();
It runs with displays message error "unsupported compression method" . But I try decompress with online tool as http://i-tools.org/gzip , it returns correct string.
Pako is a full and modern Zlib port.
Here is a very simple example and you can work from there.
Get pako.js and you can decompress byteArray like so:
<html>
<head>
<title>Gunzipping binary gzipped string</title>
<script type="text/javascript" src="pako.js"></script>
<script type="text/javascript">
// Get datastream as Array, for example:
var charData = [31,139,8,0,0,0,0,0,0,3,5,193,219,13,0,16,16,4,192,86,214,151,102,52,33,110,35,66,108,226,60,218,55,147,164,238,24,173,19,143,241,18,85,27,58,203,57,46,29,25,198,34,163,193,247,106,179,134,15,50,167,173,148,48,0,0,0];
// Turn number array into byte-array
var binData = new Uint8Array(charData);
// Pako magic
var data = pako.inflate(binData);
// Convert gunzipped byteArray back to ascii string:
var strData = String.fromCharCode.apply(null, new Uint16Array(data));
// Output to console
console.log(strData);
</script>
</head>
<body>
Open up the developer console.
</body>
</html>
Running example: http://jsfiddle.net/9yH7M/
Alternatively you can base64 encode the array before you send it over as the Array takes up a lot of overhead when sending as JSON or XML. Decode likewise:
// Get some base64 encoded binary data from the server. Imagine we got this:
var b64Data = 'H4sIAAAAAAAAAwXB2w0AEBAEwFbWl2Y0IW4jQmziPNo3k6TuGK0Tj/ESVRs6yzkuHRnGIqPB92qzhg8yp62UMAAAAA==';
// Decode base64 (convert ascii to binary)
var strData = atob(b64Data);
// Convert binary string to character-number array
var charData = strData.split('').map(function(x){return x.charCodeAt(0);});
// Turn number array into byte-array
var binData = new Uint8Array(charData);
// Pako magic
var data = pako.inflate(binData);
// Convert gunzipped byteArray back to ascii string:
var strData = String.fromCharCode.apply(null, new Uint16Array(data));
// Output to console
console.log(strData);
Running example: http://jsfiddle.net/9yH7M/1/
To go more advanced, here is the pako API documentation.
I can solve my problem by zlib . I fix my code as below
var base64Data = "eJztwTEBAAAAwqD1T20JT6AAAHgaCWAAAQ==";
var compressData = atob(base64Data);
var compressData = compressData.split('').map(function(e) {
return e.charCodeAt(0);
});
var inflate = new Zlib.Inflate(compressData);
var output = inflate.decompress();
For anyone using Ruby on Rails, who wants to send compressed encoded data to the browser, then uncompress it via Javascript on the browser, I've combined both excellent answers above into the following solution. Here's the Rails server code in my application controller which compresses and encodes a string before sending it the browser via a #variable to a .html.erb file:
require 'zlib'
require 'base64'
def compressor (some_string)
Base64.encode64(Zlib::Deflate.deflate(some_string))
end
Here's the Javascript function, which uses pako.min.js:
function uncompress(input_field){
base64data = document.getElementById(input_field).innerText;
compressData = atob(base64data);
compressData = compressData.split('').map(function(e) {
return e.charCodeAt(0);
});
binData = new Uint8Array(compressData);
data = pako.inflate(binData);
return String.fromCharCode.apply(null, new Uint16Array(data));
}
Here's a javascript call to that uncompress function, which wants to unencode and uncompress data stored inside a hidden HTML field:
my_answer = uncompress('my_hidden_field');
Here's the entry in the Rails application.js file to call pako.min.js, which is in the /vendor/assets/javascripts directory:
//= require pako.min
And I got the pako.min.js file from here:
https://github.com/nodeca/pako/tree/master/dist
All works at my end, anyway! :-)
I was sending data from a Python script and trying to decode it in JS. Here's what I had to do:
Python
import base64
import json
import urllib.parse
import zlib
...
data_object = {
'_id': '_id',
...
}
compressed_details = base64.b64encode(zlib.compress(bytes(json.dumps(data_object), 'utf-8'))).decode("ascii")
urlsafe_object = urllib.parse.quote(str(compressed_details))#.replace('%', '\%') # you likely don't need this last part
final_URL = f'https://my.domain.com?data_object={urlsafe_object}'
...
JS
// npm install this
import pako from 'pako';
...
const urlParams = new URLSearchParams(window.location.search);
const data_object = urlParams.get('data_object');
if (data_object) {
const compressedData = Uint8Array.from(window.atob(data_object), (c) => c.charCodeAt(0));
originalObject = JSON.parse(pako.inflate(compressedData, { to: 'string' }));
};
...

Downloading generated binary content contains utf-8 encoded chars in disk-file

I am trying to save a generated zip-file to disk from within a chrome extension with the follwing code:
function sendFile (nm, file) {
var a = document.createElement('a');
a.href = window.URL.createObjectURL(file);
a.download = nm; // file name
a.style.display = 'none';
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
}
function downloadZip (nm) {
window.URL = window.webkitURL || window.URL;
var content;
content = zip.generate();
var file = new Blob ([content], {type:'application/base64'});
sendFile ("x.b64", file);
content = zip.generate({base64:false});
var file = new Blob ([content], {type:'application/binary'});
sendFile ("x.zip", file);
}
Currently this saves the contents of my zip in two versions, the first one is base64 encoded, and when I decode it with base64 -d the resulting zip is ok.
The second version should just save the raw data (the zip file), but this raw data arrives utf-8 encoded on my disk. (each value >= 0x80 is preprended with 0xc2). So how to get rid of this utf-8 encoding? Tried various type-strings like application/zip, or ommitting the type info completely, it just arrives always with utf-8 encoding. I am also curious how to make the browser store/convert base64-data (the first case) by itself, so that they arrive as decoded binary data on my disk... I'm using Chrome Version 23.0.1271.95 m
PS: The second content I analysed with a hexdump-utility inside the browser: it does not contain utf-8 encodings (or my hexdump calls something which does implicit conversion). For completeness (sorry, its just transposed from c, so it might not be that cool js-code), I append it here:
function hex (bytes, val) {
var ret="";
var tmp="";
for (var i=0;i<bytes;i++) {
tmp=val.toString (16);
if (tmp.length<2)
tmp="0"+tmp;
ret=tmp+ret;
val>>=8;
}
return ret;
}
function hexdump (buf, len) {
var p=0;
while (p<len) {
line=hex (2,p);
var i;
for (i=0;i<16;i++) {
if (i==8)
line +=" ";
if (p+i<len)
line+=" "+hex(1,buf.charCodeAt(p+i));
else
line+=" ";
}
line+=" |";
for (i=0;i<16;i++) {
if (p+i<len) {
var cc=buf.charCodeAt (p+i);
line+= ((cc>=32)&&(cc<=127)&&(cc!='|')?String.fromCharCode(cc):'.');
}
}
p+=16;
console.log (line);
}
}
From working draft:
If element is a DOMString, run the following substeps:
Let s be the result of converting element to a sequence of Unicode characters [Unicode] using the algorithm for doing so in WebIDL
[WebIDL].
Encode s as UTF-8 and append the resulting bytes to bytes.
So strings are always converted to UTF-8, and there is no parameter to affect this. This doesn't affect base64 strings because they only contain characters that match single byte per codepoint, with the codepoint and byte having the same value. Luckily Blob exposes lower level interface (direct bytes), so that limitation doesn't really matter.
You could do this:
var binaryString = zip.generate({base64: false}), //By glancing over the source I trust the string is in "binary" form
len = binaryString.length, //I.E. having only code points 0 - 255 that represent bytes
bytes = new Uint8Array(len);
for( var i = 0; i < len; ++i ) {
bytes[i] = binaryString.charCodeAt(i);
}
var file = new Blob([bytes], {type:'application/zip'});
sendFile( "myzip.zip", file );

Categories