Java Script File Reader class is not reading the contents properly - javascript

I am using readAsText method in FileReader class (java script) with encoding type as "UTF-8" to read a file from client. It works well for all kind of characters with ascii values ranging from 1 to 65000. The only problem I have is, when I read chunk by chunk from the file, any char has ascii value after 3000 sometimes not read properly, After the investigation, I found that it is happening only when I do this reading for big files and the particular char is accidently sitting as first letter of a chunk. And I tested with multiple chunks of a file. This problem is not happening for all the chunks, happening one or 2 chunks out of 10. This is weird and strange. Am I missing something here? and do we have any other options to read local file in Java script? Any help will be much appreciated.

this might be one solution
new Blob(['hi']).text().then(console.log)
Here is another, not so cross browser friendly... but this could work...
new Blob(['foo']) // or new File(['foo'], 'test.txt')
.stream()
.pipeThrough(new TextDecoderStream('utf-8'))
.pipeTo(new WritableStream({
write(part) {
console.log(part)
}
}))
Another lower level solution that don't depend on WritableStream or TextDecoderStream would be to use regular TextDecoder with stream option by doing
var res = ''
const decoder = new TextDecoder()
res += decoder.decode(chunk1, { stream: true })
res += decoder.decode(chunk2, { stream: true })
res += decoder.decode(chunk3, { stream: true })
res += decoder.decode() // flush the end
how u get each chunk could be by using new Response(blob).body or by blob.stream() or simply slicing the blob using blob.slice(start, end) and use FileReader.prototype.readAsArrayBuffer()

Related

Mechanisms for hashing a file in JavaScript

I am relatively new to JavaScript and I want to get the hash of a file, and would like to better understand the mechanism and code behind the process.
So, what I need: An MD5 or SHA-256 hash of an uploaded file to my website.
My understanding of how this works: A file is uploaded via an HTML input tag of type 'file', after which it is converted to a binary string, which is consequently hashed.
What I have so far: I have managed to get the hash of an input of type 'text', and also, somehow, the hash of an uploaded file, although the hash did not match with websites I looked at online, so I'm guessing it hashed some other details of the file, instead of the binary string.
Question 1: Am I correct in my understanding of how a file is hashed? Meaning, is it the binary string that gets hashed?
Question 2: What should my code look like to upload a file, hash it, and display the output?
Thank you in advance.
Basically yes, that's how it works.
But, to generate such hash, you don't need to do the conversion to string yourself. Instead, let the SubtleCrypto API handle it itself, and just pass an ArrayBuffer of your file.
async function getHash(blob, algo = "SHA-256") {
// convert your Blob to an ArrayBuffer
// could also use a FileRedaer for this for browsers that don't support Response API
const buf = await new Response(blob).arrayBuffer();
const hash = await crypto.subtle.digest(algo, buf);
let result = '';
const view = new DataView(hash);
for (let i = 0; i < hash.byteLength; i += 4) {
result += view.getUint32(i).toString(16).padStart(2, '0');
}
return result;
}
inp.onchange = e => {
getHash(inp.files[0]).then(console.log);
};
<input id="inp" type="file">

Unable to read accented characters from csv file stream in node

To start off. I am currently using npm fast-csv which is a nice CSV reader/writer that is pretty straightforward and simple. What Im attempting to do is use this in conjunction with iconv to process "accented" character and non-ASCII characters and either convert them to an ASCII equivalent or remove them depending on the character.
My current process Im doing with fast-csv is to bring in a chunk for processing (comes in as one row) via a read stream, pause the read stream, process the data, pipe the data to a write stream and then resume the read stream using a callback. Fast-csv currently knows where to separate the chunks based on the format of the data coming in from the readstream.
The entire process looks like this:
var stream = fs.createReadStream(inputFileName);
function csvPull(source) {
csvWrite = csv.createWriteStream({ headers: true });
writableStream = fs.createWriteStream(outputFileName);
csvStream = csv()
.on("data", function (data) {
csvStream.pause();
processRow(data, function () {
csvStream.resume();
});
})
.on("end", function () {
console.log('END OF CSV FILE');
});
csvWrite.pipe(writableStream);
source.pipe(csvStream);
}
csvPull(stream);
The problem I am currently running into is that Im noticing that for some reason, when my javascript compiles, it does not inherently recognise non-ASCII characters, so I am resorting to having to use npm iconv-lite to encode the data stream as it comes in to something usable. However, this presents a bigger issue as fast-csv will no longer know where to split the chunks (rows) due to the now encoded data. This is a problem due to the sizes of the CSVs I will be working with; it will not be an option to load the entire CSV into the buffer to then decode.
Are there any suggestions on how I might get around this without writing my own CSV parser into my code?
Try reading your file with binary for the encoding option. I had to read few csv with some accented characters and it worked fine with that.
var stream = fs.createReadStream(inputFileName, { encoding: 'binary' });
Unless I misunderstand, you should be able to fix this by setting the encoding on the stream to utf-8 (docs).
for the first line:
var stream = fs.createReadStream(inputFileName, {encoding: 'utf8'});
and if needed:
writableStream = fs.createWriteStream(outputFileName, {defaultEncoding: 'utf8'});

Saving text from website using Firefox extension, wrong characters saved

Sorry about the vague title but I'm a bit lost so it's hard to be specific. I've started playing around with Firefox extensions using the add-on SDK. What I'm trying to to is to watch a page for changes, a Twitch.tv chat window in this case, and save those changes to a file.
I've gotten this to work, every time something changes on the page it gets saved. But, "unusual" characters like for example something in Korean doesn't get saved properly. I think this has to do with encoding of the file/string? I tried saving the same characters by copy-pasting them into notepad, it asked me to save in Unicode and when I did everything worked fine. So I figured, ok, I'll change the encoding of the log file to unicode as well before writing to it. Didn't exactly work... Now all the characters were in some kind of foreign language.
The code I'm using to write to the file is this:
var {Cc, Ci, Cu} = require("chrome");
var {FileUtils} = Cu.import("resource://gre/modules/FileUtils.jsm");
var file = FileUtils.getFile("Desk", ["mylogfile.txt"]);
var stream = FileUtils.openFileOutputStream(file, FileUtils.MODE_WRONLY | FileUtils.MODE_CREATE | FileUtils.MODE_APPEND);
stream.write(data, data.length);
stream.close();
I looked at the description of FileUtils.jsm over at MDN and as far as I can tell there's no way to tell it which encoding I want to use?
If you don't know a fix could you give me some good search terms because I seem to be coming up short on that front. Since I know basically nothing on the subject I'm flailing around in the dark a bit at the moment.
edit:
This is what I ended up with (for now) to get this thing working:
var {Cc, Ci, Cu} = require("chrome");
var {FileUtils} = Cu.import("resource://gre/modules/FileUtils.jsm");
var file = Cc['#mozilla.org/file/local;1']
.createInstance(Ci.nsILocalFile);
file.initWithPath('C:\\temp\\temp.txt');
if(!file.exists()){
file.create(file.NORMAL_FILE_TYPE, 0666);
}
var charset = 'UTF-8';
var fileStream = Cc['#mozilla.org/network/file-output-stream;1']
.createInstance(Ci.nsIFileOutputStream);
fileStream.init(file, FileUtils.MODE_WRONLY | FileUtils.MODE_CREATE | FileUtils.MODE_APPEND, 0x200, false);
var converterStream = Cc['#mozilla.org/intl/converter-output-stream;1']
.createInstance(Ci.nsIConverterOutputStream);
converterStream.init(fileStream, charset, data.length,
Ci.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);
converterStream.writeString(data);
converterStream.close();
fileStream.close();
Dumping just the raw bytes (well, raw jschars actually) won't work. You need to first convert the data into some sensible encoding.
See e.g. the File I/O Snippets. Here are the crucial bits of creating a converter output stream wrapper:
var converter = Components.classes["#mozilla.org/intl/converter-output-stream;1"].
createInstance(Components.interfaces.nsIConverterOutputStream);
converter.init(foStream, "UTF-8", 0, 0);
converter.writeString(data);
converter.close(); // this closes foStream
Another way is to use OS.File + TextConverter:
let encoder = new TextEncoder(); // This encoder can be reused for several writes
let array = encoder.encode("This is some text"); // Convert the text to an array
let promise = OS.File.writeAtomic("file.txt", array, // Write the array atomically to "file.txt", using as temporary
{tmpPath: "file.txt.tmp"}); // buffer "file.txt.tmp".
It might be even possible to mix both. OS.File has the benefit that it will write data and access files off the main thread (so it won't block the UI while the file is being written).

Saving byteArray to pdf file in Titanium

I have a SOAP API that is returning me a file divided in chunks encoded in several base64 strings
i'm not being able to save it to the file system without breaking it
This is the pastebin of a whole file encoded, as is, once i download and chain the responses.
What is the way to save it correctly?
i tried in many ways
var f = Ti.FileSystem.getFile(Ti.FileSystem.tempDirectory, 'test.pdf');
...
var blobStream = Ti.Stream.createStream({ source: fileString, mode: Ti.Stream.MODE_READ });
var newBuffer = Ti.createBuffer({ length: fileString.length });
f.write(fileString);
or
var data = Ti.Utils.base64decode(fileString);
var blobStream = Ti.Stream.createStream({ source: data, mode: Ti.Stream.MODE_READ });
var newBuffer = Ti.createBuffer({ length: data.length });
var bytes = blobStream.read(newBuffer);
f.write(fileString);
or
var data = Ti.Utils.base64decode(fileString);
var blobStream = Ti.Stream.createStream({ source: data, mode: Ti.Stream.MODE_READ });
var newBuffer = Ti.createBuffer({ length: data.length });
var bytes = blobStream.read(newBuffer);
f.write(bytes);
but i'm not understanding which one is the right path
Do I have to convert back to byteArray the string on my own?
What is the right way to save it?
Do I have to create a buffer from the string or ...?
I think that the base64enc for the file is not valid or incomplete, I've tested it using bash and base64 utils. You can perform these steps.
Copy and paste the base64 string on a file called pdf.base64 then run this command:
cat pdf.base64 | base64 --decode >> out.pdf
the output file is not a valid pdf.
You can try to encode and decode a valid pdf file to take a look at the generated binary:
cat validfile.pdf | base64 | base64 --decode >> anothervalidfile.pdf
Try to check if you are chainig chunks correctly or simply make a phone call with the guy who build the soap api.
Before you start downloading your file you need to create the file stream to write too, writing to a blob is not the way to go:
// Step 1
var outFileStream = Ti.Filesystem.getFile('outfile.bin').open(Ti.Filesystem.MODE_WRITE);
After creating your HTTPClient or socket stream and when you receive a chunk of Base64 data from the serve, you need to put that decoded data into a Titanium.Buffer. This would probably go into your onload or onstream in an HTPPClient, :
// Step 2
var rawDecodedFileChunk = Ti.Utils.base64decode(fileString);
var outBuffer = Ti.createBuffer({
byteOrder : // May need to set this
type : // May also need to set this to match data
value: rawDecodedFileChunk
});
Finally you can write the data out to the file stream:
// Step 3
var bytesWritten = outFileStream.write(outBuffer); // writes entire buffer to stream
Ti.API.info("Bytes written:" + bytesWritten); // should match data length
if(outBuffer.length !== bytesWritten) {
Ti.API.error("Not all bytes written!");
}
Generally errors come from having the wrong byte order or type of data, or writing in the wrong order. Obviously, this all depends on the server sending the data in the correct order and it being valid!
You may also want to consider the pump command version of this, which allows you to transfer from input stream to output file stream, minimizing your load.

How to convert character encoding from CP932 to UTF-8 in nodejs javascript, using the nodejs-iconv module (or other solution)

I'm attempting to convert a string from CP932 (aka Windows-31J) to utf8 in javascript. Basically I'm crawling a site that ignores the utf-8 request in the request header and returns cp932 encoded text (even though the html metatag indicates that the page is shift_jis).
Anyway, I have the entire page stored in a string variable called "html". From there I'm attempting to convert it to utf8 using this code:
var Iconv = require('iconv').Iconv;
var conv = new Iconv('CP932', 'UTF-8//TRANSLIT//IGNORE');
var myBuffer = new Buffer(html.length * 3);
myBuffer.write(html, 0, 'utf8')
var utf8html = (conv.convert(myBuffer)).toString('utf8');
The result is not what it's supposed to be. For example, the string: "投稿者さんの 稚内全日空ホテル のクチコミ (感想・情報)" comes out as "ソスソスソスeソスメゑソスソスソスソスソス ソスtソスソスソスSソスソスソスソスソスzソスeソスソス ソスフクソス`ソスRソス~ (ソスソスソスzソスEソスソスソスソス)"
If I remove //TRANSLIT//IGNORE (Which should cause it to return similar characters for missing characters, and failing that omit non-transcode-able characters), I get this error:
Error: EILSEQ, Illegal character sequence.
I'm open to using any solution that can be implemented in nodejs, but my search results haven't yielded many options outside of the nodejs-iconv module.
nodejs-iconv ref: https://github.com/bnoordhuis/node-iconv
Thanks!
Edit 24.06.2011:
I've gone ahead and implemented a solution in Java. However I'd still be interested in a javascript solution to this problem if somebody can solve it.
I got same trouble today :)
It depends libiconv. You need libiconv-1.13-ja-1.patch.
Please check followings.
http://d.hatena.ne.jp/ushiboy/20110422/1303481470
http://code.xenophy.com/?p=1529
or you can avoid problem using iconv-jp try
npm install iconv-jp
I had same problem, but with CP1250. I was looking for problem everywhere and everything was OK, except call of request – I had to add encoding: 'binary'.
request = require('request')
Iconv = require('iconv').Iconv
request({uri: url, encoding: 'binary'}, function(err, response, body) {
body = new Buffer(body, 'binary')
iconv = new Iconv('CP1250', 'UTF8')
body = iconv.convert(body).toString()
// ...
})
https://github.com/bnoordhuis/node-iconv/issues/19
I tried /Users/Me/node_modules/iconv/test.js
node test.js.
It return error.
On Mac OS X Lion, this problem seems depend on gcc.

Categories