Online/streaming MD5 algorithm? - javascript

Is it possible to access the data as it's coming in with HTML5 using the FileReader API and the onprogress event?
If so, is there an "online" version of MD5 or other fast hashing algorithm so that I can begin computing the hash before the file is fully read?
I would like to compute hashes client-side and send just the hash to the server before sending an entire file to check for duplicates before initiating a file upload.
I am not concerned with support for older browsers at this time.
Edit: I recognize that a hash collision does not guarantee a duplicate file, and the only way to be sure is to check byte-by-byte which would mean I would have to upload the file anyway. The probability is low enough that I'm willing to take this risk; worst case I prompt the user and say "This file already appears to be on the server; are you sure you want to upload it?"

is there an "online" version of MD5 or other fast hashing algorithm so that I can begin computing the hash before the file is fully read?
Yes, you can use sjcl if you want to use SHA. sjcl has no native support for MD5, so you'll have to write it yourself (though I'm sure someone else has done it already). CryptoJS has native MD5 support but is significantly slower.
I recognize that a hash collision does not guarantee a duplicate file [...] The probability is low enough that I'm willing to take this risk;
The probability is low enough that there's better chance of a meteor hitting the Earth and ending human life (thus removing the need for hashing altogether) than for a collision to naturally occur. Unless the user crafted a collision on purpose, of course, since MD5's collision resistance is broken.
Here's a live demo of what I believe you're trying to accomplish, minus the "access data as it comes" part. I am not sure if that's possible. I wrote this a long time ago and it uses CryptoJS, so performance isn't that great but it gets the job done. The important chunks are:
function handleFileSelect(evt)
{
evt.stopPropagation();
evt.preventDefault();
var files = evt.target.files || evt.dataTransfer.files; // FileList object.
for (var i=0, file; file = files[i]; ++i)
{
// this creates the FileReader and reads stuff as text
var fr = new FileReader();
fr.onload = (function(theFile) {
return function (e) {
var hashes = parsePseudoBuffer(e.target.result);
document.getElementById('output').innerHTML += '<br />' + theFile.name + '<br />'
+ 'MD5: ' + hashes.md5 + '<br />' + 'SHA1: ' + hashes.sha1 + '<br />' ;
};
}) (file);
fr.readAsArrayBuffer(file); // ArrayBuffer
}
}
function parsePseudoBuffer(result)
{
var buffs = new Uint8Array(result); // buffer thingie
var md5 = CryptoJS.algo.MD5.create();
var sha1 = CryptoJS.algo.SHA1.create();
var bufsize = 8 * 1024; // 8K buffer
for (var bstart=0, bend=bufsize; bstart < buffs.length; bstart+=bufsize, bend+= bufsize)
{
var data = CryptoJS.lib.WordArray.create(buffs.subarray(bstart, bend));
md5.update(data);
sha1.update(data);
}
md5 = md5.finalize();
sha1 = sha1.finalize();
return {'md5': md5, 'sha1': sha1} ;
}

I did some experimenting. It looks like we can get the last chunk read inside the onprogress event by utilizing the incomplete result on the reader object. It only appears to be accessible if we use reader.readAsArrayBuffer (Chrome only?) or reader.readAsBinaryString. The problem with strings is that if that if you want to take a chunk of it, you have to slice it which makes a copy (very slow).
ArrayBuffers have a .subarray method which creates a view into the buffer, without copying any data. This is exactly what we want. However, it doesn't appear to be available on the base class; and it's not clear from the documentation what happens when we construct a derived class (e.g. Uint8Array) using this buffer, but considering the original buffer is accessible via a readonly property, I'm assuming it's not copying.
Both sjcl and CryptoJS conveniently have .update methods which will take in this ArrayBufferView so that you can update your hash on the fly. Thus, I have come up with the following solution (using jQuery, underscore and sjcl):
$(document).on('drop', function(dropEvent) {
dropEvent.preventDefault();
_.each(dropEvent.originalEvent.dataTransfer.files, function(file) {
var reader = new FileReader();
var pos = 0;
var hash = new sjcl.hash.sha256();
reader.onprogress = function(progress) {
var chunk = new Uint8Array(reader.result, pos, progress.loaded - pos);
pos = progress.loaded;
hash.update(chunk);
if(progress.lengthComputable) {
console.log((progress.loaded/progress.total*100).toFixed(1)+'%');
}
};
reader.onload = function() {
var chunk = new Uint8Array(reader.result, pos);
if(chunk.length > 0) hash.update(chunk);
console.log(sjcl.codec.hex.fromBits(hash.finalize()));
};
reader.readAsArrayBuffer(file);
});
});
Note that this solution presently only works in Chrome and is fairly slow. I think sjcl isn't just hashing the file, but it's key-strengthening it, which really isn't what I want. Will investigate more later.

Related

Mechanisms for hashing a file in JavaScript

I am relatively new to JavaScript and I want to get the hash of a file, and would like to better understand the mechanism and code behind the process.
So, what I need: An MD5 or SHA-256 hash of an uploaded file to my website.
My understanding of how this works: A file is uploaded via an HTML input tag of type 'file', after which it is converted to a binary string, which is consequently hashed.
What I have so far: I have managed to get the hash of an input of type 'text', and also, somehow, the hash of an uploaded file, although the hash did not match with websites I looked at online, so I'm guessing it hashed some other details of the file, instead of the binary string.
Question 1: Am I correct in my understanding of how a file is hashed? Meaning, is it the binary string that gets hashed?
Question 2: What should my code look like to upload a file, hash it, and display the output?
Thank you in advance.
Basically yes, that's how it works.
But, to generate such hash, you don't need to do the conversion to string yourself. Instead, let the SubtleCrypto API handle it itself, and just pass an ArrayBuffer of your file.
async function getHash(blob, algo = "SHA-256") {
// convert your Blob to an ArrayBuffer
// could also use a FileRedaer for this for browsers that don't support Response API
const buf = await new Response(blob).arrayBuffer();
const hash = await crypto.subtle.digest(algo, buf);
let result = '';
const view = new DataView(hash);
for (let i = 0; i < hash.byteLength; i += 4) {
result += view.getUint32(i).toString(16).padStart(2, '0');
}
return result;
}
inp.onchange = e => {
getHash(inp.files[0]).then(console.log);
};
<input id="inp" type="file">

Saving text from website using Firefox extension, wrong characters saved

Sorry about the vague title but I'm a bit lost so it's hard to be specific. I've started playing around with Firefox extensions using the add-on SDK. What I'm trying to to is to watch a page for changes, a Twitch.tv chat window in this case, and save those changes to a file.
I've gotten this to work, every time something changes on the page it gets saved. But, "unusual" characters like for example something in Korean doesn't get saved properly. I think this has to do with encoding of the file/string? I tried saving the same characters by copy-pasting them into notepad, it asked me to save in Unicode and when I did everything worked fine. So I figured, ok, I'll change the encoding of the log file to unicode as well before writing to it. Didn't exactly work... Now all the characters were in some kind of foreign language.
The code I'm using to write to the file is this:
var {Cc, Ci, Cu} = require("chrome");
var {FileUtils} = Cu.import("resource://gre/modules/FileUtils.jsm");
var file = FileUtils.getFile("Desk", ["mylogfile.txt"]);
var stream = FileUtils.openFileOutputStream(file, FileUtils.MODE_WRONLY | FileUtils.MODE_CREATE | FileUtils.MODE_APPEND);
stream.write(data, data.length);
stream.close();
I looked at the description of FileUtils.jsm over at MDN and as far as I can tell there's no way to tell it which encoding I want to use?
If you don't know a fix could you give me some good search terms because I seem to be coming up short on that front. Since I know basically nothing on the subject I'm flailing around in the dark a bit at the moment.
edit:
This is what I ended up with (for now) to get this thing working:
var {Cc, Ci, Cu} = require("chrome");
var {FileUtils} = Cu.import("resource://gre/modules/FileUtils.jsm");
var file = Cc['#mozilla.org/file/local;1']
.createInstance(Ci.nsILocalFile);
file.initWithPath('C:\\temp\\temp.txt');
if(!file.exists()){
file.create(file.NORMAL_FILE_TYPE, 0666);
}
var charset = 'UTF-8';
var fileStream = Cc['#mozilla.org/network/file-output-stream;1']
.createInstance(Ci.nsIFileOutputStream);
fileStream.init(file, FileUtils.MODE_WRONLY | FileUtils.MODE_CREATE | FileUtils.MODE_APPEND, 0x200, false);
var converterStream = Cc['#mozilla.org/intl/converter-output-stream;1']
.createInstance(Ci.nsIConverterOutputStream);
converterStream.init(fileStream, charset, data.length,
Ci.nsIConverterInputStream.DEFAULT_REPLACEMENT_CHARACTER);
converterStream.writeString(data);
converterStream.close();
fileStream.close();
Dumping just the raw bytes (well, raw jschars actually) won't work. You need to first convert the data into some sensible encoding.
See e.g. the File I/O Snippets. Here are the crucial bits of creating a converter output stream wrapper:
var converter = Components.classes["#mozilla.org/intl/converter-output-stream;1"].
createInstance(Components.interfaces.nsIConverterOutputStream);
converter.init(foStream, "UTF-8", 0, 0);
converter.writeString(data);
converter.close(); // this closes foStream
Another way is to use OS.File + TextConverter:
let encoder = new TextEncoder(); // This encoder can be reused for several writes
let array = encoder.encode("This is some text"); // Convert the text to an array
let promise = OS.File.writeAtomic("file.txt", array, // Write the array atomically to "file.txt", using as temporary
{tmpPath: "file.txt.tmp"}); // buffer "file.txt.tmp".
It might be even possible to mix both. OS.File has the benefit that it will write data and access files off the main thread (so it won't block the UI while the file is being written).

Rapidly updating image with Data URI causes caching, memory leak

I have a webpage that rapidly streams JSON from the server and displays bits of it, about 10 times/second. One part is a base64-encoded PNG image. I've found a few different ways to display the image, but all of them cause unbounded memory usage. It rises from 50mb to 2gb within minutes. Happens with Chrome, Safari, and Firefox. Haven't tried IE.
I discovered the memory usage first by looking at Activity Monitor.app -- the Google Chrome Renderer process continuously eats memory. Then, I looked at Chrome's Resource inspector (View > Developer > Developer Tools, Resources), and I saw that it was caching the images. Every time I changed the img src, or created a new Image() and set its src, Chrome cached it. I can only imagine the other browsers are doing the same.
Is there any way to control this caching? Can I turn it off, or do something sneaky so it never happens?
Edit: I'd like to be able to use the technique in Safari/Mobile Safari. Also, I'm open to other methods of rapidly refreshing an image if anyone has any ideas.
Here are the methods I've tried. Each one resides in a function that gets called on AJAX completion.
Method 1 - Directly set the src attribute on an img tag
Fast. Displays nicely. Leaks like crazy.
$('#placeholder_img').attr('src', 'data:image/png;base64,' + imgString);
Method 2 - Replace img with a canvas, and use drawImage
Displays fine, but still leaks.
var canvas = document.getElementById("placeholder_canvas");
var ctx = canvas.getContext("2d");
var img = new Image();
img.onload = function() {
ctx.drawImage(img, 0, 0);
}
img.src = "data:image/png;base64," + imgString;
Method 3 - Convert to binary and replace canvas contents
I'm doing something wrong here -- the images display small and look like random noise. This method uses a controlled amount of memory (grows to 100mb and stops), but it is slow, especially in Safari (~50% CPU usage there, 17% in Chrome). The idea came from this similar SO question: Data URI leak in Safari (was: Memory Leak with HTML5 canvas)
var img = atob(imgString);
var binimg = [];
for(var i = 0; i < img.length; i++) {
binimg.push(img.charCodeAt(i));
}
var bytearray = new Uint8Array(binimg);
// Grab the existing image from canvas
var ctx = document.getElementById("placeholder_canvas").getContext("2d");
var width = ctx.canvas.width,
height = ctx.canvas.height;
var imgdata = ctx.getImageData(0, 0, width, height);
// Overwrite it with new data
for(var i = 8, len = imgdata.data.length; i < len; i++) {
imgdata.data[i-8] = bytearray[i];
}
// Write it back
ctx.putImageData(imgdata, 0, 0);
I know it's been years since this issue was posted, but the problem still exists in recent versions of Safari Browser. So I have a definitive solution that works in all browsers, and I think this could save jobs or lives!.
Copy the following code somewhere in your html page:
// Methods to address the memory leaks problems in Safari
var BASE64_MARKER = ';base64,';
var temporaryImage;
var objectURL = window.URL || window.webkitURL;
function convertDataURIToBlob(dataURI) {
// Validate input data
if(!dataURI) return;
// Convert image (in base64) to binary data
var base64Index = dataURI.indexOf(BASE64_MARKER) + BASE64_MARKER.length;
var base64 = dataURI.substring(base64Index);
var raw = window.atob(base64);
var rawLength = raw.length;
var array = new Uint8Array(new ArrayBuffer(rawLength));
for(i = 0; i < rawLength; i++) {
array[i] = raw.charCodeAt(i);
}
// Create and return a new blob object using binary data
return new Blob([array], {type: "image/jpeg"});
}
Then when you receive a new frame/image base64Image in base64 format (e.g. data:image/jpeg;base64, LzlqLzRBQ...) and you want to update a html <img /> object imageElement, then use this code:
// Destroy old image
if(temporaryImage) objectURL.revokeObjectURL(temporaryImage);
// Create a new image from binary data
var imageDataBlob = convertDataURIToBlob(base64Image);
// Create a new object URL object
temporaryImage = objectURL.createObjectURL(imageDataBlob);
// Set the new image
imageElement.src = temporaryImage;
Repeat this last code as much as needed and no memory leaks will appear. This solution doesn't require the use of the canvas element, but you can adapt the code to make it work.
Try setting image.src = "" after drawing.
var canvas = document.getElementById("placeholder_canvas");
var ctx = canvas.getContext("2d");
var img = new Image();
img.onload = function() {
ctx.drawImage(img, 0, 0);
//after drawing set src empty
img.src = "";
}
img.src = "data:image/png;base64," + imgString;
This might be helped
I don't think there are any guarantees given about the memory usage of data URLs. If you can figure out a way to get them to behave in one browser, it guarantees little if not nothing about other browsers or versions.
If you put your image data into a blob and then create a blob URL, you can then deallocate that data.
Here's an example which turns a data URI into a blob URL; you may need to change / drop the webkit- & WebKit- prefixes on browsers other than Chrome and possibly future versions of Chrome.
var parts = dataURL.match(/data:([^;]*)(;base64)?,([0-9A-Za-z+/]+)/);
//assume base64 encoding
var binStr = atob(parts[3]);
//might be able to replace the following lines with just
// var view = new Uint8Array(binStr);
//haven't tested.
//convert to binary in ArrayBuffer
var buf = new ArrayBuffer(binStr.length);
var view = new Uint8Array(buf);
for(var i = 0; i < view.length; i++)
view[i] = binStr.charCodeAt(i);
//end of the possibly unnecessary lines
var builder = new WebKitBlobBuilder();
builder.append(buf);
//create blob with mime type, create URL for it
var URL = webkitURL.createObjectURL(builder.getBlob(parts[1]))
return URL;
Deallocating is as easy as :
webkitURL.revokeObjectURL(URL);
And you can use your blob URL as your img's src.
Unfortunately, blob URLs do not appear to be supported in IE prior to v10.
API reference:
http://www.w3.org/TR/FileAPI/#dfn-createObjectURL
http://www.w3.org/TR/FileAPI/#dfn-revokeObjectURL
Compatibility reference:
http://caniuse.com/#search=blob%20url
I had a very similar issue.
Setting img.src to dataUrl Leaks Memory
Long story short, I simply worked around the Image element. I use a javascript decoder to decode and display the image data onto a canvas. Unless the user tries to download the image, they'll never know the difference either. The other downside is that you're going to be limited to modern browsers. The up side is that this method doesn't leak like a sieve :)
patching up ellisbben's answer, since BlobBuilder is obsoleted and https://developer.mozilla.org/en-US/Add-ons/Code_snippets/StringView provides what appears to be a nice quick conversion from base64 to UInt8Array:
in html:
<script src='js/stringview.js'></script>
in js:
window.URL = window.URL ||
window.webkitURL;
function blobify_dataurl(dataURL){
var parts = dataURL.match(/data:([^;]*)(;base64)?,([0-9A-Za-z+/]+)/);
//assume base64 encoding
var binStr = atob(parts[3]);
//convert to binary in StringView
var view = StringView.base64ToBytes(parts[3]);
var blob = new Blob([view], {type: parts[1]}); // pass a useful mime type here
//create blob with mime type, create URL for it
var outURL = URL.createObjectURL(blob);
return outURL;
}
I still don't see it actually updating the image in Safari mobile, but chrome can receive dataurls rapid-fire over websocket and keep up with them far better than having to manually iterate over the string. And if you know you'll always have the same type of dataurl, you could even swap the regex out for a substring (likely quicker...?)
Running some quick memory profiles, it looks like Chrome is even able to keep up with deallocations (if you remember to do them...):
URL.revokeObjectURL(outURL);
I have used different methods to solve this problem, none of them works. It seems that memory leaks when img.src = base64string and those memory can never get released. Here is my solution.
fs.writeFile('img0.jpg', img_data, function (err) {
// console.log("save img!" );
});
document.getElementById("my-img").src = 'img0.jpg?'+img_step;
img_step+=1;
Note that you should convert base64 to jpeg buffer.
My Electron app updating img every 50ms, and memory doesn't leak.
Forget about disk usage. Chrome's memory management piss me off.
Unless Safari or Mobile Safari don't leak data urls, server-side might be the only way to do this on all browsers.
Probably most straightforward would be to make a URL for your image stream, GETting it gives a 302 or 303 response redirecting to a single-use URL that will give the desired image. You will probably have to destroy and re-create the image tags to force a reload of the URL.
You will also be at the mercy of the browser regarding its img caching behavior. And the mercy of my understanding (or lack of understanding) of the HTTP spec. Still, unless server-side operation doesn't fit your requirements, try this first. It adds to the complexity of the server, but this approach uses the browser much more naturally.
But what about using the browser un-naturally? Depending on how browsers implement iframes and handle their associated content, you might be able to get data urls working without leaking the memory. This is kinda Frankenstein shit and is exactly the sort of nonsense that no one should have to do. Upside: it could work. Downside: there are a bazillion ways to try it and uneven, undocumented behavior is exactly what I'd expect.
One idea: embed an iframe containing a page; this page and the page that it is embedded in use cross document messaging (note the GREEN in the compatibility matrix!); embeddee gets the PNG string and passes it along to the embedded page, which then makes an appropriate img tag. When the embeddee needs to display a new message, it destroys the embedded iframe (hopefully releasing the memory of the data url) then creates a new one and passes it the new PNG string.
If you want to be marginally more clever, you could actually embed the source for the embedded frame in the embeddee page as a data url; however, this might leak that data url, which I guess would be poetic justice for trying such a reacharound.
"Something that works in Safari would be better." Browser technology keeps on moving forward, unevenly. When they don't hand the functionality to you on a plate, you gotta get devious.
var inc = 1;
var Bulk = 540;
var tot = 540;
var audtot = 35.90;
var canvas = document.getElementById("myCanvas");
//var imggg = document.getElementById("myimg");
canvas.width = 550;
canvas.height = 400;
var context = canvas.getContext("2d");
var variation = 0.2;
var interval = 65;
function JLoop() {
if (inc < tot) {
if (vid.currentTime < ((audtot * inc) / tot) - variation || (vid.currentTime > ((audtot * inc) / tot) + variation)) {
contflag = 1;
vid.currentTime = ((audtot * inc) / tot);
}
// Draw the animation
try {
context.clearRect(0, 0, canvas.width, canvas.height);
if (arr[inc - 1] != undefined) {
context.drawImage(arr[inc - 1], 0, 0, canvas.width, canvas.height);
arr[inc - 1].src = "";
//document.getElementById("myimg" + inc).style.display = "block";;
// document.getElementById("myimg" + (inc-1)).style.display = "none";
//imggg.src = arr[inc - 1].src;
}
$("#audiofile").val(inc);
// clearInterval(ref);
} catch (e) {
}
inc++;
// interval = 60;
//setTimeout(JLoop, interval);
}
else {
}
}
var ref = setInterval(JLoop, interval);
});
Worked for me on memory leak thanks dude.

Why am I running out of memory when downloading multiple files in appcelerator mobile on Android

I have asked this question also on the appcelerator forum, but as I find I often get better answers from you lovely people here on stackoverflow I am also asking it here just incase anyone can spread some light.
I have created a downloadQueue of urls and am using it to download files with the httpclient. Each file in the downloadQueue is is sent the the httpclient one at a time, with the next download being initiated only after the previous has been completed.
When I start the download, it seems to be working correctly and manages to download several files before it it simply freezes and I get an "out of memory" error in the DDMS error log.
I tried implementing suggestions found in other posts a sample of which are:
[http://developer.appcelerator.com/question/28911/httpclient-leaks-easily-or-can-we-have-a-close-method#answer-104241][1]
[http://developer.appcelerator.com/question/35041/large-file-download-on-mobile][2]
[http://developer.appcelerator.com/question/120129/httpclient-and-setfile][3]
[http://developer.appcelerator.com/question/95521/httpclient---save-response-directly-to-file][4]
I tried all of the following:
- moving larger file downloads directly form the nativepath rather then simply saving to file in order to insure that tmp files are not kept longer then necessary.
using the undocument setFile method of the httpclient. (This stopped my code dead without any error message, and as it is undocumented I have no idea if it was ever implemented on android anyway)
-using a settimeout in httplient.onload after the file has been download to pause for 1 second before requesting the next file (I have no idea how this would help, but I am clutching a straws now)
Below is the relevant parts of my code (which is complete except for the GetFileUrls functions which I excluded for simplicity sake as all this function does is return an array of URLs).
Can anyone spot anything that might be causing my memory issue. Does anyone have any ideas as I have tried everthing I can think of? (HELP!)
var count = 0;
var downloadQueue = [];
var rootDir = Ti.Filesystem.getExternalStorageDirectory();
downloadQueue = GetFileUrls(); /* this function is not included in order to keep my post as short as possible, bu it returns an array of urls */
DownloadFile(downloadQueue[count]);
var downloader = Ti.Network.createHTTPClient({timeout:10000});
downloader.onerror = function(){
Ti.API.info(this.responseData);
}
downloader.onload = function(){
SaveFile(this.folderName, this.fileName, this.responseData);
count += 1;
setTimeout( function(){ DownloadFile(); }, 1000);
}
function DownloadFile(){
if (count < downloadQueue.length){
var fileUrl = downloadQueue[count];
var fileName = fileUrl.substring(fileUrl.lastIndexOf('/') + 1);
downloader.fileName = fileName;
downloader.folderName = rootDir;
downloader.open('GET', fileUrl);
downloader.send();
}
}
function SaveFile(foldername, filename, response){
if (response.type == 1){
var f = Ti.Filesystem.getFile(response.nativePath);
var dest = Ti.Filesystem.getFile(foldername, filename);
if (dest.exists()){
dest.deleteFile();
}
f.move(dest.nativePath);
}else{
var dest = Ti.Filesystem.getFile(foldername, filename);
dest.write(response);
}
}
try to use events instead of the nested recursion that you are using. Android does not seem to like that too much

get remote image into string

The question says it all: how to get a remotely hosted image into a string. I will later use XMLHTTPPost to upload the content. This is javascript question, for those who don't read tag line.
#Madmartigan: the script itself is executed in rather odd manner: user uses javascript: to append the script from the remove host. (this gives access to the user cookie session, which we need in order to proceed) This generates form, giving user ability to setup some texts. (this is easy bit) When user clicks upload the script must get an image hosted on remote host. I am trying to get the image from the remote host as a string and then use something like the function below to convert it to binary. So, how do I do that?
function toBin(str){
var st,i,j,d;
var arr = [];
var len = str.length;
for (i = 1; i<=len; i++){
//reverse so its like a stack
d = str.charCodeAt(len-i);
for (j = 0; j < 8; j++) {
st = d%2 == '0' ? "class='zero'" : ""
arr.push(d%2);
d = Math.floor(d/2);
}
}
//reverse all bits again.
return arr.reverse().join("");
}
I should mention, that I managed to find things like:
var reader = new FileReader();
reader.onload = function() {
previewImage.src = reader.result;
}
reader.readAsDataURL(myFile);
However, they are very browser dependent and therefore not very useful.
I am trying to avoid using base64 because of the redundant size increase.
EDIT: take a look here. Take should help you: http://www.nihilogic.dk/labs/exif/ or maybe here: http://jsfromhell.com/classes/binary-parser the only way to store binary data into string in javascript context is to use base64/base128 encoding. But I never tried it myself to do that in case of a image. There are many JavaScript base encoder/decoder out. Hope this helps you.

Categories