So I ran into an issue where I was writing contents to a file via the HTML5 File-system api. The issue occurs when new content is shorter than the previous content, the old content is written-over as expected, but the tail of the old contents remain at the end of the file. The data I am writing is meta data for a given web-app and tends to change periodically, but not very often, generally increasing in size but occasionally the meta data is smaller in size.
Example, original content of file 0000000000, new content 11123 and after writing to the file, the contents become 1112300000
To get around this, I have been removing the file and passing a callback to write the new information in on every call. (cnDAO.filesystem is the filesystem object obtained when requesting persistent memory and has been initialized appropriately)
function writeToFile(fPath,data,callback){
rmFile(fPath,function(){
cnDAO.fileSystem.root.getFile(fPath, {
create: true
}, function(fileEntry) {
fileEntry.createWriter(function(writer) {
writer.onwriteend = function(e) {
callback();
};
writer.onerror = function(e3) { };
var blob = new Blob([data]);
writer.write(blob);
}, errorHandler);
}, errorHandler);
});
}
function rmFile(fPath,callback){
cnDAO.fileSystem.root.getFile(fPath, {
create: true
}, function(fileEntry) {
fileEntry.remove(callback);
}, errorHandler);
}
So, I was wondering if there was a better way to do what I am doing. truncate appeared in the following while I was searching for a solution (this post). As pointed out in the previous post truncate can only be called immediately after opening a file - Is truncate a better approach? Is what I'm doing better practice? Is there a quicker and easier way that I do not know about?
I would like to just start-fresh on every write to file- if that is plausible and/or good practice.
Related
I need to extract just the header from a remote csv file.
My current method is as follows:
Papa parse has a method to stream data and look at each row individually which is great, and I can terminate the stream using parser.abort() to prevent it going any further after the first row, this looks as follows:
Papa.parse(csv_file_and_path,{header:true, worker:true,
download: true,
step: function(row, parser)
{
//DO MY STUFF HERE
parser.abort();
}
});
This works fine, but because I am using a remote file, it has to download the data in order to read it. Even though the code releases control back to the browser after the first line has been parsed, the download continues long after the parsing has found the first row and given me the information I need, particularly for large files where the download can continue for a long time after I've got what I need.
Is there a more efficient way of doing this? Can I prevent papa parse from downloading the whole file?
I have tried using
Papa.parse(csv_file,{header:true,
download: true,
preview:1,
complete: function(results){
//DO MY STUFF HERE
}
});
But this does the same thing, it downloads the entire file, but as with the first approach gives back control to the browser after the first line is parsed.
The solution I came up with is very similar to my original question, the difference being that I abort, complete and clear the memory.
Using the following method, only a single chunk of the file is downloaded, massively reducing bandwidth overhead for a large file as there is no downloading continuing after the first line is parsed.
Papa.parse(csv_file,{header:true,
download: true,
step: function(results, parser) {
//DO MY THING HERE
parser.abort();
results=null; //Attempting to clear the results from memory
delete results; //Attempting to clear the results from memory
}, complete: function(results){
results=null; //Attempting to clear the results from memory
delete results; //Attempting to clear the results from memory
}
});
You can use the preview option of PapaParse:
Papa.parse(..., {
preview: 5, ...
Also read this: https://github.com/mholt/PapaParse/issues/47
Related topic: Javascript using File.Reader() to read line by line
Here is the function:
this.saveObj = function(o, finished)
{
root.getDirectory("object", {create: true}, function(directoryEntry)
{
directoryEntry.getFile("object.json", {create: true}, function(fileEntry)
{
fileEntry.createWriter(function(fileWriter)
{
fileWriter.onwriteend = function(e)
{
finished(fileEntry);
};
fileWriter.onerror = errorHandler;
var blob = new Blob([JSON.stringify(o)], {type: "json"});
fileWriter.write(blob);
}, errorHandler);
}, errorHandler);
}, errorHandler);
};
Now when I save an object everything works fine. Lets say I save {"id":1} my file content would be {"id":1}. Now I edit the object with o = {}; and save it again, my file content suddenly is {} "id":1 }.
It just overwrites the old content, but doesn't clean it. Do I have to delete the file before writing it or is there something I'm missing?
For as far as I understand the write method will write the supplied content to a position. To me this implies that the existing content is untouched unless you are overwriting parts. So I'm going to say yes, delete the file and save a new one.
source
According to the Mozilla documentation using only { create: true} :
The existing file or directory is removed and replaced with a new one,
then the successCallback is called with a FileSystemFileEntry or a
FileSystemDirectoryEntry, as appropriate.
Tested in Chrome 72 this seems to be the case.
This does not work as the file seems to be persist. The file will be overwritten (first bytes) but the size will remain the same. So this is a bug in at least Chrome 72.
Source
The JavaScript process generates a lot of data (200-300MB). I would like to save this data for further analysis but the best I found so far is saving using this example http://jsfiddle.net/c2U2T/ which is not an option for me, because it looks like it requires all the data being available before starting the downloading. But what I need is something like
var saver = new Saver();
saver.save(); // The Save As ... dialog appears
saver.onaccepted = function () { // user accepted saving
for (var i = 0; i < 1000000; i++) {
saver.write(Math.random());
}
};
Of course, instead of the Math.random() will be some meaningful construction.
#dader - I would build upon dader's example.
Use HTML5 FileSystem API - but instead of writing to the file each and every line (more IO than it is worth), you can batch some of the lines in memory in a javascript object/array/string, and only write it to the file when they reach a certain threshold. You are thus appending to a local file as the process chugs (makes it easy to pause/restart/stop etc)
Of note is the following, which is an example of how you can spawn the dialoge to request the amount of data that you would need (it sounds large). Tested in chrome.:
navigator.persistentStorage.queryUsageAndQuota(
function (usage, quota) {
var availableSpace = quota - usage;
var requestingQuota = args.size + usage;
if (availableSpace >= args.size) {
window.requestFileSystem(PERSISTENT, availableSpace, persistentStorageGranted, persistentStorageDenied);
} else {
navigator.persistentStorage.requestQuota(
requestingQuota, function (grantedQuota) {
window.requestFileSystem(PERSISTENT, grantedQuota - usage, persistentStorageGranted, persistentStorageDenied);
}, errorCb
);
}
}, errorCb);
When you are done you can use Javascript to open a new window with the url of that blob object that you saved which you can retrieve via: fileEntry.toURL()
OR - when it is done crunching you can just display that URL in an html link and then they could right click on it and do whatever Save Link As that they want.
But this is something that is new and cool that you can do entirely in the browser without needing to involve a server in any way at all. Side note, 200-300MB of data generated by a Javascript Process sounds absolutely huge... that would be a concern for whether you are storing the "right" data...
What you actually are trying to do is a kind of streaming. I mean FileAPI is not suited for the task. Instead, I could suggest two options :
The first, using XHR facility, ie ajax, by splitting your data into several chunks which will sequencially be sent to the server, each chunk in its own request along with an id ( for identifying the stream ) and a position index ( for identifying the chunk position ). I won't recommend that, since it adds work to break up and reassemble data, and since there's a better solution.
The second way of achieving this is to use Websocket API. It allows you to send data sequentially to the server as it is generated. Following a usual stream API. I think you definitely need this.
This page may be a good place to start at : http://binaryjs.com/
That's all folks !
EDIT considering your comment :
I'm not sure to perfectly get your point though but, what about HTML5's FileSystem API ?
There are a couple examples here : http://www.html5rocks.com/en/tutorials/file/filesystem/ among which this sample that allows you to append data to an existant file. You can also create a new file, etc. :
function onInitFs(fs) {
fs.root.getFile('log.txt', {create: false}, function(fileEntry) {
// Create a FileWriter object for our FileEntry (log.txt).
fileEntry.createWriter(function(fileWriter) {
fileWriter.seek(fileWriter.length); // Start write position at EOF.
// Create a new Blob and write it to log.txt.
var blob = new Blob(['Hello World'], {type: 'text/plain'});
fileWriter.write(blob);
}, errorHandler);
}, errorHandler);
}
EDIT 2 :
What you're trying to do is not possible using javascript as said on SO here. Tha author nonetheless suggest to use Java Applet to achieve needed behaviour.
To put it in a nutshell, HTML5 Filesystem API only provides a sandboxed filesystem, ie located in some hidden directory of the browser. So if you want to access the true filesystem, using java would be just fine considering your use case. I guess there is an interface between java and javascript here.
But if you want to make your data only available from the browser ( constrained by same origin policy ), use FileSystem API.
Chrome implements the file interface as described here http://www.html5rocks.com/en/tutorials/file/filesystem/, just adding the webkit prefix. The documentation covers several aspects of the interface, but what are the simplest steps, for example, to prompt the user with a file saving dialog, or to tell him that the file has been saved somewhere? For example, let's say we want to save some text data for the user.
I'm mainly referring to lines of code as a metric of simplicity, but within the 80 characters per line (and common sense). I'm also referring to Chrome 26.
This is what i found. Naturally, it's use is quite limited, and it is better to refer to the main article linked above
function error(e) { console.log(e); };
webkitRequestFileSystem(TEMPORARY, Math.pow(2, 10), function(fs) {
fs.root.getFile( 'exported.txt', {create:true}, function(fileEntry) {
fileEntry.createWriter(function(fileWriter) {
fileWriter.onwriteend = function() {
alert('content saved to '+fileEntry.fullPath);
};
var blob = new Blob(['Lorem Ipsum'], {type: 'text/plain'});
fileWriter.write(blob);
});
}, error);
}, error);
I'm trying to write a fail-safe program that uses the canvas to draw very large images (60 MB is probably the upper range, while 10 MB is the lower range). I have discovered long ago that calling the canvas's synchronous function toDataURL usually causes the page to crash in the browser, so I have adapted the program to use the toBlob method using a filler for cross-browser compatibility. My question is this: How long do Blob URLs using the URL.createObjectURL(blob) method last?
I would like to know if there's a way to cache the Blob URL that will allow it to last beyond the browser session in case somebody wants to render part of the image at one point, close the browser, and come back and finish it later by reading the Blob URL into the canvas again and resuming from the point at which it left off. I noticed that this optional autoRevoke argument may be what I'm looking for, but I'd like a confirmation that what I'm trying to do is actually possible. No code example is needed in your answer unless it involves a different solution, all I need is a yes or no on if it's possible to make a Blob URL last beyond sessions using this method or otherwise. (This would also be handy if for some reason the page crashes and it acts like a "restore session" option too.)
I was thinking of something like this:
function saveCache() {
var canvas = $("#canvas")[0];
canvas.toBlob(function (blob) {
/*if I understand correctly, this prevents it from unloading
automatically after one asynchronous callback*/
var blobURL = URL.createObjectURL(blob, {autoRevoke: false});
localStorage.setItem("cache", blobURL);
});
}
//assume that this might be a new browser session
function loadCache() {
var url = localStorage.getItem("cache");
if(typeof url=="string") {
var img = new Image();
img.onload = function () {
$("#canvas")[0].getContext("2d").drawImage(img, 0, 0);
//clear cache since it takes up a LOT unused of memory
URL.revokeObjectURL(url);
//remove reference to deleted cache
localStorage.removeItem("cache");
init(true); //cache successfully loaded, resume where it left off
};
img.onprogress = function (e) {
//update progress bar
};
img.onerror = loadFailed; //notify user of failure
img.src = url;
} else {
init(false); //nothing was cached, so start normally
}
}
Note that I am not certain this will work the way I intend, so any confirmation would be awesome.
EDIT just realized that sessionStorage is not the same thing as localStorage :P
Blob URL can last across sessions? Not the way you want it to.
The URL is a reference represented as a string, which you can save in localStorage just like any string. The location that URL points to is what you really want, and that won't persist across sessions.
When using URL.toObjectUrl() in conjuction with the autoRevoke argument, the URL will persist until you call revokeObjectUrl or "till the unloading document cleanup steps are executed." (steps outlined here: http://www.w3.org/TR/html51/browsers.html#unloading-document-cleanup-steps)
My guess is that those steps are being executed when the browser session expires, which is why the target of your blobURL can't be accessed in subsequent sessions.
Some other discourse on this: How to save the window.URL.createObjectURL() result for future use?
The above leads to a recommendation to use the FileSystem API to save the blob representation of your canvas element. When requesting the file system the first time, you'll need to request PERSISTENT storage, and the user will have to agree to let you store data on their machine permanently.
http://www.html5rocks.com/en/tutorials/file/filesystem/ has a good primer everything you'll need.