Get just header from remote csv file using papa parse - javascript

I need to extract just the header from a remote csv file.
My current method is as follows:
Papa parse has a method to stream data and look at each row individually which is great, and I can terminate the stream using parser.abort() to prevent it going any further after the first row, this looks as follows:
Papa.parse(csv_file_and_path,{header:true, worker:true,
download: true,
step: function(row, parser)
{
//DO MY STUFF HERE
parser.abort();
}
});
This works fine, but because I am using a remote file, it has to download the data in order to read it. Even though the code releases control back to the browser after the first line has been parsed, the download continues long after the parsing has found the first row and given me the information I need, particularly for large files where the download can continue for a long time after I've got what I need.
Is there a more efficient way of doing this? Can I prevent papa parse from downloading the whole file?
I have tried using
Papa.parse(csv_file,{header:true,
download: true,
preview:1,
complete: function(results){
//DO MY STUFF HERE
}
});
But this does the same thing, it downloads the entire file, but as with the first approach gives back control to the browser after the first line is parsed.

The solution I came up with is very similar to my original question, the difference being that I abort, complete and clear the memory.
Using the following method, only a single chunk of the file is downloaded, massively reducing bandwidth overhead for a large file as there is no downloading continuing after the first line is parsed.
Papa.parse(csv_file,{header:true,
download: true,
step: function(results, parser) {
//DO MY THING HERE
parser.abort();
results=null; //Attempting to clear the results from memory
delete results; //Attempting to clear the results from memory
}, complete: function(results){
results=null; //Attempting to clear the results from memory
delete results; //Attempting to clear the results from memory
}
});

You can use the preview option of PapaParse:
Papa.parse(..., {
preview: 5, ...
Also read this: https://github.com/mholt/PapaParse/issues/47
Related topic: Javascript using File.Reader() to read line by line

Related

How to enforce file size limit in jquery.fileupload

I'm using jquery.fileupload() to upload files from a browser to a Node.js server, which parses the files with the "multiparty" npm. I need to enforce a size limit on each file, and also on total size of all the files being uploaded in one request.
The "multiparty" npm allows me to do the latter, but not the former. And even for the latter, the limit isn't enforced until the browser uploads enough data to hit the limit. So the user can wait a long time only to get an error message.
I'd like to enforce the limit on the client-side. I've searched the Internet for solutions, but none of them seem to work. They may have worked in the past, but not with the newest version of Chrome.
I've found that I can determine that the files are too big by watching for a "change" event on the file-input element, like this:
$('#file-input-id').on('change', function() {
console.log(this.files);
});
When this event triggers, this.files contains an array of selected files, including the name and size of each. So I can determine that the caps have been exceeded, and I can alert the user. But I don't know how to stop the files from uploading anyway. Various source on the Internet suggest that I can do this by returning false or manipulating this.files. But none of this seems to work.
I'm testing this against the latest version of Chrome (66.0.3359.139), but I'd like a solution that works with any modern browser.
The file object that exists on the element has a size property which you can use to compare and make validations on the client. I wrote an example in javascript. I know you want it in JQuery but, that was kind of already answered here
Anyways, this is what I came up with ...
var inputElement = document.getElementById("file")
inputElement.addEventListener('change', function(){
var fileLimit = 100; // could be whatever you want
var files = inputElement.files;
var fileSize = files[0].size; //inputElement.files is always an array
var fileSizeInKB = (fileSize/1024); // this would be in kilobytes defaults to bytes
if(fileSizeInKB < fileLimit){
console.log("file go for launch")
// add file to server here
} else {
console.log("file too big")
// do not pass go, do not add to server. Pass error to user
document.getElementById("error").innerHTML = "your file is over 100 KB "
}
})
(CodePen https://codepen.io/HappinessFactory/pen/yjggbq)
Hope that answers your question. Good luck!
Thanks! I'm sure your answer would work if I weren't using jquery.fileupload(), but jquery.fileupload() starts the upload automatically. So there's no "add file to server" logic to perform/skip.
But your answer sent me off in the right direction. For anyone else stuck on this: The trick is to use the "start" or "submit" properties of the "options" object passed into jquery.fileupload(). Both of these are functions, and if either one returns false, the upload is cancelled.

Access fileEntry after dropping a file in a Chrome App

Is it possible to get a fileEntry object in Chrome Apps by opening a file via Drag'n'Drop? When I drop a file into my app I only get a file object which seems to be unrelated to the file system. I can't use that object to save the file after changing it.
I get the file like this:
document.body.addEventListener('drop', function (event) {
file = event.dataTransfer.files[0]
});
What I want to do
I'm developing a text editor and I want to add a feature to open a file by dragging it into my app.
As I said: I already get the content of the file, but I can't write changes back to the file since I need a fileEntry object in order to do so.
Okay, I just found it while inspecting the event object. In the event object there's a function called webkitGetAsEntry() to get the fileEntry object. Here's the correct code:
document.body.addEventListener('drop', function (event) {
fileEntry = event.dataTransfer.items[0].webkitGetAsEntry();
});
This is the object you can use to write changes back to the file system.
Code for reference:
// Of course this needs the "fileSystem" permission.
// Dropped files from the file system aren't writable by default.
// So we need to make it writable first.
chrome.fileSystem.getWritableEntry(fileEntry, function (writableEntry) {
writableEntry.createWriter(function (writer) {
// Here use `writer.write(blob)` to write changes to the file system.
// Very important detail when you write files:
// https://developer.chrome.com/apps/app_codelab_filesystem
// look for the part that reads `if (!truncated)`
// This also is very hard to find and causes an annoying error
// when you don't know how to correctly truncate files
// while writing content to the files...
});
});

How NOT to stop reading file when meeting EOF?

I'm trying to implement a routine for Node.js that would allow one to open a file, that is being appended to by some other process at this very time, and then return chunks of data immediately as they are appended to file. It can be thought as similar to tail -f UNIX command, however acting immediately as chunks are available, instead of polling for changes over time. Alternatively, one can think of it as of working with a file as you do with socket — expecting on('data') to trigger from time to time until a file is closed explicitly.
In C land, if I were to implement this, I would just open the file, feed its file descriptor to select() (or any alternative function with similar designation), and then just read chunks as file descriptor is marked "readable". So, when there is nothing to be read, it won't be readable, and when something is appended to file, it's readable again.
I somewhat expected this kind of behavior for following code sample in Javascript:
function readThatFile(filename) {
const stream = fs.createReadStream(filename, {
flags: 'r',
encoding: 'utf8',
autoClose: false // I thought this would prevent file closing on EOF too
});
stream.on('error', function(err) {
// handle error
});
stream.on('open', function(fd) {
// save fd, so I can close it later
});
stream.on('data', function(chunk) {
// process chunk
// fs.close() if I no longer need this file
});
}
However, this code sample just bails out when EOF is encountered, so I can't wait for new chunk to arrive. Of course, I could reimplement this using fs.open and fs.read, but that somewhat defeats Node.js purpose. Alternatively, I could fs.watch() file for changes, but it won't work over network, and I don't like an idea of reopening file all the time instead of just keeping it open.
I've tried to do this:
const fd = fs.openSync(filename, 'r'); // sync for readability' sake
const stream = net.Socket({ fd: fd, readable: true, writable: false });
But had no luck — net.Socket isn't happy and throws TypeError: Unsupported fd type: FILE.
So, any solutions?
UPD: this isn't possible, my answer explains why.
I haven't looked into the internals of the read streams for files, but it's possible that they don't support waiting for a file to have more data written to it. However, the fs package definitely supports this with its most basic functionality.
To explain how tailing would work, I've written a somewhat hacky tail function which will read an entire file and invoke a callback for every line (separated by \n only) and then wait for the file to have more lines written to it. Note that a more efficient way of doing this would be to have a fixed size line buffer and just shuffle bytes into it (with a special case for extremely long lines), rather than modifying JavaScript strings.
var fs = require('fs');
function tail(path, callback) {
var descriptor, bytes = 0, buffer = new Buffer(256), line = '';
function parse(err, bytesRead, buffer) {
if (err) {
callback(err, null);
return;
}
// Keep track of the bytes we have consumed already.
bytes += bytesRead;
// Combine the buffered line with the new string data.
line += buffer.toString('utf-8', 0, bytesRead);
var i = 0, j;
while ((j = line.indexOf('\n', i)) != -1) {
// Callback with a single line at a time.
callback(null, line.substring(i, j));
// Skip the newline character.
i = j + 1;
}
// Only keep the unparsed string contents for next iteration.
line = line.substr(i);
// Keep reading in the next tick (avoids CPU hogging).
process.nextTick(read);
}
function read() {
var stat = fs.fstatSync(descriptor);
if (stat.size <= bytes) {
// We're currently at the end of the file. Check again in 500 ms.
setTimeout(read, 500);
return;
}
fs.read(descriptor, buffer, 0, buffer.length, bytes, parse);
}
fs.open(path, 'r', function (err, fd) {
if (err) {
callback(err, null);
} else {
descriptor = fd;
read();
}
});
return {close: function close(callback) {
fs.close(descriptor, callback);
}};
}
// This will tail the system log on a Mac.
var t = tail('/var/log/system.log', function (err, line) {
console.log(err, line);
});
// Unceremoniously close the file handle after one minute.
setTimeout(t.close, 60000);
All that said, you should also try to leverage the NPM community. With some searching, I found the tail-stream package which might do what you want, with streams.
Previous answers have mentioned tail-stream's approach which uses fs.watch, fs.read and fs.stat together to create the effect of streaming the contents of the file. You can see that code in action here.
Another, perhaps hackier, approach might be to just use tail by spawning a child process with it. This of course comes with the limitation that tail must exist on the target platform, but one of node's strengths is using it to do asynchronous systems development via spawn and even on windows, you can execute node in an alternate shell like msysgit or cygwin to get access to the tail utility.
The code for this:
var spawn = require('child_process').spawn;
var child = spawn('tail',
['-f', 'my.log']);
child.stdout.on('data',
function (data) {
console.log('tail output: ' + data);
}
);
child.stderr.on('data',
function (data) {
console.log('err data: ' + data);
}
);
So, it seems people are still looking for an answer to this question for five years already, and there is yet no answer on topic.
In short: you can't. Not in Node.js particularly, you can't at all.
Long answer: there are few reasons for this.
First, POSIX standard clarifies select() behavior in this regard as follows:
File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions.
So, select() can't help with detecting a write beyond the file end.
With poll() it's similar:
Regular files shall always poll TRUE for reading and writing.
I can't tell for sure with epoll(), since it's not standartized and you have to read quite lengthy implementation, but I would assume it's similar.
Since libuv, which is in core of Node.js implementation, uses read(), pread() and preadv() in its uv__fs_read(), neither of which would block when invoked at the end of file, it would always return empty buffer when EOF is encountered. So, no luck here too.
So, summarizing, if such functionality is desired, something must be wrong with your design, and you should revise it.
What you're trying to do is a FIFO file (acronym for First In First Out), which as you said works like a socket.
There's a node.js module that allows you to work with fifo files.
I don't know what do you want that for, but there are better ways to work with sockets on node.js. Try socket.io instead.
You could also have a look at this previous question:
Reading a file in real-time using Node.js
Update 1
I'm not familiar with any module that would do what you want with a regular file, instead of with a socket type one. But as you said, you could use tail -f to do the trick:
// filename must exist at the time of running the script
var filename = 'somefile.txt';
var spawn = require('child_process').spawn;
var tail = spawn('tail', ['-f', filename]);
tail.stdout.on('data', function (data) {
data = data.toString().replace(/^[\s]+/i,'').replace(/[\s]+$/i,'');
console.log(data);
});
Then from the command line try echo someline > somefile.txt and watch at the console.
You might also would like to have a look at this: https://github.com/layerssss/node-tailer

Best way to write-over contents of file html5 filesystem api

So I ran into an issue where I was writing contents to a file via the HTML5 File-system api. The issue occurs when new content is shorter than the previous content, the old content is written-over as expected, but the tail of the old contents remain at the end of the file. The data I am writing is meta data for a given web-app and tends to change periodically, but not very often, generally increasing in size but occasionally the meta data is smaller in size.
Example, original content of file 0000000000, new content 11123 and after writing to the file, the contents become 1112300000
To get around this, I have been removing the file and passing a callback to write the new information in on every call. (cnDAO.filesystem is the filesystem object obtained when requesting persistent memory and has been initialized appropriately)
function writeToFile(fPath,data,callback){
rmFile(fPath,function(){
cnDAO.fileSystem.root.getFile(fPath, {
create: true
}, function(fileEntry) {
fileEntry.createWriter(function(writer) {
writer.onwriteend = function(e) {
callback();
};
writer.onerror = function(e3) { };
var blob = new Blob([data]);
writer.write(blob);
}, errorHandler);
}, errorHandler);
});
}
function rmFile(fPath,callback){
cnDAO.fileSystem.root.getFile(fPath, {
create: true
}, function(fileEntry) {
fileEntry.remove(callback);
}, errorHandler);
}
So, I was wondering if there was a better way to do what I am doing. truncate appeared in the following while I was searching for a solution (this post). As pointed out in the previous post truncate can only be called immediately after opening a file - Is truncate a better approach? Is what I'm doing better practice? Is there a quicker and easier way that I do not know about?
I would like to just start-fresh on every write to file- if that is plausible and/or good practice.

Use FileAPI to download big generated data file

The JavaScript process generates a lot of data (200-300MB). I would like to save this data for further analysis but the best I found so far is saving using this example http://jsfiddle.net/c2U2T/ which is not an option for me, because it looks like it requires all the data being available before starting the downloading. But what I need is something like
var saver = new Saver();
saver.save(); // The Save As ... dialog appears
saver.onaccepted = function () { // user accepted saving
for (var i = 0; i < 1000000; i++) {
saver.write(Math.random());
}
};
Of course, instead of the Math.random() will be some meaningful construction.
#dader - I would build upon dader's example.
Use HTML5 FileSystem API - but instead of writing to the file each and every line (more IO than it is worth), you can batch some of the lines in memory in a javascript object/array/string, and only write it to the file when they reach a certain threshold. You are thus appending to a local file as the process chugs (makes it easy to pause/restart/stop etc)
Of note is the following, which is an example of how you can spawn the dialoge to request the amount of data that you would need (it sounds large). Tested in chrome.:
navigator.persistentStorage.queryUsageAndQuota(
function (usage, quota) {
var availableSpace = quota - usage;
var requestingQuota = args.size + usage;
if (availableSpace >= args.size) {
window.requestFileSystem(PERSISTENT, availableSpace, persistentStorageGranted, persistentStorageDenied);
} else {
navigator.persistentStorage.requestQuota(
requestingQuota, function (grantedQuota) {
window.requestFileSystem(PERSISTENT, grantedQuota - usage, persistentStorageGranted, persistentStorageDenied);
}, errorCb
);
}
}, errorCb);
When you are done you can use Javascript to open a new window with the url of that blob object that you saved which you can retrieve via: fileEntry.toURL()
OR - when it is done crunching you can just display that URL in an html link and then they could right click on it and do whatever Save Link As that they want.
But this is something that is new and cool that you can do entirely in the browser without needing to involve a server in any way at all. Side note, 200-300MB of data generated by a Javascript Process sounds absolutely huge... that would be a concern for whether you are storing the "right" data...
What you actually are trying to do is a kind of streaming. I mean FileAPI is not suited for the task. Instead, I could suggest two options :
The first, using XHR facility, ie ajax, by splitting your data into several chunks which will sequencially be sent to the server, each chunk in its own request along with an id ( for identifying the stream ) and a position index ( for identifying the chunk position ). I won't recommend that, since it adds work to break up and reassemble data, and since there's a better solution.
The second way of achieving this is to use Websocket API. It allows you to send data sequentially to the server as it is generated. Following a usual stream API. I think you definitely need this.
This page may be a good place to start at : http://binaryjs.com/
That's all folks !
EDIT considering your comment :
I'm not sure to perfectly get your point though but, what about HTML5's FileSystem API ?
There are a couple examples here : http://www.html5rocks.com/en/tutorials/file/filesystem/ among which this sample that allows you to append data to an existant file. You can also create a new file, etc. :
function onInitFs(fs) {
fs.root.getFile('log.txt', {create: false}, function(fileEntry) {
// Create a FileWriter object for our FileEntry (log.txt).
fileEntry.createWriter(function(fileWriter) {
fileWriter.seek(fileWriter.length); // Start write position at EOF.
// Create a new Blob and write it to log.txt.
var blob = new Blob(['Hello World'], {type: 'text/plain'});
fileWriter.write(blob);
}, errorHandler);
}, errorHandler);
}
EDIT 2 :
What you're trying to do is not possible using javascript as said on SO here. Tha author nonetheless suggest to use Java Applet to achieve needed behaviour.
To put it in a nutshell, HTML5 Filesystem API only provides a sandboxed filesystem, ie located in some hidden directory of the browser. So if you want to access the true filesystem, using java would be just fine considering your use case. I guess there is an interface between java and javascript here.
But if you want to make your data only available from the browser ( constrained by same origin policy ), use FileSystem API.

Categories