Non blocking loop through binary buffer and push to socket - javascript

I am uploading a file to ftp via chrome.sockets but the socket buffer size is limited, so i need to loop through the blob and send out smaller chunks of data. I have tried several methods with closures and callbacks but the only way working for me is do/while loop, which is of course blocking. Part of the problem is multiple variables that need to be kept in the closure. Can you please suggest better way of looping through the blob?
do
{
chunk = blob.slice(start,end)
start =end
end =end + 8192
chrome.socket.write(this.info.socketId, Socket.string2buffer(chunk), function(writeInfo) {});
}
while (chunk.length>0);
Complete code of the extension (single purpose ftp manager) https://github.com/vanous/minime-content-manager/tree/master/chromium-ext-broadcast

I believe something along the lines of the following should work:
var self=this;
var writeChunk=function(start,end){
var chunk = blob.slice(start,end);
chrome.socket.write(self.info.socketId, Socket.string2buffer(chunk), function(writeInfo) {
if(chunk.length>0) writeChunk(end,end+8192);
});
};
writeChunk(0,8192);

Related

Parsing huge binary files in Node.js

I want to create Node.js module which should be able to parse huge binary files (some larger than 200GB). Each file is divided into chunks and each chunk can be larger than 10GB. I tried using flowing and non-flowing methods to read file, but the problem is because the end of the readed buffer is reached while parsing chunk, so parsing of that chunk must be terminated before the next onData event occurs. This is what I've tried:
var s = getStream();
s.on('data', function(a){
parseChunk(a);
});
function parseChunk(a){
/*
There are a lot of codes and functions.
One chunk is larger than buffer passed to this function,
so when the end of this buffer is reached, parseChunk
function must be terminated before parsing process is finished.
Also, when the next buffer is passed, it is not the start of
a new chunk because the previous chunk is not parsed to the end.
*/
}
Loading whole chunk into process memory isn't prossible because I have only 8GB of RAM. How can I synchronously read data from the stream or how can I pause parseChunk function when the end of the buffer is reached and wait until new data is available?
Maybe I'm missing something, but as far as I can tell, I don't see a reason why this couldn't be implemented using streams with a different syntax. I'd use
let chunk;
let Nbytes; // # of bytes to read into a chunk
stream.on('readable', ()=>{
while(chunk = stream.read(Nbytes)!==null) {
// call whatever you like on the chunk of data of size Nbytes
}
})
Note that if you specify the size of the chunk yourself, like done here, null will be returned if the amount of bytes requested are not available at the end of the stream. This doesn't mean there is no data anymore to stream. So just be aware that you should expect back a 'trimmed' buffer object of size < Nbytes at the end of the file.

node.js fs - stream file "backwards" - from bottom to top

Using Node.js, what is the best way to stream a file from a filesystem into Node.js, but reading it backwards, from bottom to top? I have a large file, and there doesn't seem to be much sense in reading from the top if I only want the last 10 lines. Is this possible?
Right now I have this horrible code, where we do a GET request with a browser to view the server logs, and pass a query string parameter to tell the server how many lines at the end of the log file we want to read:
function get(req, res, next) {
var numOfLinesToRespondWith = req.query.num_lines || 10;
var fileStream = fs.createReadStream(stderr_path, {encoding: 'utf8'});
var jsonData = []; //where jsonData gets populated
var ret = [];
fileStream.on('data', function processLineOfFileData(chunk) {
jsonData.push(String(chunk));
})
.on('end', function handleEndOfFileData(err) {
if (err) {
log.error(colors.bgRed(err));
res.status(500).send({"error reading from smartconnect_stdout_log": err.toString()});
}
else {
for(var i = 0; i < numOfLinesToRespondWith; i++){
ret.push(jsonData.pop());
}
res.status(200).send({"smartconnect_stdout_log": ret});
}
});
}
the code above reads the whole file and then adds the number of lines requested to the response after reading the whole file. This is bad, is there a better way to do this? Any recommendations will be met gladly.
(one problem with the code above is that it's writing out the last lines of the log but the lines are in reverse order...)
One potential way to do this is:
process.exec('tail -r ' + file_path).pipe(process.stdout);
but that syntax is incorrect - so my question there would be - how do I pipe the result of that command into an array in Node.js and eventually into a JSON HTTP response?
I created a module called fs-backwards-stream that could may meet your needs. https://www.npmjs.com/package/fs-backwards-stream
If you need the result parsed by lines rather than byte chunks you should use the module fs-reverse https://www.npmjs.com/package/fs-reverse or
both of these modules stream you could simply read the last n bytes of a file.
here is an example using plain node fs apis and no dependencies.
https://gist.github.com/soldair/f250fb497ce592c3694a
hope that helps.
One easy way if you're on a linux computer would be to execute the tac command in node as process.exec("tac yourfile.dat") and pipe it to your write stream
You could also use slice-file and then reverse the order yourself.
Also, look at what #alexmills said in the comments
this is the best answer I got, for now
the tail command on Mac/UNIX reads files from the end and pipes to stdout (correct me if this is loose language)
var cp = require('child_process');
module.exports = function get(req, res, next) {
var numOfLinesToRespondWith = req.query.num_lines || 100;
cp.exec('tail -n 5 ' + stderr_path, function(err,stdout,stderr){
if(err){
log.error(colors.bgRed(err));
res.status(500).send({"error reading from smartconnect_stderr_log": err.toString()});
}
else{
var data = String(stdout).split('\n');
res.status(200).send({"stderr_log": data});
}
});
}
this seems to work really well - it does, however, run on separate process which is expensive in it's own way, but probably better than reading an entire 10,000 line log file.

How NOT to stop reading file when meeting EOF?

I'm trying to implement a routine for Node.js that would allow one to open a file, that is being appended to by some other process at this very time, and then return chunks of data immediately as they are appended to file. It can be thought as similar to tail -f UNIX command, however acting immediately as chunks are available, instead of polling for changes over time. Alternatively, one can think of it as of working with a file as you do with socket — expecting on('data') to trigger from time to time until a file is closed explicitly.
In C land, if I were to implement this, I would just open the file, feed its file descriptor to select() (or any alternative function with similar designation), and then just read chunks as file descriptor is marked "readable". So, when there is nothing to be read, it won't be readable, and when something is appended to file, it's readable again.
I somewhat expected this kind of behavior for following code sample in Javascript:
function readThatFile(filename) {
const stream = fs.createReadStream(filename, {
flags: 'r',
encoding: 'utf8',
autoClose: false // I thought this would prevent file closing on EOF too
});
stream.on('error', function(err) {
// handle error
});
stream.on('open', function(fd) {
// save fd, so I can close it later
});
stream.on('data', function(chunk) {
// process chunk
// fs.close() if I no longer need this file
});
}
However, this code sample just bails out when EOF is encountered, so I can't wait for new chunk to arrive. Of course, I could reimplement this using fs.open and fs.read, but that somewhat defeats Node.js purpose. Alternatively, I could fs.watch() file for changes, but it won't work over network, and I don't like an idea of reopening file all the time instead of just keeping it open.
I've tried to do this:
const fd = fs.openSync(filename, 'r'); // sync for readability' sake
const stream = net.Socket({ fd: fd, readable: true, writable: false });
But had no luck — net.Socket isn't happy and throws TypeError: Unsupported fd type: FILE.
So, any solutions?
UPD: this isn't possible, my answer explains why.
I haven't looked into the internals of the read streams for files, but it's possible that they don't support waiting for a file to have more data written to it. However, the fs package definitely supports this with its most basic functionality.
To explain how tailing would work, I've written a somewhat hacky tail function which will read an entire file and invoke a callback for every line (separated by \n only) and then wait for the file to have more lines written to it. Note that a more efficient way of doing this would be to have a fixed size line buffer and just shuffle bytes into it (with a special case for extremely long lines), rather than modifying JavaScript strings.
var fs = require('fs');
function tail(path, callback) {
var descriptor, bytes = 0, buffer = new Buffer(256), line = '';
function parse(err, bytesRead, buffer) {
if (err) {
callback(err, null);
return;
}
// Keep track of the bytes we have consumed already.
bytes += bytesRead;
// Combine the buffered line with the new string data.
line += buffer.toString('utf-8', 0, bytesRead);
var i = 0, j;
while ((j = line.indexOf('\n', i)) != -1) {
// Callback with a single line at a time.
callback(null, line.substring(i, j));
// Skip the newline character.
i = j + 1;
}
// Only keep the unparsed string contents for next iteration.
line = line.substr(i);
// Keep reading in the next tick (avoids CPU hogging).
process.nextTick(read);
}
function read() {
var stat = fs.fstatSync(descriptor);
if (stat.size <= bytes) {
// We're currently at the end of the file. Check again in 500 ms.
setTimeout(read, 500);
return;
}
fs.read(descriptor, buffer, 0, buffer.length, bytes, parse);
}
fs.open(path, 'r', function (err, fd) {
if (err) {
callback(err, null);
} else {
descriptor = fd;
read();
}
});
return {close: function close(callback) {
fs.close(descriptor, callback);
}};
}
// This will tail the system log on a Mac.
var t = tail('/var/log/system.log', function (err, line) {
console.log(err, line);
});
// Unceremoniously close the file handle after one minute.
setTimeout(t.close, 60000);
All that said, you should also try to leverage the NPM community. With some searching, I found the tail-stream package which might do what you want, with streams.
Previous answers have mentioned tail-stream's approach which uses fs.watch, fs.read and fs.stat together to create the effect of streaming the contents of the file. You can see that code in action here.
Another, perhaps hackier, approach might be to just use tail by spawning a child process with it. This of course comes with the limitation that tail must exist on the target platform, but one of node's strengths is using it to do asynchronous systems development via spawn and even on windows, you can execute node in an alternate shell like msysgit or cygwin to get access to the tail utility.
The code for this:
var spawn = require('child_process').spawn;
var child = spawn('tail',
['-f', 'my.log']);
child.stdout.on('data',
function (data) {
console.log('tail output: ' + data);
}
);
child.stderr.on('data',
function (data) {
console.log('err data: ' + data);
}
);
So, it seems people are still looking for an answer to this question for five years already, and there is yet no answer on topic.
In short: you can't. Not in Node.js particularly, you can't at all.
Long answer: there are few reasons for this.
First, POSIX standard clarifies select() behavior in this regard as follows:
File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions.
So, select() can't help with detecting a write beyond the file end.
With poll() it's similar:
Regular files shall always poll TRUE for reading and writing.
I can't tell for sure with epoll(), since it's not standartized and you have to read quite lengthy implementation, but I would assume it's similar.
Since libuv, which is in core of Node.js implementation, uses read(), pread() and preadv() in its uv__fs_read(), neither of which would block when invoked at the end of file, it would always return empty buffer when EOF is encountered. So, no luck here too.
So, summarizing, if such functionality is desired, something must be wrong with your design, and you should revise it.
What you're trying to do is a FIFO file (acronym for First In First Out), which as you said works like a socket.
There's a node.js module that allows you to work with fifo files.
I don't know what do you want that for, but there are better ways to work with sockets on node.js. Try socket.io instead.
You could also have a look at this previous question:
Reading a file in real-time using Node.js
Update 1
I'm not familiar with any module that would do what you want with a regular file, instead of with a socket type one. But as you said, you could use tail -f to do the trick:
// filename must exist at the time of running the script
var filename = 'somefile.txt';
var spawn = require('child_process').spawn;
var tail = spawn('tail', ['-f', filename]);
tail.stdout.on('data', function (data) {
data = data.toString().replace(/^[\s]+/i,'').replace(/[\s]+$/i,'');
console.log(data);
});
Then from the command line try echo someline > somefile.txt and watch at the console.
You might also would like to have a look at this: https://github.com/layerssss/node-tailer

Use FileAPI to download big generated data file

The JavaScript process generates a lot of data (200-300MB). I would like to save this data for further analysis but the best I found so far is saving using this example http://jsfiddle.net/c2U2T/ which is not an option for me, because it looks like it requires all the data being available before starting the downloading. But what I need is something like
var saver = new Saver();
saver.save(); // The Save As ... dialog appears
saver.onaccepted = function () { // user accepted saving
for (var i = 0; i < 1000000; i++) {
saver.write(Math.random());
}
};
Of course, instead of the Math.random() will be some meaningful construction.
#dader - I would build upon dader's example.
Use HTML5 FileSystem API - but instead of writing to the file each and every line (more IO than it is worth), you can batch some of the lines in memory in a javascript object/array/string, and only write it to the file when they reach a certain threshold. You are thus appending to a local file as the process chugs (makes it easy to pause/restart/stop etc)
Of note is the following, which is an example of how you can spawn the dialoge to request the amount of data that you would need (it sounds large). Tested in chrome.:
navigator.persistentStorage.queryUsageAndQuota(
function (usage, quota) {
var availableSpace = quota - usage;
var requestingQuota = args.size + usage;
if (availableSpace >= args.size) {
window.requestFileSystem(PERSISTENT, availableSpace, persistentStorageGranted, persistentStorageDenied);
} else {
navigator.persistentStorage.requestQuota(
requestingQuota, function (grantedQuota) {
window.requestFileSystem(PERSISTENT, grantedQuota - usage, persistentStorageGranted, persistentStorageDenied);
}, errorCb
);
}
}, errorCb);
When you are done you can use Javascript to open a new window with the url of that blob object that you saved which you can retrieve via: fileEntry.toURL()
OR - when it is done crunching you can just display that URL in an html link and then they could right click on it and do whatever Save Link As that they want.
But this is something that is new and cool that you can do entirely in the browser without needing to involve a server in any way at all. Side note, 200-300MB of data generated by a Javascript Process sounds absolutely huge... that would be a concern for whether you are storing the "right" data...
What you actually are trying to do is a kind of streaming. I mean FileAPI is not suited for the task. Instead, I could suggest two options :
The first, using XHR facility, ie ajax, by splitting your data into several chunks which will sequencially be sent to the server, each chunk in its own request along with an id ( for identifying the stream ) and a position index ( for identifying the chunk position ). I won't recommend that, since it adds work to break up and reassemble data, and since there's a better solution.
The second way of achieving this is to use Websocket API. It allows you to send data sequentially to the server as it is generated. Following a usual stream API. I think you definitely need this.
This page may be a good place to start at : http://binaryjs.com/
That's all folks !
EDIT considering your comment :
I'm not sure to perfectly get your point though but, what about HTML5's FileSystem API ?
There are a couple examples here : http://www.html5rocks.com/en/tutorials/file/filesystem/ among which this sample that allows you to append data to an existant file. You can also create a new file, etc. :
function onInitFs(fs) {
fs.root.getFile('log.txt', {create: false}, function(fileEntry) {
// Create a FileWriter object for our FileEntry (log.txt).
fileEntry.createWriter(function(fileWriter) {
fileWriter.seek(fileWriter.length); // Start write position at EOF.
// Create a new Blob and write it to log.txt.
var blob = new Blob(['Hello World'], {type: 'text/plain'});
fileWriter.write(blob);
}, errorHandler);
}, errorHandler);
}
EDIT 2 :
What you're trying to do is not possible using javascript as said on SO here. Tha author nonetheless suggest to use Java Applet to achieve needed behaviour.
To put it in a nutshell, HTML5 Filesystem API only provides a sandboxed filesystem, ie located in some hidden directory of the browser. So if you want to access the true filesystem, using java would be just fine considering your use case. I guess there is an interface between java and javascript here.
But if you want to make your data only available from the browser ( constrained by same origin policy ), use FileSystem API.

Large file upload with WebSocket

I'm trying to upload large files (at least 500MB, preferably up to a few GB) using the WebSocket API. The problem is that I can't figure out how to write "send this slice of the file, release the resources used then repeat". I was hoping I could avoid using something like Flash/Silverlight for this.
Currently, I'm working with something along the lines of:
function FileSlicer(file) {
// randomly picked 1MB slices,
// I don't think this size is important for this experiment
this.sliceSize = 1024*1024;
this.slices = Math.ceil(file.size / this.sliceSize);
this.currentSlice = 0;
this.getNextSlice = function() {
var start = this.currentSlice * this.sliceSize;
var end = Math.min((this.currentSlice+1) * this.sliceSize, file.size);
++this.currentSlice;
return file.slice(start, end);
}
}
Then, I would upload using:
function Uploader(url, file) {
var fs = new FileSlicer(file);
var socket = new WebSocket(url);
socket.onopen = function() {
for(var i = 0; i < fs.slices; ++i) {
socket.send(fs.getNextSlice()); // see below
}
}
}
Basically this returns immediately, bufferedAmount is unchanged (0) and it keeps iterating and adding all the slices to the queue before attempting to send it; there's no socket.afterSend to allow me to queue it properly, which is where I'm stuck.
Use web workers for large files processing instead doing it in main thread and upload chunks of file data using file.slice().
This article helps you to handle large files in workers. change XHR send to Websocket in main thread.
//Messages from worker
function onmessage(blobOrFile) {
ws.send(blobOrFile);
}
//construct file on server side based on blob or chunk information.
I believe the send() method is asynchronous which is why it will return immediately. To make it queue, you'd need the server to send a message back to the client after each slice is uploaded; the client can then decide whether it needs to send the next slice or a "upload complete" message back to the server.
This sort of thing would probably be easier using XMLHttpRequest(2); it has callback support built-in and is also more widely supported than the WebSocket API.
In order to serialize this operation you need the server to send you a signal every time a slice is received & written (or an error occurs), this way you could send the next slice in response to the onmessage event, pretty much like this:
function Uploader(url, file) {
var fs = new FileSlicer(file);
var socket = new WebSocket(url);
socket.onopen = function() {
socket.send(fs.getNextSlice());
}
socket.onmessage = function(ms){
if(ms.data=="ok"){
fs.slices--;
if(fs.slices>0) socket.send(fs.getNextSlice());
}else{
// handle the error code here.
}
}
}
You could use https://github.com/binaryjs/binaryjs or https://github.com/liamks/Delivery.js if you can run node.js on the server.
EDIT : The web world, browsers, firewalls, proxies, changed a lot since this answer was made. Right now, sending files using websockets
can be done efficiently, especially on local area networks.
Websockets are very efficient for bidirectional communication, especially when you're interested in pushing information (preferably small) from the server. They act as bidirectional sockets (hence their name).
Websockets don't look like the right technology to use in this situation. Especially given that using them adds incompatibilities with some proxies, browsers (IE) or even firewalls.
On the other end, uploading a file is simply sending a POST request to a server with the file in the body. Browsers are very good at that and the overhead for a big file is really near nothing. Don't use websockets for that task.
I think this socket.io project has a lot of potential:
https://github.com/sffc/socketio-file-upload
It supports chunked upload, progress tracking and seems fairly easy to use.

Categories