How NOT to stop reading file when meeting EOF?

How NOT to stop reading file when meeting EOF? - javascript

I'm trying to implement a routine for Node.js that would allow one to open a file, that is being appended to by some other process at this very time, and then return chunks of data immediately as they are appended to file. It can be thought as similar to tail -f UNIX command, however acting immediately as chunks are available, instead of polling for changes over time. Alternatively, one can think of it as of working with a file as you do with socket — expecting on('data') to trigger from time to time until a file is closed explicitly.
In C land, if I were to implement this, I would just open the file, feed its file descriptor to select() (or any alternative function with similar designation), and then just read chunks as file descriptor is marked "readable". So, when there is nothing to be read, it won't be readable, and when something is appended to file, it's readable again.
I somewhat expected this kind of behavior for following code sample in Javascript:
function readThatFile(filename) {
const stream = fs.createReadStream(filename, {
flags: 'r',
encoding: 'utf8',
autoClose: false // I thought this would prevent file closing on EOF too
});
stream.on('error', function(err) {
// handle error
});
stream.on('open', function(fd) {
// save fd, so I can close it later
});
stream.on('data', function(chunk) {
// process chunk
// fs.close() if I no longer need this file
});
}
However, this code sample just bails out when EOF is encountered, so I can't wait for new chunk to arrive. Of course, I could reimplement this using fs.open and fs.read, but that somewhat defeats Node.js purpose. Alternatively, I could fs.watch() file for changes, but it won't work over network, and I don't like an idea of reopening file all the time instead of just keeping it open.
I've tried to do this:
const fd = fs.openSync(filename, 'r'); // sync for readability' sake
const stream = net.Socket({ fd: fd, readable: true, writable: false });
But had no luck — net.Socket isn't happy and throws TypeError: Unsupported fd type: FILE.
So, any solutions?
UPD: this isn't possible, my answer explains why.

I haven't looked into the internals of the read streams for files, but it's possible that they don't support waiting for a file to have more data written to it. However, the fs package definitely supports this with its most basic functionality.
To explain how tailing would work, I've written a somewhat hacky tail function which will read an entire file and invoke a callback for every line (separated by \n only) and then wait for the file to have more lines written to it. Note that a more efficient way of doing this would be to have a fixed size line buffer and just shuffle bytes into it (with a special case for extremely long lines), rather than modifying JavaScript strings.
var fs = require('fs');
function tail(path, callback) {
var descriptor, bytes = 0, buffer = new Buffer(256), line = '';
function parse(err, bytesRead, buffer) {
if (err) {
callback(err, null);
return;
}
// Keep track of the bytes we have consumed already.
bytes += bytesRead;
// Combine the buffered line with the new string data.
line += buffer.toString('utf-8', 0, bytesRead);
var i = 0, j;
while ((j = line.indexOf('\n', i)) != -1) {
// Callback with a single line at a time.
callback(null, line.substring(i, j));
// Skip the newline character.
i = j + 1;
}
// Only keep the unparsed string contents for next iteration.
line = line.substr(i);
// Keep reading in the next tick (avoids CPU hogging).
process.nextTick(read);
}
function read() {
var stat = fs.fstatSync(descriptor);
if (stat.size <= bytes) {
// We're currently at the end of the file. Check again in 500 ms.
setTimeout(read, 500);
return;
}
fs.read(descriptor, buffer, 0, buffer.length, bytes, parse);
}
fs.open(path, 'r', function (err, fd) {
if (err) {
callback(err, null);
} else {
descriptor = fd;
read();
}
});
return {close: function close(callback) {
fs.close(descriptor, callback);
}};
}
// This will tail the system log on a Mac.
var t = tail('/var/log/system.log', function (err, line) {
console.log(err, line);
});
// Unceremoniously close the file handle after one minute.
setTimeout(t.close, 60000);
All that said, you should also try to leverage the NPM community. With some searching, I found the tail-stream package which might do what you want, with streams.

Previous answers have mentioned tail-stream's approach which uses fs.watch, fs.read and fs.stat together to create the effect of streaming the contents of the file. You can see that code in action here.
Another, perhaps hackier, approach might be to just use tail by spawning a child process with it. This of course comes with the limitation that tail must exist on the target platform, but one of node's strengths is using it to do asynchronous systems development via spawn and even on windows, you can execute node in an alternate shell like msysgit or cygwin to get access to the tail utility.
The code for this:
var spawn = require('child_process').spawn;
var child = spawn('tail',
['-f', 'my.log']);
child.stdout.on('data',
function (data) {
console.log('tail output: ' + data);
}
);
child.stderr.on('data',
function (data) {
console.log('err data: ' + data);
}
);

So, it seems people are still looking for an answer to this question for five years already, and there is yet no answer on topic.
In short: you can't. Not in Node.js particularly, you can't at all.
Long answer: there are few reasons for this.
First, POSIX standard clarifies select() behavior in this regard as follows:
File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions.
So, select() can't help with detecting a write beyond the file end.
With poll() it's similar:
Regular files shall always poll TRUE for reading and writing.
I can't tell for sure with epoll(), since it's not standartized and you have to read quite lengthy implementation, but I would assume it's similar.
Since libuv, which is in core of Node.js implementation, uses read(), pread() and preadv() in its uv__fs_read(), neither of which would block when invoked at the end of file, it would always return empty buffer when EOF is encountered. So, no luck here too.
So, summarizing, if such functionality is desired, something must be wrong with your design, and you should revise it.

What you're trying to do is a FIFO file (acronym for First In First Out), which as you said works like a socket.
There's a node.js module that allows you to work with fifo files.
I don't know what do you want that for, but there are better ways to work with sockets on node.js. Try socket.io instead.
You could also have a look at this previous question:
Reading a file in real-time using Node.js
Update 1
I'm not familiar with any module that would do what you want with a regular file, instead of with a socket type one. But as you said, you could use tail -f to do the trick:
// filename must exist at the time of running the script
var filename = 'somefile.txt';
var spawn = require('child_process').spawn;
var tail = spawn('tail', ['-f', filename]);
tail.stdout.on('data', function (data) {
data = data.toString().replace(/^[\s]+/i,'').replace(/[\s]+$/i,'');
console.log(data);
});
Then from the command line try echo someline > somefile.txt and watch at the console.
You might also would like to have a look at this: https://github.com/layerssss/node-tailer

Related

reading avro compressed data with snappy generates an error of "snappy codec decompression error"

I have an application (KafkaConnect) that is generating me avro files into S3.
This files are compressed with avro code "snappy".
I'm trying to read them with javascript (I'm not a very strong javascript developer as you will be able to guess).
I tried to use avro-js or avsc as libraries to help me with this since they are referenced in most of the online examples I found for doing this.
The most complete example and very useful I found was here.
Anyway it seems most examples I found are using snappy version 6 which seems to be a bit different than version 7 (the latest).
One of the main things I noticed is that it now provides two methods of uncompress. One with sync and another which returns a promise, but none that can receive a call back function.
Anyway I think this is not an issue because I could make it work regardless, but my best example to read this files would be something like this (with avsc).
const avsc = require('avsc');
const avsc = require('avsc');
const snappy = require('snappy');
const codecs = {
snappy: function (buf, cb) {
// Avro appends checksums to compressed blocks, which we skip here.
const buffer = snappy.uncompressSync(buf.slice(0, buf.length - 4));
return cb(buffer);
}
};
avsc.createFileDecoder('person-10.snappy.avro', {codecs})
.on('metadata', function (writerType) {
console.log(writerType.name);
})
.on('data', function (obj) {
console.log('on data ');
console.log('obj');
})
.on('end', function () {
console.log('end');
});
Anyway the processing of metadata works without issues (I can access the full schema information) but the data always fails with
Uncaught Error: snappy codec decompression error
I'm looking for someone that has by some reason worked with avro and snappy in the latest versions and managed to make this work.
Because I'm really struggling with understanding this I created a fork of the official avsc repo and tried to introduce my examples there to see how this work but if more useful I could try and create a simpler
reproducible scenario.

the documentation of the package I was using was updated and now the problem is fixed.
https://github.com/mtth/avsc/wiki/API#class-blockdecoderopts
mainly I was just wrong on how to call the call back function and how to handle the buffer to snappy.
this is the correct way (as documented)
const crc32 = require('buffer-crc32');
const snappy = require('snappy');
const blockDecoder = new avro.streams.BlockDecoder({
codecs: {
snappy: (buf, cb) => {
// Avro appends checksums to compressed blocks.
const len = buf.length;
const checksum = buf.slice(len - 4, len);
snappy.uncompress(buf.slice(0, len - 4))
.then((inflated) => {
if (!checksum.equals(crc32(inflated))) {
// We make sure that the checksum matches.
throw new Error('invalid checksum');
}
cb(null, inflated);
})
.catch(cb);
}
}
});

What is the best way to keep a file open to read/write?

I have a local JSON file which I intent to read/write from a NodeJS electron app. I am not sure, but I believe that instead of using readFile() and writeFile(), I should get a FileHandle to avoid multiple open and close actions.
So I've tried to grab a FileHandle from fs.promises.open(), but the problem seems to be that I am unable to get a FileHandle from an existing file without truncate it and clear it to 0.
const { resolve } = require('path');
const fsPromises = require('fs').promises;
function init() {
// Save table name
this.path = resolve(__dirname, '..', 'data', `test.json`);
// Create/Open the json file
fsPromises
.open(this.path, 'wx+')
.then(fileHandle => {
// Grab file handle if the file don't exists
// because of the flag 'wx+'
this.fh = fileHandle;
})
.catch(err => {
if (err.code === 'EEXIST') {
// File exists
}
});
}
Am I doing something wrong? Are there better ways to do it?
Links:
https://nodejs.org/api/fs.html#fs_fspromises_open_path_flags_mode
https://nodejs.org/api/fs.html#fs_file_system_flags

Because JSON is a text format that has to be read or written all at once and can't be easily modified or added onto in place, you're going to have to read the whole file or write the whole file at once.
So, your simplest option will be to just use fs.promises.readFile() and fs.promises.writeFile() and let the library open the file, read/write it and close the file. Opening and closing a file in a modern OS takes advantage of disk caching so if you're reopening a file you just previously opened not long ago, it's not going to be a slow operation. Further, since nodejs performs these operations in secondary threads in libuv, it doesn't block the main thread of nodejs either so its generally not a performance issue for your server.
If you really wanted to open the file once and hold it open, you would open it for reading and writing using the r+ flag as in:
const fileHandle = await fsPromises.open(this.path, 'r+');
Reading the whole file would be simple as the new fileHandle object has a .readFile() method.
const text = await fileHandle.readFile({encoding 'utf8'});
For writing the whole file from an open filehandle, you would have to truncate the file, then write your bytes, then flush the write buffer to ensure the last bit of the data got to the disk and isn't sitting in a buffer.
await fileHandle.truncate(0); // clear previous contents
let {bytesWritten} = await fileHandle.write(mybuffer, 0, someLength, 0); // write new data
assert(bytesWritten === someLength);
await fileHandle.sync(); // flush buffering to disk

node.js fs - stream file "backwards" - from bottom to top

Using Node.js, what is the best way to stream a file from a filesystem into Node.js, but reading it backwards, from bottom to top? I have a large file, and there doesn't seem to be much sense in reading from the top if I only want the last 10 lines. Is this possible?
Right now I have this horrible code, where we do a GET request with a browser to view the server logs, and pass a query string parameter to tell the server how many lines at the end of the log file we want to read:
function get(req, res, next) {
var numOfLinesToRespondWith = req.query.num_lines || 10;
var fileStream = fs.createReadStream(stderr_path, {encoding: 'utf8'});
var jsonData = []; //where jsonData gets populated
var ret = [];
fileStream.on('data', function processLineOfFileData(chunk) {
jsonData.push(String(chunk));
})
.on('end', function handleEndOfFileData(err) {
if (err) {
log.error(colors.bgRed(err));
res.status(500).send({"error reading from smartconnect_stdout_log": err.toString()});
}
else {
for(var i = 0; i < numOfLinesToRespondWith; i++){
ret.push(jsonData.pop());
}
res.status(200).send({"smartconnect_stdout_log": ret});
}
});
}
the code above reads the whole file and then adds the number of lines requested to the response after reading the whole file. This is bad, is there a better way to do this? Any recommendations will be met gladly.
(one problem with the code above is that it's writing out the last lines of the log but the lines are in reverse order...)
One potential way to do this is:
process.exec('tail -r ' + file_path).pipe(process.stdout);
but that syntax is incorrect - so my question there would be - how do I pipe the result of that command into an array in Node.js and eventually into a JSON HTTP response?

I created a module called fs-backwards-stream that could may meet your needs. https://www.npmjs.com/package/fs-backwards-stream
If you need the result parsed by lines rather than byte chunks you should use the module fs-reverse https://www.npmjs.com/package/fs-reverse or
both of these modules stream you could simply read the last n bytes of a file.
here is an example using plain node fs apis and no dependencies.
https://gist.github.com/soldair/f250fb497ce592c3694a
hope that helps.

One easy way if you're on a linux computer would be to execute the tac command in node as process.exec("tac yourfile.dat") and pipe it to your write stream
You could also use slice-file and then reverse the order yourself.
Also, look at what #alexmills said in the comments

this is the best answer I got, for now
the tail command on Mac/UNIX reads files from the end and pipes to stdout (correct me if this is loose language)
var cp = require('child_process');
module.exports = function get(req, res, next) {
var numOfLinesToRespondWith = req.query.num_lines || 100;
cp.exec('tail -n 5 ' + stderr_path, function(err,stdout,stderr){
if(err){
log.error(colors.bgRed(err));
res.status(500).send({"error reading from smartconnect_stderr_log": err.toString()});
}
else{
var data = String(stdout).split('\n');
res.status(200).send({"stderr_log": data});
}
});
}
this seems to work really well - it does, however, run on separate process which is expensive in it's own way, but probably better than reading an entire 10,000 line log file.

Using Node.js to read a live file line by line

I am trying to determine what is the best way to read a live file line by line.
The line will sent for consumption and then discarded.
The file is live, meaning it is being written to by another application (its a log file).
The file could be large, and therefore I dont want to ready the whole thing in to memory and then process it.
Read line
Process it
Keep required data
Read next line
etc..
It seems there are many plugins aka modules. Not sure what the best (fast and efficient) way is.
I am using node.js version 0.10.33
Thanks

Use tail. It's just like the unix tail command, but in node.
npm install tail
a usage example from the npm page:
Tail = require('tail').Tail;
tail = new Tail("fileToTail", "\n", {}, true);
tail.on("line", function(data) {
console.log(data);
});
tail.on("error", function(error) {
console.log('ERROR: ', error);
});

you can create read stream http://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options
and read before next line. Something like this
var lines = [],
line,
rs = require('fs').createReadStream('/etc/passwd');
rs.on('data', function(chunk){
var indx = chunk.indexOf("\n")'
if( indx !== -1 ) {
line = line + chunk;
} else {
line = line + chunk.chunk(0, indx); //we cut the "\n" symbol.
lines.push(line); //we add line to array of lines of file
line = ''; //we clear buffer
}
});
rs.on('end', function(){
console.log(lines);
});

A linux only solution would be to spawn a child_process of tail -f /path/to/your/log and do something with stdout - Not very elegant, but it would work.

Write a data URI to a file in a Firefox extension

I am developing a Firefox addon. I need to save a bunch of data URI images to the disk. How do I approach to this?
I have browsed through the file I/O snippets on MDN, but the snippets don't help me much.
There are async and sync methods.I would like to use async method but how can I write a binary file using async method
Components.utils.import("resource://gre/modules/NetUtil.jsm");
Components.utils.import("resource://gre/modules/FileUtils.jsm");
// file is nsIFile
var file = FileUtils.getFile("Desk", ["test.png"]);
// You can also optionally pass a flags parameter here. It defaults to
// FileUtils.MODE_WRONLY | FileUtils.MODE_CREATE | FileUtils.MODE_TRUNCATE;
var ostream = FileUtils.openSafeFileOutputStream(file);
//base64 image that needs to be saved
image ="iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==";
// How can I create an inputstream from the image data URI?
var inputstream = createInputstream(image);
// The last argument (the callback) is optional.
NetUtil.asyncCopy(inputstream , ostream, function(status) {
if (!Components.isSuccessCode(status)) {
// Handle error!
return;
}
// Data has been written to the file.
});

It sounds like you'd like to write not the data URI but the binary data it "contains", so I'll answer that.
First, lets assume we got some actual data URI, (if not, adding data:application/octet-stream;base64, isn't too hard ;)
// btoa("helloworld") as a placeholder ;)
var imageDataURI = "data:application/octet-stream;base64,aGVsbG93b3JsZA==";
Option 1 - Using OS.File
OS.File has the benefit that it is truly async. On the other hand, NetUtil is only mostly async, in that there will be stat calls on the main thread and the file will be opened and potentially closed on the main thread as well (which can lead to buffer flushes and hence block the main thread while the flush is happening).
After constructing a path (with some constants help), OS.File.writeAtomic is suited for the job.
Components.utils.import("resource://gre/modules/osfile.jsm");
var file = OS.Path.join(OS.Constants.Path.desktopDir, "test.png");
var str = imageDataURI.replace(/^.*?;base64,/, "");
// Decode to a byte string
str = atob(str);
// Decode to an Uint8Array, because OS.File.writeAtomic expects an ArrayBuffer(View).
var data = new Uint8Array(str.length);
for (var i = 0, e = str.length; i < e; ++i) {
data[i] = str.charCodeAt(i);
}
// To support Firefox 24 and earlier, you'll need to provide a tmpPath. See MDN.
// There is in my opinion no need to support these, as they are end-of-life and
// contain known security issues. Let's not encourage users. ;)
var promised = OS.File.writeAtomic(file, data);
promised.then(
function() {
// Success!
},
function(ex) {
// Failed. Error information in ex
}
);
Option 2 - Using NetUtil
NetUtil has some drawbacks in that is is not fully async, as already stated above.
We can take a shortcut in that we can use NetUtil.asyncFetch to directly fetch the URL, which gives us a stream we can pass along to .asyncCopy.
Components.utils.import("resource://gre/modules/NetUtil.jsm");
Components.utils.import("resource://gre/modules/FileUtils.jsm");
// file is nsIFile
var file = FileUtils.getFile("Desk", ["test.png"]);
NetUtil.asyncFetch(imageDataURI, function(inputstream, status) {
if (!inputstream || !Components.isSuccessCode(status)) {
// Failed to read data URI.
// Handle error!
return;
}
// You can also optionally pass a flags parameter here. It defaults to
// FileUtils.MODE_WRONLY | FileUtils.MODE_CREATE | FileUtils.MODE_TRUNCATE;
var ostream = FileUtils.openSafeFileOutputStream(file);
// The last argument (the callback) is optional.
NetUtil.asyncCopy(inputstream , ostream, function(status) {
if (!Components.isSuccessCode(status)) {
// Handle error!
return;
}
// Data has been written to the file.
});
});

We Keep Coding

JavaScript is the programming language of the Web.

How NOT to stop reading file when meeting EOF? - javascript

Related

reading avro compressed data with snappy generates an error of "snappy codec decompression error"

What is the best way to keep a file open to read/write?

node.js fs - stream file "backwards" - from bottom to top

Using Node.js to read a live file line by line

Write a data URI to a file in a Firefox extension

Categories

Resources