I have an application (KafkaConnect) that is generating me avro files into S3.
This files are compressed with avro code "snappy".
I'm trying to read them with javascript (I'm not a very strong javascript developer as you will be able to guess).
I tried to use avro-js or avsc as libraries to help me with this since they are referenced in most of the online examples I found for doing this.
The most complete example and very useful I found was here.
Anyway it seems most examples I found are using snappy version 6 which seems to be a bit different than version 7 (the latest).
One of the main things I noticed is that it now provides two methods of uncompress. One with sync and another which returns a promise, but none that can receive a call back function.
Anyway I think this is not an issue because I could make it work regardless, but my best example to read this files would be something like this (with avsc).
const avsc = require('avsc');
const avsc = require('avsc');
const snappy = require('snappy');
const codecs = {
snappy: function (buf, cb) {
// Avro appends checksums to compressed blocks, which we skip here.
const buffer = snappy.uncompressSync(buf.slice(0, buf.length - 4));
return cb(buffer);
}
};
avsc.createFileDecoder('person-10.snappy.avro', {codecs})
.on('metadata', function (writerType) {
console.log(writerType.name);
})
.on('data', function (obj) {
console.log('on data ');
console.log('obj');
})
.on('end', function () {
console.log('end');
});
Anyway the processing of metadata works without issues (I can access the full schema information) but the data always fails with
Uncaught Error: snappy codec decompression error
I'm looking for someone that has by some reason worked with avro and snappy in the latest versions and managed to make this work.
Because I'm really struggling with understanding this I created a fork of the official avsc repo and tried to introduce my examples there to see how this work but if more useful I could try and create a simpler
reproducible scenario.
the documentation of the package I was using was updated and now the problem is fixed.
https://github.com/mtth/avsc/wiki/API#class-blockdecoderopts
mainly I was just wrong on how to call the call back function and how to handle the buffer to snappy.
this is the correct way (as documented)
const crc32 = require('buffer-crc32');
const snappy = require('snappy');
const blockDecoder = new avro.streams.BlockDecoder({
codecs: {
snappy: (buf, cb) => {
// Avro appends checksums to compressed blocks.
const len = buf.length;
const checksum = buf.slice(len - 4, len);
snappy.uncompress(buf.slice(0, len - 4))
.then((inflated) => {
if (!checksum.equals(crc32(inflated))) {
// We make sure that the checksum matches.
throw new Error('invalid checksum');
}
cb(null, inflated);
})
.catch(cb);
}
}
});
My current setup:
Convert canvas to blob.
Ask the user for a file path.
Save the blob at the location given by the user.
However, I can't get step 3 to work. I'm currently trying to use fs to do the job, but it doesn't really seem to save the file.
Current code:
canvas.toBlob(blob => {
remote.dialog.showSaveDialog({ defaultPath: "file.png" }).then((canceled, filepath) => {
if (filepath) { // Using filepath because canceled is always true for some reason
blob.arrayBuffer().then(arrayBuffer => {
console.log(arrayBuffer);
fs.writeFile(filepath, Buffer.from(arrayBuffer), err => {
if (err) throw err;
});
});
}
});
}, "image/png");
Are there any flaws in my code? I tried changing the Buffer to Uint8Array and Int8Array, but they didn't work either.
I solved the problem by changing canceled/filepath argument into one result argument which holds all of them, turns out I was just using showSaveDialog in a wrong way.
I am trying the following code (from sample of parquetjs-lite and stackoverflow) to read a parquet file in nodejs :
const readParquetFile = async () => {
try {
// create new ParquetReader that reads from test.parquet
let reader = await parquet.ParquetReader.openFile('test.parquet');
}
catch (e){
console.log(e);
throw e;
}
// create a new cursor
let cursor = reader.getCursor();
// read all records from the file and print them
let record = null;
while (record = await cursor.next()) {
console.log(record);
}
await reader.close();
};
When I run this code nothing happens . There is nothing written to the console, for testing purpose I have only used a small csv file which I converted using python to parquet.
Is it because I have converted from csv to parquet using python (I couldn't find any JS equivalent for large files on which I have to ultimately be able to use).
I want my application to be able to take in any parquet file and read it. Is there any limitation for parquetjs-lite in this regard.
There are NaN values in my CSV could that be a problem ?
Any pointers would be helpful.
Thanks
Possible failure cases are
you are calling this function in some file without a webserver running.
In this case the file will run in async mode and as async function goes in callback stack and your main stack is empty the program will end and even is you have code in your call stack it will never run or log anything.
To solve this try running a webserver or better use sync calls
//app.js (without webserver)
const readParquetFile = async () => {
console.log("running")
}
readParquetFile()
console.log("exit")
when you run the above code the output will be
exit
//syncApp.js
const readParquetFile = () => {
console.log("running")
// all function should be sync
}
readParquetFile()
console.log("exit")
here the console log will be
running
exit
I have a local JSON file which I intent to read/write from a NodeJS electron app. I am not sure, but I believe that instead of using readFile() and writeFile(), I should get a FileHandle to avoid multiple open and close actions.
So I've tried to grab a FileHandle from fs.promises.open(), but the problem seems to be that I am unable to get a FileHandle from an existing file without truncate it and clear it to 0.
const { resolve } = require('path');
const fsPromises = require('fs').promises;
function init() {
// Save table name
this.path = resolve(__dirname, '..', 'data', `test.json`);
// Create/Open the json file
fsPromises
.open(this.path, 'wx+')
.then(fileHandle => {
// Grab file handle if the file don't exists
// because of the flag 'wx+'
this.fh = fileHandle;
})
.catch(err => {
if (err.code === 'EEXIST') {
// File exists
}
});
}
Am I doing something wrong? Are there better ways to do it?
Links:
https://nodejs.org/api/fs.html#fs_fspromises_open_path_flags_mode
https://nodejs.org/api/fs.html#fs_file_system_flags
Because JSON is a text format that has to be read or written all at once and can't be easily modified or added onto in place, you're going to have to read the whole file or write the whole file at once.
So, your simplest option will be to just use fs.promises.readFile() and fs.promises.writeFile() and let the library open the file, read/write it and close the file. Opening and closing a file in a modern OS takes advantage of disk caching so if you're reopening a file you just previously opened not long ago, it's not going to be a slow operation. Further, since nodejs performs these operations in secondary threads in libuv, it doesn't block the main thread of nodejs either so its generally not a performance issue for your server.
If you really wanted to open the file once and hold it open, you would open it for reading and writing using the r+ flag as in:
const fileHandle = await fsPromises.open(this.path, 'r+');
Reading the whole file would be simple as the new fileHandle object has a .readFile() method.
const text = await fileHandle.readFile({encoding 'utf8'});
For writing the whole file from an open filehandle, you would have to truncate the file, then write your bytes, then flush the write buffer to ensure the last bit of the data got to the disk and isn't sitting in a buffer.
await fileHandle.truncate(0); // clear previous contents
let {bytesWritten} = await fileHandle.write(mybuffer, 0, someLength, 0); // write new data
assert(bytesWritten === someLength);
await fileHandle.sync(); // flush buffering to disk
I'm trying to implement a routine for Node.js that would allow one to open a file, that is being appended to by some other process at this very time, and then return chunks of data immediately as they are appended to file. It can be thought as similar to tail -f UNIX command, however acting immediately as chunks are available, instead of polling for changes over time. Alternatively, one can think of it as of working with a file as you do with socket — expecting on('data') to trigger from time to time until a file is closed explicitly.
In C land, if I were to implement this, I would just open the file, feed its file descriptor to select() (or any alternative function with similar designation), and then just read chunks as file descriptor is marked "readable". So, when there is nothing to be read, it won't be readable, and when something is appended to file, it's readable again.
I somewhat expected this kind of behavior for following code sample in Javascript:
function readThatFile(filename) {
const stream = fs.createReadStream(filename, {
flags: 'r',
encoding: 'utf8',
autoClose: false // I thought this would prevent file closing on EOF too
});
stream.on('error', function(err) {
// handle error
});
stream.on('open', function(fd) {
// save fd, so I can close it later
});
stream.on('data', function(chunk) {
// process chunk
// fs.close() if I no longer need this file
});
}
However, this code sample just bails out when EOF is encountered, so I can't wait for new chunk to arrive. Of course, I could reimplement this using fs.open and fs.read, but that somewhat defeats Node.js purpose. Alternatively, I could fs.watch() file for changes, but it won't work over network, and I don't like an idea of reopening file all the time instead of just keeping it open.
I've tried to do this:
const fd = fs.openSync(filename, 'r'); // sync for readability' sake
const stream = net.Socket({ fd: fd, readable: true, writable: false });
But had no luck — net.Socket isn't happy and throws TypeError: Unsupported fd type: FILE.
So, any solutions?
UPD: this isn't possible, my answer explains why.
I haven't looked into the internals of the read streams for files, but it's possible that they don't support waiting for a file to have more data written to it. However, the fs package definitely supports this with its most basic functionality.
To explain how tailing would work, I've written a somewhat hacky tail function which will read an entire file and invoke a callback for every line (separated by \n only) and then wait for the file to have more lines written to it. Note that a more efficient way of doing this would be to have a fixed size line buffer and just shuffle bytes into it (with a special case for extremely long lines), rather than modifying JavaScript strings.
var fs = require('fs');
function tail(path, callback) {
var descriptor, bytes = 0, buffer = new Buffer(256), line = '';
function parse(err, bytesRead, buffer) {
if (err) {
callback(err, null);
return;
}
// Keep track of the bytes we have consumed already.
bytes += bytesRead;
// Combine the buffered line with the new string data.
line += buffer.toString('utf-8', 0, bytesRead);
var i = 0, j;
while ((j = line.indexOf('\n', i)) != -1) {
// Callback with a single line at a time.
callback(null, line.substring(i, j));
// Skip the newline character.
i = j + 1;
}
// Only keep the unparsed string contents for next iteration.
line = line.substr(i);
// Keep reading in the next tick (avoids CPU hogging).
process.nextTick(read);
}
function read() {
var stat = fs.fstatSync(descriptor);
if (stat.size <= bytes) {
// We're currently at the end of the file. Check again in 500 ms.
setTimeout(read, 500);
return;
}
fs.read(descriptor, buffer, 0, buffer.length, bytes, parse);
}
fs.open(path, 'r', function (err, fd) {
if (err) {
callback(err, null);
} else {
descriptor = fd;
read();
}
});
return {close: function close(callback) {
fs.close(descriptor, callback);
}};
}
// This will tail the system log on a Mac.
var t = tail('/var/log/system.log', function (err, line) {
console.log(err, line);
});
// Unceremoniously close the file handle after one minute.
setTimeout(t.close, 60000);
All that said, you should also try to leverage the NPM community. With some searching, I found the tail-stream package which might do what you want, with streams.
Previous answers have mentioned tail-stream's approach which uses fs.watch, fs.read and fs.stat together to create the effect of streaming the contents of the file. You can see that code in action here.
Another, perhaps hackier, approach might be to just use tail by spawning a child process with it. This of course comes with the limitation that tail must exist on the target platform, but one of node's strengths is using it to do asynchronous systems development via spawn and even on windows, you can execute node in an alternate shell like msysgit or cygwin to get access to the tail utility.
The code for this:
var spawn = require('child_process').spawn;
var child = spawn('tail',
['-f', 'my.log']);
child.stdout.on('data',
function (data) {
console.log('tail output: ' + data);
}
);
child.stderr.on('data',
function (data) {
console.log('err data: ' + data);
}
);
So, it seems people are still looking for an answer to this question for five years already, and there is yet no answer on topic.
In short: you can't. Not in Node.js particularly, you can't at all.
Long answer: there are few reasons for this.
First, POSIX standard clarifies select() behavior in this regard as follows:
File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions.
So, select() can't help with detecting a write beyond the file end.
With poll() it's similar:
Regular files shall always poll TRUE for reading and writing.
I can't tell for sure with epoll(), since it's not standartized and you have to read quite lengthy implementation, but I would assume it's similar.
Since libuv, which is in core of Node.js implementation, uses read(), pread() and preadv() in its uv__fs_read(), neither of which would block when invoked at the end of file, it would always return empty buffer when EOF is encountered. So, no luck here too.
So, summarizing, if such functionality is desired, something must be wrong with your design, and you should revise it.
What you're trying to do is a FIFO file (acronym for First In First Out), which as you said works like a socket.
There's a node.js module that allows you to work with fifo files.
I don't know what do you want that for, but there are better ways to work with sockets on node.js. Try socket.io instead.
You could also have a look at this previous question:
Reading a file in real-time using Node.js
Update 1
I'm not familiar with any module that would do what you want with a regular file, instead of with a socket type one. But as you said, you could use tail -f to do the trick:
// filename must exist at the time of running the script
var filename = 'somefile.txt';
var spawn = require('child_process').spawn;
var tail = spawn('tail', ['-f', filename]);
tail.stdout.on('data', function (data) {
data = data.toString().replace(/^[\s]+/i,'').replace(/[\s]+$/i,'');
console.log(data);
});
Then from the command line try echo someline > somefile.txt and watch at the console.
You might also would like to have a look at this: https://github.com/layerssss/node-tailer