I am using ssh2-sftp-client to get files from a remote server, I'm running into an issue reading and downloading these files once I get() them.
At first, I was able to use the get() method to download the file when the API was hit - I could also return the whole file contents in a console.log statement then it started returning Buffer content. I updated with this:
npm install ssh2-sftp-client#3.1.0
And now I get a ReadbleStream.
function getFile(req,res) {
sftp.connect(config).then(() => {
return sftp.get(process.env.SFTP_PATH + '/../...xml',true);
}).then((stream)=>{
const outFile = fs.createWriteStream('...xml')
stream.on('data', (c) => {
console.log(`Received ${c.length} bytes of data.`);
outFile.write(c);
res.send('ok')
});
stream.on('close', function() {
});
}).catch((err) => {
console.log(err, 'catch error');
});
};
I have the above code that returns a stream but I'm not sure how to get the file - the write() method doesn't seem to work here.
Any advice or suggestions on how I can use this library to read and download files would be greatly appreciated
First, don't use version 3.x. That version has been deprecated. The most recent version is v4.1.0 and has had significant cleanup work to fix a number of small bugs.
If all you want to do is download the files, then use the fastGet() method. It takes 2 args, source path and destination path. It is a lot faster than plain get as it does the download in parallel.
If you don't want to do that, then the get() method has a number of options. If you only pass in one arg (source) it will return a buffer. If you pass in 2 args, the second arg must be either a string (path to local file) or a writeable stream. If a writeable stream, the data will be piped into that stream.
I'm using exec from child_process.
The function runs fine but after 4-5minutes, it just stops, without any errors reported, but the script should run for at least 24hours...
Here is the code :
import { exec } from 'child_process';
function searchDirectory(dirPath) {
let lineBuffer = '';
const cmd = `find ${dirPath} -type f -name "*.txt" | pv -l -L 10 -q`;
const findData = exec(cmd);
findData.on('error', err => log.error(err));
findData.stdout.on('data', data => {
lineBuffer += data;
let lines = lineBuffer.split('\n');
for (var i = 0; i < lines.length - 1; i++) {
let filepath = lines[i];
processfile(filepath);
}
lineBuffer = lines[lines.length - 1];
});
findData.stdout.on('end', () => console.log('finished finding...'));
}
The pv command slows down the output, I need this since the path where I'm finding is over the network and pretty slow (60mb/s).
When I run the command directly in the terminal it works fine (I didn't wait 24hours but I let it for half hour and it was still running).
The processfile function actually makes an async call with axios to send some data to a server :
let data = readFileSync(file);
...
axios.post(API_URL, { obj: data }, { proxy: false })
.then(res => {
log.info('Successfully saved object : ' + res.data._id);
})
.catch(err => {
log.error(err.response ? err.response.data : err);
});
What could cause the script to stop? Any ideas?
Thanks
I found the issue, using exec is not recommended for huge outputs since it's using a limited size buffer. Use spawn instead :
The most significant difference between child_process.spawn and
child_process.exec is in what they return - spawn returns a stream and
exec returns a buffer.
child_process.spawn returns an object with stdout and stderr streams.
You can tap on the stdout stream to read data that the child process
sends back to Node. stdout being a stream has the "data", "end", and
other events that streams have. spawn is best used to when you want
the child process to return a large amount of data to Node - image
processing, reading binary data etc.
child_process.exec returns the whole buffer output from the child
process. By default the buffer size is set at 200k. If the child
process returns anything more than that, you program will crash with
the error message "Error: maxBuffer exceeded". You can fix that
problem by setting a bigger buffer size in the exec options. But you
should not do it because exec is not meant for processes that return
HUGE buffers to Node. You should use spawn for that. So what do you
use exec for? Use it to run programs that return result statuses,
instead of data.
from : https://www.hacksparrow.com/difference-between-spawn-and-exec-of-node-js-child_process.html
I'm trying to write a little commander script that proxies to other scripts in my project and I'm trying to use node to pipe the stdout from the spawned process to the current process's stdout:
function runCommand(command, arguments) {
var commandProcess = childProcess.spawn(command, arguments);
commandProcess.stdout.pipe(process.stdout);
commandProcess.on("exit", process.exit);
}
this works fine until I start getting large output from my sub processes (for example one of them is a maven command). What I'm seeing is that it only prints out the first 8192 bytes of the stdout and then stores the rest until the next "data" event. Then it prints out the next 8192 etc. That means there's a lag in the output and at times when we're running a server process sometimes it stops printing things out until you trigger something on the server that triggers another "data" event.
Is there a way to increase the size of this buffer or avoid this behavior? Ideally this commander script just proxies to our other scripts and should print out everything exactly as is.
You are using node process spawn which is asynchronously asynchronous so it will give output as an when the child process gives stdout.
Ref: [https://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options][1]
I recommend to use child process exec to run your process where you can control the size of output buffer, which will give output after the child process is finished.This is how you pass buffer size
var execute = function(command, callback){
exec(command, {maxBuffer: 1024 * 500}, function(error, stdout, stderr){ callback(error, stdout); });
};
I'm trying to implement a routine for Node.js that would allow one to open a file, that is being appended to by some other process at this very time, and then return chunks of data immediately as they are appended to file. It can be thought as similar to tail -f UNIX command, however acting immediately as chunks are available, instead of polling for changes over time. Alternatively, one can think of it as of working with a file as you do with socket — expecting on('data') to trigger from time to time until a file is closed explicitly.
In C land, if I were to implement this, I would just open the file, feed its file descriptor to select() (or any alternative function with similar designation), and then just read chunks as file descriptor is marked "readable". So, when there is nothing to be read, it won't be readable, and when something is appended to file, it's readable again.
I somewhat expected this kind of behavior for following code sample in Javascript:
function readThatFile(filename) {
const stream = fs.createReadStream(filename, {
flags: 'r',
encoding: 'utf8',
autoClose: false // I thought this would prevent file closing on EOF too
});
stream.on('error', function(err) {
// handle error
});
stream.on('open', function(fd) {
// save fd, so I can close it later
});
stream.on('data', function(chunk) {
// process chunk
// fs.close() if I no longer need this file
});
}
However, this code sample just bails out when EOF is encountered, so I can't wait for new chunk to arrive. Of course, I could reimplement this using fs.open and fs.read, but that somewhat defeats Node.js purpose. Alternatively, I could fs.watch() file for changes, but it won't work over network, and I don't like an idea of reopening file all the time instead of just keeping it open.
I've tried to do this:
const fd = fs.openSync(filename, 'r'); // sync for readability' sake
const stream = net.Socket({ fd: fd, readable: true, writable: false });
But had no luck — net.Socket isn't happy and throws TypeError: Unsupported fd type: FILE.
So, any solutions?
UPD: this isn't possible, my answer explains why.
I haven't looked into the internals of the read streams for files, but it's possible that they don't support waiting for a file to have more data written to it. However, the fs package definitely supports this with its most basic functionality.
To explain how tailing would work, I've written a somewhat hacky tail function which will read an entire file and invoke a callback for every line (separated by \n only) and then wait for the file to have more lines written to it. Note that a more efficient way of doing this would be to have a fixed size line buffer and just shuffle bytes into it (with a special case for extremely long lines), rather than modifying JavaScript strings.
var fs = require('fs');
function tail(path, callback) {
var descriptor, bytes = 0, buffer = new Buffer(256), line = '';
function parse(err, bytesRead, buffer) {
if (err) {
callback(err, null);
return;
}
// Keep track of the bytes we have consumed already.
bytes += bytesRead;
// Combine the buffered line with the new string data.
line += buffer.toString('utf-8', 0, bytesRead);
var i = 0, j;
while ((j = line.indexOf('\n', i)) != -1) {
// Callback with a single line at a time.
callback(null, line.substring(i, j));
// Skip the newline character.
i = j + 1;
}
// Only keep the unparsed string contents for next iteration.
line = line.substr(i);
// Keep reading in the next tick (avoids CPU hogging).
process.nextTick(read);
}
function read() {
var stat = fs.fstatSync(descriptor);
if (stat.size <= bytes) {
// We're currently at the end of the file. Check again in 500 ms.
setTimeout(read, 500);
return;
}
fs.read(descriptor, buffer, 0, buffer.length, bytes, parse);
}
fs.open(path, 'r', function (err, fd) {
if (err) {
callback(err, null);
} else {
descriptor = fd;
read();
}
});
return {close: function close(callback) {
fs.close(descriptor, callback);
}};
}
// This will tail the system log on a Mac.
var t = tail('/var/log/system.log', function (err, line) {
console.log(err, line);
});
// Unceremoniously close the file handle after one minute.
setTimeout(t.close, 60000);
All that said, you should also try to leverage the NPM community. With some searching, I found the tail-stream package which might do what you want, with streams.
Previous answers have mentioned tail-stream's approach which uses fs.watch, fs.read and fs.stat together to create the effect of streaming the contents of the file. You can see that code in action here.
Another, perhaps hackier, approach might be to just use tail by spawning a child process with it. This of course comes with the limitation that tail must exist on the target platform, but one of node's strengths is using it to do asynchronous systems development via spawn and even on windows, you can execute node in an alternate shell like msysgit or cygwin to get access to the tail utility.
The code for this:
var spawn = require('child_process').spawn;
var child = spawn('tail',
['-f', 'my.log']);
child.stdout.on('data',
function (data) {
console.log('tail output: ' + data);
}
);
child.stderr.on('data',
function (data) {
console.log('err data: ' + data);
}
);
So, it seems people are still looking for an answer to this question for five years already, and there is yet no answer on topic.
In short: you can't. Not in Node.js particularly, you can't at all.
Long answer: there are few reasons for this.
First, POSIX standard clarifies select() behavior in this regard as follows:
File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions.
So, select() can't help with detecting a write beyond the file end.
With poll() it's similar:
Regular files shall always poll TRUE for reading and writing.
I can't tell for sure with epoll(), since it's not standartized and you have to read quite lengthy implementation, but I would assume it's similar.
Since libuv, which is in core of Node.js implementation, uses read(), pread() and preadv() in its uv__fs_read(), neither of which would block when invoked at the end of file, it would always return empty buffer when EOF is encountered. So, no luck here too.
So, summarizing, if such functionality is desired, something must be wrong with your design, and you should revise it.
What you're trying to do is a FIFO file (acronym for First In First Out), which as you said works like a socket.
There's a node.js module that allows you to work with fifo files.
I don't know what do you want that for, but there are better ways to work with sockets on node.js. Try socket.io instead.
You could also have a look at this previous question:
Reading a file in real-time using Node.js
Update 1
I'm not familiar with any module that would do what you want with a regular file, instead of with a socket type one. But as you said, you could use tail -f to do the trick:
// filename must exist at the time of running the script
var filename = 'somefile.txt';
var spawn = require('child_process').spawn;
var tail = spawn('tail', ['-f', filename]);
tail.stdout.on('data', function (data) {
data = data.toString().replace(/^[\s]+/i,'').replace(/[\s]+$/i,'');
console.log(data);
});
Then from the command line try echo someline > somefile.txt and watch at the console.
You might also would like to have a look at this: https://github.com/layerssss/node-tailer
I'm running a node application as a daemon. When debugging the daemon, I need to see the output, so I'd like to redirect stdout and stderr to a file.
I'd expect I can just reassign stdout and stderr like in Python or C:
fs = require('fs');
process.stdout = fs.openSync('/var/log/foo', 'w');
process.stderr = process.stdout;
console.log('hello');
When I run the script directly, "hello" is printed to the console! Of course when I run in the background, I see output neither on the console (of course) or in /var/log/foo.
I don't want or need sophisticated logging. I just need to see the builtin messages that node already provides.
The console object grabs a reference to process.stdout and stderr when it is first created.
(you can see this in the source).
Re-assigning them later does not affect console.
The usual way to redirect these streams is to launch the process with the streams redirected.
Alternatively, you can overwrite the console methods and make them write to your file instead.
Similar to this question, you can overwrite process.stdout.write which is called from console.log.
var fs = require('fs');
var oldWrite = process.stdout.write;
process.stdout.write = function (d) {
fs.appendFileSync('./foo', d);
oldWrite.apply(this, arguments);
};
console.log('hello');