I'm a JS developer just learning python. This is my first time trying to use node (v6.7.0) and python (v2.7.1) together. I'm using restify with python-runner as a bridge to my python virtualenv. My python script uses a RAKE NLP keyword-extraction package.
I can't figure out for the life of me why my return data in server.js inserts a random comma at character 8192 and roughly multiples of. There's no pattern except the location; Sometimes it's in the middle of the object key string other times in the value, othertimes after the comma separating the object pairs. This completely breaks the JSON.parse() on the return data. Example outputs below. When I run the script from a python shell, this doesn't happen.
I seriously can't figure out why this is happening, any experienced devs have any ideas?
Sample output in browser
[..., {...ate': 1.0, 'intended recipient': 4.,0, 'correc...}, ...]
Sample output in python shell
[..., {...ate': 1.0, 'intended recipient': 4.0, 'correc...}, ...]
DISREGARD ANY DISCREPANCIES REGARDING OBJECT CONVERSION AND HANDLING IN THE FILES BELOW. THE CODE HAS BEEN SIMPLIFIED TO SHOWCASE THE ISSUE
server.js
var restify = require('restify');
var py = require('python-runner');
var server = restify.createServer({...});
server.get('/keyword-extraction', function( req, res, next ) {
py.execScript(__dirname + '/keyword-extraction.py', {
bin: '.py/bin/python'
})
.then( function( data ) {
fData = JSON.parse(data); <---- ERROR
res.json(fData);
})
.catch( function( err ) {...});
return next();
});
server.listen(8001, 'localhost', function() {...});
keyword-extraction.py
import csv
import json
import RAKE
f = open( 'emails.csv', 'rb' )
f.readline() # skip line containing col names
outputData = []
try:
reader = csv.reader(f)
for row in reader:
email = {}
emailBody = row[7]
Rake = RAKE.Rake('SmartStoplist.txt')
rakeOutput = Rake.run(emailBody)
for tuple in rakeOutput:
email[tuple[0]] = tuple[1]
outputData.append(email)
finally:
file.close()
print( json.dumps(outputData))
This looks suspiciously like a bug related to size of some buffer, since 8192 is a power of two.
The main thing here is to isolate exactly where the failure is occurring. If I were debugging this, I would
Take a closer look at the output from json.dumps, by printing several characters on either side of position 8191, ideally the integer character code (unicode, ASCII, or whatever).
If that looks OK, I would try capturing the output from the python script as a file and read that directly in the node server (i.e. don't run a python script).
If that works, then create a python script that takes that file and outputs it without manipulation and have your node server execute that python script instead of the one it is using now.
That should help you figure out where the problem is occurring. From comments, I suspect that this is essentially a bug that you cannot control, unless you can increase the python buffer size enough to guarantee your data will never blow the buffer. 8K is pretty small, so that might be a realistic solution.
If that is inadequate, then you might consider processing the data on the the node server, to remove every character at n * 8192, if you can consistently rely on that. Good luck.
I'm trying to write a little commander script that proxies to other scripts in my project and I'm trying to use node to pipe the stdout from the spawned process to the current process's stdout:
function runCommand(command, arguments) {
var commandProcess = childProcess.spawn(command, arguments);
commandProcess.stdout.pipe(process.stdout);
commandProcess.on("exit", process.exit);
}
this works fine until I start getting large output from my sub processes (for example one of them is a maven command). What I'm seeing is that it only prints out the first 8192 bytes of the stdout and then stores the rest until the next "data" event. Then it prints out the next 8192 etc. That means there's a lag in the output and at times when we're running a server process sometimes it stops printing things out until you trigger something on the server that triggers another "data" event.
Is there a way to increase the size of this buffer or avoid this behavior? Ideally this commander script just proxies to our other scripts and should print out everything exactly as is.
You are using node process spawn which is asynchronously asynchronous so it will give output as an when the child process gives stdout.
Ref: [https://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options][1]
I recommend to use child process exec to run your process where you can control the size of output buffer, which will give output after the child process is finished.This is how you pass buffer size
var execute = function(command, callback){
exec(command, {maxBuffer: 1024 * 500}, function(error, stdout, stderr){ callback(error, stdout); });
};
I need to find a way to get ffmpeg to start recording again after the duration expires, so I can break the files up into more manageable chunks. Unfortunately, the only funcion I've found is the ability to limit the amount of time it records before terminating, with nothing about how to get it to keep recording in the next file afterwards. is ther a function to do this?
for(i = 0; i < streamsRepository.streams.length; i++) {
var obj = streamsRepository.streams[i];
findTheRightSave();
console.log('Channel '+obj.key+' is Recording');
fmpeg(obj.url)
.duration('1:00')
.format('mp3')
.on('error', function(err, stdout, stderr) {
console.error(err);
console.error(stdout);
console.error(stderr)
})
.save('/home/spencer/recorder/audio/'+obj.key+'/Recording-'+numRecordings+'.mp3');
numRecordings=1;
}
Us the following pparameters to ffmpeg (not sure how you can do it with fluent-ffmpeg)
-f segment -reset_timestamps 1 -segment_time 60 outputfile_%d.mp3
I don't know what ANY of those parameters mean, but that's a start
jsut tested here
ffmpeg -i "Sovereign Quarter.mp3" -f segment -reset_timestamps 1 -segment_time 60 sq%d.mp3
and got 4 x 1 minute files and 1 x 51 second file - original file length was 4:51, so looks like there was a way ffmpeg could do it after all - though
Hope you have better luck figuring out how to add those options in fluent-ffmpeg than I did. That's why I stopped using fluent-ffmpeg, I could never figure out how to add arbitrary (valid) options to the process
I'm trying to implement a routine for Node.js that would allow one to open a file, that is being appended to by some other process at this very time, and then return chunks of data immediately as they are appended to file. It can be thought as similar to tail -f UNIX command, however acting immediately as chunks are available, instead of polling for changes over time. Alternatively, one can think of it as of working with a file as you do with socket — expecting on('data') to trigger from time to time until a file is closed explicitly.
In C land, if I were to implement this, I would just open the file, feed its file descriptor to select() (or any alternative function with similar designation), and then just read chunks as file descriptor is marked "readable". So, when there is nothing to be read, it won't be readable, and when something is appended to file, it's readable again.
I somewhat expected this kind of behavior for following code sample in Javascript:
function readThatFile(filename) {
const stream = fs.createReadStream(filename, {
flags: 'r',
encoding: 'utf8',
autoClose: false // I thought this would prevent file closing on EOF too
});
stream.on('error', function(err) {
// handle error
});
stream.on('open', function(fd) {
// save fd, so I can close it later
});
stream.on('data', function(chunk) {
// process chunk
// fs.close() if I no longer need this file
});
}
However, this code sample just bails out when EOF is encountered, so I can't wait for new chunk to arrive. Of course, I could reimplement this using fs.open and fs.read, but that somewhat defeats Node.js purpose. Alternatively, I could fs.watch() file for changes, but it won't work over network, and I don't like an idea of reopening file all the time instead of just keeping it open.
I've tried to do this:
const fd = fs.openSync(filename, 'r'); // sync for readability' sake
const stream = net.Socket({ fd: fd, readable: true, writable: false });
But had no luck — net.Socket isn't happy and throws TypeError: Unsupported fd type: FILE.
So, any solutions?
UPD: this isn't possible, my answer explains why.
I haven't looked into the internals of the read streams for files, but it's possible that they don't support waiting for a file to have more data written to it. However, the fs package definitely supports this with its most basic functionality.
To explain how tailing would work, I've written a somewhat hacky tail function which will read an entire file and invoke a callback for every line (separated by \n only) and then wait for the file to have more lines written to it. Note that a more efficient way of doing this would be to have a fixed size line buffer and just shuffle bytes into it (with a special case for extremely long lines), rather than modifying JavaScript strings.
var fs = require('fs');
function tail(path, callback) {
var descriptor, bytes = 0, buffer = new Buffer(256), line = '';
function parse(err, bytesRead, buffer) {
if (err) {
callback(err, null);
return;
}
// Keep track of the bytes we have consumed already.
bytes += bytesRead;
// Combine the buffered line with the new string data.
line += buffer.toString('utf-8', 0, bytesRead);
var i = 0, j;
while ((j = line.indexOf('\n', i)) != -1) {
// Callback with a single line at a time.
callback(null, line.substring(i, j));
// Skip the newline character.
i = j + 1;
}
// Only keep the unparsed string contents for next iteration.
line = line.substr(i);
// Keep reading in the next tick (avoids CPU hogging).
process.nextTick(read);
}
function read() {
var stat = fs.fstatSync(descriptor);
if (stat.size <= bytes) {
// We're currently at the end of the file. Check again in 500 ms.
setTimeout(read, 500);
return;
}
fs.read(descriptor, buffer, 0, buffer.length, bytes, parse);
}
fs.open(path, 'r', function (err, fd) {
if (err) {
callback(err, null);
} else {
descriptor = fd;
read();
}
});
return {close: function close(callback) {
fs.close(descriptor, callback);
}};
}
// This will tail the system log on a Mac.
var t = tail('/var/log/system.log', function (err, line) {
console.log(err, line);
});
// Unceremoniously close the file handle after one minute.
setTimeout(t.close, 60000);
All that said, you should also try to leverage the NPM community. With some searching, I found the tail-stream package which might do what you want, with streams.
Previous answers have mentioned tail-stream's approach which uses fs.watch, fs.read and fs.stat together to create the effect of streaming the contents of the file. You can see that code in action here.
Another, perhaps hackier, approach might be to just use tail by spawning a child process with it. This of course comes with the limitation that tail must exist on the target platform, but one of node's strengths is using it to do asynchronous systems development via spawn and even on windows, you can execute node in an alternate shell like msysgit or cygwin to get access to the tail utility.
The code for this:
var spawn = require('child_process').spawn;
var child = spawn('tail',
['-f', 'my.log']);
child.stdout.on('data',
function (data) {
console.log('tail output: ' + data);
}
);
child.stderr.on('data',
function (data) {
console.log('err data: ' + data);
}
);
So, it seems people are still looking for an answer to this question for five years already, and there is yet no answer on topic.
In short: you can't. Not in Node.js particularly, you can't at all.
Long answer: there are few reasons for this.
First, POSIX standard clarifies select() behavior in this regard as follows:
File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions.
So, select() can't help with detecting a write beyond the file end.
With poll() it's similar:
Regular files shall always poll TRUE for reading and writing.
I can't tell for sure with epoll(), since it's not standartized and you have to read quite lengthy implementation, but I would assume it's similar.
Since libuv, which is in core of Node.js implementation, uses read(), pread() and preadv() in its uv__fs_read(), neither of which would block when invoked at the end of file, it would always return empty buffer when EOF is encountered. So, no luck here too.
So, summarizing, if such functionality is desired, something must be wrong with your design, and you should revise it.
What you're trying to do is a FIFO file (acronym for First In First Out), which as you said works like a socket.
There's a node.js module that allows you to work with fifo files.
I don't know what do you want that for, but there are better ways to work with sockets on node.js. Try socket.io instead.
You could also have a look at this previous question:
Reading a file in real-time using Node.js
Update 1
I'm not familiar with any module that would do what you want with a regular file, instead of with a socket type one. But as you said, you could use tail -f to do the trick:
// filename must exist at the time of running the script
var filename = 'somefile.txt';
var spawn = require('child_process').spawn;
var tail = spawn('tail', ['-f', filename]);
tail.stdout.on('data', function (data) {
data = data.toString().replace(/^[\s]+/i,'').replace(/[\s]+$/i,'');
console.log(data);
});
Then from the command line try echo someline > somefile.txt and watch at the console.
You might also would like to have a look at this: https://github.com/layerssss/node-tailer
I'm trying to capture video using FFmpeg with Node.js, and send it to a browser via websockets for playing using the MediaSource API. What I have so far works in Firefox but doesn't decode properly in Chrome. Apparently, from reading this question I need to use the sample_muxer program to ensure each 'cluster' starts with a keyframe.
Here's the code I'm using:
var ffmpeg = child_process.spawn("ffmpeg",[
"-y",
"-r", "30",
"-f","dshow",
"-i","video=FFsource:audio=Stereo Mix (Realtek High Definition Audio)",
"-vcodec", "libvpx",
"-acodec", "libvorbis",
"-threads", "0",
"-b:v", "3300k",
"-keyint_min", "150",
"-g", "150",
"-f", "webm",
"-" // Output to STDOUT
]);
ffmpeg.stdout.on('data', function(data) {
//socket.send(data); // Just sending the FFmpeg clusters works with Firefox's
// implementation of the MediaSource API. No joy with Chrome.
// - - - This is the part that doesn't work - - -
var muxer = child_process.spawn("sample_muxer",[
"-i", data, // This isn't correct...
"-o", "-" // Output to STDOUT
]);
muxer.stdout.on('data', function(muxdata) {
socket.send(muxdata); // Send the cluster
});
});
ffmpeg.stderr.on('data', function (data) {
console.log("" + data); // Output to console
});
Obviously I'm not piping it correctly and I'm unsure how I would while also including the arguments. Appreciate any help getting this working. Thanks!
The sample_muxer program takes -i argument as the name of file. It cannot read video data as standard input. To view error, you should send error stream from sample_muxer to an error log file.
var muxer = child_process.spawn("sample_muxer",[
"-i", data, // This isn't correct...
"-o", "-" // Output to STDOUT
]);
This code will result in error at https://code.google.com/p/webm/source/browse/sample_muxer.cpp?repo=libwebm#240
You can try writing to a file from ffmpeg and then reading that file from sample_muxer. Once that works, try with a FIFO file to pipe data from ffmpeg to sample_muxer.