Pipe with error events - javascript

When building complex stream pipelines, it often makes sense to pass errors from one stage to the next. (In the situation at hand, I'm considering Gulp pipelines.) But apparently the Node.js stream pipe method doesn't do that.
One can manually bind the error event listener of one stream to the error event emitter of the next. But that's pretty tedious.
Is there some easy way to build pipelines with error chaining? Or is there some reason why one shouldn't do this?
Example:
var through = require("through2");
var s1 = through(function(chunk, enc, cb) {
console.log("In 1: " + chunk.toString());
cb(null, chunk);
});
var s2 = through(function(chunk, enc, cb) {
console.log("In 2: " + chunk.toString());
if (chunk.toString() === "error")
cb(Error("Something broke"), null);
else
cb(null, chunk);
});
var s3 = through(function(chunk, enc, cb) {
console.log("In 3: " + chunk.toString());
cb(null, chunk);
});
s1.pipe(s2).pipe(s3)
.on("data", function(chunk, enc) {
console.log("Out: " + chunk.toString());
})
.on("error", function(err) {
console.log("Final error: " + err);
})
;
////////////////////////////////////////////////////
// The following two lines are the relevant part. //
s1.on("error", s2.emit.bind(s2, "error")); //
s2.on("error", s3.emit.bind(s3, "error")); //
////////////////////////////////////////////////////
s1.write("first");
s1.write("error");
s1.write("last");

Is there some easy way to build pipelines with error chaining?
bubble-stream-error provides a mechanism to do this.
Or is there some reason why one shouldn't do this?
Not that I'm aware of. This seems like something you would reasonably want to do in many cases.

Related

ffmpeg running in cloudfunction silently fails/never finishes

I am trying to implement a Cloudfunction which would run ffmpeg on a Google bucket upload. I have been playing with a script based on https://kpetrovi.ch/2017/11/02/transcoding-videos-with-ffmpeg-in-google-cloud-functions.html
The original script needs little tuning as the library evolved a bit. My current version is here:
const {Storage} = require('#google-cloud/storage');
const storage = new Storage();
const ffmpeg = require('fluent-ffmpeg');
const ffmpeg_static = require('ffmpeg-static');
console.log("Linking ffmpeg path to:", ffmpeg_static)
ffmpeg.setFfmpegPath(ffmpeg_static);
exports.transcodeVideo = (event, callback) => {
const bucket = storage.bucket(event.bucket);
console.log(event);
if (event.name.indexOf('uploads/') === -1) {
console.log("File " + event.name + " is not to be processed.")
return;
}
// ensure that you only proceed if the file is newly createdxxs
if (event.metageneration !== '1') {
callback();
return;
}
// Open write stream to new bucket, modify the filename as needed.
const targetName = event.name.replace("uploads/", "").replace(/[.][a-z0-9]+$/, "");
console.log("Target name will be: " + targetName);
const remoteWriteStream = bucket.file("processed/" + targetName + ".mp4")
.createWriteStream({
metadata: {
//metadata: event.metadata, // You may not need this, my uploads have associated metadata
contentType: 'video/mp4', // This could be whatever else you are transcoding to
},
});
// Open read stream to our uploaded file
const remoteReadStream = bucket.file(event.name).createReadStream();
// Transcode
ffmpeg()
.input(remoteReadStream)
.outputOptions('-c:v copy') // Change these options to whatever suits your needs
.outputOptions('-c:a aac')
.outputOptions('-b:a 160k')
.outputOptions('-f mp4')
.outputOptions('-preset fast')
.outputOptions('-movflags frag_keyframe+empty_moov')
// https://github.com/fluent-ffmpeg/node-fluent-ffmpeg/issues/346#issuecomment-67299526
.on('start', (cmdLine) => {
console.log('Started ffmpeg with command:', cmdLine);
})
.on('end', () => {
console.log('Successfully re-encoded video.');
callback();
})
.on('error', (err, stdout, stderr) => {
console.error('An error occured during encoding', err.message);
console.error('stdout:', stdout);
console.error('stderr:', stderr);
callback(err);
})
.pipe(remoteWriteStream, { end: true }); // end: true, emit end event when readable stream ends
};
This version correctly runs and I can see this in logs:
2020-06-16 21:24:22.606 Function execution took 912 ms, finished with status: 'ok'
2020-06-16 21:24:52.902 Started ffmpeg with command: ffmpeg -i pipe:0 -c:v copy -c:a aac -b:a 160k -f mp4 -preset fast -movflags frag_keyframe+empty_moov pipe:1
It seems the function execution ends before the actual ffmpeg command, which then never finishes.
Is there a way to make the ffmpeg "synchronous" or "blocking" so that it finishes before the function execution?
From google cloud documentation it seems the function should accept three arguments: (data, context, callback) have you tried this or do you know that context is optional? From the docs it seems that if the function accepts three arguments is treated as a background function, if it accepts only two arguments, is treated as a background function only if it returns a Promise.
More than this some other point:
1: here no callback function is called, if in your tests your function exited with that log line, it is another point suggesting that calling the second argument as a callback function is a required step to make process finish:
if (event.name.indexOf('uploads/') === -1) {
console.log("File " + event.name + " is not to be processed.")
return;
}
I would suggest to add some other console.log (or many other, if you prefer) to clarify the flow: in your question you pasted only 1 log line, it is not so much helpful more to say it is logged after the system log line
the link you used as tutorial is almost three years old, it could be that google cloud has changed its interface in the mean while.
Once said that, if acceptint three arguments rather than only two doesn't solve your problem, you can try changing your function in a Promise:
exports.transcodeVideo = (event, callback) => new Promise((resolve, reject) => {
const bucket = storage.bucket(event.bucket);
console.log(event);
if (event.name.indexOf('uploads/') === -1) {
console.log("File " + event.name + " is not to be processed.")
return resolve(); // or reject if this is an error case
}
// ensure that you only proceed if the file is newly createdxxs
if (event.metageneration !== '1') {
return resolve(); // or reject if this is an error case
}
// Open write stream to new bucket, modify the filename as needed.
const targetName = event.name.replace("uploads/", "").replace(/[.][a-z0-9]+$/, "");
console.log("Target name will be: " + targetName);
const remoteWriteStream = bucket.file("processed/" + targetName + ".mp4")
.createWriteStream({
metadata: {
//metadata: event.metadata, // You may not need this, my uploads have associated metadata
contentType: 'video/mp4', // This could be whatever else you are transcoding to
},
});
// Open read stream to our uploaded file
const remoteReadStream = bucket.file(event.name).createReadStream();
// Transcode
ffmpeg()
.input(remoteReadStream)
.outputOptions('-c:v copy') // Change these options to whatever suits your needs
.outputOptions('-c:a aac')
.outputOptions('-b:a 160k')
.outputOptions('-f mp4')
.outputOptions('-preset fast')
.outputOptions('-movflags frag_keyframe+empty_moov')
// https://github.com/fluent-ffmpeg/node-fluent-ffmpeg/issues/346#issuecomment-67299526
.on('start', (cmdLine) => {
console.log('Started ffmpeg with command:', cmdLine);
})
.on('end', () => {
console.log('Successfully re-encoded video.');
resolve();
})
.on('error', (err, stdout, stderr) => {
console.error('An error occured during encoding', err.message);
console.error('stdout:', stdout);
console.error('stderr:', stderr);
reject(err);
})
.pipe(remoteWriteStream, { end: true }); // end: true, emit end event when readable stream ends
});
Hope this helps.

How do you write to the file system of an aws lambda instance?

I am unsuccessfully trying to write to the file system of an aws lambda instance. The docs say that a standard lambda instance has 512mb of space available at /tmp/. However the following code that runs on my local machine isn't working at all on the lambda instance:
var fs = require('fs');
fs.writeFile("/tmp/test.txt", "testing", function(err) {
if(err) {
return console.log(err);
}
console.log("The file was saved!");
});
The code in the anonymous callback function is never getting called on the lambda instance. Anyone had any success doing this? Thanks so much for your help.
It's possible that this is a related question. Is it possible that there is some kind of conflict going on between the s3 code and what I'm trying to do with the fs callback function? The code below is what's currently being run.
console.log('Loading function');
var aws = require('aws-sdk');
var s3 = new aws.S3({ apiVersion: '2006-03-01' });
var fs = require('fs');
exports.handler = function(event, context) {
//console.log('Received event:', JSON.stringify(event, null, 2));
// Get the object from the event and show its content type
var bucket = event.Records[0].s3.bucket.name;
var key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
var params = {
Bucket: bucket,
Key: key
};
s3.getObject(params, function(err, data) {
if (err) {
console.log(err);
var message = "Error getting object " + key + " from bucket " + bucket +
". Make sure they exist and your bucket is in the same region as this function.";
console.log(message);
context.fail(message);
} else {
//console.log("DATA: " + data.Body.toString());
fs.writeFile("/tmp/test.csv", "testing", function (err) {
if(err) {
context.failed("writeToTmp Failed " + err);
} else {
context.succeed("writeFile succeeded");
}
});
}
});
};
Modifying your code into the Lambda template worked for me. I think you need to assign a function to exports.handler and call the appropriate context.succeed() or context.fail() method. Otherwise, you just get generic errors.
var fs = require("fs");
exports.handler = function(event, context) {
fs.writeFile("/tmp/test.txt", "testing", function (err) {
if (err) {
context.fail("writeFile failed: " + err);
} else {
context.succeed("writeFile succeeded");
}
});
};
So the answer lies in the context.fail() or context.succeed() functions. Being completely new to the world of aws and lambda I was ignorant to the fact that calling any of these methods stops execution of the lambda instance.
According to the docs:
The context.succeed() method signals successful execution and returns
a string.
By eliminating these and only calling them after I had run all the code that I wanted, everything worked well.
I ran into this, and it seems like AWS Lambda may be using an older (or modified) version of fs. I figured this out by logging the response from fs.writeFile and noticed it wasn't a promise.
To get around this, I wrapped the call in a promise:
var promise = new Promise(function(resolve, reject) {
fs.writeFile('/tmp/test.txt', 'testing', function (err) {
if (err) {
reject(err);
} else {
resolve();
}
});
});
Hopefully this helps someone else :hug-emoji:

Node.js file write in loop fails randomly

Here is my code :
function aCallbackInLoop(dataArray) {
dataArray.forEach(function (item, index) {
fs.appendFile(fileName, JSON.stringify(item) + "\r\n", function (err) {
if (err) {
console.log('Error writing data ' + err);
} else {
console.log('Data written');
}
});
});
}
I get random errors :
Data written
Data written
.
.
Error writing data Error: UNKNOWN, open 'output/mydata.json'
Error writing data Error: UNKNOWN, open 'output/mydata.json'
.
.
Data written
Error writing data Error: UNKNOWN, open 'output/mydata.json'
The function (aCallbackInLoop) is a callback for a web-service request, which returns chunks of data in dataArray. Multiple web-service requests are being made in a loop, so this callback is perhaps being called in parallel. I doubt it's some file lock issue, but I am not sure how to resolve.
PS: I have made sure it's not a data issue (I am logging all items in dataArray)
Edit : Code after trying write stream :
function writeDataToFile(fileName, data) {
try {
var wStream = fs.createWriteStream(fileName);
wStream.write(JSON.stringify(data) + "\r\n");
wStream.end();
} catch (err) {
console.log(err.message);
}
}
function aCallbackInLoop(dataArray){
dataArray.forEach(function(item, index){
writeDataToFile(filename, item); //filename is global var
});
}
As you have observed, multiple appendFile calls are not able to proceed because of the previous appendFile calls. In this particular case, it would be better to create a write stream.
var wstream = fs.createWriteStream(fileName);
dataArray.forEach(function (item) {
wstream.write(JSON.stringify(item + "\r\n");
});
wstream.end();
If you want to know when all the data is written, then you can register a function with the finish event, like this
var wstream = fs.createWriteStream(fileName);
wstream.on("finish", function() {
// Writing to the file is actually complete.
});
dataArray.forEach(function (item) {
wstream.write(JSON.stringify(item + "\r\n");
});
wstream.end();
Try using the synchronous version of appendFile - https://nodejs.org/api/fs.html#fs_fs_appendfilesync_filename_data_options

Crypto module - Node.js

Which is the simplest way to compare a hash of a file without storing it in a database?
For example:
var filename = __dirname + '/../public/index.html';
var shasum = crypto.createHash('sha1');
var s = fs.ReadStream(filename);
s.on('data', function(d) {
shasum.update(d);
});
s.on('end', function() {
var d = shasum.digest('hex');
console.log(d + ' ' + filename);
fs.writeFile(__dirname + "/../public/log.txt", d.toString() + '\n', function(err) {
if(err) {
console.log(err);
} else {
console.log("The file was saved!");
}
});
});
The above code returns the hash of the HTML file. If I edit the file how can I know if it has been changed? In other words, how can I know if the hash has been changed?
Any suggestions?
Edited
Now the hash is being saved in the log file. How can I retrieve the hash from the file and match it with the new generated one? A code example would be awesome to give me a better understanding.
There is no difference with this question, but it isn't clear for me yet how to implement it.
If you're looking for changes on a file, then you can use one of Node's filesystem functions, fs.watch. This is how it's used:
fs.watch(filename, function (event, filename) {
//event is either 'rename' or 'change'
//filename is the name of the file which triggered the event
});
The watch function is however not very consistent, so you can use fs.watchFile as an alternative. fs.watchFile uses stat polling, so it's quite a bit slower than fs.watch, which detects file changes instantly.
Watching a file will return an instance of fs.FSWatcher, which has the events change and error. Calling .close will stop watching for changes on the file.
Here's an example relating to your code:
var filename = __dirname + '/../public/index.html';
var shasum = crypto.createHash('sha1');
var oldhash = null;
var s = fs.ReadStream(filename);
s.on('data', function(d) {
shasum.update(d);
});
s.on('end', function() {
var d = shasum.digest('hex');
console.log(d + ' ' + filename);
oldhash = d.toString();
fs.writeFile(__dirname + "/../public/log.txt", d.toString() + '\n', function(err) {
if(err) {
console.log(err);
}
else {
console.log("The file was saved!");
}
});
});
//watch the log for changes
fs.watch(__dirname + "/../public/log.txt", function (event, filename) {
//read the log contents
fs.readFile(__dirname + "/../public/log.txt", function (err, data) {
//match variable data with the old hash
if (data == oldhash) {
//do something
}
});
});
What's the difference between this question and the previous one you asked? If you're not wanting to store it in a database, then store it as a file. If you want to save the hash for multiple files, then maybe put them in a JSON object and write them out as a .json file so they're easy to read/write.
EDIT
Given what you added to your question, it should be pretty simple. You might write a function to do check and re-write:
function updateHash (name, html, callback) {
var sha = crypto.createHash('sha1');
sha.update(html);
var newHash = sha.digest('hex');
var hashFileName = name + '.sha';
fs.readFile(hashFileName, 'utf8', function (err, oldHash) {
var changed = true;
if (err)
console.log(err); // probably indicates the file doesn't exist, but you should consider doing better error handling
if (oldHash === newHash)
changed = false;
fs.writeFile(hashFileName, newHash, { encoding: 'utf8' }, function (err) {
callback(err, changed);
});
});
}
updateHash('index.html', "<html><head><title>...", function (err, isChanged) {
// do something with this information ?
console.log(isChanged);
});

Memory Leak with socket.io + node.js

I appear to have a memory leak with my Node.js application. I built it quickly, and my JavaScript isn't too strong, so this might be easy.
I've done some heap dumps on it, and it's the String object? leaking memory, at the rate of about 1MB every 5 minutes. I expanded String, and it's actually String.Array?
Heap stack:
#!/usr/local/bin/node
var port = 8081;
var io = require('socket.io').listen(port),
sys = require('sys'),
daemon = require('daemon'),
mysql = require('mysql-libmysqlclient');
var updateq = "SELECT 1=1";
var countq = "SELECT 2=2";
io.set('log level', 2);
process.on('uncaughtException', function(err) {
console.log(err);
});
var connections = 0;
var conn = mysql.createConnectionSync();
dbconnect();
io.sockets.on('connection', function(client){
connections++;
client.on('disconnect', function(){ connections--; })
});
process.on('exit', function () {
console.log('Exiting');
dbdisconnect();
});
function dbdisconnect() {
conn.closeSync();
}
function dbconnect() {
conn.connectSync('leet.hacker.org','user','password');
}
function update() {
if (connections == 0)
return;
conn.query(updateq, function (err, res) {
if (err) {
dbdisconnect();
dbconnect();
return;
}
res.fetchAll(function (err, rows) {
if (err) {
throw err;
}
io.sockets.json.send(rows);
});
});
}
function totals() {
if (connections == 0)
return;
conn.query(countq, function (err, res) {
if (err) {
// Chances are that the server has just disconnected, lets try reconnecting
dbdisconnect();
dbconnect();
throw err;
}
res.fetchAll(function (err, rows) {
if (err) {
throw err;
}
io.sockets.json.send(rows);
});
});
}
setInterval(update, 250);
setInterval(totals,1000);
setInterval(function() {
console.log("Number of connections: " + connections);
},1800000);
daemon.daemonize('/var/log/epiclog.log', '/var/run/mything.pid', function (err, pid) {
// We are now in the daemon process
if (err) return sys.puts('Error starting daemon: ' + err);
sys.puts('Daemon started successfully with pid: ' + pid);
});
Current version
function totals() {
if (connections > 0)
{
var q = "SELECT query FROM table";
db.query(q, function (err, results, fields) {
if (err) {
console.error(err);
return false;
}
for (var row in results)
{
io.sockets.send("{ ID: '" + results[row].ID + "', event: '" + results[row].event + "', free: '" + results[row].free + "', total: '" + results[row].total + "', state: '" + results[row]$
row = null;
}
results = null;
fields = null;
err = null;
q = null;
});
}
}
Still leaking memory, but it seems only on these conditions:
From startup, with no clients -> Fine
1st client connection -> Fine
2nd client (even with the 1st client disconnecting and reconnecting) -> Leaking memory
Stop all connections -> Fine
1 new connection (connections = 1) -> Leaking memory
Do yourself a favour and use node-mysql, it's a pure javascript mysql client and it's fast. Other than that, you should be using asynchronous code to stop IO being blocked whilst you're working. Using the async library will help you here. It has code for waterfall callback passing among other things.
As for your memory leaking, it probably isn't socket.io, although I haven't used it in a few months, I have had many thousands of concurrent connections and not leaked memory, and my code wasn't the best either.
Two things, however. Firstly your code is faily unreadable. I suggest looking into properly formatting your code (I use two spaces for every indentation but some people use four). Secondly, printing the number of connections every half an hour seems a little silly, when you could do something like:
setInterval(function() {
process.stdout.write('Current connections: ' + connections + ' \r');
}, 1000);
The \r will cause the line to be read back to the start of the line and overwrite the characters there, which will replace the line and not create a huge amount of scrollback. This will help with debugging if you choose to put debugging details in your logging.
You can also use process.memoryUsage() for quickly checking the memory usage (or how much node thinks you're using).
Could this be related to the connected clients array not clearing properly when a client disconnects? The array value gets set to NULL rather than being dropped from the array.

Categories