RXJS buffer but only occasionally

RXJS buffer but only occasionally - javascript

Scenario:
I receive audio bytes as an array, and need to play in order. The data is continuously arriving, so I use an observable to receive it. Since it takes less time to receive the data than to play it, I want to use a buffer to store the bytes while audio is being played, and only play it afterwards.
My code:
Observable subscription:
audioObservable.pipe(
buffer(this.audioDoneSubject), // problematic line
map(matrix => { // since buffer creates an array that I want to flatten
let a = [];
matrix.forEach(ar => {
a = a.concat(ar);
});
return a;
})
)
.subscribe(arr => this.playAsync(arr, 1, 44100));
and the playAsync method is :
async playAsync(arr: number[], channelCount: number, sampleRate) {
const context = new AudioContext();
const buffer = context.createBuffer(channelCount, arr.length, sampleRate);
const floatArray = new Float32Array(arr);
buffer.copyToChannel(floatArray, 0);
const source = context.createBufferSource();
source.buffer = buffer;
source.connect(context.destination);
source.start();
source.onended = (event) => this.audioDoneSubject.next(event);
}
My problem is twofold:
The flow stops at the buffer, since audioDone has never emitted since we have not yet played anything.
This I can work around by using a replay subject and calling this.audioDoneSubject.next() after the first emit to my audioObservable and the subscription. But this brings me to the second issue
When audio is finished, this.audioDoneSubject emits, so the buffer releases, even if no new data has come in. If more data has come in, it works fine - it plays the data and the flow continues. However, if no new data has come in, then playAsync gets called with an empty array and throws an exception for not being able to properly create the audio buffer.
How can I tell the pipe buffer to only buffer if I am currently playing audio, and if I'm not, if the audio resource is free, to just skip to the subscription?

This Post Here shows how to write a bufferIf pipe that can be used to accomplish this

Related

Creating a Readable stream from emitted data chunks

Short backstory: I am trying to create a Readable stream based on data chunks that are emitted back to my server from the client side with WebSockets. Here's a class I've created to "simulate" that behavior:
class DataEmitter extends EventEmitter {
constructor() {
super();
const data = ['foo', 'bar', 'baz', 'hello', 'world', 'abc', '123'];
// Every second, emit an event with a chunk of data
const interval = setInterval(() => {
this.emit('chunk', data.splice(0, 1)[0]);
// Once there are no more items, emit an event
// notifying that that is the case
if (!data.length) {
this.emit('done');
clearInterval(interval);
}
}, 1e3);
}
}
In this post, the dataEmitter in question will have been created like this.
// Our data is being emitted through events in chunks from some place.
// This is just to simulate that. We cannot change the flow - only listen
// for the events and do something with the chunks.
const dataEmitter = new DataEmitter();
Right, so I initially tried this:
const readable = new Readable();
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
But that results in this error:
Error [ERR_METHOD_NOT_IMPLEMENTED]: The _read() method is not implemented
So I did this, implementing read() as an empty function:
const readable = new Readable({
read() {},
});
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
And it works when piping into a write stream, or sending the stream to my test API server. The resulting .txt file looks exactly as it should:
foobarbazhelloworldabc123
However, I feel like there's something quite wrong and hacky with my solution. I attempted to put the listener registration logic (.on('chunk', ...) and .once('done', ...)) within the read() implementation; however, read() seems to get called multiple times, and that results in the listeners being registered multiple times.
The Node.js documentation says this about the _read() method:
When readable._read() is called, if data is available from the resource, the implementation should begin pushing that data into the read queue using the this.push(dataChunk) method. _read() will be called again after each call to this.push(dataChunk) once the stream is ready to accept more data. _read() may continue reading from the resource and pushing data until readable.push() returns false. Only when _read() is called again after it has stopped should it resume pushing additional data into the queue.
After dissecting this, it seems that the consumer of the stream calls upon .read() when it's ready to read more data. And when it is called, data should be pushed into the stream. But, if it is not called, the stream should not have data pushed into it until the method is called again (???). So wait, does the consumer call .read() when it is ready for more data, or does it call it after each time .push() is called? Or both?? The docs seem to contradict themselves.
Implementing .read() on Readable is straightforward when you've got a basic resource to stream, but what would be the proper way of implementing it in this case?
And also, would someone be able to explain in better terms what the .read() method is on a deeper level, and how it should be implemented?
Thanks!
Response to the answer:
I did try registering the listeners within the read() implementation, but because it is called multiple times by the consumer, it registers the listeners multiple times.
Observing this code:
const readable = new Readable({
read() {
console.log('called');
dataEmitter.on('chunk', (data) => {
readable.push(data);
});
dataEmitter.once('done', () => {
readable.push(null);
});
},
});
readable.pipe(createWriteStream('./data.txt'));
The resulting file looks like this:
foobarbarbazbazbazhellohellohellohelloworldworldworldworldworldabcabcabcabcabcabc123123123123123123123
Which makes sense, because the listeners are being registered multiple times.

Seems like the only purpose of actually implementing the read() method is to only start receiving the chunks and pushing them into the stream when the consumer is ready for that.
Based on these conclusions, I've come up with this solution.
class MyReadable extends Readable {
// Keep track of whether or not the listeners have already
// been added to the data emitter.
#registered = false;
_read() {
// If the listeners have already been registered, do
// absolutely nothing.
if (this.#registered) return;
// "Notify" the client via websockets that we're ready
// to start streaming the data chunks.
const emitter = new DataEmitter();
const handler = (chunk: string) => {
this.push(chunk);
};
emitter.on('chunk', handler);
emitter.once('done', () => {
this.push(null);
// Clean up the listener once it's done (this is
// assuming the #emitter object will still be used
// in the future).
emitter.off('chunk', handler);
});
// Mark the listeners as registered.
this.#registered = true;
}
}
const readable = new MyReadable();
readable.pipe(createWriteStream('./data.txt'));
But this implementation doesn't allow for the consumer to control when things are pushed. I guess, however, in order to achieve that sort of control, you'd need to communicate with the resource emitting the chunks to tell it to stop until the read() method is called again.

Do something before consuming stream, using highland.js

I'm trying to code a writable stream which takes a stream of objects and inputs them into a mongodb database. Before consuming the stream of objects, I first need to wait for the db-connection to establish, but I seem to be doing something wrong, because the program never gets to the insertion-part.
// ./mongowriter.js
let mongo = mongodb.MongoClient,
connectToDb = _.wrapCallback(mongo.connect);
export default url => _.pipeline(s => {
return connectToDb(url).flatMap(db => {
console.log('Connection established!');
return s.flatMap(x => /* insert x into db */);
});
});
....
// Usage in other file
import mongowriter from './mongowriter.js';
let objStream = _([/* json objects */]);
objStream.pipe(mongoWriter);
The program just quits without "Connection established!" ever being written to the console.
What am I missing? Is there some kind of idiom I should be following?

By reading the source and some general experimentation, I figured out how to do a single asynchronous thing and then continue processing through the stream. Basically, you use flatMap to replace the event from the asynchronous task with the stream you actually want to process.
Another quirk I didn't expect and which was throwing me off, was that _.pipeline won't work unless the original stream is fully consumed in the callback. That's why it won't work simply putting in a _.map and log stuff (which was how I tried to debug it). Instead one needs to make sure to have an each or done at the end. Below is a minimal example:
export default _ => _.pipeline( stream => {
return _(promiseReturningFunction())
.tap(_ => process.stdout.write('.'))
.flatMap(_ => stream)
.each(_ => process.stdout.write('-'));
});
// Will produce something like the following when called with a non-empty stream.
// Note the lone '.' in the beginning.
// => .-------------------
Basically, a '.' is output when the async function is done, and a '-' for every object of the stream.
Hopefully, this saves someone some time. Took embarrassingly long for me to figure this out. ^^

What's the proper way to handle back-pressure in a node.js Transform stream?

##Intro
These are my first adventures in writing the node.js server side. It's been
fun so far but I'm having some difficulty understanding the proper way
to implement something as it relates to node.js streams.
###Problem
For test and learning purposes I'm working with large files whose
content is zlib compressed. The compressed content is binary data, each
packet being 38 bytes in length. I'm trying to create a resulting file
that looks almost identical to the original file except that there is an
uncompressed 31-byte header for every 1024 38-byte packets.
###original file content (decompressed)
+----------+----------+----------+----------+
| packet 1 | packet 2 | ...... | packet N |
| 38 bytes | 38 bytes | ...... | 38 bytes |
+----------+----------+----------+----------+
###resulting file content
+----------+--------------------------------+----------+--------------------------------+
| header 1 | 1024 38 byte packets | header 2 | 1024 38 byte packets |
| 31 bytes | zlib compressed | 31 bytes | zlib compressed |
+----------+--------------------------------+----------+--------------------------------+
As you can see, it's somewhat of a translation problem. This means, I'm
taking some source stream as input and then slightly transforming it
into some output stream. Therefore, it felt natural to implement a
Transform stream.
The class simply attempts to accomplish the following:
Takes stream as input
zlib inflates the chunks of data to count the number of packets,
putting together 1024 of them, zlib deflating, and
prepending a header.
Passes the new resulting chunk on through the pipeline via
this.push(chunk).
A use case would be something like:
var fs = require('fs');
var me = require('./me'); // Where my Transform stream code sits
var inp = fs.createReadStream('depth_1000000');
var out = fs.createWriteStream('depth_1000000.out');
inp.pipe(me.createMyTranslate()).pipe(out);
###Question(s)
Assuming Transform is a good choice for this use case, I seem to be
running into a possible back-pressure issue. My call to this.push(chunk)
within _transform keeps returning false. Why would this be and how
to handle such things?

This question from 2013 is all I was able to find on how to deal with "back pressure"
when creating node Transform streams.
From the node 7.10.0 Transform stream and Readable stream documentation what I gathered
was that once push returned false, nothing else should be pushed until _read was
called.
The Transform documentation doesn't mention _read except to mention that the base Transform
class implements it (and _write). I found the information about push returning false
and _read being called in the Readable stream documentation.
The only other authoritative comment I found on Transform back pressure only mentioned
it as an issue, and that was in a comment at the top of the node file _stream_transform.js.
Here's the section about back pressure from that comment:
// This way, back-pressure is actually determined by the reading side,
// since _read has to be called to start processing a new chunk. However,
// a pathological inflate type of transform can cause excessive buffering
// here. For example, imagine a stream where every byte of input is
// interpreted as an integer from 0-255, and then results in that many
// bytes of output. Writing the 4 bytes {ff,ff,ff,ff} would result in
// 1kb of data being output. In this case, you could write a very small
// amount of input, and end up with a very large amount of output. In
// such a pathological inflating mechanism, there'd be no way to tell
// the system to stop doing the transform. A single 4MB write could
// cause the system to run out of memory.
//
// However, even in such a pathological case, only a single written chunk
// would be consumed, and then the rest would wait (un-transformed) until
// the results of the previous transformed chunk were consumed.
Solution example
Here's the solution I pieced together to handle the back pressure in a Transform stream
which I'm pretty sure works. (I haven't written any real tests, which would require
writing a Writable stream to control the back pressure.)
This is a rudimentary Line transform which needs work as a line transform but does
demonstrate handling the "back pressure".
const stream = require('stream');
class LineTransform extends stream.Transform
{
constructor(options)
{
super(options);
this._lastLine = "";
this._continueTransform = null;
this._transforming = false;
this._debugTransformCallCount = 0;
}
_transform(chunk, encoding, callback)
{
if (encoding === "buffer")
return callback(new Error("Buffer chunks not supported"));
if (this._continueTransform !== null)
return callback(new Error("_transform called before previous transform has completed."));
// DEBUG: Uncomment for debugging help to see what's going on
//console.error(`${++this._debugTransformCallCount} _transform called:`);
// Guard (so we don't call _continueTransform from _read while it is being
// invoked from _transform)
this._transforming = true;
// Do our transforming (in this case splitting the big chunk into lines)
let lines = (this._lastLine + chunk).split(/\r\n|\n/);
this._lastLine = lines.pop();
// In order to respond to "back pressure" create a function
// that will push all of the lines stopping when push returns false,
// and then resume where it left off when called again, only calling
// the "callback" once all lines from this transform have been pushed.
// Resuming (until done) will be done by _read().
let nextLine = 0;
this._continueTransform = () =>
{
let backpressure = false;
while (nextLine < lines.length)
{
if (!this.push(lines[nextLine++] + "\n"))
{
// we've got more to push, but we got backpressure so it has to wait.
if (backpressure)
return;
backpressure = !this.push(lines[nextLine++] + "\n");
}
}
// DEBUG: Uncomment for debugging help to see what's going on
//console.error(`_continueTransform ${this._debugTransformCallCount} finished\n`);
// All lines are pushed, remove this function from the LineTransform instance
this._continueTransform = null;
return callback();
};
// Start pushing the lines
this._continueTransform();
// Turn off guard allowing _read to continue the transform pushes if needed.
this._transforming = false;
}
_flush(callback)
{
if (this._lastLine.length > 0)
{
this.push(this._lastLine);
this._lastLine = "";
}
return callback();
}
_read(size)
{
// DEBUG: Uncomment for debugging help to see what's going on
//if (this._transforming)
// console.error(`_read called during _transform ${this._debugTransformCallCount}`);
// If a transform has not pushed every line yet, continue that transform
// otherwise just let the base class implementation do its thing.
if (!this._transforming && this._continueTransform !== null)
this._continueTransform();
else
super._read(size);
}
}
I tested the above by running it with the DEBUG lines uncommented on a ~10000 line
~200KB file. Redirect stdout or stderr to a file (or both) to separate the debugging
statements from the expected output. (node test.js > out.log 2> err.log)
const fs = require('fs');
let inStrm = fs.createReadStream("testdata/largefile.txt", { encoding: "utf8" });
let lineStrm = new LineTransform({ encoding: "utf8", decodeStrings: false });
inStrm.pipe(lineStrm).pipe(process.stdout);
Helpful debugging hint
While writing this initially I didn't realize that _read could be called before
_transform returned, so I hadn't implemented the this._transforming guard and I was
getting the following error:
Error: no writecb in Transform class
at afterTransform (_stream_transform.js:71:33)
at TransformState.afterTransform (_stream_transform.js:54:12)
at LineTransform._continueTransform (/userdata/mjl/Projects/personal/srt-shift/dist/textfilelines.js:44:13)
at LineTransform._transform (/userdata/mjl/Projects/personal/srt-shift/dist/textfilelines.js:46:21)
at LineTransform.Transform._read (_stream_transform.js:167:10)
at LineTransform._read (/userdata/mjl/Projects/personal/srt-shift/dist/textfilelines.js:56:15)
at LineTransform.Transform._write (_stream_transform.js:155:12)
at doWrite (_stream_writable.js:331:12)
at writeOrBuffer (_stream_writable.js:317:5)
at LineTransform.Writable.write (_stream_writable.js:243:11)
Looking at the node implementation I realized that this error meant that the callback
given to _transform was being called more than once. There wasn't much information
to be found about this error either so I thought I'd include what I figured out here.

I think Transform is suitable for this, but I would perform the inflate as a separate step in the pipeline.
Here's a quick and largely untested example:
var zlib = require('zlib');
var stream = require('stream');
var transformer = new stream.Transform();
// Properties used to keep internal state of transformer.
transformer._buffers = [];
transformer._inputSize = 0;
transformer._targetSize = 1024 * 38;
// Dump one 'output packet'
transformer._dump = function(done) {
// concatenate buffers and convert to binary string
var buffer = Buffer.concat(this._buffers).toString('binary');
// Take first 1024 packets.
var packetBuffer = buffer.substring(0, this._targetSize);
// Keep the rest and reset counter.
this._buffers = [ new Buffer(buffer.substring(this._targetSize)) ];
this._inputSize = this._buffers[0].length;
// output header
this.push('HELLO WORLD');
// output compressed packet buffer
zlib.deflate(packetBuffer, function(err, compressed) {
// TODO: handle `err`
this.push(compressed);
if (done) {
done();
}
}.bind(this));
};
// Main transformer logic: buffer chunks and dump them once the
// target size has been met.
transformer._transform = function(chunk, encoding, done) {
this._buffers.push(chunk);
this._inputSize += chunk.length;
if (this._inputSize >= this._targetSize) {
this._dump(done);
} else {
done();
}
};
// Flush any remaining buffers.
transformer._flush = function() {
this._dump();
};
// Example:
var fs = require('fs');
fs.createReadStream('depth_1000000')
.pipe(zlib.createInflate())
.pipe(transformer)
.pipe(fs.createWriteStream('depth_1000000.out'));

push will return false if the stream you are writing to (in this case, a file output stream) has too much data buffered. Since you're writing to disk, this makes sense: you are processing data faster than you can write it out.
When out's buffer is full, your transform stream will fail to push, and start buffering data itself. If that buffer should fill, then inp's will start to fill. This is how things should be working. The piped streams are only going to process data as fast as the slowest link in the chain can handle it (once your buffers are full).

Ran into a similar problem lately, needing to handle backpressure in an inflating transform stream - the secret to handling push() returning false is to register and handle the 'drain' event on the stream
_transform(data, enc, callback) {
const continueTransforming = () => {
// ... do some work / parse the data, keep state of where we're at etc
if(!this.push(event))
this._readableState.pipes.once('drain', continueTransforming); // will get called again when the reader can consume more data
if(allDone)
callback();
}
continueTransforming()
}
NOTE this is a bit hacky as we're reaching into the internals and pipes can even be an array of Readables but it does work in the common case of ....pipe(transform).pipe(...
Would be great if someone from the Node community can suggest a "correct" method for handling .push() returning false

I ended up following Ledion's example and created a utility Transform class which assists with backpressure. The utility adds an async method named addData, which the implementing Transform can await.
'use strict';
const { Transform } = require('stream');
/**
* The BackPressureTransform class adds a utility method addData which
* allows for pushing data to the Readable, while honoring back-pressure.
*/
class BackPressureTransform extends Transform {
constructor(...args) {
super(...args);
}
/**
* Asynchronously add a chunk of data to the output, honoring back-pressure.
*
* #param {String} data
* The chunk of data to add to the output.
*
* #returns {Promise<void>}
* A Promise resolving after the data has been added.
*/
async addData(data) {
// if .push() returns false, it means that the readable buffer is full
// when this occurs, we must wait for the internal readable to emit
// the 'drain' event, signalling the readable is ready for more data
if (!this.push(data)) {
await new Promise((resolve, reject) => {
const errorHandler = error => {
this.emit('error', error);
reject();
};
const boundErrorHandler = errorHandler.bind(this);
this._readableState.pipes.on('error', boundErrorHandler);
this._readableState.pipes.once('drain', () => {
this._readableState.pipes.removeListener('error', boundErrorHandler);
resolve();
});
});
}
}
}
module.exports = {
BackPressureTransform
};
Using this utility class, my Transforms look like this now:
'use strict';
const { BackPressureTransform } = require('./back-pressure-transform');
/**
* The Formatter class accepts the transformed row to be added to the output file.
* The class provides generic support for formatting the result file.
*/
class Formatter extends BackPressureTransform {
constructor() {
super({
encoding: 'utf8',
readableObjectMode: false,
writableObjectMode: true
});
this.anyObjectsWritten = false;
}
/**
* Called when the data pipeline is complete.
*
* #param {Function} callback
* The function which is called when final processing is complete.
*
* #returns {Promise<void>}
* A Promise resolving after the flush completes.
*/
async _flush(callback) {
// if any object is added, close the surrounding array
if (this.anyObjectsWritten) {
await this.addData('\n]');
}
callback(null);
}
/**
* Given the transformed row from the ETL, format it to the desired layout.
*
* #param {Object} sourceRow
* The transformed row from the ETL.
*
* #param {String} encoding
* Ignored in object mode.
*
* #param {Function} callback
* The callback function which is called when the formatting is complete.
*
* #returns {Promise<void>}
* A Promise resolving after the row is transformed.
*/
async _transform(sourceRow, encoding, callback) {
// before the first object is added, surround the data as an array
// between each object, add a comma separator
await this.addData(this.anyObjectsWritten ? ',\n' : '[\n');
// update state
this.anyObjectsWritten = true;
// add the object to the output
const parsed = JSON.stringify(sourceRow, null, 2).split('\n');
for (const [index, row] of parsed.entries()) {
// prepend the row with 2 additional spaces since we're inside a larger array
await this.addData(` ${row}`);
// add line breaks except for the last row
if (index < parsed.length - 1) {
await this.addData('\n');
}
}
callback(null);
}
}
module.exports = {
Formatter
};

Mike Lippert's answer is the closest to the truth, I think. It appears that waiting for a new _read() call to begin again from the reading stream is the only way that the Transform is actively notified that the reader is ready. I wanted to share a simple example of how I override _read() temporarily.
_transform(buf, enc, callback) {
// prepend any unused data from the prior chunk.
if (this.prev) {
buf = Buffer.concat([ this.prev, buf ]);
this.prev = null;
}
// will keep transforming until buf runs low on data.
if (buf.length < this.requiredData) {
this.prev = buf;
return callback();
}
var result = // do something with data...
var nextbuf = buf.slice(this.requiredData);
if (this.push(result)) {
// Continue transforming this chunk
this._transform(nextbuf, enc, callback);
}
else {
// Node is warning us to slow down (applying "backpressure")
// Temporarily override _read request to continue the transform
this._read = function() {
delete this._read;
this._transform(nextbuf, enc, callback);
};
}
}

I was trying to find the comment mentioned in the source code for transform and the reference link keeps being changed so I will leave this here for reference:
// a transform stream is a readable/writable stream where you do
// something with the data. Sometimes it's called a "filter",
// but that's not a great name for it, since that implies a thing where
// some bits pass through, and others are simply ignored. (That would
// be a valid example of a transform, of course.)
//
// While the output is causally related to the input, it's not a
// necessarily symmetric or synchronous transformation. For example,
// a zlib stream might take multiple plain-text writes(), and then
// emit a single compressed chunk some time in the future.
//
// Here's how this works:
//
// The Transform stream has all the aspects of the readable and writable
// stream classes. When you write(chunk), that calls _write(chunk,cb)
// internally, and returns false if there's a lot of pending writes
// buffered up. When you call read(), that calls _read(n) until
// there's enough pending readable data buffered up.
//
// In a transform stream, the written data is placed in a buffer. When
// _read(n) is called, it transforms the queued up data, calling the
// buffered _write cb's as it consumes chunks. If consuming a single
// written chunk would result in multiple output chunks, then the first
// outputted bit calls the readcb, and subsequent chunks just go into
// the read buffer, and will cause it to emit 'readable' if necessary.
//
// This way, back-pressure is actually determined by the reading side,
// since _read has to be called to start processing a new chunk. However,
// a pathological inflate type of transform can cause excessive buffering
// here. For example, imagine a stream where every byte of input is
// interpreted as an integer from 0-255, and then results in that many
// bytes of output. Writing the 4 bytes {ff,ff,ff,ff} would result in
// 1kb of data being output. In this case, you could write a very small
// amount of input, and end up with a very large amount of output. In
// such a pathological inflating mechanism, there'd be no way to tell
// the system to stop doing the transform. A single 4MB write could
// cause the system to run out of memory.
//
// However, even in such a pathological case, only a single written chunk
// would be consumed, and then the rest would wait (un-transformed) until
// the results of the previous transformed chunk were consumed.

I discovered a solution similar to Ledion's without needing to dive into the internals of the current stream pipeline. You can achieve this via:
_transform(data, enc, callback) {
const continueTransforming = () => {
// ... do some work / parse the data, keep state of where we're at etc
if(!this.push(event))
this.once('data', continueTransforming); // will get called again when the reader can consume more data
if(allDone)
callback();
}
continueTransforming()
}
This works because data is only emitted when someone downstream is consuming the Transform's readable buffer that you're this.push()-ing to. So whenever the downstream has capacity to pull off of this buffer, you should be able to start writing back to the buffer.
The flaw with listening for drain on the downstream (other than reaching into the internals of node) is that you also are relying on your Transform's buffer having been drained as well, which there's no guarantee that it has been when the downstream emits drain.

Implementing a buffered transform stream

I am trying to implement a stream with the new Node.js streams API that will buffer a certain amount of data. When this stream is piped to another stream, or if something consumes readable events, this stream should flush its buffer and then simply become pass-through. The catch is, this stream will be piped to many other streams, and when each destination stream is attached, the buffer must be flushed even if it is already flushed to another stream.
For example:
BufferStream implements stream.Transform, and keeps a 512KB internal ring buffer
ReadableStreamA is piped to an instance of BufferStream
BufferStream writes to its ring buffer, reading data from ReadableStreamA as it comes in. (It doesn't matter if data is lost, as the buffer overwrites old data.)
BufferStream is piped to WritableStreamB
WritableStreamB receives the entire 512KB buffer, and continues to get data as it is written from ReadableStreamA through BufferStream.
BufferStream is piped to WritableStreamC
WritableStreamC also receives the entire 512KB buffer, but this buffer is now different than what WritableStreamB received, because more data has since been written to BufferStream.
Is this possible with the streams API? The only method I can think of would be to create an object with a method that spins up a new PassThrough stream for each destination, meaning I couldn't simply pipe to and from it.
For what it's worth, I've done this with the old "flowing" API by simply listening for new handlers on data events. When a new function was attached with .on('data'), I would call it directly with a copy of the ring buffer.

Here's my take on your issue.
The basic idea is to create a Transform stream, which will allow us to execute your custom buffering logic before sending the data on the output of the stream:
var util = require('util')
var stream = require('stream')
var BufferStream = function (streamOptions) {
stream.Transform.call(this, streamOptions)
this.buffer = new Buffer('')
}
util.inherits(BufferStream, stream.Transform)
BufferStream.prototype._transform = function (chunk, encoding, done) {
// custom buffering logic
// ie. add chunk to this.buffer, check buffer size, etc.
this.buffer = new Buffer(chunk)
this.push(chunk)
done()
}
Then, we need to override the .pipe() method so that we are are notified when the BufferStream is piped into a stream, which allows us to automatically write data to it:
BufferStream.prototype.pipe = function (destination, options) {
var res = BufferStream.super_.prototype.pipe.call(this, destination, options)
res.write(this.buffer)
return res
}
In this way, when we write buffer.pipe(someStream), we perform the pipe as intended and write the internal buffer to the output stream. After that, the Transform class takes care of everything, while keeping track of the backpressure and whatnot.
Here is a working gist. Please note that I didn't bother writing a correct buffering logic (ie. I don't care about the size of the internal buffer), but this should be easy to fix.

Paul's answer is good, but I don't think it meets the exact requirements. It sounds like what needs to happen is that everytime pipe() is called on this transform stream, it needs to first flush the buffer that represents all the accumulation of data between time the transform stream was created/(connected to the source stream) and the time it was connected to the current writable/destination stream.
Something like this might be more correct:
var BufferStream = function () {
stream.Transform.apply(this, arguments);
this.buffer = []; //I guess an array will do
};
util.inherits(BufferStream, stream.Transform);
BufferStream.prototype._transform = function (chunk, encoding, done) {
this.push(chunk ? String(chunk) : null);
this.buffer.push(chunk ? String(chunk) : null);
done()
};
BufferStream.prototype.pipe = function (destination, options) {
var res = BufferStream.super_.prototype.pipe.apply(this, arguments);
this.buffer.forEach(function (b) {
res.write(String(b));
});
return res;
};
return new BufferStream();
I suppose this:
BufferStream.super_.prototype.pipe.apply(this, arguments);
is equivalent to this:
stream.Transform.prototype.pipe.apply(this, arguments);
You could probably optimize this and use some flags when pipe/unpipe are called.

How to read an entire text stream in node.js?

In RingoJS there's a function called read which allows you to read an entire stream until the end is reached. This is useful when you're making a command line application. For example you may write a tac program as follows:
#!/usr/bin/env ringo
var string = system.stdin.read(); // read the entire input stream
var lines = string.split("\n"); // split the lines
lines.reverse(); // reverse the lines
var reversed = lines.join("\n"); // join the reversed lines
system.stdout.write(reversed); // write the reversed lines
This allows you to fire up a shell and run the tac command. Then you type in as many lines as you wish to and after you're done you can press Ctrl+D (or Ctrl+Z on Windows) to signal the end of transmission.
I want to do the same thing in node.js but I can't find any function which would do so. I thought of using the readSync function from the fs library to simulate as follows, but to no avail:
fs.readSync(0, buffer, 0, buffer.length, null);
The file descriptor for stdin (the first argument) is 0. So it should read the data from the keyboard. Instead it gives me the following error:
Error: ESPIPE, invalid seek
at Object.fs.readSync (fs.js:381:19)
at repl:1:4
at REPLServer.self.eval (repl.js:109:21)
at rli.on.self.bufferedCmd (repl.js:258:20)
at REPLServer.self.eval (repl.js:116:5)
at Interface.<anonymous> (repl.js:248:12)
at Interface.EventEmitter.emit (events.js:96:17)
at Interface._onLine (readline.js:200:10)
at Interface._line (readline.js:518:8)
at Interface._ttyWrite (readline.js:736:14)
How would you synchronously collect all the data in an input text stream and return it as a string in node.js? A code example would be very helpful.

As node.js is event and stream oriented there is no API to wait until end of stdin and buffer result, but it's easy to do manually
var content = '';
process.stdin.resume();
process.stdin.on('data', function(buf) { content += buf.toString(); });
process.stdin.on('end', function() {
// your code here
console.log(content.split('').reverse().join(''));
});
In most cases it's better not to buffer data and process incoming chunks as they arrive (using chain of already available stream parsers like xml or zlib or your own FSM parser)

The key is to use these two Stream events:
Event: 'data'
Event: 'end'
For stream.on('data', ...) you should collect your data data into either a Buffer (if it is binary) or into a string.
For on('end', ...) you should call a callback with you completed buffer, or if you can inline it and use return using a Promises library.

Let me illustrate StreetStrider's answer.
Here is how to do it with concat-stream
var concat = require('concat-stream');
yourStream.pipe(concat(function(buf){
// buf is a Node Buffer instance which contains the entire data in stream
// if your stream sends textual data, use buf.toString() to get entire stream as string
var streamContent = buf.toString();
doSomething(streamContent);
}));
// error handling is still on stream
yourStream.on('error',function(err){
console.error(err);
});
Please note that process.stdin is a stream.

There is a module for that particular task, called concat-stream.

If you are in async context and have a recent version of Node.js, here is a quick suggestion:
const chunks = []
for await (let chunk of readable) {
chunks.push(chunk)
}
console.log(Buffer.concat(chunks))

On Windows, I had some problems with the other solutions posted here - the program would run indefinitely when there's no input.
Here is a TypeScript implementation for modern NodeJS, using async generators and for await - quite a bit simpler and more robust than using the old callback based APIs, and this worked on Windows:
import process from "process";
/**
* Read everything from standard input and return a string.
*
* (If there is no data available, the Promise is rejected.)
*/
export async function readInput(): Promise<string> {
const { stdin } = process;
const chunks: Uint8Array[] = [];
if (stdin.isTTY) {
throw new Error("No input available");
}
for await (const chunk of stdin) {
chunks.push(chunk);
}
return Buffer.concat(chunks).toString('utf8');
}
Example:
(async () => {
const input = await readInput();
console.log(input);
})();
(consider adding a try/catch, if you want to handle the Promise rejection and display a more user-friendly error-message when there's no input.)

We Keep Coding

JavaScript is the programming language of the Web.