How do I write to a Node Passthrough stream, then later read that data? When I try, the code hangs as though no data is sent. Here's a minimal example (in Typescript):
const stream = new PassThrough();
stream.write('Test chunk.');
stream.end();
// Later
const chunks: Buffer[] = [];
const output = await new Promise<Buffer>((resolve, reject) => {
stream.on('data', (chunk) => {
chunks.push(Buffer.from(chunk));}
);
stream.on('error', (err) => reject(err));
stream.on('end', () => {
resolve(Buffer.concat(chunks));
});
});
Please note that I can't attach the event listeners before writing to the stream: I don't know at the time of writing how I'm going to be reading from it. My understanding of a Transform stream like PassThrough was that it "decoupled" the Readable from the Writable, so that you could access them asynchronously.
Your code works for me, the promise resolves to a buffer containing "Test chunk.".
It will fail, however, if the readable side of the stream has already started emitting data when the stream.on('data', (chunk) => {...}) is executed. I could force such a behavior by enclosing the // Later part of your code in a setTimeout and inserting an additional
stream.on("data", () => {});
before that. This command will cause the stream to start emitting. Could that have happened in your case?
To be on the safe side, end the "early" part of your code with stream.pause() and begin the "later" part with stream.resume(), for example:
const output = await new Promise<Buffer>((resolve, reject) => {
stream.resume();
stream.on('data', (chunk) => {
...
Related
I'd like to build a layer of abstraction over the WebWorker API that would allow (1) executing an arbitrary function over a webworker, and (2) wrapping the interaction in a Promise. At a high level, this would look something like this:
function bake() {
... // expensive calculation
return 'mmmm, pizza'
}
async function handlePizzaButtonClick() {
const pizza = await workIt(bake)
eat(pizza)
}
(Obviously, methods with arguments could be added without much difficulty.)
My first cut at workIt looks like this:
async function workIt<T>(f: () => T): Promise<T> {
const worker: Worker = new Worker('./unicorn.js') // no such worker, yet
worker.postMessage(f)
return new Promise<T>((resolve, reject) => {
worker.onmessage = ({data}: MessageEvent) => resolve(data)
worker.onerror = ({error}: ErrorEvent) => reject(error)
})
}
This fails because functions are not structured-cloneable and thus can't be passed in worker messages. (The Promise wrapper part works fine.)
There are various options for serializing Javascript functions, some scarier than others. But before I go that route, am I missing something here? Is there another way to leverage a WebWorker (or anything that executes in a separate thread) to run arbitrary Javascript?
I thought an example would be useful in addition to my comment, so here's a basic (no error handling, etc.), self-contained example which loads the worker from an object URL:
Meta: I'm not posting it in a runnable code snippet view because the rendered iframe runs at a different origin (https://stacksnippets.net at the time I write this answer — see snippet output), which prevents success: in Chrome, I receive the error message Refused to cross-origin redirects of the top-level worker script..
Anyway, you can just copy the text contents, paste it into your dev tools JS console right on this page, and execute it to see that it works. And, of course, it will work in a normal module in a same-origin context.
console.log(new URL(window.location.href).origin);
// Example candidate function:
// - pure
// - uses only syntax which is legal in worker module scope
async function get100LesserRandoms () {
// If `getRandomAsync` were defined outside the function,
// then this function would no longer be pure (it would be a closure)
// and `getRandomAsync` would need to be a function accessible from
// the scope of the `message` event handler within the worker
// else a `ReferenceError` would be thrown upon invocation
const getRandomAsync = () => Promise.resolve(Math.random());
const result = [];
while (result.length < 100) {
const n = await getRandomAsync();
if (n < 0.5) result.push(n);
}
return result;
}
const workerModuleText =
`self.addEventListener('message', async ({data: {id, fn}}) => self.postMessage({id, value: await eval(\`(\${fn})\`)()}));`;
const workerModuleSpecifier = URL.createObjectURL(
new Blob([workerModuleText], {type: 'text/javascript'}),
);
const worker = new Worker(workerModuleSpecifier, {type: 'module'});
worker.addEventListener('message', ({data: {id, value}}) => {
worker.dispatchEvent(new CustomEvent(id, {detail: value}));
});
function notOnMyThread (fn) {
return new Promise(resolve => {
const id = window.crypto.randomUUID();
worker.addEventListener(id, ({detail}) => resolve(detail), {once: true});
worker.postMessage({id, fn: fn.toString()});
});
}
async function main () {
const lesserRandoms = await notOnMyThread(get100LesserRandoms);
console.log(lesserRandoms);
}
main();
I use the fetchAPI to retrieve my data from the backend as a stream.
I decrypt the data chunk by chunk and the concat the content back together for the original file.
I have found that the stream seems to provide data differently each time makling the chunnks different. How can I force the stream to the chunks in the original sequence.
fetch(myRequest, myInit).then(response => {
var tmpResult = new Uint8Array();
const reader = response.body.getReader();
return new ReadableStream({
start(controller) {
return pump();
function pump() {
return reader.read().then(({ done, value }) => {
// When no more data needs to be consumed, close the stream
if (value) {
//values here are different in order every time
//making my concatenated values different every time
controller.enqueue(value);
var decrypted = cryptor.decrypt(value);
var arrayResponse = decrypted.toArrayBuffer();
if (arrayResponse) {
tmpResult = arrayBufferConcat(tmpResult, arrayResponse);
}
}
// Enqueue the next data chunk into our target stream
if (done) {
if (counter == length) {
callback(obj);
}
controller.close();
return;
}
return pump();
});
}
}
})
})
The documentation tells us that:
Each chunk is read sequentially and output to the UI, until the stream
has finished being read, at which point we return out of the recursive
function and print the entire stream to another part of the UI.
I made a test program with node, using node-fetch:
import fetch from 'node-fetch';
const testStreamChunkOrder = async () => {
return new Promise(async (resolve) => {
let response = await fetch('https://jsonplaceholder.typicode.com/todos/');
let stream = response.body;
let data = '';
stream.on('readable', () => {
let chunk;
while (null !== (chunk = stream.read())) {
data += chunk;
}
})
stream.on('end', () => {
resolve(JSON.parse(data).splice(0, 5).map((x) => x.title));
})
});
}
(async () => {
let results = await Promise.all(new Array(10).fill(testStreamChunkOrder()))
let joined = results.map((r) => r.join(''));
console.log(`Is every result same: ${joined.every((j) => j.localeCompare(joined[0]) === 0)}`)
})()
This one fetches some random todo-list json and streams it chunk-by-chunk, accumulating the chunks into data. When the stream is done, we parse the full json and take the first 5 elements of the todo-list and keep only the titles, after which we then return the result asynchronously.
This whole process is done 10 times. When we have 10 streamed title-lists, we go through each title-list and join the title names together to form a string. Finally we use .every to see if each of the 10 strings are the same, which means that each json was fetched and streamed in the same order.
So I believe the problems lies somewhere else - the streaming itself is working correctly. While I did use node-fetch instead of the actual Fetch API, I think it is safe to say that the actual Fetch API works as it should.
Also I noticed that you are directly calling response.body.getReader(), but when I looked at the documentation, the body.getReader call is done inside another then statement:
fetch('./tortoise.png')
.then(response => response.body)
.then(body => {
const reader = body.getReader();
This might not matter, but considering everything else in your code, such as the excessive wrapping and returning of functions, I think your problems could go away just by reading a couple of tutorials on streams and cleaning up the code a bit. And if not, you will still be in a better position to figure out if the problem is in one of your many functions you are unwilling to expose. Asynchronous code's behavior is inherently difficult to debug and lacking information around such code makes it even harder.
I'm assuming you're using the cipher/decipher family of methods in node's crypto library. We can simplify this using streams by first piping the ReadableStream into a decipher TransformStream (a stream that is both readable and writable) via ReadableStream#pipe().
const { createDecipherIv } = require('crypto');
const { createWriteStream } = require('fs');
const { pipeline } = require('stream');
// change these to match your encryption scheme and key retrieval
const algo = 'aes-256-cbc';
const key = 'my5up3r53cr3t';
// put your initialization vector you've determined here
// leave null if you are not (or the algo doesn't support) using an iv
const iv = null;
// creates the decipher TransformStream
const decipher = createDecipherIv(algo, key, iv);
// write plaintext file here
const destFile = createWriteStream('/path/to/destination.ext');
fetch(myRequest, myInit)
.then(response => response.body)
.then(body => body.pipe(decipher).pipe(destFile))
.then(stream => stream.on('end', console.log('done writing file')));
You may also pipe this to be read out in a buffer, pipe to the browser, etc, just be sure to match your algorithm, key, and iv wherever you're defining your cipher/decipher functions.
If we take the pattern in that MDN example seriously, we should use the controller to enqueue the decrypted data (not the still encrypted value), and aggregate the results with the stream returned by the first promise. In other words...
return fetch(myRequest, myInit)
// Retrieve its body as ReadableStream
.then(response => {
const reader = response.body.getReader();
return new ReadableStream({
start(controller) {
return pump();
function pump() {
return reader.read().then(({ done, value }) => {
// When no more data needs to be consumed, close the stream
if (done) {
controller.close();
return;
}
// do the computational work on each chunk here and enqueue
// *the result of that work* on the controller stream...
const decrypted = cryptor.decrypt(value);
controller.enqueue(decrypted);
return pump();
});
}
}
})
})
// Create a new response out of the stream
.then(stream => new Response(stream))
// Create an object URL for the response
.then(response => response.blob())
.then(blob => {
const arrayResponse = blob.toArrayBuffer();
// arrayResponse is the properly sequenced result
// if the caller wants a promise to resolve to this, just return it
return arrayResponse;
// OR... the OP code makes reference to a callback. if that's real,
// call the callback with this result
// callback(arrayResponse);
})
.catch(err => console.error(err));
I'm new to Javascript and NodeJS. Im trying to read multiple CSV files before doing some processing on them. My current issue is when I run the code it tries to execute the processing before the reading of the file is complete. I would like to load both the CSVs before I start doing any processing on them.
Could some explain why this happens and how I can solve the problem in Javascript/NodeJS.
function readCSV(path){
var events = []
fs.createReadStream(path).pipe(csv()).on('data', (row) => {
events.push(row);
}).on('end', () => {
console.log('CSV file successfully processed. Length: ' + events.length);
});
return events
}
function app(){
var file_list = listFiles(folder_path);
for (let i = 0; i < file_list.length; i++) {
const file = file_list[i];
var events = readCSV(file)
}
processCSV(events) // Some processing
}
app();
Any help would be great and any explanation on how I can control when code is executed would be much appreciated.
Sorry, your code cannot be compiled, so I can answer only with untested code.
My current issue is when I run the code it tries to execute the processing before the reading of the file is complete.
The main problem is that fs.createReadStream doesn't read the file, it requests the file system to start reading and calls your callbacks when some chunks were read, so event 'end' will be called much later, after readCSV completed and returned an empty result.
Your code was written as if you expect an synchronous answer, and you can make it work correctly with the use of sync methods like fs.readFileSync.
How to fix it in asynchronous way? Write CSV processing in "on(end)" callback or use promises.
Promises are much simpler and linear.
First make readCSV to return Promise.
function readCSV(path: string){ //return Promise<any[]>
return new Promise((resolve) => {
var events = [];
fs.createReadStream(path)
.pipe(csv())
.on('data', (row) => {
// this code called in future
events.push(row);
}).on('end', () => {
// this code called in future to,
console.log('CSV file successfully processed. Length: ' + events.length);
resolve(events); //return csv parsed result
});
})
}
Then in main app, use Promise.all to wait all fileReading promises.
function app(){
// i don't know what is listFiles,
// i hope it returns sync result
var file_list = fs.listFiles(folder_path);
const dataPromises: Promise[] = []
for (let i = 0; i < file_list.length; i++) {
const file = file_list[i];
//launch reading
dataPromises.push(readCSV(file))
}
Promise.all(dataPromises).then(result => {
//this code will be called in future after all readCSV Promises call resolve(..)
for(const events of result){
processCSV(events);
}
})
}
To send a PDF file from a Node.js server to a client I use the following code:
const pdf = printer.createPdfKitDocument(docDefinition);
const chunks = [];
pdf.on("data", (chunk) => {
chunks.push(chunk);
});
pdf.on("end", () => {
const pdfBuffered = `data:application/pdf;base64, ${Buffer.concat(chunks).toString("base64")}`;
res.setHeader("Content-Type", "application/pdf");
res.setHeader("Content-Length", pdfBuffered.length);
res.send(pdfBuffered);
});
pdf.end();
Everything is working correctly, the only issue is that the stream here is using callback-approach rather then async/await.
I've found a possible solution:
const { pipeline } = require("stream/promises");
async function run() {
await pipeline(
fs.createReadStream('archive.tar'),
zlib.createGzip(),
fs.createWriteStream('archive.tar.gz')
);
console.log('Pipeline succeeded.');
}
run().catch(console.error);
But I can't figure out how to adopt the initial code to the one with stream/promises.
You can manually wrap your PDF code in a promise like this and then use it as a function that returns a promise:
function sendPDF(docDefinition) {
return new Promise((resolve, reject) => {
const pdf = printer.createPdfKitDocument(docDefinition);
const chunks = [];
pdf.on("data", (chunk) => {
chunks.push(chunk);
});
pdf.on("end", () => {
const pdfBuffered =
`data:application/pdf;base64, ${Buffer.concat(chunks).toString("base64")}`;
resolve(pdfBuffered);
});
pdf.on("error", reject);
pdf.end();
});
}
sendPDF(docDefinition).then(pdfBuffer => {
res.setHeader("Content-Type", "application/pdf");
res.setHeader("Content-Length", pdfBuffer.length);
res.send(pdfBuffer);
}).catch(err => {
console.log(err);
res.sendStatus(500);
});
Because there are many data events, you can't promisify just the data portion. You will still have to listen for each data event and collect the data.
You can only convert a callback-API to async/await if the callback is intended to only be executed once.
The one you found online works, because you're just waiting for the whole stream to finish before the callback runs once. What you've got is callbacks that execute multiple times, on every incoming chunk of data.
There are other resources you can look at to make streams nicer to consume, like RXJS, or this upcoming ECMAScript proposal to add observables to the language. Both of these are designed to handle the scenario when a callback can execute multiple times — something that async/await can not do.
Well I have some functions which connect to database (redis) and return some data, those functions usually are based on promises but are asynchronous and contain streams. I looked and read some things about testing and I chose to go with tape, sinon and proxyquire, if I mock this function how I would know that it works?
The following function (listKeys) returns (through promise) all the keys that exist in the redis db after completes the scanning.
let methods = {
client: client,
// Cache for listKeys
cacheKeys: [],
// Increment and return through promise all keys
// store to cacheKeys;
listKeys: blob => {
return new Promise((resolve, reject) => {
blob = blob ? blob : '*';
let stream = methods.client.scanStream({
match: blob,
count: 10,
})
stream.on('data', keys => {
for (var i = 0; i < keys.length; i++) {
if (methods.cacheKeys.indexOf(keys[i]) === -1) {
methods.cacheKeys.push(keys[i])
}
}
})
stream.on('end', () => {
resolve(methods.cacheKeys)
})
stream.on('error', reject)
})
}
}
So how do you test a function like that?
I think there are a couple ways To excercise this function through a test and all revolve around configuring a test stream to be used by your test.
I like to write test cases that I think are important first , then figure out a way to implement them. To me the most important is something like
it('should resolve cacheKeys on end')
Then a stream needs to be created to provide to your function
var Stream = require('stream');
var stream = new Stream();
Then scan stream needs to be controlled by your test
You could do this by creating a fake client
client = {
scanStream: (config) => { return stream }
}
Then a test can be configured with your assertion
var testKeys = ['t'];
Method.listKeys().then((cacheKeys) => {
assert(cacheKeys).toEqual(testKeys);
done()
})
Now that your promise is waiting on your stream with an assertion
Send data to stream.
stream.emit('data', testKeys)
A simple way to test whether the keys get saved to cacheKeys properly by mocking the DB stream, sending data over it and checking whether it got saved properly. E.g.:
// Create a mock stream to substitute database
var mockStream = new require('stream').Readable();
// Create a mock client.scanStream that returns the mocked stream
var client = {
scanStream: function () {
return mockStream;
}
};
// Assign the mocks to methods
methods.client = client;
// Call listKeys(), so the streams get prepared and the promise awaits resolution
methods.listKeys()
.then(function (r) {
// Setup asserts for correct results here
console.log('Promise resolved with: ', r);
});
// Send test data over the mocked stream
mockStream.emit('data', 'hello');
// End the stream to resolve the promise and execute the asserts
mockStream.emit('end');