Why flatMap has no output when one stream has error? - javascript

I tried to write a program with highland.js to download several files, unzip them and parse into objects, then merge object streams into one stream by flatMap and print out.
function download(url) {
return _(request(url))
.through(zlib.createGunzip())
.errors((err) => console.log('Error in gunzip', err))
.through(toObjParser)
.errors((err) => console.log('Error in OsmToObj', err));
}
const urlList = ['url_1', 'url_2', 'url_3'];
_(urlList)
.flatMap(download)
.each(console.log);
When all URLs are valid, it works fine. If a URL is invalid there is no file downloaded, then gunzip reports error. I suspect that the stream closes when error occurs. I expect that flatMap will continue with other streams, however the program doesn't download other files and there is nothing printed out.
What's the correct way to handle error in stream and how to make flatMap not stop after one stream has error?
In imperative programming, I can add debug logs to trace where error happens. How to debug streaming code?
PS. toObjParser is a Node Transform Stream. It takes a readable stream of OSM XML and outputs a stream of objects compatible with Overpass OSM JSON. See https://www.npmjs.com/package/osm2obj
2017-12-19 update:
I tried to call push in errors as #amsross suggested. To verify if push really works, I pushed a XML document and it was parsed by following parser and I saw it from output. However, stream still stopped and url_3 was not downloaded.
function download(url) {
console.log('download', url);
return _(request(url))
.through(zlib.createGunzip())
.errors((err, push) => {
console.log('Error in gunzip', err);
push(null, Buffer.from(`<?xml version='1.0' encoding='UTF-8'?>
<osmChange version="0.6">
<delete>
<node id="1" version="2" timestamp="2008-10-15T10:06:55Z" uid="5553" user="foo" changeset="1" lat="30.2719406" lon="120.1663723"/>
</delete>
</osmChange>`));
})
.through(new OsmToObj())
.errors((err) => console.log('Error in OsmToObj', err));
}
const urlList = ['url_1_correct', 'url_2_wrong', 'url_3_correct'];
_(urlList)
.flatMap(download)
.each(console.log);

Update 12/19/2017:
Ok, so I can't give you a good why on this, but I can tell you that switching from consuming the streams resulting from download in sequence to merge'ing them together will probably give you the result you're after. Unfortunately (or not?), you will no longer be getting the results back in any prescribed order.
const request = require('request')
const zlib = require('zlib')
const h = require('highland')
// just so you can see there isn't some sort of race
const rnd = (min, max) => Math.floor((Math.random() * (max - min))) + min
const delay = ms => x => h(push => setTimeout(() => {
push(null, x)
push(null, h.nil)
}, ms))
const download = url => h(request(url))
.flatMap(delay(rnd(0, 2000)))
.through(zlib.createGunzip())
h(['urlh1hcorrect', 'urlh2hwrong', 'urlh3hcorrect'])
.map(download).merge()
// vs .flatMap(download) or .map(download).sequence()
.errors(err => h.log(err))
.each(h.log)
Update 12/03/2017:
When an error is encountered on the stream, it ends that stream. To avoid this, you need to handle the error. You are currently using errors to report the error, but not handle it. You can do something like this to move on to the next value in the stream:
.errors((err, push) => {
console.log(err)
push(null) // push no error forward
})
Original:
It's difficult to answer without knowing the input and output types of toObjParser are.
Because through passes a stream of values to the provided function and expects a stream of values in return, your issue may reside in toObjParser having a signature like Stream -> Object, or Stream -> Stream Object, where the errors are occurring on the inner stream, which will not emit any errors until it is consumed.
What is the output of .each(console.log)? If it is logging a stream, that is most likely your problem.

Related

Nodejs - How to detect when a readable stream has finished transfering data to writable stream

Background:
I'm using node-fpdf to generate pdf files at my server (MEAN stack), pdf files are stored temporarily on a Readable stream object like this:
// Code obtained directly from the library, 'buffer' holds all the pdf content.
const { Readable } = require('stream');
this.buffer = new Readable({ read() { } });
When I write some data over the PDF file with the library functions (let's say, to write a string) essentially what the library does is to push data on the stream like this:
this.buffer.push(someData) // Remember that this.buffer is actually a readable stream.
Once I'm done with the file I write it on disk with the library's own function which essentially pipes the readable stream into a file which of course writes it into the disk as a pdf file:
const fs = require('fs')
/**
* This is not the original method, I'm resuming it for readability purposes
* #param {string} Path to which PDF file will be written.
*/
Output(path) {
this.Close(); // Finish file with some details, nevermind
this.buffer.pipe(fs.createWriteStream(path))
}
So the whole process goes like this:
Client (Angular) provides data and makes http request to server (express).
Express handles request and calls pdf generation process.
Write desired data on pdf object.
Call pdf.Output('/myOuputDir/mypdf.pdf') (pipe library's internal readable stream to writable (fs)).
Create fs readable stream from '/myOuputDir/mypdf.pdf' (this one is handled by me).
Pipe my own readable stream to response object (sometimes pdf files can be heavy so streaming the data through readable stream seems to be the best approach).
The problem:
After a couple of successfull tests I realized that sometimes step 5 breaks (Create readable stream from '/myOuputDir/mypdf.pdf') because the actual file hasn't finished being written to disk (or it hasn't even been started so it doesn't exists) because step 4 takes some time.
What have I tried
I've already tried to manually call the the library functions (on my own instance) and wrap them into a promise, so at this way I should be able to detect when the 'pipe' process has finished:
return new Promise((resolve, reject) => {
const writable = fs.createWriteStream(filePath)
// Handle all possible events (or at least the ones that VS code suggest)
writable.on('close', () => console.log('writable close')) // maybe resolve here
writable.on('finish', () => console.log('writable finish'))
writable.on('open', () => console.log('writable open'))
writable.on('pipe', () => console.log('writable pipe'))
writable.on('ready', () => console.log('writable ready'))
writable.on('unpipe', () => console.log('writable unpipe'))
writable.on('drain', () => console.log('writable drain'))
writable.on('error', (err) => reject(err))
// Remember that pdf.buffer is the object that handles the file content, `pdf` is the library instance
pdf.buffer.on('end', () => console.log('readable end')) // maybe resolve here
pdf.buffer.on('error', () => console.log('readable error'))
pdf.buffer.on('pause', () => console.log('readable pause'))
pdf.buffer.on('readable', () => console.log('readable readable'))
pdf.buffer.on('resume', () => console.log('readable resume'))
pdf.buffer.on('close', () => console.log('readable close'))
pdf.Close() // Library function which finishes the pdf file.
pdf.buffer.pipe(writable) // Pipe readable pdf object to writable stream (fs)
})
I've put all this console log functions in an attempt to check all the possible events of both streams (at this way I could resolve the promise at readable's end or writable close event but for some reason they are never triggered) but the only logs I receive are:
writable pipe
readable pause
readable resume
Summarizing:
What I need is a way to detect when a readable stream (initialized by stream class, not fs) finishes it's pipe process to a writable stream, I was thinking that there must be a function/property to force the readable stream (pdf) to freeze or to make it say 'hey I've got no more data to provide you' so at this way I could handle my issue but I couldn't find any way of doing this.
An alternative:
I've also thought that I could try to pipe the pdf.buffer (remember it is a readable stream) directly to express response object and handle it at client side but after lot's of reading I couldn't find how to specify an observable with this type and also how to handle it from an Angular service.
My nodejs version is: v12.22.9

How can I read and download a file from a ReadableStream on an API call (Node.js)?

I am using ssh2-sftp-client to get files from a remote server, I'm running into an issue reading and downloading these files once I get() them.
At first, I was able to use the get() method to download the file when the API was hit - I could also return the whole file contents in a console.log statement then it started returning Buffer content. I updated with this:
npm install ssh2-sftp-client#3.1.0
And now I get a ReadbleStream.
function getFile(req,res) {
sftp.connect(config).then(() => {
return sftp.get(process.env.SFTP_PATH + '/../...xml',true);
}).then((stream)=>{
const outFile = fs.createWriteStream('...xml')
stream.on('data', (c) => {
console.log(`Received ${c.length} bytes of data.`);
outFile.write(c);
res.send('ok')
});
stream.on('close', function() {
});
}).catch((err) => {
console.log(err, 'catch error');
});
};
I have the above code that returns a stream but I'm not sure how to get the file - the write() method doesn't seem to work here.
Any advice or suggestions on how I can use this library to read and download files would be greatly appreciated
First, don't use version 3.x. That version has been deprecated. The most recent version is v4.1.0 and has had significant cleanup work to fix a number of small bugs.
If all you want to do is download the files, then use the fastGet() method. It takes 2 args, source path and destination path. It is a lot faster than plain get as it does the download in parallel.
If you don't want to do that, then the get() method has a number of options. If you only pass in one arg (source) it will return a buffer. If you pass in 2 args, the second arg must be either a string (path to local file) or a writeable stream. If a writeable stream, the data will be piped into that stream.

NodeJS exec stop after some time without error

I'm using exec from child_process.
The function runs fine but after 4-5minutes, it just stops, without any errors reported, but the script should run for at least 24hours...
Here is the code :
import { exec } from 'child_process';
function searchDirectory(dirPath) {
let lineBuffer = '';
const cmd = `find ${dirPath} -type f -name "*.txt" | pv -l -L 10 -q`;
const findData = exec(cmd);
findData.on('error', err => log.error(err));
findData.stdout.on('data', data => {
lineBuffer += data;
let lines = lineBuffer.split('\n');
for (var i = 0; i < lines.length - 1; i++) {
let filepath = lines[i];
processfile(filepath);
}
lineBuffer = lines[lines.length - 1];
});
findData.stdout.on('end', () => console.log('finished finding...'));
}
The pv command slows down the output, I need this since the path where I'm finding is over the network and pretty slow (60mb/s).
When I run the command directly in the terminal it works fine (I didn't wait 24hours but I let it for half hour and it was still running).
The processfile function actually makes an async call with axios to send some data to a server :
let data = readFileSync(file);
...
axios.post(API_URL, { obj: data }, { proxy: false })
.then(res => {
log.info('Successfully saved object : ' + res.data._id);
})
.catch(err => {
log.error(err.response ? err.response.data : err);
});
What could cause the script to stop? Any ideas?
Thanks
I found the issue, using exec is not recommended for huge outputs since it's using a limited size buffer. Use spawn instead :
The most significant difference between child_process.spawn and
child_process.exec is in what they return - spawn returns a stream and
exec returns a buffer.
child_process.spawn returns an object with stdout and stderr streams.
You can tap on the stdout stream to read data that the child process
sends back to Node. stdout being a stream has the "data", "end", and
other events that streams have. spawn is best used to when you want
the child process to return a large amount of data to Node - image
processing, reading binary data etc.
child_process.exec returns the whole buffer output from the child
process. By default the buffer size is set at 200k. If the child
process returns anything more than that, you program will crash with
the error message "Error: maxBuffer exceeded". You can fix that
problem by setting a bigger buffer size in the exec options. But you
should not do it because exec is not meant for processes that return
HUGE buffers to Node. You should use spawn for that. So what do you
use exec for? Use it to run programs that return result statuses,
instead of data.
from : https://www.hacksparrow.com/difference-between-spawn-and-exec-of-node-js-child_process.html

Node read stream hangs on specific file, base64 encoding

I have following code working for every file except one that keeps hanging without emitting end or error events (I tried other stream events too).
const fs = require('fs');
const rs = fs.createReadStream(filePath, {
encoding: 'base64',
});
rs.on('data', () => {
console.log('data');
});
rs.on('end', () => {
console.log('end');
});
rs.on('error', e => {
console.log('error', e);
});
If I move read point with start option to 1 instead of 0 it works properly. Same if highWaterMark is set to value other than default. It doesn't really help as it seems it can fail with other "corrupted" file.
It seems like Node bug, but maybe there's something I'm missing here.
I'll post file in here too, but first I need to strip it to down to only corrupting part as it's somewhat private.
Update
Here's file to recreate the issue:
http://s3.eu-west-1.amazonaws.com/jjapitest/file
Update
Here's interactive demo of the issue:
https://repl.it/repls/AnimatedDisguisedNumerator

JavaScript: Writing to download stream

I want to download an encrypted file from my server, decrypt it and save it locally. I want to decrypt the file and write it locally as it is being downloaded rather than waiting for the download to finish, decrypting it and then putting the decrypted file in an anchor tag. The main reason I want to do this is so that with large files the browser does not have to store hundreds of megabytes or several gigabytes in memory.
This is only going to be possible with a combination of service worker + fetch + stream
A few browser has worker and fetch but even fewer support fetch with streaming (Blink)
new Response(new ReadableStream({...}))
I have built a streaming file saver lib to communicate with a service worker in other to intercept network request: StreamSaver.js
It's a little bit different from node's stream here is an example
function unencrypt(){
// should return Uint8Array
return new Uint8Array()
}
// We use fetch instead of xhr that has streaming support
fetch(url).then(res => {
// create a writable stream + intercept a network response
const fileStream = streamSaver.createWriteStream('filename.txt')
const writer = fileStream.getWriter()
// stream the response
const reader = res.body.getReader()
const pump = () => reader.read()
.then(({ value, done }) => {
let chunk = unencrypt(value)
// Write one chunk, then get the next one
writer.write(chunk) // returns a promise
// While the write stream can handle the watermark,
// read more data
return writer.ready.then(pump)
)
// Start the reader
pump().then(() =>
console.log('Closed the stream, Done writing')
)
})
There are also two other way you can get streaming response with xhr, but it's not standard and doesn't mather if you use them (responseType = ms-stream || moz-chunked-arrayBuffer) cuz StreamSaver depends on fetch + ReadableStream any ways and can't be used in any other way
Later you will be able to do something like this when WritableStream + Transform streams gets implemented as well
fetch(url).then(res => {
const fileStream = streamSaver.createWriteStream('filename.txt')
res.body
.pipeThrogh(unencrypt)
.pipeTo(fileStream)
.then(done)
})
It's also worth mentioning that the default download manager is commonly associated with background download so ppl sometimes close the tab when they see the download. But this is all happening in the main thread so you need to warn the user when they leave
window.onbeforeunload = function(e) {
if( download_is_done() ) return
var dialogText = 'Download is not finish, leaving the page will abort the download'
e.returnValue = dialogText
return dialogText
}
New solution has arrived: showSaveFilePicker/FileSystemWritableFileStream, supported in Chrome, Edge, and Opera since October 2020 (and with a ServiceWorker-based shim for Firefox—from the author of the other major answer!), will allow you to do this directly:
async function streamDownloadDecryptToDisk(url, DECRYPT) {
// create readable stream for ciphertext
let rs_src = fetch(url).then(response => response.body);
// create writable stream for file
let ws_dest = window.showSaveFilePicker().then(handle => handle.createWritable());
// create transform stream for decryption
let ts_dec = new TransformStream({
async transform(chunk, controller) {
controller.enqueue(await DECRYPT(chunk));
}
});
// stream cleartext to file
let rs_clear = rs_src.then(s => s.pipeThrough(ts_dec));
return (await rs_clear).pipeTo(await ws_dest);
}
Depending on performance—if you're trying to compete with MEGA, for instance—you might also consider modifying DECRYPT(chunk) to allow you to use ReadableStreamBYOBReader with it:
…zero-copy reading from an underlying byte source. It is used for efficient copying from underlying sources where the data is delivered as an "anonymous" sequence of bytes, such as files.
For security reasons, browsers do not allow piping an incoming readable stream directly to the local file system, so you have two ways to solve it:
window.open(Resource_URL): download the resource in a new window with
Content_Disposition set to "attachment";
<a download href="path/to/resource"></a>: using the "download" attribute of
AnchorElement to download stream into the hard disk;
hope these helps :)

Categories