JavaScript: Writing to download stream - javascript

I want to download an encrypted file from my server, decrypt it and save it locally. I want to decrypt the file and write it locally as it is being downloaded rather than waiting for the download to finish, decrypting it and then putting the decrypted file in an anchor tag. The main reason I want to do this is so that with large files the browser does not have to store hundreds of megabytes or several gigabytes in memory.

This is only going to be possible with a combination of service worker + fetch + stream
A few browser has worker and fetch but even fewer support fetch with streaming (Blink)
new Response(new ReadableStream({...}))
I have built a streaming file saver lib to communicate with a service worker in other to intercept network request: StreamSaver.js
It's a little bit different from node's stream here is an example
function unencrypt(){
// should return Uint8Array
return new Uint8Array()
}
// We use fetch instead of xhr that has streaming support
fetch(url).then(res => {
// create a writable stream + intercept a network response
const fileStream = streamSaver.createWriteStream('filename.txt')
const writer = fileStream.getWriter()
// stream the response
const reader = res.body.getReader()
const pump = () => reader.read()
.then(({ value, done }) => {
let chunk = unencrypt(value)
// Write one chunk, then get the next one
writer.write(chunk) // returns a promise
// While the write stream can handle the watermark,
// read more data
return writer.ready.then(pump)
)
// Start the reader
pump().then(() =>
console.log('Closed the stream, Done writing')
)
})
There are also two other way you can get streaming response with xhr, but it's not standard and doesn't mather if you use them (responseType = ms-stream || moz-chunked-arrayBuffer) cuz StreamSaver depends on fetch + ReadableStream any ways and can't be used in any other way
Later you will be able to do something like this when WritableStream + Transform streams gets implemented as well
fetch(url).then(res => {
const fileStream = streamSaver.createWriteStream('filename.txt')
res.body
.pipeThrogh(unencrypt)
.pipeTo(fileStream)
.then(done)
})
It's also worth mentioning that the default download manager is commonly associated with background download so ppl sometimes close the tab when they see the download. But this is all happening in the main thread so you need to warn the user when they leave
window.onbeforeunload = function(e) {
if( download_is_done() ) return
var dialogText = 'Download is not finish, leaving the page will abort the download'
e.returnValue = dialogText
return dialogText
}

New solution has arrived: showSaveFilePicker/FileSystemWritableFileStream, supported in Chrome, Edge, and Opera since October 2020 (and with a ServiceWorker-based shim for Firefox—from the author of the other major answer!), will allow you to do this directly:
async function streamDownloadDecryptToDisk(url, DECRYPT) {
// create readable stream for ciphertext
let rs_src = fetch(url).then(response => response.body);
// create writable stream for file
let ws_dest = window.showSaveFilePicker().then(handle => handle.createWritable());
// create transform stream for decryption
let ts_dec = new TransformStream({
async transform(chunk, controller) {
controller.enqueue(await DECRYPT(chunk));
}
});
// stream cleartext to file
let rs_clear = rs_src.then(s => s.pipeThrough(ts_dec));
return (await rs_clear).pipeTo(await ws_dest);
}
Depending on performance—if you're trying to compete with MEGA, for instance—you might also consider modifying DECRYPT(chunk) to allow you to use ReadableStreamBYOBReader with it:
…zero-copy reading from an underlying byte source. It is used for efficient copying from underlying sources where the data is delivered as an "anonymous" sequence of bytes, such as files.

For security reasons, browsers do not allow piping an incoming readable stream directly to the local file system, so you have two ways to solve it:
window.open(Resource_URL): download the resource in a new window with
Content_Disposition set to "attachment";
<a download href="path/to/resource"></a>: using the "download" attribute of
AnchorElement to download stream into the hard disk;
hope these helps :)

Related

Nodejs - How to detect when a readable stream has finished transfering data to writable stream

Background:
I'm using node-fpdf to generate pdf files at my server (MEAN stack), pdf files are stored temporarily on a Readable stream object like this:
// Code obtained directly from the library, 'buffer' holds all the pdf content.
const { Readable } = require('stream');
this.buffer = new Readable({ read() { } });
When I write some data over the PDF file with the library functions (let's say, to write a string) essentially what the library does is to push data on the stream like this:
this.buffer.push(someData) // Remember that this.buffer is actually a readable stream.
Once I'm done with the file I write it on disk with the library's own function which essentially pipes the readable stream into a file which of course writes it into the disk as a pdf file:
const fs = require('fs')
/**
* This is not the original method, I'm resuming it for readability purposes
* #param {string} Path to which PDF file will be written.
*/
Output(path) {
this.Close(); // Finish file with some details, nevermind
this.buffer.pipe(fs.createWriteStream(path))
}
So the whole process goes like this:
Client (Angular) provides data and makes http request to server (express).
Express handles request and calls pdf generation process.
Write desired data on pdf object.
Call pdf.Output('/myOuputDir/mypdf.pdf') (pipe library's internal readable stream to writable (fs)).
Create fs readable stream from '/myOuputDir/mypdf.pdf' (this one is handled by me).
Pipe my own readable stream to response object (sometimes pdf files can be heavy so streaming the data through readable stream seems to be the best approach).
The problem:
After a couple of successfull tests I realized that sometimes step 5 breaks (Create readable stream from '/myOuputDir/mypdf.pdf') because the actual file hasn't finished being written to disk (or it hasn't even been started so it doesn't exists) because step 4 takes some time.
What have I tried
I've already tried to manually call the the library functions (on my own instance) and wrap them into a promise, so at this way I should be able to detect when the 'pipe' process has finished:
return new Promise((resolve, reject) => {
const writable = fs.createWriteStream(filePath)
// Handle all possible events (or at least the ones that VS code suggest)
writable.on('close', () => console.log('writable close')) // maybe resolve here
writable.on('finish', () => console.log('writable finish'))
writable.on('open', () => console.log('writable open'))
writable.on('pipe', () => console.log('writable pipe'))
writable.on('ready', () => console.log('writable ready'))
writable.on('unpipe', () => console.log('writable unpipe'))
writable.on('drain', () => console.log('writable drain'))
writable.on('error', (err) => reject(err))
// Remember that pdf.buffer is the object that handles the file content, `pdf` is the library instance
pdf.buffer.on('end', () => console.log('readable end')) // maybe resolve here
pdf.buffer.on('error', () => console.log('readable error'))
pdf.buffer.on('pause', () => console.log('readable pause'))
pdf.buffer.on('readable', () => console.log('readable readable'))
pdf.buffer.on('resume', () => console.log('readable resume'))
pdf.buffer.on('close', () => console.log('readable close'))
pdf.Close() // Library function which finishes the pdf file.
pdf.buffer.pipe(writable) // Pipe readable pdf object to writable stream (fs)
})
I've put all this console log functions in an attempt to check all the possible events of both streams (at this way I could resolve the promise at readable's end or writable close event but for some reason they are never triggered) but the only logs I receive are:
writable pipe
readable pause
readable resume
Summarizing:
What I need is a way to detect when a readable stream (initialized by stream class, not fs) finishes it's pipe process to a writable stream, I was thinking that there must be a function/property to force the readable stream (pdf) to freeze or to make it say 'hey I've got no more data to provide you' so at this way I could handle my issue but I couldn't find any way of doing this.
An alternative:
I've also thought that I could try to pipe the pdf.buffer (remember it is a readable stream) directly to express response object and handle it at client side but after lot's of reading I couldn't find how to specify an observable with this type and also how to handle it from an Angular service.
My nodejs version is: v12.22.9

Recording and uploading audio from javascript

I am trying to record and upload audio from javascript. I can successfullly record audio Blobs from a MediaRecorder. My understanding is that after recording several chunks into blobs, I would concatenate them as a new Blob(audioBlobs) and upload that. Unfortunately, the result on the server-side keeps being more or less gibberish. I'm currently running a localhost connection, so converting to uncompressed WAV isn't a problem (might be come one later, but that's a separate issue). Here is what I have so far
navigator.mediaDevices.getUserMedia({audio: true, video: false})
.then(stream => {
const mediaRecorder = new MediaRecorder(stream);
mediaRecorder.start(1000);
const audioChunks = [];
mediaRecorder.addEventListener("dataavailable", event => {
audioChunks.push(event.data);
});
function sendData () {
const audioBlob = new Blob(audioChunks);
session.call('my.app.method', [XXXXXX see below XXXXXX])
}
})
The session object here is an autobahn.js websockets connection to a python server (using soundfile. I tried a number of arguments in the place that was labelled by XXXXX in the code.
Just pass the audioBlob. In that case, the python side just receives an empty dictionary.
Pass audioBlob.text() in that case, I get something that looks somewhat binary (starts with OggS), but it can't be decoded.
Pass audioBlob.arrayBuffer(). In that case the python side receives an empty dictionary.
A possible solution could be to convert the data to WAV on the serverside (just changing the mime-type on the blob doesn't work) or to find a way to interpret the .text() output on the server side.
The solution was to use recorder.js and then use the getBuffer method in there to get the wave data as a Float32Array.

Dealing with IOS web browsers not caching audio

I have a language site that I am working on to teach language. Users can click on objects and hear the audio for what they click on. Many of the people that will be using this are in more remote areas with slower Internet connections. Because of this, I am needing to cache audio before each of the activities is loaded otherwise there is too much of a delay.
Previously, I was having an issue where preloading would not work because iOS devices do not allow audio to load without a click event. I have gotten around this, however, I now have another issue. iOS/Safari only allows the most recent audio file to be loaded. Therefore, whenever the user clicks on another audio file (even if it was clicked on previously), it is not cached and the browser has to download it again.
So far I have not found an adequate solution to this. There are many posts from around 2011~2012 that try to deal with this but I have not found a good solution. One solution was to combine all audio clips for activity into a single audio file. That way only one audio file would be loaded into memory for each activity and then you just pick a particular part of the audio file to play. While this may work, it also becomes a nuisance whenever an audio clip needs to be changed, added, or removed.
I need something that works well in a ReactJS/Redux environment and caches properly on iOS devices.
Is there a 2020 solution that works well?
You can use IndexedDB. It's a low-level API for client-side storage of significant amounts of structured data, including files/blobs. IndexedDB API is powerful, but may seem too complicated for simple cases. If you'd prefer a simple API, try libraries such as localForage, dexie.js.
localForage is A Polyfill providing a simple name:value syntax for client-side data storage, which uses IndexedDB in the background, but falls back to WebSQL and then localStorage in browsers that don't support IndexedDB.
You can check the browser support for IndexedDB here: https://caniuse.com/#search=IndexedDB. It's well supported. Here is a simple example I made to show the concept:
index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Audio</title>
</head>
<body>
<h1>Audio</h1>
<div id="container"></div>
<script src="localForage.js"></script>
<script src="main.js"></script>
</body>
</html>
main.js
"use strict";
(function() {
localforage.setItem("test", "working");
// create HTML5 audio player
function createAudioPlayer(audio) {
const audioEl = document.createElement("audio");
const audioSrc = document.createElement("source");
const container = document.getElementById("container");
audioEl.controls = true;
audioSrc.type = audio.type;
audioSrc.src = URL.createObjectURL(audio);
container.append(audioEl);
audioEl.append(audioSrc);
}
window.addEventListener("load", e => {
console.log("page loaded");
// get the audio from indexedDB
localforage.getItem("audio").then(audio => {
// it may be null if it doesn't exist
if (audio) {
console.log("audio exist");
createAudioPlayer(audio);
} else {
console.log("audio doesn't exist");
// fetch local audio file from my disk
fetch("panumoon_-_sidebyside_2.mp3")
// convert it to blob
.then(res => res.blob())
.then(audio => {
// save the blob to indexedDB
localforage
.setItem("audio", audio)
// create HTML5 audio player
.then(audio => createAudioPlayer(audio));
});
}
});
});
})();
localForage.js just includes the code from here: https://github.com/localForage/localForage/blob/master/dist/localforage.js
You can check IndexedDB in chrome dev tools and you will find our items there:
and if you refresh the page you will still see it there and you will see the audio player created as well. I hope this answered your question.
BTW, older versions of safari IOS didn't support storing blob in IndexedDB if it's still the case you can store the audio files as ArrayBuffer which is very well supported. Here is an example using ArrayBuffer:
main.js
"use strict";
(function() {
localforage.setItem("test", "working");
// convert arrayBuffer to Blob
function arrayBufferToBlob(buffer, type) {
return new Blob([buffer], { type: type });
}
// convert Blob to arrayBuffer
function blobToArrayBuffer(blob) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.addEventListener("loadend", e => {
resolve(reader.result);
});
reader.addEventListener("error", reject);
reader.readAsArrayBuffer(blob);
});
}
// create HTML5 audio player
function createAudioPlayer(audio) {
// if it's a buffer
if (audio.buffer) {
// convert it to blob
audio = arrayBufferToBlob(audio.buffer, audio.type);
}
const audioEl = document.createElement("audio");
const audioSrc = document.createElement("source");
const container = document.getElementById("container");
audioEl.controls = true;
audioSrc.type = audio.type;
audioSrc.src = URL.createObjectURL(audio);
container.append(audioEl);
audioEl.append(audioSrc);
}
window.addEventListener("load", e => {
console.log("page loaded");
// get the audio from indexedDB
localforage.getItem("audio").then(audio => {
// it may be null if it doesn't exist
if (audio) {
console.log("audio exist");
createAudioPlayer(audio);
} else {
console.log("audio doesn't exist");
// fetch local audio file from my disk
fetch("panumoon_-_sidebyside_2.mp3")
// convert it to blob
.then(res => res.blob())
.then(blob => {
const type = blob.type;
blobToArrayBuffer(blob).then(buffer => {
// save the buffer and type to indexedDB
// the type is needed to convet the buffer back to blob
localforage
.setItem("audio", { buffer, type })
// create HTML5 audio player
.then(audio => createAudioPlayer(audio));
});
});
}
});
});
})();
Moving my answer here from the comment.
You can use HTML5 localstorage API to store/cache the audio content. See this article from Apple https://developer.apple.com/library/archive/documentation/iPhone/Conceptual/SafariJSDatabaseGuide/Introduction/Introduction.html.
As per the article,
Make your website more responsive by caching resources—including audio
and video media—so they aren't reloaded from the web server each time
a user visits your site.
There is an example to show how to use the storage.
Apple also allows you to use a database if you need so. See this example: https://developer.apple.com/library/archive/documentation/iPhone/Conceptual/SafariJSDatabaseGuide/ASimpleExample/ASimpleExample.html#//apple_ref/doc/uid/TP40007256-CH4-SW4
Lets explore some browser storage options
localStorage is only good for storing short key/val string
IndexedDB is not ergonomic for it design
websql is deprecated/removed
Native file system is a good canditate but still experimental behind a flag in chrome
localForge is a just booiler lib for a key/value storage wrapped around IndexedDB and promises (good but unnecessary)
That leaves us with: Cache storage
/**
* Returns the cached url if it exist or fetches it,
* stores it and returns a blob
*
* #param {string|Request} url
* #returns {Promise<Blob>}
*/
async function cacheFirst (url) {
const cache = await caches.open('cache')
const res = await cache.match(file) || await fetch(url).then(res => {
cache.put(url, res.clone())
return res
})
return res.blob()
}
cacheFirst(url).then(blob => {
audioElm.src = URL.createObjectURL(blob)
})
Cache storage goes well hand in hand with service worker but can function without it. doe your site needs to be secure, as it's a "power function" and only exist in secure contexts.
Service worker is a grate addition if you want to build PWA (Progressive web app) with offline support, maybe you should consider it. something that can help you on the way is: workbox it can cache stuff on the fly as you need them - like some man in the middle. it also have a cache first strategy.
Then it can be as simple as just writing <audio src="url"> and let workbox do it thing

Download large data stream (> 1Gb) using javascript

I was wondering if it was possible to stream data from javascript to the browser's downloads manager.
Using webrtc, I stream data (from files > 1Gb) from a browser to the other. On the receiver side, I store into memory all this data (as arraybuffer ... so the data is essentially still chunks), and I would like the user to be able to download it.
Problem : Blob objects have a maximum size of about 600 Mb (depending on the browser) so I can't re-create the file from the chunks. Is there a way to stream these chunks so that the browser downloads them directly ?
if you want to fetch a large file blob from an api or url, you can use streamsaver.
npm install streamsaver
then you can do something like this
import { createWriteStream } from 'streamsaver';
export const downloadFile = (url, fileName) => {
return fetch(url).then(res => {
const fileStream = createWriteStream(fileName);
const writer = fileStream.getWriter();
if (res.body.pipeTo) {
writer.releaseLock();
return res.body.pipeTo(fileStream);
}
const reader = res.body.getReader();
const pump = () =>
reader
.read()
.then(({ value, done }) => (done ? writer.close() : writer.write(value).then(pump)));
return pump();
});
};
and you can use it like this:
const url = "http://urltobigfile";
const fileName = "bigfile.zip";
downloadFile(url, fileName).then(() => { alert('done'); });
Following #guest271314's advice, I added StreamSaver.js to my project, and I successfully received files bigger than 1GB on Chrome. According to the documentation, it should work for files up to 15GB but my browser crashed before that (maximum file size was about 4GB for me).
Note I: to avoid the Blob max size limitation, I also tried to manually append data to the href field of a <a></a> but it failed with files of about 600MB ...
Note II: as amazing as it might seem, the basic technique using createObjectURL works perfectly fine on Firefox for files up to 4GB !!

Prompt to download a stream of multiple files as one

Assuming there's a server storing multiple files (not necessarily text documents):
http://<server>/<path>/file0001.txt ... http://<server>/<path>/file9999.txt
If user was to download all of those files as one, how would I do it in javascript?
Normally user would have to download 9999 files and join them on his drive.
How can I prompt a download of a file and stream the data of multiple files while javascript gets them, just like it's a stream of one, big file.
I imagine it would be something like this (excuse me for lack of javascript, just trying to explain):
With (download prompt of 'onefile.txt') as connection:
While connection is open:
For file in file_list:
get file
return file.contents
connection close
Downloading each file and storing it in memory until the last one is retrieved is not a good idea, since overall size of that file can be quite big.
I'm wondering if that's even possible. I can write it in python, but that's another story. I wanted to make it a javascript function on a website.
I'm surprised javascript can't just create a "virtual localhost connection" where it uses some generator to "yield" the contents of each file...
Well, if you use a service worker then you can manipulate the response and give it a readableStream which you can "yield" the content of each file...
This is what the streamSaver dose internally but takes away all hassle...
I will show you an example using es6 and StreamSaver.js
It's not tested it's just a ruffly idea.
This will consume very little memory, but it's limited to only Blink ATM if you wanna use StreamSaver
let download = Promise.coroutine(function* (files) {
const fileStream = streamSaver.createWriteStream('onefile.txt')
const writeStream = fileStream.getWriter()
// Later you will be able to just simply do
// yield res.body.pipeTo(fileStream) instead of pumping
for (let file of files) {
let res = yield fetch(file)
let reader = res.body.getReader()
let pump = () => reader.read()
.then(({ value, done }) => !done &&
// Write one chunk, then get the next one
writeStream.write(value).then(pump)
)
yield pump()
}
// Close the stream when you are done writing
writeStream.close()
}
download([
'http://<server>/<path>/file0001.txt',
'http://<server>/<path>/file9999.txt'
]).then(() => {
alert('all done')
})

Categories