Assuming there's a server storing multiple files (not necessarily text documents):
http://<server>/<path>/file0001.txt ... http://<server>/<path>/file9999.txt
If user was to download all of those files as one, how would I do it in javascript?
Normally user would have to download 9999 files and join them on his drive.
How can I prompt a download of a file and stream the data of multiple files while javascript gets them, just like it's a stream of one, big file.
I imagine it would be something like this (excuse me for lack of javascript, just trying to explain):
With (download prompt of 'onefile.txt') as connection:
While connection is open:
For file in file_list:
get file
return file.contents
connection close
Downloading each file and storing it in memory until the last one is retrieved is not a good idea, since overall size of that file can be quite big.
I'm wondering if that's even possible. I can write it in python, but that's another story. I wanted to make it a javascript function on a website.
I'm surprised javascript can't just create a "virtual localhost connection" where it uses some generator to "yield" the contents of each file...
Well, if you use a service worker then you can manipulate the response and give it a readableStream which you can "yield" the content of each file...
This is what the streamSaver dose internally but takes away all hassle...
I will show you an example using es6 and StreamSaver.js
It's not tested it's just a ruffly idea.
This will consume very little memory, but it's limited to only Blink ATM if you wanna use StreamSaver
let download = Promise.coroutine(function* (files) {
const fileStream = streamSaver.createWriteStream('onefile.txt')
const writeStream = fileStream.getWriter()
// Later you will be able to just simply do
// yield res.body.pipeTo(fileStream) instead of pumping
for (let file of files) {
let res = yield fetch(file)
let reader = res.body.getReader()
let pump = () => reader.read()
.then(({ value, done }) => !done &&
// Write one chunk, then get the next one
writeStream.write(value).then(pump)
)
yield pump()
}
// Close the stream when you are done writing
writeStream.close()
}
download([
'http://<server>/<path>/file0001.txt',
'http://<server>/<path>/file9999.txt'
]).then(() => {
alert('all done')
})
Related
Background:
I'm using node-fpdf to generate pdf files at my server (MEAN stack), pdf files are stored temporarily on a Readable stream object like this:
// Code obtained directly from the library, 'buffer' holds all the pdf content.
const { Readable } = require('stream');
this.buffer = new Readable({ read() { } });
When I write some data over the PDF file with the library functions (let's say, to write a string) essentially what the library does is to push data on the stream like this:
this.buffer.push(someData) // Remember that this.buffer is actually a readable stream.
Once I'm done with the file I write it on disk with the library's own function which essentially pipes the readable stream into a file which of course writes it into the disk as a pdf file:
const fs = require('fs')
/**
* This is not the original method, I'm resuming it for readability purposes
* #param {string} Path to which PDF file will be written.
*/
Output(path) {
this.Close(); // Finish file with some details, nevermind
this.buffer.pipe(fs.createWriteStream(path))
}
So the whole process goes like this:
Client (Angular) provides data and makes http request to server (express).
Express handles request and calls pdf generation process.
Write desired data on pdf object.
Call pdf.Output('/myOuputDir/mypdf.pdf') (pipe library's internal readable stream to writable (fs)).
Create fs readable stream from '/myOuputDir/mypdf.pdf' (this one is handled by me).
Pipe my own readable stream to response object (sometimes pdf files can be heavy so streaming the data through readable stream seems to be the best approach).
The problem:
After a couple of successfull tests I realized that sometimes step 5 breaks (Create readable stream from '/myOuputDir/mypdf.pdf') because the actual file hasn't finished being written to disk (or it hasn't even been started so it doesn't exists) because step 4 takes some time.
What have I tried
I've already tried to manually call the the library functions (on my own instance) and wrap them into a promise, so at this way I should be able to detect when the 'pipe' process has finished:
return new Promise((resolve, reject) => {
const writable = fs.createWriteStream(filePath)
// Handle all possible events (or at least the ones that VS code suggest)
writable.on('close', () => console.log('writable close')) // maybe resolve here
writable.on('finish', () => console.log('writable finish'))
writable.on('open', () => console.log('writable open'))
writable.on('pipe', () => console.log('writable pipe'))
writable.on('ready', () => console.log('writable ready'))
writable.on('unpipe', () => console.log('writable unpipe'))
writable.on('drain', () => console.log('writable drain'))
writable.on('error', (err) => reject(err))
// Remember that pdf.buffer is the object that handles the file content, `pdf` is the library instance
pdf.buffer.on('end', () => console.log('readable end')) // maybe resolve here
pdf.buffer.on('error', () => console.log('readable error'))
pdf.buffer.on('pause', () => console.log('readable pause'))
pdf.buffer.on('readable', () => console.log('readable readable'))
pdf.buffer.on('resume', () => console.log('readable resume'))
pdf.buffer.on('close', () => console.log('readable close'))
pdf.Close() // Library function which finishes the pdf file.
pdf.buffer.pipe(writable) // Pipe readable pdf object to writable stream (fs)
})
I've put all this console log functions in an attempt to check all the possible events of both streams (at this way I could resolve the promise at readable's end or writable close event but for some reason they are never triggered) but the only logs I receive are:
writable pipe
readable pause
readable resume
Summarizing:
What I need is a way to detect when a readable stream (initialized by stream class, not fs) finishes it's pipe process to a writable stream, I was thinking that there must be a function/property to force the readable stream (pdf) to freeze or to make it say 'hey I've got no more data to provide you' so at this way I could handle my issue but I couldn't find any way of doing this.
An alternative:
I've also thought that I could try to pipe the pdf.buffer (remember it is a readable stream) directly to express response object and handle it at client side but after lot's of reading I couldn't find how to specify an observable with this type and also how to handle it from an Angular service.
My nodejs version is: v12.22.9
Over the years on snapchat I have saved lots of photos that I would like to retrieve now, The problem is they do not make it easy to export, but luckily if you go online you can request all the data (thats great)
I can see all my photos download link and using the local HTML file if I click download it starts downloading.
Here's where the tricky part is, I have around 15,000 downloads I need to do and manually clicking each individual one will take ages, I've tried extracting all of the links through the download button and this creates lots of Urls (Great) but the problem is, if you past the url into the browser then ("Error: HTTP method GET is not supported by this URL") appears.
I've tried a multitude of different chrome extensions and none of them show the actually download, just the HTML which is on the left-hand side.
The download button is a clickable link that just starts the download in the tab. It belongs under Href A
I'm trying to figure out what the best way of bulk downloading each of these individual files is.
So, I just watched their code by downloading my own memories. They use a custom JavaScript function to download your data (a POST request with ID's in the body).
You can replicate this request, but you can also just use their method.
Open your console and use downloadMemories(<url>)
Or if you don't have the urls you can retrieve them yourself:
var links = document.getElementsByTagName("table")[0].getElementsByTagName("a");
eval(links[0].href);
UPDATE
I made a script for this:
https://github.com/ToTheMax/Snapchat-All-Memories-Downloader
Using the .json file you can download them one by one with python:
req = requests.post(url, allow_redirects=True)
response = req.text
file = requests.get(response)
Then get the correct extension and the date:
day = date.split(" ")[0]
time = date.split(" ")[1].replace(':', '-')
filename = f'memories/{day}_{time}.mp4' if type == 'VIDEO' else f'memories/{day}_{time}.jpg'
And then write it to file:
with open(filename, 'wb') as f:
f.write(file.content)
I've made a bot to download all memories.
You can download it here
It doesn't require any additional installation, just place the memories_history.json file in the same directory and run it. It skips the files that have already been downloaded.
Short answer
Download a desktop application that automates this process.
Visit downloadmysnapchatmemories.com to download the app. You can watch this tutorial guiding you through the entire process.
In short, the app reads the memories_history.json file provided by Snapchat and downloads each of the memories to your computer.
App source code
Long answer (How the app described above works)
We can iterate over each of the memories within the memories_history.json file found in your data download from Snapchat.
For each memory, we make a POST request to the URL stored as the memories Download Link. The response will be a URL to the file itself.
Then, we can make a GET request to the returned URL to retrieve the file.
Example
Here is a simplified example of fetching and downloading a single memory using NodeJS:
Let's say we have the following memory stored in fakeMemory.json:
{
"Date": "2022-01-26 12:00:00 UTC",
"Media Type": "Image",
"Download Link": "https://app.snapchat.com/..."
}
We can do the following:
// import required libraries
const fetch = require('node-fetch'); // Needed for making fetch requests
const fs = require('fs'); // Needed for writing to filesystem
const memory = JSON.parse(fs.readFileSync('fakeMemory.json'));
const response = await fetch(memory['Download Link'], { method: 'POST' });
const url = await response.text(); // returns URL to file
// We can now use the `url` to download the file.
const download = await fetch(url, { method: 'GET' });
const fileName = 'memory.jpg'; // file name we want this saved as
const fileData = download.body; // contents of the file
// Write the contents of the file to this computer using Node's file system
const fileStream = fs.createWriteStream(fileName);
fileData.pipe(fileStream);
fileStream.on('finish', () => {
console.log('memory successfully downloaded as memory.jpg');
});
I have a local JSON file which I intent to read/write from a NodeJS electron app. I am not sure, but I believe that instead of using readFile() and writeFile(), I should get a FileHandle to avoid multiple open and close actions.
So I've tried to grab a FileHandle from fs.promises.open(), but the problem seems to be that I am unable to get a FileHandle from an existing file without truncate it and clear it to 0.
const { resolve } = require('path');
const fsPromises = require('fs').promises;
function init() {
// Save table name
this.path = resolve(__dirname, '..', 'data', `test.json`);
// Create/Open the json file
fsPromises
.open(this.path, 'wx+')
.then(fileHandle => {
// Grab file handle if the file don't exists
// because of the flag 'wx+'
this.fh = fileHandle;
})
.catch(err => {
if (err.code === 'EEXIST') {
// File exists
}
});
}
Am I doing something wrong? Are there better ways to do it?
Links:
https://nodejs.org/api/fs.html#fs_fspromises_open_path_flags_mode
https://nodejs.org/api/fs.html#fs_file_system_flags
Because JSON is a text format that has to be read or written all at once and can't be easily modified or added onto in place, you're going to have to read the whole file or write the whole file at once.
So, your simplest option will be to just use fs.promises.readFile() and fs.promises.writeFile() and let the library open the file, read/write it and close the file. Opening and closing a file in a modern OS takes advantage of disk caching so if you're reopening a file you just previously opened not long ago, it's not going to be a slow operation. Further, since nodejs performs these operations in secondary threads in libuv, it doesn't block the main thread of nodejs either so its generally not a performance issue for your server.
If you really wanted to open the file once and hold it open, you would open it for reading and writing using the r+ flag as in:
const fileHandle = await fsPromises.open(this.path, 'r+');
Reading the whole file would be simple as the new fileHandle object has a .readFile() method.
const text = await fileHandle.readFile({encoding 'utf8'});
For writing the whole file from an open filehandle, you would have to truncate the file, then write your bytes, then flush the write buffer to ensure the last bit of the data got to the disk and isn't sitting in a buffer.
await fileHandle.truncate(0); // clear previous contents
let {bytesWritten} = await fileHandle.write(mybuffer, 0, someLength, 0); // write new data
assert(bytesWritten === someLength);
await fileHandle.sync(); // flush buffering to disk
I was wondering if it was possible to stream data from javascript to the browser's downloads manager.
Using webrtc, I stream data (from files > 1Gb) from a browser to the other. On the receiver side, I store into memory all this data (as arraybuffer ... so the data is essentially still chunks), and I would like the user to be able to download it.
Problem : Blob objects have a maximum size of about 600 Mb (depending on the browser) so I can't re-create the file from the chunks. Is there a way to stream these chunks so that the browser downloads them directly ?
if you want to fetch a large file blob from an api or url, you can use streamsaver.
npm install streamsaver
then you can do something like this
import { createWriteStream } from 'streamsaver';
export const downloadFile = (url, fileName) => {
return fetch(url).then(res => {
const fileStream = createWriteStream(fileName);
const writer = fileStream.getWriter();
if (res.body.pipeTo) {
writer.releaseLock();
return res.body.pipeTo(fileStream);
}
const reader = res.body.getReader();
const pump = () =>
reader
.read()
.then(({ value, done }) => (done ? writer.close() : writer.write(value).then(pump)));
return pump();
});
};
and you can use it like this:
const url = "http://urltobigfile";
const fileName = "bigfile.zip";
downloadFile(url, fileName).then(() => { alert('done'); });
Following #guest271314's advice, I added StreamSaver.js to my project, and I successfully received files bigger than 1GB on Chrome. According to the documentation, it should work for files up to 15GB but my browser crashed before that (maximum file size was about 4GB for me).
Note I: to avoid the Blob max size limitation, I also tried to manually append data to the href field of a <a></a> but it failed with files of about 600MB ...
Note II: as amazing as it might seem, the basic technique using createObjectURL works perfectly fine on Firefox for files up to 4GB !!
I want to download an encrypted file from my server, decrypt it and save it locally. I want to decrypt the file and write it locally as it is being downloaded rather than waiting for the download to finish, decrypting it and then putting the decrypted file in an anchor tag. The main reason I want to do this is so that with large files the browser does not have to store hundreds of megabytes or several gigabytes in memory.
This is only going to be possible with a combination of service worker + fetch + stream
A few browser has worker and fetch but even fewer support fetch with streaming (Blink)
new Response(new ReadableStream({...}))
I have built a streaming file saver lib to communicate with a service worker in other to intercept network request: StreamSaver.js
It's a little bit different from node's stream here is an example
function unencrypt(){
// should return Uint8Array
return new Uint8Array()
}
// We use fetch instead of xhr that has streaming support
fetch(url).then(res => {
// create a writable stream + intercept a network response
const fileStream = streamSaver.createWriteStream('filename.txt')
const writer = fileStream.getWriter()
// stream the response
const reader = res.body.getReader()
const pump = () => reader.read()
.then(({ value, done }) => {
let chunk = unencrypt(value)
// Write one chunk, then get the next one
writer.write(chunk) // returns a promise
// While the write stream can handle the watermark,
// read more data
return writer.ready.then(pump)
)
// Start the reader
pump().then(() =>
console.log('Closed the stream, Done writing')
)
})
There are also two other way you can get streaming response with xhr, but it's not standard and doesn't mather if you use them (responseType = ms-stream || moz-chunked-arrayBuffer) cuz StreamSaver depends on fetch + ReadableStream any ways and can't be used in any other way
Later you will be able to do something like this when WritableStream + Transform streams gets implemented as well
fetch(url).then(res => {
const fileStream = streamSaver.createWriteStream('filename.txt')
res.body
.pipeThrogh(unencrypt)
.pipeTo(fileStream)
.then(done)
})
It's also worth mentioning that the default download manager is commonly associated with background download so ppl sometimes close the tab when they see the download. But this is all happening in the main thread so you need to warn the user when they leave
window.onbeforeunload = function(e) {
if( download_is_done() ) return
var dialogText = 'Download is not finish, leaving the page will abort the download'
e.returnValue = dialogText
return dialogText
}
New solution has arrived: showSaveFilePicker/FileSystemWritableFileStream, supported in Chrome, Edge, and Opera since October 2020 (and with a ServiceWorker-based shim for Firefox—from the author of the other major answer!), will allow you to do this directly:
async function streamDownloadDecryptToDisk(url, DECRYPT) {
// create readable stream for ciphertext
let rs_src = fetch(url).then(response => response.body);
// create writable stream for file
let ws_dest = window.showSaveFilePicker().then(handle => handle.createWritable());
// create transform stream for decryption
let ts_dec = new TransformStream({
async transform(chunk, controller) {
controller.enqueue(await DECRYPT(chunk));
}
});
// stream cleartext to file
let rs_clear = rs_src.then(s => s.pipeThrough(ts_dec));
return (await rs_clear).pipeTo(await ws_dest);
}
Depending on performance—if you're trying to compete with MEGA, for instance—you might also consider modifying DECRYPT(chunk) to allow you to use ReadableStreamBYOBReader with it:
…zero-copy reading from an underlying byte source. It is used for efficient copying from underlying sources where the data is delivered as an "anonymous" sequence of bytes, such as files.
For security reasons, browsers do not allow piping an incoming readable stream directly to the local file system, so you have two ways to solve it:
window.open(Resource_URL): download the resource in a new window with
Content_Disposition set to "attachment";
<a download href="path/to/resource"></a>: using the "download" attribute of
AnchorElement to download stream into the hard disk;
hope these helps :)