handling large web uploads for photo and videos - javascript

My use case involves uploading thousands of full quality photo and video files using browser to S3 and Wasabi storage accounts. Currently we are compression it on client's browser and right now we did it using dropzonejs which handles uploading. Right now its compressed before being uploaded to server.
However, that's what we need to change. We need to upload original quality photos and that's where it gets stuck as we cannot upload files more than 3-4 Gbs using Dropzonejs. Not sure what prevents it but we are struggling to find solution for this. We face problem randomly with memory limit in Chrome which crashes and need to restart process again. With original quality photo we assume this would not work as we will be talking for at least 10 to 15 gbs of data at least.
What would you recommend for this kind of use case where we need to upload video & photos in original quality sometimes a single photo could be taking as much as 40Mbs+. And video several Gbs.
How does Google photos manages this? We need something like that.

Chunking...
someone already have demo
https://github.com/dropzone/dropzone/blob/main/test/test-sites/2-integrations/aws-s3-multipart.html
but, i think 4GB is the max file size that Chrome will accept(And I think chrome has the highest limit compare to other browsers). which means you need to use other method to upload such as ftp, stream, scp etc... or ask your clients to slice the files themselves before uploading through their browser.
Or create a custom executable that bundles with S3 client, and let your clients use that

Do not compress on client side. It actually increases memory usage on the browser session. To my experience uploading original image from the browser uses least amount of memory as the browser should only read enough from the file to send the data, as long as you're not loading the picture locally to show the thumbnails.
I was able to upload GBs of images to S3 with client side compression turned off. I was able to upload a single 20GB video file to S3, upload 200 videos totaling over 13GB using S3 Chunk upload. Chunk upload should increase, not decrease browser memory usage and was implemented to hand transmission failures for large files.

dropzonejs supports chunking and paralleling
did you use them?
do you compress files by dropzone like this:
https://stackoverflow.com/a/51699311/18399373

You can have a huge performance improvement by writing your client to upload directly to S3 by providing a signed url(s) and have them skip the server as the middle man:
S3:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html
GCP:
https://cloud.google.com/blog/products/storage-data-transfer/uploading-images-directly-to-cloud-storage-by-using-signed-url

I would recommend using presigned url from S3. In our Project we generate presigned url by giving bucket name, path name, bucket access to upload and expiry time. So now the user can upload the file easily directly to S3. AWS will take care of all the networking issues, only condition is you should have a good internet.

Related

How to stream a dynamically created tar archive to a browser client as a single file download?

I am creating a server application to transfer many (could be 100+GB) files from discrete servers as a single tar archive to a browser client via a simple single file download.
Currently I settled on streaming in each file to the main server handling the web app's HTTP request, adding it to a dynamically created tar archive, and then streaming out the archive as it's created using chunked transfer encoding.
This works nicely but has 2 main downsides:
No download progress indicator for the browser client/user
Difficult to resume a failed/interrupted download
I'm looking for advice for either different techniques to implement this app or ways to address the shortcomings above. Some of the constraints are that since the archive is generated on the fly, the server doesn't know exactly how large the final result will be (it does have a rough idea since it is not compressed currently, it's just total(file_size + file_padding), maybe this could be determined?). And there is limited disk space available on the server along with limited memory on server & client so the download needs to occur in a streaming fashion; ie not storing it all in-memory or on-disk on server or client before writing it out to client's disk.
I've thought about implementing this by having the browser download chunks of known size and writing it out to a file as it goes, but it's not clear to me that this can be done with today's filesystem access limitations? And also without holding the entire archive in-memory on the client side. Aand also while presenting it as a single download to the user instead of n separate downloads. Kept running into hiccups like this so wanted to see what y'all think
Thanks for your help :)

How browsers work on upload multiple file parts asynchronously

I'm designing and implementing split one big file into small chunks for upload.
I've tried chunk size for 1mb, 10mb, 100mb for uploading 1gb, 10gb files.
I tested it on Chrome browser and not found specific performance difference among above different chunk size.
Quick questions here. If I choose chunk size as 1mb for upload 10GB file.
There will be 10K chunks to upload. Is there any limitation for IE, Chrome, or Safari for doing this intensive task?
Usually, how many workers/threads will be using for uploading at a time?
Thanks a lot!
for(let chunkIndex=0; chunkIndex< LAST_CHUNK_INDEX ;chunkIndex++) {
SEND_CHUNK[chunkIndex] // Using axios or xhr for uploading files.
}
I haven't found any official document about limitation for chunks numbers about browsers. But it sometimes might have limitation when using some tool to upload to somewhere specific, like this.
About the working principle, we send each chunk as a separate request.
I found a sample of uploading file by chunk. And I found the comment made by rizsi is very useful:
The main reason for chunked upload is that the server does not need to store the whole file in memory - which is also possible to work around on the server side when data is streamed to file directly, the second reason is to make big file upload resumable in case the TCP stream breaks.
So we should send each chunk after the previous one has finished. If we send all chunks at once, it would flood the server with all the pieces at once making the whole thing pointless.

How to use Cloud Storage to multipart upload Music/Video (for Laravel)?

I'm thinking using S3 Amazon for storage (if you guys have better one, don't hesitate to recommend). I could do a normal file upload, but let's say I'm uploading a very large video/music file (something > 20mb), I will hit php upload maxfilesize.
Yes I can just set php_max_filesize value but I am thinking something else.
How to break those files into smaller chunks, and upload each chunks (like how you download torrent files, they each have the download chunks)? And stopping and continuing them anytime the user wants?
This (http://www.hjsplit.org/php/) does not work, because it needs to read the file ON THE SERVER first, before uploading it, so it's useless. Are there any Laravel/PHP/JS library for this?
Or, I wonder how youtube did it?
The way Chunked Uploads work is by splitting the files into pieces in Javascript and sending the pieces numbered to the backend, and then the backend typically just copies the content of each chunk it receives into a file it creates and keeps counting up the size of the chunks until they make up the full size of the original file you wanted to upload.
As far as packages, look at the official AWS SDK for PHP http://docs.aws.amazon.com/AmazonS3/latest/dev/usingHLmpuPHP.html
Look at the second example which uses their multipart upload feature.

Upload image by URL in single-page application with Canvas and File API

We have a single-page application (Rails back-end, Ember.js front-end) where we're currently moving away from a server-side image uploader to a client-side image uploader. We previously used the Carrierwave gem to do resizing and sending to S3 on the server. Now we want to the resizing (using HTML5 Canvas and File API) and sending to S3 directly on the client.
This works well when the user selects an image from his computer. It's definitely way faster for the user and puts less load on the server.
However, our users have gotten used to the "Upload by URL" functionality that we offered. It works like the "Search by image" functionality on Google Image Search. Instead of selecting a file from his computer, the user pastes a URL to an image.
Because of the same-origin policy, we cannot use Canvas API on an external image (it becomes tainted, see https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API/Tutorial/Using_images#Using_images_from_other_domains).
I see two possible solutions:
In order to be able to upload to S3 directly from the client, we need to generate a pre-signed key on the server. We could pass the URL to the image in that same request, download it on the server while we generate the pre-signed key, and put the image as base64 payload in the response.
Use a proxy on our domain and use it to bypass the SOP. So access the image on the client as https://www.mydomain.com/?link=http://www.link.to/image/selected/by/user.jpg.
My questions are:
Do you know any other way to bypass the same-origin policy to provide a "Upload by URL" functionality?
Which solution do you think is best?
How hard is it to setup 2)? I have no experience in setting up proxies. FWIW, we host our application on Heroku.
I hope the situation I described is clear enough.
Thank you!
Yoran
Yes, you could force your clients to download the other-domain image to their local drives and then upload that copy from their local drives.
"Best" is subjective & relative to your configuration. The traditional workaround is your option#2--bounce the image off your server. Really all you're doing is having your server upload the image and re-serve it to the client. If you're expecting a huge volume of images, then forcing the client to download their own images might be better rather than gumming up your server by "cleaning" their images.
How hard to set up? It's fairly easy...after all you're just having some server code pull a remote image and save it to a specified server directory. The only modestly hard part is:
Make sure the server never interprets one of those client-inspired urls as executable (viruses!)
Clear the new directory often so the server is not over-weighted with images loaded for the client
Set limits on the size and quantity of images the client can upload to your server (denial-of-service attack!).

jQueryFileUpload upload a file given its URL

I'd like to use jQueryFileUpload to upload a file that is not on my computer but rather is at an external website so all I have is its URL, e.g., https://dl.dropboxusercontent.com/s/wkfr8d04dhgbd86/onarborLogo64.png.
I'm at a total loss on how to do this but I think it involves adding the file data programmatically rather than using the traditional <label for='myComputerFiles'>-based selection of files.
If this is correct, what next FileReader()?
Any thoughts would be appreciated.
You should do this on the server - it requires user intervention to download it locally, and that just seems hacky and unfriendly.
Reason? You would have to first download the file (which is an indeterminate process), and then upload it to the server. If you pass the URL to the server then it can perform the whole process in 1 action - a download (which is effectively the same as you uploading it). Also, the ability to read local files, which is what FileReader is for, does not mean you should download files to just upload them again. That's bad logic and your users will not appreciate it.
Also, Dropbox Chooser is not meant to be a way to download files. It's meant to be a replacement for downloading file, or uploading them to other servers... ...without having to worry about the complexities of implementing a file browser, authentication, or managing uploads and storage.
Since you're using S3, if there is an API call on S3 that allows you to specify a URL then that would be the most obvious thing to use. If you can't do that then you either need to download the file for the user (onto your server) and then upload the file to S3, or you're back to the original idea of downloading at the client and uploading from there. Either way, the introduction of S3 obviously adds another layer of complication, but I'd initially look at getting a URL from the client and getting that file on my server so I could do anything I wanted after that.
This previous question may be of some help in this area...
How to upload files directly to Amazon S3 from a remote server?

Categories