There is a need to reliably(*) upload big files (100 Mb+) from an ASP.NET MVC frontend to server. Any suggestions (techniques, JS libs, server components, etc) ?
(*) "Reliably" here means that in case the connection breaks, client and server should be able to pick up the upload from the point where it was interrupted so that no "upload restart" is needed.
Have you looked at http://aquantum-demo.appspot.com/file-upload. On the features list it have:
Resumable uploads: Aborted uploads can be resumed with browsers supporting the Blob API.
Chunked uploads: Large files can be uploaded in smaller chunks with browsers supporting the Blob API.
So it looks like what you need.
Raised a similar question with Microsoft recently.
They have a story for this in WCF WebApi in codeplex. It's currently at preview 5, but very functional.
Its stuff that will be in the next version of WCF.
Related
I am creating a server application to transfer many (could be 100+GB) files from discrete servers as a single tar archive to a browser client via a simple single file download.
Currently I settled on streaming in each file to the main server handling the web app's HTTP request, adding it to a dynamically created tar archive, and then streaming out the archive as it's created using chunked transfer encoding.
This works nicely but has 2 main downsides:
No download progress indicator for the browser client/user
Difficult to resume a failed/interrupted download
I'm looking for advice for either different techniques to implement this app or ways to address the shortcomings above. Some of the constraints are that since the archive is generated on the fly, the server doesn't know exactly how large the final result will be (it does have a rough idea since it is not compressed currently, it's just total(file_size + file_padding), maybe this could be determined?). And there is limited disk space available on the server along with limited memory on server & client so the download needs to occur in a streaming fashion; ie not storing it all in-memory or on-disk on server or client before writing it out to client's disk.
I've thought about implementing this by having the browser download chunks of known size and writing it out to a file as it goes, but it's not clear to me that this can be done with today's filesystem access limitations? And also without holding the entire archive in-memory on the client side. Aand also while presenting it as a single download to the user instead of n separate downloads. Kept running into hiccups like this so wanted to see what y'all think
Thanks for your help :)
My use case involves uploading thousands of full quality photo and video files using browser to S3 and Wasabi storage accounts. Currently we are compression it on client's browser and right now we did it using dropzonejs which handles uploading. Right now its compressed before being uploaded to server.
However, that's what we need to change. We need to upload original quality photos and that's where it gets stuck as we cannot upload files more than 3-4 Gbs using Dropzonejs. Not sure what prevents it but we are struggling to find solution for this. We face problem randomly with memory limit in Chrome which crashes and need to restart process again. With original quality photo we assume this would not work as we will be talking for at least 10 to 15 gbs of data at least.
What would you recommend for this kind of use case where we need to upload video & photos in original quality sometimes a single photo could be taking as much as 40Mbs+. And video several Gbs.
How does Google photos manages this? We need something like that.
Chunking...
someone already have demo
https://github.com/dropzone/dropzone/blob/main/test/test-sites/2-integrations/aws-s3-multipart.html
but, i think 4GB is the max file size that Chrome will accept(And I think chrome has the highest limit compare to other browsers). which means you need to use other method to upload such as ftp, stream, scp etc... or ask your clients to slice the files themselves before uploading through their browser.
Or create a custom executable that bundles with S3 client, and let your clients use that
Do not compress on client side. It actually increases memory usage on the browser session. To my experience uploading original image from the browser uses least amount of memory as the browser should only read enough from the file to send the data, as long as you're not loading the picture locally to show the thumbnails.
I was able to upload GBs of images to S3 with client side compression turned off. I was able to upload a single 20GB video file to S3, upload 200 videos totaling over 13GB using S3 Chunk upload. Chunk upload should increase, not decrease browser memory usage and was implemented to hand transmission failures for large files.
dropzonejs supports chunking and paralleling
did you use them?
do you compress files by dropzone like this:
https://stackoverflow.com/a/51699311/18399373
You can have a huge performance improvement by writing your client to upload directly to S3 by providing a signed url(s) and have them skip the server as the middle man:
S3:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html
GCP:
https://cloud.google.com/blog/products/storage-data-transfer/uploading-images-directly-to-cloud-storage-by-using-signed-url
I would recommend using presigned url from S3. In our Project we generate presigned url by giving bucket name, path name, bucket access to upload and expiry time. So now the user can upload the file easily directly to S3. AWS will take care of all the networking issues, only condition is you should have a good internet.
I'm designing and implementing split one big file into small chunks for upload.
I've tried chunk size for 1mb, 10mb, 100mb for uploading 1gb, 10gb files.
I tested it on Chrome browser and not found specific performance difference among above different chunk size.
Quick questions here. If I choose chunk size as 1mb for upload 10GB file.
There will be 10K chunks to upload. Is there any limitation for IE, Chrome, or Safari for doing this intensive task?
Usually, how many workers/threads will be using for uploading at a time?
Thanks a lot!
for(let chunkIndex=0; chunkIndex< LAST_CHUNK_INDEX ;chunkIndex++) {
SEND_CHUNK[chunkIndex] // Using axios or xhr for uploading files.
}
I haven't found any official document about limitation for chunks numbers about browsers. But it sometimes might have limitation when using some tool to upload to somewhere specific, like this.
About the working principle, we send each chunk as a separate request.
I found a sample of uploading file by chunk. And I found the comment made by rizsi is very useful:
The main reason for chunked upload is that the server does not need to store the whole file in memory - which is also possible to work around on the server side when data is streamed to file directly, the second reason is to make big file upload resumable in case the TCP stream breaks.
So we should send each chunk after the previous one has finished. If we send all chunks at once, it would flood the server with all the pieces at once making the whole thing pointless.
We have a single-page application (Rails back-end, Ember.js front-end) where we're currently moving away from a server-side image uploader to a client-side image uploader. We previously used the Carrierwave gem to do resizing and sending to S3 on the server. Now we want to the resizing (using HTML5 Canvas and File API) and sending to S3 directly on the client.
This works well when the user selects an image from his computer. It's definitely way faster for the user and puts less load on the server.
However, our users have gotten used to the "Upload by URL" functionality that we offered. It works like the "Search by image" functionality on Google Image Search. Instead of selecting a file from his computer, the user pastes a URL to an image.
Because of the same-origin policy, we cannot use Canvas API on an external image (it becomes tainted, see https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API/Tutorial/Using_images#Using_images_from_other_domains).
I see two possible solutions:
In order to be able to upload to S3 directly from the client, we need to generate a pre-signed key on the server. We could pass the URL to the image in that same request, download it on the server while we generate the pre-signed key, and put the image as base64 payload in the response.
Use a proxy on our domain and use it to bypass the SOP. So access the image on the client as https://www.mydomain.com/?link=http://www.link.to/image/selected/by/user.jpg.
My questions are:
Do you know any other way to bypass the same-origin policy to provide a "Upload by URL" functionality?
Which solution do you think is best?
How hard is it to setup 2)? I have no experience in setting up proxies. FWIW, we host our application on Heroku.
I hope the situation I described is clear enough.
Thank you!
Yoran
Yes, you could force your clients to download the other-domain image to their local drives and then upload that copy from their local drives.
"Best" is subjective & relative to your configuration. The traditional workaround is your option#2--bounce the image off your server. Really all you're doing is having your server upload the image and re-serve it to the client. If you're expecting a huge volume of images, then forcing the client to download their own images might be better rather than gumming up your server by "cleaning" their images.
How hard to set up? It's fairly easy...after all you're just having some server code pull a remote image and save it to a specified server directory. The only modestly hard part is:
Make sure the server never interprets one of those client-inspired urls as executable (viruses!)
Clear the new directory often so the server is not over-weighted with images loaded for the client
Set limits on the size and quantity of images the client can upload to your server (denial-of-service attack!).
I am trying to create a utility website that will parse csv files uploaded by clients.
I want the processing to happen completely on the client side rather than having the file be uploaded to some server and then some server program parsing out the contents of the file. Is this possible? I'm a backend guy so any frontend advice would be helpful.
If you are willing to restrict to supported browsers, you can use html 5 filereader API:
the main problem is ie < 10.
http://caniuse.com/filereader
More info on File API:
browser load local file without upload
http://www.w3.org/TR/FileAPI/
once you access the file, parsing the csv with javascript is easy with existing libraries such as:
https://github.com/gkindel/CSV-JS