How to efficiently handle large file uploads with django+react?

How to efficiently handle large file uploads with django+react? - javascript

I have a Django application with a React frontend, and need to upload multi-gigabyte files (around 8-12gb) selected from the React on a remote machine.
Currently I'm using a simple form and uploading the files via fetch with formData in the body and then reading them out of request.FILES, however I'm not very happy with this solution, since it takes over a minute to upload the files even from the same computer to itself (and I'm not entirely sure why).
Is there any way to speed up this process? The production environment for this is a gigabit local network without any external internet access so no cloud storage please.
The data seems to be fairly compressible, easily reducing it's size by 30%+ when zipped/tared.
Is there any way to compress during file uploads, or any parameters I can change on either end to speed up the process?
It doesn't seem like an 8 GB file should take over a minute to upload from the same machine to itself, and in fact should theoretically be faster than that over a gigabit network. How can I streamline this?
It'd be nice if I could make a progress bar on frontend but fetch didn't currently allow that, but if another method does that helps with the file uploads, happens to have a way to monitoring progress of the file upload, please mention it.
React 15.4, Django 1.10, Python 3.5 lots of memory, many CPU cores, and even for the server, a GPU (in case someone has some bizarre idea to use CUDA for decompression or something though I've never heard of such a thing) are available, in case there's some sort of parallelization that could help saturate the connection.

Related

Save file change on heroku

I hava a node.js app on heroku thath checks for emails and then if the email match certian cliteria it would reply.
Because heroku restarts the dyno every so often I made a file that saves the ids of the emails (a small array) I've already cheaked (so it doesn't reply twice to the same email), but silly me, heroku restarts that file too, so no changes I made will be save.
Is there a way to save the file changes the app makes to it?
If you know of a better way of to do what I want to?

Heroku enforces this behavior (the local file deletion stuff) because it is a best practice. Writing files locally doesn't scale well, and can lead to some odd edge-case behaviors when you have multiple processes on the same VM all performing file I/O operations.
What you should use instead is either a database, a cache (like Redis), or even just write your file directly to a file storage service like Amazon S3.
I realize it sounds annoying that you have to do these extra things even for a simple use case like what you're doing here -- but Heroku's platform is geared around enforcing best practices to help people build scalable, reliable software.
If you're looking for a way to do stuff like this without the extra hassle, you might want to consider just purchasing a small VPS server from another company where you can have direct control over processes, disk, etc.

Downloading large datasets in JS

I have a web application that has an OWIN SelfHost WebAPI backend and an AngularJS frontend.
In one particular view I want to be able to collect a ridiculously large amount of data from a server API and then (on demand, not immediately) write it to a client-side CSV file.
The unformatted binary data is about 80 kB per second or 4.6 MB per minute; I'm expecting the CSV to be a lot bigger than this, but I don't have an exact measurement yet. Most uses are expected to be up to about 30 minutes (140 MB) but I'd like to support capturing as long as feasible.
At the moment I have API code to start the data capture and store this in server memory, but it quickly runs out. So I'm looking for alternate solutions. (I haven't written the client-side part yet, so that's more malleable.)
My first thought was to have the server write a temporary file/db to hold the data during the capture, then reload and stream it in chunks to the JS when the download was requested. But isn't this just going to make the browser run out of memory too?
An added complication is that at the moment the server just has the "raw" data, which is sent as JSON to the JS to perform some tweaks to it before the JS formats it to CSV. Would it be better to move this to the server side, so that the client side just downloads pre-baked CSV? (This would be mildly annoying since it will require duplicating a small amount of code, but it's fairly straightforward.)
The server doesn't need to handle many concurrent requests (especially not of this type, though it's not impossible) but given a choice between the two I'd prefer making the client do most of the work.
I'm targeting recent browsers (mostly Chrome) so using new APIs is ok (research suggests Blob is the most appropriate, though I'm not sure if that allows client-side reformatting), though cross-compatibility is always nice.

Processing file on front-end vs back-end

I am developing a web application with angularjs as the front-end and a CRUD service at the backend. One of the requirements is to allow the user to upload a csv file containing a list of items to be created. This can be implemented on the front-end by parsing the file in javascript and making create API call to the server for each item. However, I am an not sure if this approach is better than passing the file to the server and doing all the processing there. What are advantages/disadvantages of both these approaches? What is the common practice in such a scenario?

There are 4 things that I would use to make this decision:
Do you have very high load. If you parse it on the client you are using the clients CPU. Parsing it on the server could cost you by needing more CPU's.
Access to developer talent, is your team more productive programming it on the client or the server side.
If the answer to the above does not give a clear answer, then I would put it on the server side as it would be easier to test.
Will the "upload TSV" functionality be used by other parties/apps, who consume your API -- or is only the frontend using this functionality ?

Since I have implemented this scenario, couldn't resist from responding. I believe following things would be considered(Addition to points mentioned above) :
The Size of the file, (Huge files freeze UI, no brainer) it can even crash some not so modern browsers.
Does the file need parsing/sanitizing the contents?( you would not want the garbage to make its way to your server)
Does the User need a feedback of the load summary details after the upload?(Aync vs Sync) - This tied back to #1
Regardless, you'll end up using some variation of the bulk copy at the backend.

Well I think its advisable to parse files at the backend. You get so many options like
saving the file for reference
reducing the load on your user's resource (RAM and CPU depending on the size of the file and the operation being done on the file before pushing to backend)
Can re-initiate activity on file if there is an error during batch( if the error is code you can reproduce and help out client because you've got the file😉)
Unless files are alway say some <1mb csv or txt just do stuff backend
I hope this helps 😏.

Resizing and compressing AJAX images in Node/AngularJS application

I'm building an app in AngularJS, built on Node/ExpressJS. I have a list of images which are hosted externally (and I have no access to them to compress them at the source).
The issue is, that often these images are quite large - ~200kb for a 600x600 image. I don't want to serve such large files to my users, especially those on mobile with data caps and whatnot.
Is there any service (or Node module) which would allow a middleman-style way of compressing the images that AngularJS serves up to the user? Something like Google PageSpeed Service (a surprising number of people haven't heard of this, check it out, it's awesome) would be absolutely perfect, except it doesn't work with AJAX images/AngularJS.

You have services like http://kraken.io/ - It is just a matter of hooking a url pattern to an API call for the optimized image. The problem with such services is that they aren't scalable (at least cheaply), since you are using third party bandwidth and processing power.
I would strongly advice caching the files somehow, on your side of things, though. Or even do it the other way around - hook the optimizing to changes to the image list, and serve the optimized files from your end.
Doing this from angular is doing this from each user's computer: with a limit of 50 files/day lasting (apparently) 1 hour on their server, you'll quickly run out of API calls.

Splitting a file before upload?

On a webpage, is it possible to split large files into chunks before the file is uploaded to the server? For example, split a 10MB file into 1MB chunks, and upload one chunk at a time while showing a progress bar?
It sounds like JavaScript doesn't have any file manipulation abilities, but what about Flash and Java applets?
This would need to work in IE6+, Firefox and Chrome. Update: forgot to mention that (a) we are using Grails and (b) this needs to run over https.

You can try Plupload. It can be configured to check whatever runtime is available on users side, be it - Flash, Silverlight, HTML5, Gears, etc, and use whichever satisfies required features first. Among other things it supports image resizing (on users side, preserving EXIF data(!)), stream and multipart upload, and chunking. Files can be chunked on users side, and sent to a server-side handler chunk-by-chunk (requires some additional care on server), so that big files can be uploaded to a server having max filesize limit set to a value much lower then their size, for example. And more.
Some runtimes support https I believe, some need testing. Anyway, developers on there are quite responsive these days. So you might at least try ;)

The only option I know of that would allow this would be a signed Java applet.
Unsigned applets and Flash movies have no filesystem access, so they wouldn't be able to read the file data. Flash is able to upload files, but most of that is handled by the built-in Flash implementation and from what I remember the file contents would never be exposed to your code.

There is no JavaScript solution for that selection of browsers. There is the File API but whilst it works in newer Firefox and Chrome versions it's not going to happen in IE (no sign of it in IE9 betas yet either).
In any case, reading the file locally and uploading it via XMLHttpRequest is inefficient because XMLHttpRequest does not have the ability to send pure binary, only Unicode text. You can encode binary into text using base-64 (or, if you are really dedicated, a custom 7-bit encoding of your own) but this will be less efficient than a normal file upload.
You can certainly do uploads with Flash (see SWFUpload et al), or even Java if you must (Jumploader... I wouldn't bother, these days, though, as Flash prevalence is very high and the Java plugin continues to decline). You won't necessarily get the low-level control to split into chunks, but do you really need that? What for?
Another possible approach is to use a standard HTML file upload field, and when submit occurs set an interval call to poll the server with XMLHttpRequest, asking it how far the file upload is coming along. This requires a bit of work on the server end to store the current upload progress in the session or database, so another request can read it. It also means using a form parsing library that gives you progress callback, which most standard language built-in ones like PHP's don't.
Whatever you do, take a ‘progressive enhancement’ approach, allowing browsers with no support to fall back to a plain HTML upload. Browsers do typically have an upload progress bar for HTML file uploads, it just tends to be small and easily missed.

Do you specifically need it two be in X chunks? Or are you trying to solve the problems cause by uploading large files? (e.g. can't restart an upload on the client side, server side crashes when the entire file is uploaded and held in memory all at once)
Search for streaming upload components. It depends on what technologies you are working with as to which component you will prefer jsp, asp.net, etc.
http://krystalware.com/Products/SlickUpload/ This one is a server side product
Here are some more pointers to various uploaders http://weblogs.asp.net/jgalloway/archive/2008/01/08/large-file-uploads-in-asp-net.aspx
some try to manage memory on the server,e.g. so the entire huge file isn´t in memory at one time, some try to manage the client side experience.

We Keep Coding

JavaScript is the programming language of the Web.