Downloading large datasets in JS

Downloading large datasets in JS - javascript

I have a web application that has an OWIN SelfHost WebAPI backend and an AngularJS frontend.
In one particular view I want to be able to collect a ridiculously large amount of data from a server API and then (on demand, not immediately) write it to a client-side CSV file.
The unformatted binary data is about 80 kB per second or 4.6 MB per minute; I'm expecting the CSV to be a lot bigger than this, but I don't have an exact measurement yet. Most uses are expected to be up to about 30 minutes (140 MB) but I'd like to support capturing as long as feasible.
At the moment I have API code to start the data capture and store this in server memory, but it quickly runs out. So I'm looking for alternate solutions. (I haven't written the client-side part yet, so that's more malleable.)
My first thought was to have the server write a temporary file/db to hold the data during the capture, then reload and stream it in chunks to the JS when the download was requested. But isn't this just going to make the browser run out of memory too?
An added complication is that at the moment the server just has the "raw" data, which is sent as JSON to the JS to perform some tweaks to it before the JS formats it to CSV. Would it be better to move this to the server side, so that the client side just downloads pre-baked CSV? (This would be mildly annoying since it will require duplicating a small amount of code, but it's fairly straightforward.)
The server doesn't need to handle many concurrent requests (especially not of this type, though it's not impossible) but given a choice between the two I'd prefer making the client do most of the work.
I'm targeting recent browsers (mostly Chrome) so using new APIs is ok (research suggests Blob is the most appropriate, though I'm not sure if that allows client-side reformatting), though cross-compatibility is always nice.

Related

I Need to implement/have static master data that will be use across 20-30 microservice

I have master static data of country,city and xyz. These data don't change frequently (once in a year).
These data will be used by multiple Microservice, I want to avoid network latency so decided not to go with another master-data-microservice.
What are the best way to implement this, considering multiple language used in MS (Java, JavaScript).

There are these things called files :-). In days gone by, we would put data into files and use programs to read them off of disk.
Later, we decided we needed to distribute them across networks, we would use a web server to send those files to remote programs (browsers). When we did so, the programs that used the files became slow (networks were lousy then). And so, we developed caches. We even put directives in files to say how long the caches were good for. We could tune the TTL so that some files would last for seconds, others would last for days or weeks.
They even added the ability for servers to tell the programs if the files were changed on the server without sending the whole file again.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control

How to efficiently handle large file uploads with django+react?

I have a Django application with a React frontend, and need to upload multi-gigabyte files (around 8-12gb) selected from the React on a remote machine.
Currently I'm using a simple form and uploading the files via fetch with formData in the body and then reading them out of request.FILES, however I'm not very happy with this solution, since it takes over a minute to upload the files even from the same computer to itself (and I'm not entirely sure why).
Is there any way to speed up this process? The production environment for this is a gigabit local network without any external internet access so no cloud storage please.
The data seems to be fairly compressible, easily reducing it's size by 30%+ when zipped/tared.
Is there any way to compress during file uploads, or any parameters I can change on either end to speed up the process?
It doesn't seem like an 8 GB file should take over a minute to upload from the same machine to itself, and in fact should theoretically be faster than that over a gigabit network. How can I streamline this?
It'd be nice if I could make a progress bar on frontend but fetch didn't currently allow that, but if another method does that helps with the file uploads, happens to have a way to monitoring progress of the file upload, please mention it.
React 15.4, Django 1.10, Python 3.5 lots of memory, many CPU cores, and even for the server, a GPU (in case someone has some bizarre idea to use CUDA for decompression or something though I've never heard of such a thing) are available, in case there's some sort of parallelization that could help saturate the connection.

Append 300.000 lines or print them into a file?

I need to get from thousands of online JSON about 300.000 final lines, equal to 30MB.
Being beginner in coding, I prefer to stick to JS to $getJSON data, cut it, append interesting parts to my <body>, and loop on the thousands online JSON. But I wonder :
can my web-browser handles 300.000 $getJSON queries and the resulting 30~50MB webpage without crashing ?
is it possible to use JS to write down a file with this results, so the script's works is constantly saved ?
I expect my script to run about 24 hours. Numbers are estimations.
Edit: I don't have server side knowledge, just JS.

A few things aren't right about your approach for this:
If what you are doing is fetching (and processing) data from another source then displaying it for a visitor, processing of this scale should be done separately and beforehand in a background process. Web browsers should not be used as data processors on the scale you're talking about.
If you try to display a 30-50MB webpage, your user is going to experience lots of frustrating issues - browser crashes, lack of responsiveness, timeouts, long load times, and so on. If you expect any users on older IE browsers, they might as well give up without even trying.
My recommendation is to pull this task out and do it using your backend infrastructure, saving the results in a database which can then be searched, filtered, and accessed by your user. Some options worth looking into:
Cron
Cron will allow you to run a task on a repeated and regular basis, such as daily or hourly. Use this if you want to continually update your dataset.
Worker (Heroku)
If running Heroku, take it out of the dyno and use a separate worker so as not to clog up any existing traffic on your app.

Client-side zipping with Flash + JavaScript

I'm looking for a robust way of creating a zip archive on the fly from information on a given page and making this available for download. Client-side zipping is a must since my script runs from a bookmarklet.
My first approach while I was more concerned with writing the rest of the script was just to post the information to a few lines of PHP running on my local server which zipped it and sent it back. This is obviously not suitable for a bookmarklet worth sharing.
I found JSZip earlier today, and I thought that'd be the end of it. This library works great when it works; unfortunately, the archives I'm creating frequently exceed a couple of MBs, and this breaks JSZip. (Note: I've only tested this on Chrome.)
Pure JS downloads also have the limitation of funky names due the data URI, which I intended to solve using JSZip's recommended method, using Downloadify, which uses Flash. This made me wonder whether the size limitations on JS zip generating could be / have been overcome by using a similar interplay of Flash & JS.
I Googled this, but having no experience with Actionscript I couldn't figure out quickly whether what I'm asking is possible. Is it possible to use a Flash object from JS to create relatively large (into the 10s of MBs) zip file on the client-side?
Thanks!

First of all some numbers:
Flash promises that uploads will work if the file is smaller than 100 Mb (I don't know whether it means base 10 or base 16).
There are two popular libraries in Flash for creating ZIP archives, but read on first.
ZIP archiver is a program that both compresses and archives the data, and it does it in exactly this order. I.e. it compresses each file separately and then appends it to the entire archive. This yields worse compression rate but allows for iterative creation of the archive. With the benefit being that you can even start sending the archive before it is entirely compressed.
An alternative to ZIP is first to use a dedicated archiver and then to compress the entire archive at once. This, some times can achieve few times better compression, but the cost is that you have to process the data at once.
But Flash ByteArray.compress() method offers you native implementation of deflate algorithm, which is mostly the same thing you would use in ZIP archiver. So, if you had implemented something like tar, you could significantly reduce the size of the files being sent.
But Flash is a single-thread environment, so, you would have to be careful about the size of the data you compress, and, probably, will have to find it out empirically. Or just use ZIP - more redundancy, but easier to implement.
I've used this library before: nochump. Didn't have any problems. Although, it is somewhat old, and it might make sense to try to port it to use Alchemy opcodes (which are used for fast memory access significantly reducing the cost of low-level binary arithmentic operations such as binary or, binary and etc.) This library implements CRC32 algorithm, which is an essential part of ZIP archive and it uses Alchemy - so it should be considerably faster, but you would have to implement the rest on your own.
Yet another option you might consider is Goole's NaCl - there you would be able to choose from archiver and compression implementations because it essentially runs the native code, so you could even use bz2 and other modern stuff - unfortunately, only in Chrome (and users must enable it) or Firefox (need plugin).

Splitting a file before upload?

On a webpage, is it possible to split large files into chunks before the file is uploaded to the server? For example, split a 10MB file into 1MB chunks, and upload one chunk at a time while showing a progress bar?
It sounds like JavaScript doesn't have any file manipulation abilities, but what about Flash and Java applets?
This would need to work in IE6+, Firefox and Chrome. Update: forgot to mention that (a) we are using Grails and (b) this needs to run over https.

You can try Plupload. It can be configured to check whatever runtime is available on users side, be it - Flash, Silverlight, HTML5, Gears, etc, and use whichever satisfies required features first. Among other things it supports image resizing (on users side, preserving EXIF data(!)), stream and multipart upload, and chunking. Files can be chunked on users side, and sent to a server-side handler chunk-by-chunk (requires some additional care on server), so that big files can be uploaded to a server having max filesize limit set to a value much lower then their size, for example. And more.
Some runtimes support https I believe, some need testing. Anyway, developers on there are quite responsive these days. So you might at least try ;)

The only option I know of that would allow this would be a signed Java applet.
Unsigned applets and Flash movies have no filesystem access, so they wouldn't be able to read the file data. Flash is able to upload files, but most of that is handled by the built-in Flash implementation and from what I remember the file contents would never be exposed to your code.

There is no JavaScript solution for that selection of browsers. There is the File API but whilst it works in newer Firefox and Chrome versions it's not going to happen in IE (no sign of it in IE9 betas yet either).
In any case, reading the file locally and uploading it via XMLHttpRequest is inefficient because XMLHttpRequest does not have the ability to send pure binary, only Unicode text. You can encode binary into text using base-64 (or, if you are really dedicated, a custom 7-bit encoding of your own) but this will be less efficient than a normal file upload.
You can certainly do uploads with Flash (see SWFUpload et al), or even Java if you must (Jumploader... I wouldn't bother, these days, though, as Flash prevalence is very high and the Java plugin continues to decline). You won't necessarily get the low-level control to split into chunks, but do you really need that? What for?
Another possible approach is to use a standard HTML file upload field, and when submit occurs set an interval call to poll the server with XMLHttpRequest, asking it how far the file upload is coming along. This requires a bit of work on the server end to store the current upload progress in the session or database, so another request can read it. It also means using a form parsing library that gives you progress callback, which most standard language built-in ones like PHP's don't.
Whatever you do, take a ‘progressive enhancement’ approach, allowing browsers with no support to fall back to a plain HTML upload. Browsers do typically have an upload progress bar for HTML file uploads, it just tends to be small and easily missed.

Do you specifically need it two be in X chunks? Or are you trying to solve the problems cause by uploading large files? (e.g. can't restart an upload on the client side, server side crashes when the entire file is uploaded and held in memory all at once)
Search for streaming upload components. It depends on what technologies you are working with as to which component you will prefer jsp, asp.net, etc.
http://krystalware.com/Products/SlickUpload/ This one is a server side product
Here are some more pointers to various uploaders http://weblogs.asp.net/jgalloway/archive/2008/01/08/large-file-uploads-in-asp-net.aspx
some try to manage memory on the server,e.g. so the entire huge file isn´t in memory at one time, some try to manage the client side experience.

We Keep Coding

JavaScript is the programming language of the Web.