I'm looking for a robust way of creating a zip archive on the fly from information on a given page and making this available for download. Client-side zipping is a must since my script runs from a bookmarklet.
My first approach while I was more concerned with writing the rest of the script was just to post the information to a few lines of PHP running on my local server which zipped it and sent it back. This is obviously not suitable for a bookmarklet worth sharing.
I found JSZip earlier today, and I thought that'd be the end of it. This library works great when it works; unfortunately, the archives I'm creating frequently exceed a couple of MBs, and this breaks JSZip. (Note: I've only tested this on Chrome.)
Pure JS downloads also have the limitation of funky names due the data URI, which I intended to solve using JSZip's recommended method, using Downloadify, which uses Flash. This made me wonder whether the size limitations on JS zip generating could be / have been overcome by using a similar interplay of Flash & JS.
I Googled this, but having no experience with Actionscript I couldn't figure out quickly whether what I'm asking is possible. Is it possible to use a Flash object from JS to create relatively large (into the 10s of MBs) zip file on the client-side?
Thanks!
First of all some numbers:
Flash promises that uploads will work if the file is smaller than 100 Mb (I don't know whether it means base 10 or base 16).
There are two popular libraries in Flash for creating ZIP archives, but read on first.
ZIP archiver is a program that both compresses and archives the data, and it does it in exactly this order. I.e. it compresses each file separately and then appends it to the entire archive. This yields worse compression rate but allows for iterative creation of the archive. With the benefit being that you can even start sending the archive before it is entirely compressed.
An alternative to ZIP is first to use a dedicated archiver and then to compress the entire archive at once. This, some times can achieve few times better compression, but the cost is that you have to process the data at once.
But Flash ByteArray.compress() method offers you native implementation of deflate algorithm, which is mostly the same thing you would use in ZIP archiver. So, if you had implemented something like tar, you could significantly reduce the size of the files being sent.
But Flash is a single-thread environment, so, you would have to be careful about the size of the data you compress, and, probably, will have to find it out empirically. Or just use ZIP - more redundancy, but easier to implement.
I've used this library before: nochump. Didn't have any problems. Although, it is somewhat old, and it might make sense to try to port it to use Alchemy opcodes (which are used for fast memory access significantly reducing the cost of low-level binary arithmentic operations such as binary or, binary and etc.) This library implements CRC32 algorithm, which is an essential part of ZIP archive and it uses Alchemy - so it should be considerably faster, but you would have to implement the rest on your own.
Yet another option you might consider is Goole's NaCl - there you would be able to choose from archiver and compression implementations because it essentially runs the native code, so you could even use bz2 and other modern stuff - unfortunately, only in Chrome (and users must enable it) or Firefox (need plugin).
Related
Background:
I am building a node.js-based Web app that needs to make use of various fonts. But it only needs to do so in the backend since the results will be delivered as an image. Consequently, the client/browser does not need access to the fonts at all in my case.
Question:
I will try to formulate the question as little subjective as possible:
What are the typical options to provide a node.js backend with a large collection of fonts?
The options I came up with so far are:
Does one install these hundreds or thousands of fonts in the operating system of the (in my case: Ubuntu) server?
Does one somehow serve the fonts from a cloud storage such as S3 or (online) database such as a Mongo DB server?
Does one use a local file system to store the fonts and retrieve them?
...other options
I am currently leaning towards Option 1 because this is the way a layman like me does it on a local machine.
Without starting a discussion here, where could I find resources discussing the (dis-)advantages of the different options?
EDIT:
Thank you for all the responses.
Thanks to these, I noticed that I need to clarify something. I need the fonts to be used in SVG processing libraries such as p5.js, paper.js, raphael.js. So I need to make the fonts available to these libraries that are run on node.js.
The key to your question is
hundreds or thousands of fonts
Until I took that in there is no real difference between your methods. But if that number is correct (kind of mind-boggling though) I would:
not install them in the OS. What happens if you move servers without an image? Or move OS?
Local File system would be a sane way of doing it, though you would need to keep track manually of all the file names and paths for your code.
MongoDB - store file names+paths in the collection..and store the actual fonts in your system.
In advent of moving servers you would have to pick up the directory where all the actual files are stored and the DB where you hold the file name+paths.
If you want you can place it all in a MongoDB but then that file would also be huge, I assume - that is up to you.
Choice #3 is probably what I would do in such a case.
If you have a decent enough server setup (e.g. a VPS or some other VM solution where you control what's installed) then another option you might want to consider is to do this job "out of node". For instance, in one of my projects where I need to build 175+ as-perfect-as-can-be maths statements, I offload that work to XeLaTeX instead:
I run a node script that takes the input text and builds a small but complete .tex file
I then tell node to call "xelatex theFileIJustMade.tex" which yields a pdf
I then tell node to call "pdfcrop" on that pdf, to remove the margins
I then tell node to call "pdf2svg", which is a free and amazingly effective utility
Then as a final step mostly to conserve space and bandwidth, I use "svgo" which is a nodejs based svg optimizer that can run either as normal script code, or as CLI utility.
(some more details on that here, with concrete code here)
Of course, depending on how responsive a system you need, you can do entirely without steps 3 and 5. There is a limit to how fast we can run, but as a server-side task there should never be the expectation of real-time responsiveness.
This is a good example of remembering that your server runs inside a larger operating system that might also offer tools that can do the job. While you're using Node, and the obvious choice is a Node solution, Node is also a general purpose programming language and can call anything else through spawn and exec, much like python, php, java, C#, etc. As such, it's sometimes worth thinking about whether there might be another tool that is even better suited for your needs, especially when you're thinking about doing a highly specialized job like typesetting a string to SVG.
In this case, LaTeX was specifically created to typeset text from the command line, and XeLaTeX was created to do that with full Unicode awareness and clean, easy access to fonts both from file and from the system, with full OpenType feature control, so would certainly qualify as just as worthwhile a candidate as any node-specific solution might be.
As for the tools used: XeLaTeX and pdfcrop come with TeX Live (installed using whatever package manager your OS uses, or through MiKTeX on Windows, but I suspect your server doesn't run on windows) pdf2svg is freely available on github, and svgo is available from npm)
I always hear in production, you want to combine multiple .js files into 1 to make it load faster.
But since browser actually makes multiple request concurrently, there's a chance that multiple files can be loaded faster than a single file, which has to be downloaded from beginning to end.
Is this reasoning correct?
It's a complex area.
The browser making multiple concurrent connections to the same server (which are usually quite limited in number) doesn't make the connection between the client and server faster. The pipes between them are only so big, and the server only has so much delivery capacity. So there's little if any reason to believe 4 parallel downloads, each of 10k, from the same server are likely to be faster than 1 download of 40k from that server. Add to that the fact that browsers limit the number of concurrent connections to the same server, and the expense of setting up those individual connections (which is non-trivial), and you're still better off with one large file for your own scripts.
For now. This is an area being actively developed by Google and others.
If you can load scripts from multiple servers (for instance, perhaps load common libraries from any of the several CDNs that make them accessible, and your own single combined script from your own server [or CDN]), it can make sense to separate those. It doesn't make the client's connection faster, but if the client's connection isn't the limiting factor, you can get a benefit. And of course, for a site that doesn't justify having its own CDN, loading common libraries from the free CDNs and just your own scripts from your own server lets you get the advantage of edge-casting and such on the scripts you load from the free CDNs.
For Large JS files:
Not Good idea,If you have small JS files then its good idea to merage otherwise
suppose if JS files is more than 500kbs then single file will make in
MBS and take huge loading HTTP request time.
For small JS files:
Good idea ,for small it has good idea but its better to use only 3rd party tool
which will also compress your final single file so that HTTP request
time will take less time. I would suggest using PHP Minify(but you can find other which suit you), which lets
you create a single HTTP request for a group of JS or CSS files.
Minify also handles GZipping, Compression, and HTTP Headers for client
side caching.
demo status of PHP minify
Before
After
It depends on if your server is HTTP/2 or HTTP/1.1.
HTTP/2
HTTP/2 (H2) allows a server to quickly respond to multiple requests, allowing the client to streamline all the requests without waiting for the first one to return and parse. This helps to mitigate the need for concatenation, but doesn't entirely remove it. See this post for an in-depth answer to when you should or shouldn't concatenate.
Another thing to keep in mind is that if your server gzips your assets, it can actually be better to concatenate some of them together since gzipping can perform better on larger files with lots of repeating text. By separating all your files out, you could actually hurt your overall performance. Finding the most optimal solution will require some trial and error (a lot of this is still new and so best practices are still being discovered).
HTTP/1.1
With HTTP/1.1, as the other answers have pointed out, for the majority of cases combining all your files into one is better. This reduces the number of HTTP requests, which can be slow with HTTP/1.1. There are ways to mitigate this by requesting assets from different subdomains to allow multiple concurrent requests.
I recommend reading High Performance Browser Networking for a complete understanding on strategies for HTTP/1.1.
For context I am trying to get a webserver up and running with low specs (think Raspberry Pi level) and want it to serve text content to as many users as possible.
I have tried looking up general tips, and this was helpful...but I can't believe that it's completely pointless to use Javascript to de-encode a website that's been compressed. If I took a simple cipher to replace each instance of '<div' with a special character I know I would never serve (some of these maybe), wouldn't I already be saving some bandwidth? A JS search and replace on the client side can't take more than a fraction of a second.
So my question is are any of these novel methods for speeding up simple websites worth it in the end? Are there other methods I am not considering?
The http-protocoll supports gzip compression natively. There is no further benefit in replacing
"<div " with "x". In order to improve performance you can tell your web-server to serve files that are already gzip-compressed. I did that once and it cut down page loading time about 10 milliseconds on a shared hoster using apache.
I'm using the excellent requirejs optimizer to compress the code of a web application.
The application uses a lot of third-party libs. I have several options :
Let the user download all the third party libs separately from my server
Let the user download all the third party libs from a CDN, if available
User requirejs to produce a 'compressed' version of all those libs, in a single file
Now, I know that caching and / or a CDN would help with how long it takes to fetch each individual library, however if I have 15 libs, I'm still going to end up with 15 http requests ; which is all the more annoying if the actual code for my application ends up being served in one or two relatively small files.
So what are the pros and cons of each methods ? Also, I suppose I would be actually 'redistributing' (in the sense of common FOOS licenses) the libraries if I were to bundle them inside my app (rather than pointing to a CDN ?)
Any experience / ideas welcome.
Thanks.
You could take a look to Why should I use Google's CDN for jQuery? question, why CDN is better solution.
It increases the parallelism available. (Most browsers will only
download 3 or 4 files at a time from any given site.)
It increases the chance that there will be a cache-hit. (As more sites
follow this practice, more users already have the file ready.)
It ensures that the payload will be as small as possible. (Google can
pre-compress the file in a wide array of formats (like GZIP or
DEFLATE). This makes the time-to-download very small, because it is
super compressed and it isn't compressed on the fly.)
It reduces the amount of bandwidth used by your server. (Google is
basically offering free bandwidth.)
It ensures that the user will get a geographically close response.
(Google has servers all over the world, further decreasing the
latency.)
(Optional) They will automatically keep your scripts up to date. (If
you like to "fly by the seat of your pants," you can always use the
latest version of any script that they offer. These could fix security
holes, but generally just break your stuff.)
On a webpage, is it possible to split large files into chunks before the file is uploaded to the server? For example, split a 10MB file into 1MB chunks, and upload one chunk at a time while showing a progress bar?
It sounds like JavaScript doesn't have any file manipulation abilities, but what about Flash and Java applets?
This would need to work in IE6+, Firefox and Chrome. Update: forgot to mention that (a) we are using Grails and (b) this needs to run over https.
You can try Plupload. It can be configured to check whatever runtime is available on users side, be it - Flash, Silverlight, HTML5, Gears, etc, and use whichever satisfies required features first. Among other things it supports image resizing (on users side, preserving EXIF data(!)), stream and multipart upload, and chunking. Files can be chunked on users side, and sent to a server-side handler chunk-by-chunk (requires some additional care on server), so that big files can be uploaded to a server having max filesize limit set to a value much lower then their size, for example. And more.
Some runtimes support https I believe, some need testing. Anyway, developers on there are quite responsive these days. So you might at least try ;)
The only option I know of that would allow this would be a signed Java applet.
Unsigned applets and Flash movies have no filesystem access, so they wouldn't be able to read the file data. Flash is able to upload files, but most of that is handled by the built-in Flash implementation and from what I remember the file contents would never be exposed to your code.
There is no JavaScript solution for that selection of browsers. There is the File API but whilst it works in newer Firefox and Chrome versions it's not going to happen in IE (no sign of it in IE9 betas yet either).
In any case, reading the file locally and uploading it via XMLHttpRequest is inefficient because XMLHttpRequest does not have the ability to send pure binary, only Unicode text. You can encode binary into text using base-64 (or, if you are really dedicated, a custom 7-bit encoding of your own) but this will be less efficient than a normal file upload.
You can certainly do uploads with Flash (see SWFUpload et al), or even Java if you must (Jumploader... I wouldn't bother, these days, though, as Flash prevalence is very high and the Java plugin continues to decline). You won't necessarily get the low-level control to split into chunks, but do you really need that? What for?
Another possible approach is to use a standard HTML file upload field, and when submit occurs set an interval call to poll the server with XMLHttpRequest, asking it how far the file upload is coming along. This requires a bit of work on the server end to store the current upload progress in the session or database, so another request can read it. It also means using a form parsing library that gives you progress callback, which most standard language built-in ones like PHP's don't.
Whatever you do, take a ‘progressive enhancement’ approach, allowing browsers with no support to fall back to a plain HTML upload. Browsers do typically have an upload progress bar for HTML file uploads, it just tends to be small and easily missed.
Do you specifically need it two be in X chunks? Or are you trying to solve the problems cause by uploading large files? (e.g. can't restart an upload on the client side, server side crashes when the entire file is uploaded and held in memory all at once)
Search for streaming upload components. It depends on what technologies you are working with as to which component you will prefer jsp, asp.net, etc.
http://krystalware.com/Products/SlickUpload/ This one is a server side product
Here are some more pointers to various uploaders http://weblogs.asp.net/jgalloway/archive/2008/01/08/large-file-uploads-in-asp-net.aspx
some try to manage memory on the server,e.g. so the entire huge file isn´t in memory at one time, some try to manage the client side experience.