javascript - locally generating and downloading a huge file

javascript - locally generating and downloading a huge file - javascript

I have the following JavaScript code which downloads a file, which I can't help but think I got from here: Create a file in memory for user to download, not through server
However, this function crashes in Chrome because I'm trying to download too much data (it might be a couple of MB, but it seems to work OK for downloads under 1 MB. I haven't done many metrics on it).
I really like this function because it lets me create a string from an existing huge JavaScript variable and immediately download it.
So my two questions then are:
A) Whats causing the crash? Is it the size of the string text or is there something in this function? I read that 60MB strings were possible in JavaScript and I don't think I'm quite reaching that.
B) If it is this function, is there another simple way to download some huge-ish file that allows me to generate the content locally via JavaScript?
function download(filename, text) {
var element = document.createElement('a');
element.setAttribute('href', 'data:text/plain;charset=utf-8,' + encodeURIComponent(text));
element.setAttribute('download', filename);
element.style.display = 'none';
document.body.appendChild(element);
element.click();
document.body.removeChild(element);
}

Does it work in other browsers? Try using the debugger and set a break point just inside the function and step through.
Try breaking up the element.setAttribute and the data content by creating a var that holds the string you are going to set to href, that way you can see more failure points.
See if the encodeURIComponent function is failing with large strings.
Strings are immutable in javascript, for those who are privy it means that their creation is final, you can't modify the string, or append to one, you have to create a new one for every change. encodeURIComponent which url encodes a string is possibly making thousands of changes escaping a > 1mb string depending on the contents of the string. And even if you are using zero characters that need escaped, when you call that function and then append it to the 'data:text/plain;charset=utf-8,' string, it will create a new string from those two, effective doubling the memory needed for that action.
Depending on how the particular browser is handing this function, its not optimized for long strings at all, since most browsers have a url character limitation of ~2000 characters ( 2048 typically ) then it's likely that the implementation in the browser is not doing a low level escape. If this function is indeed the culprit, you will have to find another way to uri escape your string. possibly a library or custom low level escape.
If the debugger shows that this function is not the issue, the obvious other bottleneck would be when you append this enormous link to the dom, the browser could be freezing there attempting to process this command, and for that it may require a completely different solution to your downloading issue.
Though this is just speculation, hopefully it leads you in the right direction.

While I marked Rickey's answer as the correct one because it got me to the right answer, a workaround I found for this was here:
JavaScript blob filename without link
The accepted answer at this link was capable of handling more than 8MB, while the data URI was capable of handling 2MB, because of Chrome's limit on URI length.

Related

javascript encodeURIComponent and escape?

I use JS to sent encodeURIComponent string to a PHP file write and has been working fine for years; until recently I met with a strange effect that the text need to be further encoded with escape in order to get it to work! The sympton start to show only when I use an open source wysiwyg editor !
What could be the offending characters in URI that need escape to fix it? I used to think URI only reserve ? & = for its syntax to work.

The situation you describe could possibly be explained--although there's no way of knowing without you telling us what the string is, and how it's being used--by a URL which involves two levels of nested URL-like values.
Consider a URL taking a query parameter which is another URL:
http://me.com?url=http://you.com?qp=1
That URL is subject to misinterpretation, so we would normally URL-encode the you.com URL, giving us:
http://me.com?url=http%3A%2F%2Fyou.com%3Fqp%3D1
Whoever is working with this URL can now extract the query parameter named url with the value http%3A%2F%2Fyou.com%3Fqp%3D1, decode it (often a framework or library will decode it for you), and then use it to jump to or call that URL.
Consider, however, the case where the you.com URL itself has a query parameter, not ?qp=1 as given in the first example, but rather something that itself needs to be URL-encoded. To keep things simple, we'll just use "cat?pictures". We'd need to encode that, making the query parameter
In other words, the URL in question is going to be
?qp=cat%3Fpictures
If we just use that as is, then our entire URL becomes
http://me.com?url=http%3A%2F%2Fyou.com%3Fqp=cat%3Fpictures
Unfortunately, if we now decode that in a naive way, we get
http://me.com?url=http://you.com?qp=cat?pictures
In other words, the nested URL has been decoded as well, meaning that it will think the URL has two query paramters, namely url and qp. To successfully deal with this problem, we need to encode the second query parameter a second time, yielding
http://me.com?url=http%3A%2F%2Fyou.com%3Fqp%3Dcat%253Fpictures
Please note, however, that if you use your language or environment's built-in tools and libraries for handling query parameters, most of this will happen automatically and prevent you from having to worry about it.
The symptom start to show only when I use an open source wysiwyg editor
An editor merely places characters in a file. It's very hard to imagine that an editor is causing the problem you refer to, unless perhaps one editor is configured to use smart quotes, for example, which would pretty much break everything that involved quotes.

Building large strings in JavaScript ; Is join method most efficient?

In writing a database to disk as a text file of JSON strings, I've been experimenting with how to most efficiently build the string of text that is ultimately converted to a blob for download to disk.
There a number of questions that state to not concatenate a string with the + operator in a loop, but instead write the component strings to an array and then use the join method to build one large string.
The best explanation I came across explaining why can be found here, by Jeol Mueller:
In JavaScript (and C# for that matter) strings are immutable. They can never be changed, only replaced with other strings. You're probably aware that combined + "hello " doesn't directly modify the combined variable - the operation creates a new string that is the result of concatenating the two strings together, but you must then assign that new string to the combined variable if you want it to be changed.
So what this loop is doing is creating a million different string objects, and throwing away 999,999 of them. Creating that many strings that are continually growing in size is not fast, and now the garbage collector has a lot of work to do to clean up after this."
The thread here, was also helpful.
However, using the join method didn't allow me to build the string I was aiming for without getting the error:
allocation size overflow
I was trying to write 50,000 JSON strings from a database into one text file, which simply may have been too large no matter what. I think it was reaching over 350MB. I was just testing the limit of my application and picked something far larger than a user of the application will likely ever create. So, this test case was likely unreasonable.
Nonetheless, this leaves me with three questions about working with large strings.
For the same amount of data overall, does altering the number of array elements joined in a single join operation affect the efficiency in terms of not hitting an allocation size overflow?
For example, I tried writing the JSON strings to a pseudo 3-D array of 100 (and then 50) elements per dimension; and then looped through the outer two dimensions joining them together. 100^3 = 1,000,000 or 50^3 = 125,000 both provide more than enough entries to hold the 50,000 JSON strings. I know I'm not including the 0 index, here.
So, the 50,000 strings were held in an array from a[1][1][1] to a5[100][100] in the first attempt and of a[1][1][1] to a[20][50][50] in the second attempt. If the dimensions are i, j, k from outer to inner, I joined all the k elements in each a[i][j]; and then joined all of those i x j joins, and lastly all of these i joins into the final text string.
All attemtps still hit the allocation size overflow before completing.
So, is there any difference between joining 50,000 smaller strings in one join versus 50 larger strings, if the total data is the same?
Is there a better, more efficient way to build large strings than the join method?
Does the same principle described by Joel Mueller regarding string concatenation apply to reducing a string through substring, such as string = string.substring(position)?
The context of this third question is that when I read a text file in as a string and break it down into its component JSON strings before writing to the database, I use an array that is map of the file layout; so, I know the length of each JSON string in advance and repeat three statements inside a loop:
l = map[i].l;
str = text.substring(0,l);
text = text.substring(l).
It would appear that since strings are immutable, this sort of reverse of concatenation step is as inefficient as using the + operator to concatenate.
Would it be more efficient to not delete the str from text each iteration, and just keep track of the increasing start and end positions for the substrings as step through the loop reading the entire text string?
Response to message about duplicate question
I got a message, I guess from the stackoverflow system itself, asking me to edit my question explaining why it is different from the proposed duplicate.
Reasons are:
The proposed duplicate asks specifically and exclusively about the maximum size of a single string. None of the three bolded questions, here, asks about the maximum size of a single string, although that is useful to know.
This question asks about the most efficient way of building large strings and that isn't addressed in the answers found in the proposed duplicate, apart from an efficent way of building a large test string. They don't address how to build a realistic string, comprised of actual application data.
This question provides a couple links to some information concerning the efficiency of building large strings that may be helpful to those interested in more than the maximum size alone.
This question also has a specific context of why the large string was being built, which led to some suggestions about how to handle that situation in a more efficient manner. Although, in the strictest sense, they don't specifically address the question by title, they do address the broader context of the question as presented, which is how to deal with the large strings, even if that means ways to work around them. Someone searching on this same topic might find real help in these suggestions that is not provided in the proposed duplicate.
So, although the proposed duplicate is somewhat helpful, it doesn't appear to be anywhere near a genuine duplicate of this question in its full context.
Additional Information
This doesn't answer the question concerning the most efficient way to build a large string, but it refers to the comments about how to get around the string size limit.
Converting each component string to a blob and holding them in an array, and then converting the array of blobs into a single blob, accomplished this. I don't know what the size limit of a single blob is, but did see 800MB in another question.
A process (or starting point) for creating the blob to write the database to disk and then to read it back in again can be found here.
Regarding the idea of writing the blobs or strings to disk as they are generated on the client as opposed to generating one giant string or blob for download, although the most logical and efficient method, may not be possible in the scenario presented here of an offline application.
According to this question, web extensions no longer have access to the privileged javascript code necessary to accomplish this through the File API.
I asked this question related to the Streams API write stream method and something called StreamSaver.

In writing a database to disk as a text file of JSON strings.
I see no reason to store the data in a string or array of strings in this case. Instead you can write the data directly to the file.
In the simplest case you can write each string to the file separately.
To get better performance, you could first write some data to a smaller buffer, and then write that buffer to disk when it's full.
For best performance you could create a file of a certain size and create a memory mapping over that file. Then write/copy the data directly to the mapped memory (which is your file). The trick would be to know or guess the size up front, or you could resize the file when needed and then remap the file.
Joining or growing strings will trigger a lot of memory (re)allocations, which is unnecessary overhead in this case.
I don't want the user to have to download more than one file
If the goal is to let a user download that generated file, you could even do better by streaming those strings directly to the user without even creating a file. This also has the advantage that the user starts receiving data immediately instead of first having to wait till the whole file is generated.
Because the file size is not known up front, you could use chunked transfer encoding.

Share Array Reference between JavaScript and ActionScript

I have been working with the WebcamJS library to stream video from the camera in the browser, but I have run into a major performance bottleneck. Since I am using Internet Explorer 11 (and cannot switch to a different browser), this library reverts to a Flash fallback for accessing the camera.
The ActionScript callback that returns the image is prohibitively slow, due to its many steps. When it returns the image, it first encodes its byte array as a PNG or JPG, and then to a base 64 string. This string is then passed using ExternalInterface to JavaScript, which decodes the image through a data URI. Given that all I need is the byte array in JavaScript, these extra steps seem wasteful.
I have had to tackle a similar problem before, in C++/Python. Rather than repeatedly pass the array data back and forth between the two languages, I used Python to pass a NumPy array reference at the start of the program. Then, they could both access the same data from then on without any extra communication.
Now that you understand my situation, here is the question: is it possible to pass a JavaScript Array or ArrayBuffer by reference to ActionScript? In that case, I could have ActionScript modify the JavaScript array directly, rather than waste time converting, encoding, and decoding the image for each frame.
(WebcamJS: https://github.com/jhuckaby/webcamjs)

Just for completeness, SharedObjects in flash store data, serialised with the AMF protocol, on the file system (in a very specific, sandboxed and locked place) where Javascript has no way to access to read the data.
Have you tried to simply call the ExternalInterface method and pass an array of bytes as an argument? it would be passed by value, automatically converted from the Actionscript data structure to the Javascript one, but you'd skip all the encoding steps and it should be fast enough ...

How to optimize string-to-DOM conversion?

I'm faced with a slight inconvenient 'lag' when I attempt to populate a div created in JavaScript:
var el = document.createElement("div");
el.innerHTML = '<insert string-HTML code here>'
However, this is natural due to extent of the HTML code; sometimes it's more than 300,000 characters long and it is derived from GM_xmlHttpRequest which sometimes takes 1000ms (give or take) to complete, plus the additional 500ms caused by the DOM-ification.
I have attempted to get rid of massive amount of text using substr (granted not the best idea that could've occurred to me), and it surprisingly worked for the most part, but at certain times element would fail to accept HTML code (probably unmatched <*.?>).
I only need to access an extremely small amount of text that's stored inside; regexp is per bobince out of the question and figured this would be the best approach.
EDIT: I'm inclined to mention that my definition of parsing the DOM has been underrated, I meant to say that this 'text' was the textContent of a quite a few elements which I modify. Therefore, regexp isn't an option.

While other ansers focus on guessing whether your desire (parsing DOM without string manipulation) makes sense, I will dedicate this answer to the comparison of reasonable DOM parsing methods.
For a fair comparison, I assume that we need the <body> element (as root container) for the parsed DOM. I have created a benchmark at http://jsperf.com/domparser-vs-innerhtml-vs-createhtmldocument.
var testString = '<body>' + Array(100001).join('<div>x</div>') + '</body>';
function test_innerHTML() {
var b = document.createElement('body');
b.innerHTML = testString;
return b;
}
function test_createHTMLDocument() {
var d = document.implementation.createHTMLDocument('');
d.body.innerHTML = testString;
return d.body;
}
function test_DOMParser() {
return (new DOMParser).parseFromString(testString, 'text/html').body;
}
The first method is your current one. It is wel-supported accross all browsers.
Even though the second method has the overhead of creating a full document, it has a big benefit over the first one: resources (images) are not loaded. The overhead of the document is marginal compared to the potential network traffic of the first one.
The last method is -as of writing- only supported in Firefox 12+ (no problem, since you're writing a GreaseMonkey script), and is the specific tool for this job (with the same advantages of the previous method). As it name implies, it is a DOM parser.
The bench mark shows that the original method is the fastest 4.64 Ops/s, followed by the DOMParser method 4.22 Ops/s. The slowest method is the createHTMLDocument method 3.72 Ops/s. The differences are minimal though, so I definitely recommend the DOMParser for the reasons stated earlier.
I know that you're using GM_xmlhttprequest to fetch data. However, if you're able to use XMLHttpRequest instead, I suggest to give the following method a try: Instead of getting plain text as response, you can get a document as a response:
var xhr = new XMLHttpRequest();
xhr.open('GET', 'http://www.example.com/');
xhr.responseType = 'document';
xhr.onload = function() {
var bodyElement = xhr.response.body; // xhr.response is a document object
};
xhr.send();
If Greasemonkey script is long active on a single page, you can still use this feature for other domains which do not support CORS: Insert an iframe in the document whose domain is equal to the other domain (eg http://example.com/favicon.ico), and use it as a proxy (activate the GM script for this page as well). The overhead of insering an iframe is significant, so this option is not viable for one-time requests.
For same-origin requests, this option may be the best one (although not benchmarked, one can argue that returning a document directly instead of intermediate string manipulation offers performance benefits). Unlike the DOMParser+text/html method, the responseType="document" is supported by more browsers: Chrome 18+, Firefox 11+ and IE 10+.

We'd need to know a bit more about your application, but when you're working with that much HTML content, you might just want to use an iframe. It's asynchronous, it won't stall JS code, and it won't introduce a plethora of potential debugging problems.
It can be dangerous to populate an element with raw HTML from an xmlhttprequest, mainly due to potential XSS vulnerabilities and next-to-impossible-to-fix HTML glitches. If at all possible, consider using a template (I believe JQuery offers some sort of templating solution) and loading a small amount of XML/JSON/etc. Only do that if using an iframe is out of the question though.

I you have a giant amount of HTML and it's taking a long time to put in the DOM and you only want a small piece of it, the ways to make that faster are:
Get your server to serve up only the parts of the HTML you actually want. This would save on both the networking transfer time and the DOM parsing time.
If you can't modify the server, then you need to manually parse some of the HTML to eliminate the parts you don't want so not as much will put in the DOM. A regex is one of the slower ways to search a giant string so it's better to use something like .indexOf() if possible to identify the general area you are targeting. If there is a unique id or class and you know the general form of the HTML, you can use a faster algorithm like that to identify the target area. But, without you disclosing the actual HTML to be parsed, we can't offer more specifics than that.

load and display PNG image (Not base64) requiring basic authentication in JavaScript

I'm so stuck on this. I need to retrieve a picture e.g. http://ip:port/icon_contact.png using JavaScript from another server requiring basic authentication. The server can't give base64. Don't worry about x-domain restriction.
thanks in advance,
louenas

If I'm reading your question correctly — which is by no means certain — you want to retrieve the binary data of an image file providing basic authentication information directly (not via the user).
You should be able to do this with the XMLHttpRequest object (you can supply auth information in the open call), but to read binary data from the response I'm fairly sure you'll have to stray into brand-new and/or implementation-specific stuff. Here are links to the MSDN, MDC, and (fairly new) W3C docs. Microsoft's XMLHttpRequest has responseBody, Mozilla's (Firefox's) has mozResponseArrayBuffer, and I believe the W3C docs discuss binary data here.
To display the image having loaded it via the above, you could transform the binary data into a data URL (more correctly "data URI", but no one says that) string and assign the result to an img tag's src. You'd have to convert from whatever the browser-specific binary stuff was into the base64 encoding (for the data URL). (You probably don't have to write the conversion yourself, a quick search indicates that people have been tackling this problem and you can reuse [and possibly contribute back to] their efforts...)
The bad news is that IE only supports data URIs as of IE8, and it limits them to 32k, so you'd have to nifty slicing techniques like Google does for search preview.
Once you have the data:// string, the img tag part is easy. If you're not using a library:
var img, element;
img = document.createElement('img');
img.src = /* ... the data URI ... */
element = /* ... find the element you want to put the image in, via
document.getElementById or document.getElementsByTagName
or other DOM traversal ... */;
element.appendChild(img);

We Keep Coding

JavaScript is the programming language of the Web.