What is the maximum file size javascript can process? [duplicate] - javascript

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Javascript Memory Limit
I'm working on creating html page using client side javascript which will load around 150mb of XML data file on page load. When the file size was around 5 mb, it took 30 seconds to load whole data into an array. But when I changed file to 140 mb, the page is not responding in firefox and crashing abruptly in chrome. My snippet to load data will process on every individual tag in xml. My question is, is there any limited heap size for javascript? Any academic article resource is preferable to emphasize my research.
$(document).ready(function () {
// Open the xml file
$.get("xyz.xml", {}, function (xml) {
// Run the function for each in the XML file
$('abc', xml).each(function (i) {
a = $(this).find("a").text();
b = $(this).find("b").text();
c = $(this).find("c").text();
ab = $(this).find("ab").text();
bc = $(this).find("bc").text();
cd = $(this).find("cd").text();
de = $(this).find("de").text();
// process data
dosomething(a,b,c,ab,bc,cd,de);
}); }); });

I don't know of any limits. I've been able to load even a 1Gb file. Yes, it was slow to load initially and everything ran slowly because most of the memory will be paged.
However, there are problems with trying to load a single JavaScript object that is that big, mostly because the parsers can't parse an object that is too big. See Have I reached the limits of the size of objects JavaScript in my browser can handle?
For that case, the solution was to break up the creation of the JavaScript object into multiple stages rather than using a single literal statement.

Because it's to much to improve. First of all i'd like to recommend you a post 76 bytes for faster jQuery. So, relying on that replace your $(this) on $_(this).
it will save you a lot memory and time!!
If you don't want to use single jQuery object, please cashe you variable like that:
$('abc', xml).each(function (i) {
var $this = $(this);
a = $this.find("a").text();
....
});
and you can provide your dosomething function to try to improve it

Related

PDF.js returns text contents of the whole Document as each Page's textContent

I'm building a client-side app that uses PDF.js to parse the contents of a selected PDF file, and I'm running into a strange issue.
Everything seems to be working great. The code successfully loads the PDF.js PDF object, which then loops through the Pages of the document, and then gets the textContent for each Page.
After I let the code below run, and inspect the data in browser tools, I'm noticing that each Page's textContent object contains the text of the entire document, not ONLY the text from the related Page.
Has anybody experienced this before?
I pulled (and modified) most of the code I'm using from PDF.js posts here, and it's pretty straight-forward and seems to perform exactly as expected, aside from this issue:
testLoop: function (event) {
var file = event.target.files[0];
var fileReader = new FileReader();
fileReader.readAsArrayBuffer(file);
fileReader.onload = function () {
var typedArray = new Uint8Array(this.result);
PDFJS.getDocument(typedArray).then(function (pdf) {
for(var i = 1; i <= pdf.numPages; i++) {
pdf.getPage(i).then(function (page) {
page.getTextContent().then(function (textContent) {
console.log(textContent);
});
});
}
});
}
},
Additionally, the size of the returned textContent objects are slightly different for each Page, even though all of the objects share a common last object - the last bit of text for the whole document.
Here is an image of my inspector to illustrate that the objects are all very similarly sized.
Through manual inspection of the objects in the inspector shown, I can see that the data from, Page #1, for example, should really only consist of about ~140 array items, so why does the object for that page contain ~700 or so? And why the variation?
It looks like the issue here is the formatting of the PDF document I'm trying to parse. The PDF contains government records in a tabular format, which apparently was not composed according to modern PDF standards.
I've tested the script with different PDF files (which I know are properly composed), and the Page textContent objects returned are correctly split based on the content of the Pages.
In case anyone else runs into this issue in the future, there are at least two possible ways to handle the problem, as far as I have imagined so far:
Somehow reformat the malformed PDF to use updated standards, then process it. I don't know how to do this, nor am I sure it's realistic.
Select the largest of the returned Page textContent objects (since they all contain more or less the full text of the document) and do your operations on that textContent object.

Page does not update on setTimeout? [duplicate]

This question already has answers here:
Calling functions with setTimeout()
(6 answers)
Closed 6 years ago.
I wrote a script that will read in a text file and attempt to find duplicate entries in the dataset with the user-entered parameters. As the dataset is quite large, the page tends to freeze. I referenced this post here:
Prevent long running javascript from locking up browser
To perform my calculations in chunks using setTimeout(). I am trying to get a percentage complete to display on the page, but it won't display until the script finishes running (both Chrome and Firefox).
This is my pump function:
function pump(initDataset, dataset, output, chunk){
chunk = calculateChunk(initDataset, dataset, output, chunk);
document.getElementById("d_notif_container").style.display="block";
document.getElementById("d_percent").innerHTML = NOTIF_SEARCH_START + percentComplete + "%"; //This is a div
document.getElementById("i_pc").value = percentComplete; //This is an input text field
if(percentComplete < 100){
console.log("Running on chunk " + chunk);
setTimeout(pump(initDataset, dataset, output, chunk), TIMEOUT_DELAY);
}
else {
comparisonComplete(output);
}
}
I attempted to display in two different ways, using innerHTML and value (two different elements). The percentComplete is a global variable that is updated in calculateChunk().
The strangest thing is, if I inspect one of those elements, I can watch the HTML change counting the percent (in Chrome). It just simply does not show on the actual page until the script is done running. I've tried changing the timeout up to 1000 as well with not luck.
Any idea why this is happening?
This code
setTimeout(pump(initDataset, dataset, output, chunk), TIMEOUT_DELAY);
calls pump and passes its return value into setTimeout, exactly the way foo(bar()) calls bar and passes its return value into foo.
If you want to call pump after a timeout with those values, you can use Function#bind to do it:
setTimeout(pump.bind(null, initDataset, dataset, output, chunk), TIMEOUT_DELAY);
// ------------^^^^^^^^^^
Function#bind returns a new function that, when called, will use the given this value (the first argument) and arguments you give bind.

Javascript using File.Reader() to read line by line

This question is close but not quite close enough.
My HTML5 application reads a CSV file (although it applies to text as well) and displays some of the data on screen.
The problem I have is that the CSV files can be huge (with a 1GB file size limit). The good news is, I only need to display some of the data from the CSV file at any point.
The idea is something like (psudeo code)
var content;
var reader = OpenReader(myCsvFile)
var line = 0;
while (reader.hasLinesRemaning)
if (line % 10 == 1)
content = currentLine;
Loop to next line
There are enough articles about how to read the CSV file, I'm using
function openCSVFile(csvFileName){
var r = new FileReader();
r.onload = function(e) {
var contents = e.target.result;
var s = "";
};
r.readAsText(csvFileName);
}
but, I can't see how to read line at a time in Javascript OR even if it's possible.
My CSV data looks like
Some detail: date, ,
More detail: time, ,
val1, val2
val11, val12
#val11, val12
val21, val22
I need to strip out the first 2 lines, and also consider what to do with the line starting with a # (hence why I need to read through line at a time)
So, other than loading the lot into memory, do I have any options to read line at a time?
There is no readLine() method to do this as of now. However, some ideas to explore:
Reading from a blob does fire progress events. While it is not required by the specification, the engine might prematurely populate the .result property similar to an XMLHttpRequest.
The Streams API drafts a streaming .read(size) method for file readers. I don't think it is already implemented anywhere, though.
Blobs do have a slice method which returns a new Blob containing a part of the original data. The spec and the synchronous nature of the operation suggest that this is done via references, not copying, and should be quite performant. This would allow you to read the huge file chunk-by-chunk.
Admittedly, none of these methods do automatically stop at line endings. You will need to buffer the chunks manually, break them into lines and shift them out once they are complete. Also, these operations are working on bytes, not on characters, so there might be encoding problems with multi-byte characters that need to be handled.
See also: Reading line-by-line file in JavaScript on client side

Find all string variables in javascript files

i am facing the problem that i have to translate a larger html and javascript project into several languages. The html content was no problem, but the numerous javascript files are problematic, since i was a bit lazy during the development process. For instance, if i needed a message text, i just added it in the concerning position.
My approach now is, that i am using a build-in file search (Eclipse) for every occurrence of " and ', which i am getting line-wise. This would be extremely time consuming and errors are unavoidable.
Here are some examples that occur in the files:
var d = "Datum: " + d.getDate()+"."+(d.getMonth()+1)+"."+d.getFullYear();
showYesNoDialog("heading text","Are you sure?",function(){};
Sometimes i am mixing " and ', sometimes a string goes over several lines:
var list="";
list+='<li data-role="list-divider">';
list+='Text To Translate';
list+='</li>';
Things i don't want to get, are jquery selectors, e.g.:
$("input[name^=checkbox]").each(function () {};
Do you see any time saving method to get all of the strings that i would like to translate?
Regex? A java interpreter? Grep?
I know, that is a bit unusual question.
So any suggestion would be great.
Thanks in advance!
It is better to use some kind of the lexical scanner that converts the code into the tokens and then walk over the list of tokens (or syntax tree). There is a number of such tools (I even created one of them myself - here you can find some of the examples https://github.com/AlexAtNet/spelljs/blob/master/test/scan.js).
With it you can scan the JS file and just iterate over the tokens:
var scan = require('./..../scan.js');
scan(text).forEach(function (item) {
if (item.str) {
console.log(item);
}
});

proper ways to reduce loading big hashtable

i have a JavaScript app that contains big hashtable (1 megabyte). What's the proper way to load it to reduce loading time?
the hash table is this:
function unicodeTable (num) {
// returns the unicode name of a given unicode num in decimal
var unicodedata =
{
0x0020:"SPACE",
// tens of thousand entries here
}
...
}
the app is a Unicode browser. When user hovers over a Unicode, it displays its name and codepoint.
PS I'd like the solution not involving some js lib. Thanks.
Addendum:
the page is here
http://xahlee.info/comp/unicode_6_emoticons_list.html
this is the JavaScript
http://xahlee.info/comp/unicode_popup.js
on desktop/laptop it's fine, but i noticed on a Blackberry tablet, it froze browser for 5 or so minutes. I'm not sure how to use ajax, or perhaps worker is the answer?
Load it asynchronously, like using AJAX
Use a CDN if possible
Define it in a scope that does not terminate. This avoids JS from creating and destroying it everytime.
function foo(){
//creating bar
var bar = 'someValue';
//using bar here
//bar will be destroyed as soon as foo is done
}
But in this one, bar lives since it's outside foo:
//creating bar
var bar = 'someValue';
function foo(){
//using bar here
//bar still lives after foo is done
}
Offload processing (search, accessing, traversal) to a WebWorker or a simulated worker using timers.
first, make sure you compress the responses with gzip. that will save a lot of space since texts compress really well. Second, there is no need to have a hashtable. you can just have an array with the names at their correct location and when someone hovers the space element in the page, you go in this array at [20] and retrieve the name. as i see there, that will save 7 bytes per char. so a bit from here and a bit from there it can add up. but that's about all i can think of.

Categories