Javascript using File.Reader() to read line by line - javascript

This question is close but not quite close enough.
My HTML5 application reads a CSV file (although it applies to text as well) and displays some of the data on screen.
The problem I have is that the CSV files can be huge (with a 1GB file size limit). The good news is, I only need to display some of the data from the CSV file at any point.
The idea is something like (psudeo code)
var content;
var reader = OpenReader(myCsvFile)
var line = 0;
while (reader.hasLinesRemaning)
if (line % 10 == 1)
content = currentLine;
Loop to next line
There are enough articles about how to read the CSV file, I'm using
function openCSVFile(csvFileName){
var r = new FileReader();
r.onload = function(e) {
var contents = e.target.result;
var s = "";
};
r.readAsText(csvFileName);
}
but, I can't see how to read line at a time in Javascript OR even if it's possible.
My CSV data looks like
Some detail: date, ,
More detail: time, ,
val1, val2
val11, val12
#val11, val12
val21, val22
I need to strip out the first 2 lines, and also consider what to do with the line starting with a # (hence why I need to read through line at a time)
So, other than loading the lot into memory, do I have any options to read line at a time?

There is no readLine() method to do this as of now. However, some ideas to explore:
Reading from a blob does fire progress events. While it is not required by the specification, the engine might prematurely populate the .result property similar to an XMLHttpRequest.
The Streams API drafts a streaming .read(size) method for file readers. I don't think it is already implemented anywhere, though.
Blobs do have a slice method which returns a new Blob containing a part of the original data. The spec and the synchronous nature of the operation suggest that this is done via references, not copying, and should be quite performant. This would allow you to read the huge file chunk-by-chunk.
Admittedly, none of these methods do automatically stop at line endings. You will need to buffer the chunks manually, break them into lines and shift them out once they are complete. Also, these operations are working on bytes, not on characters, so there might be encoding problems with multi-byte characters that need to be handled.
See also: Reading line-by-line file in JavaScript on client side

Related

Decompressing bzip2 data in Javascript

I ultimately have to consume some data from a Javascript file that looks as follows:
Note: The base64 is illustrative only.
function GetTripsDataCompressed() { return 'QlpoOTFBWSZTWdXoWuEDCAgfgBAHf/.....=='; }
GetTripsDataCompressed() returns a base64 string that is derived as an array of objects converted to JSON using JSON.NET and the resulting string then compressed to bzip2 using SharpCompress with the resulting memory stream Base64 encoded.
This is what I have and cannot change it.
I am struggling to find a bzip2 JavaScript implementation that will take the result of:
var rawBzip2Data = atob(GetTripsDataCompressed());
and convert rawBzip2Data back into the string that is the JSON array. I cannot use something like compressjs as I need to support IE 10 and as it uses typed arrays that means IE10 support is out.
So it appears that my best option is https://github.com/antimatter15/bzip2.js however because I have not created an archive and only bzip2 a string it raises an error of Uncaught No magic number found after doing:
var c = GetTripsDataCompressed();
c = atob(c);
var arr = new Uint8Array(c);
var bitstream = bzip2.array(arr);
bzip2.simple(bitstream);
So can anyone help me here to decompress a BZip2, Base64 encoded string from JavaScript using script that is IE 10 compliant? Ultimately I don't care whether it uses https://github.com/antimatter15/bzip2.js or some other native JavaScript implementation.
It seems to me the answer is in the readme:
decompress(bitstream, size[, len]) does the main decompression of a single block. It'll return -1 if it detects that it's the final block, otherwise it returns a string with the decompressed data. If you want to cap the output to a certain number of bytes, set the len argument.
Also, keep in mind the repository doesn't have a license attached. You'll need to reach out to the author if you want to use the code. That might be tricky given that the repository is eight years old.
On the other hand, the Bzip2 algorithm itself is open-source (BSD-like license), so you can just reimplement it yourself in Javascript. It's just a few hundred lines of relatively straight-forward code.

Insert PNG comment block (iTXt) using javascript

I want to insert a UTF-8 comment in a PNG. Context is in a modern browser : export -canvas- and add some metadata into PNG before user download, later import it and read metadata.
PNG specs for metadata, says about iTXt
I see a good answer here on SO about this, with all steps to achieve a tEXt chunk but without code.
I found a simple nodejs library node-png-metadata to manage PNG metadata.
With this resources, I succeeded some tricks like insert a chunk and read it, but it seem's it's not a valid iTXt chunk (same with tEXt chunk), because tools like pngchunks or pnginfo can't understand it.
See this working fiddle for playing import a PNG it will add metadata and display it ! Test with tEXt or iTXt chunk
Near line 21 some tests around creation of the chunk
var txt = {
sample: '#à$'
};
var newchunk = metadata.createChunk("tEXt", "Comment"+String.fromCharCode(0x00)+"heremycommentl"); // works but not conform
var newchunk = metadata.createChunk("TEXt", "Comment"+String.fromCharCode(0x00)+"heremycommentl"); // result invalid png
var newchunk = metadata.createChunk("iTXt", "Source"+String.fromCharCode(0x00)+"00fr"+String.fromCharCode(0x00)+"Source"+String.fromCharCode(0x00)+""+JSON.stringify(txt));// works but not conform
Beside Resulting PNG is corrupted if chunk type name first char is upper case ? TEXt
If some of you have understanding to share, you're welcome
Chunk names are case-sensitive, tEXt is the name of the chunk, TEXt is not. And since the first letter is uppercase, making the chunk critical, no PNG tools can understand the image since there is now an unknown critical chunk.
The iTXt one is broken because the compression flag and method are stored directly, not as the ASCII representation of the numbers. Changing it to:
metadata.createChunk("iTXt", "Source"+String.fromCharCode(0x00)+String.fromCharCode(0x00)+String.fromCharCode(0x00)+"fr"+String.fromCharCode(0x00)+"Source"+String.fromCharCode(0x00)+""+JSON.stringify(txt));
makes it work.
metadata.createChunk("tEXt", "Comment"+String.fromCharCode(0x00)+"heremycommentl") doesn't cause any issues with pnginfo, perhaps you confused the error there with the iTXt one?

PDF.js returns text contents of the whole Document as each Page's textContent

I'm building a client-side app that uses PDF.js to parse the contents of a selected PDF file, and I'm running into a strange issue.
Everything seems to be working great. The code successfully loads the PDF.js PDF object, which then loops through the Pages of the document, and then gets the textContent for each Page.
After I let the code below run, and inspect the data in browser tools, I'm noticing that each Page's textContent object contains the text of the entire document, not ONLY the text from the related Page.
Has anybody experienced this before?
I pulled (and modified) most of the code I'm using from PDF.js posts here, and it's pretty straight-forward and seems to perform exactly as expected, aside from this issue:
testLoop: function (event) {
var file = event.target.files[0];
var fileReader = new FileReader();
fileReader.readAsArrayBuffer(file);
fileReader.onload = function () {
var typedArray = new Uint8Array(this.result);
PDFJS.getDocument(typedArray).then(function (pdf) {
for(var i = 1; i <= pdf.numPages; i++) {
pdf.getPage(i).then(function (page) {
page.getTextContent().then(function (textContent) {
console.log(textContent);
});
});
}
});
}
},
Additionally, the size of the returned textContent objects are slightly different for each Page, even though all of the objects share a common last object - the last bit of text for the whole document.
Here is an image of my inspector to illustrate that the objects are all very similarly sized.
Through manual inspection of the objects in the inspector shown, I can see that the data from, Page #1, for example, should really only consist of about ~140 array items, so why does the object for that page contain ~700 or so? And why the variation?
It looks like the issue here is the formatting of the PDF document I'm trying to parse. The PDF contains government records in a tabular format, which apparently was not composed according to modern PDF standards.
I've tested the script with different PDF files (which I know are properly composed), and the Page textContent objects returned are correctly split based on the content of the Pages.
In case anyone else runs into this issue in the future, there are at least two possible ways to handle the problem, as far as I have imagined so far:
Somehow reformat the malformed PDF to use updated standards, then process it. I don't know how to do this, nor am I sure it's realistic.
Select the largest of the returned Page textContent objects (since they all contain more or less the full text of the document) and do your operations on that textContent object.

What is the maximum file size javascript can process? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Javascript Memory Limit
I'm working on creating html page using client side javascript which will load around 150mb of XML data file on page load. When the file size was around 5 mb, it took 30 seconds to load whole data into an array. But when I changed file to 140 mb, the page is not responding in firefox and crashing abruptly in chrome. My snippet to load data will process on every individual tag in xml. My question is, is there any limited heap size for javascript? Any academic article resource is preferable to emphasize my research.
$(document).ready(function () {
// Open the xml file
$.get("xyz.xml", {}, function (xml) {
// Run the function for each in the XML file
$('abc', xml).each(function (i) {
a = $(this).find("a").text();
b = $(this).find("b").text();
c = $(this).find("c").text();
ab = $(this).find("ab").text();
bc = $(this).find("bc").text();
cd = $(this).find("cd").text();
de = $(this).find("de").text();
// process data
dosomething(a,b,c,ab,bc,cd,de);
}); }); });
I don't know of any limits. I've been able to load even a 1Gb file. Yes, it was slow to load initially and everything ran slowly because most of the memory will be paged.
However, there are problems with trying to load a single JavaScript object that is that big, mostly because the parsers can't parse an object that is too big. See Have I reached the limits of the size of objects JavaScript in my browser can handle?
For that case, the solution was to break up the creation of the JavaScript object into multiple stages rather than using a single literal statement.
Because it's to much to improve. First of all i'd like to recommend you a post 76 bytes for faster jQuery. So, relying on that replace your $(this) on $_(this).
it will save you a lot memory and time!!
If you don't want to use single jQuery object, please cashe you variable like that:
$('abc', xml).each(function (i) {
var $this = $(this);
a = $this.find("a").text();
....
});
and you can provide your dosomething function to try to improve it

How to read a huge text file through javascript or jquery?

How can I read a huge text file line by line through javascript or jquery?
I cant read all and split to an array because it will require lots of memmory. I just want to stream it...
EDIT
As a note I am working on a google chrome extension so that solutions with fso ActiveX does not work on this browser. Any other ideas?
HTML5 finally provides a standard way to interact with local files, via the File API specification. As example of its capabilities, the File API could be used to create a thumbnail preview of images as they're being sent to the server, or allow an app to save a file reference while the user is offline. Additionally, you could use client-side logic to verify an upload's mimetype matches its file extension or restrict the size of an upload.
The spec provides several interfaces for accessing files from a 'local' filesystem:
1.File - an individual file; provides readonly information such as name, file size, mimetype, and a reference to the file handle.
2.FileList - an array-like sequence of File objects. (Think or dragging a directory of files from the desktop).
3.Blob - Allows for slicing a file into byte ranges.
When used in conjunction with the above data structures, the FileReader interface can be used to asynchronously read a file through familiar JavaScript event handling. Thus, it is possible to monitor the progress of a read, catch errors, and determine when a load is complete. In many ways the APIs resemble XMLHttpRequest's event model.
Note: At the time of writing this tutorial, the necessary APIs for working with local files are supported in Chrome 6.0 and Firefox 3.6. As of Firefox 3.6.3, the File.slice() method is not supported.
http://www.html5rocks.com/en/tutorials/file/dndfiles/
Lazy Text View widget is intended to display text on a web-page. The key feature is that it does not load whole text in the browser memory, but it displays only fragment (frame) of file. This allows to display large, very large, huge texts.
he widget provides user interface for text display and requires server-side data source. You have to implement server-side component yourself, it's logic is quite simple. When the widget needs next chunk of text, it queries server (using POST-method) for the next chunk.
http://polyakoff.ucoz.net/
fs.read(fd, buffer, offset, length, position, [callback])
Read data from the file specified by fd.
buffer is the buffer that the data will be written to.
offset is offset within the buffer where writing will start.
length is an integer specifying the number of bytes to read.
position is an integer specifying where to begin reading from in the file. If position is null, data will be read from the current file position.
http://nodejs.org/docs/v0.4.8/api/fs.html#file_System
TextStream and Scripting.FileSystemObject
; object = ObjectOpen("Scripting.FileSystemObject") ; WIL syntax
; ObjectClose(object) ; WIL syntax
;
; TextStream = object.CreateTextFile(filename[, overwrite[, unicode]]) ; Creates a file as a TextStream
; TextStream = object.OpenTextFile(filename[, iomode[, create[, format]]]) ; Opens a file as a TextStream
;
; TextStream.Close ; Close a text stream.
;
; TextStream.ReadAll ; Read the entire stream into a string.
; TextStream.ReadLine ; Read an entire line into a string.
; TextStream.Read (n) ; Read a specific number of characters into a string.
;
; TextStream.Write (string) ; Write a string to the stream.
; TextStream.WriteLine ; Write an end of line to the stream.
; TextStream.WriteLine (string) ; Write a string and an end of line to the stream.
; TextStream.WriteBlankLines (n) ; Write a number of blank lines to the stream.
;
; TextStream.SkipLine ; Skip a line.
; TextStream.Skip (n) ; Skip a specific number of characters.
;
; TextStream.Line ; Current line number.
; TextStream.Column ; Current column number.
;
; TextStream.AtEndOfLine ; Boolean Value. Is the current position at the end of a line?
; TextStream.AtEndOfStream ; Boolean Value. Is the current position at the end of the stream?
; -------------------------------------------------------------------------------------------------------------------------------
Sample Code:
function ReadFiles()
{
var fso, f1, ts, s;
var ForReading = 1;
fso = new ActiveXObject("Scripting.FileSystemObject");
f1 = fso.CreateTextFile("c:\\testfile.txt", true);
// Write a line.
Response.Write("Writing file <br>");
f1.WriteLine("Hello World");
f1.WriteBlankLines(1);
f1.Close();
// Read the contents of the file.
Response.Write("Reading file <br>");
ts = fso.OpenTextFile("c:\\testfile.txt", ForReading);
s = ts.ReadLine();
Response.Write("File contents = '" + s + "'");
ts.Close();
}

Categories