How to read a huge text file through javascript or jquery?

How to read a huge text file through javascript or jquery? - javascript

How can I read a huge text file line by line through javascript or jquery?
I cant read all and split to an array because it will require lots of memmory. I just want to stream it...
EDIT
As a note I am working on a google chrome extension so that solutions with fso ActiveX does not work on this browser. Any other ideas?

HTML5 finally provides a standard way to interact with local files, via the File API specification. As example of its capabilities, the File API could be used to create a thumbnail preview of images as they're being sent to the server, or allow an app to save a file reference while the user is offline. Additionally, you could use client-side logic to verify an upload's mimetype matches its file extension or restrict the size of an upload.
The spec provides several interfaces for accessing files from a 'local' filesystem:
1.File - an individual file; provides readonly information such as name, file size, mimetype, and a reference to the file handle.
2.FileList - an array-like sequence of File objects. (Think or dragging a directory of files from the desktop).
3.Blob - Allows for slicing a file into byte ranges.
When used in conjunction with the above data structures, the FileReader interface can be used to asynchronously read a file through familiar JavaScript event handling. Thus, it is possible to monitor the progress of a read, catch errors, and determine when a load is complete. In many ways the APIs resemble XMLHttpRequest's event model.
Note: At the time of writing this tutorial, the necessary APIs for working with local files are supported in Chrome 6.0 and Firefox 3.6. As of Firefox 3.6.3, the File.slice() method is not supported.
http://www.html5rocks.com/en/tutorials/file/dndfiles/

Lazy Text View widget is intended to display text on a web-page. The key feature is that it does not load whole text in the browser memory, but it displays only fragment (frame) of file. This allows to display large, very large, huge texts.
he widget provides user interface for text display and requires server-side data source. You have to implement server-side component yourself, it's logic is quite simple. When the widget needs next chunk of text, it queries server (using POST-method) for the next chunk.
http://polyakoff.ucoz.net/

fs.read(fd, buffer, offset, length, position, [callback])
Read data from the file specified by fd.
buffer is the buffer that the data will be written to.
offset is offset within the buffer where writing will start.
length is an integer specifying the number of bytes to read.
position is an integer specifying where to begin reading from in the file. If position is null, data will be read from the current file position.
http://nodejs.org/docs/v0.4.8/api/fs.html#file_System

TextStream and Scripting.FileSystemObject
; object = ObjectOpen("Scripting.FileSystemObject") ; WIL syntax
; ObjectClose(object) ; WIL syntax
;
; TextStream = object.CreateTextFile(filename[, overwrite[, unicode]]) ; Creates a file as a TextStream
; TextStream = object.OpenTextFile(filename[, iomode[, create[, format]]]) ; Opens a file as a TextStream
;
; TextStream.Close ; Close a text stream.
;
; TextStream.ReadAll ; Read the entire stream into a string.
; TextStream.ReadLine ; Read an entire line into a string.
; TextStream.Read (n) ; Read a specific number of characters into a string.
;
; TextStream.Write (string) ; Write a string to the stream.
; TextStream.WriteLine ; Write an end of line to the stream.
; TextStream.WriteLine (string) ; Write a string and an end of line to the stream.
; TextStream.WriteBlankLines (n) ; Write a number of blank lines to the stream.
;
; TextStream.SkipLine ; Skip a line.
; TextStream.Skip (n) ; Skip a specific number of characters.
;
; TextStream.Line ; Current line number.
; TextStream.Column ; Current column number.
;
; TextStream.AtEndOfLine ; Boolean Value. Is the current position at the end of a line?
; TextStream.AtEndOfStream ; Boolean Value. Is the current position at the end of the stream?
; -------------------------------------------------------------------------------------------------------------------------------
Sample Code:
function ReadFiles()
{
var fso, f1, ts, s;
var ForReading = 1;
fso = new ActiveXObject("Scripting.FileSystemObject");
f1 = fso.CreateTextFile("c:\\testfile.txt", true);
// Write a line.
Response.Write("Writing file <br>");
f1.WriteLine("Hello World");
f1.WriteBlankLines(1);
f1.Close();
// Read the contents of the file.
Response.Write("Reading file <br>");
ts = fso.OpenTextFile("c:\\testfile.txt", ForReading);
s = ts.ReadLine();
Response.Write("File contents = '" + s + "'");
ts.Close();
}

Related

Insert PNG comment block (iTXt) using javascript

I want to insert a UTF-8 comment in a PNG. Context is in a modern browser : export -canvas- and add some metadata into PNG before user download, later import it and read metadata.
PNG specs for metadata, says about iTXt
I see a good answer here on SO about this, with all steps to achieve a tEXt chunk but without code.
I found a simple nodejs library node-png-metadata to manage PNG metadata.
With this resources, I succeeded some tricks like insert a chunk and read it, but it seem's it's not a valid iTXt chunk (same with tEXt chunk), because tools like pngchunks or pnginfo can't understand it.
See this working fiddle for playing import a PNG it will add metadata and display it ! Test with tEXt or iTXt chunk
Near line 21 some tests around creation of the chunk
var txt = {
sample: '#à$'
};
var newchunk = metadata.createChunk("tEXt", "Comment"+String.fromCharCode(0x00)+"heremycommentl"); // works but not conform
var newchunk = metadata.createChunk("TEXt", "Comment"+String.fromCharCode(0x00)+"heremycommentl"); // result invalid png
var newchunk = metadata.createChunk("iTXt", "Source"+String.fromCharCode(0x00)+"00fr"+String.fromCharCode(0x00)+"Source"+String.fromCharCode(0x00)+""+JSON.stringify(txt));// works but not conform
Beside Resulting PNG is corrupted if chunk type name first char is upper case ? TEXt
If some of you have understanding to share, you're welcome

Chunk names are case-sensitive, tEXt is the name of the chunk, TEXt is not. And since the first letter is uppercase, making the chunk critical, no PNG tools can understand the image since there is now an unknown critical chunk.
The iTXt one is broken because the compression flag and method are stored directly, not as the ASCII representation of the numbers. Changing it to:
metadata.createChunk("iTXt", "Source"+String.fromCharCode(0x00)+String.fromCharCode(0x00)+String.fromCharCode(0x00)+"fr"+String.fromCharCode(0x00)+"Source"+String.fromCharCode(0x00)+""+JSON.stringify(txt));
makes it work.
metadata.createChunk("tEXt", "Comment"+String.fromCharCode(0x00)+"heremycommentl") doesn't cause any issues with pnginfo, perhaps you confused the error there with the iTXt one?

How do I insert a PNG comment block when saving an HTML5 canvas using javascript toDataURL?

I have a compact canvas-to-png download saver function (see code below).
This code works very well and I am satisfied with its output... mostly.
Would a second replace suffice? What would that replace look like?
My only other option is to post-process the file with imagemagick.
Any ideas?
More completely: I want to add metadata from javascript.
I found this link http://dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_PNG_files
which details the structures, and I may be able to figure it out with sufficient time.
If anyone has experience and can shorten this for me, I would appreciate it.
//------------------------------------------------------------------
function save () // has to be function not var for onclick to work.
//------------------------------------------------------------------
{
var element = document.getElementById("saver");
element.download = savename;
element.href = document.
getElementById(id.figure1a.canvas).
toDataURL("image/png").
replace(/^data:image\/[^;]/,'data:application/octet-stream');
}

The Base-64 representation has little to do with the internal chunks. It's just [any] binary data encoded as string so it can be transferred over string-only protocols (or displayed in a textual context).
It's perhaps a bit broad to create an example, but hopefully showing the main steps will help to achieve what you're looking for:
To add a chunk to a PNG you would first have to convert the data for it into an ArrayBuffer using XHR/fetch in the case of Data-URIs, or FileReader in case you have the PNG as Blob (which I recommend. See toBlob()).
Add a DataView to the ArrayBuffer
Go to position 0x08 in the array which will represent the start of the IHDR chunk, read the length of the chunk (Uint32) (it's very likely it has the same static size for almost any PNG but since it's possible to have changes, and you don't need to remember the chunk size we'll just read it from here). Add length to position (+4 for CRC-32 at the end of the chunk, and +4 if you didn't move the pointer while reading the length), typically this should land you at position 0x21.
You now have the position for the next chunk which we can use to insert our own text chunks
Split that first part into a part-array (a regular array) using a sub-array with the original ArrayBuffer, e.g. new Uint8Array(arraybuffer, 0, position); - you can also use the subarray method.
Produce the new chunk* as typed array and add to part-array
Add the remaining part of the original PNG array without the first part to the part-array, e.g. new Uint8Array(arraybuffer, position, length - position);
Convert the part-array to a Blob using the part-array directly as argument (var newPng = new Blob(partArray, {type: "image/png"});). This will now contain the custom chunk. From there you can use an Object-URL with it to read it back as an image (or make it available for download).
*) Chunk:
For tEXt be aware of it is limited to the Latin-1 charset which means you'll have to whitewash the string you want to use - use iTXt for unicode (UTF-8) content - we'll use tEXt here for simplicity.
The keyword and value is separated by a NUL-byte (0x00) in a tEXt chunk, and the keyword must be exactly typed as defined in the spec.
Build the chunk this way:
get byte-size from string
add 12 bytes (for length, four-cc and crc-32)
format the array this way (you can use a DataView here as well):
Uint32 - length of chunk (data only in number of bytes)
Uint32 - "tEXt" as four-cc
[...] - The data itself (copy byte-wise)
Uint32 - CRC32* which includes the FourCC but not length and itself.
All data in a PNG is big-endian.
To calculate CRC-32 feel free to use this part of my pngtoy solution (the LUT is built this way). Here is one way to format a four-cc:
function makeFourCC(n) { // n = "tEXt" etc., big-endian
var c = n.charCodeAt.bind(n);
return (c(0) & 0x7f) << 24 | (c(1) & 0x7f) << 16 | (c(2) & 0x7f) << 8 | c(3) & 0x7f
}

Javascript using File.Reader() to read line by line

This question is close but not quite close enough.
My HTML5 application reads a CSV file (although it applies to text as well) and displays some of the data on screen.
The problem I have is that the CSV files can be huge (with a 1GB file size limit). The good news is, I only need to display some of the data from the CSV file at any point.
The idea is something like (psudeo code)
var content;
var reader = OpenReader(myCsvFile)
var line = 0;
while (reader.hasLinesRemaning)
if (line % 10 == 1)
content = currentLine;
Loop to next line
There are enough articles about how to read the CSV file, I'm using
function openCSVFile(csvFileName){
var r = new FileReader();
r.onload = function(e) {
var contents = e.target.result;
var s = "";
};
r.readAsText(csvFileName);
}
but, I can't see how to read line at a time in Javascript OR even if it's possible.
My CSV data looks like
Some detail: date, ,
More detail: time, ,
val1, val2
val11, val12
#val11, val12
val21, val22
I need to strip out the first 2 lines, and also consider what to do with the line starting with a # (hence why I need to read through line at a time)
So, other than loading the lot into memory, do I have any options to read line at a time?

There is no readLine() method to do this as of now. However, some ideas to explore:
Reading from a blob does fire progress events. While it is not required by the specification, the engine might prematurely populate the .result property similar to an XMLHttpRequest.
The Streams API drafts a streaming .read(size) method for file readers. I don't think it is already implemented anywhere, though.
Blobs do have a slice method which returns a new Blob containing a part of the original data. The spec and the synchronous nature of the operation suggest that this is done via references, not copying, and should be quite performant. This would allow you to read the huge file chunk-by-chunk.
Admittedly, none of these methods do automatically stop at line endings. You will need to buffer the chunks manually, break them into lines and shift them out once they are complete. Also, these operations are working on bytes, not on characters, so there might be encoding problems with multi-byte characters that need to be handled.
See also: Reading line-by-line file in JavaScript on client side

Displaying UTF-8 characters in PDF

I am trying to display a PDF by converting it into a binary string from the backend.
This is the ajax call I am making
$.ajax({
type : 'GET',
url : '<url>',
data : oParameters,
contentType : 'application/pdf;charset=UTF-8',
success : function(odata) {
window.open("data:application/pdf;charset=utf-8," + escape(odata));
}
});
When I try to open the PDF in a new window, the url looks like
data:application/pdf;charset=utf-8,%25PDF-1.3%0D%0A%25%uFFFD%uFFFD%uFFFD%uFFFD%0D%0A2%200%20obj%0D%0A/WinAnsiEncoding%0D........
As you can see, it uses "WinAnsiEncoding" to display the PDF. Because of this, some of the characters are not being displayed properly. How do I change this to UTF-8?
EDIT : The backend is in ABAP. I am converting a smartform to OTF and then to a string using the function module "CONVERT_OTF".
CALL FUNCTION fname
EXPORTING
user_settings = space
control_parameters = ls_ctropt
output_options = ls_output
gv_lang = lv_lang
IMPORTING
job_output_info = ls_body_text
EXCEPTIONS
formatting_error = 1
internal_error = 2
send_error = 3
user_canceled = 4
OTHERS = 5.
CALL FUNCTION 'CONVERT_OTF'
EXPORTING
format = 'PDF'
IMPORTING
bin_filesize = ls_pdf_len
bin_file = ls_pdf_xstring
TABLES
otf = ls_body_text-otfdata
lines = lt_lines
EXCEPTIONS
err_max_linewidth = 1
err_format = 2
err_conv_not_possible = 3
err_bad_otf = 4
OTHERS = 5.
CALL METHOD server->response->set_header_field( name = 'Content-Type'
value = 'application/pdf;charset=UTF-8' ).
CALL METHOD server->response->append_data( data = lv_pdf_string
length = lv_len ).

Concerning your remark that it uses "WinAnsiEncoding" to display the PDF:
After the comma in
data:application/pdf;charset=utf-8,%25PDF-1.3%0D%0A%25%uFFFD%uFFFD%uFFFD%uFFFD%0D%0A2%200%20obj%0D%0A/WinAnsiEncoding%0D........
everything is pure data. Thus, "WinAnsiEncoding" is merely part of the content of the PDF, and if it is the reason of your troubles, the PDF generator must be asked to change his PDF generation process.
In the case at hand, your data is:
%PDF-1.3
%...
2 0 obj
/WinAnsiEncoding
........
which is completely normal PDF structure. It merely means that the PDF object 2 is defined as /WinAnsiEncoding which may or may not be used for some font definition, and even if it is used, it may still be adapted by some /Differences to include the characters you require. Furthermore it does not make sense to change this to UTF-8 (as you request) because UTF-8 is not a standard encoding for PDF page content. If you somehow put UTF-8 there, you'll break the PDF even more.
I'm afraid, though, that there are other problems, too.
You add a charset parameter to the type application/pdf --- this does not make sense, PDF is a binary format, i.e. a sequence of bytes is expected and, therefore, no charset is involved.
Your method call escape(odata) creates %uFFFD%uFFFD%uFFFD%uFFFD --- this is invalid according to the RFCs which only define
A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component. A percent-encoded octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing that octet's numeric value.
(RFC 3986, section 2.1)
Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI.
(ibidem, section 2.4)
Thus, %uFFFD%uFFFD%uFFFD%uFFFD is invalid.
PDF being a binary format are better suited for Base64 encoding, i.e.
data:application/pdf;base64,BASE_64_ENCODED_PDF
Thus, I propose you change your client side process accordingly.

Returning a byte string to ExternalInterface.call throws an error

I am working on my open source project Downloadify, and up until now it simply handles returning Strings in response to ExternalInterface.call commands.
I am trying to put together a test case using JSZip and Downloadify together, the end result being that a Zip file is created dynamically in the browser, then saved to the disk using FileReference.save. However, this is my problem:
The JSZip library can return either a base64 encoded string of the Zip, or the raw byte string. The problem is, if I return that byte string in response to the ExternalInterface.call command, I get this error:
Error #1085: The element type "string" must be terminated by the matching end-tag "</string>"
ActionScript 3:
var theData:* = ExternalInterface.call('Downloadify.getTextForSave',queue_name);
Where queue_name is just a string used to identify the correct instance in JS.
JavaScript:
var zip = new JSZip();
zip.add("test.txt", "Hello world!\n");
var content = zip.generate(true);
return content;
If I instead return a normal string instead of the byte string, the call works correctly.I would like to avoid using base64 as I would have to include a base64 decoder in my swf which will increase its size.
Finally: I am not looking for a AS3 Zip generator. It is imperative to my project to have that part run in JavaScript
I am admittedly not a AS3 programmer by trade, so if you need any more detail please let me know.

When data is being returned from javascript calls it's being serialized into an XML string. So if the "raw string" returned by JSZip will include characters which make the XML non-valid, which is what I think is happening here, you'll get errors like that.
What you get as a return is actually:
<string>[your JSZip generated string]</string>
Imagine your return string includes a "<" char - this will make the xml invalid, and it's hard to tell what character codes will a raw byte stream translate too.
You can read more about the external API's XML format on LiveDocs

i think the problem is caused by the fact, that flash expects a utf8 String and you throw some binary stuff at it. i think for example 0x00FF will not turn out to be valid utf8 ...
you can try fiddling around with flash.system::System.setCodePage, but i wouldn't be too optimistic ...
i guess a base64 decoder is probably really the easiest ... i'd rather worry about speed than about file size though ... this rudimentary decoder method uses less than half a K:
public function decodeBase64(source:String):ByteArray {
var ret:ByteArray = new ByteArray();
var map:Object = new Object();
var i:int = 0;
for each (var char:String in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".split("")) map[char] = i++;
map["="] = 0;
source = source.split("\n").join("").split("\r").join("");//remove linebreaks
for (i = 0; i < source.length/4; i++) {
var buf:int = 0;
for each (char in source.substr(i * 4, 4).split("")) buf = (buf << 6) + map[char];
ret.writeByte(buf >>> 16);
ret.writeShort(buf);
}
return ret;
}
you could simply shorten function names and take a smaller image ... or use ColorTransform or ConvolutionFilter on one image instead of four ... or compile the image into the SWF for smaller overall size ... or reduce function name length ...
so unless you're planning on working with MBs of data, this is the way to go ...

We Keep Coding

JavaScript is the programming language of the Web.