How to read any local file by chunks using JavaScript? - javascript

How can I read any large file(greater than 1 gigabytes) locally by chunks(2kb or more),and then convert the chunk to a string, process the string and then get the next chunk and so on until the end of the file?
I'm only able to read small files and convert it to string, as you can see from the code I don't know how to read the file by chunks. The browser freezes if I try it with a file greater than 10mb.
<html>
<head>
<title>Read File</title>
</head>
<body>
<input type="file" id="myFile">
<hr>
<textarea style="width:500px;height: 400px" id="output"></textarea>
<script>
var input = document.getElementById("myFile");
var output = document.getElementById("output");
input.addEventListener("change", function () {
if (this.files && this.files[0]) {
var myFile = this.files[0];
var reader = new FileReader();
reader.addEventListener('load', function (e) {
output.textContent = e.target.result;
});
reader.readAsBinaryString(myFile);
}
});
</script>
</body>
</html>
Below are the links and answers I found on StackOverflow whilst researching on how to accomplish it, but it didn't solve my question.
1: This question was asking about how to do it using UniversalXPConnect, and only in Firefox, which is why i found the answer there to be irrelevant, because I use Chrome and don't know what UniversalXPConnect is.
How to read a local file by chunks in JavaScript
2: This question was asking about how to read text files only, but I want to be able to read any file not just text, and also by chunks, which makes the answers there irrelevant, but i liked how short the code of the answer was. Reading local text file into a JavaScript array [duplicate]
3: This also is about text files and doesn't show how to read files by chunks How to read a local text file.
I know a little bit of Java, which you can easily do it by;
char[] myBuffer = new char[512];
int bytesRead = 0;
BufferedReader in = new BufferedReader(new FileReader("foo.mp4"));
while ((bytesRead = in.read(myBuffer,0,512)) != -1){
...
}
but I'm new to javascript

I was able to solve it by slicing the file by specifying attributes of where to begin the slice and where to end which will be the chunk, I then enclosed it in a while loop so that for each loop chunk position will shift according to the desired chunk size until the end of the file.
But after running it, I end up getting the last value of the chunk in the text area, so to display all the binary string i concatenate the output on each iteration.
<html>
<head>
<title>Read File</title>
</head>
<body>
<input type="file" id="myFile">
<hr>
<textarea style="width:500px;height: 400px" id="output"></textarea>
<script>
var input = document.getElementById("myFile");
var output = document.getElementById("output");
var chunk_size = 2048;
var offset = 0;
input.addEventListener("change", function () {
if (this.files && this.files[0]) {
var myFile = this.files[0];
var size = myFile.size; //getting the file size so that we can use it for loop statement
var i=0;
while( i<size){
var blob = myFile.slice(offset, offset + chunk_size); //slice the file by specifying the index(chunk size)
var reader = new FileReader();
reader.addEventListener('load', function (e) {
output.textContent += e.target.result; //concatenate the output on each iteration.
});
reader.readAsBinaryString(blob);
offset += chunk_size; // Increment the index position(chunk)
i += chunk_size; // Keeping track of when to exit, by incrementing till we reach file size(end of file).
}
}
});
</script>
</body>
</html>

So the issue isn't with FileReader, it's with :
output.textContent = e.target.result;
Because you are trying to dump 10MB+ worth of string into that textarea all at once. I'm not even sure there is a "right" way to do what you are wanting, since even if you did have it in chunks, it would still have to concat the previous value of output.textContent on each loop through those chunks, so that as it gets closer to the end, it would start slowing down in the same way (worse, really, because it would be doing the slow memory hogging business on every loop). So I think part of the looping process is going to have to be adding a new element (like a new textarea to push the current chunk to (so it doesn't have to do any concatenation to preserve what has already been output). I haven't worked that part out yet, but here's what I've got so far:
var input = document.getElementById("myFile");
var output = document.getElementById("output");
var chunk_length = 2048; //2KB as you mentioned
var chunker = new RegExp('[^]{1,' + chunk_length + '}', 'g');
var chunked_results;
input.addEventListener("change", function () {
if (this.files && this.files[0]) {
var myFile = this.files[0];
var reader = new FileReader();
reader.addEventListener('load', function (e) {
chunked_results = e.target.result.match(chunker);
output.textContent = chunked_results[0];
});
reader.readAsBinaryString(myFile);
}
});
This is just outputting the first string in the array of 2KB chunks. You would want to do your thing as far as adding a new element/node in the DOM document for outputting all the other chunks.
Using RegExp and match for the actual chunking was lifted from a clever gist I found.

You can do that using fs.createReadStream(), The amount of data potentially buffered depends on the highWaterMark option passed into the streams constructor.
So you would do it like this:
var read = fs.createReadStream('/something/something', { highWaterMark: 64 });
here's an example :
var fs = require('fs')
var read = fs.createReadStream('readfile.txt',{highWaterMark:64})
var write = fs.createWriteStream('written.txt')
read.on('open', function () {
read.pipe(write);
});
see how it reads 64 bytes at a time (Very Slow), you can view it on explorer in a fun way, but make sure you have a large text file to test it not a gigabyte but at least 17 megabytes like I did "fill it with any dummy text"
make the file view to "details" and keep refreshing the destination in windows explorer, you will see the size increase on every refresh.
I assumed you know about the pipe method if you don't, no problem! it's very simple, here is a link:
https://nodejs.org/api/stream.html#stream_readable_pipe_destination_options
or a quick explanation :
readable.pipe(writable)
The pipe() function reads data from a readable stream as it becomes available and writes it to a destination writable stream.

Related

How to create a PDF file from any Base64 string?

I want to input any Base64 string to function and get the PDF from there. So tried this way, It download the PDF but there is a error
"Failed to load PDF document."
This is the way I tried,
let data = "SGVsbG8gd29ybGQ=" //hello world
var bufferArray = this.base64ToArrayBuffer(data);
var binary_string = window.atob(data)
var len = bufferArray.length;
var bytes = new Uint8Array(len);
for (var i = 0; i < len; i++) {
bytes[i] = binary_string.charCodeAt(i);
}
let blob = new Blob([bytes.buffer], { type: 'application/pdf' })
var url = URL.createObjectURL(blob);
window.open(url);
//convert base64 string to arraybuffer
base64ToArrayBuffer(data) {
var bString = window.atob(data);
var bLength = bString.length;
var bytes = new Uint8Array(bLength);
for (var i = 0; i < bLength; i++) {
var ascii = bString.charCodeAt(i);
bytes[i] = ascii;
}
return bytes;
};
Base64 is not pdf so hello.b64 will never morph into hello.pdf
It needs a pdf header page and trailer in decimal bytes, those cannot be easily added as base64 object wrapping as too many variables.
The text/pdf needs careful script as text to wrap around the hello text see hello example https://stackoverflow.com/a/70748286/10802527
So as Base64 for example
JVBERi0xLjIgDQo5IDAgb2JqDQo8PA0KPj4NCnN0cmVhbQ0KQlQvIDMyIFRmKCAgSGVsbG8gV29ybGQgICApJyBFVA0KZW5kc3RyZWFtDQplbmRvYmoNCjQgMCBvYmoNCjw8DQovVHlwZSAvUGFnZQ0KL1BhcmVudCA1IDAgUg0KL0NvbnRlbnRzIDkgMCBSDQo+Pg0KZW5kb2JqDQo1IDAgb2JqDQo8PA0KL0tpZHMgWzQgMCBSIF0NCi9Db3VudCAxDQovVHlwZSAvUGFnZXMNCi9NZWRpYUJveCBbIDAgMCAyNTAgNTAgXQ0KPj4NCmVuZG9iag0KMyAwIG9iag0KPDwNCi9QYWdlcyA1IDAgUg0KL1R5cGUgL0NhdGFsb2cNCj4+DQplbmRvYmoNCnRyYWlsZXINCjw8DQovUm9vdCAzIDAgUg0KPj4NCiUlRU9G
<iframe type="application/pdf" width="95%" height=150 src="data:application/pdf;base64,JVBERi0xLjIgDQo5IDAgb2JqDQo8PA0KPj4NCnN0cmVhbQ0KQlQvIDMyIFRmKCAgSGVsbG8gV29ybGQgICApJyBFVA0KZW5kc3RyZWFtDQplbmRvYmoNCjQgMCBvYmoNCjw8DQovVHlwZSAvUGFnZQ0KL1BhcmVudCA1IDAgUg0KL0NvbnRlbnRzIDkgMCBSDQo+Pg0KZW5kb2JqDQo1IDAgb2JqDQo8PA0KL0tpZHMgWzQgMCBSIF0NCi9Db3VudCAxDQovVHlwZSAvUGFnZXMNCi9NZWRpYUJveCBbIDAgMCAyNTAgNTAgXQ0KPj4NCmVuZG9iag0KMyAwIG9iag0KPDwNCi9QYWdlcyA1IDAgUg0KL1R5cGUgL0NhdGFsb2cNCj4+DQplbmRvYmoNCnRyYWlsZXINCjw8DQovUm9vdCAzIDAgUg0KPj4NCiUlRU9G">frame</iframe>
Try above but may be blocked by security it will look like this for some users but not ALL !
In comments you asked how text could be manipulated in java script, and my stock answer is java script cannot generally be easily used to build PDF or edit Base64 content. However if you have prepared placeholders it can be changed by find and replace. But must be done with care as the total file length should never be changed.
As an example take the above as a prior template and switch the content to.
JVBERi0xLjIgDQo5IDAgb2JqDQo8PA0KPj4NCnN0cmVhbQ0KQlQvIDMyIFRmKCAgRmFyZS10aGVlLXdlbGwpJyBFVA0KZW5kc3RyZWFtDQplbmRvYmoNCjQgMCBvYmoNCjw8DQovVHlwZSAvUGFnZQ0KL1BhcmVudCA1IDAgUg0KL0NvbnRlbnRzIDkgMCBSDQo+Pg0KZW5kb2JqDQo1IDAgb2JqDQo8PA0KL0tpZHMgWzQgMCBSIF0NCi9Db3VudCAxDQovVHlwZSAvUGFnZXMNCi9NZWRpYUJveCBbIDAgMCAyNTAgNTAgXQ0KPj4NCmVuZG9iag0KMyAwIG9iag0KPDwNCi9QYWdlcyA1IDAgUg0KL1R5cGUgL0NhdGFsb2cNCj4+DQplbmRvYmoNCnRyYWlsZXINCjw8DQovUm9vdCAzIDAgUg0KPj4NCiUlRU9G
So by find and replace SGVsbG8gV29ybGQgICAp with RmFyZS10aGVlLXdlbGwp we get a text change:- (it is important the string length is a multiple of 4 and the length is the same)
<iframe type="application/pdf" width="95%" height=150 src="data:application/pdf;base64,JVBERi0xLjIgDQo5IDAgb2JqDQo8PA0KPj4NCnN0cmVhbQ0KQlQvIDMyIFRmKCAgRmFyZS10aGVlLXdlbGwpJyBFVA0KZW5kc3RyZWFtDQplbmRvYmoNCjQgMCBvYmoNCjw8DQovVHlwZSAvUGFnZQ0KL1BhcmVudCA1IDAgUg0KL0NvbnRlbnRzIDkgMCBSDQo+Pg0KZW5kb2JqDQo1IDAgb2JqDQo8PA0KL0tpZHMgWzQgMCBSIF0NCi9Db3VudCAxDQovVHlwZSAvUGFnZXMNCi9NZWRpYUJveCBbIDAgMCAyNTAgNTAgXQ0KPj4NCmVuZG9iag0KMyAwIG9iag0KPDwNCi9QYWdlcyA1IDAgUg0KL1R5cGUgL0NhdGFsb2cNCj4+DQplbmRvYmoNCnRyYWlsZXINCjw8DQovUm9vdCAzIDAgUg0KPj4NCiUlRU9G">frame</iframe>
and the result be
There are strict rules to be followed when using this method:-
Hello World ) is the template, note the inclusion of white space before the ) limit thus
Fare-thee-well) is as far as substitution is allowed in this case
so source field must be pre-planned to be big enough for largest replacement and is based on a plain text length of multiples of 3 (matches base64 blocks of 4)

javascript, how could we read a local text file with accent letters into it?

I have one doubt because I need to read a local file and I have been studying some threads, and I have seen various ways to handle it, in most of the cases there is an input file.
I would need to load it directly through code.
I have studied this thread:
How to read a local text file?
And I could read it.
The surprising part was when I tried to split the lines and words, it showed: � replacing accent letters.
The code I have right now is:
myFileReader.js
function readTextFile(file) {
var rawFile = new XMLHttpRequest();
rawFile.open("GET", file, false);
rawFile.onreadystatechange = function () {
if (rawFile.readyState === 4) {
if (rawFile.status === 200 || rawFile.status == 0) {
allText = rawFile.responseText;
console.log('The complete text is', allText);
let lineArr = intoLines(allText);
let firstLineWords = intoWords(lineArr[0]);
let secondLineWords = intoWords(lineArr[1]);
console.log('Our first line is: ', lineArr[0]);
let atlas = {};
for (let i = 0; i < firstLineWords.length; i++) {
console.log(`Our ${i} word in the first line is : ${firstLineWords[i]}`);
console.log(`Our ${i} word in the SECOND line is : ${secondLineWords[i]}`);
atlas[firstLineWords[i]] = secondLineWords[i];
}
console.log('The atlas is: ', atlas);
let atlasJson = JSON.stringify(atlas);
console.log('Atlas as json is: ', atlasJson);
download(atlasJson, 'atlasJson.txt', 'text/plain');
}
}
};
rawFile.send(null);
}
function download(text, name, type) {
var a = document.getElementById("a");
var file = new Blob([text], {type: type});
a.href = URL.createObjectURL(file);
a.download = name;
}
function intoLines(text) {
// splitting all text data into array "\n" is splitting data from each new line
//and saving each new line as each element*
var lineArr = text.split('\n');
//just to check if it works output lineArr[index] as below
return lineArr;
}
function intoWords(lines) {
var wordsArr = lines.split('" "');
return wordsArr;
}
The doubt is: how could we handle those special character which are the vowels with accent?
I ask this, because even in the IDE thet interrogation marks appeared if we load the txt in UTF-8, so then I changed to ISO-8859-1 and it loaded well.
Also I have studied:
Read UTF-8 special chars from external file using Javascript
Convert special characters to HTML in Javascript
Reading a local text file from a local javascript file?
In addition, could you explain if there is a shorter way to load files in client javascript. For example in Java there is the FileReader / FileWriter / BufferedWriter. Is theren in Javascript something similar?
Thank you for you help!
It sounds like the file is encoded with ISO-8859-1 (or possibly the very-similar Windows-1252).
There's no BOM or equivalent for those encodings.
The only solutions I can see are:
Use a (local) server and have it return the HTTP Content-Type header with the encoding identified as a charset, e.g. Content-Type: text/plain; encoding=ISO-8859-1
Use UTF-8 instead (e.g., open the file in an editor as ISO-8859-1, then save it as UTF-8 instead), as that's the default encoding for XHR response bodies.
Put your text in an .html file with the corresponding content type,
for example:
<meta http-equiv="Content-Type" content="text/html; charset="UTF-8">
enclose the text between two tags ("####" in my example) (or put in a div)
Read the html page, extract the content and select the text:
window.open(url); //..
var content = newWindow.document.body.innerHTML;
var strSep="####";
var x = content.indexOf(strSep);
x=x+strSep.length;
var y = content.lastIndexOf(strSep);
var points=content.slice(x, y);

JAVASCRIPT decode a base64string (which is an encoded zipfile) to a zipfile and get the zipfiles content by name

the question says it all, im receiving a base64 encoded ZIPFILE from the server, which I first want to decode to a ZIPFILE in memory and then get the ZIPFILES content, which is a json-file.
I tried to use JSZIP but im totally lost in this case ... the base64 string is received with javascript by a promise.
So my question in short is: How can I convert a base64 encoded ZIPFILE to a ZIPFILE in memory to get its contents.
BASE64 -> ZIPFILE -> CONTENT
I use this complicated process to save much space on my database. And I dont want to handle this process on server-side, but on clientside with JS.
Thanks in advance!
If anyone is interested in my solution to this problem read my answer right here:
I received the data in a base64-string format, then converted the string to a blob. Then I used the blob-handle to load the zipfile with the JSZip-Library. After that I could just grab the contents of the zipfile. Code is below:
function base64ToBlob(base64) {
let binaryString = window.atob(base64);
let binaryLen = binaryString.length;
let ab = new ArrayBuffer(binaryLen);
let ia = new Uint8Array(ab);
for (let i = 0; i < binaryLen; i++) {
ia[i] = binaryString.charCodeAt(i);
}
let bb = new Blob([ab]);
bb.lastModifiedDate = new Date();
bb.name = "archive.zip";
bb.type = "zip";
return bb;
}
To get the contents of the zipfile:
let blob = base64ToBlob(resolved);
let zip = new JSZip();
zip.loadAsync(blob).then(function(zip) {
zip.file("archived.json").async("string").then(function (content) {
console.log(content);
// content is the file as a string
});
}).catch((e) => {
});
As you can see, first the blob is created from the base64-string. Then the handle is given over to the JSZip loadAsync method. After that you have to set the name of the file which you want to retrieve from the zipfile. In this case it is the file called "archived.json". Now because of the async("string") function the file (file contents) are returned as a string. To further use the extracted string, just work with the content variable.

Search for tab character in a specific line of a tab delimited text file

I'm loading a tab delimited text file using the FileReader API. Once loaded I need to find the tab location in the first line, parse out the characters preceding the tab, do some stuff with the parsed characters, then proceed to the second line, parse out the characters before the first tab in the second line, do some stuff with the parsed characters, then proceed to the third line, and so on until the end of the file.
I'm not a coder. I could use some help on the script to perform these operations.
Update/Edit (as requested): Specifically, taking it step by step:
I'm able to load the tab delimited file.
I'm able to step through the lines of the file (row 15+).
I'm making progress on stepping through the lines in the file (row 15+).
But I'm failing in the ability to perform a set of tasks as each line is read.
As each line is read, I want to parse out the characters in the line that are prior to the first tab character. In the example file contents below, I want to parse out 5, then I wish to take action on the 5. After that I want to parse out 10, then take action on the 10. Then I want to parse out 200 and take action on the 200. Then the script will end.
I'm assuming as each line is read that I want to call another function and send the contents of the first line to the new function. The new function will then parse out the characters before the first tab. Is this correct? If not, then what should I be doing? After that I'm assuming I should call another function, which will take the action on the parsed characters. Is this correct (and if not, what should I be doing instead)?
If I'm correct that I should be calling another function with each line read, then how do I do so (including sending the contents of the line)? In the code shown, I've been unsuccessful in figuring out how to do this.
Thank you,
Andrew
Example of tab delimited file:
5 15:00:05 2 1
10 15:00:10 2 2
200 15:03:20 2 3
var fileInput = document.getElementById('fileInput');
fileInput.addEventListener('change', function (e) {
var file = fileInput.files[0];
var textType = /text.*/;
if (file.type.match(textType)) {
var reader = new FileReader();
reader.onload = function (e) {
// Entire file
fileDisplayArea.innerText = reader.result;
// Count of lines in file
var lines2 = reader.result.split("\n").length;
fileDisplayArea2.innerText = "The number of lines in the text file is: " + Number(lines2-1);
// Attempt at an action per line
var lines = reader.result.split('\n');
for (var line = 0; line < lines.length; line++) {
//console.log(lines[line])
//with each line, how can I call another function and send along with the call the contents of the line?
fileDisplayArea3.innerText = lines;
}
}
reader.readAsText(file);
} else {
fileDisplayArea.innerText = "File not supported!"
}
});
Select a text file:
<input type="file" id="fileInput">
<hr />
<pre id="fileDisplayArea"></pre>
<hr />
<pre id="fileDisplayArea2"></pre>
<hr />
<pre id="fileDisplayArea3"></pre>
Here is an example of what you want to do. When looping each line you can get the text of that line with lines[line] from the lines array. You can then pass that text (and in my example the line number) to a function.
In my example the function is doStuff and it then splits the line text by tab character getting an array of "cells" (the values on the line that are delimited by tabs). I had the function output the values so that you could see them. You can have it do whatever you need.
var fileInput = document.getElementById('fileInput');
fileInput.addEventListener('change', function (e) {
var file = fileInput.files[0];
var textType = /text.*/;
if (file.type.match(textType)) {
var reader = new FileReader();
reader.onload = function (e) {
// Entire file
fileDisplayArea.innerText = reader.result;
// Count of lines in file
var lines2 = reader.result.split("\n").length;
fileDisplayArea2.innerText = "The number of lines in the text file is: " + Number(lines2);
// Attempt at an action per line
var lines = reader.result.split('\n');
for (var line = 0; line < lines.length; line++) {
doStuff(line, lines[line]);
fileDisplayArea3.innerText = lines;
}
}
reader.readAsText(file);
} else {
fileDisplayArea.innerText = "File not supported!"
}
});
function doStuff(lineNumber, lineText) {
// do something with the
var cells = lineText.split('\t'); // '\t' is a tab character
cellValues.innerText += "Line: " + (lineNumber + 1) + "\n";
cells.forEach(function(value) {
// do something with each "value" that was delimited by the "tab" characters
// in this example add the value to cellValues
// you can do whatever you want with the "value" here
cellValues.innerText += '\t' + value + '\n';
});
}
Select a text file:
<input type="file" id="fileInput">
<hr />
<pre id="fileDisplayArea"></pre>
<hr />
<pre id="fileDisplayArea2"></pre>
<hr />
<pre id="fileDisplayArea3"></pre>
<hr />
<pre id="cellValues"></pre>
Update: Explanation of doStuff
The first line of the function is var cells = lineText.split('\t'); This does not "replace" the tab characters with commas. What it does is create an array and store it into the cells variable.
In your original code the reason this line fileDisplayArea3.innerText = lines; is displayed with commas is because the lines array is transformed to a string in order to put it into innerText. Internally javascript calls the toString() method on the array which outputs it's elements separated by commas.
So continuing on. cells is now an array of the values of the line that were separated (delimited) by tab characters. We could use a for loop like you did to iterate the lines but I chose to use forEach. forEach will loop through (as the name suggests) each element of the array passing it's value to the function. The value is now available to do whatever we want with it i.e. make decisions, do math on it, etc... or (in my case) write it out to be seen.

From .txt data give values to inputs

I'm trying to fill some inputs when you load a page, using the data I have from a .txt file. This file has a list of numbers
1
2
3
Something like this. So I wanted to read this lines and put them in their corresponding input. Suggestions on how to do this??
I tried with this code, but maybe I have a mistake that I don't know about, I'm starting with javascript.
function loadvalues()
{
var fso = new ActiveXObject("Scripting.FileSystemObject");
var s = fso.OpenTextFile("E://Steelplanner/Demand_Routing/Pruebas/OrderBalancing_Masivos/ModificaFechaTope/DueDate/Datosactuales.txt", true);
var Ia5 = document.getElementById("Ia5sem");
var text = s.ReadLine();
Ia5.value = text;
Try using file.ReadLine() until document not completely read using while loop with AtEndOfStream of file variable.
Here is example you can refer: ReadLine Method
Don't forget to replace TextFile path to your own text file path
My text file contains same data as in your example
<script type="text/javascript">
var fso = new ActiveXObject("Scripting.FileSystemObject");
//specify the local path to Open and always add escape character else throw error for bad file name
var file = fso.OpenTextFile("C:\\Users\\MY_USER\\Desktop\\txtfile.txt", 1);
var Ia5 = document.getElementById("Ia5sem");
while (!file.AtEndOfStream) {
var r = file.ReadLine();
Ia5.innerHTML = Ia5.innerHTML + ("<br />" + r);
}
file.Close();
</script>
<p id="Ia5sem">HI</p>
So, I don't know why, but I just changed the name of the variables and made a slight change in the .OpenTextFile line and it worked.
function loadvalues()
{
var file = new ActiveXObject("Scripting.FileSystemObject");
var text = file.OpenTextFile("E:\\Steelplanner\\Demand_Routing\\Pruebas\\OrderBalancing_Masivos\\ModificaFechaTope\\DueDate\\Datosactuales.txt", 1,false);
var Ia5s = document.getElementById("Ia5sem");
Ia5s.value = text.ReadLine();
var Ia4s = document.getElementById("Ia4sem");
Ia4s.value = text.ReadLine();
text.close();
}
Anyways, I'm gonna check the FileReader() for future references and the script #Sarjan gave, maybe I can improve it, but I have other things to finish. Thanks for everything.

Categories