Setting up a Papa Parse progress bar with Web workers

Setting up a Papa Parse progress bar with Web workers - javascript

I'm working on a CSV parsing web application, which collects data and then uses it to draw a plot graph. So far it works nicely, but unfortunately it takes some time to parse the CSV files with papaparse, even though they are only about 3MB.
So it would be nice to have some kind of progress shown, when "papa" is working. I could go for the cheap hidden div, showing "I'm working", but would prefer the use of <progress>.
Unfortunately the bar just gets updated AFTER papa has finished its work. So I tried to get into webworkers and use a worker file to calculate progress and also setting worker: true in Papa Parses configuration. Still no avail.
The used configuration (with step function) is as followed:
var papaConfig =
{
header: true,
dynamicTyping: true,
worker: true,
step: function (row) {
if (gotHeaders == false) {
for (k in row.data[0]) {
if (k != "Time" && k != "Date" && k != " Time" && k != " ") {
header.push(k);
var obj = {};
obj.label = k;
obj.data = [];
flotData.push(obj);
gotHeaders = true;
}
}
}
tempDate = row.data[0]["Date"];
tempTime = row.data[0][" Time"];
var tD = tempDate.split(".");
var tT = tempTime.split(":");
tT[0] = tT[0].replace(" ", "");
dateTime = new Date(tD[2], tD[1] - 1, tD[0], tT[0], tT[1], tT[2]);
var encoded = $.toJSON(row.data[0]);
for (j = 0; j < header.length; j++) {
var value = $.evalJSON(encoded)[header[j]]
flotData[j].data.push([dateTime, value]);
}
w.postMessage({ state: row.meta.cursor, size: size });
},
complete: Done,
}
Worker configuration on the main site:
var w = new Worker("js/workers.js");
w.onmessage = function (event) {
$("#progBar").val(event.data);
};
and the called worker is:
onmessage = function(e) {
var progress = e.data.state;
var size = e.data.size;
var newPercent = Math.round(progress / size * 100);
postMessage(newPercent);
}
The progress bar is updated, but only after the CSV file is parsed and the site is set up with data, so the worker is called, but the answer is handled after parsing. Papa Parse seems to be called in a worker, too. Or so it seems if checking the calls in the browsers debugging tools, but still the site is unresponsive, until all data shows up.
Can anyone point me to what I have done wrong, or where to adjust the code, to get a working progress bar? I guess this would also deepen my understanding of web workers.

You could use the FileReader API to read the file as text, split the string by "\n" and then count the length of the returned array. This is then your size variable for the calculation of percentage.
You can then pass the file string to Papa (you do not need to reread directly from the file) and pass the number of rows (the size variable) to your worker. (I am unfamiliar with workers and so am unsure how you do this.)
Obviously this only accurately works if there are no embedded line breaks inside the csv file (e.g. where a string is spread over several lines with line breaks) as these will count as extra rows, so you will not make it to 100%. Not a fatal error, but may look strange to the user if it always seems to finish before 100%.
Here is some sample code to give you ideas.
var size = 0;
function loadFile(){
var files = document.getElementById("file").files; //load file from file input
var file = files[0];
var reader = new FileReader();
reader.readAsText(file);
reader.onload = function(event){
var csv = event.target.result; //the string version of your csv.
var csvArray = csv.split("\n");
size = csvArray.length;
console.log(size); //returns the number of rows in your file.
Papa.parse(csv, papaConfig); //Send the csv string to Papa for parsing.
};
}

I haven't used Papa Parse with workers before, but a few things pop up after playing with it for a bit:
It does not seem to expect you to interact directly with the worker
It expects you to either want the entire final result, or the individual items
Using a web worker makes providing a JS Fiddle infeasible, but here's some HTML that demonstrates the second point:
<html>
<head>
<script src="papaparse.js"></script>
</head>
<body>
<div id="step">
</div>
<div id="result">
</div>
<script type="application/javascript">
var papaConfig = {
header: true,
worker: true,
step: function (row) {
var stepDiv = document.getElementById('step');
stepDiv.appendChild(document.createTextNode('Step received: ' + JSON.stringify(row)));
stepDiv.appendChild(document.createElement('hr'));
},
complete: function (result) {
var resultDiv = document.getElementById('result');
resultDiv.appendChild(document.createElement('hr'));
resultDiv.appendChild(document.createTextNode('Complete received: ' + JSON.stringify(result)))
resultDiv.appendChild(document.createElement('hr'));
}
};
var data = 'Column 1,Column 2,Column 3,Column 4 \n\
1-1,1-2,1-3,1-4 \n\
2-1,2-2,2-3,2-4 \n\
3-1,3-2,3-3,3-4 \n\
4,5,6,7';
Papa.parse(data, papaConfig);
</script>
</body>
</html>
If you run this locally, you'll see you get a line for each of the four rows of the CSV data, but the call to the complete callback gets undefined. Something like:
Step received: {"data":[{"Column 1":"1-1",...
Step received: {"data":[{"Column 1":"2-1",...
Step received: {"data":[{"Column 1":"3-1",...
Step received: {"data":[{"Column 1":"4","...
Complete received: undefined
However if you remove or comment out the step function, you will get a single line for all four results:
Complete received: {"data":[{"Column 1":"1-1",...
Note also that Papa Parse uses a streaming concept to support the step callback regardless of using a worker or not. This means you won't know how many items you are parsing directly, so calculating the percent complete is not possible unless you can find the length of items separately.

Related

Getting this error Uncaught RangeError: Maximum call stack size exceeded

I am getting this error while uploading the CSV files in my website.
Screenshot attached below.
I can see my data of csv file while debugging but it is stopping me to proceed further but i am not able to get this error, I have searched that on google to but they are not relevant to this.
I am using the library
https://d3js.org/d3-dsv.v1.min.js
The code I am trying to upload file is below.
function file(event){
var uploadFileEl = document.getElementById('upload');
if(uploadFileEl.files.length > 0){
var reader = new FileReader();
reader.onload = function(e) {
fileProcess(reader.result.split(/\[\r\n\]+/));
}
reader.readAsText(uploadFileEl.files\[0\]);
}
}
function fileProcess(data) {
var lines = data;
//Set up the data arrays
var time = \[\];
var data1 = \[\];
var data2 = \[\];
var data3 = \[\];
var headings = lines\[0\].split(','); // Splice up the first row to get the headings
var headerCheckbox = document.getElementById('includeHeader');
if(headerCheckbox.checked == true){
for (var j=1; j<lines.length; j++) {
var values = lines\[j\].split(','); // Split up the comma seperated values
// We read the key,1st, 2nd and 3rd rows
time.push(values\[0\]); // Read in as string
// Recommended to read in as float, since we'll be doing some operations on this later.
if (values\[0\] =="" || values\[0\] == null )
{
delete values\[0\];
delete values\[1\];
delete values\[2\];
delete values\[3\];
}
else{
data1.push(parseFloat(values\[1\]));
data2.push(parseFloat(values\[2\]));
data3.push(parseFloat(values\[3\]));
}
}
The error I am getting is on this line
fileProcess(reader.result.split(/[\r\n]+/));
What could be the reason for this.

That error means that you are filling the call stack.
If you call a function that returns a function recursively and doesn't have a stop condition or it never fulfils it's stop condition that error will be throwed.
It appears that fileProcess(reader.result.split(/[\r\n]+/)) it's being called recursively and it's filling the call stack.
Don't know exactly where, as I don't see any recursive calls in the code you posted, so I can't help you farer, but I hope this can shed light into your problem.
P.S: If you think there's some relevant extra code that you didn't post and edit your question leave me a comment and I'll update my answer aswell.

out of memory error when calling readAsArrayBuffer method on FileReader of the cordova-plugin-file (iOS)

On iOS I'm trying to upload videos to my own website and to do so I use the FileReader.readAsArrayBuffer method with a chunkSize of 1048576.
I upload chunks when the onprogress event fires and all of this actually works perfect except for bigger files. When trying to upload a file of 1.33GB i'm getting a out of memory exception when calling the readAsArrayBuffer method.
I guess this is because it's trying to reserve memory for the complete file, however this is not necessary. Is there a way to read a binary chunk from a file without it reserving memory for the complete file? or are there other solutions?
Thanks!

I fixed it myself today by changing the plugin code, this is the original code:
FileReader.prototype.readAsArrayBuffer = function (file) {
if (initRead(this, file)) {
return this._realReader.readAsArrayBuffer(file);
}
var totalSize = file.end - file.start;
readSuccessCallback.bind(this)('readAsArrayBuffer', null, file.start, totalSize, function (r) {
var resultArray = (this._progress === 0 ? new Uint8Array(totalSize) : new Uint8Array(this._result));
resultArray.set(new Uint8Array(r), this._progress);
this._result = resultArray.buffer;
}.bind(this));
};
and since at start progress is always 0, it always reserves the entire filesize. I added a propery READ_CHUNKED (because I still have other existing code that also uses this method and expects it to work as it did, i have to check to see that everything else also keeps working) and changed the above to this:
FileReader.prototype.readAsArrayBuffer = function(file) {
if (initRead(this, file)) {
return this._realReader.readAsArrayBuffer(file);
}
var totalSize = file.end - file.start;
readSuccessCallback.bind(this)('readAsArrayBuffer', null, file.start, totalSize, function(r) {
var resultArray;
if (!this.READ_CHUNKED) {
resultArray = new Uint8Array(totalSize);
resultArray.set(new Uint8Array(r), this._progress);
} else {
var newSize = FileReader.READ_CHUNK_SIZE;
if ((totalSize - this._progress) < FileReader.READ_CHUNK_SIZE) {
newSize = (totalSize - this._progress);
}
resultArray = new Uint8Array(newSize);
resultArray.set(new Uint8Array(r), 0);
}
this._result = resultArray.buffer;
}.bind(this));
};
When the READ_CHUNKED property is true, it only returns the chunks and doesn't reserve memory for the entire file and when it is false it works like it used to.
I've never used github (except for pulling code) so i'm not uploading this for now, I might look into it in the near future.

FileSaver.js Chrome Issue, Multiple documents

I'm using Chrome v67 and FileSaver.js.
This code works in FF & Edge but not in Chrome.
var ajaxSettings = {
url: "my/api/method",
data: JSON.stringify(myItems),
success: function (data) {
if (data) {
var dateToAppend = moment().format("YYYYMMDD-HHmmss");
createPdf("pdfDoc_" + dateToAppend + ".pdf", data[0]);
saveFile("txtDoc_" + dateToAppend + ".txt", data[1]);
//sleep(2000).then(() => {
// saveFile("txtDoc_" + dateToAppend + ".txt", data[1]);
//});
}
}
}
//function sleep(time) {
//return new Promise((resolve) => setTimeout(resolve, time));
//}
function createPdf(filename, pdfBytes) {
if (navigator.appVersion.indexOf("MSIE 9") > -1) {
window.open("data:application/pdf;base64," + encodeURIComponent(pdfBytes));
}
else {
saveFile(fileName, data);
}
}
function saveFile(fileName, data) {
var decodedByte64 = atob(data);
var byteVals = new Array(decodedByte64.length);
for (var i = 0; i < decodedByte64.length; i++) {
byteVals[i] = decodedByte64.charCodeAt(i);
}
var byte8bitArray = new Uint8Array(byteVals);
var blob = new Blob([byte8bitArray]);
saveAs(blob, fileName); //FileSaver.js
}
The result of calling the API is an array with 2 byte arrays in it.
The byte arrays are the documents.
If I run this code, as a user would, then what happens is that the first document gets "downloaded" but the second does not. Attempting to do this a second time without refreshing the page results in no documents being "downloaded". The "download" word is in quotes because it already has been downloaded, what I'm really trying to do is generate the documents from the byte arrays.
This is the strange bit ... if I open the console and place a breakpoint on the "saveFile" call and immediately hit continue when the debugger lands on the breakpoint then all is well with the world and the 2 documents get downloaded.
I initially thought it was a timing issue so I put a 2 second delay on this to see if that was it but it wasn't. The only thing I've managed to get working is the breakpoint which I'm obviously not going to be able to convince the users to start doing no matter how much I want them to.
Any help or pointers are much appreciated

You probably faced a message at the top left corner of your browser stating
https://yoursite.com wants to
Download multiple files
[Block] [Allow]
If you don't have it anymore, it's probably because you clicked on "Block".
You can manage this restriction in chrome://settings/content/automaticDownloads.

javascript functions wait for data availability or variable not capable of handling huge data

forgive the trivial question but I am more used to C++ and Python code than javascript.
I have the following code from the THREE JS PLY loader:
var geometry;
var scope = this;
if (data instanceof ArrayBuffer) {
geometry = isASCII(data) ? parseASCII(bin2str(data)) : parseBinary(data);
} else {
geometry = parseASCII(data);
}
parse: function (data) {
function isASCII(data) {
var header = parseHeader(bin2str(data));
return header.format === 'ascii';
}
function bin2str(buf) {
var array_buffer = new Uint8Array(buf);
var str = '';
for (var i = 0; i < buf.byteLength; i++) {
str += String.fromCharCode(array_buffer[i]); // implicitly assumes little-endian
}
return str;
}
It works fine if I load a small ply file but browser crashes on very large one. I believe there are two "possible" issues:
1) on a large file the string str returned by the function bin2str(buf) might not be able to handle the parsing process
2) in the function isASCII(data) the line
parseHeader(bin2str(data));
crashes the browser as the bin2str(data) cannot return a proper value in time as the process is very memory consuming
I am using the conditional as i am not totally sure of what the problem is. Any suggestion and/or possible solution?
Thank you,
Dino

In the end the solution I have adopted was to decimate my file using the free software MeshLab.
Hope this helps.
Dino

Javascript memory consumption with map() over a large set and callbacks

I don't even know how to properly ask this question but I have concerns about the performance (mostly memory consumption) of the following code. I am anticipating that this code will consume a lot of memory because of map on a large set and a lot of 'hanging' functions that wait for external service. Are my concerns justified here? What would be a better approach?
var list = fs.readFileSync('./mailinglist.txt') // say 1.000.000 records
.split("\n")
.map( processEntry );
var processEntry = function _processEntry(i){
i = i.split('\t');
getEmailBody( function(emailBody, name){
var msg = {
"message" : emailBody,
"name" : i[0]
}
request(msg, function reqCb(err, result){
...
});
}); // getEmailBody
}
var getEmailBody = function _getEmailBody(obj, cb){
// read email template from file;
// v() returns the correct form for person's name with web-based service
v(obj.name, function(v){
cb(obj, v)
});
}

If you're worried about submitting a million http requests in a very short time span (which you probably should be), you'll have to set up a buffer of some kind.
one simple way to do it:
var lines = fs.readFileSync('./mailinglist.txt').split("\n");
var entryIdx = 0;
var done = false;
var processNextEntry = function () {
if (entryIdx < lines.length) {
processEntry(lines[entryIdx++]);
} else {
done = true;
}
};
var processEntry = function _processEntry(i){
i = i.split('\t');
getEmailBody( function(emailBody, name){
var msg = {
"message" : emailBody,
"name" : name
}
request(msg, function reqCb(err, result){
// ...
!done && processNextEntry();
});
}); // getEmailBody
}
// getEmailBody didn't change
// you set the ball rolling by calling processNextEntry n times,
// where n is a sensible number of http requests to have pending at once.
for (var i=0; i<10; i++) processNextEntry();
Edit: according to this blog post node has an internal queue system, it will only allow 5 simultaneous requests. But you can still use this method to avoid filling up that internal queue with a million items if you're worried about memory consumption.

Firstly I would advise against using readFileSync, and instead favour the async equivalent. Blocking on IO operations should be avoided as reading from a disk is very expensive, and whilst that's the sole purpose of your code now, I would consider how that might change in the future - and arbitrarily wasting clock cycles is never a good idea.
For large data files I would read them in in defined chunks and process them. If you can come up with some schema, either sentinels to distinguish data blocks within the file, or padding to boundaries, then process the file piece by piece.
This is just rough, untested off the top of my head, but something like:
var fs = require("fs");
function doMyCoolWork(startByteIndex, endByteIndex){
fs.open("path to your text file", 'r', function(status, fd) {
var chunkSize = endByteIndex - startByteIndex;
var buffer = new Buffer(chunkSize);
fs.read(fd, buffer, 0, chunkSize, 0, function(err, byteCount) {
var data = buffer.toString('utf-8', 0, byteCount);
// process your data here
if(stillWorkToDo){
//recurse
doMyCoolWork(endByteIndex, endByteIndex + 100);
}
});
});
}
Or look into one of the stream library functions for similar functionality.
H2H
ps. Javascript and Node works extremely well with async and eventing.. using sync is an antipattern in my opinion, and likely to cause code to be a headache in future

We Keep Coding

JavaScript is the programming language of the Web.

Setting up a Papa Parse progress bar with Web workers - javascript

Related

Getting this error Uncaught RangeError: Maximum call stack size exceeded

out of memory error when calling readAsArrayBuffer method on FileReader of the cordova-plugin-file (iOS)

FileSaver.js Chrome Issue, Multiple documents

javascript functions wait for data availability or variable not capable of handling huge data

Javascript memory consumption with map() over a large set and callbacks

Categories

Resources