Count number of lines in CSV with Javascript - javascript

I'm trying to think of a way to count the number of lines in a .csv file using Javascript, any useful tips or resources someone can direct me to?

Depends what you mean by a line. For simple number of newlines, Robusto's answer is fine.
If you want to know how many rows of CSV data that represents, things may be a little more difficult, as a CSV field may itself contain a newline:
field1,"field
two",field3
...is one row, at least in CSV as defined by RFC4180. (It's one of the aggravating features of CSV that there are so many non-standard variants; the RFC itself was very late to the game.)
So if you need to cope with that case you'll have to essentially parse each field.
A field can be raw, or (necessarily if it contains \n or ,) quoted, with " represented as double quotes. So a regex for one field would be:
"([^"]|"")*"|[^,\n]*
and so for a whole row (assuming it is not empty):
("([^"]|"")*"|[^,\n]*)(,("([^"]|"")*"|[^,\n]*))*\n
and to get the number of those:
var rowsn= csv.match(/(?:"(?:[^"]|"")*"|[^,\n]*)(?:,(?:"(?:[^"]|"")*"|[^,\n]*))*\n/g).length;
If you are lucky enough to be dealing with a variant of CSV that complies with RFC4180's recommendation that there are no " characters in unquoted fields, you can make this a bit more readable. Split on newlines as before and count the number of " characters in each line. If it's an even number, you have a complete line; if it's an odd number you've got a split.
var lines= csv.split('\n');
for (var i= lines.length; i-->0;)
if (lines[i].match(/"/g).length%2===1)
lines.splice(i-1, 2, lines[i-1]+lines[i]);
var rowsn= lines.length;

To count the number of lines in a document (once you have it as a string in Javascript), simply do:
var lines = csvString.split("\n").length;

You can use the '.' to match everything on a line except the newline at the end-
it won't count quoted new lines. Use the 'm' for multiline flag, as well as 'g' for global.
function getLines(s){
return s.match(/^(.*)$/mg);
}
alert(getLines(string).length)
If you don't mind skipping empty lines it is simpler-
but sometimes you need to keep them for spaceing.
function getLines(s){
return s.match(/(.+)/g);
}

If you are asking to count number of rows in a csv then you can use this example..
http://purbayubudi.wordpress.com/2008/11/09/csv-parser-using-javascript/
It takes the csv file and in a popup window display number of rows..

Here is a sample code in Typescript.
FileReader is needed to obtain the contents.
Returning a promise makes it easy to wait for the results of the asynchronous readAsText and onload function.
const countRowsInCSV = async (csvFile: File): Promise<number> => {
return new Promise((resolve, reject) => {
try {
const reader = new FileReader();
reader.onload = (event: any) => {
const cvsData = event.target.result;
const rowData = cvsData.split('\n');
resolve(rowData.length);
};
reader.readAsText(csvFile);
} catch (error: any) {
reject(error);
}
});
};
const onChangeFile = (selectedFile: File) => {
const totalRows = await countRowsInCSV(selectedFile);
}

Related

create one array after using map() twice

I may or may not get in 2 differently formatted bits of data.
They both need to be stripped of characters in different ways. Please excuse the variable names, I will make them better once I have this working.
const cut = flatten.map(obj => {
return obj.file.replace("0:/", "");
});
const removeDots = flatten.map(obj => {
return obj.file.replace("../../uploads/", "");
})
I then need to push the arrays into my mongo database.
let data;
for (const loop of cut) {
data = { name: loop };
product.images.push(data);
}
let moreData;
for (const looptwo of removeDots) {
moreData = {name: looptwo};
product.images.push(moreData);
}
I wanted to know if there is a way to either join them or do an if/else because the result of this is that if I have 2 records, it ends up duplicating and I get 4 records instead of 2. Also, 2 of the records are incorrectly formatted ie: the "0:/ is still present instead of being stripped away.
Ideally I would like have a check that if 0:/ is present, remove it, if ../../uploads/ is present or if both are present, remove both. And then create an array from that to push.
You can do your 2 replace on the same map :
const processed = flatten.map(obj => {
return obj.file.replace("0:/", "").replace("../../uploads/", "");
});
Since you know the possible patterns, you can create a regex and use it to replace any occurrences.
const regex = /(0:\/|(\.\.\/)+uploads\/)/g
const processed = flatten.map(obj => obj.file.replace(regex, ''));
You can verify here
Note, regex is a pattern based approach. So it has pros and cons.
Pro:
You can have any number of folder nesting. Using string ../../uploads/ will restrict you with 2 folder structure only.
You can achieve transformation in 1 operation and code looks clean.
Cons:
Regex can be hard to understand and can reduce readability of code a bit. (Opinionated)
If you have pattern like .../../uploads/bla, this will be parsed to .bla.
Since you ask also about a possible way of joining two arrays, I'll give you couple of solutions (with and w/o joining).
You can either chain .replace on the elements of the array, or you can concat the two arrays in your solution. So, either:
const filtered = flatten.map(obj => {
return obj.file.replace('0:/', '').replace('../../uploads/', '');
});
Or (joining the arrays):
// your two .map calls go here
const joinedArray = cut.concat(removeDots);

Stream JSON-parsable array to file

So you're reading data from a file, cleaning out the data, and writing it back to another file, but the new file isn't accepted JSON format.
You need to fill an object in the new file. You get a chunk from the file, alter it, and save it to the new file.
For this you stream the data out, edit the chunks, and stream it back into the other file. Great.
You're sure to add , after each item to keep the array readable later on,
but now the last item has a trailing comma ,...
You don't know the count of items in the original file, and you also don't know when the reader is at the last item.
You use something like JSONStream on that array but JSONStream also does not provide the index.
The only end events are for your writers and readers.
How do you remove the trailing comma before/after writing?
read_file = 'animals.json' //very large file
write_file = 'brown_dogs.json' //moderately large file
let read_stream = fs.createReadStream(read_file);
let write_stream = fs.createWriteStream(write_file);
let dog_stream = require('JSONStream').parse('array_of_animals.dogs.*');
write_stream
.on('finish', () => {
//the writer is done writing my list of dogs, but my array has a
//trailing comma, now my brown_dogs.json isn't parsable
})
.write('{"brown_dogs": ['); //lets start
read_stream
.pipe(dog_stream)
.on('data', dog => {
//basic logic before we save the item
if (dog._fur_colour === 'brown'){
let _dog = {
type : dog._type,
colour : dog._fur_colour,
size : dog._height
}
};
//we write our accepted dog
write_stream.write(JSON.stringify(_dog) + ',');
}
})
.on('end', () => {
//done reading animals.json
write_stream.write(']}');
})
--
If your resulting JSON file is small, you may simply add all the dogs to an array and only save all the contents to the file in one go. This means the file is not only JSON friendly, but also small enough to simply open with JSON.parse()
If your resulting JSON file is large, you may need to stream the items out in any case. Luckily JSONStream allows us to not only extract each dog individually but also ignore the trailing comma.
This is what I understand to be the solution...but I don't think it's perfect. Why can't the file be accepted JSON, regardless of the size.
This is actually very simple.
Add an empty string var to the beginning of the insert. Set the string to a separator after the first insert.
//update this string after the first insert
let separator = '';
read_stream
.pipe(dog_stream)
.on('data', dog => {
//basic logic before we save the item
if (dog._fur_colour === 'brown'){
let _dog = {
type : dog._type,
colour : dog._fur_colour,
size : dog._height
}
};
//we write our accepted dog
write_stream.write(separator + JSON.stringify(_dog));
//update this after first insert
separator = ',';
}
})
I think
I added toJSONArray method exactly for this in scramjet see docs here. It puts a comma only between the chunks.
The code would look like this:
fs.createReadStream(read_file)
.pipe(require('JSONStream').parse('array_of_animals.dogs.*'))
.pipe(new DataStream())
.filter(dog => dog._fur_colour === 'brown') // this will filter out the non-brown dogs.
.map(dog => { // remap the data
reutrn {
type : dog._type,
colour : dog._fur_colour,
size : dog._height
};
})
.toJSONArray(['{"brown_dogs": [', ']}']) // add your enclosure
.pipe(fs.createWriteStream(write_file));
This code should make a fine JSON.

Using javascript to Capitalize, Break it up with a new line every, and Add String to the beginning of each new line

Using javascript to (1) Capitalize all characters from a user input string, (2) break it up with a new line every 45 characters, and (3) add a certain string ("///////" for example) to the beginning of each new line.
I want a simple application where I can copy and paste a string of text, and have a function do the above.
For example:
Copy and paste "I am new to JavaScript, so even this simple code is very difficult to write" and get the following:
"
//////I AM NEW TO JAVASCRIPT, SO EVEN THIS SIMPLE C
//////ODE IS VERY DIFFICULT TO WRITE
"
I would like to, in the future, make it so that it doesn't cut off a word like that in the middle, and can use the SPACES to find where the new line should be, but that seems like a little much right now.
All I have is the Capitalization function working:
var txt = prompt("Enter string of text");
var cap = txt.toUpperCase();
alert(cap);
but I want it to run all three functions at once and alert() the final product.
In a "functional programming style", you can do it like this:
var txt = prompt("Enter string of text");
var cap = txt.toUpperCase().split('').reduce(function(agg, item, i) {
if(i % 45 === 0) {
if(i > 0) {
agg.push('\r\n');
}
agg.push('//////');
}
agg.push(item);
return agg;
}, []).join('');
alert(cap);
Essentially what happens here is that the string is:
Converted to upper case. Then...
Split into an array of single characters. Then...
The array is "reduced"1 to a new array with interwoven new lines and "separator" string (//////). Then...
The new array is joined to form a new string.
1 Reducing an array is an operation that iterates array items sequentially, and incrementally generating a single "reduced" result. Typically this is used in scenarios such as summing multiple values. In this code this is not a "logically correct" usage of this function as it doesn't reduce anything, but it does enable a functional style solution.

What is the fastest way to read and parse a file of numerical ASCII pairs in Node.js?

I'm using Node.js to read and parse a file of pairs encoding numbers. I have a file like this:
1561 0506
1204 900
6060 44
And I want to read it as an array, like this:
[[1561,0506],[1204,900],[6060,44]]
For that, I am using a readStream, reading the file as chunks and using native string functions to do the parsing:
fileStream.on("data",function(chunk){
var newLineIndex;
file = file + chunk;
while ((newLineIndex = file.indexOf("\n")) !== -1){
var spaceIndex = file.indexOf(" ");
edges.push([
Number(file.slice(0,spaceIndex)),
Number(file.slice(spaceIndex+1,newLineIndex))]);
file = file.slice(newLineIndex+1);
};
});
That took way to many time, though (4s for the file I need on my machine). I see some reasons:
Use of strings;
use of "Number";
Dynamic array of arrays.
I've rewriten the algorithm without using the builtin string functions, but loops instead and, to my surprise, it became much slower! Is there any way to make it faster?
Caveat: I have not tested the performance of this solution, but it's complete so should be easy to try.
How about using this liner implementation based on the notes in this question.
Using the liner:
var fs = require('fs')
var liner = require('./liner')
var source = fs.createReadStream('mypathhere')
source.pipe(liner)
liner.on('readable', function () {
var line
while (line = liner.read()) {
var parts = line.split(" ");
edges.push([Number(parts[0]), Number(parts[1])]);
}
})
As you can see I also moved the edge array to be an inline constant-sized array separate from the split parts, which I'm guessing would speed up allocation. You could even try swapping out using indexOf(" ") instead of split(" ").
Beyond this you could instrument the code to identify any further bottlenecks.

Line-oriented streams in Node.js

I'm developing a multi-process application using Node.js. In this application, a parent process will spawn a child process and communicate with it using a JSON-based messaging protocol over a pipe. I've found that large JSON messages may get "cut off", such that a single "chunk" emitted to the data listener on the pipe does not contain the full JSON message. Furthermore, small JSON messages may be grouped in the same chunk. Each JSON message will be delimited by a newline character, and so I'm wondering if there is already a utility that will buffer the pipe read stream such that it emits one line at a time (and hence, for my application, one JSON document at a time). This seems like it would be a pretty common use case, so I'm wondering if it has already been done.
I'd appreciate any guidance anyone can offer. Thanks.
Maybe Pedro's carrier can help you?
Carrier helps you implement new-line
terminated protocols over node.js.
The client can send you chunks of
lines and carrier will only notify you
on each completed line.
My solution to this problem is to send JSON messages each terminated with some special unicode character. A character that you would never normally get in the JSON string. Call it TERM.
So the sender just does "JSON.stringify(message) + TERM;" and writes it.
The reciever then splits incomming data on the TERM and parses the parts with JSON.parse() which is pretty quick.
The trick is that the last message may not parse, so we simply save that fragment and add it to the beginning of the next message when it comes. Recieving code goes like this:
s.on("data", function (data) {
var info = data.toString().split(TERM);
info[0] = fragment + info[0];
fragment = '';
for ( var index = 0; index < info.length; index++) {
if (info[index]) {
try {
var message = JSON.parse(info[index]);
self.emit('message', message);
} catch (error) {
fragment = info[index];
continue;
}
}
}
});
Where "fragment" is defined somwhere where it will persist between data chunks.
But what is TERM? I have used the unicode replacement character '\uFFFD'. One could also use the technique used by twitter where messages are separated by '\r\n' and tweets use '\n' for new lines and never contain '\r\n'
I find this to be a lot simpler than messing with including lengths and such like.
Simplest solution is to send length of json data before each message as fixed-length prefix (4 bytes?) and have a simple un-framing parser which buffers small chunks or splits bigger ones.
You can try node-binary to avoid writing parser manually. Look at scan(key, buffer) documentation example - it does exactly line-by line reading.
As long as newlines (or whatever delimiter you use) will only delimit the JSON messages and not be embedded in them, you can use the following pattern:
let buf = ''
s.on('data', data => {
buf += data.toString()
const idx = buf.indexOf('\n')
if (idx < 0) { return } // No '\n', no full message
let lines = buf.split('\n')
buf = lines.pop() // if ends in '\n' then buf will be empty
for (let line of lines) {
// Handle the line
}
})

Categories