Javascript: Parse each CSV column to separate JSON - javascript

I am using csvtojson npm library, and i read options, but cannot figure out, how can i make separate json files from CSV columns.
The CSV format is :
invoiceNumber;taxNumber
test-1;00000000
test-2
for options i am using: {columns: false, delimiter: ';'}
and the output is :
{ invoiceNumber: 'test-1', taxNumber: '00000000' }
{ invoiceNumber: 'test-2' }
however the expected output is :
{ invoiceNumber: 'test-1'}
{ taxNumber: '00000000' }
{ invoiceNumber: 'test-2' }
Is it possible with any csv parser npm library?

For this task, I recommend using lodash. After csttojson try this script
const _ = require('lodash')
const arr = [{ invoiceNumber: 'test-1', taxNumber: '00000000' },
{ invoiceNumber: 'test-2' }];
console.log(_.flatMap(arr, item=>_.map(_.toPairs(item), d => _.fromPairs([d]))))

Related

Date in XLSX file not parsing correctly in SheetJs

I am trying to read a XLSX file using sheetjs node-module with a column having dates. After parsing I got data in incorrect format
File data is : 2/17/2020
But after xlsx read it gives me 2/17/20. It is changing the year format. My requirement is to get the data as it is.
Sample Code:
var workbook = XLSX.readFile('a.xlsx', {
sheetRows: 10
});
var data = XLSX.utils.sheet_to_json(workbook.Sheets['Sheet1'], {
header: 1,
defval: '',
blankrows: true,
raw: false
});
There's a solution presented here using streams but can equally work with readFile. The problem that was at play in the issue is that the value for the dateNF option in sheet_to_json needs some escape characters.
E.g.:
const XLSX = require('xlsx');
const filename = './Book4.xlsx';
const readOpts = { // <--- need these settings in readFile options
cellText:false,
cellDates:true
};
const jsonOpts = {
header: 1,
defval: '',
blankrows: true,
raw: false,
dateNF: 'd"/"m"/"yyyy' // <--- need dateNF in sheet_to_json options (note the escape chars)
}
const workbook = XLSX.readFile(filename, readOpts);
const worksheet = workbook.Sheets['Sheet1'];
const json = XLSX.utils.sheet_to_json(worksheet, jsonOpts);
console.log(json);
For this input:
Will output:
[
[ 'v1', 'v2', 'today', 'formatted' ],
[ '1', 'a', '14/8/2021', '14/8/2021' ]
]
Without the dateNF: 'd"/"m","/yyyy' you get same problem as you describe in the question:
[
[ 'v1', 'v2', 'today', 'formatted' ],
[ '1', 'a', '8/14/21', '8/14/21' ]
]
The two potential unwanted side effects are:
use of the cellText and cellDates options in readFile
note my custom format of yyyy^mmm^dd in the input - the dateNF setting overrides any custom setting

How to split a big ODS file without causing memory leaks?

I'm working with a MYSQL database, and have two types of files to import:
First one is a CSV file that I can use
LOAD DATA INFILE 'path-to-csv_file'
The second type of file is ODS (OpenDocument Spreadsheet) that MYSQL doesn't support for LOAD DATA INFILE.
My solution was to convert ODS to CSV using xlsx package that have a XLSX.readfile command and then using csv-writer. But, when working with large ODS files, my program was crashing cause it was using to much memory. I searched for solutions and found streams but xlsx package doesn't have read streams. After this, I tried to use fs cause it has a fs.createReadStream command, but this module doesn't support ODS files. An example is comparing both returns in fs.readFile and xlsx.readFile.
fs.readFile:
PK♥♦m�IQ�l9�.mimetypeapplication/vnd.oasis.opendocument.spreadsheetPK♥♦m�IQM◄ˋ%�%↑Thumbnails/thumbnail.png�PNG
→
IHDR�♥A�-=♥PLTE►►☼§¶►∟↓*.!/<22/8768:G6AN>AM>BP>MaC:;A?GOE?EFJGJRJQ[TJEQOQ\QJYWYKVeX\dX]p\bkXetaNJgTEe[Wp^Wa_aja\ue\hfgektjqztkeqnpyqlwwvco�jw�j}�v{�q⌂�~�⌂{��t��t��u��z��y��|��{��{��}���o]�od�vj�|v�⌂n�⌂r��{��n��x��~��~������
XLSX.readFile:
J323: { t: 's', v: '79770000', w: '79770000' },
K323: { t: 's', v: '20200115', w: '20200115' },
Working with XLSX module is easy, cause I can pick up only the data that I want in this ODS file. Using a javascript code, I extract three columns and put it in an array:
const xlsx = require('xlsx');
let posts = [];
let post = {};
for(let i = 0; i < 1; i++){
let filePath = `C:\\Users\\me\\Downloads\\file_users.ODS`;
let workbook = xlsx.readFile(filePath);
let worksheet = workbook.Sheets[workbook.SheetNames[0]];
for (let cell in worksheet) {
const cellAsString = cell.toString();
cellAsString[0] === 'A' ? post['ID'] = worksheet[cell].v :
cellAsString[0] === 'C' ? post['USER NAME'] = worksheet[cell].v : null;
if (cellAsString[0] === 'J') {
post['USER EMAIL'] = worksheet[cell].v;
Object.keys(post).length == 3 ? posts.push(post) : null;
post = {}
}
}
}
...returns:
{
ID: '1',
'USER NAME': 'John Paul',
'USER EMAIL': 'Paul.John12#hotmail.com'
},
{
ID: '2',
'USER NAME': 'Julia',
'USER EMAIL': 'lejulie31312#outlook.com'
},
{
ID: '3',
'USER NAME': 'Greg Norton',
'USER EMAIL': 'thenorton31031#hotmail.com'
},
... 44660 more items
So, my problem is when working with large ODS files. The return above is when using this script with 78MB file, and is using 1.600MB of RAM. When I try to use this with 900MB files, my memory reaches the limit (4000MB+) and I got the error: 'ERR_STRING_TOO_LONG'
I tried to use readline package for parse the data, but it needs a stream.
If I have to slice the ODS files into small pieces, how could I read the file for this without crashing my vs code?

How to make complex Json fit a Javascript object

The backend of my webapp, written in node.js interacts with Json file, with a specific format that I thought not so complex but apparently is.
The structure of my json file is as such :
{
"data": [
{
"somefield": "ioremipsum",
"somedate" : "2018-08-23T11:48:00Z",
"someotherdate" : "2018-08-23T13:43:00Z",
"somethingelse":"ioremipsum",
"files": [
{
"specificfieldinarray": "ioremipsum",
"specificotherfieldinarray": "ioremipsum"
},
{
"specificfieldinarray": "ioremipsum",
"specificotherfieldinarray": "ioremipsum"
},
{
"specificfieldinarray": "ioremipsum",
"specificotherfieldinarray": "ioremipsum"
}
]
}
]
}
I try to make this answer fit a JS object like this :
const file = require('specificJsonFile.json');
let fileList = file;
And I need to loop through my 'files' array, for further treatments, but unfortunately, my JS object looks like this :
{ data:
[ { somefield: "ioremipsum",
somedate : "2018-08-23T11:48:00Z",
someotherdate : "2018-08-23T13:43:00Z",
somethingelse:"ioremipsum",
files: [Array] } ] }
Please forgive me if this is obvious, for I am still a beginner with JS.
That's only how console.log logs deep objects. To get a deeper output, you can use util.inspect
const util = require('util');
console.log(util.inspect(yourObject, {showHidden: false, depth: null}));
To loop each data's files, simply loop data, then its files
yourObject.data.forEach(d => {
d.files.forEach(file => console.log(file));
});
It looks like there is nothing wrong there and the console is abbreviating the log.
Try accessing the files list with the following code:
const filesList = file.data[0].files
and then
console.log(filesList) to check that it's eventually working.
Hope it helps!
let fileList = file.data[0].files;
This will create an array of only your files array.
You can console.log(fileList)
Or whatever you like with the data.
Based on your comment, try the of keyword instead of in keyword to get the behaviour you expected.
for (let file of fileList){
console.log(file);
}
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for...of
You can use for in
for (item in fileList.data) {
for (file in fileList.data[item].files) {
let data = fileList.data[item].files[file];
// process the data
}
}

nodejs unable to parse date with momentjs

I am trying to parse some csv data, however, I am not able to parse date with momentjs library.
var csv = require('csv-parser');
var fs = require('fs');
var moment = require('moment');
const bittrexDateFormat = "MM/DD/YYYY hh:mm:ss a";
var count = 0;
fs.createReadStream('orders.csv')
.pipe(csv({
headers: ['OrderUuid', 'Exchange', 'Type', 'Quantity', 'Limit', 'CommissionPaid', 'Price', 'Opened', 'Closed']
}))
.on('data', function(data) {
var createDate = moment(data.Opened, bittrexDateFormat);
console.log(createDate.toDate());
});
And the csv data looks like;
OrderUuid,Exchange,Type,Quantity,Limit,CommissionPaid,Price,Opened,Closed
24245deb-134c-4da7-990e-8d22d8fd728c,BTC-STRAT,LIMIT_SELL,77.12739479,0.00087503,0.00016874,0.06749952,12/24/2017 12:09:20 AM,12/24/2017 12:09:21 AM
And this the output;
0002-01-02T09:00:02.000Z
On the other hand, if I directly hardcode the date string I am able to get Date object.
var createDate = moment("12/24/2017 12:09:20 AM", bittrexDateFormat);
console.log(createDate.toDate());
Another thing I figured out is if I print data in event .on('data') it prints encoded string version
Row {
OrderUuid: 'O\u0000r\u0000d\u0000e\u0000r\u0000U\u0000u\u0000i\u0000d\u0000',
Exchange: '\u0000E\u0000x\u0000c\u0000h\u0000a\u0000n\u0000g\u0000e\u0000',
Type: '\u0000T\u0000y\u0000p\u0000e\u0000',
Quantity: '\u0000Q\u0000u\u0000a\u0000n\u0000t\u0000i\u0000t\u0000y\u0000',
Limit: '\u0000L\u0000i\u0000m\u0000i\u0000t\u0000',
CommissionPaid: '\u0000C\u0000o\u0000m\u0000m\u0000i\u0000s\u0000s\u0000i\u0000o\u0000n\u0000P\u0000a\u0000i\u0000d\u0000',
Price: '\u0000P\u0000r\u0000i\u0000c\u0000e\u0000',
Opened: '\u0000O\u0000p\u0000e\u0000n\u0000e\u0000d\u0000',
Closed: '\u0000C\u0000l\u0000o\u0000s\u0000e\u0000d\u0000\r\u0000' }
Row {
OrderUuid: '\u00002\u00004\u00002\u00004\u00005\u0000d\u0000e\u0000b\u0000-\u00001\u00003\u00004\u0000c\u0000-\u00004\u0000d\u0000a\u00007\u0000-\u00009\u00009\u00000\u0000e\u0000-\u00008\u0000d\u00002\u00002\u0000d\u00008\u0000f\u0000d\u00007\u00002\u00008\u0000c\u0000',
Exchange: '\u0000B\u0000T\u0000C\u0000-\u0000S\u0000T\u0000R\u0000A\u0000T\u0000',
Type: '\u0000L\u0000I\u0000M\u0000I\u0000T\u0000_\u0000S\u0000E\u0000L\u0000L\u0000',
Quantity: '\u00007\u00007\u0000.\u00001\u00002\u00007\u00003\u00009\u00004\u00007\u00009\u0000',
Limit: '\u00000\u0000.\u00000\u00000\u00000\u00008\u00007\u00005\u00000\u00003\u0000',
CommissionPaid: '\u00000\u0000.\u00000\u00000\u00000\u00001\u00006\u00008\u00007\u00004\u0000',
Price: '\u00000\u0000.\u00000\u00006\u00007\u00004\u00009\u00009\u00005\u00002\u0000',
Opened: '\u00001\u00002\u0000/\u00002\u00004\u0000/\u00002\u00000\u00001\u00007\u0000 \u00001\u00002\u0000:\u00000\u00009\u0000:\u00002\u00000\u0000 \u0000A\u0000M\u0000',
Closed: '\u00001\u00002\u0000/\u00002\u00004\u0000/\u00002\u00000\u00001\u00007\u0000 \u00001\u00002\u0000:\u00000\u00009\u0000:\u00002\u00001\u0000 \u0000A\u0000M\u0000' }
I am pretty new to nodejs but I don't think the problem occurs from either momentjs or csv-parser libraries. Instead it should be string format of stream api nodejs. Thanks a lot.
I just ran your code, and it worked just fine.
Try updating your node version or try invoking the function with
fs.createReadStream('orders.csv', 'utf16le')
the encoding should make the difference...
https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options

Converting a file utf8 with fast-csv module

I have a file with name "file.csv", this file have data below:
ID Full name
1 Steve
2 John
3 nam
4 Hạnh
5 Thủy
I use segment code below to parse this file to json file. But my results is not utf8
Code:
var fastCsv = require("fast-csv");
var fs = require("fs");
var iconv = require('iconv-lite');
var fileStream = fs.createReadStream("file.csv");
fastCsv
.fromStream(fileStream, {headers : ["id", "full_name"]})
.on("data", function(data){
console.log("------------------------");
console.log("data: ", data);
})
.on("end", function(){
console.log("done");
});
Results:
data: { id: '��I\u0000D\u0000', full_name: '\u0000F\u0000u\u0000l\u0000l\u0000 \u0000n\u0000a\u0000m\u0000e\u0000' }
data: { id: '\u00001\u0000',full_name: '\u0000S\u0000t\u0000e\u0000v\u0000e\u0000' }
data: { id: '\u00002\u0000',full_name: '\u0000J\u0000o\u0000h\u0000n\u0000' }
data: { id: '\u00003\u0000',full_name: '\u0000n\u0000a\u0000m\u0000' }
data: { id: '\u00004\u0000', full_name: '\u0000H\u0000�\u001en\u0000h\u0000' }
data: { id: '\u00005\u0000',full_name: '\u0000T\u0000h\u0000�\u001ey\u0000' }
data: { id: '\u0000', full_name: '' }
How to convert my result to utf8?
Your input file is encoded in UTF-16LE, but it has been read as if it were UTF-8.
Try opening the file with fs.createReadStream('file.csv', {encoding: 'utf-16le'}).
Take a look at Javascript Has a Unicode Problem
In your case you need to decode the escaped unicode chars. A library included with node called punycode can handle this.
Import punycode via:
var punycode = require("punycode");
Change:
console.log("firstName: ", data);
To:
console.log("firstName: ", punycode.ucs2.decode(data));
You might have to break down the data object further to decode it's properties but I can't tell from your answer what their structure is.

Categories