Parsing Excel sheet in Hebrew (.xlsx) to JSON produces question marks - javascript

I'm trying to parse Excel (*.xlsx) to a JSON object in Node JS , however all the columns with Hebrew characters are converted with question marks.
For example :
Here's the code :
"use strict";
const excelToJson = require("convert-excel-to-json");
// -> Read Excel File to Json Data
const excelData = excelToJson({
sourceFile: "customers.xlsx",
sheets: [
{
// Excel Sheet Name
name: "Customers",
header: {
rows: 1
}
}
]
});
Any idea how to fix it ?

I believe it's only your console that's showing invalid characters. Try dumping the excel file contents to file like so:
"use strict";
const excelToJson = require("convert-excel-to-json");
// -> Read Excel File to Json Data
const excelData = excelToJson({
sourceFile: "customers.xlsx",
sheets: [
{
// Excel Sheet Name
name: "Customers",
header: {
rows: 1
}
}
]
});
const fs = require("fs");
fs.writeFileSync("customers.json", JSON.stringify(excelData));
Then open in say Notepad++. You should see the Hebrew characters correctly. I'm getting exactly this behaviour. I see invalid characters in the command window, but it's all good when I open the customers.json file.
e.g.
{"Customers":[{"A":"לקוח 1"},{"A":"לקוח 2"}]}

Related

Date in XLSX file not parsing correctly in SheetJs

I am trying to read a XLSX file using sheetjs node-module with a column having dates. After parsing I got data in incorrect format
File data is : 2/17/2020
But after xlsx read it gives me 2/17/20. It is changing the year format. My requirement is to get the data as it is.
Sample Code:
var workbook = XLSX.readFile('a.xlsx', {
sheetRows: 10
});
var data = XLSX.utils.sheet_to_json(workbook.Sheets['Sheet1'], {
header: 1,
defval: '',
blankrows: true,
raw: false
});
There's a solution presented here using streams but can equally work with readFile. The problem that was at play in the issue is that the value for the dateNF option in sheet_to_json needs some escape characters.
E.g.:
const XLSX = require('xlsx');
const filename = './Book4.xlsx';
const readOpts = { // <--- need these settings in readFile options
cellText:false,
cellDates:true
};
const jsonOpts = {
header: 1,
defval: '',
blankrows: true,
raw: false,
dateNF: 'd"/"m"/"yyyy' // <--- need dateNF in sheet_to_json options (note the escape chars)
}
const workbook = XLSX.readFile(filename, readOpts);
const worksheet = workbook.Sheets['Sheet1'];
const json = XLSX.utils.sheet_to_json(worksheet, jsonOpts);
console.log(json);
For this input:
Will output:
[
[ 'v1', 'v2', 'today', 'formatted' ],
[ '1', 'a', '14/8/2021', '14/8/2021' ]
]
Without the dateNF: 'd"/"m","/yyyy' you get same problem as you describe in the question:
[
[ 'v1', 'v2', 'today', 'formatted' ],
[ '1', 'a', '8/14/21', '8/14/21' ]
]
The two potential unwanted side effects are:
use of the cellText and cellDates options in readFile
note my custom format of yyyy^mmm^dd in the input - the dateNF setting overrides any custom setting

How to split a big ODS file without causing memory leaks?

I'm working with a MYSQL database, and have two types of files to import:
First one is a CSV file that I can use
LOAD DATA INFILE 'path-to-csv_file'
The second type of file is ODS (OpenDocument Spreadsheet) that MYSQL doesn't support for LOAD DATA INFILE.
My solution was to convert ODS to CSV using xlsx package that have a XLSX.readfile command and then using csv-writer. But, when working with large ODS files, my program was crashing cause it was using to much memory. I searched for solutions and found streams but xlsx package doesn't have read streams. After this, I tried to use fs cause it has a fs.createReadStream command, but this module doesn't support ODS files. An example is comparing both returns in fs.readFile and xlsx.readFile.
fs.readFile:
PK♥♦m�IQ�l9�.mimetypeapplication/vnd.oasis.opendocument.spreadsheetPK♥♦m�IQM◄ˋ%�%↑Thumbnails/thumbnail.png�PNG
→
IHDR�♥A�-=♥PLTE►►☼§¶►∟↓*.!/<22/8768:G6AN>AM>BP>MaC:;A?GOE?EFJGJRJQ[TJEQOQ\QJYWYKVeX\dX]p\bkXetaNJgTEe[Wp^Wa_aja\ue\hfgektjqztkeqnpyqlwwvco�jw�j}�v{�q⌂�~�⌂{��t��t��u��z��y��|��{��{��}���o]�od�vj�|v�⌂n�⌂r��{��n��x��~��~������
XLSX.readFile:
J323: { t: 's', v: '79770000', w: '79770000' },
K323: { t: 's', v: '20200115', w: '20200115' },
Working with XLSX module is easy, cause I can pick up only the data that I want in this ODS file. Using a javascript code, I extract three columns and put it in an array:
const xlsx = require('xlsx');
let posts = [];
let post = {};
for(let i = 0; i < 1; i++){
let filePath = `C:\\Users\\me\\Downloads\\file_users.ODS`;
let workbook = xlsx.readFile(filePath);
let worksheet = workbook.Sheets[workbook.SheetNames[0]];
for (let cell in worksheet) {
const cellAsString = cell.toString();
cellAsString[0] === 'A' ? post['ID'] = worksheet[cell].v :
cellAsString[0] === 'C' ? post['USER NAME'] = worksheet[cell].v : null;
if (cellAsString[0] === 'J') {
post['USER EMAIL'] = worksheet[cell].v;
Object.keys(post).length == 3 ? posts.push(post) : null;
post = {}
}
}
}
...returns:
{
ID: '1',
'USER NAME': 'John Paul',
'USER EMAIL': 'Paul.John12#hotmail.com'
},
{
ID: '2',
'USER NAME': 'Julia',
'USER EMAIL': 'lejulie31312#outlook.com'
},
{
ID: '3',
'USER NAME': 'Greg Norton',
'USER EMAIL': 'thenorton31031#hotmail.com'
},
... 44660 more items
So, my problem is when working with large ODS files. The return above is when using this script with 78MB file, and is using 1.600MB of RAM. When I try to use this with 900MB files, my memory reaches the limit (4000MB+) and I got the error: 'ERR_STRING_TOO_LONG'
I tried to use readline package for parse the data, but it needs a stream.
If I have to slice the ODS files into small pieces, how could I read the file for this without crashing my vs code?

Convert JSON to CSV using js

i try to convert flatten json data to csv with js.
js code
import { saveAs } from "file-saver";
let data = [{}, {}] //data export from firebase as an array filled with objects
let dataJson = JSON.stringify(data);
let fileToSave = new Blob([dataJson ], {
type: "csv",
name: 'data.csv'
});
saveAs(fileToSave, 'data.csv');
my example json
[
{"kaab":{"ka11":6,"ka12":6,"ka10":6},"ae":{"a6":2,"a5":2,"a4":6},"kg3":"fdsf","kg2":4,"solz":"2","kg1":5,"ges":1,"kaak":{"ka4":5,"ka1":4,"ka5":3,"ka6":5,"ka3":5,"ka2":4},"eink":"","kawe":{"ka9":4,"ka7":5,"ka8":5},"soz2":"","alt":3,"zul":{"infl":1,"spi":1,"int":1,"les":1,"mer":1,"aut":1,"inf2":1},"kg4":2,"am":{"a1":5,"a3":2,"a2":2}}
{"kaab":{"ka11":6,"ka12":6,"ka10":6},"ae":{"a6":2,"a5":2,"a4":6},"kg3":"fdsf","kg2":4,"solz":"2","kg1":5,"ges":1,"kaak":{"ka4":5,"ka1":4,"ka5":3,"ka6":5,"ka3":5,"ka2":4},"eink":"","kawe":{"ka9":4,"ka7":5,"ka8":5},"soz2":"","alt":3,"zul":{"infl":1,"spi":1,"int":1,"les":1,"mer":1,"aut":1,"inf2":1},"kg4":2,"am":{"a1":5,"a3":2,"a2":2}}
]
I have used file-saver for this but with no big succes.
I get a csv file for excel, but something is still wrong
excel image
But I need this as an example
expected result
If any of you can help me I really appreciate it. It doesnt need to be done with file-saver

Problem generating buffer for nodejs csv file creation

Iam able to generate a csv file with the data below. I am using a nodejs library "csv-writer" that generates the file quite well. My problem is that I need a way to return back a buffer instead of the file itself. Reason being I need to upload the file to a remote server via sftp.
How do I go ab bout modifying this piece of code to enable buffer response? Thanks.
...
const csvWriter = createCsvWriter({
path: 'AuthHistoryReport.csv',
header: [
{id: 'NAME', title: 'msg_datetime_date'},
{id: 'AGE', title: 'msg_datetime'}
]
});
var rows = [
{ NAME: "Paul", AGE:21 },
{ NAME: "Charles", AGE:28 },
{ NAME: "Teresa", AGE:27 },
];
csvWriter
.writeRecords(rows)
.then(() => {
console.log('The CSV file was written successfully');
});
...
Read your own file with fs.readFile('AuthHistoryReport.csv', data => ... );. If you don't specify an encoding, then the returned data is a buffer, not a string.
fs.readFile('AuthHistoryReport.csv', 'utf8', data => ... ); Returns a string
fs.readFile('AuthHistoryReport.csv', data => ... ); Returns a buffer
Nodejs file system #fs.readFile
You need to store your created file in a buffer using the native package fs
const fs = require('fs');
const buffer = fs.readFileSync('AuthHistoryReport.csv');

CSV as a signle line string with newlines

I have an existing service to create files from strings and zip them up. It currently takes takes JSON as a string as such:
data: ${ JSON.stringify(data) }
which works. However I want to in the came way express that data as a csv.
assume the data is:
json
[
{
"test1": "1",
"test2": "2",
"test3": "3",
},
{
"test1": "1",
"test2": "2",
"test3": "3",
},
]
I have found many good node json to csv libraries, I ended up using json2csv and have found that I can successfully create a csv file for one line of data, but not two as follows.
// works (header row)
const test = "\"test1\",\"test2\",\"test3\"";
const csvStr = `data:text/csv;charset=UTF-8, ${ test }`;
// fails (header row + 1 data row)
const test = "\"test1\",\"test2\",\"test3\"\n\"test1\",\"test2\",\"test3\"";
const csvStr = `data:text/csv;charset=UTF-8, ${ test }`;
Based on these tests I believe the issue is with how the newline / carriage return is being used. If this is even possible. Anyone know what I might be doing wrong here? If it's possible to express a CSV file on a single line with line breaks?

Categories