How to parse Xlsx file and search columns by NLP JSON dictionary - javascript

I'm working on a kind of magical xlsx parsing where I need to parse a complex xlsx file (it contains some presentational and useless informations as a header, and a data table).
What I'm trying to do is parsing the file (that's cool), and I'm trying to figure out where the table starts. For that what I want to do, is having a kind on dictionary as JSON.
Since the xlsx's files are dynamics and are all supposed to be some product ordering topics, I want to find the table's header row by "blindly" looking for column names according to the JSON dictionary below.
{
"productName": [
"description",
"product",
"name",
"product name"
],
"unit": [
"unit",
"bundle",
"package",
"packaging",
"pack"
]
}
Here for example, we suppose one column header is going to be "bundle", and another "description". Thanks to my json, I'm suppose to be able to find the key "productName" and "unit". It's a kind of synonym searching actually.
Questions are :
1/ Is there a clean data structure or an efficient for doing doing this search ? The Json could be huge with memory leaks, I want it to be as fast as possible.
2/ I know this is not precise, but it could work. So do you have any suggestion how to to it or how would you do it ?
3/ The operation is going to be costly because I have to parse this xlsx file, and in the mean time I have to parse the JSON to do my dictionary thing, do you have any advice ?
I suppose I should to both parsing in a stream way so I only parse one row of the xlsx at a time, and I make my JSON parsing and research of that row before jumping to the next one ?
Here is the start of the parsing code.
const ExcelJS = require('exceljs');
const { FILE_EXTESTIONS } = require('../constants');
const { allowExtention } = require('../helpers');
const executeXlsxParsing = () => {
const filePath = process.argv.slice(2)[1];
const workbook = new ExcelJS.Workbook();
const isAllowed = allowExtention(FILE_EXTESTIONS.xlsx, filePath)
if (!filePath || !isAllowed) return;
return workbook.xlsx.readFile(filePath).then(() => {
var inboundWorksheet = workbook.getWorksheet(1);
inboundWorksheet.eachRow({ includeEmpty: true }, (row, rowNumber) => {
console.log('Row ' + rowNumber + ' = ' + JSON.stringify(row.values));
});
});
};
module.exports.executeXlsxParsing = executeXlsxParsing();
Thank you all :)

Related

Parse the data read from csv file with Nodejs ExcelJS package

With NodeJs I need to fill the Excel file with the data fetched from the csv file.
I am using the ExcelJS npm package.
I sucessfully read the data from csv fiel and write it in console.log() but the problem is, it is very strange format.
Code:
var Excel = require("exceljs");
exports.generateExcel = async () => {
let workbookNew = new Excel.Workbook();
let data = await workbookNew.csv.readFile("./utilities/file.csv");
const worksheet = workbookNew.worksheets[0];
worksheet.eachRow(function (row: any, rowNumber: number) {
console.log(JSON.stringify(row.values));
});
};
Data looks like this:
[null,"Users;1;"]
[null,"name1;2;"]
[null,"name2;3;"]
[null,"name3;4;"]
[null,"Classes;5;"]
[null,"class1;6;"]
[null,"class2;7;"]
[null,"class3;8;"]
[null,"Teachers;9;"]
[null,"teacher1;10;"]
[null,"teacher2;11;"]
[null,"Grades;12;"]
[null,"grade1;13;"]
[null,"grade2;14;"]
[null,"grade3;15;"]
So the Excel file which I need to fill with this data is very complex.. In specific cells I need to insert the users, in other sheet I need some images with grades, etc...
The Main question for me is:
How can I parse and store the data which is displayed in my console.log() in separate variables like Users in separate variable, Grades in separate variable and Teachers in separate variable.
Example for users:
users = {
title: "Users",
names: ["name1", "name2", "name3"],
};
There is no need to be exactly the same as example, but the something that can be reused when I will read different csv files with same structure and so I could easily access to the specific data and put it in specific cell in the Excel file.
Thank you very much.
I prepared example, how could you pare your file. As it was proposed in one answer above we use fast-csv. The parsing is quite simple you split by separator and than took line[0] which is first element.
const fs = require('fs');
const csv = require('#fast-csv/parse');
fs.createReadStream('Test_12345.csv')
.pipe(csv.parse())
.on('error', error => console.error(error))
.on('data', function (row) {
var line = String(row)
line = line.split(';')
console.log(`${line[0]}`)
})
.on('end', rowCount => console.log(`Parsed ${rowCount} rows`));
If we put for input like this:
Value1;1101;
Value2;2202;
Value3;3303;
Value4;4404;
your output is in this case like this:
Value1
Value2
Value3
Value4

Postman Pre-Request Script to Read Variables from CSV file and Append to JSON Body

I have a JSON body with template to pass to an API endpoint in Postman/Newman as follows:
{
"total": [
{
"var1a": "ABCDE",
"var1b": {
"var2a": "http://url/site/{{yyyy}}/{{mmdd}}/{{yyyymmdd}}.zip",
"var2b": {
"var3a": "{{yyyymmdd}}",
"var3b": "GKG"
},
"var2c": "ABCDE"
"var2d": "{{p}}"
}
},
{
"var1a": "ABCDE",
"var1b": {
"var2a": "http://url/site/{{yyyy}}/{{mmdd}}/{{yyyymmdd}}.zip",
"var2b": {
"var3a": "{{yyyymmdd}}",
"var3b": "GKG"
},
"var2c": "ABCDE"
"var2d": "{{p}}"
}
}
}
Everything enclosed in double braces is a variable that needs to read from an external CSV file, which looks like:
p,yyyymmdd,yyyy,mmdd
1,19991231,1999,1231
2,20001230,2000,1230
Note that the CSV isn't necessarily 2 entries long - I'd ideally like it to take in a dynamic number of entries that append after the last.
Ideally, I'd get rid of the last two columns in the CSV file (yyyy and mmdd) and have the code just do a slice as needed.
How do I even begin coding the pre-request script?
Even if reading from an external CSV is not possible, or even coding within the pre-request script to do this, what is a quick one-liner in Javascript or Python that can quickly give me the output I need so that I can just manually place it in the request body?
I even tried using the Postman runner to do a "fake" run so that I can quickly extract the different JSON bodies, but even with this, there is no clean way that it does the export...
Reading data from CSV is not the good way to solve problem because each record is will be used to one request. You want the whole record into ONE request.
I think write code in pre-request script will work.
In body tab:
In pre-rquest tab:
let data_set = [
[1, 1999, 1231],
[2, 2000, 1230]
];
let total = [];
for (let i = 0; i < data_set.length; i++) {
let p = data_set[i][0];
let yyyy = data_set[i][1];
let mmdd = data_set[i][2];
const item = {
var1a: "ABCDE",
var1b: {
var2a: `http://url/site/${yyyy}/${mmdd}/${yyyy}${mmdd}.zip`,
var2b: {
var3a: `${yyyy}${mmdd}`,
var3b: "GKG"
},
var2c: "ABCDE",
var2d: `${p}`
}
};
total.push(item);
}
pm.environment.set("total", JSON.stringify(total));

How to get specific key and value from a long json object while iterating it in node.js

I am trying to parse a csv file in node.js , i am able to parse the csv file and can print the content , the contents are coming as a from of a json object.Now my target is to iterate the json object and take out specific key and values from each block and use them in a Query which will do some DB operations.But the problem is while i am trying to iterate the json only first key and values of the first block is printed. Let me post the code what i have done
fs.createReadStream(path)
.pipe(csv.parse({headers:true ,ignoreEmpty : true}))
.on("error",(error) => {
throw error.message;
})
.on("data",function(data){
if(data && data!=={}){
Object.keys(data).forEach(function(k){
if(k==='name' || k==='Office'){
let selectQury = `select name,Office from myTable where name = ${data['name']} and Office
=${data[Office]};
db.query(selectQury,(err,res)=>{
if(err){
console.log('error',null);
This my json which i parse from the csv looks like
{
id:1,
name:"AS",
Office:"NJ"
........
ACTIVE: 1.
},
{
id:2,
name:"AKJS",
Office:"NK"
........
ACTIVE: 2.
}
so now what i want is in the select Query the parameters will be passed like
let selectQury = `select name,Office from myTable where name = "AS" and Office = "NJ";
in the first iteration
let selectQury = `select name,Office from myTable where name = "AKJS" and Office = "NK";
in the second iteration and so on as the csv grows.
I am not able to do it ,please help . Thanks in advance. I am new to node.js & tricky javascript operations.

(NodeJS, large JSON file) | stream-json npm package error: Javascript heap out of memory

stream-json noob here. I'm wondering why the below code is running out of memory.
context
I have a large JSON file. The JSON file's structure is something like this:
[
{'id': 1, 'avg_rating': 2},
{'id': 1, 'name': 'Apple'}
]
I want to modify it to be
[
{'id': 1, 'avg_rating': 2, 'name': 'Apple'}
]
In other words, I want to run a reducer function on each element of the values array of the JSON (Object.values(data)) to check if the same id is entered into different keys in the json, and if so "merge" that into one key.
The code I wrote to do this is:
var chunk = [{'id': 1, 'name': 'a'},{'id': 1, 'avg_rating': 2}]
const result = Object.values(chunk.reduce((j, c) => {
if (j[c.id]) {
j[c.id]['avg_rating'] = c.avg_rating
} else {
j[c.id] = { ...c };
}
return j;
}, {}));
console.log(result)
The thing is, you cannot try to run this on a large JSON file without running out of memory. So, I need to use JSON streaming.
the streaming code
Looking at the stream-json documentation, I think I need to use a Parser to take in text as a Readable stream of objects and output a stream of data items as Writeable buffer/text "things".
The code I can write to do that is:
const {chain} = require('stream-chain');
const {parser} = require('stream-json/Parser');
const {streamValues} = require('stream-json/streamers/StreamValues');
const fs = require('fs');
const pipeline = chain([
fs.createReadStream('test.json'),
parser(),
streamValues(),
data => {
var chunk = data.value
const result = Object.values(chunk.reduce((j, c) => {
if (j[c.id]) {
j[c.id]['avg_rating'] = c.avg_rating
} else {
j[c.id] = { ...c };
}
return j;
}, {}));
//console.log(result)
return JSON.stringify(result);
},
fs.createWriteStream(fpath)
])
To create a write stream (since I do want an output json file), I just added to the parse function above fs.createWriteStream(filepath) , but it looks like -- while this works on a small sample -- this doesn't work for a large JSON file: I get the error "heap out of memory".
attempts to fix
I think the main issue of the code is that "chunk" philosophy is wrong. If this works via "streaming" a JSON line by line (?), then "chunk" might be trying to save all the data that the program has run into so far, whereas I really only want it to run a reducer function in batches. I then am kind of back at square one .. how would I merge the key-value pairs of a JSON if the id is the same?
If the data custom code isn't the problem, then I get the feeling I need to use a Stringer , since I want to edit a stream with custom code, and save it back to a file.
However, I can't seem to get how Stringer reads data, as the below code runs an error where data is undefined:
const pipeline = chain([
fs.createReadStream('testjson'),
parser(),
data => {
var chunk = data.value
const result = Object.values(chunk.reduce((j, c) => {
if (j[c.id]) {
j[c.id]['avg_rating'] = c.avg_rating
} else {
j[c.id] = { ...c };
}
return j;
}, {}));
console.log(result)
return JSON.stringify(result);
},
stringer(),
zlib.Gzip(),
fs.createWriteStream('edited.json.gz')
])
I would greatly appreciate any advice on this situation or any help diagnosing the problems in my approach.
Thank you!!
While this is certainly an interesting question - I have the liberty to just restructure how the data's scraped, and as such can bypass having to do this at all.
Thanks all!

Convert Facebook json file sequences like \u00f0\u009f\u0098\u008a to emoji characters

I have downloaded my Facebook data as json files. The json files for my posts contain emojis, which appear something like this in the json file: \u00f0\u009f\u0098\u008a. I want to parse this json file and extract the posts with the correct emojis.
I can't find a way to load this json file into a json object (using JavaScript) then read (and output) the post with the correct emojis.
(Eventually I will upload these posts to WordPress using its REST API, which I've worked out how to do.)
My program is written in JavaScript and run using nodejs from the command line. I've parsed the file using:
const fs = require('fs')
let filetext = fs.readFileSync(filename, 'utf8')
let jsonObj = JSON.parse(filetext)
However, when I output the data (using something like jsonObj.status_updates.data[0].post), I get strange characters for the emoji, like Happy birthday ├░┬ƒ┬ÿ┬è instead of Happy birthday 😊. This is not a Windows 10 console display issue because I've piped the output to a file also.
I've used the answer Decode or unescape \u00f0\u009f\u0091\u008d to 👍 to change the \uXXXX sequences in the json file to actual emojis before parsing the file. However, then JSON.parse does not work. It gives this message:
SyntaxError: Unexpected token o in JSON at position 1
at JSON.parse (<anonymous>)
So I'm in a bind: if I convert the \uXXXX sequences before trying to parse the json file, the JavaScript json parser has an error. If I don't convert the \uXXXX sequences then the parsed file in the form of a json object does not provide the correct emojis!
How can I correctly extract data, including emojis, from the json file?
I believe you should be able to do all this in Node.js, here's an example.
I've tested this using Visual Studio Code.
You can try it here: https://repl.it/repls/BrownAromaticGnudebugger
Note: I've updated processMessageas per #JakubASuplicki's very helpful comments to only look at string properties.
index.js
const fs = require('fs')
let filename = "test.json";
let filetext = fs.readFileSync(filename, "utf8");
let jsonObj = JSON.parse(filetext);
console.log(jsonObj);
function decodeFBString(str) {
let arr = [];
for (var i = 0; i < str.length; i++) {
arr.push(str.charCodeAt(i));
}
return Buffer.from(arr).toString("utf8");
}
function processMessages (messageArray) {
return messageArray.map(processMessage);
}
function processMessage(message) {
return Object.keys(message).reduce((obj, key) => {
obj[key] = (typeof message[key] === "string") ? decodeFBString(message[key]): message[key];
return obj
}, {});
}
let messages = processMessages(jsonObj.messages);
console.log("Input: ", jsonObj.messages);
console.log("Output: ", messages);
test.json
{
"participants": [
{
"name": "Philip Marlowe"
},
{
"name": "Terry Lennox"
}
],
"messages": [
{
"sender_name": "Philip Marlowe",
"timestamp_ms": 1546857175,
"content": "Meet later? \u00F0\u009F\u0098\u008A",
"type": "Generic"
},
{
"sender_name": "Terry Lennox",
"timestamp_ms": 1546857177,
"content": "Excellent!! \u00f0\u009f\u0092\u009a",
"type": "Generic"
}
]
}

Categories