I am parsing a csv file with following contents using csv-parse -
userID,sysID
20,50
30,71
However, on the objects returned it isn't possible to access the property created from the first column userID.
Here is my code --
async function main(){
let systemIDs = await getSystemIds('./systems.csv');
console.log(`Scanning data for ${systemIDs.length} systems..`);
console.log(systemIDs[0]);
console.log(systemIDs[0].userID); // This prints undefined
console.log(systemIDs[0].sysID); // This prints the correct value
}
async function getSystemIds(path){
let ids= [];
await new Promise ((resolve,reject)=>{
const csvParser = csvParse({columns:true, skip_empty_lines: true});
FS.createReadStream(path)
.pipe(csvParser)
.on('readable', ()=>{
let record ;
while(record = csvParser.read()) {
ids.push(record);
}
})
.on('finish',()=>{
resolve();
});
});
return ids;
}
Output -
Scanning data for 2 systems..
{ 'userID': '20', sysID: '50' }
undefined // <== The Problem
50
I notice the first column key userID has single quotes around it in the console output where as sysID doesn't. But don't know what is causing them.
Figured it out myself in the end...
I needed the BOM option. The documentation states it should be set to true for UTF-8 files. But it defaults to false.
Excel by default generates csv files with BOM as the first character in CSV files. This gets picked up as part of the header (and key name) by the parser.
With the bom option set to true, it can handle csv files generated from excel or other programs.
const csvParser = csvParse({
columns: true,
skip_empty_lines: true,
bom: true
});
Related
I can't believe that I'm asking an obvious question, but I still get the wrong in console log.
Console shows crawl like "[]" in the site, but I've checked at least 10 times for typos. Anyways, here's the javascript code.
I want to crawl in the site.
This is the kangnam.js file :
const axios = require('axios');
const cheerio = require('cheerio');
const log = console.log;
const getHTML = async () => {
try {
return await axios.get('https://web.kangnam.ac.kr', {
headers: {
Accept: 'text/html'
}
});
} catch (error) {
console.log(error);
}
};
getHTML()
.then(html => {
let ulList = [];
const $ = cheerio.load(html.data);
const $allNotices = $("ul.tab_listl div.list_txt");
$allNotices.each(function(idx, element) {
ulList[idx] = {
title : $(this).find("list_txt title").text(),
url : $(this).find("list_txt a").attr('href')
};
});
const data = ulList.filter(n => n.title);
return data;
}). then(res => log(res));
I've checked and revised at least 10 times
Yet, Js still throws this result :
root#goorm:/workspace/web_platform_test/myapp/kangnamCrawling(master)# node kangnam.js
[]
Mate, I think the issue is you're parsing it incorrectly.
$allNotices.each(function(idx, element) {
ulList[idx] = {
title : $(this).find("list_txt title").text(),
url : $(this).find("list_txt a").attr('href')
};
});
The data that you're trying to parse for is located within the first index of the $(this) array, which is really just storing a DOM Node. As to why the DOM stores Nodes this way, it's most likely due to efficiency and effectiveness. But all the data that you're looking for is contained within this Node object. However, the find() is superficial and only checks the indexes of an array for the conditions you supplied, which is a string search. The $(this) array only contains a Node, not a string, so when you you call .find() for a string, it will always return undefined.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/find
You need to first access the initial index and do property accessors on the Node. You also don't need to use $(this) since you're already given the same exact data with the element parameter. It's also more efficient to just use element since you've already been given the data you need to work with.
$allNotices.each(function(idx, element) {
ulList[idx] = {
title : element.children[0].attribs.title,
url : element.children[0].attribs.href
};
});
This should now populate your data array correctly. You should always analyze the data structures you're parsing for since that's the only way you can correctly parse them.
Anyways, I hope I solved your problem!
I am trying to parse a pdf with more than 300 page. I am using pdf-parse npm package.
The pdf has 300 pages. But my application crashes to while parsing the pdf.
My question is that is there way by which i can parse one page at a time?
Below is the code I have tried.
function render_page(pageData) {
//check documents https://mozilla.github.io/pdf.js/
let render_options = {
//replaces all occurrences of whitespace with standard spaces (0x20). The default value is `false`.
normalizeWhitespace: false,
//do not attempt to combine same line TextItem's. The default value is `false`.
disableCombineTextItems: false
}
return pageData.getTextContent(render_options)
.then(function (textContent) {
return textContent.items.map(function (s) {
return s.str
}).join(''); // value page text
})
}
//textContent.items.map
//.map(function (s) { return s.str; }).join('{newline}'); // value page text
let dataBuffer = fs.readFileSync('male.pdf');
const options = {
// internal page parser callback
// you can set this option, if you need another format except raw text
pagerender: render_page,
// max page number to parse
max: 4,
//check https://mozilla.github.io/pdf.js/getting_started/
version: 'v1.10.100'
}
pdf(dataBuffer, options).then(function (data) {
res.send(data)
})
I'm currently searching for a way to create a JSON file (versions.json) with a key and a value from an object within JavaScript. To create the JSON file, I've this object here:
["V1_config-interfaces.json","V2_config-interfaces.json","V3_config-interfaces.json","versions.json"]
I need to loop now some way over this object and check if the current file is not the versions.json because this is the created file.
The JSON file must looks like this:
{
"V1": "V1_config-interfaces.json",
"V2": "V2_config-interfaces.json",
"V3": "V3_config-interfaces.json"
}
So the key is always the version number before the underscore. What I've tried is this here:
const fs = require('fs');
const interfaces = fs.readdirSync('./src/interfaces/');
fs.writeFile('./src/interfaces/versions.json', JSON.stringify(interfaces), (err) => {
if (err) throw err;
console.log('versions.js successfully created');
});
But this generates the same result like the object looks like. So how can I reach my goals?
Use Array#reduce and regex. This strips the file version and adds it as a key to your object and ignores anything that doesn't have a version number. It also checks if the version has _ character following immediately after.
const data = ["V1_config-interfaces.json","V2_config-interfaces.json","V3_config-interfaces.json","versions.json", "V4shouldntwork.json", "shouldntwork_V5_.json", "V123_shouldwork.json"];
const res = data.reduce((a,v)=>{
const version = v.match(/^V[1-9]+(?=_)/);
if(version === null) return a;
a[version.shift()] = v;
return a;
}, {});
console.log(res);
I'm working on a CSV uploader that uses PapaParse as it's CSV parser. For my CSV I would like my first column to act as my header for the parsed data as opposed to the first row. In order to get the expected outcome, I've been having to manually transpose the CSV in the editor before uploading.
The reason for this is that my users find it much easier to edit the CSV when the headers are in the first column and not the first row. Is there a way I can do this in PapaParse (or even JavaScript outside of PapaParse)?
if (file != null) {
Papa.parse(file, {
header: true,
complete: function (results, file) {
console.log("Parsing complete: ", results, file);
}
});
}
I would suggest to parse the array with PapaParse and then perform transpose over the result with JS.
Using this method: https://stackoverflow.com/a/4492703/1625793
So it would look like that transpose(result.data)
-- Update --
const transposed = transpose(result.data)
const headers = transposed.shift();
const res = transposed.map(row => row.reduce((acc, col, ind) => {acc[headers[ind]] = col; return acc}, {}))
I'm trying to create a little Node.js app that can read Markdown text and converts it to HTML. To achieve this, I wanted to create a transform stream that would get a char at a time, figure the meaning, then return the HTML version of it.
So, for example, if I pass the transform stream a *, it should return <b>(or </b>).
However, the Transform stream does not actually transform data, whatever I push to it, it just comes back like I pushed it, and when I put a console.log statement into the transform method of the stream, I saw no output, as if the method isn't even called.
Here's the file with the Stream:
module.exports = function returnStream() {
const Transform = require('stream').Transform;
let trckr = {
bold: false
};
const compiler = new Transform({
transform(chunk, encoding, done) {
const input = chunk.toString();
console.log(input); // No output
let output;
switch (input) {
case '*':
if (trckr.bold === true) {
output = '</b>';
trckr.bold = false;
} else {
output = '<b>';
trckr.bold = true;
}
break;
default:
output = input;
}
done(null, output);
}
});
return compiler;
};
Example file that uses the Stream:
const transformS = require('./index.js')();
transformS.on('data', data => {
console.log(data.toString());
});
transformS.push('*');
Thanks!
done(null, output) and transformS.push() are performing the exact same function: they push data to the readable (the output) side of the Transform stream. What you need to do instead of calling transformS.push() is to write to the writable (the input) side of the Transform stream with transformS.write('*').
I should also point out that you should not make assumptions about the contents of chunk in your transform function. It could be a single character or a bunch of characters (in which case input may never equal '*').