Accessing Parsed CSV data , CSV-Parser Node.js - javascript

Im using csv-parser npm package and doing a sample csv parse. My only confusion is accessing the parsed array after running these functions. I understand im pushing the data in .on('data') , then doing a console.log(results); statement in .on('end'); to show what's being stored. Why do I get undefined when i try to access results after running those functions. Doesn't results get the information stored?
const csv = require('csv-parser');
const fs = require('fs');
const results = [];
fs.createReadStream('demo.csv')
.pipe(csv())
.on('data', (data) => results.push(data))
.on('end', () => {
console.log(results);
});

I came here to find the solution to the same issue.
Since this is an async operation, what works here is to call that function that acts on your parsed data once the end handler is called. Something like this should work in this situation:
const csv = require('csv-parser');
const fs = require('fs');
const results = [];
fs.createReadStream('demo.csv')
.pipe(csv())
.on('data', (data) => results.push(data))
.on('end', () => {
console.log(results);
csvData(results);
});
const csvData = ((csvInfo) => {
console.log(csvInfo);
console.log(csvInfo.length);
})

I can get results in .on('end', () => { console.log(results);}); , but
if I put a console.log() after the createReadStream , results is
undefined, does that make sense? – Videoaddict101
Your stream acts asynchronously, that means your data and your end handler will be called later, meanwhile your javascript continue to be executed. So accessing your array just after fs.createReadStream instruction will result of an empty array.
Understanding async is very important using javascript, even more for nodejs.
Please have a look on differents resources for handling async like Promise, Async/Await ...

You should you neat-csv which is the endorsed wrapper for csv-parser that gives you a promise interface.
That said, you can create a promise and resolve it in the on("end", callback)
import fs from "fs";
import csv from "csv-parser";
function getCsv(filename) {
return new Promise((resolve, reject) => {
const data = [];
fs.createReadStream(filename)
.pipe(csv())
.on("error", (error) => reject(error))
.on("data", (row) => data.push(row))
.on("end", () => resolve(data));
});
}
console.log(await getCsv("../assets/logfile0.csv"));

Related

nodejs download image from url synchronously

I am trying to build a small node app it calls an api which returns an array of urls which point to image blob png files.
I am then trying to loop over the array and download the files using a utility function. I need this to work synchronously. Once the downloads are complete I then want to fire an additional function.
I started off using some asynchronous code which I took from here: https://sabe.io/blog/node-download-image
The async code in my utils file looked like this:
import { promises as fs } from "fs";
import fetch from "node-fetch";
const downloadFile = async (url, path, cb) => {
const response = await fetch(url);
const blob = await response.blob();
const arrayBuffer = await blob.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);
fs.writeFile(path, buffer);
cb();
}
export { downloadFile };
I have tried to convert it to be purely synchronous using this code:
import fs from "fs";
import fetch from "node-fetch";
const downloadFile = (url, path, cb) => {
const response = fetch(url);
const blob = response.blob();
const arrayBuffer = await blob.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);
fs.writeFileSync(path, buffer);
cb();
}
export { downloadFile };
Then in my index.js file I am using it like so:
import { downloadFile } from './utils/downloadFiles.js';
let imagesArray = [];
let newImageNames = [];
imagesArray.forEach((item, index) => {
const fileName = `${prompt}__${index}_${uuid.v4()}.png`;
const filePath = path.join('src', 'images');
newImageNames.push(fileName);
downloadFile(item, filePath, fileDownloadCallback);
});
processDataCallback(); // This is the function which is being fired before the previous downloadFile functions have finished processing.
const fileDownloadCallback = () => {
console.log(`File download callback`);
}
My images array is being populated and looks like this as an example:
data: [
{
url: 'https://someurl.com/HrwNAzC8YW/A%3D'
},
{
url: 'https://someurl.com/rGL7UeTeWTfhAuLWPg%3D'
},
{
url: 'https://someurl.com/xSKR36gCdOI3/tofbQrR8YTlN6W89DI%3D'
},
{
url: 'https://someurl.com/2y9cgRWkH9Ff0%3D'
}
]
When I try and use the synchronous method I get this error TypeError: response.blob is not a function. This function does work when using it asynchronously, but then it is firing my next function before the image downloads have finished.
I have tried several iterations, first off using createWriteStream and createWriteStreamSync (which I believe are deprecated). So switched to fileWrite. I also tried using a synchronous fileWriteSync inside the async function, but still no dice. The other issue is that fetch works asynchronously, so I still don't know how to wire this up to only work synchronously. I was also wondering If I could chain a then onto the end of my fileDownload util function.
All of my code is in github, so I can share a url if required. Or please ask for more explanation if needed.
Is there something equivalent to jsfiddle for Node? If so I am more than happy to try and make a demo.
Any help greatly appreciated.
We can leave the original async downloadFile util alone (though there's a little room for improvement there).
In the index file...
import { downloadFile } from './utils/downloadFiles.js';
let imagesArray = [];
let newImageNames = [];
// I'm a little confused about how we get anything out of iterating an empty array
// but presuming it get's filled with URLs somehow...
const promises = imagesArray.map((item, index) => {
const fileName = `${prompt}__${index}_${uuid.v4()}.png`;
const filePath = path.join('src', 'images');
newImageNames.push(fileName);
// we can use the callback for progress, but we care about finishing all
return downloadFile(item, filePath, () => {
console.log('just wrote', filePath);
});
});
Promise.all(promises).then(() => {
console.log('all done')
})
Array.forEach doesnt work with async code.
Convert your code into a for...of or for code and it will works.
Also. You don't use callbacks with async/await code.
Your code will look like this:
let index = 0;
for(const item of imageArray) => {
const fileName = `${prompt}__${index}_${uuid.v4()}.png`;
const filePath = path.join('src', 'images');
newImageNames.push(fileName);
downloadFile(item, filePath);
fileDownloadCallback();
index++;
});

Saving data from a CSV in a global variable (fast-csv/papaparse)

I am new to js and trying to parse a CSV in the backend using node.js.
I have an array of states in which I want to store the data of a column of CSV. This is a very simple code that i wrote using fast-csv to do so. But whenever I run the code, I get an empty array, [] . I tried doing the same using papaparse and got the same results.
const csv = require('fast-csv')
const file = fs.createReadStream('main.csv');
var states = []
file
.pipe(csv.parse({ headers: false }))
.on('data', row => states.push(row[2]))
console.log(states)
But whenever I console log it in then .on('end') block the values are logged.
const csv = require('fast-csv')
const file = fs.createReadStream('main.csv');
var states = []
file
.pipe(csv.parse({ headers: false }))
.on('data', row => states.push(row[2]))
.on('end', () => console.log(states) // This works
console.log(states) // This doesn't
I think this is due to the parser working asynchronously. I have tried to resolve promises and used async/await methods but I cant use the parsed content in the global scope.
Would love some help on this one.
You are right, parser works asynchronously. So, it starts parsing and uses callback on events ('data', 'end' in your case). But code after your parser will be executed emediately after parser started to work. So all your actions with parsed data should be done in 'end' event callback.
// function to start parser
const startParsing = (res) => {
const csv = require('fast-csv')
const file = fs.createReadStream('main.csv');
// you may use const cos it's type won't be changed
const states = []
file
.pipe(csv.parse({ headers: false }))
.on('data', row => states.push(row[2]))
// execute function after parsing.
.on('end', () => outputData(states, res));
};
const outputData = (states, res) => {
// your next actions here
console.log(states);
// for example, res.send(states) or anything to complete server response if it's accessible
res.send(states);
};
// then, in any place you need use this startParsing function, for example
server.get('/parser', (req, res) => {
startParsing(res);
});

Refactoring Complicated Nested Node.js Function

I have the following snippet of code below. It currently works, but I'm hoping to optimize/refactor it a bit.
Basically, it fetches JSON data, extracts the urls for a number of PDFs from the response, and then downloads those PDFs into a folder.
I'm hoping to refactor this code in order to process the PDFs once they are all downloaded. Currently, I'm not sure how to do that. There are a lot of nested asynchronous functions going on.
How might I refactor this to allow me to tack on another .then call before my error handler, so that I can then process the PDFs that are downloaded?
const axios = require("axios");
const moment = require("moment");
const fs = require("fs");
const download = require("download");
const mkdirp = require("mkdirp"); // Makes nested files...
const getDirName = require("path").dirname; // Current directory name...
const today = moment().format("YYYY-MM-DD");
function writeFile(path, contents, cb){
mkdirp(getDirName(path), function(err){
if (err) return cb(err)
fs.writeFile(path, contents, cb)
})
};
axios.get(`http://federalregister.gov/api/v1/public-inspection-documents.json?conditions%5Bavailable_on%5D=${today}`)
.then((res) => {
res.data.results.forEach((item) => {
download(item.pdf_url).then((data) => {
writeFile(`${__dirname}/${today}/${item.pdf_file_name}`, data, (err) => {
if(err){
console.log(err);
} else {
console.log("FILE WRITTEN: ", item.pdf_file_name);
}
})
})
})
})
.catch((err) => {
console.log("COULD NOT DOWNLOAD FILES: \n", err);
})
Thanks for any help you all can provide.
P.S. –– When I simply tack on the .then call right now, it fires immediately. This means that my forEach loop is non-blocking? I thought that forEach loops were blocking.
The current forEach will run synchronously, and will not wait for the asynchronous operations to complete. You should use .map instead of forEach so you can map each item to its Promise from download. Then, you can use Promise.all on the resulting array, which will resolve once all downloads are complete:
axios.get(`http://federalregister.gov/api/v1/public-inspection-documents.json?conditions%5Bavailable_on%5D=${today}`)
.then(processResults)
.catch((err) => {
console.log("COULD NOT DOWNLOAD FILES: \n", err);
});
function processResults(res) {
const downloadPromises = res.data.results.map((item) => (
download(item.pdf_url).then(data => new Promise((resolve, reject) => {
writeFile(`${__dirname}/${today}/${item.pdf_file_name}`, data, (err) => {
if(err) reject(err);
else resolve(console.log("FILE WRITTEN: ", item.pdf_file_name));
});
}))
));
return Promise.all(downloadPromises)
.then(() => {
console.log('all done');
});
}
If you wanted to essentially block the function on each iteration, you would want to use an async function in combination with await instead.

How to use Promise.all in the following readFile code?

In the following code, I'm reading some files and getting their filename and text. After that, I'm storing data in an option variable to generate an epub file:
const Epub = require("epub-gen")
const folder = './files/'
const fs = require('fs')
let content = []
fs.readdir(folder, (err, files) => {
files.forEach(filename => {
const title = filename.split('.').slice(0, -1).join('.')
const data = fs.readFileSync(`${folder}${filename}`).toString('utf-8')
content.push({ title, data })
})
})
const option = {
title: "Alice's Adventures in Wonderland", // *Required, title of the book.
content
}
new Epub(option, "./text.epub")
The problem is, new Epub runs before the files are read, before content is ready. I think Promise.all is the right candidate here. I checked the Mozilla docs. But it shows various promises as example, but I have none. So, I'm not very sure how to use Promise.all here.
Any advice?
Your problem is with readdir, which is asynchronous so new Epub, like you already figured out, is called before it's callback parameter.
Switch to using readdirSync or move const option ... new Epub... inside the callback parameter of readdir, after files.forEach.
At the moment you can do everything synchronous since you use readFileSync.
So you can place the Epub creation after the forEach loop.
If you want to go async, my first question would be:
Does your node.js version support util.promisify ( node version 8.x or higher iirc )?
If so, that can be used to turn the callback functions like readFile and such into promises. If not, you can use the same logic, but then with nested callbacks like the other solutions show.
const FS = require( 'fs' );
const { promisify } = require( 'util' );
const readFile = promisify( FS.readFile );
const readFolder = promisify( FS.readFolder );
readFolder( folder )
// extract the file paths. Maybe using `/${filename}` suffices here.
.then( files => files.map( filename => `${folder}${filename}`))
// map the paths with readFile so we get an array with promises.
.then( file_paths => file_paths.map( path => readFile( path )))
// fecth all the promises using Promise.all() .
.then( file_promises => Promise.all( file_promises ))
.then( data => {
// do something with the data array that is returned, like extracting the title.
// create the Epub objects by mapping the data values with their titles
})
// error handling.
.catch( err => console.error( err ));
Add promises to an array. Each promise should resolve with the value you were pushing into content
When all promises resolve, the returned value will be the array previously known as content.
Also, you can, and should, use all async fs calls. So readFileSync can be replaced with readFile (async). I did not replace your code with this async call however, so you can clearly see what was required to answer your original question.
Not sure if I got the nesting right in snippet.
const Epub = require("epub-gen")
const folder = './files/'
const fs = require('fs')
let promises = []
fs.readdir(folder, (err, files) => {
files.forEach(filename => {
promises.push(new Promise((resolve, reject) => {
const title = filename.split('.').slice(0, -1).join('.')
const data = fs.readFile(`${folder}${filename}`).toString('utf-8')
resolve({
title,
data
})
}))
})
})
const option = {
title: "Alice's Adventures in Wonderland", // *Required, title of the book.
content
}
new Epub(option, "./text.epub")
Promise.all(promises).then((content) => {
//done
})

async read and output file using node.js

I want to collect all of my files name in a folder and output it to a json file. I have 2 problem, first one I don't know how to do callback. Then I'll skip that one, but I tried the settimeout example, I did not see any .json file also. I wonder that's wrong.
const imagesFolder = './assets/images';
const fs = require('fs');
const jsonfile = require('jsonfile');
let json = [];
fs.readdir(imagesFolder, (err, files) => {
files.forEach(file => {
json.push(file.split('.')[0])
});
})
setTimeout(function(){
var obj = {"foo":"bar"}
jsonfile.writeFile(imagesFolder, obj);
},1000)
You can do this synchronously in readdir callback. Also, I believe you need to pass a file name to jsonfile.writeFile:
fs.readdir(imagesFolder, (err, files) => {
files.forEach(file => {
json.push(file.split('.')[0]);
});
jsonfile.writeFile(`${imagesFolder}/files.json`, json);
});

Categories