Javascript await not happening while streaming to a file - javascript

Requirement : I am supposed to hit around 100 links given in a file and download it sequentially. The response content type is octet stream file (.ts file). I am using "got" library to stream. However I am not able to do the task sequentially even though i am using await function.
I Do not want to hit all the links asynchronously and then do Promise.allsettled().
I want to it to be in sequential manner.
however in the output, i am seeing the for loop executing without waiting for previous file to be downloaded and written to file completely.
First all the console.log statements for each file is getting printed and then writing to file is happening until process is not yet exited.
(each file size is around 2 mb)
How do I execute this sequentially ?
const allFileLines = allFileContents.split(/\r?\n/);
async function makeApiCall(){
for await(const line of allFileLines){
if (line.startsWith("https")) {
const fileName = new URL(line).pathname.split('/')[1];
const downloadPathAndFile = `${download_path}${dirName}/${fileName}`;
console.log(`Downloading file : ${fileName}`);
await downloadFileWithGot(line, downloadPathAndFile, getOptions(line))
}
}
}
await makeApiCall();
// in a different utility file
export async function downloadFileWithGot(url, outputLocationPath, options) {
console.log('entered inside fn');
const pipeResp = await got.stream(url, undefined, options)
console.log('got pipe resp');
const writeResp = await pipeResp.pipe(createWriteStream(outputLocationPath));
console.log('successfully wrote to file');
return writeResp;
}
/*
File in which links are present (its a simple txt file) :
https://example.com/__segment:12345/stream_1/file_0.ts
https://example.com/__segment:12345/stream_1/file_1.ts
.
.
------------ output ------------
Downloading file : file_0.ts
entered inside fn
got pipe resp
successfully wrote to file
Downloading file : file_1.ts
entered inside fn
got pipe resp
successfully wrote to file
.
.
.
*/

Related

NodeJs csv-parse await inside on('data')

I have a code, which reades .csv line by line and saves each parsed row to the database
const csv = require('csv-parse')
const errors = []
csv.parse(content, {})
.on('data', async function (row: any) {
const error = await tryToSaveToDatabase(row);
if (error) {
errors.push(error)
}
})
.on('end', function () {
// somehow process all errors
})
but, unfortunately, .on('end', ... block is beeing called earlier then all await block succeded.
I have read NodeJs Csv parser async operations - seems we cannot use await inside .on('data', ... callback.
What is the correct way to perform such thing - if I want to read .csv line by line (files might be very huge - so it must be performed in a streaming manner) and collect some errors while saving to database? (these errors are displayed on frontend then)
https://csv.js.org/parse/api/async_iterator/
This solution reads .csv line by line

Reading a parquet file in nodejs

I am trying the following code (from sample of parquetjs-lite and stackoverflow) to read a parquet file in nodejs :
const readParquetFile = async () => {
try {
// create new ParquetReader that reads from test.parquet
let reader = await parquet.ParquetReader.openFile('test.parquet');
}
catch (e){
console.log(e);
throw e;
}
// create a new cursor
let cursor = reader.getCursor();
// read all records from the file and print them
let record = null;
while (record = await cursor.next()) {
console.log(record);
}
await reader.close();
};
When I run this code nothing happens . There is nothing written to the console, for testing purpose I have only used a small csv file which I converted using python to parquet.
Is it because I have converted from csv to parquet using python (I couldn't find any JS equivalent for large files on which I have to ultimately be able to use).
I want my application to be able to take in any parquet file and read it. Is there any limitation for parquetjs-lite in this regard.
There are NaN values in my CSV could that be a problem ?
Any pointers would be helpful.
Thanks
Possible failure cases are
you are calling this function in some file without a webserver running.
In this case the file will run in async mode and as async function goes in callback stack and your main stack is empty the program will end and even is you have code in your call stack it will never run or log anything.
To solve this try running a webserver or better use sync calls
//app.js (without webserver)
const readParquetFile = async () => {
console.log("running")
}
readParquetFile()
console.log("exit")
when you run the above code the output will be
exit
//syncApp.js
const readParquetFile = () => {
console.log("running")
// all function should be sync
}
readParquetFile()
console.log("exit")
here the console log will be
running
exit

How would you use readdir to create a list of files to require in a Node.js app

I have a director data and it contains 10 files and I'll be adding some files to it. I'd like to automatically bring in these files when the node server boots up using something like
let Test = require("./data/Test");
I wrote the following, when I loop over it, I see the filenames.
fs.readdir(__dirname + "/data", (err, files) => {
files.forEach((file) => {
console.log("file", file);
let fileName = file.split(".")[0];
global[fileName] = require("./data/" + fileName);
console.log(global[fileName]);
});
});
If I put a console log on the page and wait 5 seconds, then the data is indeed there. How can I get the contents of this file BEFORE the server finishes starting?

What is the best way to keep a file open to read/write?

I have a local JSON file which I intent to read/write from a NodeJS electron app. I am not sure, but I believe that instead of using readFile() and writeFile(), I should get a FileHandle to avoid multiple open and close actions.
So I've tried to grab a FileHandle from fs.promises.open(), but the problem seems to be that I am unable to get a FileHandle from an existing file without truncate it and clear it to 0.
const { resolve } = require('path');
const fsPromises = require('fs').promises;
function init() {
// Save table name
this.path = resolve(__dirname, '..', 'data', `test.json`);
// Create/Open the json file
fsPromises
.open(this.path, 'wx+')
.then(fileHandle => {
// Grab file handle if the file don't exists
// because of the flag 'wx+'
this.fh = fileHandle;
})
.catch(err => {
if (err.code === 'EEXIST') {
// File exists
}
});
}
Am I doing something wrong? Are there better ways to do it?
Links:
https://nodejs.org/api/fs.html#fs_fspromises_open_path_flags_mode
https://nodejs.org/api/fs.html#fs_file_system_flags
Because JSON is a text format that has to be read or written all at once and can't be easily modified or added onto in place, you're going to have to read the whole file or write the whole file at once.
So, your simplest option will be to just use fs.promises.readFile() and fs.promises.writeFile() and let the library open the file, read/write it and close the file. Opening and closing a file in a modern OS takes advantage of disk caching so if you're reopening a file you just previously opened not long ago, it's not going to be a slow operation. Further, since nodejs performs these operations in secondary threads in libuv, it doesn't block the main thread of nodejs either so its generally not a performance issue for your server.
If you really wanted to open the file once and hold it open, you would open it for reading and writing using the r+ flag as in:
const fileHandle = await fsPromises.open(this.path, 'r+');
Reading the whole file would be simple as the new fileHandle object has a .readFile() method.
const text = await fileHandle.readFile({encoding 'utf8'});
For writing the whole file from an open filehandle, you would have to truncate the file, then write your bytes, then flush the write buffer to ensure the last bit of the data got to the disk and isn't sitting in a buffer.
await fileHandle.truncate(0); // clear previous contents
let {bytesWritten} = await fileHandle.write(mybuffer, 0, someLength, 0); // write new data
assert(bytesWritten === someLength);
await fileHandle.sync(); // flush buffering to disk

Where does readstream stores the file in nodejs

I read that createRreadStream doesn't put the whole file into the memory, instead it works with chunks. However I have a situation where I am simultaneously writing and reading from a file; Write gets finished first, then I delete the file from disk. Somehow, readstream was able to complete reading whole file without any error.
Does anyone have any explanation for this ? Am I wrong to think that streams doesn't load the file into memory?
Here's the code for writing to a file
const fs = require('fs');
const file = fs.createWriteStream('./bigFile4.txt');
function write(stream,data) {
if(!stream.write(data))
return new Promise(resolve=>stream.once('drain',resolve));
return true;
}
(async() => {
for(let i=0; i<1e6; i++) {
const res = write(file,'a')
if(res instanceof Promise)
await res;
}
write(file,'success');
})();
For Reading I used this,
const file = fs.createReadStream('bigFile4.txt')
file.on('data',(chunk)=>{
console.log(chunk.toString())
})
file.on('end',()=>{
console.log('done')
})
At least on UNIX-type OS'es, if you open a file and then remove it, the file data will still be available to read until you close the file.

Categories