Reading a parquet file in nodejs

Reading a parquet file in nodejs - javascript

I am trying the following code (from sample of parquetjs-lite and stackoverflow) to read a parquet file in nodejs :
const readParquetFile = async () => {
try {
// create new ParquetReader that reads from test.parquet
let reader = await parquet.ParquetReader.openFile('test.parquet');
}
catch (e){
console.log(e);
throw e;
}
// create a new cursor
let cursor = reader.getCursor();
// read all records from the file and print them
let record = null;
while (record = await cursor.next()) {
console.log(record);
}
await reader.close();
};
When I run this code nothing happens . There is nothing written to the console, for testing purpose I have only used a small csv file which I converted using python to parquet.
Is it because I have converted from csv to parquet using python (I couldn't find any JS equivalent for large files on which I have to ultimately be able to use).
I want my application to be able to take in any parquet file and read it. Is there any limitation for parquetjs-lite in this regard.
There are NaN values in my CSV could that be a problem ?
Any pointers would be helpful.
Thanks

Possible failure cases are
you are calling this function in some file without a webserver running.
In this case the file will run in async mode and as async function goes in callback stack and your main stack is empty the program will end and even is you have code in your call stack it will never run or log anything.
To solve this try running a webserver or better use sync calls
//app.js (without webserver)
const readParquetFile = async () => {
console.log("running")
}
readParquetFile()
console.log("exit")
when you run the above code the output will be
exit
//syncApp.js
const readParquetFile = () => {
console.log("running")
// all function should be sync
}
readParquetFile()
console.log("exit")
here the console log will be
running
exit

Related

Javascript await not happening while streaming to a file

Requirement : I am supposed to hit around 100 links given in a file and download it sequentially. The response content type is octet stream file (.ts file). I am using "got" library to stream. However I am not able to do the task sequentially even though i am using await function.
I Do not want to hit all the links asynchronously and then do Promise.allsettled().
I want to it to be in sequential manner.
however in the output, i am seeing the for loop executing without waiting for previous file to be downloaded and written to file completely.
First all the console.log statements for each file is getting printed and then writing to file is happening until process is not yet exited.
(each file size is around 2 mb)
How do I execute this sequentially ?
const allFileLines = allFileContents.split(/\r?\n/);
async function makeApiCall(){
for await(const line of allFileLines){
if (line.startsWith("https")) {
const fileName = new URL(line).pathname.split('/')[1];
const downloadPathAndFile = `${download_path}${dirName}/${fileName}`;
console.log(`Downloading file : ${fileName}`);
await downloadFileWithGot(line, downloadPathAndFile, getOptions(line))
}
}
}
await makeApiCall();
// in a different utility file
export async function downloadFileWithGot(url, outputLocationPath, options) {
console.log('entered inside fn');
const pipeResp = await got.stream(url, undefined, options)
console.log('got pipe resp');
const writeResp = await pipeResp.pipe(createWriteStream(outputLocationPath));
console.log('successfully wrote to file');
return writeResp;
}
/*
File in which links are present (its a simple txt file) :
https://example.com/__segment:12345/stream_1/file_0.ts
https://example.com/__segment:12345/stream_1/file_1.ts
.
.
------------ output ------------
Downloading file : file_0.ts
entered inside fn
got pipe resp
successfully wrote to file
Downloading file : file_1.ts
entered inside fn
got pipe resp
successfully wrote to file
.
.
.
*/

What is the best way to keep a file open to read/write?

I have a local JSON file which I intent to read/write from a NodeJS electron app. I am not sure, but I believe that instead of using readFile() and writeFile(), I should get a FileHandle to avoid multiple open and close actions.
So I've tried to grab a FileHandle from fs.promises.open(), but the problem seems to be that I am unable to get a FileHandle from an existing file without truncate it and clear it to 0.
const { resolve } = require('path');
const fsPromises = require('fs').promises;
function init() {
// Save table name
this.path = resolve(__dirname, '..', 'data', `test.json`);
// Create/Open the json file
fsPromises
.open(this.path, 'wx+')
.then(fileHandle => {
// Grab file handle if the file don't exists
// because of the flag 'wx+'
this.fh = fileHandle;
})
.catch(err => {
if (err.code === 'EEXIST') {
// File exists
}
});
}
Am I doing something wrong? Are there better ways to do it?
Links:
https://nodejs.org/api/fs.html#fs_fspromises_open_path_flags_mode
https://nodejs.org/api/fs.html#fs_file_system_flags

Because JSON is a text format that has to be read or written all at once and can't be easily modified or added onto in place, you're going to have to read the whole file or write the whole file at once.
So, your simplest option will be to just use fs.promises.readFile() and fs.promises.writeFile() and let the library open the file, read/write it and close the file. Opening and closing a file in a modern OS takes advantage of disk caching so if you're reopening a file you just previously opened not long ago, it's not going to be a slow operation. Further, since nodejs performs these operations in secondary threads in libuv, it doesn't block the main thread of nodejs either so its generally not a performance issue for your server.
If you really wanted to open the file once and hold it open, you would open it for reading and writing using the r+ flag as in:
const fileHandle = await fsPromises.open(this.path, 'r+');
Reading the whole file would be simple as the new fileHandle object has a .readFile() method.
const text = await fileHandle.readFile({encoding 'utf8'});
For writing the whole file from an open filehandle, you would have to truncate the file, then write your bytes, then flush the write buffer to ensure the last bit of the data got to the disk and isn't sitting in a buffer.
await fileHandle.truncate(0); // clear previous contents
let {bytesWritten} = await fileHandle.write(mybuffer, 0, someLength, 0); // write new data
assert(bytesWritten === someLength);
await fileHandle.sync(); // flush buffering to disk

Where does readstream stores the file in nodejs

I read that createRreadStream doesn't put the whole file into the memory, instead it works with chunks. However I have a situation where I am simultaneously writing and reading from a file; Write gets finished first, then I delete the file from disk. Somehow, readstream was able to complete reading whole file without any error.
Does anyone have any explanation for this ? Am I wrong to think that streams doesn't load the file into memory?
Here's the code for writing to a file
const fs = require('fs');
const file = fs.createWriteStream('./bigFile4.txt');
function write(stream,data) {
if(!stream.write(data))
return new Promise(resolve=>stream.once('drain',resolve));
return true;
}
(async() => {
for(let i=0; i<1e6; i++) {
const res = write(file,'a')
if(res instanceof Promise)
await res;
}
write(file,'success');
})();
For Reading I used this,
const file = fs.createReadStream('bigFile4.txt')
file.on('data',(chunk)=>{
console.log(chunk.toString())
})
file.on('end',()=>{
console.log('done')
})

At least on UNIX-type OS'es, if you open a file and then remove it, the file data will still be available to read until you close the file.

Automating Priority-Web-SDK file upload

I would like to create a command line (or other automated) method for uploading files to priority using the Web-SDK. The best solution I have right now seems to be a simple webform activated by a python script.
Are there tools/examples for using Javascript and a file picker without opening the browser? Are there Priority-Web-SDK ports to other environments? C#, Python, etc?
Any other suggestions also welcome.
UPDATE June 14, 2020:
I was able to complete the task for this client using a combination of Javascript, Python and C#. A tangled mess indeed, but files were uploaded. I am now revisiting the task and looking for cleaner solutions.
I found a working and usable Node module to compact the program into an executable to make it a viable option for deployment.
So the question becomes more focused => creating the input for uploadDataUrl() or uploadFile() without a browser form.

You run node locally and use priority SDK.
*As long as you work in an environment that is capable to render JS.
You can send files through the function uploadFile.
The data inside the file object need to be written as 64 base file.

This nodejs script will upload a file to Priority. Make sure that fetch-base64 is npm installed:
"use strict";
const priority = require('priority-web-sdk');
const fetch = require('fetch-base64');
const configuration = {...};
async function uploadFile(formName, zoomVal, filepath, filename) {
try {
await priority.login(configuration);
let form = await priority.formStartEx(formName, null, null, null, 1, {zoomValue: zoomVal});
await form.startSubForm("EXTFILES", null ,null);
let data = await fetch.local(filepath + '/' + filename);
let f = await form.uploadDataUrl(data[0], filename.match(/\..+$/i)[0], () => {});
await form.fieldUpdate("EXTFILENAME", f.file); // Path
await form.fieldUpdate("EXTFILEDES", filename); // Name
await form.saveRow(0);
} catch(err) {
console.log('Something bad happened:');
console.dir(err);
}
}
uploadFile('AINVOICES', 'T9679', 'C:/my/path', 'a.pdf');

how to handle error while extracing the zip file in node.js

i am trying to upload zip file and then i have to extract it in server side and also i have to handle error while extracting that zip file.to extract i am trying like this
var zip = new AdmZip(x);
zip.extractAllTo('target path');
the extractAllTo not contain call back function ,if it is contain that i can handle err easily so let me know how to handle err while extracting zip file.
i am creating one tmp folder and after upload file and then i keep that uploaded file into tmp folder and then i am storing that uploaded file into original folder and i will take that path to store db(mongodb).After stored data i got stored result in callback function within that callback function i have tried to remove that tmp folder but i could not remove it.i have tired to remove without that data stored callback function it is working . what mistake i did.how to resolve it.i have tried like this
db.save({'filepath':'xxxxx'},function(err,data)
{
if(data)
{
fs.rmdir('xxxx/xxxxx',function(err)
{
if(err)
{
console.log('err')
}else
{
console.log('removed');
}
});
}
});
i am always received in console that err.

After looking in the code from adm-zip, the only way is to embed extraction in a try {} catch statement:
var zip = new AdmZip(x);
try {
zip.extractAllTo('target path');
} catch ( e ) {
console.log( 'Caught exception: ', e );
}

It looks like your library is synchronous, which is why it doesn't use callbacks. If you are uploading a zip file to a server, synchronous calls will halt your entire server for all clients and thus you should switch to an asynchronous library to do this work. FYI for the synchronous version, to handle errors, you would use a try/catch structure as the exceptions thrown are in a single execution stack.

We Keep Coding

JavaScript is the programming language of the Web.

Reading a parquet file in nodejs - javascript

Related

Javascript await not happening while streaming to a file

What is the best way to keep a file open to read/write?

Where does readstream stores the file in nodejs

Automating Priority-Web-SDK file upload

how to handle error while extracing the zip file in node.js

Categories

Resources