Node.js - Getting empty files when unzipping and uploading to GCS

Node.js - Getting empty files when unzipping and uploading to GCS - javascript

I am trying to create a service that gets a zip file, unpacks it, and uploads its contents to a Google Cloud Storage bucket.
The unzipping part seems to work well, but in my GCS bucket all the files seem to be empty.
I'm using the following code:
app.post('/fileupload', function(req, res) {
var form = new formidable.IncomingForm();
form.parse(req, function (err, fields, files) {
const uuid = uuidv4();
console.log(files.filetoupload.path); // temporary path to zip
fs.createReadStream(files.filetoupload.path)
.pipe(unzip.Parse())
.on('entry', function (entry) {
var fileName = entry.path;
var type = entry.type; // 'Directory' or 'File'
var size = entry.size;
const gcsname = uuid + '/' + fileName;
const blob = bucket.file(gcsname);
const blobStream = blob.createWriteStream(entry.path);
blobStream.on('error', (err) => {
console.log(err);
});
blobStream.on('finish', () => {
const publicUrl = format(`https://storage.googleapis.com/${bucket.name}/${blob.name}`);
console.log(publicUrl); // file on GCS
});
blobStream.end(entry.buffer);
});
});
});
I'm quite new to Node.js so I'm probably overlooking something - I've spent some time on documentation but I don't quite know what to do.
Could someone advise on what might be the problem?

The fs.createWriteStream() takes file path as argument but GCS createWriteStream() takes options
As per the example in this documentation the recommended way would be:
const stream = file.createWriteStream({
metadata: {
contentType: req.file.mimetype
},
resumable: false
});
instead of:
const blobStream = blob.createWriteStream(entry.path).

Check whether your buffer is undefined or not . It may be due to unspecified disk/Mem storage that the buffer remains undefined .

Related

Discord.JS - List all files within a directory as one message

I am having an issue where I cannot seem to find a solution.
I have written a Discord bot from Discord.JS that needs to send a list of file names from a directory as one message. So far I have tried using fs.readddir with path.join and fs.readfilesync(). Below is one example.
const server = message.guild.id;
const serverpath = `./sounds/${server}`;
const fs = require('fs');
const path = require('path');
const directoryPath = path.join('/home/pi/Pablo', serverpath);
fs.readdir(directoryPath, function(err, files) {
if (err) {
return console.log('Unable to scan directory: ' + err);
}
files.forEach(function(file) {
message.channel.send(file);
});
});
Whilst this does send a message of every file within the directory, it sends each file as a separate message. This causes it to take a while due to Discord API rate limits. I want them to all be within the same message, separated with a line break, with a max of 2000 characters (max limit for Discord messages).
Can someone assist with this?
Thanks in advance.
Jake

I recommend using fs.readdirSync(), it will return an array of the file names in the given directory. Use Array#filter() to filter the files down to the ones that are JavaScript files (extentions ending in ".js"). To remove ".js" from the file names use Array#map() to replace each ".js" to "" (effectively removing it entirely) and use Array#join() to join them into a string and send.
const server = message.guild.id;
const serverpath = `./sounds/${server}`;
const { readdirSync } = require('fs');
const path = require('path');
const directoryPath = path.join('/home/pi/Pablo', serverpath);
const files = readdirSync(directoryPath)
.filter(fileName => fileName.endsWith('.js'))
.map(fileName => fileName.replace('.js', ''));
.join('\n');
message.channel.send(files);
Regarding handling the sending of a message greater than 2000 characters:
You can use the Util.splitMessage() method from Discord.JS and provide a maxLength option of 2000. As long as the number of chunks needed to send is not more than a few you should be fine from API ratelimits
const { Util } = require('discord.js');
// Defining "files"
const textChunks = Util.splitMessage(files, {
maxLength: 2000
});
textChunks.forEach(async chunk => {
await message.channel.send(chunk);
});

Built an array of strings (names of files) then join with "\n".
let names = []
fs.readdir(directoryPath, function(err, files) {
if (err) {
return console.log('Unable to scan directory: ' + err);
}
files.forEach(function(file) {
names << file
});
});
message.channel.send(names.join("\n"));

Create new json from source json file in test project

UPDATE
I have continued to work through this and have the following code - st
async generatejobs() {
const fs = require("fs");
const path = require("path");
const directoryPath = path.join(__dirname, "../data/pjo/in");
fs.readdir(directoryPath, function (err, files) {
if (err) {
console.log("Error getting directory information." + err);
} else {
files.forEach(function (file) {
console.log(file);
fs.readFile(file, (err, data) => {
console.log(file); // this works, if I stop here
// if (err) throw err;
// let newJson = fs.readFileSync(data);
// console.log(newJson);
})
// let data = fs.readFileSync(file);
// let obj = JSON.parse(data);
// let isoUtc = new Date();
// let isoLocal = toISOLocal(isoUtc);
// obj.printingStart = isoLocal;
// obj.printingEnd = isoLocal;
// let updatedFile = JSON.stringify(obj);
// let write = fs.createWriteStream(
// path.join(__dirname, "../data/pjo/out", updatedFile)
// );
// read.pipe(write);
});
}
});
As soon as I try uncomment the line shown below, it fails.
let newJson = fs.readFileSync(data);
The error I am getting is this.
Uncaught ENOENT: no such file or directory, open 'C:\projects\codeceptJs\ipt\80-012345.json'
This is a true statement as the path should be as follows.
'C:\projects\codeceptJs\ipt\src\data\pjo\in\80-012345.json'
I do not understand why it is looking for the file here given that earlier in the code the path is set and seems to work correctly for finding the file via this.
const directoryPath = path.join(__dirname, "../data/pjo/in");
The remainder of the code which is currently commented out is where I am attempting to do the following.
Grab each file from source dir
put into json object
Update the json object to change two date entries
Save to a new json file / new location in my project
Original Post
I have a codeceptjs test project and would like to include a set of existing json files in my project (src/data/jsondata/in) and then update the date attribute within each and write them to an output location in my project (src/data/jsondata/out). I need to change the date and then get it back into a very specific string format, which I have done and then insert this back into the new json being created. I got this about 80% of the way there and then ran into issues when trying to get the files from one folder within my project to another.
I broke this up in to two parts.
function to take a date and convert it to the date string I need
function to grab the source json, update the date, and make a new json at a new folder location
Number 1 is working as it should. Number 2 is not.
If there is a better way to accomplish this, I am very much open to that.
Here is the code where I'm trying to update the json. The main issue here is I'm not understanding and / or handling correctly the join path stuff.
generatePressJobs() {
//requiring path and fs modules
const path = require('path');
const fs = require('fs');
//joining path of directory
const directoryPath = path.join(__dirname, '../', 'data/pjo/in/');
//passsing directoryPath and callback function
fs.readdir(directoryPath, function (err, files) {
//handling error
if (err) {
I.say('unable to scan directory: ' + err);
return console.log('Unable to scan directory: ' + err);
}
//listing all files using forEach
files.forEach(function (file) {
// Update each file with new print dates
let data = fs.readFileSync(file);
let obj = JSON.parse(data);
let isoUtc = new Date();
let isoLocal = toISOLocal(isoUtc);
obj.printingStart = isoLocal;
obj.printingEnd = isoLocal;
let updatedFile = JSON.stringify(obj);
fs.writeFile(`C:\\projects\\csPptr\\ipt\\src\\data\\pjo\\out\\${file}`, updatedFile, (err) => {
if (err) {
throw err;
}
});
});
});
},
Error received
Uncaught ENOENT: no such file or directory, open '80-003599.json'
at Object.openSync (fs.js:462:3)
at Object.readFileSync (fs.js:364:35)
at C:\projects\codeceptJs\ipt\src\pages\Base.js:86:23
at Array.forEach (<anonymous>)
at C:\projects\codeceptJs\ipt\src\pages\Base.js:84:13
at FSReqCallback.oncomplete (fs.js:156:23)
The function to generate the json is located in src/pages/basePage.js
The folder structure I've built for the json file is located in
src/data/jsondata/in --> for original source files
src/data/jsondata/out --> for resulting json after change
Any insight or suggestions would be hugely appreciated.
Thank you,
Bob

My approach / resolution
Passing along the final approach I took in the event this is helpful to anyone else. The data in the middle was specific to my requirements, but left in to show the process I took to do what I needed to do.
async generatePressjobs(count) {
const fs = require("fs");
const path = require("path");
const sourceDirectoryPath = path.join(__dirname, "../data/pjo/in/");
const destDirectoryPath = path.join(__dirname, "../data/pjo/out/");
for (i = 0; i < count; i++) {
// read file and make object
let content = JSON.parse(
fs.readFileSync(sourceDirectoryPath + "source.json")
);
// Get current date and convert to required format for json file
let isoUtc = new Date();
let isoLocal = await this.toISOLocal(isoUtc);
let fileNameTimeStamp = await this.getFileNameDate(isoUtc);
// Get current hour and minute for DPI time stamp
let dpiDate = new Date;
let hour = dpiDate.getHours();
let minute = dpiDate.getMinutes();
dpiStamp = hour + '' + minute;
// update attributes in the json obj
content.batchid = `80-0000${i}`;
content.id = `80-0000${i}-10035-tcard-${dpiStamp}-0101010000_.pdf`
content.name = `80-0000${i}-8.5x11CALJEF-CalBody-${dpiStamp}-01010100${i}_.pdf`;
content.printingStart = isoLocal;
content.printingEnd = isoLocal;
// write the file
fs.writeFileSync(
destDirectoryPath + `80-0000${i}-SOME-JOB-NAME-${dpiStamp}.pdf_Press Job printing end_${fileNameTimeStamp}.json`,
JSON.stringify(content)
);
}
},

Javascript - Read parquet data (with snappy compression) from AWS s3 bucket

In nodeJS, I am trying to read a parquet file (compression='snappy') but not successful.
I used https://github.com/ironSource/parquetjs npm module to open local file and read it but reader.cursor() throws cryptic error 'not yet implemented'. It does not matter which compression (plain, rle, or snappy) was used to create input file, it throws same error.
Here is my code:
const readParquet = async (fileKey) => {
const filePath = 'parquet-test-file.plain'; // 'snappy';
console.log('----- reading file : ', filePath);
let reader = await parquet.ParquetReader.openFile(filePath);
console.log('---- ParquetReader initialized....');
// create a new cursor
let cursor = reader.getCursor();
// read all records from the file and print them
if (cursor) {
console.log('---- cursor initialized....');
let record = await cursor.next() ; // this line throws exception
while (record) {
console.log(record);
record = await cursor.next();
}
}
await reader.close();
console.log('----- done with reading parquet file....');
return;
};
Call to read:
let dt = readParquet(fileKeys.dataFileKey);
dt
.then((value) => console.log('--------SUCCESS', value))
.catch((error) => {
console.log('-------FAILURE ', error); // Random error
console.log(error.stack);
})
More info:
1. I have generated my parquet files in python using pyarrow.parquet
2. I used 'SNAPPY' compression while writing file
3. I can read these files in python without any issue
4. My schema is not fixed (unknown) each time I write parquet file. I do not create schema while writing.
5. error.stack prints undefined in console
6. console.log('-------FAILURE ', error); prints "not yet implemented"
I would like to know if someone has encountered similar problem and has ideas/solution to share. BTW my parquet files are stored on AWS S3 location (unlike in this test code). I still have to find solution to read parquet file from S3 bucket.
Any help, suggestions, code example will be highly appreciated.

Use var AWS = require('aws-sdk'); to get data from S3.
Then use node-parquet to read parquet file into variable.
import np = require('node-parquet');
// Read from a file:
var reader = new np.ParquetReader(`file.parquet`);
var parquet_info = reader.info();
var parquet_rows = reader.rows();
reader.close();
parquet_rows = parquet_rows + "\n";

There is a fork of https://github.com/ironSource/parquetjs here: https://github.com/ZJONSSON/parquetjs which is a "lite" version of the ironSource project. You can install it using npm install parquetjs-lite.
The ZJONSSON project comes with a function ParquetReader.openS3, which accepts an s3 client (from version 2 of the AWS SDK) and params ({Bucket: 'x', Key: 'y'}). You might want to try and see if that works for you.
If you are using version 3 of the AWS SDK / S3 client, I have a compatible fork here: https://github.com/entitycs/parquetjs (see tag feature/openS3v3).
Example usage from the project's README.md:
const parquet = require("parquetjs-lite");
const params = {
Bucket: 'xxxxxxxxxxx',
Key: 'xxxxxxxxxxx'
};
// v2 example
const AWS = require('aws-sdk');
const client = new AWS.S3({
accessKeyId: 'xxxxxxxxxxx',
secretAccessKey: 'xxxxxxxxxxx'
});
let reader = await parquet.ParquetReader.openS3(client,params);
//v3 example
const {S3Client, HeadObjectCommand, GetObjectCommand} = require('#aws-sdk/client-s3');
const client = new S3Client({region:"us-east-1"});
let reader = await parquet.ParquetReader.openS3(
{S3Client:client, HeadObjectCommand, GetObjectCommand},
params
);
// create a new cursor
let cursor = reader.getCursor();
// read all records from the file and print them
let record = null;
while (record = await cursor.next()) {
console.log(record);
}

Merging PDFs in Node

Hi i'm trying to merge pdf's of total of n but I cannot get it to work.
I'm using the Buffer module to concat the pdf's but it does only apply the last pdf in to the final pdf.
Is this even possible thing to do in node?
var pdf1 = fs.readFileSync('./test1.pdf');
var pdf2 = fs.readFileSync('./test2.pdf');
fs.writeFile("./final_pdf.pdf", Buffer.concat([pdf1, pdf2]), function(err) {
if(err) {
return console.log(err);
}
console.log("The file was saved!");
});
There are currently some libs out there but they do all depend on either other software or programming languages.

What do you expect to get when you do Buffer.concat([pdf1, pdf2])? Just by concatenating two PDFs files you won’t get one containing all pages. PDF is a complex format (basically one for vector graphics). If you just added two JPEG files you wouldn’t expect to get a big image containing both pictures, would you?
You’ll need to use an external library. https://github.com/wubzz/pdf-merge might work for instance.

HummusJS is another PDF manipulation library, but without a dependency on PDFtk. See this answer for an example of combining PDFs in Buffers.

Aspose.PDF Cloud SDK for Node.js can merge/combine pdf documents without depending on any third-party API/Tool. However, currently, it works with cloud storage(Aspose Internal Storage, Amazon S3, DropBox, Google Drive Storage, Google Cloud Storage, Windows Azure Storage, FTP Storage). In near future, we will provide support to merge files from the request body(stream).
const { PdfApi } = require("asposepdfcloud");
const { MergeDocuments }= require("asposepdfcloud/src/models/mergeDocuments");
var fs = require('fs');
pdfApi = new PdfApi("xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx", "xxxxxxxxxxxxxxxxxxxxx");
const file1 = "dummy.pdf";
const file2 = "02_pages.pdf";
const localTestDataFolder = "C:\\Temp";
const names = [file1, file2];
const resultName = "MergingResult.pdf";
const mergeDocuments = new MergeDocuments();
mergeDocuments.list = [];
names.forEach( (file) => {
mergeDocuments.list.push(file);
});
// Upload File
pdfApi.uploadFile(file1, fs.readFileSync(localTestDataFolder + "\\" + file1)).then((result) => {
console.log("Uploaded File");
}).catch(function(err) {
// Deal with an error
console.log(err);
});
// Upload File
pdfApi.uploadFile(file2, fs.readFileSync(localTestDataFolder + "\\" + file2)).then((result) => {
console.log("Uploaded File");
}).catch(function(err) {
// Deal with an error
console.log(err);
});
// Merge PDF documents
pdfApi.putMergeDocuments(resultName, mergeDocuments, null, null).then((result) => {
console.log(result.body.code);
}).catch(function(err) {
// Deal with an error
console.log(err);
});
//Download file
const outputPath = "C:/Temp/" + resultName;
pdfApi.downloadFile(outputPath).then((result) => {
fs.writeFileSync(localPath, result.body);
console.log("File Downloaded");
}).catch(function(err) {
// Deal with an error
console.log(err);
});

How to upload image to S3 using Node

I am writing an Express app that takes in a base64 encoded string that represents an image. Right now, i'm not really sure how I can take that string and upload the image to AWS S3, so i'm reading in the encoded image string data, decoding it, writing a file using fs, and then trying to upload. I have this working for an endpoint that just takes in a raw file, and all of its content is correctly uploaded to AWS s3.
Now when I try to do what I described above, i'm able to upload to S3, but the file has 0kb and is empty, and i'm not sure why. I tested just taking the stringData and writing a file to a test file, and it works. However, when I try uploading to s3, the file shows but it's empty. Here is my code:
router.post('/images/tags/nutritionalInformation/image/base64encoded', function (req, res) {
console.log(req.body.imageString);
var base64Stream = req.body.imageString;
var imgDecodedBuffer = decodeBase64Image(base64Stream);
console.log(imgDecodedBuffer);
// write to image file
var prefix = guid().toString() + ".jpg";
var filePath = './uploads/' + prefix;
console.log(filePath);
fs.writeFile(filePath, imgDecodedBuffer.data, function(err) {
console.log(err);
});
var stream = fs.createReadStream(filePath);
console.log(stream);
return s3fsImpl.writeFile(prefix, stream).then(function () {
fs.unlink(filePath, function (err) {
if (err) {
console.error(err);
}
});
});
})
Here are the relevant import statements:
var fs = require('fs');
var s3fs = require('s3fs');
var multiparty = require('connect-multiparty'),
multipartyMidleware = multiparty();
var s3fsImpl = new s3fs('blahblah', {
accessKeyId: 'ACCESS_KEY_ID',
secretAccessKey: 'SECRET'
});
Any help would be greatly appreciated!

If you simply just pass in the buffer, which I presume is in your imgDecodedBuffer.data value, it should work.

We Keep Coding

JavaScript is the programming language of the Web.

Node.js - Getting empty files when unzipping and uploading to GCS - javascript

Check whether your buffer is undefined or not . It may be due to unspecified disk/Mem storage that the buffer remains undefined .

Related

Discord.JS - List all files within a directory as one message

Create new json from source json file in test project

Javascript - Read parquet data (with snappy compression) from AWS s3 bucket

Merging PDFs in Node

How to upload image to S3 using Node

Categories

Resources