I have an express endpoint where i currently handle uploading of files. Large files are taking lots of memory b/c i was using bodyParser which buffers the entire file in memory before calling my handler function.
I removed the bodyParser middleware from this endpoint and i'm strugging to properly use streams to basically stream the file upload -> express -> s3.
This is the docs on the s3 method and it accepts a buffer or a stream.
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property
route
router.put('/files/:filename', putHandler({ s3Client: s3Client }))
I tried this which streams the file to my handler method, but it doesn't seem to be streaming it to the s3.upload method (no surprise really)
function put ({ s3Client }) {
return (req, res) => {
...
let whenFileUploaded = new Promise((resolve, reject) => {
// const { Readable } = require('stream')
// const inStream = new Readable({
// read() {}
// })
let data = ''
req.on('data', function (chunk) {
req.log.debug('in chunk')
data += chunk
// inStream.push(chunk)
})
req.on('end', function () {
req.log.debug('in end')
})
s3Client.upload(
{
Key: filepath,
Body: data,
SSECustomerAlgorithm: 'AES256',
SSECustomerKey: sseKey.id.split('-').join('')
},
{
partSize: 16 * 1024 * 1024, // 16mb
queuSize: 1
},
(err, data) => err ? reject(err) : resolve(data)
)
})
My guess is that I need to create a stream and pipe req.on('data... to my stream and then set Body: inStream which you can see i attempted with the commented out stuff, but that didn't seem to work either.
Help?
Turns out the answer is actually very simple. All I had to do was pass the req object.
function put ({ s3Client }) {
return (req, res) => {
...
let whenFileUploaded = new Promise((resolve, reject) => {
s3Client.upload(
{
Key: filepath,
Body: req, // <-- NOTE THIS LINE
SSECustomerAlgorithm: 'AES256',
SSECustomerKey: sseKey.id.split('-').join('')
},
{
partSize: 16 * 1024 * 1024, // 16mb
queuSize: 1
},
(err, data) => err ? reject(err) : resolve(data)
)
})
The way i found this out is b/c I looked at the express source code for what a req object is and I see that it is a http.IncomingMessage object - https://github.com/expressjs/express/blob/master/lib/request.js#L31
Then i looked at the Node docs and I see that http.IncomingMessage implements the Readable Stream interface
It implements the Readable Stream interface, as well as the following
additional events, methods, and properties.
https://nodejs.org/docs/latest-v9.x/api/http.html#http_class_http_incomingmessage
Related
I have been trying to upload a file to Firebase storage using a callable firebase cloud function.
All i am doing is fetching an image from an URL using axios and trying to upload to storage.
The problem i am facing is, I don't know how to save the response from axios and upload it to storage.
First , how to save the received file in the temp directory that os.tmpdir() creates.
Then how to upload it into storage.
Here i am receiving the data as arraybuffer and then converting it to Blob and trying to upload it.
Here is my code. I have been missing a major part i think.
If there is a better way, please recommend me. Ive been looking through a lot of documentation, and landed up with no clear solution. Please guide. Thanks in advance.
const bucket = admin.storage().bucket();
const path = require('path');
const os = require('os');
const fs = require('fs');
module.exports = functions.https.onCall((data, context) => {
try {
return new Promise((resolve, reject) => {
const {
imageFiles,
companyPIN,
projectId
} = data;
const filename = imageFiles[0].replace(/^.*[\\\/]/, '');
const filePath = `ProjectPlans/${companyPIN}/${projectId}/images/${filename}`; // Path i am trying to upload in FIrebase storage
const tempFilePath = path.join(os.tmpdir(), filename);
const metadata = {
contentType: 'application/image'
};
axios
.get(imageFiles[0], { // URL for the image
responseType: 'arraybuffer',
headers: {
accept: 'application/image'
}
})
.then(response => {
console.log(response);
const blobObj = new Blob([response.data], {
type: 'application/image'
});
return blobObj;
})
.then(async blobObj => {
return bucket.upload(blobObj, {
destination: tempFilePath // Here i am wrong.. How to set the path of downloaded blob file
});
}).then(buffer => {
resolve({ result: 'success' });
})
.catch(ex => {
console.error(ex);
});
});
} catch (error) {
// unknown: 500 Internal Server Error
throw new functions.https.HttpsError('unknown', 'Unknown error occurred. Contact the administrator.');
}
});
I'd take a slightly different approach and avoid using the local filesystem at all, since its just tmpfs and will cost you memory that your function is using anyway to hold the buffer/blob, so its simpler to just avoid it and write directly from that buffer to GCS using the save method on the GCS file object.
Here's an example. I've simplified out a lot of your setup, and I am using an http function instead of a callable. Likewise, I'm using a public stackoverflow image and not your original urls. In any case, you should be able to use this template to modify back to what you need (e.g. change the prototype and remove the http response and replace it with the return value you need):
const functions = require('firebase-functions');
const axios = require('axios');
const admin = require('firebase-admin');
admin.initializeApp();
exports.doIt = functions.https.onRequest((request, response) => {
const bucket = admin.storage().bucket();
const IMAGE_URL = 'https://cdn.sstatic.net/Sites/stackoverflow/company/img/logos/so/so-logo.svg';
const MIME_TYPE = 'image/svg+xml';
return axios.get(IMAGE_URL, { // URL for the image
responseType: 'arraybuffer',
headers: {
accept: MIME_TYPE
}
}).then(response => {
console.log(response); // only to show we got the data for debugging
const destinationFile = bucket.file('my-stackoverflow-logo.svg');
return destinationFile.save(response.data).then(() => { // note: defaults to resumable upload
return destinationFile.setMetadata({ contentType: MIME_TYPE });
});
}).then(() => { response.send('ok'); })
.catch((err) => { console.log(err); })
});
As a commenter noted, in the above example the axios request itself makes an external network access, and you will need to be on the Blaze or Flame plan for that. However, that alone doesn't appear to be your current problem.
Likewise, this also defaults to using a resumable upload, which the documentation does not recommend when you are doing large numbers of small (<10MB files) as there is some overhead.
You asked how this might be used to download multiple files. Here is one approach. First, lets assume you have a function that returns a promise that downloads a single file given its filename (I've abridged this from the above but its basically identical except for the change of INPUT_URL to filename -- note that it does not return a final result such as response.send(), and there's sort of an implicit assumption all the files are the same MIME_TYPE):
function downloadOneFile(filename) {
const bucket = admin.storage().bucket();
const MIME_TYPE = 'image/svg+xml';
return axios.get(filename, ...)
.then(response => {
const destinationFile = ...
});
}
Then, you just need to iteratively build a promise chain from the list of files. Lets say they are in imageUrls. Once built, return the entire chain:
let finalPromise = Promise.resolve();
imageUrls.forEach((item) => { finalPromise = finalPromise.then(() => downloadOneFile(item)); });
// if needed, add a final .then() section for the actual function result
return finalPromise.catch((err) => { console.log(err) });
Note that you could also build an array of the promises and pass them to Promise.all() -- that would likely be faster as you would get some parallelism, but I wouldn't recommend that unless you are very sure all of the data will fit inside the memory of your function at once. Even with this approach, you need to make sure the downloads can all complete within your function's timeout.
I want to pass original file stream pass down to other layer of code which will handle later drop on disk (upload to cloud storage) behavior. As files size might be large I can't actually fully buffer incoming file. I assume that PassThrough stream should pass needed data. While file.resume already called, finish event never get called.
How can I collect all required form fields along with single file stream and make proper service call, without explicit whole file in memory storage or on local disk, as I have a few of both of them?
private collectMultipartRequest (req: Request, fileFieldName: string): Promise<{ file: IFile, fields: { [k: string]: string }}> {
const obj = {
file: null,
fields: {}
};
return new Promise ((resolve, reject) => {
const busboy = new Busboy({ headers: req.headers, limits: { files: 1 }});
busboy.on("file", (fieldname, file, filename, mimetype) => {
if (fieldname === fileFieldName) {
const passThrough = new PassThrough();
file.pipe(passThrough);
obj.file = <IFile>{
mimeType: mimetype,
name: filename,
readStream: passThrough
};
}
file.resume();
});
busboy.on("field", (fieldName, val) => {
obj.fields[fieldName] = val;
});
busboy.on("filesLimit", () => {
reject(obj);
});
busboy.on("finish", async () => {
resolve(obj);
});
req.pipe(busboy);
});
}
Is there any way to use http2 directly in NestJs? (methods like pushStream etc) or should i use express for this kind of features?
No. There is none so far. But as long as you have the latest nodejs, you may use the http2 libraries.
You have 2 options:
write a nestjs custom transport (I wonder no one has done that so far)
mix it into your code yourself e.g. here a JS http2 client consuming
const http2 = require('http2')
const session = http2.connect('http://localhost:8088')
session.on('error', (err) => console.error(err))
const body = {
sql: "SELECT * FROM SIGNALS EMIT CHANGES;",
properties: {"ksql.streams.auto.offset.reset": "latest"}
}
const req = session.request({
':method': 'POST',
':path': '/query-stream',
'Content-Type': 'application/json'
})
req.write(JSON.stringify(body), 'utf8')
req.end()
req.on('response', (headers) => {
// we can log each response header here
for (const name in headers) {
console.log("a header: " + '${name}: ${headers[name]}')
}
})
req.setEncoding('utf8')
let data = ''
req.on('data', (chunk) => {
data += chunk
console.log('\n${data}')
})
req.on('end', () => {
console.log('\n${data}')
session.close()
})
trying to fetch a file from s3 bucket and storing it on the local, once its written to the local reading the file from the local and converting the data to json format and sending it.
i need to check whether the file is downloaded and written to local, once the file exist only read and convert it to json else send an error message.
once the file is on open i am writing the file and making end. So after end i can't send a return value. So how i can solve this one and use try catch to send proper error message.
const fetchFileDownloadAndWriteIt = () => {
let Bucket = "DataBucket";
let filename = "sample_data.csv";
let s3 = new AWS.S3();
const params = {
Bucket: Bucket,
Key: filename
};
return s3.getObject(params)
.promise()
.then(data => {
const file = fs.createWriteStream('./localdata/' + filename);
file.on("open", () => {
file.write(data.Body);
file.end();
})
.on("error", err => {
console.log("Error Occured while writing", err.message)
})
})
.catch(err => {
console.log("unable to fetch file from s3 Bucket", err.message)
})
}
exports.fetchData = async (req,res) => {
let fileDownloadAndWrite = await fetchFileAndDownloadWriteIt();
// need to check file is downloaded and written properly
const path = "./localdata/sample_data.csv";
const json = await csv().fromFile(path);
res.send({data: json})
}
You can return a new Promise instead of the one instead of the one you get by calling the SDK's API.
return new Promise((res, rej) => {
s3.getObject(params)
.promise()
.then(data => {
const file = fs.createWriteStream('./localdata/' + filename);
file
.on("open", () => {
file.write(data.Body);
file.end();
//success
res();
})
.on("error", err => {
rej(err);
})
})
.catch(err => {
rej(err);
})
});
This will resolve to undefined and rejected with the proper error occured, like while writing file, etc.
How to Call it in your handler?
Something like this would be fine.
exports.fetchData = async (req, res, next) => {
try {
await fetchFileDownloadAndWriteIt();
// need to check file is downloaded and written properly - here the file is actually downloaded and written properly.
const path = "./localdata/sample_data.csv";
const json = await csv().fromFile(path);
res.send({ data: json })
}
catch (err) {
return next(err);
}
}
I am new to nodejs and am trying to set up a server where i get the exif information from an image. My images are on S3 so I want to be able to just pass in the s3 url as a parameter and grab the image from it.
I am u using the ExifImage project below to get the exif info and according to their documentation:
"Instead of providing a filename of an image in your filesystem you can also pass a Buffer to ExifImage."
How can I load an image to the buffer in node from a url so I can pass it to the ExifImage function
ExifImage Project:
https://github.com/gomfunkel/node-exif
Thanks for your help!
Try setting up request like this:
var request = require('request').defaults({ encoding: null });
request.get(s3Url, function (err, res, body) {
//process exif here
});
Setting encoding to null will cause request to output a buffer instead of a string.
Use the axios:
const response = await axios.get(url, { responseType: 'arraybuffer' })
const buffer = Buffer.from(response.data, "utf-8")
import fetch from "node-fetch";
let fimg = await fetch(image.src)
let fimgb = Buffer.from(await fimg.arrayBuffer())
I was able to solve this only after reading that encoding: null is required and providing it as an parameter to request.
This will download the image from url and produce a buffer with the image data.
Using the request library -
const request = require('request');
let url = 'http://website.com/image.png';
request({ url, encoding: null }, (err, resp, buffer) => {
// Use the buffer
// buffer contains the image data
// typeof buffer === 'object'
});
Note: omitting the encoding: null will result in an unusable string and not in a buffer. Buffer.from won't work correctly too.
This was tested with Node 8
Use the request library.
request('<s3imageurl>', function(err, response, buffer) {
// Do something
});
Also, node-image-headers might be of interest to you. It sounds like it takes a stream, so it might not even have to download the full image from S3 in order to process the headers.
Updated with correct callback signature.
Here's a solution that uses the native https library.
import { get } from "https";
function urlToBuffer(url: string): Promise<Buffer> {
return new Promise((resolve, reject) => {
const data: Uint8Array[] = [];
get(url, (res) => {
res
.on("data", (chunk: Uint8Array) => {
data.push(chunk);
})
.on("end", () => {
resolve(Buffer.concat(data));
})
.on("error", (err) => {
reject(err);
});
});
});
}
const imageUrl = "https://i.imgur.com/8k7e1Hm.png";
const imageBuffer = await urlToBuffer(imageUrl);
Feel free to delete the types if you're looking for javascript.
I prefer this approach because it doesn't rely on 3rd party libraries or the deprecated request library.
request is deprecated and should be avoided if possible.
Good alternatives include got (only for node.js) and axios (which also support browsers).
Example of got:
npm install got
Using the async/await syntax:
const got = require('got');
const url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png';
(async () => {
try {
const response = await got(url, { responseType: 'buffer' });
const buffer = response.body;
} catch (error) {
console.log(error.body);
}
})();
you can do it that way
import axios from "axios";
function getFileContentById(
download_url: string
): Promise < Buffer > {
const response = await axios.get(download_url, {
responseType: "arraybuffer",
});
return Buffer.from(response.data, "base64");
}