I have a cloud function receiving a json string in a pubsub topic.
The goal is to extracts some data into a new json string.
Next parse it as JSONL.
And finally stream it to Google Cloud Storage.
I notice that sometimes the files seem to contain data and sometimes they do not.
The pubsub is working fine and data is coming into this cloud function just fine.
I tried adding some async awaits where I seem it might fit but I am afraid it has do to with the bufferstream. Both topics on where I have trouble getting my head around.
What could be the issue?
const stream = require('stream');
const { Storage } = require('#google-cloud/storage');
// Initiate the source
const bufferStream = new stream.PassThrough();
// Creates a client
const storage = new Storage();
// save stream to bucket
const toBucket = (message, filename) => {
// Write your buffer
bufferStream.end(Buffer.from(message));
const myBucket = storage.bucket(process.env.BUCKET);
const file = myBucket.file(filename);
// Pipe the 'bufferStream' into a 'file.createWriteStream' method.
bufferStream.pipe(file.createWriteStream({
validation: 'md5',
}))
.on('error', (err) => { console.error(err); })
.on('finish', () => {
// The file upload is complete.
console.log(`${filename} is uploaded`);
});
};
// extract correct fields
const extract = (entry) => ({
id: entry.id,
status: entry.status,
date_created: entry.date_created,
discount_total: entry.discount_total,
discount_tax: entry.discount_tax,
shipping_total: entry.shipping_total,
shipping_tax: entry.shipping_tax,
total: entry.total,
total_tax: entry.total_tax,
customer_id: entry.customer_id,
payment_method: entry.payment_method,
payment_method_title: entry.payment_method_title,
transaction_id: entry.transaction_id,
date_completed: entry.date_completed,
billing_city: entry.billing.city,
billing_state: entry.billing.state,
billing_postcode: entry.billing.postcode,
coupon_lines_id: entry.coupon_lines.id,
coupon_lines_code: entry.coupon_lines.code,
coupon_lines_discount: entry.coupon_lines.discount,
coupon_lines_discount_tax: entry.coupon_lines.discount_tax,
});
// format json to jsonl
const format = async (message) => {
let jsonl;
try {
// extract only the necessary
const jsonMessage = await JSON.parse(message);
const rows = await jsonMessage.map((row) => {
const extractedRow = extract(row);
return `${JSON.stringify(extractedRow)}\n`;
});
// join all lines as one string with no join symbol
jsonl = rows.join('');
console.log(jsonl);
} catch (e) {
console.error('jsonl conversion failed');
}
return jsonl;
};
exports.jsonToBq = async (event, context) => {
const message = Buffer.from(event.data, 'base64').toString();
const { filename } = event.attributes;
console.log(filename);
const jsonl = await format(message, filename);
toBucket(jsonl, filename);
};
it's fixed by moving the bufferstream const into the tobucket function.
Related
I am trying to send audio over web-sockets to node server where I want to store it in local file system. I am able to append the data received in a file, but it is not playable.
Most of the solution I found are for client side which use AudioContext.decodeAudiodata().
client
localAudioStreamRecorder = getMediaStreamRecording(
localAudioStream,
'audio/webm;codecs=opus',
(data: ArrayBuffer) => {
handleSendRecordingChunk(socket, {
...getIdentityPayload({ sessionId, userId, role }),
data,
streamType: 'audio',
})
}
)
server
const audioStreamPass = fs.createWriteStream(audioFilePath, { flags: 'a' });
const newData = async (socket, eventData, cb) => {
const { sessionId } = eventData.body;
if (eventData.body.streamType === 'audio') {
// Need help here
audioStreamPass.write(Buffer.from(new Uint8Array(eventData.body.data)));
}
};
I just want to know, how can I decode this data to something which is playable.
Thanks.
Try this on server:
const { Readable } = require('stream');
const audioStreamPass = fs.createWriteStream(audioFilePath, { flags: 'a' });
const newData = async (socket, eventData, cb) => {
const { sessionId } = eventData.body;
if (eventData.body.streamType === 'audio') {
// Create the readable stream from buffer
const readableStream = Readable.from(buffer);
//Pipe the stream to the writeable
readableStream.pipe(audioStreamPass)
};
I have a database listener in my code and I am trying to get every user's new posts and then (when I have all of them in an array) update the posts state.
My code looks like this but it is not working good, because setPosts is async and sometimes it might be called again before ending the state update. I think that I need to wrap the listener in a Promise but I have no idea how to do it detaching the listener when the component unmounts.
useEffect(() => {
const { firebase } = props;
// Realtime database listener
const unsuscribe = firebase
.getDatabase()
.collection("posts")
.doc(firebase.getCurrentUser().uid)
.collection("userPosts")
.onSnapshot((snapshot) => {
let changes = snapshot.docChanges();
changes.forEach(async (change) => {
if (change.type === "added") {
// Get the new post
const newPost = change.doc.data();
// TODO - Move to flatlist On end reached
const uri = await firebase
.getStorage()
.ref(`photos/${newPost.id}`)
.getDownloadURL();
// TODO - Add the new post *(sorted by time)* to the posts list
setPosts([{ ...newPost, uri }, ...posts]);
}
});
});
/* Pd: At the first time, this function will get all the user's posts */
return () => {
// Detach the listening agent
unsuscribe();
};
}, []);
Any ideas?
Also, I have think to do:
useEffect(() => {
const { firebase } = props;
let postsArray = [];
// Realtime database listener
const unsuscribe = firebase
.getDatabase()
.collection("posts")
.doc(firebase.getCurrentUser().uid)
.collection("userPosts")
.orderBy("time") // Sorted by date
.onSnapshot((snapshot) => {
let changes = snapshot.docChanges();
changes.forEach(async (change) => {
if (change.type === "added") {
// Get the new post
const newPost = change.doc.data();
// Add the new post to the posts list
postsArray.push(newPost);
}
});
setPosts(postsArray.reverse());
});
But in this case, the post uri is saved too in the firestore document (something I can do because I write on the firestore with a cloud function that gets the post from storage), and I don't know if it is a good practice.
Thanks.
Update
Cloud Function code:
exports.validateImageDimensions = functions
.region("us-central1")
.runWith({ memory: "2GB", timeoutSeconds: 120 })
.https.onCall(async (data, context) => {
// Libraries
const admin = require("firebase-admin");
const sizeOf = require("image-size");
const url = require("url");
const https = require("https");
const sharp = require("sharp");
const path = require("path");
const os = require("os");
const fs = require("fs");
// Lazy initialization of the Admin SDK
if (!is_validateImageDimensions_initialized) {
const serviceAccount = require("./serviceAccountKey.json");
admin.initializeApp({
// ...
});
is_validateImageDimensions_initialized = true;
}
// Create Storage
const storage = admin.storage();
// Create Firestore
const firestore = admin.firestore();
// Get the image's owner
const owner = context.auth.token.uid;
// Get the image's info
const { id, description, location, tags } = data;
// Photos's bucket
const bucket = storage.bucket("bucket-name");
// File Path
const filePath = `photos/${id}`;
// Get the file
const file = getFile(filePath);
// Check if the file is a jpeg image
const metadata = await file.getMetadata();
const isJpgImage = metadata[0].contentType === "image/jpeg";
// Get the file's url
const fileUrl = await getUrl(file);
// Get the photo dimensions using the `image-size` library
getImageFromUrl(fileUrl)
.then(async (image) => {
// Check if the image has valid dimensions
let dimensions = sizeOf(image);
// Create the associated Firestore's document to the valid images
if (isJpgImage && hasValidDimensions(dimensions)) {
// Create a thumbnail for the uploaded image
const thumbnailPath = await generateThumbnail(filePath);
// Get the thumbnail
const thumbnail = getFile(thumbnailPath);
// Get the thumbnail's url
const thumbnailUrl = await getUrl(thumbnail);
try {
await firestore
.collection("posts")
.doc(owner)
.collection("userPosts")
.add({
id,
uri: fileUrl,
thumbnailUri: thumbnailUrl, // Useful for progress images
description,
location,
tags,
date: admin.firestore.FieldValue.serverTimestamp(),
likes: [], // At the first time, when a post is created, zero users has liked it
comments: [], // Also, there aren't any comments
width: dimensions.width,
height: dimensions.height,
});
// TODO: Analytics posts counter
} catch (err) {
console.error(
`Error creating the document in 'posts/{owner}/userPosts/' where 'id === ${id}': ${err}`
);
}
} else {
// Remove the files that are not jpeg images, or whose dimensions are not valid
try {
await file.delete();
console.log(
`The image '${id}' has been deleted because it has invalid dimensions.
This may be an attempt to break the security of the app made by the user '${owner}'`
);
} catch (err) {
console.error(`Error deleting invalid file '${id}': ${err}`);
}
}
})
.catch((e) => {
console.log(e);
});
/* ---------------- AUXILIAR FUNCTIONS ---------------- */
function getFile(filePath) {
/* Get a file from the storage bucket */
return bucket.file(filePath);
}
async function getUrl(file) {
/* Get the public url of a file */
const signedUrls = await file.getSignedUrl({
action: "read",
expires: "01-01-2100",
});
// signedUrls[0] contains the file's public URL
return signedUrls[0];
}
function getImageFromUrl(uri) {
return new Promise((resolve, reject) => {
const options = url.parse(uri); // Automatically converted to an ordinary options object.
const request = https.request(options, (response) => {
if (response.statusCode < 200 || response.statusCode >= 300) {
return reject(new Error("statusCode=" + response.statusCode));
}
let chunks = [];
response.on("data", (chunk) => {
chunks.push(chunk);
});
response.on("end", () => {
try {
chunks = Buffer.concat(chunks);
} catch (e) {
reject(e);
}
resolve(chunks);
});
});
request.on("error", (e) => {
reject(e.message);
});
// Send the request
request.end();
});
}
function hasValidDimensions(dimensions) {
// Posts' valid dimensions
const validDimensions = [
{
width: 1080,
height: 1080,
},
{
width: 1080,
height: 1350,
},
{
width: 1080,
height: 750,
},
];
return (
validDimensions.find(
({ width, height }) =>
width === dimensions.width && height === dimensions.height
) !== undefined
);
}
async function generateThumbnail(filePath) {
/* Generate thumbnail for the progressive images */
// Download file from bucket
const fileName = filePath.split("/").pop();
const tempFilePath = path.join(os.tmpdir(), fileName);
const thumbnailPath = await bucket
.file(filePath)
.download({
destination: tempFilePath,
})
.then(() => {
// Generate a thumbnail using Sharp
const size = 50;
const newFileName = `${fileName}_${size}_thumb.jpg`;
const newFilePath = `thumbnails/${newFileName}`;
const newFileTemp = path.join(os.tmpdir(), newFileName);
sharp(tempFilePath)
.resize(size, null)
.toFile(newFileTemp, async (_err, info) => {
// Uploading the thumbnail.
await bucket.upload(newFileTemp, {
destination: newFilePath,
});
// Once the thumbnail has been uploaded delete the temporal file to free up disk space.
fs.unlinkSync(tempFilePath);
});
// Return the thumbnail's path
return newFilePath;
});
return thumbnailPath;
}
});
I'm writing a Lambda function which is given a list of text files on S3, and concatenates them together, and then zips that resulting file. For some reason, the function is bombing out in the middle of the process, with no errors.
The payload sent to the Lambda func looks like this:
{
"sourceFiles": [
"s3://bucket/largefile1.txt",
"s3://bucket/largefile2.txt"
],
"destinationFile": "s3://bucket/concat.zip",
"compress": true,
"omitHeader": false,
"preserveSourceFiles": true
}
The scenarios in which this function works totally fine:
The two files are small, and compress === false
The two files are small, and compress === true
The two files are large, and compress === false
If I try to have it compress two large files, it quits in the middle. The concatenation process itself works fine, but when it tries to use zip-stream to add the stream to an archive, it fails.
The two large files together are 483,833 bytes. When the Lambda function fails, it reads either 290,229 or 306,589 bytes (it's random) then quits.
This is the main entry point of the function:
const packer = require('zip-stream');
const S3 = require('aws-sdk/clients/s3');
const s3 = new S3({ apiVersion: '2006-03-01' });
const { concatCsvFiles } = require('./csv');
const { s3UrlToParts } = require('./utils');
function addToZip(archive, stream, options) {
return new Promise((resolve, reject) => {
archive.entry(stream, options, (err, entry) => {
console.log('entry done', entry);
if (err) reject(err);
resolve(entry);
});
});
}
export const handler = async event => {
/**
* concatCsvFiles returns a readable stream to pass to either the archiver or
* s3.upload.
*/
let bytesRead = 0;
try {
const stream = await concatCsvFiles(event.sourceFiles, {
omitHeader: event.omitHeader,
});
stream.on('data', chunk => {
bytesRead += chunk.length;
console.log('read', bytesRead, 'bytes so far');
});
stream.on('end', () => {
console.log('this is never called :(');
});
const dest = s3UrlToParts(event.destinationFile);
let archive;
if (event.compress) {
archive = new packer();
await addToZip(archive, stream, { name: 'concat.csv' });
archive.finalize();
}
console.log('uploading');
await s3
.upload({
Body: event.compress ? archive : stream,
Bucket: dest.bucket,
Key: dest.key,
})
.promise();
console.log('done uploading');
if (!event.preserveSourceFiles) {
const s3Objects = event.sourceFiles.map(s3Url => {
const { bucket, key } = s3UrlToParts(s3Url);
return {
bucket,
key,
};
});
await s3
.deleteObjects({
Bucket: s3Objects[0].bucket,
Delete: {
Objects: s3Objects.map(s3Obj => ({ Key: s3Obj.key })),
},
})
.promise();
}
console.log('## Never gets here');
// return {
// newFile: event.destinationFile,
// };
} catch (e) {
if (e.code) {
throw new Error(e.code);
}
throw e;
}
};
And this is the concatenation code:
import MultiStream from 'multistream';
import { Readable } from 'stream';
import S3 from 'aws-sdk/clients/s3';
import { s3UrlToParts } from './utils';
const s3 = new S3({ apiVersion: '2006-03-01' });
/**
* Takes an array of S3 URLs and returns a readable stream of the concatenated results
* #param {string[]} s3Urls Array of S3 URLs
* #param {object} options Options
* #param {boolean} options.omitHeader Omit the header from the final output
*/
export async function concatCsvFiles(s3Urls, options = {}) {
// Get the header so we can use the length to set an offset in grabbing files
const firstFile = s3Urls[0];
const file = s3UrlToParts(firstFile);
const data = await s3
.getObject({
Bucket: file.bucket,
Key: file.key,
Range: 'bytes 0-512', // first 512 bytes is pretty safe for header size
})
.promise();
const streams = [];
const [header] = data.Body.toString().split('\n');
for (const s3Url of s3Urls) {
const { bucket, key } = s3UrlToParts(s3Url);
const stream = s3
.getObject({
Bucket: bucket,
Key: key,
Range: `bytes=${header.length + 1}-`, // +1 for newline char
})
.createReadStream();
streams.push(stream);
}
if (!options.omitHeader) {
const headerStream = new Readable();
headerStream.push(header + '\n');
headerStream.push(null);
streams.unshift(headerStream);
}
const combinedStream = new MultiStream(streams);
return combinedStream;
}
Got it. The problem was actually with the zip-stream library. Apparently it doesn't work well with S3 + streaming. I tried yazl and it works perfectly.
I manually upload the JSON file to google cloud storage by creating a new project. I am able to read the metadata for a file but I don't know how to read the JSON content.
The code I used to read the metadata is:
var Storage = require('#google-cloud/storage');
const storage = Storage({
keyFilename: 'service-account-file-path',
projectId: 'project-id'
});
storage
.bucket('project-name')
.file('file-name')
.getMetadata()
.then(results => {
console.log("results is", results[0])
})
.catch(err => {
console.error('ERROR:', err);
});
Can someone guide me to the way to read the JSON file content?
I've used the following code to read a json file from Cloud Storage:
'use strict';
const Storage = require('#google-cloud/storage');
const storage = Storage();
exports.readFile = (req, res) => {
console.log('Reading File');
var archivo = storage.bucket('your-bucket').file('your-JSON-file').createReadStream();
console.log('Concat Data');
var buf = '';
archivo.on('data', function(d) {
buf += d;
}).on('end', function() {
console.log(buf);
console.log("End");
res.send(buf);
});
};
I'm reading from a stream and concat all the data within the file to the buf variable.
Hope it helps.
UPDATE
To read multiple files:
'use strict';
const {Storage} = require('#google-cloud/storage');
const storage = new Storage();
listFiles();
async function listFiles() {
const bucketName = 'your-bucket'
console.log('Listing objects in a Bucket');
const [files] = await storage.bucket(bucketName).getFiles();
files.forEach(file => {
console.log('Reading: '+file.name);
var archivo = file.createReadStream();
console.log('Concat Data');
var buf = '';
archivo.on('data', function(d) {
buf += d;
}).on('end', function() {
console.log(buf);
console.log("End");
});
});
};
I was using the createWriteStream method like the other answers but I had a problem with the output in that it randomly output invalid characters (�) for some characters in a string. I thought it could be some encoding problems.
I came up with my workaround that uses the download method. The download method returns a DownloadResponse that contains an array of Buffer. We then use Buffer.toString() method and give it an encoding of utf8 and parse the result with JSON.parse().
const downloadAsJson = async (bucket, path) => {
const file = await new Storage()
.bucket(bucket)
.file(path)
.download();
return JSON.parse(file[0].toString('utf8'));
}
There exists a convenient method:'download' to download a file into memory or to a local destination. You may use download method as follows:
const bucketName='bucket name here';
const fileName='file name here';
const storage = new Storage.Storage();
const file = storage.bucket(bucketName).file(fileName);
file.download(function(err, contents) {
console.log("file err: "+err);
console.log("file data: "+contents);
});
A modern version of this:
const { Storage } = require('#google-cloud/storage')
const storage = new Storage()
const bucket = storage.bucket('my-bucket')
// The function that returns a JSON string
const readJsonFromFile = async remoteFilePath => new Promise((resolve, reject) => {
let buf = ''
bucket.file(remoteFilePath)
.createReadStream()
.on('data', d => (buf += d))
.on('end', () => resolve(buf))
.on('error', e => reject(e))
})
// Example usage
(async () => {
try {
const json = await readJsonFromFile('path/to/json-file.json')
console.log(json)
} catch (e) {
console.error(e)
}
})()
I’m a bit confused with how to proceed. I am using Archive ( node js module) as a means to write data to a zip file. Currently, I have my code working when I write to a file (local storage).
var fs = require('fs');
var archiver = require('archiver');
var output = fs.createWriteStream(__dirname + '/example.zip');
var archive = archiver('zip', {
zlib: { level: 9 }
});
archive.pipe(output);
archive.append(mybuffer, {name: ‘msg001.txt’});
I’d like to modify the code so that the archive target file is an AWS S3 bucket. Looking at the code examples, I can specify the bucket name and key (and body) when I create the bucket object as in:
var s3 = new AWS.S3();
var params = {Bucket: 'myBucket', Key: 'myMsgArchive.zip' Body: myStream};
s3.upload( params, function(err,data){
…
});
Or
s3 = new AWS.S3({ parms: {Bucket: ‘myBucket’ Key: ‘myMsgArchive.zip’}});
s3.upload( {Body: myStream})
.send(function(err,data) {
…
});
With regards to my S3 example(s), myStream appears to be a readable stream and I am confused as how to make this work as archive.pipe requires a writeable stream. Is this something where we need to use a pass-through stream? I’ve found an example where someone created a pass-through stream but the example is too terse to gain proper understanding. The specific example I am referring to is:
Pipe a stream to s3.upload()
Any help someone can give me would greatly be appreciated. Thanks.
This could be useful for anyone else wondering how to use pipe.
Since you correctly referenced the example using the pass-through stream, here's my working code:
1 - The routine itself, zipping files with node-archiver
exports.downloadFromS3AndZipToS3 = () => {
// These are my input files I'm willing to read from S3 to ZIP them
const files = [
`${s3Folder}/myFile.pdf`,
`${s3Folder}/anotherFile.xml`
]
// Just in case you like to rename them as they have a different name in the final ZIP
const fileNames = [
'finalPDFName.pdf',
'finalXMLName.xml'
]
// Use promises to get them all
const promises = []
files.map((file) => {
promises.push(s3client.getObject({
Bucket: yourBubucket,
Key: file
}).promise())
})
// Define the ZIP target archive
let archive = archiver('zip', {
zlib: { level: 9 } // Sets the compression level.
})
// Pipe!
archive.pipe(uploadFromStream(s3client, 'someDestinationFolderPathOnS3', 'zipFileName.zip'))
archive.on('warning', function(err) {
if (err.code === 'ENOENT') {
// log warning
} else {
// throw error
throw err;
}
})
// Good practice to catch this error explicitly
archive.on('error', function(err) {
throw err;
})
// The actual archive is populated here
return Promise
.all(promises)
.then((data) => {
data.map((thisFile, index) => {
archive.append(thisFile.Body, { name: fileNames[index] })
})
archive.finalize()
})
}
2 - The helper method
const uploadFromStream = (s3client) => {
const pass = new stream.PassThrough()
const s3params = {
Bucket: yourBucket,
Key: `${someFolder}/${aFilename}`,
Body: pass,
ContentType: 'application/zip'
}
s3client.upload(s3params, (err, data) => {
if (err)
console.log(err)
if (data)
console.log('Success')
})
return pass
}
The following example takes the accepted answer and makes it work with local files as requested.
const archiver = require("archiver")
const fs = require("fs")
const AWS = require("aws-sdk")
const s3 = new AWS.S3()
const stream = require("stream")
const zipAndUpload = async () => {
const files = [`test1.txt`, `test2.txt`]
const fileNames = [`test1target.txt`, `test2target.txt`]
const archive = archiver("zip", {
zlib: { level: 9 } // Sets the compression level.
})
files.map((thisFile, index) => {
archive.append(fs.createReadStream(thisFile), { name: fileNames[index] })
})
const uploadStream = new stream.PassThrough()
archive.pipe(uploadStream)
archive.finalize()
archive.on("warning", function (err) {
if (err.code === "ENOENT") {
console.log(err)
} else {
throw err
}
})
archive.on("error", function (err) {
throw err
})
archive.on("end", function () {
console.log("archive end")
})
await uploadFromStream(uploadStream)
console.log("all done")
}
const uploadFromStream = async pass => {
const s3params = {
Bucket: "bucket-name",
Key: `streamtest.zip`,
Body: pass,
ContentType: "application/zip"
}
return s3.upload(s3params).promise()
}
zipAndUpload()