I'm still trying to grok my way through streams in general. I have been able to stream a large file using multiparty from within form.on('part'). But I need to defer the invocation and resolve the stream before it's read. I have tried PassThrough, through. through2, but have gotten different results, which it mainly hangs, and I can't figure out what to do, nor steps to debug. I'm open to all alternatives. Thanks for all insights.
import multiparty from 'multiparty'
import {
PassThrough
} from 'stream';
import through from 'through'
import through2 from 'through2'
export function promisedMultiparty(req) {
return new Promise((resolve, reject) => {
const form = new multiparty.Form()
const form_files = []
let q_str = ''
form.on('field', (fieldname, value) => {
if (value) q_str = appendQStr(fieldname, value, q_str)
})
form.on('part', async (part) => {
if (part.filename) {
const pass1 = new PassThrough() // this hangs at 10%
const pass2 = through(function write(data) { // this hangs from the beginning
this.queue(data)
},
function end() {
this.queue(null)
})
const pass3 = through2() // this hangs at 10%
/*
// This way works for large files, but I want to defer
// invocation
const form_data = new FormData()
form_data.append(savepath, part, {
filename,
})
const r = request.post(url, {
headers: {
'transfer-encoding': 'chunked'
}
}, responseCallback(resolve))
r._form = form
*/
form_files.push({
part: part.pipe(pass1),
// part: part.pipe(pass2),
// part: part.pipe(pass3),
})
} else {
part.resume()
}
})
form.on('close', () => {
resolve({
fields: qs.parse(q_str),
forms: form_files,
})
})
form.parse(req)
})
}
p.s. For sure the title could be better, if someone could use the proper terms please. Thanks.
I believe this is because you are not using through2 correctly - i.e. not actually emptying the buffer once it's full (that's why it hangs at 10% on bigger files, but works on smaller ones).
I believe an implementation like this should do it:
const pass2 = through2(function(chunk, encoding, next) {
// do something with the data
// Use this only if you want to send the data further to another stream reader
// Note - From your implementation you don't seem to need it
// this.push(data)
// This is what tells through2 it's ready to empty the
// buffer and read more data
next();
})
Related
I have simple script to handle CSV file with size 10GB. The idea is pretty simple.
Open file as stream.
Parse CSV objects from it.
Modify objects.
Make output stream to new file.
I made following code, but it cause memory leak. I have tried a lot of different things, but nothing helps. The memory leak disappear if I remove transformer from pipes. Maybe it causes memory leak.
I run the code under NodeJS.
Can you help me found where I am wrong?
'use strict';
import fs from 'node:fs';
import {parse, transform, stringify} from 'csv';
import lineByLine from 'n-readlines';
// big input file
const inputFile = './input-data.csv';
// read headers first
const linesReader = new lineByLine(inputFile);
const firstLine = linesReader.next();
linesReader.close();
const headers = firstLine.toString()
.split(',')
.map(header => {
return header
.replace(/^"/, '')
.replace(/"$/, '')
.replace(/\s+/g, '_')
.replace('(', '_')
.replace(')', '_')
.replace('.', '_')
.replace(/_+$/, '');
});
// file stream
const fileStream1 = fs.createReadStream(inputFile);
// parser stream
const parserStream1 = parse({delimiter: ',', cast: true, columns: headers, from_line: 1});
// transformer
const transformer = transform(function(record) {
return Object.assign({}, record, {
SomeField: 'BlaBlaBla',
});
});
// stringifier stream
const stringifier = stringify({delimiter: ','});
console.log('Loading data...');
// chain of pipes
fileStream1.on('error', err => { console.log(err); })
.pipe(parserStream1).on('error', err => {console.log(err); })
.pipe(transformer).on('error', err => { console.log(err); })
.pipe(stringifier).on('error', err => { console.log(err); })
.pipe(fs.createWriteStream('./_data/new-data.csv')).on('error', err => { console.log(err); })
.on('finish', () => {
console.log('Loading data finished!');
});
I have been trying to upload a file to Firebase storage using a callable firebase cloud function.
All i am doing is fetching an image from an URL using axios and trying to upload to storage.
The problem i am facing is, I don't know how to save the response from axios and upload it to storage.
First , how to save the received file in the temp directory that os.tmpdir() creates.
Then how to upload it into storage.
Here i am receiving the data as arraybuffer and then converting it to Blob and trying to upload it.
Here is my code. I have been missing a major part i think.
If there is a better way, please recommend me. Ive been looking through a lot of documentation, and landed up with no clear solution. Please guide. Thanks in advance.
const bucket = admin.storage().bucket();
const path = require('path');
const os = require('os');
const fs = require('fs');
module.exports = functions.https.onCall((data, context) => {
try {
return new Promise((resolve, reject) => {
const {
imageFiles,
companyPIN,
projectId
} = data;
const filename = imageFiles[0].replace(/^.*[\\\/]/, '');
const filePath = `ProjectPlans/${companyPIN}/${projectId}/images/${filename}`; // Path i am trying to upload in FIrebase storage
const tempFilePath = path.join(os.tmpdir(), filename);
const metadata = {
contentType: 'application/image'
};
axios
.get(imageFiles[0], { // URL for the image
responseType: 'arraybuffer',
headers: {
accept: 'application/image'
}
})
.then(response => {
console.log(response);
const blobObj = new Blob([response.data], {
type: 'application/image'
});
return blobObj;
})
.then(async blobObj => {
return bucket.upload(blobObj, {
destination: tempFilePath // Here i am wrong.. How to set the path of downloaded blob file
});
}).then(buffer => {
resolve({ result: 'success' });
})
.catch(ex => {
console.error(ex);
});
});
} catch (error) {
// unknown: 500 Internal Server Error
throw new functions.https.HttpsError('unknown', 'Unknown error occurred. Contact the administrator.');
}
});
I'd take a slightly different approach and avoid using the local filesystem at all, since its just tmpfs and will cost you memory that your function is using anyway to hold the buffer/blob, so its simpler to just avoid it and write directly from that buffer to GCS using the save method on the GCS file object.
Here's an example. I've simplified out a lot of your setup, and I am using an http function instead of a callable. Likewise, I'm using a public stackoverflow image and not your original urls. In any case, you should be able to use this template to modify back to what you need (e.g. change the prototype and remove the http response and replace it with the return value you need):
const functions = require('firebase-functions');
const axios = require('axios');
const admin = require('firebase-admin');
admin.initializeApp();
exports.doIt = functions.https.onRequest((request, response) => {
const bucket = admin.storage().bucket();
const IMAGE_URL = 'https://cdn.sstatic.net/Sites/stackoverflow/company/img/logos/so/so-logo.svg';
const MIME_TYPE = 'image/svg+xml';
return axios.get(IMAGE_URL, { // URL for the image
responseType: 'arraybuffer',
headers: {
accept: MIME_TYPE
}
}).then(response => {
console.log(response); // only to show we got the data for debugging
const destinationFile = bucket.file('my-stackoverflow-logo.svg');
return destinationFile.save(response.data).then(() => { // note: defaults to resumable upload
return destinationFile.setMetadata({ contentType: MIME_TYPE });
});
}).then(() => { response.send('ok'); })
.catch((err) => { console.log(err); })
});
As a commenter noted, in the above example the axios request itself makes an external network access, and you will need to be on the Blaze or Flame plan for that. However, that alone doesn't appear to be your current problem.
Likewise, this also defaults to using a resumable upload, which the documentation does not recommend when you are doing large numbers of small (<10MB files) as there is some overhead.
You asked how this might be used to download multiple files. Here is one approach. First, lets assume you have a function that returns a promise that downloads a single file given its filename (I've abridged this from the above but its basically identical except for the change of INPUT_URL to filename -- note that it does not return a final result such as response.send(), and there's sort of an implicit assumption all the files are the same MIME_TYPE):
function downloadOneFile(filename) {
const bucket = admin.storage().bucket();
const MIME_TYPE = 'image/svg+xml';
return axios.get(filename, ...)
.then(response => {
const destinationFile = ...
});
}
Then, you just need to iteratively build a promise chain from the list of files. Lets say they are in imageUrls. Once built, return the entire chain:
let finalPromise = Promise.resolve();
imageUrls.forEach((item) => { finalPromise = finalPromise.then(() => downloadOneFile(item)); });
// if needed, add a final .then() section for the actual function result
return finalPromise.catch((err) => { console.log(err) });
Note that you could also build an array of the promises and pass them to Promise.all() -- that would likely be faster as you would get some parallelism, but I wouldn't recommend that unless you are very sure all of the data will fit inside the memory of your function at once. Even with this approach, you need to make sure the downloads can all complete within your function's timeout.
I have an application that persists its state on disk, when any state change occur it reads from file the old state, it changes the state on memory and persists on disk again. But, the problem is that store function is writing on disk only after close program. I don't know why?
const load = (filePath) => {
const fileBuffer = fs.readFileSync(
filePath, "utf8"
);
return JSON.parse(fileBuffer);
}
const store = (filePath, data) => {
const contentString = JSON.stringify(data);
fs.writeFileSync(filePath, contentString);
}
To create a complete example, let's use load-dataset command, in the file "src/interpreter/index.js".
while(this.isRunning) {
readLineSync.promptCL({
"load-dataset": async (type, name, from) => {
await loadDataset({type, name, from});
},
...
}, {
limit: null,
});
}
In general, this calls loadDatasets, which reads json ou csv files.
export const loadDataset = async (options) => {
switch(options.type) {
case "csv":
await readCSVFile(options.from)
.then(data => {
app.createDataset(options.name, data);
});
break;
case "json":
const data = readJSONFile(options.from);
app.createDataset(options.name, data);
break;
}
}
The method createDataset() read the file on disk, update it and write again.
createDataset(name, data) {
const state = loadState();
state.datasets = [
...state.datasets,
{name, size: data.length}
];
storeState(state);
const file = loadDataset();
file.datasets = [
...file.datasets,
{name, data}
];
storeDataset(file);
}
Where methods loadState(), storeState(), loadDataset(), storeDataset() uses initial methods.
const loadState = () =>
load(stateFilePath);
const storeState = state =>
store(stateFilePath, state);
...
const loadDataset = () =>
load(datasetFilePath);
const storeDataset = dataset =>
store(datasetFilePath, dataset);
I'm using a package from npm called readline-sync to create a simple "terminal", I don't know if it causes some conflicts.
The source code is in the Github: Git repo. In the file "index.js", the method createDataset() calls loadState() and storeState(), which both uses the methods showed above.
The package readline-sync is used in the interpreter, here Interpreter file, which basic loops until exit command.
Just as note, I'm using Ubuntu 18.04.2 and Node.js 10.15.0. To make this code I saw an example, in the YouTube Video. This guy is using a MAC OS X, I really hope that the system won't be problem.
I'm trying to stream a lot of data from a NodeJS server that fetches the data from Mongo and sends it to React. Since it's quite a lot of data, I've decided to stream it from the server and display it in React as soon as it comes in. Here's a slightly simplified version of what I've got on the server:
const getQuery = async (req, res) => {
const { body } = req;
const query = mongoQueries.buildFindQuery(body);
res.set({ 'Content-Type': 'application/octet-stream' });
Log.find(query).cursor()
.on('data', (doc) => {
console.log(doc);
const data = JSON.stringify(result);
res.write(`${data}\r\n`);
}
})
.on('end', () => {
console.log('Data retrieved.');
res.end();
});
};
Here's the React part:
fetch(url, { // this fetch fires the getQuery function on the backend
method: "POST",
body: JSON.stringify(object),
headers: {
"Content-Type": "application/json",
}
})
.then(response => {
const reader = response.body.getReader();
const decoder = new TextDecoder();
const pump = () =>
reader.read().then(({ done, value }) => {
if (done) return this.postEndHandler();
console.log(value.length); // !!!
const decoded = decoder.decode(value);
this.display(decoded);
return pump();
});
return pump();
})
.catch(err => {
console.error(err);
toast.error(err.message);
});
}
display(chunk) {
const { data } = this.state;
try {
const parsedChunk = chunk.split('\r\n').slice(0, -1);
parsedChunk.forEach(e => data.push(JSON.parse(e)));
return this.setState({data});
} catch (err) {
throw err;
}
}
It's a 50/50 whether it completes with no issues or fails at React's side of things. When it fails, it's always because of an incomplete JSON object in parsedChunk.forEach. I did some digging and it turns out that every time it fails, the console.log that I marked with 3 exclamation marks shows 65536. I'm 100% certain it's got something to do with my streams implementation and I'm not queuing the chunks correctly but I'm not sure whether I should be fixing it client or server side. Any help would be greatly appreciated.
Instead of implementing your own NDJSON-like streaming JSON protocol which you are basically doing here (with all of the pitfalls of dividing the stream into chunks and packets which is not always under your control), you can take a look at some of the existing tools that are created to do what you need, e.g.:
http://oboejs.com/
http://ndjson.org/
https://www.npmjs.com/package/stream-json
https://www.npmjs.com/package/JSONStream
https://www.npmjs.com/package/clarinet
I am new to nodejs and am trying to set up a server where i get the exif information from an image. My images are on S3 so I want to be able to just pass in the s3 url as a parameter and grab the image from it.
I am u using the ExifImage project below to get the exif info and according to their documentation:
"Instead of providing a filename of an image in your filesystem you can also pass a Buffer to ExifImage."
How can I load an image to the buffer in node from a url so I can pass it to the ExifImage function
ExifImage Project:
https://github.com/gomfunkel/node-exif
Thanks for your help!
Try setting up request like this:
var request = require('request').defaults({ encoding: null });
request.get(s3Url, function (err, res, body) {
//process exif here
});
Setting encoding to null will cause request to output a buffer instead of a string.
Use the axios:
const response = await axios.get(url, { responseType: 'arraybuffer' })
const buffer = Buffer.from(response.data, "utf-8")
import fetch from "node-fetch";
let fimg = await fetch(image.src)
let fimgb = Buffer.from(await fimg.arrayBuffer())
I was able to solve this only after reading that encoding: null is required and providing it as an parameter to request.
This will download the image from url and produce a buffer with the image data.
Using the request library -
const request = require('request');
let url = 'http://website.com/image.png';
request({ url, encoding: null }, (err, resp, buffer) => {
// Use the buffer
// buffer contains the image data
// typeof buffer === 'object'
});
Note: omitting the encoding: null will result in an unusable string and not in a buffer. Buffer.from won't work correctly too.
This was tested with Node 8
Use the request library.
request('<s3imageurl>', function(err, response, buffer) {
// Do something
});
Also, node-image-headers might be of interest to you. It sounds like it takes a stream, so it might not even have to download the full image from S3 in order to process the headers.
Updated with correct callback signature.
Here's a solution that uses the native https library.
import { get } from "https";
function urlToBuffer(url: string): Promise<Buffer> {
return new Promise((resolve, reject) => {
const data: Uint8Array[] = [];
get(url, (res) => {
res
.on("data", (chunk: Uint8Array) => {
data.push(chunk);
})
.on("end", () => {
resolve(Buffer.concat(data));
})
.on("error", (err) => {
reject(err);
});
});
});
}
const imageUrl = "https://i.imgur.com/8k7e1Hm.png";
const imageBuffer = await urlToBuffer(imageUrl);
Feel free to delete the types if you're looking for javascript.
I prefer this approach because it doesn't rely on 3rd party libraries or the deprecated request library.
request is deprecated and should be avoided if possible.
Good alternatives include got (only for node.js) and axios (which also support browsers).
Example of got:
npm install got
Using the async/await syntax:
const got = require('got');
const url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png';
(async () => {
try {
const response = await got(url, { responseType: 'buffer' });
const buffer = response.body;
} catch (error) {
console.log(error.body);
}
})();
you can do it that way
import axios from "axios";
function getFileContentById(
download_url: string
): Promise < Buffer > {
const response = await axios.get(download_url, {
responseType: "arraybuffer",
});
return Buffer.from(response.data, "base64");
}