I wrote this function to generate png files from some svg files across a few directories. I was doing the below functionality synchronously and it was working as expected (same code as below but with readFileSync), but was told to re-do it to use only promisified fs functions.
The current code skips a couple files in both groupA and groupB, plus its swapping widths. For example, I've noticed the conversion function wont generate for svg1 of dirB, but will generate for svg1 of dirA though it has incorrect width that matches svg1 of dirB.
Most files convert correctly, but a handful don't. My guess is its a timing issue, so how do I fix that while still keeping the fs functionality all promisified?
const { createConverter } = require('convert-svg-to-png');
const fs = require('fs');
const path = require('path');
const util = require('util');
const readdir = util.promisify(fs.readdir);
const readFile = util.promisify(fs.readFile);
async function convertSvgFiles(dirPath) {
const converter = createConverter();
try {
const files = await readdir(dirPath);
for (let file of files) {
const currentFile = path.join(dirPath, fil);
const fileContents = await readFile(currentFile);
const fileWidth = fileContents.toString('utf8').match(\*regex capture viewbox width*\);
await converter.convertFile(currentFile, { width: fileWidth });
}
} catch (err) {
console.warn('Error while converting a file to png', '\n', err);
} finally {
await converter.destroy();
}
}
['dirA', 'dirB', 'dirC'].map(dir => convertSvgFiles(`src/${dir}`));
Your code looks pretty good. I do not see anything obvious that would cause the behavior you describe - you await each promise within the function. You are not using global variables.
I'll say that this line:
['dirA', 'dirB', 'dirC'].map(dir => convertSvgFiles(`src/${dir}`));
ends up running your function on all 3 directories in parallel, with 3 converter instances. Assuming there are no parallel-related bugs in converter, this should not cause any issues.
But just for grins, try changing that to:
async function run() {
for (let d of ['dirA', 'dirB', 'dirC']) {
await convertSvgFiles(`src/${d}`)
}
}
run()
to force the folders to be scanned sequentially. If this resolves the issue, then there's a bug within convert-svg-to-png.
Related
I am working on a small project. discussing Step by step
At first I am uploading zip files though multer
extracting those files (How can I call extract function after completing upload using multer?)
After extracting those I am trying to filter those files
after filtering those files I want to move some files to another directory
in my main index.js I have
A simple route to upload files which is working
// MAIN API ENDPOINT
app.post("/api/zip-upload", upload, async (req, res, next) => {
console.log("FIles - ", req.files);
});
Continuous checking for if there is any zip file that needs to unzip but the problem is after uploading it's not showing any files or dir
// UNZIP FILES
const dir = `${__dirname}/uploads`;
const files = fs.readdirSync("./uploads");
const filesUnzip = async () => {
try {
if (fs.existsSync(dir)) {
console.log("files - ", files);
for (const file of files) {
console.log("file - ", file);
try {
const extracted = await extract("./uploads/" + file, { dir: __dirname + "/uploads/" });
console.log("Extracted - ",extracted);
// const directories = await fs.statSync(dir + '/' + file).isDirectory();
} catch (bufErr) {
// console.log("------------");
console.log(bufErr.syscall);
}
};
// const directories = await files.filter(function (file) { return fs.statSync(dir + '/' + file).isDirectory(); });
// console.log(directories);
}
} catch (err) {
console.log(err);
}
return;
}
setInterval(() => {
filesUnzip();
}, 2000);
Moving files to static directory but here is the same problem no directory found
const getAllDirs = async () => {
// console.log(fs.existsSync(dir));
// FIND ALL DIRECTORIES
if (fs.existsSync(dir)) {
const directories = await files.filter(function (file) { return fs.statSync(dir + '/' + file).isDirectory(); });
console.log("Directories - ",directories);
if (directories.length > 0) {
for (let d of directories) {
const subdirFiles = fs.readdirSync("./uploads/" + d);
for (let s of subdirFiles) {
if (s.toString().match(/\.xml$/gm) || s.toString().match(/\.xml$/gm) !== null) {
console.log("-- ", d + "/" + s);
const move = await fs.rename("uploads/" + d + "/" + s, __dirname + "/static/" + s, (err) => { console.log(err) });
console.log("Move - ", move);
}
}
}
}
}
}
setInterval(getAllDirs, 3000);
There are so many issues with your code, I don't know where to begin:
Why are you using fs.xxxSync() methods if all your functions are async? Using xxxSync() methods is highly discouraged because it's blocking the server (ie parallel requests can't/won't be accepted while a sync reading is in progress). The fs module supports a promise api ...
Your "Continuous checking" for new files is always checking the same (probably empty) files array because it seems you are executing files = fs.readdirSync("./uploads"); only once (probably at server start, but I can't tell for sure because there isn't any context for that snippet)
You shouldn't be polling that "uploads" directory. Because as writing a file (if done properly) is an asynchronous process, you may end up reading incomplete files. Instead you should trigger the unzipping from your endpoint handler. Once it is hit, body.files contains the files that have been uploaded. So you can simply use this array to start any further processing instead of frequently polling a directory.
At some points you are using the callback version of the fs API (for instance fs.rename(). You cannot await a function that expects a callback. Again, use the promise api of fs.
EDIT
So I'm trying to address your issues. Maybe I can't solve all of them because of missing infomation, but you should get the general idea.
First of all, you shuld use the promise api of the fs module. And also for path manipulation, you should use the available path module, which will take care of some os specific issues.
const fs = require('fs').promises;
const path = require('path');
Your API endpoint isn't currently returning anything. I suppose you stripped away some code, but still. Furthermore, you should trigger your filehandling from here, so you don't have to do directory polling, which is
error prone,
wasting resources and
if you do it synchronously like you do blocks the server
app.post("/api/zip-upload", upload, async (req, res, next) => {
console.log("FIles - ", req.files);
//if you want to return the result only after the files have been
//processed use await
await handleFiles(req.files);
//if you want to return to the client immediately and process files
//skip the await
//handleFiles(req.files);
res.sendStatus(200);
});
Handling the files seems to consist of two different steps:
unzipping the uploaded zip files
copying some of the extracted files into another directory
const source = path.join(".", "uploads");
const target = path.join(__dirname, "uploads");
const statics = path.join(__dirname, "statics");
const handleFiles = async (files) => {
//a random folder, which will be unique for this upload request
const tmpfolder = path.join(target, `tmp_${new Date().getTime()}`);
//create this folder
await fs.mkdir(tmpfolder, {recursive: true});
//extract all uploaded files to the folder
//this will wait for a list of promises and resolve once all of them resolved,
await Promise.all(files.map(f => extract(path.join(source, f), { dir: tmpfolder })));
await copyFiles(tmpfolder);
//you probably should delete the uploaded zipfiles and the tempfolder
//after they have been handled
await Promise.all(files.map(f => fs.unlink(path.join(source, f))));
await fs.rmdir(tmpfolder, { recursive: true});
}
const copyFiles = async (tmpfolder) => {
//get all files/directory names in the tmpfolder
const allfiles = await fs.readdir(tmpfolder);
//get their stats
const stats = await Promise.all(allfiles.map(f => fs.stat(path.join(tmpfolder, f))));
//filter directories only
const dirs = allfiles.filter((_, i) => stats[i].isDirectory());
for (let d of dirs) {
//read all filenames in the subdirectory
const files = await fs.readdir(path.join(tmpfolder, d)));
//filter by extension .xml
const xml = files.filter(x => path.extname(x) === ".xml");
//move all xml files
await Promise.all(xml.map(f => fs.rename(path.join(tmpfolder, d, f), path.join(statics, f))));
}
}
That should do the trick. Of course you may notice there is no error handling with this code. You should add that.
And I'm not 100% sure about your paths. You should consider the following
./uploads refers to a directory uploads in the current working directory (whereever that may be)
${__dirname}/uploads refers to a directory uploads which is in the same directory as the script file currently executing Not sure if that is the directory you want ...
./uploads and ${__dirname}/uploads may point to the same folder or to completely different folders. No way knowing that without additional context.
Furthermore in your code you extract the ZIP files from ./uploads to ${__dirname}/uploads and then later try to copy XML files from ./uploads/xxx to ${__dirname}/statics, but there won't be any directory xxx in ./uploads because you extracted the ZIP file to a (probably) completely different folder.
I am struggling to run an async function taken from a Google example alongside the Environment Variables on Windows 10. I have created a bucket at GCS and uploaded my .raw file.
I then created a .env file which contains the following
HOST=localhost
PORT=3000
GOOGLE_APPLICATION_CREDENTIALS=GDeveloperKey.json
Doing this in AWS Lambda is just a case of wrapping the code within exports.handler = async (event, context, callback) => {
How can I emulate the same locally in Windows 10?
// Imports the Google Cloud client library
const speech = require('#google-cloud/speech');
// Creates a client
const client = new speech.SpeechClient();
/**
* TODO(developer): Uncomment the following lines before running the sample.
*/
// const gcsUri = 'gs://my-bucket/audio.raw';
// const encoding = 'Encoding of the audio file, e.g. LINEAR16';
// const sampleRateHertz = 16000;
// const languageCode = 'BCP-47 language code, e.g. en-US';
const config = {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
};
const audio = {
uri: gcsUri,
};
const request = {
config: config,
audio: audio,
};
// Detects speech in the audio file. This creates a recognition job that you
// can wait for now, or get its result later.
const [operation] = await client.longRunningRecognize(request);
// Get a Promise representation of the final result of the job
const [response] = await operation.promise();
const transcription = response.results
.map(result => result.alternatives[0].transcript)
Wrap your await statements into an immediately-invoked async function.
Ex:
(async () => {
// Detects speech in the audio file. This creates a recognition job that you
// can wait for now, or get its result later.
const [operation] = await client.longRunningRecognize(request);
// Get a Promise representation of the final result of the job
const [response] = await operation.promise();
const transcription = response.results
.map(result => result.alternatives[0].transcript)
})();
I've got a script that synchronously installs non-built-in modules at startup that looks like this
const cp = require('child_process')
function requireOrInstall (module) {
try {
require.resolve(module)
} catch (e) {
console.log(`Could not resolve "${module}"\nInstalling`)
cp.execSync(`npm install ${module}`)
console.log(`"${module}" has been installed`)
}
console.log(`Requiring "${module}"`)
try {
return require(module)
} catch (e) {
console.log(require.cache)
console.log(e)
}
}
const http = require('http')
const path = require('path')
const fs = require('fs')
const ffp = requireOrInstall('find-free-port')
const express = requireOrInstall('express')
const socket = requireOrInstall('socket.io')
// List goes on...
When I uninstall modules, they get installed successfully when I start the server again, which is what I want. However, the script starts throwing Cannot find module errors when I uninstall the first or first two modules of the list that use the function requireOrInstall. That's right, the errors only occur when the script has to install either the first or the first two modules, not when only the second module needs installing.
In this example, the error will be thrown when I uninstall find-free-port, unless I move its require at least one spot down ¯\_(• _ •)_/¯
I've also tried adding a delay directly after the synchronous install to give it a little more breathing time with the following two lines:
var until = new Date().getTime() + 1000
while (new Date().getTime() < until) {}
The pause was there. It didn't fix anything.
#velocityzen came with the idea to check the cache, which I've now added to the script. It doesn't show anything out of the ordinary.
#vaughan's comment on another question noted that this exact error occurs when requiring a module twice. I've changed the script to use require.resolve(), but the error still remains.
Does anybody know what could be causing this?
Edit
Since the question has been answered, I'm posting the one-liner (139 characters!). It doesn't globally define child_modules, has no last try-catch and doesn't log anything in the console:
const req=async m=>{let r=require;try{r.resolve(m)}catch(e){r('child_process').execSync('npm i '+m);await setImmediate(()=>{})}return r(m)}
The name of the function is req() and can be used like in #alex-rokabilis' answer.
It seems that the require operation after an npm install needs a certain delay.
Also the problem is worse in windows, it will always fail if the module needs to be npm installed.
It's like at a specific event snapshot is already known what modules can be required and what cannot. Probably that's why require.cache was mentioned in the comments. Nevertheless I suggest you to check the 2 following solutions.
1) Use a delay
const cp = require("child_process");
const requireOrInstall = async module => {
try {
require.resolve(module);
} catch (e) {
console.log(`Could not resolve "${module}"\nInstalling`);
cp.execSync(`npm install ${module}`);
// Use one of the two awaits below
// The first one waits 1000 milliseconds
// The other waits until the next event cycle
// Both work
await new Promise(resolve => setTimeout(() => resolve(), 1000));
await new Promise(resolve => setImmediate(() => resolve()));
console.log(`"${module}" has been installed`);
}
console.log(`Requiring "${module}"`);
try {
return require(module);
} catch (e) {
console.log(require.cache);
console.log(e);
}
}
const main = async() => {
const http = require("http");
const path = require("path");
const fs = require("fs");
const ffp = await requireOrInstall("find-free-port");
const express = await requireOrInstall("express");
const socket = await requireOrInstall("socket.io");
}
main();
await always needs a promise to work with, but it's not needed to explicitly create one as await will wrap whatever it is waiting for in a promise if it isn't handed one.
2) Use a cluster
const cp = require("child_process");
function requireOrInstall(module) {
try {
require.resolve(module);
} catch (e) {
console.log(`Could not resolve "${module}"\nInstalling`);
cp.execSync(`npm install ${module}`);
console.log(`"${module}" has been installed`);
}
console.log(`Requiring "${module}"`);
try {
return require(module);
} catch (e) {
console.log(require.cache);
console.log(e);
process.exit(1007);
}
}
const cluster = require("cluster");
if (cluster.isMaster) {
cluster.fork();
cluster.on("exit", (worker, code, signal) => {
if (code === 1007) {
cluster.fork();
}
});
} else if (cluster.isWorker) {
// The real work here for the worker
const http = require("http");
const path = require("path");
const fs = require("fs");
const ffp = requireOrInstall("find-free-port");
const express = requireOrInstall("express");
const socket = requireOrInstall("socket.io");
process.exit(0);
}
The idea here is to re-run the process in case of a missing module. This way we fully reproduce a manual npm install so as you guess it works! Also it seems more synchronous rather the first option, but a bit more complex.
I think your best option is either:
(ugly) to install package globally, instead of locally
(best solution ?) to define YOUR new 'package repository installation', when installing, AND when requiring
First, you may consider using the npm-programmatic package.
Then, you may define your repository path with something like:
const PATH='/tmp/myNodeModuleRepository';
Then, replace your installation instruction with something like:
const npm = require('npm-programmatic');
npm.install(`${module}`, {
cwd: PATH,
save:true
}
Eventually, replace your failback require instruction, with something like:
return require(module, { paths: [ PATH ] });
If it is still not working, you may update the require.cache variable, for instance to invalide a module, you can do something like:
delete require.cache[process.cwd() + 'node_modules/bluebird/js/release/bluebird.js'];
You may need to update it manually, to add information about your new module, before loading it.
cp.execSync is an async call so try check if the module is installed in it's call back function. I have tried it, installation is clean now:
const cp = require('child_process')
function requireOrInstall (module) {
try {
require.resolve(module)
} catch (e) {
console.log(`Could not resolve "${module}"\nInstalling`)
cp.execSync(`npm install ${module}`, () => {
console.log(`"${module}" has been installed`)
try {
return require(module)
} catch (e) {
console.log(require.cache)
console.log(e)
}
})
}
console.log(`Requiring "${module}"`)
}
const http = require('http')
const path = require('path')
const fs = require('fs')
const ffp = requireOrInstall('find-free-port')
const express = requireOrInstall('express')
const socket = requireOrInstall('socket.io')
When node_modules not available yet :
When node_modules available already:
i'm trying to make an app that searches for all files
contains a specified string under the current directory/subdirectory.
as i understand it means i need to create a read stream, loop it, load the read data to an array, if the word found give __filename, dirname and if ! not found message.
unfortunately, i could not make it work...
any clue?
var path = require('path'),
fs=require('fs');
function fromDir(startPath,filter,ext){
if (!fs.existsSync(startPath)){
console.log("no dir ",startPath);
return;
};
var files=fs.readdirSync(startPath);
let found = files.find((file) => {
let thisFilename = path.join(startPath, file);
let stat = fs.lstatSync(thisFilename);
var readStream = fs.createReadStream(fs);
var readline = require('readline');
if (stat.isDirectory()) {
fromDir(thisFilename, filename,readline, ext);
} else {
if (path.extname(createReadStream) === ext && path.basename(thisFilename, ext) === filename) {
return true;
}
}
});
console.log('-- your word has found on : ',filename,__dirname);
}
if (!found) {
console.log("Sorry, we didn't find your term");
}
}
fromDir('./', process.argv[3], process.argv[2]);
Because not everything was included in the question, I made an assumption:
We are looking for full words (if that's not the case, replace the regex with a simple indexOf()).
Now, I've split the code into two functions - to make it both more readable and easier to recursively find the files.
Synchronous version:
const path = require('path');
const fs = require('fs');
function searchFilesInDirectory(dir, filter, ext) {
if (!fs.existsSync(dir)) {
console.log(`Specified directory: ${dir} does not exist`);
return;
}
const files = getFilesInDirectory(dir, ext);
files.forEach(file => {
const fileContent = fs.readFileSync(file);
// We want full words, so we use full word boundary in regex.
const regex = new RegExp('\\b' + filter + '\\b');
if (regex.test(fileContent)) {
console.log(`Your word was found in file: ${file}`);
}
});
}
// Using recursion, we find every file with the desired extention, even if its deeply nested in subfolders.
function getFilesInDirectory(dir, ext) {
if (!fs.existsSync(dir)) {
console.log(`Specified directory: ${dir} does not exist`);
return;
}
let files = [];
fs.readdirSync(dir).forEach(file => {
const filePath = path.join(dir, file);
const stat = fs.lstatSync(filePath);
// If we hit a directory, apply our function to that dir. If we hit a file, add it to the array of files.
if (stat.isDirectory()) {
const nestedFiles = getFilesInDirectory(filePath, ext);
files = files.concat(nestedFiles);
} else {
if (path.extname(file) === ext) {
files.push(filePath);
}
}
});
return files;
}
Async version - because async is cool:
const path = require('path');
const fs = require('fs');
const util = require('util');
const fsReaddir = util.promisify(fs.readdir);
const fsReadFile = util.promisify(fs.readFile);
const fsLstat = util.promisify(fs.lstat);
async function searchFilesInDirectoryAsync(dir, filter, ext) {
const found = await getFilesInDirectoryAsync(dir, ext);
for (file of found) {
const fileContent = await fsReadFile(file);
// We want full words, so we use full word boundary in regex.
const regex = new RegExp('\\b' + filter + '\\b');
if (regex.test(fileContent)) {
console.log(`Your word was found in file: ${file}`);
}
};
}
// Using recursion, we find every file with the desired extention, even if its deeply nested in subfolders.
async function getFilesInDirectoryAsync(dir, ext) {
let files = [];
const filesFromDirectory = await fsReaddir(dir).catch(err => {
throw new Error(err.message);
});
for (let file of filesFromDirectory) {
const filePath = path.join(dir, file);
const stat = await fsLstat(filePath);
// If we hit a directory, apply our function to that dir. If we hit a file, add it to the array of files.
if (stat.isDirectory()) {
const nestedFiles = await getFilesInDirectoryAsync(filePath, ext);
files = files.concat(nestedFiles);
} else {
if (path.extname(file) === ext) {
files.push(filePath);
}
}
};
return files;
}
If you have not worked with / understand async/await yet, it is a great step to take and learn it as soon as possible. Trust me, you will love not seeing those ugly callbacks again!
UPDATE:
As you pointed in comments, you want it to execute the function after running node process on the file. You also want to pass the function parameters as node's arguments.
To do that, at the end of your file, you need to add:
searchFilesInDirectory(process.argv[2], process.argv[3], process.argv[4]);
This extracts our arguments and passes them to the function.
With that, you can call our process like so (example arguments):
node yourscriptname.js ./ james .txt
Personally, if I were to write this, I would leverage the beauty of asynchronous code, and Node.js's async / await.
As a very side note:
You can easily improve readability of your code, if you add proper formatting to it. Don't get me wrong, it's not terrible - but it can be improved:
Use spaces OR newlines after commas.
Use spaces around equality operators and arithmetic operators.
As long as you are consistent with formatting, everything looks much better.
Imagine you have many long text files, and you need to only extract data from the first line of each one (without reading any further content). What is the best way in Node JS to do it?
Thanks!
I ended up adopting this solution, which seems the most performant I've seen so far:
var fs = require('fs');
var Q = require('q');
function readFirstLine (path) {
return Q.promise(function (resolve, reject) {
var rs = fs.createReadStream(path, {encoding: 'utf8'});
var acc = '';
var pos = 0;
var index;
rs
.on('data', function (chunk) {
index = chunk.indexOf('\n');
acc += chunk;
index !== -1 ? rs.close() : pos += chunk.length;
})
.on('close', function () {
resolve(acc.slice(0, pos + index));
})
.on('error', function (err) {
reject(err);
})
});
}
I created a npm module for convenience, named "firstline".
Thanks to #dandavis for the suggestion to use String.prototype.slice()!
There's a built-in module almost for this case - readline. It avoids messing with chunks and so forth. The code would look like the following:
const fs = require('fs');
const readline = require('readline');
async function getFirstLine(pathToFile) {
const readable = fs.createReadStream(pathToFile);
const reader = readline.createInterface({ input: readable });
const line = await new Promise((resolve) => {
reader.on('line', (line) => {
reader.close();
resolve(line);
});
});
readable.close();
return line;
}
I know this doesn't exactly answer the question but for those who are looking for a READABLE and simple way to do so:
const fs = require('fs').promises;
async function getFirstLine(filePath) {
const fileContent = await fs.readFile(filePath, 'utf-8');
return (fileContent.match(/(^.*)/) || [])[1] || '';
}
NOTE:
naturaly, this will only work with text files, which I assumed you used from your description
this will work with empty files and will return an empty string
this regexp is very performant since it is simple (no OR conditions`or complex matches) and only reads the first line
Please try this:
https://github.com/yinrong/node-line-stream-util#get-head-lines
It unpipe the upstream once got the head lines.
Node.js >= 16
In all current versions of Node.js, readline.createInterface can be used as an async iterable, to read a file line by line - or just for the first line. This is also safe to use with empty files.
Unfortunately, the error handling logic is broken in versions of Node.js before 16, where certain file system errors may go uncaught even if the code is wrapped in a try-catch block because of the way asynchronous errors are propagated in streams. So I recommend using this method only in Node.js >= 16.
import { createReadStream } from "fs";
import { createInterface } from "readline";
async function readFirstLine(path) {
const inputStream = createReadStream(path);
try {
for await (const line of createInterface(inputStream)) return line;
return ''; // If the file is empty.
}
finally {
inputStream.destroy(); // Destroy file stream.
}
}
const firstLine = await readFirstLine("path/to/file");
//Here you go;
var lineReader = require('line-reader');
var async = require('async');
exports.readManyFiles = function(files) {
async.map(files,
function(file, callback))
lineReader.open(file, function(reader) {
if (reader.hasNextLine()) {
reader.nextLine(function(line) {
callback(null,line);
});
}
});
},
function(err, allLines) {
//do whatever you want to with the lines
})
}