Unzip .gz with zlib & async/await (w/o using streams) - javascript

Since zlib has been added to node.js I'd like to ask a question about unzipping .gz with async/await style, w/o of using streams, one by one.
In the code below I am using fs-extra instead of standard fs & typescript (instead of js), but as for the answer, it doesn't matter will it have js or ts code.
import fs from 'fs-extra';
import path from "path";
import zlib from 'zlib';
(async () => {
try {
//folder which is full of .gz files.
const dir = path.join(__dirname, '..', '..', 'folder');
const files: string[] = await fs.readdir(dir);
for (const file of files) {
//read file one by one
const
file_content = fs.createReadStream(`${dir}/${file}`),
write_stream = fs.createWriteStream(`${dir}/${file.slice(0, -3)}`,),
unzip = zlib.createGunzip();
file_content.pipe(unzip).pipe(write_stream);
}
} catch (e) {
console.error(e)
}
})()
As for now, I have this code, based on streams, which is working, but in various StackOverflow answers, I haven't found any example with async/await, only this one, but it also uses streams I guess.
So does it even possible?
//inside async function
const read_file = await fs.readFile(`${dir}/${file}`)
const unzip = await zlib.unzip(read_file);
//write output of unzip to file or console
I understand that this task will block the main thread. It's ok for me since I write a simple day schedule script.

Seems I have figure it out, but I am still not hundred percent sure about it, here is example of full IIFE:
(async () => {
try {
//folder which is full of .gz files.
const dir = path.join(__dirname, '..', '..', 'folder');
const files: string[] = await fs.readdir(dir);
//parallel run
await Promise.all(files.map(async (file: string, i: number) => {
//let make sure, that we have only .gz files in our scope
if (file.match(/gz$/g)) {
const
buffer = await fs.readFile(`${dir}/${file}`),
//using .toString() is a must, if you want to receive readble data, instead of Buffer
data = await zlib.unzipSync(buffer , { finishFlush: zlib.constants.Z_SYNC_FLUSH }).toString(),
//from here, you can write data to a new file, or parse it.
json = JSON.parse(data);
console.log(json)
}
}))
} catch (e) {
console.error(e)
} finally {
process.exit(0)
}
})()
If you have many files in one directory, I guess you could use await Promise.all(files.map => fn()) to run this task in parallel. Also, in my case, I required to parse JSON, so remember about some nuances of JSON.parse.

Related

How to handle situation with node.js streams when one stream is dependant on another one?

I am developing a functionality for bulk uploading and I came with this issue.
I want to archive files and that will be uploaded to my server. Also the archive will contain a manifest file - which will describe each file with various properties / meta data / etc.
The issue occurs when I want to send back the response. The stream which is reading the manifest file is closed which leads to immediate callback execution. Bellow I will show the examples.
const csv = require("fast-csv");
const fs = require("fs");
const path = require("path");
async function proccesUpload() {
const manifestReadStream = fs.createReadStream(
path.join(__dirname, "manifest.txt")
);
manifestReadStream
.pipe(
csv.parse({
delimiter: ";",
})
)
.on("data", async (row) => {
// do processing for each file described in manifest file
const hash = crypto.createHash("sha1");
const rs = fs.createReadStream(targetFile, {
flags: "r",
autoClose: true,
});
rs.on("data", (data) => hash.update(data, "utf-8"));
rs.on("close", function onReadStreamClose() {
// do proccessing for file
});
})
.on("end", async () => {
// return response when all formating was performed
});
}
By using nest read stream, the on "end" is executed before all the files are processed.
How can I solve this?
I recommend using async iterators will make the code easier and callback free
async function proccesUpload() {
const manifestReadStream = fs.createReadStream(
path.join(__dirname, "manifest.txt")
);
const parserStream = manifestReadStream.pipe(
csv.parse({
delimiter: ";",
})
);
for await (const row of parserStream) {
// do processing for each file described in manifest file
const hash = crypto.createHash("sha1");
const rs = fs.createReadStream(targetFile, {
flags: "r",
autoClose: true,
});
for await (const data of rs) {
hash.update(data, "utf-8");
}
// DONE PROCESSING THE ROW
}
// DONE PROCESSING ALL FILES
// return response when all formating was performed
}

How can I write files in Deno?

I was trying to write to a file using Deno.writeFile
await Deno.writeFile('./file.txt', 'some content')
But got the following cryptic error:
error: Uncaught TypeError: arr.subarray is not a function
at Object.writeAll ($deno$/buffer.ts:212:35)
at Object.writeFile ($deno$/write_file.ts:70:9)
What's the right way to write files in Deno?
There are multiple ways to write a file in Deno, all of them require --allow-write flag and will throw if an error occurred, so you should handle errors correctly.
Using Deno.writeFile
This API takes a Uint8Array, not a string, the reason why you get that error. It also takes an optional WriteFileOptions object
const res = await fetch('http://example.com/image.png');
const imageBytes = new Uint8Array(await res.arrayBuffer());
await Deno.writeFile('./image.png', imageBytes);
There's also the synchronous API (it blocks the event loop as it does in Node.js).
Deno.writeFileSync('./image.png', imageBytes);
Writing strings
The easiest way is to use Deno.writeTextFile
await Deno.writeTextFile('./file.txt', 'some content');
You can also use Deno.writeFile with TextEncoder.
const encoder = new TextEncoder(); // to convert a string to Uint8Array
await Deno.writeFile('./file.txt', encoder.encode('some content'));
Streaming
Deno.open returns a FsFile which contains a WritableStream in .writable property, so you can just pipe a stream directly to it.
const res = await fetch('https://example.com/csv');
const file = await Deno.open('./some.csv', { create: true, write: true })
await res.body.pipeTo(file.writable);
file.close();
If you have a Reader instead of a ReadableStream you can convert it to a ReadableStream using readableStreamFromReader from std/streams:
import { readableStreamFromReader } from "https://deno.land/std#0.156.0/streams/mod.ts?s=readableStreamFromReader";
// ...
const readable = readableStreamFromReader(someReader);
await readable.pipeTo(file.writeable)
Low-level APIs
Using Deno.open and Deno.writeAll (or Deno.writeAllSync)
const file = await Deno.open('./image.png', { write: true, create: true });
/* ... */
await Deno.writeAll(file, imageBytes);
file.close(); // You need to close it!
See OpenOptions here. If you want to append you would do:
{ append: true }
And you can also use even lower-level APIs such as Deno.write or Writer.write
You can use ensureDir to safely write files to possibly non-existent directories:
import { ensureDir } from "https://deno.land/std#0.54.0/fs/ensure_dir.ts";
ensureDir("./my/dir")
.then(() => Deno.writeTextFile("./my/dir/file.txt", "some content"));
The containing file directory can be derived via dirname:
import { dirname } from "https://deno.land/std#0.54.0/path/mod.ts";
const file = "./my/dir/file.txt";
ensureDir(dirname(file)).then(() => Deno.writeTextFile(file, "some content"));
An alternative is ensureFile to assert file existence:
import { ensureFile } from "https:deno.land/std/fs/ensure_file.ts";
ensureFile(file).then(/* your file write method */)
This variant is slightly less verbose, with the cost of one additional write operation (file creation, if not exists).

Outputting file details using ffprobe in ffmpeg AWS Lambda layer

I am trying to output the details of an audio file with ffmpeg using the ffprobe option. But it is just returning 'null' at the moment? I have added the ffmpeg layer in Lambda. can anyone spot why this is not working?
const { spawnSync } = require("child_process");
const { readFileSync, writeFileSync, unlinkSync } = require("fs");
const util = require('util');
var fs = require('fs');
let path = require("path");
exports.handler = (event, context, callback) => {
spawnSync(
"/opt/bin/ffprobe",
[
`var/task/myaudio.flac`
],
{ stdio: "inherit" }
);
};
This is the official AWS Lambda layer I am using, it is a great prooject but a little lacking in documentation.
https://github.com/serverlesspub/ffmpeg-aws-lambda-layer
First of all, I would recommend using NodeJS 8.10 over NodeJs 6.10 (it will be soon EOL, although AWS is unclear on how long it will be supported)
Also, I would not use the old style handler with a callback.
A working example below - since it downloads a file from the internet (couldn't be bothered to create a package to deploy on lambda with the file uploaded) give it a bit more time to work.
const { spawnSync } = require('child_process');
const util = require('util');
var fs = require('fs');
let path = require('path');
const https = require('https');
exports.handler = async (event) => {
const source_url = 'https://upload.wikimedia.org/wikipedia/commons/b/b2/Bell-ring.flac'
const target_path = '/tmp/test.flac'
async function downloadFile() {
return new Promise((resolve, reject) => {
const file = fs.createWriteStream(target_path);
const request = https.get(source_url, function(response) {
const stream = response.pipe(file)
stream.on('finish', () => {resolve()})
});
});
}
await downloadFile()
const test = spawnSync('/opt/bin/ffprobe',[
target_path
]);
console.log(test.output.toString('utf8'))
const response = {
statusCode: 200,
body: JSON.stringify([test.output.toString('utf8')]),
};
return response;
}
NB! In production be sure to generate a unique temporary file as instances that the Lambda function run on are often shared from invocation to invocation, you don't want multiple invocations stepping on each others files! When done, delete the temporary file, otherwise you might run out of free space on the instance executing your functions. The /tmp folder can hold 512MB, so it can run out fast if you work with many large flac files
I'm not fully familiar with this layer, however from looking at the git repo of the thumbnail-builder it looks like child_process is a promise, so you should be waiting for it's result using .then(), otherwise it is returning null because it doesn't wait for the result.
So try something like:
return spawnSync(
"/opt/bin/ffprobe",
[
`var/task/myaudio.flac`
],
{ stdio: "inherit" }
).then(result => {
return result;
})
.catch(error => {
//handle error
});

Alternative ways to parse local JSON file with functional approach?

Below is the way I usually use to safely parse local JSON data in Node environment, mostly config files and some other relevant data:
const fs = require('fs')
let localDb
let parsedData
try {
localDb = fs.readFileSync('./file.json', 'utf8')
parsedData = JSON.parse(localDb)
} catch (err) {
throw err
}
exports.data = parsedData
In the end, I export the parsed data from the JavaScript file for usage. While this works perfectly fine, I'm curious to know if there are better ways to do the same thing with a functional approach.
Just wrap your code inside a function and export the return of that function:
const fs = require('fs')
function parseDBData(name, coding) {
let localDb;
let parsedData;
try {
localDb = fs.readFileSync(name, coding);
parsedData = JSON.parse(localDb);
} catch (err) {
throw err;
}
}
exports.data = parseDBData('./file.json', 'utf8');
p.s. with node you can directly get the JSON files content through require:
exports.data = require('./file.json');

Using Node.JS, how do I read a JSON file into (server) memory?

Background
I am doing some experimentation with Node.js and would like to read a JSON object, either from a text file or a .js file (which is better??) into memory so that I can access that object quickly from code. I realize that there are things like Mongo, Alfred, etc out there, but that is not what I need right now.
Question
How do I read a JSON object out of a text or js file and into server memory using JavaScript/Node?
Sync:
var fs = require('fs');
var obj = JSON.parse(fs.readFileSync('file', 'utf8'));
Async:
var fs = require('fs');
var obj;
fs.readFile('file', 'utf8', function (err, data) {
if (err) throw err;
obj = JSON.parse(data);
});
The easiest way I have found to do this is to just use require and the path to your JSON file.
For example, suppose you have the following JSON file.
test.json
{
"firstName": "Joe",
"lastName": "Smith"
}
You can then easily load this in your node.js application using require
var config = require('./test.json');
console.log(config.firstName + ' ' + config.lastName);
Asynchronous is there for a reason! Throws stone at #mihai
Otherwise, here is the code he used with the asynchronous version:
// Declare variables
var fs = require('fs'),
obj
// Read the file and send to the callback
fs.readFile('path/to/file', handleFile)
// Write the callback function
function handleFile(err, data) {
if (err) throw err
obj = JSON.parse(data)
// You can now play with your datas
}
At least in Node v8.9.1, you can just do
var json_data = require('/path/to/local/file.json');
and access all the elements of the JSON object.
Answer for 2022, using ES6 module syntax and async/await
In modern JavaScript, this can be done as a one-liner, without the need to install additional packages:
import { readFile } from 'fs/promises';
let data = JSON.parse(await readFile("filename.json", "utf8"));
Add a try/catch block to handle exceptions as needed.
In Node 8 you can use the built-in util.promisify() to asynchronously read a file like this
const {promisify} = require('util')
const fs = require('fs')
const readFileAsync = promisify(fs.readFile)
readFileAsync(`${__dirname}/my.json`, {encoding: 'utf8'})
.then(contents => {
const obj = JSON.parse(contents)
console.log(obj)
})
.catch(error => {
throw error
})
Using fs-extra package is quite simple:
Sync:
const fs = require('fs-extra')
const packageObj = fs.readJsonSync('./package.json')
console.log(packageObj.version)
Async:
const fs = require('fs-extra')
const packageObj = await fs.readJson('./package.json')
console.log(packageObj.version)
using node-fs-extra (async await)
const readJsonFile = async () => {
const myJsonObject = await fs.readJson('./my_json_file.json');
console.log(myJsonObject);
}
readJsonFile() // prints your json object
https://nodejs.org/dist/latest-v6.x/docs/api/fs.html#fs_fs_readfile_file_options_callback
var fs = require('fs');
fs.readFile('/etc/passwd', (err, data) => {
if (err) throw err;
console.log(data);
});
// options
fs.readFile('/etc/passwd', 'utf8', callback);
https://nodejs.org/dist/latest-v6.x/docs/api/fs.html#fs_fs_readfilesync_file_options
You can find all usage of Node.js at the File System docs!
hope this help for you!
function parseIt(){
return new Promise(function(res){
try{
var fs = require('fs');
const dirPath = 'K:\\merge-xml-junit\\xml-results\\master.json';
fs.readFile(dirPath,'utf8',function(err,data){
if(err) throw err;
res(data);
})}
catch(err){
res(err);
}
});
}
async function test(){
jsonData = await parseIt();
var parsedJSON = JSON.parse(jsonData);
var testSuite = parsedJSON['testsuites']['testsuite'];
console.log(testSuite);
}
test();
Answer for 2022, using v8 Import assertions
import jsObject from "./test.json" assert { type: "json" };
console.log(jsObject)
Dynamic import
const jsObject = await import("./test.json", {assert: { type: "json" }});
console.log(jsObject);
Read more at:
v8 Import assertions
So many answers, and no one ever made a benchmark to compare sync vs async vs require. I described the difference in use cases of reading json in memory via require, readFileSync and readFile here.
If you are looking for a complete solution for Async loading a JSON file from Relative Path with Error Handling
// Global variables
// Request path module for relative path
const path = require('path')
// Request File System Module
var fs = require('fs');
// GET request for the /list_user page.
router.get('/listUsers', function (req, res) {
console.log("Got a GET request for list of users");
// Create a relative path URL
let reqPath = path.join(__dirname, '../mock/users.json');
//Read JSON from relative path of this file
fs.readFile(reqPath , 'utf8', function (err, data) {
//Handle Error
if(!err) {
//Handle Success
console.log("Success"+data);
// Parse Data to JSON OR
var jsonObj = JSON.parse(data)
//Send back as Response
res.end( data );
}else {
//Handle Error
res.end("Error: "+err )
}
});
})
Directory Structure:

Categories