Background
I am trying to read a several GB sized file line by line. I want to process each line and after that write it to a file. I don't want to ( nor can I ) put everything into memory.
It is important that the order in which i read a line is the order in which I write it to a file.
Code
To achieve this I tried using Node.js Readline interface
const fs = require( "fs" ),
readline = require( "readline" );
const readStream = fs.createReadStream( "./logs/report.csv" );
const writeStream = fs.createWriteStream( "./logs/out.csv", { encoding: "utf8"} );
const rl = readline.createInterface({
input: readStream,
output: writeStream,
terminal: false,
historySize: 0
});
rl.on( "line", function(line) {
//Do your stuff ...
const transformedLine = line.toUpperCase();
console.log(transformedLine);
//Then write to outstream
rl.write(transformedLine );
});
Problem
As you can see, I am trying to read a line, parse it, and write it into a file called out.csv.
The problem is that the output file is always empty. Nothing is ever written into it.
I have read all the methods, events and options, but clearly I am missing something.
Question
Why is this code not writing into the file?
Answer
With the current code, I am actually feeding Readline with transformedLine again.
This is not what I want. What I should be doing is to write directly to writeStream.
rl.on( "line", function(line) {
console.log(line);
//Do your stuff ...
const transformedLine = line.toUpperCase();
console.log(transformedLine);
//Then write to outstream
writeStream.write( transformedLine );
});
This will produce an output file respecting the order of input.
For a more detailed discussion on the stream mechanics and internal buffers see:
https://github.com/nodejs/help/issues/1292
I'm quite late for the question, but for anyone who reads this:
If you write on every read and your write speed is slower than read speed you'll still bloat memory. Though not as much as by reading entire file to memory.
You should use pipe with stream.Transform instead of readline. Reason being that pipe processes data at the phase of the slowest participant in flow and so won't bloat memory.
const stream = require('stream');
const fs = require('fs');
const readStream = fs.createReadStream("./logs/report.csv");
const writeStream = fs.createWriteStream("./logs/report.csv");
const transformer = new stream.Transform({
// buffer is a chunk of stream, enc is type of chunk, done is a callback when transform is done
transform(buffer, enc, done){
const lines = buffer.toString().split('\n');
const transformedChunkAsString = lines.map(workYourMagicAndReturnFormattedLine).join('\n');
const transformedBuffer = Buffer.from(transformedChunkAsString);
this.push(transformedBuffer);
done();
}
})
readStream.pipe(transformStream).pipe(writeStream);
Can you try this
const fs = require( "fs" ),
readline = require( "readline" );
const readStream = fs.createReadStream("./logs/report.csv");
const writeStream = fs.createWriteStream("./logs/report.csv");
readStream.pipe(writeStream);
Related
Passing large amounts of data to stdin will fail. If you run this script under unix, you will get only a portion of the output of the website in the terminal:
const cat = Deno.run({
cmd: ["cat"],
stdin: "piped"
});
await cat.stdin.write(new Uint8Array(
await (
await fetch("https://languagelog.ldc.upenn.edu/nll/?feed=atom")
).arrayBuffer()
));
cat.stdin.close();
await cat.status();
The sample feed ends with </feed>, but the pipe will swallow in the middle:
Is there a way to circumvent this issue or did I spot a bug?
No other than Ryan Dahl himself answered me:
stdin.write is just one syscall, it returns the number of bytes written. If you use writeAll, I think it would work.
That said, ideally you'd stream large data, rather than buffer it up.
import { readerFromStreamReader } from "https://deno.land/std#0.100.0/io/streams.ts";
const cat = Deno.run({
cmd: ["cat"],
stdin: "piped",
});
const res = await fetch("https://languagelog.ldc.upenn.edu/nll/?feed=atom");
let r = readerFromStreamReader(res.body.getReader());
await Deno.copy(r, cat.stdin);
cat.stdin.close();
await cat.status();
I am new to JavaScript and need the ability to create, edit and export an XML document on the server side. I have seen different options on the Internet, but they do not suit me.
It seems that I found one suitable option with processing my XML file into JSON, and then back and then export it through another plugin, but maybe there is some way to make it easier?
Thanks!
I recently came across a similar problem. The solution turned out to be very simple. It is to use XML-Writer
In your project folder, first install it via the console
npm install xml-writer
Next, first import it and create a new file to parse what's going on here:
var XMLWriter = require ('xml-writer');
xw = new XMLWriter;
xw.startDocument ();
xw.startElement ('root');
xw.writeAttribute ('foo', 'value');
xw.text ('Some content');
xw.endDocument ();
console.log (xw.toString ());
You can find more information here and at the bottom of the page see the different code for each item. In this way, you can create, edit and export xml files. Good luck and if something is not clear, write!
Additional
You will need also fs module
const fs = require("fs")
const xmlParser = require("xml2json")
const formatXml = require("xml-formatter")
Completed code:
const fs = require("fs")
const xmlParser = require("xml2json")
const formatXml = require("xml-formatter")
var XMLWriter = require('xml-writer');
xw = new XMLWriter;
xw.startDocument();
xw.startElement('root');
xw.startElement('man');
xw.writeElement('name', 'Sergio');
xw.writeElement('adult', 'no');
xw.endElement();
xw.startElement('item');
xw.writeElement('name', 'phone');
xw.writeElement('price', '305.77');
xw.endElement();
xw.endDocument();
const stringifiedXmlObj = JSON.stringify(xmlObj)
const finalXml = xmlParser.toXml(stringifiedXmlObj)
fs.writeFile("./datax.xml", formatXml(finalXml, { collapseContent: true }), function (err, result) {
if (err) {
console.log("Error")
} else {
console.log("Xml file successfully updated.")
}
})
})
I wrote this function to generate png files from some svg files across a few directories. I was doing the below functionality synchronously and it was working as expected (same code as below but with readFileSync), but was told to re-do it to use only promisified fs functions.
The current code skips a couple files in both groupA and groupB, plus its swapping widths. For example, I've noticed the conversion function wont generate for svg1 of dirB, but will generate for svg1 of dirA though it has incorrect width that matches svg1 of dirB.
Most files convert correctly, but a handful don't. My guess is its a timing issue, so how do I fix that while still keeping the fs functionality all promisified?
const { createConverter } = require('convert-svg-to-png');
const fs = require('fs');
const path = require('path');
const util = require('util');
const readdir = util.promisify(fs.readdir);
const readFile = util.promisify(fs.readFile);
async function convertSvgFiles(dirPath) {
const converter = createConverter();
try {
const files = await readdir(dirPath);
for (let file of files) {
const currentFile = path.join(dirPath, fil);
const fileContents = await readFile(currentFile);
const fileWidth = fileContents.toString('utf8').match(\*regex capture viewbox width*\);
await converter.convertFile(currentFile, { width: fileWidth });
}
} catch (err) {
console.warn('Error while converting a file to png', '\n', err);
} finally {
await converter.destroy();
}
}
['dirA', 'dirB', 'dirC'].map(dir => convertSvgFiles(`src/${dir}`));
Your code looks pretty good. I do not see anything obvious that would cause the behavior you describe - you await each promise within the function. You are not using global variables.
I'll say that this line:
['dirA', 'dirB', 'dirC'].map(dir => convertSvgFiles(`src/${dir}`));
ends up running your function on all 3 directories in parallel, with 3 converter instances. Assuming there are no parallel-related bugs in converter, this should not cause any issues.
But just for grins, try changing that to:
async function run() {
for (let d of ['dirA', 'dirB', 'dirC']) {
await convertSvgFiles(`src/${d}`)
}
}
run()
to force the folders to be scanned sequentially. If this resolves the issue, then there's a bug within convert-svg-to-png.
I am developping a node js project.I have a zip file i want to extract it then i read one of the files inside my extracted zip.
The problem that i has that even i code the function for extraction before the readfile function that i call it in the callback.
I always has no such file or directory error like the readfile is passed before the extraction.Help!!
This is my code
var unzip = require('unzip');
const fs = require('fs');
var stream = fs.createReadStream(zipFilePath).pipe(unzip.Extract({ path: 'files/em' }));
stream.on('finish', function () {
fs.readFileSync('files/em/data.json') ;//read the extracted file but always the extraction passed after this
});
You don't observe the right event. The 'finish' event of createReadStream() is triggered before the unzip process occurs. You should instead listen to the 'close' event of the unzip process to be sure the extraction is done.
const unzip = require('unzip');
const fs = require('fs');
const zipFilePath = 'files/data.zip'
let extract = unzip.Extract({ path: 'files/em' })
let stream = fs.createReadStream(zipFilePath).pipe(extract);
extract.on('close', () => {
let data = fs.readFileSync('files/em/data.json') ;
console.log(data.toString()) // print your unziped json file
})
path should be a directory, 'unzip.Extract({ path: 'files/em' })'
I have a large json file that looks like that:
[
{"name": "item1"},
{"name": "item2"},
{"name": "item3"}
]
I want to stream this file (pretty easy so far), for each line run a asynchronous function (that returns a promise) upon the resolve/reject call edit this line.
The result of the input file could be:
[
{"name": "item1", "response": 200},
{"name": "item2", "response": 404},
{"name": "item3"} // not processed yet
]
I do not wish to create another file, I want to edit on the fly the SAME FILE (if possible!).
Thanks :)
I don't really answer the question, but don't think it can be answered in a satisfactory way anyway, so here are my 2 cents.
I assume that you know how to stream line by line, and run the function, and that the only problem you have is editing the file that you are reading from.
Consequences of inserting
It is not possible to natively insert data into any file (which is what you want to do by changing the JSON live). A file can only grow up at its end.
So inserting 10 bytes of data at the beginning of a 1GB file means that you need to write 1GB to the disk (to move all the data 10 bytes further).
Your filesystem does not understand JSON, and just sees that you are inserting bytes in the middle of a big file so this is going to be very slow.
So, yes it is possible to do.
Write a wrapper over the file API in NodeJS with an insert() method.
Then write some more code to be able to know where to insert bytes into a JSON file without loading the whole file and not producing invalid JSON at the end.
Now I would not recommend it :)
=> Read this question: Is it possible to prepend data to an file without rewriting?
Why do it then?
I assume that want to either
Be able to kill your process at any time, and easily resume work by reading the file again.
Retry partially treated files to fill only the missing bits.
First solution: Use a database
Abstracting the work that needs to be done to live edit files at random places is the sole purpose of existence of databases.
They all exist only to abstract the magic that is behind UPDATE mytable SET name = 'a_longer_name_that_the_name_that_was_there_before' where name = 'short_name'.
Have a look at LevelUP/Down, sqlite, etc...
They will abstract all the magic that needs to be done in your JSON file!
Second solution: Use multiple files
When you stream your file, write two new files!
One that contain current position in the input file and lines that need to be retried
The other one the expected result.
You will also be able to kill your process at any time and restart
According to this answer writing to the same file while reading is not reliable. As a commenter there says, better to write to a temporary file, and then delete the original and rename the temp file over it.
To create a stream of lines you can use byline. Then for each line, apply some operation and pipe it out to the output file.
Something like this:
var fs = require('fs');
var stream = require('stream');
var util = require('util');
var LineStream = require('byline').LineStream;
function Modify(options) {
stream.Transform.call(this, options);
}
util.inherits(Modify, stream.Transform);
Modify.prototype._transform = function(chunk, encoding, done) {
var self = this;
setTimeout(function() {
// your modifications here, note that the exact regex depends on
// your json format and is probably the most brittle part of this
var modifiedChunk = chunk.toString();
if (modifiedChunk.search('response:[^,}]+') === -1) {
modifiedChunk = modifiedChunk
.replace('}', ', response: ' + new Date().getTime() + '}') + '\n';
}
self.push(modifiedChunk);
done();
}, Math.random() * 2000 + 1000); // to simulate an async modification
};
var inPath = './data.json';
var outPath = './out.txt';
fs.createReadStream(inPath)
.pipe(new LineStream())
.pipe(new Modify())
.pipe(fs.createWriteStream(outPath))
.on('close', function() {
// replace input with output
fs.unlink(inPath, function() {
fs.rename(outPath, inPath);
});
});
Note that the above results in only one async operation happening at a time. You could also save the modifications to an array and once all of them are done write the lines from the array to a file, like this:
var fs = require('fs');
var stream = require('stream');
var LineStream = require('byline').LineStream;
var modifiedLines = [];
var modifiedCount = 0;
var inPath = './data.json';
var allModified = new Promise(function(resolve, reject) {
fs.createReadStream(inPath).pipe(new LineStream()).on('data', function(chunk) {
modifiedLines.length++;
var index = modifiedLines.length - 1;
setTimeout(function() {
// your modifications here
var modifiedChunk = chunk.toString();
if (modifiedChunk.search('response:[^,}]+') === -1) {
modifiedChunk = modifiedChunk
.replace('}', ', response: ' + new Date().getTime() + '}');
}
modifiedLines[index] = modifiedChunk;
modifiedCount++;
if (modifiedCount === modifiedLines.length) {
resolve();
}
}, Math.random() * 2000 + 1000);
});
}).then(function() {
fs.writeFile(inPath, modifiedLines.join('\n'));
}).catch(function(reason) {
console.error(reason);
});
If instead of lines you wish to stream chunks of valid json which would be a more robust approach, take a look at JSONStream.
As mentioned in the comment, the file you have is not proper JSON, although is valid in Javascript. In order to generate proper JSON, JSON.stringify() could be used. I think it would make life difficult for others to parse nonstandard JSON as well, therefore I would recommend furnishing a new output file instead of keeping the original one.
However, it is still possible to parse the original file as JSON. This is possible via eval('(' + procline + ')');, however it is not secure to take external data into node.js like this.
const fs = require('fs');
const readline = require('readline');
const fr = fs.createReadStream('file1');
const rl = readline.createInterface({
input: fr
});
rl.on('line', function (line) {
if (line.match(new RegExp("\{name"))) {
var procline = "";
if (line.trim().split('').pop() === ','){
procline = line.trim().substring(0,line.trim().length-1);
}
else{
procline = line.trim();
}
var lineObj = eval('(' + procline + ')');
lineObj.response = 200;
console.log(JSON.stringify(lineObj));
}
});
The output would be like this:
{"name":"item1","response":200}
{"name":"item2","response":200}
{"name":"item3","response":200}
Which is line-delimited JSON (LDJSON) and could be useful for streaming stuff, without the need for leading and trailing [, ], or ,. There is an ldjson-stream package for it as well.