Where is memory leak in my JavaScript (NodeJS) code? - javascript

I have simple script to handle CSV file with size 10GB. The idea is pretty simple.
Open file as stream.
Parse CSV objects from it.
Modify objects.
Make output stream to new file.
I made following code, but it cause memory leak. I have tried a lot of different things, but nothing helps. The memory leak disappear if I remove transformer from pipes. Maybe it causes memory leak.
I run the code under NodeJS.
Can you help me found where I am wrong?
'use strict';
import fs from 'node:fs';
import {parse, transform, stringify} from 'csv';
import lineByLine from 'n-readlines';
// big input file
const inputFile = './input-data.csv';
// read headers first
const linesReader = new lineByLine(inputFile);
const firstLine = linesReader.next();
linesReader.close();
const headers = firstLine.toString()
.split(',')
.map(header => {
return header
.replace(/^"/, '')
.replace(/"$/, '')
.replace(/\s+/g, '_')
.replace('(', '_')
.replace(')', '_')
.replace('.', '_')
.replace(/_+$/, '');
});
// file stream
const fileStream1 = fs.createReadStream(inputFile);
// parser stream
const parserStream1 = parse({delimiter: ',', cast: true, columns: headers, from_line: 1});
// transformer
const transformer = transform(function(record) {
return Object.assign({}, record, {
SomeField: 'BlaBlaBla',
});
});
// stringifier stream
const stringifier = stringify({delimiter: ','});
console.log('Loading data...');
// chain of pipes
fileStream1.on('error', err => { console.log(err); })
.pipe(parserStream1).on('error', err => {console.log(err); })
.pipe(transformer).on('error', err => { console.log(err); })
.pipe(stringifier).on('error', err => { console.log(err); })
.pipe(fs.createWriteStream('./_data/new-data.csv')).on('error', err => { console.log(err); })
.on('finish', () => {
console.log('Loading data finished!');
});

Related

Electron nedb-promises storage file gets replaced on every app start

I'm now trying for a few hours to understand why this happens, in my electron app i would like to use the nedb-promises package("nedb-promises": "^6.2.1",). Installation and configuration works so far but on every app start (dev & prod) the db file got replaced by a new / empty one. Should'nt the package not handle that?
I've took the code from this example:
https://shivekkhurana.medium.com/persist-data-in-electron-apps-using-nedb-5fa35500149a
// db.js
const {app} = require('electron');
const Datastore = require('nedb-promises');
const dbFactory = (fileName) => Datastore.create({
filename: `${process.env.NODE_ENV === 'development' ? '.' : app.getPath('userData')}/data/${fileName}`,
timestampData: true,
autoload: true
});
const db = {
customers: dbFactory('customers.db'),
tasks: dbFactory('tasks.db')
};
module.exports = db;
import db from './db'
....
// load task should not be important because file is already replaced when arriving here
ipcMain.handle('Elements:Get', async (event, args) => {
// 'Select * from Customers
let data = await db.customers.find({});
console.log(data);
return data;
})
...
// Set an item
ipcMain.handle('Element:Save', async (event, data) => {
console.log(data)
const result = db.customers.insertOne(data.item).then((newDoc) => {
console.log(newDoc)
return newDoc
}).catch((err) => {
console.log("Error while Adding")
console.log(err)
});
console.log(result);
return result;
})
Note: After "adding" newDoc contains the new element and when checking the file manually in the filesystem it is added. When i now close the app and open again the file got replaced.
I've checked the docs up and down - i have no clue what i'm doing wrong - thanks for your help.

Is it possible to read/decode .avro uInt8Array Container File with Javascript in Browser?

I'm trying to decode an .avro file loaded from a web server.
Since the string version of the uInt8Array starts with
"buffer from S3 Objavro.schema�{"type":"record","name":"Destination",..."
I assume it's avro Container File
I found 'avro.js' and 'avsc' as tools for working with the .avro format and javascript but reading the documentation it sound's like the decoding of a Container File is only possible in Node.js, not in the browser.
(The FileDecoder/Encoder methods are taking a path to a file as string, not an uInt8Array)
Do I get this wrong or is there an alternative way to decode an .avro Container File in the browser with javascript?
Luckily I found a way using avsc with broserify
avro.createBlobDecoder(blob, [{options}])
[avro.js - before browserifying]
var avro = require('avsc');
const AVRO = {
decodeBlob(blob) {
let schema, columnTitles, columnTypes, records = []
return new Promise((resolve) => {
avro.createBlobDecoder(blob, {
// noDecode: true
})
.on('metadata', (s) => {
schema = s
columnTitles = schema.fields.map(f => f.name)
columnTypes = schema.fields.map(f => f.type)
})
.on('data', (data) => {
records.push(data)
})
.on('finish', () => {
resolve(
{
columnTitles: columnTitles,
columnTypes: columnTypes,
records: records
}
)
})
})
}
}
module.exports = AVRO
[package.json]
"scripts": {
"avro": "browserify public/avro.js --s AVRO > public/build/avro.js"
}
[someOtherFile.js]
//s3.getObject => uInt8array
const blob = new Blob([uInt8array]) //arr in brackets !important
const avroDataObj = await AVRO.decodeBlob(blob)
Thanks for posting!
Below is my integration of avro/binary with axios, in case it helps anyone else trying to implement browser side decoding:
[before browserifying]
const axios = require('axios')
const avro = require('avsc')
const config = {
responseType: 'blob'
};
const url = 'https://some-url.com'
axios.get(url, config)
.then(res => {
avro.createBlobDecoder(res.data)
.on('metadata', (type) => console.log(type))
.on('data', (record) => console.log(record))
})
.catch(e => console.error(e))

npm package csvtojson CSV Parse Error: Error: unclosed_quote

Node version: v10.19.0
Npm version: 6.13.4
Npm package csvtojson Package Link
csvtojson({
"delimiter": ";",
"fork": true
})
.fromStream(fileReadStream)
.subscribe((dataObj) => {
console.log(dataObj);
}, (err) => {
console.error(err);
}, (success) => {
console.log(success);
});
While trying to handle large CSV file (about 1.3 million records) I face error "CSV Parse Error: Error: unclosed_quote." after certain records(e.g. after 400+ records) being processed successfully. From the CSV file i don't see any problems with data formatting there, however the parser might be raising this error because of "\n" character being found inside the column/field value.
Is there a solution already available with this package? or
is there a workaround to handle this error? or
is there a way to skip such CSV rows having any sort of errors not only this one, to let the
entire CSV to JSON parsing work without the processing getting stuck?
Any help will be much appreciated.
I've played about with this, and it's possible to hook into this using a CSV File Line Hook, csv-file-line-hook, you can check for invalid lines and either repair or simply invalidate them.
The example below will simply skip the invalid lines (missing end quotes)
example.js
const fs = require("fs");
let fileReadStream = fs.createReadStream("test.csv");
let invalidLineCount = 0;
const csvtojson = require("csvtojson");
csvtojson({ "delimiter": ";", "fork": true })
.preFileLine((fileLineString, lineIdx)=> {
let invalidLinePattern = /^['"].*[^"'];/;
if (invalidLinePattern.test(fileLineString)) {
console.log(`Line #${lineIdx + 1} is invalid, skipping:`, fileLineString);
fileLineString = "";
invalidLineCount++;
}
return fileLineString
})
.fromStream(fileReadStream)
.subscribe((dataObj) => {
console.log(dataObj);
},
(err) => {
console.error("Error:", err);
},
(success) => {
console.log("Skipped lines:", invalidLineCount);
console.log("Success");
});
test.csv
Name;Age;Profession
Bob;34;"Sales,Marketing"
Sarah;31;"Software Engineer"
James;45;Driver
"Billy, ;35;Manager
"Timothy;23;"QA
This regex works better
/^(?:[^"\]|\.|"(?:\.|[^"\])")$/g
Here is a more complex working script for big files by reading each line
import csv from 'csvtojson'
import fs from 'fs-extra'
import lineReader from 'line-reader'
import { __dirname } from '../../../utils.js'
const CSV2JSON = async(dumb, editDumb, headers, {
options = {
trim: true,
delimiter: '|',
quote: '"',
escape: '"',
fork: true,
headers: headers
}
} = {}) => {
try {
log(`\n\nStarting CSV2JSON - Current directory: ${__dirname()} - Please wait..`)
await new Promise((resolve, reject) => {
let firstLine, counter = 0
lineReader.eachLine(dumb, async(line, last) => {
counter++
// log(`line before convert: ${line}`)
let json = (
await csv(options).fromString(headers + '\n\r' + line)
.preFileLine((fileLineString, lineIdx) => {
// if it its not the first line
// eslint-disable-next-line max-len
if (counter !== 1 && !fileLineString.match(/^(?:[^"\\]|\\.|"(?:\\.|[^"\\])*")*$/g)) {
// eslint-disable-next-line max-len
console.log(`Line #${lineIdx + 1} is invalid. It has unescaped quotes. We will skip this line.. Invalid Line: ${fileLineString}`)
fileLineString = ''
}
return fileLineString
})
.on('error', e => {
e = `Error while converting CSV to JSON.
Line before convert: ${line}
Error: ${e}`
throw new BaseError(e)
})
)[0]
// log(`line after convert: ${json}`)
if (json) {
json = JSON.stringify(json).replace(/\\"/g, '')
if (json.match(/^(?:[^"\\]|\\.|"(?:\\.|[^"\\])*")*$/g)) {
await fs.appendFile(editDumb, json)
}
}
if (last) {
resolve()
}
})
})
} catch (e) {
throw new BaseError(`Error while converting CSV to JSON - Error: ${e}`)
}
}
export { CSV2JSON }

How to defer stream read invocation

I'm still trying to grok my way through streams in general. I have been able to stream a large file using multiparty from within form.on('part'). But I need to defer the invocation and resolve the stream before it's read. I have tried PassThrough, through. through2, but have gotten different results, which it mainly hangs, and I can't figure out what to do, nor steps to debug. I'm open to all alternatives. Thanks for all insights.
import multiparty from 'multiparty'
import {
PassThrough
} from 'stream';
import through from 'through'
import through2 from 'through2'
export function promisedMultiparty(req) {
return new Promise((resolve, reject) => {
const form = new multiparty.Form()
const form_files = []
let q_str = ''
form.on('field', (fieldname, value) => {
if (value) q_str = appendQStr(fieldname, value, q_str)
})
form.on('part', async (part) => {
if (part.filename) {
const pass1 = new PassThrough() // this hangs at 10%
const pass2 = through(function write(data) { // this hangs from the beginning
this.queue(data)
},
function end() {
this.queue(null)
})
const pass3 = through2() // this hangs at 10%
/*
// This way works for large files, but I want to defer
// invocation
const form_data = new FormData()
form_data.append(savepath, part, {
filename,
})
const r = request.post(url, {
headers: {
'transfer-encoding': 'chunked'
}
}, responseCallback(resolve))
r._form = form
*/
form_files.push({
part: part.pipe(pass1),
// part: part.pipe(pass2),
// part: part.pipe(pass3),
})
} else {
part.resume()
}
})
form.on('close', () => {
resolve({
fields: qs.parse(q_str),
forms: form_files,
})
})
form.parse(req)
})
}
p.s. For sure the title could be better, if someone could use the proper terms please. Thanks.
I believe this is because you are not using through2 correctly - i.e. not actually emptying the buffer once it's full (that's why it hangs at 10% on bigger files, but works on smaller ones).
I believe an implementation like this should do it:
const pass2 = through2(function(chunk, encoding, next) {
// do something with the data
// Use this only if you want to send the data further to another stream reader
// Note - From your implementation you don't seem to need it
// this.push(data)
// This is what tells through2 it's ready to empty the
// buffer and read more data
next();
})

Replace a string in a file with nodejs

I use the md5 grunt task to generate MD5 filenames. Now I want to rename the sources in the HTML file with the new filename in the callback of the task. I wonder what's the easiest way to do this.
You could use simple regex:
var result = fileAsString.replace(/string to be replaced/g, 'replacement');
So...
var fs = require('fs')
fs.readFile(someFile, 'utf8', function (err,data) {
if (err) {
return console.log(err);
}
var result = data.replace(/string to be replaced/g, 'replacement');
fs.writeFile(someFile, result, 'utf8', function (err) {
if (err) return console.log(err);
});
});
Since replace wasn't working for me, I've created a simple npm package replace-in-file to quickly replace text in one or more files. It's partially based on #asgoth's answer.
Edit (3 October 2016): The package now supports promises and globs, and the usage instructions have been updated to reflect this.
Edit (16 March 2018): The package has amassed over 100k monthly downloads now and has been extended with additional features as well as a CLI tool.
Install:
npm install replace-in-file
Require module
const replace = require('replace-in-file');
Specify replacement options
const options = {
//Single file
files: 'path/to/file',
//Multiple files
files: [
'path/to/file',
'path/to/other/file',
],
//Glob(s)
files: [
'path/to/files/*.html',
'another/**/*.path',
],
//Replacement to make (string or regex)
from: /Find me/g,
to: 'Replacement',
};
Asynchronous replacement with promises:
replace(options)
.then(changedFiles => {
console.log('Modified files:', changedFiles.join(', '));
})
.catch(error => {
console.error('Error occurred:', error);
});
Asynchronous replacement with callback:
replace(options, (error, changedFiles) => {
if (error) {
return console.error('Error occurred:', error);
}
console.log('Modified files:', changedFiles.join(', '));
});
Synchronous replacement:
try {
let changedFiles = replace.sync(options);
console.log('Modified files:', changedFiles.join(', '));
}
catch (error) {
console.error('Error occurred:', error);
}
Perhaps the "replace" module (www.npmjs.org/package/replace) also would work for you. It would not require you to read and then write the file.
Adapted from the documentation:
// install:
npm install replace
// require:
var replace = require("replace");
// use:
replace({
regex: "string to be replaced",
replacement: "replacement string",
paths: ['path/to/your/file'],
recursive: true,
silent: true,
});
You can also use the 'sed' function that's part of ShellJS ...
$ npm install [-g] shelljs
require('shelljs/global');
sed('-i', 'search_pattern', 'replace_pattern', file);
Full documentation ...
ShellJS - sed()
ShellJS
If someone wants to use promise based 'fs' module for the task.
const fs = require('fs').promises;
// Below statements must be wrapped inside the 'async' function:
const data = await fs.readFile(someFile, 'utf8');
const result = data.replace(/string to be replaced/g, 'replacement');
await fs.writeFile(someFile, result,'utf8');
You could process the file while being read by using streams. It's just like using buffers but with a more convenient API.
var fs = require('fs');
function searchReplaceFile(regexpFind, replace, cssFileName) {
var file = fs.createReadStream(cssFileName, 'utf8');
var newCss = '';
file.on('data', function (chunk) {
newCss += chunk.toString().replace(regexpFind, replace);
});
file.on('end', function () {
fs.writeFile(cssFileName, newCss, function(err) {
if (err) {
return console.log(err);
} else {
console.log('Updated!');
}
});
});
searchReplaceFile(/foo/g, 'bar', 'file.txt');
On Linux or Mac, keep is simple and just use sed with the shell. No external libraries required. The following code works on Linux.
const shell = require('child_process').execSync
shell(`sed -i "s!oldString!newString!g" ./yourFile.js`)
The sed syntax is a little different on Mac. I can't test it right now, but I believe you just need to add an empty string after the "-i":
const shell = require('child_process').execSync
shell(`sed -i "" "s!oldString!newString!g" ./yourFile.js`)
The "g" after the final "!" makes sed replace all instances on a line. Remove it, and only the first occurrence per line will be replaced.
Expanding on #Sanbor's answer, the most efficient way to do this is to read the original file as a stream, and then also stream each chunk into a new file, and then lastly replace the original file with the new file.
async function findAndReplaceFile(regexFindPattern, replaceValue, originalFile) {
const updatedFile = `${originalFile}.updated`;
return new Promise((resolve, reject) => {
const readStream = fs.createReadStream(originalFile, { encoding: 'utf8', autoClose: true });
const writeStream = fs.createWriteStream(updatedFile, { encoding: 'utf8', autoClose: true });
// For each chunk, do the find & replace, and write it to the new file stream
readStream.on('data', (chunk) => {
chunk = chunk.toString().replace(regexFindPattern, replaceValue);
writeStream.write(chunk);
});
// Once we've finished reading the original file...
readStream.on('end', () => {
writeStream.end(); // emits 'finish' event, executes below statement
});
// Replace the original file with the updated file
writeStream.on('finish', async () => {
try {
await _renameFile(originalFile, updatedFile);
resolve();
} catch (error) {
reject(`Error: Error renaming ${originalFile} to ${updatedFile} => ${error.message}`);
}
});
readStream.on('error', (error) => reject(`Error: Error reading ${originalFile} => ${error.message}`));
writeStream.on('error', (error) => reject(`Error: Error writing to ${updatedFile} => ${error.message}`));
});
}
async function _renameFile(oldPath, newPath) {
return new Promise((resolve, reject) => {
fs.rename(oldPath, newPath, (error) => {
if (error) {
reject(error);
} else {
resolve();
}
});
});
}
// Testing it...
(async () => {
try {
await findAndReplaceFile(/"some regex"/g, "someReplaceValue", "someFilePath");
} catch(error) {
console.log(error);
}
})()
I ran into issues when replacing a small placeholder with a large string of code.
I was doing:
var replaced = original.replace('PLACEHOLDER', largeStringVar);
I figured out the problem was JavaScript's special replacement patterns, described here. Since the code I was using as the replacing string had some $ in it, it was messing up the output.
My solution was to use the function replacement option, which DOES NOT do any special replacement:
var replaced = original.replace('PLACEHOLDER', function() {
return largeStringVar;
});
ES2017/8 for Node 7.6+ with a temporary write file for atomic replacement.
const Promise = require('bluebird')
const fs = Promise.promisifyAll(require('fs'))
async function replaceRegexInFile(file, search, replace){
let contents = await fs.readFileAsync(file, 'utf8')
let replaced_contents = contents.replace(search, replace)
let tmpfile = `${file}.jstmpreplace`
await fs.writeFileAsync(tmpfile, replaced_contents, 'utf8')
await fs.renameAsync(tmpfile, file)
return true
}
Note, only for smallish files as they will be read into memory.
This may help someone:
This is a little different than just a global replace
from the terminal we run
node replace.js
replace.js:
function processFile(inputFile, repString = "../") {
var fs = require('fs'),
readline = require('readline'),
instream = fs.createReadStream(inputFile),
outstream = new (require('stream'))(),
rl = readline.createInterface(instream, outstream);
formatted = '';
const regex = /<xsl:include href="([^"]*)" \/>$/gm;
rl.on('line', function (line) {
let url = '';
let m;
while ((m = regex.exec(line)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
url = m[1];
}
let re = new RegExp('^.* <xsl:include href="(.*?)" \/>.*$', 'gm');
formatted += line.replace(re, `\t<xsl:include href="${repString}${url}" />`);
formatted += "\n";
});
rl.on('close', function (line) {
fs.writeFile(inputFile, formatted, 'utf8', function (err) {
if (err) return console.log(err);
});
});
}
// path is relative to where your running the command from
processFile('build/some.xslt');
This is what this does.
We have several file that have xml:includes
However in development we need the path to move down a level.
From this
<xsl:include href="common/some.xslt" />
to this
<xsl:include href="../common/some.xslt" />
So we end up running two regx patterns one to get the href and the other to write
there is probably a better way to do this but it work for now.
Thanks
Nomaly, I use tiny-replace-files to replace texts in file or files. This pkg is smaller and lighter...
https://github.com/Rabbitzzc/tiny-replace-files
import { replaceStringInFilesSync } from 'tiny-replace-files'
const options = {
files: 'src/targets/index.js',
from: 'test-plugin',
to: 'self-name',
}
# await
const result = replaceStringInFilesSync(options)
console.info(result)
I would use a duplex stream instead. like documented here nodejs doc duplex streams
A Transform stream is a Duplex stream where the output is computed in
some way from the input.
<p>Please click in the following {{link}} to verify the account</p>
function renderHTML(templatePath: string, object) {
const template = fileSystem.readFileSync(path.join(Application.staticDirectory, templatePath + '.html'), 'utf8');
return template.match(/\{{(.*?)\}}/ig).reduce((acc, binding) => {
const property = binding.substring(2, binding.length - 2);
return `${acc}${template.replace(/\{{(.*?)\}}/, object[property])}`;
}, '');
}
renderHTML(templateName, { link: 'SomeLink' })
for sure you can improve the reading template function to read as stream and compose the bytes by line to make it more efficient

Categories