Can someone help me with this syntax? - javascript

This maybe silly but I'm unfamiliar with the syntax here:
var stdin = '';
process.stdin.on('data', function (chunk) {
stdin += chunk;
}).on('end', function() {
var lines = stdin.split('\n');
for(var i=0; i<lines.length; i++) {
process.stdout.write(lines[i]);
}
});
I'm supposed to write a program that squares a number, which I know how to do, but I've never encountered this type of structure. I understand the loop and that process.stdout.write is essentially console.log The test cases inputs are 5 and 25. Outputs should be 25 and 625.
Where am I supposed to write the code to do this?

You can put it in a file sample.js and run it:
node sample.js
This process.stdin refers to the stdin stream (incoming data from other applications, for example, a shell input , so basically this:
process.stdin.on('data', function (chunk) {
stdin += chunk;
})
says, whenever there new data (user typed in something into a console, hosting application send some data), read it and store it in stdin variable. Then, when the stdin stream is over (for example, a user finished entering data):
.on('end', function() {
var lines = stdin.split('\n');
for(var i=0; i<lines.length; i++) {
process.stdout.write(lines[i]);
}
})
the code outputs back what a user typed in.

It appears all of the infrastructure is there. All that's left is actually squaring your numbers.
process.stdin and process.stdout are node streams which are asynchronous and so use events to tell you what's going on with them. data is the event for when there is data ready to process and end is for when there's no more data. The code just snarfs process.stdin and then, once the data is all in memory, processes it.
The end anonymous function would probably be best implemented like this:
function() {
stdin.split('\n').foreach(function(line){
var value = line.trim()|0;
process.stdout.write(value * value);
});
}
Off topic: The memory footprint might be improved by processing the stream as it comes in rather than collecting it and then processing it all at once. This depends on the sizes of the input and the input buffer:
var buffer = '';
var outputSquare = function(line) {
var value = line.trim()|0;
process.stdout.write(value * value);
};
process.stdin.on('data', function (chunk) {
var lines = (buffer + chunk).split('\n');
buffer = lines.pop();
lines.foreach(outputSquare);
}).on('end', function() {
outputSquare(buffer);
});

process.stdin.resume();
process.stdin.setEncoding('utf8');
const squareNum = (num) => (num ** 2);
let stdin = '';
process.stdin.on('data', (chunk) => {
stdin = `${stdin}${chunk}`;
console.log(squareNum(parseInt(chunk)));
}).on('end', () => {
//const lines = stdin.trim().split('\n');
//for (const line of lines) {
// process.stdout.write(`${line}\n`);
//}
});

Related

Javascript Await Changes Local Variables? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
Anyone able to explain what I'm doing wrong with my use of asynchronous functions in Javascript?
Basically, I must use an asynchronous in my Node.js code to grab an open port for me to use. There is a local variable that is being set outside of the asynchronous call that I can access/use just fine until I await for the asynchronous function to return. After that, the local variable is undefined.
(async () => {
console.log("CHECK AFTER ASYNC1: " + csvFilePath);
// First, grab a valid open port
var port;
while (!port || portsInProcess.indexOf(port) >= 0) {
console.log("CHECK AFTER ASYNC2: " + csvFilePath);
port = await getPort();
console.log(port);
}
console.log("CHECK AFTER ASYNC3: " + csvFilePath);
portsInProcess.push(port);
// ... more code below...
Checks #1 and 2 are fine for the csvFilePath variable, but check #3 shows that it's undefined. The port number, however, is fine. This leads me to believe that there's some weirdness with asynchronous function calls in Javascript that ONLY affects local variables; the global variables I use further down are just fine. Unfortunately here, I cannot make the csvFilePath variable global since that will introduce race conditions on that variable too (which I'm preventing elsewhere; the while loop is to help prevent race conditions on the port number, which is basically unused in my simple tests on localhost).
Just in case it's helpful, here's the output I'm getting:
CHECK AFTER ASYNC1: data/text/crescent_topics.csv
CHECK AFTER ASYNC2: data/text/crescent_topics.csv
58562
CHECK AFTER ASYNC3: null
It might also be worth mentioning it's really only those first few lines of code to dynamically grab an open port that are the lines of code I added. The code that I had before which used a fixed port number worked just fine (including this csvFilePath variable remaining stable).
My understanding of the await functionality was that it makes the asynchronous function act more or less synchronously, which is what seems to be happening here; the code I have farther down that uses the port number is not running until after the port number is set. (But even if that wasn't the case, why is the csvFilePath variable being unset since I'm not altering it or using it in any way here?)
EDIT: Here's some more code to provide additional context
var spawn = require('child_process').spawn;
var fs = require("fs");
var async = require('async');
var zmq = require('zmq');
var readline = require('readline');
const getPort = require('get-port');
/* Export the Nebula class */
module.exports = Nebula;
/* Location of the data for the Crescent dataset */
var textDataPath = "data/text/";
var crescentRawDataPath = textDataPath + "crescent_raw";
var crescentTFIDF = textDataPath + "crescent tfidf.csv";
var crescentTopicModel = textDataPath + "crescent_topics.csv";
/* Location of the data for the UK Health dataset */
var ukHealthRawDataPath = textDataPath + "uk_health_raw";
var ukHealthTFIDF = textDataPath + "uk_health.csv";
/* Map CSV files for text data to raw text location */
var textRawDataMappings = {};
textRawDataMappings[crescentTFIDF] = crescentRawDataPath;
textRawDataMappings[crescentTopicModel] = crescentRawDataPath;
textRawDataMappings[ukHealthTFIDF] = ukHealthRawDataPath;
textRawDataMappings[textDataPath + "uk_health_sm.csv"] = ukHealthRawDataPath;
/* The pipelines available to use */
var flatTextUIs = ["cosmos", "composite", "sirius", "centaurus"];
var pipelines = {
andromeda: {
file: "pipelines/andromeda.py",
defaultData: "data/highD/Animal_Data_study.csv"
},
cosmos: {
file: "pipelines/cosmos.py",
defaultData: textDataPath + "crescent tfidf.csv"
},
sirius: {
file: "pipelines/sirius.py",
defaultData: "data/highD/Animal_Data_paper.csv"
},
centaurus: {
file: "pipelines/centaurus.py",
defaultData: "data/highD/Animal_Data_paper.csv"
},
twitter: {
file: "pipelines/twitter.py",
},
composite: {
file: "pipelines/composite.py",
defaultData: textDataPath + "crescent tfidf.csv"
},
elasticsearch: {
file: "pipelines/espipeline.py",
args: []
}
};
/* The locations of the different types of datasets on the server */
var textDataFolder = "data/text/";
var highDDataFolder = "data/highD/";
var customCSVFolder = "data/customCSV/";
var sirius_prototype = 2;
// An array to track the ports being processed to eliminate race conditions
// as much as possible
var portsInProcess = [];
var nextSessionNumber = 0;
var usedSessionNumbers = [];
/* Nebula class constructor */
function Nebula(io, pipelineAddr) {
/* This allows you to use "Nebula(obj)" as well as "new Nebula(obj)" */
if (!(this instanceof Nebula)) {
return new Nebula(io);
}
/* The group of rooms currently active, each with a string identifier
* Each room represents an instance of a visualization that can be shared
* among clients.
*/
this.rooms = {};
this.io = io;
/* For proper use in callback functions */
var self = this;
/* Accept new WebSocket clients */
io.on('connection', function(socket) {
// Skipped some irrelevant Socket.io callbacks
**// Use the csvFilePath to store the name of a user-defined CSV file
var csvFilePath = null;**
/* Helper function to tell the client that the CSV file is now ready for them
* to use. They are also sent a copy of the data
*/
var csvFileReady = function(csvFilePath) {
// Let the client know that the CSV file is now ready to be used on
// the server
socket.emit("csvDataReady");
// Prepare to parse the CSV file
var csvData = [];
const rl = readline.createInterface({
input: fs.createReadStream(csvFilePath),
crlfDelay: Infinity
});
// Print any error messages we encounter
rl.on('error', function (err) {
console.log("Error while parsing CSV file: " + csvFilePath);
console.log(err);
});
// Read each line of the CSV file one at a time and parse it
var columnHeaders = [];
var firstColumnName;
rl.on('line', function (data) {
var dataColumns = data.split(",");
// If we haven't saved any column names yet, do so first
if (columnHeaders.length == 0) {
columnHeaders = dataColumns;
firstColumnName = columnHeaders[0];
}
// Process each individual line of data in the CSV file
else {
var dataObj = {};
var i;
for (i = 0; i < dataColumns.length; i++) {
var key = columnHeaders[i];
var value = dataColumns[i];
dataObj[key] = value
}
csvData.push(dataObj);
}
});
// All lines are read, file is closed now.
rl.on('close', function () {
// On certain OSs, like Windows, an extra, blank line may be read
// Check for this and remove it if it exists
var lastObservation = csvData[csvData.length-1];
var lastObservationKeys = Object.keys(lastObservation);
if (lastObservationKeys.length = 1 && lastObservation[lastObservationKeys[0]] == "") {
csvData.pop();
}
// Provide the CSV data to the client
socket.emit("csvDataReadComplete", csvData, firstColumnName);
});
};
**/* Allows the client to specify a CSV file already on the server to use */
socket.on("setCSV", function(csvName) {
console.log("setCSV CALLED");
csvFilePath = "data/" + csvName;
csvFileReady(csvFilePath);
console.log("CSV FILE SET: " + csvFilePath);
});**
// Skipped some more irrelevant callbacks
/* a client/ a room. If the room doesn't next exist yet,
* initiate it and send the new room to the client. Otherwise, send
* the client the current state of the room.
*/
socket.on('join', function(roomName, user, pipeline, args) {
console.log("Join called for " + pipeline + " pipeline; room " + roomName);
socket.roomName = roomName;
socket.user = user;
socket.join(roomName);
console.log("CSV FILE PATH: " + csvFilePath);
var pipelineArgsCopy = [];
if (!self.rooms[roomName]) {
var room = {};
room.name = roomName;
room.count = 1;
room.points = new Map();
room.similarity_weights = new Map();
if (pipeline == "sirius" || pipeline == "centaurus") {
room.attribute_points = new Map();
room.attribute_similarity_weights = new Map();
room.observation_data = [];
room.attribute_data = [];
}
/* Create a pipeline client for this room */
console.log("CHECK BEFORE ASYNC: " + csvFilePath);
**// Here's the code snippet I provided above**
**(async () => {
console.log("CHECK AFTER ASYNC1: " + csvFilePath);
// First, grab a valid open port
var port;
while (!port || portsInProcess.indexOf(port) >= 0) {
console.log("CHECK AFTER ASYNC2: " + csvFilePath);
port = await getPort();
console.log(port);
}
console.log("CHECK AFTER ASYNC3: " + csvFilePath);**
portsInProcess.push(port);
console.log("CHECK AFTER ASYNC4: " + csvFilePath);
if (!pipelineAddr) {
var pythonArgs = ["-u"];
if (pipeline in pipelines) {
// A CSV file path should have already been set. This
// file path should be used to indicate where to find
// the desired file
console.log("LAST CHECK: " + csvFilePath);
if (!csvFilePath) {
csvFilePath = pipelines[pipeline].defaultData;
}
console.log("FINAL CSV FILE: " + csvFilePath);
pipelineArgsCopy.push(csvFilePath);
// If the UI supports reading flat text files, tell the
// pipeline where to find the files
if (flatTextUIs.indexOf(pipeline) >= 0) {
pipelineArgsCopy.push(textRawDataMappings[csvFilePath]);
}
// Set the remaining pipeline args
pythonArgs.push(pipelines[pipeline].file);
pythonArgs.push(port.toString());
if (pipeline != "twitter" && pipeline != "elasticsearch") {
pythonArgs = pythonArgs.concat(pipelineArgsCopy);
}
}
else {
pythonArgs.push(pipelines.cosmos.file);
pythonArgs.push(port.toString());
pythonArgs.push(pipelines.cosmos.defaultData);
pythonArgs.push(crescentRawDataPath);
}
// used in case of CosmosRadar
for (var key in args) {
if (args.hasOwnProperty(key)) {
pythonArgs.push("--" + key);
pythonArgs.push(args[key]);
}
}
// Dynamically determine which distance function should be
// used
if (pythonArgs.indexOf("--dist_func") < 0) {
if (pipeline === "twitter" || pipeline === "elasticsearch" ||
csvFilePath.startsWith(textDataPath)) {
pythonArgs.push("--dist_func", "cosine");
}
else {
pythonArgs.push("--dist_func", "euclidean");
}
}
console.log(pythonArgs);
console.log("");
var pipelineInstance = spawn("python2.7", pythonArgs, {stdout: "inherit"});
pipelineInstance.on("error", function(err) {
console.log("python2.7.exe not found. Trying python.exe");
pipelineInstance = spawn("python", pythonArgs,{stdout: "inherit"});
pipelineInstance.stdout.on("data", function(data) {
console.log("Pipeline: " + data.toString());
});
pipelineInstance.stderr.on("data", function(data) {
console.log("Pipeline error: " + data.toString());
});
});
/* Data received by node app from python process,
* ouptut this data to output stream(on 'data'),
* we want to convert that received data into a string and
* append it to the overall data String
*/
pipelineInstance.stdout.on("data", function(data) {
console.log("Pipeline STDOUT: " + data.toString());
});
pipelineInstance.stderr.on("data", function(data) {
console.log("Pipeline error: " + data.toString());
});
room.pipelineInstance = pipelineInstance;
}
/* Connect to the pipeline */
pipelineAddr = pipelineAddr || "tcp://127.0.0.1:" + port.toString();
room.pipelineSocket = zmq.socket('pair');
room.pipelineSocket.connect(pipelineAddr);
pipelineAddr = null;
portsInProcess.splice(portsInProcess.indexOf(port), 1);
/* Listens for messages from the pipeline */
room.pipelineSocket.on('message', function (msg) {
self.handleMessage(room, msg);
});
self.rooms[roomName] = socket.room = room;
invoke(room.pipelineSocket, "reset");
})();
}
else {
socket.room = self.rooms[roomName];
socket.room.count += 1;
if (pipeline == "sirius" || pipeline == "centaurus") {
socket.emit('update', sendRoom(socket.room, true), true);
socket.emit('update', sendRoom(socket.room, false), false);
}
else {
socket.emit('update', sendRoom(socket.room));
}
}
// Reset the csvFilePath to null for future UIs...
// I don't think this is actually necessary since
// csvFilePath is local to the "connections" message,
// which is called for every individual room
csvFilePath = null;
});
// Skipped the rest of the code; it's irrelevant
});
}
Full printouts:
setCSV CALLED
CSV FILE SET: data/text/crescent_topics.csv
Join called for sirius pipeline; room sirius0
CSV FILE PATH: data/text/crescent_topics.csv
CHECK BEFORE ASYNC: data/text/crescent_topics.csv
CHECK AFTER ASYNC1: data/text/crescent_topics.csv
CHECK AFTER ASYNC2: data/text/crescent_topics.csv
58562
CHECK AFTER ASYNC3: null
CHECK AFTER ASYNC4: null
LAST CHECK: null
FINAL CSV FILE: data/highD/Animal_Data_paper.csv
[ '-u',
'pipelines/sirius.py',
'58562',
'data/highD/Animal_Data_paper.csv',
undefined,
'--dist_func',
'euclidean' ]
Since bolding of code doesn't work, just search for the "**" to find the relevant pieces I've marked.
TL;DR There's a lot of communication happening between the client and server to establish an individualized communication that is directly linked to a specific dataset. The user has the ability to upload a custom CSV file to the system, but the code I'm working with right now is just trying to select an existing CSV file on the server, so I omitted the callbacks for the custom CSV file. Once the file has been selected, the client asks to "join" a room/session. The case I'm working with right now assumes that this is a new room/session as opposed to trying to do some shared room/session with another client. (Yes, I know, the code is messy for sharing rooms/sessions, but it works for the most part for now and is not my main concern.) Again, all this code worked just fine before the asynchronous code was added (and using a static port variable), so I don't know what changed so much by adding it.
Since you now included the whole code context, we can see that the issue is that the code after your async IIFE is what is causing the problem.
An async function returns a promise as soon as it hits an await. And, while that await is waiting for its asynchronous operation, the code following the call to the async function runs. In your case, you're essentially doing this:
var csvFilePath = someGoodValue;
(async () => {
port = await getPort();
console.log(csvFilePath); // this will be null
})();
csvFilePath = null; // this runs as soon as the above code hits the await
So, as soon as you hit your first await, the async function returns a promise and the code following it continues to run, hitting the line of code that resets your csvFilePath.
There are probably cleaner ways to restructure your code, but a simple thing you could do is this:
var csvFilePath = someGoodValue;
(async () => {
port = await getPort();
console.log(csvFilePath); // this will be null
})().finally(() => {
csvFilePath = null;
});
Note: .finally() is supported in node v10+. If you're using an older version, you can reset the path in both .then() and .catch().
Or, as your comment says, maybe you can just remove the resetting of the csvFilePath entirely.
I realized after some silly tests I tried that I'm resetting csvFilePath to null outside the asynchronous call, which is what is causing the error... Oops!

Efficiently read file line-by-line in node

I have already learned that readline can be used to read the file line by line, e.g.
readline
.createInterface({input: fs.createReadStream('xxx')})
.on('line', (line) => { apply_regexp_on_line })
.on('close', () => { report_all_regexps });
However, this is pretty slow, since I compared the performance of grep and JavaScript regexp, and the latter has better performance on the regexps I tested. (see benchmark) So I think I have to blame the node async readline.
In my situation, I do not care async at all, I just need to exploit the fast regexp from JavaScript to process very large log files (typically 1-2GB, sometimes up to 10GB). What is the best way of doing this? My only concern is speed.
Bonus points: some of the log files are gzipped, so I need to uncompress them. If someone can recommend me a fast line-by-line reader for both plain text and gzipped text exists, I would be really appreciated.
How does this hold up against your data?
// module linegrep.js
'use strict';
var through2 = require('through2');
var StringDecoder = require('string_decoder').StringDecoder
function grep(regex) {
var decoder = new StringDecoder('utf8'),
last = "",
lineEnd = /\r?\n/;
var stream = through2({}, function transform(chunk, enc, cb) {
var lines = decoder.write(last + chunk).split(lineEnd), i;
last = lines.pop();
for (i = 0; i < lines.length; i++) {
if (regex.test(lines[i])) this.push(lines[i]);
}
cb();
}, function flush(cb) {
if (regex.test(last)) this.push(last);
cb();
});
stream._readableState.objectMode = true;
return stream;
}
module.exports = grep;
and
// index.js
'use strict';
var fs = require('fs');
var zlib = require('zlib');
var grep = require('./linegrep');
function grepFile(filename, regex) {
var rstream = fs.createReadStream(filename, {highWaterMark: 172 * 1024});
if (/\.gz$/.test(filename)) rstream = rstream.pipe(zlib.createGunzip());
return rstream
.pipe(grep(regex));
}
// -------------------------------------------------------------------------
var t = Date.now(), mc = 0;
grepFile('input.txt', /boot\.([a-z]+)_head\./).on('data', function (line) {
mc++;
console.log(line);
}).on('end', function () {
console.log( mc + " matches, " + (Date.now() - t) + " ms" );
});
This turns a file stream into an object stream of lines, maps them through your regex and returns only the matching lines.

Haskell call Node.js file not working

I am trying to write a program in Haskell that takes in the input as a string of sentences, calls a javascript file with that input, and return the output of that javascript file as the output of the Haskell file. Right now, the output of the javascript file is not printed. It is not clear whether javascript file is called or not.
Here is the script in Haskell:
main :: IO ()
main =
do
putStrLn "Give me the paragraphs \n"
paragraphs <- getLine
output <- readCreateProcess (shell "node try2.js") paragraphs
putStrLn output
Script in Node.js. The desired output is toplines:
var lexrank = require('./lexrank');
const readline = require('readline');
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
rl.question('Hi', (answer) => {
var originalText = answer;
var topLines = lexrank.summarize(originalText, 5, function (err, toplines, text) {
if (err) {
console.log(err);
}
rl.write(toplines);
// console.log(toplines);
});
rl.close();
});
I am guessing there is some problem with my way of doing stdin. I am new to Node.js
It took me a really long time but following code works:
Haskell File:
import System.Process
main :: IO ()
main =
do
putStrLn "Give me the paragraphs \n"
paragraphs <- getLine
output <- readCreateProcess (shell "node lexrankReceiver.js") (paragraphs ++ "\n")
putStrLn output
NodeJs File:
// Getting this to work took almost a full day. Javascript gets really freaky
// when using it on terminal.
/* Import necessary modules. */
var lexrank = require('./Lexrank/lexrank.js');
const readline = require('readline');
// var Type = require('type-of-is');
// var utf8 = require('utf8');
// Create readline interface, which needs to be closed in the end.
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
// Set stdin and stdout to be encoded in utf8. Haskell passes string as basic
// 8-bit unsigned integer array. The output also needs to be encoded so that
// Haskell can read them
process.stdin.setEncoding('utf8');
process.stdout.setEncoding('utf8');
// If a string is readable, start reading.
process.stdin.on('readable', () => {
var chunk = process.stdin.read();
if (chunk !== null) {
var originalText = chunk;
var topLines = lexrank.summarize(originalText, 5, function (err, toplines, text) {
if (err) {
console.log(err);
}
// Loop through the result to form a new paragraph consisted of most
// important sentences in ascending order. Had to split the 0 index and
// the rest indices otherwise the first thing in newParagraphs will be
// undefined.
var newParagraphs = (toplines[0])['text'];
for (var i = 1; i < toplines.length; i++) {
newParagraphs += (toplines[i])['text'];
}
console.log(newParagraphs);
});
}
});
// After the output is finished, set end of file.
// TODO: write a handler for end of writing output.
process.stdin.on('end', () => {
process.stdout.write('\n');
});
// It is incredibly important to close readline. Otherwise, input doesn't
// get sent out.
rl.close();
The problem with the Haskell program you have is that paragraphs is not a line of input, just a string, so to fix the issue you can append a newline, something like:
output <- readCreateProcess (shell "node try2.js") $ paragraphs ++ "\n"
To find this issue I tried replacing question with a shim:
rl.question = function(prompt, cb) {
rl.on('line', function(thing) {
console.log(prompt);
cb(thing);
})
}
And that worked, so I knew it was something to do with how question handles stdin. So after this I tried appending a newline and that worked. Which means question requires a 'line' of input, not just any string, unlike on('line'), oddly enough.

How to write a file from an ArrayBuffer in JS

I am trying to write a file uploader for Meteor framework.
The principle is to split the fileon the client from an ArrayBuffer in small packets of 4096 bits that are sent to the server through a Meteor.method.
The simplified code below is the part of the client that sends a chunk to the server, it is repeated until offset reaches data.byteLength :
// data is an ArrayBuffer
var total = data.byteLength;
var offset = 0;
var upload = function() {
var length = 4096; // chunk size
// adjust the last chunk size
if (offset + length > total) {
length = total - offset;
}
// I am using Uint8Array to create the chunk
// because it can be passed to the Meteor.method natively
var chunk = new Uint8Array(data, offset, length);
if (offset < total) {
// Send the chunk to the server and tell it what file to append to
Meteor.call('uploadFileData', fileId, chunk, function (err, length) {
if (!err) {
offset += length;
upload();
}
}
}
};
upload(); // start uploading
The simplified code below is the part on the server that receives the chunk and writes it to the file system :
var fs = Npm.require('fs');
var Future = Npm.require('fibers/future');
Meteor.methods({
uploadFileData: function(fileId, chunk) {
var fut = new Future();
var path = '/uploads/' + fileId;
// I tried that with no success
chunk = String.fromCharCode.apply(null, chunk);
// how to write the chunk that is an Uint8Array to the disk ?
fs.appendFile(path, chunk, 'binary', function (err) {
if (err) {
fut.throw(err);
} else {
fut.return(chunk.length);
}
});
return fut.wait();
}
});
I failed to write a valid file to the disk, actually the file is saved but I cannot open it, when I see the content in a text editor, it is similar to the original file (a jpg for example) but some characters are different, I think that could be an encoding problem as the file size is not the same, but I don't know how to fix that...
Saving the file was as easy as creating a new Buffer with the Uint8Array object :
// chunk is the Uint8Array object
fs.appendFile(path, Buffer.from(chunk), function (err) {
if (err) {
fut.throw(err);
} else {
fut.return(chunk.length);
}
});
Building on Karl.S answer, this worked for me, outside of any framework:
fs.appendFileSync(outfile, Buffer.from(arrayBuffer));
Just wanted to add that in newer Meteor you could avoid some callback hell with async/await. Await will also throw and push the error up to client
Meteor.methods({
uploadFileData: async function(file_id, chunk) {
var path = 'somepath/' + file_id; // be careful with this, make sure to sanitize file_id
await fs.appendFile(path, new Buffer(chunk));
return chunk.length;
}
});

Read a file one line at a time in node.js?

I am trying to read a large file one line at a time. I found a question on Quora that dealt with the subject but I'm missing some connections to make the whole thing fit together.
var Lazy=require("lazy");
new Lazy(process.stdin)
.lines
.forEach(
function(line) {
console.log(line.toString());
}
);
process.stdin.resume();
The bit that I'd like to figure out is how I might read one line at a time from a file instead of STDIN as in this sample.
I tried:
fs.open('./VeryBigFile.csv', 'r', '0666', Process);
function Process(err, fd) {
if (err) throw err;
// DO lazy read
}
but it's not working. I know that in a pinch I could fall back to using something like PHP, but I would like to figure this out.
I don't think the other answer would work as the file is much larger than the server I'm running it on has memory for.
Since Node.js v0.12 and as of Node.js v4.0.0, there is a stable readline core module. Here's the easiest way to read lines from a file, without any external modules:
const fs = require('fs');
const readline = require('readline');
async function processLineByLine() {
const fileStream = fs.createReadStream('input.txt');
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
// Note: we use the crlfDelay option to recognize all instances of CR LF
// ('\r\n') in input.txt as a single line break.
for await (const line of rl) {
// Each line in input.txt will be successively available here as `line`.
console.log(`Line from file: ${line}`);
}
}
processLineByLine();
Or alternatively:
var lineReader = require('readline').createInterface({
input: require('fs').createReadStream('file.in')
});
lineReader.on('line', function (line) {
console.log('Line from file:', line);
});
The last line is read correctly (as of Node v0.12 or later), even if there is no final \n.
UPDATE: this example has been added to Node's API official documentation.
For such a simple operation there shouldn't be any dependency on third-party modules. Go easy.
var fs = require('fs'),
readline = require('readline');
var rd = readline.createInterface({
input: fs.createReadStream('/path/to/file'),
output: process.stdout,
console: false
});
rd.on('line', function(line) {
console.log(line);
});
Update in 2019
An awesome example is already posted on official Nodejs documentation. here
This requires the latest Nodejs is installed on your machine. >11.4
const fs = require('fs');
const readline = require('readline');
async function processLineByLine() {
const fileStream = fs.createReadStream('input.txt');
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
// Note: we use the crlfDelay option to recognize all instances of CR LF
// ('\r\n') in input.txt as a single line break.
for await (const line of rl) {
// Each line in input.txt will be successively available here as `line`.
console.log(`Line from file: ${line}`);
}
}
processLineByLine();
You don't have to open the file, but instead, you have to create a ReadStream.
fs.createReadStream
Then pass that stream to Lazy
require('fs').readFileSync('file.txt', 'utf-8').split(/\r?\n/).forEach(function(line){
console.log(line);
})
there is a very nice module for reading a file line by line, it's called line-reader
with it you simply just write:
var lineReader = require('line-reader');
lineReader.eachLine('file.txt', function(line, last) {
console.log(line);
// do whatever you want with line...
if(last){
// or check if it's the last one
}
});
you can even iterate the file with a "java-style" interface, if you need more control:
lineReader.open('file.txt', function(reader) {
if (reader.hasNextLine()) {
reader.nextLine(function(line) {
console.log(line);
});
}
});
Old topic, but this works:
var rl = readline.createInterface({
input : fs.createReadStream('/path/file.txt'),
output: process.stdout,
terminal: false
})
rl.on('line',function(line){
console.log(line) //or parse line
})
Simple. No need for an external module.
You can always roll your own line reader. I have'nt benchmarked this snippet yet, but it correctly splits the incoming stream of chunks into lines without the trailing '\n'
var last = "";
process.stdin.on('data', function(chunk) {
var lines, i;
lines = (last+chunk).split("\n");
for(i = 0; i < lines.length - 1; i++) {
console.log("line: " + lines[i]);
}
last = lines[i];
});
process.stdin.on('end', function() {
console.log("line: " + last);
});
process.stdin.resume();
I did come up with this when working on a quick log parsing script that needed to accumulate data during the log parsing and I felt that it would nice to try doing this using js and node instead of using perl or bash.
Anyway, I do feel that small nodejs scripts should be self contained and not rely on third party modules so after reading all the answers to this question, each using various modules to handle line parsing, a 13 SLOC native nodejs solution might be of interest .
With the carrier module:
var carrier = require('carrier');
process.stdin.resume();
carrier.carry(process.stdin, function(line) {
console.log('got one line: ' + line);
});
I ended up with a massive, massive memory leak using Lazy to read line by line when trying to then process those lines and write them to another stream due to the way drain/pause/resume in node works (see: http://elegantcode.com/2011/04/06/taking-baby-steps-with-node-js-pumping-data-between-streams/ (i love this guy btw)). I haven't looked closely enough at Lazy to understand exactly why, but I couldn't pause my read stream to allow for a drain without Lazy exiting.
I wrote the code to process massive csv files into xml docs, you can see the code here: https://github.com/j03m/node-csv2xml
If you run the previous revisions with Lazy line it leaks. The latest revision doesn't leak at all and you can probably use it as the basis for a reader/processor. Though I have some custom stuff in there.
Edit: I guess I should also note that my code with Lazy worked fine until I found myself writing large enough xml fragments that drain/pause/resume because a necessity. For smaller chunks it was fine.
In most cases this should be enough:
const fs = require("fs")
fs.readFile('./file', 'utf-8', (err, file) => {
const lines = file.split('\n')
for (let line of lines)
console.log(line)
});
Edit:
Use a transform stream.
With a BufferedReader you can read lines.
new BufferedReader ("lorem ipsum", { encoding: "utf8" })
.on ("error", function (error){
console.log ("error: " + error);
})
.on ("line", function (line){
console.log ("line: " + line);
})
.on ("end", function (){
console.log ("EOF");
})
.read ();
I was frustrated by the lack of a comprehensive solution for this, so I put together my own attempt (git / npm). Copy-pasted list of features:
Interactive line processing (callback-based, no loading the entire file into RAM)
Optionally, return all lines in an array (detailed or raw mode)
Interactively interrupt streaming, or perform map/filter like processing
Detect any newline convention (PC/Mac/Linux)
Correct eof / last line treatment
Correct handling of multi-byte UTF-8 characters
Retrieve byte offset and byte length information on per-line basis
Random access, using line-based or byte-based offsets
Automatically map line-offset information, to speed up random access
Zero dependencies
Tests
NIH? You decide :-)
Since posting my original answer, I found that split is a very easy to use node module for line reading in a file; Which also accepts optional parameters.
var split = require('split');
fs.createReadStream(file)
.pipe(split())
.on('data', function (line) {
//each chunk now is a seperate line!
});
Haven't tested on very large files. Let us know if you do.
function createLineReader(fileName){
var EM = require("events").EventEmitter
var ev = new EM()
var stream = require("fs").createReadStream(fileName)
var remainder = null;
stream.on("data",function(data){
if(remainder != null){//append newly received data chunk
var tmp = new Buffer(remainder.length+data.length)
remainder.copy(tmp)
data.copy(tmp,remainder.length)
data = tmp;
}
var start = 0;
for(var i=0; i<data.length; i++){
if(data[i] == 10){ //\n new line
var line = data.slice(start,i)
ev.emit("line", line)
start = i+1;
}
}
if(start<data.length){
remainder = data.slice(start);
}else{
remainder = null;
}
})
stream.on("end",function(){
if(null!=remainder) ev.emit("line",remainder)
})
return ev
}
//---------main---------------
fileName = process.argv[2]
lineReader = createLineReader(fileName)
lineReader.on("line",function(line){
console.log(line.toString())
//console.log("++++++++++++++++++++")
})
I wanted to tackle this same problem, basically what in Perl would be:
while (<>) {
process_line($_);
}
My use case was just a standalone script, not a server, so synchronous was fine. These were my criteria:
The minimal synchronous code that could reuse in many projects.
No limits on file size or number of lines.
No limits on length of lines.
Able to handle full Unicode in UTF-8, including characters beyond the BMP.
Able to handle *nix and Windows line endings (old-style Mac not needed for me).
Line endings character(s) to be included in lines.
Able to handle last line with or without end-of-line characters.
Not use any external libraries not included in the node.js distribution.
This is a project for me to get a feel for low-level scripting type code in node.js and decide how viable it is as a replacement for other scripting languages like Perl.
After a surprising amount of effort and a couple of false starts this is the code I came up with. It's pretty fast but less trivial than I would've expected: (fork it on GitHub)
var fs = require('fs'),
StringDecoder = require('string_decoder').StringDecoder,
util = require('util');
function lineByLine(fd) {
var blob = '';
var blobStart = 0;
var blobEnd = 0;
var decoder = new StringDecoder('utf8');
var CHUNK_SIZE = 16384;
var chunk = new Buffer(CHUNK_SIZE);
var eolPos = -1;
var lastChunk = false;
var moreLines = true;
var readMore = true;
// each line
while (moreLines) {
readMore = true;
// append more chunks from the file onto the end of our blob of text until we have an EOL or EOF
while (readMore) {
// do we have a whole line? (with LF)
eolPos = blob.indexOf('\n', blobStart);
if (eolPos !== -1) {
blobEnd = eolPos;
readMore = false;
// do we have the last line? (no LF)
} else if (lastChunk) {
blobEnd = blob.length;
readMore = false;
// otherwise read more
} else {
var bytesRead = fs.readSync(fd, chunk, 0, CHUNK_SIZE, null);
lastChunk = bytesRead !== CHUNK_SIZE;
blob += decoder.write(chunk.slice(0, bytesRead));
}
}
if (blobStart < blob.length) {
processLine(blob.substring(blobStart, blobEnd + 1));
blobStart = blobEnd + 1;
if (blobStart >= CHUNK_SIZE) {
// blobStart is in characters, CHUNK_SIZE is in octets
var freeable = blobStart / CHUNK_SIZE;
// keep blob from growing indefinitely, not as deterministic as I'd like
blob = blob.substring(CHUNK_SIZE);
blobStart -= CHUNK_SIZE;
blobEnd -= CHUNK_SIZE;
}
} else {
moreLines = false;
}
}
}
It could probably be cleaned up further, it was the result of trial and error.
Generator based line reader: https://github.com/neurosnap/gen-readlines
var fs = require('fs');
var readlines = require('gen-readlines');
fs.open('./file.txt', 'r', function(err, fd) {
if (err) throw err;
fs.fstat(fd, function(err, stats) {
if (err) throw err;
for (var line of readlines(fd, stats.size)) {
console.log(line.toString());
}
});
});
A new function was added in Node.js v18.11.0 to read files line by line
filehandle.readLines([options])
This is how you use this with a text file you want to read
import { open } from 'node:fs/promises';
myFileReader();
async function myFileReader() {
const file = await open('./TextFileName.txt');
for await (const line of file.readLines()) {
console.log(line)
}
}
To understand more read Node.js documentation here is the link for file system readlines():
https://nodejs.org/api/fs.html#filehandlereadlinesoptions
If you want to read a file line by line and writing this in another:
var fs = require('fs');
var readline = require('readline');
var Stream = require('stream');
function readFileLineByLine(inputFile, outputFile) {
var instream = fs.createReadStream(inputFile);
var outstream = new Stream();
outstream.readable = true;
outstream.writable = true;
var rl = readline.createInterface({
input: instream,
output: outstream,
terminal: false
});
rl.on('line', function (line) {
fs.appendFileSync(outputFile, line + '\n');
});
};
var fs = require('fs');
function readfile(name,online,onend,encoding) {
var bufsize = 1024;
var buffer = new Buffer(bufsize);
var bufread = 0;
var fd = fs.openSync(name,'r');
var position = 0;
var eof = false;
var data = "";
var lines = 0;
encoding = encoding || "utf8";
function readbuf() {
bufread = fs.readSync(fd,buffer,0,bufsize,position);
position += bufread;
eof = bufread ? false : true;
data += buffer.toString(encoding,0,bufread);
}
function getLine() {
var nl = data.indexOf("\r"), hasnl = nl !== -1;
if (!hasnl && eof) return fs.closeSync(fd), online(data,++lines), onend(lines);
if (!hasnl && !eof) readbuf(), nl = data.indexOf("\r"), hasnl = nl !== -1;
if (!hasnl) return process.nextTick(getLine);
var line = data.substr(0,nl);
data = data.substr(nl+1);
if (data[0] === "\n") data = data.substr(1);
online(line,++lines);
process.nextTick(getLine);
}
getLine();
}
I had the same problem and came up with above solution
looks simular to others but is aSync and can read large files very quickly
Hopes this helps
Two questions we must ask ourselves while doing such operations are:
What's the amount of memory used to perform it?
Is the memory consumption increasing drastically with the file size?
Solutions like require('fs').readFileSync() loads the whole file into memory. That means that the amount of memory required to perform operations will be almost equivalent to the file size. We should avoid these for anything larger than 50mbs
We can easily track the amount of memory used by a function by placing these lines of code after the function invocation :
const used = process.memoryUsage().heapUsed / 1024 / 1024;
console.log(
`The script uses approximately ${Math.round(used * 100) / 100} MB`
);
Right now the best way to read particular lines from a large file is using node's readline. The documentation has amazing examples.
This is my favorite way of going through a file, a simple native solution for a progressive (as in not a "slurp" or all-in-memory way) file read with modern async/await. It's a solution that I find "natural" when processing large text files without having to resort to the readline package or any non-core dependency.
let buf = '';
for await ( const chunk of fs.createReadStream('myfile') ) {
const lines = buf.concat(chunk).split(/\r?\n/);
buf = lines.pop() ?? '';
for( const line of lines ) {
console.log(line);
}
}
if(buf.length) console.log(buf); // last line, if file does not end with newline
You can adjust encoding in the fs.createReadStream or use chunk.toString(<arg>). Also this let's you better fine-tune the line splitting to your taste, ie. use .split(/\n+/) to skip empty lines and control the chunk size with fs.createReadStream('myfile', { highWaterMark: <chunkSize> }).
Don't forget to create a function like processLine(line) to avoid repeating the line processing code twice due to the ending buf leftover. Unfortunately, the ReadStream instance does not update its end-of-file flags in this setup, so there's no way, afaik, to detect within the loop that we're in the last iteration without some more verbose tricks like comparing the file size from a fs.Stats() with .bytesRead. Hence the final buf processing solution, unless you're absolutely sure your file ends with a newline \n, in which case the for await loop should suffice.
Performance Considerations
Chunk sizes are important for performance, the default is 64k for text files and, for multi MB files, larger chunks can improve speed by an order of magnitude.
The above snippet runs at least the same speed (or even 5% faster sometimes) as code based on NodeJS v18's fs.readLine() or based on the readline module (the accepted answer), once you tune highWaterMark to something that your machine can handle, ie. setting it to the same size as the file, if your available memory allows it, is the fastest.
In any case, any of NodeJS line-reading answers here are an order of magnitude slower than the Perl or native *Nix solutions.
Similar alternatives
★ If you prefer the evented asynchronous version, this would be it:
let buf = '';
fs.createReadStream('myfile')
.on('data', chunk => {
const lines = buf.concat(chunk).split(/\r?\n/);
buf = lines.pop();
for( const line of lines ) {
console.log(line);
}
})
.on('end', () => buf.length && console.log(buf) );
★ Now if you don't mind importing the stream core package, then this is the equivalent piped stream version, which allows for chaining transforms like gzip decompression:
const { Writable } = require('stream');
let buf = '';
fs.createReadStream('myfile').pipe(
new Writable({
write: (chunk, enc, next) => {
const lines = buf.concat(chunk).split(/\r?\n/);
buf = lines.pop();
for (const line of lines) {
console.log(line);
}
next();
}
})
).on('finish', () => buf.length && console.log(buf) );
I have a little module which does this well and is used by quite a few other projects npm readline Note thay in node v10 there is a native readline module so I republished my module as linebyline https://www.npmjs.com/package/linebyline
if you dont want to use the module the function is very simple:
var fs = require('fs'),
EventEmitter = require('events').EventEmitter,
util = require('util'),
newlines = [
13, // \r
10 // \n
];
var readLine = module.exports = function(file, opts) {
if (!(this instanceof readLine)) return new readLine(file);
EventEmitter.call(this);
opts = opts || {};
var self = this,
line = [],
lineCount = 0,
emit = function(line, count) {
self.emit('line', new Buffer(line).toString(), count);
};
this.input = fs.createReadStream(file);
this.input.on('open', function(fd) {
self.emit('open', fd);
})
.on('data', function(data) {
for (var i = 0; i < data.length; i++) {
if (0 <= newlines.indexOf(data[i])) { // Newline char was found.
lineCount++;
if (line.length) emit(line, lineCount);
line = []; // Empty buffer.
} else {
line.push(data[i]); // Buffer new line data.
}
}
}).on('error', function(err) {
self.emit('error', err);
}).on('end', function() {
// Emit last line if anything left over since EOF won't trigger it.
if (line.length){
lineCount++;
emit(line, lineCount);
}
self.emit('end');
}).on('close', function() {
self.emit('close');
});
};
util.inherits(readLine, EventEmitter);
Another solution is to run logic via sequential executor nsynjs. It reads file line-by-line using node readline module, and it doesn't use promises or recursion, therefore not going to fail on large files. Here is how the code will looks like:
var nsynjs = require('nsynjs');
var textFile = require('./wrappers/nodeReadline').textFile; // this file is part of nsynjs
function process(textFile) {
var fh = new textFile();
fh.open('path/to/file');
var s;
while (typeof(s = fh.readLine(nsynjsCtx).data) != 'undefined')
console.log(s);
fh.close();
}
var ctx = nsynjs.run(process,{},textFile,function () {
console.log('done');
});
Code above is based on this exampe: https://github.com/amaksr/nsynjs/blob/master/examples/node-readline/index.js
i use this:
function emitLines(stream, re){
re = re && /\n/;
var buffer = '';
stream.on('data', stream_data);
stream.on('end', stream_end);
function stream_data(data){
buffer += data;
flush();
}//stream_data
function stream_end(){
if(buffer) stream.emmit('line', buffer);
}//stream_end
function flush(){
var re = /\n/;
var match;
while(match = re.exec(buffer)){
var index = match.index + match[0].length;
stream.emit('line', buffer.substring(0, index));
buffer = buffer.substring(index);
re.lastIndex = 0;
}
}//flush
}//emitLines
use this function on a stream and listen to the line events that is will emit.
gr-
While you should probably use the readline module as the top answer suggests, readline appears to be oriented toward command line interfaces rather than line reading. It's also a little bit more opaque regarding buffering. (Anyone who needs a streaming line oriented reader probably will want to tweak buffer sizes). The readline module is ~1000 lines while this, with stats and tests, is 34.
const EventEmitter = require('events').EventEmitter;
class LineReader extends EventEmitter{
constructor(f, delim='\n'){
super();
this.totalChars = 0;
this.totalLines = 0;
this.leftover = '';
f.on('data', (chunk)=>{
this.totalChars += chunk.length;
let lines = chunk.split(delim);
if (lines.length === 1){
this.leftover += chunk;
return;
}
lines[0] = this.leftover + lines[0];
this.leftover = lines[lines.length-1];
if (this.leftover) lines.pop();
this.totalLines += lines.length;
for (let l of lines) this.onLine(l);
});
// f.on('error', ()=>{});
f.on('end', ()=>{console.log('chars', this.totalChars, 'lines', this.totalLines)});
}
onLine(l){
this.emit('line', l);
}
}
//Command line test
const f = require('fs').createReadStream(process.argv[2], 'utf8');
const delim = process.argv[3];
const lineReader = new LineReader(f, delim);
lineReader.on('line', (line)=> console.log(line));
Here's an even shorter version, without the stats, at 19 lines:
class LineReader extends require('events').EventEmitter{
constructor(f, delim='\n'){
super();
this.leftover = '';
f.on('data', (chunk)=>{
let lines = chunk.split(delim);
if (lines.length === 1){
this.leftover += chunk;
return;
}
lines[0] = this.leftover + lines[0];
this.leftover = lines[lines.length-1];
if (this.leftover)
lines.pop();
for (let l of lines)
this.emit('line', l);
});
}
}
const fs = require("fs")
fs.readFile('./file', 'utf-8', (err, data) => {
var innerContent;
console.log("Asynchronous read: " + data.toString());
const lines = data.toString().split('\n')
for (let line of lines)
innerContent += line + '<br>';
});
I wrap the whole logic of daily line processing as a npm module: line-kit
https://www.npmjs.com/package/line-kit
// example
var count = 0
require('line-kit')(require('fs').createReadStream('/etc/issue'),
(line) => { count++; },
() => {console.log(`seen ${count} lines`)})
I use below code the read lines after verify that its not a directory and its not included in the list of files need not to be check.
(function () {
var fs = require('fs');
var glob = require('glob-fs')();
var path = require('path');
var result = 0;
var exclude = ['LICENSE',
path.join('e2e', 'util', 'db-ca', 'someother-file'),
path.join('src', 'favicon.ico')];
var files = [];
files = glob.readdirSync('**');
var allFiles = [];
var patternString = [
'trade',
'order',
'market',
'securities'
];
files.map((file) => {
try {
if (!fs.lstatSync(file).isDirectory() && exclude.indexOf(file) === -1) {
fs.readFileSync(file).toString().split(/\r?\n/).forEach(function(line){
patternString.map((pattern) => {
if (line.indexOf(pattern) !== -1) {
console.log(file + ' contain `' + pattern + '` in in line "' + line +'";');
result = 1;
}
});
});
}
} catch (e) {
console.log('Error:', e.stack);
}
});
process.exit(result);
})();
I have looked through all above answers, all of them use third-party library to solve it. It's have a simple solution in Node's API. e.g
const fs= require('fs')
let stream = fs.createReadStream('<filename>', { autoClose: true })
stream.on('data', chunk => {
let row = chunk.toString('ascii')
}))

Categories