I have a file which stores many JavaScript objects in JSON form and I need to read the file, create each of the objects, and do something with them (insert them into a db in my case). The JavaScript objects can be represented a format:
Format A:
[{name: 'thing1'},
....
{name: 'thing999999999'}]
or Format B:
{name: 'thing1'} // <== My choice.
...
{name: 'thing999999999'}
Note that the ... indicates a lot of JSON objects. I am aware I could read the entire file into memory and then use JSON.parse() like this:
fs.readFile(filePath, 'utf-8', function (err, fileContents) {
if (err) throw err;
console.log(JSON.parse(fileContents));
});
However, the file could be really large, I would prefer to use a stream to accomplish this. The problem I see with a stream is that the file contents could be broken into data chunks at any point, so how can I use JSON.parse() on such objects?
Ideally, each object would be read as a separate data chunk, but I am not sure on how to do that.
var importStream = fs.createReadStream(filePath, {flags: 'r', encoding: 'utf-8'});
importStream.on('data', function(chunk) {
var pleaseBeAJSObject = JSON.parse(chunk);
// insert pleaseBeAJSObject in a database
});
importStream.on('end', function(item) {
console.log("Woot, imported objects into the database!");
});*/
Note, I wish to prevent reading the entire file into memory. Time efficiency does not matter to me. Yes, I could try to read a number of objects at once and insert them all at once, but that's a performance tweak - I need a way that is guaranteed not to cause a memory overload, not matter how many objects are contained in the file.
I can choose to use FormatA or FormatB or maybe something else, just please specify in your answer. Thanks!
To process a file line-by-line, you simply need to decouple the reading of the file and the code that acts upon that input. You can accomplish this by buffering your input until you hit a newline. Assuming we have one JSON object per line (basically, format B):
var stream = fs.createReadStream(filePath, {flags: 'r', encoding: 'utf-8'});
var buf = '';
stream.on('data', function(d) {
buf += d.toString(); // when data is read, stash it in a string buffer
pump(); // then process the buffer
});
function pump() {
var pos;
while ((pos = buf.indexOf('\n')) >= 0) { // keep going while there's a newline somewhere in the buffer
if (pos == 0) { // if there's more than one newline in a row, the buffer will now start with a newline
buf = buf.slice(1); // discard it
continue; // so that the next iteration will start with data
}
processLine(buf.slice(0,pos)); // hand off the line
buf = buf.slice(pos+1); // and slice the processed data off the buffer
}
}
function processLine(line) { // here's where we do something with a line
if (line[line.length-1] == '\r') line=line.substr(0,line.length-1); // discard CR (0x0D)
if (line.length > 0) { // ignore empty lines
var obj = JSON.parse(line); // parse the JSON
console.log(obj); // do something with the data here!
}
}
Each time the file stream receives data from the file system, it's stashed in a buffer, and then pump is called.
If there's no newline in the buffer, pump simply returns without doing anything. More data (and potentially a newline) will be added to the buffer the next time the stream gets data, and then we'll have a complete object.
If there is a newline, pump slices off the buffer from the beginning to the newline and hands it off to process. It then checks again if there's another newline in the buffer (the while loop). In this way, we can process all of the lines that were read in the current chunk.
Finally, process is called once per input line. If present, it strips off the carriage return character (to avoid issues with line endings – LF vs CRLF), and then calls JSON.parse one the line. At this point, you can do whatever you need to with your object.
Note that JSON.parse is strict about what it accepts as input; you must quote your identifiers and string values with double quotes. In other words, {name:'thing1'} will throw an error; you must use {"name":"thing1"}.
Because no more than a chunk of data will ever be in memory at a time, this will be extremely memory efficient. It will also be extremely fast. A quick test showed I processed 10,000 rows in under 15ms.
Just as I was thinking that it would be fun to write a streaming JSON parser, I also thought that maybe I should do a quick search to see if there's one already available.
Turns out there is.
JSONStream "streaming JSON.parse and stringify"
Since I just found it, I've obviously not used it, so I can't comment on its quality, but I'll be interested to hear if it works.
It does work consider the following Javascript and _.isString:
stream.pipe(JSONStream.parse('*'))
.on('data', (d) => {
console.log(typeof d);
console.log("isString: " + _.isString(d))
});
This will log objects as they come in if the stream is an array of objects. Therefore the only thing being buffered is one object at a time.
As of October 2014, you can just do something like the following (using JSONStream) - https://www.npmjs.org/package/JSONStream
var fs = require('fs'),
JSONStream = require('JSONStream'),
var getStream() = function () {
var jsonData = 'myData.json',
stream = fs.createReadStream(jsonData, { encoding: 'utf8' }),
parser = JSONStream.parse('*');
return stream.pipe(parser);
}
getStream().pipe(MyTransformToDoWhateverProcessingAsNeeded).on('error', function (err) {
// handle any errors
});
To demonstrate with a working example:
npm install JSONStream event-stream
data.json:
{
"greeting": "hello world"
}
hello.js:
var fs = require('fs'),
JSONStream = require('JSONStream'),
es = require('event-stream');
var getStream = function () {
var jsonData = 'data.json',
stream = fs.createReadStream(jsonData, { encoding: 'utf8' }),
parser = JSONStream.parse('*');
return stream.pipe(parser);
};
getStream()
.pipe(es.mapSync(function (data) {
console.log(data);
}));
$ node hello.js
// hello world
I had similar requirement, i need to read a large json file in node js and process data in chunks and call a api and save in mongodb.
inputFile.json is like:
{
"customers":[
{ /*customer data*/},
{ /*customer data*/},
{ /*customer data*/}....
]
}
Now i used JsonStream and EventStream to achieve this synchronously.
var JSONStream = require("JSONStream");
var es = require("event-stream");
fileStream = fs.createReadStream(filePath, { encoding: "utf8" });
fileStream.pipe(JSONStream.parse("customers.*")).pipe(
es.through(function(data) {
console.log("printing one customer object read from file ::");
console.log(data);
this.pause();
processOneCustomer(data, this);
return data;
}),
function end() {
console.log("stream reading ended");
this.emit("end");
}
);
function processOneCustomer(data, es) {
DataModel.save(function(err, dataModel) {
es.resume();
});
}
I realize that you want to avoid reading the whole JSON file into memory if possible, however if you have the memory available it may not be a bad idea performance-wise. Using node.js's require() on a json file loads the data into memory really fast.
I ran two tests to see what the performance looked like on printing out an attribute from each feature from a 81MB geojson file.
In the 1st test, I read the entire geojson file into memory using var data = require('./geo.json'). That took 3330 milliseconds and then printing out an attribute from each feature took 804 milliseconds for a grand total of 4134 milliseconds. However, it appeared that node.js was using 411MB of memory.
In the second test, I used #arcseldon's answer with JSONStream + event-stream. I modified the JSONPath query to select only what I needed. This time the memory never went higher than 82MB, however, the whole thing now took 70 seconds to complete!
I wrote a module that can do this, called BFJ. Specifically, the method bfj.match can be used to break up a large stream into discrete chunks of JSON:
const bfj = require('bfj');
const fs = require('fs');
const stream = fs.createReadStream(filePath);
bfj.match(stream, (key, value, depth) => depth === 0, { ndjson: true })
.on('data', object => {
// do whatever you need to do with object
})
.on('dataError', error => {
// a syntax error was found in the JSON
})
.on('error', error => {
// some kind of operational error occurred
})
.on('end', error => {
// finished processing the stream
});
Here, bfj.match returns a readable, object-mode stream that will receive the parsed data items, and is passed 3 arguments:
A readable stream containing the input JSON.
A predicate that indicates which items from the parsed JSON will be pushed to the result stream.
An options object indicating that the input is newline-delimited JSON (this is to process format B from the question, it's not required for format A).
Upon being called, bfj.match will parse JSON from the input stream depth-first, calling the predicate with each value to determine whether or not to push that item to the result stream. The predicate is passed three arguments:
The property key or array index (this will be undefined for top-level items).
The value itself.
The depth of the item in the JSON structure (zero for top-level items).
Of course a more complex predicate can also be used as necessary according to requirements. You can also pass a string or a regular expression instead of a predicate function, if you want to perform simple matches against property keys.
If you have control over the input file, and it's an array of objects, you can solve this more easily. Arrange to output the file with each record on one line, like this:
[
{"key": value},
{"key": value},
...
This is still valid JSON.
Then, use the node.js readline module to process them one line at a time.
var fs = require("fs");
var lineReader = require('readline').createInterface({
input: fs.createReadStream("input.txt")
});
lineReader.on('line', function (line) {
line = line.trim();
if (line.charAt(line.length-1) === ',') {
line = line.substr(0, line.length-1);
}
if (line.charAt(0) === '{') {
processRecord(JSON.parse(line));
}
});
function processRecord(record) {
// Process the records one at a time here!
}
I solved this problem using the split npm module. Pipe your stream into split, and it will "Break up a stream and reassemble it so that each line is a chunk".
Sample code:
var fs = require('fs')
, split = require('split')
;
var stream = fs.createReadStream(filePath, {flags: 'r', encoding: 'utf-8'});
var lineStream = stream.pipe(split());
linestream.on('data', function(chunk) {
var json = JSON.parse(chunk);
// ...
});
Using the #josh3736 answer, but for ES2021 and Node.js 16+ with async/await + AirBnb rules:
import fs from 'node:fs';
const file = 'file.json';
/**
* #callback itemProcessorCb
* #param {object} item The current item
*/
/**
* Process each data chunk in a stream.
*
* #param {import('fs').ReadStream} readable The readable stream
* #param {itemProcessorCb} itemProcessor A function to process each item
*/
async function processChunk(readable, itemProcessor) {
let data = '';
let total = 0;
// eslint-disable-next-line no-restricted-syntax
for await (const chunk of readable) {
// join with last result, remove CR and get lines
const lines = (data + chunk).replace('\r', '').split('\n');
// clear last result
data = '';
// process lines
let line = lines.shift();
const items = [];
while (line) {
// check if isn't a empty line or an array definition
if (line !== '' && !/[\[\]]+/.test(line)) {
try {
// remove the last comma and parse json
const json = JSON.parse(line.replace(/\s?(,)+\s?$/, ''));
items.push(json);
} catch (error) {
// last line gets only a partial line from chunk
// so we add this to join at next loop
data += line;
}
}
// continue
line = lines.shift();
}
total += items.length;
// Process items in parallel
await Promise.all(items.map(itemProcessor));
}
console.log(`${total} items processed.`);
}
// Process each item
async function processItem(item) {
console.log(item);
}
// Init
try {
const readable = fs.createReadStream(file, {
flags: 'r',
encoding: 'utf-8',
});
processChunk(readable, processItem);
} catch (error) {
console.error(error.message);
}
For a JSON like:
[
{ "name": "A", "active": true },
{ "name": "B", "active": false },
...
]
https.get(url1 , function(response) {
var data = "";
response.on('data', function(chunk) {
data += chunk.toString();
})
.on('end', function() {
console.log(data)
});
});
I think you need to use a database. MongoDB is a good choice in this case because it is JSON compatible.
UPDATE:
You can use mongoimport tool to import JSON data into MongoDB.
mongoimport --collection collection --file collection.json
I am trying to write a JXA script in Apple Script Editor, that compresses a string using the LZ algorithm and writes it to a text (JSON) file:
var story = "Once upon a time in Silicon Valley..."
var storyC = LZString.compress(story)
var data_to_write = "{\x22test\x22\x20:\x20\x22"+storyC+"\x22}"
app.displayAlert(data_to_write)
var desktopString = app.pathTo("desktop").toString()
var file = `${desktopString}/test.json`
writeTextToFile(data_to_write, file, true)
Everything works, except that the LZ compressed string is just transformed to a set of "?" by the time it reaches the output file, test.json.
It should look like:
{"test" : "㲃냆Њޱᐈ攀렒삶퓲ٔ쀛䳂䨀푖㢈Ӱນꀀ"}
Instead it looks like:
{"test" : "????????????????????"}
I have a feeling the conversion is happening in the app.write command used by the writeTextToFile() function (which I pulled from an example in Apple's Mac Automation Scripting Guide):
var app = Application.currentApplication()
app.includeStandardAdditions = true
function writeTextToFile(text, file, overwriteExistingContent) {
try {
// Convert the file to a string
var fileString = file.toString()
// Open the file for writing
var openedFile = app.openForAccess(Path(fileString), { writePermission: true })
// Clear the file if content should be overwritten
if (overwriteExistingContent) {
app.setEof(openedFile, { to: 0 })
}
// Write the new content to the file
app.write(text, { to: openedFile, startingAt: app.getEof(openedFile) })
// Close the file
app.closeAccess(openedFile)
// Return a boolean indicating that writing was successful
return true
}
catch(error) {
try {
// Close the file
app.closeAccess(file)
}
catch(error) {
// Report the error is closing failed
console.log(`Couldn't close file: ${error}`)
}
// Return a boolean indicating that writing was successful
return false
}
}
Is there a substitute command for app.write that maintains the LZ compressed string / a better way to accomplish what I am trying to do?
In addition, I am using the readFile() function (also from the Scripting Guide) to load the LZ string back into the script:
function readFile(file) {
// Convert the file to a string
var fileString = file.toString()
// Read the file and return its contents
return app.read(Path(fileString))
}
But rather than returning:
{"test" : "㲃냆Њޱᐈ攀렒삶퓲ٔ쀛䳂䨀푖㢈Ӱນꀀ"}
It is returning:
"{\"test\" : \"㲃냆੠Њޱᐈ攀렒삶퓲ٔ쀛䳂䨀푖㢈Ӱນꀀ\"}"
Does anybody know a fix for this too?
I know that it is possible to use Cocoa in JXA scripts, so maybe the solution lies therein?
I am just getting to grips with JavaScript so I'll admit trying to grasp Objective-C or Swift is way beyond me right now.
I look forward to any solutions and/or pointers that you might be able to provide me. Thanks in advance!
After some further Googl'ing, I came across these two posts:
How can I write UTF-8 files using JavaScript for Mac Automation?
read file as class utf8
I have thus altered my script accordingly.
writeTextToFile() now looks like:
function writeTextToFile(text, file) {
// source: https://stackoverflow.com/a/44293869/11616368
var nsStr = $.NSString.alloc.initWithUTF8String(text)
var nsPath = $(file).stringByStandardizingPath
var successBool = nsStr.writeToFileAtomicallyEncodingError(nsPath, false, $.NSUTF8StringEncoding, null)
if (!successBool) {
throw new Error("function writeFile ERROR:\nWrite to File FAILED for:\n" + file)
}
return successBool
};
While readFile() looks like:
ObjC.import('Foundation')
const readFile = function (path, encoding) {
// source: https://github.com/JXA-Cookbook/JXA-Cookbook/issues/25#issuecomment-271204038
pathString = path.toString()
!encoding && (encoding = $.NSUTF8StringEncoding)
const fm = $.NSFileManager.defaultManager
const data = fm.contentsAtPath(pathString)
const str = $.NSString.alloc.initWithDataEncoding(data, encoding)
return ObjC.unwrap(str)
};
Both use Objective-C to overcome app.write and app.read's inability to handle UTF-8.
So I wanted to save a file on the client storage using Store.js.
I can change the date using store.set and i can log it to console to see the change, but then it's supposed to be saved in app data where it's not created.
I tried to get the Path where it's being saved and it's :
C:\Users\USER\AppData\Roaming\stoma2/Categories.json
I noticed that there is a "/" so I tried :
C:\Users\USER\AppData\Roaming\stoma2\Categories.json
and :
C:/Users/USER/AppData/Roaming/stoma2/Categories.json
But all 3 of them didn't work.
This is my Store.js :
const fs = require('browserify-fs');
var fs2 = require('filereader'),Fs2 = new fs2();
const electron = window.require('electron');
const path = require('path');
class Store {
constructor(opts) {
// Renderer process has to get `app` module via `remote`, whereas the main process can get it directly
// app.getPath('userData') will return a string of the user's app data directory path.
//const userDataPath = (electron.app || electron.remote.app).getPath('userData');
var userDataPath = (electron.app || electron.remote.app).getPath('userData');
for(var i=0;i<userDataPath.length;i++){
if(userDataPath.charAt(i)=="\\"){
userDataPath = userDataPath.replace("\\","/");
}
}
// We'll use the `configName` property to set the file name and path.join to bring it all together as a string
this.path = path.join(userDataPath, opts.configName + '.json');
this.data = parseDataFile(this.path, opts.defaults);
console.log(this.path);
}
// This will just return the property on the `data` object
get(key) {
return this.data[key];
}
// ...and this will set it
set(key, val) {
this.data[key] = val;
// Wait, I thought using the node.js' synchronous APIs was bad form?
// We're not writing a server so there's not nearly the same IO demand on the process
// Also if we used an async API and our app was quit before the asynchronous write had a chance to complete,
// we might lose that data. Note that in a real app, we would try/catch this.
fs.writeFile(this.path, JSON.stringify(this.data));
}
}
function parseDataFile(filePath, data) {
// We'll try/catch it in case the file doesn't exist yet, which will be the case on the first application run.
// `fs.readFileSync` will return a JSON string which we then parse into a Javascript object
try {
return JSON.parse(Fs2.readAsDataURL(new File(filePath)));
} catch(error) {
// if there was some kind of error, return the passed in defaults instead.
return data;
}
}
// expose the class
export default Store;
There might be a probleme fith js.writeFile() (well that's the source of probleme).
and this is my call :
//creation
const storeDefCat = new Store({
configName: "Categories",
defaults: require("../data/DefaultCategorie.json")
})
//call for the save
storeDefCat.set('Pizza',{id:0,path:storeDefCat.get('Pizza').path});
For now if possible,I might need to find another way to save the file.
And i tried : fs : It doesn't work for me for some reason (I get strange errors that they don't want to be fixed..) .
If anyone has an Idea then please I would be grateful.
So I managed to fix the probleme, Why fs was sending me errors about undefined functions?Why file wasn't getting created ? It has NOTHING to do with the code it self, but the imports...
To clearify, I was using :
const fs = require('fs');
And the solution is to make it like :
const fs = window.require('fs');
Just adding window. fixed all the problems .Since it's my first time using electron I wasn't used to import from the window but it seems it's necessary.And more over...There was no posts saying this is the fix.
I need some help to understand how stream work in NodeJS
I explain, i need to write a module which call a UNIX process (with spawn) and I want to redirect the stdout of this process to a Readable Stream.
I want this behavior to exports the Readable Stream and allow another module to read them.
To do this, I have write a little piece of code :
var spawn = require('child_process').spawn
var Duplex = require('stream').Duplex;
var stream = new Duplex;
var start = function() {
ps = spawn('mycmd', [/*... args ...*/]);
ps.stdout.pipe(stream);
};
exports.stream = stream;
exports.start = start;
But if I use this module I throw an exception which say that the stream doesn't implement the _read method.
Can you help me with this problem ?
Thanks in advance.
[EDIT] I have try the solution of creating a Stream object, but that's doesnt work, here is the code:
var spawn = require('child_process').spawn;
var Stream = require('stream');
var ps = null;
var audio = new Stream;
audio.readable = audio.writable = true;
var start = function() {
if(ps == null) {
ps = spawn('mycmd', []);
ps.stdout.pipe(stream);
}
};
var stop = function() {
if(ps) {
ps.kill();
ps = null;
}
};
exports.stream = stream;
exports.start = start;
exports.stop = stop;
But when I try to listen the stream, I encount an new error :
_stream_readable.js:583
var written = dest.write(chunk);
^
TypeError: Object #<Stream> has no method 'write'
Most of Node's Stream classes aren't meant to be used directly, but as the base of a custom type:
Note that stream.Duplex is an abstract class designed to be extended with an underlying implementation of the _read(size) and _write(chunk, encoding, callback) methods as you would with a Readable or Writable stream class.
One notable exception is stream.PassThrough, which is a simple echo stream implementation.
var PassThrough = require('stream').PassThrough;
var stream = new PassThrough;
Also note that ps will be a global, making it directly accessible in all other modules.
If you simply want to use stream then you should do :
var stream = new Stream;
stream.readable = stream.writable = true;
Duplex is meant for developers. Some methods like _read and _write need to be implemented for it.
[Update]
OK, you have data source, from the stdout. You will need write function, use this :
stream.write = function(data){this.emit('data', data);};
I have a library that takes as input a ReadableStream, but my input is just a base64 format image. I could convert the data I have in a Buffer like so:
var img = new Buffer(img_string, 'base64');
But I have no idea how to convert it to a ReadableStream or convert the Buffer I obtained to a ReadableStream.
Is there a way to do this?
For nodejs 10.17.0 and up:
const { Readable } = require('stream');
const stream = Readable.from(myBuffer);
something like this...
import { Readable } from 'stream'
const buffer = new Buffer(img_string, 'base64')
const readable = new Readable()
readable._read = () => {} // _read is required but you can noop it
readable.push(buffer)
readable.push(null)
readable.pipe(consumer) // consume the stream
In the general course, a readable stream's _read function should collect data from the underlying source and push it incrementally ensuring you don't harvest a huge source into memory before it's needed.
In this case though you already have the source in memory, so _read is not required.
Pushing the whole buffer just wraps it in the readable stream api.
Node Stream Buffer is obviously designed for use in testing; the inability to avoid a delay makes it a poor choice for production use.
Gabriel Llamas suggests streamifier in this answer: How to wrap a buffer as a stream2 Readable stream?
You can create a ReadableStream using Node Stream Buffers like so:
// Initialize stream
var myReadableStreamBuffer = new streamBuffers.ReadableStreamBuffer({
frequency: 10, // in milliseconds.
chunkSize: 2048 // in bytes.
});
// With a buffer
myReadableStreamBuffer.put(aBuffer);
// Or with a string
myReadableStreamBuffer.put("A String", "utf8");
The frequency cannot be 0 so this will introduce a certain delay.
You can use the standard NodeJS stream API for this - stream.Readable.from
const { Readable } = require('stream');
const stream = Readable.from(buffer);
Note: Don't convert a buffer to string (buffer.toString()) if the buffer contains binary data. It will lead to corrupted binary files.
You don't need to add a whole npm lib for a single file. i refactored it to typescript:
import { Readable, ReadableOptions } from "stream";
export class MultiStream extends Readable {
_object: any;
constructor(object: any, options: ReadableOptions) {
super(object instanceof Buffer || typeof object === "string" ? options : { objectMode: true });
this._object = object;
}
_read = () => {
this.push(this._object);
this._object = null;
};
}
based on node-streamifier (the best option as said above).
Here is a simple solution using streamifier module.
const streamifier = require('streamifier');
streamifier.createReadStream(new Buffer ([97, 98, 99])).pipe(process.stdout);
You can use Strings, Buffer and Object as its arguments.
This is my simple code for this.
import { Readable } from 'stream';
const newStream = new Readable({
read() {
this.push(someBuffer);
},
})
Try this:
const Duplex = require('stream').Duplex; // core NodeJS API
function bufferToStream(buffer) {
let stream = new Duplex();
stream.push(buffer);
stream.push(null);
return stream;
}
Source:
Brian Mancini -> http://derpturkey.com/buffer-to-stream-in-node/