createReadStream in Node.JS - javascript

So I used fs.readFile() and it gives me
"FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of
memory"
since fs.readFile() loads the whole file into memory before calling the callback, should I use fs.createReadStream() instead?
That's what I was doing previously with readFile:
fs.readFile('myfile.json', function (err1, data) {
if (err1) {
console.error(err1);
} else {
var myData = JSON.parse(data);
//Do some operation on myData here
}
}
Sorry, I'm kind of new to streaming; is the following the right way to do the same thing but with streaming?
var readStream = fs.createReadStream('myfile.json');
readStream.on('end', function () {
readStream.close();
var myData = JSON.parse(readStream);
//Do some operation on myData here
});
Thanks

If the file is enormous then yes, streaming will be how you want to deal with it. However, what you're doing in your second example is letting the stream buffer all the file data into memory and then handling it on end. It's essentially no different than readFile that way.
You'll want to check out JSONStream. What streaming means is that you want to deal with the data as it flows by. In your case you obviously have to do this because you cannot buffer the entire file into memory all at once. With that in mind, hopefully code like this makes sense:
JSONStream.parse('rows.*.doc')
Notice that it has a kind of query pattern. That's because you will not have the entire JSON object/array from the file to work with all at once, so you have to think more in terms of how you want JSONStream to deal with the data as it finds it.
You can use JSONStream to essentially query for the JSON data that you are interested in. This way you're never buffering the whole file into memory. It does have the downside that if you do need all the data, then you'll have to stream the file multiple times, using JSONStream to pull out only the data you need right at that moment, but in your case you don't have much choice.
You could also use JSONStream to parse out data in order and do something like dump it into a database.
JSONStream.parse is similar to JSON.parse but instead of returning a whole object it returns a stream. When the parse stream gets enough data to form a whole object matching your query, it will emit a data event with the data being the document that matches your query. Once you've configured your data handler you can pipe your read stream into the parse stream and watch the magic happen.
Example:
var JSONStream = require('JSONStream');
var readStream = fs.createReadStream('myfile.json');
var parseStream = JSONStream.parse('rows.*.doc');
parseStream.on('data', function (doc) {
db.insert(doc); // pseudo-code for inserting doc into a pretend database.
});
readStream.pipe(parseStream);
That's the verbose way to help you understand what's happening. Here is a more succinct way:
var JSONStream = require('JSONStream');
fs.createReadStream('myfile.json')
.pipe(JSONStream.parse('rows.*.doc'))
.on('data', function (doc) {
db.insert(doc);
});
Edit:
For further clarity about what's going on, try to think about it like this. Let's say you have a giant lake and you want to treat the water to purify it and move the water to a new reservoir. If you had a giant magical helicopter with a huge bucket then you could fly over the lake, put the lake in the bucket, add treatment chemicals to it, then fly it to its destination.
The problem of course being that there is no such helicopter that can deal with that much weight or volume. It's simply impossible, but that doesn't mean we can't accomplish our goal a different way. So instead you build a series of rivers (streams) between the lake and the new reservoir. You then set up cleansing stations in these rivers that purify any water that passes through it. These stations could operate in a variety of ways. Maybe the treatment can be done so fast that you can let the river flow freely and the purification will just happen as the water travels down the stream at maximum speed.
It's also possible that it takes some time for the water to be treated, or that the station needs a certain amount of water before it can effectively treat it. So you design your rivers to have gates and you control the flow of the water from the lake into your rivers, letting the stations buffer just the water they need until they've performed their job and released the purified water downstream and on to its final destination.
That's almost exactly what you want to do with your data. The parse stream is your cleansing station and it buffers data until it has enough to form a whole document that matches your query, then it pushes just that data downstream (and emits the data event).
Node streams are nice because most of the time you don't have to deal with opening and closing the gates. Node streams are smart enough to control backflow when the stream buffers a certain amount of data. It's as if the cleansing station and the gates on the lake are talking to each other to work out the perfect flow rate.
If you had a streaming database driver then you'd theoretically be able to create some kind of insert stream and then do parseStream.pipe(insertStream) instead of handling the data event manually :D. Here's an example of creating a filtered version of your JSON file, in another file.
fs.createReadStream('myfile.json')
.pipe(JSONStream.parse('rows.*.doc'))
.pipe(JSONStream.stringify())
.pipe(fs.createWriteStream('filtered-myfile.json'));

Related

Get data from Stream.Writable into a string variable

I am using the #kubernetes/client-node library.
My end goal is to execute commands (say "ls") and get the output for further processing.
The .exec() method requires providing two Writeable streams (for the WebSocket to write the output to), and one Readable stream (for pushing our commands to).
The code I have looks something like this:
const outputStream = new Stream.Writable();
const commandStream = new Stream.Readable();
const podExec = await exec.exec(
"myNamespace",
"myPod",
"myContainer",
["/bin/sh", "-c"],
outputStream,
outputStream,
commandStream,
true
);
commandStream.push("ls -l\n");
// get the data from Writable stream here
outputStream.destroy();
commandStream.destroy();
podExec.close();
I am pretty new to JS and am having trouble getting the output from the Writable stream since it doesn't allow direct reading.
Creating a Writable stream to a file and then reading from it seems unnecessarily overcomplicated.
I would like to write the output as a string to a variable.
Has anyone encountered the same task before, and if so, what can you suggest to get the command output?
I would appreciate any help on this matter!

How can i create an array as a database with JSON files and use Javascript to update / save it

I am making a discord bot in Node.js mostly for fun to get better at coding and i want the bot to push a string into an array and update the array file permanently.
I have been using separate .js files for my arrays such as this;
module.exports = [
"Map: Battlefield",
"Map: Final Destination",
"Map: Pokemon Stadium II",
];
and then calling them in my main file. Now i tried using .push() and it will add the desired string but only that one time.
What is the best solution to have an array i can update & save the inputs? apparently JSON files are good for this.
Thanks, Carl
congratulations on the idea of writing a bot in order to get some coding practice. I bet you will succeed with it!
I suggest you try to split your problem into small chunks, so it is going to be easier to reason about it.
Step1 - storing
I agree with you in using JSON files as data storage. For an app that is intended to be a "training gym" is more than enough and you have all the time in the world to start looking into databases like Postgres, MySQL or Mongo later on.
A JSON file to store a list of values may look like that:
{
"values": [
"Map: Battlefield",
"Map: Final Destination",
"Map: Pokemon Stadium II"
]
}
when you save this piece of code into list1.json you have your first data file.
Step2 - reading
Reading a JSON file in NodeJS is easy:
const list1 = require('./path-to/list1.json');
console.log(list.values);
This will load the entire content of the file in memory when your app starts. You can also look into more sophisticated ways to read files using the file system API.
Step3 - writing
Looks like you know your ways around in-memory array modifications using APIs like push() or maybe splice().
Once you have fixed the memory representation you need to persist the change into your file. You basically have to write it down in JSON format.
Option n.1: you can use the Node's file system API:
// https://stackoverflow.com/questions/2496710/writing-files-in-node-js
const fs = require('fs');
const filePath = './path-to/list1.json';
const fileContent = JSON.stringify(list1);
fs.writeFile(filePath, fileContent, function(err) {
if(err) {
return console.log(err);
}
console.log("The file was saved!");
});
Option n.2: you can use fs-extra which is an extension over the basic API:
const fs = require('fs-extra');
const filePath = './path-to/list1.json';
fs.writeJson(filePath, list1, function(err) {
if(err) {
return console.log(err);
}
console.log("The file was saved!");
});
In both cases list1 comes from the previous steps, and it is where you did modify the array in memory.
Be careful of asynchronous code:
Both the writing examples use non-blocking asynchronous API calls - the link points to a decent article.
For simplicity sake, you can first start by using the synchronous APIs which is basically:
fs.writeFileSync
fs.writeJsonSync
You can find all the details into the links above.
Have fun with bot coding!

Discord User History - Saving Multiple Things In A JSON File

I'm trying to set up a history command for when a player gets warned with a moderation bot. I have it where it saves it in a JSON file however I don't know how to save multiple cases for one player. Right now it just replaces the last warn.
//--Adds To Logs
let warnhist = JSON.parse(fs.readFileSync("./historywarn.json", "utf8"));
warnhist[wUser.id] = {
Case: `${wUser} was warned by ${message.author} for ${reason}`
};
fs.writeFile("./historywarn.json", JSON.stringify(warnhist), (err) => {
if (err) console.log(err)
});
It saves like this without adding onto it every time:
{"407104392647409664":{"Case":"<#407104392647409664> was warned by <#212770377833775104> for 2nd Warn"}}
I need it to save like this:
{
"407104392647409664":{"Case":"<#407104392647409664> was warned by <#212770377833775104> for 1st Warn", "Case 2":"<#407104392647409664> was warned by <#212770377833775104> for 2nd Warn" }
}
You want an array instead of an object. Try structuring your data a bit more too, so your JSON looks like this:
{ "<user id>": { "warnings": [...] }
Then instead of overriding the history every time with an assignment warnhist[wUser.id] = ... you can use Array.prototype.push to add a new one.
Taking a step back though, I don't think JSON is a good strategy to store this data, especially if it contains all warnings for all users. You need to load the entire JSON and read the file and write the file every single time you add a warning, and there could be conflicts if you have multiple instances of your service doing the same for a large number of users.
For a super similar API and storage format but better performance and consistency, try MongoDB instead of a plain JSON file.

Are there any tutorials or examples for cubism.js + WebSocket?

Are there any tutorials specifically about connecting WebSockets (or other non-polling data source) and cubism.js?
In particular, I'd like to be able to create a real time graph of data streaming from server, visually similar to example on the cubism page.
References:
- https://github.com/square/cubism/issues/5
- http://xaranke.github.io/articles/cubism-intro/
- Using Other Data Sources for cubism.js
Here's something I'm toying with. It's not authoritative but it seems to work: https://gist.github.com/cuadue/6427101
When data comes in from the websocket, put it in a buffer. Pump the callbacks (I'll explain those below), sending the buffer as the argument. Check the return code for "success" or "wait for more data". Success means data was sent to cubism and we can remove this callback.
When cubism requests a frame of data, set up a callback which checks if the last point in the buffer is after the last point cubism requested. Otherwise, wait for more data.
If there's data to cover the stop of the requested frame, we'll fulfill this request. Without an API to request history, we have to drop data going into the past.
Then, just interpolate the buffer onto the cubism step size.
It seems like cubism requests data from the same point in time multiple times, so it's up to you how to prune your buffer. I don't think it's safe to just drop all data earlier than the requested start time.
I did a quick and dirty hack:
Websocket fill a realTimeData array
Cubism does the initial fetch from some Web services, then pull from realTimeData array
var firstTime = true;
context.metric(function(start, stop, step, callback) {
if (firstTime) {
firstTime = false;
d3.json("... {
var historicalData = [];
callback(null, historicalData);
}
} else {
callback(null, realTimeData);
}
Note that cubism.js expects 6 points per fetch (cubism_metricOverlap) so make sure to keep 6 points in realTimeData

Check/Log how much bandwidth PhantomJS/CasperJS used

Is it possible to check/log how much data has been transferred during each run of PhantomJs/CasperJS?
Each instance of Phantom/Casper has a instance_id assigned to it (by the PHP function that spun up the instance). After the run has finished, the amount of data transferred and the instance_id will have to make its way to be inserted into a MySQL database, possibly via the PHP function that spawned the instance. This way the bandwidth utilization of individual phantomjs runs can be logged.
There can be many phantom/casper instances running, each lasting a minute or two.
The easiest and most accurate approach when trying to capture data is to get the collector and emitter as close as possible. In this case it would be ideal if phantomjs could capture that data that you need and send it back to your PHP function to associate it to the instance_id and do the database interaction. Turns out it can (at least partially).
Here is one approach:
var page = require('webpage').create();
var bytesReceived = 0;
page.onResourceReceived = function (res) {
if (res.bodySize) {
bytesReceived += res.bodySize;
}
};
page.open("http://www.google.com", function (status) {
console.log(bytesReceived);
phantom.exit();
});
This captures the size of all resources retrieved, adds them up, and spits out the result to standard output where your PHP code is able to work with it. This does not include the size of headers or any POST activity. Depending upon your application, this might be enough. If not, then hopefully this gives you a good jumping off point.

Categories