How to read stream of JSON objects per object - javascript

I have a binary application which generates a continuous stream of json objects (not an array of json objects). Json object can sometimes span multiple lines (still being a valid json object but prettified).
I can connect to this stream and read it without problems like:
var child = require('child_process').spawn('binary', ['arg','arg']);
child.stdout.on('data', data => {
console.log(data);
});
Streams are buffers and emit data events whenever they please, therefore I played with readline module in order to parse the buffers into lines and it works (I'm able to JSON.parse() the line) for Json objects which don't span on multiple lines.
Optimal solution would be to listen on events which return single json object, something like:
child.on('json', object => {
});
I have noticed objectMode option in streams node documentation however I' getting a stream in Buffer format so I belive I'm unable to use it.
Had a look at npm at pixl-json-stream, json-stream but in my opinnion none of these fit the purpose. There is clarinet-object-stream but it would require to build the json object from ground up based on the events.
I'm not in control of the json object stream, most of the time one object is on one line, however 10-20% of the time json object is on multiple lines (\n as EOL) without separator between objects. Each new object always starts on a new line.
Sample stream:
{ "a": "a", "b":"b" }
{ "a": "x",
"b": "y", "c": "z"
}
{ "a": "a", "b":"b" }
There must be a solution already I'm just missing something obvious. Would rather find appropriate module then to hack with regexp the stream parser to handle this scenario.

I'd recommend to try parsing every line:
const readline = require('readline');
const rl = readline.createInterface({
input: child.stdout
});
var tmp = ''
rl.on('line', function(line) {
tmp += line
try {
var obj = JSON.parse(tmp)
child.emit('json', obj)
tmp = ''
} catch(_) {
// JSON.parse may fail if JSON is not complete yet
}
})
child.on('json', function(obj) {
console.log(obj)
})
As the child is an EventEmitter, one can just call child.emit('json', obj).

Having the same requirement, I was uncomfortable enforcing a requirement for newlines to support readline, needed to be able to handle starting the read in the middle of a stream (possibly the middle of a JSON document), and didn't like constantly parsing and checking for errors (seemed inefficient).
As such I preferred using the clarinet sax parser, collecting the documents as I went and emitting doc events once whole JSON documents have been parsed.
I just published this class to NPM
https://www.npmjs.com/package/json-doc-stream

Related

javascript passing a function object to a web worker - ERROR DataCloneError could not be cloned

I need to use a web worker to open a separate thread an do some heavy CPU task.
I would need to task the web worker with a function call and arguments and then get the return, so I went for:
funcs.js
export default function add(args) {
return args[0] + args[1];
}
main.js
import add from './funcs.js';
// [...]
this.worker.postMessage({func: add, args: [7, 3]});
then runtime error:
DataCloneError: Failed to execute postMessage on Worker: function add(args) {
return args[0] + args[1];
}
could not be cloned.
It seems the worker.postMessage method only allow string to be passed,
any idea how I can work this around simply and elegantly?
About postMessage
postMessage documentation give a clear definition about what can or cannot be send to a worker:
postMessage accept only value or JavaScript object handled by the structured clone algorithm, which includes cyclical references.
Looking at the structured clone algorithm, it accept :
All primitive types (However, not symbols), Boolean object, String object, Date, RegExp (The lastIndex field is not preserved.), Blob, File, FileList, ArrayBuffer, ArrayBufferView (This basically means all typed arrays like Int32Array etc.), ImageBitmap, ImageData, Array, Object (This just includes plain objects (e.g. from object literals)), Map, Set
But unfortunately :
Error and Function objects cannot be duplicated by the structured clone algorithm; attempting to do so will throw a DATA_CLONE_ERR exception.
So function is definitely not an option. A simple solution would be to import add directly in your worker.js file, and replace func by a string.
Javascript
this.worker.postMessage( {func: 'ADD', args:[7, 3]} );
worker.js
import add from './funcs.js';
onmessage = function(event) {
const action = event.data;
switch (action.func) {
case 'ADD': {
postMessage({
result: add(action.args)
});
}
break;
....

Apply transform stream to write stream without controlling read stream?

I have a function that expects a write stream to which I am providing the following stream:
const logStream = fs.createWriteStream('./log.txt')
fn(logStream)
fn is provided by a third-party module, so I do not control its implementation. Internally, I know that fn eventually does this:
// super simplified
fn (logStream) {
// ...
stream.pipe(logStream, { end: true })
// ...
}
My issue is that I know that the read stream stream contains ANSI escape codes which I don't want to be outputted to my log.txt. After a quick google search, I found chalk/strip-ansi-stream, which is a transform stream designed to do just that.
So, being the Node streams newbie that I am, I decided to try to modify my code to this:
const stripAnsiStream = require('strip-ansi-stream')
const logStream = fs.createWriteStream('./log.txt')
fn(stripAnsiStream().pipe(logStream))
... which does not work: my log file still contains content with the ANSI escape codes. I think this is because instead of creating a chain like
a.pipe(b).pipe(c)
I've actually done
a.pipe(b.pipe(c))
How can I apply this transform stream to my write stream without controlling the beginning of the pipe chain where the read stream is provided?
For the purpose of chaining, stream.pipe() returns the input argument. The return value of b.pipe(c) is c.
When you call fn(b.pipe(c)), you're actually bypassing transform stream b and inputting the write stream c directly.
Case #1: a.pipe(b.pipe(c))
b.pipe(c)
a.pipe(c)
Case #2: a.pipe(b).pipe(c)
a.pipe(b)
b.pipe(c)
The transform stream can be piped into the log stream, and then passed into the module separately. You're effectively using case #2, but starting the pipes in reverse order.
const stripAnsiStream = require('strip-ansi-stream')
const fn = require('my-third-party-module')
const transformStream = stripAnsiStream()
const logStream = fs.createWriteStream('./log.txt')
transformStream.pipe(logStream)
fn(transformStream)

stringify object array to JSON Selectively

I have an Object array named users.
The object format in this array looks like this:
var userExample = {pub:{name:'John', id:'100'}, priv:{location:'NYC', phone:'000000'}};
As a restful service, clients may request information of all users.
And obviously I just want to send public information to them.
So I want to serialize my data selectively by keys(priv key will be ignored)
Here is my code snippet:
var users = [];
function censor(key, value) {
if (key == priv) {
return undefined;
}
return value;
}
app.get('/listUsers', function(req, res){
res.end(JSON.stringify(users, censor));
});
When I run these code, an error occurred:
ReferenceError: priv is not defined
I'm a Javascript beginner, please help.
Change priv to "priv".
But your approach is dangerous. In similar conditions I usually create a new object to export and I explicitly copy the properties which should be exported, this way there's no risk of leak on future data structure changes. A white list is always more future proof than a black list.
Newer versions of JSON.stringify() have a replacer array
E.g.
```
JSON.stringify(foo, ['week', 'month']);
// '{"week":45,"month":7}', only keep "week" and "month" properties
```
Try with:
if (key == "priv")
This should work.

Node.js: Capture STDOUT of `child_process.spawn`

I need to capture in a custom stream outputs of a spawned child process.
child_process.spawn(command[, args][, options])
For example,
var s = fs.createWriteStream('/tmp/test.txt');
child_process.spawn('ifconfig', [], {stdio: [null, s, null]})
Now how do I read from the /tmp/test.txt in real time?
It looks like child_process.spawn is not using stream.Writable.prototype.write nor stream.Writable.prototype._write for its execution.
For example,
s.write = function() { console.log("this will never get printed"); };
As well as,
s.__proto__._write = function() { console.log("this will never get printed"); };
It looks like it uses file descriptors under-the-hood to write from child_process.spawn to a file.
Doing this does not work:
var s2 = fs.createReadStream('/tmp/test.txt');
s2.on("data", function() { console.log("this will never get printed either"); });
So, how can I get the STDOUT contents of a child process?
What I want to achieve is to stream STDOUT of a child process to a socket. If I provide the socket directly to the child_process.spawn as a stdio parameter it closes the socket when it finishes, but I want to keep it open.
Update:
The solution is to use default {stdio: ['pipe', 'pipe', 'pipe']} options and listen to the created .stdout of the child process.
var cmd = child_process.spaw('ifconfig');
cmd.stdout.on("data", (data) => { ... });
Now, to up the ante, a more challenging question:
-- How do you read the STDOUT of the child process and still preserve the colors?
For example, if you send STDOUT to process.stdout like so:
child_process.spawn('ifconfig', [], {stdio: [null, process.stdout, null]});
it will keep the colors and print colored output to the console, because the .isTTY property is set to true on process.stdout.
process.stdout.isTTY // true
Now if you use the default {stdio: ['pipe', 'pipe', 'pipe']}, the data you will read will be stripped of console colors. How do you get the colors?
One way to do that would be creating your own custom stream with fs.createWriteStream, because child_process.spawn requires your streams to have a file descriptor.
Then setting .isTTY of that stream to true, to preserve colors.
And finally you would need to capture the data what child_process.spawn writes to that stream, but since child_process.spawn does not use .prototype.write nor .prototype._write of the stream, you would need to capture its contents in some other hacky way.
That's probably why child_process.spawn requires your stream to have a file descriptor because it bypasses the .prototype.write call and writes directly to the file under-the-hood.
Any ideas how to implement this?
You can do it without using a temporary file:
var process = child_process.spawn(command[, args][, options]);
process.stdout.on('data', function (chunk) {
console.log(chunk);
});
Hi I'm on my phone but I will try to guide you as I can. I will clarify when near a computer if needed
What I think you want is to read the stdout from a spawn and do something with the data?
You can give the spawn a variable name instead of just running the function, e.g:
var child = spawn();
Then listen to the output like:
child.stdout.on('data', function(data) {
console.log(data.toString());
});
You could use that to write the data then to a file or whatever you may want to do with it.
The stdio option requires file descriptors, not stream objects, so one way to do it is use use fs.openSync() to create an output file descriptor and us that.
Taking your first example, but using fs.openSync():
var s = fs.openSync('/tmp/test.txt', 'w');
var p = child_process.spawn('ifconfig', [], {stdio: [process.stdin, s, process.stderr]});
You could also set both stdout and stderr to the same file descriptor (for the same effect as bash's 2>&1).
You'll need to close the file when you are done, so:
p.on('close', function(code) {
fs.closeSync(s);
// do something useful with the exit code ...
});

How to reorganize/refactor large sets of static javascript data

I have some inline-javascript containing large datasets which are hard-coded into my PHP site:
var statsData = {
"times" : [1369008000,1369094400,1369180800,],
"counts" : [49,479,516,]
};
I'd like to refactor my code so that my variables are served with this structure:
[
[1369008000, 49],
[1369094400, 479],
[1369180800, 516],
]
However I have many files to update - are there any tools that would help automate this process?
Just create a new array then loop through the original one, and place the values according to the indexes:
var statsData = {"times":[1369008000,1369094400,1369180800,],"counts":[49,479,516,]};
var result = [];//Create a new array for results.
for (var i = 0; i < statsData.times.length; ++i){//Loop the original object's times property from 0 to it's length.
result.push([statsData.times[i], statsData.counts[i]]);//Push a new array to the result array in the new order that will contain the original values you acces throught the index in the loop variable.
}
console.log(result);
Also in your code you have two start [ in your object's counts attribute but only one ] closing it.
Carrying on from the comments; Trying to parse JS from a mix of PHP/HTML is horrible so if you are prepared to do some copying and pasting then - if it were me - I'd opt for a simple command-line tool. As your Javascript won't validate as JSON it doesn't make much sense to try and parse it in any other language.
I've knocked up a quick script to work with your current example (I'll leave it up to you to extend it further as needed). To run it you will need to install Node.js
Next, save the following where ever you like to organise files - lets call it statsData.js:
process.stdin.resume();
process.stdin.setEncoding('utf8');
process.stdin.on('data', function(data){
try {
eval(data+';global.data=statsData');
processData();
} catch(e) {
process.stdout.write('Error: Invalid Javascript\n');
}
});
function processData(){
try {
var i, out = [];
while(i = data.times.shift())
out.push([i, data.counts.shift()||0]);
process.stdout.write('var statsData='+JSON.stringify(out)+';\n');
} catch(e) {
process.stdout.write('Error: Unexpected Javascript\n');
}
}
Now you have a CLI tool that works with standard I/O, to use it open a terminal window and run:
$ node path/to/statsData.js
It will then sit and wait for you to copy and paste valid javascript statements into the terminal, otherwise you could always pipe the input stream from a file where you have copied and pasted your JS to:
$ cat inputFile.js | node path/to/statsData.js > outputFile.js
cat is a unix command - if you are working on a windows machine I think the equivalent is type - but I'm unable to test that right now.

Categories