I'm trying to read a query from BigQuery and stream it to the front-end. In Node.js-land with Express, this would be:
app.get('/endpoint', (req, res) => {
bigQuery.createQueryStream(query).pipe(res);
});
However, createQueryStream() does not create a Node.js stream, instead it's a custom stream object that returns table rows and as such it fails:
(node:21236) UnhandledPromiseRejectionWarning: TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be one of type string or Buffer. Received type object
This is confirmed in the official documentation:
bigquery.createQueryStream(query)
.on('data', function(row) {
// row is a result from your query.
})
So, is there a way to stream BigQuery data to the front-end? I've thought two potential solutions but wanted to know if anyone knows a better way:
JSON.stringify() the row and return JSONL instead of plain JSON. This adds a front-end burden to decode it, but makes it fairly easy on both sides.
Move to the REST API and do actual streaming with request like: request(url, { body: { query, params } }).pipe(res) (or whatever is the specific API, haven't dug there yet).
I was confused that a Node.js library that says that it does streaming doesn't work with Node.js native streams, but this seems to be the case.
BigQuery is intended to be used with a wide array of different client libraries written for different programming languages, and therefore, it does not return nodejs-specific data structures, but rather, more general structures which are common to mostly any structured programming language, such as objects. Answering to your questions, yes, there is a way to stream BigQuery data to the front-end, but this is a rather personal choice, because all it entails is converting from one data type to another. However, I would say the most straight-forward way to do this is by calling JSON.stringify(), which you have already mentioned.
I hope that helps.
We ended up making an implementation that stitched together the reply from BigQuery into a big JSON array:
exports.stream = (query, params, res) => {
// Light testing for descriptive errors in the parameters
for (let key in params) {
if (typeof params[key] === "number" && isNaN(params[key])) {
throw new TypeError(`The parameter "${key}" should be a number`);
}
}
return new Promise((resolve, reject) => {
let prev = false;
const onData = row => {
try {
// Only handle it when there's a row
if (!row) return;
// There was a previous row written before, so add a comma
if (prev) {
res.write(",");
}
res.write(stringify(row));
prev = true;
} catch (error) {
console.error("Cannot parse row:", error);
// Just ignore it, don't write this frame
}
};
const onEnd = () => {
res.write("]");
res.end();
resolve();
};
res.writeHead(200, { "Content-Type": "application/json" });
res.write("[");
bigQuery
.createQueryStream({ query, params })
.on("error", reject)
.on("data", onData)
.on("end", onEnd);
});
};
It will build a large JSON array by stitching together:
[ // <- First character sent
stringify(row1) // <- First row
, // <- add comma on second row iteration
stringify(row2) // <- Second row
...
stringify(rowN) // <- Last row
] // <- Send the "]" character to close the array
This has the advantages:
The data is sent as soon as available, so the bandwidth needs are lower.
(depends on BigQuery implementation) lower memory needs on the server side since not all the data is hold at once in memory, only small chunks.
Related
I have a lambda (nodeJs) that reads a file (.ref) in a S3 bucket and publishes its content in a topic inside the AWS IoT-Core broker.
The file contains something like this (50 lines):
model:type
aa;tp1
bb;tpz
cc;tpf
dd;tp1
The code must remove the first line and retrieve the remains 50 lines. This is the code
async function gerRef (BUCKET_NAME) {
const refFile = version + '/file.ref'
const ref = await getObject(FIRMWARE_BUCKET_NAME, refFile)
//Get the file content
const refString = ref.toString('utf8')
//Slip by each line
let arrRef = refString.split('\n')
//Remove the file header
arrRef.shift()
//join the lines
let refString = arrRef.join('\n')
return refString
}
Then I am getting this result and publishing in the AWS IoT-Core Broker like this:
const publishMqtt = (params) =>
new Promise((resolve, reject) =>
iotdata.publish(params, (err, res) => resolve(res)))
...
let refData = await gerRef (bucket1)
let JsonPayload = {
"attr1":"val1",
"machineConfig":`${refData}` //Maybe here is the issue
}
let params = {
topic: 'test/1',
payload: JSON.stringify(JsonPayload) //Maybe here is the issue
qos: '0'
};
await publishMqtt(params)
...
Then it publishes in the broker.
The issue is that the content is being published without a real new line. When I see in the broker I get the follow JSON:
{
"attr1":"val1",
"machineConfig":"aa;tp1\nbb;tpz\ncc;tpf\ndd;tp1"
}
The machine that receives this message is expecting a real new line, something like this:
{
"attr1":"val1",
"machineConfig":"aa;tp1
bb;tpz
cc;tpf
dd;tp1"
}
If I just copy and paste this entire JSON in the AWS IoT-Core interface it will complain about the JSON parse but will publish as string and the machine will accept the data- because the new line is there:
In short, the main point here is that:
We can use the JSON.stringify(JsonPayload) - The broker will accept
I don't know how to stringfy and keep the actual new line
I have tried these solutions but none of then worked: s1, s2, s3
Any guess in how to achieve this?
What that machines is expecting is wrong. In JSON any newline data inside a value must be escaped, and \n in the string is the correct way to do it. This is the fault of the receiver's expectations.
A "real" newline would result in an invalid JSON document and most parsers will flat-out reject it.
On the receiving end JSON deserializer can deal with \n encoded strings. If your receiver requires newlines it's broken and needs repairing. If you can't repair it then you're committed to sending busted up, malformed JSON-ish data that's not actually JSON and your broker is fully justified in trashing it.
I've pulled json from a File using NodeJS's fs.createReadStream() and I'm now finding difficulty writing data back into the File (already parsing and then stringifying as appropriate).
The Discord-Bot I'm developing deletes text-channels then 'recreates' them (with the same title) to clear chat - it grabs the channel IDs dynamically and puts them in a file, until the channels are deleted.
However, the file-writing procedure ends up in errors.
This was my first attempt:
let channels_json = fs.createReadStream()
//let channels_json = fs.readFileSync(`${__dirname}\\..\\json\\channels.json`);
let obj = (JSON.parse(channels_json)).channelsToClear;
let i = 0;
obj.forEach(id => {
i++;
if(id === originalId){
obj[i] = channela.id;
}
});
obj += (JSON.parse(channels_json)).infoChannel;
obj += "abc";
json = JSON.stringify(obj);
channels_json.write(json);
This was my second:
let id_to_replace = message.guild.channels.get(channels[channel]).id;
//let channels_json = fs.readFileSync(`${__dirname}\\..\\json\\channels.json`);
let obj;
let channels_json = fs.createReadStream(`${__dirname}\\..\\json\\channels.json`,function(err,data){
if (err) throw err;
obj = JSON.parse(data);
if (obj["channelsToClear"].indexOf(id_to_replace) > -1) {
obj["channelsToClear"][obj["channelsToClear"].indexOf(id_to_replace)] = channela.id;
//then replace the json file with new parsed one
channels_json.writeFile(`${__dirname}\\..\\json\\channels.json`, JSON.stringify(obj), function(){
console.log("Successfully replaced channels.json contents");
});
//channels_json.end();
}
});
The final outcome was to update the 'channelsToClear' array within the json file with new channel-IDs. Console/Node output varied, all of which had to do with "create channels with an options object" or "Buffer.write" (all irrelevant) - the json file remained unchanged..
You're using Streams incorrectly. You can't write back out through a read stream.
For a simple script type thing, streaming is probably overkill. Streams are quite a bit more complicated and while worth it for high-efficienty applications but not for something that looks like it is going to be relatively quick.
Use the fs.readFileSync and fs.writeFileSync to get and write the data instead.
As far as your actual searching for and replacing the channel, I think either approach would work, but assuming there is only ever going to be one replacement, the second approach is probably better.
I'm not a super experienced coder, so forgive me if the question is rather too simple.
I have a csv with many rows, and one of its columns is 'id'. How can I remove just one row based on the id (i.e. code should search for id and delete that row)?
I got the following so far (not too helpful since on one day I may need to remove id 5 and on another I may need to remove id 2...) Thank you so much!
var fs = require('fs')
fs.readFile(filename, 'utf8', function(err, data)
{
if (err)
{
// check and handle err
}
var linesExceptFirst = data.split('\n').slice(1).join('\n');
fs.writeFile(filename, linesExceptFirst);
});
PS: it must be in javascript as the code is running on a nodejs server
You'll need to parse the CSV which is simple with Array.prototype.map()
Then you'll need to use Array.prototype.filter() to find the column value you are after.
It is just a couple lines of code and you are all set:
var fs = require('fs')
// Set this up someplace
var idToSearchFor = 2;
// read the file
fs.readFile('csv.csv', 'utf8', function(err, data)
{
if (err)
{
// check and handle err
}
// Get an array of comma separated lines`
let linesExceptFirst = data.split('\n').slice(1);
// Turn that into a data structure we can parse (array of arrays)
let linesArr = linesExceptFirst.map(line=>line.split(','));
// Use filter to find the matching ID then return only those that don't matching
// deleting the found match
// Join then into a string with new lines
let output = linesArr.filter(line=>parseInt(line[0]) !== idToSearchFor).join("\n");
// Write out new file
fs.writeFileSync('new.csv', output);
});
Note that I removed the call to .join() so we can operate on the array created from the call to .split(). The rest is commented.
And finally, a working example can be found here: https://repl.it/#randycasburn/Parse-CSV-and-Find-row-by-column-value-ID
EDIT: The code will now return all rows except the found id. Hence, in essence, deleting the row. (Per OPs comment request).
EDIT2: Now outputting to new CSV file per request.
I'd like to crawl data over SSH in a server cluster with NodeJS.
The remote scripts output JSON that is then parsed and split into an object stream.
My problem is now that the callback-oriented libraries I use (SSH2, MySQL) lead to a callback-pattern that I find hard to match with the Readable API spec. How to implement _read(size) when the data to push is behind a bunch of callbacks?
My current implementation takes advantage of the fact that Streams are also EventEmitters. I start to populate my data upon constructing the Stream instance. When all my callbacks are done, I emit an event. I then listen on the custom event, and only then do I start to push data downwards down the pipe chain.
// Calling code
var stream = new CrawlerStream(argsForTheStream);
stream.on('queue_completed', function() {
stream
.pipe(logger)
.pipe(dbWriter)
.on('end', function() {
// Close db connection etc...
});
});
A mock of the CrawlerStream would be
// Mock of the Readable stream implementation
function CrawlerStream(args) {
// boilerplate
// array holding the data to push
this.data = [];
// semi-colon separated string of commands
var cmdQueue = getCommandQueue();
var self = this;
db.query(sql, function(err, sitesToCrawl, fields) {
var servers = groupSitesByServer(sitesToCrawl);
for (var s in servers) {
sshConnect(getRemoteServer(s), function(err, conn) {
sshExec({
ssh: conn,
cmd: cmdQueue
}, function(err, stdout, stderr) {
// Stdout is parsed as JSON
// Finally I can populate self.data!
// Check if all servers are done
// If I'm the last callback to execute
self.data.push(null);
self.emit('queue_completed');
})
});
}
});
}
util.inherits(CrawlerStream, Readable);
CrawlerStream.prototype._read = function(size) {
while (this.data.length) {
this.push(this.data.shift());
}
}
I'm unsure if this is the idiomatic way to accomplish this and would like to get your advice.
Please note in your answers that I'd like to retain the vanilla NodeJS style of using callbacks (no promises) and that I'm stuck with ES5.
Thanks for your time!
This is actually the exercise No.8 from the Node.js tutorial ([https://github.com/workshopper/learnyounode][1])
The goal:
Write a program that performs an HTTP GET request to a URL provided to you as the first command-line argument. Collect all data from the server (not just the first "data" event) and then write two lines to the console (stdout).
The first line you write should just be an integer representing the number of characters received from the server. The second line should contain the complete String of characters sent by the server.
So here's my solution(It passes but looks uglier compared to the official solution).
var http = require('http'),
bl = require('bl');
var myBL = new bl(function(err, myBL){
console.log(myBL.length);
console.log(myBL.toString());
});
var url = process.argv[2];
http.get(url, function(res){
res.pipe(myBL);
res.on('end', function(){
myBL.end();
});
});
The official solution:
var http = require('http')
var bl = require('bl')
http.get(process.argv[2], function (response) {
response.pipe(bl(function (err, data) {
if (err)
return console.error(err)
data = data.toString()
console.log(data.length)
console.log(data)
}))
})
I have difficulties understanding how the official solution works. I have mainly two questions:
The bl constructor expects the 2nd argument to be an instance of
bl (according to bl module's documentation,
[https://github.com/rvagg/bl#new-bufferlist-callback--buffer--buffer-array-][2]),
but what is data? It came out of nowhere. It should be undefined
when it is passed to construct the bl instance.
when is bl.end()
called? I can see no where that the bl.end() is called...
Hope someone can shed some light on these questions. (I know I should've read the source code, but you know...)
[1]: https://github.com/workshopper/learnyounode
[2]: https://github.com/rvagg/bl#new-bufferlist-callback--buffer--buffer-array-
This portion of the bl github page more or less answers your question:
Give it a callback in the constructor and use it just like
concat-stream:
const bl = require('bl')
, fs = require('fs')
fs.createReadStream('README.md')
.pipe(bl(function (err, data) { // note 'new' isn't strictly required
// `data` is a complete Buffer object containing the full data
console.log(data.toString())
}))
Note that when you use the callback method like this, the resulting
data parameter is a concatenation of all Buffer objects in the
list. If you want to avoid the overhead of this concatenation (in
cases of extreme performance consciousness), then avoid the callback
method and just listen to 'end' instead, like a standard Stream.
You're passing a callback to bl, which is basically a function that it will call when it has a stream of data to do something with. Thus, data is undefined for now... it's just a parameter name that will later be used to pass the text from the GET call for printing.
I believe that bl.end() doesn't have be called because there's no real performance overhead to letting it run, but I could be wrong.
I have read the source code of bl library and node stream API.
BufferList is a custom duplex stream,that is both Readable and Writable.When you run readableStream.pipe(BufferList), by default end() is called on BufferList as the destination when the source stream emits end() which fires when there will be no more data to read.
See the implementation of BufferList.prorotype.end:
BufferList.prototype.end = function (chunk) {
DuplexStream.prototype.end.call(this, chunk)
if (this._callback) {
this._callback(null, this.slice())
this._callback = null
}
}
So the callback passed to BufferList, will be called after BufferList received all data from the source stream, call this.slice() will return the result of concatenating all the Buffers in the BufferList where is the data parameter comes from.
var request=require('request')
request(process.argv[2],function(err,response,body){
console.log(body.length);
console.log(body);
})
you can have a look on this approach to solve the above exercise,
p.s request is a third party module though