JSON string directly from node mongodb - javascript

I've been doing a series of load tests on a simple server to try and determine what is negatively impacting the load on my much more complicated node/express/mongodb app. One of the things that consistently comes up is string manipulation require for converting an in-memory object to JSON in the express response.
The amount of data that I'm pulling from mongodb via node and sending over the wire is ~200/300 KB uncompressed. (Gzip will turn this into 28k which is much better.)
Is there a way to have the native nodejs mongodb driver stringify the results for me? Right now for each request with the standard .toArray() we're doing the following:
Query the database, finding the results and transferring them to the node native driver
Native driver then turns them into an in-memory javascript object
My code then passes that in-memory object to express
Express then converts it to a string for node's http response.send using JSON.stringify() (I read the source Luke.)
I'm looking to get the stringify work done at a c++/native layer so that it doesn't add processing time to my event loop. Any suggestions?
Edit 1:
It IS a proven bottleneck.
There may easily be other things that can be optimized, but here's what the load tests are showing.
We're hitting the same web sever with 500 requests over a few seconds. With this code:
app.get("/api/blocks", function(req, res, next){
db.collection('items').find().limit(20).toArray(function(err, items){
if(err){
return next(err);
}
return res.send(200, items);
});
});
overall mean: 323ms, 820ms for 95th%
If instead I swap out the json data:
var cached = "[{... "; //giant json blob that is a copy+paste of the response in above code.
app.get("/api/blocks", function(req, res, next){
db.collection('items').find().limit(20).toArray(function(err, items){
if(err){
return next(err);
}
return res.send(200, cached);
});
});
mean is 164 ms, 580 for 95th%
Now you might say, "Gosh Will a mean of 323ms is great, what's your problem?" My problem is that this is an example in which stringify is causes a doubling of the response time.
From my testing I can also tell you these useful things:
Gzip was a 2x or better gain on response time. The above is with gzip
Express adds a nearly imperceptible amount over overhead compared to generic nodejs
Batching the data by doing cursor.each and then sending each individual item to the response is way worse
Update 2:
Using a profiling tool: https://github.com/baryshev/look
This is while hitting my production code on the same database intensive process over and over. The request includes a mongodb aggregate and sends back ~380KB data (uncompressed).
That function is very small and includes the var body = JSON.stringify(obj, replacer, spaces); line.

It sounds like you should just stream directly from Mongo to Express.
Per this question that asks exactly this:
cursor.stream().pipe(JSONStream.stringify()).pipe(res);

Related

How to upload massive number of books automatically to database

I have a client who has 130k books (~4 terabyte) and he wants a site to upload them into it and make an online library. So, how can I make it possible for him to upload them automatically or at least upload multiple books per time? I'll be using Node.js + mysql
I might suggest using Object Storage on top of MySQL to speed up book indexing and retrieval- but that is entirely up to you.
HTTP is a streaming protocol, and node, interestingly, has streams.
This means that, when you send a HTTP request to a node server, internally, node handles it as a stream. This means, theoretically, you can upload massive books to your web server while only having a fraction of it in memory at a time.
The first thing is- book can be very large. In order to efficiently process them, we must process the metadata (name, author, etc) and content seperately.
One example, using an express-like framework, could be: pseudo-code
app.post('/begin/:bookid', (Req, Res) => {
ParseJSON(Req)
MySQL.addColumn(Req.params.bookid,Req.body.name,Req.body.author)
})
app.put('/upload/:bookid', (Req, Res) => {
// If we use MySQL to store the books:
MySQL.addColumn(Req.params.bookid,Req.body)
// Or if we use object storage:
let Uploader = new StorageUploader(Req.params.bookid)
Req.pipe(Uploader)
})
If you need inspiration, look at how WeTransfer has created their API. They deal with lots of data daily- their solution might be helpful to you.
Remember- your client likely won't want to use Postman to upload their books. Build a simple website for them in Svelte or React.

How to send and receive large JSON data

I'm relatively new to full-stack development, and currently trying to figure out an effective way to send and fetch large data between my front-end (React) and back-end (Express) while minimizing memory usage. Specifically, I'm building a mapping app which requires me to play around with large JSON files (10-100mb).
My current setup works for smaller JSON files:
Backend:
const data = require('../data/data.json');
router.get('/', function(req, res, next) {
res.json(data);
});
Frontend:
componentDidMount() {
fetch('/')
.then(res => res.json())
.then(data => this.setState({data: data}));
}
However, if data is bigger than ~40mb, the backend would crash if I test on local due to running out of memory. Also, holding onto the data with require() takes quite a bit of memory as well.
I've done some research and have a general understanding of JSON parsing, stringifying, streaming, and I think the answer lies somewhere with using chunked json stream to send the data bit by bit, but am pretty much at a loss on its implementation, especially using a single fetch() to do so (is this even possible?).
Definitely appreciate any suggestions on how to approach this.
First off, 40mb is huge and can be inconsiderate to your users especially if there' s a high probability of mobile use.
If possible, it would be best to collect this data on the backend, probably put it onto disk, and then provide only the necessary data to the frontend as it's needed. As the map needs more data, you would make further calls to the backend.
If this isn't possible, you could load this data with the client-side bundle. If the data doesn't update too frequently, you can even cache it on the frontend. This would at least prevent the user from needing to fetch it repeatedly.
Alternatively, you can read the JSON via a stream on the server and stream the data to the client and use something like JSONStream to parse the data on the client.
Here's an example of how to stream JSON from your server via sockets: how to stream JSON from your server via sockets

Node.js - passing sql data to a module on client request

I am connecting to a sql db and returning data on the view using res.json. The client sends a request - my server uses a mssql driver and a connection string to connect to that database and retrieve some data. So I've got a connection using GET, POST, ect.
However I am encountering a logical problem as I want to pass some data from the sql db to a module which will then use that data to prepare a json response. When I hard code an array with couple of parameters it works, but it don't know how to send a request from node.js to a db and propagate that array for a module to consume when a client sends a request. (When a client sends a request the module send a request to that db and the db returns some parameters which then the module can use to prepare a response.)
Any ideas? Could you point me in the right direction? How from a logical point of view such solution can work?
I am using node.js, express, mssql module to connect to db. I am not looking for specific code just to point me in the right direction, and if you've got any examples of course I'm happy to see those.
You will probably need to have a chain of callbacks and pass data through them, something like this:
app.get('/users', function (req, res) {
database.find('find criteria', function (err, data) {
mymodule.formatData(data, function(err, json) {
res.json(json);
})
});
});
So you just nest the callbacks until you have everything you need to send the response.
You need to get used to this style of programming in node.js.
There are also some solutions to avoid too deep callbacks nesting - split your callbacks into individual functions or use async, promises, es6 generators.

Override res.write in NodeJS

I've been playing with NodeJS for a while now, and I've really been enjoying it, but I've run myself into a wall though...
I'm trying to create a connect module to intercept http.ServerResponse methods. My end goal is to allow the user to apply some kind of filter to outgoing data. For example, they would have the option to apply compression or something before the data goes out.
I am having this weird bug though... this method is getting called twice as often as it should be.
Here's my code:
var http = require('http'), orig;
orig = http.ServerResponse.prototype.write;
function newWrite (chunk) {
console.log("Called");
orig.call(this, chunk);
}
http.ServerResponse.prototype.write = newWrite;
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.write("Hello");
res.write(" World");
res.end();
console.log("Done");
}).listen(12345);
This works and I get 'Hello World' as output when I access this using a browser, but I'm getting 4 'Called' to the console, and 'Done' gets output twice. This leads me to believe that for some reason, my server code is getting called twice. I did a console.log in my newWrite method on this.constructor, and the constructor in both instances is ServerResponse, so that doesn't help much.
I'm really confused about what is going on here. What could possibly be going on?? This doesn't directly affect the output, but I could potentially be serving gigabytes of compressed data to many clients simultaneously, and doing everything twice will put undue strain on my server.
This is going to be part of a larger fileserver, hence the emphasis on not doing everything twice.
Edit:
I've already read this question in case you're wondering:
Can I use an http.ServerResponse as a prototype in node.js?
There is no problem with your code: if you add a console.log(req.url) in the createServer call, you'll probably see that it is actually called twice by your browser. Most browsers make a request for the requested url and an additional call for /favicon.ico if no favicon is specified in the html markup.
You can use connect's favicon middleware to avoid that problem:
http://senchalabs.github.com/connect/middleware-favicon.html

Monitoring Mongo for changes with Node.js

I'm using Node.js for some project work and I would like to monitor my Mongo database (collection) for changes, basically fire an event if something gets added.
Anyone know if this is possible? I'm using the node-mongodb-native drivers.
If it's not I'd also like any available pointers on pushing data from the server (run with node) to the client browser.
The question is if all data is added to your database through your node.js app. If so, you can use the EventEmitter class of node.js to trigger an event (http://nodejs.org/api.html#eventemitter-14).
If the database is populated by some other app, things are getting difficult. In this case you would need something like a database trigger, which is AFAIK not yet availabled in MongoDB.
Pushing Events to the Client (aka Comet) will be possible once the HTML 5 websockets API makes its way into all major browsers.
In the meantime, you can only try to emulate this behaviour using techniques like (long-term) AJAX polling, forever frame etc. but each of them has its weaknesses.
I would turn on replication in your mongodb. There is a replicate? database that contains a list of changes, similar to the mysql replication log. You can monitor that.
-daniel
Almost 3y since the last answer. I would suggest looking at:
Pub Sub for nodejs and MongoDB https://github.com/scttnlsn/mubsub
npm install mubsub should get you there
collection.insert({"key1":val1,"key2":"val2"}, function(err, info){
if(err){
//handle this
}
else{
if(info){
you call a fireandforgetfunction(info); here
that can write to logs or send to SQS or do some other child spawn or in process thing. This could even be a callback but I think a fire and forget may do in most circumstances.
I say fire and forget because I presume you don't need to hold
up the response so you can return what ever you need to the client.
And in part-answer to your other question you can return JSON like this
db.close();
var myJSON =[];
sys.puts("Cool info stored and did a non blocking fire and forget for some other mongo monitoring stuff/process and sending control back to the browser");
sys.puts(sys.inspect(info));//remove later
myJSON.push({"status":"success"});
myJSON.push({"key1":val1,"key2":val2});//or whatev you want to send
res.writeHead(200, { "Content-Type" : "text/plain" });
res.write(JSON.stringify(myJSON));
res.end();
}
}

Categories