How to send and receive large JSON data - javascript

I'm relatively new to full-stack development, and currently trying to figure out an effective way to send and fetch large data between my front-end (React) and back-end (Express) while minimizing memory usage. Specifically, I'm building a mapping app which requires me to play around with large JSON files (10-100mb).
My current setup works for smaller JSON files:
Backend:
const data = require('../data/data.json');
router.get('/', function(req, res, next) {
res.json(data);
});
Frontend:
componentDidMount() {
fetch('/')
.then(res => res.json())
.then(data => this.setState({data: data}));
}
However, if data is bigger than ~40mb, the backend would crash if I test on local due to running out of memory. Also, holding onto the data with require() takes quite a bit of memory as well.
I've done some research and have a general understanding of JSON parsing, stringifying, streaming, and I think the answer lies somewhere with using chunked json stream to send the data bit by bit, but am pretty much at a loss on its implementation, especially using a single fetch() to do so (is this even possible?).
Definitely appreciate any suggestions on how to approach this.

First off, 40mb is huge and can be inconsiderate to your users especially if there' s a high probability of mobile use.
If possible, it would be best to collect this data on the backend, probably put it onto disk, and then provide only the necessary data to the frontend as it's needed. As the map needs more data, you would make further calls to the backend.
If this isn't possible, you could load this data with the client-side bundle. If the data doesn't update too frequently, you can even cache it on the frontend. This would at least prevent the user from needing to fetch it repeatedly.
Alternatively, you can read the JSON via a stream on the server and stream the data to the client and use something like JSONStream to parse the data on the client.
Here's an example of how to stream JSON from your server via sockets: how to stream JSON from your server via sockets

Related

best way to load 100mb+ json file nodejs

I have a system that generates data every day
And saves the data in the json file. The file about 120MB
I'm trying to send the data with nodejs to the client
router.get('/getData',(req, res) => {
const newData = require(`./newDataJson.json`)
res.json(newData)`});
and then from the client i use axios get request
const fetchNewData = async () => {
const { data } = await axios.get(`/api/getData/`);}
The data reaches the client within about 3 minutes in production.
and few second in localhost.
My question is if it possible to Shorten the load time in production.
thanks !!
I would suggest you to use stream in Node.js that is suitable to send the large data from server to client. Stream are useful when you send big chunk of data. You should give it a try and check if there are any improvements after adding this.
const readStream = fs.createReadStream('./newDataJson.json');
return response.headers({
'Content-Type': 'application/json',
'Content-Disposition': 'attachment; filename="newDataJson.json"',
}).send(readStream);
Also, as Andy suggested this could be a way to divide your file data in smaller partition based on hours.
120 MB is way over the limits of initial page loading times. The best thing to do would be to split the file into smaller chunks:
1.
Right after that the user will see the loaded page you need (as I assume) only portion of that data. So initially send only small chunk to make the data visible. Keep it small, so the data would not block loading and first draw.
2.
Keep sending rest of the data in smaller parts or load them on demand in chunks e.g. with pagination or scroll events.
You can start splitting the data upon receiving/saving them.
Loading data from api on scroll (infinite scroll)
https://kennethscoggins.medium.com/using-the-infinite-scrolling-method-to-fetch-api-data-in-reactjs-c008b2b3a8b9
Info on loading time vs response time
https://www.pingdom.com/blog/page-load-time-vs-response-time-what-is-the-difference/

How to upload massive number of books automatically to database

I have a client who has 130k books (~4 terabyte) and he wants a site to upload them into it and make an online library. So, how can I make it possible for him to upload them automatically or at least upload multiple books per time? I'll be using Node.js + mysql
I might suggest using Object Storage on top of MySQL to speed up book indexing and retrieval- but that is entirely up to you.
HTTP is a streaming protocol, and node, interestingly, has streams.
This means that, when you send a HTTP request to a node server, internally, node handles it as a stream. This means, theoretically, you can upload massive books to your web server while only having a fraction of it in memory at a time.
The first thing is- book can be very large. In order to efficiently process them, we must process the metadata (name, author, etc) and content seperately.
One example, using an express-like framework, could be: pseudo-code
app.post('/begin/:bookid', (Req, Res) => {
ParseJSON(Req)
MySQL.addColumn(Req.params.bookid,Req.body.name,Req.body.author)
})
app.put('/upload/:bookid', (Req, Res) => {
// If we use MySQL to store the books:
MySQL.addColumn(Req.params.bookid,Req.body)
// Or if we use object storage:
let Uploader = new StorageUploader(Req.params.bookid)
Req.pipe(Uploader)
})
If you need inspiration, look at how WeTransfer has created their API. They deal with lots of data daily- their solution might be helpful to you.
Remember- your client likely won't want to use Postman to upload their books. Build a simple website for them in Svelte or React.

More generic question about receiving a stream with JavaScript

I have a more generic question about receiving a stream with JavaScript
Given the generic use case: I want to get data from backend (csv file, ...) to frontend.
Given the following restrictions:
Backend just accepts 'POST' (non negotiable)
Frontend uses 'Axios' (could be adjusted)
How would I do that?
All the examples in the network work with stream to get data like csv etc., I'm convinced that that's the way to go but couldn't explain it.
My situation
So far I tried a lot to make it work, I did actually but I don't think it was really "Streaming".
Backend: I pushed my stream into the OutputStream of the response and returned the stream. That means the data is visible in the devTools which shoudn't be. Without that, just returning the stream and not writing it to the OutputStream, I always got an "Network error", the backend returned successfully (I guess) but still I got the error.
Frontend: After some time I assume I realized why I got probably got the "Network error": Axios does not support stream
Furthermore I use a fileDownloader that simulates a download click to download the response (POST fetch with Axios) which I formatted into a Blob.
What is necessary to make streaming work? Is there an agenda?
I don't want to fabricate stupid stuff and go somewhere I shouldn't go. That's why I would like to get some information from people who have already some experience with that.
Do I need 'GET' to make streaming work? Or does it work with 'POST' too?
In the backend returning the stream without writing it to the output of the response body is enough?
Is Streaming the right way?
What do I use? Frontend: React, TypeScript, Axios | Backend: C#

Understanding express and how to use it

Many times I have created apps with express where I just spin up a server at a port and then on the client side do all the stuff. whether that be fetching with fetch/axios, rendering data, and even changing routes (react-router). I have never hugely explored node or the server part, until now....hopefully.
I get what it's doing partially. in terms of
app.get('/', (req, res) => res.send('Hello World!'))
this just sends the response to the browser window. and I have even managed to do this:
app.listen(port, () => {
console.log("Listening");
fetch(url, {
}).then((res => res.json()))
.then((json => console.log('json')))
.catch(() => {
console.log("bbb");
});
});
and this logs all the data into the sever window. however, I have a couple of questions
should I be doing this in the server or the client? whats the advantage?
secondly, once I have this data, how can I send it to the client? i.e. a react component
also, I cant seem to copy this code and get it work inside app.get()? am I doing it wrong? maybe I have misunderstood there
I understood more than questions, an answer to all would be great but I would just like to have some more knowledge on what goes inside express and the server
should I be doing this in the server or the client? whats the advantage?
You have to consider the followig things when requesting another server:
Serverside:
you can share the data to multiple clients
you can keep algorithms / secrets private
you probably have a better bandwith than your clients, so you can load big chunks of data and then only send the neccessary data to the client
Client:
Does not consume your servers ressources
secondly, once I have this data, how can I send it to the client? i.e. a react component
You can use AJAX, websockets (http://socket.io) or you have to use redirects.
also, I cant seem to copy this code and get it work inside app.get()?
If you expected to see the data at the client, you have to res.json(json).

JSON string directly from node mongodb

I've been doing a series of load tests on a simple server to try and determine what is negatively impacting the load on my much more complicated node/express/mongodb app. One of the things that consistently comes up is string manipulation require for converting an in-memory object to JSON in the express response.
The amount of data that I'm pulling from mongodb via node and sending over the wire is ~200/300 KB uncompressed. (Gzip will turn this into 28k which is much better.)
Is there a way to have the native nodejs mongodb driver stringify the results for me? Right now for each request with the standard .toArray() we're doing the following:
Query the database, finding the results and transferring them to the node native driver
Native driver then turns them into an in-memory javascript object
My code then passes that in-memory object to express
Express then converts it to a string for node's http response.send using JSON.stringify() (I read the source Luke.)
I'm looking to get the stringify work done at a c++/native layer so that it doesn't add processing time to my event loop. Any suggestions?
Edit 1:
It IS a proven bottleneck.
There may easily be other things that can be optimized, but here's what the load tests are showing.
We're hitting the same web sever with 500 requests over a few seconds. With this code:
app.get("/api/blocks", function(req, res, next){
db.collection('items').find().limit(20).toArray(function(err, items){
if(err){
return next(err);
}
return res.send(200, items);
});
});
overall mean: 323ms, 820ms for 95th%
If instead I swap out the json data:
var cached = "[{... "; //giant json blob that is a copy+paste of the response in above code.
app.get("/api/blocks", function(req, res, next){
db.collection('items').find().limit(20).toArray(function(err, items){
if(err){
return next(err);
}
return res.send(200, cached);
});
});
mean is 164 ms, 580 for 95th%
Now you might say, "Gosh Will a mean of 323ms is great, what's your problem?" My problem is that this is an example in which stringify is causes a doubling of the response time.
From my testing I can also tell you these useful things:
Gzip was a 2x or better gain on response time. The above is with gzip
Express adds a nearly imperceptible amount over overhead compared to generic nodejs
Batching the data by doing cursor.each and then sending each individual item to the response is way worse
Update 2:
Using a profiling tool: https://github.com/baryshev/look
This is while hitting my production code on the same database intensive process over and over. The request includes a mongodb aggregate and sends back ~380KB data (uncompressed).
That function is very small and includes the var body = JSON.stringify(obj, replacer, spaces); line.
It sounds like you should just stream directly from Mongo to Express.
Per this question that asks exactly this:
cursor.stream().pipe(JSONStream.stringify()).pipe(res);

Categories