nodeJS, mongoDB, express, eJS - opinions on memory caching - javascript

I'm building my first site using this framework, i'm remaking a website i had done in PHP+mySQL and wish to know something about performance... In my website, i have two kinds of content:
Blog Posts (for 2 sections of the site) - these have the tendency to one day sum to thousands of records and, are more often updated
Static (sort of) data: this is information i keep in the database, like site section's data (title, metatags, header image url, fixed html content, javascript and css filenames to include in that section), that is rarely updated and it's very small in size.
While i was learning the basics on nodeJS, i started thinking of a way to improve the performance of the website, in a way i couldn't do with PHP. So, what i'm doing is:
When i run the app, the static content is all loaded into memory, i have a "model" Object for each content that stores the data in an array, has a method to refresh that data, ie, when the administrator updates something, i call refresh() to go get the new data from the database to that array. In this way, for every page load, instead of querying the database, the app queries the object in memory directly.
What i would like to know is if there should be any increase of performance, working with objects directly in memory or if constant queries to the database would work just as good or even better.
Any documentation supporting your answer will be much appreciated.
Thanks

In terms of the general database performance, MongoDB will keep your working set in memory - that's its basic method of operation.
So, as long as there is no memory contention to cause the data to get swapped out, and it is not too large to fit into your physical RAM, then the queries to the database should be extremely fast (in the sub millisecond range once you have your data set paged in initially).
Of course, if the database is on a different host then you have network latency to think about and such, but theoretically you can treat them as the same until you have a reason to question it.

I don't think there will be any performance difference. First thing is that this static data is probably not so big (up to 100 records?) and querying DB for it is not a big deal. Second thing (more important) is that most DB engines (including mongoDB) have caching systems built-in (although I'm not sure how they work in details). Third thing is that holding query results in memory does not scale well (for big websites) unless you use storage engine like Redis. And that's my opinion, although I'm not the expert.

Related

how to create pagination if api has not provided any option (data items per page)?

All I get from an api is an array of objects, there is no pages or anything that I can use to request only part of the data. I wonder if there is still a good way to create pagination on my client?
(Im using react)
Thank you
Since you can't save on network data-transfers or parsing (although some lazy parsing algorithm might help), paging would improve the memory footprint of all the DOM nodes created, and the time it takes to layout the page.
If you are really conscious about those aspects I would consider using something like react-window to lazy-render data.
The primary considerations of paging is saving on network transfers. The trick of lazy-rendering nodes makes sense on really large datasets.

How much data should we cache in memory in single page applications?

I was curious to know if there is any limit for data caching in
Single page applications using shared service or ngrx.
Does caching too much data on front end impacts the overall
performance of web Application (DOM).
Lets say I have a very big complex nested object which I am caching in memory
Now assume that I want to use different subsets of object in different modules/components of our
application and for that I may need to do lot of mapping operations(using loops by matching the id's etc) on UI.
I was thinking in other way around that instead of doing so much operations on UI to extract the
relevant data why don't I use a simple API with having id parameter to fetch the relevant information if its not taking much time to get the data from backend.
url = some/url/{id}
So is it worth to cache more complex nested objects if we cant use its subset simply by its properties
obj[prop] and need to do lot of calculations on UI (looping etc) which actually is more time consuming than getting the data from rest API ?
Any help/explanation will be appreciated !!!
Thanks
Caching too much data in memory is not a good idea. It will affect your application performance. Causes performance degradation in a system having less memory.
theoretically cache memory is for keeping less amount of data. The maximum support size is 2GB. I think chrome is also supported up to that limit.
For keeping data of big size in client-side never use memory cache instead you should use client-side database/datastore. It uses disk space instead of memory.
There are number of web technologies that store data on the client-side like
Indexed Database
Web SQL
LocalStorage
Cookies
Depending upon the client application framework it can be decided.
By default browser uses 10% of disk space for these data stores. We also have option to increase that size.

Append 300.000 lines or print them into a file?

I need to get from thousands of online JSON about 300.000 final lines, equal to 30MB.
Being beginner in coding, I prefer to stick to JS to $getJSON data, cut it, append interesting parts to my <body>, and loop on the thousands online JSON. But I wonder :
can my web-browser handles 300.000 $getJSON queries and the resulting 30~50MB webpage without crashing ?
is it possible to use JS to write down a file with this results, so the script's works is constantly saved ?
I expect my script to run about 24 hours. Numbers are estimations.
Edit: I don't have server side knowledge, just JS.
A few things aren't right about your approach for this:
If what you are doing is fetching (and processing) data from another source then displaying it for a visitor, processing of this scale should be done separately and beforehand in a background process. Web browsers should not be used as data processors on the scale you're talking about.
If you try to display a 30-50MB webpage, your user is going to experience lots of frustrating issues - browser crashes, lack of responsiveness, timeouts, long load times, and so on. If you expect any users on older IE browsers, they might as well give up without even trying.
My recommendation is to pull this task out and do it using your backend infrastructure, saving the results in a database which can then be searched, filtered, and accessed by your user. Some options worth looking into:
Cron
Cron will allow you to run a task on a repeated and regular basis, such as daily or hourly. Use this if you want to continually update your dataset.
Worker (Heroku)
If running Heroku, take it out of the dyno and use a separate worker so as not to clog up any existing traffic on your app.

key/value store with good performance for multiple tenants

im running a multi tenant GAE app where each tenant could have from a few 1000 to 100k documents.
at this moment im trying to make a MVC javascript client app (the admin part of my app with spine.js) and i need CRUD endpoints and the ability to get a big amount of serialized objects at once. for this specific job appengine is way to slow. i tried to store serialized objects in the blobstore but between reading/writing and updating stuff to the blobstore it takes too much time and the app gets really slow.
i thought of using a nosql db on an external machine to do these operations over appengine.
a few options would be mongodb, couchdb or redis. but i am not sure about how good they perform with that much data and concurrent requests/inserts from different tenants.
lets say i have 20 tenants and each tenant has 50k docs. are these dbs capable to handle this load?
is this even the right way to go?
Why not use the much faster regular appengine datastore instead of blobstore? Simply store your documents in regular entities as Blob property. Just make sure the entity size doesn't exceed 1 MB in which case you have to split up your data into more then one entity. I run an application whith millions of large Blobs that way.
To further speed up things use memcache or even in-memory cache. Consider fetching your entites with eventual consistency which is MUCH faster. Run as many database ops in parallel as possible using either bulk operations or the async API.
The overhead of making calls from appengine to these external machines is going to be worse than the performance you're seeing now (I would expect). why not just move everything to a non-appengine machine?
I can't speak for couch, but mongo or redis are definitely capable of handling serious load as long as they are set up correctly and with enough horsepower for your needs.

Ajax requests/responses: how to make them lightning fast?

I came across a site that does something very similar to Google Suggest. When you type in 2 characters in the search box (e.g. "ca" if you are searching for "canon" products), it makes 4 Ajax requests. Each request seems to get done in less than 125ms. I've casually observed Google Suggest taking 500ms or longer.
In either case, both sites are fast. What are the general concepts/strategies that should be followed in order to get super-fast requests/responses? Thanks.
EDIT 1: by the way, I plan to implement an autocomplete feature for an e-commerce site search where it 1.) provides search suggestion based on what is being typed and 2.) a list of potential products matches based on what has been typed so far. I'm trying for something similar to SLI Systems search (see http://www.bedbathstore.com/ for example).
This is a bit of a "how long is a piece of string" question and so I'm making this a community wiki answer — everyone feel free to jump in on it.
I'd say it's a matter of ensuring that:
The server / server farm / cloud you're querying is sized correctly according to the load you're throwing at it and/or can resize itself according to that load
The server /server farm / cloud is attached to a good quick network backbone
The data structures you're querying server-side (database tables or what-have-you) are tuned to respond to those precise requests as quickly as possible
You're not making unnecessary requests (HTTP requests can be expensive to set up; you want to avoid firing off four of them when one will do); you probably also want to throw in a bit of hysteresis management (delaying the request while people are typing, only sending it a couple of seconds after they stop, and resetting that timeout if they start again)
You're sending as little information across the wire as can reasonably be used to do the job
Your servers are configured to re-use connections (HTTP 1.1) rather than re-establishing them (this will be the default in most cases)
You're using the right kind of server; if a server has a large number of keep-alive requests, it needs to be designed to handle that gracefully (NodeJS is designed for this, as an example; Apache isn't, particularly, although it is of course an extremely capable server)
You can cache results for common queries so as to avoid going to the underlying data store unnecessarily
You will need a web server that is able to respond quickly, but that is usually not the problem. You will also need a database server that is fast, and can query very fast which popular search results start with 'ca'. Google doesn't use conventional database for this at all, but use large clusters of servers, a Cassandra-like database, and a most of that data is kept in memory as well for quicker access.
I'm not sure if you will need this, because you can probably get pretty good results using only a single server running PHP and MySQL, but you'll have to make some good choices about the way you store and retrieve the information. You won't get these fast results if you run a query like this:
select
q.search
from
previousqueries q
where
q.search LIKE 'ca%'
group by
q.search
order by
count(*) DESC
limit 1
This will probably work as long as fewer than 20 people have used your search, but will likely fail on you before you reach a 100.000.
This link explains how they made instant previews fast. The whole site highscalability.com is very informative.
Furthermore, you should store everything in memory and should avoid retrieving data from the disc (slow!). Redis for example is lightning fast!
You could start by doing a fast search engine for your products. Check out Lucene for full text searching. It is available for PHP, Java and .NET amongst other.

Categories