Am developing a node js application, which reads a json list from a centralised db
List Object is around 1.2mb(if kept in txt file)
Requirement is like, data is to be refreshed every 24 hours, so i kept a cron job for it
Now after fetching data i keep it into a db(couchbase) which is locally running on my server
Data access is very frequent, i get around 1 or 2 req per sec and nearly all req need that Object
Is it good to keep that Object as a in memory object in node js or to keep it in local db ?
What are the advantages and disadvantages of both ?
Object only read for all requests , only written once by cron job
it's a high end system, i7 quad core, 16gb ram
It depends from your hardware
If this object is immutable, per requests, it's better to keep it in memory. If no - depends.
In any case workflow open connection to db - fetch data - return result - free data will consume more resources than caching in memory.
For example, in our project we processing high definition images, and keep all objects in memory - 3-7mb in raw format. Tests shows that this is much efficient than usage of any caching systems, such as redis or couch base.
I would keep the most recent version as memory object, and store it as well. That way you have a backup if anything crashes. If you edit the file however, I would only keep it as database object.
Accessing the DB for that object every 2 seconds would probably work fine, but 1.2MB of memory is not that much and if you can keep that contained, your server won't likely run into problems.
The DB is a little slow compared to memory, but has the advantage to (most likely) be thread-safe. If you would edit the document, you could run into thread problems with a memory object.
You know the application and the requirements, you should be able to tell if you would need a thread-safe database, or if you need to safe your memory on the server. If you don't know, we need to see the actual code and use-cases to tell you what you could do best.
Related
In the design stage for an app that collects large amounts of data...
Ideally, I want it to be an offline-first app and was looking to Pouchdb/Counchdb - However, the data needs to be kept for years for legal reasons, and my concern is that this is going to consume too much local storage over time.
My thoughts were:
handle sync between pouchdb and couchdb myself, allowing me to purge inactive documents from the local store without impacting the couchdb. This feels messy and probably a lot of work
Build a local store using dexie.js and completely write the sync function. It also looks hard work, but may be less as I'm not trying to mess with a sync function
Search harder :)
Conceptually, I guess I'm looking for a 'DB cache' - holding active json document versions and removing documents that have not been touched for X period. It might be that 'offline' mode is handled separate to the DB cache..
Not sure yet if this is the correct answer..
setup a filter on couchdb to screen out old documents (lets say we have a 'date_modified' field in the doc and we filter out any docs with date_modified older than one month)
have a local routine on the client that deletes documents from the local pouchdb that are older than one month ( actually using the remove() method against the local pouchdb, not updating it with _deleted:true) - from https://pouchdb.com/2015/04/05/filtered-replication.html it appears removed documents don't sync.
docs updated on the Pouchdb will replicate normally
there might be a race condition here for replication, we'll see
I want to preLoad data from the server so that it will be immediately available when I call for it. The data is stored in a file called "data.json".
At first, I thought to save data.json to an object and reference it when I need it.
However, depending on the user's actions it may take a while before I need the data that this object has stored. So memory management becomes a bit of a concern as the object is quiet large (~5 mb).
My question is, when I call for data.json via ajax, does the browser internally "cache" this file for the duration of the website session.
Meaning, if I called for the file via ajax again after calling for it already, the browser would instantly get the file from its own internal memory instead of going back to the server?
If so it seems it would be wasteful saving an extra copy of this file in JavaScript. However, I can't find any information/standards about this online.
So in short, do I need to save the downloaded file to an object? Or is it safe to let browsers handle this internally?
There are several different types of "cache" in play here. It sounds like you're asking "How long does the browser's JavaScript engine keep an object in memory" and the answer is "As long as there's a reference to it."
The browser's (HTTP) cache, on the other hand, lives longer; an entry may live for days or weeks or years, depending on the available space, the freshness headers on the response, etc.
For a scenario like you've described, you probably want to prefetch the JSON to a local cache file, then load that cache file into JavaScript only as needed.
I have Json file I need to look for specific value (file is pretty big) and I want to turn this json into array so doing so would be easier and faster.
BUT what is the best way to save this array? so I wont need to run over this json every time and it will be saved until recycle or service restart?
(node js project)
1) Use Redis recommended
Pros:
Access objects super fast.
Isolated from node process.
Will not affect heap memory.
Can be deployed at separated server.
Data persistence if your application crashed.
Support compression.
low memory consumption.
Cons:
You may face some limitations if you have nested objects, but there
is a workaround which requires extra work to handle.
2) Use database preferred MongoDB,
Pros:
Save/Load objects easily since MongoDB supports JSON.
The same as number 2 of Redis pros
Cons:
You have to query every time to access objects.
3) Use Files not recommended, When your application start/restart, load your objects form file into global array, and when close/shutdown your application dump your objects from global array back into file.
Pros:
Access objects fast.
Cons:
Heap memory leakage if your objects size are Huge.
Data loss if your application crashed.
At the end it's your choice, if the speed matters choose Redis, if you want the easy way choose mongoDB. If it's not a problem losing some of your data go for Files. Also you can mix between number 2 and 3.
I know memcache and redis are used when caching needs to be there for more than one servers.
I'm creating a node application which will run on single server only and uses mysql as db, and i need to hash around 100,000 keys and each key will contain json string of 200 in length, so that i dont have to call mysql for reads.
If i use memcache or redis i will use a callback to get my data, but if i use javascript hash i can get the data synchronously, but will it affect the application somehow, like high usage of memory. Which one i should be using for a application like this?
I know memcache and redis are used when caching needs to be there for more than one servers.
Not necessarily, for instance Facebook puts a memcache instance in front of each of their mysql servers. You can use Redis/Memcache for fast computation (e.g. real-time analytics) without having a whole cluster.
and i need to hash around 100,000 keys and each key will contain json string of 200 in length, so that i dont have to call mysql for reads.
It seems like premature optimization to mee, if MySQL have enough RAM (the dataset live in memory) you don't have to worry about performance, that's just 100 keys.
If i use memcache or redis i will use a callback to get my data
If really depends on what language you use (Ruby and Python offers synchronous Redis clients) and what type of paradygm is used (event-loop, thread pool...)
but if i use javascript hash i can get the data synchronously
To be more precise, that's just because you are using node_redis and not because you are using a javascript "hash" (an object in fact).
but will it affect the application somehow, like high usage of memory
It depends if you are loading all keys in your process or not, if you are using a Redis Hash, you will be able to only query the field you want and not the whole field each time.
Which one i should be using for a application like this?
The best thing to keep in mind is to lower the number of application you have to maintain in your stack while still using the right tool for the right job. Here MySQL could be enough but if you really want to use Redis or MemCached, I would go for Redis. It will offers simirarly the same features as memcached with the same performances will allowing you to use its other data-structures in the future without needing another application in your stack.
Moreover, if you put all your data in a Redis HASH, you will be able to retrieve a field (hget) or a group of fields (hmget) or all fields (hgetall) with just one call.
Finally, regarding recent statistics and Redis ecosystem (GUI, hosting, librairies, ...), Redis seems to be way more future proof than Memcached if you really want to go that way.
Disclaimer: I am the founder of Redsmin, an online developer oriented service for administrating and monitoring Redis.
It depends- you could even opt for memcached over mysql :). For simple operation such as only -readonly just storing it within your javascript code (I believe as dictionary objects) is enough. But be sure that you have enough RAM :) .
So I don't actually mean browser caching of an Ajax request using the GET method, but storing large queries (any number, likely double-digits, of 40 - 300kb queries) in the the browser's memory.
What are the unseen benefits, risks associated with this?
var response = JSON.parse(xhr.responseText);
Cache.push(response); // Store parsed JSON object in global variable `Cache`
// Time passes, stuff is done ...
if(Cache[query])
load(Cache[query])
else
Ajax(query, cache_results);
Is there an actual need? Or is it just optimization for the sake of? I'd suggest doing some profiling first and see where the bottlenecks lie. Remember that a web page session typically doesn't last that long, so unless you're using some kind of offline storage the cache won't last that long.
Not having the full view of your system it is hard to tell but I would think that potentially playing with stale data will be a concern.
Of course, if you have a protocol in place to resolve "cache freshness" you are on the right track.... but then, why not rely on the HTTP protocol to do this? (HTTP GET with ETag/Last-Modified headers)
You'll probably want to stress-test the memory usage and general performance of various browsers when storing many 300kb strings. You can monitor them in task manager, and also use performance tools like Speed Tracer and dynatrace ajax edition.
If it turns out that caching is a performance win but it starts to get bogged down when you have too many strings in memory, you might think of trying HTML5 storage or Flash storage to store the strings--that way you can cache things across sessions as well. Dojo storage is a good library for this.