I'm trying to get my head around the use of IndexDB. I have an SQL database which I access via REST and I'm planning on providing some local caching using IndexDB.
My SQL structure uses a large (and variable) number of tables, each table storing an array of data (time sequence and value) for a specific sensor value. Ideally, I would have assumed I'd create a new object store for each of my tables from MySQL. However, it seems that you can only create an object store when the database is opened which is a bit of a pain.
So, I see a number of options -:
I could use a single object store and add two indexes - one for the time,
and one for the sensor. I'm a little worried that this might have
performance issues, but I'm not sure how data is stored under the
hood.
I could probably detect a new sensor somehow, and open the
database with a new version number. This just feels a little wrong to
me.
I could alternatively use different databases for each sensor,
but I've read somewhere that it's not recommended to use multiple
databases (although it's unclear why since this is possibly the
easiest solution).
I'd welcome any thoughts people have regarding the best structure for this sort of data, that will provide good performance.
If your data sets are independent, example you don't need to combine results from multiple sensors, I suggest you to split them in different tables and/or different databases. Different database option is more convenient for deleting data.
IndexedDB database limit for performance in a single database is for more than 50K data, depending on browser and hardware. I have a couple tests which can measure the speed, just tweak the object size that is inserted and you can test your use case.
If you have less than 10K data per sensor (object store/database) you won't hit big performance issues. One common mistake when inserting batch of data is separate transaction for each insert - this is completely unnecessary, since you can store 10K data with one transaction. If you are working with even larger data set, you can separate the inserting into couple transaction, so you won't block the reading of that database.
Also for every transaction that you do in IndexeDB you need to open a connection, some people use the approach for keeping the one connection alive and reusing it, I prefer the closing and opening a separate connection for each transaction.
Also for faster access, you can store all database info into Local Storage, that way you can track how many databases you have and descriptions for each of them.
Additionally you can take a look at this similar question
Related
In the design stage for an app that collects large amounts of data...
Ideally, I want it to be an offline-first app and was looking to Pouchdb/Counchdb - However, the data needs to be kept for years for legal reasons, and my concern is that this is going to consume too much local storage over time.
My thoughts were:
handle sync between pouchdb and couchdb myself, allowing me to purge inactive documents from the local store without impacting the couchdb. This feels messy and probably a lot of work
Build a local store using dexie.js and completely write the sync function. It also looks hard work, but may be less as I'm not trying to mess with a sync function
Search harder :)
Conceptually, I guess I'm looking for a 'DB cache' - holding active json document versions and removing documents that have not been touched for X period. It might be that 'offline' mode is handled separate to the DB cache..
Not sure yet if this is the correct answer..
setup a filter on couchdb to screen out old documents (lets say we have a 'date_modified' field in the doc and we filter out any docs with date_modified older than one month)
have a local routine on the client that deletes documents from the local pouchdb that are older than one month ( actually using the remove() method against the local pouchdb, not updating it with _deleted:true) - from https://pouchdb.com/2015/04/05/filtered-replication.html it appears removed documents don't sync.
docs updated on the Pouchdb will replicate normally
there might be a race condition here for replication, we'll see
Which is the best approach for an app that aims to filter data, like 5000+ records, by keeping the response speed in focus?
Filter local memory arrays
Query to db through http API request calls
For my app I use angularjs, php and SQLite3. Right now I load all record from slite db to my table and then filter this field by search. All works great, but when I exceed 3000 records I notice a certain slowing down. By limiting the search on two fields, I get better performance.
My doubt is if changing the model and querying the db I get a better performance or not.
Local array advantages
I can use JavaScript Array map() Method
low consuming data bandwidth
I can see all records in table before filter
I can work, after loading data, in offline.
Local array disadvantages
slowing down performance over 2000 record.
So can you help me to evaluate advantages and disadvantages if I make http API call for any filter action request keeping in focus the performances?
I can't tell about caching in PHP, but for the AngularJS end, there's an approach you can follow:
When the user searches for the first time, fetch the result(s) from db.
Make 2 copies of the data: one presented to the user directly, another can be stores in a local json with a key value pair approach.
3.Next time the user searches for anything, look into the local json first for the result. If the data is present locally, no need for the db query, else make the db query and repeat step 2.
The idea is not to make user wait for every search, you cannot simply call all 5000+ records at once and store locally, and you definitely cannot make db queries every-time since RDMS having that much records simply have low performance issues.
So this seems best to me.
I am building a project using node.js that is integrated with 4 other systems that keeps sending data from sensors every 1 second. I am trying to have like a timeline so I need to save that data, but I don't feel it's correct to hit a couple of insert statements every one second.
what is the best way to save data that is that redundant. I was thinking about having some log files and then insert data in bulk. Any suggestions?
Thank you.
This would make it a premature optimization. I've bench-marked PostgreSQL under Node.js many times. And at any given moment inserting several records per second will take under 10ms, i.e. less than 1% of your app's load, if you do it every second.
The only worthwhile optimization you should do from start - use multi-row insert, even if you insert only 2 rows at a time. The reasons for this are as follows:
Node.js IO is a valuable resource, so the fewer round trips you do the better
Multi-row inserts are tremendously faster than separate insert queries
Separate inserts typically require a transaction, and a single multi-row insert doesn't.
You can find a good example here: Multi-row insert with pg-promise.
Are there any patterns for cases when something (in my case it's a filters) is stored on client (e.g. localStorage) and you need to run a script once per user/version to migrate data you store. For example, initially there is a filter saved in localStorage with a key myFilter after some time you decide that you need to separate filters per environment, so you need separate dev-myFilter, train-myFilter, etc. You update your code to work with environment-dependant filters, but there are users who have old myFilter and you want with next deployed version to run script which will update the key of saved filter if there is one.
Question is - what are patterns/best practices for that?
I don't know about "best practices", but the obvious technical solution, just like with any API or storage format, is to store a version number alongside the data. If you didn't do so from the start, assume version == 1 when absent.
You may be able to avoid this if the data structure is so unique between versions that the version can be determined simply by examining it.
Either way, you simply perform the translation whenever you spot that the user's data is in the old format.
The downside of this is that you have to keep checking; for a web application this is unlikely to be a bottleneck, but if you can make your data forward-compatible from the outset then you may save a bit of processing time on each request. But for the data to be useful you've got to read it anyway, so a little branching for as long as you wish to maintain backward-compatibility is, again, unlikely to be a big problem.
I'm using Meteor JS for a project so inherently I'm using MongoDB. I'm storing a user's check in and out actions. I'm currently storing them as individual docs in the collection. Each action contains 3 fields; in or out, time of action and userid. Is the best way to go though? Should I just have one doc per members and then store each action in an array? Is there another way? I anticipate several hundred members, but hopefully several thousands of members in the future. Thanks.
From experience, I can say that storing records instead of arrays is a better choice in the long run.
As far as Meteor is concerned, its reactivity handles collection records, but not individual fields in arrays. In other words, if one element gets added to the checkins array of a user object, the entire user object needs to be synchronized with the clients. If you store records instead, only the newly added record will be sent by the publication.
As far as MongoDB is concerned, there is a document size limit of 16MB. Not sure how frequent your checkins and checkouts are, but if you store them in an array, you might run into that limitation at some point.
Records are also easier to access than arrays.
For more details, see MongoDB data modeling and Database modeling in Bulletproof Meteor.