IndexDB records takes up more space than shown

IndexDB records takes up more space than shown - javascript

I am having a strange problem with IndexDB in google chrome. I am saving large amounts of data to indexdb. However, the Application tab's dashboard (devtools) shows that I used more space than my data holds. I am going to explain via two screenshots:
In this image, as you can see, my data holds only 1.7 megabytes. There is nothing else stored in the IndexDB other than these two entries. However, when I switch to the "Clear Storage" section to see overall storage usage for this domain, I see something quite strange.
Here, it shows that there are 59.3 megabytes of data stored in IndexDB. I don't understand what's the issue here to be honest. I clear site data and save same data in the cache, result is the same. What is the problem here?

Chrome's implementation of Indexed DB compacts space lazily, so it's unsurprising that it often shows more data use than expected. That said, this is 10x what you have stored, which seems unusual.
You should create a minimal standalone repro and report it at https://new.crbug.com

Related

How can I retrieve lots of data in chrome.storage.local?

I'm creating a Chrome extension that needs to store a lot of data. I've set the unlimitedStorage permission, and I'm successfully storing information using calls to chrome.storage.local.set. However, I'm unable to make calls to chrome.storage.local.get. When I make a call as simple as chrome.storage.local.get(null, () => {}), Chrome crashes.
My issue seems similar to chrome.storage.local.get limit, except that that issue was about a much smaller amount of data, and turned out to be a fluke. When I call chrome.storage.local.getBytesInUse(null, i => console.log(i)), I get a result of 176031461. (Admittedly, 180 MB is a lot more than Chrome extensions should typically use, but this extension will be running on my own machine.)
I'd like to be able to save all of this data into a JSON file, but to do that, it seems I need to bring it into memory first, and the only way to do that is through chrome.storage.local.get. I'm trying to use the method described at Chrome Extension: Local Storage, how to export to download the data, but it doesn't even get to the callback function. I'm not sure what's causing it to crash, and I don't think it's a memory limit, given that I've followed the instructions at Max memory usage of a chrome process (tab) & how do I increase it? to increase Chrome's max memory to 4 GB.
One potential solution would be to set the first parameter to something other than null and download the data in chunks, but I'd rather not do that if I don't have to.
My questions are:
What is the limit to the amount of storage that can be retrieved this way?
Is there any way for me to get all of that data at once without crashing?
Is there a better way for me to be saving files from a Chrome extension?

Should I use a cache for this?

I made a code this summer holidays and today I look for the first time at my code again, and I am strugging on one thing I did.
My system is a system with multiple types (pages, newsletters etc.) and multiple subtypes (items, archive, concepts etc.). The idea now I have an object like this:
object { 1: { normal: { 1: { content: 'somecontent', title: 'sometitle' } } } }
Another example:
object { 1: { normal: { 1: { content: 'somecontent', title: 'sometitle' } }, archive: {} }, 2: { normal: {} } }
The data originally comes from the database. I'm making a system to edit pages on the website and other things like newsletters. Because I have multiple types and subtypes.
I made a cache for the reason I don't want to get all items from the database every time. But now the problem is if I add an item, edit an item and remove an item I have to delete it from the cache / edit / add.
My question: is this a good way? I thought it is because you don't have to call an AJAX file to get the data from the database.
I'm sorry if I'm not allowed to ask this here.

My question: is this a good way? I thought it is because you don't
have to call an AJAX file to get the data from the database.
The answer is that "it depends". There is no always right and always wrong answer for caching because caching is a tradeoff between efficiency and timeliness of data.
If you want maximum efficiency, you cache like crazy, but your data may not be perfectly up to date because you're using old data from the cache.
If you want the most up-to-date data, you don't cache anything so you always get the latest data, but obviously efficiency may suffer if you are regular requesting the same data over and over.
So, it's a tradeoff and the tradeoff depends entirely upon the application, its needs, how often the data is modified and what the consequences are for having stale data or for not caching. There is no single right or wrong answer for that tradeoff. It depends entirely upon the particular situation for your application and the tradeoff may even be different for some types of data vs. others within the same application.
For example, let's supposed you were writing an online bidding site that offered some functionality like eBay. You would probably be fine caching the item description for at least several hours because that almost never changes and even if it does, the consequences of being a bit tardy on seeing a new item description are fairly low. But, you could never cache the data on the current bid because the timeliness of that information is critical. The user needs to always see the latest info on the current bid, even if you have to make some sacrifices in efficiency.
Also, remember that caching isn't completely all or none. You can set a lifetime for a cached value such that it can only be used for a certain period of time that is appropriate for the type of data. For example, you might cache an item description in the above auction for up to 2 hours. This allows you to achieve some efficiency gains, but also to eventually see the new data if it happens to change.
In general, you have to review the consequences of showing stale data. If the consequences for having data that is even minutes out of date are high (like the latest price in a live auction), then you can't cache that data at all.
If the consequences of having data that is even hours out of date are low, then you can likely cache that value for at least several hours - maybe even longer.
And, when considering what to cache, you obviously want to first look at the items that are most requested and are the most expensive on your server to retrieve. Some analysis of the usage pattern on your server would give you a prioritized list of candidates to consider for caching.

My question: is this a good way? I thought it is because you don't
have to call an AJAX file to get the data from the database.
This is fine if
1) You want to provide offline reading continuity to the user. User doesn't have to wait for internet connection to be available so that they can read at any time.
2) Your data-service is quite heavy and you want to avoid multiple/frequent visits to the server to get the same data over and over again.
3) You want your app to be bundled with a native package (like phonegap) to become a hybrid app and give a complete offline experience to the user.
This is not a comprehensive list, but just to get your started in terms of when to go for offline and when to keep totally offline
So, on the other hand, this is a bad idea if
1) Your local storage structure is going to change frequently for user to require re-install (unless you can figure out auto-upgrate of local storage)
2) All your features are transactional and require synch with other users also.

Nothing wrong with your approach, just make sure you have kept these points in mind while managing client-side cache
You have one variable 'version' maintained, this version is to be increased whenever there's any change in structure, this version will be sent to client every time, client is responsible for comparison of versions and empty client cache if server version is greater than client version.
You can implement or find any open-sources to handle your ajax responses, this one might be useful - https://github.com/SaneMethod/jquery-ajax-localstorage-cache.
you can set proper expiry tag from server, which can also help, browser to cache response for you, if it is 'get' request.
You can also implement server-side cache, which will not make calls to database, it will cache response against request-url, Note - if different users are supposed to receive different response than this approach wont work. You can delete the cache if any changes happens related to that particular data set - delete/update
In your case you can also maintain flags on server, which simply tells if data has been updated or not the time of article update, if stored version is older you can make server-request or just use local version.
I hope it helps.

Breeze.js cache limitations? Or Browser?

We are investigating using Breeze for field deployment of some tools. The scenario is this -- an auditor will visit sites in the field, where most of the time there will be no -- or very degraded -- internet access. Rather than replicate our SQL database on all the laptops and tablets (if that's even possible), we are hoping to use Breeze to cache the data and then store it locally so it is accessible when there is not a usable connection.
Unfortunately, Breeze seems to choke when caching any significant amount of data. Generally on Chrome it's somewhere between 8 and 13MB worth of entities (as measured by the HTTPResponse headers). This can change a bit depending on how many tabs I have open and such, but I have not been able to move that more than 10%. the error I get is the Chrome tab crashes and tells me to reload. The error is replicable (I download the data in 100K chunks and it fails on the same read every time and works fine if I stop it after the previous read) When I change the page size, it always fails within the same range.
Is this a limitation of Breeze, or Chrome? Or windows? I tried it on Firefox, and it handles even less data before the whole browser crashes. IE fares a little better, but none of them do great.
Looking at performance in task manager, I get the following:
IE goes from 250M memory usage to 1.7G of memory usage during the caching process and caches a total of about 14MB before throwing an out-of-memory error.
Chrome goes from 206B memory usage to about 850M while caching a total of around 9MB
Firefox goes from around 400M to about 750M and manages to cache about 5MB before the whole program crashes.
I can calculate how much will be downloaded with any selection criteria, but I cannot find a way to calculate how much data can be handled by any specific browser instance. This makes using Breeze for offline auditing close to useless.
Has anyone else tackled this problem yet? What are the best approaches to handling something like this. I've thought of several things, but none of them are ideal. Any ideas would be appreciated.
ADDED At Steve Schmitt's request:
Here are some helpful links:
Metadata
Entity Diagram (pdf) (and html and edmx)
The first query, just to populate the tags on the page runs quickly and downloads minimal data:
var query = breeze.EntityQuery
.from("Countries")
.orderBy("Name")
.expand("Regions.Districts.Seasons, Regions.Districts.Sites");
Once the user has select the Sites s/he wishes to cache, the following two queries are kicked off (used to be one query, but I broke it into two hoping it would be less of a burden on resources -- it didn't help). The first query (usually 2-3K entities and about 2MB) runs as expected. Some combination of the predicates listed are used to filter the data.
var qry = breeze.EntityQuery
.from("SeasonClients")
.expand("Client,Group.Site,Season,VSeasonClientCredit")
.orderBy("DistrictId,SeasonId,GroupId,ClientId")
var p = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p1 = breeze.Predicate("SeasonId", "==", SeasonId);
var p2 = breeze.Predicate("DistrictId", "==", DistrictId);
var p3 = breeze.Predicate("Group.Site.SiteId", "in", SiteIds);
After the first query runs, the second query (below) runs (also using some combination of the predicates listed to filter the data. At about 9MB, it will have about 50K rows to download). When the total download burden between the two queries is between 10MB and 13MB, browsers will crash.
var qry = breeze.EntityQuery
.from("Repayments")
.orderBy('SeasonId,ClientId,RepaymentDate');
var p1 = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p2 = breeze.Predicate("SeasonId", "==", SeasonId);
var p3 = breeze.Predicate("DistrictId", "==", DistrictId);
var p4 = breeze.Predicate("SiteId", "in", SiteIds);
Thanks for the interest, Steve. You should know that the Entity Relationships are inherited and currently in production supporting the majority of the organization's operations, so as few changes as possible to that would be best. Also, the hope is to grow this from a reporting application to one with which data entry can be done in the field (so, as I understand it, using projections to limit the data wouldn't work).
Thanks for the interest, and let me know if there is anything else you need.

Here are some suggestions based on my experience building on an offline capable web application using breeze. Some or all of these might not make sense for your use cases...
Identify which entity types need to be editable vs which are used to fill drop-downs etc. Load non-editable data using the noTracking query option and cache them in localStorage yourself using JSON.stringify. This avoids the overhead of coercing the data into entities, change tracking, etc. Good candidates for this approach in your model might be entity types like Country, Region, District, Site, etc.
If possible, provide a facility in your application for users to identify which records they want to "take offline". This way you don't need to load and cache everything, which can get quite expensive depending on the number of relationships, entities, properties, etc.
In conjunction with suggestion #2, avoid loading all the editable data at once and avoid using the same EntityManager instance to load each set of data. For example, if the Client entity is something that needs to be editable out in the field without a connection, create a new EntityManager, load a single client (expanding any children that also need to be editable) and cache this data separately from other clients.
Cache the breeze metadata once. When calling exportEntities the includeMetadata argument should be false. More info on this here.
To create new EntityManager instances make use of the createEmptyCopy method.
EDIT:
I want to respond to this comment:
Say I have a client who has bills and payments. That client is in a
group, in a site, in a region, in a country. Are you saying that the
client, payment, and bill information might each have their own EM,
while the location hierarchy might be in a 4th EM with no-tracking?
Then when I refer to them, I wire up the relationships as needed using
LINQs on the different EMs (give me all the bills for customer A, give
me all the payments for customer A)?
It's a bit of a judgement call in terms of deciding how to separate things out. Some of what I'm suggesting might be overkill, it really depends on the amount of data and the way your application is used.
Assuming you don't need to edit groups, sites, regions and countries while offline, the first thing I'd do would be to load the list of groups using the noTracking option and cache them in localStorage for offline use. Then do the same for sites, regions and countries. Keep in mind, entities loaded with the noTracking option aren't cached in the entity manager so you'll need to grab the query result, JSON.stringify it and then call localStorage.setItem. The intent here is to make sure your application always has access to the list of groups, sites, regions, etc so that when you display a form to edit a client entity you'll have the data you need to populate the group, site, region and country select/combobox/dropdown.
Assuming the user has identified the subset of clients they want to work with while offline, I'd then load each of these clients one at a time (including their payment and bill information but not expanding their group, site, region, country) and cache each client+payments+bills set using entityManager.exportEntities. Reasoning here is it doesn't make sense to load several clients plus their payments and bills into the same EntityManager each time you want to edit a particular client. That could be a lot of unnecessary overhead, but again, this is a bit of a judgement call.

#Jeremy's answer was excellent and very helpful, but didn't actually answer the question, which I was starting to think was unanswerable, or at least the wrong question. However #Steve in the comments gave me the most appropriate information for this question.
It is neither Breeze nor the Browser, but rather Knockout. Apparently the knockout wrapper around the breeze entities uses all that memory (at least while loading the entities and in my environment). As described above, Knockout/Breeze would crap out after reading around 5MB of data, causing Chrome to crash with over 1.7GB of memory usage (from a pre-download memory usage around 300MB). Rewriting the app in ANgularJS eliminated the problem. So far I have been able to download over 50MB from the exact same EF6 model into Breeze/Angular, total Chrome memory usage never went above 625MB.
I will be testing larger payloads, but 50 MB more than satisfies my needs for the moment. Thanks everyone for your help.

Chrome usedJSHeapSize property

First of all, I've looked around the internet and found it quite badly documented.
Somewhere in my code I have a big memory leak that I'm trying to track and after using:
window.performance.memory.usedJSHeapSize
it looks like the value remains at the same level of 10MB, which is not true because when we compare to the values either visible here:
chrome://memory-internals/
or if we look at the Timeline in devTools we can see a big difference. Does anyone encountered a similar issue? Do I need to manually update these values (to run a command "update", "measure" etc?)
Following this topic:
Information heap size
it looks like this value is increased by a certain step, can we somehow see what is it or modify it? In my case from what I can see now the page has about 10MB, 30 minutes later there will be about 400MB, and half an hour after the page will crash..
Any ideas guys?
(Why the code is leaking it's a different issue, please treat this one as I was trying to use this variable to create some kind of test).

There's a section of the WebPlatform.org docs that explains this:
The values are quantized as to not expose private information to attackers. If Chrome is run with the flag --enable-precise-memory-info the values are not quantized.
http://docs.webplatform.org/wiki/apis/timing/properties/memory
So, by default, the number is not precise, and it only updates every 20 minutes! This should explain why your number doesn't change. If you use the flag, the number will be precise and current.
The WebKit commit message explains:
This patch adds an option to expose quantized and rate-limited memory
information to web pages. Web pages can only learn new data every 20
minutes, which helps mitigate attacks where the attacker compares two
readings to extract side-channel information. The patch also only
reports 100 distinct memory values, which (combined with the rate
limits) makes it difficult for attackers to learn about small changes in
memory use.

localStorage not storing persistently between two pages

I'm developing an application and, at certain point, I need to store information that requires to be persistent between multiple pages, more probably, it will only be 2 pages.
The amount of information varies between just a few bytes and about 15KB (It will never be more than 20KB, ever). I can't really properly predict beforehand how much it will be.
For that I decided to use localStorage.
For now I'm only working on localhost:8080.
The pages, for now have only generic names: pageA.php and pageB.php.
The pages reside on the root of the domain. I.e.
http://localhost:8080/pageA.php
http://localhost:8080/pageB.php
...
At certain times, I store data on localStorage, on pageA.php (I do use the setItem() method).
When the user moves to pageB.php, pageB.php's script then tries to get the data that was stored in pageA.php.
The problem is that getItem() always returns null on pageB.php
I did check the keys I'm using and they are the same, so there should be no problems there.
I've checked, data stored is persisting between page loads as long as the url does not change.
What am I doing wrong here?
Note: tested only on Firefox 19 and on chrome 24

The problem here was that the editor I was using had been changed such that it was searching with case sensitivity.
When I changed the string i was using for the key, the replacer didn't match all the strings due to case sensivity.
I solved it by searching and adapting each key such that all keys had the same characters with the same case, not the same characters regardless of case.
In the in the end, it was just lack of attention. As expected, strings in javascript are case sensitive and that also applies to the key for localStorage and sessionStorage

We Keep Coding

JavaScript is the programming language of the Web.