Related
I'm just making up a scenario, but let's say I have a 500MB file that I want to provide an html table for the client to view the data. Let's say there are two scenarios:
They are viewing it via a Desktop where they have 1.2GB available memory. They can download the whole file.
Later, they try and view this same table on their phone. We detect that they only have 27MB available memory, and so give them a warning that says "We have detected that your device does not have enough memory to view the entire table. Would you like to download a sample instead?"
Ignoring things like pagination or virtual tables, I'm just concerned about "if the full dataset can fit in the user's available memory". Is this possible to detect in a browser (even with a user's confirmation). If so, how could this be done?
Update
This answer has been answered about 6 years ago, and the question points to an answer from 10 years ago. I'm wondering what the current state is, as browsers have changed quite a bit since then and there's also webassembly and such.
Use performance.memory.usedJSHeapSize. Though it non-standard and in development, it will be enough for testing the memory used. You can try it out on edge/chrome/opera, but unfortunately not on firefox or safari (as of writing).
Attributes (performance.memory)
jsHeapSizeLimit: The maximum size of the heap, in bytes, that is available to the context.
totalJSHeapSize: The total allocated heap size, in bytes.
usedJSHeapSize: The currently active segment of JS heap, in bytes.
Read more about performance.memory: https://developer.mozilla.org/en-US/docs/Web/API/Performance/memory.
CanIUse.com: https://caniuse.com/mdn-api_performance_memory
CanIUse.com 2020/01/22
I ran into exactly this problem some time ago (a non-paged render of a JSON table, because we couldn't use paging, because :-( ), but the problem was even worse than what you describe:
the client having 8 GB of memory does not mean that the memory is available to the browser.
any report of "free memory" on a generic device will be, ultimately, bogus (how much is used as cache and buffers?).
even knowing exactly "how much memory is available to Javascript" leads to a maintenance nightmare because the translation formula from available memory to displayable rows involves a "memory size for a single row" that is unknown and variable between platforms, browsers, and versions.
After some heated discussions, my colleagues and I agreed that this was a XY problem. We did not want to know how much memory the client had, we wanted to know how many table rows it could reasonably and safely display.
Some tests we ran - but this was a couple of months or three before the pandemic, so September 2019, and things might have changed - showed the following interesting effects: if we rendered off-screen, client-side, a table with the same row, repeated, and random data, and timed how long it took to add each row, this time was roughly correlated with the device performances and limits, and allowed a reasonable estimate of the permissible number of actual rows we could display.
I have tried to reimplement a very crude such test from my memory, it ran along these lines and it transmitted the results through an AJAX call upon logon:
var tr = $('<tr><td>...several columns...</td></tr>')
$('body').empty();
$('body').html('<table id="foo">');
var foo = $('#foo');
var s = Date.now();
for (i = 0; i < 500; i++) {
var t = Date.now();
// Limit total runtime to, say, 3 seconds
if ((t - s) > 3000) {
break;
}
for (j = 0; j < 100; j++) {
foo.append(tr.clone());
}
var dt = Date.now() - t;
// We are interested in when dt exceeds a given guard time
if (0 == (i % 50)) { console.debug(i, dt) };
}
// We are also interested in the maximum attained value
console.debug(i*j);
The above is a re-creation of what became a more complex testing rig (it was assigned to a friend of mine, I don't know the details past the initial discussions). On my Firefox on Windows 10, I notice a linear growth of dt that markedly increases around i=450 (I had to increase the runtime to arrive at that value; the laptop I'm using is a fat Precision M6800). About a second after that, Firefox warns me that a script is slowing down the machine (that was, indeed, one of the problems we encountered when sending the JSON data to the client). I do remember that the "elbow" of the curve was the parameter we ended up using.
In practice, if the overall i*j was high enough (the test terminated with all the rows), we knew we need not worry; if it was lower (the test terminated by timeout), but there was no "elbow", we showed a warning with the option to continue; below a certain threshold or if "dt" exceeded a guard limit, the diagnostic stopped even before the timeout, and we just told the client that it couldn't be done, and to download the synthetic report in PDF format.
You may want to use the IndexedDB API together with the Storage API:
Using navigator.storage.estimate().then((storage) => console.log(storage)) you can estimate the available storage the browser allows the site to use. Then, you can decide whether to store the data in an IndexedDB or to prompt the user with not enaugh storage to downlaod a sample.
void async function() {
try {
let storage = await navigator.storage.estimate();
print(`Available: ${storage.quota/(1024*1024)}MiB`);
} catch(e) {
print(`Error: ${e}`);
}
}();
function print(t) {
document.body.appendChild(document.createTextNode(
t
));
}
(This snippet might not work in this snippet context. You may need to run this on a local test server)
Wide Browser Support
IndexedDB will be available in the future: All browsers except Opera
Storage API will be available in the future with exceptions: All browsers except Apple and IE
Sort of.
As of this writing, there is a Device Memory specification under development. It specifies the navigator.deviceMemory property to contain a rough order-of-magnitude estimate of total device memory in GiB; this API is only available to sites served over HTTPS. Both constraints are meant to mitigate the possibility of fingerprinting the client, especially by third parties. (The specification also defines a ‘client hint’ HTTP header, which allows checking available memory directly on the server side.)
However, the W3C Working Draft is dated September 2018, and while the Editor’s Draft is dated November 2020, the changes in that time span are limited to housekeeping and editorial fixups. So development on that front seems lukewarm at best. Also, it is currently only implemented in Chromium derivatives.
And remember: just because a client does have a certain amount of memory, it doesn’t mean it is available to you. Perhaps there are other purposes for which they want to use it. Knowing that a large amount of memory is present is not a permission to use it all up to the exclusion to everyone else. The best uses for this API are probably like the ones specified in the question: detecting whether the data we want to send might be too large for the client to handle.
I made a code this summer holidays and today I look for the first time at my code again, and I am strugging on one thing I did.
My system is a system with multiple types (pages, newsletters etc.) and multiple subtypes (items, archive, concepts etc.). The idea now I have an object like this:
object { 1: { normal: { 1: { content: 'somecontent', title: 'sometitle' } } } }
Another example:
object { 1: { normal: { 1: { content: 'somecontent', title: 'sometitle' } }, archive: {} }, 2: { normal: {} } }
The data originally comes from the database. I'm making a system to edit pages on the website and other things like newsletters. Because I have multiple types and subtypes.
I made a cache for the reason I don't want to get all items from the database every time. But now the problem is if I add an item, edit an item and remove an item I have to delete it from the cache / edit / add.
My question: is this a good way? I thought it is because you don't have to call an AJAX file to get the data from the database.
I'm sorry if I'm not allowed to ask this here.
My question: is this a good way? I thought it is because you don't
have to call an AJAX file to get the data from the database.
The answer is that "it depends". There is no always right and always wrong answer for caching because caching is a tradeoff between efficiency and timeliness of data.
If you want maximum efficiency, you cache like crazy, but your data may not be perfectly up to date because you're using old data from the cache.
If you want the most up-to-date data, you don't cache anything so you always get the latest data, but obviously efficiency may suffer if you are regular requesting the same data over and over.
So, it's a tradeoff and the tradeoff depends entirely upon the application, its needs, how often the data is modified and what the consequences are for having stale data or for not caching. There is no single right or wrong answer for that tradeoff. It depends entirely upon the particular situation for your application and the tradeoff may even be different for some types of data vs. others within the same application.
For example, let's supposed you were writing an online bidding site that offered some functionality like eBay. You would probably be fine caching the item description for at least several hours because that almost never changes and even if it does, the consequences of being a bit tardy on seeing a new item description are fairly low. But, you could never cache the data on the current bid because the timeliness of that information is critical. The user needs to always see the latest info on the current bid, even if you have to make some sacrifices in efficiency.
Also, remember that caching isn't completely all or none. You can set a lifetime for a cached value such that it can only be used for a certain period of time that is appropriate for the type of data. For example, you might cache an item description in the above auction for up to 2 hours. This allows you to achieve some efficiency gains, but also to eventually see the new data if it happens to change.
In general, you have to review the consequences of showing stale data. If the consequences for having data that is even minutes out of date are high (like the latest price in a live auction), then you can't cache that data at all.
If the consequences of having data that is even hours out of date are low, then you can likely cache that value for at least several hours - maybe even longer.
And, when considering what to cache, you obviously want to first look at the items that are most requested and are the most expensive on your server to retrieve. Some analysis of the usage pattern on your server would give you a prioritized list of candidates to consider for caching.
My question: is this a good way? I thought it is because you don't
have to call an AJAX file to get the data from the database.
This is fine if
1) You want to provide offline reading continuity to the user. User doesn't have to wait for internet connection to be available so that they can read at any time.
2) Your data-service is quite heavy and you want to avoid multiple/frequent visits to the server to get the same data over and over again.
3) You want your app to be bundled with a native package (like phonegap) to become a hybrid app and give a complete offline experience to the user.
This is not a comprehensive list, but just to get your started in terms of when to go for offline and when to keep totally offline
So, on the other hand, this is a bad idea if
1) Your local storage structure is going to change frequently for user to require re-install (unless you can figure out auto-upgrate of local storage)
2) All your features are transactional and require synch with other users also.
Nothing wrong with your approach, just make sure you have kept these points in mind while managing client-side cache
You have one variable 'version' maintained, this version is to be increased whenever there's any change in structure, this version will be sent to client every time, client is responsible for comparison of versions and empty client cache if server version is greater than client version.
You can implement or find any open-sources to handle your ajax responses, this one might be useful - https://github.com/SaneMethod/jquery-ajax-localstorage-cache.
you can set proper expiry tag from server, which can also help, browser to cache response for you, if it is 'get' request.
You can also implement server-side cache, which will not make calls to database, it will cache response against request-url, Note - if different users are supposed to receive different response than this approach wont work. You can delete the cache if any changes happens related to that particular data set - delete/update
In your case you can also maintain flags on server, which simply tells if data has been updated or not the time of article update, if stored version is older you can make server-request or just use local version.
I hope it helps.
relatively new to databases here (and dba).
I've been recently looking into Riot Games' APIs, however now realising that you're limited to 10 calls per 10 seconds, I need to change my front-end code that was originally just loading all the information with lots of and lots of API calls into something that uses a MySQL database.
I would like to collect ranked data about each player and list them (30+ players) in an ordered list of ranking. I was thinking, as mentioned in their Rate Limiting Page, "caching" data when GET-ing it, and then when needing that information again, check if it is still relevant - if so use it, if not re-GET it.
Is the idea of adding a time of 30 minutes (the rough length of a game) in the future to a column in a table, and when calling check whether server time is ahead of the saved time. Is this the right approach/idea of caching - If not, what is the best practice of doing so?
Either way, this doesn't solve the problem of loading 30+ values for the first time, when no previous calls have been made to cache.
Any advice would be welcome, even advice telling me I'm doing completely the wrong thing!
If there is more information needed I can edit it in, let me know.
tl;dr What's best practice to get around Rate-Limiting?
Generally yes, most of the large applications simply put guesstimate rate limits, or manual cache (check DB for recent call, then go to API if its an old call).
When you use large sites like op.gg or lolKing for Summoner look ups, they all give you a "Must wait X minutes before doing another DB check/Call", I also do this. So yes, giving an estimated number (like a game length) to handle your rate limit is definitely a common practice that I have observed within the Riot Developer community. Some people do go all out and implement actual caching though with actual caching layers/frameworks, but you don't need to do that with smaller applications.
I recommend building up your app's main functionality first, submit it, and get it approved for a higher rate limit as well. :)
Also you mentioned adjusting your front-end code for calls, make sure your API calls are in server-side code for security concerns.
Can anyone share best practices for troubleshooting google anlytics code?
Has anyone built a debugging tool? Does google have a linter hidden somewhere? Does anybody have a good triage logic diagram?
I'll periodically set up different parts of GA and it seems like every time I do it takes 4 or 5 days to get it working.
The workflow looks like this:
Read the docs on the feature (e.g. events, custom variables).
Implement what appears to be the correct code based on the docs.
Wait a day.
See no data.
Google every version of the problem I can imagine. Find what may be a solution.
Change my code.
Wait a day.
See no data.
Loop:
Randomly move elements of the tracking code around.
Wait a day.
If other parts break, tell ceo, get yelled at, revert changes.
If data appears, break.
Pray it continues to work/I never have to change the tracking code again.
For obvious reasons, I'm not satisfied with this workflow and hoping someone has figured out something I haven't.
Everything I do, debugging GA code, stops and starts with the Google Analytics Debugger Chrome Extension. It prints out to the console a summary of the data it has sent to Google Analytics which, for all purposes except testing profile filters, is all you need. It'll eliminate the "wait a day" step.
If you're not a fan of Google Chrome, you can inspect the HTTP requests yourself to see how the data is parsing. You can use this guide to figure out what each paramater in the URL represents.
In terms of ensuring the features I've installed or the code itself is working, I'll open a fresh browser (cleared of cookies), and navigate to the site I'm testing via Google search. I'll proceed to navigate to all of the pertinent pages, and trigger all the pertinent events, all the while ensuring that the requests are being sent to Google, and that the session isn't broken at any point (by either keeping an eye on the Session Count, or ensuring that the traffic source doesn't change from organic/google to direct or a self-referral.
Screenshot:
To begin with, this answer isn't at odds with any portion of either of the two answers before mine--i.e. you could certainly implement them all without conflict.
My answer just reflects my own priority, which is that the latency issue. Latency makes debugging far more difficult than it should be. Ten minutes of latency while waiting for the compiler to finish is irritating, four hours (minimum GA latency) is painful.
So for me, the first step in building a GA de-bugging framework was to somehow get the GA results in real-time--in other words, if i changed a regular expression filter, i needed to catch the traffic processed by that filter. So removing the 4-24 hour latency in getting results from the GA server was critical.
The easiest way i have found so far to do this is to modify the GA tracking code on each page of your Site so that it sends a copy of each GIF Request to your own server.
To do this, immediately before the call to trackPageview(), add this line:
pageTracker._setLocalRemoteServerMode();
This will send the entire request header to your server access log, which you can parse in real time. (Specifically, your server writes to the access log one line at a time--one line corresponds to one request. All of the GA data is packaged and set as a request header, so there's perfect coincidence between the two.
yahelc answer is great, but I'd like to add my 2c here.
Get yourself a nice sniffer to see the hits flowing.
Nice options:
Wasp
Charles
HTTPFox
Fiddler
Then implement your changes on QA.
Test this new setup on QA. Things you should keep an eye on.
Always make sure that the basic pageview fires. It should have at least an utmp value and no utmt set.
Make sure the visitor Id doesn't get overwritten. This is the second number on the __utma cookie. This number should be your userid, if it changes then things are broken.
Make sure your pageviews contain the page and session variables you set. If you set any. They are coded into the param utme.
Make sure that any Visitor custom var is fired before your basic pageview. utmt=custom variable
Make sure the source data is not overwritten (Campaign/medium/source/content/keyword) - These are set on the __utmz cookie. If it gets overwritten by direct or a referral of you own site there's something wrong.
If you miss any event it may be due a reqired field missing or the last value being a float or string. The value of an event must be an integer.
If you're using the ecomerce double check all your parameters. Make sure that you're firing everything as strings here and that unused parametrs are empty strings.
triple check your account number. UA-XXXXX-X.
If your doing something with custom JS make sure to test on all browsers, and try to get at least the basic tracking on a safe zone where you are sure things won't break.
Send debug info about javascript code that might break GA to GA. Check this.
I'm embedding a large array in <script> tags in my HTML, like this (nothing surprising):
<script>
var largeArray = [/* lots of stuff in here */];
</script>
In this particular example, the array has 210,000 elements. That's well below the theoretical maximum of 231 - by 4 orders of magnitude. Here's the fun part: if I save JS source for the array to a file, that file is >44 megabytes (46,573,399 bytes, to be exact).
If you want to see for yourself, you can download it from GitHub. (All the data in there is canned, so much of it is repeated. This will not be the case in production.)
Now, I'm really not concerned about serving that much data. My server gzips its responses, so it really doesn't take all that long to get the data over the wire. However, there is a really nasty tendency for the page, once loaded, to crash the browser. I'm not testing at all in IE (this is an internal tool). My primary targets are Chrome 8 and Firefox 3.6.
In Firefox, I can see a reasonably useful error in the console:
Error: script stack space quota is exhausted
In Chrome, I simply get the sad-tab page:
Cut to the chase, already
Is this really too much data for our modern, "high-performance" browsers to handle?
Is there anything I can do* to gracefully handle this much data?
Incidentally, I was able to get this to work (read: not crash the tab) on-and-off in Chrome. I really thought that Chrome, at least, was made of tougher stuff, but apparently I was wrong...
Edit 1
#Crayon: I wasn't looking to justify why I'd like to dump this much data into the browser at once. Short version: either I solve this one (admittedly not-that-easy) problem, or I have to solve a whole slew of other problems. I'm opting for the simpler approach for now.
#various: right now, I'm not especially looking for ways to actually reduce the number of elements in the array. I know I could implement Ajax paging or what-have-you, but that introduces its own set of problems for me in other regards.
#Phrogz: each element looks something like this:
{dateTime:new Date(1296176400000),
terminalId:'terminal999',
'General___BuildVersion':'10.05a_V110119_Beta',
'SSM___ExtId':26680,
'MD_CDMA_NETLOADER_NO_BCAST___Valid':'false',
'MD_CDMA_NETLOADER_NO_BCAST___PngAttempt':0}
#Will: but I have a computer with a 4-core processor, 6 gigabytes of RAM, over half a terabyte of disk space ...and I'm not even asking for the browser to do this quickly - I'm just asking for it to work at all! ☹
Edit 2
Mission accomplished!
With the spot-on suggestions from Juan as well as Guffa, I was able to get this to work! It would appear that the problem was just in parsing the source code, not actually working with it in memory.
To summarize the comment quagmire on Juan's answer: I had to split up my big array into a series of smaller ones, and then Array#concat() them, but that wasn't enough. I also had to put them into separate var statements. Like this:
var arr0 = [...];
var arr1 = [...];
var arr2 = [...];
/* ... */
var bigArray = arr0.concat(arr1, arr2, ...);
To everyone who contributed to solving this: thank you. The first round is on me!
*other than the obvious: sending less data to the browser
Here's what I would try: you said it's a 44MB file. That surely takes more than 44MB of memory, I'm guessing this takes much over 44MB of RAM, maybe half a gig. Could you just cut down the data until the browser doesn't crash and see how much memory the browser uses?
Even apps that run only on the server would be well served to not read a 44MB file and keep it in memory. Having said all that, I believe the browser should be able to handle it, so let me run some tests.
(Using Windows 7, 4GB of memory)
First Test
I cut the array in half and there were no problems, uses 80MB, no crash
Second Test
I split the array into two separate arrays, but still contains all the data, uses 160Mb, no crash
Third Test
Since Firefox said it ran out of stack, the problem is probably that it can't parse the array at once. I created two separate arrays, arr1, arr2 then did arr3 = arr1.concat(arr2); It ran fine and uses only slightly more memory, around 165MB.
Fourth Test I am creating 7 of those arrays (22MB each) and concatting them to test browser limits. It takes about 10 seconds for the page to finish loading. Memory goes up to 1.3GB, then it goes back down to 500MB. So yeah chrome can handle it. It just can't parse it all at once because it uses some kind of recursion as can be noticed by the console's error message.
Answer Create separate arrays (less than 20MB each) and then concat them. Each array should be on its own var statement, instead of doing multiple declarations with a single var.
I would still consider fetching only the necessary part, it may make the browser sluggish. however, if it's an internal task, this should be fine.
Last point: You're not at maximum memory levels, just max parsing levels.
Yes, it's too much to ask of a browser.
That amount of data would be managable if it already was data, but it isn't data yet. Consider that the browser has to parse that huge block of source code while checking that the syntax adds up for it all. Once parsed into valid code, the code has to run to produce the actual array.
So, all of the data will exist in (at least) two or three versions at once, each with a certain amount of overhead. As the array literal is a single statement, each step will have to include all of the data.
Dividing the data into several smaller arrays would possibly make it easier on the browser.
Do you really need all the data? can't you stream just the data currently needed using AJAX? Similar to Google Maps - you can't fit all the map data into browser's memory, they display just the part you are currently seeing.
Remember that 40 megs of hard data can be inflated to much more in browser's internal representation. For example the JS interpreter may use hashtable to implement the array, which would add additional memory overhead. Also, I expect that the browsers stores both source code and the JS memory, that alone doubles the amount of data.
JS is designed to provide client-side UI interaction, not handle loads of data.
EDIT:
Btw, do you really think users will like downloading 40 megabytes worth of code? There are still many users with less than broadband internet access. And execution of the script will be suspended until all the data is downloaded.
EDIT2:
I had a look at the data. That array will definitely be represented as hashtable. Also many of the items are objects, which will require reference tracking...that is additional memory.
I guess the performance would be better if it was simple vector of primitive data.
EDIT3: The data could certainly be simplified. The bulk of it are repeating strings, which could be encoded
in some way as integers or something. Also, my Opera is having trouble just displaying the text, let alone interpreting it.
EDIT4: Forget the DateTime objects! Use unix era timestamps or strings, but not objects!
EDIT5: Your processor doesn't matter because JS is single-threaded. And your RAM doesn't matter either, most browsers are 32bit, so they can't use much of that memory.
EDIT6: Try changing the array indices to sequential integers (0, 1, 2, 3...). That might make the browser use more efficient array data structure. You can use constants to access the array items efficiently. This is going to cut down the array size by huge chunk.
Try retrieving the data with Ajax as an JSON page. I don't know the exact size but I've been able to pull large amounts of data into Google Chrome that way.
Use lazy loading. Have pointers to the data and get it when the user asks.
This technique is used in various places to manage millions of records of data.
[Edit]
I found what I was looking for. Virtual scrolling in the jqgrid. That's 500k records being lazy loaded.
I would try having it as one big string with separator between each "item" then use split, something like:
var largeString = "item1,item2,.......";
var largeArray = largeString.split(",");
Hopefully string won't exhaust the stack so fast.
Edit: in order to test it I created dummy array with 200,000 simple items (each item one number) and Chrome loaded it within an instant. 2,000,000 items? Couple of seconds but no crash. 6,000,000 items array (50 MB file) made Chrome load for about 10 seconds but still, no crash in either ways.
So this leads me to believe the problem is not with the array itself but rather it's contents.. optimize the contents to simple items then parse them "on the fly" and it should work.