How to build 'real time visitors on site', Google API?

How to build 'real time visitors on site', Google API? - javascript

I'm building a custom admin dashboard for users on our site who create posts. I want to show them the active amount of visitors on their posts only (not on the entire site).
I want it to act exactly like GA does it:
I was originally thinking of building this from scratch, but in retrospect it might be easier to use the GA API?
I've stared at the docs for forever and I'm just not groking it, so I'm coming here for help.
We have ~5,000 posts total, and I some people on our site have authored over 1000 posts, so the 'input' to GA will be anywhere from 1 to 1000+ slugs (for only their posts).
I want a combined amount of on-site traffic for their posts only.
Optionally, maybe it would have to be reversed... I'm not sure if GA can show it, but even better probably would be to get a content breakdown of the realtime visitors from the API, with 5000 max results. From there I can filter through the result set slugs (along with then number of users on each), and compare those results to each slug which belongs to that user, then just sum the totals on my end.
Is this something the Google API could help me with? which API endpoint would I need to use? Is it possible to have 5000+ max results for URLs with traffic on them from the API?
Thanks!

Yes, it is possible.
It seems that you should utilize Real Time Data: get endpoint.
Additionally, to limit results for specific pages (posts) only, you should use dimension filters (filters which will select only specific page views before calculating aggregated result), and 'ga:pagePath' looks like the one you need:
ga:pagePath
UI Name: Page
A page on your website specified by path and/or query parameters. Use in conjunction with hostname to get the full URL of the page.
Source
You might prefer using ga:pageTitle instead, if you have similar title for posts of a single author, and you haven't got common path elements in posts of the same author.
So you do something like:
GET https://www.googleapis.com/analytics/v3/data/realtime
ids=ga:<your_analytics_id>
metrics=rt:activeUsers
dimensions=rt:pagePath
filters=rt:pagePath=~/authors/123/*
Please notice that there maybe slight difference in real time and non-realtime API (e.g. use of 'rt' instead of 'ga' above), and generally realtime-API is still in beta.
Generally speaking, you should go here: Real Time Reporting API - Developer Guide and look through the links in the table of contents (left part of the page).
What about 'building from scratch' idea: it's rather simple from the developer's perspective, but it could be complex from the dev-ops perspective. I.e., it's not a problem to write code which would aggregate such metrics. But it could be a problem to make a system which will sustain required for that task amount of requests per second.

I think you will want to apply your second approach: pull down realtime visitors for all slugs and then aggregate by author on your own server.
There is a quota of 10,000 queries per profile per day. Using your first approach, it sounds like you would be performing a query for each author. Say you have 50 authors. This would leave you only 20 queries/day/author (10,000/50). Factoring in the time dimension, this would allow you only 8.33 (200/24) queries per hour for each author. Not very "realtime like".
If you have problems getting it going, check out http://www.embeddedanalytics.com - we have done many implementations such as this. In fact, we even have hat "Right Now" realtime widget.
Is there a way to determine the author based on the slug title?

Related

Caching information from API queries - Limited to 10 per 10s

relatively new to databases here (and dba).
I've been recently looking into Riot Games' APIs, however now realising that you're limited to 10 calls per 10 seconds, I need to change my front-end code that was originally just loading all the information with lots of and lots of API calls into something that uses a MySQL database.
I would like to collect ranked data about each player and list them (30+ players) in an ordered list of ranking. I was thinking, as mentioned in their Rate Limiting Page, "caching" data when GET-ing it, and then when needing that information again, check if it is still relevant - if so use it, if not re-GET it.
Is the idea of adding a time of 30 minutes (the rough length of a game) in the future to a column in a table, and when calling check whether server time is ahead of the saved time. Is this the right approach/idea of caching - If not, what is the best practice of doing so?
Either way, this doesn't solve the problem of loading 30+ values for the first time, when no previous calls have been made to cache.
Any advice would be welcome, even advice telling me I'm doing completely the wrong thing!
If there is more information needed I can edit it in, let me know.
tl;dr What's best practice to get around Rate-Limiting?

Generally yes, most of the large applications simply put guesstimate rate limits, or manual cache (check DB for recent call, then go to API if its an old call).
When you use large sites like op.gg or lolKing for Summoner look ups, they all give you a "Must wait X minutes before doing another DB check/Call", I also do this. So yes, giving an estimated number (like a game length) to handle your rate limit is definitely a common practice that I have observed within the Riot Developer community. Some people do go all out and implement actual caching though with actual caching layers/frameworks, but you don't need to do that with smaller applications.
I recommend building up your app's main functionality first, submit it, and get it approved for a higher rate limit as well. :)
Also you mentioned adjusting your front-end code for calls, make sure your API calls are in server-side code for security concerns.

Meteor.js - Should you denormalize data?

This question has been driving me crazy and I can't get my head around it. I come from a MySQL relational background and have been using Meteorjs and Mongo. For the purposes of this question take the example of posts and authors. One Author to Many Posts. I have come up with two ways in which to do this:
Have a single collection of posts - Each post has the author information embedded into the document. This of course leads to denormalization and issues such as if the author name changes how do you keep the data correct.
Have two collections: posts and authors - Each post has an author ID which references the authors collection. I then attempt to do a "join" on a non relational database while trying to maintain reactivity.
It seems to me with MongoDB degrees of denormalization is acceptable and I am tempted to embed as implementing joins really does feel like going against the ideals of Mongo.
Can anyone shed any light on what is the right approach especially in terms of wanting my app data to scale well and be manageable?
Thanks

Denormalisation is useful when you're scaling your application and you notice that some queries are taking too much time to complete. I also noticed that most Mongodb developers tend to forget about data normalisation but that's another topic.
Some developers say things like: "Don't use observe and observeChanges because it's slow". We're building real-time applications so that a normal thing to happen, it's a CPU intensive app design.
In my opinion, you should always aim for a normalised database design and then you have to decide, try and test which fields, that duplicated/denormalised, could improve your app's performance. Example: You remove 1 query per user. The UI need an extra field and it's fast to duplicated it, etc.
With the denormalisation you've an extra price to pay. You've to update the denormalised fields according to the main collection.
Example:
Let's say that you Authors and Articles collections. On each article you have the author name. The author might change his name. With a normalised scenario, it works fine. With a denormalised scenario you have to update the Author document name AND every single article, owned by this author, with the new name.
Keeping a normalised design makes you life easier but denormalisation, eventually, becomes necessary.
From a MeteorJs perspective: With the normalised scenario you're sending data from 2 Collections to the client. With the denormalised scenario, you only send 1 collection. You can also reactively join on the server and send 1 collection to the client, although it increases the RAM usage because of MergeBox on the server.
Denormalisation is something that it's very specify for you application needs. You can use Kadira to find ways of making your application faster. The database design is only 1 factor out of many that you play with when trying to improve performance.

Breeze.js cache limitations? Or Browser?

We are investigating using Breeze for field deployment of some tools. The scenario is this -- an auditor will visit sites in the field, where most of the time there will be no -- or very degraded -- internet access. Rather than replicate our SQL database on all the laptops and tablets (if that's even possible), we are hoping to use Breeze to cache the data and then store it locally so it is accessible when there is not a usable connection.
Unfortunately, Breeze seems to choke when caching any significant amount of data. Generally on Chrome it's somewhere between 8 and 13MB worth of entities (as measured by the HTTPResponse headers). This can change a bit depending on how many tabs I have open and such, but I have not been able to move that more than 10%. the error I get is the Chrome tab crashes and tells me to reload. The error is replicable (I download the data in 100K chunks and it fails on the same read every time and works fine if I stop it after the previous read) When I change the page size, it always fails within the same range.
Is this a limitation of Breeze, or Chrome? Or windows? I tried it on Firefox, and it handles even less data before the whole browser crashes. IE fares a little better, but none of them do great.
Looking at performance in task manager, I get the following:
IE goes from 250M memory usage to 1.7G of memory usage during the caching process and caches a total of about 14MB before throwing an out-of-memory error.
Chrome goes from 206B memory usage to about 850M while caching a total of around 9MB
Firefox goes from around 400M to about 750M and manages to cache about 5MB before the whole program crashes.
I can calculate how much will be downloaded with any selection criteria, but I cannot find a way to calculate how much data can be handled by any specific browser instance. This makes using Breeze for offline auditing close to useless.
Has anyone else tackled this problem yet? What are the best approaches to handling something like this. I've thought of several things, but none of them are ideal. Any ideas would be appreciated.
ADDED At Steve Schmitt's request:
Here are some helpful links:
Metadata
Entity Diagram (pdf) (and html and edmx)
The first query, just to populate the tags on the page runs quickly and downloads minimal data:
var query = breeze.EntityQuery
.from("Countries")
.orderBy("Name")
.expand("Regions.Districts.Seasons, Regions.Districts.Sites");
Once the user has select the Sites s/he wishes to cache, the following two queries are kicked off (used to be one query, but I broke it into two hoping it would be less of a burden on resources -- it didn't help). The first query (usually 2-3K entities and about 2MB) runs as expected. Some combination of the predicates listed are used to filter the data.
var qry = breeze.EntityQuery
.from("SeasonClients")
.expand("Client,Group.Site,Season,VSeasonClientCredit")
.orderBy("DistrictId,SeasonId,GroupId,ClientId")
var p = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p1 = breeze.Predicate("SeasonId", "==", SeasonId);
var p2 = breeze.Predicate("DistrictId", "==", DistrictId);
var p3 = breeze.Predicate("Group.Site.SiteId", "in", SiteIds);
After the first query runs, the second query (below) runs (also using some combination of the predicates listed to filter the data. At about 9MB, it will have about 50K rows to download). When the total download burden between the two queries is between 10MB and 13MB, browsers will crash.
var qry = breeze.EntityQuery
.from("Repayments")
.orderBy('SeasonId,ClientId,RepaymentDate');
var p1 = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p2 = breeze.Predicate("SeasonId", "==", SeasonId);
var p3 = breeze.Predicate("DistrictId", "==", DistrictId);
var p4 = breeze.Predicate("SiteId", "in", SiteIds);
Thanks for the interest, Steve. You should know that the Entity Relationships are inherited and currently in production supporting the majority of the organization's operations, so as few changes as possible to that would be best. Also, the hope is to grow this from a reporting application to one with which data entry can be done in the field (so, as I understand it, using projections to limit the data wouldn't work).
Thanks for the interest, and let me know if there is anything else you need.

Here are some suggestions based on my experience building on an offline capable web application using breeze. Some or all of these might not make sense for your use cases...
Identify which entity types need to be editable vs which are used to fill drop-downs etc. Load non-editable data using the noTracking query option and cache them in localStorage yourself using JSON.stringify. This avoids the overhead of coercing the data into entities, change tracking, etc. Good candidates for this approach in your model might be entity types like Country, Region, District, Site, etc.
If possible, provide a facility in your application for users to identify which records they want to "take offline". This way you don't need to load and cache everything, which can get quite expensive depending on the number of relationships, entities, properties, etc.
In conjunction with suggestion #2, avoid loading all the editable data at once and avoid using the same EntityManager instance to load each set of data. For example, if the Client entity is something that needs to be editable out in the field without a connection, create a new EntityManager, load a single client (expanding any children that also need to be editable) and cache this data separately from other clients.
Cache the breeze metadata once. When calling exportEntities the includeMetadata argument should be false. More info on this here.
To create new EntityManager instances make use of the createEmptyCopy method.
EDIT:
I want to respond to this comment:
Say I have a client who has bills and payments. That client is in a
group, in a site, in a region, in a country. Are you saying that the
client, payment, and bill information might each have their own EM,
while the location hierarchy might be in a 4th EM with no-tracking?
Then when I refer to them, I wire up the relationships as needed using
LINQs on the different EMs (give me all the bills for customer A, give
me all the payments for customer A)?
It's a bit of a judgement call in terms of deciding how to separate things out. Some of what I'm suggesting might be overkill, it really depends on the amount of data and the way your application is used.
Assuming you don't need to edit groups, sites, regions and countries while offline, the first thing I'd do would be to load the list of groups using the noTracking option and cache them in localStorage for offline use. Then do the same for sites, regions and countries. Keep in mind, entities loaded with the noTracking option aren't cached in the entity manager so you'll need to grab the query result, JSON.stringify it and then call localStorage.setItem. The intent here is to make sure your application always has access to the list of groups, sites, regions, etc so that when you display a form to edit a client entity you'll have the data you need to populate the group, site, region and country select/combobox/dropdown.
Assuming the user has identified the subset of clients they want to work with while offline, I'd then load each of these clients one at a time (including their payment and bill information but not expanding their group, site, region, country) and cache each client+payments+bills set using entityManager.exportEntities. Reasoning here is it doesn't make sense to load several clients plus their payments and bills into the same EntityManager each time you want to edit a particular client. That could be a lot of unnecessary overhead, but again, this is a bit of a judgement call.

#Jeremy's answer was excellent and very helpful, but didn't actually answer the question, which I was starting to think was unanswerable, or at least the wrong question. However #Steve in the comments gave me the most appropriate information for this question.
It is neither Breeze nor the Browser, but rather Knockout. Apparently the knockout wrapper around the breeze entities uses all that memory (at least while loading the entities and in my environment). As described above, Knockout/Breeze would crap out after reading around 5MB of data, causing Chrome to crash with over 1.7GB of memory usage (from a pre-download memory usage around 300MB). Rewriting the app in ANgularJS eliminated the problem. So far I have been able to download over 50MB from the exact same EF6 model into Breeze/Angular, total Chrome memory usage never went above 625MB.
I will be testing larger payloads, but 50 MB more than satisfies my needs for the moment. Thanks everyone for your help.

How to store documents like google docs?

I'm interested how does google docs store documents on server side because I need to create similar application.
Does it use pure RTF/ODF files or own database?
How do they make possible versioning and undo/redo feature?
If anybody have knowing according this question please share with me.

To answer you question specifically to how Google Docs works. They use a technology called
Operational Transformation
You may be able to use one of operational transformation engines listed on: https://en.wikipedia.org/wiki/Operational_transform#OT_software
The basic idea is that every operation has a context, e.g. "delete the fourth word in the fifth paragraph" or "add an input box after the button". The clients all send each other operations thru the server. The clients and server each keep their own version of the document and apply operations as they come.
When operations have overlapping contexts, there are a bunch of rules that kick in to resolve conflicts. Like you can't modify something that's been deleted, so the delete must come last in a sequence of concurrent operations on that context.
It's possible that the various clients and server will get out of sync, so you need a secondary algorithm to maintain consistency. One way would be to reload the data from the server whenever a conflict is detected.
--This is an answer I got from a professor when I asked the same thing a couple of years ago.

You should use a database. Perhaps a table storing each document revision. First, find a way to determine whether an update is significant or not. You can store minor changes client side for redo/undo, and then, either periodically or per some condition (e.g., user hits save), create a database entry per revision (you can store things like bytes changed, bytes added, bytes deleted, etc.).
Take a look at MediaWiki, which is open source, and essentially does what you're asking (i.e., take a look at their tables and code).
RTF/ODF would typically be generated, and served, when a user requests exporting the document.

Possibly, you should consider utilizing Google Drive's public API. See link for details.

How to create Custom Path definitions and Filters for Website Sections in Omniture (SiteCatalyst)?

I tried every possible way for me to understand to do this.
I have a site which accepts only registered users. When a user registers, we get his/her Department, Role, Name and Region. What I am hoping that when this user visits the site, the omniture shows all his credentials (name, role, department, region) along with generic variables like Page views, country, etc.
The report I mentioned in the previous paragraph is needed for the whole site + individual sections of the site. I have three sections in the site (Newsletter, Events and About). I am hoping to get individual reporting for all these three sections + global site reports.
Update 1
I am extremely indebted to Chris de Groot and Crayon Violent for their help. I have summarized my problem and thought may be I could be post an update. The problem set is actually smaller now:
The thing I have to achieve is:
Get Normal Traffic Variables for the Newsletter Section of the site (Page Views, Visits and Unique Visitors in a meaningful way)
See all registered users of my site who visited the Newsletter and categorize them based on their Role, Region
I have all the attributes of the user (Role, Department, Region etc) stored in the DB. Here is what I did up until now.
On the Newsletter Page, I used the following code:
s.pageName= document.title.split(" | ") [0];
s.server=""
s.channel="Newsletter"
s.pageType=""
s.prop1="Newsletter Issue of December"
/* Conversion Variables */
s.eVar1="Newsletter Issue of December"
s.eVar2="Name of the User"
s.eVar3="Role of the User"
s.eVar4="Region of the User"
s.eVar5=""
s.events = "event2";
My question is:
Is this the right approach for the things I need
How would I make sure that the Reports in SiteCatalyst are formatted/layouted in the way I want them.

The way I think of this is to start with the basics. You can then build many solid reports on top of that.
You need to think of what you want to track and what they are.
If it is a number it is 95% of the time going to be a metric and thus you need to use events. For example "number of registrations".
It you then want to breakdown that number you need dimensions. A dimension is an eVar.
So say you wanted to breakdown registrations by department, you define both variables in your Report Suite Manager (Success Events and Conversion Variables). Then in your web site tagging when a visitor arrives you immediately set the department eVar to the department name (it stays set across pages/visits etc, depending on how you configured it earlier in the Report Suite Manager) and after however many pages the user completes registration you set the event for registration.
e.g
s.eVar6 = "HR"; // set HR in the department Dimension
/** later **/
s.events = "event3"; // trigger the event for registration.
// This increments the Metric and it is allocated to
// eVar6 and any other currently active eVars.
When Adobe Analytics has done its processing you can cut this data many ways. You can for example view a "Custom Conversion" report by department(rows) and with the counts of registration(columns).
There is a lot to Analytics, it is an awesome tool, it takes years to fully learn, but if you nail the relationship between Metrics and Dimensions you will really be able to do a lot very quickly.
Data like this is in the right format to really allow many reports you want
Ps: The other kind of number in Analytics is an instance. They are really a count of how often an evar was set. If you want to also Path that data then you would use a prop variable (also known as a traffic variable). They are good for simple counts and path analysis.

We Keep Coding

JavaScript is the programming language of the Web.