Can anyone share best practices for troubleshooting google anlytics code?
Has anyone built a debugging tool? Does google have a linter hidden somewhere? Does anybody have a good triage logic diagram?
I'll periodically set up different parts of GA and it seems like every time I do it takes 4 or 5 days to get it working.
The workflow looks like this:
Read the docs on the feature (e.g. events, custom variables).
Implement what appears to be the correct code based on the docs.
Wait a day.
See no data.
Google every version of the problem I can imagine. Find what may be a solution.
Change my code.
Wait a day.
See no data.
Loop:
Randomly move elements of the tracking code around.
Wait a day.
If other parts break, tell ceo, get yelled at, revert changes.
If data appears, break.
Pray it continues to work/I never have to change the tracking code again.
For obvious reasons, I'm not satisfied with this workflow and hoping someone has figured out something I haven't.
Everything I do, debugging GA code, stops and starts with the Google Analytics Debugger Chrome Extension. It prints out to the console a summary of the data it has sent to Google Analytics which, for all purposes except testing profile filters, is all you need. It'll eliminate the "wait a day" step.
If you're not a fan of Google Chrome, you can inspect the HTTP requests yourself to see how the data is parsing. You can use this guide to figure out what each paramater in the URL represents.
In terms of ensuring the features I've installed or the code itself is working, I'll open a fresh browser (cleared of cookies), and navigate to the site I'm testing via Google search. I'll proceed to navigate to all of the pertinent pages, and trigger all the pertinent events, all the while ensuring that the requests are being sent to Google, and that the session isn't broken at any point (by either keeping an eye on the Session Count, or ensuring that the traffic source doesn't change from organic/google to direct or a self-referral.
Screenshot:
To begin with, this answer isn't at odds with any portion of either of the two answers before mine--i.e. you could certainly implement them all without conflict.
My answer just reflects my own priority, which is that the latency issue. Latency makes debugging far more difficult than it should be. Ten minutes of latency while waiting for the compiler to finish is irritating, four hours (minimum GA latency) is painful.
So for me, the first step in building a GA de-bugging framework was to somehow get the GA results in real-time--in other words, if i changed a regular expression filter, i needed to catch the traffic processed by that filter. So removing the 4-24 hour latency in getting results from the GA server was critical.
The easiest way i have found so far to do this is to modify the GA tracking code on each page of your Site so that it sends a copy of each GIF Request to your own server.
To do this, immediately before the call to trackPageview(), add this line:
pageTracker._setLocalRemoteServerMode();
This will send the entire request header to your server access log, which you can parse in real time. (Specifically, your server writes to the access log one line at a time--one line corresponds to one request. All of the GA data is packaged and set as a request header, so there's perfect coincidence between the two.
yahelc answer is great, but I'd like to add my 2c here.
Get yourself a nice sniffer to see the hits flowing.
Nice options:
Wasp
Charles
HTTPFox
Fiddler
Then implement your changes on QA.
Test this new setup on QA. Things you should keep an eye on.
Always make sure that the basic pageview fires. It should have at least an utmp value and no utmt set.
Make sure the visitor Id doesn't get overwritten. This is the second number on the __utma cookie. This number should be your userid, if it changes then things are broken.
Make sure your pageviews contain the page and session variables you set. If you set any. They are coded into the param utme.
Make sure that any Visitor custom var is fired before your basic pageview. utmt=custom variable
Make sure the source data is not overwritten (Campaign/medium/source/content/keyword) - These are set on the __utmz cookie. If it gets overwritten by direct or a referral of you own site there's something wrong.
If you miss any event it may be due a reqired field missing or the last value being a float or string. The value of an event must be an integer.
If you're using the ecomerce double check all your parameters. Make sure that you're firing everything as strings here and that unused parametrs are empty strings.
triple check your account number. UA-XXXXX-X.
If your doing something with custom JS make sure to test on all browsers, and try to get at least the basic tracking on a safe zone where you are sure things won't break.
Send debug info about javascript code that might break GA to GA. Check this.
Related
relatively new to databases here (and dba).
I've been recently looking into Riot Games' APIs, however now realising that you're limited to 10 calls per 10 seconds, I need to change my front-end code that was originally just loading all the information with lots of and lots of API calls into something that uses a MySQL database.
I would like to collect ranked data about each player and list them (30+ players) in an ordered list of ranking. I was thinking, as mentioned in their Rate Limiting Page, "caching" data when GET-ing it, and then when needing that information again, check if it is still relevant - if so use it, if not re-GET it.
Is the idea of adding a time of 30 minutes (the rough length of a game) in the future to a column in a table, and when calling check whether server time is ahead of the saved time. Is this the right approach/idea of caching - If not, what is the best practice of doing so?
Either way, this doesn't solve the problem of loading 30+ values for the first time, when no previous calls have been made to cache.
Any advice would be welcome, even advice telling me I'm doing completely the wrong thing!
If there is more information needed I can edit it in, let me know.
tl;dr What's best practice to get around Rate-Limiting?
Generally yes, most of the large applications simply put guesstimate rate limits, or manual cache (check DB for recent call, then go to API if its an old call).
When you use large sites like op.gg or lolKing for Summoner look ups, they all give you a "Must wait X minutes before doing another DB check/Call", I also do this. So yes, giving an estimated number (like a game length) to handle your rate limit is definitely a common practice that I have observed within the Riot Developer community. Some people do go all out and implement actual caching though with actual caching layers/frameworks, but you don't need to do that with smaller applications.
I recommend building up your app's main functionality first, submit it, and get it approved for a higher rate limit as well. :)
Also you mentioned adjusting your front-end code for calls, make sure your API calls are in server-side code for security concerns.
I don't know if this is possible or even if it exist, but I'm very curious to find out. I don't want to give the wrong ideas, and describing what I'm trying to achieve before giving out examples definitely will - so I'll just dive right into it.
As far as I know, codes on websites are only ever executed when those websites are accessed by someone. If no one accesses those website, the codes just sit there. The codes have no reason to run if no one's using them, right?
Now what I'm going to propose may sound ridiculous, but please hear me out. I don't know if there is a way to do this, so I'm just going to ask. Is there a way to run those codes without someone accessing the website itself?
Now I know some of you are like, "Huh? What is he talking about? Why would you even want to run the codes if no one is on the website? That literally makes no sense," so I'm going to try and justify why I want something like that to be possible.
For example, if you want to create a script for automatically logging out an user if they've stayed inactive for a certain amount of time, you would need to check whether they've been active in the last (amount of time to wait before logging them out). You can use AJAX to check if they've been active in the last (amount of time to wait before logging them out). If they navigates, or refreshes the page, then it'll reset the counter, and let you know that they've been active in the last (amount of time to wait before logging them out). However, if they do nothing in (amount of time to wait before logging them out), they will be automatically logged out.
If they closed their browser, or exit the tabs that monitors their progress using AJAX, it will no longer monitor their progress, and thus their counter will not be updated, and thus you will have no idea whether they've been active or not. You can't just log them out if they close a tab or a browser, because what if they have multiple tabs or browsers of your website open? Then you would only want to log them out when they closed all of them.
I have other examples, but this is the gist of it. Is there a way to execute codes on a website without the website being accessed by a user? Thank you.
You are looking for cron jobs. They are basically scheduled jobs that run at set times. A cron job can run all kinds of scripts including PHP scripts.
Whether such a script can easily clear expired sessions, I don't know. It will probably depend on the way you store the sessions.
It may be just as easy to implement it in the website. If you store the last activity timestamp of a user, you can just check on a new request whether that timestamp is too old, and if so, delete the session and redirect to the login page. That way, the user officially remains logged in until their next request.
Optionally you may delete old sessions that are remembered by PHP. See related question: Cleanup PHP session files.
One approach is that you could run your PHP scripts on a timer using CRON jobs.
These jobs typically repeat every x hours, minutes, or days.
I'm not sure about the example you provided, though.
I've created a subscription-based system that deals with a large data-set. In its first iteration, it had semi-complicated joins that would execute, based on user-set filters, on every 'data view' page. Each query would fetch anywhere from a few kilobytes to several megabytes depending on the filter range. I decided this was unacceptable and so learned about APC (I had heard about its data-store features).
I moved all of the strings out of the queries into an APC preload routine that fires upon first login. In the same routine, I am running the "full set" join query to get all of the possible IDs for the data set into a $_SESSION variable. The entire set is anywhere from 100-800Kb, depending on what data the customer is subscribed to.
I convert this set into a JSON array and shuffle the data around dynamically when the user changes the filters. In creating the system I wanted it to seem as if the user was moving around lots of data very quickly, with minimal page loading (AJAX + APC when string representations are needed), as they played with the filters.
My multipart question is, is it possible for the user to effectively "cancel" the initial cache/query routine by surfing to another page after the first login? If so, can I move this process to an AJAX page for preloading, or does this carry the same problem? Or, am I just going about all of this in the wrong way? I came up with the idea on my own and I'm worried that I've created an unusable monster.
Also, I've been warned that my questions suck and I'm in danger of being banned. Every question I've asked has come from a position of intelligent wonder, written as well as I knew how at the time, and so it's really aggravating when an outsider votes me down without intelligent criticism. Just tell me what I did wrong and I will quickly fix the problem. Bichis.
I'm developing a Facemash-like rating system. Recently I decided to switch from simple voting links (href=vote.php?v=left or right while id's were stored in $_SESSION) to AJAX. Then I realised that it's extremely vurnelable. In a browser console cheater can invoke an infinite loop that checks picture url then, if it matches some specified url, votes for it, and if it doesn't match, votes for just left or right. Is there any way to prevent this besides obviously not implementing ajax voting at all? Maybe there is some command to break the loop in console or something?
This is just a specific example of a general principle: You can't trust anything sent to the server from the client. You must implement checks at the server end to rate-limit (or just limit), validate, etc., because people can send you any information they like (they don't even have to be using a web browser to do it).
In this case, if you're letting people vote on things, you have to limit how many times an individual can vote on a specific item (and probably rate-limit how often they can vote, period).
It's a simple case of a javascript that continuously asks "are there yet?" Like a four year old on a car drive.. But, much like parents, if you do this too often or, with too many kids at once, the server will buckle under pressure..
How do you solve the issue of having a webpage that looks for new content in the order of every 5 seconds and that allows for a larger number of visitors?
stackoverflow does it some way, don't know how though.
The more standard way would indeed be the javascript that looks for new content every few seconds.
A more advanced way would use a push-like technique, by using Comet techniques (long-polling and such). There's a lot of interesting stuff under that link.
I'm still waiting for a good opportunity to use it myself...
Oh, and here's a link from stackoverflow about it:
Is there some way to PUSH data from web server to browser?
In Java I used Ajax library (DWR) using Comet technology - I think you should search for library in PHP using it.
The idea is that server is sending one very long Http response and when it has something to send to the client it ends it and send new response with updated data.
Using it client doens't have to ping server every x seconds to get new data - I think it could help you.
You could make the poll time variable depending on the number of clients. Using your metaphor, the kid asks "Are we there yet?" and the driver responds "No, but maybe in an hour". Thankfully, Javascript isn't a stubborn kid so you can be sure he won't bug you until then.
You could consider polling every 5 seconds to start with, but after a while start to increase the poll interval time - perhaps up to some upper limit (1 minute, 5 minute - whatever seems optimal for your usage). The increase doesn't have to be linear.
A more sophisticated spin (which could incorporate monzee's suggestion to vary by number of clients), would be to allow the server to dictate the interval before next poll. The server could then increase the intervale over time, and you can even change the algorithm on the fly, or in response to network load.
You could take a look at the 'Twisted' framework in python. It's event-driven network programming framework that might satisfy what you are looking for. It can be used to push messages from the server.
Perhaps you can send a query to a real simple script, that doesn't need to make a real db-query, but only uses a simple timestamp to tell if there is anything new.
And then, if the answer is true, you can do a real query, where the server has to do real work !-)
I would have a single instance calling the DB and if a newer timestamp exists, put that new timestamp in a application variable. Then let all sessions check against that application variable. Or something like that. That way only one innstance are calling the sql-server and the number of clients does'nt matter.
I havent tried this and its just the first idéa on the top of the head but I think that cashe the timestamp and let the clients check the cashe is a way to do it, and how to implement the cashe (sql-server-cashe, application variable and so on) I dont know whats best.
Regarding how SO does it, note that it doesn't check for new answers continuously, only when you're typing into the "Your Answer" box.
The key then, is to first do a computationally cheap operation to weed out common "no update needed" cases (e.g., entering a new answer or checking a timestamp) before initiating a more expensive process to actually retrieve any changes.
Alternately, depending on your application, you may be able to resolve this by optimizing your change-publishing mechanism. For example, perhaps it might be feasible for changes (or summaries of them) to be put onto an RSS feed and have clients watch the feed instead of the real application. We can assume that this would be fairly efficient, as it's exactly the sort of thing RSS is designed and optimized for, plus it would have the additional benefit of making your application much more interoperable with the rest of the world at little or no cost to you.
I believe the approach shd be based on a combination of server-side sockets and client-side ajax/comet. Like:
Assume a chat application with several logged on users, and that each of them is listening via a slow-load AJAX call to the server-side listener script.
Whatever browser gets the just-entered data submits it to the server with an ajax call to a writer script. That server updates the database (or storage system) and posts a sockets write to noted listener script. The latter then gets the fresh data and posts it back to the client browser.
Now I haven't yet written this, and right now I dunno whether/how the browser limit of two concurrent connections screws up the above logic.
Will appreciate hearing fm anyone with thoughts here.
AS