How does Google Analytics In-Page Analytics work? - javascript

Google Analytics features an 'In-Page Analytics' view to show click-through rates and other information directly on your own website. I'm looking to build something similar that logs all clicks.
The problem is I'm not really sure how Google implement their In-Page Analytics views - they seem to use an iframe, or two, and have injected their own HTML and JavaScript onto other pages.
How would one go about doing such a thing - are iframes the best way to go? How would you avoid the same-origin security policies of Javascript if domainX is trying to manipulate the rendering of domainY?

This is a very interesting question. You're right, the same origin policy forbids injecting JS. But Google Analytics has an advantage: it already is in your site (the tracker code).
So here is how it works (as far as I can see):
When you open in-page analytics, you are first taken to https://www.google.com/analytics/reporting/iyp_launch
This page redirects to your site and adds a Google session to the url (like http://example.com/#gaso=THESESSION
The tracker now checks if the referrer is iyp_launch and gaso is set. If yes, it does not only load the tracker, but also injects the JS needed for requesting further data and rendering the overlays. This way, the JS is executed inside the frame (or window) and bypasses the same-origin policy.
Since Google Anaytics already tracks your visit (i.e. identifies you as the same user that viewed the previous page), it can from then on inject the additional JS along with the tracker until your visit is over (i.e. you close the page). This way, the overlays can be rendered again after you click a link.
So I guess the bottom line is this: Things like in-page analytics can be done if the site's owner has already trusted you by adding a script you control to his website (this is a good example why one should be very careful before doing such a thing). If you don't have that kind of access to the site, it might be impossible to bypass the same-origin policy - at least, I can't think of another way to do it (except maybe proxying all the requests through your sever, but that leads to other major problems).

I think this can be done quite simply. You inject javascript that makes sure that whenever a user clicks a link it requests/posts to a special page in the iframe. Something like this (jquery):
$('a').live('click', function() {
$('#' + iframeid).attr('http://somedomain.com/my_magic_page.php?linkClicked=' + $(this).attr('id') + '&page=' + window.location.toString());
}
Don't know if it'll work though.

Related

Iframes security tips and scripts

I am trying to launch a website for myself which people might be using in future. Currently I am allowing users to post iframes for YouTube and Google Maps etc. Copy entire 'iframe' from Google Maps or YouTube and paste it in post box just to keep it simple.
Later I am storing it in MySQL database. I am displaying this post on some page. I am little worried since though I have asked user to paste only YouTube or Maps iframes, a devil mind might put src of malicious code.
What are all the possible ways to prevent this?
I think there are multiple risks, some that come to mind are:
Cross-site scripting. There are too many ways to achieve this if you allow the full <iframe> tag to be displayed as entered. This is probably the main risk, and the showstopper. It would be really hard to prevent XSS if you just write the full iframe tag (as entered by an attacker) into subsequent pages. If you really want to do this, you should look into HTML sanitization like Google Caja or HTMLPurifier or similar, but it is a can of worms that you better avoid if possible.
Information leak to malicious website. This very much depends on the browser (and the exact version of such browser), but some information (like for example teh window size, etc.) does leak to the website in an iframe, even if it's from a different origin.
Information / control leak from malicious website. Even worse than the previous, the embedded website would have some control over the window, for example it can redirect it (again, I think it depends on the browser though, I'm not quite sure), or can change the url hash fragment. Also if postMessage is used, the iframe can send messages to your application, which can be exploited if your application is not properly secured (not necessarily right now, but at any time in the future, like 5 years from now, after much development).
Arbitrary text injection, possibly leading to social engineering. Say an adversary includes a frame that says something like "You are the winner of this month's super-prize! Call 1-800-ATTACKER to provide your details and get your reward!"... You get the idea. The message would look like a legitimate one from your website, when it's not.
So you'd better not allow people to enter full tags as copied from Google Maps or anywhere else. There appears to be a finite set of things you want to allow (like for example Youtube videos and Google Maps links are only two), for which you should have customized controls. The user would only enter the video id/slug (the part after ?v=...), or would paste the full link, from which you would take the id, and you would make the actual tag for your page on the server side. The same for Google Maps, if the user navigates to wherever he wants in a Maps window and pastes the url, you can make your own iframe I think, because everything is in the url in Google Maps.
So in short, you should not allow people entering tags. XSS can be mitigated by sanitizers, but other risks listed above cannot.

Is there a way to prevent page redirection caused by ads?

My site is using Javascript ads code, and sometimes one of the ads redirects the page, and this is not a good practice according to Google (I got banned temporarily by Google until I solved the issue on my site).
Is there a way to prevent external Javascript redirect on the site (beside remove the ads)? Can you do this on the Apache configuration side to keep the domain in the address bar unchanged?
Working on the assumption that the adverts load external code:
Can you do this on the Apache configuration side to keep the domain in the address bar unchanged?
No. The advert isn't coming from your server. Your server can't influence it.
Is there a way to prevent external Javascript redirect on the site (beside remove the ads)?
No. The script will be loaded into the global scope and you have no opportunity to block access to things it might use to redirect.
Removing the ads is the only real option. Don't use advertising platforms that do a bad job of filtering out adverts that use such shady practises.
You can place the advertiser tag inside an iframe and add the sandbox attribute on the iframe.
In the sandbox attribute you specify which capabilities you grant the iframe.
Enter any capabilities you want but omit the allow-top-navigation navigation option and it will not allow it to navigate your site.
You can see all available attribute options here:https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe
Scroll down to the sandbox explanation.
Know that if you put sandbox="" nothing will be allowed because only what you specify will be available for the iframe - and it is a problem for ads. Some advertiser might even serve ads for 100% crippled iframes.

URL tracking functionality

I want my webpage to have two parts. The top part has a textbox. When the user types a URL into the textbox, the bottom part browses to the content of that URL. When the user clicks a link within the bottom part, the bottom part navigates to the new URL, and the textbox in the top part changes to the new URL. How can I do it?
NOTE: This behavior is the same as in Google Translate (e.g. here), but without any translation.
first problem..
Same origin issue
The only way to achieve what you are asking is exactly the way google translate does what it does - which is to use a server-side powered script as a proxy request:
http://translate.google.com/translate_un?depth=1&hl=en&ie=UTF8&prev=_t&rurl=translate.google.com&sl=auto&tl=en&twu=1&u=http://de.wikipedia.org/wiki/USA&lang=de&usg=ALkJrhgoLkbUGvOPUCHoNZIkVcMQpXhxZg
The above is the URL taken from the iframe that Google translate uses to display the translated page. The main thing to note is that the domain part of the URL is the same as the parent page's URL http://translate.google.com -- if both your frame and your parent window do not share the same domain, then your parent window's JavaScript wont be able to access anything within the iframe. It will be blocked by your browser's in-built security.
Obviously the above wont be a problem if in your project you are only ever going to be navigating your own pages (on the same domain), but considering you are proffering Google Translate as an example I'm assuming not.
What would Google do?
What the above URL does is to ask the server-side to fetch the wikipedia page and return it so that the iframe can display it - but to the iframe this page appears to be hosted on translate.google.com rather than wikipedia. This means that the iframe stays within the same origin as the parent window, and means that JavaScript can be used to edit or modify the page within the iframe.
next problem....
Rewrite the proxied content
Basically what I'm saying is that this can't be achieved with just HTML and client-side JavaScript - you need to have something to help from the server-side i.e. PHP, Python, Ruby, Lisp, Node.. and so on. This script will be responsible for making sure the proxied page appears/renders correctly e.g. you will have to make sure relative links to content/images/css on the original server are not broken (you can use the base tag or physically rewrite relative links). There are also many sites that would see this as an illegal use of their site, as per their site's terms of use and so should be black listed from your service.
final problem..?
Prevent the user from breaking away from your proxy
Once you have your proxy script, you can then use an iframe (please avoid using old framesets), and a bit of JavaScript magic that onload or ondomready of the iframe rewrites all of the links, forms and buttons in the page. This is so that when clicked or submitted, they post to your proxy script rather than the original destination. This rewrite code would also have to send the original destination to your proxy script some how - like u in the Google translate URL. Once you've sorted this, it will mean your iframe will reload with the new destination content, but - all importantly - your iframe will stay on the same domain.
too many problems!
If it were me, personally, I'd rethink your strategy
Overall this is not a simple task, and it isn't 100% fullproof either because there are many things that will cause problems:
Certain sites are designed to break out of frames.
There are ways a user can navigate from a page that can not be easily rewritten i.e. any navigation powered by JavaScript.
Certain pages are designed to break when served up from the wrong host.
Sites that do this kind of 'proxying' of other websites can get into hot water with regards to copyright and usage.
The reason why Google can do it is because they have a lot of time, money and resources... oh and a great deal of what Google translate does is actually handled on the server-side - not in JavaScript.
suggestions
If you are looking for tracking users navigating through your own site:
Use Google Analytics.
Or implement a simple server-side tracking system using cookies.
If you are looking to track users coming to your site and then travelling on to the rest of the world wide web:
Give up, web technologies are designed to prevent things like this.
Or join an online marketing company, they do their best to get around the prevention of things like this.
add a javascript function to your second frame -
<frame id="dataframe" src="frame_a.htm" onload="load()">
let the text box have an id - say "test"
function load()
{
document.getElementById('test').value=document.getElementById('dataframe').src
}

Is it possible to read cookies of a visitor?

AdSense use javascript to catch the keywords of current page to display relevant ads. But I was recently noticed that it will show ads related to my browsing history too. It seems that the javascript code can read my cookie (in general I mean, e.g. list of domains visited) to display relevant ads.
Is it practically possible to read cookies of a visitor?
No, it isn't possible to do so directly due to the same origin policy. (If it were possible, then you could make a page that stole the session of anyone on any domain!)
However, many websites include code from ad providers (read Google) on their page -- so that Google knows that you have visited website X because website X allows Google code to run when someone visits it. When you later visit website Y and it asks Google to display some ads on its behalf, what happens is no mystery.
Google knows all.

Avoid Google Analytics from categorizing pop-up as entrance

in my google analytics account there is a page tracked usually opened as a javascript window.open() pop-up (same domain as referring page).
unfortunately, g.a. categorizes the pop-up page as entrance, although it is just a step in the whole navigation flow.
how can i avoid this?
thanks for your help!
You will need to use GA events.
On your popup page: (untested)
<script>
window.opener.pageTracker._trackPageview(window.location.href);
</script>
and remove the existing tracking code from the popup entirely.
note: I'm not sure if this will be displayed as an "entrance." but it may be worth a shot.
http://aktagon.com/projects/jquery/google-analytics
This Jquery GA plugin has a lot of options, a big one being that you can set certain pages to be ignore/categorized in certain ways.
Also, If you setup a funnel or goal with regular expressions, you could set it to only match sites from your homepage and off (and not the popup.)
I don't know the solution here, but I do know the problem: window.open does not carry the HTTP referrer through (on some browsers at least), and without a same-site referrer Google Analytics considers a pageview to start a new (direct-access) session.
This blog post from 2006 has some possible solutions - I can't vouch for any of them though.

Categories