images.google appearing as referral traffic - javascript

I see a decent amount of traffic, around 100 visits a day, that comes from an images.google domain but shows as referral traffic rather than organic in Google Analytics. I have some custom code to pull keywords out and set an organic source for a few variations of what Google Image Search referrers look like, and it works for every referrer I can run it against from the server log.
var ref = document.referrer;
if (ref.search(/www.google/) != -1 && ref.search(/imgres/) != -1) {
var regex = new RegExp("www.google.([^\/]+).*");
var match = regex.exec(ref);
ref = 'http://images.google.' + match[1] + '?' + ref.split('?')[1];
_gaq.push(['_setReferrerOverride', ref]);
} else if (ref.search(/maps.google/) != -1 && ref.search(/q=/) == -1) {
var regex = new RegExp("maps.google.([^\/]+).*");
var match = regex.exec(ref);
ref = 'http://maps.google.' + match[1] + '?q=' + encodeURIComponent('(not provided)');
_gaq.push(['_setReferrerOverride', ref]);
}
function splitUrl(url) {
var vals = {};
var split = url.split('?');
vals.base = split[0];
if(split.length > 1) {
var vars = split[1].split('&');
vals.params = {};
for(var i = 0, len = vars.length; i < len; i++) {
var valSplit = vars[i].split('=', 2);
vals.params[valSplit[0]] = valSplit[1];
}
}
return vals;
}
function joinUrl(urlObj) {
var vars = [];
for(key in urlObj.params)
if(urlObj.params.hasOwnProperty(key))
vars.push(key + '=' + urlObj.params[key]);
return urlObj.base + '?' + vars.join('&');
}
//fix keyword for old google image search
if(ref.match(/^http:\/\/images\.google\./) || ref.match(/^http:\/\/images\.google$/)) {
var refUrl = splitUrl(ref);
if(refUrl.params.prev && !refUrl.params.q) {
var prev = decodeURIComponent(refUrl.params.prev);
if(prev.indexOf('?q=') !== -1 || prev.indexOf('&q=') !== -1) {
var prevUrl = splitUrl(prev);
refUrl.params.q = prevUrl.params.q;
if(!refUrl.params.q)
refUrl.params.q = encodeURIComponent('(not provided)');
delete prevUrl.params.q;
refUrl.params.prev = encodeURIComponent(joinUrl(prevUrl));
}
_gaq.push(['_setReferrerOverride', joinUrl(refUrl)]);
} else if(!refUrl.params.q) {
refUrl.params.q = encodeURIComponent('(not provided)');
_gaq.push(['_setReferrerOverride', joinUrl(refUrl)]);
}
}
_gaq.push(['_addOrganic', 'images.google', 'q']);
_gaq.push(['_addOrganic', 'maps.google', 'q', true]);
This handles all of the referres that look like:
http://images.google.com/?q=
and
http://www.google.com/?imgres=
I don't know where the referral traffic is coming from. Has anyone else seen this?

Well it is natural for Google to recognize this domain as a referral as GA only includes by default a certain number of domains as Search Engines.
To solve this problem you can include such domain as a Search Engine using the "addOrganic()" Method.
To use this method, you must specify not only the domain of the search engine, but also the query string parameter used for searches. In the case of images.google.com it's "q".
On your GA tracking code, add the line:
_gaq.push(['_addOrganic', 'images.google.com', 'q', true]);
You can get more info on the Ga Help Site.
Hope this info helps,
Augusto Roselli
Web Analytics - digitalcube
#_digitalcube
www.dp6.com.br

If someone clicks on an image that shows up on standard google search, not images.google, the url might be different. You should try some urls from there. But besides that, the google images links that popup on normal Google will not include the query string if the user is logged in into a Google Account. It happened on Oct 2011 here are a couple of links on the subject:
Official Google Statement
Avinash's, always worth reading, opinion.
On normal organic google links Google Analytics shows these visits as coming from a (not provided) keyword from an organic medium. But if you click on an image on the SERP it won't be identified as an organic medium. It will be identified as a Referral, and that's probably the ones you are seeing.
So what you need to do is to verify if the google images link has the q parameter or not. If it doesn't have than it's coming from a logged user and should be reported as (not provided) to be consistent with google organic keywords. Just append &q=(not provided) to the _setReferrerOverride url you got. Remember to url encode that before appending to the url.
I'm also posting here the code I use. It's from Google Forums. But it's very similar to yours and doesn't handle the (not provided) keywords issue yet.
Note that it's very similar to yours with a few notable differences.
You strip the whole path from the images url, while mine keeps the
path.
You don't use the "true" keyword on "_addOrganic", that
may cause Google Images to be reported as google instead of
images.google source on your reports.
Here's the code I currently use:
//handle google images referrer
var ref = document.referrer;
if ((ref.search(/google.*imgres/) != -1)) {
var regex = new RegExp("\.google\.([^\/]+)(.*)");
var match = regex.exec(ref);
_gaq.push(
['_setReferrerOverride', 'http://images.google.' + match[1] +
unescape(match[2])],
['_addOrganic', 'images.google', 'q', true]
);
}
I'll be updating my code to handle (not provided) google images links and will post here as soon as I have it.

Related

slicing only the domain from the domain+TLD combination

I tried to write a Greasemonkey userscript that checks to see if the user is in one of a list of websites.
If the user is indeed in one of them, the script will alert:
Enough with this domain already!
The purpose of the script is to remind the user that he stop visiting this site (addiction-like behavior).
The output should include only the domain, without the TLD.
I have tried the following code which failed (the code runs on a collection of tlds and uses the collection to strip these away):
let sites = ['walla.com', 'mako.co.il'];
let tlds = new RegExp('\.+(com|co.il)');
for (let i = 0; i < sites.length; i++) {
if (window.location.href.indexOf(sites[i]) != -1 ) {
sites.forEach((e)=>{
e.replace(tlds, '').split('.').pop(),
});
alert(` Enough with this ${sites[i]} already! `);
}
}
No console errors.
To reproduce, install the script in Greasemoneky/Tampermonkey and try it in the listed sites.
You should iterate the sites, and if the href contains the site (sites[i]), replace everything after the domain (the don onward), alert, and break the loop:
const sites = ['walla.com', 'mako.co.il'];
const regex = /\..+/;
const href = 'http://www.mako.co.il/news?partner=NavBar'; // this replace window.location.href for demo purposes
for (let i = 0; i < sites.length; i++) {
if (href.includes(sites[i])) { // if the href includes the sites[i]
const domain = sites[i].replace(regex, ''); // remove the 1st dot and everything after it to get the domain name
alert(` Enough with this ${domain} already! `);
break; // or return if in a function
}
}

Have a line of javascript being added to the <head> tag of all my domains. Not sure what it is or where it is coming from

I have several domains hosted at one hosting service. All of them have a line of javascript added to the tag. It appears to be changing Math.random to something else, but I can't figure out what it's trying to accomplish.
I also can't find out where it is coming from. I have mostly WordPress domains, but 2 of my sites are static and the files there haven't been touched in years. They still show this. Also, a Drupal site has this added too.
So, I'm hoping for help with two things: 1) what is it doing? and 2) where is it coming from?
I see it in any browser I use other than the android browser on my phone. I don't see it on other sites except for 1, so I don't believe it to be related to my PC or browser.
This is the script itself. The large number in the arguments to the function call is different every time. I don't know enough about anonymous functions to really sort out what is happening here.
<script type="text/javascript">/* <![CDATA[ */Math.random=function(a,c,d,b){return function(){return 300>d++?(a=(1103515245*a+12345)%b,a/b):c()}}(237429089,Math.random,0,1<<21);(function(){function b(){try{if(top.window.location.href==c&&!0!=b.a){var p=document.createElement('a');p.href=c;var len=p.hostname.length;var sep='';var path=p.pathname;if(p.hostname.charAt(len-1)!='/'){sep=(p.pathname.charAt(0)=='/')?'':'/';}else{if(p.pathname.charAt(0)=='/'){path=p.pathname.slice(1);}}c='http%3A%2F%2F'+p.hostname+sep+path+'%2F';var a=-1!=navigator.userAgent.indexOf('MSIE')?new XDomainRequest:new XMLHttpRequest;a.open('GET','http://1.2.3.4/cserver/clientresptime?cid=CID5460105.AID1492092648.TID387&url='+c+'&resptime='+(new Date-d)+'&starttime='+d.valueOf(),!0);a.send(null);b.a=!0}}catch(e){}}var d=new Date,a=window,c=document.location.href,f='undefined';f!=typeof a.attachEvent?a.attachEvent('onload',b):f!=typeof a.addEventListener&&a.addEventListener('load',b,!1)})();/* ]]> */</script>
UPDATE - April 18/17
I have checked this from other PCs and I still get the inserted javascript. Also, I have found several other sites which have this code in their header. They appear to be WP blogs but not all WP blogs that I check have this.
After some questioning in other forums, I was finally able to track down an answer. This is inserted into web pages by the ISP I use (Xplornet here in Canada, Hughesnet in the US) to track response times for their satellite internet acceleration. It only shows up in sites using the http protocol rather than https (they can't inject into those encrypted packets).
I'm not too sure that I'm happy with this since it will use more of my bandwidth for their purposes and they happily charge me when I go over my limit every month. Another great reason to find an new ISP as soon as I can!
But at least it doesn't seem malicious and it's not a problem on my hosting or my PC which is nice.
Thanks to all for helping out with this!
It doesn't appear to be malicious. Contact your hosting provider to find out if they're inserting it and/or try to find out if their other clients have it too.
It's hard to say but it could be a logging, cache or IE detection tool (MSIE stands for Microsoft Internet Explorer). You'll have to ask them though.
Unminified code:
Math.random = function(a, c, d, b) {
return function() {
return 300 > d++ ? (a = (1103515245 * a + 12345) % b, a / b) : c()
}
}(237429089, Math.random, 0, 1 << 21);
(function() {
function b() {
try {
if (top.window.location.href == c && !0 != b.a) {
var p = document.createElement('a');
p.href = c;
var len = p.hostname.length;
var sep = '';
var path = p.pathname;
if (p.hostname.charAt(len - 1) != '/') {
sep = (p.pathname.charAt(0) == '/') ? '' : '/';
} else {
if (p.pathname.charAt(0) == '/') {
path = p.pathname.slice(1);
}
}
c = 'http%3A%2F%2F' + p.hostname + sep + path + '%2F';
var a = -1 != navigator.userAgent.indexOf('MSIE') ? new XDomainRequest : new XMLHttpRequest;
a.open('GET', 'http://1.2.3.4/cserver/clientresptime?cid=CID5460105.AID1492092648.TID387&url=' + c + '&resptime=' + (new Date - d) + '&starttime=' + d.valueOf(), !0);
a.send(null);
b.a = !0
}
} catch (e) {}
}
var d = new Date,
a = window,
c = document.location.href,
f = 'undefined';
f != typeof a.attachEvent ? a.attachEvent('onload', b) : f != typeof a.addEventListener && a.addEventListener('load', b, !1)
})(); /* ]]>

Why won't my extension for Google Chrome work on this one specific page?

I have been working on an extension for Google Chrome for some time now that allows users to share pages to Facebook using a simple browser action icon. It provides some additional functionality, such as allowing users to search through past links, etc. It has worked almost flawlessly up until I attempted to share one particular page (http://imgur.com/gallery/N0s079c) to my personal Facebook account today today. This is very concerning to me for a number of reasons, as it may mean that a similar problem may happen on additional pages and I want to patch the extension before my users run into it. Here's a (somewhat brief) rundown in how my extension shares links:
The user clicks the browser action and clicks "share" from a small menu in the popup. The popup page then sends a message using chrome.runtime.sendMessage() to the event page.
The event page processes the incoming message stream and acts appropriately, calling a function that uses chrome.tabs.query() to get the current tab. It then passes this information on to a function that queries a simple Web SQL database for an exact match of the URL to see if the user has shared it before. If they have, if gives them a basic confirm() dialog before continuing. If they haven't, the link is added to the database before continuing. I've included the code for this section below.
The extension processes the URL and generates a Facebook Feed dialog.
The Facebook Feed dialog redirects the user to a server page that either takes the user back to the link they shared or to the new Facebook post, depending on their settings.
When I attempt to share the link mentioned above, however, the extension doesn't do anything. There are no errors in the console for either the event or popup pages. I'm at a loss as to what may be causing it to fail. The only thing I can think of is that it is caused by some edge case bug in the Web SQL query. The way that it is currently set up, an error in the query would cause the code to simply stop executing. It was a basic SELECT column FROM table WHERE expression query that looks for exact matches, so I didn't feel the need to write any error handling in.
I tested several other links on Imgur to see if it was perhaps an issue specific to that site, but that didn't seem to be the case.
Code for Checking Shared Link History/Adding to History
simpleshare.shareLink.checkHistory = function(result) {
simpleshare.backend.database.transaction(function (tx) {
tx.executeSql('SELECT url FROM history WHERE url=\'' + result[0].url + '\'', [], function(tx, results) {
if(results.rows.length != 0) {
reshare = confirm('It appears that you\'ve already shared (or started to share) this link in the past.');
if (reshare == true) {
simpleshare.shareLink.share(result);
};
};
if(results.rows.length == 0) {
var today = new Date();
var month = today.getMonth();
var day = today.getDate();
var year = today.getFullYear();
if (month == 0) {
var monthAsWord = 'January';
};
if (month == 1) {
var monthAsWord = 'February';
};
if (month == 2) {
var monthAsWord = 'March';
};
if (month == 3) {
var monthAsWord = 'April';
};
if (month == 4) {
var monthAsWord = 'May';
};
if (month == 5) {
var monthAsWord = 'June';
};
if (month == 6) {
var monthAsWord = 'July';
};
if (month == 7) {
var monthAsWord = 'August';
};
if (month == 8) {
var monthAsWord = 'September';
};
if (month == 9) {
var monthAsWord = 'October';
};
if (month == 10) {
var monthAsWord = 'November';
};
if (month == 11) {
var monthAsWord = 'December';
};
var fullDate = monthAsWord + ' ' + day + ', ' + year;
tx.executeSql('INSERT INTO history VALUES (\'' + fullDate + '\', \'' + result[0].title + '\', \'' + result[0].url + '\')', [], function(tx, results) {
simpleshare.shareLink.share(result);
});
};
});
});
};
Heh, good question and this is a bit of a guess based on what you've said but I think I can tell you why it's that one page (and I know this because I've hit similar in the past before).
Your insert query:
INSERT INTO history VALUES (\'' + fullDate + '\', \'' + result[0].title + '\', \'' + result[0].url + '\')
Is going resolve to (for that page):
INSERT INTO history VALUES ('April 5, 2014', 'It's a graduated cylinder', 'http://imgur.com/gallery/N0s079c');
Which isn't valid, and you can see the problem in the syntax highlighting -- the It's, and specifically the single quote there, is ending that string early and making the rest of the query nonsense.
So, yes, this will happen on other pages, in fact an attacker to could guess what was happening and attempt to compromise the database with a cleverly crafted page title.
The lesson here is to sanitize anything you're using as an input to a SQL query. Actually never trust any input and validate/sanitize it on principal anyway.
Second lesson, and one I've failed to learn many times, if something can return an error -- catch it and do something with it.
Hope that helps.

Mixpanel Data Sampling/Event Sampling

I found this code for Google Analytics which lets you analyze just a subset of data for your analytics.
_gaq.push(['_setSampleRate', '80']);
I want to do the same thing with Mixpanel but from what I understand SetSampleRate is a function that is specific to Google Analytics.
How might I do something like this in Mixpanel?
I have browsed their KB & Help articles but haven't found anything that talks about this.
All you have to do is create a Random number from 0 to 100 and check if it's lower than the sample target you have. If it's lower you track it, otherwise you don't.
The way _setSampleRate works in Google Analytics is that it samples by user not by hit. So when you generate the Random number you also have to store it in a cookie so that you can check for further interactions and either track it or not.
In the Example below I created a helper function that checks if the user is in the Sample and handles the cookie logic for me.
function inSample(target) {
var domain_name = 'mysite.com'; // CUSTOMIZE WITH YOUR DOMAIN
var sampleCookie = 'mixpanel_sample='; // COOKIE NAME
var current = document.cookie;
if (current.indexOf(sampleCookie) > -1) {
// Cookie already exists use it
var current = document.cookie.substring(
document.cookie.indexOf(sampleCookie) + sampleCookie.length
);
if (current.indexOf(';') > -1)
current = current.substring(0,current.indexOf(';'));
current = parseInt(current);
} else {
// Cookie not found calculate a random number
current = Math.floor(Math.random()*100)
}
// reset the cookie to expire in 2 years
var two_years = new Date();
two_years.setTime(two_years.getTime() + 2*365*24*60*60*1000);
two_years = two_years.toGMTString();
document.cookie = sampleCookie + current +
'; domain=' + domain_name + '; path=/' +
' ; expires=' + two_years + ';'
return target >= current;
}
Now all you have to do is use this function in order to fire or not the mixPanel tracking Code.
if (inSample(80)) {
// MIXPANEL TRACKING CODE GOES HERE
}
What you have in the end is a report in Mixpanel that only includes 80% of your users.

Improve this search engine detecter with javascript

I have the following code which detects which search engine and what search term has been used:
if (document.referrer.search(/google\.*/i) != -1) {
var start = document.referrer.search(/q=/);
var searchTerms = document.referrer.substring(start + 2);
var end = searchTerms.search(/&/);
end = (end == -1) ? searchTerms.length : end;
searchTerms = searchTerms.substring(0, end);
if (searchTerms.length != 0) {
searchTerms = searchTerms.replace(/\+/g, " ");
searchTerms = unescape(searchTerms);
alert('You have searched: '+searchTerms+' on google');
}
}
That actually works, but unfortunately it doesn't work as expected sometimes.
Sometimes if the referrer was even not google i get an alert with the search term as : ttp://www.domain.com ( without H at the start ) i think that may lead to the bug.
Appreciate any help!
Have you tried leveraging existing JS URL parsing schemes? It might save you a bunch of time. For example:
http://blog.stevenlevithan.com/archives/parseuri
It's cutting the "h" off because q= was not in the referrer string. So your start variable is -1. Then you add 2 to that to get your searchTerms var with a substring. You need to check for start to be equal to -1 and return.
I also think your "google" string detection is not bulletproof, I would rather do something like this...
var ref = document.referrer;
var pcol = ref.indexOf("://") + 3;
if(ref.indexOf("google.com") == pcol || ref.indexOf("www.google.com") == pcol) {
// It is google
}
One last thing, you should use decodeURIComponent instead of unescape.

Categories