I added a new feature to a site recently, which uses AJAX to load Log-in and Registration panels.
After uploading I got tons of Google Crawl Errors, nearly 700!
The error URL doesn't appear anywhere in the source of the page, except as the URL used by a jQuery .load() function.
Could it be that Google is trying to crawl the URL being used by my JavaScript code?
Check out this page from the Google Docs http://support.google.com/webmasters/bin/answer.py?hl=en&answer=174993 It gives you some ideas about ajax and how to prevent the bot from messing with your ajax stuff. Specifically the "What if my site has some hash fragment URLs that should not be crawled?" question.
Related
I made a webpage which can provide some direct downloads. Therefore I only want real human, not crawler, to download my files. I tried to use Google reCAPTCHA but it is part of the webpage - visitors can still use the download links and doesn't have to worry about the reCAPTCHA at all. Is there a way to mandate visitors to pass the verification first? For example, is it possible to pop up reCAPTCHA before the whole page is loaded? If that's doable, how can I do it? Thanks!
What I can recommend here is the captcha form be on the current page that you have and create a new page with the download links that's not indexable.
Upon authorizing the captcha code, use header('Location: download.php'); or something similar to redirect the user.
A captcha before loading a webpage is possible, but it always uses client side code such as javascript which bots can easily bypass.
For whatever reason, that's not important, i'm trying to combine google shopping with another page via an iframe.
I've tried the approach proposed here, consisting of embedding a google custom search query in an iframe, but google custom search does not allow access to the shopping tab.
I figured, if you can't embed Google, embed yourself in it. So I proceeded to inject some jQuery in the page
var jq = document.createElement('script');
jq.src = "https://ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js";
document.getElementsByTagName('head')[0].appendChild(jq);
// ... give time for script to load, then type.
jQuery.noConflict();
clean up the google shopping search results page to what I needed, namely the html inside the div#search
jQuery(function($) {$('#search').show().parentsUntil('body').andSelf().siblings().hide();});
Create an iframe and inject it:
var iframe = document.createElement('iframe')
iframe.src="http://example.com"
iframe.width="100%";
iframe.height="500";
iframe.style="visibility:visible";
document.body.appendChild(iframe)
Only problem is the iframe doesn't load the contents of the page and in turn is blank. If you try the above snippet in any other page, it works. It seems like Google is blocking the iframe from loading. How can I get around that?
Google seems not to work using an iframe... Even if you are not using JS.
What you should use instead is the Google Custom Search API, wich allows you to create a custom search engine.
You just have to enter an example website, change the Option to Search all the web. and remove your entered website again.
To create a custom Search engine you'll need a google account.
Start here.
When I run that code, the following error is reported in my console:
VM259:7 Mixed Content: The page at 'https://www.google.co.uk/?gws_rd=ssl' was loaded over HTTPS, but requested an insecure resource 'http://example.com/'. This request has been blocked; the content must be served over HTTPS.
Changing it to an HTTPS URL:
var iframe = document.createElement('iframe')
iframe.src="https://example.com"
iframe.width="100%";
iframe.height="500";
iframe.style="visibility:visible";
document.body.appendChild(iframe)
… makes it work fine (albeit it tucked behind the logo):
Tnx for #Quentins comment.
UPD:
Embedding code to google website:
In general you can't embed code for page that you don't own.
if user opens your website and open another tab with google or your website opens another tab with google, your website doesn't have access to google website source code/context and you can't affect on google website, because there are completely isolated from each other.
Seems your actions for cleaning results and embedding your iframe in google page you made in your browser console. That changes affect only locally for your browser and doesn't affect for any other users that open google website.
Possible solutions:
Actually, you can embed some code to other pages, but you need to use:
Browser extensions (too complicated, because user need to install your extension for browser)
XSS/other vulnerables (that's almost impossible for google search website)
Embedding google to your page:
You can't embed iframe from google because of x-frame-options header in http response for google.com. There is no good workaround, sorry.
The X-Frame-Options HTTP response header can be used to indicate whether or not a browser should be allowed to render a page in a <frame>, <iframe> or <object>. Sites can use this to avoid clickjacking attacks, by ensuring that their content is not embedded into other sites.
I have a website on which Google Analytics code fires (through Google Tag Manager). The site has a lot of pages and I want to check if Google Analytics code fires on all pages. One way would be to open the URL, open GA debugger and check the pageview firing in the console.
Since there are a lot of URLs which need to be checked, is there a way to automate this process (preferably by Python)
Edit:
What I've tried so far:
I've managed to get the fetch the source code of the pages and then regexing my way to find specific code snippets (of GA and GTM) You can find the code here. But the problem is this will fetch just the static code. Any pixels/codes firing after the page actually loads will not be captured.
The issue with using Selenium is that I will be performing this test for possibly thousands of URLs. Selenium will slow the process down considerably.
I am trying to implement login with Google Plus in my website using JavaScript. My origin URL is http://www.locallylahore.com and my redirect URL is http://www.locallylahore.com/oauth2callback/. I tried removing www from my URLs, but it didn't work.
My website URL is http://locallylahore.com/map.php.
You have to make sure to add http://locallylahore.com to the "JavaScript origins" in the Google Developers Console.
Since your website is reachable with and without www I would keep both variants in there. If you try to sign-in at http://www.locallylahore.com/map.php you will see that you don't get the error since you apparently already have the www-variant correctly in the Javascript origins.
Please be aware that changing the origin can take several minutes before it is active, so trying to sign-in right after changing the setting won't work.
per Google crawling, AJAX and HTML5 , google can crawl dynamic pages that use history api, yet it says that google won't execute any javascript on the page. To me that means that an ajax request and dom building wont be made, so google won't be able to index the contents of the page that is loaded in. can anyone please elaborate?
As written in the answer, you'll need to provide hard links for bots.
Just treat it like a user without JavaScript. You should support users with no JavaScript. Feel free to implement the <noscript> tag.
So linked on the page is a guide by google on how to make your ajax site crawl able by google. Following the mentioned schema you can do it.
www.example.com/ajax.html#!key=value
This way you can tell google crawlers that your site is ajax crawlable and they will do the rest.