confused - html5 history api and google crawler?

confused - html5 history api and google crawler? - javascript

per Google crawling, AJAX and HTML5 , google can crawl dynamic pages that use history api, yet it says that google won't execute any javascript on the page. To me that means that an ajax request and dom building wont be made, so google won't be able to index the contents of the page that is loaded in. can anyone please elaborate?

As written in the answer, you'll need to provide hard links for bots.
Just treat it like a user without JavaScript. You should support users with no JavaScript. Feel free to implement the <noscript> tag.

So linked on the page is a guide by google on how to make your ajax site crawl able by google. Following the mentioned schema you can do it.
www.example.com/ajax.html#!key=value
This way you can tell google crawlers that your site is ajax crawlable and they will do the rest.

Related

How to make Google crawl an AngularJS application fine?

Same websites after using html5mode and remove hashtags from url "you do not need to serve different or pre-rendered content to Google" says.
In Google write ajax crawling documents deprecated. Other website write Google can crawl an AngularJS fine. In old stackoverflow question(s) the solution offers a different way.
If you dont use hashtags you can put _escaped_fragment_ requests end of url to try how to see Google your website.
My AngularJS application uses html5mode and no needs hashtags(ex:www.domain.com/app/page-1). What should I do to be sure Google can crawl my AngularJS application fine ? Could you tell me more detail about crawling(I am not a senior).
Some informations without link because I could not post more than 2 links.
Thank you.

I'm glad to see your question that you have already done quite a good research on AngularJs and Google Crawler, as you already know about most of the stuff there is very little to make sure if the bot is working as expected or not.
Hashbang urls are an ugly stopgap requiring the developer to provide a pre-rendered version of the site at a special location. They still work, but you don't need to use them.
Hashbang URLs look like this:
domain.com/#!path/to/resource
This would be paired with a metatag like this:
<meta name="fragment" content="!">
Google will not index them in this form, but will instead pull a static version of the site from the _escaped_fragments_ URL and index that.
Pushstate URLs look like any ordinary URL:
domain.com/path/to/resource
Making Sure it Works:
Google Webmaster tools now contains a tool which will allow you to fetch a URL as google, and render JavaScript as Google renders it. Link to Googlebot-fetch
AngularJs and Google Crawler Stuff:
1: This is a wonderful article explaining everything in detail about AngularJS SEO
2: Also this question is already been answered by #superluminary in detail, please take a look Use PushState and Precomposition
3: Some Answers more answers from a post I earlier wrote "Link"

Facebook share with one-page application

This question may be not related to exact software stack, framework or language.
For my current project, we are using AngularJS to build the front-end that has a constant entrance page to load real data and render, which is easy for CDN and good for fast loading speed from browser side. But for some social feature, such architect may result in some problem. For example, when you paste your interested link to Facebook portal to share, Facebook will grab your page and show up a preview. If a landing page is empty, such preview won't work.
(I heard that Google+ recently support rendering javascript logic at server side before send back a preview, but obviously it's not a common support for other similar services. Google.com also supports indexing js based one-page application.)
Is there a better solution to solve this problem gracefully rather than fallback to have dynamic page which includes real data? Have I missed something in understanding this problem?
========
... I was even thinking of that, for requests that identified as FB request (like user agent), redirect it to a special gateway that wrapping sth like PhantomJS, fetch the page, render it server-side, and send back a DOM tree snapshot as content for FB to generate preview. But I also doubt that it's a good direction. : (

We are in the same situation. The simple solution is to use Open Graph meta tags in the pages your server will serve to Facebook scrapers.
Basically you need to do server-side what your web app is doing client-side. Amount of work highly depends on your hosting technology (MVC makes it super easy), your URI format and the APIs you use.
You will find some explanations here:
https://developers.facebook.com/docs/plugins/share-button/
Open graph introduction:
http://ogp.me/

700 Google Crawl Errors caused by AJAX?

I added a new feature to a site recently, which uses AJAX to load Log-in and Registration panels.
After uploading I got tons of Google Crawl Errors, nearly 700!
The error URL doesn't appear anywhere in the source of the page, except as the URL used by a jQuery .load() function.
Could it be that Google is trying to crawl the URL being used by my JavaScript code?

Check out this page from the Google Docs http://support.google.com/webmasters/bin/answer.py?hl=en&answer=174993 It gives you some ideas about ajax and how to prevent the bot from messing with your ajax stuff. Specifically the "What if my site has some hash fragment URLs that should not be crawled?" question.

web crawler/spider to fetch ajax based link

I want to create a web crawler/spider to iteratively fetch all the links in the webpage including javascript-based links (ajax), catalog all of the Objects on the page, build and maintain a site hierarchy. My question is:
Which language/technology should be better (to fetch javascript-based links)?
Is there any open source tools there?
Thanks
Brajesh

You can automate the browser. For example, have a look at http://watir.com/

Fetching ajax links is something that even the search-giants haven't accomplished yet. It is because, the ajax links are dynamic and the command and response both vary greatly as per the user's actions. That's probably why, SEF-AJAX (Search Engine Friendly AJAX) is now being developed. It is a technique that makes a website completely indexable to search engines that when visited by a web browser, acts as a web application. For reference, you may check this link: http://nixova.com
No offence but I dont see any way of tracking ajax links. That's where my knowledge ends. :)

you can do it with php, simple_html_dom and java. let the php crawler copy the pages on your local machine or webserver, open it with an java application (jpane or something) mark all text as focused and grab it. send it to your database or where you want to store it. track all a tags or tags with an onclick or mouseover attribute. check what happens when you call it again. if the source html (the document returned from server) size or md5 hash is different you know its an effective link and can grab it. i hope you can understand my bad english :D

Google Analytics to track FireFox extension use

I'm developing a Firefox extension and would like to track its use with google analytics, but I can't get it working.
I've tried manually calling a function from ga.js, but that didn't work for some reason. No error was produced, but neither was any data collected.
My last attempt was to have a website that just holds the tracking javascript and then load it within the extension in an iframe with the URL configured so it contains meaningful data. This way the analytics are getting connected when I visit said webpage with a browser, but not in an extension. I've tried putting some visible javascript on the site and have confirmed the site's javascript is executing. This method also works with other trackers, but I don't like their output and would prefer Google Analytics.
Any ideas what else I could try to accomplish this?

The solution is to use Remy Sharp's mini library for tracking bookmarklets and extensions with Google Analytics. Works like a charm.
Usage is as simple as:
gaTrack('UA-123456', 'yoursite.com', '/js/script.js');
Note that, since it doesn't use cookies, there's no differentiation between pageviews and visits, or for that matter, between visits and visitors. But, the rest of the functionality is fairly reliable.

Depending on what you want to track you may not need Google Analytics. Mozilla's addon.mozilla.org portal already provides comprehensive tracking and usage statistics for addons.
To check if Mozilla provides what you need go to the Statistics Dashboard and choose the statistics for one of the publicly available addons.

Here is a small library to proxy the requests through an iframe hosted on another server: https://github.com/yelloroadie/google_analytics_proxy
This gets around the bug in the add-on sdk that causes ga.js to die (https://bugzilla.mozilla.org/show_bug.cgi?id=785914).
This method allows full use of google analytics, unlike the limited use found in the library by Remy Sharp.

I don't think this is possible. Firefox extensions don't allow you to load pages from other servers. So the only way I can think of is to have an invisible iframe load up the code. The pings to Google's servers need to be from a domain belonging to you. So I guess your own servers have to serve up pages every time a user loads the extension, which just kills your server and defeats the purpose of Google doing all the work!! Please post if you have found a way around it. Chrome extensions can be tracked easily!

For using analytics in the main/background script you might want to use this solution:
https://stackoverflow.com/a/17430194/193017
Citing part of the answer:
I would suggest you take a look at the new Measurement Protocol in Universal Analytics:
https://developers.google.com/analytics/devguides/collection/protocol/v1/
This allows you to use XHR POST to simply send GA events directly.
This will coexist much better with Firefox extensions.
The code would look something like this:

We Keep Coding

JavaScript is the programming language of the Web.

confused - html5 history api and google crawler? - javascript

As written in the answer, you'll need to provide hard links for bots. Just treat it like a user without JavaScript. You should support users with no JavaScript. Feel free to implement the <noscript> tag.

So linked on the page is a guide by google on how to make your ajax site crawl able by google. Following the mentioned schema you can do it. www.example.com/ajax.html#!key=value This way you can tell google crawlers that your site is ajax crawlable and they will do the rest.

Related

How to make Google crawl an AngularJS application fine?

Facebook share with one-page application

700 Google Crawl Errors caused by AJAX?

web crawler/spider to fetch ajax based link

Google Analytics to track FireFox extension use

Categories

Resources