We have a web app that its content generated by javascript. Can google index those pages?
When we investigate this issue we always found solutions from old pages about using "#!" in links.
In our app the links are like this:
domain.com/paris
domain.com/london
When we use these kind of links, javascript populates content.
Is it wise to use HTML snapshot or do you have any other suggestions?
Short answer
Yes they can crawl JavaScript generated content, as long as you are using pushstates.
Detailed answer
It depends on your setup. Google and Bing CAN crawl javascript and AJAX based content if your are using pushstates. If you do they will handle content coming from AJAX calls, updates to page title or meta tags using javascript, and in general any such things.
Most frontend frameworks like Angular, Ember or Backbone already works with pushstates so in these cases you don't need to do anything. Check whatever system you are using to see how they do things. If you are not using pushstates you will need to implement it on your own or use the whole escapted_fragment html snapshot deal.
So if you use pushstate then yes, search engines can crawl your page just fine. If you don't then no, you will need to implement pushstates or do HTML snapshots.
Bonus info - Unfortunately Facebook does not handle pushstates, so the facebook crawler needs either non-dynamic og-tags or HTML snapshots.
"Generated by JavaScript" is ambiguous. That could mean that you are running a JS script on the server or it could mean that you are making an AJAX call with a JS API. The difference appears to matter as far as Googlebot is concerned. But you don't have to take my word for it, as there is empirical proof of what Googlebot will and won't currently cache as far as JavaScript content in the form of live experiments using both the XMLHTTPRequest API and the Fetch API. So, as you can see, server-side rendering is still going to be the best way to go for SEO.
Related
I want to rebuild an old website made on plain HTML and add some extra functionality with AngulaJS. But since I plan to use ng-views to render templates on my main layout, is it possible to make search engines still find the templates of these subpages?
In a general sense, this is not an angular problem - its the same problem with any single page site that uses javascript to generate your html.
The general solution would be to detect when it is a crawler accessing your page instead of a person (usually by using the query agent string), and then use server side logic to render pages that are suitable for the crawler to process.
Here is one article that discusses this problem:
http://www.webdesignerdepot.com/2013/10/how-to-optimize-single-page-sites-for-search-engines/
but google (or searching this site) for "google seo single page app" will give you lots of other ideas.
What is the best thing to do when a user doesn't have JavaScript enabled? What is the best way to deliver content to that kind of user? What is the best way to keep a site readable by search engines?
I can think of two ways to achieve this, but do not know what is better (or if a 3rd option is better):
Rely on the meta-refresh tag to redirect users to a non-javascript version of site. Wrap the meta-refresh tag in a noscript tag so it will be ignored by those with javascript.
Rely on an iframe tag located within the body tag to deliver a non-javascript version of site. Wrap the iframe tag in a a noscript tag so it will be ignored by those with javascript.
I would also appreciate high-profile examples of the correct or incorrect way to do this.
--------- ADDITION TO QUESTION -----------
Here is an example of what I have done in the past to address this: http://photocontest.highpoint.edu/
I want to make sure there aren't better ways to do this.
You are talking about graceful degradation: Designing and making the site to work with javascript, then making the site still work with javascript turned off. The easiest thing to do is include the html "noscript" tag somewhere near the top of your page that gives a message saying that the site REQUIRES javascript or things won't work right. SO is a perfect example of this. Most of the buttons at the top of the screen run via javascript. Turn it off and you get a nice red banner and the drop down js effects are gone.
I prefer progressive enhancement development. Get the site working in it's entirety without javascript / flash / css3 / whatever, THEN enhance it bit by bit (still include the noscript tag) to improve the user experience. This ensures you have a fully working, readable website regardless if you're a disabled user with a screen reader or search engine, whilst providing a good user experience for users with newer browsers.
Bottom line: for any dynamically generated content (for example page elements generated via AJAX) there has to be a static page alternative where this content must be available via a standard link. If you are using javascript for tabbed content, then show all the content in a way that is consistent with the rest of the webpage.
An example is http://www.bbc.co.uk/news/ Turn off javascript and you have a full page of written content, pictures, links etc. Turn on javascript and you get scrolling news stories, tabbed content, scrolling pictures and so on.
I'm going to be naughty and post links to wikipedia:
Progressive Enhancement
Graceful Degredation
You have another option, just load the same page but make it work for noscript users (progressive enhancement/gracefull degradation).
A simple example:
You want to load content into a div with ajax, make an <a> tag linking to the full page with the new content (noscript behavior) and bind the <a> tag with jQuery to intercept clicks and load with ajax (script behavior).
$('a.ajax').click(function(){
var anchor = $(this);
$('#content').load(anchor.attr('href') + ' #content');
return false;
});
I'm not entirely sure if Progressive Enhancement is considered to be best practice these days but it's the approach I personally favour. In this case you write your server side code so that it functions like a standard web 1.0 web app (no JavaScript) to provide at least enough functionality for the system to work without JavaScript. You then start layering JavaScript functionality on top of this to make the system more user friendly. If done properly you should end up with a web app that at least provides enough functionality to be useful for non-JavaScript users.
A related process is known as Graceful Degradation, which works in a similar way but starts with the assumption that a user has JavaScript enabled and build in workarounds for cases where they don/t. This has a drawback, however, in that if you overlook something you can leave a non-JavaScript user without a fallback.
Progressive Enhancement example for a search page: Build your search page so that it normally just returns a HTML page of search results, but also add a flag that can be set via GET that when set, it returns XML or JSON instead. On the search page, include a script that does an AJAX request to the search page with the flag appended onto the query string and then replaces the main content of the page with the result of the AJAX call. JavaScript users get the benefit of AJAX but those without JavaScript still get a workable search page.
http://en.wikipedia.org/wiki/Progressive_enhancement
If your application must have javascript to function then there's nothing you can do except show them a polite message in a noscript tag.
Otherwise, you should be thinking the other way around.
Build your site without JS
Give awesome user experience and make it full functional
Add JS and make the UX even more functional. Layer the JS on top.
So if the user doesn't have JS, your site will still revert to step two of your site state.
As for crawling. If your site depends on AJAX and a lot of JS to work, you can make gogole aware of it : http://code.google.com/web/ajaxcrawling/docs/getting-started.html
One quick tip that may help you: just install lynx, a command-line web browser, and you'll immediately see how google and other seo see your site (and blind people too). This is very useful. Of course, in a command line windows, there's no graphics and javascript is disabled.
If you're doing "serious" Ajax (e.g. client side-routing) the following technique could be useful:
Use Urls without GET/"?"-parameters (it makes your life easier later on)
Use http://baseurl.com/#!/path/to/resource for client side-routing
Implement rendering of non-script HTML-version of your site (HTML snapshot is what Google calls it) at http://baseurl.com/path/to/resource
Wrap the whole content of your HTML snapshot in noscript-tags and redirect via top.location.href to the full version of the site
Handle http://baseurl.com/?_escaped_fragment=/path/to/resource - it should redirect via 301-response to http://baseurl.com/path/to/resource
Use a-tags only for GET-links, use forms for POST/PUT/DELETE-links - unstyle the hell out of them if necessary
A nice example code for links I found while researching "How to write proper Ajax-code":
Resource
This is of course a pretty complex solution but it should enable both SEO (including non-search engine crawlers) and accessibility. The problem is that you have to be able to render your page server- AND client side.
One solution could be to use a templating framework like mustache where implementations for different platforms exist.
Use something like {{#pagelet}}/path/to/partial{{/pagelet}} for dynamic parts of your page - example: {{#pagelet}}/image/{{image_id}}/preview{{/pagelet}}
In your client-side rendering, pagelet would be implemented to be dynamically replaced with something loaded via Ajax (for example: render )
In your server-side rendering, pagelet would just be rendered directly (in doubt just curl the pagelet and render it right away - or if you can write the code asynchronously do it just as you would do it client side: write some temporary span into a buffer, start fetching all the pagelets, replace the temporary spans as the pagelets arrive and flush the buffer once all pagelets have been rendered.
That's the best general design I found so far. You can deep link into your app, it's search engine friendly and it should force you to build a page that gracefully degrades.
P.S.: One advantage of the techniques described above is that both the Ajax- and the "Web 1.0"-rendering of a page could profit from memcached-caching of whole pagelets.
I would prefer to code the page without javascript and then if javascript is enabled, we redirect users to a similar page with javascript. (same concept as progressive enhancement)
redirecting with javascript
I noticed that like Google Email, FB's source code shows nothing but Javascript. Why do they use JS to write the page?
this allows them to render pages extremely fast. They just load some javascript to render everything on the screen and then load the rest.
They name it BigPipe. You can read more here http://www.facebook.com/note.php?note_id=389414033919
pretty interesting reading.
Because their pages are extremely dynamic; most of the content has to be constructed dynamically.
All their content is populated using AJAX giving it a dynamic and desktop-ish look and feel (aka the instant messaging features)
Because AJAX (Asynchronized JavaScript and XML), provides dynamic feature to webpages, or websites, with this multiple parts of single page can work or can load simultaneously, so it provide great flexibilty and speed to loaading and working of pages
I've created a pretty basic system here at work that does what Google analytics does (extremely simplistic in comparison) and it works quite well, but like Google Analytics it requires each page to reference a JavaScript file. Is there any way to make all of our pages that are served from IIS reference this Javascript file? I would like to capture these stats for every page.
Any ideas?
Thanks
Hmm, it looks like you are looking for this.
If you're dealing with static HTML files your best bet seems to be this previous question.
If you have an ASP site going, and you already have a header or layout file, I'd recommend putting it in there.
This depends on how you build your web site, but most people do this by adding the reference to their templates, layouts, master pages, or whatever term is used in your development platform.
You don't want every page tracked, e.g., pages returning data such as JSON or XML should not be meddled with. This is why it is better to have explicit control over which pages get the analytics javascript added to them.
I had read that SEO is applicable for static website, which holding the information in the initial page itself. Is it possible to get search engines to index the dynamically added information?
I used AJAX for loading information. In this situation how can I optimize a site for search engines?
You have to make all your content accessible without javascript (ie. ajax). Otherwise the search engine spiders cannot index your content.
The proper way to use javascript and Ajax is to first code your pages and delivery content without javascript. All content should show in a logically organized manner. Once this is done you can use JS/Ajax to provide superior usability to the visitors who have JS enabled.
That will benefit all your users, javascript enabled and disabled, and the search engines.
As long as each page has a unique URL (either by url rewriting or by query string parameters) and uses that to drive the content being displayed SEO will work.
I've done this a number of times in the past.
Ensure that your content is accessible to clients without JavaScript. You may have JavaScript on your pages that changes the content based on the URL.
I don't really know about this, but IMHO, using semantic markups and submitting sitemap to Google helps a lot.
You can create a website that has AJAX and is search engine compatible but it must be created such that the same information can be accessed without AJAX through the same URL. Search engine cannot execute Javascript and as such any content only available through Javascript will be inaccessible to the search engine.
You need to either provide this content within the <noscript> tag or within the page by default and have the Javascript hide it for your AJAX version.
You cannot deliver a different page to a search engine such as Google as they will generally crawl a page both as their bot but also masking as a user by sending a user-agent string purporting to be say, Internet Explorer. This is their method for ensuring you're not trying to game the search engines and they're seeing the same content as a regular user.
To solve this problem I have create a sitemap of the site.
For example, in my sitemap I have
www.example.com/level_one/level_two/page1.html,
www.example.com/level_one/level_two/page2.html,
...
So the crawlers (Google, yahoo, Bing, etc) knows what to look for. But when an user goes to www.example.com always use pure AJAX site.
So you need to acces the pages in the sitemap like a static site.
Other way to solve this (more work) is to make page compatible without JavaScript, so if the user can execute JavaScript you rewrite all href to "#" (for example)
Please check: http://www.mattcutts.com/blog/give-each-store-a-url/
SEO ultimately is based on have a good site.
Things that will help you are links from other "good sites", Having descriptive friendly, URLS, page titles and H1 headings
submitting sitemaps to google and using there webmaster tools is a great starting place