Use AngularJS on website and still get indexed by search engines

Use AngularJS on website and still get indexed by search engines - javascript

I want to rebuild an old website made on plain HTML and add some extra functionality with AngulaJS. But since I plan to use ng-views to render templates on my main layout, is it possible to make search engines still find the templates of these subpages?

In a general sense, this is not an angular problem - its the same problem with any single page site that uses javascript to generate your html.
The general solution would be to detect when it is a crawler accessing your page instead of a person (usually by using the query agent string), and then use server side logic to render pages that are suitable for the crawler to process.
Here is one article that discusses this problem:
http://www.webdesignerdepot.com/2013/10/how-to-optimize-single-page-sites-for-search-engines/
but google (or searching this site) for "google seo single page app" will give you lots of other ideas.

Related

Multi language on static website with i18next client side

I made a website with HTML, CSS and Javascript without using a CMS, so it's all kind of static. I've been asked to make this project into a multi language website (the current language and English). Now I am looking for a good way to implement all the translations.
I've seen a simple solution with i18next using client-side Javascript, but I'm wondering if this isn't harmful for SEO (the url doesn't change when selecting another language) and if there is a better solution for this.
https://github.com/dwkns/i18next-translation-tutorial

Hidden content may not be read by the search engines. It's better to have two versions of entire site in subfolders.
If you really want to do it with javascript, try to put both content visible in html (users with no javascript should see both) and then with javascript hide one of them after page loading. But i suggest the first approach, it's more realiable.
source about google do this: https://www.freshegg.co.uk/blog/technical-seo/google/how-does-google-treat-hidden-content

No issue with that. Google and other search engines run javascript as user browsers. Just add some links so google could discover the pages in other language (not only have autodetection)
The only drawback you have currently are routes are not translated but that could be solved, eg. https://github.com/i18next/i18next-express-middleware#add-localized-routes

Can Googlebot crawl javascript generated content?

We have a web app that its content generated by javascript. Can google index those pages?
When we investigate this issue we always found solutions from old pages about using "#!" in links.
In our app the links are like this:
domain.com/paris
domain.com/london
When we use these kind of links, javascript populates content.
Is it wise to use HTML snapshot or do you have any other suggestions?

Short answer
Yes they can crawl JavaScript generated content, as long as you are using pushstates.
Detailed answer
It depends on your setup. Google and Bing CAN crawl javascript and AJAX based content if your are using pushstates. If you do they will handle content coming from AJAX calls, updates to page title or meta tags using javascript, and in general any such things.
Most frontend frameworks like Angular, Ember or Backbone already works with pushstates so in these cases you don't need to do anything. Check whatever system you are using to see how they do things. If you are not using pushstates you will need to implement it on your own or use the whole escapted_fragment html snapshot deal.
So if you use pushstate then yes, search engines can crawl your page just fine. If you don't then no, you will need to implement pushstates or do HTML snapshots.
Bonus info - Unfortunately Facebook does not handle pushstates, so the facebook crawler needs either non-dynamic og-tags or HTML snapshots.

"Generated by JavaScript" is ambiguous. That could mean that you are running a JS script on the server or it could mean that you are making an AJAX call with a JS API. The difference appears to matter as far as Googlebot is concerned. But you don't have to take my word for it, as there is empirical proof of what Googlebot will and won't currently cache as far as JavaScript content in the form of live experiments using both the XMLHTTPRequest API and the Fetch API. So, as you can see, server-side rendering is still going to be the best way to go for SEO.

Grabbing HTML from another page

I have two HTML files: One acts as a template, supplying the navigation, sidebars, etc., and the other has the main content. I'm trying to figure out how best to insert the content into the template. I need persistent URLs, so my plan was to have the content page essentially replace itself with the template, plugging the text back into the resulting page. I'm really new to front-end programming and I'm suspicious that this may be an anti-pattern, so my first question is about whether I'm barking up the right tree. The problem seems universal, and I'm sure there must be a best practice, though I haven't yet seen it discussed. If this is an acceptable way to proceed, then what JavaScript function would allow me to access the HTML of two different pages at the same time?
[EDIT: It's a small page on GitHub]

Do not do this. At current implementation HTML is not designed to be template engine. You can use HTML import but it has not full support in browsers. (compatibility table).
Usually this problem can be solved with:
Use frontend framework. Libraries like angular.js or polymer.js (and so on) usually has support of importing HTML documents in different forms.
Build your application HTML. Task runners like grunt.js usually has plugin that includes HTML.
Use server side technologies to extend your HTML from base layouts
If your application have to be consisted from different HTMLs I recommend you to try polymer. It is polyfill for web components and designed to work in such way by default.
UPD:
About edit to your question. It seems like you just need template engine for HTML. You can google for it. I use nunjucks - javascript port of python's template engine jinja2. It is lightweight, simple and can be compiled right in browser.
Another way is to use special tools for building static web pages. You have mentioned that your page is blog build from simple HTML pages. Try to use pelican. It is the static websites (blogs) generator. It is simple and fast. You can build your HTML even on your local machine and just push your HTML pages to github.

Web Development Best Practices - How to Support Javascript Disabled

What is the best thing to do when a user doesn't have JavaScript enabled? What is the best way to deliver content to that kind of user? What is the best way to keep a site readable by search engines?
I can think of two ways to achieve this, but do not know what is better (or if a 3rd option is better):
Rely on the meta-refresh tag to redirect users to a non-javascript version of site. Wrap the meta-refresh tag in a noscript tag so it will be ignored by those with javascript.
Rely on an iframe tag located within the body tag to deliver a non-javascript version of site. Wrap the iframe tag in a a noscript tag so it will be ignored by those with javascript.
I would also appreciate high-profile examples of the correct or incorrect way to do this.
--------- ADDITION TO QUESTION -----------
Here is an example of what I have done in the past to address this: http://photocontest.highpoint.edu/
I want to make sure there aren't better ways to do this.

You are talking about graceful degradation: Designing and making the site to work with javascript, then making the site still work with javascript turned off. The easiest thing to do is include the html "noscript" tag somewhere near the top of your page that gives a message saying that the site REQUIRES javascript or things won't work right. SO is a perfect example of this. Most of the buttons at the top of the screen run via javascript. Turn it off and you get a nice red banner and the drop down js effects are gone.
I prefer progressive enhancement development. Get the site working in it's entirety without javascript / flash / css3 / whatever, THEN enhance it bit by bit (still include the noscript tag) to improve the user experience. This ensures you have a fully working, readable website regardless if you're a disabled user with a screen reader or search engine, whilst providing a good user experience for users with newer browsers.
Bottom line: for any dynamically generated content (for example page elements generated via AJAX) there has to be a static page alternative where this content must be available via a standard link. If you are using javascript for tabbed content, then show all the content in a way that is consistent with the rest of the webpage.
An example is http://www.bbc.co.uk/news/ Turn off javascript and you have a full page of written content, pictures, links etc. Turn on javascript and you get scrolling news stories, tabbed content, scrolling pictures and so on.
I'm going to be naughty and post links to wikipedia:
Progressive Enhancement
Graceful Degredation

You have another option, just load the same page but make it work for noscript users (progressive enhancement/gracefull degradation).
A simple example:
You want to load content into a div with ajax, make an <a> tag linking to the full page with the new content (noscript behavior) and bind the <a> tag with jQuery to intercept clicks and load with ajax (script behavior).
$('a.ajax').click(function(){
var anchor = $(this);
$('#content').load(anchor.attr('href') + ' #content');
return false;
});

I'm not entirely sure if Progressive Enhancement is considered to be best practice these days but it's the approach I personally favour. In this case you write your server side code so that it functions like a standard web 1.0 web app (no JavaScript) to provide at least enough functionality for the system to work without JavaScript. You then start layering JavaScript functionality on top of this to make the system more user friendly. If done properly you should end up with a web app that at least provides enough functionality to be useful for non-JavaScript users.
A related process is known as Graceful Degradation, which works in a similar way but starts with the assumption that a user has JavaScript enabled and build in workarounds for cases where they don/t. This has a drawback, however, in that if you overlook something you can leave a non-JavaScript user without a fallback.
Progressive Enhancement example for a search page: Build your search page so that it normally just returns a HTML page of search results, but also add a flag that can be set via GET that when set, it returns XML or JSON instead. On the search page, include a script that does an AJAX request to the search page with the flag appended onto the query string and then replaces the main content of the page with the result of the AJAX call. JavaScript users get the benefit of AJAX but those without JavaScript still get a workable search page.
http://en.wikipedia.org/wiki/Progressive_enhancement

If your application must have javascript to function then there's nothing you can do except show them a polite message in a noscript tag.
Otherwise, you should be thinking the other way around.
Build your site without JS
Give awesome user experience and make it full functional
Add JS and make the UX even more functional. Layer the JS on top.
So if the user doesn't have JS, your site will still revert to step two of your site state.
As for crawling. If your site depends on AJAX and a lot of JS to work, you can make gogole aware of it : http://code.google.com/web/ajaxcrawling/docs/getting-started.html

One quick tip that may help you: just install lynx, a command-line web browser, and you'll immediately see how google and other seo see your site (and blind people too). This is very useful. Of course, in a command line windows, there's no graphics and javascript is disabled.

If you're doing "serious" Ajax (e.g. client side-routing) the following technique could be useful:
Use Urls without GET/"?"-parameters (it makes your life easier later on)
Use http://baseurl.com/#!/path/to/resource for client side-routing
Implement rendering of non-script HTML-version of your site (HTML snapshot is what Google calls it) at http://baseurl.com/path/to/resource
Wrap the whole content of your HTML snapshot in noscript-tags and redirect via top.location.href to the full version of the site
Handle http://baseurl.com/?_escaped_fragment=/path/to/resource - it should redirect via 301-response to http://baseurl.com/path/to/resource
Use a-tags only for GET-links, use forms for POST/PUT/DELETE-links - unstyle the hell out of them if necessary
A nice example code for links I found while researching "How to write proper Ajax-code":
Resource
This is of course a pretty complex solution but it should enable both SEO (including non-search engine crawlers) and accessibility. The problem is that you have to be able to render your page server- AND client side.
One solution could be to use a templating framework like mustache where implementations for different platforms exist.
Use something like {{#pagelet}}/path/to/partial{{/pagelet}} for dynamic parts of your page - example: {{#pagelet}}/image/{{image_id}}/preview{{/pagelet}}
In your client-side rendering, pagelet would be implemented to be dynamically replaced with something loaded via Ajax (for example: render )
In your server-side rendering, pagelet would just be rendered directly (in doubt just curl the pagelet and render it right away - or if you can write the code asynchronously do it just as you would do it client side: write some temporary span into a buffer, start fetching all the pagelets, replace the temporary spans as the pagelets arrive and flush the buffer once all pagelets have been rendered.
That's the best general design I found so far. You can deep link into your app, it's search engine friendly and it should force you to build a page that gracefully degrades.
P.S.: One advantage of the techniques described above is that both the Ajax- and the "Web 1.0"-rendering of a page could profit from memcached-caching of whole pagelets.

I would prefer to code the page without javascript and then if javascript is enabled, we redirect users to a similar page with javascript. (same concept as progressive enhancement)
redirecting with javascript

SEO for dynamic website powered by JavaScript and AJAX

I had read that SEO is applicable for static website, which holding the information in the initial page itself. Is it possible to get search engines to index the dynamically added information?
I used AJAX for loading information. In this situation how can I optimize a site for search engines?

You have to make all your content accessible without javascript (ie. ajax). Otherwise the search engine spiders cannot index your content.

The proper way to use javascript and Ajax is to first code your pages and delivery content without javascript. All content should show in a logically organized manner. Once this is done you can use JS/Ajax to provide superior usability to the visitors who have JS enabled.
That will benefit all your users, javascript enabled and disabled, and the search engines.

As long as each page has a unique URL (either by url rewriting or by query string parameters) and uses that to drive the content being displayed SEO will work.
I've done this a number of times in the past.

Ensure that your content is accessible to clients without JavaScript. You may have JavaScript on your pages that changes the content based on the URL.

I don't really know about this, but IMHO, using semantic markups and submitting sitemap to Google helps a lot.

You can create a website that has AJAX and is search engine compatible but it must be created such that the same information can be accessed without AJAX through the same URL. Search engine cannot execute Javascript and as such any content only available through Javascript will be inaccessible to the search engine.
You need to either provide this content within the <noscript> tag or within the page by default and have the Javascript hide it for your AJAX version.
You cannot deliver a different page to a search engine such as Google as they will generally crawl a page both as their bot but also masking as a user by sending a user-agent string purporting to be say, Internet Explorer. This is their method for ensuring you're not trying to game the search engines and they're seeing the same content as a regular user.

To solve this problem I have create a sitemap of the site.
For example, in my sitemap I have
www.example.com/level_one/level_two/page1.html,
www.example.com/level_one/level_two/page2.html,
...
So the crawlers (Google, yahoo, Bing, etc) knows what to look for. But when an user goes to www.example.com always use pure AJAX site.
So you need to acces the pages in the sitemap like a static site.
Other way to solve this (more work) is to make page compatible without JavaScript, so if the user can execute JavaScript you rewrite all href to "#" (for example)
Please check: http://www.mattcutts.com/blog/give-each-store-a-url/

SEO ultimately is based on have a good site.
Things that will help you are links from other "good sites", Having descriptive friendly, URLS, page titles and H1 headings
submitting sitemaps to google and using there webmaster tools is a great starting place

We Keep Coding

JavaScript is the programming language of the Web.