SEO and AngularJS - javascript

SEO and AngularJS - javascript

I am building a web application with Angular JS, I need this to be fully SEO-optimized.
It seems that for a few, google is able to crawl angular app but it is not clear if it can read everything (eg I read that directives cannot be interpreted).
So today, is an angular app fully google-compatible even if we use the full set of JS functions? What about other engines? Do we still need fantomJS static generation for Yahoo or microsoft?

As of today, AngularJS is still not fully SEO-optimized. Thus, a lot of developers still find it necessary to use services like Predender.io or cooking up their own implementation by using PhantomJS
Great strides have been made towards SEO-friendliness, especially when talking about AngularJS 2.0 since it allows you to render HTML on the server side (More info here).
However, even with the new additions it's better to retain control over what the search engines are seeing by creating pre-rendered versions of your site.
Here is a great article about this very topic: https://builtvisible.com/javascript-framework-seo/
To quote some parts of it:
Google can render and crawl JS sites. This, however is no guarantee that the outcome will be an SEO friendly, perfectly optimised site! Technical SEO expertise is needed, especially during testing. This will ensure that you have an SEO friendly site architecture, targeting the right keywords. A site that continues to rank despite the changes made in Google’s crawl.
In my opinion, it’s better to retain tighter control of what’s being rendered by serving Google a pre-rendered, flat version of your site. This way, all the classic rules of SEO apply and it should be easier to detect and diagnose potential SEO issues during testing.
As far as other search engines, it seems like they currently don't support javascript rendering yet: https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a
The technology used on your website can sometimes prevent Bingbot from being able to find your content. Rich media (Flash, JavaScript, etc.) can lead to Bing not being able to crawl through navigation, or not see content embedded in a webpage. To avoid any issue, you should consider implementing a down-level experience which includes the same content elements and links as your rich version does. This will allow anyone (Bingbot) without rich media enabled to see and interact with your website.

Related

Is search engine optimization on a mean.js full stack javascript application still a major issue an how should it be dealt with

I am working on my first full stack javascript application using specifically mean.js as my start point, and I have started to become nervous and somewhat confused around the issue of search engine optimization (SEO).
Has Googles recent efforts (within last year or so), to improve javascript crawling rendered this a non issue, or is this something I need to take account of in the planning and structuring of my project?
If Google can crawl AngularJS/Ajax heavy applications now, why are we getting blog posts about solutions to the SEO issue:
http://blog.meanjs.org/post/78474995741/mean-seo
Is this type of solution necessary.
Is this going to be as effective as server side rendering in terms of SEO.
Are hash bang (#!) urls a necessary evil or just plain evil.
I know questions about SEO and AngularJS have been asked before, but there seems to be so many differing opinions on this issue that I am lost, and it would be nice to have some thoughts that are more mean.js specific. My major worries are:
If building a angularjs heavy implementation will make it an SEO black hole.
If I will end up building practically the whole project again in static files just for SEO
If I need to be looking at a server side rendering solution.

If you are rendering the majority of your content using a JavaScript, then yes, it becomes a search engine black hole. That's one of the big downsides of a thick client application. If you need high visibility by search engines, it's a challenge. There is a middle ground.
You'll need a combination of server side rendering and client side rendering. When the page first loads, it should have all the visible content the user needs, or at least the content that appears "above the fold" (at the top of the page). Links should be descriptive and allow search engines to dive deeper into the site. Your site's main menu should be delivered with the web page as well giving search engines something to bite into.
Content below the fold, or paginated content can be pulled in dynamically and rendered on the client using any JavaScript framework. This gives you a good mix of server side rendering to feed search engines, and the performance boost that pulling content in dynamically can offer.

well you'll need to be worried about the public face of your site you shouldn't be considered anything behind a log-in screen, to me the snapshot with a headless browser approach using the farment_scape seems is the way to go, it's the one that will consume less time and as you looked at mean-seo isn't that hard to implement.
take a look to this question, there are some answers about how to creates links on the pages to be SEO friendly, almost all the recent posts fit together one to each other.
https://support.google.com/webmasters/answer/174992?hl=en
and also try to register to https://webmasters.stackexchange.com/ you'll find more about seo

Just wanted to mention this npm package https://www.npmjs.com/package/mean-seo which uses PhantomJS to render a preview of your app and caches it on the disk/redis for whatever period you set.

Providing no-js fallbacks for clientside MV* frameworks

I just recently got introduced to MV* frameworks and have taken a chance to try out Ember.js with the TodoMVC app tutorial they have on their site.
I was considering using Ember for one of my upcoming projects (a Ruby on Rails CRUD app, similar to Twitter in some of the functionality), but I'm still a bit confused and before I take a final decision I would love it if somebody could clear the following concerns:
Is it a good idea to use such an advanced framework as Ember for a medium-sized multi-page CRUD app? Will it improve development time and maintenance compared to an interactivity layer built with jQuery's DOM manipulation and AJAX capabilities? Or is using Ember (and the like) only good when developing complex single-page apps (e.g.: Grooveshark)?
Considering the app will be developed using Rails, and assuming Ember will be used, is it going to be possible to offer a fallback with basic functionality for browsers with JavaScript disabled and/or for search engine crawlers? Will it require code duplication or other dirty tricks? Do you know of any technique that can be used to achieve it?
Will it be possible to adapt the website for mobile browsing (using only CSS) with valid results, or will the overhead imposed by running Ember on the phone make it hard for the device to render the website in a way that keeps it responsive?

We're in the middle of a pretty big Ember project right now, so here are my thoughts on your questions.
We've found Ember to be really productive for creating rich UIs for our single page app, but I don't know that it's going to be that much more helpful if you're creating an app designed for traditional multi-page (viewing pages, submitting forms, etc) layout.
I think this is the clincher - Ember is completely JS-based, so if you need to support browsers without JS, you'd basically have to write a parallel application. If this is a hard requirement for your app, I think Ember (or any MV* JS framework) would be out of the question
We've had very few performance issues on mobile - our site is fully responsive and renders on everything from Blackberries to the latest Chrome on desktop with good performance.

#Scott Rankin, has addressed most of the concerns with going with the Pure Ember approach. I'll add one quick way to make this decision.
Go with Ember/MVVM if the application is behind a login. Then you don't have to consider search engines, as the content is generally private and not supposed to be indexed.
For SEO you have to build atleast part of your content such that it is indexable. A good example of this is the Discourse application. They use Ember but also send down some generated html along with the app html slugs, so that search engines can index them. You can read about their approach here.

We have a different approach which can be seen as a fall back: We pre-render a static version of each page in the application (daily scheduled task). This static version is stored on the server as HTML file. Whenever we sniff as spider/ robot user agent, we deliver that version.

Creating a web application only using Javascript

One of the design proposed for a web application I am working on suggests using javascript to generate all html content. Basically to create a page in the application you would use a homemade javascript framework to build the dynamic html on the page. The pages of this webpage are very complex and all of the html markup would be generated via javascript using our custom built framework. The framework would essentially need a method to create each element on the page you wanted to produce.
What are some of the pros and cons of this approach?

The biggest cons of it are that you must rely on the end user's client browser to correctly render all of your content. An out of date browser or an untested browser is likely to result in broken content or no content. This is distinct from and more severe than the problems those same browsers encounter with HTML & CSS they cannot correctly render. If the markup is supplied to the browser, it may incorrectly render the CSS, but at least the content will be accessible. Using a script to generate all the markup can easily result in no markup being generated.
Then there are the users who run without JavaScript, or with something like NoScript blocking most scripts. They'll not see any of your content either. Thirdly, your content will not be indexed by most search engines.
Addendum
Relating to developer skill sets, working strictly from a JavaScript framework could tromp on the web development division of labor somewhat, if you have such a division. Unless the framework is able to maintain a good separation between the generation of markup, CSS, & application script, your programmers may find themselves more deeply in the role of designer and content editor than they are accustomed to (if you have a division of labor between those aspects of development).
From the comments below, we learn that this is intended for an intranet application in a controlled browser environment. This moots the end-user testing issues mentioned above to some degree, but there is always a danger of a browser upgrade breaking application code in JavaScript.
I cannot think of any positive outcomes that would outweight the potential negative outcomes (by my own judgement, anyway)

The strategy adopted by serious web sites is to start with basic HTML, then add CSS to tidy the layout, then add script to make the users' life easier by enhancing functionality (and not to add time wasting and annoying animations). That way you always have a fallback if something doesn't work or scripting fails for some reason. I deliberately left out the "add annoying advertisements" since they aren't part of the functional design.
To design the site, you should start by determining what it is that your web site is supposed to do (i.e. the vision). Then set out some goals that achieve that vision. Then chose the most efficient design to deliver on the goals—at this point you should not have yet decided on the implementation technology.
Then choose the most appropriate technology and design based on reliability, maintainability, longevity and supportability. That will lead you to a detailed design and implementation.
If, after all that, the best option is a 100% scripted client, so be it. But the fact that very, very few web sites have chosen that architecture makes me very much doubt that it'll be the winner.

Pure JavaScript-based clients

I would like to know, how powerful/viable are JavaScript only clients based on say, GWT/gxt/vaadin, when compared to DHTML clients such as those made with wicket, tapestry, click etc?
My boss has insisted on using GXT (due to its nice colors and theme) on a project that will most likely become very big with lots of screens. I am against the idea of a javascript only client, especially when the javascript is generated from Java code. I have tried to advice him that we use something like wicket whereby we construct the screens with html but put in ajax where and when neccessary.
How viable is such a JavaScript client? I understand that JavaScript was intended for minor web page enhancements, and not all browsers, especially mobile devices have complete support for JavaScript.

Yes, it is viable for certain applications. Consider Gmail, Google Docs and Google Maps as typical applications where this works, and is probably the most feasible approach.
Some rich UI JavaScript frameworks, such as Ext JS also rely on this technique.

I've built javascript only web apps for ages.
First in SAP projects for big multinationals. And now on a new project:https://beebole-apps.com?demo
So yes it is powerful and viable.

Javascript-only webapp can be extremely powerful, and it's viable for certain applications, say, an Instant-Messenger webapp?
You mentioned that there are lots of screens in your web-app. One of the advantages from GWT/GXT is the fact that you can unit test your UI-layer with JUnit. This is an extra testing you can do on top of, say, Selenium. This is essential if you'd like to make UI testing a part of the continuous integration process, and, as the team grows, you'll definitely want to have tests around to make sure everything works (At least in theory.)
However, if what your boss meant to do is to build an in-house, custom Javascript engine using GWT's JavaScript Native Interface (Link), then I'm not sure...
Another advantage with GWT-like-engine over Wicket is that you can rely on HTML-code-gen to generate standard-compliant (In theory) HTML code. With framework like Wicket, it is hard to ensure every single developer on the team to author good HTML code - Especially when the team gets bigger.

Disclaimer: I'm a member of the Vaadin team.
Our Timeline demo is a good example of what can be achieved with Vaadin and GWT in client side, but I think all of the options presented in this discussions are viable given enough time.
Since you are going to start a big project you should build a simple proof-of-concept app with each of the relevant frameworks. If your PoC includes at least some of the more complex use cases you'll probably can make a pretty informed choice based on the experiences you get while building them.
I urge you to at least evaluate Vaadin. With it you write only server-side Java code and Vaadin will create a slick and professional browser UI for you. Client side can be easily extended using standard GWT (also pure Java), and there are no HTML templates, tag libraries or XML configuration involved at all. A Vaadin UI is fully Ajax'ed and lazy loading out of the box, and it easily integrates with any server side technologies, eg. Spring.
In addition to the development model advantages you get top-notch documentation, a bi-weekly update schedule, a very lively community filled with helpful experts, 100+ useful open source add-ons, and a 10 year old backing company with help on hand should you need it.

how do web crawlers handle javascript

Today a lot of content on Internet is generated using JavaScript (specifically by background AJAX calls). I was wondering how web crawlers like Google handle them. Are they aware of JavaScript? Do they have a built-in JavaScript engine? Or do they simple ignore all JavaScript generated content in the page (I guess quite unlikely). Do people use specific techniques for getting their content indexed which would otherwise be available through background AJAX requests to a normal Internet user?

JavaScript is handled by both Bing and Google crawlers. Yahoo uses the Bing crawler data, so it should be handled as well. I didn't look into other search engines, so if you care about them, you should look them up.
Bing published guidance in March 2014 as to how to create JavaScript-based websites that work with their crawler (mostly related to pushState) that are good practices in general:
Avoid creating broken links with pushState
Avoid creating two different links that link to the same content with pushState
Avoid cloaking. (Here's an article Bing published about their cloaking detection in 2007)
Support browsers (and crawlers) that can't handle pushState.
Google later published guidance in May 2014 as to how to create JavaScript-based websites that work with their crawler, and their recommendations are also recommended:
Don't block the JavaScript (and CSS) in the robots.txt file.
Make sure you can handle the load of the crawlers.
It's a good idea to support browsers and crawlers that can't handle (or users and organizations that won't allow) JavaScript
Tricky JavaScript that relies on arcane or specific features of the language might not work with the crawlers.
If your JavaScript removes content from the page, it might not get indexed.
around.

Most of them don't handle Javascript in any way. (At least, all the major search engines' crawlers don't.)
This is why it's still important to have your site gracefully handle navigation without Javascript.

I have tested this by putting pages on my site only reachable by Javascript and then observing their presence in search indexes.
Pages on my site which were reachable only by Javascript were subsequently indexed by Google.
The content was reached through Javascript with a 'classic' technique or constructing a URL and setting the window.location accordingly.

Precisely what Ben S said. And anyone accessing your site with Lynx won't execute JavaScript either. If your site is intended for general public use, it should generally be usable without JavaScript.
Also, related: if there are pages that you would want a search engine to find, and which would normally arise only from JavaScript, you might consider generating static versions of them, reachable by a crawlable site map, where these static pages use JavaScript to load the current version when hit by a JavaScript-enabled browser (in case a human with a browser follows your site map). The search engine will see the static form of the page, and can index it.

Crawlers doesn't parse Javascript to find out what it does.
They may be built to recognise some classic snippets like onchange="window.location.href=this.options[this.selectedIndex].value;" or onclick="window.location.href='blah.html';", but they don't bother with things like content fetched using AJAX. At least not yet, and content fetched like that will always be secondary anyway.
So, Javascript should be used only for additional functionality. The main content taht you want the crawlers to find should still be plain text in the page and regular links that the crawlers easily can follow.

crawlers can handle javascript or ajax calls if they are using some kind of frameworks like 'htmlunit' or 'selenium'

We Keep Coding

JavaScript is the programming language of the Web.