AngularJS SEO for static webpages (S3 CDN)

AngularJS SEO for static webpages (S3 CDN) - javascript

I've been looking into ways to improve SEO for angularJS apps that are hosted on a CDN like Amazon S3 (i.e. simple storage with no backend). Most of the solutions out there, PhantomJS, prerender.io, seo.js etc., rely on a backend to recognise the ?_escaped_fragment_ url that the crawler generates and then fetch the relevant page from elsewhere. Even grunt-html-snapshot ultimately needs you to do this, even though you generate the snapshot pages ahead of time.
This solution is basically relying on using cloudflare as a reverse proxy, which seems a bit of a waste given that most of the security apparatus etc. that their service provides is totally redundant for a static site. Setting up a reverse proxy myself as suggested here also seems problematic given that it would require either i) routing all AngularJS apps I need static html for through one proxy server which would potentially hamper performance or ii) setting up a separate proxy server for each app, at which point I may as well set up a backend, which isn't affordable at the scale I am working.
Is there anyway of doing this, or are statically hosted AngularJS apps with great SEO basically impossible until google updates their crawlers?
Reposted on webmasters following John Conde's comments.

Actually this is a task that is indeed very troublesome, but I have managed to get SEO working nicely with my AngularJS SPA site (hosted on AWS S3) at http://www.jobbies.co/. The main idea is to pre-generate and populate the content into the HTML. The templates will still be loaded when the page loads and the pre-rendered content will be replaced.
You can read more about my solution at http://www.ericluwj.com/2015/11/17/seo-for-angularjs-on-s3.html, but do note that there are a lot of conditions.

Here is a full overview of how to make your app SEO-friendly on a storage service such as S3, with nice urls (no #) and everything with grunt with the simple command to be performed after build:
grunt seo
It's still a puzzle of workarounds, but it's working and it's the best you can do. Thank you to #ericluwj and his blogpost who inspired me.
Overview
The goal & url structure
The goal is to create 1 html file per state in your angular app. The only major assumption is that you remove the '#' from your url by using html5history (which you should do !) and that all your paths are absolute or using angular states. There are plenty of posts explaining how to do so.
Urls end with a trailing slash like this
http://yourdomain.com/page1/
Personally I made sure that http://yourdomain.com/page1 (no trailing slash) also reaches its destination, but that's off topic here. I also made sure that every language has a different state and a different url.
The SEO logic
Our goal is that when someone reaches your website through an http request:
If it's a search engine crawler: keep him on the page which contains the required html. The page also contains angular logic (eg to start your app) but the crawler cannot read that so he is intentionally stuck with the html you served him and will index that.
For normal humans and intelligent machines : make sure angular gets activated, erase the generated html and start your app normally
The grunt tasks
Here we go with the grunt tasks:
//grunt plugins you will need:
grunt.loadNpmTasks('grunt-prerender');
grunt.loadNpmTasks('grunt-replace');
grunt.loadNpmTasks('grunt-wait');
grunt.loadNpmTasks('grunt-aws-s3');
//The grunt tasks in the right order
grunt.registerTask('seo', 'First launch server, then prerender and replace', function (target) {
grunt.task.run([
'concurrent:seo' //Step 1: in parrallel launch server, then perform so-called seotasks
]);
});
grunt.registerTask('seotasks', [
'http', //This is an API call to get all pages on my website. Skipping this step in this tutorial.
'wait', // wait 1.5 sec to make sure that server is launched
'prerender', //Step 2: create a snapshot of your website
'replace', //Step 3: clean the mess
'sitemap', //Create a sitemap of your production environment
'aws_s3:dev' //Step 4: upload
]);
Step 1: Launch local server with concurrent:seo
We first need to launch a local server (like grunt serve) so that we can take snapshots of our website.
//grunt config
concurrent: {
seo: [
'connect:dist:keepalive', //Launching a server and keeping it alive
'seotasks' //now that we have a running server we can launch the SEO tasks
]
}
Step 2: Create a snapshot of your website with grunt prerender
The grunt-prerender plugins allows you to take a snapshot of any website using PhantomJS. In our case we want to take a snapshot of all pages of the localhost website we just launched.
//grunt config
prerender: {
options: {
sitePath: 'http://localhost:9001', //points to the url of the server you just launched. You can also make it point to your production website.
//As you can see the source urls allow for multiple languages provided you have different states for different languages (see note below for that)
urls: ['/', '/projects/', '/portal/','/en/', '/projects/en/', '/portal/en/','/fr/', '/projects/fr/', '/portal/fr/'],//this var can be dynamically updated, which is done in my case in the callback of the http task
hashed: true,
dest: 'dist/SEO/',//where your static html files will be stored
timeout:5000,
interval:5000, //taking a snapshot of how the page looks like after 5 seconds.
phantomScript:'basic',
limit:7 //# pages processed simultaneously
}
}
Step 3: Clean the mess with grunt replace
If you open the pre-rendered files, they will work for crawlers, but not for humans. For humans using chrome, your directives will load twice. Therefore you need to redirect intelligent browsers to your home page before angular gets activated (i.e., right after head).
//Add the script tag to redirect if we're not a search bot
replace: {
dist: {
options: {
patterns: [
{
match: '<head>',
//redirect to a clean page if not a bot (to your index.html at the root basically).
replacement: '<head><script>if(!/bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent)) { document.location = "/#" + window.location.pathname; }</script>'
//note: your hashbang (#) will still work.
}
],
usePrefix: false
},
files: [
{expand: true, flatten: false, src: ['dist/SEO/*/**/*.html'], dest: ''}
]
}
Also make sure you have this code in your index.html on your ui-view element, which clears all the generated html directives BEFORE angular starts.
<div ui-view autoscroll="true" id="ui-view"></div>
<!-- this script is needed to clear ui-view BEFORE angular starts to remove the static html that has been generated for search engines who cannot read angular -->
<script>
if(!/bot|googlebot|crawler|spider|robot|crawling/i.test( navigator.userAgent)) { document.getElementById('ui-view').innerHTML = ""; }
</script>
Step 4: Upload to aws
You first upload your dist folder which contains your build. Then you overwrite it with the files you prerendered and updated.
aws_s3: {
options: {
accessKeyId: "<%= aws.accessKeyId %>", // Use the variables
secretAccessKey: "<%= aws.secret %>", // You can also use env variables
region: 'eu-west-1',
uploadConcurrency: 5, // 5 simultaneous uploads
},
dev: {
options: {
bucket: 'xxxxxxxx'
},
files: [
{expand: true, cwd: 'dist/', src: ['**'], exclude: 'SEO/**', dest: '', differential: true},
{expand: true, cwd: 'dist/SEO/', src: ['**'], dest: '', differential: true},
]
}
}
That's it, you have your solution ! Both humans and bots will be able to read your web-app

if you use ng-cloak in interesting ways there could be a good solution.
I haven't tried this myself, but it should work in theory
The solution is highly dependent on CSS, but it should perfectly well.
For example you have three states in your angular app:
- index (pathname : #/)
- about (pathname : #/about)
- contact (pathname : #/contact)
The base case for index can be added in too, but will be tricky so I'll leave it out for now.
Make your HTML look like this:
<body>
<div ng-app="myApp" ng-cloak>
<!-- Your whole angular app goes here... -->
</div>
<div class="static">
<div id="about class="static-other">
<!-- Your whole about content here... -->
</div>
<div id="contact" class="static-other">
<!-- Your whole contact content here... -->
</div>
<div id="index" class="static-main">
<!-- Your whole index content here... -->
</div>
</div>
</body>
(It's Important that you put your index case last, if you want to make it more awesome)
Next Make your CSS look something like this:-
[ng-cloak], .static { display: none; }
[ng-cloak] ~ .static { display: block; }
Just that will probably work well enough for you anyway.
The mg-cloak directive will keep your angular app hidden when angular is not loaded and will show your static content instead. Google will get your static content in the HTML.
As a bonus end-users can also see well styles static content while angular loads.
You can then get more creative if you start using :target pseudo selectors in your CSS. You can use actual links in your Static content, but just make them links to various ids. So in #index div make sure you have links to #about and #contact. Note the missing '/' in the links. HTML id's can't start with a slash.
Then make your CSS look like this:
[ng-cloak], .static { display: none; }
[ng-cloak] ~ .static { display: block; }
.static-other {display: none;}
.static-other:target {display: block;}
.static-other:target ~ .static-main {display: none;}
You now have a full functioning static app WITH ROUTINg that works before angular starts-up.
As an additional bonus, when angular starts up it is smart enough to convert #about to #/about automatically, and the experience shouldn't even break at all.
Also, not to forget the SEO problem has totally been solved, of course. I've not used this technique yet, as I've always had a server to configure, but I'm very interested in how this works out for you.
Hope this helps.

As AWS is offering Lambda#Edge as a service we can handle this issue without grunt or anything else. (Atleast for basic stuff)
I tried Lambda#Edge and it worked as expected, in my case I just had all the routes set to "/" in Lambda#Edge (Except for the files are present in s3 like css, images etc).
The event for the Lambda that I set to is "viewerRequest" and following is the code.
'use strict';
exports.handler = (event, context, callback) => {
console.log("Event received is", JSON.stringify(event));
console.log("Context received is", context);
const request = event.Records[0].cf.request;
if (request.uri.endsWith(".rt")) {
console.log("URI is matching with .rt, the URI is ", request.uri);
request.uri = "/";
} else {
console.log("URI is not ending with rt so letting it go URI is", request.uri);
}
console.log("Final request URI is", request.uri);
callback(null, request);
};
Logs in the cloudwatch are little difficult to check as the logs are populated in the region of the cloudwatch which is nearer to the edge location which is handling the request.
For ex. Though this Lambda is deployed/written for us-east I see this in ap-south region as I am accessing the cloudfront from Singapore.
Checked it in Google webmaster tools 'Fetch as google' options and the page is being rendered and viewed as expected.

I've been looking for days to find a solution for this. As far as I know there isn't nice solution to the problem. I hope firebase will eventually enable user-agent redirects. If you have the money you could use MaxCDN enterprise. They offer Edge Rules which includes redirects by user agent .
https://www.maxcdn.com/features/rules/

Related

how to create an offline web app using javascript

I am looking for a solution on how to create an offline compatible web app using html, JavaScript, and maybe jQuery. I looked into service workers, but they aren’t comparable with all mobile devices yet. I also looked at the manifest file thing, it worked but it didn’t update the files. So now I’m here asking for a solution. I intend this application to be a music website that can be a web app. I like music and i take it everywhere so I’m trying to find out how i can save the website files for offline use so even if I don’t have WiFi, i can listen to my saved music. btw the files I’d like to save are:
main.js
Main.css
Index.html
EDIT 1
Also, if you know how to properly use service workers, can you show an example?

For future reference:
1/ Create a service worker file in the app root folder.
Example sw.js:
let cacheName = "core" // Whatever name
// Pass all assets here
// This example use a folder named «/core» in the root folder
// It is mandatory to add an icon (Important for mobile users)
let filesToCache = [
"/",
"/index.html",
"/core/app.css",
"/core/main.js",
"/core/otherlib.js",
"/core/favicon.ico"
]
self.addEventListener("install", function(e) {
e.waitUntil(
caches.open(cacheName).then(function(cache) {
return cache.addAll(filesToCache)
})
)
})
self.addEventListener("fetch", function(e) {
e.respondWith(
caches.match(e.request).then(function(response) {
return response || fetch(e.request)
})
)
})
2/ Add an onload event anywhere in the app:
window.onload = () => {
"use strict";
if ("serviceWorker" in navigator && document.URL.split(":")[0] !== "file") {
navigator.serviceWorker.register("./sw.js");
}
}
3/ Create a manifest.json file in the app root folder.
{
"name": "APP",
"short_name": "App",
"lang": "en-US",
"start_url": "/index.html",
"display": "standalone"
}
4/ Test
Start a web server from the root folder:
php -S localhost:8090
Visit http://localhost:8090 one time.
Stop the web server with Ctrl + c.
Refresh http://localhost:8090, the page should respond.
To switch off when developing, remove the onload event, and in Firefox
visit about:debugging#workers to unregister the service.
Newest versions of Firefox are showing an application tab directly in the debugger instead. about:debugging#workers is not valid any more.
https://developer.mozilla.org/en-US/docs/Tools/Application/Service_workers
Source for more details
Manifest.json reference

If you need to save settings after the user left, you need to use cookies.
If you need some server data (and ajax requests for example), I'm afraid you can't do that offline.
For everything else (as far as I know), if you want it to work offline, you have to make the user's browser download all code it's going to use, including JQuery, Bootstrap, or any plugin code you want. You have to add them to your website sources and link them internally :
<script src="http://code.jquery.com/jquery-3-3-0-min.js"></script> <!-- Won't work offline.-->
<script src="./js/jquery-3-3-0-min.js"></script> <!-- Will work offline -->
Be careful about plugin dependencies ! For example Bootstrap 3.3.6 JS plugin needs JQuery 1.12.4
Hope it helps you !

Service worker fallback takes over React SSR

On online mode, the service worker api fallback takes over the server side rendered page. When I first load the /about page for example, looking in the source code I have the home page rendered (because I defined / as navigateFallback in the sw-precache options):
I want this behavior only when I'm running on offline mode.
Here are the options I use:
{
cacheId: pkg.name,
dontCacheBustUrlsMatching: /./,
dynamicUrlToDependencies: {
'/': [ resolve(__dirname, '../server/views/index.ejs') ]
},
navigateFallback: '/',
staticFileGlobs: [
`${publicDir}/{bundle,vendor}.*.{js,css,gz}`,
`${publicDir}/manifest.json`
],
stripPrefix: publicDir,
runtimeCaching: [{
urlPattern: /api/,
handler: 'networkFirst'
}]
}

The expected use case of navigateFallback in sw-precache is to provide a static HTML document that could be served using a cache-first strategy immediately, without having to go against the network. This ensures that requests for any URL on your site can be fulfilled with something almost instantly, instead of having to wait an indeterminate amount of time for a response from the network. This presupposes that you have a generic HTML page (referred to as an App Shell) that could be used to fulfill any navigation requests, and which knows how to implement client-side logic to examine the current URL and dynamically insert the appropriate content. If you already have code that supports Single Page App-style navigations, then that's usually sufficient.
You can read more about this architectural pattern, and there's also a full example of a React-based PWA that uses SSR for the initial navigation (or for clients that don't support service workers) and then upgrades to the navigateFallback behavior once the SW work is installed.
If you'd prefer not to use this model, and instead want to always go against the network for your SSR version of each page, and only use a cached page when the network request fails, then the navigateFallback option is not the right choice. You can find some examples of offline fallbacks around the web.

Shopify App - Using Script Tags with Ruby on Rails Application

I'm trying to familiarize myself with the concept of using script tags. I'm making a ruby on rails app that does something as simple as alert "Hi" when a customer visits a page. I am testing this public app on a local server and I have the shopify_app gem installed. The app has been authenticated and I have access to the store's data. I've viewed the Shopify API documentation on using script tags and I've looked at the Shopify Embedded App example that Shopify has on GitHub. The documentation details the properties of a script tag and gives examples of script tags with their properties defined, but doesn't say anything about where to place the script tag in an application, or how to configure an environment so that the js file in the script tag will go through.
I've discovered that a js file being added with a script tag will only work if the js file is hosted online, so I've uploaded the js file to google drive. I have the code for the script tag in the index action of my HomeController (the default page for the app). This is the code I'm using:
def index
if response = request.env['omniauth.auth']
sess = ShopifyAPI::Session.new(params[:shop], response[:credentials][:token])
session[:shopify] = sess
ShopifyAPI::Base.activate_session(sess)
ShopifyAPI::ScriptTag.create(
:event => "onload",
:src => "https://drive.google.com/..."
)
end
I think the problem may be tied to the request.env. The response is not being read as request.env[omniauth.auth] and I believe that the response coming back as valid may be required for the script tag to go through.
The method that I tried above is from the 2nd answer given in this topic: How to develop rails app for shopify with ScriptTags.
The first answer suggested using this code:
ShopifyAPI::Base.site = token
s = ShopifyAPI::ScriptTag.create(:events => "onload",:src => "your javascript url")
However, it doesn't say where to place both lines of code in a rails application. I tried putting the second line in a js file in my rails application, but it did not work.
I don't know if I'm encountering problems because I'm running the app on a local server or if there is something missing from the configuration of my application.
I'd appreciate it if anyone could point me in the right direction.

Try putting something like this in config/initializers/shopify_app.rb
ShopifyApp.configure do |config|
config.api_key = "xxx-xxxx-xxx-xxx"
config.secret = "xxx-xxxx-xxx-xxx"
config.scope = "read_orders, read_products"
config.embedded_app = true
config.scripttags = [
{event:'onload', src: 'https://yourdomain.herokuapp.com/javascripts/yourjs.js'}
]
end
Yes, you are correct that you'll need the js file you want to include for your script tag publicly available - if you are using localhost for development look into ngrok.
Do yourself the favor of ensuring your callbacks use SSL when interacting with the Shopify API (i.e. configure your app with https://localhost/ as a callback setting in the Shopify app settings). I went through the trouble of configuring thin as the web server locally with a self-signed SSL certificate.
With a proper set up you should be able to debug why the response is failing the omniauth check.
I'm new to the Shopify API(s), but not Rails. Their documentation leaves a lot to be desired.
Good luck to you sir,

Angular routing behaving strange/differently with different browsers

I just finished coding a new web project using AngularJS and Bootstrap. The development took place using Brackets, an editing tool that launches Chrome for testing while functioning as the web server.
So far, everything works as required both when Brackets is used as the server as well as when the whole project is deployed within a Tomcat installation, and this as long as the browser being used is Chrome and the machine is a Windows 10 computer.
Now, I started testing the project using different browsers and devices (e.g tablets, mobiles, etc.) and, oops! I am getting crashes all the time.
It would appear that the first (and perhaps central) issue is coming from the way I implemented and use Angular's routing services (or, at least, this is what is suggested by several posts I found). Two things are happening (depending on the browser and the action triggered) pointing in that direction:
I received many times the error infdig, meaning that there is an infinite reload loop somewhere,
When the user successfully logs into the the system, an object containing the user's details is stored as a $rootScope object, and when a $window.location.href command is used to move to other page, all the user information previously stored disapears (strangely, this s not happening with Chrome, but it is with IE and Edge!).
Unfortunately, I was unable to fully understand what is the proper way of using the routing services.
The structure of the project is:
[MyApp] -- This is the folder containing the whole project under TOMCAT's "webapps" folder
index.html
index.js -- Contails the controller related ot the index.html page
[pages] -- Sub-folder hosting all the pages of the project except for the `index.html`
page1.html
page2.html
:
:
[js] -- Sub-folder hosting the controllers of each and every sub-page
page1.js -- Sub-page controller
page2.js -- Sub-page controller
:
:
Transition to the sub-pages (e.g. page1.html, etc.) takes place using the command $window.location.href = "#page1.html";, and the routing service is defined:
$routeProvider
:
:
.when('page1.html', {
templateUrl: '#/pages/page1.html',
controller: 'Page1Controller'
}
.when('page2.html', {
templateUrl: '#/pages/page2.html',
controller: 'Page2Controller'
}
:
:
Based on some posts related to how to define routing, I also tried:
.when('page1.html', {
templateUrl: 'pages/page1.html',
controller: 'Page1Controller'
}
and
.when('page1.html', {
templateUrl: '/pages/page1.html',
controller: 'Page1Controller'
}
getting errors in both cases (e.g. page not found).
Additionally, it is not clear to me what is the impact of including the statement $locationProvider.html5Mode(true); (when including this, I got an injection error).
How can I properly use this Angular routing service, and how can I set HTML5 mode?

Routing params: the way I've done it and it works for me and its simple is using the same route function I showed before:
Then if you look at 'searchresult/:searchCriteria' :search criteria is already a parameter that I am putting in a JavaScript variable called sys (i.e at the beginning of my JavaScript I declare variable var sys = null;).
Then on the SearchResult Controller I put the value of sys inside a $scope variable let's say $scope.sys = sys;. This gives you the variable both in the scope and in JavaScript if you want to check the values in developer tools console and/or play with them.
To call the pafe http://url/#searchresult/myvalue
Just like before call $location.path("/searchresult/myvalue")
like this you can create a path with many arguments (i.e "/searchresult/myvalue1/myvalue2/myvalue3") and they all will be stored in the variable sys
PS: if you want to change your whole url use window.location.replace('new url') without any $. The difference between this routing and the Angular is that this will refresh the page while angular will only refresh your ng-view

Regarding the page not found issue make sure that templateUrl: 'pages/page2.html' has the same path as in the actual folders
- capital letters
- the s in "pages" is also present in the pages folder
Also make sure that the permission are ok such that your application is not getting denied access to the file. I don't know what OS you are using but make sure your application can access it
Regarding the loop error, to help I would need to see a bit more code, but if you open the application in Chrome and see the error in the developer tools it may give you a hint as of where your application is crashing. The other approach is to start commenting part of the application until you don't get the error to find the problematic line then analyze.
This is an example of a controller I use without problems:

Is SpineJS not fit for multi page web site?

I'm newbie on using SpineJS and having happy time with it.
And, when I finished contact examples and saw some other components in SpineJS,
I realized there's no example about Web Site(which has many html pages).
It seems like SpineJS is not proper framework for web site design.
(I think this kind of framework is proper for Single Page Application)
I thought like that because I should create 'websocket' object in the first view of my web site.
I cannot keep the 'websocket' object when I leave first view( html page changed.).
I should keep this 'websocket' for whole time until user logs out.
Is it right? or are there ways that I can create multi view web site?
(AngularJS framework support this with $route service.
- it can load html page without reloading whole framework.)

You can certainly implement multi-page websites within a single-page RIA. Ok, that sounds paradoxical. From the server side, it's rendering a single page, serving the source code. But within the client code, the Router object may render the page completely differently based on the route.
Edit / Addition:
Not sure if this is best, but here's how my app loads templates stored in separate html files within the app source code, e.g. myview.template = app.TemplateManager.fetch('grids/item');
templateManager: {
JST : {}, // hash table so not to load same template twice
fetch: function(path) {
url = "/app/templates" + path + ".html";
if (!this.JST[path]) {
$.ajax({ url: url, async: false }).then(function(contents) {
this.JST[path] = _.template(contents);
});
}
return this.JST[path];
}
});

We Keep Coding

JavaScript is the programming language of the Web.

AngularJS SEO for static webpages (S3 CDN) - javascript

Related

how to create an offline web app using javascript

Service worker fallback takes over React SSR

Shopify App - Using Script Tags with Ruby on Rails Application

Angular routing behaving strange/differently with different browsers

Is SpineJS not fit for multi page web site?

Categories

Resources