Single Page Application Web crawlers and SEO - javascript

I have created my blog as a single page application using mithril framework on the front end. To make queries I've used a rest API and Django at the backend. Since everything is rendered using javascript code and when the crawlers hit my blog all they see is an empty page. And to add to that whenever I share a post on social media for instance all Facebook sees is just an empty page and not the post content and title.
I was thinking of looking at the user agents and whenever the USER-AGENT is from a crawler I would feed it the rendered version of the pages but I'm having problems implementing the above method described.
What is the best practice to create a single page app that uses rest API and Django in the backend SEO friendly for web crawlers?

I'm doing this on a project right now, and I would really recommend doing it with Node instead of Python, like this:
https://isomorphic-mithril.mvlabs.it/en/

You might want to look into a server-side rendering of the page that crawlers visit.
Here is a good article on Client Side vs Server Side
I haven't heard of Mithril before, but you might find some plugins that does this for you.
https://github.com/MithrilJS/mithril-node-render

This might help you : https://github.com/sharjeel619/SPA-SEO
The above example is made with Node/Express but you can use the same logic with your Django server.
Logic
A browser requests your single page application from the server,
which is going to be loaded from a single index.html file.
You program some intermediary server code which intercepts the client
request and differentiates whether the request came from a browser or
some social crawler bot.
If the request came from some crawler bot, make an API call to
your back-end server, gather the data you need, fill in that data to
html meta tags and return those tags in string format back to the
client.
If the request didn't come from some crawler bot, then simply
return the index.html file from the build or dist folder of your single page
application.

Related

Suggestion: Single Page application architecture issue

I have written a web app (Single Page application) which has only frontend technologies involved (Vuejs) and when I compile it, it will ultimately generate web pages (only HTML and JS). I can run this app anywhere by opening the index page.I am consuming REST API powered by oAuth on this SPA (making direct Ajax call to REST API endpoints).
But the problem is, My lead developer is saying the SPA must be powered by back-end service (Server) for example nodejs, apache. And the backend should make call to the REST APIs not directly Ajax calls from the browser (Frontend JS ajax). My SPA app runs anywhere and works perfectly on browsers even without any server.
My question is, do I really need to render and run my SPA using webserver, whats the reasons behind making my SPA (Plain html, js) app server powered??
Also please suggest me, if people simply write app using JS and HTML (pure front end) and upload on the server and point a domain name to that html-js web app which will be consuming remote REST APIs.
Thank you for making my doubts clear in advance.
I have remote REST API provider, suggest me best way to write an SPA to consume that remote APIs.
There may be some reasons to setup a back-end service, for example:
Hide REST API endpoints
Setup your own caching / throttling / failovers etc. to REST API endpoints
Override / control REST API responses / requests
Still, you can use only pure html+js SPA, but adding back-end service gives you additional options, not possible to achieve on front-end.

Creating HTML Snapshot for AngularJS App for Search Engine Crawling

First of all let me tell you what i want to do.
I want to index my Website which is made in Angular JS, for this i had read all documents or articles from google for this kind of purpose.
And what i found is that :
I need to convert my angular js url to friendly url with html5Mode or hashbag mode.
like : http://www.example.com/#/about to http://www.example.com/#!/about or just http://www.example.com/about
using <meta name="fragment" content="!">
So when any crawler will come on my website say that Googlebot will come and it will see my url as http://www.example.com/?_escaped_fragment_=/about so now i need to serve this request to static html page....Right??
Now my question is how can i generate this html static page with php framework and this html should create automatically only after its content is fully loaded means If angular js is loading data with $http request then this html will only generated after all data will loaded in html template.
I can serve this html to crawler only if this can be generated automatically.
I have tested locally by creating some html page manually and check that if request will come with _escaped_fragment_ parameter then it will server that particular static html page.
But i'm not able to find way for that i will create html page for particular angular js request with php framework.
I dont want to use any npm services.I want to create it fully in php and/or any jQuery Plugin.
Apart from needing a server you redirect your escaped fragment requests to, where the server has taken HTML snapshots after the various views have been rendered, which can then be fed to Googlebot, etc, don't forget you'll also need a setting in your existing web server (such as Apache or IIS) to redirect to the new server handling your escaped fragment.
I appreciate you're looking for a PHP solution, and don't want to use an NPM one like the prerender or seoserver packages, but I want to throw an alternative and easier solution your way: Use an existing hosted solution such as https://www.seo4ajax.com/ or https://prerender.io/ instead. They'll host an SEO server for you that can crawl your site, and typically this is free unless you've got a huge data-driven site with more than a few hundred pages. Less load for you, no need to run yet another server, and you get some nice admin panels to look at what crawlers have hit the SEO redirect, etc.
Broadly speaking, you'll need the meta tag in your HTML you mention, and these settings in your AngularJS config:
$locationProvider.html5Mode(true);
$locationProvider.hashPrefix('!');
Then some custom redirect settings such as in an .htaccess file if your existing server is Apache, which these sites will give you, and you can copy and paste.

Scraping a website which has javascript

I'm looking for a method to scrape a website from server side (which uses javascript) and save the output after analyzing data into a mysql database. I need to navigate from page to page by clicking links and submitting data from the database,without session expiring . Is this possible using phpquery web browser plugin? . I've started doing this using casperjs. I would like to know the pros and cons of both methods. I'm a beginner in the coding space. Please help.
I would recommend that you use PhantomJS or CasperJS and parse the DOM with JavaScript selectors to get the parts of the pages you want back. Don't use phpQuery as it's based on PHP and would require a separate step in your processing versus using just JavaScript DOM parsing. Also, you won't be able to perform click events using PHP. Anything client side would need to be run in PhantomJS or CasperJS.
It might even be possible to write a full scraping engine using just PHP if that's your server side language of choice. You would need to reverse engineer the login process and maintain a cookie jar with your cURL requests to keep your login valid with each request. Once you've established a session with the the website, you can then setup your navigation path with an array of links that you would like to crawl. The idea behind web crawling is that you load a page from some link and process the page and then move to the next link. You continue this process until all pages have been processed and then your crawl is complete.
I would check out Google's guide Making AJAX Applications Crawlable the website you're trying to scrap might have adopted the scheme (making their site's content crawlable).
You want to look for #! in the URL's hash fragment, this indicates to the crawler that the site supports the AJAX crawling scheme.
To put it simply, when you come across a URL like this.
www.example.com/ajax.html#!key=value you would modify it to www.example.com/ajax.html?_escaped_fragment_=key=value. The server should respond with a HTML snapshot of that page.
Here is the Full Specification

Secure way of persisting an auth token in a single page js application

Our scenario:
Our solution consists of an MVC app which serves a single page javascript application, and an Asp.Net WebAPI app which is intended for use both as a standalone api and as a source of data for the SPA.
We have set everything up so that the two apps will share auth tokens and membership, so if we are logged in to the SPA then the same formsauthentication cookie will also allow us access to the API.
This works fine if you make API requests in the browser address bar, but not through AJAX. We have followed examples of setting up basic authentication using Thinktecture and if we hardcode username\password as an authentication header for our ajax calls then this works fine also.
My question is however, what is the correct way of persisting these details on the client side? Our only real solution so far would be to send down the base 64 hash of the username\password as part of the initial load of the SPA and then pull this out when needed. This seems insecure however.
So basically, just wondering what the 'correct' approach is in this situation... are we close or is there another approach that we have overlooked?
Thanks!
We're using the session token support from Thinktecture.IdentityModel and then making the token available to the client via a dynamically generated script.
Full details at http://ben.onfabrik.com/posts/dog-fooding-our-api-authentication
I also published a sample application demonstrating these concepts at https://github.com/benfoster/ApiDogFood.

OAuth, javascript and many URI's

I'm trying to make a program that can be hosted by many peoples, like an app.
The app use a REST API, so I must authenticate with Oauth,
and because anyone should be able to host the program, the redirect URI cannot be static.
Further, I don't want to use any server-side processing, which means only javascript for me.
Is it even possible to make a secure and working solution with non-static redirect URI,
and only using javascript, to work in a normal webbrowser?
So you use the information provided in the request to your app to indicate the URL for your app. For instance, if the request came to http://example.com/path/to/app and you knew in your app that /to/app was part of your routing infrastructure, then the path to your app is http://example.com/path/.
That is how I would determine it, using a serverside language.
Using a javascript library, which would be loaded from the server, I would either determine it like the above, or I would just hard code it on the generation of the javascript file (when you tell people where to download the javascript, it can use a form that requires their web address first).

Categories