Creating HTML Snapshot for AngularJS App for Search Engine Crawling

Creating HTML Snapshot for AngularJS App for Search Engine Crawling - javascript

First of all let me tell you what i want to do.
I want to index my Website which is made in Angular JS, for this i had read all documents or articles from google for this kind of purpose.
And what i found is that :
I need to convert my angular js url to friendly url with html5Mode or hashbag mode.
like : http://www.example.com/#/about to http://www.example.com/#!/about or just http://www.example.com/about
using <meta name="fragment" content="!">
So when any crawler will come on my website say that Googlebot will come and it will see my url as http://www.example.com/?_escaped_fragment_=/about so now i need to serve this request to static html page....Right??
Now my question is how can i generate this html static page with php framework and this html should create automatically only after its content is fully loaded means If angular js is loading data with $http request then this html will only generated after all data will loaded in html template.
I can serve this html to crawler only if this can be generated automatically.
I have tested locally by creating some html page manually and check that if request will come with _escaped_fragment_ parameter then it will server that particular static html page.
But i'm not able to find way for that i will create html page for particular angular js request with php framework.
I dont want to use any npm services.I want to create it fully in php and/or any jQuery Plugin.

Apart from needing a server you redirect your escaped fragment requests to, where the server has taken HTML snapshots after the various views have been rendered, which can then be fed to Googlebot, etc, don't forget you'll also need a setting in your existing web server (such as Apache or IIS) to redirect to the new server handling your escaped fragment.
I appreciate you're looking for a PHP solution, and don't want to use an NPM one like the prerender or seoserver packages, but I want to throw an alternative and easier solution your way: Use an existing hosted solution such as https://www.seo4ajax.com/ or https://prerender.io/ instead. They'll host an SEO server for you that can crawl your site, and typically this is free unless you've got a huge data-driven site with more than a few hundred pages. Less load for you, no need to run yet another server, and you get some nice admin panels to look at what crawlers have hit the SEO redirect, etc.
Broadly speaking, you'll need the meta tag in your HTML you mention, and these settings in your AngularJS config:
$locationProvider.html5Mode(true);
$locationProvider.hashPrefix('!');
Then some custom redirect settings such as in an .htaccess file if your existing server is Apache, which these sites will give you, and you can copy and paste.

Related

Control of dynamically loaded scripts in Meteor.js

Would there be any mechanism (at least theortical) which would allow to control which scripts are provided to a client? I have splited code into dynamically loadable parts using import('dynamically_loadable_file') but whenever is it called on a client, the file is served. I'd like to perform some security check whether does a user have permission to load the file. I thought of middlewares but those are only for HTTP and executable scripts are served through WebSockets.
Also, if it would be possible I would like to control content of the provided scripts. E.g. I'd like to add or "hide" some functions or variables to the script based on a user loading them. I guess something like dynamic compilation using AST would be required, or maybe there is/would be available something else. I guess that's another level but if there would be some content available on such ideas I'd be thankful.
Maybe it is not possible with meteor at all, so if this is possible wherever in JavaScript (node.js) world, it'd help too.
Thanks for ideas and explanations.

Most client-side protection mechanism can be surrounded with enough knowledge and the right tools.
The most viable solution to your problem would be to use a server side rendering (ssr) library for your current front-end engine.
With ssr your would solve to
allow to control which scripts are provided to a client?
perform some security check whether does a user have permission to load the file
scripts are served through WebSockets
control content of the provided scripts
add or "hide" some functions or variables to the script based on a user loading them
Because all your templates are rendered on the server and only the resulting data is returned to the client.
Some ssr packages for Meteor:
Generic: https://docs.meteor.com/packages/server-render.html
React: https://www.chrisvisser.io/meteor/how-to-set-up-meteor-react-with-ssr (guide with link to a boilerplate repo)
Vue: https://github.com/meteor-vue/vue-meteor/tree/master/packages/vue-ssr
The native Meteor Way
Besides this I would like to emphasize, that you can achieve most data handling through Publications and Methods.
Showing / hiding HTML elements on the client does not add any security if your data and logic are not secured on the server.
If you only publish the right data to the right user (for example using alanning:roles) then it does not matter which scripts you load.
Same goes for Methods: If you are very strict in who (use again alanning:roles) can call a Method then it does not matter, if a user can disable the Router and see all of the "hidden" areas on the client because all invalid actions are rejected server-side.

Inject script tag on static html files served by a node server

I would like to create a server on node where users will request several pages from a static folder and the server should inject a custom tag on them before serving.
Is there any recommended way to do it? I've been trying with http-proxy with no luck, not sure if I really need a proxy or if there's a way to intercept the answer on static pages using just simple http.

You will have to make an AJAX request to the server, which then will send you back the files. If everything goes right, then you can use .innerHTML = ... to inject it.

Single Page Application Web crawlers and SEO

I have created my blog as a single page application using mithril framework on the front end. To make queries I've used a rest API and Django at the backend. Since everything is rendered using javascript code and when the crawlers hit my blog all they see is an empty page. And to add to that whenever I share a post on social media for instance all Facebook sees is just an empty page and not the post content and title.
I was thinking of looking at the user agents and whenever the USER-AGENT is from a crawler I would feed it the rendered version of the pages but I'm having problems implementing the above method described.
What is the best practice to create a single page app that uses rest API and Django in the backend SEO friendly for web crawlers?

I'm doing this on a project right now, and I would really recommend doing it with Node instead of Python, like this:
https://isomorphic-mithril.mvlabs.it/en/

You might want to look into a server-side rendering of the page that crawlers visit.
Here is a good article on Client Side vs Server Side
I haven't heard of Mithril before, but you might find some plugins that does this for you.
https://github.com/MithrilJS/mithril-node-render

This might help you : https://github.com/sharjeel619/SPA-SEO
The above example is made with Node/Express but you can use the same logic with your Django server.
Logic
A browser requests your single page application from the server,
which is going to be loaded from a single index.html file.
You program some intermediary server code which intercepts the client
request and differentiates whether the request came from a browser or
some social crawler bot.
If the request came from some crawler bot, make an API call to
your back-end server, gather the data you need, fill in that data to
html meta tags and return those tags in string format back to the
client.
If the request didn't come from some crawler bot, then simply
return the index.html file from the build or dist folder of your single page
application.

Scraping a website which has javascript

I'm looking for a method to scrape a website from server side (which uses javascript) and save the output after analyzing data into a mysql database. I need to navigate from page to page by clicking links and submitting data from the database,without session expiring . Is this possible using phpquery web browser plugin? . I've started doing this using casperjs. I would like to know the pros and cons of both methods. I'm a beginner in the coding space. Please help.

I would recommend that you use PhantomJS or CasperJS and parse the DOM with JavaScript selectors to get the parts of the pages you want back. Don't use phpQuery as it's based on PHP and would require a separate step in your processing versus using just JavaScript DOM parsing. Also, you won't be able to perform click events using PHP. Anything client side would need to be run in PhantomJS or CasperJS.
It might even be possible to write a full scraping engine using just PHP if that's your server side language of choice. You would need to reverse engineer the login process and maintain a cookie jar with your cURL requests to keep your login valid with each request. Once you've established a session with the the website, you can then setup your navigation path with an array of links that you would like to crawl. The idea behind web crawling is that you load a page from some link and process the page and then move to the next link. You continue this process until all pages have been processed and then your crawl is complete.

I would check out Google's guide Making AJAX Applications Crawlable the website you're trying to scrap might have adopted the scheme (making their site's content crawlable).
You want to look for #! in the URL's hash fragment, this indicates to the crawler that the site supports the AJAX crawling scheme.
To put it simply, when you come across a URL like this.
www.example.com/ajax.html#!key=value you would modify it to www.example.com/ajax.html?_escaped_fragment_=key=value. The server should respond with a HTML snapshot of that page.
Here is the Full Specification

How to solve JavaScript origin problem with an application and static file server

In a system that I'm building I want to serve
Static files (static HTML pages and a lot of images), and
Dynamic XML generated by my servlet.
The dynamic XML is generated from my database (through Hibernate) and I use Restlets to serve it in response to API calls. I want to create a static file server (e.g. Apache) so that this does not interfere with the dynamic server traffic. Currently both servers need to run on the same machine.
I've never done something like this before and this is where I'm stuck:
The static HTML pages contain JavaScript that makes API calls to the dynamic server. However, since the two servers operate on different ports, I get stuck with the same origin problem. How can this be solved?
As a bonus, if you can point me to any resources that explain how to create such a static/dynamic content serving system, I'll be happy.
Thanks!

You should setup mod_proxy in apache to forward dynamic requests to whatever backend server you are using. Your existing setup (ie. two separate ports) is perfect, you just need to tell apache 'proxy dynamic requests to my backend server without letting the browser know'.
This page should get you started - http://httpd.apache.org/docs/1.3/mod/mod_proxy.html

You need to load a script tag from the Reslet server... have a look at JSONP and this SO post

We Keep Coding

JavaScript is the programming language of the Web.

Creating HTML Snapshot for AngularJS App for Search Engine Crawling - javascript

Related

Control of dynamically loaded scripts in Meteor.js

Inject script tag on static html files served by a node server

Single Page Application Web crawlers and SEO

Scraping a website which has javascript

How to solve JavaScript origin problem with an application and static file server

Categories

Resources