Lately Google announced it will be rolling out support for _escape_fragment. It was a feature Google used to get "static" content of website if it had hashbang (#!) in URL.
So now Google advices to provide along with JS version of website a static, non JS version for users with no JS and for Google bots in the same manner.
So when person visits for example website test.com/#!/item/2
I should generate a JS version of website and in noscript tag a non-JS version. Ok.
But since hashbang is not sent to server how should i know that i need generate a static website for item 2?
So my question is: how to provide static content for no-JS users in hashbang url scheme website.
You can't, but that isn't what Google is saying.
Instead of using hashbangs, you should use pushState and the rest of the History API.
That will let you have URLs like http://test.com/item/2.
If someone visits http://test.com/item/2 then your server should generate the page in the state it would be in if they had visited http://test.com/item/1 and then triggered the JavaScript event that would convert it into http://test.com/item/2.
No need to use noscript at all.
Related
I have a website which has two versions, an all singing all dancing javascript powered application which is served when you request the root url
/
As you navigate around the lovely website the content updates, as does the url, thanks to html5 push state or good old correctly formatted #! urls. However if you don't have javascript enabled you can still use all functionality of the site as each piece of content also exists under it's own url. This is great for 3 reasons
non javascript users can still use the site
SEO - web crawlers can index the site easily
everything is shareable on social networks
The third reason is very important to me as every piece of content must be individually shareable on the site. And because each piece of content has it's own url it is easy to deep link to that url, and each piece of content can have it's own specific open graph data.
However the issue I hit is the following. You are a normal person and have javascript enabled and you are browsing and image gallery on the site and decide to share the picture of a lovely cat you have found. Using javascript the url has been updated to
/gallery/lovely-cat
You share this url and your friend clicks on it. When they click on the link the server sends you the non javascript / web crawler version of the site, and the experience is no where near as nice as the javascript version you would have been served if you directly went to the root of the site and navigated there.
Do anyone have a nice solution / alternative setup to solve this problems? I have several hacks which work, however I am not that happy with them. They include :
javascript redirect to the root of the site on every page and store a cookie / add a #! to the url so on page render the javascript router will show the correct content. ( does google punish automatic javascript redirects? )
render the no javascript page, and add some javascript which redirects the user to the root, similar to above, whenever the user clicks on a link
I don't particularly like either of these solutions, but can't think of a better solution. Rendering the entire javascript app for each page doesn't appear to be a solution to me, as you would end up with bad looking urls such as /gallery/lovely-cat/gallery/another-lovely-cat as you start navigating through the site.
My solution must support old browsers which do not implement push state
Make the "non javascript / web crawler version of the site" the same as the JavaScript version. Just build HTML on the server instead of DOM on the client.
Rendering the entire javascript app for each page doesn't appear to be a solution to me,
That is the robust approach
as you would end up with bad looking urls such as /gallery/lovely-cat/gallery/another-lovely-cat
Only if you linked (and pushStateed) to gallery/another-lovely-cat instead of /gallery/another-lovely-cat. (Note the / at the front).
Try out this plugin it might solve your 3rd reason, along with two reasons.
http://www.asual.com/jquery/address/
There are numerous resources out there for implementing SEO-friendly versions of AngularJS applications, of course. Despite reading all of them numerous times, I'm still a bit unclear on a few things, particularly regarding the distinction between the hashbang and HTML5 mode models:
For hashbang (#!, or "HTML4") apps, the following setting is given on the location provider:
$location.hashPrefix('!');
Is this setting required for HTML5 mode as well? Why or why not?
For HTML5 mode apps, the following meta tag is included in the index.html page:
<meta name="fragment" content="!">
Is this meta tag required for hashbang apps as well? Why or why not?
Using HTML5 mode, my URLs look similar to:
http://sample.com/landing/home
Even with the meta tag from #2 specified in my index.html, I'm still unable to navigate to my URLs as a crawler would, such as to:
http://sample.com/#!/landing/home
Is this normal? Should I expect to be able to navigate to my app hashbang-style, if it's an HTML5 mode app, after adding the location provider settings and/or meta tag?
More than anything, I guess my actual question would be: what's specifically required for HTML5 mode crawling, and what's specifically required for hashbang-style crawling? How do they overlap? Additionally, how does the HTML5 mode configuration actually work, behind the scenes, if no hashbang-style route is ever produced/usable?
Note that these questions are separate from the issue of generating/serving snapshots, which I generally understand.
AngularJS SEO-friendly configuration generally makes sense when it comes to classical hashbang-style apps, but for HTML5 mode, I'm a bit confused. Would love some clarity.
Answers
Hashbang isn't required for HTML4 either. But if you want to implement SEO it's good that you do use it as search bots will see those links and request a different URL:
original
http://somesite.com/#!/crazy/101
bot:
http://somesite.com/?_escaped_fragment_=crazy/101
Meta tag is included so search bot will automatically append ?_escaped_fragment_ to requests. Since it can't know which part is actually part of SPA the value will be empty.
original with meta tag
http://somesite/crazy/101
bot
http://somesite/crazy/101?_escaped_fragment_=
See #2
How HTML5 mode works behind the scenes?
It works using History API implemented in HTML5 that allows changng browser's URL and history entry manipulation. Basically it allows developers to change browser's URL address without the browser to actually make a request.
Additional HTML5 mode explanation
Suppose your SPA runs at domain root http://somesite.com. So whenever URL in browser changes it means that it's been manipulated on the client. This means that there is no actual content on the server at some sub-content URL.
That's why bot appends _escaped_fragment_ at the end so you can serve static content instead of 404 or 301 to root (as content doesn't exists on the server). This static content does nothing else but returns content. No processing no SPA scripts. Pure content.
This question may be not related to exact software stack, framework or language.
For my current project, we are using AngularJS to build the front-end that has a constant entrance page to load real data and render, which is easy for CDN and good for fast loading speed from browser side. But for some social feature, such architect may result in some problem. For example, when you paste your interested link to Facebook portal to share, Facebook will grab your page and show up a preview. If a landing page is empty, such preview won't work.
(I heard that Google+ recently support rendering javascript logic at server side before send back a preview, but obviously it's not a common support for other similar services. Google.com also supports indexing js based one-page application.)
Is there a better solution to solve this problem gracefully rather than fallback to have dynamic page which includes real data? Have I missed something in understanding this problem?
========
... I was even thinking of that, for requests that identified as FB request (like user agent), redirect it to a special gateway that wrapping sth like PhantomJS, fetch the page, render it server-side, and send back a DOM tree snapshot as content for FB to generate preview. But I also doubt that it's a good direction. : (
We are in the same situation. The simple solution is to use Open Graph meta tags in the pages your server will serve to Facebook scrapers.
Basically you need to do server-side what your web app is doing client-side. Amount of work highly depends on your hosting technology (MVC makes it super easy), your URI format and the APIs you use.
You will find some explanations here:
https://developers.facebook.com/docs/plugins/share-button/
Open graph introduction:
http://ogp.me/
I've got this setup:
Single page app that generates HTML content using Javascript. There is no visible HTML for non-JS users.
History.js (pushState) for handling URLS without hashbangs. So, the app on "domain.com" can load dynamic content of "page-id" and updates the URL to "domain.com/page-id". Also, direct URLS work nicely via Javascript this way.
The problem is that Google cannot execute Javascript this way. So essentially, as far as Google knows, there is no content whatsoever.
I was thinking of serving cached content to search bots only. So, when a search bot hits "domain.com/page-id", it loads cached content, but if a user loads the same page, it sees normal (Javascript injected) content.
A proposed solution for this is using hashbangs, so Google can automatically convert those URLs to alternative URLs with an "escaped_fragment" string. On the server side, I could then redirect those alternative URLs to cached content. As I won't use hashbangs, this doesn't work.
Theoretically I have everything in place. I can generate a sitemap.xml and I can generate cached HTML content, but one piece of the puzzle is missing.
My question, I guess, is this: how can I filter out search bot access, so I can serve those bots the cached pages, while serving my users the normal JS enabled app?
One idea was parsing the "HTTP_USER_AGENT" string in .htaccess for any bots, but is this even possible and not considered cloaking? Are there other, smarter ways?
updates the URL to "domain.com/page-id". Also, direct URLS work nicely via Javascript this way.
That's your problem. The direct URLs aren't supposed to work via JavaScript. The server is supposed to generate the content.
Once whatever page the client has requested is loaded, JavaScript can take over. If JavaScript isn't available (e.g. because it is a search engine bot) then you should have regular links / forms that will continue to work (if JS is available, then you would bind to click/submit events and override the default behaviour).
A proposed solution for this is using hashbangs
Hashbangs are an awful solution. pushState is fix for hashbangs, and you are using that already - you just need to use it properly.
how can I filter out search bot access
You don't need to. Use progressive enhancement / unobtrusive JavaScript instead.
I'm making a Web-App (still in "Beta") which uses the Flickr API to get information for the photos of a particular Flickr user and generates IPB code to post any of his/her images.
While Flickr now gives you the IPB code to show the image and link back to the photo site directly on its site, my App also has the option of embeding the title, description, select EXIF data, location information, etc. into the post for the IPB forum.
I've most recently added the option to integrate a Google Maps image of the photo's geolocation data into the post by using the Google Static Maps API.
The problem is that the image URL I have is in the following form (including IPB [IMG] tags):
[IMG]http://maps.google.com/maps/api/staticmap?zoom=16&size=600x600&maptype=hybrid&markers=19.387687,-99.251732&sensor=false[/IMG]
Which shows this example image (In practice the image size is user selectable):
However, some IPB forums seem to not support dyamic image URLs which gives me a broken image, I'd like to replace the
[IMG]http://maps.google.com/maps/api/staticmap?zoom=16&size=600x600&maptype=hybrid&markers=19.387687,-99.251732&sensor=false[/IMG]
with something like
[IMG]http://maps.google.com/maps/api/staticmap/map0000001.png[/IMG]
which should be supported by all IPB forums. Thanks in advance for your help.
In case you're interested, the most recent "released" version of my Web-App can be found here: http://flickr.argote.mx/ (The changes I mention here are still on local development server).
There are two types of solution as far as I can see:
You create a proxy server to download the images from Google and serve it on nice URLs to the clients. The disadvantage is that you will have to handle a high traffic through your servers (I don't know much about your project you have to decide about performance)
You create a special BBCODE to handle your URLs and you can use that on any IPB forums
+1: You could create a serverside script with nice URLs to redirect to the Google URLs but the problem is you never know how the different browsers will handle it. I suppose they normally don't follow URLs for images inside pages.
+2: Ask Google to support nice URLs ;)
Hope that helps.
You should be able to use a URL shortener service, as long as the service supports simple 301 redirects to image resources. You'd have to try out which ones do.
For example, bit.ly has a REST API. It allows you to make calls like this from within PHP:
http://api.bitly.com/v3/shorten?login=abc&apiKey=123&longUrl={myurl}&format=json
returning a bit.ly URL that you can use in BBCode.
Edit: According to this JSFiddle, this method works, at least in Chrome and IE8. It would still need scrupulous testing across browsers.
Since both Aston's suggestions are out of the question, maybe you can set up a simple script that redirects the request to Google Maps images (instead of a proxy)?
So you can have something like http://my-simple-script.tld/lat,lng have that script redirect to the correct Google Maps static image URL.