I have a website which has two versions, an all singing all dancing javascript powered application which is served when you request the root url
/
As you navigate around the lovely website the content updates, as does the url, thanks to html5 push state or good old correctly formatted #! urls. However if you don't have javascript enabled you can still use all functionality of the site as each piece of content also exists under it's own url. This is great for 3 reasons
non javascript users can still use the site
SEO - web crawlers can index the site easily
everything is shareable on social networks
The third reason is very important to me as every piece of content must be individually shareable on the site. And because each piece of content has it's own url it is easy to deep link to that url, and each piece of content can have it's own specific open graph data.
However the issue I hit is the following. You are a normal person and have javascript enabled and you are browsing and image gallery on the site and decide to share the picture of a lovely cat you have found. Using javascript the url has been updated to
/gallery/lovely-cat
You share this url and your friend clicks on it. When they click on the link the server sends you the non javascript / web crawler version of the site, and the experience is no where near as nice as the javascript version you would have been served if you directly went to the root of the site and navigated there.
Do anyone have a nice solution / alternative setup to solve this problems? I have several hacks which work, however I am not that happy with them. They include :
javascript redirect to the root of the site on every page and store a cookie / add a #! to the url so on page render the javascript router will show the correct content. ( does google punish automatic javascript redirects? )
render the no javascript page, and add some javascript which redirects the user to the root, similar to above, whenever the user clicks on a link
I don't particularly like either of these solutions, but can't think of a better solution. Rendering the entire javascript app for each page doesn't appear to be a solution to me, as you would end up with bad looking urls such as /gallery/lovely-cat/gallery/another-lovely-cat as you start navigating through the site.
My solution must support old browsers which do not implement push state
Make the "non javascript / web crawler version of the site" the same as the JavaScript version. Just build HTML on the server instead of DOM on the client.
Rendering the entire javascript app for each page doesn't appear to be a solution to me,
That is the robust approach
as you would end up with bad looking urls such as /gallery/lovely-cat/gallery/another-lovely-cat
Only if you linked (and pushStateed) to gallery/another-lovely-cat instead of /gallery/another-lovely-cat. (Note the / at the front).
Try out this plugin it might solve your 3rd reason, along with two reasons.
http://www.asual.com/jquery/address/
Related
I am creating an angular app that is hosted on a webserver that doesn't allow me to edit htaccess files or webconfig. There is no server side language option available which means no middleware for creating HTML snapshots. This is a high dollar CRM with webstore and no option of switching hosts.
So I have come up with my own "solution" to the issue. Would it be considered ok to create hyperlinks that link to url's that will generate the same view that will be updated by an onClick event. This way the user will see the content loaded immediately, but bots will have to reload the page at the new url to see the page content.
Example:
View 2
I'm struggling to find a good solution to this issue, and I know others have to be in the same situation as me when it comes to development. The code above is just a visual reference to what I am referring to.
Have you looked at
grunt-html-snapshot
After implementing this and testing this, it does work well. Google sees them as new pages and the user never has to worry about loading new content.
This question may be not related to exact software stack, framework or language.
For my current project, we are using AngularJS to build the front-end that has a constant entrance page to load real data and render, which is easy for CDN and good for fast loading speed from browser side. But for some social feature, such architect may result in some problem. For example, when you paste your interested link to Facebook portal to share, Facebook will grab your page and show up a preview. If a landing page is empty, such preview won't work.
(I heard that Google+ recently support rendering javascript logic at server side before send back a preview, but obviously it's not a common support for other similar services. Google.com also supports indexing js based one-page application.)
Is there a better solution to solve this problem gracefully rather than fallback to have dynamic page which includes real data? Have I missed something in understanding this problem?
========
... I was even thinking of that, for requests that identified as FB request (like user agent), redirect it to a special gateway that wrapping sth like PhantomJS, fetch the page, render it server-side, and send back a DOM tree snapshot as content for FB to generate preview. But I also doubt that it's a good direction. : (
We are in the same situation. The simple solution is to use Open Graph meta tags in the pages your server will serve to Facebook scrapers.
Basically you need to do server-side what your web app is doing client-side. Amount of work highly depends on your hosting technology (MVC makes it super easy), your URI format and the APIs you use.
You will find some explanations here:
https://developers.facebook.com/docs/plugins/share-button/
Open graph introduction:
http://ogp.me/
I've got this setup:
Single page app that generates HTML content using Javascript. There is no visible HTML for non-JS users.
History.js (pushState) for handling URLS without hashbangs. So, the app on "domain.com" can load dynamic content of "page-id" and updates the URL to "domain.com/page-id". Also, direct URLS work nicely via Javascript this way.
The problem is that Google cannot execute Javascript this way. So essentially, as far as Google knows, there is no content whatsoever.
I was thinking of serving cached content to search bots only. So, when a search bot hits "domain.com/page-id", it loads cached content, but if a user loads the same page, it sees normal (Javascript injected) content.
A proposed solution for this is using hashbangs, so Google can automatically convert those URLs to alternative URLs with an "escaped_fragment" string. On the server side, I could then redirect those alternative URLs to cached content. As I won't use hashbangs, this doesn't work.
Theoretically I have everything in place. I can generate a sitemap.xml and I can generate cached HTML content, but one piece of the puzzle is missing.
My question, I guess, is this: how can I filter out search bot access, so I can serve those bots the cached pages, while serving my users the normal JS enabled app?
One idea was parsing the "HTTP_USER_AGENT" string in .htaccess for any bots, but is this even possible and not considered cloaking? Are there other, smarter ways?
updates the URL to "domain.com/page-id". Also, direct URLS work nicely via Javascript this way.
That's your problem. The direct URLs aren't supposed to work via JavaScript. The server is supposed to generate the content.
Once whatever page the client has requested is loaded, JavaScript can take over. If JavaScript isn't available (e.g. because it is a search engine bot) then you should have regular links / forms that will continue to work (if JS is available, then you would bind to click/submit events and override the default behaviour).
A proposed solution for this is using hashbangs
Hashbangs are an awful solution. pushState is fix for hashbangs, and you are using that already - you just need to use it properly.
how can I filter out search bot access
You don't need to. Use progressive enhancement / unobtrusive JavaScript instead.
I want my webpage to have two parts. The top part has a textbox. When the user types a URL into the textbox, the bottom part browses to the content of that URL. When the user clicks a link within the bottom part, the bottom part navigates to the new URL, and the textbox in the top part changes to the new URL. How can I do it?
NOTE: This behavior is the same as in Google Translate (e.g. here), but without any translation.
first problem..
Same origin issue
The only way to achieve what you are asking is exactly the way google translate does what it does - which is to use a server-side powered script as a proxy request:
http://translate.google.com/translate_un?depth=1&hl=en&ie=UTF8&prev=_t&rurl=translate.google.com&sl=auto&tl=en&twu=1&u=http://de.wikipedia.org/wiki/USA&lang=de&usg=ALkJrhgoLkbUGvOPUCHoNZIkVcMQpXhxZg
The above is the URL taken from the iframe that Google translate uses to display the translated page. The main thing to note is that the domain part of the URL is the same as the parent page's URL http://translate.google.com -- if both your frame and your parent window do not share the same domain, then your parent window's JavaScript wont be able to access anything within the iframe. It will be blocked by your browser's in-built security.
Obviously the above wont be a problem if in your project you are only ever going to be navigating your own pages (on the same domain), but considering you are proffering Google Translate as an example I'm assuming not.
What would Google do?
What the above URL does is to ask the server-side to fetch the wikipedia page and return it so that the iframe can display it - but to the iframe this page appears to be hosted on translate.google.com rather than wikipedia. This means that the iframe stays within the same origin as the parent window, and means that JavaScript can be used to edit or modify the page within the iframe.
next problem....
Rewrite the proxied content
Basically what I'm saying is that this can't be achieved with just HTML and client-side JavaScript - you need to have something to help from the server-side i.e. PHP, Python, Ruby, Lisp, Node.. and so on. This script will be responsible for making sure the proxied page appears/renders correctly e.g. you will have to make sure relative links to content/images/css on the original server are not broken (you can use the base tag or physically rewrite relative links). There are also many sites that would see this as an illegal use of their site, as per their site's terms of use and so should be black listed from your service.
final problem..?
Prevent the user from breaking away from your proxy
Once you have your proxy script, you can then use an iframe (please avoid using old framesets), and a bit of JavaScript magic that onload or ondomready of the iframe rewrites all of the links, forms and buttons in the page. This is so that when clicked or submitted, they post to your proxy script rather than the original destination. This rewrite code would also have to send the original destination to your proxy script some how - like u in the Google translate URL. Once you've sorted this, it will mean your iframe will reload with the new destination content, but - all importantly - your iframe will stay on the same domain.
too many problems!
If it were me, personally, I'd rethink your strategy
Overall this is not a simple task, and it isn't 100% fullproof either because there are many things that will cause problems:
Certain sites are designed to break out of frames.
There are ways a user can navigate from a page that can not be easily rewritten i.e. any navigation powered by JavaScript.
Certain pages are designed to break when served up from the wrong host.
Sites that do this kind of 'proxying' of other websites can get into hot water with regards to copyright and usage.
The reason why Google can do it is because they have a lot of time, money and resources... oh and a great deal of what Google translate does is actually handled on the server-side - not in JavaScript.
suggestions
If you are looking for tracking users navigating through your own site:
Use Google Analytics.
Or implement a simple server-side tracking system using cookies.
If you are looking to track users coming to your site and then travelling on to the rest of the world wide web:
Give up, web technologies are designed to prevent things like this.
Or join an online marketing company, they do their best to get around the prevention of things like this.
add a javascript function to your second frame -
<frame id="dataframe" src="frame_a.htm" onload="load()">
let the text box have an id - say "test"
function load()
{
document.getElementById('test').value=document.getElementById('dataframe').src
}
I'm sorry if this is a newbie question but I don't really know what to search for either. How do you keep content from a previous page when navigating through a web site? For example, the right side Activity/Chat bar on facebook. It doesn't appear to refresh when going to different profiles; it's not an iframe and doesn't appear to be ajax (I could be wrong).
Thanks,
I believe what you're seeing in Facebook is not actual "page loads", but clever use of AJAX or AHAH.
So ... imagine you've got a web page. It contains links. Each of those links has a "hook" -- a chunk of JavaScript that gets executed when the link gets clicked.
If your browser doesn't support JavaScript, the link works as it normally would on an old-fashioned page, and loads another page.
But if JavaScript is turned on, then instead of navigating to an HREF, the code run by the hook causes a request to be placed to a different URL that spits out just the HTML that should be used to replace a DIV that's already showing somewhere on the page.
There's still a real link in the HTML just in case JS doesn't work, so the HTML you're seeing looks as it should. Try disabling JavaScript in your browser and see how Facebook works.
Live updates like this are all over the place in Web 2.0 applications, from Facebook to Google Docs to Workflowy to Basecamp, etc. The "better" tools provide the underlying HTML links where possible so that users without JavaScript can still get full use of the applications. (This is called Progressive Enhancement or Graceful degradation, depending on your perspective.) Of course, nobody would expect Google Docs to work without JavaScript.
In the case of a chat like Facebook, you must save the entire conversation on the server side (for example in a database). Then, when the user changes the page, you can restore the state of the conversation on the server side (with PHP) or by querying your server like you do for the chat (Javascript + AJAX).
This isn't done in Javascript. It needs to be done using your back-end scripting language.
In PHP, for example, you use Sessions. The variables set by server-side scripts can be maintained on the server and tied together (between multiple requests/hits) using a cookie.
One really helpful trick is to run HTTPFox in Firefox so you can actually monitor what's happening as you browse from one page to the next. You can check out the POST/Cookies/Response tabs and watch for which web methods are being called by the AJAX-like behaviors on the page. In doing this you can generally deduce how data is flowing to and from the pages, even though you don't have access to the server side code per se.
As for the answer to your specific question, there are too many approaches to list (cookies, server side persistence such as session or database writes, a simple form POST, VIEWSTATE in .net, etc..)
You can open your last closed web-page by pressing ctrl+shift+T . Now you can save content as you like. Example: if i closed a web-page related by document sharing and now i am on travel web page. Then i press ctrl+shift+T. Now automatic my last web-page will open. This function works on Mozilla, e explorer, opera and more. Hope this answer is helpful to you.