This article makes a pretty convincing argument that because URLs are long-lived (they get bookmarked and passed around), they should be meaningful, and that using the hash for real routing (determining what data is shown on the page and/or the state of the application) is thus improper. When I try to actually do that in my single-page application, though, I run up against a problem: how do I render my links so that all browsers can use the application? As I see it, there are three options:
all hrefs have a #/ prefix. This works great in HTML4 browsers. In HTML5 browsers, I can add a Sammy route that redirects to the hash-less version, which also works great. There might be a problem with browsers marking links as visited when they're not or not marking them visited when they are. The other problem is that it's... wrong. Anyone who shares a link by right-clicking it and selecting "Copy Link URL" will be sending a working but kludgy URL around.
no hrefs have a #/ prefix. As far as I can tell, HTML4 browsers will have no way of intercepting these link clicks, which means that every one will cause a page refresh. Though the application would probably still function, since I could use Sammy routes to rewrite the hashless versions to hashy ones on page load, the page loads would kill the performance of the single-page application.
I dynamically determine whether to prefix with #/ or not. This means that all of my links have to have dynamic markup and dramatically complicates the application.
The hash value of a URL never caused and entire re-load of page. Not in HTML4 and before that. A hash value has always been an internal link, and therefore it can be used perfectly (take a look at twitter for example). Of course, when you refresh the page, you will reload the page. But that is obvious.
With JavaScript you can actually read this hash value (see also this question/answer: How can you check for a #hash in a URL using JavaScript?) using window.location.hash
Using a more recent browser, you can also detect a hash change, which is useful if users actually change the URL: On - window.location.hash - Change?
But when you, as the website, change the URL you don't need read this, you already know because you just changed it.
This way, using hashes, people can exchange the URLs, and you can actually read which URL they are requesting, and therefore it should work perfectly.
Related
As RIAs and SPAs (or web apps with heavy javascript usage) have become more and more popular, I've been running into systems that, instead of using good old a href hyperlinks, I see them utilizing constructs using onclick with JavaScript code that manipulates navigation. This is particularly true with images.
For example, instead of seeing something like this:
<img src="...."/>
<div ... onclick='SomeJsFunctionThatNavsToAnotherPage()'><img src="..."/></a>
What is the advantage of this? It makes it incredibly hard to trace where pages transition to when debugging or trying to root cause a bug. I can get the idea when the target to navigate can change (so yes, here you could use a function that computes to what page to navigate to.)
But I see this pattern even when the pages to navigate to are constant. I find this extremely convoluted and hard to test. Not to mention that there is always the browser-specific bugs that come from stuff (sadly in my experience from over-complexifying the front-end.)
But I am not a RIA/SPA developer (just backend and traditional web development). Am I missing the rationale behind this?
TO CLARIFY
My question is not for the case when we want to redraw the page or change current content without changing the current location. My question is for plain
old transitions, from page A to page B.
In such a case, why use onclick=funcToChangeLocation() over <a href="some location"/>.
This has been a pain for me when troubleshooting systems that are already written (for I wouldn't write them like that), but there could be reasons I am not aware of.
Again, my question is not for pages that redraw themselves without changing the browser location, but for navigation from one page to the next.
ALSO
If you are going to vote to close this question, at least leave a message explaining why.
If you are making a web application, sometime you don't want to redirect the user to another page, but you want to dynamically change the content of the page without refreshing the page. It has some advantages. It can be faster. You can easily keep the state of the page/application. You are not obligated to communicate with the server. You can update only a part of the page.
You can also dynamically request data to print the page. If you are displaying an user profile page, you can only ask a json object that represent the user. This json object is smaller than the whole page and will be dynamically rendered. It can help to reduce the data transfer between users and server when your bandwidth is limited.
EDIT: In the case of a simple page redirection, I think it's a bad practice and I cannot see an advantage. I think it obfuscate the website when the google crawler try to parse the website.
I once had a pretty successful web directory website. One day Google decided that "directories" are competing businesses and started penalizing sites that had links on directories. I used the method you describe to cloak outgoing links to try and trick Google.
Until now I believed that it was not possible for the HTML source of a page to be modified without reloading it entirely. I suspect what I've just seen is somehow related to pushState (but not only that): on the Quartz website (qz.com), you seamlessly slide to the next article once you've reached the bottom of the one you're reading, with a clean update of the URL and, amazingly, an updated HTML source.
For an example, scroll down any article, e.g.: http://qz.com/643497/we-are-witnessing-the-rise-of-global-authoritarianism-on-a-chilling-scale/
Maybe I've missed something, but could someone explain how this is done? Which HTML5 APIs are being used here?
NOTE: My question specifically focuses on updating the HTML content which you can seen when displaying the page source (not the DOM inspector, obviously).
This looks a lot like your typical asynchronous content loading, but yes they are using pushState to change the url. What is getting you I think is the fact that the routes are well-designed so that going to that URL gives you that same article first. However behind the scenes it is all just a series of JSONP requests combined with pushState() to give a nice flowing, well-formed document. You can even verify the requests using the network tab of any modern debugging console.
See this answer for info on using pushState for url updating :Modify the URL without reloading the page
EDIT
Since you are getting stuck on the fact on how the source is changing, think about it this way: When I say "the routes are well-designed so that going to that URL gives you that same article first" I mean that going to that page will reflect that article. All you are doing by viewing the source is taking the active URL (modified by pushState, so you are really going to a different page to retrieve the source) and grabbing the text without DOM parsing/rendering. I hope that clarifies what is going on in this site in particular.
You can now do this in most "modern" browsers!
Here is the original article I read (posted July 10, 2010): (HTML5: Changing the browser-URL without refreshing page).
For a more in-depth look into pushState/replaceState/popstate (aka the HTML5 History API) see the MDN docs.
you can do this:
window.history.pushState("object or string", "Title", "/new-url");
I am creating an internal dashboard on my site which is only accessible to logged in users and therefore is not indexable / crawlable by search engines. This dashboard is mainly a single-page app. Also, I don't care (at least I don't think I care) about having pretty urls: there is no progressive enhancement - if javascript is disabled, then the dashboard is not functional.
What I need is the ability to navigate using the back / forward button between different states - for instance, various modals that are opened. And very importantly, I need to be able to link externally to the specific state of the dashboard (e.g. modal A was open in this configuration) - e.g. via emails to users containing links to the dashboard.
Given all this, is there any preference to "old school" hash bangs (#!) vs html5 pushState? pushState will require me to use something like history.js for older browser support anyway. And architecturally speaking, if my dashboard is at the following url:
http://example.com/dashboard
won't I have to perform nearly identical operations to resolve to a particular modal state regardless of whether I'm using pushState or onhashchange? In other words:
http://example.com/dashboard#!modalA/state1
or
http://example.com/dashboard/modalA/state1
both of which will require parsing client side (done by a framework) to figure out how to display the current dashboard state. My backend controller would still be mapping to /dashboard/* for any dashboard url since all of the state concern is handled on the client.
Am I missing something? Not that it should matter, but I am using CanJS which supports both hash events and pushState.
Note: my question is not specific to Google's hashbang proposal, but to the general use of the # (onhashchange).
This article does a pretty good job of summing up the potential issues with using hash/hashbang, albeit it's pretty biased against them- often with good reason.
You are in a pretty fortunate position given that:
You don't have to worry about SEO concerns
Your app is internal to your organization
To me this makes your choice pretty clear cut depending on whether or not you can require those within your organization to upgrade their browsers to a version that is HTML5 compatible, if they haven't already. So, if you have that ability, use HTML5 History api. If you don't, use hash. If you want HTML5 pushState with an HTML4 onhashchange fallback, you can do that as well though it'll require some extra work to ensure all your HTML5 links work for HTML4 users.
I am creating a single-page-app (SPA) on my local machine, but a few levels down from the document root. So, for example, my index page would be at http://localhost/projects/foo/index.html.
I'm using Davis.js to do client-side routing with the history API, and it recommends using absolute URLs for its routes. E.g. <a href="/hello/world"> triggers the /hello/world route.
This is a problem, because when you click that link residing on http://localhost/projects/foo/index.html, it changes the URL to http://localhost/hello/world, which is obviously not right, even though the app continues on like normal (because you never actually left the page). Refresh the page, though, and you'll get a 404 error, because the file http://localhost/hello/world doesn't exist.
Using relative links, like <a href="hello/world"> gets closer to the mark. Clicking that link changes the URL to http://localhost/projects/foo/hello/world, but does not trigger the /hello/world route. Click that same link again, and you'll find yourself at http://localhost/projects/foo/hello/hello/world (double hello). Again, not desired behavior.
Now, what's going on is that Davis is matching routes from the root of the domain, so /hello/world will only trigger when the url is http://somewhere.tld/hello/world. But even if I was serving directly out of the document root, there's still the problem that /hello/world doesn't actually exist.
At the moment, my current solution is forcing Davis to use hash-based routing instead of path-based: http://localhost/projects/foo/index.html#/hello/world. This works 100% as expected, because the browser will always load index.html and Davis will always see the /hello/world. Additionally, links containing that hash fragment will always work, provided the user has Javascript turned on. (I'm not worried about that case)
One solution I can see is to have a base URL of http://localhost/projects/foo/, have a server rewrite all requests in that directory to index.html, and have all links and routes point to and match the base url + fragment (like http://localhost/projects/foo/hello/world). So technically, all those URLs do exist, they just all point to the same file. This, however, requires that (a) a server capable of URL-rewriting serves the SPA (the url-hash solution doesn't even require a server, just a browser) and (b) the SPA keep track of "where" it is relative to the document root (which to me is a very bad thing).
So my question is, what is the right way to do client-side routing, agnostic to the app's location on the server, and preferably without requiring server-side technology other than static hosting.
I have had a similar experience with single page apps and client-side routing.
After considering this problem a little, I eventually realized that in the interest of SEO, you will want to actually have your server render the content at the absolute urls that Davis is suggesting. That way, the Google crawler can actually continue to crawl through your website as if it were not a single page app.
If you say you can't do any server-side technology, then the problem will be much more difficult. The solutions you have presented all seem reasonable.
You may also wish to read on this link about Google's specification for its crawler.
I think the best way is to have whatever client side routes you are defining with Davis also be available on the server. So if you have a client side route of /foo/bar then the server should ideally be able to respond sensibly to the same route.
This is often simpler than it sounds and doesn't have to involve a lot of duplication if you are using a language agnostic templating language such as mustache.
If this is not possible then there are workarounds so that the server returns something other than a 404 for your client side only routes. However these always feel like workarounds rather than solutions to me. Obviously the answer depends on what kind of application you are building.
As for using Davis with relative routes, I admit that is something that I've never used and so can't say how well supported it would be. There isn't anything in particular in the design of Davis that would prevent it from working.
However I've just been playing around with the pushState api in the browser here and it does seem to have some weirdness with relative paths.
We recently moved to jQuery 1.6 and ran into the attr() versus prop() back-compat issue. During the first few hours after the change was deployed everything was fine, then it started breaking for people. We identified the problem pretty quickly and updated the offending JS, which was inline.
No we have a situation where some folks are still having issues. In every case thus far, I could get the user up and running again by telling them to load the page in question then manually refresh it in the browser. So something must still be cached somewhere.
But there are basically only two potential culprits: First, the jQuery library itself, but this is loaded with the version number in the query string so I think browsers will be refreshing it in their cache. Second, the inline javascript. Is it possible that this is being cached in the browser?
We are using APC, apc.stat=1 so it should be detecting that the PHP files have changed. Just to be on the safe side I nuked the opcode cache anyway.
To summarize, I have two questions:
Could some browsers be ignoring the query string when jQuery is loaded?
Could some browsers be caching an older version of the inline javascript?
Any other ideas very welcome too.
UPDATE: In the course of checking that there wasn't any unexpected caching going on using Firebug, I discovered a case where the old jQuery library would load. That doesn't explain why we had trouble after deploying the site and before we updated the inline code, but if it solves the problem I'll take it.
The answer to both your questions is no. Unless the whole page is being cached. A browser can't cache part of a file, since it would have to download it to know which parts it had cached and by that time it's downloaded them all anyways. It makes no sense :)
You could try sending some headers along with your page that force the browser not to use it's cached copy.
I'd say the answers to your questions are 1) No and 2) Yes.
jQuery versions are different URLs so there's no caching problems there unless you somehow edit a jQuery file directly without changing the version string.
Browser pages (including inline javascript) will get cached according to both the page settings and the browser settings (that's what browsers do). The inline javascript isn't cached separately, but if the web page is cached, then the inline javascript is cached with it. What kind of permissible caching have you set for your web pages (either in meta tags or via http headers)?
Lots to read on the subject of web page cache control here and here if needed.
It is very important to plan/design an upgrade strategy when you want to roll out upgrades that works properly with cached files in the browser. Doing this wrong can result in your user either staying on old content/code until caches expire or even worse ending with a mix of old/new content/code that doesn't work.
The safest thing to do is to change source URLs when you have new content. Then there is zero possibility that an old cached page will ever get the new content so you avoid the mixing possibility. For example, on the Smugmug photo sharing web site, whenever any site owner updates an image to a new version of the image, a version number in the image URL is changed. Then, when the source page that shows that image is served from the web server, it includes the new image URL so, no matter whether the old version of the image is in the browser cache or not, the new version image is shown to the user.
It is obviously not always practical to change the URL of all pages (especially top level pages) so those pages often have to be set with short cache settings so the browsers won't cache them for long and they will regularly pull fresh content.