Javascript redirect impact on SEO

Javascript redirect impact on SEO - javascript

Not sure if this is the best place for this question but it's something I've been really curious about. I'd like to use data only available on the client side for loading resources/assets for a website, such as device-pixel-ratio, touch support, etc.
The content on the page will not be changing, just resources like JS files, CSS files, and image files.
There are a few scripts already out there that work like this that run client-side tests and then store the data in a cookie, and then reload the page, loading resources based on data stored in the cookie.
The process works as follows:
User comes to the site
JS sets cookie with device features
JS reloads current page
Server can now access the cookie with all the feature data
Can conditionally load resources and assets based on this data
Is this a bad practice to immediately reload the page as the user comes to it. Are there any SEO drawbacks to this method. It seems like a great technique for conditionally loading resources based on device capabilities. I'm just not sure if there are any reasons not to do this?

Many web crawlers do not use full Javascript or cookie functionality. For instance, GoogleBot does interpret all Javascript by default. Thus, all the content you are dynamically loading as a part of your cookie may not be detected by the crawler and will not be indexed as a result. This kills the SEO.
As a quote from Matt Cutts (Google's webspam guy):
"For a while, we were scanning within JavaScript, and we were looking
for links. Google has gotten smarter about JavaScript and can execute
some JavaScript. I wouldn't say that we execute all JavaScript, so
there are some conditions in which we don't execute JavaScript.
Certainly there are some common, well-known JavaScript things like
Google Analytics, which you wouldn't even want to execute because you
wouldn't want to try to generate phantom visits from Googlebot into
your Google Analytics".
Reference: http://www.searchnewz.com/topstory/news/sn-2-20100315SEOInterviewwithMattCutts.html

Well, search engines usually neither support cookies nor JavaScript, so they will get the default version only.
And some search engines may test for that and might see this as a "doorway page" (and thus punish the site). I wonder if one of the reasons they started their own web browser was as a side product of developing a robot that checks for such things. Obviously, the robots needs to be fast in JavaScript...

I certainly would not like the page to reload when I've just started using it.
You should probably be using media queries (for the CSS) and feature detection (for JS).
#media all and (min-width:420px) {
/*styles...*/
}
And:
if( typeof window.localStorage !== "undefined") {
// you can now do stuff with localStorage.
}

Related

Use of JavaScript in lieu of hyperlinks

As RIAs and SPAs (or web apps with heavy javascript usage) have become more and more popular, I've been running into systems that, instead of using good old a href hyperlinks, I see them utilizing constructs using onclick with JavaScript code that manipulates navigation. This is particularly true with images.
For example, instead of seeing something like this:
<img src="...."/>
<div ... onclick='SomeJsFunctionThatNavsToAnotherPage()'><img src="..."/></a>
What is the advantage of this? It makes it incredibly hard to trace where pages transition to when debugging or trying to root cause a bug. I can get the idea when the target to navigate can change (so yes, here you could use a function that computes to what page to navigate to.)
But I see this pattern even when the pages to navigate to are constant. I find this extremely convoluted and hard to test. Not to mention that there is always the browser-specific bugs that come from stuff (sadly in my experience from over-complexifying the front-end.)
But I am not a RIA/SPA developer (just backend and traditional web development). Am I missing the rationale behind this?
TO CLARIFY
My question is not for the case when we want to redraw the page or change current content without changing the current location. My question is for plain
old transitions, from page A to page B.
In such a case, why use onclick=funcToChangeLocation() over <a href="some location"/>.
This has been a pain for me when troubleshooting systems that are already written (for I wouldn't write them like that), but there could be reasons I am not aware of.
Again, my question is not for pages that redraw themselves without changing the browser location, but for navigation from one page to the next.
ALSO
If you are going to vote to close this question, at least leave a message explaining why.

If you are making a web application, sometime you don't want to redirect the user to another page, but you want to dynamically change the content of the page without refreshing the page. It has some advantages. It can be faster. You can easily keep the state of the page/application. You are not obligated to communicate with the server. You can update only a part of the page.
You can also dynamically request data to print the page. If you are displaying an user profile page, you can only ask a json object that represent the user. This json object is smaller than the whole page and will be dynamically rendered. It can help to reduce the data transfer between users and server when your bandwidth is limited.
EDIT: In the case of a simple page redirection, I think it's a bad practice and I cannot see an advantage. I think it obfuscate the website when the google crawler try to parse the website.

I once had a pretty successful web directory website. One day Google decided that "directories" are competing businesses and started penalizing sites that had links on directories. I used the method you describe to cloak outgoing links to try and trick Google.

Make javascript do stuff on external pages

Ok so im learning javascript and I just wanted to know if its possible to make it do actions on external pages. For example if I wanted to say 'onload redirect to somesite.com/page1 then once on somesite.com/page1 fill in register form with these details'
is that possible?

You cannot do this.
This would represent, for lack of a better word, an enormous security hole.
The only way to make an external page "do stuff" is to write code that is on or explicitly included in that page itself. Period.
I have however, seen external pages get loaded INTO the current page as strings, and then have the javascript that loaded those pages modify that markup directly. But that is ugly.

On the first page you could modify some variables/values in a database. Then, in the second page you could check the values in your database, and do different "stuff" depending on those values.
You would need to set up a database and use some server-side scripting along with Javascript (server-side scripting is used to interact with your server/database). In your first page, the server-side script, like PHP, would fetch info from your Javascript. In your second page, your Javascript would fetch info from your server side script and then do stuff to that page.
This is a much safer way. If you are taking user input from things like HTML fields, you need to look into cleaning the input to prevent something called "cross-site scripting (XSS)".

You could do this IF you were rendering the other page in a frame of some sort.
There are multiple ways in which you can render an entire external page as a piece of your page. Many pages take precautions to block being rendered in a frame for just this very reason though (Not to mention copyright issues).
Once you're rendering the external page inside your page you should be able to reference components nested in your frame and do the sort of thing that you're describing.

There's no way to do this with JavaScript. The developers of all the major browsers work very, very hard to prevent this sort of thing. If this were possible, it would open up pretty massive security holes.
If you really want to use something like this for testing, you can look at browser automation software like Selenium. This allows you to automate various testing scenarios in your browser, but it does not affect other clients using your site.

Javascript concept using history object

I am interested in making a website that flashes through a visitors entire web history when they visit. I plan on using JavaScript to grab the history on each viewer's computer and animate through it with varying speeds depending on how much they had. My thought was to use history.length to determine the length of the visitor's history, and then use history.go() to navigate -1, -2, -3, etc. through the entire web history. I recognize that load times would be HUGE, but right now I am just trying to think through the concept. This related question seems like what I would use for the basis of my code, however, I don't understand why they describe that this method would not work. I am a student who is very new to JavaScript.
Do you guys have any knowledge of whether or not this will work, or any ideas on ways to achieve my idea?

You can call history.go() once. That's about as far as you'll get. The reason is simple, once you're on the previous page, your javascript is gone. Iframes won't work either due to the fact that you can't execute your own JS in an iframe that has a page from another domain. Read about same origin policy for more info on that.
The only real solution I can think of is a browser extension. The reason that'll work is due to the fact that your JS can persist across multiple sites. You'd probably just need a userscript in each page that does the following:
check a variable to see if the functionality is enabled
if it is, call history.go(-1) after a timeout (to control the speed)
I'm most familiar with Chrome so I'm imagining a browserAction to enable/disable the script and a content script that does the redirect. Other potential options include Greasemonkey (Firefox), Tampermonkey (Chrome), Personalized Web (Chrome) scripts

As stated in the question you linked to, JavaScript and / or the DOM does not give you access to the entire browser history since that would be a severe privacy violation. Imagine going to a site and having them be able to know every site you ever visited in that browser.
This would potentially give the site access to:
Sessions you are still logged into on other sites (if they store the session key in the URL, as some sites do)
Insight into what kind of activities you perform (are you a moderator on site X?)
Enormous amounts of data on what you are interested in.
This is not something that standards bodies or browser manufacturers thought users would be interested in sharing with everybody. That's why there isn't an API to walk through the browser's entire history.
#sachleen has already provided a very nice in-depth answer on how you can get around this limitation for individual browsers if you want to build this application. For the sake of completeness I'll simply mention the key term: "browser extension". :-)

how do web crawlers handle javascript

Today a lot of content on Internet is generated using JavaScript (specifically by background AJAX calls). I was wondering how web crawlers like Google handle them. Are they aware of JavaScript? Do they have a built-in JavaScript engine? Or do they simple ignore all JavaScript generated content in the page (I guess quite unlikely). Do people use specific techniques for getting their content indexed which would otherwise be available through background AJAX requests to a normal Internet user?

JavaScript is handled by both Bing and Google crawlers. Yahoo uses the Bing crawler data, so it should be handled as well. I didn't look into other search engines, so if you care about them, you should look them up.
Bing published guidance in March 2014 as to how to create JavaScript-based websites that work with their crawler (mostly related to pushState) that are good practices in general:
Avoid creating broken links with pushState
Avoid creating two different links that link to the same content with pushState
Avoid cloaking. (Here's an article Bing published about their cloaking detection in 2007)
Support browsers (and crawlers) that can't handle pushState.
Google later published guidance in May 2014 as to how to create JavaScript-based websites that work with their crawler, and their recommendations are also recommended:
Don't block the JavaScript (and CSS) in the robots.txt file.
Make sure you can handle the load of the crawlers.
It's a good idea to support browsers and crawlers that can't handle (or users and organizations that won't allow) JavaScript
Tricky JavaScript that relies on arcane or specific features of the language might not work with the crawlers.
If your JavaScript removes content from the page, it might not get indexed.
around.

Most of them don't handle Javascript in any way. (At least, all the major search engines' crawlers don't.)
This is why it's still important to have your site gracefully handle navigation without Javascript.

I have tested this by putting pages on my site only reachable by Javascript and then observing their presence in search indexes.
Pages on my site which were reachable only by Javascript were subsequently indexed by Google.
The content was reached through Javascript with a 'classic' technique or constructing a URL and setting the window.location accordingly.

Precisely what Ben S said. And anyone accessing your site with Lynx won't execute JavaScript either. If your site is intended for general public use, it should generally be usable without JavaScript.
Also, related: if there are pages that you would want a search engine to find, and which would normally arise only from JavaScript, you might consider generating static versions of them, reachable by a crawlable site map, where these static pages use JavaScript to load the current version when hit by a JavaScript-enabled browser (in case a human with a browser follows your site map). The search engine will see the static form of the page, and can index it.

Crawlers doesn't parse Javascript to find out what it does.
They may be built to recognise some classic snippets like onchange="window.location.href=this.options[this.selectedIndex].value;" or onclick="window.location.href='blah.html';", but they don't bother with things like content fetched using AJAX. At least not yet, and content fetched like that will always be secondary anyway.
So, Javascript should be used only for additional functionality. The main content taht you want the crawlers to find should still be plain text in the page and regular links that the crawlers easily can follow.

crawlers can handle javascript or ajax calls if they are using some kind of frameworks like 'htmlunit' or 'selenium'

What are advantages of using google.load('jQuery', ...) vs direct inclusion of hosted script URL?

Google hosts some popular JavaScript libraries at:
http://code.google.com/apis/ajaxlibs/
According to google:
The most powerful way to load the libraries is by using google.load() ...
What are the real advantages of using
google.load("jquery", "1.2.6")
vs.
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.2.6/jquery.min.js"></script>
?

Aside from the benefit of Google being able to bundle multiple files together on the request, there is no perk to using google.load. In fact, if you know all libraries that you want to use (say just jQuery 1.2.6), you're possibly making the user's browser perform one unneeded HTTP connection. Since the whole point of using Google's hosting is to reduce bandwidth consumption and response time, the best decision - if you're just using 1 library - is to call that library directly.
Also, if your site will be using any SSL certificates, you want to plan for this by calling the script via Google's HTTPS connection. There's no downside to calling a https script from an http page, but calling an http script from an https page will causing more obscure debugging problems than you would want to think about.

It allows you to dynamically load the libraries in your code, wherever you want.
Because it lets you switch directly to a new version of the library in the javascript, without forcing you to rebuild/change templates all across your site.

It lets Google change the URL (but they can't since the URL method is already established)
In theory, if you do several google.load()s, Google can bundle then into one file, but I don't think that is implemented.

I find it's very useful for testing different libraries and different methods, particularly if you're not used to them and want to see their differences side by side, without having to download them. It appears that one of the primary reason to do it, would be that it is asynchronous versus the synchronous script call. You also get some neat stuff that is directly included in the google loader, like client location. You can get their latitude and longitude from it. Not necessarily useful, but it may be helpful if you're planning to have targeted advertising or something of the like.
Not to mention that dynamic loading is always useful. Particularly to smooth out the initial site load. Keeping the initial "site load time" down to as little as possible is something every web designer is fighting an uphill battle on.

You might want to load a library only under special conditions.
Additionally the google.load method would speed up the initial page display. Otherwise the page rendering will freeze until the file has been loaded if you include script tags in your html code.

Personally, I'm interested in whether there's a caching benefit for browsers that will already have loaded that library as well. Seems like if someone browses to google and loads the right jQuery lib and then browses to my site and loads the right jQuery lib... ...both might well use the same cached jQuery. That's just a speculative possibility, though.
Edit: Yep, at very least when using the direct script tags to the location, the javascript library will be cached if someone has already called for the library from google (e.g. if it were included by another site somewhere).

If you were to write a boatload of JavaScript that only used the library when a particular event happens, you could wait until the event happens to download the library, which avoids unnecessary HTTP requests for those who don't actually end up triggering the event. However, in the case of libraries like Prototype + Scriptaculous, which downloads over 300kb of JavaScript code, this isn't practical.

We Keep Coding

JavaScript is the programming language of the Web.

Javascript redirect impact on SEO - javascript

Related

Use of JavaScript in lieu of hyperlinks

Make javascript do stuff on external pages

Javascript concept using history object

how do web crawlers handle javascript

What are advantages of using google.load('jQuery', ...) vs direct inclusion of hosted script URL?

Categories

Resources