AFAIK, the rel="nofollow" attribute on links instruct search engines not to follow through the link when it crawls your site, therefore severing all assumption of relationship between your site and the linked site, and therefore, not sharing any of your SEO goodness. For the most part, that's a Good Thing™ on a comment system.
Now, after integrating an IntenseDebate system on my site, I noticed that the commenter names link through their respective websites without nofollow. This kind of raised an alarm in my head --- that is, until I realized that these were generated dynamically via AJAX. Which means that these links aren't there when a search spider crawls through my site.
Problem averted. Good. A good sigh of relief.
But then, there are these sites that suggest implementing a script-based solution to add nofollow.
Now that just doesn't jive well with my current understanding of nofollow, for two reasons:
As mentioned, the links aren't there when a spider crawls your page. So it doesn't make sense to nofollow it, because as far as the spider is concerned, there isn't anything to follow after all.
Regarding static links, a spider wouldn't be able to run the script to add nofollow on your markup, so links that a spider can follow will be unmodified, and therefore, are follow links.
Am I missing something here? Is it actually useful to dynamically add nofollow to links using Javascript?
From an interview with Matt Cutts from Google (emphasis mine):
For a while, we were scanning within JavaScript, and we were looking
for links. Google has gotten smarter about JavaScript and can execute
some JavaScript. I wouldn't say that we execute all JavaScript, so
there are some conditions in which we don't execute JavaScript.
Certainly there are some common, well-known JavaScript things like
Google Analytics, which you wouldn't even want to execute because you
wouldn't want to try to generate phantom visits from Googlebot into
your Google Analytics.
We do have the ability to execute a large fraction of JavaScript when
we need or want to. One thing to bear in mind if you are advertising
via JavaScript is that you can use NoFollow on JavaScript links
Additional debate on the topic: https://webmasters.stackexchange.com/questions/5653/does-the-google-spider-render-javascript.
Related
As RIAs and SPAs (or web apps with heavy javascript usage) have become more and more popular, I've been running into systems that, instead of using good old a href hyperlinks, I see them utilizing constructs using onclick with JavaScript code that manipulates navigation. This is particularly true with images.
For example, instead of seeing something like this:
<img src="...."/>
<div ... onclick='SomeJsFunctionThatNavsToAnotherPage()'><img src="..."/></a>
What is the advantage of this? It makes it incredibly hard to trace where pages transition to when debugging or trying to root cause a bug. I can get the idea when the target to navigate can change (so yes, here you could use a function that computes to what page to navigate to.)
But I see this pattern even when the pages to navigate to are constant. I find this extremely convoluted and hard to test. Not to mention that there is always the browser-specific bugs that come from stuff (sadly in my experience from over-complexifying the front-end.)
But I am not a RIA/SPA developer (just backend and traditional web development). Am I missing the rationale behind this?
TO CLARIFY
My question is not for the case when we want to redraw the page or change current content without changing the current location. My question is for plain
old transitions, from page A to page B.
In such a case, why use onclick=funcToChangeLocation() over <a href="some location"/>.
This has been a pain for me when troubleshooting systems that are already written (for I wouldn't write them like that), but there could be reasons I am not aware of.
Again, my question is not for pages that redraw themselves without changing the browser location, but for navigation from one page to the next.
ALSO
If you are going to vote to close this question, at least leave a message explaining why.
If you are making a web application, sometime you don't want to redirect the user to another page, but you want to dynamically change the content of the page without refreshing the page. It has some advantages. It can be faster. You can easily keep the state of the page/application. You are not obligated to communicate with the server. You can update only a part of the page.
You can also dynamically request data to print the page. If you are displaying an user profile page, you can only ask a json object that represent the user. This json object is smaller than the whole page and will be dynamically rendered. It can help to reduce the data transfer between users and server when your bandwidth is limited.
EDIT: In the case of a simple page redirection, I think it's a bad practice and I cannot see an advantage. I think it obfuscate the website when the google crawler try to parse the website.
I once had a pretty successful web directory website. One day Google decided that "directories" are competing businesses and started penalizing sites that had links on directories. I used the method you describe to cloak outgoing links to try and trick Google.
I want to keep bots from following my external links through rel=nofollow.
I have 2 questions about it:
1) Does this really help my page ranking (I heard a SEO guy saying this, as it the page ranking should go up as the probability is lower that the user leaves the page)
2) Does it work when the rel=nofollow is set through javascript in the $(document).ready() function?
EDIT: thanks for the suggestions so far - to go more into detail to 1:
how can the robot know(...)?
The robot knows this because he knows the page ranking of the page that you link to, and if it is high the probability is high that you follow this link and so by leave my page. That's why it is supposed to be good if you have more incoming than outgoing links, where of course incoming links from high-ranked pages count more than incoming links from low-ranked websites. on the other hand outgoing links to high-ranked pages are supposed to increase the probability that the user leaves... but I am no expert in this that's just what this SEO guy was telling
EDIT 2
Question is if it improves my Google pageranking if I put rel="nofollow" to external links, and - in case it improves my page ranking - if this still works through setting it with javascript.
Thanks in advance
1.
It's possible. Your pages will flow pagerank internally, so having more outbound links will decrease the pagerank you flow to your own pages.
2.
Google is capable of reading javascript, and will honor a nofollow on dynamically created links, however, I am not sure if it works when dynamically adding nofollow on 'static' links.
Of course, there's much speculation when it comes to SEO.
I doubt
No, it doesn't work. Bots generally don't execute JavaScript code.
What?
the page ranking should go up as the probability is lower that the user leaves the page
How should a robot know this?
Robots don't process JavaScript, rel="nofollow" has to be present in the source markup as it is sent to the client.
And to add: rel="nofollow" does not guarantee that a link is not followed or added as link to the other page to build up page rank (the real process is much more complex); that depends on the robot/search engine.
Adding a rel="nofollow" will not stop the bot following the link. but it will stop the bot giving any of your page rank to that link.
Oh and as said before mostly bots do not execute JavaScript. I belive google have been playing around with one that dose, but this is the exception not the norm.
1) The more pages that you link out to, the more it affects your authority ratio, you essentially want more linking in that you link out. CTR is tracked by google analytics and this is factored into their essentially blackbox search ranking magic.
2) Whilst it's commonly thought that robots don't process JavaScript, this is wrong, google's current generation of robots are ajax aware.
I came here looking for an answer to this question myself. (Thanks Andre!)
I can attest to Google following links with href="javascript:..." URLs, and going to the correct pages, so that is no defense against unwanted link-crawling. I have also seen search result snippets include text inserted by javascript, so there is ample evidence of Google processing javascript.
If the links are internal, proper use of robots.txt would be the preferred, easier, and more bandwidth-efficient answer, of course, if you have access to that. (We don't on the server in question, thus my own search for answers.)
I shall be adding nofollow via javascript.
I'm creating a blog, but I need box-shadows for my boxes, so I'm asking the following.
Is it good to add shadows via a)images/css or b)javascript?
I've heard that lot of people don't have javascript enabled while browsing, so is there this a problem? It would be easier and simpler to create these shadows with javascript than adding a million divs and positioning them.
EDIT: I found this page: http://www.w3schools.com/browsers/browsers_stats.asp and it says that almoset every user has js enabled.
You could use JavaScript for your layout, but the general principal that you should keep in mind is that your HTML should be semantic: the elements on the page should have a meaning; it should project a structure that goes beyond the design of the page (although that structure can certainly be used as an indcator for the design aspects as well).
When this principal is applied, using JavaScript can help with providing the style you wish to project given the semantic meaning of the page.
Also, you should check your server logs (your hosting provider should have some sort of analytics tool/report available) which should tell you what browsers and versions are being used to visit your site. With that information, you can get a good feel for the people that you are currently reaching.
If you are using some sort of analytics package (e.g. Google Analytics) then you can possibly see the delta between two periods of time for the new visitors to your site as well, and try to gauge the capability of the browsers that new users will be using when they visit your site.
A few things to consider when using JavaScript to manipulate the DOM on the front end:
If you are using JavaScript to manipulate a good deal of the content, it's going to be a client-side process, and that can slow down the rendering of your page. You might want to consider a theme/template for your blog/cms which gives you the styling that you want and is rendered through CSS on the server-side.
Search engines do not execute your JavaScript. Because of this, you want to avoid manipulating the indexable content at all costs. You want your content to be embedded in the HTML as it is sent from the server. Using AJAX or other JavaScript to manipulate certain things is fine, but when it comes to your content, unless you are stylizing it, do not use JavaScript to manipulate it
Use CSS box-shadow for nice, up-to-date browsers: http://css-tricks.com/snippets/css/css-box-shadow/ (requires no extra markup)
And for most everyone else, serve up your js solution.
You should do it the easiest way for you and allow the page to degrade gracefully for those without JS (If you think you need to consider them, as today, I don't see any point in building none JS sites or building sites for no-js users).
According to this page it would seem like they don't, in the sense that they don't actually run it, but that page is 2 years old (judging from the copyright info).
The reason I'm asking this question is because we use Javascript to replace text on our site with other more typographically sound content. We're worried that this may affect the crawlability/seo of our sites, since generally what we're replacing is headers; ie. <h1>, <h2>, etc.
Will search engine bots see our original code, or will they run the Javascript and see the replaced text?
Google now officially processes JavaScript.
In order to solve this problem, we decided to try to understand pages by executing JavaScript. It’s hard to do that at the scale of the current web, but we decided that it’s worth it. We have been gradually improving how we do this for some time. In the past few months, our indexing system has been rendering a substantial number of web pages more like an average user’s browser with JavaScript turned on.
Sometimes things don't go perfectly during rendering, which may negatively impact search results for your site. Here are a few
potential issues, and – where possible, – how you can help prevent
them from occurring:
If resources like JavaScript or CSS in separate files are blocked (say, with robots.txt) so that Googlebot can’t retrieve them, our
indexing systems won’t be able to see your site like an average user.
We recommend allowing Googlebot to retrieve JavaScript and CSS so that
your content can be indexed better. This is especially important for
mobile websites, where external resources like CSS and JavaScript help
our algorithms understand that the pages are optimized for mobile. If
your web server is unable to handle the volume of crawl requests for
resources, it may have a negative impact on our capability to render
your pages. If you’d like to ensure that your pages can be rendered by
Google, make sure your servers are able to handle crawl requests for
resources.
It's always a good idea to have your site degrade gracefully. This will help users enjoy your content even if their browser doesn't have
compatible JavaScript implementations. It will also help visitors with
JavaScript disabled or off, as well as search engines that can't
execute JavaScript yet.
Sometimes the JavaScript may be too complex or arcane for us to execute, in which case we can’t render the page fully and accurately.
Some JavaScript removes content from the page rather than adding, which prevents us from indexing the content.
Search engines don't process JavaScript as such.
There is some evidence that Google may have started processing inline script content in some cases, in order to catch content that is entered into the page parse queue using document.write. However certainly DOM methods such as you might use for font-replacement are not affected and no onload code is invoked.
Generally no. Google has mentioned that they are working on a system of indexing ajax content, but I don't think any of the major search engines index dynamic content as a rule. See this page for Google's take on it: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=81766
The bots will certainly not run the Javascript code, but they might recognise some commonly used scripts.
You shouldn't count on it though. Clear markup, proper content and real links is still what counts.
Also, if the bots happen to recognise your script, it might not be in your favor. If the code is recognised as something that is commonly used to try to fool bots, it could even hurt your page ranking.
I'd use metadata to ensure bots pick up the content on your pages.
I know the general consensus is that google does not process javascript or index anything with a <script> tag, however, the general consensus appears incorrect.
Try searching for the following, with the surrounding quotes (or click here):
"Samsung Public Interest Statement by Thomas Fusco, Fish & Richardson P.C., for Samsung."
You should only get one result. Now click on that result (or just click here) and view the source.
Do a CTRL-F for the text you searched for in Google. Notice that the text is in a javascript variable, and not html. Google must be processing some javascript to pull those words into its index.
Today a lot of content on Internet is generated using JavaScript (specifically by background AJAX calls). I was wondering how web crawlers like Google handle them. Are they aware of JavaScript? Do they have a built-in JavaScript engine? Or do they simple ignore all JavaScript generated content in the page (I guess quite unlikely). Do people use specific techniques for getting their content indexed which would otherwise be available through background AJAX requests to a normal Internet user?
JavaScript is handled by both Bing and Google crawlers. Yahoo uses the Bing crawler data, so it should be handled as well. I didn't look into other search engines, so if you care about them, you should look them up.
Bing published guidance in March 2014 as to how to create JavaScript-based websites that work with their crawler (mostly related to pushState) that are good practices in general:
Avoid creating broken links with pushState
Avoid creating two different links that link to the same content with pushState
Avoid cloaking. (Here's an article Bing published about their cloaking detection in 2007)
Support browsers (and crawlers) that can't handle pushState.
Google later published guidance in May 2014 as to how to create JavaScript-based websites that work with their crawler, and their recommendations are also recommended:
Don't block the JavaScript (and CSS) in the robots.txt file.
Make sure you can handle the load of the crawlers.
It's a good idea to support browsers and crawlers that can't handle (or users and organizations that won't allow) JavaScript
Tricky JavaScript that relies on arcane or specific features of the language might not work with the crawlers.
If your JavaScript removes content from the page, it might not get indexed.
around.
Most of them don't handle Javascript in any way. (At least, all the major search engines' crawlers don't.)
This is why it's still important to have your site gracefully handle navigation without Javascript.
I have tested this by putting pages on my site only reachable by Javascript and then observing their presence in search indexes.
Pages on my site which were reachable only by Javascript were subsequently indexed by Google.
The content was reached through Javascript with a 'classic' technique or constructing a URL and setting the window.location accordingly.
Precisely what Ben S said. And anyone accessing your site with Lynx won't execute JavaScript either. If your site is intended for general public use, it should generally be usable without JavaScript.
Also, related: if there are pages that you would want a search engine to find, and which would normally arise only from JavaScript, you might consider generating static versions of them, reachable by a crawlable site map, where these static pages use JavaScript to load the current version when hit by a JavaScript-enabled browser (in case a human with a browser follows your site map). The search engine will see the static form of the page, and can index it.
Crawlers doesn't parse Javascript to find out what it does.
They may be built to recognise some classic snippets like onchange="window.location.href=this.options[this.selectedIndex].value;" or onclick="window.location.href='blah.html';", but they don't bother with things like content fetched using AJAX. At least not yet, and content fetched like that will always be secondary anyway.
So, Javascript should be used only for additional functionality. The main content taht you want the crawlers to find should still be plain text in the page and regular links that the crawlers easily can follow.
crawlers can handle javascript or ajax calls if they are using some kind of frameworks like 'htmlunit' or 'selenium'