Search Engine Index Javascript Tabs

Search Engine Index Javascript Tabs - javascript

I have a website that is 1 html file and uses javascript to hide tabbed pages.
The url gets rewritten with a # for the different pages to make them bookmark-able.
Is there a way to make the different pages show in search engine results? It would be good to have them show up as different pages there.
I have read the below doc, but I think that is just for dynamically generated ajax content, right?
http://code.google.com/web/ajaxcrawling/docs/getting-started.html

I read the page mentioned by you. That is for Ajax site. In your case it is not Ajax.
Another point as Jeff B has mentioned is that the chance is high that Google will index all content for each trick you use. In that case it would be bad as Google will get duplicate content. It will be not very bad as all content are from your site only.
Search Engine questions like this are very tricky and difficult to answer as no one know the exact functioning of Search Engine.
In my thinking you either recreate your pages as Ajax and follow the points mentioned in article you got. Or
Use a link for each tag with param. like page1.php?cat1, page1.php?cat2, etc.
and that only load content related to specific tag at a time.
The second solution is no different than implementing different page for each tab, but it can be easier to update in your case! and also all content are still accessible by both person and search engine at a place. Slowly search engine will index your each page with parameter. Remember, It is generally said that Google does not index pages with parameter but it is not true. Google does not index page with variable or id kind of parameter only. They index each page with popular parameters if page content changes.
Still your question is tricky and my suggestion is what comes to me after thinking much about it.

The problem seems to be that even if the different pages were indexed, they would all index the same content. This is because according to your explanation all of the content (including hidden) exists at load time.
If your tabs are links, you simply need to put the href in the link. Google should follow this link, while javascript-enabled browsers will execute your tab-switching code and not follow the link (if you coded it right).
However, the problem of all content being indexed for all pages still remains.

Modify your system like this:
Every link that changes the content of the current tab should have
as href attribute a subpage that contains the content of the tab
intended to appear -> this will be cached by Search Engines.
Those links should have binded JS actions that changes the content
of the current tab and also denies the redirecting that should have
been done by what's in the "href" attribute -> this will be shown to
the user

Related

Include code from jQuery load() onto page source code [duplicate]

Many aspects of my site are dynamic. I am using jquery.
I have a div which once the DOM is ready is populated using load().
Then if a button is clicked, using load() once again, this value is replaced by another value.
This kind of setup is common across my site. My homepage is essentially lots of dynamically loaded, refreshed, and changeable content.
What are the repercussions of this for SEO?
Ive seen sites where each page is loaded using load() and then displayed using the animation functions... It looks awesome !
People have posed this question before, but noone has answered it properly.
So any ideas? JQUERY AND SEO??
Thanks
EDIT
Very interesting points. I dont want to overdo my site with jaascript.. just where neccesary to make it look good - my homepage however is one place of concern.
So when the DOM is readY, it loads content into a div. On clicking a tab, this content is changed. I.E No JS, No content.
The beauty here for me is that, there is no duplicated code. Is the suggestion here that i should simply 'print' some default content, then have the tabs link to pages (with the same content) if JS is disabled. I.E sacrifice a little duplicate code for SEO?
As far as degrading goes, my only other place of concern is tabs on the same page.. I have 3 divs, all containing content. On this page two divs are hidden until a tab is clicked. I used this method first before i started playing with JS. Would it perhaps be best to load() these tabs, then have the tab buttons link to where the content is pulled from?
Thanks

None of the content loaded via JavaScript will be crawled.
The common and correct approach is to use Progressive Enhancement: all links should be normal <a href="..."> to actual pages so that your site "makes sense" to a search spider; and the click() event overrides the normal functionality with load() so normal users with JavaScript enabled will see the "enhanced" version of your site.

If your content is navigable when JavaScript is turned off, you'll be a good ways toward being visible to search engines.
Note that search engine crawlers won't be submitting any forms on your site, so if you have any or elements that are meant to be navigating between your site's content pages, that content is not navigable by search engines.

Here is a guidelines how to make Google to crawl content loaded with ajax: http://code.google.com/web/ajaxcrawling/docs/getting-started.html

I use jquery load() asynchronous page load. It greatly improves user experience, but not seo-friendly. Here's the only solution I have found so far:
On first load I do not use jquery load() and try to write cookie with javascript.document.cookie = 'checkjs=on';
On next page load if php script finds this cookie it means that javascript is enabled and jquery load() can be used. If there's no such cookie then javascript is off (probably spider came), so jquery load() is not used.
if (!$_COOKIE['checkjs'] || $_COOKIE['checkjs']!='on'){echo 'js is off, hello Google!'; } else {echo 'js is on, can use jquery load';}
This way I can be sure that most of users can benefit from asynchronous page blocks load, exept for the very first load. And spiders get all content too.
In your case you could just load the same page with new parameter that makes another tab active. Spider is gonna be happy.

Is it possible to convert a dynamic HTML page with a lot of javascript to a page without javascript?

I have a page with a lots of javascript. However, the page once rendered remains static, there are no moving things or special effects, etc... It should be possible to render the same HTML without any javascript at all using only the plain HTML and CSS. This is exactly what I want - I would like to get a no javascript version of the particular page. Surely, I do not expect any dynamic behavior, so I am OK if buttons are dead, for example. I just want them rendered.
Now, I do not want an image. It needs to be an HTML with CSS, may be embedded with the HTML, which is fine too.
How can I do it?
EDIT
I am sorry, but I must have not been clear. My web site works with javascript and will not work without it. I do not want to check if it works without, I know it will not and I really do not care about it. This is not what I am asking. I am asking about a specific page, which I want to grab as pure HTML + CSS. The fact that its dynamic nature is lost is of no importance.
EDIT2
There is a suggestion to gram the HTML from the DOM inspector. This is what I did the first thing - in Chrome development utils copied as HTML the root html element and saved it to a file. Of course, this does not work, because it continues to reference the CSS files on the web. I guess I should have mentioned that I want it to work from the file system.
Next was to save the page as complete with all the environment using some kind of the Save menu (browser dependent). It saves the page and all the related files forming a closure, which can be open from the file system. But the html has to be manually cleaned up of all the javascript - tedious and error prone.
EDIT3
I seem to keep forgetting things. Images should be preserved, of course.

I have to do a similar task on a semi-regular basis. As yet I haven't found an automated method, but here's my workflow:
Open the page in Google Chrome (I imagine FireFox also has the relevant tools);
"Save Page As" (complete page), rename the html page to something nicer, delete any .js scripts which got downloaded, move everything into a single folder;
On the original page, open the Elements tab (DOM inspector), find and delete any tags which I know cause problems (Facebook "like" buttons for example) (I also try to delete script tags at this stage because it's easier) and copy as HTML (right-click the <html> tag. Paste this into (replace) the downloaded HTML file (remember to keep the DOCTYPE which doesn't get copied;
Search all HTML files for any remaining script sections and delete (also delete any noscript content), and search for on (that's with a space at the start but StackOverflow won't render it) to remove handlers (onload, onclick, etc);
Search for images (src=, url(), find common patterns in image filenames and use regular expressions to replace them globally. So for example src="/images/myimage.png" => |/images/||. This needs to be applied to all HTML and CSS files. Also make sure the CSS files have the correct path (href). While doing this I usually replace all href (links) with #;
Finally open the converted page in a browser (actually I tend to do this early on so that I can see if any change I make causes it to break), use the Console tab to check for 404 errors (images that didn't get downloaded or had a different name) and the Network tab to check if anything is still being loaded from the online version;
For any files which didn't get downloaded I go back to the original page and use the Resources tab to find them and download manually;
(Optional) Cull any content which isn't needed (tracker images/iframes, unused CSS, etc).
It's a big job. I'd love a tool which automated all that, but so far I haven't found one. The pages I download are quite badly made (shops) which have a lot of unusual code, so that's why there are so many steps. You might not need to follow every step.

Update window hash (at url)

So I have this js code for an image gallery:
(this.settings.update_window_hash) {
var thumb_link = this.images[this.current_index].thumb_link;
if (thumb_link.attr("id")) {
window.location.hash = "#image-"+ thumb_link.attr("id"); //#url
} else {
window.location.hash = "#image-"+ this.current_index;
};
};
So as you've probably assumed this appends $image-(int) to the url. So if I have a
gallery with multiple images if the thir image is selected the url will look like this:
mysite.com/gallery.html#image-3
All good. But I dont really like this to be appended to the end of the url. So is there
any problem if I remove this part of the script entirely? So regardless the number of
image currently selected the url will look like this:
mysite.com/gallery.html
I've tested it and it works okay. But I'm not very experienced with javascript and I want
to make sure I'm not making a mistake. So
IS IT OKAY IF I REMOVE THIS SCRIPT ENTIRELY? WILL IT CAUSE ANY PROBLEMS?
HUGE THANKS.

Hashes at the end of the URL are optional and not required so YES, you can remove that script if you want (I'm not sure what problem you're trying to solve by removing it). In general, you get more useful answers if you tell us what problem you're trying to solve rather than what solution you're trying to use.
Hashes are used when you want the URL of the page to direct the viewer to some subcontent on that page. If you remove them, your page will still work just fine, but the URL of the page will not reflect which image is displaying. So, if the viewer saves that URL and comes back to it or links to it or anything that keeps a reference to the URL, it will go to the generic version of the page, not the onethat shows a specific image. Whether that is OK is totally up to you and how your page works.

Just use:
location.replace(location.href + "#myhash");
The location.replace method overwrites the current step in browser history. For an example of this in action see http://prettydiff.com/slideshow/

The stuff after the octothorpe normally represents a "name" or "id" from the web page. You can have an anchor tag (<a name='thevalue'>) and the browser will interpret the text after the octothorpe (http://example.com#thevalue) by scrolling to the associated section on the page.
Unless the page has special JavaScript to behave differently. In your case, it depends upon the full functionality of the web page you're writing. If you have smoke tests/unit test/use case tests/other QE tests, you should execute those to ensure that your changes don't break anything.
See http://www.w3schools.com/html/html_links.asp for more description of the standard usage.

General questions about #! hashbang urls and am I using them correctly

I'm in the process of writing a website that includes a reasonably large gallery. First page of the gallery the user will be displayed a bunch of thumbnail images with a url of: website.com/gallery.php
When they click a thumbnail image, if javaScript is turned off it will follow the url in the href and go to a page called gallery.php?img=67. If javaScript is turned on the href click will not execute, instead it will perform an ajax request to display the larger image and some text about it. The url changes to gallery.php#!img=67. The back button will take you back to the thumbnails, pressing f5 will keep the big image displayed with the text. If someone copies the address with the #! and sends it to someone they will get the same image displayed (assuming the receiver has javaScript turned on).
My question is, have I sorted this out correctly for google to index the individual gallery pages? Will google index them twice, once with the ?img=67 and once with the #! and if so is that a bad thing? I'm using javaScript/Ajax to preload the larger images once the thumbnail page is loaded for speed. I've read a lot of backlash against using hasbang ajaxy things recently and wondering if you think can justify using it here?

Google will follow your links and index the ?img=67 pages, and will not index your #! pages, because it can't see those links. You can tell Google about those links by doing the following:
Add <meta name="fragment" content="!"> to the <head> of your document, and
Handle requests for /?_escaped_fragment_= by returning an "HTML Snapshot" of your page that has all your #! links in the <A> tags.
Also, to make the most of this feature, you should also handle requests for /?_escaped_fragment_=img=67 by returning an HTML snapshot page with the big image displayed. Remember, GoogleBot doesn't execute Javascript. Using the #! URL tells Google to retrieve an alternate version of the page (A version where #! has been replaced with ?_escaped_fragment_=) that should render without Javascript.

THe use of #! tags in URLs are in the news recently, with updates to a well known blog.
http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs has a good description of what they are best used for - and when they can be bad. I think your use in a gallery is quite valid.
In short, a URL like http://lifehacker.com/#!5753509/hello-world... is rewritten by Google, and other compatible web-spiders as http://lifehacker.com/?_escaped_fragment_=5753509/hello-world...
Google may index them twice, but you can also use the canonical meta-tag to ensure it knows what the 'official' copy is.

Possible solution (as suggested in http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs) is to use regular links and translate them to #! in the OnClick() event. This ensures that the site displays regular links and not the shitty #!.
It does mean extra work for the server though, since the server needs to support both versions (the Ajax version and the regular version), but I think it worth it.These #! are so ugly..

Ajax entire page - display only one div and retain CSS and other header material?

A client wants a merch shop on their site, and has set one up. I could iFrame in the whole page to the merch page, but frankly the merch site is an eyesore, and their site has a very particular feel to it. So I'm considering using an AJAX GET to grab the whole page, then javascript to display only the div with the merchandise in it. However, there are a lot of javascript includes (etc) on the merch site that I'd need to make sure are still present for the div to work correctly.
Any feeling on if this would work or not? Would the displayed div take its stylesheet and scripts from the AJAX'd page? Can I put the div in an iframe instead?
Opinions?

It sounds like an ugly solution. Isn't it better to do this serverside instead, for example let a PHP script read in the page and to whatever magic it takes to display it?
Using AJAX to load entire pages is ugly for a couple of reasons, including:
It breaks the URLs (can be worked around but requires extra work)
It's hard for search engines to crawl your site
It breaks some GUI elements in the browser, such as loading visualisations

looks like you can use jquery load function http://docs.jquery.com/Ajax/load

We Keep Coding

JavaScript is the programming language of the Web.