I have an aspx application.
In every GET the server respond with a "basic" html containing everything except table grids.
This "grid information" is contained in a input type hidden (json format) in the page.
This is by design and cannot be changed.
A normal visitor wil see the page HTML:
head, body, scripts, meta tags
text, labels, inputs...
<div id='gridcontainer'></div>
more html
more html
Then onpage load I render dynamically by using javascript a table inside div (gridcontainer).
So after onload event is executed, the user see also the table grid inside div.
In this situation google is not indexing the information in tabular grids, because it is rendered by javascript after page load.
The application has the ability to render the exact same content in HTML without using javascript (loosing some functionality). When I say the same exact content I really mean the same page (same content, same headers, same metatags, same title), but not being render by javascript.
The content length may be diferent if we compare both responses because HTML responses might be bigger than html + json + javascript.
This is what I want the spider to see:
head, body, scripts, meta tags
text, labels, inputs...
<div id='gridcontainer'>
<table> table row 1, table row2.....<table>
</div>
more html
more html
To sum up, I want to deliver the "HTML" version to spiders and the other (javascript rendered) to visitors.
Is this cloaking?
This may be dangerous to search engines or is a total legal method if the content I am displaying is totally the same (no tricks).
Thanks in advance!
If the content is basically the same and a human viewer would say that it's the same content, then it's legal. I know of a fairly major site that does this with Google's blessing. Any site that has a page that is largely generated with client-side JS has to do something like this for Google to see anything useful. Since Google doesn't currently evaluate Javascript, there is no other choice for a page that use JS-generated HTML.
I don't know if there's a way to get Google's blessing to avoid any accidental penalty.
The important point is that the actual content of the page needs to be the same. The details of the formatting does not have to be identical.
Note: For legal advice, contact a lawyer.
Yes, this is 'cloaking'.
Yes, it's morally questionable.
But No, it isn't illegal. *(subject to the disclaimer at the top of this answer)
But either way don't do it, because Yes, Google will kill your rankings if they catch you trying to serve content to them which the user doesn't get to see.
If you use progressive enhancement you won't have any issues at all. What you would do is serve the HTML version so users who don't have JavaScript enabled can still see the content. Then add JavaScript that, when the page loads, removes the current HTML and adds the enhanced version of that same content. They key is that the content is the same, just the experience is different due to lack of JavaScript capabilities. This will never get you in trouble with the search engines and is great accessibility. Accessibility is one of the main tenants of SEO.
Related
My company uses tinyMCE editor for content editing feature.
problem : when saving content (as a bulk HTML string) from browser say (chrome) then view on Firefox
Attributes order changes as you see in this => differences in HTML between chrome & Firefox
Our problem is based on content.If content changes, business changes as well.
But in this case user doesn't change content, the browser does.
Scenario
- tinymce is loaded inside a popup
- user edites content & closes the editor popup
- we render edited HTML in a div element (part of a form)
- part of server-side form validation is checking for content (HTML) changes
- using C# to compare saved vs edited HTML content as two strings
Do you have any ideas on how to find the actual changes or could you provide us with a hint about the way to solve this ?
The reason why this occurs is due to how the HTML content is parsed by the browser. At this point of time, TinyMCE cannot guarantee the order of the attributes.
To resolve the issue for your use case, I would suggest parsing the HTML on the server (preferable) or client side before storing the data.
Depending on the technology stack you're on, there are a range of HTML Parsers written in languages from PHP, Java to Ruby. Prettier is one of the "go to" parsers these days - unfortunately there are no ASP.NET solutions options as far as I can see.
I want to display html provided by a user in a page. My page is almost entirely dynamic (JS code), and I was wondering if there's an easy way to sanitize it?
Like, maybe I could remove all the <script> and <iframe> tags and unbind all the events contained in the string (or remove any html attribute starting by 'on') in order to not have any javascript code from the string possibly executed?
Can the users possibly insert javascript with a css 'content' property in a style attribute?
The jquery $(...).text(...) function doesn't help me, since I want to preserve any html mark-up or css styling.
If there's no easy solution i'm ready to live with a whitelist of html tags (table span div img a b u i strong...), but i'd rather not have to white-list the attributes too.
The more foolproof way to show user content safely is to embed it in an iframe who's origin is a different domain than your host web page. This is what jsFiddle does. The main page is served from jsfiddle.net, but the user scripts are served from fiddle.jshell.net. This lets the user content do what it would normally do, but the browser's cross-origin protection keeps the user content from messing with the host page or domain or cookies, etc....
Trying to strip all possible places that scripts could be in the content is a risky proposition which you will probably forever be chasing new attack vectors. I'd personally much rather let the browser be in that business and put the user content on a different domain. Plus, allowing the user content to have it's normal JS will also let it work as desired.
SharePoint is a beast and seems to stomp on everything. Customizing the front-end with javascript has gone well, but now I would like to provide my content owners with more back-end controls. However, any changes made to objects in the WYSIWYG editable area at $(document).ready are immediately reverted by SharePoint.
I imagine this has to do with that "content" not really existing there, but being a copy of hidden input fields. Does anyone know how to get some control of this area? I would love to be able to insert or modify "page content" under the control of scripts, but SharePoint documentation is so terrifyingly sparse.
*EDIT: It appears as though content which is inserted "late" (as in html which is inserted by a click event well after page load) will stick. Anything done at doc.ready or window.load however is rinsed before the area is relinquished to user control.
Sharepoint does a lot of "Sanitizing" of Content entered into some HTML Fields or Content Editor Web Parts sadly. Can you edit the Master Page through SharePoint designer and stick your JavaScript in there?
Also look at ExecuteOrDelayUntilScriptLoaded or _spBodyOnLoadFunctionNames.push()
I have a website that is 1 html file and uses javascript to hide tabbed pages.
The url gets rewritten with a # for the different pages to make them bookmark-able.
Is there a way to make the different pages show in search engine results? It would be good to have them show up as different pages there.
I have read the below doc, but I think that is just for dynamically generated ajax content, right?
http://code.google.com/web/ajaxcrawling/docs/getting-started.html
I read the page mentioned by you. That is for Ajax site. In your case it is not Ajax.
Another point as Jeff B has mentioned is that the chance is high that Google will index all content for each trick you use. In that case it would be bad as Google will get duplicate content. It will be not very bad as all content are from your site only.
Search Engine questions like this are very tricky and difficult to answer as no one know the exact functioning of Search Engine.
In my thinking you either recreate your pages as Ajax and follow the points mentioned in article you got. Or
Use a link for each tag with param. like page1.php?cat1, page1.php?cat2, etc.
and that only load content related to specific tag at a time.
The second solution is no different than implementing different page for each tab, but it can be easier to update in your case! and also all content are still accessible by both person and search engine at a place. Slowly search engine will index your each page with parameter. Remember, It is generally said that Google does not index pages with parameter but it is not true. Google does not index page with variable or id kind of parameter only. They index each page with popular parameters if page content changes.
Still your question is tricky and my suggestion is what comes to me after thinking much about it.
The problem seems to be that even if the different pages were indexed, they would all index the same content. This is because according to your explanation all of the content (including hidden) exists at load time.
If your tabs are links, you simply need to put the href in the link. Google should follow this link, while javascript-enabled browsers will execute your tab-switching code and not follow the link (if you coded it right).
However, the problem of all content being indexed for all pages still remains.
Modify your system like this:
Every link that changes the content of the current tab should have
as href attribute a subpage that contains the content of the tab
intended to appear -> this will be cached by Search Engines.
Those links should have binded JS actions that changes the content
of the current tab and also denies the redirecting that should have
been done by what's in the "href" attribute -> this will be shown to
the user
A client wants a merch shop on their site, and has set one up. I could iFrame in the whole page to the merch page, but frankly the merch site is an eyesore, and their site has a very particular feel to it. So I'm considering using an AJAX GET to grab the whole page, then javascript to display only the div with the merchandise in it. However, there are a lot of javascript includes (etc) on the merch site that I'd need to make sure are still present for the div to work correctly.
Any feeling on if this would work or not? Would the displayed div take its stylesheet and scripts from the AJAX'd page? Can I put the div in an iframe instead?
Opinions?
It sounds like an ugly solution. Isn't it better to do this serverside instead, for example let a PHP script read in the page and to whatever magic it takes to display it?
Using AJAX to load entire pages is ugly for a couple of reasons, including:
It breaks the URLs (can be worked around but requires extra work)
It's hard for search engines to crawl your site
It breaks some GUI elements in the browser, such as loading visualisations
looks like you can use jquery load function http://docs.jquery.com/Ajax/load