Do search engines process Javascript? - javascript

According to this page it would seem like they don't, in the sense that they don't actually run it, but that page is 2 years old (judging from the copyright info).
The reason I'm asking this question is because we use Javascript to replace text on our site with other more typographically sound content. We're worried that this may affect the crawlability/seo of our sites, since generally what we're replacing is headers; ie. <h1>, <h2>, etc.
Will search engine bots see our original code, or will they run the Javascript and see the replaced text?

Google now officially processes JavaScript.
In order to solve this problem, we decided to try to understand pages by executing JavaScript. It’s hard to do that at the scale of the current web, but we decided that it’s worth it. We have been gradually improving how we do this for some time. In the past few months, our indexing system has been rendering a substantial number of web pages more like an average user’s browser with JavaScript turned on.
Sometimes things don't go perfectly during rendering, which may negatively impact search results for your site. Here are a few
potential issues, and – where possible, – how you can help prevent
them from occurring:
If resources like JavaScript or CSS in separate files are blocked (say, with robots.txt) so that Googlebot can’t retrieve them, our
indexing systems won’t be able to see your site like an average user.
We recommend allowing Googlebot to retrieve JavaScript and CSS so that
your content can be indexed better. This is especially important for
mobile websites, where external resources like CSS and JavaScript help
our algorithms understand that the pages are optimized for mobile. If
your web server is unable to handle the volume of crawl requests for
resources, it may have a negative impact on our capability to render
your pages. If you’d like to ensure that your pages can be rendered by
Google, make sure your servers are able to handle crawl requests for
resources.
It's always a good idea to have your site degrade gracefully. This will help users enjoy your content even if their browser doesn't have
compatible JavaScript implementations. It will also help visitors with
JavaScript disabled or off, as well as search engines that can't
execute JavaScript yet.
Sometimes the JavaScript may be too complex or arcane for us to execute, in which case we can’t render the page fully and accurately.
Some JavaScript removes content from the page rather than adding, which prevents us from indexing the content.

Search engines don't process JavaScript as such.
There is some evidence that Google may have started processing inline script content in some cases, in order to catch content that is entered into the page parse queue using document.write. However certainly DOM methods such as you might use for font-replacement are not affected and no onload code is invoked.

Generally no. Google has mentioned that they are working on a system of indexing ajax content, but I don't think any of the major search engines index dynamic content as a rule. See this page for Google's take on it: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=81766

The bots will certainly not run the Javascript code, but they might recognise some commonly used scripts.
You shouldn't count on it though. Clear markup, proper content and real links is still what counts.
Also, if the bots happen to recognise your script, it might not be in your favor. If the code is recognised as something that is commonly used to try to fool bots, it could even hurt your page ranking.

I'd use metadata to ensure bots pick up the content on your pages.

I know the general consensus is that google does not process javascript or index anything with a <script> tag, however, the general consensus appears incorrect.
Try searching for the following, with the surrounding quotes (or click here):
"Samsung Public Interest Statement by Thomas Fusco, Fish & Richardson P.C., for Samsung."
You should only get one result. Now click on that result (or just click here) and view the source.
Do a CTRL-F for the text you searched for in Google. Notice that the text is in a javascript variable, and not html. Google must be processing some javascript to pull those words into its index.

Related

JavaScript get list of external scripts (like extensions)

Is it possible to get list of all scripts injected by browser? Or at least detect them somehow. I mean sometimes on Windows there are various viruses which inject scripts in fly modifying eg. click actions to display ads. I'm writing kind of advanced website so I'd like to warn user about other scripts which most likely:
will crash as my webapp is modifying basic native browser APIs like document.getElement* or even listeners
may make webapp unstable and in wors case make it crash.
could be performance overkill
I'm talking also about scripts modifying site content like eg. Ponify or XKCD numbers.
I know about navigator.plugins but it doesn't seem to be what am I looking for.
Not really. I mean you could do a:
document.getElementsByTagName('script');
And fetch all script tags, and check their src attribute, but it's not that simple. It's possible to make an ajax request for a javascript file, and then eval(ajaxResult) to execute that code. Your browser has no way of knowing where that code came from as it's just a string.
There are a lot of ways to execute javascript, and cleanup any trace, there is no way to cover them all.
EDIT: I missed the key phrase "scripts injected by browser" :)
At least in Chrome, some extensions do seem to inject script tags. Though they don't seem to be marked up in any special way. Filtering them may be tricky. Perhaps if you add a class to the script tags you know should be there, and you find a script tag that does not have that class then you know it could be a extension script.
I'm also not sure that an extension must insert a script tag to do stuff on the page. I think it has ways of interacting with the page directly from the extension code as well. Not sure though. More research required. And this is probably different for different browsers.
Personally, I think defensively protecting your site from people's own browser extensions is a fools errand. If someone wants to hamstring their own browsing experience in bizarre ways, it's not your responsibility to hold their hand. And you will have a very hard time detecting all the ways an extension can blow up your everything.

Website doesn't work with Javascript turned off

So head to www.jabsy.com, with Javascript turned off.
Basically, I use some JQuery UI Dialogs, I use Javascript for all the bindings on the page...I pretty much use it for everything. Is that really a bad thing though?
Nothing really works without Javascript. Not even the Google Maps API.
Should I go out of my way to try and make the entire page work without Javascript? Is that even possible with my site? I wouldn't even know where to begin as I use Javascript for everything, so could I get some points? How many users actually turn off their Javascript these days?
Would it help to let the user know if they have Javascript turned off and make them turn it on before accessing it and provide them with directions how?
Yes, if your site requires JavaScript you need to let the user know that it is required.
For example:
<noscript>
<div>
You need to have JavaScript enabled to use this site.
</div>
</noscript>
You can provide more description as appropriate. A savvy user that sees this text is going to be able to then go in and turn on JavaScript for your site. A non-technical user might have trouble, but I would think most of them would be running with JavaScript enabled anyway (?).
According to data collected in 2007, about 3% of users in the US have JavaScript off. I'm sure that number is lower today.
It really depends on how critical the sections of your page that require JavaScript are. If there is a form that is mission critical, but controlled completely by JavaScript, you probably want to engineer a way for that form to do the same thing with JS on and off.
However, you have animated snowflakes on your background (for the love of God, don't really do this), it's not going to negatively affect someone visiting your site with JavaScript off.
Really, it all comes down to how important the information or actions are to your site. Turn off JavaScript and note all the things you can't do that are absolutely vital, then make them work.
Keep in mind there are several audiences that will not render your JavaScript:
Screen readers/accessible browsers
Console-based browsers (Text based browsers)
Search Engines (Google)
Your specific service (location-based messages) will be way too cumbersome to use without JavaScript (and its content is dynamic). Therefore, I see no problem requiring it. You should, however, point out that JavaScript is necessary to use your site (Preferably at the top, in really large letters). You can do that by including the alternative no-JavaScript content in noscript tags, i.e.
<noscript>
<div style="font-size: 200%;">You need JavaScript!</div>
</noscript>
However, most websites are content-based, like a company's homepage, stackoverflow or Wikipedia. These websites should be usable without JavaScript. Nowadays, even smartphones have excellent JavaScript support, but Kindle and regular phones are still too slow for JavaScript.
There is a line of argument that says sites should work without JS. Personally, I think that is tosh, unless you have a clientelle for whom this is liable to be an issue. JS is a reasonable thing to expect for many sites.
However, it is polite to let people know that this is a requirement, and inform them rather than just letting it not work. If your site is heavily JS dependent, then you may have made some mistaken design decisions, but it is probably not worth re-working it. If you monitor the number of people who get the "you need js" message, you will identify if it is proving a turn-off. I suspect it will not be an issue.
So build based on what you need, BUT tell people if they need to have things set.
You can use the <noscript><!-- html here if no Javascript --></noscript> tags and place content to be rendered in between if javascript is turned off.
I don't think there are many sites that will work with these days without it. It's more or less mandatory.

javascript find replace

I have a question about optimization, but more on the browser/client side.
I am catering to a few societies that need about 3 different languages. So I'm just putting my user's language type in the php session, and swapping out the text for selected areas on each page they navigate to. So, really nothing complicated.
However, I'm toying with the idea of letting javascript do the find/replace of the selected texts on each page.
There are a few ways to skin this cat, and I've done them all, and they work. However, I do have a few hundred pages, and many words to replace with the correct language text.
If I were to go the Javascript route, does anyone have an opposing view to this? And if so, why? I'm interested in letting the user's browser do the work, rather than my servers constantly finding and replacing, or creating new CONSTANTS for each language specific situation.
I'm worried about their browsers getting slower. But that could be a very small problem.
For those individuals who love to get specifics, here's what I would do with javascript.
I would load a languages.js file with all appropriate word translations for any language I implement. Instead of running a huge find/replace on each page load, I'd localize the find/replace to the specific page, or possibly narrow an element to have an attribute that my scripting would load in the the DOM and perform a find/replace on that alone.
I'm open to better ideas.
Also, for those people who find "over-optimization" useless or "over-doing-it", please don't mention anything. This is for fun and not a critical decision item.
thanks guys!
Well, on the pro side, yes, you are offloading some of the work to the client, but I don't think that's going to make any real difference. You're probably talking about a tiny percentage of the overall performance of the site. The only way to know of course is to test it.
On the con side, you'll be increasing the bandwidth it takes to load your site, since the user will need to load the page plus the language file. It will be cached, if you set it up right, so that's probably not a huge concern either.
Another con is that this will make your site depend on javascript. A non-scripting visitor won't get the translation, and that includes search engines. Whether that matters to you depends on the nature of your site, but in general, that's a pretty big negative.
You'd also have to watch out for "flashing" of the non-localized language. It'd look horrible if the page loaded and then a split second later the language changed to something else. If you are doing the swapping from the DOM ready event ($(function() {}) in jquery, for example), it's probably too late. You could do it from a script you put at the bottom of the page, and that'd probably be ok, but even then, it may depend on the browser and the structure of the markup, not to mention the user's bandwidth and whether the server sends the content in chunks.
I think it comes down to what fits your needs best. Sorry that's not much of an answer, but it's an accurate one I think :)
I agree it is important to keep your server from overload. I would solve the problem one of two ways
Use your suggested javascript find and replace, whilst the javascript is working, have a loading.gif spinning round with a message to the effect of 'translating' nearly to explain to users why they must wait. If you are doing a word by word translation, you have to be careful about causing a browser like IE or to moan about having to do work 'The page has become unresponsive'; I would suggest running a setInterval(Translate(), 1) where translate translates a set number of words at a time so the browser doesn't think your script is going in an endless loop.
Provided the same sections are translated for all foreign visitors, you could make a PHP script that makes new, translated pages next to the originals. The translated pages could include the translator.php script to do a quick check to see if the original page has changed to decide whether or not to make a new translated page. This would not mean translating a page every time it needs to be viewed in a foreign language, but only a little check to see if the original had changed - putting less load on your server and none on the client side browser.
Personally I would implement 2 if possible to be more low-power-browser friendly (such as mobile devices) but in practice either would do and it's an interesting problem.

Just In General: JS Only Vs Page-Based Web Apps

When a developing a web app, versus a web site, what reasons are there to use multiple HTML pages, rather than using one html page and doing everything through Javascript?
I would expect that it depends on the application -- maybe -- but would appreciate any thoughts on the subject.
Thanks in advance.
EDIT:
Based on the responses here, and some of my own research, if you wanted to do a single-page, fully JS-Powered site, some useful tools would seem to include:
JQuery Plug Ins:
JQuery History:
http://balupton.com/projects/jquery-history
JQuery Address:
http://plugins.jquery.com/project/jquery-address
JQuery Pagination:
http://plugins.jquery.com/project/pagination
Frameworks:
Sproutcore
http://www.sproutcore.com/
Cappucino
http://cappuccino.org/
Possibly, JMVC:
http://www.javascriptmvc.com/
page based applications provide:
ability to work on any browser or device
simpler programming model
they also provide the following (although these are solvable by many js frameworks):
bookmarkability
browser history
refresh or F5 to repeat action
indexability (in case the application is public and open)
One of the bigger reasons is going to be how searchable your website is.
Doing everything in javascript is going to make it complicated for search engines to crawl all content of your website, and thus not fully indexing it. There are ways around this (with Google's recent AJAX SEO guidelines) but I'm not sure if all search engines support this yet. On top of that, it's a little bit more complex then just making separate pages.
The bigger issue, whether you decide to build multiple HTML pages, or you decide to use some sort of framework or CMS to generate them for you, is that the different sections of your website have URL's that are unique to them. E.g., an about section would have a URL like mywebsite.com/about, and that URL is used on the actual "about" link within the website.
One of the biggest downfalls of single-page, Ajax-ified websites is complexity. What might otherwise be spread across several pages suddenly finds its way into one huge, master page. Also, it can be difficult to coordinate the state of the page (for example, tracking if you are in Edit mode, or Preview mode, etc.) and adjusting the interface to match.
Also, one master page that is heavy on JS can be a performance drag if it has to load multiple, big JS files.
At the OP's request, I'm going to discuss my experience with JS-only sites. I've written four relevant sites: two JS-heavy (Slide and SpeedDate) and two JS-only (Yazooli and GameCrush). Keep in mind that I'm a JS-only-site bigot, so you're basically reading John Hinkley on the subject of Jody Foster.
The idea really works. It produces gracefully, responsive sites at very low operational costs. My estimate is that the cost for bandwidth, CPU, and such goes to 10% of the cost of running a similar page-based site.
You need fewer but better (or at least, better-trained) programmers. JavaScript is an powerful and elegant language, but it has huge problems that a more rigid and unimaginative language like Java doesn't have. If you have a whole bunch of basically mediocre guys working for you, consider JSP or Ruby instead of JS-only. If you are required to use PHP, just shoot yourself.
You have to keep basic session state in the anchor tag. Users simply expect that the URL represents the state of the site: reload, bookmark, back, forward. jQuery's Address plug-in will do a lot of the work for you.
If SEO is an issue for you, investigate Google Ajax Crawling. Basically, you make a very simple parallel site, just for search engines.
When would I not use JS-only? If I were producing a site that was almost entirely content, where the user did nothing but navigate from one place to another, never interacting with the site in a complicated manner. So, Wikipedia and ... well, that's about it. A big reference site, with a lot of data for the user to read.
modularization.
multiple files allows you to mre cleanly break out different workflow paths and process parts.
chances are your Business Rules are something that do not usually directly impact your layout rules and multiple files would better help in editing on what needs to be edited without the risk of breaking something unrelated.
I actually just developed my first application using only one page.
..it got messy
My idea was to create an application that mimicked the desktop environment as much as possible. In particular I wanted a detailed view of some app data to be in a popup window that would maintain it's state regardless of the section of the application they were in.
Thus my frankenstein was born.
What ended up happening due to budget/time constraints was the code got out of hand. The various sections of my JavaScript source got muddled together. Maintaining the proper state of various views I had proved to be... difficult.
With proper planning and technique I think the 'one-page' approach is a very easy way to open up some very interesting possibilities (ex: widgets that maintain state across application sections). But it also opens up many... many potential problem areas. including...
Flooding the global namespace (if you don't already have your own... make one)
Code organization can easily get... out of hand
Context - It's very easy to
I'm sure there are more...
In short, I would urge you to stay away from relying on JavaScript dependency for the compatibility issue's alone. What I've come to realize is there is simply no need rely on JavaScript to everything.
I'm actually in the process of removing JavaScript dependencies in loo of Progressive Enhancement. It just makes more sense. You can achieve the same or similar effects with properly coded JavaScript.
The idea is too...
Develop out well-formatted, fully functional application w/o any JavaScript
Style it
Wrap the whole thing with JavaScript
Using Progressive Enhancement one can develop an application that delivers the best possible experience for the user that is possible.
For some additional arguments, check out The Single Page Interface Manifesto and some (mostly) negative reaction to it on Hacker News (link at the bottom of the SPI page):
The Single Page Interface Manifesto: http://itsnat.sourceforge.net/php/spim/spi_manifesto_en.php
stofac, first of all, thanks for the link to the Single Page Interface (SPI) Manifesto (I'm the author of this boring text)
Said this, SPI != doing everything through Javascript
Take a look to this example (server-centric):
http://www.innowhere.com/insites/
The same in GAE:
http://itsnatsites.appspot.com/
More info about the GAE approach:
http://www.theserverside.com/news/thread.tss?thread_id=60270
In my opinion coding a complex SPI application/web site fully on JavaScript is very very complex and problematic, the best approach in my opinion is "hybrid programming" for SPI, a mix of server-centric for big state management and client-centric (a.k.a JavaScript by hand) for special effects.
Doing everything on a single page using ajax everywhere would break the browser's history/back button functionality and be annoying to the user.
I utterly despise JS-only sites where it is not needed. That extra condition makes all the difference. By way of example consider the oft quoted Google Docs, in this case it not only helps improve experiences it is essential. But some parts of Google Help have been JS-only and yet it adds nothing to the experience, it is only showing static content.
Here are reasons for my upset:
Like many, I am a user of NoScript and love it. Pages load faster, I feel safer and the more distracting adverts are avoided. The last point may seem like a bad thing for webmasters but I don't want anyone to get rewarded for pushing annoying flashy things in my face, if tactless advertisers go out of business I consider it natural selection.
Obviously this means some visitors to your site are either going to be turned away or feel hassled by the need to provide a temporary exclusion. This reduces your audience.
You are duplicating effort. The browser already has a perfectly good history function and you shouldn't need to reinvent the wheel by redrawing the previous page when a back button is clicked. To make matters worse going back a page shouldn't require re-rendering. I guess I am a student of If-it-ain't-broke-don't-fix-it School (from Don't-Repeat-Yourself U.).
There are no HTTP headers when traversing "pages" in JS. This means no cache controls, no expiries, content cannot be adjusted for requested language nor location, no meaningful "page not found" nor "unavailable" responses. You could write error handling routines within your uber-page that respond to failed AJAX fetches but that is more complexity and reinvention, it is redundant.
No caching is a big deal for me, without it proxies cannot work efficiently and caching has the greatest of all load reducing effects. Again, you could mimic some caching in your JS app but that is yet more complexity and redundancy, higher memory usage and poorer user experience overall.
Initial load times are greater. By loading so much Javascript on the first visit you are causing a longer delay.
More JavaScript complexity means more debugging in various browsers. Server-side processing means debugging only once.
Unfuddle (a bug-tracker) left a bad taste. One of my most unpleasant web experiences was being forced to use this service by an employer. On the surface it seems well suited; the JS-heavy section is private so doesn't need to worry about search engines, only repeat visitors will be using it so have time to turn off protections and shouldn't mind the initial JS library load.
But it's use of JS is pointless, most content is static. "Pages" were still being fetched (via AJAX) so the delay is the same. With the benefit of AJAX it should be polling in the background to check for changes but I wouldn't get notified when the visible page had been modified. Sections had different styles so there was an awkward re-rendering when traversing those, loading external stylesheets by Javascript is Bad Practice™. Ease of use was sacrificed for whizz-bang "look at our Web 2.0" features. Such a business-orientated application should concentrate on speed of retrieval, but it ended up slower.
Eventually I had to refuse to use it as it was disrupting the team's work flow. This is not good for client-vendor relationships.
Dynamic pages are harder to save for offline use. Some mobile users like to download in advance and turn off their connection to save power and data usage.
Dynamic pages are harder for screen readers to parse. While the number of blind users are probably less than those with NoScript or a mobile connection it is inexcusable to ignore accessibility - and in some countries even illegal, see the "Disability Discrimination Act" (1999) and "Equality Act" (2010).
As mentioned in other answers the "Progressive Enhancement", née "Unobtrusive Javascript", is the better approach. When I am required to make a JS-only site (remember, I don't object to it on principle and there are times when it is valid) I look forward to implementing the aforementioned AJAX crawling and hope it becomes more standardised in future.

How do you add a JavaScript widget to a Wordpress.com hosted blog?

I've got a site that provides blog-friendly widgets via JavaScript. These work fine in most circumstances, including self-hosted Wordpress blogs. With blogs hosted at Wordpress.com, however, JavaScript isn't allowed in sidebar text modules. Has anyone seen a workaround for this limitation?
you could always petition wp to add your widget to their 'approved' list, but who knows how long that would take. you're talking about a way to circumvent the rules they have in place about posting arbitrary script. myspace javascript exploits in particular have increased awareness of the possibility of such workarounds, so you might have a tough time getting around the restrictions - however, here's a classic ones to try:
put the javascript in a weird place, like anywhere that executes a URL. for instance:
<div style="background:url('javascript:alert(this);');" />
sometimes the word 'javascript' gets cut out, but occasionally you can sneak it through as java\nscript, or something similar.
sometimes quotes get stripped out - try String.fromCharCode(34) to get around that. Also, in general, using eval("codepart1" + "codepart2") to get around restricted words or characters.
sneaking in javascript is a tricky business, mostly utilizing unorthodox (possibly un-documented) browser behavior in order to execute arbitrary javascript on a page. Welcome to hacking.
From the official WordPress.com FAQ:
Javascript can be used for malicious purposes and while what you want to do is okay it does not mean all javascript will be okay.
It goes on to remind the reader that both MySpace and LiveJournal had been affected by malicious Javascript and, therefore, will not be permitted (as it may be exploited by users with poor intentions). They can't risk it with amazingly large sites (think I Can Has Cheezburger, Anderson Cooper 360, Fox, etc.).
If you think you have Javascript that would benefit WordPress.com you can contact them directly.
There is not work around for it. Wordpress does not currently support Javascript. Sorry.
Just find a good site about XSS if You really need that js to work. But if it works for You it works for anybody, and You post a tutorian on how to do an XSS attack on Your page with posts or comments.
reference:
http://ha.ckers.org/xss.html

Categories