Detect Search Crawlers via JavaScript

Detect Search Crawlers via JavaScript - javascript

I am wondering how would I go abouts in detecting search crawlers? The reason I ask is because I want to suppress certain JavaScript calls if the user agent is a bot.
I have found an example of how to to detect a certain browser, but am unable to find examples of how to detect a search crawler:
/MSIE (\d+\.\d+);/.test(navigator.userAgent); //test for MSIE x.x
Example of search crawlers I want to block:
Google
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Googlebot/2.1 (+http://www.googlebot.com/bot.html)
Googlebot/2.1 (+http://www.google.com/bot.html)
Baidu
Baiduspider+(+http://www.baidu.com/search/spider_jp.html)
Baiduspider+(+http://www.baidu.com/search/spider.htm)
BaiDuSpider

This is the regex the ruby UA agent_orange library uses to test if a userAgent looks to be a bot. You can narrow it down for specific bots by referencing the bot userAgent list here:
/bot|crawler|spider|crawling/i
For example you have some object, util.browser, you can store what type of device a user is on:
util.browser = {
bot: /bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent),
mobile: ...,
desktop: ...
}

Try this. It's based on the crawlers list on available on https://github.com/monperrus/crawler-user-agents
var botPattern = "(googlebot\/|bot|Googlebot-Mobile|Googlebot-Image|Google favicon|Mediapartners-Google|bingbot|slurp|java|wget|curl|Commons-HttpClient|Python-urllib|libwww|httpunit|nutch|phpcrawl|msnbot|jyxobot|FAST-WebCrawler|FAST Enterprise Crawler|biglotron|teoma|convera|seekbot|gigablast|exabot|ngbot|ia_archiver|GingerCrawler|webmon |httrack|webcrawler|grub.org|UsineNouvelleCrawler|antibot|netresearchserver|speedy|fluffy|bibnum.bnf|findlink|msrbot|panscient|yacybot|AISearchBot|IOI|ips-agent|tagoobot|MJ12bot|dotbot|woriobot|yanga|buzzbot|mlbot|yandexbot|purebot|Linguee Bot|Voyager|CyberPatrol|voilabot|baiduspider|citeseerxbot|spbot|twengabot|postrank|turnitinbot|scribdbot|page2rss|sitebot|linkdex|Adidxbot|blekkobot|ezooms|dotbot|Mail.RU_Bot|discobot|heritrix|findthatfile|europarchive.org|NerdByNature.Bot|sistrix crawler|ahrefsbot|Aboundex|domaincrawler|wbsearchbot|summify|ccbot|edisterbot|seznambot|ec2linkfinder|gslfbot|aihitbot|intelium_bot|facebookexternalhit|yeti|RetrevoPageAnalyzer|lb-spider|sogou|lssbot|careerbot|wotbox|wocbot|ichiro|DuckDuckBot|lssrocketcrawler|drupact|webcompanycrawler|acoonbot|openindexspider|gnam gnam spider|web-archive-net.com.bot|backlinkcrawler|coccoc|integromedb|content crawler spider|toplistbot|seokicks-robot|it2media-domain-crawler|ip-web-crawler.com|siteexplorer.info|elisabot|proximic|changedetection|blexbot|arabot|WeSEE:Search|niki-bot|CrystalSemanticsBot|rogerbot|360Spider|psbot|InterfaxScanBot|Lipperhey SEO Service|CC Metadata Scaper|g00g1e.net|GrapeshotCrawler|urlappendbot|brainobot|fr-crawler|binlar|SimpleCrawler|Livelapbot|Twitterbot|cXensebot|smtbot|bnf.fr_bot|A6-Indexer|ADmantX|Facebot|Twitterbot|OrangeBot|memorybot|AdvBot|MegaIndex|SemanticScholarBot|ltx71|nerdybot|xovibot|BUbiNG|Qwantify|archive.org_bot|Applebot|TweetmemeBot|crawler4j|findxbot|SemrushBot|yoozBot|lipperhey|y!j-asr|Domain Re-Animator Bot|AddThis)";
var re = new RegExp(botPattern, 'i');
var userAgent = navigator.userAgent;
if (re.test(userAgent)) {
console.log('the user agent is a crawler!');
}

The following regex will match the biggest search engines according to this post.
/bot|google|baidu|bing|msn|teoma|slurp|yandex/i
.test(navigator.userAgent)
The matches search engines are:
Baidu
Bingbot/MSN
DuckDuckGo (duckduckbot)
Google
Teoma
Yahoo!
Yandex
Additionally, I've added bot as a catchall for smaller crawlers/bots.

This might help to detect the robots user agents while also keeping things more organized:
Javascript
const detectRobot = (userAgent) => {
const robots = new RegExp([
/bot/,/spider/,/crawl/, // GENERAL TERMS
/APIs-Google/,/AdsBot/,/Googlebot/, // GOOGLE ROBOTS
/mediapartners/,/Google Favicon/,
/FeedFetcher/,/Google-Read-Aloud/,
/DuplexWeb-Google/,/googleweblight/,
/bing/,/yandex/,/baidu/,/duckduck/,/yahoo/, // OTHER ENGINES
/ecosia/,/ia_archiver/,
/facebook/,/instagram/,/pinterest/,/reddit/, // SOCIAL MEDIA
/slack/,/twitter/,/whatsapp/,/youtube/,
/semrush/, // OTHER
].map((r) => r.source).join("|"),"i"); // BUILD REGEXP + "i" FLAG
return robots.test(userAgent);
};
Typescript
const detectRobot = (userAgent: string): boolean => {
const robots = new RegExp(([
/bot/,/spider/,/crawl/, // GENERAL TERMS
/APIs-Google/,/AdsBot/,/Googlebot/, // GOOGLE ROBOTS
/mediapartners/,/Google Favicon/,
/FeedFetcher/,/Google-Read-Aloud/,
/DuplexWeb-Google/,/googleweblight/,
/bing/,/yandex/,/baidu/,/duckduck/,/yahoo/, // OTHER ENGINES
/ecosia/,/ia_archiver/,
/facebook/,/instagram/,/pinterest/,/reddit/, // SOCIAL MEDIA
/slack/,/twitter/,/whatsapp/,/youtube/,
/semrush/, // OTHER
] as RegExp[]).map((r) => r.source).join("|"),"i"); // BUILD REGEXP + "i" FLAG
return robots.test(userAgent);
};
Use on server:
const userAgent = req.get('user-agent');
const isRobot = detectRobot(userAgent);
Use on "client" / some phantom browser a bot might be using:
const userAgent = navigator.userAgent;
const isRobot = detectRobot(userAgent);
Overview of Google crawlers:
https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers

isTrusted property could help you.
The isTrusted read-only property of the Event interface is a Boolean
that is true when the event was generated by a user action, and false
when the event was created or modified by a script or dispatched via
EventTarget.dispatchEvent().
eg:
isCrawler() {
return event.isTrusted;
}
⚠ Note that IE isn't compatible.
Read more from doc: https://developer.mozilla.org/en-US/docs/Web/API/Event/isTrusted

People might light to check out the new navigator.webdriver property, which allows bots to inform you that they are bots:
https://developer.mozilla.org/en-US/docs/Web/API/Navigator/webdriver
The webdriver read-only property of the navigator interface indicates whether the user agent is controlled by automation.
It defines a standard way for co-operating user agents to inform the document that it is controlled by WebDriver, for example, so that alternate code paths can be triggered during automation.
It is supported by all major browsers and respected by major browser automation software like Puppeteer. Users of automation software can of course disable it, and so it should only be used to detect "good" bots.

I combined some of the above and removed some redundancy. I use this in .htaccess on a semi-private site:
(google|bot|crawl|spider|slurp|baidu|bing|msn|teoma|yandex|java|wget|curl|Commons-HttpClient|Python-urllib|libwww|httpunit|nutch|biglotron|convera|gigablast|archive|webmon|httrack|grub|netresearchserver|speedy|fluffy|bibnum|findlink|panscient|IOI|ips-agent|yanga|Voyager|CyberPatrol|postrank|page2rss|linkdex|ezooms|heritrix|findthatfile|Aboundex|summify|ec2linkfinder|facebook|slack|instagram|pinterest|reddit|twitter|whatsapp|yeti|RetrevoPageAnalyzer|sogou|wotbox|ichiro|drupact|coccoc|integromedb|siteexplorer|proximic|changedetection|WeSEE|scrape|scaper|g00g1e|binlar|indexer|MegaIndex|ltx71|BUbiNG|Qwantify|lipperhey|y!j-asr|AddThis)

The "test for MSIE x.x" example is just code for testing the userAgent against a Regular Expression. In your example the Regexp is the
/MSIE (\d+\.\d+);/
part. Just replace it with your own Regexp you want to test the user agent against. It would be something like
/Google|Baidu|Baiduspider/.test(navigator.userAgent)
where the vertical bar is the "or" operator to match the user agent against all of your mentioned robots. For more information about Regular Expression you can refer to this site since javascript uses perl-style RegExp.

I found this isbot package that has the built-in isbot() function. It seams to me that the package is properly maintained and that they keep everything up-to-date.
USAGE:
const isBot = require('isbot');
...
isBot(req.get('user-agent'));
Package: https://www.npmjs.com/package/isbot

Related

How to detect if webkitSpeechRecognition actually works in a browser

Edge claims to support webkitSpeechRecognition, but it doesn't work (discussion here, doesn't work on websites meant for testing, like this mozilla one, with the error "Error occurred in recognition: language-not-supported" despite my US english UI).
How can I detect if webkitSpeechRecognition is actually supported? I tried to filter out Edge by looking at the user agent, but it shows up as Chrome, and I'd prefer to just use feature detection rather than looking at the user agent anyway. I'd like to check this without requesting microphone permission (if I did request microphone permission, I'd have to wait for them to accept, and then see the language-not-supported error). Is there a simple way to check this, similar to just checking the value of window["webkitSpeechRecognition"] (which is defined in Edge, despite not working)?

If you want to check the support for webkitSpeechRecognition then you can refer to the JS code example below.
if ('SpeechRecognition' in window || 'webkitSpeechRecognition' in window)
{
console.log("speech recognition API supported");
}
else
{
console.log("speech recognition API not supported")
}
Output in MS Edge 88.0.705.56:
However, if you directly try to make a test using webkitSpeechRecognition then it will not work.
It looks like this feature is currently under development and to use it we need to enable it by passing the command line arguments.
I suggest you refer to the steps below.
Create a shortcut of the Edge chromium-browser.
Right-click the shortcut file and go to Properties.
Under Shortcut tab, in the Target textbox, add --enable-features=msSpeechRecognition after the msedge.exe path. Make sure to add 1 space between the path and command-line argument.
It should look like below.
Click on the OK button to close the properties window.
Launch the Edge browser via shortcut and visit any sample code example for SpeechRecognition. Here I am making a test with this example.
Output in MS Edge 88.0.705.56:

In addition, it's useful to check for errors (as shown in the above example source code, see below, https://www.google.com/intl/en/chrome/demos/speech.html): for instance audio might be disabled if not on a https website.
recognition.onerror = function(event) {
if (event.error == 'no-speech') {
start_img.src = '/intl/en/chrome/assets/common/images/content/mic.gif';
showInfo('info_no_speech');
ignore_onend = true;
}
if (event.error == 'audio-capture') {
start_img.src = '/intl/en/chrome/assets/common/images/content/mic.gif';
showInfo('info_no_microphone');
ignore_onend = true;
}
if (event.error == 'not-allowed') {
if (event.timeStamp - start_timestamp < 100) {
showInfo('info_blocked');
} else {
showInfo('info_denied');
}
ignore_onend = true;
}
};

How to create speech recognition object in JavaScript

I want to create a speech recognition object in JavaScript, but when I am going code this:
const btn =document.querySelector(".talk");
const containt=document.querySelector(".containt");
const SpeechRec=window.SpeechRecognition||window.webkitSpeechRecognition;
const recognition= new SpeechRecognition(); //Error in this line ``
it gives an error that
Uncaught ReferenceError: SpeechRecognition is not defined at script.js:6

To use speech recognition in an app, you need to specify the following permissions in your manifest:
"permissions": {
"audio-capture" : {
"description" : "Audio capture"
},
"speech-recognition" : {
"description" : "Speech recognition"
}
}
You also need a privileged app, so you need to include this as well:
"type": "privileged"
Are you doing all this?
NB: Taken from...
https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition/SpeechRecognition
In other words, you are attempting to call a special function that is part of a Chrome/Mozilla collaboration library for modern browsers (not standard javascript) so you must have a manifest.json file in your project’s directory in which you have to declare all these optional libraries you will be utilizing.

If you're interested in having speech recognition work in more browsers, including Chrome, Firefox, Safari, Edge, etc... You can use Microsoft's Speech SDK. There are some good samples here: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/quickstart/javascript/browser

This has a very good example for Speech Recognition in JS
While the above link is a guide. The error in the code the SpechRecognition JS variable is referenced incorrectly. The actual reference name is "SpeechRec". Please see the below corrected code -
const btn =document.querySelector(".talk");
const containt=document.querySelector(".containt");
const SpeechRec=window.SpeechRecognition||window.webkitSpeechRecognition;
const recognition= new SpeechRec();

crypto.subtle don't exist even with webcrypto-shim

I'm making a Cordova app with backbone and my goal was to achieve socket authentification with a JSON Web Token (JWT).
To sign my JWT, I used webcrypto-jwt and it worked well when using the app in the browser.
Then I tried the app on my mobile and BBAAMM...
webcrypto-jwt uses the browser's window.crypto.subtle module.
var cryptoSubtle = (window.crypto && crypto.subtle) ||
(window.crypto && crypto.webkitSubtle) ||
(window.msCrypto && window.msCrypto.Subtle);
But no subtle on android web view!
So I used webcrypto-shim to add the crypto.subtle. But it doesn't work.
That's a screenshot of my cordova's window object. It does have a crypto key but with no subtle in it!
So I can't sign my JWT.

https://github.com/PeculiarVentures/webcrypto-liner will provide you a working webcrypto on most platforms.
I have used it with https://github.com/square/js-jose just fine.
You can test your browsers support for WebCrypto with this - https://peculiarventures.github.io/pv-webcrypto-tests/
Ryan

WebCryptographyApi is not supported on Android WebView, and webcrypto-shim is not targeted to this component
The library is targeted to fix these browsers having prefixed and buggy webcrypto api implementations:
Internet Explorer 11, Mobile Internet Explorer 11,
Safari 8+, iOS Safari 8+.
So you are getting the expected behaviour. I think window.crypto that is showing Cordova is the old implementation.
If you need key storage I suggest use the Android native keystore (or iOS if you build for it). If you are looking for cryptographic function, include a pure javascript library

After more research and tests I have found a pure js library that works on cordova.
jsrsasign
I used it to authentificate my JWT. It doesn't use the the crypto.subtle module.
// Header
var oHeader = { alg: 'HS256', typ: 'JWT' };
// Payload
var oPayload = {};
var tNow = KJUR.jws.IntDate.get('now');
var tEnd = KJUR.jws.IntDate.get('now + 1day');
oPayload.iss = "http://foobar.com";
oPayload.sub = "mailto:someone#hello.com";
oPayload.iat = tNow;
oPayload.exp = tEnd;
oPayload.jti = "id123";
oPayload.aud = "http://someUrl";
oPayload.email = "userEmail";
oPayload.pwd = "userPassword";
oPayload.deviceId = "deviceId";
// Sign JWT.
var sHeader = JSON.stringify(oHeader);
var sPayload = JSON.stringify(oPayload);
//secret -> your secret that the server gave you.
var sJWT = KJUR.jws.JWS.sign("HS256", sHeader, sPayload, secret);
console.log(sJWT);
So that's it. It solved my problem.
I know that the undefined crypto.subtle error still exists. I did not find any solutions to that problem.
I supose that one day the developers in charge of cordova will make the effort to support the cryto module that we can find in all other browser but for now the only solution is to used third party libraries.

Automatic Language Selection without click on flag [duplicate]

This question already has answers here:
Get visitors language & country code with javascript (client-side) [duplicate]
(3 answers)
How to determine user's locale within browser?
(10 answers)
Closed 4 years ago.
I have been trying to detect the browser language preference using JavaScript.
If I set the browser language in IE in Tools>Internet Options>General>Languages, how do I read this value using JavaScript?
Same problem for Firefox. I'm not able to detect the setting for tools>options>content>languages using navigator.language.
Using navigator.userLanguage , it detects the setting done thru
Start>ControlPanel>RegionalandLanguageOptions>Regional Options tab.
I have tested with navigator.browserLanguage and navigator.systemLanguage but neither returns the value for the first setting(Tools>InternetOptions>General>Languages)
I found a link which discusses this in detail, but the question remains unanswered :(

I think the main problem here is that the browser settings don't actually affect the navigator.language property that is obtained via javascript.
What they do affect is the HTTP 'Accept-Language' header, but it appears this value is not available through javascript at all. (Probably why #anddoutoi states he can't find a reference for it that doesn't involve server side.)
I have coded a workaround: I've knocked up a google app engine script at http://ajaxhttpheaders.appspot.com that will return you the HTTP request headers via JSONP.
(Note: this is a hack only to be used if you do not have a back end available that can do this for you. In general you should not be making calls to third party hosted javascript files in your pages unless you have a very high level of trust in the host.)
I intend to leave it there in perpetuity so feel free to use it in your code.
Here's some example code (in jQuery) for how you might use it
$.ajax({
url: "http://ajaxhttpheaders.appspot.com",
dataType: 'jsonp',
success: function(headers) {
language = headers['Accept-Language'];
nowDoSomethingWithIt(language);
}
});
Hope someone finds this useful.
Edit: I have written a small jQuery plugin on github that wraps this functionality: https://github.com/dansingerman/jQuery-Browser-Language
Edit 2: As requested here is the code that is running on AppEngine (super trivial really):
class MainPage(webapp.RequestHandler):
def get(self):
headers = self.request.headers
callback = self.request.get('callback')
if callback:
self.response.headers['Content-Type'] = 'application/javascript'
self.response.out.write(callback + "(")
self.response.out.write(headers)
self.response.out.write(")")
else:
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write("I need a callback=")
application = webapp.WSGIApplication(
[('/', MainPage)],
debug=False)
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
Edit3: Have open sourced the app engine code here: https://github.com/dansingerman/app-engine-headers

var language = window.navigator.userLanguage || window.navigator.language;
alert(language); //works IE/SAFARI/CHROME/FF
window.navigator.userLanguage is IE only and it's the language set in Windows Control Panel - Regional Options and NOT browser language, but you could suppose that a user using a machine with Window Regional settings set to France is probably a French user.
navigator.language is FireFox and all other browser.
Some language code: 'it' = italy, 'en-US' = english US, etc.
As pointed out by rcoup and The WebMacheter in comments below, this workaround won't let you discriminate among English dialects when users are viewing website in browsers other than IE.
window.navigator.language (Chrome/FF/Safari) returns always browser language and not browser's preferred language, but: "it's pretty common for English speakers (gb, au, nz, etc) to have an en-us version of Firefox/Chrome/Safari." Hence window.navigator.language will still return en-US even if the user preferred language is en-GB.

Update of year 2014.
Now there is a way to get Accept-Languages in Firefox and Chrome using navigator.languages (works in Chrome >= 32 and Firefox >= 32)
Also, navigator.language in Firefox these years reflects most preferred language of content, not language of UI. But since this notion is yet to be supported by other browsers, it is not very useful.
So, to get most preferred content language when possible, and use UI language as fallback:
navigator.languages
? navigator.languages[0]
: (navigator.language || navigator.userLanguage)

I came across this piece of code to detect browser's language in Angular Translate module, which you can find the source here. I slightly modified the code by replacing angular.isArray with Array.isArray to make it independent of Angular library.
var getFirstBrowserLanguage = function () {
var nav = window.navigator,
browserLanguagePropertyKeys = ['language', 'browserLanguage', 'systemLanguage', 'userLanguage'],
i,
language;
// support for HTML 5.1 "navigator.languages"
if (Array.isArray(nav.languages)) {
for (i = 0; i < nav.languages.length; i++) {
language = nav.languages[i];
if (language && language.length) {
return language;
}
}
}
// support for other well known properties in browsers
for (i = 0; i < browserLanguagePropertyKeys.length; i++) {
language = nav[browserLanguagePropertyKeys[i]];
if (language && language.length) {
return language;
}
}
return null;
};
console.log(getFirstBrowserLanguage());

let lang = window.navigator.languages ? window.navigator.languages[0] : null;
lang = lang || window.navigator.language || window.navigator.browserLanguage || window.navigator.userLanguage;
let shortLang = lang;
if (shortLang.indexOf('-') !== -1)
shortLang = shortLang.split('-')[0];
if (shortLang.indexOf('_') !== -1)
shortLang = shortLang.split('_')[0];
console.log(lang, shortLang);
I only needed the primary component for my needs, but you can easily just use the full string. Works with latest Chrome, Firefox, Safari and IE10+.

var language = navigator.languages && navigator.languages[0] || // Chrome / Firefox
navigator.language || // All browsers
navigator.userLanguage; // IE <= 10
console.log(language);
https://developer.mozilla.org/en-US/docs/Web/API/NavigatorLanguage/languages
https://developer.mozilla.org/en-US/docs/Web/API/NavigatorLanguage/language
Try PWA Template https://github.com/StartPolymer/progressive-web-app-template

navigator.userLanguage for IE
window.navigator.language for firefox/opera/safari

I've been using Hamid's answer for a while, but it in cases where the languages array is like ["en", "en-GB", "en-US", "fr-FR", "fr", "en-ZA"] it will return "en", when "en-GB" would be a better match.
My update (below) will return the first long format code e.g. "en-GB", otherwise it will return the first short code e.g. "en", otherwise it will return null.
function getFirstBrowserLanguage() {
var nav = window.navigator,
browserLanguagePropertyKeys = ['language', 'browserLanguage', 'systemLanguage', 'userLanguage'],
i,
language,
len,
shortLanguage = null;
// support for HTML 5.1 "navigator.languages"
if (Array.isArray(nav.languages)) {
for (i = 0; i < nav.languages.length; i++) {
language = nav.languages[i];
len = language.length;
if (!shortLanguage && len) {
shortLanguage = language;
}
if (language && len>2) {
return language;
}
}
}
// support for other well known properties in browsers
for (i = 0; i < browserLanguagePropertyKeys.length; i++) {
language = nav[browserLanguagePropertyKeys[i]];
//skip this loop iteration if property is null/undefined. IE11 fix.
if (language == null) { continue; }
len = language.length;
if (!shortLanguage && len) {
shortLanguage = language;
}
if (language && len > 2) {
return language;
}
}
return shortLanguage;
}
console.log(getFirstBrowserLanguage());
Update: IE11 was erroring when some properties were undefined. Added a check to skip those properties.

I've just come up with this. It combines newer JS destructuring syntax with a few standard operations to retrieve the language and locale.
var [lang, locale] = (
(
(
navigator.userLanguage || navigator.language
).replace(
'-', '_'
)
).toLowerCase()
).split('_');
Hope it helps someone

There is no decent way to get that setting, at least not something browser independent.
But the server has that info, because it is part of the HTTP request header (the Accept-Language field, see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4)
So the only reliable way is to get an answer back from the server. You will need something that runs on the server (like .asp, .jsp, .php, CGI) and that "thing" can return that info.
Good examples here: http://www.developershome.com/wap/detection/detection.asp?page=readHeader

I can't find a single reference that state that it's possible without involving the serverside.
MSDN on:
navigator.browserLanguage
navigator.systemLanguage
navigator.userLanguage
From browserLanguage:
In Microsoft Internet Explorer 4.0 and
earlier, the browserLanguage property
reflects the language of the installed
browser's user interface. For example,
if you install a Japanese version of
Windows Internet Explorer on an
English operating system,
browserLanguage would be ja.
In Internet Explorer 5 and later,
however, the browserLanguage property
reflects the language of the operating
system regardless of the installed
language version of Internet Explorer.
However, if Microsoft Windows 2000
MultiLanguage version is installed,
the browserLanguage property indicates
the language set in the operating
system's current menus and dialogs, as
found in the Regional Options of the
Control Panel. For example, if you
install a Japanese version of Internet
Explorer 5 on an English (United
Kingdom) operating system,
browserLanguage would be en-gb. If you
install Windows 2000 MultiLanguage
version and set the language of the
menus and dialogs to French,
browserLanguage would be fr, even
though you have a Japanese version of
Internet Explorer.
Note This property does not indicate
the language or languages set by the
user in Language Preferences, located
in the Internet Options dialog box.
Furthermore, it looks like browserLanguage is deprecated cause IE8 doesn't list it

I had the same problem, and I wrote the following front-end only library that wraps up the code for multiple browsers. It's not much code, but nice to not have to copy and paste the same code across multiple websites.
Get it: acceptedlanguages.js
Use it:
<script src="acceptedlanguages.js"></script>
<script type="text/javascript">
console.log('Accepted Languages: ' + acceptedlanguages.accepted);
</script>
It always returns an array, ordered by users preference. In Safari & IE the array is always single length. In FF and Chrome it may be more than one language.

I would like to share my code, because it works and it is different than the others given anwers.
In this example, if you speak French (France, Belgium or other French language) you are redirected on the French page, otherwise on the English page, depending on the browser configuration:
<script type="text/javascript">
$(document).ready(function () {
var userLang = navigator.language || navigator.userLanguage;
if (userLang.startsWith("fr")) {
window.location.href = '../fr/index.html';
}
else {
window.location.href = '../en/index.html';
}
});
</script>

If you only need to support certain modern browsers then you can now use:
navigator.languages
which returns an array of the user's language preferences in the order specified by the user.
As of now (Sep 2014) this works on:
Chrome (v37),
Firefox (v32) and
Opera (v24)
But not on:
IE (v11)

Javascript way:
var language = window.navigator.userLanguage || window.navigator.language;//returns value like 'en-us'
If you are using jQuery.i18n plugin, you can use:
jQuery.i18n.browserLang();//returns value like '"en-US"'

If you are developing a Chrome App / Extension use the chrome.i18n API.
chrome.i18n.getAcceptLanguages(function(languages) {
console.log(languages);
// ["en-AU", "en", "en-US"]
});

DanSingerman has a very good solution for this question.
The only reliable source for the language is in the HTTP-request header.
So you need a server-side script to reply the request-header or at least the Accept-Language field back to you.
Here is a very simple Node.js server which should be compatible with DanSingermans jQuery plugin.
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end(JSON.stringify(req.headers));
}).listen(80,'0.0.0.0');

For what it's worth, Wikimedia's Universal Language Selector library has hooks for doing this:
https://www.mediawiki.org/wiki/Extension:UniversalLanguageSelector
See the function getFrequentLanguageList in resources/js/ext.uls.init.js . Direct link:
https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/UniversalLanguageSelector.git;a=blob;f=resources/js/ext.uls.init.js;hb=HEAD
It still depends on the server, or more specifically, the MediaWiki API. The reason I'm showing it is that it may provide a good example of getting all the useful information about the user's language: browser language, Accept-Language, geolocation (with getting country/language info from the CLDR), and of course, user's own site preferences.

Dan Singerman's answer has an issue that the header fetched has to be used right away, due to the asynchronous nature of jQuery's ajax. However, with his google app server, I wrote the following, such that the header is set as part of the initial set up and can be used at later time.
<html>
<head>
<script>
var bLocale='raw'; // can be used at any other place
function processHeaders(headers){
bLocale=headers['Accept-Language'];
comma=bLocale.indexOf(',');
if(comma>0) bLocale=bLocale.substring(0, comma);
}
</script>
<script src="jquery-1.11.0.js"></script>
<script type="application/javascript" src="http://ajaxhttpheaders.appspot.com?callback=processHeaders"></script>
</head>
<body>
<h1 id="bLocale">Should be the browser locale here</h1>
</body>
<script>
$("#bLocale").text(bLocale);
</script>
</html>

If you don't want to rely on an external server and you have one of your own you can use a simple PHP script to achieve the same behavior as #DanSingerman answer.
languageDetector.php:
<?php
$lang = substr($_SERVER['HTTP_ACCEPT_LANGUAGE'], 0, 2);
echo json_encode($lang);
?>
And just change this lines from the jQuery script:
url: "languageDetector.php",
dataType: 'json',
success: function(language) {
nowDoSomethingWithIt(language);
}

If you have control of a backend and are using django, a 4 line implementation of Dan's idea is:
def get_browser_lang(request):
if request.META.has_key('HTTP_ACCEPT_LANGUAGE'):
return JsonResponse({'response': request.META['HTTP_ACCEPT_LANGUAGE']})
else:
return JsonResponse({'response': settings.DEFAULT_LANG})
then in urls.py:
url(r'^browserlang/$', views.get_browser_lang, name='get_browser_lang'),
and on the front end:
$.get(lg('SERVER') + 'browserlang/', function(data){
var lang_code = data.response.split(',')[0].split(';')[0].split('-')[0];
});
(you have to set DEFAULT_LANG in settings.py of course)

Based on the answer here Accessing the web page's HTTP Headers in JavaScript I built the following script to get the browser language:
var req = new XMLHttpRequest();
req.open('GET', document.location, false);
req.send(null);
var headers = req.getAllResponseHeaders().toLowerCase();
var contentLanguage = headers.match( /^content-language\:(.*)$/gm );
if(contentLanguage[0]) {
return contentLanguage[0].split(":")[1].trim().toUpperCase();
}

If you are using ASP .NET MVC and you want to get the Accepted-Languages header from JavaScript then here is a workaround example that does not involve any asynchronous requests.
In your .cshtml file, store the header securely in a div's data- attribute:
<div data-languages="#Json.Encode(HttpContext.Current.Request.UserLanguages)"></div>
Then your JavaScript code can access the info, e.g. using JQuery:
<script type="text/javascript">
$('[data-languages]').each(function () {
var languages = $(this).data("languages");
for (var i = 0; i < languages.length; i++) {
var regex = /[-;]/;
console.log(languages[i].split(regex)[0]);
}
});
</script>
Of course you can use a similar approach with other server technologies as others have mentioned.

For who are looking for Java Server solution
Here is RestEasy
#GET
#Path("/preference-language")
#Consumes({"application/json", "application/xml"})
#Produces({"application/json", "application/xml"})
public Response getUserLanguagePreference(#Context HttpHeaders headers) {
return Response.status(200)
.entity(headers.getAcceptableLanguages().get(0))
.build();
}

i had a diffrent approach, this might help someone in the future:
the customer wanted a page where you can swap languages.
i needed to format numbers by that setting (not the browser setting / not by any predefined setting)
so i set an initial setting depending on the config settings (i18n)
$clang = $this->Session->read('Config.language');
echo "<script type='text/javascript'>var clang = '$clang'</script>";
later in the script i used a function to determine what numberformating i need
function getLangsettings(){
if(typeof clang === 'undefined') clang = navigator.language;
//console.log(clang);
switch(clang){
case 'de':
case 'de-de':
return {precision : 2, thousand : ".", decimal : ","}
case 'en':
case 'en-gb':
default:
return {precision : 2, thousand : ",", decimal : "."}
}
}
so i used the set language of the page and as a fallback i used the browser settings.
which should be helpfull for testing purposes aswell.
depending on your customers you might not need that settings.

I have a hack that I think uses very little code and is quite reliable.
Put your site's files in a subdirectory. SSL into your server and create symlinks to that subdirectory where your files are stored that indicate your languages.
Something like this:
ln -s /var/www/yourhtml /var/www/en
ln -s /var/www/yourhtml /var/www/sp
ln -s /var/www/yourhtml /var/www/it
Use your web server to read HTTP_ACCEPT_LANGUAGE and redirect to these "different subdirectories" according to the language value it provides.
Now you can use Javascript's window.location.href to get your url and use it in conditionals to reliably identify the preferred language.
url_string = window.location.href;
if (url_string = "http://yoursite.com/it/index.html") {
document.getElementById("page-wrapper").className = "italian";
}

Identify tab that made request in Firefox Addon SDK

I'm using the Firefox Addon SDK to build something that monitors and displays the HTTP traffic in the browser. Similar to HTTPFox or Live HTTP Headers. I am interested in identifying which tab in the browser (if any) generated the request
Using the observer-service I am monitoring for "http-on-examine-response" events. I have code like the following to identify the nsIDomWindow that generated the request:
const observer = require("observer-service"),
{Ci} = require("chrome");
function getTabFromChannel(channel) {
try {
var noteCB= channel.notificationCallbacks ? channel.notificationCallbacks : channel.loadGroup.notificationCallbacks;
if (!noteCB) { return null; }
var domWin = noteCB.getInterface(Ci.nsIDOMWindow);
return domWin.top;
} catch (e) {
dump(e + "\n");
return null;
}
}
function logHTTPTraffic(sub, data) {
sub.QueryInterface(Ci.nsIHttpChannel);
var ab = getTabFromChannel(sub);
console.log(tab);
}
observer.add("http-on-examine-response", logHTTPTraffic);
Mostly cribbed from the documentation for how to identify the browser that generated the request. Some is also taken from the Google PageSpeed Firefox addon.
Is there a recommended or preferred way to go from the nsIDOMWindow object domWin to a tab element in the SDK tabs module?
I've considered something hacky like scanning the tabs list for one with a URL that matches the URL for domWin, but then I have to worry about multiple tabs having the same URL.

You have to keep using the internal packages. From what I can tell, getTabForWindow() function in api-utils/lib/tabs/tab.js package does exactly what you want. Untested code:
var tabsLib = require("sdk/tabs/tab.js");
return tabsLib.getTabForWindow(domWin.top);

The API has changed since this was originally asked/answered...
It should now (as of 1.15) be:
return require("sdk/tabs/utils").getTabForWindow(domWin.top);

As of Addon SDK version 1.13 change:
var tabsLib = require("tabs/tab.js");
to
var tabsLib = require("sdk/tabs/helpers.js");

If anyone still cares about this:
Although the Addon SDK is being deprecated in support of the newer WebExtensions API, I want to point out that
var a_tab = require("sdk/tabs/utils").getTabForContentWindow(window)
returns a different 'tab' object than the one you would typically get by using
worker.tab in a PageMod.
For example, a_tab will not have the 'id' attribute, but would have linkedPanel property that's similar to the 'id' attribute.

We Keep Coding

JavaScript is the programming language of the Web.

Detect Search Crawlers via JavaScript - javascript

I found this isbot package that has the built-in isbot() function. It seams to me that the package is properly maintained and that they keep everything up-to-date. USAGE: const isBot = require('isbot'); ... isBot(req.get('user-agent')); Package: https://www.npmjs.com/package/isbot

Related

How to detect if webkitSpeechRecognition actually works in a browser

How to create speech recognition object in JavaScript

crypto.subtle don't exist even with webcrypto-shim

Automatic Language Selection without click on flag [duplicate]

Identify tab that made request in Firefox Addon SDK

Categories

Resources