Parse host from url without subdomains etc

Parse host from url without subdomains etc - javascript

I'm working on a chrome-extension that reads the domain from window.location.hostname. Now for this extension to work properly, I need to be able to separate subdomains and other url variation to the same host. example:
I need all of the following url:s
www.google.com
accounts.google.com
photos.google.se
example.google.co.uk
https://google.com
all of these need to be resolved to, in this case, "google", in a way that is reliable and will work for any website with sometimes quirky subdomainconfigurations.
this is my current aproach, somewhat simplified:
var url = window.location.hostname.split(".") //returns an array of strings
for(i=0;i<url.length;i++){
if(url[i].match(domainregex) //regex for identifying domains ".com",".se",".co.uk" etc
return url[i-1] //usually what I'm after is directly before the domain, thus i-1
}
This approach is alot of hassle, and has proven unreliable at times...Is there any more straitforward way of doing this?

A more reliable solution to strip the top level domain part and get the main domain part is to use Public Suffix List which is used by Firefox and Chrome and other browsers.
Several js parsers of the list data are available if you don't want to write your own.

I had to do it for my fork of edit-my-cookies, so It will able to change profile of cookies per site. (https://github.com/AminaG/swap-my-cookies-multisite/blob/master/js/tools.js)
It is what I did, and it is working for me. I am sure if it not complete solution, but I am sure it can helps.
var remove_sub_domain=function(v){
var is_co=v.match(/\.co\./)
v=v.split('.')
v=v.slice(is_co ? -3: -2)
v=v.join('.')
console.log(v)
return v
}
it is working for:
www.google.com
accounts.google.com
photos.google.se
example.google.co.uk
google.com
if you want it to work also for:
http://gooogle.com
You first need to remove the protocol:
parser=document.createElement('a');
parser.href=url;
host=parser.host;
newurl=remove_sub_domain(host);

Related

How to get exactly url from document.referrer?

I am using React and I am getting the document referrer using
document.referrer
However when I am directed to my url from google.com, I am just getting the domain in document.referrer instead of something like this: https://www.google.com/search?q=fb&rlz=1C1CHBF_enIN850IN850&oq=fb&aqs=chrome.0.69i59j46i199i291i433j0i131i433l2j0i395j69i60l3.1007j1j9&sourceid=chrome&ie=UTF-8.
Is there someway I can get the entire URL including pathname and query string? Thanks in advance.

Websites can set a Referrer-Policy. If they don't then, as of version 85, Chrome defaults to strict-origin-when-cross-origin. (Other browsers will have their own defaults, but Chrome's marketshare makes it worth highlighting)
This means that the referrer information will include only the origin and not the full URL.
This is a feature designed to protect the user's privacy.
There is no way around it unless you control the site that controls the link (in your example: www.google.com).

Make google Recaptcha work with special characters in the domain name

I was setting up an API key for a site with the swedish charachter ä in the domain name (http://sälja.io) but it did not initialize the recaptcha.
Then trying an api key for the equivalent url http://xn--slja-loa.io which worked when reaching the site from http://xn--slja-loa.io but not http://sälja.io.
Then I found the secure token which should work on all domains. It initialized the recaptchas on all domains and was also working on all tested domains, except the one with ä in it.
https://developers.google.com/recaptcha/docs/secure_token
Is there any way to get it working also with ä in the domain name ?
Edit
Since an api-key for http://xn--slja-loa.io is working from android when accessing the site from http://sälja.io, it might be how the browser interprets the domain. Eg. firefox interprets http://sälja.io as domain http://sälja.io and cannot get a response from google servers that will not allow ä in domain names. Android inteprets http://sälja.io as http://xn--slja-loa.io and will get a response since it's not containing ä. Any thoughts about this ? Is there any way to force the browser to interpret http://sälja.io as http://xn--slja-loa.io ?
Edit2
Code examples can be reached on sälja.io/test, 178.62.187.163/test and xn--slja-loa.io/test
Edit3
As of today (25.11.2015) it seems not to be possible to use recaptcha with a special charachter like ä in the domain name. Since aishwat singh have been helping the most to coming to this conclusion within the time for the bounty he will be rewarded, however an answer will be accepted when a solution can be provided for this problem.

I tried it just now and I am able to generate key for example-ä.se 6Ld8VRETAAAAALRXFNxmjEeVzbg2y5vdWv7THwJz
will post a complete working example shortly
EDIT 1
Here's git repo for code and same running on heroku
Ok it's not a complete fix because i used herokuapp.com as domain so example-ä.herokuapp.com becomes a sub domain and works
(however when earlier i tried it created a key for example-ä.se too but on loading page at captcha it gives invalid domain, figured out that was original issue, i thought you were not able to generate key for this domain)
Will try for a complete fix
EDIT 2
Btw you can specify your domain ip too, check this discussion
For me heroku free account doesn't provides ip of domain so it's difficult for me to test it
Also i was reading this thread
Figured out if i specify domains list as just com it accepts. In your case, just specify se as domains and it should work , google doesn't checks out exact url it just checks domain
Hope it helps, Will look into stoken approach too

you can read this post:
http://webdesign.tutsplus.com/tutorials/how-to-integrate-no-captcha-recaptcha-in-your-website--cms-23024
.....................

webkitgtk custom scheme and XMLHttpRequest

I have a custom scheme ("embed") which serves data from an sqlite db
while I can do (inside a embed scheme page)
<img src="embed://any/old/uri/image.gif" />
if I try to do XMLHttpRequests I can only do relative paths, and only ones without double dots which seem to be stripped
Is there any way to get resources anywhere from the same scheme without issues. I notice the issue (as well as simple jscript tests) with three.js and howl.js It would be nice to be able to pull resources from anywhere within the same scheme.
I have tried the following
WebKitSettings *wks = webkit_web_view_get_settings (webView);
webkit_settings_set_enable_webgl (wks, TRUE);
webkit_settings_set_enable_webaudio (wks, TRUE);
webkit_settings_set_enable_xss_auditor(wks,FALSE);
webkit_settings_set_enable_hyperlink_auditing(wks, FALSE);
webkit_settings_set_enable_write_console_messages_to_stdout(wks, TRUE);
I also tried
webkit_security_manager_register_uri_scheme_as_local(wksm ,"embed");
but that seems to make things worse! I tried a bunch of other stuff but just seem to be chasing my own tail at the moment!
To be clear I'd be quite happy to turn off all "security" and let any embed:// resource have access to any other embed:// resource regardless of access method.

interesting one this!
a URI specification must contain a domain
as such this is the first part of the URI
embed://demo/content/main.js
embed://demo/image/texture.jpg
will work (here main.js is accessing texture.jpg)
embed://content/main.js
embed://image/texture.jpg
won't work! accessing texture.jpg from main.js looks to webkit as if a script on the content domain is accessing a resource on the image domain

Block a specific request by it's url in firefox

how to prevent firefox from making a specific request
to an url, ex: site.com/ajax/something.php
i have found a lot of addons but they couldn't really do the job they
can block requests to another domain but not an absolute uri,
Is there any way to accomplish this ?

The solution with addons is to use AdsBlockPlus->Filter preferences->Add filter, then simply put the url after || eg: ||facebook.com/ajax/mercury/change_read_status.phpwhich will prevent any requests to the url, Programmatically #Noitidart link is a perfect solution i was looking for that too.

Here you go man: firefox extension: intercepting url it is requesting and blocking conditionally
That uses observer service. Ideally you want to use nsIContentPolicy which i think is more performant but i dont have a solution with that to share. The Adblock Plus author is on this forum he may be able to give us a solution i can spam. :P

Is it the filename or the whole URL used as a key in browser caches?

It's common to want browsers to cache resources - JavaScript, CSS, images, etc. until there is a new version available, and then ensure that the browser fetches and caches the new version instead.
One solution is to embed a version number in the resource's filename, but will placing the resources to be managed in this way in a directory with a revision number in it do the same thing? Is the whole URL to the file used as a key in the browser's cache, or is it just the filename itself and some meta-data?
If my code changes from fetching /r20/example.js to /r21/example.js, can I be sure that revision 20 of example.js was cached, but now revision 21 has been fetched instead and it is now cached?

Yes, any change in any part of the URL (excluding HTTP and HTTPS protocols changes) is interpreted as a different resource by the browser (and any intermediary proxies), and will thus result in a separate entity in the browser-cache.
Update:
The claim in this ThinkVitamin article that Opera and Safari/Webkit browsers don't cache URLs with ?query=strings is false.
Adding a version number parameter to a URL is a perfectly acceptable way to do cache-busting.
What may have confused the author of the ThinkVitamin article is the fact that hitting Enter in the address/location bar in Safari and Opera results in different behavior for URLs with query string in them.
However, (and this is the important part!) Opera and Safari behave just like IE and Firefox when it comes to caching embedded/linked images and stylesheets and scripts in web pages - regardless of whether they have "?" characters in their URLs. (This can be verified with a simple test on a normal Apache server.)
(I would have commented on the currently accepted answer if I had the reputation to do it. :-)

Browser cache key is a combination of the request method and resource URI. URI consists of scheme, authority, path, query, and fragment.
Relevant excerpt from HTTP 1.1 specification:
The primary cache key consists of the request method and target URI.
However, since HTTP caches in common use today are typically limited
to caching responses to GET, many caches simply decline other methods
and use only the URI as the primary cache key.
Relevant excerpt from URI specification:
The generic URI syntax consists of a hierarchical sequence of
components referred to as the scheme, authority, path, query, and
fragment.
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty

I am 99.99999% sure that it is the entire url that is used to cache resources in a browser, so your url scheme should work out fine.

The MINIMUM you need to identify an HTTP object is by the full path, including any query-string parameters. Some browsers may not cache objects with a query string but that has nothing to do with the key to the cache.
It is also important to remember that the the path is no longer sufficient. The Vary: header in the HTTP response alerts the browser (or proxy server, etc.) of anything OTHER than the URL which should be used to determine the cache key, such as cookies, encoding values, etc.
To your basic question, yes, changing the URL of the .js file is sufficent. TO the larger question of what determines the cache key, it's the URL plus the Vary: header restrictions.

Yes. A different path is the same from the caches perspective.

Of course it has to use the whole path '/r20/example.js' vs '/r21/example.js' could be completely different images to begin with. What you suggest is a viable way to handle version control.

depends. it is supposed to be the full URL, but some browsers (Opera, Safari2) apply a different cache strategy for urls with different params.
best bet is to change the name of the file.
There is a very clever solution here (uses PHP, Apache)
http://verens.com/archives/2008/04/09/javascript-cache-problem-solved/
Strategy notes:
“According the letter of the HTTP caching specification, user agents should never cache URLs with query strings. While Internet Explorer and Firefox ignore this, Opera and Safari don’t - to make sure all user agents can cache your resources, we need to keep query strings out of their URLs.”
http://www.thinkvitamin.com/features/webapps/serving-javascript-fast

Entire url. I've seen a strange behavior in a few older browsers where case sensitivity came into play.

In addition to the existing answers I just want to add that it might not apply if you use ServiceWorkers or e.g offline-plugin. Then you could experience different cache rules depending on how the ServiceWorkers are set up.

In most browsers the full url is used.
In some browsers, if you have a query in the url, the document will never be cached.

We Keep Coding

JavaScript is the programming language of the Web.