Regular expression to determine website root

Regular expression to determine website root - javascript

I have following url's and all these url are considered root of the website, how can I use javascript location.pathname using regex to determine pattern below, as you'll notice the word "site" is repeating in this pattern..
http://www.somehost.tv/sitedev/
http://www.somehost.tv/sitetest/
http://www.somehost.tv/site/
http://www.somehost.tv/sitedev/index.html
http://www.somehost.tv/sitetest/index.html
http://www.somehost.tv/site/index.html
I am attempting to display jQuery dialog only and only if the user is at the root of the website.

Simply use the DOM to parse this. No need to invoke a regex parser.
var url = 'http://www.somesite.tv/foobar/host/site';
urlLocation = document.createElement('a');
urlLocation.href = url;
alert(urlLocation.hostname); // alerts 'www.somesite.tv'

A complete pattern, including protocol and domain, could be like this:
/^http:\/\/www\.somehost\.tv\/site(test|dev)?\/(index\.html)?$/
but, if you're matching against location.pathname just try
/^\/site(test|dev)?\/(index\.html)?$/.test(location.pathname)

If you do not explicitly need a Regular Expression for this
You also could do for example
Fill an array with your urls
Loop over a decreasing substring of
the shortest element.
Comparing it against
the longest element.
Until they match.
var urls = ["http://www.somehost.tv/sitedev/",
"http://www.somehost.tv/sitetest/",
"http://www.somehost.tv/site/",
"http://www.somehost.tv/sitedev/index.html",
"http://www.somehost.tv/sitetest/index.html",
"http://www.somehost.tv/site/index.html"]
function getRepeatedSub(arr) {
var srt = arr.concat().sort();
var a = srt[0];
var b = srt.pop();
var s = a.length;
while (!~b.indexOf(a.substr(0, s))) {
s--
};
return a.substr(0, s);
}
console.log(getRepeatedSub(urls)); //http://www.somehost.tv/site
Heres an example on JSBin

Related

How to redirect url based on only one part of the path?

Let me explain what I mean:
I want to redirect from https://example.net/category/83745/my-first-post to https://myredirect.net/my-first-post but without considering /category/numbers/
For the moment I work with this:
if(window.location.pathname == '/category/83745/my-first-post')
{
window.location.href="https://myredirect.net/my-first-post";
}
And it is working fine but as I described I need to remove /category/numbers/ because they could be different and only consider this part /my-first-post for the redirection.
Thanks in advance.

if you want to just ignore the first 2 parts dynamically and only care about the last part of the URL then just do the following:
var stringContains = function (str, partial){
return (str.indexOf(partial) > -1);
};
var url = '/category/83745/my-first-post';
if(stringContains(url, "/category")){
var parts = a.split("/");
window.location.href = parts[parts.length-1];
}

You can use String's methods lastIndexOf and slice:
var path = window.location.pathname;
window.location.href = "https://myredirect.net" + path.slice(path.lastIndexOf('/') + 1);

Use Regex. Something like
if(window.location.pathname.match(/\/category\/\d+\/my\-first\-post$/)
{
window.location.href="https://myredirect.net/my-first-post";
}

You can run a regular expression match on the pathname
if(window.location.pathname.match(/my-first-post$/)) {
window.location.href='/my-first-post';
}
More on regexes: https://www.regular-expressions.info/
Another good tool for building and testing regexes: https://regex101.com/
Edit:
To give an example of how to regex according to the more fleshed out specs from Chris G
let pathmatch = window.location.pathname.match(/([^\/]+)$/g);
window.location.href = '/' + pathmatch[0];
Thus, regex can be utilized to grab any pattern and use it later.
IF there is a need to make sure the pathname contains category and/or numbers, it is easily added in to the pattern. This one simply disregards anything before the last forward slash (/)

parse url to get id using javascript regex

I am writing a router which will parse the url and redirect to necessary components in the code, when I change my url and pass object id with it, I want to parse it using regular expression and route it to get that object by id.
mysite.com/blah#path=folder/?folderId=klafjlka
How do I parse this url using javscript regex and route it to that folder
With reference to backbone, I want to write a code which does this, but I'm not using backbone
routes : { "folder/:id" : "handler" },

I tend to find that using .split normally creates much more readable code in these situations.
If you use window.location.hash to get your data originally, you'll be left with
#path=folder/?folderId=klafjlka
Eliminating the first lot of un-needed stuff. The rest can be simply done with a split and a looped split.
//Remove the initial hash from the window.location.hash
var hash = window.location.hash.substr(1),
//Split it down so we have ["path=folder","folderId=klafjlka"]
paramSplit = hash.split("/?");
var params = {};
for (var x=0; x<paramSplit.length; x++){
//Split it at the equals
var split = paramSplit[x].split("=");
params[split[0]]=split[1];
}
console.log(params);
Params should return
{
path: "folder",
folderId: "klafjlka"
}
Which is easy to use for whatever your purposes are.

If your url is in a string and has always the same structure
var url = 'mysite.com/blah#path=folder/?folderId=klafjlka';
var re = /#path=(.+?)\?folderId=(.*)/i
var args = url.match(re);
var path = args[1];
var id = args[2];
this searches for #path= and captures the following characters until ? and then searches for ?folderId= and captures everything else.
Now path will contain folder/ and id wil contain klafjlka.

Given a URL as a string, how to extract just the domain and extension?

Given a string with URLs in the following formats:
https://www.cnn.com/
http://www.cnn.com/
http://www.cnn.com/2012/02/16/world/american-nicaragua-prison/index.html
http://edition.cnn.com/?hpt=ed_Intl
W JS/jQuery, how can I extract from the string just cnn.com for all of them? Top level domain plus extension?
Thanks

var loc = document.createElement('a');
loc.href = 'http://www.cnn.com/2012/02/16/world/index.html';
window.alert(loc.hostname); // alerts "cnn.com"
Credits for the previous method:
Creating a new Location object in javascript

function domain(input){
var matches,
output = "",
urls = /\w+:\/\/([\w|\.]+)/;
matches = urls.exec(input);
if(matches !== null){
output = matches[1];
}
return output;
}

Given that there are top-level domains with dots in them, for example "co.uk", there's no way to do this programatically unless you include a list of all of the TLDs with dots in them.

var domain = location.host.split('.').slice(-2);
If you want it reassembled:
var domain = location.host.split('.').slice(-2).join('.');
But this won't work with co.uk or something. There's no hard nor fast rule for this, not even regex will determine that.

// something.domain.com -> domain.com
function getDomain() {
return window.location.hostname.replace(/([a-z]+.)/,"");
}

What is the best way to parse a URL with JavaScript? [duplicate]

If there is one thing I just cant get my head around, it's regex.
So after a lot of searching I finally found this one that suits my needs:
function get_domain_name()
{
aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
//aaaa="http://somesite.se/blah/sese";
domain_name_parts = aaaa.match(/:\/\/(.[^/]+)/)[1].split('.');
if(domain_name_parts.length >= 3){
domain_name_parts[0] = '';
}
var domain = domain_name_parts.join('.');
if(domain.indexOf('.') == 0)
alert("1"+ domain.substr(1));
else
alert("2"+ domain);
}
It basically gives me back the domain name, is there anyway I can also get all the stuff after the domain name? in this case it would be /blah/sdgsdgsdgs from the aaaa variable.

EDIT (2020): In modern browsers, you can use the built-in URL Web API.
https://developer.mozilla.org/en-US/docs/Web/API/URL/URL
var url = new URL("http://www.somesite.se/blah/sdgsdgsdgs");
var pathname = url.pathname; // returns /blah/sdgsdgsdgs
Instead of relying on a potentially unreliable* regex, you should instead use the built-in URL parser that the JavaScript DOM API provides:
var url = document.createElement('a');
url.href = "http://www.example.com/some/path?name=value#anchor";
That's all you need to do to parse the URL. Everything else is just accessing the parsed values:
url.protocol; //(http:)
url.hostname; //(www.example.com)
url.pathname; //(/some/path)
url.search; // (?name=value)
url.hash; //(#anchor)
In this case, if you're looking for /blah/sdgsdgsdgs, you'd access it with url.pathname
Basically, you're just creating a link (technically, anchor element) in JavaScript, and then you can make calls to the parsed pieces directly. (Since you're not adding it to the DOM, it doesn't add any invisible links anywhere.) It's accessed in the same way that values on the location object are.
(Inspired by this wonderful answer.)
EDIT: An important note: it appears that Internet Explorer has a bug where it omits the leading slash on the pathname attribute on objects like this. You could normalize it by doing something like:
url.pathname = url.pathname.replace(/(^\/?)/,"/");
Note:
*: I say "potentially unreliable", since it can be tempting to try to build or find an all-encompassing URL parser, but there are many, many conditions, edge cases and forgiving parsing techniques that might not be considered or properly supported; browsers are probably best at implementing (since parsing URLs is critical to their proper operation) this logic, so we should keep it simple and leave it to them.

The RFC (see appendix B) provides a regular expression to parse the URI parts:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9
where
scheme = $2
authority = $4
path = $5
query = $7
fragment = $9
Example:
function parse_url(url) {
var pattern = RegExp("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?");
var matches = url.match(pattern);
return {
scheme: matches[2],
authority: matches[4],
path: matches[5],
query: matches[7],
fragment: matches[9]
};
}
console.log(parse_url("http://www.somesite.se/blah/sdgsdgsdgs"));
gives
Object
authority: "www.somesite.se"
fragment: undefined
path: "/blah/sdgsdgsdgs"
query: undefined
scheme: "http"
DEMO

Please note that this solution is not the best. I made this just to match the requirements of the OP. I personally would suggest looking into the other answers.
THe following regexp will give you back the domain and the rest. :\/\/(.[^\/]+)(.*):
www.google.com
/goosomething
I suggest you studying the RegExp documentation here: http://www.regular-expressions.info/reference.html
Using your function:
function get_domain_name()
{
aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
//aaaa="http://somesite.se/blah/sese";
var matches = aaaa.match(/:\/\/(?:www\.)?(.[^/]+)(.*)/);
alert(matches[1]);
alert(matches[2]);
}

You just need to modify your regex a bit. For example:
var aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
var m = aaaa.match(/^[^:]*:\/\/([^\/]+)(\/.*)$/);
m will then contain the following parts:
["http://www.somesite.se/blah/sdgsdgsdgs", "www.somesite.se", "/blah/sdgsdgsdgs"]
Here is the same example, but modified so that it will split out the "www." part. I think the regular expression should be written so that the match will work whether or not you you have the "www." part. So check this out:
var aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
var m = aaaa.match(/^[^:]*:\/\/(www\.)?([^\/]+)(\/.*)$/);
m will then contain the following parts:
["http://www.somesite.se/blah/sdgsdgsdgs", "www.", "somesite.se", "/blah/sdgsdgsdgs"]
Now check out the same regular expression but with a url that does not start with "www.":
var bbbb="http://somesite.se/blah/sdgsdgsdgs";
var m = .match(/^[^:]*:\/\/(www\.)?([^\/]+)(\/.*)$/);
Now your match looks like this:
["http://somesite.se/blah/sdgsdgsdgs", undefined, "somesite.se", "/blah/sdgsdgsdgs"]
So as you can see it will do the right thing in both cases.

There is a nice jQuery plugin for parsing URLs: Purl.
All the regex stuff is hidden inside, and you get something like:
> url = $.url("http://markdown.com/awesome/language/markdown.html?show=all#top");
> url.attr('source');
"http://markdown.com/awesome/language/markdown.html?show=all#top"
> url.attr('protocol');
"http"
> url.attr('host');
"markdown.com"
> url.attr('relative');
"/awesome/language/markdown.html?show=all#top"
> url.attr('path');
"/awesome/language/markdown.html"
> url.attr('directory');
"/awesome/language/"
> url.attr('file');
"markdown.html"
> url.attr('query');
"show=all"
> url.attr('fragment');
"top"

Browsers have come a long way since this question was first asked. You can now use the native URL interface to accomplish this:
const url = new URL('http://www.somesite.se/blah/sdgsdgsdgs')
console.log(url.host) // "www.somesite.se"
console.log(url.href) // "http://www.somesite.se/blah/sdgsdgsdgs"
console.log(url.origin) // "http://www.somesite.se"
console.log(url.pathname) // "/blah/sdgsdgsdgs"
console.log(url.protocol) // "http:"
// etc.
Be aware that IE does not support this API. But, you can easily polyfill it with polyfill.io:
<script crossorigin="anonymous" src="https://polyfill.io/v3/polyfill.min.js?flags=gated&features=URL"></script>

How to parse a URL?

If there is one thing I just cant get my head around, it's regex.
So after a lot of searching I finally found this one that suits my needs:
function get_domain_name()
{
aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
//aaaa="http://somesite.se/blah/sese";
domain_name_parts = aaaa.match(/:\/\/(.[^/]+)/)[1].split('.');
if(domain_name_parts.length >= 3){
domain_name_parts[0] = '';
}
var domain = domain_name_parts.join('.');
if(domain.indexOf('.') == 0)
alert("1"+ domain.substr(1));
else
alert("2"+ domain);
}
It basically gives me back the domain name, is there anyway I can also get all the stuff after the domain name? in this case it would be /blah/sdgsdgsdgs from the aaaa variable.

EDIT (2020): In modern browsers, you can use the built-in URL Web API.
https://developer.mozilla.org/en-US/docs/Web/API/URL/URL
var url = new URL("http://www.somesite.se/blah/sdgsdgsdgs");
var pathname = url.pathname; // returns /blah/sdgsdgsdgs
Instead of relying on a potentially unreliable* regex, you should instead use the built-in URL parser that the JavaScript DOM API provides:
var url = document.createElement('a');
url.href = "http://www.example.com/some/path?name=value#anchor";
That's all you need to do to parse the URL. Everything else is just accessing the parsed values:
url.protocol; //(http:)
url.hostname; //(www.example.com)
url.pathname; //(/some/path)
url.search; // (?name=value)
url.hash; //(#anchor)
In this case, if you're looking for /blah/sdgsdgsdgs, you'd access it with url.pathname
Basically, you're just creating a link (technically, anchor element) in JavaScript, and then you can make calls to the parsed pieces directly. (Since you're not adding it to the DOM, it doesn't add any invisible links anywhere.) It's accessed in the same way that values on the location object are.
(Inspired by this wonderful answer.)
EDIT: An important note: it appears that Internet Explorer has a bug where it omits the leading slash on the pathname attribute on objects like this. You could normalize it by doing something like:
url.pathname = url.pathname.replace(/(^\/?)/,"/");
Note:
*: I say "potentially unreliable", since it can be tempting to try to build or find an all-encompassing URL parser, but there are many, many conditions, edge cases and forgiving parsing techniques that might not be considered or properly supported; browsers are probably best at implementing (since parsing URLs is critical to their proper operation) this logic, so we should keep it simple and leave it to them.

The RFC (see appendix B) provides a regular expression to parse the URI parts:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9
where
scheme = $2
authority = $4
path = $5
query = $7
fragment = $9
Example:
function parse_url(url) {
var pattern = RegExp("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?");
var matches = url.match(pattern);
return {
scheme: matches[2],
authority: matches[4],
path: matches[5],
query: matches[7],
fragment: matches[9]
};
}
console.log(parse_url("http://www.somesite.se/blah/sdgsdgsdgs"));
gives
Object
authority: "www.somesite.se"
fragment: undefined
path: "/blah/sdgsdgsdgs"
query: undefined
scheme: "http"
DEMO

Please note that this solution is not the best. I made this just to match the requirements of the OP. I personally would suggest looking into the other answers.
THe following regexp will give you back the domain and the rest. :\/\/(.[^\/]+)(.*):
www.google.com
/goosomething
I suggest you studying the RegExp documentation here: http://www.regular-expressions.info/reference.html
Using your function:
function get_domain_name()
{
aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
//aaaa="http://somesite.se/blah/sese";
var matches = aaaa.match(/:\/\/(?:www\.)?(.[^/]+)(.*)/);
alert(matches[1]);
alert(matches[2]);
}

You just need to modify your regex a bit. For example:
var aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
var m = aaaa.match(/^[^:]*:\/\/([^\/]+)(\/.*)$/);
m will then contain the following parts:
["http://www.somesite.se/blah/sdgsdgsdgs", "www.somesite.se", "/blah/sdgsdgsdgs"]
Here is the same example, but modified so that it will split out the "www." part. I think the regular expression should be written so that the match will work whether or not you you have the "www." part. So check this out:
var aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
var m = aaaa.match(/^[^:]*:\/\/(www\.)?([^\/]+)(\/.*)$/);
m will then contain the following parts:
["http://www.somesite.se/blah/sdgsdgsdgs", "www.", "somesite.se", "/blah/sdgsdgsdgs"]
Now check out the same regular expression but with a url that does not start with "www.":
var bbbb="http://somesite.se/blah/sdgsdgsdgs";
var m = .match(/^[^:]*:\/\/(www\.)?([^\/]+)(\/.*)$/);
Now your match looks like this:
["http://somesite.se/blah/sdgsdgsdgs", undefined, "somesite.se", "/blah/sdgsdgsdgs"]
So as you can see it will do the right thing in both cases.

There is a nice jQuery plugin for parsing URLs: Purl.
All the regex stuff is hidden inside, and you get something like:
> url = $.url("http://markdown.com/awesome/language/markdown.html?show=all#top");
> url.attr('source');
"http://markdown.com/awesome/language/markdown.html?show=all#top"
> url.attr('protocol');
"http"
> url.attr('host');
"markdown.com"
> url.attr('relative');
"/awesome/language/markdown.html?show=all#top"
> url.attr('path');
"/awesome/language/markdown.html"
> url.attr('directory');
"/awesome/language/"
> url.attr('file');
"markdown.html"
> url.attr('query');
"show=all"
> url.attr('fragment');
"top"

Browsers have come a long way since this question was first asked. You can now use the native URL interface to accomplish this:
const url = new URL('http://www.somesite.se/blah/sdgsdgsdgs')
console.log(url.host) // "www.somesite.se"
console.log(url.href) // "http://www.somesite.se/blah/sdgsdgsdgs"
console.log(url.origin) // "http://www.somesite.se"
console.log(url.pathname) // "/blah/sdgsdgsdgs"
console.log(url.protocol) // "http:"
// etc.
Be aware that IE does not support this API. But, you can easily polyfill it with polyfill.io:
<script crossorigin="anonymous" src="https://polyfill.io/v3/polyfill.min.js?flags=gated&features=URL"></script>

We Keep Coding

JavaScript is the programming language of the Web.

Regular expression to determine website root - javascript

Simply use the DOM to parse this. No need to invoke a regex parser. var url = 'http://www.somesite.tv/foobar/host/site'; urlLocation = document.createElement('a'); urlLocation.href = url; alert(urlLocation.hostname); // alerts 'www.somesite.tv'

A complete pattern, including protocol and domain, could be like this: /^http:\/\/www\.somehost\.tv\/site(test|dev)?\/(index\.html)?$/ but, if you're matching against location.pathname just try /^\/site(test|dev)?\/(index\.html)?$/.test(location.pathname)

Related

How to redirect url based on only one part of the path?

parse url to get id using javascript regex

Given a URL as a string, how to extract just the domain and extension?

What is the best way to parse a URL with JavaScript? [duplicate]

How to parse a URL?

Categories

Resources