Domain/host name inside regex Javascript - javascript

I want to create a regex that parse only links that complies to the current domain. I want to know if there is something like {hostname} that I can use inside a Javascript regex object so it will lock the search only to those links that are for the current page specific domain.
For example: if the domain is www.domain.com, it will search links only links that start with that specific domain. if the domain is anotherdomain.com, it will search only links that start with that specific domain. My regex is more complex, but I would like to be able to put some kind of global variable that will be replaced with the current domain.

You can retrieve the hostname of the current page via the Location object :
var hostname = window.location.hostname;
And then you can compose your regex using that variable by concatenating it into your regex :
var re = new RegExp("<start of your regex>" + hostname + "<end of your regex>");

You can get the hostname from window.location.hostname.
Then, you will want to escape potential special characters in a hostname.
A hostname will usually include . which is a special character in regular expressions, and it can certainly also contain - which can be a special character.
Probably best to engage in defensive programming and escape everything that might be special characters in regular expressions even though a lot of them shouldn't actually ever appear. From Escape string for use in Javascript regex you can use this function:
function escapeRegExp(str) {
return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
}
And then you can do this:
var regexpFragment = escapeRegExp(window.location.hostname);

Related

Regex to get SLD + TLD from domain string

I have a function that takes a domain name as an argument but the domain must be in the format of xxx.com. e.g. http://subdomain.example.com must be passed into the function as example.com.
I have written the below regex but it's only returning the TLD (.com). I'm a bit of a newb with regex so can't really see where I've gone wrong... The first statement is to extract http:// from domain and the second statement should extract any subdomain.
var domain = req.query.domain.replace(/.*?:\/\//g, '').replace(/^[^.]+\./g, '');
Using the above regex, http://example.com becomes com.
I think it's easier to match the pattern directly than to match and remove its complement. I would use the pattern /[^./]+\.[^./]+$/. This matches two runs of non-special characters separated by a period at the end of the string.
alert('http://subdomain.example.com'.match(/[^./]+\.[^./]+$/)[0]);
alert('http://example.com'.match(/[^./]+\.[^./]+$/)[0]);
A much easier answer is to use the document.createElement trick shown here
To get the hostname and TLD, you'd simply write
var link = document.createElement('a');
link.href = req.query.domain;
var formattedDomain = link.hostname; //yay

How do I match URLs with regular expressions?

We want to check if a URL matches mail.google.com or mail.yahoo.com (also a subdomain of them is accepted) but not a URL which contains this string after a question mark. We also want the strings "mail.google.com" and "mail.yahoo.com" to come before the third slash of the URL, for example https://mail.google.com/ is accepted, https://www.facebook.com/mail.google.com/ is not accepted, and https://www.facebook.com/?mail=https://mail.google.com/ is also not accepted. https://mail.google.com.au/ is also not accepted. Is it possible to do it with regular expressions?
var possibleURLs = /^[^\?]*(mail\.google\.com|mail\.yahoo\.com)\//gi;
var url;
// assign a value to var url.
if (url.match(possibleURLs) !== null) {
// Do something...
}
Currently this will match both https://mail.google.com/ and https://www.facebook.com/mail.google.com/ , but we don't want to match https://www.facebook.com/mail.google.com/.
Edit: I want to match any protocol (any string which doesn't contain "?" and "/") followed by a slash "/" twice (the string and the slash can both be twice), then any string which doesn't contain "?" and "/" (if it's not empty, it must end with a dot "."), and then (mail\.google\.com|mail\.yahoo\.com)\/. Case insensitive.
Not being funny - but why must it be a regular expression?
Is there are reason why you couldn't simplify the process using URL (or webkitURL in Chrome and Safari) - the URL constructor simply takes a string and then contains properties for each part of the URL. Whether it supports all the host types that you want to support, I don't know.
Granted, you might still need a regex after that (although really you'd just be checking that the hostname ends with either yahoo.com or google.com), but you would just be running it against the hostname of the URL object rather than the whole URI.
The API is not ubiquitous, but seems reasonably well supported and, anyway, if this is client-side validation then I hope you're checking it on the server, too, because sidestepping javascript validation is easy.
How about
^[a-z]+:\/\/([^.\/]+\.)*mail\.(google|yahoo).com\/
Regex Example Link
^ Anchors the regex at the start of the string
[a-z]+ Matches the protocol. If you want a specific set of protocols, then (https?|ftp) may do the work
([^.\/]+\.)* matches the subdomin part
^([-a-z]+://|^cid:|^//)([^/\?]+\.)?mail\.(google|yahoo)\.com/
Should do the trick
The first ^ means "match beginning of line", the second negates the allowed characters, thus making a slash / not allowed.
Nb. You still have to escape the slashes, or use it as a string in new RegExp(string):
new RegExp('^([-a-z]+://|^cid:|^//)([^/\?]+\.)?mail\.(google|yahoo)\.com/')
OK, I found that it works with:
var possibleURLs = /^([^\/\?]*\/){2}([^\.\/\?]+\.)*(mail\.google\.com|mail\.yahoo\.com)\//gi;

Allowing for wildcard at end of url for javascript conditional css style

I have a script working that switches stylesheet depending on the url.
It has to be done this way because the site uses smarty templates and I don't want to alter the core files or the core css.
Right now I have to add the URL pathname of each individual page. The more pages there are, the more impractical this is.
So for example, instead of /ojs/index.php/index/user/register and /ojs/index.php/index/user/profile I would like to call /ojs/index.php/index/user/* so then all pages under /user/ would have the stylesheet applied to them.
What is the best way to do this? I have seen a couple of similar posts but not exactly what I need.
var loc = window.location;
var currentURL = loc.pathname;
if (currentURL=='/ojs/index.php/index' || currentURL=='/ojs/' || currentURL=='/ojs/index.php/index/about' || currentURL=='/ojs/index.php/index/user/register' || currentURL=='/ojs/index.php/index/user/profile' || currentURL=='/ojs/index.php/index/admin/' || currentURL=='/ojs/index.php/index/admin/auth')
loadjscssfile("/ojs/plugins/themes/main-theme/main-theme.css", "css")
You could use a Regular Expression.
// unless you were using loc for something else, there is no need to store it,
// just chain to get the pathname
var currentURL = window.location.pathname,
// create a regular expression that will match all pages under user
usersPattern = new RegExp('^/ojs/index\.php/index/user/.*');
if (usersPattern.test(currentURL)) {
loadjscssfile("/ojs/plugins/themes/main-theme/main-theme.css", "css")
}
As always is the case with regex, you need to be careful what you are doing to get it right. Here's a short explanation of how this expression works:
^ tells it to only match if the string starts with /ojs/, etc
The \. in the middle escapes the ., telling it that the . in index.php is a literal dot
The . at the end will match any character
The * following the . will match 0 or more instances of the previous character (any character in this case)
I created this regex with the RegExp constructor function, but it could also have been done with a regular expression literal. Generally it is a better idea to use a literal but in this case I used the constructor because since it takes a string as an argument I didn't have to escape the / characters in the pattern. If we were to do it with a literal, instead of this:
usersPattern = new RegExp('^/ojs/index\.php/index/user/.*');
It would look like this:
usersPattern = /^\/ojs\/index\.php\/index\/user\/.*/;
Not needing to escape those / makes it a little more readable.

How to turn urls padded by space into links

I have the following code that is used to turn http URLs in text into anchor tags. It's looking for anything starting with http, surrounded by white space (or the beginning/end of input)
function linkify (str) {
var regex = /(^|\s)(https?:\/\/\S+)($|\s)/ig;
return str.replace(regex,'$1$2$3')
}
// This works
linkify("Go to http://www.google.com and http://yahoo.com");
// This doesn't, yahoo.com doesn't become a link
linkify("Go to http://www.google.com http://yahoo.com");
The case where it doesn't work is if I only have a single space between two links. I'm assuming it's because the space in between the two links can't be used to match both URLs, after the first match, the space after the URL has already been consumed.
To play with: http://jsfiddle.net/NgMw8/
Can somebody suggest a regex way of doing this? I could scan the string myself, looking for a regex way of doing it (or some way that doesn't require scanning the string my self and building a new string on my own.
Don't capture the final \s. This way, the second url will match the preceding \s, as required:
function linkify (str) {
var regex = /(^|\s)(https?:\/\/\S+)/ig;
return str.replace(regex,'$1$2')
}
http://jsfiddle.net/NgMw8/3/
Just use a positive lookahead when matching your final $|\s, like so:
var regex = /(^|\s)(https?:\/\/\S+)(?=($|\s))/ig;
None will work if there are any html element stuck to the url ...
Similar question and it's answers HERE
Some solutions can handle url like "test.com/anothertest?get=letsgo" and append http://
Workaround may be done to handle https and miscellaneous tld ...

How to identify all URLs that contain a (domain) substring?

If I am correct, the following code will only match a URL that is exactly as presented.
However, what would it look like if you wanted to identify subdomains as well as urls that contain various different query strings - in other words, any address that contains this domain:
var url = /test.com/
if (window.location.href.match(url)){
alert("match!");
}
If you want this regex to match "test.com" you need to escape the "." and both of the "/" that means any character in regex syntax.
Escaped : \/test\.com\/
Take a look for here for more info
No, your pattern will actually match on all strings containing test.com.
The regular expresssion /test.com/ says to match for test[ANY CHARACTER]com anywhere in the string
Better to use example.com for example links. So I replaces test with example.
Some example matches could be
http://example.com
http://examplexcom.xyz
http://example!com.xyz
http://example.com?q=123
http://sub.example.com
http://fooexample.com
http://example.com/asdf/123
http://stackoverflow.com/?site=example.com
I think you need to use /g. /g enables "global" matching. When using the replace() method, specify this modifier to replace all matches, rather than only the first one:
var /test.com/g;
If you want to test if an URL is valid this is the one I use. Fairly complex, because it takes care also of numeric domain & a few other peculiarities :
var urlMatcher = /(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?#)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/;
Takes care of parameters and anchors etc... dont ask me to explain the details pls.

Categories