Regex to get SLD + TLD from domain string - javascript

I have a function that takes a domain name as an argument but the domain must be in the format of xxx.com. e.g. http://subdomain.example.com must be passed into the function as example.com.
I have written the below regex but it's only returning the TLD (.com). I'm a bit of a newb with regex so can't really see where I've gone wrong... The first statement is to extract http:// from domain and the second statement should extract any subdomain.
var domain = req.query.domain.replace(/.*?:\/\//g, '').replace(/^[^.]+\./g, '');
Using the above regex, http://example.com becomes com.

I think it's easier to match the pattern directly than to match and remove its complement. I would use the pattern /[^./]+\.[^./]+$/. This matches two runs of non-special characters separated by a period at the end of the string.
alert('http://subdomain.example.com'.match(/[^./]+\.[^./]+$/)[0]);
alert('http://example.com'.match(/[^./]+\.[^./]+$/)[0]);

A much easier answer is to use the document.createElement trick shown here
To get the hostname and TLD, you'd simply write
var link = document.createElement('a');
link.href = req.query.domain;
var formattedDomain = link.hostname; //yay

Related

Regex to replace number in url

Similar to many questions such as Javascript Regex url replace
But I'm trying to replace a number in a URL string i.e.
filename.replace('org\/*\/','org/23/')
The URL is much longer, but I just need to replace the number that comes after org/
ie. assets/org/1/course/154/805597a6-9c35-4f13-af83-ebfdcb12f769/upload_87bf778b-44ee-4a39-8765-ee9c4b9f3126.jpg
The current regex you're passing is being interpreted as a string. You need to use the forward-slashes or RegExp class to indicate you're passing a regex
let filename = "assets/org/1/course/154/805597a6-9c35-4f13-af83-ebfdcb12f769/upload_87bf778b-44ee-4a39-8765-ee9c4b9f3126.jpg"
console.log(filename.replace(/org\/([0-9]+)\//,'org/23/'))

How do I match URLs with regular expressions?

We want to check if a URL matches mail.google.com or mail.yahoo.com (also a subdomain of them is accepted) but not a URL which contains this string after a question mark. We also want the strings "mail.google.com" and "mail.yahoo.com" to come before the third slash of the URL, for example https://mail.google.com/ is accepted, https://www.facebook.com/mail.google.com/ is not accepted, and https://www.facebook.com/?mail=https://mail.google.com/ is also not accepted. https://mail.google.com.au/ is also not accepted. Is it possible to do it with regular expressions?
var possibleURLs = /^[^\?]*(mail\.google\.com|mail\.yahoo\.com)\//gi;
var url;
// assign a value to var url.
if (url.match(possibleURLs) !== null) {
// Do something...
}
Currently this will match both https://mail.google.com/ and https://www.facebook.com/mail.google.com/ , but we don't want to match https://www.facebook.com/mail.google.com/.
Edit: I want to match any protocol (any string which doesn't contain "?" and "/") followed by a slash "/" twice (the string and the slash can both be twice), then any string which doesn't contain "?" and "/" (if it's not empty, it must end with a dot "."), and then (mail\.google\.com|mail\.yahoo\.com)\/. Case insensitive.
Not being funny - but why must it be a regular expression?
Is there are reason why you couldn't simplify the process using URL (or webkitURL in Chrome and Safari) - the URL constructor simply takes a string and then contains properties for each part of the URL. Whether it supports all the host types that you want to support, I don't know.
Granted, you might still need a regex after that (although really you'd just be checking that the hostname ends with either yahoo.com or google.com), but you would just be running it against the hostname of the URL object rather than the whole URI.
The API is not ubiquitous, but seems reasonably well supported and, anyway, if this is client-side validation then I hope you're checking it on the server, too, because sidestepping javascript validation is easy.
How about
^[a-z]+:\/\/([^.\/]+\.)*mail\.(google|yahoo).com\/
Regex Example Link
^ Anchors the regex at the start of the string
[a-z]+ Matches the protocol. If you want a specific set of protocols, then (https?|ftp) may do the work
([^.\/]+\.)* matches the subdomin part
^([-a-z]+://|^cid:|^//)([^/\?]+\.)?mail\.(google|yahoo)\.com/
Should do the trick
The first ^ means "match beginning of line", the second negates the allowed characters, thus making a slash / not allowed.
Nb. You still have to escape the slashes, or use it as a string in new RegExp(string):
new RegExp('^([-a-z]+://|^cid:|^//)([^/\?]+\.)?mail\.(google|yahoo)\.com/')
OK, I found that it works with:
var possibleURLs = /^([^\/\?]*\/){2}([^\.\/\?]+\.)*(mail\.google\.com|mail\.yahoo\.com)\//gi;

How to turn urls padded by space into links

I have the following code that is used to turn http URLs in text into anchor tags. It's looking for anything starting with http, surrounded by white space (or the beginning/end of input)
function linkify (str) {
var regex = /(^|\s)(https?:\/\/\S+)($|\s)/ig;
return str.replace(regex,'$1$2$3')
}
// This works
linkify("Go to http://www.google.com and http://yahoo.com");
// This doesn't, yahoo.com doesn't become a link
linkify("Go to http://www.google.com http://yahoo.com");
The case where it doesn't work is if I only have a single space between two links. I'm assuming it's because the space in between the two links can't be used to match both URLs, after the first match, the space after the URL has already been consumed.
To play with: http://jsfiddle.net/NgMw8/
Can somebody suggest a regex way of doing this? I could scan the string myself, looking for a regex way of doing it (or some way that doesn't require scanning the string my self and building a new string on my own.
Don't capture the final \s. This way, the second url will match the preceding \s, as required:
function linkify (str) {
var regex = /(^|\s)(https?:\/\/\S+)/ig;
return str.replace(regex,'$1$2')
}
http://jsfiddle.net/NgMw8/3/
Just use a positive lookahead when matching your final $|\s, like so:
var regex = /(^|\s)(https?:\/\/\S+)(?=($|\s))/ig;
None will work if there are any html element stuck to the url ...
Similar question and it's answers HERE
Some solutions can handle url like "test.com/anothertest?get=letsgo" and append http://
Workaround may be done to handle https and miscellaneous tld ...

Domain/host name inside regex Javascript

I want to create a regex that parse only links that complies to the current domain. I want to know if there is something like {hostname} that I can use inside a Javascript regex object so it will lock the search only to those links that are for the current page specific domain.
For example: if the domain is www.domain.com, it will search links only links that start with that specific domain. if the domain is anotherdomain.com, it will search only links that start with that specific domain. My regex is more complex, but I would like to be able to put some kind of global variable that will be replaced with the current domain.
You can retrieve the hostname of the current page via the Location object :
var hostname = window.location.hostname;
And then you can compose your regex using that variable by concatenating it into your regex :
var re = new RegExp("<start of your regex>" + hostname + "<end of your regex>");
You can get the hostname from window.location.hostname.
Then, you will want to escape potential special characters in a hostname.
A hostname will usually include . which is a special character in regular expressions, and it can certainly also contain - which can be a special character.
Probably best to engage in defensive programming and escape everything that might be special characters in regular expressions even though a lot of them shouldn't actually ever appear. From Escape string for use in Javascript regex you can use this function:
function escapeRegExp(str) {
return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
}
And then you can do this:
var regexpFragment = escapeRegExp(window.location.hostname);

How to identify all URLs that contain a (domain) substring?

If I am correct, the following code will only match a URL that is exactly as presented.
However, what would it look like if you wanted to identify subdomains as well as urls that contain various different query strings - in other words, any address that contains this domain:
var url = /test.com/
if (window.location.href.match(url)){
alert("match!");
}
If you want this regex to match "test.com" you need to escape the "." and both of the "/" that means any character in regex syntax.
Escaped : \/test\.com\/
Take a look for here for more info
No, your pattern will actually match on all strings containing test.com.
The regular expresssion /test.com/ says to match for test[ANY CHARACTER]com anywhere in the string
Better to use example.com for example links. So I replaces test with example.
Some example matches could be
http://example.com
http://examplexcom.xyz
http://example!com.xyz
http://example.com?q=123
http://sub.example.com
http://fooexample.com
http://example.com/asdf/123
http://stackoverflow.com/?site=example.com
I think you need to use /g. /g enables "global" matching. When using the replace() method, specify this modifier to replace all matches, rather than only the first one:
var /test.com/g;
If you want to test if an URL is valid this is the one I use. Fairly complex, because it takes care also of numeric domain & a few other peculiarities :
var urlMatcher = /(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?#)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/;
Takes care of parameters and anchors etc... dont ask me to explain the details pls.

Categories