I'm trying to construct URL RegExp. The base expression looks like:
/^(((http(?:s)?\:\/\/)|www\.)[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*\.[a-zA-Z]{2,6}(?:\/?|(?:\/[\w\-]+)*)(?:\/?|\/\w+((\.[a-zA-Z]{2,4})?)(?:\?[\w]+\=[\w\-]+)?)?(?:\&[\w]+\=[\w\-]+)*)$/
It looks good for me, because matches these:
http://gmail.com
http://www.gmail.com
www.gmail.com
But I wold like to modify it to match this:
gmail.com
I will appreciate any help.
just add a ? to make www optional, then it will match gmail.com also
use this :
^(((http(?:s)?\:\/\/)|www\.)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*\.[a-zA-Z]{2,6}(?:\/?|(?:\/[\w\-]+)*)(?:\/?|\/\w+((\.[a-zA-Z]{2,4})?)(?:\?[\w]+\=[\w\-]+)?)?(?:\&[\w]+\=[\w\-]+)*)$
or if you want to match only gmail.com and not http://gmail.com in that case use this :
^([a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*\.[a-zA-Z]{2,6}(?:\/?|(?:\/[\w\-]+)*)(?:\/?|\/\w+((\.[a-zA-Z]{2,4})?)(?:\?[\w]+\=[\w\-]+)?)?(?:\&[\w]+\=[\w\-]+)*)$
please note , this will match anu string which has dots and alphabets in it.
IMO it will be better off using a regex like this :
^(http:\/\/|www\.)?[\w\.]+\.(com|net|co\.cc|co\.in)$
you can modify it according to your needs .
check out a demo here and play around with the regex :
http://regex101.com/r/tS4aB3
The easiest way is to treat 'www' as just another subdomain (because that's all it is).
So:
/^(((http(?:s)?\:\/\/))?([a-zA-Z0-9\-]+\.?)+(?:\.[a-zA-Z0-9\-]+)*\.[a-zA-Z]{2,6}(?:\/?|(?:\/[\w\-]+)*)(?:\/?|\/\w+((\.[a-zA-Z]{2,4})?)(?:\?[\w]+\=[\w\-]+)?)?(?:\&[\w]+\=[\w\-]+)*)$/
Edit: as a side note, the tld (i.e. the ".com" part) is... quite complicated these days. There are a lot of them, and they may not fit easily in 2-6 chars.
Related
Using Javascript RegExp, I'm trying to match URL like the following:
http://sub.domain.com/
http://sub1.domain.com/
http://sub100.domain.com/
I tried the following regex, which isn't working - not sure what I'm doing wrong ?
^http\:\/\/sub\d*\.domain\.com\/$
EDIT: fixed copy & paste typo
Update: For some reason, document.location.href doesn't match the regex - even though examples below (also on regex101.com) do work as expected. My workaround for now - I just match any subdomain.
..any help is much appreciated!
i dont understand How you can compare a grape with a mango
here is the corrected regex :
/^http\:\/\/subs\d*\.app\.clicktale\.com\/$/.test("http://subs14.app.clicktale.com/");
run this command in your console, right now.
you will get true , if you use search then you will get 0 because the index of your match is 0
Well the sub-domains you showed have the token "sub", but your REGEX is looking for "subs".
Also, no need to escape colons.
You don't say whether you wish to test for a match or actually capture the sub-domain. I'll assume the latter:
var match = "http://foo.bar.com".match(/https?:\/\/(([^.]+)\.)?/);
alert(match[2]); //"foo"
I'm trying to find a simple regexp for url validation, but not very good in regexing..
Currently I have such regexp: (/^https?:\/\/\w/).test(url)
So it's allowing to validate urls as http://localhost:8080 etc.
What I want to do is NOT to validate urls if they have some long special characters at the end like: http://dodo....... or http://dododo&&&&&
Could you help me?
How about this?
/^http:\/\/\w+(\.\w+)*(:[0-9]+)?\/?(\/[.\w]*)*$/
Will match: http://domain.com:port/path or just http://domain or http://domain:port
/^http:\/\/\w+(\.\w+)*(:[0-9]+)?\/?$/
match URLs without path
Some explanations of regex blocks:
Domain: \w+(\.\w+)* to match text with dots: localhost or www.yahoo.com (could be as long as Path or Port section begins)
Port: (:[0-9]+)? to match or to not match a number starting with semicolon: :8000 (and it could be only one)
Path: \/?(\/[.\w]*)* to match any alphanums with slashes and dots: /user/images/0001.jpg (until the end of the line)
(path is very interesting part, now I did it to allow lone or adjacent dots, i.e. such expressions could be possible: /. or /./ or /.../ and etc. If you'd like to have dots in path like in domain section - without border or adjacent dots, then use \/?(\/\w+(.\w+)*)* regexp, similar to domain part.)
* UPDATED *
Also, if you would like to have (it is valid) - characters in your URL (or any other), you should simply expand character class for "URL text matching", i.e. \w+ should become [\-\w]+ and so on.
If you want to match ABCD then you may leave the start part..
For Example to match http://localhost:8080
' just write
/(localhost).
if you want to match specific thing then please focus the term that you want to search, not the starting and ending of sentence.
Regular expression is for searching the terms, until we have a rigid rule for the same. :)
i hope this will do..
It depends on how complex you need the Regex to be. A simple way would be to just accept words (and the port/domain):
^https?:\/\/\w+(:[0-9]*)?(\.\w+)?$
Remember you need to use the + character to match one or more characters.
Of course, there are far better & more complicated solutions out there.
^https?:\/\/localhost:[0-9]{1,5}\/([-a-zA-Z0-9()#:%_\+.~#?&\/=]*)
match:
https://localhost:65535/file-upload-svc/files/app?query=abc#next
not match:
https://localhost:775535/file-upload-svc/files/app?query=abc#next
explanation
it can only be used for localhost
it also check the value for port number since it should be less than 65535 but you probably need to add additional logic
You can use this. This will allow localhost and live domain as well.
^https?:\/\/\w+(\.\w+)*(:[0-9]+)?(\/.*)?$
I'm pretty late to the party but now you should consider validating your URL with the URL class. Avoid the headache of regex and rely on standard
let isValid;
try {
new URL(endpoint); // Will throw if URL is invalid
isValid = true;
} catch (err) {
isValid = false;
}
^https?:\/\/(localhost:([0-9]+\.)+[a-zA-Z0-9]{1,6})?$
Will match the following cases :
http://localhost:3100/api
http://localhost:3100/1
http://localhost:3100/AP
http://localhost:310
Will NOT match the following cases :
http://localhost:3100/
http://localhost:
http://localhost
http://localhost:31
I 'borrowed' a regex from this website : http://daringfireball.net/2010/07/improved_regex_for_matching_urls that is almost complete but i want to match exemple.com
I know that stackoverflow is not doyourhomework.com but I passed a long time thinking without results. Here is a fiddle to test : http://jsfiddle.net/BGnMm/25/ and you can see at the end that exemple.com is not a link.
var reg=/\b((?:[a-z][\w-]+:(?:\/*)|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/gi;
var allurl="http:foo.com/blah_blah http://foo.com/blah_blah/ (Something like http://foo.com/blah_blah) http://foo.com/blah_blah_(wikipedia) http://foo.com/more_(than)_one_(parens) (Something like http://foo.com/blah_blah_(wikipedia)) http://foo.com/blah_(wikipedia)#cite-1 http://foo.com/blah_(wikipedia)_blah#cite-1 http://foo.com/unicode_(✪)_in_parens http://foo.com/(something)?after=parens http://foo.com/blah_blah. http://foo.com/blah_blah/. <http://foo.com/blah_blah> <http://foo.com/blah_blah/> http://foo.com/blah_blah, http://www.extinguishedscholar.com/wpglob/?p=364. http://✪df.ws/1234 rdar://1234 rdar:/1234 x-yojimbo-item://6303E4C1-6A6E-45A6-AB9D-3A908F59AE0E message://%3c330e7f840905021726r6a4ba78dkf1fd71420c1bf6ff#mail.gmail.com%3e http://➡.ws/䨹 www.c.ws/䨹 <tag>http://example.com</tag> Just a www.example.com link. http://example.com/something?with,commas,in,url, but not at end What about <mailto:gruber#daringfireball.net?subject=TEST> (including brokets). mailto:name#example.com bit.ly/foo “is.gd/foo/” WWW.EXAMPLE.COM http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55))/Web_ENG/View_DetailPhoto.aspx?PicId=752 http://www.asianewsphoto.com/(S(neugxif4twuizg551ywh3f55)) http://lcweb2.loc.gov/cgi-bin/query/h?pp/horyd:#field(NUMBER+#band(thc+5a46634)) 6:00p filename.txt http://example.com/quotes-are-“part” ✪df.ws/1234 example.com example.com/";
document.write(allurl.replace(reg,"<a href='$1' >$1</a><br />"));
Add an alternation operator (|) after the {2,4}\/, i.e.
var reg=/\b((?:[a-z][\w-]+:(?:\/*)|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/|)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/gi;
There's something you should understand about this. The first non-captured group, (?: … ), looks for "indicators" of URLs. One indicator, for example, is the www (followed by up to 3 digits of numbers). You however are asking for a way to identify URLs without any indicator at all. So, what we've done above is we've added a clause, "or an empty match," as a "valid" indicator. The consequence of this is that your regular expression is less selective now: all sorts of strings, not only example.com but also filename.txt, 3.141593, and omg...really are identified as URLs! Your only other (readily available) option is to be more selective about suffixes, e.g. require specific suffixes (com|org|net), but then this takes away from the generality of the original regex, which doesn't specify any suffixes at all.
In other words, you are probably faced with a limitation of logic, not a limitation of regex-writing skills or the regex language itself.
Please check if
var reg=/\b((?:[a-z][\w-]+:(?:\/*)|(?:www\d{0,3}[.])|[a-z0-9.\-]+[.][a-z]{2,4}\/{0,1})(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))*(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/gi;
suits your needs. www(anyNumber) has just been put to appear one or zero times. Sorry for the first answer, did not notice the texts.
If I am correct, the following code will only match a URL that is exactly as presented.
However, what would it look like if you wanted to identify subdomains as well as urls that contain various different query strings - in other words, any address that contains this domain:
var url = /test.com/
if (window.location.href.match(url)){
alert("match!");
}
If you want this regex to match "test.com" you need to escape the "." and both of the "/" that means any character in regex syntax.
Escaped : \/test\.com\/
Take a look for here for more info
No, your pattern will actually match on all strings containing test.com.
The regular expresssion /test.com/ says to match for test[ANY CHARACTER]com anywhere in the string
Better to use example.com for example links. So I replaces test with example.
Some example matches could be
http://example.com
http://examplexcom.xyz
http://example!com.xyz
http://example.com?q=123
http://sub.example.com
http://fooexample.com
http://example.com/asdf/123
http://stackoverflow.com/?site=example.com
I think you need to use /g. /g enables "global" matching. When using the replace() method, specify this modifier to replace all matches, rather than only the first one:
var /test.com/g;
If you want to test if an URL is valid this is the one I use. Fairly complex, because it takes care also of numeric domain & a few other peculiarities :
var urlMatcher = /(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?#)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/;
Takes care of parameters and anchors etc... dont ask me to explain the details pls.
I need to match a image url like this:
http://site.com/site.com/files/images/img (5).jpg
Something like this works fine:
.replace(/(http:\/\/([ \S]+\.(jpg|png|gif)))/ig, "<div style=\"background: url($1)\"></div>")
Except if I have something like this:
http://site.com/site.com/files/audio/audiofile.mp3 http://site.com/site.com/files/images/img (5).jpg
How do I match only the image?
Thanks!
Edit: And I'm using javascript.
Assuming images will always be in the 'images' directory, try:
http://.*/images/(.*?).(jpe?g|gif|png)
If you can't assume an images directory:
http://.*/(.*?).(jpe?g|gif|png)
Group 1 and 2 should have what you want (file name and extension).
I tested the regular expression here and here and it appears to do what you want.
Proper URLs should not have spaces in them, they should have %20 or a plus '+' instead. If you had them written with those alternatives then your matching would be much easier.
Why not:
/([^/]+\.(jpg|png|gif))$
Using
http:\/\/.*\/(.*)\.(jpg|png|gif)
should do the trick if all you want is the name of the image. The first group is the file name and the second group is the file extension.
Can you assume that the urls will be space delimited, or return delimited?
As in, can you assume this input?
site.com/images/images/lol (5).jpg
site.com/images/other/radio.mp3
site.com/images/images/copter (3).jpg
If you are going to have your delimiter as part of your string to return, things get tricky. What kind of volume are you talking about here? Could you do it semi-manually at all, or does the process have to be automated?
This would be an approach:
^((\w+):)?\/\/((\w|\.)+(:\d+)?)[^:]+\.(jpe?g|gif|png)$
Mathing on the colon. (:)
In this case it's only accepted for the protocol and port (optional).
This will not match:
http://site.com/site.com/files/audio/audiofile.mp3 http://site.com/site.com/files/images/img (5).jpg
This will match (colon in second http:// removed)
"/audiofile.mp3 http/" will count as a folder in "/audio/"
http://site.com/site.com/files/audio/audiofile.mp3 http//site.com/site.com/files/images/img (5).jpg
It's not fool proof. There are other characters that are not allowed in filenames ( * | " < > )