Regex to validate URL/URI without special characters

Regex to validate URL/URI without special characters - javascript

I need a regex to validate the given value is a URL or URI without ascii or special characters.
So valid scenarios :
https://stackoverflow.com/questions/ask
/questions/ask
Invalid scenarios:
https://stackoverflow.com/questions/ask/##$ds
questions/ask#*
https://stackoverflow.com/questions/ask/##$ds ### (without spaces)
How could this be achieved using a regex?

I'd say this one is a good place to start
^[https:\/\/|http:\/\/]?(www\.)?[-a-zA-Z0-9\.\/]+$
It's not exactly evident which are your limitations
Optional protocol
Optional the www. Clause
Limitation of accepted characters
Validation from start to finish ^ - $

Related

Simple regex pattern for email

I've been trying to work this out for almost an hour now, and I can't see myself getting much further with it without any help or explanation. I've used regex before, but only ones that are very simple or had already been made.
This time, I'm trying to work out how to write a regex that achieves the following:
Email address must contain one # character and at least one dot (.) at least one position after the # character.
So far, this is all I've been able to work out, and it still matches email addresses that, for example, have more than one # symbol.
.*?#?[^#]*\.+.*
It would be helpful if you can show me how to construct a regular expression that checks for a single # and at least one full stop one or more spaces after the #. If you could break down the regex and explain what each bit does, that would be really helpful.
I want to keep it simple for now, so it doesn't have to be a full-on super-accurate email validation expression.

With the help of ClasG's comment, I now have a fairly straightforward and suitable regex for my problem. For the sake of anyone learning regex who might come across this question in the future, I'll break the expression down below.
Expression: ^[^#]+#[^#]+\.[^#]+$
^ Matches the beginning of the string (or line if multiline)
[^#] Match any character that is not in this set (i.e. not "#")
+ Match one or more of this
# Match "#" character
[^#] Match any character that is not in this set
+ Match one or more
\. Match "." (full stop) character (backslash escapes the full stop)
[^#] Match any character that is not in this set
+ Match one or more
$ Matches the end of the string (or line if multiline)
And in plain language:
Start at beginning of string or line
Include all characters except # until the # sign
Include the # sign
Include all characters except # after the # sign until the full stop
Include all characters except # after the full stop
Stop at the end of the string or line

Email address must contain one # character
No they don't. An email address with no '#' character is perfectly valid. An email address with multiple '#' characters before an IP address is perfectly valid (as long as all but 1 are outside the ADDR_SPEC or are quoted/escaped within the mailbox name).
I suspect you're not trying to validate an email address but rather an ADDR_SPEC. The answer linked by Máté Safranka describes how to validate an ADDR_SPEC (not an email address). Unless you expect to be validating records which don't have a valid internet MX record, and more than one '#' is more likely be a typo than a valid address....
/[a-z0-9\._%+!$&*=^|~#%'`?{}/\-]+#([a-z0-9\-]+\.){1,}([a-z]{2,16})/

^[^\W_]+\w*(?:[.-]\w*)*[^\W_]+#[^\W_]+(?:[.-]?\w*[^\W_]+)*(?:\.[^\W_]{2,})$

Email validation doesn't work

I have this simple regular expression for Emails.
/^[a-z]+([\.-_]?[a-z0-9]+)*#([a-z]{3,})+(\.[a-z]{2,3})+$/i;
But when I use this example: first#last#example.com it's still works, And Also when I remove # character from expression :
`/^[a-z]+([\.-_]?[a-z0-9]+)*([a-z]{3,})+(\.[a-z]{2,3})+$/i
it gives the same result.
This expression allows an infinite number of at signs (i.e. #) between at least 2 characters in the email !!
Where is the problem with this expression?

Your pattern is rather restrictive, you might think of other options of validating an email address, like type="email" if it is an input field validation.
As to why the regex matches # even if you take it out, or matches a string with two # symbols, that is cased by [.-_] that matches a lot of chars as the hyphen creates a range that includes #. You need to use [._-] instead.
You may "fix" the regex as
/^[a-z]+([._-]?[a-z0-9]+)*[a-z]{3,}(\.[a-z]{2,3})+$/i
However, this regex is not good to use in real life scenarios.

You want something like that?
/^[a-z\.\-_]+#([a-z]{3,})+(\.[a-z]{2,3})+$/
Probably with sign \.-_ you wanted to have either ".", or "-" or "_" to be used inside the regex, but you forgot to escape "minus".
Or you can use your own but with escape:
^[a-z]+([\.\-_]?[a-z0-9]+)*#([a-z]{3,})+(\.[a-z]{2,3})+$
PS: Remember that a real valid email address could be completely different and has a huge regex, and moreover, each web server defines what is allowed and what is not in email.

Regex - Invalid target for quantifier which converting C# Regex to JavaScript Regex

I am trying to convert C# email regular expression, which I have taken from MSDN sample
#"^(?("")("".+?(?<!\\)""#)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])#)) (?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$"
which is like this:
^(?(")(".+?"#)|(([0-9a-zA-Z]((\.(?!\.))|[^!#\$%&\s'\*/=\?\^`\{\}\|~])*)(?<=[-+0-9a-zA-Z_])#))(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-zA-Z][-\w]*[0-9a-zA-Z]*\.)+[a-zA-Z]{2,6}))$
but I am getting error for:
? : Invalid target for qualifier.
?<= : Lookbehind is not supported in JavaScript
I have need help in converting above Regex

In .NET, this regex must have been used with IgnorePatternWhitespace and IgnoreCase flags since there is a space that prevents matching. Here is a demo.
The problems you encounter when porting the regex to JS are caused by the fact that JS regex does not support lookbehinds and conditionals.
There is a conditional workaround for JS: .NET (?(")"[^"]*"|\w+) can be translated as (?:(?=")"[^"]*"|(?!")\w+).
The lookbehinds are difficult to convert, but here, the first lookbehind does not seem appropriate. You are looking to find the closest set of unescaped double quotes. You can do it with "[^"\\]*(?:\\.[^"\\]*)*".
The second lookbehind is just checking if # is preceded by a letter or digit character. The easiest way to handle this is to add [a-z0-9] character class to the left of the # symbol and apply a ? quantifier to the first group of this alternative, making a digit or a letter appear before # and the 1-character user part would still get matched.
So, you can use
/^(?:(?=")("[^"\\]*(?:\\.[^"\\]*)*"#)|(?!")(([0-9a-z]((\.(?!\.))|[-!#$%&'*+\/=?^`{}|~\w])*)?[a-z0-9]#))(?:(?=\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(?!\[)(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][-a-z0-9]{0,22}[a-z0-9]))$/i
See demo (note I also removed some unnecessary escape symbols).

The inverse of [^\/:] | Regular Expression Improvement

This character set
[^\/:] // all characters except / or :
is weak per jslint b.c. I should be specifying the characters that can be used not he characters that can not be used per this SO Post.
This is for a simple not production level domain tester that looks like this:
domain: /:\/\/(www\.)?([^\/:]+)/,
I'm just looking for some direction on how to think about this. The post mentions that allowing the myriad of Unicode characters is not a good thing...How do I formulate a plan to write this a tad better?
I am not concerned with the completeness of my domain checker ( it is just a prototype )...I am concerned with how to write reg-exes differently.

According to http://en.wikipedia.org/wiki/Domain_name#Internationalized_domain_names
the character set allowed in the Domain Name System is based on ASCII
and as per http://www.netregister.biz/faqit.htm#1
to name your domain you can use any letter, numbers between 0 and 9, and the symbol "-" [as long as the first character is not "-"]
and considering that your domain must end with .something, you are looking for
([a-zA-Z0-9][a-zA-Z0-9-]*\.)+[a-zA-Z0-9][a-zA-Z0-9-]*

"I should be specifying the characters that can be used not he characters that can not be use"
No, that's nonsense, just JSLint being JSLint.
When you see [^\/:] in a regex it's immediately obvious what it is doing. If you tried to list all possible allowed characters the resulting regex would be horrendously difficult to read and it would be easy to accidentally forget to include some characters.
If you have a specific set of allowed characters then fine, list them. That's easier and more reliable than trying to list all possible invalid characters.
But if you have a specific set of invalid characters the [^] syntax is the appropriate way to do it.

Here`s a regex for characters you can have:
mycharactersarecool[^shouldnothavethesechars](oneoftwooptions|anotheroption)
Is this what you're talking about ?

This is a great question for Google, you know... but just to wet your beak: Matthew O'Riordan has written such regular expression that mathces link with or without protocol.
Here's link to his blog post
But for future reference let me provide the regular expression from the post here as well:
/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[.\!\/\\w]*))?)/
And as nicely broken down by blog writer Matthew himself:
(
( # brackets covering match for protocol (optional) and domain
([A-Za-z]{3,9}:(?:\/\/)?) # match protocol, allow in format http:// or mailto:
(?:[\-;:&=\+\$,\w]+#)? # allow something# for email addresses
[A-Za-z0-9\.\-]+ # anything looking at all like a domain, non-unicode domains
| # or instead of above
(?:www\.|[\-;:&=\+\$,\w]+#) # starting with something# or www.
[A-Za-z0-9\.\-]+ # anything looking at all like a domain
)
( # brackets covering match for path, query string and anchor
(?:\/[\+~%\/\.\w\-]*) # allow optional /path
?\??(?:[\-\+=&;%#\.\w]*) # allow optional query string starting with ?
#?(?:[\.\!\/\\\w]*) # allow optional anchor #anchor
)? # make URL suffix optional
)
What about your particular example
But in your case of mathing URL domains the negative of [^\/:] could simply be:
[-0-9a-zA-Z_.]
And that should match everything after // and before first /. But what happens when your URLs don't end with a slash? what will you do in that case?
Upper regular expression (simplification) only matches one character just like your negative character set does. So this just replaces your negative set in the complete reg ex you're using.

Email verification regex failing on hyphens

I'm attempting to verify email addresses using this regex: ^.*(?=.{8,})[\w.]+#[\w.]+[.][a-zA-Z0-9]+$
It's accepting emails like a-bc#def.com but rejecting emails like abc#de-f.com (I'm using the tool at http://tools.netshiftmedia.com/regexlibrary/ for testing).
Can anybody explain why?

Here is the explaination:
In your regualr expression, the part matches a-bc#def.com and abc#de-f.com is [\w.]+[.][a-zA-Z0-9]+$
It means:
There should be one or more digits, word characters (letters, digits, and underscores), and whitespace (spaces, tabs, and line breaks) or '.'. See the reference of '\w'
It is followed by a '.',
Then it is followed one or more characters within the collection a-zA-Z0-9.
So the - in de-f.com doesn't matches the first [\w.]+ format in rule 1.
The modified solution
You could adjust this part to [\w.-]+[.][a-zA-Z0-9]+$. to make - validate in the #string.

Because after the # you're looking for letters, numbers, _, or ., then a period, then alphanumeric. You don't allow for a - anywhere after the #.
You'd need to add the - to one of the character classes (except for the single literal period one, which I would have written \.) to allow hyphens.
\w is letters, numbers, and underscores.
A . inside a character class, indicated by [], is just a period, not any character.
In your first expression, you don't limit to \w, you use .*, which is 0+ occurrences of any character (which may not actually be what you want).

Use this Regex:
var email-regex = /^[^#]+#[^#]+\.[^#\.]{2,}$/;
It will accept a-bc#def.com as well as emails like abc#de-f.com.
You may also refer to a similar question on SO:
Why won't this accept email addresses with a hyphen after the #?
Hope this helps.

Instead you can use a regex like this to allow any email address.
^[a-zA-Z][\w\.-]*[a-zA-Z0-9]#[a-zA-Z][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$

Following regex works:
([A-Za-z0-9]+[-.-_])*[A-Za-z0-9]+#[-A-Za-z0-9-]+(\.[-A-Z|a-z]{2,})+

We Keep Coding

JavaScript is the programming language of the Web.

Regex to validate URL/URI without special characters - javascript

I'd say this one is a good place to start ^[https:\/\/|http:\/\/]?(www\.)?[-a-zA-Z0-9\.\/]+$ It's not exactly evident which are your limitations Optional protocol Optional the www. Clause Limitation of accepted characters Validation from start to finish ^ - $

Related

Simple regex pattern for email

Email validation doesn't work

Regex - Invalid target for quantifier which converting C# Regex to JavaScript Regex

The inverse of [^\/:] | Regular Expression Improvement

Email verification regex failing on hyphens

Categories

Resources