How to convert a .NET URL regex to a Javascript URL regex? - javascript

I am using in .Net the [Url(UrlOptions.DisallowProtocol)] data annotation attribute which checks URL regex (no mandatory for https/http or www).
The code of this attribute looks like this:
string const regex = new RegExp('^((https?|ftp):\/\/)?(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|([a-zA-Z][\-a-zA-Z0-9]*)|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$');
I want to convert it to JS validation but facing lots of difficulties because this is a long validation.
Is there any tool or any easy way to convert this regex to work in JS?

The regular expression seems to work in Javascript to some extent without any modification - see regex101 demo. Just remember to escape all backslashes (\\ in place of \) and single quotes ('' instead of ') if defining it in a single-quoted Javascript string:
var jsRegex = new RegExp('^((https?|ftp):\\/\\/)?(((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:)*#)?(((\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5]))|([a-zA-Z][\\-a-zA-Z0-9]*)|((([a-zA-Z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(([a-zA-Z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])*([a-zA-Z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])))\\.)+(([a-zA-Z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(([a-zA-Z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])*([a-zA-Z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])))\\.?)(:\\d*)?)(\\/((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)+(\\/(([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)*)*)?)?(\\?((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)|[\\uE000-\\uF8FF]|\\/|\\?)*)?(\\#((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)|\\/|\\?)*)?$', 'i');
// ...or...
var jsRegex = /^((https?|ftp):\/\/)?(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|([a-zA-Z][\-a-zA-Z0-9]*)|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i;
Note: The i modifier is assuming this needs case-insensitive matching.
(If the above isn't sufficient, please be more specific about what isn't working.)
Is there any tool or any easy way to convert this regex to work in JS?
The only generic tool I'm aware of that supposedly converts between regex flavors is RegexBuddy - but it is paid for software (€29.95) - although if for any reason it didn't work you could get a refund.

var YourRegEx = #"^((https?|ftp):\/\/)?(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|([a-zA-Z][\-a-zA-Z0-9]*)|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$";
var Ismatch = Regex.Match(input, YourRegEx , RegexOptions.IgnoreCase);
if (Ismatch .Success)
{
// does match
}
You can Try this as will put your regex in #""

Related

Get only one character case insensitive in a globally case sensitive RegExp in JavaScript [duplicate]

I am using Nodejs to build application in which I need to process certain strings I have used the JS "RegExp" object for this purpose.
I want only a part of my string in the regex to be case insensitive
var key = '(?i)c(?-i)ustomParam';
var find = '\{(\\b' + key +'\\b:?.*?)\}';
var regex = new RegExp(find,"g");
But it breaks with following error
SyntaxError: Invalid regular expression: /{(\b(?i)c(?-i)ustomParam\b:?.*?)}/
I will get the key from some external source like redis and the string to be matched from some other external source , I want that the first alphabet should be case-Insensitive and the rest of alphabets to be case-Sensitive.
When I get the key from external source I will append the (?i) before the first alphabet and (?-i) after the first alphabet.
I even tried this just for starters sake, but that also didn't work
var key ='customParam';
var find = '(?i)\{(\\b' + key +'\\b:?.*?)\}(?-i)';
var regex = new RegExp(find,"g");
I know I can use "i" flags instead of above ,but that's not my use case. I did it just to check.
JavaScript built-in RegExp does not support inline modifiers, like (?im), let alone inline modifier groups that can be placed anywhere inside the pattern (like (?i:....)).
Even XRegExp cannot offer this functionality, you can only use (?i) on the whole pattern declaring it at the pattern beginning.
In XRegExp, you can define the regex ONLY as
var regex = XRegExp('(?i)\\{(\\b' + key +'\\b:?.*?)\\}', 'g');
On May 27, 2020, still neither JavaScript native RegExp, nor XRegExp patterns support inline modifier groups (i.e. (?i:...)), nor placing them in any part of the pattern (as far as XRegExp is concerned).

RFC email regex pattern is shown as error in visual studio?

I'm using the exact email regex pattern from the RFC :
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
However
When I paste it in vs :
var emailPattern = /[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/i;
it is shown with error :
http://i.stack.imgur.com/UkJJE.jpg
Undetermined string constant
How can I remove this error ?
I think you need to escape your / in your regexp if you decide to use the / as a delimiter so replace in the regexp each / occurrence by \/.
The thing is, as the first and last / are here to indicate where you regexp really starts and where it really ends, you need to escape / inside the regexp for the parser to understand where to stop.
This should work :
var emailPattern = /[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/i;
And as you pointed out in your comment, using the construct new Regexp("...") allows you to build your regexp without having to escape.
Both constructs are equivalent. More info here

XRegExp to replace Unicode characters in IE

I developed a javascript function to clean a range of Unicode characters. For example, "ñeóñú a1.txt" => "neonu a1.txt". For this, I used a regular expression:
var = new RegExp patternA ("[\\u0300-\\u036F]", "g");
name = name.replace (patternA,'');
But it does not work properly in IE. If my research is correct, IE does not detect Unicode in the same way. I'm trying to make an equivalent function using the library XRegExp (http://xregexp.com/), which is compatible with all browsers, but I don't know how to write the Unicode pattern so XRegExp works in IE.
One of the failed attemps:
XRegExp.replace(name,'\\u0300-\\u036F','');
How can I build this pattern?
The value provided as the XRegExp.replace method's second argument should be a regular expression object, not a string. The regex can be built by the XRegExp or the native RegExp constructor. Thus, the following two lines are equivalent:
name = name.replace(/[\u0300-\u036F]/g, '');
// Is equivalent to:
name = XRegExp.replace(name, /[\u0300-\u036F]/g, '');
The following line you wrote, however, is not valid:
var = new RegExp patternA ("[\\u0300-\\u036F]", "g");
Instead, it should be:
var patternA = new RegExp ("[\\u0300-\\u036F]", "g");
I don't know if that is the source of your problem, but perhaps. For the record, IE's Unicode support is as good or better than other browsers.
XRegExp can let you identify your block by name, rather than using magic numbers. XRegExp('[\\u0300-\\u036F]') and XRegExp('\\p{InCombiningDiacriticalMarks}') are exactly equivalent. However, the marks in that block are a small subset of all combining marks. You might actually want to match something like XRegExp('\\p{M}'). However, note that simply removing marks like you're doing is not a safe way to remove diacritics. Generally, what you're trying to do is a bad idea and should be avoided, since it will often lead to wrong or unintelligible results.

Matching regular expression string in Javascript

Does anyone know how to find regular expression string from javascript code?
e.g.
var pattern = /some regular expression/;
Is it possible to to with regular expression :) ?
If I got your question right, and you need a regular expression which would find all the regular expressions in a JavaScript program, then I don't think it is possible. A regular expression in JavaScript does not have to use the // syntax, it can be defined as a string. Even a full-blown JavaScript parser would not be smart enough to detect a regular expression here, for instance:
var re = "abcde";
var regexClass = function() { return RegExp; }
var regex = new regexClass()(re);
So I would give up this idea unless you want to cover only a few very basic cases.
You want a regex to match a regex? Crazy. This might cover the simplest cases.
new RegExp("\/.+\/")
However, I peeked into the Javascript Textmate bundle and is has 2 regex for finding a regex start and end.
begin = '(?<=[=(:]|^|return)\s*(/)(?![/*+{}?])'
end = '(/)[igm]*';
Which you could probably use as inspiration for toward your goal.
Thanks for answers I have found also that it is nearly impossible task to do, but here is my regex which parses source code just fine:
this.mainPattern = new RegExp(//single line comment
"(?://.*$)|"+
//multiline comment
"(/\\*.*?($|\\*/))"+
//single or double quote strings
"|(?:(?:\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")|(?:'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'))"+
//regular expression literal in javascript code
"|(?:(?:[/].+[/])[img]?[\\s]?(?=[;]|[,]|[)]))"+
//brackets
"|([{]|[(]|[\[])|([}]|[)]|[\\]])", 'g');

Regex to match part of a string

Regex fun again...
Take for example http://something.com/en/page
I want to test for an exact match on /en/ including the forward slashes, otherwise it could match 'en' from other parts of the string.
I'm sure this is easy, for someone other than me!
EDIT:
I'm using it for a string.match() in javascript
Well it really depends on what programming language will be executing the regex, but the actual regex is simply
/en/
For .Net the following code works properly:
string url = "http://something.com/en/page";
bool MatchFound = Regex.Match(url, "/en/").Success;
Here is the JavaScript version:
var url = 'http://something.com/en/page';
if (url.match(/\/en\//)) {
alert('match found');
}
else {
alert('no match');
}
DUH
Thank you to Welbog and Chris Ballance to making what should have been the most obvious point. This does not require Regular Expressions to solve. It simply is a contains statement. Regex should only be used where it is needed and that should have been my first consideration and not the last.
If you're trying to match /en/ specifically, you don't need a regular expression at all. Just use your language's equivalent of contains to test for that substring.
If you're trying to match any two-letter part of the URL between two slashes, you need an expression like this:
/../
If you want to capture the two-letter code, enclose the periods in parentheses:
/(..)/
Depending on your language, you may need to escape the slashes:
\/..\/
\/(..)\/
And if you want to make sure you match letters instead of any character (including numbers and symbols), you might want to use an expression like this instead:
/[a-z]{2}/
Which will be recognized by most regex variations.
Again, you can escape the slashes and add a capturing group this way:
\/([a-z]{2})\/
And if you don't need to escape them:
/([a-z]{2})/
This expression will match any string in the form /xy/ where x and y are letters. So it will match /en/, /fr/, /de/, etc.
In JavaScript, you'll need the escaped version: \/([a-z]{2})\/.
You may need to escape the forward-slashes...
/\/en\//
Any reason /en/ would not work?
/\/en\// or perhaps /http\w*:\/\/[^\/]*\/en\//
You don't need a regex for this:
location.pathname.substr(0, 4) === "/en/"
Of course, if you insist on using a regex, use this:
/^\/en\//.test(location.pathname)

Categories