Only minimal experience with Regex, I am trying to implement some email masking in node.js, all was well running it locally but once pushed up to the server I am getting invalid Regex errors.
The Regex code example can be found here
https://regexr.com/42uid
var email = 'foo#bar.com'
const regex = /(.)[^#\n](?=[^#\n]*[^#\n]#)|(?:(#.)|(?!^)\G(?=[^#]*$)).(?!$)/g;
const maskedEmail = email.replace(regex, '*');
maskedEmail should return
f*o#b*r.com
I have narrowed the issue down to being the 'lookbehind/lookahead' which as I understand it is not available in JS. However I am not aware how best to re-write it.
You can capture it in multiple groups and then retrieve that data in the replace with $1, $2, etc.
By using this regex: ^(.).*(.#.).*(.\.[^\.]+)$
and using the following replace string: $1*$2*$3
it will result in: f*o#b*r.com
Link to my Fiddle: https://regexr.com/42um8
Related
I am using in .Net the [Url(UrlOptions.DisallowProtocol)] data annotation attribute which checks URL regex (no mandatory for https/http or www).
The code of this attribute looks like this:
string const regex = new RegExp('^((https?|ftp):\/\/)?(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|([a-zA-Z][\-a-zA-Z0-9]*)|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$');
I want to convert it to JS validation but facing lots of difficulties because this is a long validation.
Is there any tool or any easy way to convert this regex to work in JS?
The regular expression seems to work in Javascript to some extent without any modification - see regex101 demo. Just remember to escape all backslashes (\\ in place of \) and single quotes ('' instead of ') if defining it in a single-quoted Javascript string:
var jsRegex = new RegExp('^((https?|ftp):\\/\\/)?(((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:)*#)?(((\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5]))|([a-zA-Z][\\-a-zA-Z0-9]*)|((([a-zA-Z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(([a-zA-Z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])*([a-zA-Z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])))\\.)+(([a-zA-Z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(([a-zA-Z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])*([a-zA-Z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])))\\.?)(:\\d*)?)(\\/((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)+(\\/(([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)*)*)?)?(\\?((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)|[\\uE000-\\uF8FF]|\\/|\\?)*)?(\\#((([a-zA-Z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-fA-F]{2})|[!\\$&''\\(\\)\\*\\+,;=]|:|#)|\\/|\\?)*)?$', 'i');
// ...or...
var jsRegex = /^((https?|ftp):\/\/)?(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|([a-zA-Z][\-a-zA-Z0-9]*)|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i;
Note: The i modifier is assuming this needs case-insensitive matching.
(If the above isn't sufficient, please be more specific about what isn't working.)
Is there any tool or any easy way to convert this regex to work in JS?
The only generic tool I'm aware of that supposedly converts between regex flavors is RegexBuddy - but it is paid for software (€29.95) - although if for any reason it didn't work you could get a refund.
var YourRegEx = #"^((https?|ftp):\/\/)?(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|([a-zA-Z][\-a-zA-Z0-9]*)|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-fA-F]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$";
var Ismatch = Regex.Match(input, YourRegEx , RegexOptions.IgnoreCase);
if (Ismatch .Success)
{
// does match
}
You can Try this as will put your regex in #""
I'm trying to form a regular expression (javascript/node.js) which will extract the sub-domain & domain part from any given URL. This is what I ended up with:
[^(?:http:\/\/|www\.|https:\/\/)]([^\/]+)
Right now, I'm just considering http, https for protocol & exclude "www." portion from the subdomain+domain portion of an URL. I checked the expression & it almost works. But, here is the issue:
Success
'http://mplay.google.co.in/sadfask/asdkfals?dk=10'.match(/[^(?:http:\/\/|www\.|https:\/\/)]([^\/]+)/i)
'http://lplay.google.co.in/sadfask/asdkfals?dk=10'.match(/[^(?:http:\/\/|www\.|https:\/\/)]([^\/]+)/i)
Failure
'http://play.google.co.in/sadfask/asdkfals?dk=10'.match(/[^(?:http:\/\/|www\.|https:\/\/)]([^\/]+)/i)
'http://tplay.google.co.in/sadfask/asdkfals?dk=10'.match(/[^(?:http:\/\/|www\.|https:\/\/)]([^\/]+)/i)
I just use the first element from the result array. I'm not able to understand why "play." & "tplay." doesn't work. Could anyone please help me in this regard?
Does "/p" and "/t" have any meaning for the regular expression evaluator?
Is there any other way of extracting sub-domain & domain from any given URL using a regular expression?
Edit -
Example:
https://play.google.com/store/apps/details?id=com.skgames.trafficracer => play.google.com
https://mail.google.com/mail/u/0/#inbox => mail.google.com
Your regex doesn't seem correct. Try this regex:
/^(?:https?:\/\/)?(?:[^#\n]+#)?(?:www\.)?([^:\/\n?]+)/img
RegEx Demo
You are about the one millionth person to try to parse URLs in JavaScript. I'm a little bit surprised you didn't see any of the existing questions on SO dating back years. The last thing you want to do is write yet another broken regexp, with all due respect to those that provided answers to your question.
There are many well documented libraries and approaches to handling this. Google it. The simplest way is to create an a element in memory, assign it an href, and then access its hostname and other properties. See http://tutorialzine.com/2013/07/quick-tip-parse-urls/. If that does not float your boat, then use a library like uri.js.
If you really don't want to use a library, and insist on reinventing the wheel, then at least do something like the following:
function get_domain_from_url(url) {
var a = document.createElement('a').
a.setAttribute('href', url);
return a.hostname;
}
Essentially, you are delegating the extraction of the subdomain/domain part of the URL to the browser's URL parsing logic, which is MUCH better than anything you will ever write.
Also see Parse URL with jquery/ javascript?, Parse URL with Javascript, How do I parse a URL into hostname and path in javascript?, or parse URL with JavaScript or jQuery. How did you miss those? Sorry, I have to vote to close this as a duplicate.
The same RegExp as in anubhava's answer, only added support for protocol-relative URLs like //google.com:
/^(?:https?:)?(?:\/\/)?(?:[^#\n]+#)?(?:www\.)?([^:\/\n]+)/im
RegEx Demo
Here's a solution ignoring everything before ://
.*\://?([^\/]+)
Incase you want to ignore www.
.*\://(?:www.)?([^\/]+)
Your regex expression works pretty well. You only need to remove the brackets. The final expression is:
^(?:http:\/\/|www\.|https:\/\/)([^\/]+)
Hope it's useful!
I know I am late to the party but I want to answer the question with some extra useful info.
Get the domain name from a link using regex.
^(https?:\/\/)?(www\.)?([^\/]+)
Here is the link to above regex.
If you want to get the subdomain, split the result from one of the matches of above regex with the first occurrence of .
Note: regex is faster than language built-in modules. check below examples, regex comes out to be 15x faster than the built-in module
javascript Example with Regex:
console.time('time2');
const pttrn = /^(https?:\/\/)?(www\.)?([^\/]+)/gm
const urlInfo = pttrn.exec("https://www.google.co.in/imghp");
console.timeEnd('time2');
//time2: 0.055ms
console.log(urlInfo[0]) // https://www.google.co.in
console.log(urlInfo[1]) // https://
console.log(urlInfo[2]) // www.
console.log(urlInfo[3]) // google.co.in
Nodejs with built-in url module
console.time('time');
const url = require('url');
const urlInfo = url.parse("https://www.google.co.in/imghp");
console.timeEnd('time');
//time: 0.840ms;
console.log(urlInfo.hostname) //www.google.co.in
DEMO: http://jsfiddle.net/RH8f6/52/
$('document').ready(function(){
$('#content').each(function(){
var str = $(this).html();
var find_url = /(https?:\/\/([-\w\.]+)+(:\d+)?(\/([\w\/_\.]*(\?\S+)?)?)?)/ig;
var find_email = /([\.\w]+#[a-zA-Z_]+?\.[a-zA-Z]{2,6})/gim;
var replaced_text = str.replace(find_url, "$1');
$(this).html(replaced_text);
});
});
My regex works to find URLs in the text (and convert them to hyperlinks) except for two cases:
1) When the URL begins with 'www'
2) When the URL ends with-a-file-path-that-looks-like-this.
I have no idea how to start solving this. Any tips on what resources I should look at?
Using a more complex regular expression to find URLs, such as in this answer will solve both of your problems.
To explain why you were having those problems, problem 1 is because your regular expression requires the URL to start with http:// or https://; the portion https?:\/\/ is not grouped with a question mark to make it optional. The reason for problem 2 is that the allowable characters for the path don't include the minus sign.
Using a pre-built complex regular expression meant to handle many types of URLs is a good bet, banking on someone else's work of trying to handle all possible situations.
I've got some JavaScript that looks for Amazon ASINs within an Amazon link, for example
http://www.amazon.com/dp/B00137QS28
For this I use the following regex: /([A-Z0-9]{10})
However, I don't want it to match artist links which look like:
http://www.amazon.com/Artist-Name/e/B000AQ1JZO
So I need to exclude any links where there's a '/e' before the slash and the 10-character alphanumeric code. I thought the following would do that: (?<!/e)([A-Z0-9]{10}), but it turns out negative lookbehinds don't work in JavaScript. Is that right? Is there another way to do this instead?
Any help would be much appreciated!
As a side note, be aware there are plenty of Amazon link formats, which is why I want to blacklist rather than whitelist, eg, these are all the same page:
http://www.amazon.com/gp/product/B00137QS28/
http://www.amazon.com/dp/B00137QS28
http://www.amazon.com/exec/obidos/ASIN/B00137QS28/
http://www.amazon.com/Product-Title-Goes-Here/dp/B00137QS28/
In your case an expression like this would work:
/(?!\/e)..\/([A-Z0-9]{10})/
([A-Z0-9]{10}) will work equally well on the reverse of its input, so you can
reverse the string,
use positive lookahead,
reverse it back.
You need to use a lookahead to filter the /e/* ones out. Then trim the leading /e/ from each of the matches.
var source; // the source you're matching against the RegExp
var matches = source.match(/(?!\/e)..\/[A-Z0-9]{10}/g) || [];
var ids = matches.map(function (match) {
return match.substr(3);
});
I'm trying to retrieve the category part this string "property_id=516&category=featured-properties", so the result should be "featured-properties", and I came up with a regular expression and tested it on this website http://gskinner.com/RegExr/, and it worked as expected, but when I added the regular expression to my javascript code, I had a "Invalid regular expression" error, can anyone tell me what is messing up this code?
Thanks!
var url = "property_id=516&category=featured-properties"
var urlRE = url.match('(?<=(category=))[a-z-]+');
alert(urlRE[0]);
Positive lookbehinds (your ?<=) are not supported in JavaScript environments that do not comply with ECMAScript 2018 standard, which is causing your RegEx to fail.
You can mimic them in a whole bunch of different ways, but this might be a simpler RegEx to get the job done for you:
var url = "property_id=516&category=featured-properties"
var urlRE = url.match(/category=([^&]+)/);
// urlRE => ["category=featured-properties","featured-properties"]
// urlRE[1] => "featured-properties"
That's a super-simple example, but searching StackOverflow for a RegEx pattern to parse URL parameters will turn up more robust examples if you need them.
The syntax is messing up your code.
var urlRE = url.match(/category=([a-z-]+)/);
alert(urlRE[1]);
If you want to parse URL parameters, you can use the getParameterByName() function from this site:
http://james.padolsey.com/javascript/bujs-1-getparameterbyname/
In any case, as already mentioned, regular expressions in JavaScript are not plain strings:
https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
var url = "property_id=516&category=featured-properties",
urlRE = url.match(/(category=)([a-z-]+)/i); //don't forget i if you want to match also uppercase letters in category "property_id=516&category=Featured-Properties"
//urlRE = url.match(/(?<=(category=))[a-z-]+/i); //this is a mess
alert(urlRE[2]);