Regex for extensive phone number validation - javascript

I have a number of rules which I need to apply to a phone number input field, following is my attempt:
var positive_checks = new Array(
/^[0-9]{8}$/g // 1. Must have 8 digits exactly
);
var negative_checks = new Array(
/^[0147]/g, // 2. Must not start with 0,1,4 or 7
/^[9]{3}/g, // 3. Must not start with 999
/(.)\\1*$/g // 4. Must not be all the same number
);
for (i in positive_checks) {
if (str.search(positive_checks[i]) < 0) {
return false;
}
}
for (i in negative_checks) {
if (str.search(negative_checks[i]) >= 0) {
return false;
}
}
All rules are working except rule 4, which I don't fully understand, other than it uses back-references somehow. I think there was mention that the environment needs to allow for back-references, is Javascript such an environment?
Secondarily, I'd be interested to try and rework all rules so I only need to have a single rule array and loop and not need to check for negative checks, is that possible in each of these instances? Ultimately I'm looking for a Javascript solution, however being able to use regex for all 4 makes it nicer looking code in my opinion, and being form validation logic means that performance is not really an issue here.

Your number four rule probably doesn't work because of the double backslashes you have for your backreference and I would also anchor it and change the * quantifier to + meaning "one or more times"
/^(.)\1+$/g
Explanation:
^ # the beginning of the string
( # group and capture to \1:
. # any character except \n
) # end of \1
\1+ # what was matched by capture \1 (1 or more times)
$ # before an optional \n, and the end of the string
A one-liner that will validate all of your requirements:
var re = /^(?=.{8}$)(?!999|[0147]|(.)\1+)[0-9]+$/

Use regexr.com/39khr and hover the different parts of your expression to see what they do.
As you do not say what doesn't work, ie: giving examples of a false number that should be true or the other way around, it's very hard to give you an answer.

Related

Regex taking long time to evaluate

At the time of login, I need to allow either username (alphanumeric and some special characters) or email address or username\domain format only. For this purpose, I used this regex with or (|) condition. Along with this, I need to allow some other language characters like Japanese, Chinese etc., so included those as well in the same regex. Now, the issue is when I enter characters (>=30) and # or some special character, the evaluation of this regex is taking some seconds and browser goes in hang mode.
export const usernameRegex = /(^[a-zA-Z0-9._~^#!%+\-]+#[a-z0-9.-]+\.[a-z]{2,4})+|^[a-zA-Z0-9._~^#!\-]+\\([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+|^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+$/gu;
When I tried removing the other language character set such as [\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+|^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF] it works fine.
I understood that generally regex looks simple but it does a lot under the hood. Is there any modification that needs to be done in this regex, so that it doesn't take time to evaluate. Any help is much appreciated!
Valid texts:
stackoverflow,
stackoverflow1~,
stackoverflow!#~^-,
stackoverflow#g.co,
stackoverflow!#~^-#g.co,
こんにちは,
你好,
tree\guava
EDIT:
e.g. Input causing the issue
stackoverflowstackoverflowstackoverflow#
On giving the above text it is taking long time.
https://imgur.com/T2Vg4lg
Your regex seems to consist of three regular expressions concatenated with |
(^[a-zA-Z0-9._~^#!%+\-]+#[a-z0-9.-]+\.[a-z]{2,4})+
^[a-zA-Z0-9._~^#!\-]+\\([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+
^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+$
first regex (^...)+ how many times do you think this entire pattern can occur that starts at the beginning of the string. Either it's a second occurence OR it starts at the beginning of the string it can't be both.
So ^[a-zA-Z0-9._~^#!%+\-]+#[a-z0-9.-]+\.[a-z]{2,4}
parts 2 and 3 are mostly identical, only that nr. 2 contains this block [a-zA-Z0-9._~^#!\-]+\\ followed by what's the rest of the 3rd part.
So let's combine them: ^(?:[a-zA-Z0-9._~^#!\-]+\\)? ... and make sure to use non-capturing groups when possible.
([abc]|[def])+ can be simplified to [abcdef]+. This btw. is the part that's killing your performance.
your regex ends with a $. This was only part of the last part, but I assume you always want to match the entire string? So let's make all 3 (now 2) parts ^ ... $
Summary:
/^[a-zA-Z0-9._~^#!%+-]+#[a-z0-9.-]+\.[a-z]{2,4}$|^(?:[a-zA-Z0-9._~^#!-]+\\)?[._-~^#!\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF]+$/u
A JS example how a simple regex would try to match a string, and how it fails, backtracks, retries with the other side of the | and so on, and so on.
// let's implement what `/([a-z]|[\p{Ll}])+/u` would do,
// how it would try to match something.
const a = /[a-z]/; // left part
const b = /[\p{Ll}]/u; // right part
const string = "abc,";
const testNextCharacter = (index) => {
if (index === string.length) {
return true;
}
const pattern = index + " ".repeat(index + 1) + "%o.test(%o)";
const character = string.charAt(index);
console.log(pattern, a, character);
// checking the left part && if successful checking the next character
if (a.test(character) && testNextCharacter(index + 1)) {
return true;
}
// checking the right part && if successful checking the next character
console.log(pattern, b, character);
if (b.test(character) && testNextCharacter(index + 1)) {
return true;
}
return false;
}
console.log("result", testNextCharacter(0));
.as-console-wrapper{top:0;max-height:100%!important}
And this are only 4 characters. Why don't you try this with 5,6 characters to get an impression how much work this will be at 20characters.

Applying currency format using replace and a regular expression

I am trying to understand some code where a number is converted to a currency format. Thus, if you have 16.9 it converts to $16.90. The problem with the code is if you have an amount over $1,000, it just returns $1, an amount over $2,000 returns $2, etc. Amounts in the hundreds show up fine.
Here is the function:
var _formatCurrency = function(amount) {
return "$" + parseFloat(amount).toFixed(2).replace(/(\d)(?=(\d{3})+\.)/g, '$1,')
};
(The reason the semicolon is after the bracket is because this function is in itself a statement in another function. That function is not relevant to this discussion.)
I found out that the person who originally put the code in there found it somewhere but didn't fully understand it and didn't test this particular scenario. I myself have not dealt much with regular expressions. I am not only trying to fix it, but to understand how it is working as it is now.
Here's what I've found out. The code between the backslash after the open parenthesis and the backslash before the g is the pattern. The g means global search. The \d means digit, and the (?=\d{3})+\. appears to mean find 3 digits plus a decimal point. I'm not sure I have that right, though, because if that was correct shouldn't it ignore numbers like 5.4? That works fine. Also, I'm not sure what the '$1,' is for. It looks to me like it is supposed to be placed where the digits are, but wouldn't that change all the numbers to $1? Also, why is there a comma after the 1?
Regarding your comment
I was hoping to just edit the regex so it would work properly.
The regex you are currently using is obviously not working for you so I think you should consider alternatives even if they are not too similar, and
Trying to keep the code change as small as possible
Understandable but sometimes it is better to use a code that is a little bit bigger and MORE READABLE than to go with compact and hieroglyphical.
Back to business:
I'm assuming you are getting a string as an argument and this string is composed only of digits and may or may not have a dot before the last 1 or 2 digts. Something like
//input //intended output
1 $1.00
20 $20.00
34.2 $34.20
23.1 $23.10
62516.16 $62,516.16
15.26 $15.26
4654656 $4,654,656.00
0.3 $0.30
I will let you do a pre-check of (assumed) non-valids like 1. | 2.2. | .6 | 4.8.1 | 4.856 | etc.
Proposed solution:
var _formatCurrency = function(amount) {
amount = "$" + amount.replace(/(\d)(?=(\d{3})+(\.(\d){0,2})*$)/g, '$1,');
if(amount.indexOf('.') === -1)
return amount + '.00';
var decimals = amount.split('.')[1];
return decimals.length < 2 ? amount + '0' : amount;
};
Regex break down:
(\d): Matches one digit. Parentheses group things for referencing when needed.
(?=(\d{3})+(\.(\d){0,2})*$). Now this guy. From end to beginning:
$: Matches the end of the string. This is what allows you to match from the end instead of the beginning which is very handy for adding the commas.
(\.(\d){0,2})*: This part processes the dot and decimals. The \. matches the dot. (\d){0,2} matches 0, 1 or 2 digits (the decimals). The * implies that this whole group can be empty.
?=(\d{3})+: \d{3} matches 3 digits exactly. + means at least one occurrence. Finally ?= matches a group after the main expression without including it in the result. In this case it takes three digits at a time (from the end remember?) and leaves them out of the result for when replacing.
g: Match and replace globally, the whole string.
Replacing with $1,: This is how captured groups are referenced for replacing, in this case the wanted group is number 1. Since the pattern will match every digit in the position 3n+1 (starting from the end or the dot) and catch it in the group number 1 ((\d)), then replacing that catch with $1, will effectively add a comma after each capture.
Try it and please feedback.
Also if you haven't already you should (and SO has not provided me with a format to stress this enough) really really look into this site as suggested by Taplar
The pattern is invalid, and your understanding of the function is incorrect. This function formats a number in a standard US currency, and here is how it works:
The parseFloat() function converts a string value to a decimal number.
The toFixed(2) function rounds the decimal number to 2 digits after the decimal point.
The replace() function is used here to add the thousands spearators (i.e. a comma after every 3 digits). The pattern is incorrect, so here is a suggested fix /(\d)(?=(\d{3})+\.)/g and this is how it works:
The (\d) captures a digit.
The (?=(\d{3})+\.) is called a look-ahead and it ensures that the captured digit above has one set of 3 digits (\d{3}) or more + followed by the decimal point \. after it followed by a decimal point.
The g flag/modifier is to apply the pattern globally, that is on the entire amount.
The replacement $1, replaces the pattern with the first captured group $1, which is in our case the digit (\d) (so technically replacing the digit with itself to make sure we don't lose the digit in the replacement) followed by a comma ,. So like I said, this is just to add the thousands separator.
Here are some tests with the suggested fix. Note that it works fine with numbers and strings:
var _formatCurrency = function(amount) {
return "$" + parseFloat(amount).toFixed(2).replace(/(\d)(?=(\d{3})+\.)/g, '$1,');
};
console.log(_formatCurrency('1'));
console.log(_formatCurrency('100'));
console.log(_formatCurrency('1000'));
console.log(_formatCurrency('1000000.559'));
console.log(_formatCurrency('10000000000.559'));
console.log(_formatCurrency(1));
console.log(_formatCurrency(100));
console.log(_formatCurrency(1000));
console.log(_formatCurrency(1000000.559));
console.log(_formatCurrency(10000000000.559));
Okay, I want to apologize to everyone who answered. I did some further tracing and found out the JSON call which was bringing in the amount did in fact have a comma in it, so it is just parsing that first digit. I was looking in the wrong place in the code when I thought there was no comma in there already. I do appreciate everyone's input and hope you won't think too bad of me for not catching that before this whole exercise. If nothing else, at least I now know how that regex operates so I can make use of it in the future. Now I just have to go about removing that comma.
Have a great day!
Assuming that you are working with USD only, then this should work for you as an alternative to Regular Expressions. I have also included a few tests to verify that it is working properly.
var test1 = '16.9';
var test2 = '2000.5';
var test3 = '300000.23';
var test4 = '3000000.23';
function stringToUSD(inputString) {
const splitValues = inputString.split('.');
const wholeNumber = splitValues[0].split('')
.map(val => parseInt(val))
.reverse()
.map((val, idx, arr) => idx !== 0 && (idx + 1) % 3 === 0 && arr[idx + 1] !== undefined ? `,${val}` : val)
.reverse()
.join('');
return parseFloat(`${wholeNumber}.${splitValues[1]}`).toFixed(2);
}
console.log(stringToUSD(test1));
console.log(stringToUSD(test2));
console.log(stringToUSD(test3));
console.log(stringToUSD(test4));

Regex- match 3 or 6 of type

I'm writing an application that requires color manipulation, and I want to know when the user has entered a valid hex value. This includes both '#ffffff' and '#fff', but not the ones in between, like 4 or 5 Fs. My question is, can I write a regex that determines if a character is present a set amount of times or another exact amount of times?
What I tried was mutating the:
/#(\d|\w){3}{6}/
Regular expression to this:
/#(\d|\w){3|6}/
Obviously this didn't work. I realize I could write:
/(#(\d|\w){3})|(#(\d|\w){6})/
However I'm hoping for something that looks better.
The shortest I could come up with:
/#([\da-f]{3}){1,2}/i
I.e. # followed by one or two groups of three hexadecimal digits.
You can use this regex:
/#[a-f\d]{3}(?:[a-f\d]{3})?\b/i
This will allow #<3 hex-digits> or #<6 hex-digits> inputs. \b in the end is for word boundary.
RegEx Demo
I had to find a pattern for this myself today but I also needed to include the extra flag for transparency (i.e. #FFF5 / #FFFFFF55). Which made things a little more complicated as the valid combinations goes up a little.
In case it's of any use, here's what I came up with:
var inputs = [
"#12", // Invalid
"#123", // Valid
"#1234", // Valid
"#12345", // Invalid
"#123456", // Valid
"#1234567", // Invalid
"#12345678", // Valid
"#123456789" // Invalid
];
var regex = /(^\#(([\da-f]){3}){1,2}$)|(^\#(([\da-f]){4}){1,2}$)/i;
inputs.forEach((itm, ind, arr) => console.log(itm, (regex.test(itm) ? "valid" : "-")));
Which should return:
#123 valid
#1234 valid
#12345 -
#123456 valid
#1234567 -
#12345678 valid
#123456789 -

Regex pattern does not work properly.. but only in Extjs / Javascript

I've built a regex pattern to check the strength of passwords:
(?=^.{8,15}$)((?=.*\d)(?=.*[A-Z])(?=.*[a-z])|(?=.*\d)(?=.*[^A-Za-z0-9])(?=.*[a-z])|(?=.*[^A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z])|(?=.*\d)(?=.*[A-Z])(?=.*[^A-Za-z0-9]))^.*
It forces the user to use a password which characters originate from at least 3 of the 4 following categories:
at least 1 upper case character
at least 1 lower case character
at least 1 numerical character
at least 1 special character / symbol
Note: It also enforces a min and max length {8,15}
The pattern works fine on a server side PHP script and I've also tested it with multiple javascript Regex-tester-tools (e.g. http://www.regular-expressions.info/javascriptexample.html). Everything looks perfect so far...
BUT, if I'm using it inside of a simple Extjs textfield validator, the validator only returns TRUE, if I'm using all 4 categories.
validator: function (value) {
var pattern =
'(?=^.{8,15}$)'+
'((?=.*\d)(?=.*[A-Z])(?=.*[a-z])|' +
'(?=.*\d)(?=.*[^A-Za-z0-9])(?=.*[a-z])|' +
'(?=.*[^A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z])|' +
'(?=.*\d)(?=.*[A-Z])(?=.*[^A-Za-z0-9]))^.*';
if (value.match(pattern)) {
return true;
} else {
return this.i18n.invalidPassword;
}
}
And now, I'm running out of ideas...
You're setting up the pattern incorrectly:
var pattern = new RegExp(
'(?=^.{8,15}$)'+
'((?=.*\\d)(?=.*[A-Z])(?=.*[a-z])|' +
'(?=.*\\d)(?=.*[^A-Za-z0-9])(?=.*[a-z])|' +
'(?=.*[^A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z])|' +
'(?=.*\\d)(?=.*[A-Z])(?=.*[^A-Za-z0-9]))^.*'
);
Note the \\ instead of \. If you don't do that, the \ will be gone by the time the regular expression code gets to it. You could alternatively use native regex syntax, but there's no way to break that up across multiple lines.
edit — specifically, the \\ before the \d occurrences in your regex. If you don't double the backslash, then the regular expression will just see a lower-case "d".

Sort lines on webpage using javascript/ regex

I'd like to write a Greasemonkey script that requires finding lines ending with a string ("copies.") & sorting those lines based on the number preceding that string.
The page I'm looking to modify does not use tables unfortunately, just the br/ tag, so I assume that this will involve Regex:
http://www.publishersweekly.com/article/CA6591208.html
(Lines without the matching string will just be ignored.)
Would be grateful for any tips to get me started.
Most times, HTML and RegEx do not go together, and when parsing HTML your first thought should not be RegEx.
However, in this situation, the markup looks simple enough that it should be okay - at least until Publisher Weekly change how they do that page.
Here's a function that will extract the data, grab the appropriate lines, sort them, and put them back again:
($j is jQuery)
function reorderPwList()
{
var Container = $j('#article span.table');
var TargetLines = /^.+?(\d+(?:,\d{3})*) copies\.<br ?\/?>$/gmi
var Lines = Container.html().match( TargetLines );
Lines.sort( sortPwCopies );
Container.html( Lines.join('\n') );
function sortPwCopies()
{
function getCopyNum()
{ return arguments[0].replace(TargetLines,'$1').replace(/\D/g,'') }
return getCopyNum(arguments[0]) - getCopyNum(arguments[1]);
}
}
And an explanation of the regex used there:
^ # start of line
.+? # lazy match one or more non-newline characters
( # start capture group $1
\d+ # match one or more digits (0-9)
(?: # non-capture group
,\d{3} # comma, then three digits
)* # end group, repeat zero or more times
) # end group $1
copies\. # literal text, with . escaped
<br ?\/?> # match a br tag, with optional space or slash just in case
$ # end of line
(For readability, I've indented the groups - only the spaces before 'copies' and after 'br' are valid ones.)
The regex flags gmi are used, for global, multi-line mode, case-insensitive matching.
<OLD ANSWER>
Once you've extracted just the text you want to look at (using DOM/jQuery), you can then pass it to the following function, which will put the relevant information into a format that can then be sorted:
function makeSortable(Text)
{
// Mark sortable lines and put number before main content.
Text = Text.replace
( /^(.*)([\d,]+) copies\.<br \/>/gm
, "SORT ME$2 $1"
);
// Remove anything not marked for sorting.
Text = Text.replace( /^(?!SORT ME).*$/gm , '' );
// Remove blank lines.
Text = Text.replace( /\n{2,}/g , '\n' );
// Remove sort token.
Text = Text.replace( /SORT ME/g , '' );
return Text;
}
You'll then need a sort function to ensure that the numbers are sorted correctly (the standard JS array.sort method will sort on text, and put 100,000 before 20,000).
Oh, and here's a quick explanation of the regexes used here:
/^(.*)([\d,]+) copies\.<br \/>/gm
/.../gm a regex with global-match and multi-line modes
^ matches start of line
(.*) capture to $1, any char (except newline), zero or more times
([\d,]+) capture to $2, any digit or comma, one or more times
copies literal text
\.<br \/> literal text, with . and / escaped (they would be special otherwise)
/^(?!SORT ME).*$/gm
/.../gm again, enable global and multi-line
^ match start of line
(?!SORT ME) a negative lookahead, fails the match if text 'SORT ME' is after it
.* any char (except newline), zero or more times
$ end of line
/\n{2,}/g
\n{2,} a newline character, two or more times
</OLD ANSWER>
you can start with something like this (just copypaste into the firebug console)
// where are the things
var elem = document.getElementById("article").
getElementsByTagName("span")[1].
getElementsByTagName("span")[0];
// extract lines into array
var lines = []
elem.innerHTML.replace(/.+?\d+\s+copies\.\s*<br>/g,
function($0) { lines.push($0) });
// sort an array
// lines.sort(function(a, b) {
// var ma = a.match(/(\d+),(\d+)\s+copies/);
// var mb = b.match(/(\d+),(\d+)\s+copies/);
//
// return parseInt(ma[1] + ma[2]) -
// parseInt(mb[1] + mb[2]);
lines.sort(function(a, b) {
function getNum(p) {
return parseInt(
p.match(/([\d,]+)\s+copies/)[1].replace(/,/g, ""));
}
return getNum(a) - getNum(b);
})
// put it back
elem.innerHTML = lines.join("");
It's not clear to me what it is you're trying to do. When posting questions here, I encourage you to post (a part of) your actual data and clearly indicate what exactly you're trying to match.
But, I am guessing you know very little regex, in which case, why use regex at all? If you study the topic a bit, you will soon know that regex is not some magical tool that produces whatever it is you're thinking of. Regex cannot sort in whatever way. It simply matches text, that's all.
Have a look at this excellent on-line resource: http://www.regular-expressions.info/
And if after reading you think a regex solution to your problem is appropriate, feel free to elaborate on your question and I'm sure I, or someone else is able to give you a hand.
Best of luck.

Categories