several regex queries - javascript

I need some help writing regex.
this is my first regex expression(match either English or Hebrew chars):
/^(?:[\u0590-\u05FF\uFB1D-\uFB40]+|[\w]+)$/i
this should be match: abc, אבג
this should not be match:a , b, aא
It works ok, I just need to also add limitaion for more then 1 char.
The next one should be exactly like the one above(including the more than 1 char
limitation) but to also allow spaces.
this should be match: abcx, abcx ascx, דגהק , שגד דשגב
this should not be match:a , b, asaceדגעההת, ascasv אקיכרעקכ
The last Regex expression should be all digits, contain exactly 10 digits
and to start with the numbers 05.
this should be match: 0528547114
this should not be match: digits, special chars, less or more than
10 digits.
I'm using JS and C# Regex.
Any help would be much appreciated.

To match more than 1 character use the quantifier {2,} instead of +:
/^(?:[\u0590-\u05FF\uFB1D-\uFB40]{2,}|[\w]{2,})$/i
To match space, add it in the character class:
/^(?:[\u0590-\u05FF\uFB1D-\uFB40 ]{2,}|[\w ]{2,})$/i
To match 10 digits:
/^(?:[\u0590-\u05FF\uFB1D-\uFB40 ]{2,}|[\w ]{2,}|05\d{8})$/i
To match several words separated by one space:
/^(?:[\u0590-\u05FF\uFB1D-\uFB40]{2,}(?: [\u0590-\u05FF\uFB1D-\uFB40]{2,})*|\w{2,}(?: \w{2,})*|05\d{8})$/i

Related

Regex to validate an alphanumeric string with optional space

I have been working on a regex for validating an alphanumeric string with the rules as below:
The first FOUR starting characters must be numbers and last
TWO characters must be alphabets.
The space is OPTIONAL but must be placed between two characters,
meaning trailing space is not allowed.
The length of postal code must be 6 characters if SPACE is
not included and 7 characters if space is included.
Eg.
1111 ZZ
111 1ZZ
1 111ZZ
1111ZZ
I tried using ^[0-9]{4}[A-Za-z]{2}$|^(?=[\d|\D]+ [\d|\D]+).{7}$ but this also validates 9999 1A as TRUE which should actually be FALSE.
Any leads or help will be appreciated :)
(?=^.{6,7}$)^(([0-9] ?){4}( ?[a-zA-Z]){2})$
will match
1111 ZZ
111 1ZZ
1 111ZZ
1111ZZ
1111 ZZ
but not
9999 1A
11111 Z
1111111
11 11 ZZ
https://regex101.com/r/lByOx6/1
EDIT: explanation
The "Positive Lookahead" part:
(?=^.{6,7}$) this only matches if the string meets the requirements, BUT it does not 'consume' the characters.
. is any character
{6,7} is about repetitions
so (?=^.{6,7}$) is matched if the string has 6 or 7 characters, no matter what
Then the following part already 'consumes' the string to say that I want at the start 4 repetitions of numbers and optionally space, and at the end 2 repetitions of letters and optionally space. The second part would accept strings such as 1 1 1 1 Z Z but as those are more than 7 characters, the first part wouldn't let the string match.
I suggest simplifying the problem ahead of time, by reducing all white spaces, which you seem to be uninterested in anyway:
var candidate = input.replaceAll(/\s/mg, '');
Then the regex is simply: /^\d{4}[A-Za-z]{2}$/
If, however, you need to validate, that there actually are no leading or trailing white spaces, you can validate that ahead of time, and return a negative result right away.
Another option is to check if the string contains an optional space between the first and the last non whitespace character.
Then match the first digit followed by 3 digits separated by an optional space and 2 or 3 times a char a-zA-Z or a space.
Using a case insensitive match:
^(?=\S+ ?\S+$)\d(?: ?\d){3}[A-Z ]{2,3}$
Explanation
^ Start of string
(?= Positive lookahead, assert what on the right is
\S+ ?\S+$ Match optional space between the first and the last non whitespace char
) Close lookahead
\d(?: ?\d){3} Match a digit and repeat 3 times an optional space and a digit
[a-zA-Z ]{2,3} Match 2-3 times either a char a-zA-Z or a space
$ End of string
Regex demo

Regex-How can I filter out any grouping of 3 numbers? [duplicate]

I'm attempting to string match 5-digit coupon codes spread throughout a HTML web page. For example, 53232, 21032, 40021 etc... I can handle the simpler case of any string of 5 digits with [0-9]{5}, though this also matches 6, 7, 8... n digit numbers. Can someone please suggest how I would modify this regular expression to match only 5 digit numbers?
>>> import re
>>> s="four digits 1234 five digits 56789 six digits 012345"
>>> re.findall(r"\D(\d{5})\D", s)
['56789']
if they can occur at the very beginning or the very end, it's easier to pad the string than mess with special cases
>>> re.findall(r"\D(\d{5})\D", " "+s+" ")
Without padding the string for special case start and end of string, as in John La Rooy answer one can use the negatives lookahead and lookbehind to handle both cases with a single regular expression
>>> import re
>>> s = "88888 999999 3333 aaa 12345 hfsjkq 98765"
>>> re.findall(r"(?<!\d)\d{5}(?!\d)", s)
['88888', '12345', '98765']
full string: ^[0-9]{5}$
within a string: [^0-9][0-9]{5}[^0-9]
Note: There is problem in using \D since \D matches any character that is not a digit , instead use \b.
\b is important here because it matches the word boundary but only at end or beginning of a word .
import re
input = "four digits 1234 five digits 56789 six digits 01234,56789,01234"
re.findall(r"\b\d{5}\b", input)
result : ['56789', '01234', '56789', '01234']
but if one uses
re.findall(r"\D(\d{5})\D", s)
output : ['56789', '01234']
\D is unable to handle comma or any continuously entered numerals.
\b is important part here it matches the empty string but only at end or beginning of a word .
More documentation: https://docs.python.org/2/library/re.html
More Clarification on usage of \D vs \b:
This example uses \D but it doesn't capture all the five digits number.
This example uses \b while capturing all five digits number.
Cheers
A very simple way would be to match all groups of digits, like with r'\d+', and then skip every match that isn't five characters long when you process the results.
You probably want to match a non-digit before and after your string of 5 digits, like [^0-9]([0-9]{5})[^0-9]. Then you can capture the inner group (the actual string you want).
You could try
\D\d{5}\D
or maybe
\b\d{5}\b
I'm not sure how python treats line-endings and whitespace there though.
I believe ^\d{5}$ would not work for you, as you likely want to get numbers that are somewhere within other text.
I use Regex with easier expression :
re.findall(r"\d{5}", mystring)
It will research 5 numerical digits. But you have to be sure not to have another 5 numerical digits in the string

How to create Regex to filter out results with few complex conditions regarding length, case and classes of characters

I have the following filtered:
2 digits (?=..*\d)
2 uppercase characters (?=..*[a-z])
2 lowercase characters (?=..*[A-Z])
10 to 63 characters .{10,63}$
Which translates to:
(?=.{2,}\d)(?=..*[a-z])(?=..*[A-Z]).{10,63}
Then I want to exclude a word starting with the letter u, and ending with three to six digits:
([uU][0-9]{3,6})
However, how can I merge these two patterns to do the following:
It should not allow the following because it respectively:
# does not have the required combination of characters
aaaaaaaaaaaaaaa
# is too long
asadsfdfs12BDFsdfsdfdsfsdfsdfdsfdsfdfsdfsdfsdfsdsfdfsdfsdfssdfdfsdfssdfdfsdfssdfdfsdfsdfsdfsdfsfdsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfs
# contains the pattern that shouldn't be allowed
U0000ABcd567890
ABcd56U00007890
D4gF3U432234
D4gF3u432234
ABcd567890U123456
should allow the following:
# it has the required combination of characters
ABcd5678990
ABcd567890
# does contain a part of the disallowed pattern (`([uU][0-9]{3,6})`), but does not fit that pattern entirely
ABcd567890U12
ABcd5U12abcdf
s3dU00sDdfgdg
ABcd56U007890
Created and example here: https://regex101.com/r/4b2Hu9/3
In your pattern you make use of a lookahead (?=..*\d) which has a different meaning than you assume.
It means if what is directly on the right is 2 or more times any char except a newline followed by a single digit and the same for the upper and lowercase variants.
You could update your pattern to:
^(?!.*[uU]\d{3,6})(?=(?:\D*\d){2})(?=(?:[^a-z]*[a-z]){2})(?=(?:[^A-Z]*[A-Z]){2}).{10,63}$
In parts
^ Start of string
(?!.*[uU]\d{3,6}) Negative lookahead, assert not u or U followed by 3-6 digits
(?=(?:\D*\d){2}) Assert 2 digits
(?=(?:[^a-z]*[a-z]){2}) Assert 2 lowercase chars
(?=(?:[^A-Z]*[A-Z]){2}) Assert 2 uppercase chars
.{10,63} Match any char except a newline 10-63 times
$ End of string
Regex demo
First, the way to ensure that the string contains, for example, two digits would be to use a positive lookahead:
(?=.*\d.*\d)
You can generalize this to your other filters.
To make sure the string contains 10 - 63 characters:
.{10,63}
You say you do not want the string to begin with u or U followed by 3 to 6 digits (presumbaly 7 digits is okay), use a negative lookahead:
(?![uU]\d{3,6}\D)
The \D is required to make sure that if there is a 7th digit, then the string will be accepted.
Putting it all together:
r'^(?=.*\d.*\d)(?=.*[a-z].*[a-z])(?=.*[A-Z].*[A-Z])(?![uU]\d{3,6}\D).{10,63}$'

How to match digit in middle of a string efficiently in javascript?

I have strings like
XXX-1234
XXXX-1234
XX - 4321
ABCDE - 4321
AB -5677
So there will be letters at the beginning. then there will be hyphen. and then 4 digits. Number of letters may vary but number of digits are same = 4
Now I need to match the first 2 positions from the digits. So I tried a long process.
temp_digit=mystring;
temp_digit=temp_digit.replace(/ /g,'');
temp_digit=temp_digit.split("-");
if(temp_digit[1].substring(0,2)=='12') {}
Now is there any process using regex / pattern matching so that I can do it in an efficient way. Something like string.match(regexp) I'm dumb in regex patterns. How can I find the first two digits from 4 digits from above strings ? Also it would be great it the solution can match digits without hyphens like XXX 1234 But this is optional.
Try a regular expression that finds at least one letter [a-zA-Z]+, followed by some space if necessary \s*, followed by a hyphen -, followed by some more space if necessary \s*. It then matches the first two digits \d{2} after the pattern.:
[a-zA-Z]+\s*-\s*(\d{2})
may vary but number of digits are same = 4
Now I need to match the first 2 positions from the digits.
Also it would be great it the solution can match digits without hyphens like XXX 1234 But this is optional.
Do you really need to check it starts with letters? How about matching ANY 4 digit number, and capturing only the first 2 digits?
Regex
/\b(\d{2})\d{2}\b/
Matches:
\b a word boundary
(\d{2}) 2 digits, captured in group 1, and assigned to match[1].
\d{2} 2 more digits (not captured).
\b a word boundary
Code
var regex = /\b(\d{2})\d{2}\b/;
var str = 'ABCDE 4321';
var result = str.match(regex)[1];
document.body.innerText += result;
If there are always 4 digits at the end, you can simply slice it:
str.trim().slice(-4,-2);
here's a jsfiddle with the example strings:
https://jsfiddle.net/mckinleymedia/6suffmmm/

Regex match second dot in number

I'm trying to match the second dot in a number to replace it later with a white space in my 'find and replace' function in Aptana.
I tried a lot of expressions, none of them worked for me.
For example I take the number:
48.454.714 (I want to replace the dot between 454 and 714)
Try this regex:
(\d{3})\.(\d{3})
and replace the first and second capturing group \1 \2
as mentioned by FiveO, you might want to match other numbers of digits too. E.g. one to 3 digits: \d{1,3} or any number of digits: \d+
Try with following regex:
\d+\.\d+(\.)\d+
And replace it with white space.

Categories