I'm using this /[-\+,\.0-9]+/ to match numbers in strings like +4400,00 % or -3500,00 % or 0.00 %.
The matched results I want is +4400,00 and I correctly get it.
What if I wanted the same results for a string like +4.400,00 % (dot for thousands) ?
EDIT
How do I have to modify my RegEx for matching numbers in strings like <font color="red">+44.500 %</font>?
/[\-\+]?\s*[0-9]{1,3}(\.[0-9]{3})*,[0-9]+/
That should cover strings that
may start with a + or -, and then perhaps some whitespaces
then have between one and three numbers
then have groups of three numbers, prefixed with a period
then have a comma and at least one number behind the comma
Regarding your additional question (matching numbers inside strings), you should look into the manual of whatever regex API you're using. Most APIs have separate search and match methods; match wants the whole string to be part of your regular expression's language, while search will also match substrings.
[\+-]? - plus or minus
\d{1,3} - some digits
(\.\d{3})* - groups of 3 digits with point before
,\d{2} comma and 2 more digits
And so we get:
/[+-]?\d{1,3}(\.\d{3})*,\d{2}/
Your regex will already match ".". But it sounds like you also want to strip "." out? if that's the case, you need a substiution. In Perl,
if ($input =~ /(-|\+)[0-9][,\.0-9]+/) {
$input =~ s/\.//;
} else {
die;
}
I've also changed the regex so it will only match - and + at the start, and so it requires an initial digit
Related
I have a couple of regex which I am planning to combine.
So the first regex is as below (allows amounts with particular thousand and decimal separators)
"^-?(\\d+|\\d{1,3}(,\\d{3})*)?(\\.(\\d+)?)?$"
I have similar other regexes (based on different locales e.g. other one would have comma as the decimal separator)
So with the above regex, following are Valid/Invalid values
123.11 (Valid)
1'23 (Invalid)
With the second regex, I want that the string can contain a max of 13 digits (including before or after the decimal)
^[^\\d]*?(\\d|\\d[^\\d]+){0,13}$
With the above regex, following are Valid/Invalid values
1234567890123 (Valid - 13 digits)
12345678901234 (Invalid - 14 digits)
1234567890.123 (Valid as 13 digits...10.3)
1234567890.1234 (Invalid as 14 digits...10.4)
Is it possible to somehow consolidate the 2 regex?
However, I do not want to touch the first regex (have different combinations based on different locales). But it would be nice to somehow dynamically append the 2nd regex into the first one ?
So, I am flexible with the 2nd regex as that is not based on any locale, but is going to be the same always and mainly validates for max of 13 digits in the string.
I'll then validate my string using the consolidated regex.
You may keep the first pattern as is, and just prepend it with
(?=^\D*(?:\d\D*){0,13}$)
The (?=^\D*(?:\d\D*){0,13}$) pattern represents a positive lookahead that matches a location that is immediately followed with
^ - start of string
\D* - 0+ non-digits
(?:\d\D*){0,13} - 0 to 13 occurrences of a digit followed with a non-digit char
$ - end of string.
Full JavaScript regex definition:
var regex1 = "^-?(\\d+|\\d{1,3}(,\\d{3})*)?(\\.(\\d+)?)?$"; // Not to be touched
var consolidated_regex = "(?=^\\D*(?:\\d\\D*){0,13}$)" + regex1;
See full regex demo.
Details
I'm attempting to string match 5-digit coupon codes spread throughout a HTML web page. For example, 53232, 21032, 40021 etc... I can handle the simpler case of any string of 5 digits with [0-9]{5}, though this also matches 6, 7, 8... n digit numbers. Can someone please suggest how I would modify this regular expression to match only 5 digit numbers?
>>> import re
>>> s="four digits 1234 five digits 56789 six digits 012345"
>>> re.findall(r"\D(\d{5})\D", s)
['56789']
if they can occur at the very beginning or the very end, it's easier to pad the string than mess with special cases
>>> re.findall(r"\D(\d{5})\D", " "+s+" ")
Without padding the string for special case start and end of string, as in John La Rooy answer one can use the negatives lookahead and lookbehind to handle both cases with a single regular expression
>>> import re
>>> s = "88888 999999 3333 aaa 12345 hfsjkq 98765"
>>> re.findall(r"(?<!\d)\d{5}(?!\d)", s)
['88888', '12345', '98765']
full string: ^[0-9]{5}$
within a string: [^0-9][0-9]{5}[^0-9]
Note: There is problem in using \D since \D matches any character that is not a digit , instead use \b.
\b is important here because it matches the word boundary but only at end or beginning of a word .
import re
input = "four digits 1234 five digits 56789 six digits 01234,56789,01234"
re.findall(r"\b\d{5}\b", input)
result : ['56789', '01234', '56789', '01234']
but if one uses
re.findall(r"\D(\d{5})\D", s)
output : ['56789', '01234']
\D is unable to handle comma or any continuously entered numerals.
\b is important part here it matches the empty string but only at end or beginning of a word .
More documentation: https://docs.python.org/2/library/re.html
More Clarification on usage of \D vs \b:
This example uses \D but it doesn't capture all the five digits number.
This example uses \b while capturing all five digits number.
Cheers
A very simple way would be to match all groups of digits, like with r'\d+', and then skip every match that isn't five characters long when you process the results.
You probably want to match a non-digit before and after your string of 5 digits, like [^0-9]([0-9]{5})[^0-9]. Then you can capture the inner group (the actual string you want).
You could try
\D\d{5}\D
or maybe
\b\d{5}\b
I'm not sure how python treats line-endings and whitespace there though.
I believe ^\d{5}$ would not work for you, as you likely want to get numbers that are somewhere within other text.
I use Regex with easier expression :
re.findall(r"\d{5}", mystring)
It will research 5 numerical digits. But you have to be sure not to have another 5 numerical digits in the string
I have the following specifications for a regex:
-> The string starts with a string of three numbers
-> It is followed by a '-'
-> That is followed by three uppercase vowels
-> That is followed by a '-'
-> That is followed by three numbers
-> That is followed by a final '-'
-> That is followed by the last three uppercase vowels.
-> Second set of numbers can not equal the first.
-> The second group of letters can not equal the first.
-> The groups of numbers may not contain zero.
A passable string is:
368-IOU-789-AIO.
An invalid string is:
368-AEO-368-AEI
354-AOU-431-AOU
Currently, I have something like this:
([0-9]+[0-9]+[0-9]+[/AEIOU/]+[0-9]+[0-9]+[0-9])
What you have won't work since + means "one or more of". For example, the sequence [0-9]+[0-9]+[0-9]+ will match anywhere between three and an infinite number of digits.
In addition, your current attempt:
allows for one to an infinite number of vowels (and possibly / character);
doesn't require a vowel set at the end;
doesn't require the - separators; and
may allow for arbitrary content before and after the match.
You should be able to use the {count} specifier to get an exact quantity. All but one of those limitations can be done with any basic regex engine, with something like:
^[1-9]{3}-[AEIOU]{3}-[1-9]{3}-[AEIOU]{3}$
The ^ and $ anchors means start and end of string, [1-9]{3} gives you exactly three non-zero digits, [AEIOU]{3} gives you exactly three vowels, and - gives you the literal separator character.
The "groups cannot be identical" rule is a little more problematic. I would just post process for that to ensure it's not violated. The following pseudo-code is what I mean:
def isValid(str):
if not str.regex_match("^[1-9]{3}-[AEIOU]{3}-[1-9]{3}-[AEIOU]{3}$"):
return false
return str[0..2] != str[8..10]
and str[4..6] != str[12..14]
The alternative will be a rather complex regex that future developers will probably curse you for inflicting on them :-)
Note that your "The groups of numbers may not contain zero" is a little ambiguous in that it may mean no zeros are allowed or just 000 is not allowed. I've assumed the former but it's easy adjustable to cater for the latter:
def isValid(str):
if not str.regex_match("^[0-9]{3}-[AEIOU]{3}-[0-9]{3}-[AEIOU]{3}$"):
return false
return str[0..2] != str[8..10]
and str[4..6] != str[12..14]
and str[0..2] != "000"
and str[8..10] != "000"
You can use capture group and backreferences
^([0-9]{3})-([AEIOU]{3})-(?:(?!\1)[0-9]){3}-(?:(?!\2)[AEIOU]){3}$
Regex Demo
The groups of numbers may not contain zero. From this if you meant only digits between 1 to 9 then you can replace [0-9] with [1-9]
If you don't want to have 000 then you can add a negative lookahead ^(?!.*000) to avoid matching 000
This is from an exercise on FCC beta and i can not understand how the following code means two consecutive numbers seeing how \D* means NOT 0 or more numbers and \d means number, so how does this accumulate to two numbers in a regexp?
let checkPass = /(?=\w{5,})(?=\D*\d)/;
This does not match two numbers. It doesn't really match anything except an empty string, as there is nothing preceding the lookup.
If you want to match two digits, you can do something like this:
(\d)(\d)
Or if you really want to do a positive lookup with the (?=\D*\d) section, you will have to do something like this:
\d(?=\D*\d)
This will match against the last digit which is followed by a bunch of non-digits and a single digit. A few examples (matched numbers highlighted):
2 hhebuehi3
^
245673
^^^^^
2v jugn45
^ ^
To also capture the second digit, you will have to put brackets around both numbers. Ie:
(\d)(?=\D*(\d))
Here it is in action.
In order to do what your original example wants, ie:
number
5+ \w characters
a non-number character
a number
... you will need to precede your original example with a \d character. This means that your lookups will actually match something which isn't just an empty string:
\d(?=\w{5,})(?=\D*\d)
IMPORTANT EDIT
After playing around a bit more with a JavaScript online console, I have worked out the problem with your original Regex.
This matches a string with 5 or more characters, including at least 1 number. This can match two numbers, but it can also match 1 number, 3 numbers, 12 numbers, etc. In order to match exactly two numbers in a string of 5-or-more characters, you should specify the number of digits you want in the second half of your lookup:
let regex = /(?=\w{5,})(?=\D*\d{2})/;
let string1 = "abcd2";
let regex1 = /(?=\w{5,})(?=\D*\d)/;
console.log("string 1 & regex 1: " + regex1.test(string1));
let regex2 = /(?=\w{5,})(?=\D*\d{2})/;
console.log("string 1 & regex 2: " + regex2.test(string1));
let string2 = "abcd23";
console.log("string 2 & regex 2: " + regex2.test(string2));
My original answer was about Regex in a vacuum and I glossed over the fact that you were using Regex in conjunction with JavaScript, which works a little differently when comparing Regex to a string. I still don't know why your original answer was supposed to match two numbers, but I hope this is a bit more helpful.
?= Positive lookahead
w{5,} matches any word character (equal to [a-zA-Z0-9_])
{5,}. matches between 5 and unlimited
\D* matches any character that\'s not a digit (equal to [^0-9])
* matches between zero and unlimited
\d matches a digit (equal to [0-9])
This expression is global - so tries to match all
You can always check your expression using regex101
Finding a specific string is relatively easy, but I am not sure where to begin on this one. I would need to extract a string that would be different every time, but with similar characteristics.
Here are some example strings I need to find in a paragraph, either at the beginning, end or somewhere in the middle.
7b.9t.7iv.4x
4ir.4i.5i.6t
7ix.7t.4t.0z
As you can see the string will always begin with a number, and would have up to 2 characters after it and will always contain 4 octets separated by dots.
Let me know if you may need more details.
EDIT:
Thanks to the answer below I came up with this, while not pretty, does what I need.
$body="test 1f.9t.7iv.4x test 1a.9a.7ab.4xa test ";
$var=preg_match_all("([0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2})",$body,$matches);
$count=count($matches[0]);
$stack = array();
while($count > 0){
$count--;
array_push($stack, "<span id='ip_".$matches[0][$count]."'>".$matches[0][$count]."</span>");
}
$stack=array_reverse($stack);
$body=str_replace($matches[0],$stack,$body);
You can use a regular expression.
Something like this to get you started. There may be a better way to match since it's repeated, but....
([0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2}\.[0-9][a-z]{1,2})
( Start a capture group
[0-9] match any character 0 through 9
[a-z] match any character [a-z]
{1,2} but only match the previous 1 or 2 times
\. match a literal . the \ is needed as an escape because . is a special character
) End capture group
Both php and javascript allow for regular expression use.
For an even better visual representation you can check out this tool: http://www.debuggex.com/
If you need each octet by itself (as a match) you can add more parenthesis () around each [0-9][a-z]{1,2} which will then store those octets individually.
Also note that \d is the same as [0-9] but I prefer the later as I find it a little more readable.