Match all number occurrences without any letter following or preceding - javascript

What I'm trying to accomplish:
Assuming an example string:
1 this is a1 my test 1a 12 string 12.123 whatever 1
I would like to have a Regex, that would give me all the occurrences of numbers (floats included), but I want it to skip the number if a letter (or more generally: non-number) precedes or follows it. So a1 and 1a would not match.
I've been struggling with this for a while, I got to this point (not ideal, because it also catches the preceding space):
/(^|\s)\d*\.*\d+/g
But this will also catch the 1a instance... I could also set up something similar, that would skip 1a, but would catch a1...
Can I accomplish this using regex matching?

You can use word boundaries with this regex:
/(?:\.\d+|\b\d+(?:\.\d+)?)\b/g
RegEx Demo

This is not a regex-only answer but maybe that's a good thing, we'll see.
The regex in use here is /^[-+]?(?:\d+(?:\.\d*)?|\d*\.\d+)(?:e\d+)?$/:
var testStr = '.1 this is a1 my test +5 1a 12 string -2.4 12.123 whatever . .02e1 5e5.4 1 1.4e5 1.2.3';
console.log('matches');
console.log(...testStr
.trim()
.split(/\s+/g)
.filter(word => /^[-+]?(?:\d+(?:\.\d*)?|\d*\.\d+)(?:e\d+)?$/
.test(word)
)
);
console.log('mismatches');
console.log(...testStr
.trim()
.split(/\s+/g)
.filter(word => !/^[-+]?(?:\d+(?:\.\d*)?|\d*\.\d+)(?:e\d+)?$/
.test(word)
)
);

For a simple, but not comprehensive solution (assuming numeric types used in the given example string, no negative numbers,scientific notation,etc.,), try this:
\d*\.*\d+
It removes the \s from your regex you developed, which matches the preceding space.
\d* will match all of the numbers (digits [0-9]).
Adding \.*\d+ will match floats (decimal followed by digits [0-9]).

Try this expression: (?<=^|\s)[-+]?(?:[0-9]*\.[0-9]+|[0-9]+)(?=$|\s) - Regex demo
JavaScript supported: (?:^|\s)([-+]?(?:[0-9]*\.[0-9]+|[0-9]+))(?=$|\s) - Regex demo
This expression supports floating-point numbers with an optional sign.
JS does not support positive lookbehind, so it was replaced by non-capturing group. The numbers are captured by the 1st group.

Related

Regex-How can I filter out any grouping of 3 numbers? [duplicate]

I'm attempting to string match 5-digit coupon codes spread throughout a HTML web page. For example, 53232, 21032, 40021 etc... I can handle the simpler case of any string of 5 digits with [0-9]{5}, though this also matches 6, 7, 8... n digit numbers. Can someone please suggest how I would modify this regular expression to match only 5 digit numbers?
>>> import re
>>> s="four digits 1234 five digits 56789 six digits 012345"
>>> re.findall(r"\D(\d{5})\D", s)
['56789']
if they can occur at the very beginning or the very end, it's easier to pad the string than mess with special cases
>>> re.findall(r"\D(\d{5})\D", " "+s+" ")
Without padding the string for special case start and end of string, as in John La Rooy answer one can use the negatives lookahead and lookbehind to handle both cases with a single regular expression
>>> import re
>>> s = "88888 999999 3333 aaa 12345 hfsjkq 98765"
>>> re.findall(r"(?<!\d)\d{5}(?!\d)", s)
['88888', '12345', '98765']
full string: ^[0-9]{5}$
within a string: [^0-9][0-9]{5}[^0-9]
Note: There is problem in using \D since \D matches any character that is not a digit , instead use \b.
\b is important here because it matches the word boundary but only at end or beginning of a word .
import re
input = "four digits 1234 five digits 56789 six digits 01234,56789,01234"
re.findall(r"\b\d{5}\b", input)
result : ['56789', '01234', '56789', '01234']
but if one uses
re.findall(r"\D(\d{5})\D", s)
output : ['56789', '01234']
\D is unable to handle comma or any continuously entered numerals.
\b is important part here it matches the empty string but only at end or beginning of a word .
More documentation: https://docs.python.org/2/library/re.html
More Clarification on usage of \D vs \b:
This example uses \D but it doesn't capture all the five digits number.
This example uses \b while capturing all five digits number.
Cheers
A very simple way would be to match all groups of digits, like with r'\d+', and then skip every match that isn't five characters long when you process the results.
You probably want to match a non-digit before and after your string of 5 digits, like [^0-9]([0-9]{5})[^0-9]. Then you can capture the inner group (the actual string you want).
You could try
\D\d{5}\D
or maybe
\b\d{5}\b
I'm not sure how python treats line-endings and whitespace there though.
I believe ^\d{5}$ would not work for you, as you likely want to get numbers that are somewhere within other text.
I use Regex with easier expression :
re.findall(r"\d{5}", mystring)
It will research 5 numerical digits. But you have to be sure not to have another 5 numerical digits in the string

String replace with regexp overwrites non matching character

The idea is replacing in a string all decimal numbers without a digit before the decimal point with the zero so .03 sqrt(.02) would become 0.03 sqrt(0.02).
See the code below for a sample, the problem is that the replacement overwrites the opening parenthesis when there's one preceding the decimal point. I think that the parenthesis does not pertain to the matching string, does it?
let s='.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(?:^|\D)\.(\d+)/g , "0.$1");
console.log(s)
Make your initial group capturing, not non-capturing, and use it in the replacement:
s=s.replace(/(^|[^\d])\.(\d+)/g , "$10.$2");
// ^---- capturing, not non-capturing
Example:
let s = '.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(^|[^\d])\.(\d+)/g , "$10.$2");
console.log(s)
I think that the parenthesis does not pertain to the matching string, does it?
It does, because it matches [^\d].
Side note: As Wiktor points out, you can use \D instead of [^\d].
Side note 2: JavaScript regexes are finally getting lookbehind (in the living specification, and will be in the ES2018 spec snapshot), so an alternate way to do this with modern JavaScript environments would be a negative lookbehind:
s=s.replace(/(?<!\d)\.(\d+)/g , "0.$1");
// ^^^^^^^--- negative lookbehind for a digit
That means basically "If there's a digit here, don't match." (There's also positive lookbehind, (?<=...).)
Example:
let s = '.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(?<!\d)\.(\d+)/g , "0.$1");
console.log(s)
A parenthesis is a nn-digit, thus it is matched with [^\d] and removed.
The solution is to match and capture the part before a dot and then insert back using a replacement backreference.
Use
.replace(/(^|\D)\.(\d+)/g , "$10.$2")
See the regex demo.
Pattern details
(^|\D) - Capturing group 1 (later referred to with $1 from the replacement pattern): a start of string or any non-digit ([^\d] = \D)
\. - a dot
(\d+) - Capturing group 2 (later referred to with $2 from the replacement pattern): 1+ digits.
See the JS demo:
let s='.05 sqrt(.005) another(.33) thisShouldnt(a.b) neither(3.4)'
s=s.replace(/(^|\D)\.(\d+)/g , "$10.$2");
console.log(s)
Note that $10.$2 will be parsed by the RegExp engine as $1 backreference, then 0. text and then $2 backreference, since there are only 2 capturing groups in the pattern, there are no 10 capturing groups and thus $10 will not be considered as a valid token in the replacement pattern.

How does the following code mean two consecutive numbers?

This is from an exercise on FCC beta and i can not understand how the following code means two consecutive numbers seeing how \D* means NOT 0 or more numbers and \d means number, so how does this accumulate to two numbers in a regexp?
let checkPass = /(?=\w{5,})(?=\D*\d)/;
This does not match two numbers. It doesn't really match anything except an empty string, as there is nothing preceding the lookup.
If you want to match two digits, you can do something like this:
(\d)(\d)
Or if you really want to do a positive lookup with the (?=\D*\d) section, you will have to do something like this:
\d(?=\D*\d)
This will match against the last digit which is followed by a bunch of non-digits and a single digit. A few examples (matched numbers highlighted):
2 hhebuehi3
^
245673
^^^^^
2v jugn45
^ ^
To also capture the second digit, you will have to put brackets around both numbers. Ie:
(\d)(?=\D*(\d))
Here it is in action.
In order to do what your original example wants, ie:
number
5+ \w characters
a non-number character
a number
... you will need to precede your original example with a \d character. This means that your lookups will actually match something which isn't just an empty string:
\d(?=\w{5,})(?=\D*\d)
IMPORTANT EDIT
After playing around a bit more with a JavaScript online console, I have worked out the problem with your original Regex.
This matches a string with 5 or more characters, including at least 1 number. This can match two numbers, but it can also match 1 number, 3 numbers, 12 numbers, etc. In order to match exactly two numbers in a string of 5-or-more characters, you should specify the number of digits you want in the second half of your lookup:
let regex = /(?=\w{5,})(?=\D*\d{2})/;
let string1 = "abcd2";
let regex1 = /(?=\w{5,})(?=\D*\d)/;
console.log("string 1 & regex 1: " + regex1.test(string1));
let regex2 = /(?=\w{5,})(?=\D*\d{2})/;
console.log("string 1 & regex 2: " + regex2.test(string1));
let string2 = "abcd23";
console.log("string 2 & regex 2: " + regex2.test(string2));
My original answer was about Regex in a vacuum and I glossed over the fact that you were using Regex in conjunction with JavaScript, which works a little differently when comparing Regex to a string. I still don't know why your original answer was supposed to match two numbers, but I hope this is a bit more helpful.
?= Positive lookahead
w{5,} matches any word character (equal to [a-zA-Z0-9_])
{5,}. matches between 5 and unlimited
\D* matches any character that\'s not a digit (equal to [^0-9])
* matches between zero and unlimited
\d matches a digit (equal to [0-9])
This expression is global - so tries to match all
You can always check your expression using regex101

Regex not matching 6 repeated numbers

I am trying to get a regular expression to work but am stumped. What I want is to do the inverse of this:
/(\w)\1{5,}/
This regex does the exact opposite of what I'm trying to do. I would like to get everything but a string that has 6 repeating numbers i.e. 111111 or 999999.
Is there a way to use a negative look-around or something with this regex?
You can use this rgex:
/^(?!.*?(\w)\1{5}).*$/gm
RegEx Demo
(?!.*?(\w)\1{5}) is a negative lookaahead that will fail the match if there are 6 consecutive same word characters in it.
I'd rather go with the \d shorthand class for digits since \w also allows letters and an underscore.
^(?!.*(\d)\1{5}).*$
Regex explanation:
^ - Start of string/line anchor
(?!.*(\d)\1{5}) - The negative lookahead checking if after an optional number of characters (.*) we have a digit ((\d)) that is immediately followed with 5 identical digits (\1{5}).
.* - Match 0 or more characters up to the
$ - End of string/line.
See demo. This regex will allow

Javascript positive lookbehind alternative

So, js apparantly doesn't support lookbehind.
What I want is a regex valid in javascript that could mimic that behavior.
Specifically, I have a string that consists of numbers and hyphens to denote a range. As in,
12 - 23
12 - -23
-12 - 23
-12 - -23
Please ignore the spaces. These are the only cases possible, with different numbers, of course.
What I want is to match the first hyphen that separates the numbers and is not a minus sign. In other words, the first hyphen followed by a digit. But the digit shouldn't be part of the match.
So my strings are:
12-23
12--23
-12-23
-12--23
And the match should be the 3rd character in the 1st 2 cases and the 4th character in the last two.
The single regex I need is expected to match the character in brackets.
12(-)23
12(-)-23
-12(-)23
-12(-)-23
This can be achieved using positive lookbehind :
(?<=[0-9])\-
But javascript doesn't support that. I want a regex that essentially does the same thing and is valid in js.
Can anyone help?
I don't know why you want to match the delimiting hyphen, instead of just matching the whole string and capture the numbers:
input.match(/(-?\d+) *- *(-?\d+)/)
The 2 numbers will be in capturing group 1 and 2.
It is possible to write a regex which works for sanitized input (no space, and guaranteed to be valid as shown in the question) by using \b to check that - is preceded by a word character:
\b-
Since the only word characters in the sanitized string is 0-9, we are effectively checking that - is preceded by a digit.
(\d+.*?)(?:\s+(-)\s+)(.*?\d+)
You probably want this though i dont know why there is a diff between expected output of 2nd and 4th.Probably its a typo.You can try this replace by $1$2$3.See demo.
http://regex101.com/r/yR3mM3/26
var re = /(\d+.*?)(?:\s+(-)\s+)(.*?\d+)/gmi;
var str = '12 - 23\n12 - -23\n-12 - 23\n-12 - -23';
var subst = '$1$2$3';
var result = str.replace(re, subst);

Categories