Extract Number from a String explanation - javascript

I was looking for a way to extract Number/Numbers from a string.
Example:
const string = "Monthly Weather Review, Volume 129, Issues 9-12"
and i found a solution on this site. The problem is that i don't understand it very well.
Can someone explain me what actually happened in the line below ?
let res = string.match(/[+-]?\d+(?:\.\d+)?/g).map(Number); //return [129, 9, -12]

The regular expression:
[+-]?\d+(?:\.\d+)?
will match substrings which:
[+-]? - May start with a + or a - (character set, made optional by the following ?)
\d+ - Then, contains one or more digit characters (numbers from 0 to 9)
(?:\.\d+)? - Non-capturing group, made optional by ?:
\.\d+ - May match a literal ., followed by more digit characters
The only part of the pattern that isn't used by your input is the decimals part at the end. For an example, in the string
foo +12.34
it will match +12.34.
Regular expression matches extract substrings from a larger string, as an array. The .map(Number) uses Array.prototype.map to transform all elements of one array into another array - it turns the array of strings into an array of numbers.

For starters .match(regex) is a method on the String prototype that given a regular expression, will execute it and return to you the results or null if there aren't any matches. See this for more info:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match
For the regular expression, /[+-]?\d+(?:\.\d+)?/g:
[+-]? match optionally a + or -
\d+ match one or more numerical digits
(?:\.\d+)? non-capturing group of . followed by digits (you're getting the decimal values here). This is optional with the ?. It looks like they only want integers here.
/g - g for global. all matches returned.
More info on regular expressions: https://www.rexegg.com/regex-quickstart.html
.map(Number) will iterate over each string item in the matches array (all matches are strings because that is what the first thing was) and coerce to number by using the Number() function.
More info on Number: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number
More info on .map: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map

Related

Split a String by a Regex in JavaScript

I have the string:
"Vl.55.16b25.3d.42b50.59b30.90.24b35.3d.56.67b70.Tv.54b30.Vl.41b35.Tv.Bd.71b50.3d.99b20.03b50.Tv.73b50.Vl.05b25.12b40.Bd.Tv.82b25."
How to detached get results like:
["Vl.55.16b25", 3d.42.b50.59b30.90.24b35, 3d.56.67b70, ...]
The logic:
Condition 1: The End will be start b and 2 number. Example: b20, b25.
If pass condition 1 I need to check condition 2.
Condition 2: maybe like "3d" or 2 characters. If don't match condition 2 we need to pass the next character to the current block.
Many thanks.
If I understand your question correctly, the following code should work:
var string = "Vl.55.16b25.3d.42b50.59b30.90.24b35.3d.56.67b70.Tv.54b30.Vl.41b35.Tv.Bd.71b50.3d.99b20.03b50.Tv.73b50.Vl.05b25.12b40.Bd.Tv.82b25.";
console.log(string.split(/(?<=b\d\d)\.(?=3d)/g))
Explanation:
(?<=) is look-behind.
b matches the literal character "b".
\d matches any digit so \d\d will match two digits in a row.
\. matches a literal ".", it needs the \ before it because otherwise it would match any character.
(?=) is look-ahead.
The g flag stands for global so the string will be split up at every occurrence of the regular expression.
This means that the string will be split at every occurrence of "." that is preceded the letter "b" then two digits, and followed by "3d".
Assuming you want to separate by last having 'b' and two digits followed by 3d, two digits or the end of string (this is necessary) and by omitting leading dot, you could take the following regular expression.
const
string = "Vl.55.16b25.3d.42b50.59b30.90.24b35.3d.56.67b70.Tv.54b30.Vl.41b35.Tv.Bd.71b50.3d.99b20.03b50.Tv.73b50.Vl.05b25.12b40.Bd.Tv.82b25.",
result = string.match(/[^.].*?b\d\d(?=\.(3d|\D\D|$))/g);
console.log(result);

Regex expression for serial numbers

I have the following specifications for a regex:
-> The string starts with a string of three numbers
-> It is followed by a '-'
-> That is followed by three uppercase vowels
-> That is followed by a '-'
-> That is followed by three numbers
-> That is followed by a final '-'
-> That is followed by the last three uppercase vowels.
-> Second set of numbers can not equal the first.
-> The second group of letters can not equal the first.
-> The groups of numbers may not contain zero.
A passable string is:
368-IOU-789-AIO.
An invalid string is:
368-AEO-368-AEI
354-AOU-431-AOU
Currently, I have something like this:
([0-9]+[0-9]+[0-9]+[/AEIOU/]+[0-9]+[0-9]+[0-9])
What you have won't work since + means "one or more of". For example, the sequence [0-9]+[0-9]+[0-9]+ will match anywhere between three and an infinite number of digits.
In addition, your current attempt:
allows for one to an infinite number of vowels (and possibly / character);
doesn't require a vowel set at the end;
doesn't require the - separators; and
may allow for arbitrary content before and after the match.
You should be able to use the {count} specifier to get an exact quantity. All but one of those limitations can be done with any basic regex engine, with something like:
^[1-9]{3}-[AEIOU]{3}-[1-9]{3}-[AEIOU]{3}$
The ^ and $ anchors means start and end of string, [1-9]{3} gives you exactly three non-zero digits, [AEIOU]{3} gives you exactly three vowels, and - gives you the literal separator character.
The "groups cannot be identical" rule is a little more problematic. I would just post process for that to ensure it's not violated. The following pseudo-code is what I mean:
def isValid(str):
if not str.regex_match("^[1-9]{3}-[AEIOU]{3}-[1-9]{3}-[AEIOU]{3}$"):
return false
return str[0..2] != str[8..10]
and str[4..6] != str[12..14]
The alternative will be a rather complex regex that future developers will probably curse you for inflicting on them :-)
Note that your "The groups of numbers may not contain zero" is a little ambiguous in that it may mean no zeros are allowed or just 000 is not allowed. I've assumed the former but it's easy adjustable to cater for the latter:
def isValid(str):
if not str.regex_match("^[0-9]{3}-[AEIOU]{3}-[0-9]{3}-[AEIOU]{3}$"):
return false
return str[0..2] != str[8..10]
and str[4..6] != str[12..14]
and str[0..2] != "000"
and str[8..10] != "000"
You can use capture group and backreferences
^([0-9]{3})-([AEIOU]{3})-(?:(?!\1)[0-9]){3}-(?:(?!\2)[AEIOU]){3}$
Regex Demo
The groups of numbers may not contain zero. From this if you meant only digits between 1 to 9 then you can replace [0-9] with [1-9]
If you don't want to have 000 then you can add a negative lookahead ^(?!.*000) to avoid matching 000

String replace with regexp overwrites non matching character

The idea is replacing in a string all decimal numbers without a digit before the decimal point with the zero so .03 sqrt(.02) would become 0.03 sqrt(0.02).
See the code below for a sample, the problem is that the replacement overwrites the opening parenthesis when there's one preceding the decimal point. I think that the parenthesis does not pertain to the matching string, does it?
let s='.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(?:^|\D)\.(\d+)/g , "0.$1");
console.log(s)
Make your initial group capturing, not non-capturing, and use it in the replacement:
s=s.replace(/(^|[^\d])\.(\d+)/g , "$10.$2");
// ^---- capturing, not non-capturing
Example:
let s = '.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(^|[^\d])\.(\d+)/g , "$10.$2");
console.log(s)
I think that the parenthesis does not pertain to the matching string, does it?
It does, because it matches [^\d].
Side note: As Wiktor points out, you can use \D instead of [^\d].
Side note 2: JavaScript regexes are finally getting lookbehind (in the living specification, and will be in the ES2018 spec snapshot), so an alternate way to do this with modern JavaScript environments would be a negative lookbehind:
s=s.replace(/(?<!\d)\.(\d+)/g , "0.$1");
// ^^^^^^^--- negative lookbehind for a digit
That means basically "If there's a digit here, don't match." (There's also positive lookbehind, (?<=...).)
Example:
let s = '.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(?<!\d)\.(\d+)/g , "0.$1");
console.log(s)
A parenthesis is a nn-digit, thus it is matched with [^\d] and removed.
The solution is to match and capture the part before a dot and then insert back using a replacement backreference.
Use
.replace(/(^|\D)\.(\d+)/g , "$10.$2")
See the regex demo.
Pattern details
(^|\D) - Capturing group 1 (later referred to with $1 from the replacement pattern): a start of string or any non-digit ([^\d] = \D)
\. - a dot
(\d+) - Capturing group 2 (later referred to with $2 from the replacement pattern): 1+ digits.
See the JS demo:
let s='.05 sqrt(.005) another(.33) thisShouldnt(a.b) neither(3.4)'
s=s.replace(/(^|\D)\.(\d+)/g , "$10.$2");
console.log(s)
Note that $10.$2 will be parsed by the RegExp engine as $1 backreference, then 0. text and then $2 backreference, since there are only 2 capturing groups in the pattern, there are no 10 capturing groups and thus $10 will not be considered as a valid token in the replacement pattern.

How does the following code mean two consecutive numbers?

This is from an exercise on FCC beta and i can not understand how the following code means two consecutive numbers seeing how \D* means NOT 0 or more numbers and \d means number, so how does this accumulate to two numbers in a regexp?
let checkPass = /(?=\w{5,})(?=\D*\d)/;
This does not match two numbers. It doesn't really match anything except an empty string, as there is nothing preceding the lookup.
If you want to match two digits, you can do something like this:
(\d)(\d)
Or if you really want to do a positive lookup with the (?=\D*\d) section, you will have to do something like this:
\d(?=\D*\d)
This will match against the last digit which is followed by a bunch of non-digits and a single digit. A few examples (matched numbers highlighted):
2 hhebuehi3
^
245673
^^^^^
2v jugn45
^ ^
To also capture the second digit, you will have to put brackets around both numbers. Ie:
(\d)(?=\D*(\d))
Here it is in action.
In order to do what your original example wants, ie:
number
5+ \w characters
a non-number character
a number
... you will need to precede your original example with a \d character. This means that your lookups will actually match something which isn't just an empty string:
\d(?=\w{5,})(?=\D*\d)
IMPORTANT EDIT
After playing around a bit more with a JavaScript online console, I have worked out the problem with your original Regex.
This matches a string with 5 or more characters, including at least 1 number. This can match two numbers, but it can also match 1 number, 3 numbers, 12 numbers, etc. In order to match exactly two numbers in a string of 5-or-more characters, you should specify the number of digits you want in the second half of your lookup:
let regex = /(?=\w{5,})(?=\D*\d{2})/;
let string1 = "abcd2";
let regex1 = /(?=\w{5,})(?=\D*\d)/;
console.log("string 1 & regex 1: " + regex1.test(string1));
let regex2 = /(?=\w{5,})(?=\D*\d{2})/;
console.log("string 1 & regex 2: " + regex2.test(string1));
let string2 = "abcd23";
console.log("string 2 & regex 2: " + regex2.test(string2));
My original answer was about Regex in a vacuum and I glossed over the fact that you were using Regex in conjunction with JavaScript, which works a little differently when comparing Regex to a string. I still don't know why your original answer was supposed to match two numbers, but I hope this is a bit more helpful.
?= Positive lookahead
w{5,} matches any word character (equal to [a-zA-Z0-9_])
{5,}. matches between 5 and unlimited
\D* matches any character that\'s not a digit (equal to [^0-9])
* matches between zero and unlimited
\d matches a digit (equal to [0-9])
This expression is global - so tries to match all
You can always check your expression using regex101

Is it possible to match half of a string using regex or use half of a matched group?

I have a string of the following form "some_text_AAAABB_some_other_text". There is an arbitrary even number of 'A's in the string and "BB" is a fixed string that follows the 'A's. Assuming that there is 2n 'A's I would like to use a regex to replace the 'A's with a string of 'A's of length n.
For the following string
"some_text_AAAABB_some_other_text"
the result would be
"some_text_AABB_some_other_text"
Is it even possible to achieve this with regex?
I'm using V8 javascript to perform the transformation.
There are two scenarios: 1) number of As is even, 2) number of As is odd.
If you do not care if there is an even or odd number of As, just use
replace(/(A+)\1BB/g, "$1BB")
where (A+) matches and captures into Group 1 one or more As as many as possible and \1 matches the same substring (the same number as is captured into Group 1). Since BB is a fixed string, we just put it into the pattern as a literal.
See this regex demo
If you do not want to modify a string with odd number of As, you need
replace(/(^|[^A-Z])(A+)\2BB/g, "$1$2BB")
See this regex demo
Here, the first capture group captures the start of string ^ or any character other than [A-Z], the second capture group captures 1 or more As, and the backreference now has the ID = 2 - hence, \2 is used.

Categories