regex replace with groups plus optional group in javascript - javascript

What's working: I'm using a regex similar to the one below for phone numbers (the actual regex is here) that can grab extensions
as an optional 4th group. The regex itself is working fine for the
first three groups as shown below
"555-555-5555 x214".replace(/([\d]{3})-([\d]{3})-([\d]{4})(?: +x([\d]+))?/, "$1-$2-$3");
// returns 555-555-5555
What I'm looking for: I can't find the syntax for the replace string to insert the phone extension preceded by a x ONLY if the 4th group is
captured. In my real regex, the phone extension could be marked by
few different character designations which I would like to simply replace
with an "x".
If I use:
"555-555-5555 x214".replace(/([\d]{3})-([\d]{3})-([\d]{4})(?: +x([\d]+))?/, "$1-$2-$3 x4");
// returns 555-555-5555 x214
"555-555-5555".replace(/([\d]{3})-([\d]{3})-([\d]{4})(?: +x([\d]+))?/, "$1-$2-$3 x4");
// will return 555-555-5555 x
In the last example, I get the "x" (which I'm not surprised by), so I'm looking for the syntax to only add the "x" plus group 4 if something was captured.
Is this possible with String.replace? If not, is there a more efficient way to replace these groups with a formatted number type once I've identified their parts?

The String#replace function can also take a function as a 2nd argument, which can be more expressive that a string.
For example:
"555-555-5555".replace(/(\d{3})-(\d{3})-(\d{4})(?: +x(\d+))?/, replacer);
function replacer(match, p1, p2, p3, p4, offset, string) {
return `${p1}-${p2}-${p3}` + (p4 ? ` x${p4}` : '')
}
See MDN documentation for more details :)
Note that your regex needed some modifications. Play with it here: https://regex101.com/r/rtJYTw/1

Related

Applying currency format using replace and a regular expression

I am trying to understand some code where a number is converted to a currency format. Thus, if you have 16.9 it converts to $16.90. The problem with the code is if you have an amount over $1,000, it just returns $1, an amount over $2,000 returns $2, etc. Amounts in the hundreds show up fine.
Here is the function:
var _formatCurrency = function(amount) {
return "$" + parseFloat(amount).toFixed(2).replace(/(\d)(?=(\d{3})+\.)/g, '$1,')
};
(The reason the semicolon is after the bracket is because this function is in itself a statement in another function. That function is not relevant to this discussion.)
I found out that the person who originally put the code in there found it somewhere but didn't fully understand it and didn't test this particular scenario. I myself have not dealt much with regular expressions. I am not only trying to fix it, but to understand how it is working as it is now.
Here's what I've found out. The code between the backslash after the open parenthesis and the backslash before the g is the pattern. The g means global search. The \d means digit, and the (?=\d{3})+\. appears to mean find 3 digits plus a decimal point. I'm not sure I have that right, though, because if that was correct shouldn't it ignore numbers like 5.4? That works fine. Also, I'm not sure what the '$1,' is for. It looks to me like it is supposed to be placed where the digits are, but wouldn't that change all the numbers to $1? Also, why is there a comma after the 1?
Regarding your comment
I was hoping to just edit the regex so it would work properly.
The regex you are currently using is obviously not working for you so I think you should consider alternatives even if they are not too similar, and
Trying to keep the code change as small as possible
Understandable but sometimes it is better to use a code that is a little bit bigger and MORE READABLE than to go with compact and hieroglyphical.
Back to business:
I'm assuming you are getting a string as an argument and this string is composed only of digits and may or may not have a dot before the last 1 or 2 digts. Something like
//input //intended output
1 $1.00
20 $20.00
34.2 $34.20
23.1 $23.10
62516.16 $62,516.16
15.26 $15.26
4654656 $4,654,656.00
0.3 $0.30
I will let you do a pre-check of (assumed) non-valids like 1. | 2.2. | .6 | 4.8.1 | 4.856 | etc.
Proposed solution:
var _formatCurrency = function(amount) {
amount = "$" + amount.replace(/(\d)(?=(\d{3})+(\.(\d){0,2})*$)/g, '$1,');
if(amount.indexOf('.') === -1)
return amount + '.00';
var decimals = amount.split('.')[1];
return decimals.length < 2 ? amount + '0' : amount;
};
Regex break down:
(\d): Matches one digit. Parentheses group things for referencing when needed.
(?=(\d{3})+(\.(\d){0,2})*$). Now this guy. From end to beginning:
$: Matches the end of the string. This is what allows you to match from the end instead of the beginning which is very handy for adding the commas.
(\.(\d){0,2})*: This part processes the dot and decimals. The \. matches the dot. (\d){0,2} matches 0, 1 or 2 digits (the decimals). The * implies that this whole group can be empty.
?=(\d{3})+: \d{3} matches 3 digits exactly. + means at least one occurrence. Finally ?= matches a group after the main expression without including it in the result. In this case it takes three digits at a time (from the end remember?) and leaves them out of the result for when replacing.
g: Match and replace globally, the whole string.
Replacing with $1,: This is how captured groups are referenced for replacing, in this case the wanted group is number 1. Since the pattern will match every digit in the position 3n+1 (starting from the end or the dot) and catch it in the group number 1 ((\d)), then replacing that catch with $1, will effectively add a comma after each capture.
Try it and please feedback.
Also if you haven't already you should (and SO has not provided me with a format to stress this enough) really really look into this site as suggested by Taplar
The pattern is invalid, and your understanding of the function is incorrect. This function formats a number in a standard US currency, and here is how it works:
The parseFloat() function converts a string value to a decimal number.
The toFixed(2) function rounds the decimal number to 2 digits after the decimal point.
The replace() function is used here to add the thousands spearators (i.e. a comma after every 3 digits). The pattern is incorrect, so here is a suggested fix /(\d)(?=(\d{3})+\.)/g and this is how it works:
The (\d) captures a digit.
The (?=(\d{3})+\.) is called a look-ahead and it ensures that the captured digit above has one set of 3 digits (\d{3}) or more + followed by the decimal point \. after it followed by a decimal point.
The g flag/modifier is to apply the pattern globally, that is on the entire amount.
The replacement $1, replaces the pattern with the first captured group $1, which is in our case the digit (\d) (so technically replacing the digit with itself to make sure we don't lose the digit in the replacement) followed by a comma ,. So like I said, this is just to add the thousands separator.
Here are some tests with the suggested fix. Note that it works fine with numbers and strings:
var _formatCurrency = function(amount) {
return "$" + parseFloat(amount).toFixed(2).replace(/(\d)(?=(\d{3})+\.)/g, '$1,');
};
console.log(_formatCurrency('1'));
console.log(_formatCurrency('100'));
console.log(_formatCurrency('1000'));
console.log(_formatCurrency('1000000.559'));
console.log(_formatCurrency('10000000000.559'));
console.log(_formatCurrency(1));
console.log(_formatCurrency(100));
console.log(_formatCurrency(1000));
console.log(_formatCurrency(1000000.559));
console.log(_formatCurrency(10000000000.559));
Okay, I want to apologize to everyone who answered. I did some further tracing and found out the JSON call which was bringing in the amount did in fact have a comma in it, so it is just parsing that first digit. I was looking in the wrong place in the code when I thought there was no comma in there already. I do appreciate everyone's input and hope you won't think too bad of me for not catching that before this whole exercise. If nothing else, at least I now know how that regex operates so I can make use of it in the future. Now I just have to go about removing that comma.
Have a great day!
Assuming that you are working with USD only, then this should work for you as an alternative to Regular Expressions. I have also included a few tests to verify that it is working properly.
var test1 = '16.9';
var test2 = '2000.5';
var test3 = '300000.23';
var test4 = '3000000.23';
function stringToUSD(inputString) {
const splitValues = inputString.split('.');
const wholeNumber = splitValues[0].split('')
.map(val => parseInt(val))
.reverse()
.map((val, idx, arr) => idx !== 0 && (idx + 1) % 3 === 0 && arr[idx + 1] !== undefined ? `,${val}` : val)
.reverse()
.join('');
return parseFloat(`${wholeNumber}.${splitValues[1]}`).toFixed(2);
}
console.log(stringToUSD(test1));
console.log(stringToUSD(test2));
console.log(stringToUSD(test3));
console.log(stringToUSD(test4));

Regex to extract two digits from phone number

I am trying to take only 2 characters from my phone no.
I have used regex match ^\+55 and this will return the following example.
Phone No : +5546342543
Result : 46342543
Expected Result was only 46.
I don't want to use substring for the answer instead I want to extract that from the phone no with regex.
Can anybody help me on this.
Thank you.
The pattern you used - ^\+55 - matches a literal + in the beginning of the string and two 5s right after.
46 is the substring that appears right after the initial +55. In some languages, you can use a look-behind (see example) to match some text preceded with another.
JavaScript has no look-behind support, so, you need to resort to capturing groups.
You can use string#match or RegExp#exec to obtain that captured text marked with round brackets:
var s = '+5546342543';
if ((m=/^\+55(\d{2})/.exec(s)) !== null) {
document.write(m[1]);
}
This example handles the case when you get no match.
Just try with:
'+5546342543'.match(/^\+55(\d{2})/)[1];
This will get what you want
"+5546342543".match(/^\+55(.*)/)[1]
This solves your problem ?
phoneNumber = "+5546342543"
phone = phoneNumber.substr(3) // returns "46342543"
twoDigits = phoneNumber.substr(3,2) // returns "46"
Using the substr() method as quoted :
The substr() method returns the characters in a string beginning at the specified location through the specified number of characters.
Syntax: str.substr(start[, length])
Source : Mozilla MDN

Javascript regex match returning a string with comma at the end

Just as the title says...i'm trying to parse a string for example
2x + 3y
and i'm trying to get only the coefficients (i.e. 2 and 3)
I first tokenized it with space character as delimiter giving me "2x" "+" "3y"
then i parsed it again to this statement to get only the coefficients
var number = eqTokens[i].match(/(\-)?\d+/);
I tried printing the output but it gave me "2,"
why is it printing like this and how do i fix it? i tried using:
number = number.replace(/[,]/, "");
but this just gives me an error that number.replace is not a function
What's wrong with this?
> "2x + 3y".match(/-?\d+(?=[A-Za-z]+)/g)
[ '2', '3' ]
The above regex would match the numbers only if it's followed by one or more alphabets.
Match is going to return an array of every match. Since you put the optional negative in a parentheses, it's another capture group. That capture group has one term and it's optional, so it'll return an empty match in addition to your actual match.
Input 2x -> Your output: [2,undefined] which prints out as "2,"
Input -2x -> Your output: [2,-]
Remove the parentheses around the negative.
This is just for the sake of explaining why your case is breaking but personally I'd use Avinash's answer.

split line via regex in javascript?

I have this structure of text :
1.6.1 Members................................................................ 12
1.6.2 Accessibility.......................................................... 13
1.6.3 Type parameters........................................................ 13
1.6.4 The T generic type aka <T>............................................. 13
I need to create JS objects :
{
num:"1.6.1",
txt:"Members"
},
{
num:"1.6.2",
txt:"Accessibility"
} ...
That's not a problem.
The problem is that I want to extract values via Regex split via positive lookahead :
Split via the first time you see that next character is a letter
What have i tried :
'1.6.1 Members........... 12'.split(/\s(?=(?:[\w\. ])+$)/i)
This is working fine :
["1.6.1", "Members...........", "12"] // I don't care about the 12.
But If I have 2 words or more :
'1.6.3 Type parameters................ 13'.split(/\s(?=(?:[\w\. ])+$)/i)
The result is :
["1.6.3", "Type", "parameters................", "13"] //again I don't care about 13.
Of course I can join them , but I want the words to be together.
Question :
How can I enhance my regex NOT to split words ?
Desired result :
["1.6.3", "Type parameters"]
or
["1.6.3", "Type parameters........"] // I will remove extras later
or
["1.6.3", "Type parameters........13"]// I will remove extras later
NB
I know I can do split via " " or by other simpler solution but I'm seeking ( for pure knowledge) for an enhancement for my solution which uses positive lookahead split.
Full online example :
nb2 :
The text can contain capital letter in the middle also.
You can use this regex:
/^(\d+(?:\.\d+)*) (\w+(?: \w+)*)/gm
And get your desired matches using matched group #1 and matched group #2.
Online Regex Demo
Update: For String#split you can use this regex:
/ +(?=[A-Z\d])/g
Regex Demo
Update 2: With the possibility of having capital letters also in chapter names following more complex regex is needed:
var re = /(\D +(?=[a-z]))| +(?=[a-z\d])/gmi;
var str = '1.6.3 Type Foo Bar........................................................ 13';
var m = str.split( re );
console.log(m[0], ',', m.slice(1, -1).join(''), ',', m.pop() );
//=> 1.6.3 , Type Foo Bar........................................................ , 13
EDIT: Since you added 1.6.1 The .net 4.5 framework.... to the requirements, we can tweak the answer to this:
^([\d.]+) ((?:[^.]|\.(?!\.))+)
And if you want to allow sequences of up to three dots in the title, as in 1.6.1 She said... Boo!..........., it's an easy tweak from there ({3} quantifier):
^([\d.]+) ((?:[^.]|\.(?!\.{3}))+)
Original:
^([\d.]+) ([^.]+)
In the regex demo, see the Groups in the right pane.
To retrieve Groups 1 and 2, something like:
var myregex = /^([\d.]+) ((?:[^.]|\.(?!\.))+)/mg;
var theMatchObject = myregex.exec(yourString);
while (theMatchObject != null) {
// the numbers: theMatchObject[1]
// the title: theMatchObject[1]
theMatchObject = myregex.exec(yourString);
}
OUTPUT
Group 1 Group 2
1.6.1 Members
1.6.2 Accessibility
1.6.3 Type parameters
1.6.4 The T generic type aka <T>**
1.6.1 The .net 4.5 framework
Explanation
^ asserts that we are a the beginning of the line
The parentheses in ([\d.]+) capture digits and dots to Group 1
The parentheses in ((?:[^.]|\.(?!\.))+) capture to Group 2...
[^.] one char that is not a dot, | OR...
\.(?!\.) a dot that is not followed by a dot...
+ one or more times
You can use this pattern too:
var myStr = "1.6.1 Members................................................................ 12\n1.6.2 Accessibility.......................................................... 13\n1.6.3 Type parameters........................................................ 13\n1.6.4 The T generic type aka <T>............................................. 13";
console.log(myStr.split(/ (.+?)\.{2,} ?\d+$\n?/m));
About a way with a lookahead :
I don't think it is possible. Because the only way to skip a character (here a space between two words), is to match it on the occasion of the previous occurence of a space (between the number and the first word). In other words, you use the fact that characters can not be matched more than one time.
But if, except the space where you want to split, all the pattern is enclosed in a lookahead, and since the substring matched by this subpattern in the lookahead isn't a part of the match result (in other words, it's only a check and the corresponding characters are not eaten by the regex engine), you can't skip the next spaces, and the regex engine will continue his road until the next space character.

Javascript Regular Expression: alternation and nesting

Here is what i've got so far:
/(netscape)|(navigator)\/(\d+)(\.(\d+))?/.test(UserAgentString.toLowerCase()) ? ' netscape'+RegExp.$3+RegExp.$4 : ''
I'm trying to do several different things here.
(1). I want to match either netscape or navigator, and it must be followed by a single slash and one or more digits.
(2). It can optionally follow those digits with up to one of: one period and one or more digits.
The expression should evaluate to an empty string if (1) is not true.
The expression should return ' netscape8' if UserAgentString is Netscape/8 or Navigator/8.
The expression should return ' netscape8.4' if UserAgentString is Navigator/8.4.2.
The regex is not working. In particular (this is an edited down version for my testing, and it still doesn't work):
// in Chrome this produces ["netscape", "netscape", undefined, undefined]
(/(netscape)|(navigator)\/(\d+)/.exec("Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.7.5) Gecko/20060912 Netscape/8.1.2".toLowerCase()))
Why does the 8 not get matched? Is it supposed to show up in the third entry or the fourth?
There are a couple things that I want to figure out if they are supported. Notice how I have 5 sets of capture paren groups. group #5 \d+ is contained within group #4: \.(\d+). Is it possible to retrieve the matched groups?
Also, what happens if I specify a group like this? /(\.\d+)*/ This matches any number of "dot-number" strings contatenated together (like in a version number). What's RegExp.$1 supposed to match here?
Your "or" expression is not doing what you think.
Simplified, you're doing this:
(a)|(b)cde
Which matches either a or bcde.
Put parentheses around your "or" expression: ((a)|(b))cde and that will match either acde or bcde.
I find http://regexpal.com/ to be a very useful tool for quickly checking my regex syntax.
Regex (netscape|navigator)\/(\d+(?:\.\d+)?) will return 2 groups (if match found):
netscape or navigator
number behind the name
var m = /(netscape|navigator)\/(\d+(?:\.\d+)?)/.exec(text);
if (m != null) {
var r = m[1] + m[2];
}
(....) Creates a group. Everything inside that group is returned with that group's variable.
The following will match netscape or navigator and the first two numbers of the version separated by a period.
$1 $2
|------------------| |------------|
/(netscape|navigator)[^\/]*\/((\d+)\.(\d+))/
The final code looks like this:
/(netscape|navigator)[^\/]*\/((\d+)\.(\d+))/.test(
navigator.userAgent.toLowerCase()
) ? 'netscape'+RegExp.$2 : ''
Which will give you
netscape5.0
Check out these great tuts (there are many more):
http://perldoc.perl.org/perlrequick.html
http://perldoc.perl.org/perlre.html
http://perldoc.perl.org/perlretut.html

Categories