capture with regex in javascript - javascript

I have a string like "ListUI_col_order[01234567][5]". I'd like to capture the two numeric sequences from the string. The last part between the square brackets may contain 2 digits, while the first numeric sequence always contains 8 digits (And the numbers are dynamically changing of course.) Im doing this in javascript and the code for the first part is simple: I get the only 8digit sequence from the string:
var str = $(this).attr('id');
var unique = str.match(/([0-9]){8}/g);
Getting the second part is a bit complicated to me. I cannot simply use:
var column = str.match(/[0-9]{1,2}/g)
Because this will match '01', '23', '45', '67', '5' in our example, It's clear. Although I'm able to get the information what I need as column[4], because the first part always contains 8 digits, but I'd like a nicer way to retrieve the last number.
So I define the contex and I can tell the regex that Im looking for a 1 or 2 digit number which has square brackets directly before and after it:
var column = str.match(/\[[0-9]{1,2}\]/g)
// this will return [5]. which is nearly what I want
So to get Only the numeric data I use parenthesis to capture only the numbers like:
var column = str.match(/\[([0-9]){1,2}\]/g)
// this will result in:
// column[0] = '[5]'
// column[1] = [5]
So my question is how to match the '[5]' but only capture the '5'? I have only the [0-9] between the parenthesis, but this will still capture the square brackets as well

You can get both numbers in one go :
var m = str.match(/\[(\d{8})\]\[(\d{1,2})\]$/)
For your example, this makes ["[01234567][5]", "01234567", "5"]
To get both matches as numbers, you can then do
if (m) return m.slice(1).map(Number)
which builds [1234567, 5]

Unfortunately, JavaScript does not support the lookbehind necessary to do this. In other languages such as PHP, it'd be as simple as /(?<=\[)\d{1,2}(?=\])/, but in JavaScript I am not aware of any way to do this other than use a capturing subpattern as you are here, and getting that index from the result array.
Side-note, it's usually better to put the quantifier inside the capturing group - otherwise you're repeating the group itself, not its contents!

Related

Applying currency format using replace and a regular expression

I am trying to understand some code where a number is converted to a currency format. Thus, if you have 16.9 it converts to $16.90. The problem with the code is if you have an amount over $1,000, it just returns $1, an amount over $2,000 returns $2, etc. Amounts in the hundreds show up fine.
Here is the function:
var _formatCurrency = function(amount) {
return "$" + parseFloat(amount).toFixed(2).replace(/(\d)(?=(\d{3})+\.)/g, '$1,')
};
(The reason the semicolon is after the bracket is because this function is in itself a statement in another function. That function is not relevant to this discussion.)
I found out that the person who originally put the code in there found it somewhere but didn't fully understand it and didn't test this particular scenario. I myself have not dealt much with regular expressions. I am not only trying to fix it, but to understand how it is working as it is now.
Here's what I've found out. The code between the backslash after the open parenthesis and the backslash before the g is the pattern. The g means global search. The \d means digit, and the (?=\d{3})+\. appears to mean find 3 digits plus a decimal point. I'm not sure I have that right, though, because if that was correct shouldn't it ignore numbers like 5.4? That works fine. Also, I'm not sure what the '$1,' is for. It looks to me like it is supposed to be placed where the digits are, but wouldn't that change all the numbers to $1? Also, why is there a comma after the 1?
Regarding your comment
I was hoping to just edit the regex so it would work properly.
The regex you are currently using is obviously not working for you so I think you should consider alternatives even if they are not too similar, and
Trying to keep the code change as small as possible
Understandable but sometimes it is better to use a code that is a little bit bigger and MORE READABLE than to go with compact and hieroglyphical.
Back to business:
I'm assuming you are getting a string as an argument and this string is composed only of digits and may or may not have a dot before the last 1 or 2 digts. Something like
//input //intended output
1 $1.00
20 $20.00
34.2 $34.20
23.1 $23.10
62516.16 $62,516.16
15.26 $15.26
4654656 $4,654,656.00
0.3 $0.30
I will let you do a pre-check of (assumed) non-valids like 1. | 2.2. | .6 | 4.8.1 | 4.856 | etc.
Proposed solution:
var _formatCurrency = function(amount) {
amount = "$" + amount.replace(/(\d)(?=(\d{3})+(\.(\d){0,2})*$)/g, '$1,');
if(amount.indexOf('.') === -1)
return amount + '.00';
var decimals = amount.split('.')[1];
return decimals.length < 2 ? amount + '0' : amount;
};
Regex break down:
(\d): Matches one digit. Parentheses group things for referencing when needed.
(?=(\d{3})+(\.(\d){0,2})*$). Now this guy. From end to beginning:
$: Matches the end of the string. This is what allows you to match from the end instead of the beginning which is very handy for adding the commas.
(\.(\d){0,2})*: This part processes the dot and decimals. The \. matches the dot. (\d){0,2} matches 0, 1 or 2 digits (the decimals). The * implies that this whole group can be empty.
?=(\d{3})+: \d{3} matches 3 digits exactly. + means at least one occurrence. Finally ?= matches a group after the main expression without including it in the result. In this case it takes three digits at a time (from the end remember?) and leaves them out of the result for when replacing.
g: Match and replace globally, the whole string.
Replacing with $1,: This is how captured groups are referenced for replacing, in this case the wanted group is number 1. Since the pattern will match every digit in the position 3n+1 (starting from the end or the dot) and catch it in the group number 1 ((\d)), then replacing that catch with $1, will effectively add a comma after each capture.
Try it and please feedback.
Also if you haven't already you should (and SO has not provided me with a format to stress this enough) really really look into this site as suggested by Taplar
The pattern is invalid, and your understanding of the function is incorrect. This function formats a number in a standard US currency, and here is how it works:
The parseFloat() function converts a string value to a decimal number.
The toFixed(2) function rounds the decimal number to 2 digits after the decimal point.
The replace() function is used here to add the thousands spearators (i.e. a comma after every 3 digits). The pattern is incorrect, so here is a suggested fix /(\d)(?=(\d{3})+\.)/g and this is how it works:
The (\d) captures a digit.
The (?=(\d{3})+\.) is called a look-ahead and it ensures that the captured digit above has one set of 3 digits (\d{3}) or more + followed by the decimal point \. after it followed by a decimal point.
The g flag/modifier is to apply the pattern globally, that is on the entire amount.
The replacement $1, replaces the pattern with the first captured group $1, which is in our case the digit (\d) (so technically replacing the digit with itself to make sure we don't lose the digit in the replacement) followed by a comma ,. So like I said, this is just to add the thousands separator.
Here are some tests with the suggested fix. Note that it works fine with numbers and strings:
var _formatCurrency = function(amount) {
return "$" + parseFloat(amount).toFixed(2).replace(/(\d)(?=(\d{3})+\.)/g, '$1,');
};
console.log(_formatCurrency('1'));
console.log(_formatCurrency('100'));
console.log(_formatCurrency('1000'));
console.log(_formatCurrency('1000000.559'));
console.log(_formatCurrency('10000000000.559'));
console.log(_formatCurrency(1));
console.log(_formatCurrency(100));
console.log(_formatCurrency(1000));
console.log(_formatCurrency(1000000.559));
console.log(_formatCurrency(10000000000.559));
Okay, I want to apologize to everyone who answered. I did some further tracing and found out the JSON call which was bringing in the amount did in fact have a comma in it, so it is just parsing that first digit. I was looking in the wrong place in the code when I thought there was no comma in there already. I do appreciate everyone's input and hope you won't think too bad of me for not catching that before this whole exercise. If nothing else, at least I now know how that regex operates so I can make use of it in the future. Now I just have to go about removing that comma.
Have a great day!
Assuming that you are working with USD only, then this should work for you as an alternative to Regular Expressions. I have also included a few tests to verify that it is working properly.
var test1 = '16.9';
var test2 = '2000.5';
var test3 = '300000.23';
var test4 = '3000000.23';
function stringToUSD(inputString) {
const splitValues = inputString.split('.');
const wholeNumber = splitValues[0].split('')
.map(val => parseInt(val))
.reverse()
.map((val, idx, arr) => idx !== 0 && (idx + 1) % 3 === 0 && arr[idx + 1] !== undefined ? `,${val}` : val)
.reverse()
.join('');
return parseFloat(`${wholeNumber}.${splitValues[1]}`).toFixed(2);
}
console.log(stringToUSD(test1));
console.log(stringToUSD(test2));
console.log(stringToUSD(test3));
console.log(stringToUSD(test4));

split on words except when phrase contains that word

I am trying to split where clauses, I want to split text on AND|OR|NOT except when NOT is in the 'phrase' NOT IN or NOT LIKE or IS NOT NULL.
1st example:
DEVLDATE IS NOT NULL AND STATUS = D AND PICKUPDATE IS NULL
I expect 3 segments, splitting on the AND's, but not on the NOT in this instance.
2nd ex:
(NOT (STATUS IN ('A','X') )) AND LINEHAUL = 0
I want to split on this NOT & AND, also expecting 3 segments in this instance
I'm trying this look ahead from another almost similar example but it is not splitting at all. I have next to zero regex experience. Not sure what I'm missing or if it's even possible.
Thanks in advance.
var ignoreRegex = /(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b)/g
var filterArray = filterBy.split(new RegExp(ignoreRegex));
Try with:
\b(AND|OR|NOT(?!\s+NULL|IN|LIKE))\b
DEMO
About your regex:
(?!.*\b([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL])\b)(?=.*\b(AND|OR|NOT)\b
[NOT IN] - this is character class [...] it will match character
which you put in in, so it can match: N,T,etc. not whole
word/sentence,
([NOT IN]|[NOT LIKE]|[NOT BETWEEN]|[IS NOT NULL]) - this whole part actually can match only one character, because it doesn't use any quantifires or intervals, it doesn't work as you expect at all,
so whole regex should match: some text with AND, OR or NOT, but if line within which the part was matched doesn't consist letters and spaces included in character classes..... so it will not match anything probably.

Javascript regex match returning a string with comma at the end

Just as the title says...i'm trying to parse a string for example
2x + 3y
and i'm trying to get only the coefficients (i.e. 2 and 3)
I first tokenized it with space character as delimiter giving me "2x" "+" "3y"
then i parsed it again to this statement to get only the coefficients
var number = eqTokens[i].match(/(\-)?\d+/);
I tried printing the output but it gave me "2,"
why is it printing like this and how do i fix it? i tried using:
number = number.replace(/[,]/, "");
but this just gives me an error that number.replace is not a function
What's wrong with this?
> "2x + 3y".match(/-?\d+(?=[A-Za-z]+)/g)
[ '2', '3' ]
The above regex would match the numbers only if it's followed by one or more alphabets.
Match is going to return an array of every match. Since you put the optional negative in a parentheses, it's another capture group. That capture group has one term and it's optional, so it'll return an empty match in addition to your actual match.
Input 2x -> Your output: [2,undefined] which prints out as "2,"
Input -2x -> Your output: [2,-]
Remove the parentheses around the negative.
This is just for the sake of explaining why your case is breaking but personally I'd use Avinash's answer.

Get id from url

I have the following example url: #/reports/12/expense/11.
I need to get the id just after the reports -> 12. What I am asking here is the most suitable way to do this. I can search for reports in the url and get the content just after that ... but what if in some moment I decide to change the url, I will have to change my algorythm.
What do You think is the best way here. Some code examples will be also very helpfull.
It's hard to write code that is future-proof since it's hard to predict the crazy things we might do in the future!
However, if we assume that the id will always be the string of consecutive digits in the URL then you could simply look for that:
function getReportId(url) {
var match = url.match(/\d+/);
return (match) ? Number(match[0]) : null;
}
getReportId('#/reports/12/expense/11'); // => 12
getReportId('/some/new/url/report/12'); // => 12
You should use a regular expression to find the number inside the string. Passing the regular expression to the string's .match() method will return an array containing the matches based on the regular expression. In this case, the item of the returned array that you're interested in will be at the index of 1, assuming that the number will always be after reports/:
var text = "#/reports/12/expense/11";
var id = text.match(/reports\/(\d+)/);
alert(id[1]);
\d+ here means that you're looking for at least one number followed by zero to an infinite amount of numbers.
var text = "#/reports/12/expense/11";
var id = text.match("#/[a-zA-Z]*/([0-9]*)/[a-zA-Z]*/")
console.log(id[1])
Regex explanation:
#/ matches the characters #/ literally
[a-zA-Z]* - matches a word
/ matches the character / literally
1st Capturing group - ([0-9]*) - this matches a number.
[a-zA-Z]* - matches a word
/ matches the character / literally
Regular expressions can be tricky (add expensive). So usually if you can efficiently do the same thing without them you should. Looking at your URL format you would probably want to put at least a few constraints on it otherwise the problem will be very complex. For instance, you probably want to assume the value will always appear directly after the key so in your sample report=12 and expense=11, but report and expense could be switched (ex. expense/11/report/12) and you would get the same result.
I would just use string split:
var parts = url.split("/");
for(var i = 0; i < parts.length; i++) {
if(parts[i] === "report"){
this.reportValue = parts[i+1];
i+=2;
}
if(parts[i] === "expense"){
this.expenseValue = parts[i+1];
i+=2;
}
}
So this way your key/value parts can appear anywhere in the array
Note: you will also want to check that i+1 is in the range of the parts array. But that would just make this sample code ugly and it is pretty easy to add in. Depending on what values you are expecting (or not expecting) you might also want to check that values are numbers using isNaN

Javascript regex, determining what group was matched on

I have the following regex in javascript for matching similar to book[n], book[1,2,3,4,5,...,n], book[author="Kristian"] and book[id=n] (n is an arbitrary number):
var opRegex = /\[[0-9]+\]|\[[0-9]+,.*\]|\[[a-zA-Z]+="*.+"*\]/gi;
I can use this in the following way:
// If there is no match in any of the groups hasOp will be null
hasOp = opRegex.exec('books[0]');
/*
Result: ["[0]", index: 5, input: "books[0]"]
*/
As shown above I not only get the value but also the [ and ]. I can avoid this by using groups. So I changed the regex to:
var opRegex = /\[([0-9]+)\]|\[([0-9]+,.*)\]|\[([a-zA-Z]+=".+")\]/gi;
Running the same as above the results will instead be:
["[0]", "0", undefined, undefined, index: 5, input: "books[0]"]
Above I get the groups as index 1, 2 and 3 in the array. For this example the match is in the first but if the match is in the second regex group the match will be in index 2 or the array.
Can I change my first regex to get the value without the brackets or do I go with the grouped approach and a while loop to get the first defined value?
Anything else I'm missing? Is it greedy?
Let me know if you need more information and I'll be happy to provide it.
I have a few suggestions. First, especially since you are looking for literal brackets, avoid the regex brackets when you can (replace [0-9] with \d, for example). Also, you were allowing multiple quotes with the *, so I changed it to "?. But most importantly, I moved the match for the brackets outside the alternation, since they should be in every alternate match. That way, you have the same group no matter which part matches.
/\[(\d+(,\d+)*|[a-zA-Z]+="?[^\]]+"?)\]/gi

Categories