Javascript text manipulation - javascript

Just starting with js, decided to convert Friendfeed to a fluid app, and as part of that I need to be able to parse some numbers out of a string.
How do I complete this function?
function numMessages(text) {
MAGIC HAPPENS (POSSIBLY THE DARK ART OF THE REGEX)
return number;
}
input would be "Direct Messages (15)"
output would be 15.
Instincts tell me to find the first bracket then find the last bracket, and get the text in between but I don't know how to do that. Second instinct tells me to regex for [0-9], but I don't know how to run regexes in js. Jquery is avaliable already if needed.
Thanks!

This should do it:
>>> 'Direct Messages (15)'.match(/[0-9]+/g);
["15"]
Just be careful if you expect more than 1 number to be in the string:
>>> 'Direct 19 Messages (15)'.match(/[0-9]+/g);
["19", "15"]
If you only wanted the first match, you could remove the g flag:
>>> 'Direct 19 Messages (15)'.match(/[0-9]+/);
["19"]
If you only wanted to match what's between the parentheses
>>> 'Direct 19 Messages (15)'.match(/\((.*?)\)/);
["(15)","15"]
// first index will always be entire match, 2nd index will be captured match
As pointed out in the comments, to get the last match:
>>> var matches = 'Direct 19 Messages (15)'.match(/[0-9]+/g);
>>> matches[matches.length-1];
"15"
Though some boundary checking would also be appropriate. :)

var reg = new RegExp('[0-9]+');
reg.exec('Direct Messages (15)');

function numMessages(text) {
return text.match(/\d+/g);
}
This will return all numbers (\d is a special character class equivalent to [0-9]) from the string. the /g makes the regex engine do a global search, thereby returning an array of all matches; if you just want one, remove the /g. Regardless of if your expression is global or not, match returns an array, so you will need to use array notation to get at the element you want.
Note that results from a regular expression match are of type string; if you want numbers, you can use parseInt to convert "15" to 15.
Putting that all together, if you just want one number, as it seems to appear from your initial question text:
function numMessages(text) {
return parseInt(text.match(/\d+/)[0]);
}
str = "Direct Messages (15)";
numMessages(str); // 15

Related

How to parse and capture any measurement unit

In my application, users can customize measurement units, so if they want to work in decimeters instead of inches or in full-turns instead of degrees, they can. However, I need a way to parse a string containing multiple values and units, such as 1' 2" 3/8. I've seen a few regular expressions on SO and didn't find any which matched all cases of the imperial system, let alone allowing any kind of unit. My objective is to have the most permissive input box possible.
So my question is: how can I extract multiple value-unit pairs from a string in the most user-friendly way?
I came up with the following algorithm:
Check for illegal characters and throw an error if needed.
Trim leading and trailing spaces.
Split the string into parts every time there's a non-digit character followed by a digit character, except for .,/ which are used to identify decimals and fractions.
Remove all spaces from parts, check for character misuse (multiple decimal points or fraction bars) and replace '' with ".
Split value and unit-string for each part. If a part has no unit:
If it is the first part, use the default unit.
Else if it is a fraction, consider it as the same unit as the previous part.
Else if it isn't, consider it as in, cm or mm based on the previous part's unit.
If it isn't the first part and there's no way to guess the unit, throw an error.
Check if units mean something, are all of the same system (metric/imperial) and follow a descending order (ft > in > fraction or m > cm > mm > fraction), throw an error if not.
Convert and sum all parts, performing division in the process.
I guess I could use string manipulation functions to do most of this, but I feel like there must be a simpler way through regex.
I came up with a regex:
((\d+('|''|"|m|cm|mm|\s|$) *)+(\d+(\/\d+)?('|''|"|m|cm|mm|\s|$) *)?)|((\d+('|''|"|m|cm|mm|\s) *)*(\d+(\/\d+)?('|''|"|m|cm|mm|\s|$) *))
It only allows fractions at the end and allows to place spaces between values. I've never used regex capturing though, so I'm not so sure how I'll manage to extract the values out of this mess. I'll work again on this tomorrow.
My objective is to have the most permissive input box possible.
Careful, more permissive doesn't always mean more intuitive. An ambiguous input should warn the user, not pass silently, as that might lead them to make multiple mistakes before they realize their input wasn't interpreted like they hoped.
How can I extract multiple value-unit pairs from a string? I guess I could use string manipulation functions to do most of this, but I feel like there must be a simpler way through regex.
Regular expressions are a powerful tool, especially since they work in many programming languages, but be warned. When you're holding a hammer everything starts to look like a nail. Don't try to use a regular expression to solve every problem just because you recently learned how they work.
Looking at the pseudocode you wrote, you are trying to solve two problems at once: splitting up a string (which we call tokenization) and interpreting input according to a grammar (which we call parsing). You should should try to first split up the input into a list of tokens, or maybe unit-value pairs. You can start making sense of these pairs once you're done with string manipulation. Separation of concerns will spare you a headache, and your code will be much easier to maintain as a result.
I've never used regex capturing though, so I'm not so sure how I'll manage to extract the values out of this mess.
If a regular expression has the global (g) flag, it can be used to find multiple matches in the same string. That would be useful if you had a regular expression that finds a single unit-value pair. In JavaScript, you can retrieve a list of matches using string.match(regex). However, that function ignores capture groups on global regular expressions.
If you want to use capture groups, you need to call regex.exec(string) inside a loop. For each successful match, the exec function will return an array where item 0 is the entire match and items 1 and onwards are the captured groups.
For example, /(\d+) ([a-z]+)/g will look for an integer followed by a space and a word. If you made successive calls to regex.exec("1 hour 30 minutes") you would get:
["1 hour", "1", "hour"]
["30 minutes", "30", "minutes"]
null
Successive calls work like this because the regex object keeps an internal cursor you can get or set with regex.lastIndex. You should set it back to 0 before using the regex again with a different input.
You've been using parentheses to isolate OR clauses such as a|b and to apply quantifiers to a character sequence such as (abc)+. If you want to do that without creating capture groups, you can use (?: ) instead. This is called a non-capturing group. It does the same thing as regular parentheses in a regex, but what's inside it won't create an entry in the returned array.
Is there a better way to approach this?
A previous version of this answer concluded with a regular expression even more incomprehensible than the one posted in the question because I didn't know better at the time, but today this would be my recommendation. It's a regular expression that only extracts one token at a time from the input string.
/ (\s+) // 1 whitespace
| (\d+)\/(\d+) // 2,3 fraction
| (\d*)([.,])(\d+) // 4,5,6 decimal
| (\d+) // 7 integer
| (km|cm|mm|m|ft|in|pi|po|'|") // 8 unit
/gi
Sorry about the weird syntax highlighting. I used whitespace to make this more readable but properly formatted it becomes:
/(\s+)|(\d+)\/(\d+)|(\d*)([.,])(\d+)|(\d+)|(km|cm|mm|m|ft|in|pi|po|'|")/gi
This regular expression makes clever uses of capture groups separated by OR clauses. Only the capture groups of one type of token will contain anything. For example, on the string "10 ft", successive calls to exec would return:
["10", "", "", "", "", "", "", "10", ""] (because "10" is an integer)
[" ", " ", "", "", "", "", "", "", ""] (because " " is whitespace)
["ft", "", "", "", "", "", "", "", "ft"] (because "ft" is a unit)
null
A tokenizer function can then do something like this to treat each individual token:
function tokenize (input) {
const localTokenRx = new RegExp(tokenRx);
return function next () {
const startIndex = localTokenRx.lastIndex;
if (startIndex >= input.length) {
// end of input reached
return undefined;
}
const match = localTokenRx.exec(input);
if (!match) {
localTokenRx.lastIndex = input.length;
// there is leftover garbage at the end of the input
return ["garbage", input.slice(startIndex)];
}
if (match.index !== startIndex) {
localTokenRx.lastIndex = match.index;
// the regex skipped over some garbage
return ["garbage", input.slice(startIndex, match.index)];
}
const [
text,
whitespace,
numerator, denominator,
integralPart, decimalSeparator, fractionalPart,
integer,
unit
] = match;
if (whitespace) {
return ["whitespace", undefined];
// or return next(); if we want to ignore it
}
if (denominator) {
return ["fraction", Number(numerator) / Number(denominator)];
}
if (decimalSeparator) {
return ["decimal", Number(integralPart + "." + fractionalPart)];
}
if (integer) {
return ["integer", Number(integer)];
}
if (unit) {
return ["unit", unit];
}
};
}
This function can do all the necessary string manipulation and type conversion all in one place, letting another piece of code do proper analysis of the sequence of tokens. But that would be out of scope for this Stack Overflow answer, especially since the question doesn't specify the rules of the grammar we are willing to accept.
But this is most likely too generic and complex of a solution if all you're trying to do is accept imperial lengths and metric lengths. For that, I'd probably only write a different regular expression for each acceptable format, then test the user's input to see which one matches. If two different expressions match, then the input is ambiguous and we should warn the user.

Javascript regex match returning a string with comma at the end

Just as the title says...i'm trying to parse a string for example
2x + 3y
and i'm trying to get only the coefficients (i.e. 2 and 3)
I first tokenized it with space character as delimiter giving me "2x" "+" "3y"
then i parsed it again to this statement to get only the coefficients
var number = eqTokens[i].match(/(\-)?\d+/);
I tried printing the output but it gave me "2,"
why is it printing like this and how do i fix it? i tried using:
number = number.replace(/[,]/, "");
but this just gives me an error that number.replace is not a function
What's wrong with this?
> "2x + 3y".match(/-?\d+(?=[A-Za-z]+)/g)
[ '2', '3' ]
The above regex would match the numbers only if it's followed by one or more alphabets.
Match is going to return an array of every match. Since you put the optional negative in a parentheses, it's another capture group. That capture group has one term and it's optional, so it'll return an empty match in addition to your actual match.
Input 2x -> Your output: [2,undefined] which prints out as "2,"
Input -2x -> Your output: [2,-]
Remove the parentheses around the negative.
This is just for the sake of explaining why your case is breaking but personally I'd use Avinash's answer.

JavaScript Regex to capture repeating part of decimal

Looking for the best way to take an arbitrary number with potentially repeating decimal part, and discover the repeating part (if present).
Ultimately, I need to decorate the number with overline notation (either with css text-decoration or MathML mline), so I need to know the index of where the repetition begins also.
So I need regex that will get me (or can be used in an algorithm to get) the following results:
1.333 // result: {"pattern": 3, index: 0}
1.5444 // result: {"pattern": 4, index: 1}
1.123123 // result: {"pattern": 123, index: 0}
1.5432121212 // result: {"pattern": 12, index: 4}
1.321 // result: null
1.44212 // result: null
Additional Example (from comments):
1.3333 // result: { "pattern": 3, index: 0}
function getRepetend(num) {
var m = (num+'').match(/\.(\d*?)(\d+?)\2+$/);
return m && {pattern: +m[2], index: m[1].length};
}
It works like this:
First, convert the number to string in order to be able to use regular expressions.
Then, match this regex: /\.(\d*?)(\d+)\2+$/:
\. matches the decimal dot.
(\d*?) matches the digits between the decimal dot and the repetend, and captures the result into backreference number 1.
(\d+?) matches the repetend, and captures it into backreference number 2.
\2+ matches repetitions of the repetend.
$ matches end of string.
Finally, if the match is null (i.e. there is no match), return null.
Otherwise, return an object that contains the repetend (backreference 2) converted to number, and the number of digits between the dot and the repetend (backreference 1).
You could try something like this:
(\d+?)\1+$
http://regex101.com/r/eX8eC3/3
It matched some number of digits and then uses a backreference to try and match the same set immediately afterwards 1 or more times. It's anchored at the end of the string because otherwise it'll be tripped up by, for example:
1.5432121212
It would see the 21 repeating instead of the 12.
Adding ? to the first group to make it non-greedy should fix the problem with 1.3333 as raised by Louis.
You can use this regex with RexExp#exec and use result.index in the resulting object:
var re = /(\d+)\1$/;
var s = '.5439876543212211211';
var result = re.exec( s );
console.log ( result.index );
//=> 14
console.log ( result[1] );
//=> 211
JsFiddle Demo
(.+)(?:\1)+$
Try this.See demo.
http://regex101.com/r/uH3tP3/10
The accepeted answer is OKish as per the given examples in the question are concerned. However if one day you find yourself here what you probably need is exactly what the topic says.
JavaScript Regex to capture repeating part of decimal
So you have the floating part of a string and you want to know if it is repeating or not. The accepted answer fails in practice. You can never guarantee that the string ends when the repeating part ends. Most of the time the string ends with a portion of the repeating part or sometimes *thanks* to the double precision errors jumps to irrelevant figures towards the end. So my suggestion is
/(\d+)\1+(?=\d*$)/g
Now this is not a silver bullet. It's helpful but won't protect you from vampires like 3.1941070707811985 which happens to have no repetend at all. In order to feel it you have to develop deeper mechanisms. However in most cases it's just fine in percieving the repend like in
3.1941070707811985 // 07 which is wrong so prove it later
7.16666666810468 // 666 but reduce it to 6 later
3.00000000000001 // 000000 but reduce it to "" later
0.008928571428571428 // 285714 just fine, do nothing
It is not an easy task to find if the floating part of a decimal has repetends or not in this environment. Most possibly you need to do further processing on the given string and the result of the regex for futher reduction / decision.

capture with regex in javascript

I have a string like "ListUI_col_order[01234567][5]". I'd like to capture the two numeric sequences from the string. The last part between the square brackets may contain 2 digits, while the first numeric sequence always contains 8 digits (And the numbers are dynamically changing of course.) Im doing this in javascript and the code for the first part is simple: I get the only 8digit sequence from the string:
var str = $(this).attr('id');
var unique = str.match(/([0-9]){8}/g);
Getting the second part is a bit complicated to me. I cannot simply use:
var column = str.match(/[0-9]{1,2}/g)
Because this will match '01', '23', '45', '67', '5' in our example, It's clear. Although I'm able to get the information what I need as column[4], because the first part always contains 8 digits, but I'd like a nicer way to retrieve the last number.
So I define the contex and I can tell the regex that Im looking for a 1 or 2 digit number which has square brackets directly before and after it:
var column = str.match(/\[[0-9]{1,2}\]/g)
// this will return [5]. which is nearly what I want
So to get Only the numeric data I use parenthesis to capture only the numbers like:
var column = str.match(/\[([0-9]){1,2}\]/g)
// this will result in:
// column[0] = '[5]'
// column[1] = [5]
So my question is how to match the '[5]' but only capture the '5'? I have only the [0-9] between the parenthesis, but this will still capture the square brackets as well
You can get both numbers in one go :
var m = str.match(/\[(\d{8})\]\[(\d{1,2})\]$/)
For your example, this makes ["[01234567][5]", "01234567", "5"]
To get both matches as numbers, you can then do
if (m) return m.slice(1).map(Number)
which builds [1234567, 5]
Unfortunately, JavaScript does not support the lookbehind necessary to do this. In other languages such as PHP, it'd be as simple as /(?<=\[)\d{1,2}(?=\])/, but in JavaScript I am not aware of any way to do this other than use a capturing subpattern as you are here, and getting that index from the result array.
Side-note, it's usually better to put the quantifier inside the capturing group - otherwise you're repeating the group itself, not its contents!

regular expression for finding decimal/float numbers?

i need a regular expression for decimal/float numbers like 12 12.2 1236.32 123.333 and +12.00 or -12.00 or ...123.123... for using in javascript and jQuery.
Thank you.
Optionally match a + or - at the beginning, followed by one or more decimal digits, optional followed by a decimal point and one or more decimal digits util the end of the string:
/^[+-]?\d+(\.\d+)?$/
RegexPal
The right expression should be as followed:
[+-]?([0-9]*[.])?[0-9]+
this apply for:
+1
+1.
+.1
+0.1
1
1.
.1
0.1
Here is Python example:
import re
#print if found
print(bool(re.search(r'[+-]?([0-9]*[.])?[0-9]+', '1.0')))
#print result
print(re.search(r'[+-]?([0-9]*[.])?[0-9]+', '1.0').group(0))
Output:
True
1.0
If you are using mac, you can test on command line:
python -c "import re; print(bool(re.search(r'[+-]?([0-9]*[.])?[0-9]+', '1.0')))"
python -c "import re; print(re.search(r'[+-]?([0-9]*[.])?[0-9]+', '1.0').group(0))"
You can check for text validation and also only one decimal point validation using isNaN
var val = $('#textbox').val();
var floatValues = /[+-]?([0-9]*[.])?[0-9]+/;
if (val.match(floatValues) && !isNaN(val)) {
// your function
}
This is an old post but it was the top search result for "regular expression for floating point" or something like that and doesn't quite answer _my_ question. Since I worked it out I will share my result so the next person who comes across this thread doesn't have to work it out for themselves.
All of the answers thus far accept a leading 0 on numbers with two (or more) digits on the left of the decimal point (e.g. 0123 instead of just 123) This isn't really valid and in some contexts is used to indicate the number is in octal (base-8) rather than the regular decimal (base-10) format.
Also these expressions accept a decimal with no leading zero (.14 instead of 0.14) or without a trailing fractional part (3. instead of 3.0). That is valid in some programing contexts (including JavaScript) but I want to disallow them (because for my purposes those are more likely to be an error than intentional).
Ignoring "scientific notation" like 1.234E7, here is an expression that meets my criteria:
/^((-)?(0|([1-9][0-9]*))(\.[0-9]+)?)$/
or if you really want to accept a leading +, then:
/^((\+|-)?(0|([1-9][0-9]*))(\.[0-9]+)?)$/
I believe that regular expression will perform a strict test for the typical integer or decimal-style floating point number.
When matched:
$1 contains the full number that matched
$2 contains the (possibly empty) leading sign (+/-)
$3 contains the value to the left of the decimal point
$5 contains the value to the right of the decimal point, including the leading .
By "strict" I mean that the number must be the only thing in the string you are testing.
If you want to extract just the float value out of a string that contains other content use this expression:
/((\b|\+|-)(0|([1-9][0-9]*))(\.[0-9]+)?)\b/
Which will find -3.14 in "negative pi is approximately -3.14." or in "(-3.14)" etc.
The numbered groups have the same meaning as above (except that $2 is now an empty string ("") when there is no leading sign, rather than null).
But be aware that it will also try to extract whatever numbers it can find. E.g., it will extract 127.0 from 127.0.0.1.
If you want something more sophisticated than that then I think you might want to look at lexical analysis instead of regular expressions. I'm guessing one could create a look-ahead-based expression that would recognize that "Pi is 3.14." contains a floating point number but Home is 127.0.0.1. does not, but it would be complex at best. If your pattern depends on the characters that come after it in non-trivial ways you're starting to venture outside of regular expressions' sweet-spot.
Paulpro and lbsweek answers led me to this:
re=/^[+-]?(?:\d*\.)?\d+$/;
>> /^[+-]?(?:\d*\.)?\d+$/
re.exec("1")
>> Array [ "1" ]
re.exec("1.5")
>> Array [ "1.5" ]
re.exec("-1")
>> Array [ "-1" ]
re.exec("-1.5")
>> Array [ "-1.5" ]
re.exec(".5")
>> Array [ ".5" ]
re.exec("")
>> null
re.exec("qsdq")
>> null
For anyone new:
I made a RegExp for the E scientific notation (without spaces).
const floatR = /^([+-]?(?:[0-9]+(?:\.[0-9]+)?|\.[0-9]+)(?:[eE][+-]?[0-9]+)?)$/;
let str = "-2.3E23";
let m = floatR.exec(str);
parseFloat(m[1]); //=> -2.3e+23
If you prefer to use Unicode numbers, you could replace all [0-9] by \d in the RegExp.
And possibly add the Unicode flag u at the end of the RegExp.
For a better understanding of the pattern see https://regexper.com/.
And for making RegExp, I can suggest https://regex101.com/.
EDIT: found another site for viewing RegExp in color: https://jex.im/regulex/.
EDIT 2: although op asks for RegExp specifically you can check a string in JS directly:
const isNum = (num)=>!Number.isNaN(Number(num));
isNum("123.12345678E+3");//=> true
isNum("80F");//=> false
converting the string to a number (or NaN) with Number()
then checking if it is NOT NaN with !Number.isNaN()
If you want it to work with e, use this expression:
[+-]?[0-9]+([.][0-9]+)?([eE][+-]?[0-9]+)?
Here is a JavaScript example:
var re = /^[+-]?[0-9]+([.][0-9]+)?([eE][+-]?[0-9]+)?$/;
console.log(re.test('1'));
console.log(re.test('1.5'));
console.log(re.test('-1'));
console.log(re.test('-1.5'));
console.log(re.test('1E-100'));
console.log(re.test('1E+100'));
console.log(re.test('.5'));
console.log(re.test('foo'));
Here is my js method , handling 0s at the head of string
1- ^0[0-9]+\.?[0-9]*$ : will find numbers starting with 0 and followed by numbers bigger than zero before the decimal seperator , mainly ".". I put this to distinguish strings containing numbers , for example, "0.111" from "01.111".
2- ([1-9]{1}[0-9]\.?[0-9]) : if there is string starting with 0 then the part which is bigger than 0 will be taken into account. parentheses are used here because I wanted to capture only parts conforming to regex.
3- ([0-9]\.?[0-9]): to capture only the decimal part of the string.
In Javascript , st.match(regex), will return array in which first element contains conformed part. I used this method in the input element's onChange event , by this if the user enters something that violates the regex than violating part is not shown in element's value at all but if there is a part that conforms to regex , then it stays in the element's value.
const floatRegexCheck = (st) => {
const regx1 = new RegExp("^0[0-9]+\\.?[0-9]*$"); // for finding numbers starting with 0
let regx2 = new RegExp("([1-9]{1}[0-9]*\\.?[0-9]*)"); //if regx1 matches then this will remove 0s at the head.
if (!st.match(regx1)) {
regx2 = new RegExp("([0-9]*\\.?[0-9]*)"); //if number does not contain 0 at the head of string then standard decimal formatting takes place
}
st = st.match(regx2);
if (st?.length > 0) {
st = st[0];
}
return st;
}
Here is a more rigorous answer
^[+-]?0(?![0-9]).[0-9]*(?![.])$|^[+-]?[1-9]{1}[0-9]*.[0-9]*$|^[+-]?.[0-9]+$
The following values will match (+- sign are also work)
.11234
0.1143424
11.21
1.
The following values will not match
00.1
1.0.00
12.2350.0.0.0.0.
.
....
How it works
The (?! regex) means NOT operation
let's break down the regex by | operator which is same as logical OR operator
^[+-]?0(?![0-9]).[0-9]*(?![.])$
This regex is to check the value starts from 0
First Check + and - sign with 0 or 1 time ^[+-]
Then check if it has leading zero 0
If it has,then the value next to it must not be zero because we don't want to see 00.123 (?![0-9])
Then check the dot exactly one time and check the fraction part with unlimited times of digits .[0-9]*
Last, if it has a dot follow by fraction part, we discard it.(?![.])$
Now see the second part
^[+-]?[1-9]{1}[0-9]*.[0-9]*$
^[+-]? same as above
If it starts from non zero, match the first digit exactly one time and unlimited time follow by it [1-9]{1}[0-9]* e.g. 12.3 , 1.2, 105.6
Match the dot one time and unlimited digit follow it .[0-9]*$
Now see the third part
^[+-]?.{1}[0-9]+$
This will check the value starts from . e.g. .12, .34565
^[+-]? same as above
Match dot one time and one or more digits follow by it .[0-9]+$

Categories