Enhancing regex of thousands separator? - javascript

I saw this beautiful script to add thousands separator to js numbers:
function thousandSeparator(n, sep)
{
var sRegExp = new RegExp('(-?[0-9]+)([0-9]{3})'),
sValue = n + '';
if(sep === undefined)
{
sep = ',';
}
while(sRegExp.test(sValue))
{
sValue = sValue.replace(sRegExp, '$1' + sep + '$2');
}
return sValue;
}
usage :
thousandSeparator(5000000.125, '\,') //"5,000,000.125"
However I'm having a trouble accepting the while loop.
I was thinking to change the regex to : '(-?[0-9]+)([0-9]{3})*' asterisk...
but now , how can I apply the replace statement ?
now I will have $1 and $2..$n
how can I enhance the replace func?
p.s. the code is taken from here http://www.grumelo.com/2009/04/06/thousand-separator-in-javascript/

There is no need to use replace, you can just add toLocaleString instead:
console.log((5000000.125).toLocaleString('en'));
More information: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/toLocaleString

Your assumption
now i will have $1 and $2..$n
is wrong. You have two groups, because you have two sets of brackets.
(-?[0-9]+)([0-9]{3})*
1. ^^^^^^^^^^
2. ^^^^^^^^^^
And then you repeat the second group. If it matches the second time, it overwrites the result of the first match, when it matches the third time, it overwrites ...
That means when matching is complete, $2 contains the value of the last match of that group.
First approach
(\d)(?=(?:[0-9]{3})+\b)
and replace with
$1,
See it on Regexr
It has the flaw that it does insert the comma also on the right of the dot. (I am working on it.)
Second approach
(\d)(?:(?=\d+(?=[^\d.]))(?=(?:[0-9]{3})+\b)|(?=\d+(?=\.))(?=(?:[0-9]{3})+(?=\.)))
and replace with
$1,
See it on Regexr
So now its getting a bit more complicated.
(\d) # Match a digit (will be reinserted)
(?:
(?=\d+(?=[^\d.])) # Use this alternative if there is no fractional part in the digit
(?=(?:\d{3})+ # Check that there are always multiples of 3 digits ahead
\b) # Till a word boundary
| # OR
(?=\d+(?=\.)) # There is a fractional part
(?=(?:\d{3})+ # Check that there are always multiples of 3 digits ahead
(?=\.)) # Till a dot
)
Problem:
does also match the fractional part if there is not the end of the string following.

Here is an ugly script to contrast your beautiful script.
10000000.0001 .toString().split('').reverse().join('')
.replace(/(\d{3}(?!.*\.|$))/g, '$1,').split('').reverse().join('')
Since we don't have lookbehinds, we can cheat by reversing the string and using lookaheads instead.
Here it is again in a more palatable form.
function thousandSeparator(n, sep) {
function reverse(text) {
return text.split('').reverse().join('');
}
var rx = /(\d{3}(?!.*\.|$))/g;
if (!sep) {
sep = ',';
}
return reverse(reverse(n.toString()).replace(rx, '$1' + sep));
}

How about this one:
result = "1235423.125".replace(/\B(?=(\d{3})+(?!\d))/g, ',') //1,235,423.125

Try this one:
result = subject.replace(/([0-9]+?)([0-9]{3})(?=.*?\.|$)/mg, "$1,$2");
Test here

Related

Replacing character after certain character length

I trying to do a function to replace all the character after a certain length of character to asterisk, and so far what i did is like this.
var text = 'ABCDEFG';
var newText = text.substring(0,3) + text.substring(3).replace(/\S/g,'*');
If would give me what I need, but it is fairly inefficient as I understand it, and i trying to change it to make it more efficient.
text.replace(/.{4}$/,'*');
Unfortunately the result is not i expected and also it need to be hardcode the length of 4 counting from behind, which it wont work if the word's length is different.
Is there any regex method that able to replace all other character to asterisk after certain length of character (in this case is 3).
Any help to this will be appreciated. Thanks.
Edited:
As conclusion of the suggestion and discussion
Alternative way to solve the problem, which giving the almost identical result as my solution.
text.replace(/(\w{3}).*/g, "$1"+(new Array(text.length -3 + 1).join( '*' )));
by #Keerthana Prabhakaran
text.replace(new RegExp(".(?=.{0," + (text.length-4) + "}$)", "g"), '*')
by #Wiktor Stribiżew
var longerThanNeeded = "***************************";
var newText = text.substring(0,3) + longerThanNeeded.substring(0,text.length-3);
by #matthewninja
(^.{3}).|. and replace w/ \1*
by #alpha bravo
As discuss with some of the people, due to the efficiency of the code are almost the same even for the original code that I make of. Therefore it stated as side discussing.
Appreciated the helps once again.
I hope I'm not overthinking this.
text.substring(3).replace(/\S/g,'*'); has linear time complexity O(n) and isn't terribly inefficient.
I initially thought of using Array.prototype.join() like so:
var newText = text.substring(0,3) + Array(text.length-2).join("*");
Before realizing that .join() needs to run for every element of the array, which results in linear time complexity, just like your original solution. This wouldn't improve the solution at all; All I've done is inflate the space complexity.
I then went on to think of creating the element to be joined by copying and increasing the size of the prior element, which would get us down to 0(log n) complexity.
Finally, I saw the most obvious solution.
var longerThanNeeded = "***************************";
var newText = text.substring(0,3) + longerThanNeeded.substring(0,text.length-3);
which will run in constant time.
You could use this pattern (^.{3}).|. and replace w/ \1* Demo
(Please note limitation for strings less than 3 characters in length)
( # Capturing Group (1)
^ # Start of string/line
. # Any character except line break
{3} # (repeated {3} times)
) # End of Capturing Group (1)
. # Any character except line break
| # OR
. # Any character except line break
You may use
s.replace(new RegExp(".(?=.{0," + (s.length-4) + "}$)", "g"), '*')
See a JS demo:
var text = 'ABCDEFG';
var threshold = 3; // Start replacing with * after this value
if (text.length > threshold) {
text = text.replace(new RegExp(".(?=.{0," + (text.length-threshold-1) + "}$)", "g"), '*');
}
console.log(text);
Here, if threshold is 3, the pattern will look like .(?=.{0,3}$): it matches any char but a line break char with . that is followed with 0 to 3 chars other than line break chars (.{0,3}) and the end of string position ($). The (?=...) is a positive lookahead that only checks for the pattern match, but does not move the regex index and does not add the matched text to the match value (allowing subsequent consecutive symbol check).
To enable matching line breaks, replace . with [^] or [\s\S].

Javascript regex for hashtag with number only

I want to match these kind of hashtag pattern #1, #4321, #1000 and not:
#01 (with leading zero)
#1aa (has alphabetical char)
But special character like comma, period, colon after the number is fine, such as #1.. Think of it as hashtag at the end of the sentence or phrase. Basically treat these as whitespace.
Basically just # and a number.
My code below doesn't meet requirement because it takes leading zero and it has an ugly space at the end. Although I can always trim the result but doesn't feel it's the right way to do it
reg = new RegExp(/#[0-9]+ /g);
var result;
while((result = reg.exec("hyha #12 gfdg #01 aa #2e #1. #101")) !== null) {
alert("\"" + result + "\"");
}
http://jsfiddle.net/qhoc/d3TpJ/
That string there should just match #12, #1 and #101
Please help to suggest better RegEx string than I had. Thanks.
You could use a regex like:
#[1-9]\d*\b
Code example:
var re = /#[1-9]\d*\b/g;
var str = "#1 hyha #12 #0123 #5 gfdg #2e ";
var matches = str.match(re); // = ["#1", "#12", "#5"]
This should work
reg = /(#[1-9]\d*)(?: |\z)/g;
Notice the capturing group (...) for the hash and number, and the non capturing (?: ..) to match the number only if it is followed by a white space or end of string. Use this if you dont want to catch strings like #1 in #1.. Otherwise the other answer is better.
Then you have to get the captured group from the match iterating over something like this:
myString = 'hyha #12 gfdg #01 aa #2e #1. #101';
match = reg.exec(myString);
alert(match[1]);
EDIT
Whenever you are working with regexps, you should use some kind of tool. For desktop for instance you can use The regex coach and online you can try this regex101
For instance: http://regex101.com/r/zY0bQ8

Replace letters with integer and place "-" (dash) with it

I am currently using the following JavaScript code:
concatedSubstring.replace(/\//g, '-').replace(/[A-Za-z]/g, function(c){
return c.toUpperCase().charCodeAt(0)-64;
});
...to take input in the format "1234/A", "22/B", etc. and output "1234-1" , "22-2", etc.
That is, / becomes -, and the letters become integers with A = 1, B = 2, etc.
I would like to change this so that if the input doesn't contain a "/" the output will still insert a "-" in the spot where the "/" should've been. That is, the input "1234A" should output "1234-1", or "22B" should output "22-2", etc.
The following should work even for inputs containing more than one of your number/letter pattern:
var input = "1234/B 123a 535d";
var replaced = input.replace(/(\d+)(\/?)([A-Za-z])/g, function(m,p1,p2,p3) {
return p1 + "-" + (p3.toUpperCase().charCodeAt(0)-64);
});
alert(replaced); // "1234-2 123-1 535-4"
The regex:
/(\d+)(\/?)([A-Za-z])/g
...will match one or more digits followed by an optional forward slash followed by a single letter, capturing each of those parts for later use.
If you pass a callback to .replace() then it will be called with arguments for the full match (which I'm ignoring for your requirement) and also for any sub-matches (which I use).
str = "1234/B"; or str = "1234B";
str.replace(/(\/[A-Z])|([A-Z])/g,"-"+parseInt(str.charCodeAt(str.indexOf(str.match(/[A-Z]/g)))-64))
You can also .replace(/([0-9])([a-zA-Z])/g,"$1-$2"): this turns a number adjacent to a letter into numberDASHletter, using backreferences (the $1 refers to whatever was in the first set of brackets, $2 to whatever was in the second set of brackets).

remove unwanted commas in JavaScript

I want to remove all unnecessary commas from the start/end of the string.
eg; google, yahoo,, , should become google, yahoo.
If possible ,google,, , yahoo,, , should become google,yahoo.
I've tried the below code as a starting point, but it seems to be not working as desired.
trimCommas = function(s) {
s = s.replace(/,*$/, "");
s = s.replace(/^\,*/, "");
return s;
}
In your example you also want to trim the commas if there's spaces between them at the start or at the end, use something like this:
str.replace(/^[,\s]+|[,\s]+$/g, '').replace(/,[,\s]*,/g, ',');
Note the use of the 'g' modifier for global replace.
You need this:
s = s.replace(/[,\s]{2,}/,""); //Removes double or more commas / spaces
s = s.replace(/^,*/,""); //Removes all commas from the beginning
s = s.replace(/,*$/,""); //Removes all commas from the end
EDIT: Made all the changes - should work now.
My take:
var cleanStr = str.replace(/^[\s,]+/,"")
.replace(/[\s,]+$/,"")
.replace(/\s*,+\s*(,+\s*)*/g,",")
This one will work with opera, internet explorer, whatever
Actually tested this last one, and it works!
What you need to do is replace all groups of "space and comma" with a single comma and then remove commas from the start and end:
trimCommas = function(str) {
str = str.replace(/[,\s]*,[,\s]*/g, ",");
str = str.replace(/^,/, "");
str = str.replace(/,$/, "");
return str;
}
The first one replaces every sequence of white space and commas with a single comma, provided there's at least one comma in there. This handles the edge case left in the comments for "Internet Explorer".
The second and third get rid of the comma at the start and end of string where necessary.
You can also add (to the end):
str = str.replace(/[\s]+/, " ");
to collapse multi-spaces down to one space and
str = str.replace(/,/g, ", ");
if you want them to be formatted nicely (space after each comma).
A more generalized solution would be to pass parameters to indicate behaviour:
Passing true for collapse will collapse the spaces within a section (a section being defined as the characters between commas).
Passing true for addSpace will use ", " to separate sections rather than just "," on its own.
That code follows. It may not be necessary for your particular case but it might be better for others in terms of code re-use.
trimCommas = function(str,collapse,addspace) {
str = str.replace(/[,\s]*,[,\s]*/g, ",").replace(/^,/, "").replace(/,$/, "");
if (collapse) {
str = str.replace(/[\s]+/, " ");
}
if (addspace) {
str = str.replace(/,/g, ", ");
}
return str;
}
First ping on Google for "Javascript Trim": http://www.somacon.com/p355.php. You seem to have implemented this using commas, and I don't see why it would be a problem (though you escaped in the second one and not in the first).
Not quite as sophisticated, but simple with:
',google,, , yahoo,, ,'.replace(/\s/g, '').replace(/,+/g, ',');
You should be able to use only one replace call:
/^( *, *)+|(, *(?=,|$))+/g
Test:
'google, yahoo,, ,'.replace(/^( *, *)+|(, *(?=,|$))+/g, '');
"google, yahoo"
',google,, , yahoo,, ,'.replace(/^( *, *)+|(, *(?=,|$))+/g, '');
"google, yahoo"
Breakdown:
/
^( *, *)+ # Match start of string followed by zero or more spaces
# followed by , followed by zero or more spaces.
# Repeat one or more times
| # regex or
(, *(?=,|$))+ # Match , followed by zero or more spaces which have a comma
# after it or EOL. Repeat one or more times
/g # `g` modifier will run on until there is no more matches
(?=...) is a look ahead will will not move the position of the match but only verify that a the characters are after the match. In our case we look for , or EOL
match() is much better tool for this than replace()
str = " aa, bb,, cc , dd,,,";
newStr = str.match(/[^\s,]+/g).join(",")
alert("[" + newStr + "]")
When you want to replace ",," ",,,", ",,,," and ",,,,," below code will be removed by ",".
var abc = new String("46590,26.91667,75.81667,,,45346,27.18333,78.01667,,,45630,12.97194,77.59369,,,47413,19.07283,72.88261,,,45981,13.08784,80.27847,,");
var pqr= abc.replace(/,,/g,',').replace(/,,/g, ',');
alert(pqr);

Sort lines on webpage using javascript/ regex

I'd like to write a Greasemonkey script that requires finding lines ending with a string ("copies.") & sorting those lines based on the number preceding that string.
The page I'm looking to modify does not use tables unfortunately, just the br/ tag, so I assume that this will involve Regex:
http://www.publishersweekly.com/article/CA6591208.html
(Lines without the matching string will just be ignored.)
Would be grateful for any tips to get me started.
Most times, HTML and RegEx do not go together, and when parsing HTML your first thought should not be RegEx.
However, in this situation, the markup looks simple enough that it should be okay - at least until Publisher Weekly change how they do that page.
Here's a function that will extract the data, grab the appropriate lines, sort them, and put them back again:
($j is jQuery)
function reorderPwList()
{
var Container = $j('#article span.table');
var TargetLines = /^.+?(\d+(?:,\d{3})*) copies\.<br ?\/?>$/gmi
var Lines = Container.html().match( TargetLines );
Lines.sort( sortPwCopies );
Container.html( Lines.join('\n') );
function sortPwCopies()
{
function getCopyNum()
{ return arguments[0].replace(TargetLines,'$1').replace(/\D/g,'') }
return getCopyNum(arguments[0]) - getCopyNum(arguments[1]);
}
}
And an explanation of the regex used there:
^ # start of line
.+? # lazy match one or more non-newline characters
( # start capture group $1
\d+ # match one or more digits (0-9)
(?: # non-capture group
,\d{3} # comma, then three digits
)* # end group, repeat zero or more times
) # end group $1
copies\. # literal text, with . escaped
<br ?\/?> # match a br tag, with optional space or slash just in case
$ # end of line
(For readability, I've indented the groups - only the spaces before 'copies' and after 'br' are valid ones.)
The regex flags gmi are used, for global, multi-line mode, case-insensitive matching.
<OLD ANSWER>
Once you've extracted just the text you want to look at (using DOM/jQuery), you can then pass it to the following function, which will put the relevant information into a format that can then be sorted:
function makeSortable(Text)
{
// Mark sortable lines and put number before main content.
Text = Text.replace
( /^(.*)([\d,]+) copies\.<br \/>/gm
, "SORT ME$2 $1"
);
// Remove anything not marked for sorting.
Text = Text.replace( /^(?!SORT ME).*$/gm , '' );
// Remove blank lines.
Text = Text.replace( /\n{2,}/g , '\n' );
// Remove sort token.
Text = Text.replace( /SORT ME/g , '' );
return Text;
}
You'll then need a sort function to ensure that the numbers are sorted correctly (the standard JS array.sort method will sort on text, and put 100,000 before 20,000).
Oh, and here's a quick explanation of the regexes used here:
/^(.*)([\d,]+) copies\.<br \/>/gm
/.../gm a regex with global-match and multi-line modes
^ matches start of line
(.*) capture to $1, any char (except newline), zero or more times
([\d,]+) capture to $2, any digit or comma, one or more times
copies literal text
\.<br \/> literal text, with . and / escaped (they would be special otherwise)
/^(?!SORT ME).*$/gm
/.../gm again, enable global and multi-line
^ match start of line
(?!SORT ME) a negative lookahead, fails the match if text 'SORT ME' is after it
.* any char (except newline), zero or more times
$ end of line
/\n{2,}/g
\n{2,} a newline character, two or more times
</OLD ANSWER>
you can start with something like this (just copypaste into the firebug console)
// where are the things
var elem = document.getElementById("article").
getElementsByTagName("span")[1].
getElementsByTagName("span")[0];
// extract lines into array
var lines = []
elem.innerHTML.replace(/.+?\d+\s+copies\.\s*<br>/g,
function($0) { lines.push($0) });
// sort an array
// lines.sort(function(a, b) {
// var ma = a.match(/(\d+),(\d+)\s+copies/);
// var mb = b.match(/(\d+),(\d+)\s+copies/);
//
// return parseInt(ma[1] + ma[2]) -
// parseInt(mb[1] + mb[2]);
lines.sort(function(a, b) {
function getNum(p) {
return parseInt(
p.match(/([\d,]+)\s+copies/)[1].replace(/,/g, ""));
}
return getNum(a) - getNum(b);
})
// put it back
elem.innerHTML = lines.join("");
It's not clear to me what it is you're trying to do. When posting questions here, I encourage you to post (a part of) your actual data and clearly indicate what exactly you're trying to match.
But, I am guessing you know very little regex, in which case, why use regex at all? If you study the topic a bit, you will soon know that regex is not some magical tool that produces whatever it is you're thinking of. Regex cannot sort in whatever way. It simply matches text, that's all.
Have a look at this excellent on-line resource: http://www.regular-expressions.info/
And if after reading you think a regex solution to your problem is appropriate, feel free to elaborate on your question and I'm sure I, or someone else is able to give you a hand.
Best of luck.

Categories