What is the safest way to divide two IEEE 754 floating point numbers?
In my case the language is JavaScript, but I guess this isn't important. The goal is to avoid the normal floating point pitfalls.
I've read that one could use a "correction factor" (cf) (e.g. 10 uplifted to some number, for instance 10^10) like so:
(a * cf) / (b * cf)
But I'm not sure this makes a difference in division?
Incidentally, I've already looked at the other floating point posts on Stack Overflow and I've still not found a single post on how to divide two floating point numbers. If the answer is that there is no difference between the solutions for working around floating point issues when adding and when dividing, then just answer that please.
Edit:
I've been asked in the comments which pitfalls I'm referring to, so I thought I'd just add a quick note here as well for the people who don't read the comments:
When adding 0.1 and 0.2, you would expect to get 0.3, but with floating point arithmetic you get 0.30000000000000004 (at least in JavaScript). This is just one example of a common pitfall.
The above issue is discussed many times here on Stack Overflow, but I don't know what can happen when dividing and if it differs from the pitfalls found when adding or multiplying. It might be that that there is no risks, in which case that would be a perfectly good answer.
The safest way is to simply divide them. Any prescaling will either do nothing, or increase rounding error, or cause overflow or underflow.
If you prescale by a power of two you may cause overflow or underflow, but will otherwise make no difference in the result.
If you prescale by any other number, you will introduce additional rounding steps on the multiplications, which may lead to increased rounding error on the division result.
If you simply divide, the result will be the closest representable number to the ratio of the two inputs.
IEEE 754 64-bit floating point numbers are incredibly precise. A difference in one part in almost 10^16 can be represented.
There are a few operations, such as floor and exact comparison, that make even extremely low significance bits matter. If you have been reading about floating point pitfalls you should have already seen examples. Avoid those. Round your output to an appropriate number of decimal places. Be careful adding numbers of very different magnitude.
The following program demonstrates the effects of using each power of 10 from 10 through 1e20 as scale factor. Most get the same result as not multiplying, 6.0, which is also the rational number arithmetic result. Some get a slightly larger result.
You can experiment with different division problems by changing the initializers for a and b. The program prints their exact values, after rounding to double.
import java.math.BigDecimal;
public class Test {
public static void main(String[] args) {
double mult = 10;
double a = 2;
double b = 1.0 / 3.0;
System.out.println("a=" + new BigDecimal(a));
System.out.println("b=" + new BigDecimal(b));
System.out.println("No multiplier result="+(a/b));
for (int i = 0; i < 20; i++) {
System.out.println("mult="+mult + " result="+((a * mult) / (b * mult)));
mult *= 10;
}
}
}
Output:
a=2
b=0.333333333333333314829616256247390992939472198486328125
No multiplier result=6.0
mult=10.0 result=6.000000000000001
mult=100.0 result=6.000000000000001
mult=1000.0 result=6.0
mult=10000.0 result=6.000000000000001
mult=100000.0 result=6.000000000000001
mult=1000000.0 result=6.0
mult=1.0E7 result=6.000000000000001
mult=1.0E8 result=6.0
Floating point division will produce exactly the same "pitfalls" as addition or multiplication operations, and no amount of pre-scaling will fix it - the end result is the end result and it's the internal representation of that in IEEE-754 that causes the "problem".
The solution is to completely forget about these precision issues during calculations themselves, and to perform rounding as late as possible, i.e. only when displaying the results of the calculation, at the point at which the number is converted to a string using the .toFixed() function provided precisely for that purpose.
.tofixed() is not a good solution to divide float numbers.
Using javascript try : 4.11 / 100 and you will be surprised.
4.11 / 100 = 0.041100000000000005
Not all browsers get the same results.
Right solution is to convert float to integer:
parseInt(4.11 * Math.pow(10, 10)) / (100 * Math.pow(10, 10)) = 0.0411
Related
I am working with js numbers and have lack of experience in that. So, I would like to ask few questions:
2.2932600144518896
e+160
is this float or integer number? If it's float how can I round it to two decimals (to get 2.29)? and if it's integer, I suppose it's very large number, and I have another problem than.
Thanks
Technically, as said in comments, this is a Number.
What you can do if you want the number (not its string representation):
var x = 2.2932600144518896e+160;
var magnitude = Math.floor(Math.log10(x)) + 1;
console.log(Math.round(x / Math.pow(10, magnitude - 3)) * Math.pow(10, magnitude - 3));
What's the problem with that? Floating point operation may not be precise, so some "number" different than 0 should appear.
To have this number really "rounded", you can only achieve it through string (than you can't make any operation).
JavaScript only has one Number type so is technically neither a float or an integer.
However this isn't really relevant as the value (or rather representation of it) is not specific to JavaScript and uses E-Notation which is a standard way to write very large/small numbers.
Taking this in to account 2.2932600144518896e+160 is equivalent to 2.2932600144518896 * Math.pow(10,160) and approximately 229 followed by 158 zeroes i.e. very flippin' big.
I did an investigation and I saw that there is a whole website to explain to you what is the correct way to use floats at: http://floating-point-gui.de/
In Java for example I was always using BigDecimal for the floats just to make sure that everything will work correctly without confusing me. For example:
BigDecimal a = new BigDecimal("0.1");
BigDecimal b = new BigDecimal("0.2");
BigDecimal c = a.add(b); // returns a BigDecimal representing exactly 0.3
// instead of this number: 0.30000000000000004 that can
// easily confuse me
However in JavaScript I realized that there is not such thing as a build-in library (at least at the Math object that I've looked)
So the best way that I did find so far was to use a JavaScript library that it is doing exactly that! In my projects I am using this one: https://github.com/dtrebbien/BigDecimal.js
Although I think this is the best library I could find, the library doesn't really matter so much. My main questions are:
Is it the best way possible to use a library like BigDecimal the best way to use floats for JavaScript? or am I missing something? I want to do basic calculations like add, multiplying ...e.t.c.
Is there any other suggested way for example to add two floats in JavaScript?
For example, let's say that I want to have: 0.1 + 0.2 . With the BigDecimal library, I will have:
var a = new BigDecimal("0.1");
var b = new BigDecimal("0.2");
console.log(a.add(b).toString()); //returns exactly 0.3
So is there any other way to add 0.1 + 0.2 and have exactly 0.3, in JavaScript without having to actually round the number ?
For the reference the below example in JavaScript will not work:
var a = 0.1;
var b = 0.2;
console.log(a + b); //This will have as an output: 0.30000000000000004
As all numbers in javascript are 64bit, in general the best way to do floating point aritmetic in javascript is to simply use numbers straight.
However, if you specifically have a problem where you need higher precision than what 64bits will provide, then you need to do something like that.
I urge you, however, to strongly consider if you have such a usecase or not.
If your problem is with some far-down decimals affecting your comparisons, there are functions to deal with that sort of thing specifically. I urge you to look up the Number.prototype.toFixed(n) function and also see this dicussion on almostEquals which proposes that you incorporate the use of an epsilon for float comparisons.
You could use the toFixed(n) method if you are not relying on high precision:
var a = 0.1;
var b = 0.2;
var sum = a + b;
console.log(sum.toFixed(1));
Your calculation shows a precision lost on the 17th floating point which is no big issue in the most cases.
I would advice you to go with toFixed() if you want to get the output right.
There are a few things to consider here.
A lot of times people use the term 'float' when they really mean to use fixed decimal
Fixed Decimal
US currency, for example, is a fixed decimal.
$12.40
$0.90
In this case, there will always be two decimal points.
If your values fit into the range of JavaScript integers (2^53-1) or 9007199254740991 then you can simply work in cents and store all your values that way.
1240
90
Floating point decimal
Now, floating point is where you deal with extreme ranges of numbers and the decimal point actually moves, or floats.
12.394
1294856.9458566
.0000000998984
49586747435893
In cases of floating point, accuracy to 53 significant bits (which means around 15 digits of accuracy). For a lot of things, that is good enough.
Big Decimal Classes
You should only look at big decimal classes if you need something beyond the range of JavaScript native numbers. BigDecimal classes are much slower than native math and you have to use a functional style of programming rather than use the math operators.
JavaScript does not support operator overloading, so there is no built-in way to do natural calculations like '0.1 + 0.2' with BigNumbers.
What you can do is use math.js, which has an expression parser and support for BigNumbers:
math.config({
number: 'bignumber', // Default type of number: 'number' or 'bignumber'
precision: 64 // Number of significant digits for BigNumbers
});
math.eval('0.1 + 0.2'); // returns a BigNumber, 0.3
See docs: http://mathjs.org/docs/datatypes/bignumbers.html
You could convert operands to integers beforehand
(0.1 * 10 + 0.2 * 10) / 10
will give the exact answer, but people probably don't want to deal with that floating point round stuff
I started having problems with decimals which made me learn about the whole floating point math. My question is, what's a viable solution for it?
x = 0.1;
y = 0.2;
num = x + y;
num = Math.round(num * 100) / 100;
or
x = 0.1;
y = 0.2;
num = x + y;
num = num.toFixed(2);
num = Number(num);
Are these both 100% viable options? As in, never have to worry about having the same problem anymore? Which one would you recommend? Or would you recommend a different solution? Any reason to use one solution over the other? Thanks in advance for any help.
EDIT:
Sorry I wasn't more specific. I'm fine with it always being 2 decimals, since that won't be a problem for my project. Obviously if you want more decimals you would use 1000 instead of 100 and toFixed(3), and so on. My main concern is, are the above 2 solutions 100% viable, as in, I won't have to worry about any of the same problems? And also, would you recommend the first solution or the second? Or another one altogether? Since I will be using a method quite a lot for many calculations. Thanks again for your help.
This is not a problem with JavaScript's floating point implementation, or something that will go away if you use a string formatting function like toFixed (the MDN docs for it here make clear that it is a string representation returned, not some other format of number). Rather, this is an inherent property of floating point arithmetic as a concept - it has a variable accuracy designed to closely approximate values within a certain range.
If you want your values to always be entirely accurate, the only solution is not to use floating point numbers. Generally, this is done by using integers representing some fraction of the "whole" numbers you're dealing with (e.g. pence/cents instead of pounds/euros/dollars, or milliseconds instead of seconds). Alternatively, you may be able to find a precision maths library which performs fixed-point arithmetic, so avoids the inaccuracies but will have worse performance.
If you don't mind the risk of the inaccuracies slowly building up, you can simply use formatting functions to only display to a certain precision when you output the result of a calculation. There is little point in converting to a string with a fixed precision and then back to a number, as the floating point implementation may still be unable to represent that number with complete precision.
Please note, I am suprised with the below problem, I have two values to be divided in TextBox T1 and T2. I am dividing the same to get Amount And not Getting the Exact Amount. Rather then getting the Amount in Fraction 00000001.
Example:
var t1=5623.52;
var t2=56.2352;
var t3=5623.52/56.2352; //100.0000000001
Note: I can't round up the values since the vales are Exchange Rates so vary according to currency.
This is caused by the limited precision of floating point values. See The Floating Point Guide for full details.
The short version is that the 0.52 fractional part of your numbers cannot be represented exactly in binary, just like 1/3 cannot be represented exactly in decimal. Because of the limited number of digits of accuracy, the larger number is slightly more precise than the smaller one, and so is not exactly 100 times as large.
If that doesn't make sense, imagine you are dealing with thirds, and pretend that numbers are represented as decimals, to ten decimal places. If you declare:
var t1 = 1000.0 / 3.0;
var t2 = 10.0 / 3.0;
Then t2 is represented as 3.3333333333, which is as close as can be represented with the given precision. Something that is 100 times as large as t2 would be 333.3333333300, but t1 is actually represented as 333.3333333333. It is not exactly 100 times t2, due to rounding/truncation being applied at different points for the different numbers.
The fix, as always with floating-point rounding issues, is to use decimal types instead. Have a look at the Javascript cheat-sheet on the aforementioned guide for ways to go about this.
like Felix Kling said, don't use floating point values.
Or use parseInt if you want to keep an integer.
var t1=5623.52;
var t2=56.2352;
var t3=parseInt(t1/t2);
I am adding client-side sub-total calculations to my order page, so that the volume discount will show as the user makes selections.
I am finding that some of the calculations are off by one cent here or there. This wouldn't be a very big deal except for the fact that the total doesn't match the final total calculated server-side (in PHP).
I know that the rounding errors are an expected result when dealing with floating point numbers. For example, 149.95 * 0.15 = 22.492499999999996 and 149.95 * 0.30 = 44.98499999999999. The former rounds as desired, the latter does not.
I've searched on this topic and found a variety of discussions, but nothing that satisfactorily addresses the problem.
My current calculation is as follows:
discount = Math.round(price * factor * 100) / 100;
A common suggestion is to work in cents rather than fractions of dollars. However, this would require me to convert my starting numbers, round them, multiply them, round the result, and then convert it back.
Essentially:
discount = Math.round(Math.round(price * 100) * Math.round(factor * 100) / 100) / 100;
I was thinking of adding 0.0001 to the number before rounding. For example:
discount = Math.round(price * factor * 100 + 0.0001) / 100;
This works for the scenarios I've tried, but I am wondering about my logic. Will adding 0.0001 always be enough, and never too much, to force the desired rounding result?
Note: For my purposes here, I am only concerned with a single calculation per price (so not compounding the errors) and will never be displaying more than two decimal places.
EDIT: For example, I want to round the result of 149.95 * 0.30 to two decimal places and get 44.99. However, I get 44.98 because the actual result is 44.98499999999999 not 44.985. The error is not being introduced by the / 100. It is happening before that.
Test:
alert(149.95 * 0.30); // yields 44.98499999999999
Thus:
alert(Math.round(149.95 * 0.30 * 100) / 100); // yields 44.98
The 44.98 is expected considering the actual result of the multiplication, but not desired since it is not what a user would expect (and differs from the PHP result).
Solution: I'm going to convert everything to integers to do my calculations. As the accepted answer points out, I can simplify my original conversion calculation somewhat. My idea of adding the 0.0001 is just a dirty hack. Best to use the right tool for the job.
I don't think adding a small amount will favor you, I guess there are cases where it is too much. Also it needs to be properly documented, otherwise one could see it as incorrect.
working in cents […] would require me to convert my starting numbers, round them, multiply them, round the result, and then convert it back:
discount = Math.round(Math.round(price * 100) * Math.round(factor * 100) / 100) / 100;
I think it should work as well to round afterwards only. However, you should first multiply the result so that the significant digits are the sum of the two sig digits from before, i.e. 2+2=4 decimal places in your example:
discount = Math.round(Math.round( (price * factor) * 10000) / 100) / 100;
Adding a small amount to your numbers will not be very accurate. You can try using a library to get better results: https://github.com/jtobey/javascript-bignum.
Bergi’s answer shows a solution. This answer shows a mathematical demonstration that it is correct. In the process, it also establishes some bound on how much error in the input is tolerable.
Your problem is this:
You have a floating-point number, x, which already contains rounding errors. E.g., it is intended to represent 149.95 but actually contains 149.94999999999998863131622783839702606201171875.
You want to multiply this floating-point number x by a discount value d.
You want to know the result of the multiplication to the nearest penny, performed as if ideal mathematics were used with no errors.
Suppose we add two more assumptions:
x always represents some exact number of cents. That is, it represents a number that has an exact number of hundredths, such as 149.95.
The error in x is small, less than, say, .00004.
The discount value d represents an integer percentage (that is, also an exact number of hundredths, such as .25 for 25%) and is in the interval [0%, 100%].
The error is d is tiny, always the result of correct conversion of a decimal numeral with two digits after the decimal point to double-precision (64 bit) binary floating point.
Consider the value x*d*10000. Ideally, this would be an integer, since x and d are each ideally multiples of .01, so multiplying the ideal product of x and d by 10,000 produces an integer. Since the errors in x and d are small, then rounding x*d*10000 to an integer will produce that ideal integer. E.g., instead of the ideal x and d, we have x and d plus small errors, x+e0 and d+e1, and we are computing (x+e0)•(d+e1)•10000 = (x•d+x•e1+d•e0+e0•e1)•10000. We have assumed that e1 is tiny, so the dominant error is d•e0•10000. We assumed e0, the error in x, is less than .00004, and d is at most 1 (100%), so d•e0•10000 is less than .4. This error, plus the tiny errors from e1, are not enough to change the rounding of x*d*10000 from the ideal integer to some other integer. (This is because the error must be at least .5 to change how a result that should be an integer rounds. E.g., 3 plus an error of .5 would round to 4, but 3 plus .49999 would not.)
Thus, Math.round(x*d*10000) produces the integer desired. Then Math.round(x*d*10000)/100 is an approximation of x*d*100 that is accurate to much less than one cent, so rounding it, with Math.round(Math.round(x*d*10000)/100) produces exactly the number of cents desired. Finally, dividing that by 100 (to produce a number of dollars, with hundredths, instead of a number of cents, as an integer) produces a new rounding error, but the error is so small that, when the resulting value is correctly converted to decimal with two decimal digits, the correct value is displayed. (If further arithmetic is performed with this value, that might not remain true.)
We can see from the above that, if the error in x grows to .00005, this calculation can fail. Suppose the value of an order might grow to $100,000. The floating-point error in representing a value around 100,000 is at most 100,000•2-53. If somebody ordered one hundred thousand items with this error (they could not, since the items would have smaller individual prices than $100,000, so their errors would be smaller), and the prices were individually added up, performing one hundred thousand (minus one) additions adding one hundred thousand new errors, then we have almost two hundred thousand errors of at most 100,000•2-53, so the total error is at most 2•105•105•2-53, which is about .00000222. Therefore, this solution should work for normal orders.
Note that the solution requires reconsideration if the discount is not an integer percentage. For example, if the discount is stated as “one third” instead of 33%, then x*d*10000 is not expected to be an integer.