I have noticed some inconsistencies between Python and JavaScript when converting a string to base36.
Python Method:
>>> print int('abcdefghijr', 36)
Result: 37713647386641447
Javascript Method:
<script>
document.write(parseInt("abcdefghijr", 36));
</script>
Result: 37713647386641450
What causes the different results between the two languages? What would be the best approach to produce the same results irregardless of the language?
Thank you.
That number takes 56 bits to represent. JavaScript's numbers are actually double-precision binary floating point numbers, or double for short. These are 64 bit in total, and can represent a far wider range of values than a 64 bit integers, but due to how they achieve that (they represent a number as mantissa * 2^exponent), they cannot represent all numbers in that range, just the ones that are a multiple of 2^exponent where the multiple fits into the mantissa (which includes 2^0 = 1, so you get all integers the mantissa can handle directly). The mantissa is 53 bits, which is insufficient for this number. So it gets rounded to a number which can be represented.
What you can do is use an arbitrary precision number type defined by a third party library like gwt-math or Big.js. These numbers aren't hard to implement if you know your school arithmetic. Doing it efficiently is another matter, but also an area of extensive research. And not your problem if you use an existing library.
Related
while working with large numbers i found some strange issue
(99999999999999.9).toFixed(2) returns "99999999999999.91"
where as
(99999999999999.8).toFixed(2) returns "99999999999999.80"
what I need is
(99999999999999.9).toFixed(2) should return "99999999999999.90"
how can I resolve this?
You basically cannot do this, this is due to the representation of numbers in floating point and how javascript works in the background:
Javascript uses IEEE 754 double precision floating point to represent symbols in memory 99999999999999.9 is displayed as
0 10000101101 0110101111001100010000011110100011111111111111111010
Which is a number consisting of 3 (integer) parts:
first the sign "0" means it's positive.
Then the exponent "multiplier" - 1069. We subtract 1023 from it (to allow for negative exponents) so the multiplier is `2^(1069-1023)
And then the mantissa - "data" - 1.421085471520199. -- calculation is shown on the wikipedia (without mathjax available a bit hard to show)
So the total value is +1.421085471520199*2^(1069-1023) = (according to wolfram alpha) 99999999999999.903472220635136
As you can see it isn't able to show that decimal precision exactly. This is due to limited mantissa, we can see if we change the last bit of the mantissa, "1 higher" we get 9.99999999999999245828438884352 × 10^13 as result. and the one below is 9.99999999999998893984717996032 × 10^13 .
So everything between 9.99999999999998893984717996032 × 10^13 and 9.9999999999999903472220635136 × 10^13 is represented as the same number - you won't notice a difference there. [*]
Now as to why floating point is rounded "up" when displaying again in this case: that is a bit harder to explain but I guess it's due to the implementation.
Now could we predict this? And why doesn't it happen in decimal? This is easy to explain. A numberic system has different bases, normally we use "base 10", while computers use base 2 as intrinsic unit.
The choice has far fetched consequences: when working in a certain base you can only represent fractions that are a multiple of the prime factors of that base. IE base 10 has prime factors 2 and 5. So we can display any fraction that is a multiple of those:
1/2 => 0.5
1/5 => 0.2
2/5 => 0.4
1/10 = 1/2*1/5 => 0.1
However 1/3 cannot be described as a multiple of those two fractions, nor 1/7 or 1/6 - so in base 10 we can not write those as "decimals" down exactly.
Similarly base 2 has only the prime factor "2". a number ending with '.9' is in decimal always a fraction based on "1/5", which is not part of binary - thus cannot be described in binary.
now there are solutions, often there exist libraries that give so called "decimal" packages - which keep numbers in decimal representation, and replace the internal computer FPU with manual calculations.
[*]: ps I do not know if it's actually these boundaries - so if the floating point interpreter will always round up, or if the interpreter will round towards the nearest floating point. Someone with knowledge of the JS interpreters could answer that.
You are definitely running into float precision error here.
If the specific usecase of yours is not needed to be extra sensitive you can use
(Math.round(yourValue*100)/100).toFixed(2)
There's also much more newer way of handling this using the Intl.NumberFormat API
Playing especially with the fractions config
minimumFractionDigits: 2,
maximumFractionDigits: 2
Ofcourse if you need some much more robust that plays well with other edge cases there's also decimal.js to help out
var a = new Decimal(99999999999999.9);
console.log(a.toFixed(2))
Why do some numbers lose accuracy when stored as floating point numbers?
For example, the decimal number 9.2 can be expressed exactly as a ratio of two decimal integers (92/10), both of which can be expressed exactly in binary (0b1011100/0b1010). However, the same ratio stored as a floating point number is never exactly equal to 9.2:
32-bit "single precision" float: 9.19999980926513671875
64-bit "double precision" float: 9.199999999999999289457264239899814128875732421875
How can such an apparently simple number be "too big" to express in 64 bits of memory?
In most programming languages, floating point numbers are represented a lot like scientific notation: with an exponent and a mantissa (also called the significand). A very simple number, say 9.2, is actually this fraction:
5179139571476070 * 2 -49
Where the exponent is -49 and the mantissa is 5179139571476070. The reason it is impossible to represent some decimal numbers this way is that both the exponent and the mantissa must be integers. In other words, all floats must be an integer multiplied by an integer power of 2.
9.2 may be simply 92/10, but 10 cannot be expressed as 2n if n is limited to integer values.
Seeing the Data
First, a few functions to see the components that make a 32- and 64-bit float. Gloss over these if you only care about the output (example in Python):
def float_to_bin_parts(number, bits=64):
if bits == 32: # single precision
int_pack = 'I'
float_pack = 'f'
exponent_bits = 8
mantissa_bits = 23
exponent_bias = 127
elif bits == 64: # double precision. all python floats are this
int_pack = 'Q'
float_pack = 'd'
exponent_bits = 11
mantissa_bits = 52
exponent_bias = 1023
else:
raise ValueError, 'bits argument must be 32 or 64'
bin_iter = iter(bin(struct.unpack(int_pack, struct.pack(float_pack, number))[0])[2:].rjust(bits, '0'))
return [''.join(islice(bin_iter, x)) for x in (1, exponent_bits, mantissa_bits)]
There's a lot of complexity behind that function, and it'd be quite the tangent to explain, but if you're interested, the important resource for our purposes is the struct module.
Python's float is a 64-bit, double-precision number. In other languages such as C, C++, Java and C#, double-precision has a separate type double, which is often implemented as 64 bits.
When we call that function with our example, 9.2, here's what we get:
>>> float_to_bin_parts(9.2)
['0', '10000000010', '0010011001100110011001100110011001100110011001100110']
Interpreting the Data
You'll see I've split the return value into three components. These components are:
Sign
Exponent
Mantissa (also called Significand, or Fraction)
Sign
The sign is stored in the first component as a single bit. It's easy to explain: 0 means the float is a positive number; 1 means it's negative. Because 9.2 is positive, our sign value is 0.
Exponent
The exponent is stored in the middle component as 11 bits. In our case, 0b10000000010. In decimal, that represents the value 1026. A quirk of this component is that you must subtract a number equal to 2(# of bits) - 1 - 1 to get the true exponent; in our case, that means subtracting 0b1111111111 (decimal number 1023) to get the true exponent, 0b00000000011 (decimal number 3).
Mantissa
The mantissa is stored in the third component as 52 bits. However, there's a quirk to this component as well. To understand this quirk, consider a number in scientific notation, like this:
6.0221413x1023
The mantissa would be the 6.0221413. Recall that the mantissa in scientific notation always begins with a single non-zero digit. The same holds true for binary, except that binary only has two digits: 0 and 1. So the binary mantissa always starts with 1! When a float is stored, the 1 at the front of the binary mantissa is omitted to save space; we have to place it back at the front of our third element to get the true mantissa:
1.0010011001100110011001100110011001100110011001100110
This involves more than just a simple addition, because the bits stored in our third component actually represent the fractional part of the mantissa, to the right of the radix point.
When dealing with decimal numbers, we "move the decimal point" by multiplying or dividing by powers of 10. In binary, we can do the same thing by multiplying or dividing by powers of 2. Since our third element has 52 bits, we divide it by 252 to move it 52 places to the right:
0.0010011001100110011001100110011001100110011001100110
In decimal notation, that's the same as dividing 675539944105574 by 4503599627370496 to get 0.1499999999999999. (This is one example of a ratio that can be expressed exactly in binary, but only approximately in decimal; for more detail, see: 675539944105574 / 4503599627370496.)
Now that we've transformed the third component into a fractional number, adding 1 gives the true mantissa.
Recapping the Components
Sign (first component): 0 for positive, 1 for negative
Exponent (middle component): Subtract 2(# of bits) - 1 - 1 to get the true exponent
Mantissa (last component): Divide by 2(# of bits) and add 1 to get the true mantissa
Calculating the Number
Putting all three parts together, we're given this binary number:
1.0010011001100110011001100110011001100110011001100110 x 1011
Which we can then convert from binary to decimal:
1.1499999999999999 x 23 (inexact!)
And multiply to reveal the final representation of the number we started with (9.2) after being stored as a floating point value:
9.1999999999999993
Representing as a Fraction
9.2
Now that we've built the number, it's possible to reconstruct it into a simple fraction:
1.0010011001100110011001100110011001100110011001100110 x 1011
Shift mantissa to a whole number:
10010011001100110011001100110011001100110011001100110 x 1011-110100
Convert to decimal:
5179139571476070 x 23-52
Subtract the exponent:
5179139571476070 x 2-49
Turn negative exponent into division:
5179139571476070 / 249
Multiply exponent:
5179139571476070 / 562949953421312
Which equals:
9.1999999999999993
9.5
>>> float_to_bin_parts(9.5)
['0', '10000000010', '0011000000000000000000000000000000000000000000000000']
Already you can see the mantissa is only 4 digits followed by a whole lot of zeroes. But let's go through the paces.
Assemble the binary scientific notation:
1.0011 x 1011
Shift the decimal point:
10011 x 1011-100
Subtract the exponent:
10011 x 10-1
Binary to decimal:
19 x 2-1
Negative exponent to division:
19 / 21
Multiply exponent:
19 / 2
Equals:
9.5
Further reading
The Floating-Point Guide: What Every Programmer Should Know About Floating-Point Arithmetic, or, Why don’t my numbers add up? (floating-point-gui.de)
What Every Computer Scientist Should Know About Floating-Point Arithmetic (Goldberg 1991)
IEEE Double-precision floating-point format (Wikipedia)
Floating Point Arithmetic: Issues and Limitations (docs.python.org)
Floating Point Binary
This isn't a full answer (mhlester already covered a lot of good ground I won't duplicate), but I would like to stress how much the representation of a number depends on the base you are working in.
Consider the fraction 2/3
In good-ol' base 10, we typically write it out as something like
0.666...
0.666
0.667
When we look at those representations, we tend to associate each of them with the fraction 2/3, even though only the first representation is mathematically equal to the fraction. The second and third representations/approximations have an error on the order of 0.001, which is actually much worse than the error between 9.2 and 9.1999999999999993. In fact, the second representation isn't even rounded correctly! Nevertheless, we don't have a problem with 0.666 as an approximation of the number 2/3, so we shouldn't really have a problem with how 9.2 is approximated in most programs. (Yes, in some programs it matters.)
Number bases
So here's where number bases are crucial. If we were trying to represent 2/3 in base 3, then
(2/3)10 = 0.23
In other words, we have an exact, finite representation for the same number by switching bases! The take-away is that even though you can convert any number to any base, all rational numbers have exact finite representations in some bases but not in others.
To drive this point home, let's look at 1/2. It might surprise you that even though this perfectly simple number has an exact representation in base 10 and 2, it requires a repeating representation in base 3.
(1/2)10 = 0.510 = 0.12 = 0.1111...3
Why are floating point numbers inaccurate?
Because often-times, they are approximating rationals that cannot be represented finitely in base 2 (the digits repeat), and in general they are approximating real (possibly irrational) numbers which may not be representable in finitely many digits in any base.
While all of the other answers are good there is still one thing missing:
It is impossible to represent irrational numbers (e.g. π, sqrt(2), log(3), etc.) precisely!
And that actually is why they are called irrational. No amount of bit storage in the world would be enough to hold even one of them. Only symbolic arithmetic is able to preserve their precision.
Although if you would limit your math needs to rational numbers only the problem of precision becomes manageable. You would need to store a pair of (possibly very big) integers a and b to hold the number represented by the fraction a/b. All your arithmetic would have to be done on fractions just like in highschool math (e.g. a/b * c/d = ac/bd).
But of course you would still run into the same kind of trouble when pi, sqrt, log, sin, etc. are involved.
TL;DR
For hardware accelerated arithmetic only a limited amount of rational numbers can be represented. Every not-representable number is approximated. Some numbers (i.e. irrational) can never be represented no matter the system.
There are infinitely many real numbers (so many that you can't enumerate them), and there are infinitely many rational numbers (it is possible to enumerate them).
The floating-point representation is a finite one (like anything in a computer) so unavoidably many many many numbers are impossible to represent. In particular, 64 bits only allow you to distinguish among only 18,446,744,073,709,551,616 different values (which is nothing compared to infinity). With the standard convention, 9.2 is not one of them. Those that can are of the form m.2^e for some integers m and e.
You might come up with a different numeration system, 10 based for instance, where 9.2 would have an exact representation. But other numbers, say 1/3, would still be impossible to represent.
Also note that double-precision floating-points numbers are extremely accurate. They can represent any number in a very wide range with as much as 15 exact digits. For daily life computations, 4 or 5 digits are more than enough. You will never really need those 15, unless you want to count every millisecond of your lifetime.
Why can we not represent 9.2 in binary floating point?
Floating point numbers are (simplifying slightly) a positional numbering system with a restricted number of digits and a movable radix point.
A fraction can only be expressed exactly using a finite number of digits in a positional numbering system if the prime factors of the denominator (when the fraction is expressed in it's lowest terms) are factors of the base.
The prime factors of 10 are 5 and 2, so in base 10 we can represent any fraction of the form a/(2b5c).
On the other hand the only prime factor of 2 is 2, so in base 2 we can only represent fractions of the form a/(2b)
Why do computers use this representation?
Because it's a simple format to work with and it is sufficiently accurate for most purposes. Basically the same reason scientists use "scientific notation" and round their results to a reasonable number of digits at each step.
It would certainly be possible to define a fraction format, with (for example) a 32-bit numerator and a 32-bit denominator. It would be able to represent numbers that IEEE double precision floating point could not, but equally there would be many numbers that can be represented in double precision floating point that could not be represented in such a fixed-size fraction format.
However the big problem is that such a format is a pain to do calculations on. For two reasons.
If you want to have exactly one representation of each number then after each calculation you need to reduce the fraction to it's lowest terms. That means that for every operation you basically need to do a greatest common divisor calculation.
If after your calculation you end up with an unrepresentable result because the numerator or denominator you need to find the closest representable result. This is non-trivil.
Some Languages do offer fraction types, but usually they do it in combination with arbitary precision, this avoids needing to worry about approximating fractions but it creates it's own problem, when a number passes through a large number of calculation steps the size of the denominator and hence the storage needed for the fraction can explode.
Some languages also offer decimal floating point types, these are mainly used in scenarios where it is imporant that the results the computer gets match pre-existing rounding rules that were written with humans in mind (chiefly financial calculations). These are slightly more difficult to work with than binary floating point, but the biggest problem is that most computers don't offer hardware support for them.
According to the documentation, one can convert a binary representation of a number to the number itself using parseInt(string, base).
For instance,
var number = parseInt("10100", 2);
// number is 20
However, take a look at the next example:
var number = parseInt("1000110011000011101000010100000110011110010111111100011010000", 2);
// number is 1267891011121314000
var number = parseInt("1000110011000011101000010100000110011110010111111100100000000", 2);
// number is 1267891011121314000
How is this possible?
Note that the binary numbers are almost the same, except for the last 9 bits.
1267891011121314000 is way over Number.MAX_SAFE_INTEGER (9007199254740991). It can't safely represent it in memory.
Take a look at this classic example:
1267891011121314000 == 1267891011121314001 // true
Because that's a 61-bit number and the highest-precision type that Javascript will store is a 64-bit floating point values (using IEEE-754 doubles).
A 64-bit float does not have 61 bits of integral precision, since the 64 bits used are split into the fraction and exponent. As you can see from the graphic of the bit layout, only 52 bits are available to the mantissa (or fraction), which is used when storing an integer.
Many bignum libraries exist to solve this sort of thing, as it's a very common problem in scientific applications. They tend not to be as fast as math with hardware support, but allow much more precision. Pulling from Github results, bignumber.js may be what you need to support these values.
There are upper and lower limits on numbers in JavaScript. JavaScript represents number as 64 bit floating point. IEEE754
Look into a BigDecimal implementation in JavaScript if you need larger support.
So in JavaScript, 111111111111111111111 == 111111111111111110000. Just type any long number – at least about 17 digits – to see it in action ;-)
That is because JavaScript uses double-precision floating-point numbers, and certain very long numeric literals can not be expressed accurately. Instead, those numbers get rounded to the nearest representable number possible. See e.g. What is JavaScript's highest integer value that a Number can go to without losing precision?
However, doing the math according to IEEE-754, I found out that every number inside [111111111111111106560, 111111111111111122944] will be replaced by 111111111111111114752 internally. But instead of showing this ...4752 number, it gets displayed as 111111111111111110000. So JavaScript is showing those trailing zeros which obfuscates the real behavior. This is especially annoying because even numbers like 263, which are accurately representable, get "rounded" as described.
So my question is: Why does JavaScript behave like this when displaying the number?
JavaScript integers can only be +/- 253, which is:
9007199254740992
One of your numbers is
111111111111111106560
which is considerably outside of the range of numbers that can accurately represented as an integer.
This follows the IEEE 754:
Sign bit: 1 bit
Exponent width: 11 bits
Significand precision: 53 bits (52 explicitly stored)
EDIT
The Display of numbers is sometimes rounded by the JavaScript engine, yes. However, that can be over-ridden using the toFixed method. (Warning, toFixed is known to be broken under some versions of IE).
In your console, type:
111111111111111122944..toFixed(0)
"111111111111111114752"
I've been playing around with floating point numbers a little bit, and based on what I've learned about them in the past, the fact that 0.1 + 0.2 ends up being something like 0.30000000000000004 doesn't surprise me.
What does surprise me, however, is that integer arithmetic always seems to work just fine and not have any of these artifacts.
I first noticed this in JavaScript (Chrome V8 in node.js):
0.1 + 0.2 == 0.3 // false, NOT surprising
123456789012 + 18 == 123456789030 // true
22334455667788 + 998877665544 == 23333333333332 // true
1048576 / 1024 == 1024 // true
C++ (gcc on Mac OS X) seems to have the same properties.
The net result seems to be that integer numbers just — for lack of a better word — work. It's only when I start using decimal numbers that things get wonky.
Is this is a feature of the design, an mathematical artifact, or some optimisation done by compilers and runtime environments?
Is this is a feature of the design, an mathematical artifact, or some optimisation done by compilers and runtime environments?
It's a feature of the real numbers. A theorem from modern algebra (modern algebra, not high school algebra; math majors take a class in modern algebra after their basic calculus and linear algebra classes) says that for some positive integer b, any positive real number r can be expressed as r = a * bp, where a is in [1,b) and p is some integer. For example, 102410 = 1.02410*103. It is this theorem that justifies our use of scientific notation.
That number a can be classified as terminal (e.g. 1.0), repeating (1/3=0.333...), or non-repeating (the representation of pi). There's a minor issue here with terminal numbers. Any terminal number can be also be represented as a repeating number. For example, 0.999... and 1 are the same number. This ambiguity in representation can be resolved by specifying that numbers that can be represented as terminal numbers are represented as such.
What you have discovered is a consequence of the fact that all integers have a terminal representation in any base.
There is an issue here with how the reals are represented in a computer. Just as int and long long int don't represent all of integers, float and double don't represent all of the reals. The scheme used on most computer to represent a real number r is to represent in the form r = a*2p, but with the mantissa (or significand) a truncated to a certain number of bits and the exponent p limited to some finite number. What this means is that some integers cannot be represented exactly. For example, even though a googol (10100) is an integer, it's floating point representation is not exact. The base 2 representation of a googol is a 333 bit number. This 333 bit mantissa is truncated to 52+1 bits.
On consequence of this is that double precision arithmetic is no longer exact, even for integers if the integers in question are greater than 253. Try your experiment using the type unsigned long long int on values between 253 and 264. You'll find that double precision arithmetic is no longer exact for these large integers.
I'm writing that under assumption that Javascript uses double-precision floating-point representation for all numbers.
Some numbers have an exact representation in the floating-point format, in particular, all integers such that |x| < 2^53. Some numbers don't, in particular, fractions such as 0.1 or 0.2 which become infinite fractions in binary representation.
If all operands and the result of an operation have an exact representation, then it would be safe to compare the result using ==.
Related questions:
What number in binary can only be represented as an approximation?
Why can't decimal numbers be represented exactly in binary?
Integers withing the representable range are exactly representable by the machine, floats are not (well, most of them).
If by "basic integer math" you understand "feature", then yes, you can assume correctly implementing arithmetic is a feature.
The reason is, that you can represent every whole number (1, 2, 3, ...) exactly in binary format (0001, 0010, 0011, ...)
That is why integers are always correct because 0011 - 0001 is always 0010.
The problem with floating point numbers is, that the part after the point cannot be exactly converted to binary.
All of the cases that you say "work" are ones where the numbers you have given can be represented exactly in the floating point format. You'll find that adding 0.25 and 0.5 and 0.125 works exactly too because they can also be represented exactly in a binary floating point number.
it's only values that can't be such as 0.1 where you'll get what appear to be inexact results.
Integers are exact because because the imprecision results mainly from the way we write decimal fractions, and secondarily because many rational numbers simply don't have non-repeating representations in any given base.
See: https://stackoverflow.com/a/9650037/140740 for the full explanation.
That method only works, when you are adding a small enough integer to very large integer -- and even in that case you are not representing both of the integers in the 'floating point' format.
All floating point numbers can't be represented. it's due to the way of coding them. The wiki page explain it better than me: http://en.wikipedia.org/wiki/IEEE_754-1985.
So when you are trying to compare a floating point number, you should use a delta:
myFloat - expectedFloat < delta
You can use the smallest representable floating point number as delta.