Keep trailing or leading zeroes on number - javascript

Is it possible to keep trailing or leading zeroes on a number in javascript, without using e.g. a string instead?
const leading = 003; // literal, leading
const trailing = 0.10; // literal, trailing
const parsed = parseFloat('0.100'); // parsed or somehow converted
console.log(leading, trailing, parsed); // desired: 003 0.10 0.100
This question has been regularly asked (and still is), yet I don't have a place I'd feel comfortable linking to (did i miss it?).
Fully analogously would be keeping any other aspect of the representation a number literal was entered as, although asked nowhere near as often:
console.log(0x10); // 16 instead of potentially desired 0x10
console.log(1e1); // 10 instead of potentially desired 1e1
For disambiguation, this is not about the following topics, for some of which I'll add links, as they might be of interest as well:
Padding to a set amount of digits, formatting to some specific string representation, e.g. How can i pad a value with leading zeroes?, How to output numbers with leading zeros in JavaScript?, How to add a trailing zero to a price
Why a certain string representation will be produced for some number by default, e.g. How does JavaScript determine the number of digits to produce when formatting floating-point values?
Floating point precision/accuracy problems, e.g. console.log(0.1 + 0.2) producing 0.30000000000000004, see Is floating point math broken?, and How to deal with floating point number precision in JavaScript?

No. A number stores no information about the representation it was entered as, or parsed from. It only relates to its mathematical value. Perhaps reconsider using a string after all.
If i had to guess, it would be that much of the confusion comes from the thought, that numbers, and their textual representations would either be the same thing, or at least tightly coupled, with some kind of bidirectional binding between them. This is not the case.
The representations like 0.1 and 0.10, which you enter in code, are only used to generate a number. They are convenient names, for what you intend to produce, not the resulting value. In this case, they are names for the same number. It has a lot of other aliases, like 0.100, 1e-1, or 10e-2. In the actual value, there is no contained information, about what or where it came from. The conversion is a one-way street.
When displaying a number as text, by default (Number.prototype.toString), javascript uses an algorithm to construct one of the possible representations from a number. This can only use what's available, the number value, also meaning it will produce the same results for two same numbers. This implies, that 0.1 and 0.10 will produce the same result.
Concerning the number1 value, javascript uses IEEE754-2019 float642. When source code is being evaluated3, and a number literal is encountered, the engine will convert the mathematical value the literal represents to a 64bit value, according to IEEE754-2019. This means any information about the original representation in code is lost4.
There is another problem, which is somewhat unrelated to the main topic. Javascript used to have an octal notation, with a prefix of "0". This means, that 003 is being parsed as an octal, and would throw in strict-mode. Similarly, 010 === 8 (or an error in strict-mode), see Why JavaScript treats a number as octal if it has a leading zero
In conclusion, when trying to keep information about some representation for a number (including leading or trailing zeroes, whether it was written as decimal, hexadecimal, and so on), a number is not a good choice. For how to achieve some specific representation other than the default, which doesn't need access to the originally entered text (e.g. pad to some amount of digits), there are many other questions/articles, some of which were already linked.
[1]: Javascript also has BigInt, but while it uses a different format, the reasoning is completely analogous.
[2]: This is a simplification. Engines are allowed to use other formats internally (and do, e.g. to save space/time), as long as they are guaranteed to behave like an IEEE754-2019 float64 in any regard, when observed from javascript.
[3]: E.g. V8 would convert to bytecode earlier than evaluation, already exchanging the literal. The only relevant thing is, that the information is lost, before we could do anything with it.
[4]: Javascript gives the ability to operate on code itself (e.g. Function.prototype.toString), which i will not discuss here much. Parsing the code yourself, and storing the representation, is an option, but has nothing to do with how number works (you would be operating on code, a string). Also, i don't immediately see any sane reason to do so, over alternatives.

Related

Assigning BigInt stores wrong number (number+1)

I want to define a BigInt number in JavaScript. But when I assign it, the wrong number is stored. In fact 1 is added to the number when storing.
let num = BigInt(0b0000111111111111111111111111111111111111111111111111111111111111)
console.log(num) // Output: 1152921504606846976n
console.log(num.toString(2)) // Output: 1000000000000000000000000000000000000000000000000000000000000
So the number stored is 1152921504606846976, but it should be 11529215046068469765. Why is that?
Converting a Number to a BigInt can't create bits that weren't there before.
0b1 (just like 1) is a Number literal, so it creates a Number.
0b1n (just like 1n) is a BigInt literal, so it creates a BigInt.
By writing BigInt(0b1), you're first creating a Number and then converting that to a BigInt. As long as the value is 1, that works just fine; once the value exceeds what you can losslessly store in a Number [1], you'll see that the value of the final BigInt won't match the literal you wrote down. Whether you use binary (0b...), decimal, or hex (0x...) literals doesn't change any of that.
(And just to be extra clear: there's no reason to write BigInt(123n), just like you wouldn't write Number(123). 123n already is a BigInt, so there's nothing to convert.)
A simple non-BigInt way to illustrate what's happening is to enter 12345678901234567890 into your favorite browser's DevTools console: you can specify Number literals of any length you want, but they'll be parsed into an IEEE754 64-bit "double", which has limited precision. Any extra digits in the literal simply can't be stored, though of course each digit's presence affects the magnitude of the number.
[1] Side note: this condition is more subtle than just saying that Number.MAX_SAFE_INTEGER is the threshold, though that constant is related to the situation: any integral number below MAX_SAFE_INTEGER can be stored losslessly, but there are plenty of numbers above MAX_SAFE_INTEGER that can also be represented exactly. Random example: 1e20.

JSON.stringify() converts integers to exponential

I have a record
[
"5GrwvaEF5zXb26Fz9rcQpDWS57CtERHpNehXCPcNoHGKutQY",
1000000000000000000000
],
JSON.stringify() converts it to the form
[
"5GrwvaEF5zXb26Fz9rcQpDWS57CtERHpNehXCPcNoHGKutQY",
1e+21
],
JSON.stringify() writes it accordingly the same way, can this be somehow solved?
JSON.parse doesn't convert it to 1e+21, it converts it to a number that, when converted to string in the usual way, is output as the string "1e+21". But the number is the same number whether you write it as 1000000000000000000000 or 1e+21.
JSON.stringify may output it in either form; both are valid JSON numbers, and both define exactly the same number.
I should note that you need to beware of numbers of that magnitude in JavaScript (or any other language that uses IEEE-754 double-precision floating point numbers [or single-precision ones, actually]). That number is well into the range where even integers may be imprecisely represented. Any number greater than 9,007,199,254,740,992 (Number.MAX_SAFE_INTEGER + 1) may or may not have a precise representation. It happens that 10,000,00,000,000,000,000,000 (your number) does, but for instance, 9,007,199,254,740,993 doesn't, nor do any odd numbers from that point upward. At some point, you get to where only multiples of 4 can be represented; and then later it's only multiple of 8, etc. See this question's answers for more.
If you still need to get 1e+21 as 1000000000000000000000, you can use (1e+21).toLocaleString().split(',').join('')
but actually, you don't need to convert it if you want to use it as a number, because they are absolutely the same.
Instead, you can keep the number as string and use +'1000000000000000000000' or parseInt('1000000000000000000000') when you need to use it as a number.

Why does code points between U+D800 and U+DBFF generate one-length string in ECMAScript 6?

I'm getting too confused. Why do code points from U+D800 to U+DBFF encode as a single (2 bytes) String element, when using the ECMAScript 6 native Unicode helpers?
I'm not asking how JavaScript/ECMAScript encodes Strings natively, I'm asking about an extra functionality to encode UTF-16 that makes use of UCS-2.
var str1 = '\u{D800}';
var str2 = String.fromCodePoint(0xD800);
console.log(
str1.length, str1.charCodeAt(0), str1.charCodeAt(1)
);
console.log(
str2.length, str2.charCodeAt(0), str2.charCodeAt(1)
);
Re-TL;DR: I want to know why the above approaches return a string of length 1. Shouldn't U+D800 generate a 2 length string, since my browser's ES6 implementation incorporates UCS-2 encoding in strings, which uses 2 bytes for each character code?
Both of these approaches return a one-element String for the U+D800 code point (char code: 55296, same as 0xD800). But for code points bigger than U+FFFF each one returns a two-element String, the lead and trail. lead would be a number between U+D800 and U+DBFF, and trail I'm not sure about, I only know it helps changing the result code point. For me the return value doesn't make sense, it represents a lead without trail. Am I understanding something wrong?
I think your confusion is about how Unicode encodings work in general, so let me try to explain.
Unicode itself just specifies a list of characters, called "code points", in a particular order. It doesn't tell you how to convert those to bits, it just gives them all a number between 0 and 1114111 (in hexadecimal, 0x10FFFF). There are several different ways these numbers from U+0 to U+10FFFF can be represented as bits.
In an earlier version, it was expected that a range of 0 to 65535 (0xFFFF) would be enough. This can be naturally represented in 16 bits, using the same convention as an unsigned integer. This was the original way of storing Unicode, and is now known as UCS-2. To store a single code point, you reserve 16 bits of memory.
Later, it was decided that this range was not large enough; this meant that there were code points higher than 65535, which you can't represent in a 16-bit piece of memory. UTF-16 was invented as a clever way of storing these higher code points. It works by saying "if you look at a 16-bit piece of memory, and it's a number between 0xD800 and 0xDBF (a "low surrogate"), then you need to look at the next 16 bits of memory as well". Any piece of code which is performing this extra check is processing its data as UTF-16, and not UCS-2.
It's important to understand that the memory itself doesn't "know" which encoding it's in, the difference between UCS-2 and UTF-16 is how you interpret that memory. When you write a piece of software, you have to choose which interpretation you're going to use.
Now, onto Javascript...
Javascript handles input and output of strings by interpreting its internal representation as UTF-16. That's great, it means that you can type in and display the famous 💩 character, which can't be stored in one 16-bit piece of memory.
The problem is that most of the built in string functions actually handle the data as UCS-2 - that is, they look at 16 bits at a time, and don't care if what they see is a special "surrogate". The function you used, charCodeAt(), is an example of this: it reads 16 bits out of memory, and gives them to you as a number between 0 and 65535. If you feed it 💩, it will just give you back the first 16 bits; ask it for the next "character" after, and it will give you the second 16 bits (which will be a "high surrogate", between 0xDC00 and 0xDFFF).
In ECMAScript 6 (2015), a new function was added: codePointAt(). Instead of just looking at 16 bits and giving them to you, this function checks if they represent one of the UTF-16 surrogate code units, and if so, looks for the "other half" - so it gives you a number between 0 and 1114111. If you feed it 💩, it will correctly give you 128169.
var poop = '💩';
console.log('Treat it as UCS-2, two 16-bit numbers: ' + poop.charCodeAt(0) + ' and ' + poop.charCodeAt(1));
console.log('Treat it as UTF-16, one value cleverly encoded in 32 bits: ' + poop.codePointAt(0));
// The surrogates are 55357 and 56489, which encode 128169 as follows:
// 0x010000 + ((55357 - 0xD800) << 10) + (56489 - 0xDC00) = 128169
Your edited question now asks this:
I want to know why the above approaches return a string of length 1. Shouldn't U+D800 generate a 2 length string?
The hexadecimal value D800 is 55296 in decimal, which is less than 65536, so given everything I've said above, this fits fine in 16 bits of memory. So if we ask charCodeAt to read 16 bits of memory, and it finds that number there, it's not going to have a problem.
Similarly, the .length property measures how many sets of 16 bits there are in the string. Since this string is stored in 16 bits of memory, there is no reason to expect any length other than 1.
The only unusual thing about this number is that in Unicode, that value is reserved - there isn't, and never will be, a character U+D800. That's because it's one of the magic numbers that tells a UTF-16 algorithm "this is only half a character". So a possible behaviour would be for any attempt to create this string to simply be an error - like opening a pair of brackets that you never close, it's unbalanced, incomplete.
The only way you could end up with a string of length 2 is if the engine somehow guessed what the second half should be; but how would it know? There are 1024 possibilities, from 0xDC00 to 0xDFFF, which could be plugged into the formula I show above. So it doesn't guess, and since it doesn't error, the string you get is 16 bits long.
Of course, you can supply the matching halves, and codePointAt will interpret them for you.
// Set up two 16-bit pieces of memory
var high=String.fromCharCode(55357), low=String.fromCharCode(56489);
// Note: String.fromCodePoint will give the same answer
// Glue them together (this + is string concatenation, not number addition)
var poop = high + low;
// Read out the memory as UTF-16
console.log(poop);
console.log(poop.codePointAt(0));
Well, it does this because the specification says it has to:
http://www.ecma-international.org/ecma-262/6.0/#sec-string.fromcodepoint
http://www.ecma-international.org/ecma-262/6.0/#sec-utf16encoding
Together these two say that if an argument is < 0 or > 0x10FFFF, a RangeError is thrown, but otherwise any codepoint <= 65535 is incorporated into the result string as-is.
As for why things are specified this way, I don't know. It seems like JavaScript doesn't really support Unicode, only UCS-2.
Unicode.org has the following to say on the matter:
http://www.unicode.org/faq/utf_bom.html#utf16-2
Q: What are surrogates?
A: Surrogates are code points from two special ranges of Unicode values, reserved for use as the leading, and trailing values of paired code units in UTF-16. Leading, also called high, surrogates are from D80016 to DBFF16, and trailing, or low, surrogates are from DC0016 to DFFF16. They are called surrogates, since they do not represent characters directly, but only as a pair.
http://www.unicode.org/faq/utf_bom.html#utf16-7
Q: Are there any 16-bit values that are invalid?
A: Unpaired surrogates are invalid in UTFs. These include any value in the range D80016 to DBFF16 not followed by a value in the range DC0016 to DFFF16, or any value in the range DC0016 to DFFF16 not preceded by a value in the range D80016 to DBFF16.
Therefore the result of String.fromCodePoint is not always valid UTF-16 because it can emit unpaired surrogates.

Odd behavior with ParseFloat when my string is too long

I was making a calculator (something like excel in javascript) and I have found a strange behavior with ParseFloat.
parseFloat(999999999999999) //999999999999999
parseFloat(9999999999999999) //10000000000000000
parseFloat(9999999999999899) //9999999999999900
Is there a limit with parseFloat function in javascript? Following ECMA Standard there is no issue about this.
Float is not an endless container. Consider this example:
console.log(0.1 + 0.2 == 0.3) // Prints... FALSE!
Or, another case:
console.log(99999999999999999999999999999999999) // Prints 1e+35
...while 1e+35 is just 1 with 35 zeroes. Original number (9999...) is so large and precise that JS starts cutting lower digits to store at least something - the source is too big to save in float.
This actually happens because of internal float conversions made by JavaScript engine and the philosophy of float type is that higher digits are more important that lower.
Your case is somewhat similar. This is because floating point type accuracy depends on its value length. So, If your value is too big or too small, you will lose precision for lower digits.
Thus you should never trust float and never compare it with other values using '==' of '===' - it may be anything.

JSON.parse parses / converts big numbers incorrectly

My problem is really simple but I'm not sure if there's a "native" solution using JSON.parse.
I receive this string from an API :
{ "key" : -922271061845347495 }
When I'm using JSON.parse on this string, it turns into this object:
{ "key" : -922271061845347500 }
As you can see, the parsing stops when the number is too long (you can check this behavior here). It has only 15 exact digits, the last one is rounded and those after are set to 0. Is there a "native" solution to keep the exact value ? (it's an ID so I can't round it)
I know I can use regex to solve this problem but I'd prefer to use a "native" method if it exists.
Your assumption that the parsing stops after certain digits is incorrect.
It says here:
In JavaScript all numbers are floating-point numbers. JavaScript uses
the standard 8 byte IEEE floating-point numeric format, which means
the range is from:
±1.7976931348623157 x 10308 - very large, and ±5 x 10-324 - very small.
As JavaScript uses floating-point numbers the accuracy is only assured
for integers between: -9007199254740992 (-253) and 9007199254740992
(253)
You number lies outside the "accurate" range hence it is converted to the nearest representation of the JavaScript number. Any attempt to evaluate this number (using JSON.parse, eval, parseInt) will cause data loss. I therefore recommend that you pass the key as a string. If you do not control the API, file a feature request.
The number is too big to be parsed correctly.
One solution is:
Preprocessing your string from API to convert it into string before parsing.
Preform normal parsing
Optionally, you could convert it back into number for your own purpose.
Here is the RegExp to convert all numbers in your string (proceeded with :) into strings:
// convert all number fields into strings to maintain precision
// : 922271061845347495, => : "922271061845347495",
stringFromApi = stringFromApi.replace(/:\s*(-?\d+),/g, ': "$1",');
Regex explanation:
\s* any number of spaces
-? one or zero '-' symbols (negative number support)
\d+ one or more digits
(...) will be put in the $1 variable

Categories