Is there an upper limit to the possible character length of strings in JavaScript, and ES6+ in particular?
Could you do this?
const wowThisIsALongString = `${collectedWorksOfWilliamShakespeare}`
[I'd write the collected works out by hand but am feeling lazy.]
If I understand correctly (and odds are that I don't), a JavaScript string is just a special kind of JavaScript Object, so there's technically no limit?
But maybe things are different in practice?
EDIT / UPDATE: As people have noted, a string primitive isn't an Object. I'd never thought of it as such until I checked the ECMAScript 2015 specs.
4.3.17 String value
primitive value that is a finite ordered sequence of zero or more
16-bit unsigned integer
NOTE A String value is a member of the String type. Each integer value
in the sequence usually represents a single 16-bit unit of UTF-16
text. However, ECMAScript does not place any restrictions or
requirements on the values except that they must be 16-bit unsigned
integers.
4.3.18 String type
set of all possible String values
4.3.19 String object
member of the Object type that is an instance of the standard built-in
String constructor
NOTE A String object is created by using the String constructor in a
new expression, supplying a String value as an argument. The resulting
object has an internal slot whose value is the String value. A String
object can be coerced to a String value by calling the String
constructor as a function (21.1.1.1).
So, when they write that, is the meaning that String objects are objects which contain strings, or ... something else?
Another Update: I think that Ryan has answered this below.
There is a specified length of 253 − 1 in Section 6.1.4:
The String type is the set of all ordered sequences of zero or more 16-bit unsigned integer values (“elements”) up to a maximum length of 253-1 elements.
This is the highest integer with unambiguous representation as a JavaScript number:
> 2**53 === 2**53 - 1
false
> 2**53 === 2**53 + 1
true
Individual engines can have smaller limits. V8, for example, limits its strings to 228 − 14 characters.
Side note: primitive strings aren’t objects, but that doesn’t have much to do with length limits. JavaScript has a “primitive wrapper” misfeature allowing strings, numbers, and booleans to be wrapped by objects, and that’s what the section you linked refers to, but there’s no reason to ever use it.
Related
From MDN:
JavaScript's String type is used to represent textual data. It is a
set of "elements" of 16-bit unsigned integer values. Each element in
the String occupies a position in the String. The first element is at
index 0, the next at index 1, and so on. The length of a String is the
number of elements in it. You can create strings using string literals
or string objects.
What does it mean when you say the JavaScript String type is a set of "elements" of 16-bit unsigned integer values?
Please explain why it is a series of integer values.
The 16-bit unsigned integer values is a representation of specific characters and since it is a set of elements, you are able to grab specific characters within a string with [] notation as you would a list. Ex:
const string = 'john doe';
console.log(string[3]) // Will print 'n' as it is the 3rd index characters (starts at 0)
It just means that a string is an "array-like" object with each character available in a similar manner to an array element. Each of those characters are stored as a UTF-16 value.
// The following is one string literal:
let s = "ABCDEFG";
console.log(s);
// But it's also an array-like object in that it has a length and can be indexed
console.log("The length of the string is: ", s.length);
console.log("The 3rd character is: ", s[2]);
// And we can see that the characters are stored as separate UTF-16 values:
console.log(s.charCodeAt(2));
As I understood:
unsigned means not + or -.
16 bit means 2^16 number of elements/characters can represent.
set of Integers mean to represent a String use multiple integers (1 or more).
Therefore this means to represents a string, JavaScript uses a set of numbers (each number is one of 2^16 numbers, because no float numbers and no positive/negative representation).
Note: to understand more read about UTF-16
Reference: UTF-16 (IBM)
In Unicode, each symbol has an associated number. For example, "A" is 65, "a" is 97, etc. These numbers are called code points. Depending on the encoding we’re using (UTF-32, UTF-16, UTF-8, ASCII, etc.), we represent/encode these code points in different ways. The things we use to encode these code point numbers are called "code units", or as MDN calls them, "elements".
As we're using JavaScript, we're interested in the UTF-16 encoding of characters. This means that to represent a single code unit/"element", we use 16 bits (two bytes). For "A", the "element" representation is:
0000000001000001 // (16 bits, hence 0 padding)
There are a lot of characters that we need to represent (think emojis, Chinese, Japanese, Korean scripts, etc. that each have their own code points), so 16 bits to represent and encode all of these characters alone isn't enough. That's why sometimes some code points are encoded using two code units/elements. For example, 😂 has a code point of 128514 and in UTF-16 is encoded by two elements/code units:
1101100000111101 1101111000000010
So these two code units/elements 1101100000111101 (decimal 55357) and 1101111000000010 (decimal 56834) encode the code point/"character" of 128514 which represents 😂. Notice how both code units are both positive (unsigned), and are whole numbers (integers). UTF16 outlines the algorithm to take these elements from the element form to their code point form and vice-versa (see here for examples).
What are the implications of all this? Well it means that strings like "😂" will have a length of 2:
console.log("😂".length); // 2
And that when you access the indexes of the string, you will access the code units/"elements" of that string:
// "😂" in UTF16 is "1101100000111101 1101111000000010"
// So "😂"[0] gives 1101100000111101 (in decimal 55357)
// So "😂"[1] gives 1101111000000010 (in decimal 56834)
console.log("😂"[0], "😂".charCodeAt(0)); // 1101100000111101
console.log("😂"[1], "😂".charCodeAt(1)); // 1101111000000010
The basic question here is how do I know when to use and
what is the critical difference between each of them:
The Number.parseInt method (or just parseInt),
Number.parseFloat method (or just parseFloat),
Number() function (or class?),
and the + operator
for converting JavaScript values (mostly String's) to numbers.
Especially since all of them give similar values and can convert String to its Number representation:
Number.parseInt("2") // returns 2
Number.parseFloat("2") // returns 2
Number("2") // returns 2
+"2" // returns 2
/* Plus a few more methods... */
eval("2") // returns 2
JSON.parse("2") // returns 2
Number.parseInt method (or just parseInt)
Ignores leading and trailing whitespace
Parses a leading number to an integer (not a floating point number)
Ignores invalid trailing data
Lets you set the base to use when interpreting the number
Will interpret text starting with 0x as hexadecimal, if another base was not provided
Returns NaN if the value could not be successfully parsed to an integer
Number.parseFloat method (or just parseFloat)
Similar to parseInt, except that it allows for a decimal part to be interpreted
Only parses to base-10
Number() function (or class?)
Similar to parseFloat, but does not allow trailing text
Will return 0 for an empty string or a string that only contains whitespace
It's not a class; when called without new, it returns a primitive number
the + operator
Basically the same as Number(), but in operator form.
eval()
Interprets and executes the given input as a JavaScript program.
Given the string "2", it will be interpreted as a numeric literal, and return that value since it's the result of the last expression in the program
Throws an error if the input was not a valid program.
JSON.parse()
Parses the textual data as JSON-serialized data.
If the data is valid, it creates the JavaScript objects/primitives that are represented by the data, and returns them.
If the data is invalid, it throws an error.
Given the string "2", it will be interpreted as a numeric literal, and return the value that was successfully parsed out of it according to the parsing requirements of JSON.
So you decide which is appropriate to use based on their capabilities.
Number.parseInt() calls the global function parseInt() in the background, same with Number.parseFloat() see: Number.parseInt ECMA and Number.parseFloat ECMA
The calls Number("2") and "+2" is identical in the background, they both call ToNumber see: Number and Unary + Operator
When you know what types you are working with, or want a guaranteed type back, use parseFloat and parseInt, otherwise it tends to be easier to only use Number() as it will work within all your calculations, many people choose to use the unary + operator because they find it more pleasing to read/type, but that is only based on preference as it is identical to Number().
Also, when you using parseInt(), you can specify a radix, which is useful in certain applications where you want to work in different number systems, which you cannot do with Number()
If the ECMA standard references does not explain the details for you enough, I will add a summary for you.
I'm taking a course on JavaScript but have no guide on toString method, what's the purpose of these two outputs in JavaScript:
(35).toString(36) >>> "z"!!!
(35).toString(37) >>> throws a RangeError!!!
I am utterly confused as to why I am getting these results, I would really appreciate it if someone could shed some light on this.
tldr: Number.prototype.toString can take an argument called radix that specifies the numeric base to use for the string representation of the number in question.
Object.prototype.toString()
An object's toString method is supposed to return a string representation of that object (I say "supposed" because you can modify it so it doesn't return a string). 35 is clearly not an object, it is a primitive, but you are using it like an object, which causes JavaScript to create a temporary Number object for that toString call (see this StackOverflow answer on autoboxing).
Number.prototype.toString()
About the confusing behavior you are getting by passing 36 to (35).toString: it is because Number.prototype.toString can take an argument that you can use to specify the numeric base to use for the string representation of that number, the argument must be an integer (or any other value that can be coerced to an integer, e.g 35..toString([20])) between 2 and 36, so 2 <= [radix] <= 36 (this means your second example will throw a RangeError).
So, when you execute (35).toString(36), 2 things happen (not necessarily in my order, and most likely it is done in a single step, 35 ====> [string representation of 35 in numeric format specified by "radix"]):
Generate a string representation of the number 35.
Convert the string generated in step #1 to the number base specified by radix.
For example, if you wanted a string representation of 35, in binary form:
console.log(35..toString(2)); // "100011"
Fun fact: the syntax [integer]..[method] is totally valid in JavaScript, the first . is interpreted as a decimal point, the latter as the . that precedes the name of an object's method/property.
Radix
If you don't know what a "radix" is (because I didn't prior to this question, and no, I am no caveman, English is not my native language), here is the definition I got by a Google search for "radix meaning":
the base of a system of numeration.
toString converts an Int to a string. An int is used for math and a string is used as text and math should not be done on it.
Maybe a little more of the code you are looking at would shed more light on what is going on in the script
In JavaScript, variables are loosely typed, so the number 5 and the string "5" may both be treated as a number by several operators. However, is there a generic way to find out JavaScripts conversion abilites in at tunrime, or is it just the overloading of operators for several types that make the loose typing possible?
For example, given a variable a and a string containing a type name type_canditate, is there any way to ask JavaScript, if a may convert to type_candidate in a feasable manner, in contrast to the hard typing operators like instanceof? For example, "5" instanceof Number evaluates false, while Math.sin("5") is perfectly feasable. For numbers, one can obviuosly check if parseFloat(some_number) evaluates to NaN, but this is a special case for numbers.
So, is there any generic way of naming types, and check if some variable may convert to a given type in a useful manner?
There are three primitive data types in JavaScript: string, number and boolean.
Anything can be converted to a string or boolean:
All objects convert to true (except null, which becomes false - I only mention it here because typeof null gives object)
All objects have a built-in toString method which is called when converting to a string.
Converting a number to a string is done by giving the string representation of the number (ie. 5 becomes "5")
Numbers convert to boolean true, unless it's 0 which becomes false.
Converting to a number is a little trickier, but technically possible. If it can find a valid number, then it becomes that number. Otherwise, it becomes NaN.
So basically... any type can become any other type through casting in this way. The only time you have anything resembling an "error condition" is NaN.
I was wondering why do people have to convert numbers to string. What are the practical uses for that kind of conversion?
Similarly why do developers use parseInt or parseFloat to convert a string to a number.
thanks
The variable’s data type is the JavaScript scripting engine’s interpretation of the type of data that variable is currently holding. A string variable holds a string; a number variable holds a number value, and so on. However, unlike many other languages, in JavaScript, the same variable can hold different types of data, all within the same application. This is a concept known by the terms loose typing and dynamic typing, both of which mean that a JavaScript variable can hold different data types at different times depending on context.
With a loosely typed language, you don’t have to declare ahead of time that a variable will be a string or a number or a boolean, as the data type is actually determined while the application is being processed. If you start out with a string variable and then want to use it as a number, that’s perfectly fine, as long as the string actually contains something that resembles a number and not something such as an email address. If you later want to treat it as a string again, that’s fine, too.
The forgiving nature of loose typing can end up generating problems. If you try to add two numbers together, but the JavaScript engine interprets the variable holding one of them as a string data type, you end up with an odd string, rather than the sum you were expecting. Context is everything when it comes to variables and data types with JavaScript.
Using parseInt and parseFloat is important if you want to do arithmetic operations on a number which is in string form. For example
"42" + 1 === "421"
parseInt("42") + 1 === 43;
The reverse is true when you want to do string operations on values which are currently a number.
42 + 1 === 43
(42 + "") + 1 === 421
Why one would want to do the former or latter though is very scenario specific. I'd wager the case of converting strings to numbers for arithmetic operations is the more prominent case though.
An example of when converting numbers to strings is useful is when you want to format the number a certain way, perhaps like a currency (1234.56 -> $1,234.56).
The converse is useful when you want to do arithmetic on strings the represent numbers. Say you have a text box were you allow the user to input a number. The value of that text box will be a string, but you need it as a number to do some arithmetic with it, so you would use parseInt and parseFloat.
string -> number:
Think about simple number validation using JS. if you can convert a string into a number, then you can validate that number before posting to a number, or for use in an arithmetic operation.
number -> string:
String concatenation mainly and display purposes. The language will most often use implicit conversion to convert the number into a string anyway, such as:
1 + " new answer has been posted"
Do remember, Javascript is a loosely typed language. This can hide a lot of implicit type-casting that is occurring.