ECMAScript 2017: Difference between string literal, StringValue, String value, and SV - javascript

The below excerpts refer to ECMAScript 2017.
11.8.4.2 Static Semantics: StringValue
StringLiteral::
"DoubleStringCharactersopt"
'SingleStringCharactersopt'
1. Return the String value whose elements are the SV of this StringLiteral.
11.8.4.3 Static Semantics: SV
A string literal stands for a value of the String type. The String
value (SV) of the literal is described in terms of code unit values
contributed by the various parts of the string literal.
Questions
In the excerpts above, the following terms appear:
string literal
Nonterminal symbol StringLiteral
String value
SV
Could someone help explain the difference between these terms?
Also, what does the last sentence in 11.8.4.2 mean?

A string literal is the thing that you, a human writing or reading code, can recognize as the sequence "..." or '...'
The token StringLiteral is a nonterminal in the formal grammar of EMCAScript that can be replaced by a terminal that is an actual string literal.
A string value is the semantic content of a string literal. The spec says
The String value (SV) of the literal is ...
Therefore, we may be sure that a string literal has a string value: the string value of some string literal is a collection of code unit values.
The identifier SV appears to be shorthand for (and used interchangeably with) "string value".
Also, what does the last sentence in 11.8.4.2 mean?
Every nonterminal "returns" some value when it is evaluated. The line
Return the String value whose elements are the SV of this StringLiteral.
simply means that when the parser finds a StringLiteral in the text of a program, the result of parsing that nonterminal is the string value (i.e., collection of code unit values) associated with the just-parsed StringLiteral.

A lot of the terminology you're looking at is really of value to JavaScript platform maintainers; in practical terms, you almost certainly already know what a "string" is. The other terms are useful for reading the spec.
The term StringLiteral refers to a piece of JavaScript source code that a JavaScript programmer would look at and call "a string"; in other words, in
let a = "hello world";
the StringLiteral is that run of characters on the right side of the = from the opening double-quote to the closing double-quote. It's a "nonterminal" because it's not a "terminal" symbol in the definition of the grammar. Language grammars are built from terminal symbols at the lowest level and non-terminals to describe higher-level subsections of a program. The bold-faced double-quote characters you see in the description of a double-quoted string are examples of terminal symbols.
The term StringValue refers to an internal operation that applies to several components of the grammar; for StringLiteral it has the fairly obvious definition you posted. Semantic rules are written in terms of non-terminals that make up some grammar concept.
The term String value or SV is used for describing the piece-by-piece portions of a string.
The JavaScript spec is particularly wacky with terminology, because the language committee is stuck with describing semantics that evolved willy-nilly in the early years of language adoption. Inventing layers of terminology with much apparent redundancy is a way of coping with the difficulty of creating unambiguous descriptions of what bits of code are supposed to do, down to the last detail and weird special case. It's further complicated by the fact that (for reasons unknown to me) the lexical grammar is broken down in as much excruciating detail as are higher-level constructs, so that really compounds the nit-picky feel of the spec.
An example of when knowing that expanse of terminology would be useful might be an explanation of why it's necessary to "double-up" on backslashes when building a regular expression from a string literal instead of a regular expression literal. It's clear that a call to the RegExp constructor:
var r = new RegExp("foo\\.bar");
has an expression consisting of just one StringLiteral. To make the call to the constructor, then, the semantic rules for that operation will at some point call for getting the StringValue (and thus SV) of that literal, and those rules contain the details for every piece of the literal. That's where you come across the fact that the SV semantics have rules for backslashes, and in particular one that says two backslashes collapse to one.
Now I'm not saying that that explanation would be better than a simple explanation, but it's explicitly clear about every detail of the question.

Related

Do reserved words need to be quoted when set as property names of JavaScript objects?

Given an object literal, or jQuery(html, attributes) object, does any specification state that reserved words, or future reserved words MUST be quoted?
Or, can, for example, class be set as a property name of an object without using quotes to surround the property name, without the practice being contrary to a specification concerning identifiers, property names, or use of reserved words?
Seeking a conclusive answer as to this question to avoid confusion.
let objLit = {
class: 123,
var: "abc",
let: 456,
const: "def",
import: 789
}
console.dir(objLit);
jQuery("<div>012</div>", {
class: "ghi"
})
.appendTo("body");
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js">
</script>
Related:
What is the difference between object keys with quotes and without quotes?
Comments on this answer
Specification
7.6 Identifier Names and Identifiers:
Identifier Names are tokens that are interpreted according to the
grammar given in the “Identifiers” section of chapter 5 of the Unicode
standard, with some small modifications.
An Identifier is an IdentifierName that is not a ReservedWord
ECMAScript 5+
No, quotes were not needed since ECMAScript 5. Here's why:
As mentioned in your post, from the ECMAScript® 5.1 Language Specification:
7.6 Identifier Names and Identifiers
Identifier Names are tokens that are interpreted according to the grammar given in the “Identifiers” section of chapter 5 of the Unicode standard, with some small modifications. An Identifier is an IdentifierName that is not a ReservedWord (see 7.6.1).
[...]
Syntax
Identifier ::
IdentifierName but not ReservedWord
By specification, a ReservedWord is:
7.6.1 Reserved Words
A reserved word is an IdentifierName that cannot be used as an Identifier.
Syntax
ReservedWord ::
Keyword
FutureReservedWord
NullLiteral
BooleanLiteral
This includes keywords, future keywords, null, and boolean literals. The full list is as follows:
7.6.1.1 Keywords
break do instanceof typeof
case else new var
catch finally return void
continue for switch while
debugger function this with
default if throw
delete in try
7.6.1.2 Future Reserved Words
class enum extends super
const export import
7.8.1 Null Literals
null
7.8.2 Boolean Literals
true
false
The above (Section 7.6) implies that IdentifierNames can be ReservedWords, and from the specification for object initializers:
11.1.5 Object Initialiser
[...]
Syntax
ObjectLiteral :
{ }
{ PropertyNameAndValueList }
{ PropertyNameAndValueList , }
Where PropertyName is, by specification:
PropertyName :
IdentifierName
StringLiteral
NumericLiteral
As you can see, a PropertyName may be an IdentifierName, thus allowing ReservedWords to be PropertyNames. That conclusively tells us that, by specification, it is allowed to have ReservedWords such as class and var as PropertyNames unquoted just like string literals or numeric literals.
ECMAScript <5
To go more in depth as to why this wasn't allowed in previous versions before ES5, you have to look at how PropertyName was defined. Per the ECMAScript® 3 Language Specification:
PropertyName :
Identifier
StringLiteral
NumericLiteral
As you can see, PropertyName was an Identifer - not an IdentifierName, thus leading to the inability for ReservedWords as PropertyNames.
Given an object literal, or jQuery (html, attributes) object, does any specification state that reserved words, or future reserved words MUST be quoted?
No (starting with ES5).
The definition of property in the spec is that it is any identifier name. class is a perfectly good identifier name.
As others have pointed out in the comments, according to the spec, the property name in an object literal may be an (unquoted) IdentifierName (in addition to being a string etc.). IdentifierName is, for all practical purposes, any sequence of Unicode "letters", as given in section 7.6.
Note that the syntax error generated by
const {class} = obj;
is not an exception. That's not an object literal, which is what the question is about; it's an assignment (or the destructuring kind), which attempts to assign a variable class. Of course you can't, never have been able to, and never will be able to have variables which are named with reserved words.
See also this blog post, which although not authoritative is a reliable, high-quality source of information about all things ES5/6/7.
Note that in ES3, the definition of PropertyName was Identifier, not IdentifierName as in ES5. That prevented using properties such as class, since class is not an identifier. It was this change that permitted the use of unquoted reserved words as properties in object literals (as well as in dot notation).
With regard to "jQuery objects", a "jQuery object" is just a regular old JS object. Do you mean the DOM elements held by jQuery objects? They are a kind of hybrid of native objects and JS objects. As JS objects, they can have properties. However, they cannot be written in object literal form, so the question does not really apply to them. (As native (DOM) objects, they can have attributes, the latter case not being covered by the JS spec.)
This answer cannot compete with those already given but I'd love to chime in nonetheless.
In my code I prefer to ALWAYS quote keys, for example:
var o;
o = {
"label": "Hello",
"index": 3
};
This way, the problem of strange names or reserved keywords doesn't even arise. Furthermore, all object literals are written in a style that is very near to valid JSON, as an added bonus copy+paste into a separate JSON file (and vice-versa) can be done very quickly.
Today, I consider this a must-have style for clean code.

Dust JS Dot Into string Key Name

I'm not sure if I'm doing something wrong, but I have "dates" as a key for the object and Dust seems to just output exactly what I put in rather than evaluate properly.
{#.weeks pos=items}
{pos['2016-02-15].id}
{/.weeks}
Output:
{pos.'2016-02-15'.id}
How can I output the ID rather than output the string?
Dust does not allow the character - as part of an array key.
As you mentioned in your comment, - is allowed in Dust references, but the rules are slightly different.
Dust references must not start with a number, and contain the characters 0-9a-zA-Z_$-. This mirrors the rules for real Javascript variables, except for the hyphen.
Array keys are allowed to start with numbers, but cannot contain hyphens. So when you use a date as part of the key, Dust uses the array key evaluation path since the date starts with a number.
This would work, for example, using the array-key evaluation path:
{#.weeks pos=items}
{pos[20160215].id}
{/.weeks}
And so would this, because it uses the reference evaluation path:
{#.weeks pos=items}
{pos[date-2016-02-15].id}
{/.weeks}
You'll have to munge your data slightly.
I think you've uncovered an inconsistency in the way Dust handles reference naming. In early Dust, references were only allowed to be valid JS variable names. This restriction was relaxed later on but there are clearly some rough bits around it.

JavaScript Objects: How do Regular Expression objects get passed?

In the pursuit of understanding JavaScript/OOP better, I'm curious how regular expression argument parameters are handled in JavaScript. I already understand a lot about regular expressions, so this isn't about interpreting patterns. This is about identifying how JavaScript handles it.
Example:
newStr = str.replace(/(^\W*|\W*$)/gi,'');
This basically trims any special characters and white-space from a string. However, /(^\W*|\W*$)/gi is not an encapsulated string, therefore, it baffles me to understand this concept since the JS object is not a string, nor a number. Is this object-type alone (i.e., regex-only), or does it serve other purposes?
It's just a special syntax that JavaScript has for regular expressions. It evaluates to an object, and is no different than:
var rex = /(^\W*|\W*$)/gi;
decision = str.replace(rex, '');
Or:
var rex = new RegExp('^\\W*|\\W*$', 'gi');
The RegExp MDN documentation has plenty of detailed info.
Regexes are first-class citizens in JavaScript, i. e. they are a separate object type.
You can construct a new RegExp object using its standard constructor:
var regex = new RegExp("(^\\W*|\\W*$)", "gi");
or using the special "regex literal" notation that allows you to cut down on backslashes:
var regex = /(^\W*|\W*$)/gi;
/(^\W*|\W*$)/gi is a regular expression literal, which is an object type in JavaScript. This type can be passed as the first parameter to the replace method, which accepts either a regex or a substring.
Is this object-type alone (i.e., regex-only)
This is correct. RegExp objects are a special type of value that's built-in to the language. They are one of only a handful of types that have "literal" representations in JavaScript.
This does make them fairly unique; there aren't any other special-purpose literals in the language. The other literals are generic types like:
null
boolean values (true/false)
numbers (1.0, 2e3, -5)
strings ('hello', "goodbye")
Arrays ([1, 2, 3])
Objects ({ name: "Bob", age: 18 })
To add to the people saying largely the same thing:
On top of the fact that it's a literal with its own syntax, you can actually access its methods in literal form:
/bob/gi.exec(" My name is Bob ");
...so long as the browser you're using is young enough to indeed support RegEx literals (it's pretty hard to find one that doesn't, these days, and if you do, does the browser support CSS?).

Why do two regex literals in my Javascript vary on a property?

I read in Javascript: The Good Parts by Douglas Crockford that javascript regular expression literals share the same object. If so, then how come these two regex literals vary in the lastIndex property?
var a = /a/g;
var b = /a/g;
a.lastIndex = 3;
document.write(b.lastIndex);​
JS Fiddle
0 is outputted as opposed to 3.
Section 7.8.5 of the ECMAScript Documentation makes it quite clear they are two different objects:
7.8.5 Regular Expression Literals
A regular expression literal is an input element that is converted to a RegExp object (see 15.10) each time the literal is evaluated. Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical. A RegExp object may also be created at runtime by new RegExp (see 15.10.4) or calling the RegExp constructor as a function (15.10.3).
Because they are different objects.
document.write(a === b);
Even this outputs false.
Either Crockford was wrong, or he was right at the time but times have changed.
I realize this isn't a particularly helpful or informative answer; I'm just pushing back on what I perceive as your disbelief that something Crockford wrote could be (now) false.
Do you have a reference to that claim, by the way? Would be interesting to read it in context (I don't have the book).

in JSON, Why is each name quoted?

The JSON spec says that JSON is an object or an array. In the case of an object,
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. ...
And later, the spec says that a string is surrounded in quotes.
Why?
Thus,
{"Property1":"Value1","Property2":18}
and not
{Property1:"Value1",Property2:18}
Question 1: why not allow the name in the name/value pairs to be unquoted identifiers?
Question 2: Is there a semantic difference between the two representations above, when evaluated in Javascript?
I leave a quote from a presentation that Douglas Crockford (the creator of the JSON standard) gave to Yahoo.
He talks about how he discovered JSON, and amongst other things why he decided to use quoted keys:
....
That was when we discovered the
unquoted name problem. It turns out
ECMA Script 3 has a whack reserved
word policy. Reserved words must be
quoted in the key position, which is
really a nuisance. When I got around
to formulizing this into a standard, I
didn't want to have to put all of the
reserved words in the standard,
because it would look really stupid.
At the time, I was trying to convince
people: yeah, you can write
applications in JavaScript, it's
actually going to work and it's a good
language. I didn't want to say, then,
at the same time: and look at this
really stupid thing they did! So I
decided, instead, let's just quote the
keys.
That way, we don't have to tell
anybody about how whack it is.
That's why, to this day, keys are quoted in
JSON.
You can find the complete video and transcript here.
Question 1: why not allow the name in the name/value pairs to be unquoted identifiers?
The design philosophy of JSON is "Keep it simple"
"Quote names with "" is a lot simpler than "You may quote names with " or ' but you don't have to, unless they contain certain characters (or combinations of characters that would make it a keyword) and ' or " may need to be quoted depending on what delimiter you selected".
Question 2: Is there a semantic difference between the two representations above, when evaluated in Javascript?
No. In JavaScript they are identical.
Both : and whitespace are permitted in identifiers. Without the quotes, this would cause ambiguity when trying to determine what exactly constitutes the identifier.
In javascript objects can be used like a hash/hashtable with key pairs.
However if your key has characters that javascript could not tokenize as a name, it would fail when trying it access like a property on an object rather than a key.
var test = {};
test["key"] = 1;
test["#my-div"] = "<div> stuff </div>";
// test = { "key": 1, "#my-div": "<div> stuff </div>" };
console.log(test.key); // should be 1
console.log(test["key"]); // should be 1
console.log(test["#my-div"]); // should be "<div> stuff </div>";
console.log(test.#my-div); // would not work.
identifiers can sometimes have characters that can not be evaluated as a token/identifier in javascript, thus its best to put all identifiers in strings for consistency.
If json describes objects, then in practise you get the following
var foo = {};
var bar = 1;
foo["bar"] = "hello";
foo[bar] = "goodbye";
so then,
foo.bar == "hello";
foo[1] == "goodbye" // in setting it used the value of var bar
so even if your examples do produce the same result, their equivalents in "raw code" wouldn't. Maybe that's why?? dunno, just an idea.
I think the right answer to Cheeso's question is that the implementation surpassed the documentation. It no longer requires a string as the key, but rather something else, which can either be a string (ie quoted) or (probably) anything that can be used as a variable name, which I will guess means start with a letter, _, or $, and include only letters, numbers, and the $ and _.
I wanted to simplify the rest for the next person who visits this question with the same idea I did. Here's the meat:
Variable names are not interpolated in JSON when used as an object key (Thanks Friedo!)
Breton, using "identifier" instead of "key", wrote that "if an identifier happens to be a reserved word, it is interpreted as that word rather than as an identifier." This may be true, but I tried it without any trouble:
var a = {do:1,long:2,super:3,abstract:4,var:5,break:6,boolean:7};
a.break
=> 6
About using quotes, Quentin wrote "...but you don't have to, unless [the key] contains certain characters (or combinations of characters that would make it a keyword)"
I found the former part (certain characters) is true, using the # sign (in fact, I think $ and _ are the only characters that don't cause the error):
var a = {a#b:1};
=> Syntax error
var a = {"a#b":1};
a['a#b']
=> 1
but the parenthetical about keywords, as I showed above, isn't true.
What I wanted works because the text between the opening { and the colon, or between the comma and the colon for subsequent properties is used as an unquoted string to make an object key, or, as Friedo put it, a variable name there doesn't get interpolated:
var uid = getUID();
var token = getToken(); // Returns ABC123
var data = {uid:uid,token:token};
data.token
=> ABC123
It may reduce data size if quotes on name are only allowed when necessary

Categories