I read in Javascript: The Good Parts by Douglas Crockford that javascript regular expression literals share the same object. If so, then how come these two regex literals vary in the lastIndex property?
var a = /a/g;
var b = /a/g;
a.lastIndex = 3;
document.write(b.lastIndex);​
JS Fiddle
0 is outputted as opposed to 3.
Section 7.8.5 of the ECMAScript Documentation makes it quite clear they are two different objects:
7.8.5 Regular Expression Literals
A regular expression literal is an input element that is converted to a RegExp object (see 15.10) each time the literal is evaluated. Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical. A RegExp object may also be created at runtime by new RegExp (see 15.10.4) or calling the RegExp constructor as a function (15.10.3).
Because they are different objects.
document.write(a === b);
Even this outputs false.
Either Crockford was wrong, or he was right at the time but times have changed.
I realize this isn't a particularly helpful or informative answer; I'm just pushing back on what I perceive as your disbelief that something Crockford wrote could be (now) false.
Do you have a reference to that claim, by the way? Would be interesting to read it in context (I don't have the book).
Related
The below excerpts refer to ECMAScript 2017.
11.8.4.2 Static Semantics: StringValue
StringLiteral::
"DoubleStringCharactersopt"
'SingleStringCharactersopt'
1. Return the String value whose elements are the SV of this StringLiteral.
11.8.4.3 Static Semantics: SV
A string literal stands for a value of the String type. The String
value (SV) of the literal is described in terms of code unit values
contributed by the various parts of the string literal.
Questions
In the excerpts above, the following terms appear:
string literal
Nonterminal symbol StringLiteral
String value
SV
Could someone help explain the difference between these terms?
Also, what does the last sentence in 11.8.4.2 mean?
A string literal is the thing that you, a human writing or reading code, can recognize as the sequence "..." or '...'
The token StringLiteral is a nonterminal in the formal grammar of EMCAScript that can be replaced by a terminal that is an actual string literal.
A string value is the semantic content of a string literal. The spec says
The String value (SV) of the literal is ...
Therefore, we may be sure that a string literal has a string value: the string value of some string literal is a collection of code unit values.
The identifier SV appears to be shorthand for (and used interchangeably with) "string value".
Also, what does the last sentence in 11.8.4.2 mean?
Every nonterminal "returns" some value when it is evaluated. The line
Return the String value whose elements are the SV of this StringLiteral.
simply means that when the parser finds a StringLiteral in the text of a program, the result of parsing that nonterminal is the string value (i.e., collection of code unit values) associated with the just-parsed StringLiteral.
A lot of the terminology you're looking at is really of value to JavaScript platform maintainers; in practical terms, you almost certainly already know what a "string" is. The other terms are useful for reading the spec.
The term StringLiteral refers to a piece of JavaScript source code that a JavaScript programmer would look at and call "a string"; in other words, in
let a = "hello world";
the StringLiteral is that run of characters on the right side of the = from the opening double-quote to the closing double-quote. It's a "nonterminal" because it's not a "terminal" symbol in the definition of the grammar. Language grammars are built from terminal symbols at the lowest level and non-terminals to describe higher-level subsections of a program. The bold-faced double-quote characters you see in the description of a double-quoted string are examples of terminal symbols.
The term StringValue refers to an internal operation that applies to several components of the grammar; for StringLiteral it has the fairly obvious definition you posted. Semantic rules are written in terms of non-terminals that make up some grammar concept.
The term String value or SV is used for describing the piece-by-piece portions of a string.
The JavaScript spec is particularly wacky with terminology, because the language committee is stuck with describing semantics that evolved willy-nilly in the early years of language adoption. Inventing layers of terminology with much apparent redundancy is a way of coping with the difficulty of creating unambiguous descriptions of what bits of code are supposed to do, down to the last detail and weird special case. It's further complicated by the fact that (for reasons unknown to me) the lexical grammar is broken down in as much excruciating detail as are higher-level constructs, so that really compounds the nit-picky feel of the spec.
An example of when knowing that expanse of terminology would be useful might be an explanation of why it's necessary to "double-up" on backslashes when building a regular expression from a string literal instead of a regular expression literal. It's clear that a call to the RegExp constructor:
var r = new RegExp("foo\\.bar");
has an expression consisting of just one StringLiteral. To make the call to the constructor, then, the semantic rules for that operation will at some point call for getting the StringValue (and thus SV) of that literal, and those rules contain the details for every piece of the literal. That's where you come across the fact that the SV semantics have rules for backslashes, and in particular one that says two backslashes collapse to one.
Now I'm not saying that that explanation would be better than a simple explanation, but it's explicitly clear about every detail of the question.
In "Up & Going", the first book in Kyle Simpson's "You Don't Know JS" series, he presents the following statement, and explains that it contains four expressions:
a = b * 2;
He says that:
2 is a literal value expression
b is a variable expression
b * 2 is an arithmetic expression
a = b * 2 is an assignment expression
All of which I agree with.
The Mozilla Developer Network defines an expression as any valid unit of code that resolves to a value.
Is a not a variable expression because it is a LHS expression and being assigned to rather than evaluated?
Informally, an expression is something which conforms to the syntax for an expression, and is treated as an expression because it appears in an expression context. It is evaluated and results in a value.
For instance, {} is an expression indicating an empty object literal only in expression contexts; elsewhere it might be a block. The same holds for a syntactic construct starting with the function keyword.
In contrast, we normally don't think of the left hand side of assignments as being an "expression". Certainly, we could not write a + 2 = 3. Even when the left side could be considered an expression, such as a or a.b, we normally don't refer to it as an expression, because we think of expressions as being things that are evaluated, whereas such left hand sides are "references".
It is this common usage that the author of your book is following in not identifying a as an expression in a = b * 2.
However, the terminological waters are muddied somewhat by the names of the tokens used in the spec. The spec defines something called an AssignmentExpression, which is basically any expression like 2 or a or a + 2 or a = b (which evaluates to b). So far so good. But it then proceeds to define something called LeftHandSideExpression, which obviously is what occurs on the left-hand side of an assignment (or to which ++ etc. is applied), but defines it as an AssignmentExpression. What? We can't put a 2 on the left-hand side, can we? The spec addresses this by limiting what kinds of AssignmentExpressions can serve as LeftHandSideExpressions, excluding 2, for example (since it is not a "IsValidSimpleAssignmentTarget").
So it is correct to not call the a in a = b * 2 an expression--in the common usage of the term "expression". But it is also technically correct to say that it is an expression, in the sense that that word is used in the spec, where it is a LeftHandSideExpression.
That makes no sense. a is a variable expression just as b is, and they both evaluate to variable references. Which are then used differently of course - the value of the b reference is being accessed, and a value is put into the a reference. Nonetheless, both identifiers form an expression.
This is a technicality however (and there isn't even an official term "variable expression"), and I guess it was omitted here for simplification purposes. Treating left-hand-sides as non-expressions breaks apart in more complex code, though.
Technically, they are called Identifiers (or IdentifierReference if you use the non-terminal symbol name), not variable expressions and they evaluate to references. LHS are also considered expressions. The grammar includes NewExpressions, CallExpressions, MemberExpressions and PrimaryExpressions (which Identifiers belong to). Assignment operator semantics won't allow for many of these expressions though.
So technically, a there is also an expression.
I am working with the karma.js library right now. I was walking through there example project and came across some code that I don't really understand. I am sure it is easy enough, but an explanation would be very helpful in understanding what the lib is doing. From what I can understand it is looping through the files in the __karma__ object and doing some kind of regex matching in the if statement with /Spec\.js$/.
for (var file in window.__karma__.files) {
if (/Spec\.js$/.test(file)) {
tests.push(file);
}
}
If that is a regex matching, you can go from a string directly to access an object in javascript. That is really interesting.
Thanks for the help.
That's a for-in loop. It looks through the enumerable properties of an object. So for instance, if you have:
var obj = {
a: 42,
b: 27
};
...then within the loop, file will be "a" on one pass and "b" on another (but the order is not defined).
The var in it is just declaring a variable. Note that unlike some other languages, the variable is not limited in scope to just the loop, the declaration is function-wide.
The regex, /Spec\.js$/, is checking to see if the string ends with "Spec.js". In a regex, $ matches "end of line/input". A backslash is needed before the . because an unescaped . matches any character.
More about for-in:
...in the specification.
...on my blog.
More about var:
...on my blog.
/Spec\.js$/ is not string but a regular expression literal. What's essentially doing is:
var re = new RegExp('Spec\.js$');
re.test(file)
See MDN article on Regular Expressions for more details: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
You indeed can go directly from a literal value into accessing its properties in JavaScript:
/^regex$/.test(...)
"a string".split(...)
etc. it all works.
With numbers however you need special treatment:
(1).toString()
In the pursuit of understanding JavaScript/OOP better, I'm curious how regular expression argument parameters are handled in JavaScript. I already understand a lot about regular expressions, so this isn't about interpreting patterns. This is about identifying how JavaScript handles it.
Example:
newStr = str.replace(/(^\W*|\W*$)/gi,'');
This basically trims any special characters and white-space from a string. However, /(^\W*|\W*$)/gi is not an encapsulated string, therefore, it baffles me to understand this concept since the JS object is not a string, nor a number. Is this object-type alone (i.e., regex-only), or does it serve other purposes?
It's just a special syntax that JavaScript has for regular expressions. It evaluates to an object, and is no different than:
var rex = /(^\W*|\W*$)/gi;
decision = str.replace(rex, '');
Or:
var rex = new RegExp('^\\W*|\\W*$', 'gi');
The RegExp MDN documentation has plenty of detailed info.
Regexes are first-class citizens in JavaScript, i. e. they are a separate object type.
You can construct a new RegExp object using its standard constructor:
var regex = new RegExp("(^\\W*|\\W*$)", "gi");
or using the special "regex literal" notation that allows you to cut down on backslashes:
var regex = /(^\W*|\W*$)/gi;
/(^\W*|\W*$)/gi is a regular expression literal, which is an object type in JavaScript. This type can be passed as the first parameter to the replace method, which accepts either a regex or a substring.
Is this object-type alone (i.e., regex-only)
This is correct. RegExp objects are a special type of value that's built-in to the language. They are one of only a handful of types that have "literal" representations in JavaScript.
This does make them fairly unique; there aren't any other special-purpose literals in the language. The other literals are generic types like:
null
boolean values (true/false)
numbers (1.0, 2e3, -5)
strings ('hello', "goodbye")
Arrays ([1, 2, 3])
Objects ({ name: "Bob", age: 18 })
To add to the people saying largely the same thing:
On top of the fact that it's a literal with its own syntax, you can actually access its methods in literal form:
/bob/gi.exec(" My name is Bob ");
...so long as the browser you're using is young enough to indeed support RegEx literals (it's pretty hard to find one that doesn't, these days, and if you do, does the browser support CSS?).
Basically, my question is about how Javascript handles regex literals.
Contrasting with number, string and boolean where literals are primitive data types and corresponding Number, String and Boolean objects exist with seamless type conversion, are regex literals anonymous instances of the RegExp object or is this a case of regex being treated like primitive data with seamless type conversion to RegExp?
"The complete Reference Javascript, 2nd edition, Powell and Schneider (MH)" contradicts itself - at one place the authors say that /regex/ is automatically typecasted into RegExp when needed and at another place they say that /regex/ is nothing but an instance of RegExp!
EDIT: Please provide a reference to a reliable source
Here's what the spec has to say:
A regular expression literal is an input element that is converted to a RegExp object when it is scanned. The object is created before evaluation of the containing program or function begins. Evaluation of the literal produces a reference to that object; it does not create a new object. Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical.
There is no primitive regex type that autoboxes to an object in the same way as string or number.
Note, however, that not all browsers implement the "instantiate-once-per-literal" behavior, including Safari and IE6 (and possibly later), so portable code shouldn't depend on it. The abortive ECMAScript 4 draft would have changed the behavior to match those browsers:
In ES3 a regular expression literal like /a*b/mg denotes a single unique RegExp object that is created the first time the literal is encountered during evaluation. In ES4 a new RegExp object is created every time the literal is encountered during evaluation.
Also, some browsers (Firefox <3, Safari) report typeof /regex/ as "function", so portable code should avoid typeof on RegExp instances—stick with instanceof.
Yes, the following two expressions are equivalent:
var r1 = /ab+c/i,
r2 =new RegExp("ab+c", "i");
The constructor property of both points to the RegExp constructor function:
(/ab+c/i).constructor === RegExp // true
r2.constructor === RegExp // true
And a regexp literal is an instance of RegExp:
/ab+c/i instanceof RegExp // true
The basic difference is that defining regular expressions using the constructor function allows you to build and compile an expression from a string. This can be very useful for constructing complex expressions that will be re-used.
Yes, new RegExp("something", "g") is the same as /something/g