According to Crockford's json.org, a JSON object is made up of members, which is made up of pairs.
Every pair is made of a string and a value, with a string being defined as:
A string is a sequence of zero or more
Unicode characters, wrapped in double
quotes, using backslash escapes. A
character is represented as a single
character string. A string is very
much like a C or Java string.
But in practice most programmers don't even know that a JSON key should be surrounded by double quotes, because most browsers don't require the use of double quotes.
Does it make any sense to bother surrounding your JSON in double quotes?
Valid Example:
{
"keyName" : 34
}
As opposed to the invalid:
{
keyName : 34
}
The real reason about why JSON keys should be in quotes, relies in the semantics of Identifiers of ECMAScript 3.
Reserved words cannot be used as property names in Object Literals without quotes, for example:
({function: 0}) // SyntaxError
({if: 0}) // SyntaxError
({true: 0}) // SyntaxError
// etc...
While if you use quotes the property names are valid:
({"function": 0}) // Ok
({"if": 0}) // Ok
({"true": 0}) // Ok
The own Crockford explains it in this talk, they wanted to keep the JSON standard simple, and they wouldn't like to have all those semantic restrictions on it:
....
That was when we discovered the
unquoted name problem. It turns out
ECMA Script 3 has a whack reserved
word policy. Reserved words must be
quoted in the key position, which is
really a nuisance. When I got around
to formulizing this into a standard, I
didn't want to have to put all of the
reserved words in the standard,
because it would look really stupid.
At the time, I was trying to convince
people: yeah, you can write
applications in JavaScript, it's
actually going to work and it's a good
language. I didn't want to say, then,
at the same time: and look at this
really stupid thing they did! So I
decided, instead, let's just quote the
keys.
That way, we don't have to tell
anybody about how whack it is.
That's why, to this day, keys are quoted in
JSON.
...
The ECMAScript 5th Edition Standard fixes this, now in an ES5 implementation, even reserved words can be used without quotes, in both, Object literals and member access (obj.function Ok in ES5).
Just for the record, this standard is being implemented these days by software vendors, you can see what browsers include this feature on this compatibility table (see Reserved words as property names)
Yes, it's invalid JSON and will be rejected otherwise in many cases, for example jQuery 1.4+ has a check that makes unquoted JSON silently fail. Why not be compliant?
Let's take another example:
{ myKey: "value" }
{ my-Key: "value" }
{ my-Key[]: "value" }
...all of these would be valid with quotes, why not be consistent and use them in all cases, eliminating the possibility of a problem?
One more common example in the web developer world: There are thousands of examples of invalid HTML that renders in most browsers...does that make it any less painful to debug or maintain? Not at all, quite the opposite.
Also #Matthew makes the best point of all in comments below, this already fails, unquoted keys will throw a syntax error with JSON.parse() in all major browsers (and any others that implement it correctly), you can test it here.
If I understand the standard correctly, what JSON calls "objects" are actually much closer to maps ("dictionaries") than to actual objects in the usual sense. The current standard easily accommodates an extension allowing keys of any type, making
{
"1" : 31.0,
1 : 17,
1n : "valueForBigInt1"
}
a valid "object/map" of 3 different elements.
If not for this reason, I believe the designers would have made quotes around keys optional for all cases (maybe except keywords).
YAML, which is in fact a superset of JSON, supports what you want to do. ALthough its a superset, it lets you keep it as simple as you want.
YAML is a breath of fresh air and it may be worth your time to have a look at it. Best place to start is here: http://en.wikipedia.org/wiki/YAML
There are libs for every language under the sun, including JS, eg https://github.com/nodeca/js-yaml
Related
I'm currently looking at regexs and emojis, and I'd like to use unicode property escapes to simplify the task
In https://unicode.org/reports/tr18/#Full_Properties, it lists a number of emoji properties such as Emoji and Emoji_Presentation etc.
Creating a regex using these patterns works:
const re = /\p{Emoji}/gu
The same page also lists RGI_Emoji, which is
The set of all emoji (characters and sequences) covered by ED-20, ED-21, ED-22, ED-23, ED-24, and ED-25.
or basic emojis, modifiers, etc, which seems to cover all use cases that I'm looking at.
However, creating a regex using this:
const re = /\p{RGI_Emoji}/gu
Gives a SyntaxError:
Uncaught SyntaxError: invalid property name in regular expression
The unicode page does mention that
Properties marked with * are properties of strings, not just single code points.
which RGI_Emoji is marked as. My knowledge of unicode isn't amazing, so I'm not sure of the proper way to use this.
Is it possible to use RGI_Emoji in a regex, and if so, what's the correct way to use it?
RGI_Emoji is not available in JavaScript yet.
It is mentioned on top of the Full Properties table that,
Properties marked with * are properties of strings, not just single code points.
Support for following sequence properties is being proposed in proposal-regexp-unicode-sequence-properties. The proposal is at stage 2 i.e. not part of the ECMAScript specification yet and hence not available.
RGI_Emoji
Basic_Emoji
Emoji_Keycap_Sequence
RGI_Emoji_Modifier_Sequence
RGI_Emoji_Flag_Sequence
RGI_Emoji_Tag_Sequence
RGI_Emoji_ZWJ_Sequence
To further confirm, check available \p{UnicodeBinaryPropertyName}'s in the latest ECMAScript specification. Only following properties of characters related to emoji's are available:
Emoji
Emoji_Component
EComp
Emoji_Modifier
EMod
Emoji_Modifier_Base
EBase
Emoji_Presentation
You'll have to form a regular expression with unicode ranges covering ED-20, ED-21, ED-22, ED-23, ED-24, and ED-25 unicode sets. Like suggested by #JosefZ in a comment.
This discussion may help JavaScript regular expression for Unicode emoji
The emoji properties were only added to UTS #18 relatively recently (mid 2020), and this involved a significant change in Unicode's properties model in that it involved formally defining for the first time properties of strings. RGI_Emoji is a binary-valued property of strings of characters. A potential issue for use of string properties in regex is that the set corresponding to a string property is potentially a vast number of strings. To avoid potential problems in existing implementations, UTS #18 allows for use of the syntax \m{Property_Name} for string properties. See https://www.unicode.org/reports/tr18/#Resolving_Character_Ranges_with_Strings for more information.
It's possible that the implementation you're using has not been fully updated for Rev. 21 of UTS #18, with support for all new properties, or that it requires you to use the \m syntax for string properties.
The online Unicode UnicodeSet utility does support enumerating string results of a regex using the RGI_Emoji property:
https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BRGI_Emoji%7D&g=&i=
We are trying to get rid of all of our eval() calls in our JavaScript. Unfortunately, I am not much of a JavaScript programmer, and I need some help.
Many of our eval() calls operate on strings, outputs from a web service, that are very JSON-like, for example, we might eval the following string:
ClassMetaData['Asset$Flex'] = {
fields: {
}
,label: 'Flex Fields'
};
I've seen various suggestions on the Internet suggesting Ext.decode(). The documentation for it says - "Decodes (parses) a JSON string to an object. If the JSON is invalid, this function throws a SyntaxError unless the safe option is set." The string that I am supplying as an argument isn't legitimate JSON as I understand it (the field names aren't quoted), but Ext.decode() nearly works for me anyway. If I decode the above string, I get an error (why?) - "Uncaught SyntaxError: Unexpected token ;". However, if I remove the trailing semi-colon, and decode, everything seems to be fine.
I am using the following code to determine whether the decode call and the eval call do the same thing:
var evaled = eval(inputString);
var decoded = Ext.decode(inputString.replace(";", "")); // remove trailing ";", if any
console.log("Equal? - " + (JSON.stringify(decoded) == JSON.stringify(evaled)));
Unfortunately, this is not a very good solution. For example, some of the input strings to eval are fairly complex. They may have all sorts of embedded characters - semicolons, HTML character encodings, etc. Decode may complain about some other syntax problem, besides semicolons at the end, and I haven't found a good way to determine where the problem is that decode objects to. (It doesn't say "illegal character in position 67", for example.)
My questions:
Could we, with a small amount of work, create a generic solution
using decode?
Is there an easy way to convert our JSON-like input
into true JSON?
Is there a better way of comparing the results of
eval and decode than JSON.stringify(decoded) == JSON.stringify(evaled)?
I'm not sure if I'm doing something wrong, but I have "dates" as a key for the object and Dust seems to just output exactly what I put in rather than evaluate properly.
{#.weeks pos=items}
{pos['2016-02-15].id}
{/.weeks}
Output:
{pos.'2016-02-15'.id}
How can I output the ID rather than output the string?
Dust does not allow the character - as part of an array key.
As you mentioned in your comment, - is allowed in Dust references, but the rules are slightly different.
Dust references must not start with a number, and contain the characters 0-9a-zA-Z_$-. This mirrors the rules for real Javascript variables, except for the hyphen.
Array keys are allowed to start with numbers, but cannot contain hyphens. So when you use a date as part of the key, Dust uses the array key evaluation path since the date starts with a number.
This would work, for example, using the array-key evaluation path:
{#.weeks pos=items}
{pos[20160215].id}
{/.weeks}
And so would this, because it uses the reference evaluation path:
{#.weeks pos=items}
{pos[date-2016-02-15].id}
{/.weeks}
You'll have to munge your data slightly.
I think you've uncovered an inconsistency in the way Dust handles reference naming. In early Dust, references were only allowed to be valid JS variable names. This restriction was relaxed later on but there are clearly some rough bits around it.
I was learning about javascript string methods here.
Under section Extracting String Characters, it said:
There are 2 safe methods for extracting string characters:
charAt(position)
charCodeAt(position)
The questions here are:
Why these methods are called safe?
What are these methods protecting from?
There are two ways to access a character from a string.
// Bracket Notation
"Test String1"[6]
// Real Implementation
"Test String1".charAt(6)
It is a bad idea to use brackets, for these reasons (Source):
This notation does not work in IE7.
The first code snippet will return
undefined in IE7. If you happen to use
the bracket notation for strings all
over your code and you want to migrate
to .charAt(pos), this is a real pain:
Brackets are used all over your code
and there's no easy way to detect if
that's for a string or an
array/object.
You can't set the character using this notation. As there is no warning of
any kind, this is really confusing and
frustrating. If you were using the
.charAt(pos) function, you would not
have been tempted to do it.
Also, it can produce unexpected results in edge cases
console.log('hello' [NaN]) // undefined
console.log('hello'.charAt(NaN)) // 'h'
console.log('hello' [true]) //undefined
console.log('hello'.charAt(true)) // 'e'
Basically, it's a short-cut notation that is not fully implemented across all browsers.
Note, you are not able to write characters using either method. However, that functionality is a bit easier to understand with the .charAt() function which, in most languages, is a read-only function.
So for the compatibility purpose .charAt is considered to be safe.
Source
Speed Test: http://jsperf.com/string-charat-vs-bracket-notation
Testing in Chrome 47.0.2526.80 on Mac OS X 10.10.4
Test Ops/sec
String charAt
testCharAt("cat", 1);
117,553,733
±1.25%
fastest
String bracket notation
testBracketNotation("cat", 1);
118,251,955
±1.56%
fastest
The JSON spec says that JSON is an object or an array. In the case of an object,
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. ...
And later, the spec says that a string is surrounded in quotes.
Why?
Thus,
{"Property1":"Value1","Property2":18}
and not
{Property1:"Value1",Property2:18}
Question 1: why not allow the name in the name/value pairs to be unquoted identifiers?
Question 2: Is there a semantic difference between the two representations above, when evaluated in Javascript?
I leave a quote from a presentation that Douglas Crockford (the creator of the JSON standard) gave to Yahoo.
He talks about how he discovered JSON, and amongst other things why he decided to use quoted keys:
....
That was when we discovered the
unquoted name problem. It turns out
ECMA Script 3 has a whack reserved
word policy. Reserved words must be
quoted in the key position, which is
really a nuisance. When I got around
to formulizing this into a standard, I
didn't want to have to put all of the
reserved words in the standard,
because it would look really stupid.
At the time, I was trying to convince
people: yeah, you can write
applications in JavaScript, it's
actually going to work and it's a good
language. I didn't want to say, then,
at the same time: and look at this
really stupid thing they did! So I
decided, instead, let's just quote the
keys.
That way, we don't have to tell
anybody about how whack it is.
That's why, to this day, keys are quoted in
JSON.
You can find the complete video and transcript here.
Question 1: why not allow the name in the name/value pairs to be unquoted identifiers?
The design philosophy of JSON is "Keep it simple"
"Quote names with "" is a lot simpler than "You may quote names with " or ' but you don't have to, unless they contain certain characters (or combinations of characters that would make it a keyword) and ' or " may need to be quoted depending on what delimiter you selected".
Question 2: Is there a semantic difference between the two representations above, when evaluated in Javascript?
No. In JavaScript they are identical.
Both : and whitespace are permitted in identifiers. Without the quotes, this would cause ambiguity when trying to determine what exactly constitutes the identifier.
In javascript objects can be used like a hash/hashtable with key pairs.
However if your key has characters that javascript could not tokenize as a name, it would fail when trying it access like a property on an object rather than a key.
var test = {};
test["key"] = 1;
test["#my-div"] = "<div> stuff </div>";
// test = { "key": 1, "#my-div": "<div> stuff </div>" };
console.log(test.key); // should be 1
console.log(test["key"]); // should be 1
console.log(test["#my-div"]); // should be "<div> stuff </div>";
console.log(test.#my-div); // would not work.
identifiers can sometimes have characters that can not be evaluated as a token/identifier in javascript, thus its best to put all identifiers in strings for consistency.
If json describes objects, then in practise you get the following
var foo = {};
var bar = 1;
foo["bar"] = "hello";
foo[bar] = "goodbye";
so then,
foo.bar == "hello";
foo[1] == "goodbye" // in setting it used the value of var bar
so even if your examples do produce the same result, their equivalents in "raw code" wouldn't. Maybe that's why?? dunno, just an idea.
I think the right answer to Cheeso's question is that the implementation surpassed the documentation. It no longer requires a string as the key, but rather something else, which can either be a string (ie quoted) or (probably) anything that can be used as a variable name, which I will guess means start with a letter, _, or $, and include only letters, numbers, and the $ and _.
I wanted to simplify the rest for the next person who visits this question with the same idea I did. Here's the meat:
Variable names are not interpolated in JSON when used as an object key (Thanks Friedo!)
Breton, using "identifier" instead of "key", wrote that "if an identifier happens to be a reserved word, it is interpreted as that word rather than as an identifier." This may be true, but I tried it without any trouble:
var a = {do:1,long:2,super:3,abstract:4,var:5,break:6,boolean:7};
a.break
=> 6
About using quotes, Quentin wrote "...but you don't have to, unless [the key] contains certain characters (or combinations of characters that would make it a keyword)"
I found the former part (certain characters) is true, using the # sign (in fact, I think $ and _ are the only characters that don't cause the error):
var a = {a#b:1};
=> Syntax error
var a = {"a#b":1};
a['a#b']
=> 1
but the parenthetical about keywords, as I showed above, isn't true.
What I wanted works because the text between the opening { and the colon, or between the comma and the colon for subsequent properties is used as an unquoted string to make an object key, or, as Friedo put it, a variable name there doesn't get interpolated:
var uid = getUID();
var token = getToken(); // Returns ABC123
var data = {uid:uid,token:token};
data.token
=> ABC123
It may reduce data size if quotes on name are only allowed when necessary