Are there any forbidden characters in key names, for JavaScript objects or JSON strings? Or characters that need to be escaped?
To be more specific, I'd like to use "$", "-" and space in key names.
No. Any valid string is a valid key. It can even have " as long as you escape it:
{"The \"meaning\" of life":42}
There is perhaps a chance you'll encounter difficulties loading such values into some languages, which try to associate keys with object field names. I don't know of any such cases, however.
The following characters must be escaped in JSON data to avoid any problems:
" (double quote)
\ (backslash)
all control characters like \n, \t
JSON Parser can help you to deal with JSON.
It is worth mentioning that while starting the keys with numbers is valid, it could cause some unintended issues.
Example:
var testObject = {
"1tile": "test value"
};
console.log(testObject.1tile); // fails, invalid syntax
console.log(testObject["1tile"]; // workaround
Unicode codepoints U+D800 to U+DFFF must be avoided: they are invalid in Unicode because they are reserved for UTF-16 surrogate pairs. Some JSON encoders/decoders will replace them with U+FFFD. See for example how the Go language and its JSON library deals with them.
So avoid "\uD800" to "\uDFFF" alone (not in surrogate pairs).
Related
Are there any forbidden characters in key names, for JavaScript objects or JSON strings? Or characters that need to be escaped?
To be more specific, I'd like to use "$", "-" and space in key names.
No. Any valid string is a valid key. It can even have " as long as you escape it:
{"The \"meaning\" of life":42}
There is perhaps a chance you'll encounter difficulties loading such values into some languages, which try to associate keys with object field names. I don't know of any such cases, however.
The following characters must be escaped in JSON data to avoid any problems:
" (double quote)
\ (backslash)
all control characters like \n, \t
JSON Parser can help you to deal with JSON.
It is worth mentioning that while starting the keys with numbers is valid, it could cause some unintended issues.
Example:
var testObject = {
"1tile": "test value"
};
console.log(testObject.1tile); // fails, invalid syntax
console.log(testObject["1tile"]; // workaround
Unicode codepoints U+D800 to U+DFFF must be avoided: they are invalid in Unicode because they are reserved for UTF-16 surrogate pairs. Some JSON encoders/decoders will replace them with U+FFFD. See for example how the Go language and its JSON library deals with them.
So avoid "\uD800" to "\uDFFF" alone (not in surrogate pairs).
What I am trying to do is simple. Parse this array holding json objects into a Javascript array.
var merchantsJson = JSON.parse('[{"id":61693,"name":"Más"},{"id":61690,"name":"\u0027\u0022\u003C/div\u003E"}]');
But the unicode character \u003C seems to be breaking the parser. In the chrome console I see "Uncaught SyntaxError: Unexpected token <"
A little more info. The above is what the code is evaluated to. In reality the code contains a jsp expression.
var merchantsJson = JSON.parse('${jsonArr}');
If I remove the single quotes, there is no issue, but eclipse give me an "missing semicolon" error message. Is it possible to parse the array with the quotes as I am trying to do?
The interpolation of ${jsonArr} is already a JavaScript object. When you wrap it in '${jsonArr}' this turns it into a string and you have to use JSON.parse.
There's no need to make it a string. You can just do var merchantsArray = ${jsonArr}. JSON constructs are already interoperable with JavaScript code.
Because there's an extra " in your string literal that is encoded by \u0022:
> '[{"id":61693,"name":"Más"},{"id":61690,"name":"\u0027\u0022\u003C/div\u003E"}]'
[{"id":61693,"name":"Más"},{"id":61690,"name":"'"</div>"}]
In short, your JSON in the string is invalid. You would need to escape the unicode escape sequences for the quotes in the string literal ("'\u0022</div>"), by using
JSON.parse('[{"id":61693,"name":"Más"},{"id":61690,"name":"\u0027\\u0022\u003C/div\u003E"}]'
// ^
or escape the quote character ("'\"</div>"):
JSON.parse('[{"id":61693,"name":"Más"},{"id":61690,"name":"\u0027\\\u0022\u003C/div\u003E"}]');
// ^^
However, there actually is no need to use JSON at all. Just output a JS array literal into your code:
var merchantsJson = ${jsonArr};
Try to replace \u with \\u. If you don't, JSON parser receives already decoded Unicode, which created polluted JSON.
It's not because of \u003C, rather the \u0022 character is causing the issue, since it's a quotation mark and JavaScript treats it literally ending the string.
You need to escape that character: \\u0022 .
you have to use special character in your JSON string, you can escape it using \ character.
you need to replace \ with \\.
[{\"id\":61693,\"name\":\"Más\"},{\"id\":61690,\"name\":\"\\u0027\\u0022\\u003C/div\\u003E\"}]
var jsn=getAttr(ref,"json-data").toString();
console.log(jsn); //{test: true,stringtest:"hallo"}. it's OK.
JSON.parse(jsn); //Uncaught SyntaxError: Unexpected token s, line: line with JSON.parse;
I think JSON.parse does something not right with this data.. I tried to remove stringtest:"hallo" - no result... PS: also I think that I do something wrong then I have asked this question
At the first time I tried JSON.parse("{"+jsn+"}");.
Your JSON is not properly formatted, as your object keys must be surrounded by quotation marks. The following will work:
var jsn = '{"test": true, "stringtest": "hallo"}';
JSON.parse(jsn);
Edit: The RFC4627, which specifies JSON format, states:
2.2. Objects
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique.
object = begin-object [ member *( value-separator member ) ]
end-object
member = string name-separator value
As you can see, JSON objects are composed of name/value pairs, where a name is a string. Again, the RFC says:
The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
string = quotation-mark *char quotation-mark
quotation-mark = %x22 ; "
So, according to the RFC, keys must be surrounded by double quotes, not single one. Still, I guess some parsers may be more tolerant and accept both of them, but I'd stick to the standard.
I wan a regex to alidate all types of possible DN's
I create one but its not so good.
/([A-z0-9=]{1}[A-z0-9]{1})*[,??]/ and some others by changing it, but in vain.
Posible DN's can be
CN=abcd,CN=abcd,O=abcd,C=us
CN=abcd0520,CN=users,O=abcd,C=us
C=us
etc
I recently had a need for this, so I created one that perfectly follows the LDAPv3 distinguished name syntax at RFC-2253.
Attribute Type
An attributeType can be expressed 2 ways. An alphanumeric string that starts with an alpha, validated using:
[A-Za-z][\w-]*
Or it can be an OID, validated using:
\d+(?:\.\d+)*
So attributeType validates using:
[A-Za-z][\w-]*|\d+(?:\.\d+)*
Attribute Value
An attributeValue can be expressed 3 ways. A hex string, which is a sequence of hex-pairs with a leading #. A hex string validates using:
#(?:[\dA-Fa-f]{2})+
Or an escaped string; each non-special character is expressed "as-is" (validates using [^,=\+<>#;\\"]). Special characters can be expressed with a leading \ (validates using \\[,=\+<>#;\\"]). Finally any character can be expressed as a hex-pair with a leading \ (validates using \\[\dA-Fa-f]{2}). An escaped string validates using:
(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*
Or a quoted-string; the value starts and ends with ", and can contain any character un-escaped except \ and ". Additionally, any of the methods from the escaped string above can be used. A quoted-string validates using:
"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"
All combined, an attributeValue validates using:
#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"
Name component
A name-component in BNF is:
name-component = attributeTypeAndValue *("+" attributeTypeAndValue)
attributeTypeAndValue = attributeType "=" attributeValue
In RegEx is:
(?#attributeType)=(?#attributeValue)(?:\+(?#attributeType)=(?#attributeValue))*
Replacing the (?#attributeType) and (?#attributeValue) placeholders with the values above gives us:
(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*")(?:\+(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"))*
Which validates a single name-component.
Distinguished name
Finally, the BNF for a distinguished name is:
name-component *("," name-component)
In RegEx is:
(?#name-component)(?:,(?#name-component))*
Replacing the (?#name-component) placeholder with the value above gives us:
^(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*")(?:\+(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"))*(?:,(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*")(?:\+(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"))*)*$
Test it here
This is not only not possible, it will never work, and should not even be attempted. LDAP data (distinguished name in this case) are not strings. A distinguished name has distinguishedName syntax, which is not a string, and comparisons must be made with using matching rules defined in the directory server schema. For this reason, regular expressions and native-language comparison, relative value, and equality operations like perl's ~~, eq and == and Java's == cannot be used with LDAP data - if a programmer attempts this, unexpected results can occur and the code is brittle, fragile, unpredictable, and does not have repeatable characteristics. Language LDAP APIs that do not support matching rules cannot be used with LDAP where comparison, equality checks, and relative value ordering comparisons are required.
By way of example, the distinguished names "dc=example,dc=com" and "DC=example, DC=COM" are equivalent in every way from an LDAP perspective, but native language equality operators would return false.
This worked for me:
Expression:
^(?<RDN>(?<Key>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+)\=(?<Value>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+))(?:\s*\,\s*(?<RDN>(?<Key>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+)\=(?<Value>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+)))*$
Test:
CN=Test User Delete\0ADEL:c1104f63-0389-4d25-8e03-822a5c3616bc,CN=Deleted Objects,DC=test,DC=domain,DC=local
The expression is already Regex escaped so to avoid having to repeat all the backslashes in C# make sure you prefix the string with the non-escaped literal # sign, i.e.
var dnExpression = #"...";
This will yield four groups, first a copy of the whole string, second a copy of the last RDN, third and fourth the key/value pairs. You can index into each key/value using the Captures collection of each group.
You can also use this to validate a RDN by cutting the expression to the "(?...)" group surrounded by the usual "^...$" to required a whole value (start-end of string).
I've allowed a hex special character escape "\", simple character escape "\" or anything other than ",=\" inside the key/value DN text. I'd guess this expression could be perfected by taking extra time to go through the MSDN AD standard and restrict the allowed characters to match exactly what is or is not allowed. But I believe this is a good start.
I created one. Working great.
^(\w+[=]{1}\w+)([,{1}]\w+[=]{1}\w+)*$
I'm trying to construct a javascript array from a db query. Each item in the array is a string that can contain many different characters that mess up the array. Single quotes, double quotes, parens, etc...
Here's an example of my current output:
var titleList = new Array('Fallout: New Vegas Teaser-Trailer HD','Saints NFC Champions','Best action scene of all time','NJ Lady ep 5: Our Grandma watches Jersey Shore','Australian Banker Caught Looking At Racy Images Of Model Miranda Kerr On Live Television','LEAKED FOOTAGE: New Griswold's "Vacation" Movie?','"A.D." teaser (ZOMBIE ANIMATION)'
...and so on
Is there a special way i can encapsulate each array item so that the title's characters dont interfere with the JS?
Thanks for your help.
Yes, you can escape these characters. Read about JSON.
Example (Hebrew text in json serialized object):
{"updated_at":"2010/02/01 09:55:15 +0000",
"title":"\u05d5\u05d9\u05ea\u05d5\u05e8"}
In general you need to escape the quoting characters you’re using for the string declaration. So in case of your string, you need to escape the ' inside your string declaration:
'LEAKED FOOTAGE: New Griswold\'s "Vacation" Movie?'
You either have to do the replacement manually (with some string replace function). Or maybe your language does support JSON functions. That would definitely be the better choice.
You can escape some special characters (double quote, single quote, newline, tab,..) with \ e.g.: var foo = "\" is just a double quote"