First of all: I know that there are many questions related to escaping, but I did not found a generally working answer so far.
Say I have this simple toy function for demonstration:
function f(somePOJO) {
var s = eval("'" + JSON.stringify(somePOJO) + "';"); // for demonstration only
return JSON.parse(s);
}
const clone = f({a: 1, b: "c"});
Given an object literal such as {a: 1, b: "c"} (a POJO), f should return a "clone" of it. (Note that I do not really use this approach for cloning or similar, and I am aware that eval is evil and also that it is not even needed here, it's just for demonstration of the escaping problem!)
This works fine, but only as long as the POJO values do not contain a '. Now of course I could escape the JSON by using something like JSON.stringify(somePOJO).replace(/'/g, "\\'"). This works if the POJO values contain ', but not if they contain \\'. And this creates a spiral of escaping...
Is there a solution to this at all?
The escape function to preserve a JSON string through being evaluated by the eval function, the JavaScript compiler under some circumstances or by the JSON.parse function is actually JSON.stringify. This JSON method will happily stringify string values, not just object data types.
function f(somePOJO) {
var s = eval( JSON.stringify(JSON.stringify(somePOJO)) );
return JSON.parse(s);
}
const obj = {a: 1, b: "c", d: "back\\, forward/"}
const clone = f(obj);
console.log(obj);
console.log(clone);
The reason it's not one of the escape/encodeURI/encodeURIComponent family of functions is that these are for escaping characters for inclusion in URLs whereas this case is about escaping characters to be parsed by a JavaScipt parser.
In most cases, particularly to parse JSON text using JSON.parse, stringifying JSON text a second time and parsing it twice is simply unnecessary.
Of somewhat academic interest now but before the introduction of JSON into Javascript, one could stringify a string by serially inspecting its characters and backslash escaping backslashes, at least one kind of quote marks, and unicode escaping control codes - the posted question may be missing the part about needing to escape backslash characters as well as quote marks.
Related
I need to concatenate untrusted* data into a javascript string, but I need it to work for all types of strings (single quoted, double quoted, or backtick quoted)
And ideally, I need it to work for multiple string types at once
I could use string replace, but this is usually a bad idea.
I was using JSON.stringify, but this only escapes double quotes, not single or backtick.
Other answers that I've found deal with escaping only a single type of quote at a time (and never backticks).
An example of what I need:
untrustedData = 'a String with \'single quotes\', \"double quotes\" and \`backticks\` in it';
const someJS = `console.log(\`the thing is "${escapingFunctionHere(untrustedString)}"\`)`
someJS will be passed to new Function
* N.B. In my context "untrusted" here doesn't mean potentially malicous, but it does need to cope with quotes, escapes and the like.
I am building javascript code dynamically, the constructed code will not be in any way web-facing. In fact its likely that I am the only one who will use this tool directly or indirectly.
I am happy to accept the minimal associated risks
NOTE TO OTHERS: Be sure you understand the risks before doing this kind of thing.
For those interested, I am writing a parser creator. Given an input ebnf grammar file, it will output a JS class that can be used to parse things.
I really do need to output code here.
If all you need to do is escape single quotes ', double quotes " and backticks `, then using replace to prepend a backslash \ should be enough:
untrustedData.replace(/['"`]/g, '\\$&')
const untrustedData = 'I \'am\' "a `string`"';
console.log(untrustedData);
const escapedData = untrustedData.replace(/['"`]/g, '\\$&');
console.log(escapedData);
Basically, I am trying to create an object like this by providing a string to JSON.parse():
a = {x:1}
// -> Object {x: 1}
Intuitively I tried:
a = JSON.parse('{x:1}')
// -> Uncaught SyntaxError: Unexpected token x
After some fiddling I figured out:
a = JSON.parse('{"x":1}')
// -> Object {x: 1}
But then I accidentally changed the syntax and bonus confusion kicked in:
a = JSON.parse("{'x':1}")
//-> Uncaught SyntaxError: Unexpected token '
So now I am looking for an explanation why
one must to quote the property name
the implementation accepts single quotes, but fails on double quotes
The main reason for confusion seems to be the difference between JSON and JavaScript objects.
JSON (JavaScript Object Notation) is a data format meant to allow data exchange in a simple format. That is the reason why there is one valid syntax only. It makes parsing much easier. You can find more information on the JSON website.
Some notes about JSON:
Keys must be quoted with "
Values might be strings, numbers, objects, arrays, booleans or "null"
String values must be quoted with "
JavaScript objects on the other hand are related to JSON (obviously), but not identical. Valid JSON is also a valid JavaScript object. The other way around, however, is not.
For example:
keys and values can be quoted with " or '
keys do not always have to be quoted
values might be functions or JavaScript objects
As pointed out in the comments, because that's what the JSON spec specifies. The reason AFAIK is that JSON is meant to be a data interchange format (language agnostic). Many languages, even those with hash literals, do not allow unquoted strings as hash table keys.
I have a problem parsing a json string.
Here's the string (the problematic part of it):
{
"type":"meaning",
"terms":[
{
"type":"text",
"text":"some value.",
"language":"ru"
},
{
"type":"url",
"text":"\x3ca href\x3d\x22http://readmas.ru/arts/bodyart/znachenie-tatuirovok.-chast-i.html\x22\x3…ttp://readmas.ru/arts/bodyart/znachenie-tatuirovok.-chast-i.html\x3c/a\x3e",
"language":"ru"
}]
},
Note:
These function doesn't work for me:
string replace.
JSON.parse.
$.parseJSON.
Unlike JavaScript, the JSON notation only supports the two-byte \uNNNN escape sequences, not the \xNN sequences. Try this:
var cleaned = input.replace(/\\x([0-9a-f]{2})/g, '\\u00$1');
var output = $.parseJSON(cleaned);
console.log(output);
Demonstration
Also, in order to make this demonstration work, I had make a couple other modifications to your string, which I think are just a result of how you formatted the question here:
Completed the \xNN escape sequence that was broken when in the middle of the string (\x3…ttp).
Removed the comma at the end of the object literal.
In any case, it would probably be better if you could make your service (or whatever is giving you this file) provide you valid JSON instead this.
I am using jQuery's getJSON function to make a request and handle the JSON response. The problem is the response I get back is malformed and I can't change it. The response looks something like this:
{
aNumber: 200,
someText: '\'hello\' world',
anObject: {
'foo': 'fooValue',
'bar': '10.0'
}
}
To be valid JSON, it should look like this:
{
"aNumber": 200,
"someText": "'hello' world",
"anObject": {
"foo": "fooValue",
"bar": "10.0"
}
}
I would like to change the text returned to a valid JSON object. I've used the JavaScript replace function to turn the single quotes into double quotes and the escaped single quotes into single quotes, but now I am stuck on figuring out the best way to add quotes around the key values.
For example, how would I change foo: "fooValue" to "foo":"fooValue"? Is there a Regular Expression that can make this easy?
Thanks in advance!
This regex will do the trick
$json = preg_replace('/([{,])(\s*)([A-Za-z0-9_\-]+?)\s*:/','$1"$3":',$json);
It's a php though! I assume it's not a problem converting it to JS.
I was trying to solve the same problem using a regEx in Javascript. I have an app written for Node.js to parse incoming JSON, but wanted a "relaxed" version of the parser (see following comments), since it is inconvenient to put quotes around every key (name). Here is my solution:
var objKeysRegex = /({|,)(?:\s*)(?:')?([A-Za-z_$\.][A-Za-z0-9_ \-\.$]*)(?:')?(?:\s*):/g;// look for object names
var newQuotedKeysString = originalString.replace(objKeysRegex, "$1\"$2\":");// all object names should be double quoted
var newObject = JSON.parse(newQuotedKeysString);
Here's a breakdown of the regEx:
({|,) looks for the beginning of the object, a { for flat objects or , for embedded objects.
(?:\s*) finds but does not remember white space
(?:')? finds but does not remember a single quote (to be replaced by a double quote later). There will be either zero or one of these.
([A-Za-z_$\.][A-Za-z0-9_ \-\.$]*) is the name (or key). Starts with any letter, underscore, $, or dot, followed by zero or more alpha-numeric characters or underscores or dashes or dots or $.
the last character : is what delimits the name of the object from the value.
Now we can use replace() with some dressing to get our newly quoted keys:
originalString.replace(objKeysRegex, "$1\"$2\":")
where the $1 is either { or , depending on whether the object was embedded in another object. \" adds a double quote. $2 is the name. \" another double quote. and finally : finishes it off.
Test it out with
{keyOne: "value1", $keyTwo: "value 2", key-3:{key4:18.34}}
output:
{"keyOne": "value1","$keyTwo": "value 2","key-3":{"key4":18.34}}
Some comments:
I have not tested this method for speed, but from what I gather by reading some of these entries is that using a regex is faster than eval()
For my application, I'm limiting the characters that names are allowed to have with ([A-Za-z_$\.][A-Za-z0-9_ \-\.$]*) for my 'relaxed' version JSON parser. If you wanted to allow more characters in names (you can do that and still have valid JSON), you could instead use ([^'":]+) to mean anything other than double or single quotes or a colon. This would still limit you further than the JSON standard (which allows single quotes in the name) but then you wouldn't be able to parse using this method. You can have all sorts of stuff in here with this expression ([^'":]+), so be careful.
Hope this helps.
edit — came back to point out, first and foremost, that this is not a problem that can be solved with a regular expression.
It's important to distinguish between JSON notation as a serialized form, and JavaScript object constant notation.
This:
{ x: "hello" }
is a perfectly valid JavaScript value (an expression fragment), so that this:
var y = { x: "hello" };
gives you exactly the same result as:
var y = { "x": "hello" };
In other words, the value of "y" in either of those cases will be exactly the same. Completely, exactly the same, such that it would not be possible to ever tell which of those two constants was used to initialize "y".
Now, if what you want to do is translate a string containing JavaScript style "JSON shorthand" without quotes into valid JSON, the only thing to do is parse it and reconstruct the string with quotes around the property names. That is, you will have to either write your own "relaxed" JSON parser than can cope with unquoted identifiers as property names, or else find an off-the-shelf parser that can handle such relaxed syntax.
In your case, it looks like once you have the "relaxed" parser available, you're done; there shouldn't be any need for you to translate back. Thankfully, your "invalid" JSON response is completely interpretable by JavaScript itself, so if you trust the data source (and that's a big "if") you should be able to evaluate it with "eval()".
UPD 2020: the object you have is a valid javascript object, but not 100% valid JSON.
An easy way to convert it to valid JSON is to utilize the features JavaScript provides you with, JSON.stringify:
JSON.stringify(object)
You can run this in your browser's JS console.
To get it formatted (aka "pretty-printed"), you can pass two arguments to this function - the replacer (a function which allows you to filter out some of the properties of your object; just pass a null if you don't care) and space (either the number of spaces or a string which will be placed before each key-value pair of your object' string representation):
JSON.stringify(object, null, 4)
In your case, this call
JSON.stringify({
aNumber: 200,
someText: '\'hello\' world',
anObject: {
'foo': 'fooValue',
'bar': '10.0'
}
}, null, 4)
will give you
{
"aNumber": 200,
"someText": "'hello' world",
"anObject": {
"foo": "fooValue",
"bar": "10.0"
}
}
You **do not** need to do this - you've already got a valid **JSON object**. Read 'bout JSON [here][1].
If you need to get value, you just write `data.whatever` and it just works. E.g.: if you have JSON **object** `data`:
{
moo: "foo",
foo: "bar"
}
All possible fields are `moo` and `foo` and their use are `data.moo` and `data.foo` respectively. And if you want to use `data` as a jQuery argument, you just pass it as-is: `$.load("http://my.site.com/moo", data, function(response){ /* ... */ })`.
**Note:** in the last example i've mentioned, response will be a string. To make it a valid JSON object use `$.parseJSON(response);` method.
Since it's a malformed "JSON", you will not be able to use jQuery.getJSON.
You can use
jQuery.ajax({
url : myUrl,
data : myParams,
type : "GET",
success : function(jsontext)
{
// jsontext is in text format
jsontext = jsontext.replace("'", "\"");
// now convert text to JSON object
var jsonData = eval('(' + jsontext+ ')');
// rest of the code
}
});
The JSON spec says that JSON is an object or an array. In the case of an object,
An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. ...
And later, the spec says that a string is surrounded in quotes.
Why?
Thus,
{"Property1":"Value1","Property2":18}
and not
{Property1:"Value1",Property2:18}
Question 1: why not allow the name in the name/value pairs to be unquoted identifiers?
Question 2: Is there a semantic difference between the two representations above, when evaluated in Javascript?
I leave a quote from a presentation that Douglas Crockford (the creator of the JSON standard) gave to Yahoo.
He talks about how he discovered JSON, and amongst other things why he decided to use quoted keys:
....
That was when we discovered the
unquoted name problem. It turns out
ECMA Script 3 has a whack reserved
word policy. Reserved words must be
quoted in the key position, which is
really a nuisance. When I got around
to formulizing this into a standard, I
didn't want to have to put all of the
reserved words in the standard,
because it would look really stupid.
At the time, I was trying to convince
people: yeah, you can write
applications in JavaScript, it's
actually going to work and it's a good
language. I didn't want to say, then,
at the same time: and look at this
really stupid thing they did! So I
decided, instead, let's just quote the
keys.
That way, we don't have to tell
anybody about how whack it is.
That's why, to this day, keys are quoted in
JSON.
You can find the complete video and transcript here.
Question 1: why not allow the name in the name/value pairs to be unquoted identifiers?
The design philosophy of JSON is "Keep it simple"
"Quote names with "" is a lot simpler than "You may quote names with " or ' but you don't have to, unless they contain certain characters (or combinations of characters that would make it a keyword) and ' or " may need to be quoted depending on what delimiter you selected".
Question 2: Is there a semantic difference between the two representations above, when evaluated in Javascript?
No. In JavaScript they are identical.
Both : and whitespace are permitted in identifiers. Without the quotes, this would cause ambiguity when trying to determine what exactly constitutes the identifier.
In javascript objects can be used like a hash/hashtable with key pairs.
However if your key has characters that javascript could not tokenize as a name, it would fail when trying it access like a property on an object rather than a key.
var test = {};
test["key"] = 1;
test["#my-div"] = "<div> stuff </div>";
// test = { "key": 1, "#my-div": "<div> stuff </div>" };
console.log(test.key); // should be 1
console.log(test["key"]); // should be 1
console.log(test["#my-div"]); // should be "<div> stuff </div>";
console.log(test.#my-div); // would not work.
identifiers can sometimes have characters that can not be evaluated as a token/identifier in javascript, thus its best to put all identifiers in strings for consistency.
If json describes objects, then in practise you get the following
var foo = {};
var bar = 1;
foo["bar"] = "hello";
foo[bar] = "goodbye";
so then,
foo.bar == "hello";
foo[1] == "goodbye" // in setting it used the value of var bar
so even if your examples do produce the same result, their equivalents in "raw code" wouldn't. Maybe that's why?? dunno, just an idea.
I think the right answer to Cheeso's question is that the implementation surpassed the documentation. It no longer requires a string as the key, but rather something else, which can either be a string (ie quoted) or (probably) anything that can be used as a variable name, which I will guess means start with a letter, _, or $, and include only letters, numbers, and the $ and _.
I wanted to simplify the rest for the next person who visits this question with the same idea I did. Here's the meat:
Variable names are not interpolated in JSON when used as an object key (Thanks Friedo!)
Breton, using "identifier" instead of "key", wrote that "if an identifier happens to be a reserved word, it is interpreted as that word rather than as an identifier." This may be true, but I tried it without any trouble:
var a = {do:1,long:2,super:3,abstract:4,var:5,break:6,boolean:7};
a.break
=> 6
About using quotes, Quentin wrote "...but you don't have to, unless [the key] contains certain characters (or combinations of characters that would make it a keyword)"
I found the former part (certain characters) is true, using the # sign (in fact, I think $ and _ are the only characters that don't cause the error):
var a = {a#b:1};
=> Syntax error
var a = {"a#b":1};
a['a#b']
=> 1
but the parenthetical about keywords, as I showed above, isn't true.
What I wanted works because the text between the opening { and the colon, or between the comma and the colon for subsequent properties is used as an unquoted string to make an object key, or, as Friedo put it, a variable name there doesn't get interpolated:
var uid = getUID();
var token = getToken(); // Returns ABC123
var data = {uid:uid,token:token};
data.token
=> ABC123
It may reduce data size if quotes on name are only allowed when necessary