How to reliably hash JavaScript objects? - javascript

Is there a reliable way to JSON.stringify a JavaScript object that guarantees that the ceated JSON string is the same across all browsers, Node.js and so on, given that the JavaScript object is the same?
I want to hash JavaScript objects like
{
signed_data: object_to_sign,
signature: md5(JSON.stringify(object_to_sign) + secret_code)
}
and pass them around across web applications (e.g. Python and Node.js) and the user so that the user can authenticate against one service and show the next service "signed data" for that one to check if the data is authentic.
However, I came across the problem that JSON.stringify is not really unique across the implementations:
In Node.js / V8, JSON.stringify returns a JSON string without unnecessary whitespace, such as '{"user_id":3}.
Python's simplejson.dumps leaves some whitespace, e.g. '{"user_id": 3}'
Probably other stringify implementations might deal differently with whitespace, the order of attributes, or whatever.
Is there a reliable cross-platform stringify method? Is there a "nomalised JSON"?
Would you recommend other ways to hash objects like this?
UPDATE:
This is what I use as a workaround:
normalised_json_data = JSON.stringify(object_to_sign)
{
signed_data: normalised_json_data,
signature: md5(normalised_json_data + secret_code)
}
So in this approach, not the object itself, but its JSON representation (which is specific to the sigining platform) is signed. This works well because what I sign now is an unambiguous string and I can easily JSON.parse the data after I have checked the signature hash.
The drawback here is that if I send the whole {signed_data, signature} object as JSON around as well, I have to call JSON.parse twice and it does not look as nice because the inner one gets escaped:
{"signature": "1c3763890298f5711c8b2ea4eb4c8833", "signed_data": "{\"user_id\":5}"}

You might be interested in npm package object-hash, which seems to have a rather good activity & reliability level.
var hash = require('object-hash');
var testobj1 = {a: 1, b: 2};
var testobj2 = {b: 2, a: 1};
var testobj3 = {b: 2, a: "1"};
console.log(hash(testobj1)); // 214e9967a58b9eb94f4348d001233ab1b8b67a17
console.log(hash(testobj2)); // 214e9967a58b9eb94f4348d001233ab1b8b67a17
console.log(hash(testobj3)); // 4a575d3a96675c37ddcebabd8a1fea40bc19e862

This is an old question, but I thought I'd add a current solution to this question for any google referees.
The best way to sign and hash JSON objects now is to use JSON Web Tokens. This allows for an object to be signed, hashed and then verified by others based on the signature. It's offered for a bunch of different technologies and has an active development group.

You're asking for an implementation of something across multiple languages to be the same... you're almost certainly out of luck. You have two options:
check www.json.org implementations to see if they might be more standardized
roll your own in each language (use json.org implementations as a base and there should be VERY little work to do)

You could normalise the result of stringify() by applying rules such as:
remove unnecessary whitespace
sort attribute names in hashes
well-defined consistent quoting style
normalise string contents (so "\u0041" and "A" become the same)
This would leave you with a canonical JSON representation of your object, which you can then reliably hash.

After trying some hash algorithms and JSON-to-string methods, I found this to work the best (Sorry, it is typescript, can of course be rewritten to javascript):
// From: https://stackoverflow.com/questions/5467129/sort-javascript-object-by-key
function sortObjectKeys(obj){
if(obj == null || obj == undefined){
return obj;
}
if(typeof obj != 'object'){ // it is a primitive: number/string (in an array)
return obj;
}
return Object.keys(obj).sort().reduce((acc,key)=>{
if (Array.isArray(obj[key])){
acc[key]=obj[key].map(sortObjectKeys);
}
else if (typeof obj[key] === 'object'){
acc[key]=sortObjectKeys(obj[key]);
}
else{
acc[key]=obj[key];
}
return acc;
},{});
}
let xxhash64_ObjectToUniqueStringNoWhiteSpace = function(Obj : any)
{
let SortedObject : any = sortObjectKeys(Obj);
let jsonstring = JSON.stringify(SortedObject, function(k, v) { return v === undefined ? "undef" : v; });
// Remove all whitespace
let jsonstringNoWhitespace :string = jsonstring.replace(/\s+/g, '');
let JSONBuffer: Buffer = Buffer.from(jsonstringNoWhitespace,'binary'); // encoding: encoding to use, optional. Default is 'utf8'
return xxhash.hash64(JSONBuffer, 0xCAFEBABE, "hex");
}
It used npm module: https://cyan4973.github.io/xxHash/ , https://www.npmjs.com/package/xxhash
The benefits:
This is deterministic
Ignores key order (preserves array order)
Cross platform (if you can find equivalents for JSON-stringify)
JSON-stringify will hopefully will not get a different implementation and the whitespace removal will hopefully make it JSON-formatting independent.
64-bit
Hexadecimal string a result
Fastest (0.021 ms for 2177 B JSON, 2.64 ms for 150 kB JSON)

You may find bencode suitable for your needs. It's cross-platform, and the encoding is guaranteed to be the same from every implementation.
The downside is it doesn't support nulls or booleans. But that may be okay for you if you do something like transforming e.g., bools -> 0|1 and nulls -> "null" before encoding.

Related

Getting length (size? count?) of an array in Parse.com cloud code

I believe in javascript, arrays have a ".count" property. However, I believe that when writing Parse cloud code, effectively you cannot use this since .count is in a word, used by Parse (for queries).
(1) Is that correct, and is the reason I gave correctly stated or a shambles?
I believe (it seems to work) you can go ahead and use .length in Parse cloud code for the length of an array; but I'm confused "why" since javascript doco says .length
(2) Is that correct - if so why can it be done?
You inevitably use "underscore" library in Parse projects; in fact does that library offer a way to get the size/length/count of an array?
(3) Is there yet another way, using _ ?
I swear I have seen Parse cloud code using "size" (something or other like that) in relation to arrays;
(4) Is there an idiom using something like 'size' ?
Finally, indeed, considering this typical example of using _,
Parse.Cloud.afterSave("Employee", function(request)
{
var company = request.object.get("company");
var query = new Parse.Query("Employee");
query.equalTo("company", company);
query.include("company");
var justEmails = new Array();
query.each(function(employee)
{
var thatEmail = employee.get("email");
justEmails.push(thatEmail);
}
).then(function()
{
var kount = justEmails.length;
console.log(">>> count is " + kount );
justEmails = _.uniq(justEmails);
kount = justEmails.length;
console.log(">>> but count is now " + kount );
});
});
(5) is there a way to do that in "one line", saying something like _.uniq(justEmails).sizeOrWhateverTheHell();
Finally in summary,
(6) what then is the best and most sensible and most idimoatic way to get simply the length of an array, in javascript in the Parse cloud code milieu -- is it indeed .length?
There is no such thing as count. Arrays (and strings) have a .length property. Use it.
I have no idea what this is asking.
No, use .length.
See 3
_.uniq(whatever).length
See 1
It's just JavaScript.
You are correct and the best way to get the number of elements of an array in javascript (and in Parse cloud code) is to use array.length
Length is the property of the array, whereas size is a function that's defined in some javascript frameworks. Always use the length property to get the number of elements in an array.

JSON Date without eval?

Short description:
Is there a javascript JSON-converter out there that is able to preserve Dates and does not use eval?
Example:
var obj1 = { someInt: 1, someDate: new Date(1388361600000) };
var obj2 = parseJSON(toJSON(obj1));
//obj2.someDate should now be of type Date and not String
//(like in ordinary json-parsers).
Long description:
I think most people working with JSON already had the problem of how to transmit a Date:
var obj = { someInt: 1, someDate: new Date(1388361600000) }
When converting this to JSON and back, the date suddenly became a String:
JSON.parse(JSON.stringify(obj))
== { someInt: 1, someDate: "2013-12-30T00:00:00.000Z" }
This is a huge disadvantage since you cannot easily submit a Date using JSON. There is always some post-processing necessary (and you need to know where to look for the dates).
Then Microsoft found a loophole in the specification of JSON and - by convention - encodes a date as follows:
{"someInt":1,"someDate":"\/Date(1388361600000)\/"}
The brilliance in this is that there is a now a definitive way to tell a String from a Date inside a valid JSON-string: An encoded String will never contain the substring #"/" (a backslash followed by a slash, not to be confused with an escaped slash). Thus a parser that knows this convention can now safely create the Date-object.
If a parser does not know this convention, the date will just be parsed to the harmless and readable String "/Date(1388361600000)/".
The huge drawback is that there seems to be no parser that can read this without using eval. Microsoft proposes the following way to read this:
var obj = eval("(" + s.replace(/\"\\\/Date\((\d+)\)\\\/\"/g, function (match, time) { return "new Date(" + time + ")"; }) + ")");
This works like a charm: You never have to care about Dates in JSON anymore. But it uses the very unsafe eval-method.
Do you know any ready-to-use-parser that achieves the same result without using eval?
EDIT
There was some confusion in the comments about the advantages of the tweaked encoding.
I set up a jsFiddle that should make the intentions clear: http://jsfiddle.net/AJheH/
I disagree with adeno's comment that JSON is a notation for strings and cannot represent objects. Json is a notation for compound data types which must be in the form of a serialized objects, albeit that the primitive types can only be integer, float, string or bool. (update: if you've ever had deal with spaghetti coded XML, then you'll appreciate that maybe this is a good thing too!)
Presumably hungarian notation has lost favour with Microsoft if they now think that creating a non-standard notation incorporating the data type to describe a type is better idea.
Of itself 'eval' is not evil - it makes solving some problems a lot easier - but it's very difficult to implement good security while using it. Indeed it's disabled by default with a Content Security Policy.
IMHO it boils down to storing the date as 1388361600000 or "2013-12-30T00:00:00.000Z". IMHO the latter has significantly more semantic value - taken out of context it is clearly a date+time while the latter could be just about anything. Both can be parsed by the ECMAscript Date object without resorting to using eval. Yes this does require code to process the data - but what can you do with an sort of data without parsing it? he only time I can see this as being an advanage is with a schemaless database - but in fairness this is a BIG problem.
The issue is the following line of code, here is an example function and take a look at parseWithDate function, add the script to the page and change the following line to this it will work.
http://www.asp.net/ajaxlibrary/jquery_webforms_serialize_dates_to_json.ashx
var parsed1 = JSON.parse(s1); // changed to below
var parsed1 = JSON.parseWithDate(s1);
Updated jsFiddle that works http://jsfiddle.net/GLb67/1/

Dojo JSON serialization, using various serialization strategies

I have the issue with serializing inputs of DateTextBox and TimeTextBox to JSON - during serialization the timezone convertion is made, which forces me to send timezone to server and do appropriate adaptations there.
To prevent there I'd like to change the date format for serialization purposes. I can alter Date's prototype, as described here (I've done that in JSFiddle), but I'd prefer not to alter the behaviour for the scope of single request. Something like that:
someDojoUtil.jsonSerialize(obj, {option1: 'value1'...})
Does Dojo provide the way for achieving it, or the only way is to globally alter Date's prototype?
Based on comment elaboration in OP, I would use the second argument to Json.stringify, the so-called "replacer". Something like this:
require(['dojo/json'], function(Json) {
function replacer(key, value) {
if ('string' === typeof (value)) {
var d = new Date(value);
if (isNaN(d.getTime())) {
return value; // string, but not a date
}
// do whatever you want to do, this is just an example
d.setSeconds(0);
return d.toJSON();
}
return value;
}
var data = {'a':new Date(), 'b':123, 'c':'foo', 'd':[new Date()]};
var str = Json.stringify(data, replacer);
console.log(str);
});
I suggest writing this as a mixin for dojo/request, then creating yourself a custom request class that has this behavior, then using that custom request object as needed.
This feels hackish, but I think it will meet your need (as I'm understanding it!).

How decode HEX in XMLHtppRequest?

I have a site and I used AJAX. And I got some problems.
Server return JSON string something like this {a:"x48\x65\x6C\x6C\x6F"}.
Then in xx.responseText, we have this string '{a:"\x48\x65\x6C\x6C\x6F"}'.
But if I create JavaScript string "\x48\x65\x6C\x6C\x6F" then I have "Hello" and not HEX!
Is it possible get in xx.responseText "real" text from HEX (automatically, without .replace())?
If the output is at all regular (predictable), .replace() is probably the simplest.
var escapeSequences = xx.responseText.replace(/^\{a:/, '').replace(/\}$/, '');
console.log(escapeSequences === "\"\\x48\\x65\\x6C\\x6C\\x6F\""); // true
Or, if a string literal that's equivalent in value but may not otherwise be the same is sufficient, you could parse (see below) and then stringify() an individual property.
console.log(JSON.stringify(data.a) === "\"Hello\""); // true
Otherwise, you'll likely need to run responseText through a lexer to tokenize it and retrieve the literal from that. JavaScript doesn't include an option for this separate from parsing/evaluating, so you'll need to find a library for this.
"Lexer written in JavaScript?" may be a good place to start for that.
To parse it:
Since it appears to be a string of code, you'll likely have to use eval().
var data = eval('(' + xx.responseText + ')');
console.log(data.a); // Hello
Note: The parenthesis make sure {...} is evaluated as an Object literal rather than as a block.
Also, I'd suggest looking into alternatives to code for communicating data like this.
A common option is JSON, which takes its syntax from JavaScript, but uses a rather strict subset. It doesn't allow functions or other potentially problematic code to be included.
var data = JSON.parse(xx.responseText);
console.log(data.a); // Hello
Visiting JSON.org, you should be able to find a reference or library for the choice of server-side language to output JSON.
{ "a": "Hello" }
Why not just let the JSON parser do its job and handle the \x escape sequences, and then just convert the string back to hex again afterwards, e.g.
function charToHex(c) {
var hex = c.charCodeAt(0).toString(16);
return (hex.length === 2) ? hex : '0' + hex;
}
"Hello".replace(/./g, charToHex); // gives "48656c6c6f"

JSON.stringify and "\u2028\u2029" check?

Sometimes I see in a view source page ( html view source) this code:
if (JSON.stringify(["\u2028\u2029"]) === '["\u2028\u2029"]') JSON.stringify = function (a) {
var b = /\u2028/g,
c = /\u2029/g;
return function (d, e, f) {
var g = a.call(this, d, e, f);
if (g) {
if (-1 < g.indexOf('\u2028')) g = g.replace(b, '\\u2028');
if (-1 < g.indexOf('\u2029')) g = g.replace(c, '\\u2029');
}
return g;
};
}(JSON.stringify);
What is the problem with JSON.stringify(["\u2028\u2029"]) that it needs to be checked ?
Additional info :
JSON.stringify(["\u2028\u2029"]) value is "["

"]"
'["\u2028\u2029"]' value is also "["

"]"
I thought it might be a security feature. FileFormat.info for 2028 and 2029 have a banner stating
Do not use this character in domain names. Browsers are blacklisting it because of the potential for phishing.
But it turns out that the line and paragraph separators \u2028 and \u2029 respectively are treated as a new line in ES5 JavaScript.
From http://www.thespanner.co.uk/2011/07/25/the-json-specification-is-now-wrong/
\u2028 and \u2029 characters that can break entire JSON feeds since the string will contain a new line and the JavaScript parser will bail out
So you are seeing a patch for JSON.stringify. Also see Node.js JavaScript-stringify
Edit: Yes, modern browsers' built-in JSON object should take care of this correctly. I can't find any links to the actual source to back this up though. The Chromium code search doesn't mention any bugs that would warrant adding this workaround manually. It looks like Firefox 3.5 was the first version to have native JSON support, not entirely bug-free though. IE8 supports it too. So it is likely a now unnecessary patch, assuming browsers have implemented the specification correctly.
After reading both answers , here is the Simple visual explanation :
doing this
alert(JSON.stringify({"a":"sddd\u2028sssss"})) // can cause problems
will alert :
While changing the trouble maker to something else ( for example from \u to \1u)
will alert :
Now , let's invoke the function from my original Q ,
Lets try this alert(JSON.stringify({"a":"sddd\u2028sssss"})) again :
result :
and now , everybody's happy.
\u2028 and \u2029 are invisible Unicode line and paragraph separator characters. Natively JSON.stringify method converts these codes to their symbolic representation (as JavaScript automatically does in the strings), resulting in "["

"]". The code you have provided does not let JSON to convert the codes to symbols and preserves their \uXXXX representation in the output string, i.e. returning "["\u2028\u2029"]".

Categories