Encrypting a nested map of small strings with the same symmetric key

Encrypting a nested map of small strings with the same symmetric key - javascript

Let's say I have am creating a webapp, where users can create a nested tree of strings (with sensitive information). These strings are presumably quite short. I want to encrypt both keys and values in this tree before saving it. All values in the tree will be encrypted client-side using a symmetric key supplied by the user. Likewise they will be decrypted client-side, when reading.
The tree is persisted in a Mongo database.
I can't decide whether I should serialize the tree and encrypt it has a whole string or whether to encrypt values individually, considering that all data in the tree will be encrypted using the same key.
What are the pros and cons of either?
From what I can tell, AES uses a block size of 128 bits, meaning that any string can grow up to 15 characters in length when encoded, which speaks in favor of encoding a serialized string (if you want to avoid overhead)
Note: Although the webapp will use both HTTPS, IP whitelisting and multifactor authentication, I want to make an effort to prevent data breach in the event the Mongo database is stolen. That's what I'm going for here. Advice or thoughts on how to accomplish this is appreciated.
Update
Furthermore, I also want my service to inspire trust. Sending data in the clear (although over HTTPS) means the user must trust me to encrypt it before persisting it. Encrypting client-side allows me to emphasize that I don't know (or need to know) what I'm saving.

I can't think of a reason why these approaches would be different in terms of security of the actual strings (assuming they are both implemented correctly). Encrypting the strings individually obviously means that the structure of the tree will not be secret, but I'm not sure if you are concerned with that or not. For example, if you encrypt each string individually, someone seeing the ciphertexts could find out how many keys there are in the tree, and he could also learn something about the length of each key and value. If you encrypt the tree as a whole serialized blob, then someone seeing the ciphertext can tell roughly how much data is in the tree but nothing about the lengths or number of individual keys/values.
In terms of overhead, the padding would be a consideration, as you mentioned. A bigger source of storage overhead is IVs: if you are using a block cipher mode such as CTR, you need to use a distinct IV for each ciphertext. This means if you are encrypting each string individually, you need to store an IV for each string. If you encrypt the whole serialized tree, then you just need to store the one IV for that one ciphertext.
Before you implement this in Javascript, though, you should make sure that you're actually getting a real improvement in security from doing client-side encryption. This article is a classic: http://www.matasano.com/articles/javascript-cryptography/ One important point is to remember that the server is providing the Javascript encryption code, so encrypting data on the client doesn't protect it from the server. If your main concern is a stolen database, you could achieve the same security by just encrypting the data on the server before inserting it in the database.

First of all, I am not a security expert ;-)
I can't decide whether I should serialize the tree and encrypt it has a whole string or whether to encrypt values individually, considering that all data in the tree will be encrypted using the same key.
I would say serializing the tree first and encrypting the result of that has the biggest con.
What plays a huge role in successfully cracking encryption is often the knowledge about certain characters that appear quite often in the original text – for example the letters e and n in English language – and doing statistical analysis based on that on the encrypted text.
Now lets say you use for example JSON to serialize your tree client-side before encrypting it. As the attacker, I would easily know that, since I can analyze your client-side script at my leisure. So I also know already that the “letters” {, }, [, ], : and " will have a high percentage of occurrence in every “text” that you encrypt … and that the first letter of every text will have been either a { or a [ (based upon whether your tree is an object or an array) – that’s already quite a bit of potentially very useful knowledge about the texts that get encrypted by your app.

Related

Is it possible to encrypt (not hash!) and use a salt?

I am encrypting objects using Node.js native crypto methods like createCipherIv.
const algorithm = "aes256";
const inputEncoding = "utf8";
const outputEncoding = "hex";
const iv = randomBytes(16);
export async function encryptObject(dataToEncrypt: object, key: Buffer) {
const clear = JSON.stringify(dataToEncrypt);
const cipher = createCipheriv(algorithm, key, iv);
let ciphered = cipher.update(clear, inputEncoding, outputEncoding);
ciphered += cipher.final(outputEncoding);
return iv.toString(outputEncoding) + ":" + ciphered;
}
Sometimes I am encrypting the same object multiple times and send it over http(s). That makes me think a man in the middle could observe that communication and maybe gain information about my user by using something like a Rainbow table to map the encrypted Data to real data over time.
Now I'm not sure if my worries make sense, but I'm thinking, that my encryption could be more secure if a add a salt to it. So far I've only come accross salt when hashing, not encrypting. Hashing is not an option for me, because I cannot rely on hashes to be equivalent. I actually have to do something with the data, so I have to be able to decrypt it again.
So my questions are:
Do my thoughts add up, and I would be better of adding salt?
Is it possible to use Node.js native crypto functions for symmetric encryption while adding salt to the mechanism in order to have different encrypted results on every run?

Basically the IV is your salt. That's it purpose (apart from initializing the chaining algorithm). So you are ok with the code you posted here. Initialization vector is random so the encrypted bytes will be different every time.
Just check it with the simple console.log you will see that resulting bytes are totally different every time.
On the other hand I don't think that this (identical encrypted bytes) is much of a concern here. I would make rather sure that the chaining method is at least CBC. Here you can read more about it:
https://en.m.wikipedia.org/wiki/Block_cipher_mode_of_operation
Also if you want to be super secure with the man in the middle attack. You can add some HMAC to your message. This will ensure that no one can flip a bit in your message to make it different. In other words it provides
data integrity and authenticity of a message.
But still if you send data over httpS, all of those safety measures are already in place. Hence the name of the examplary https cipher:
tls_dhe_rsa_with_aes_256_gcm_sha384. Extracting the things that I mentioned here. It uses aes256 with gcm chaining mode and sha348 as a hashing method for the hmac.

understanding different encrypt mode in cryptojs

I am learning about various hashing technique and found interesting library to start with cryptoJs
In the documentation, there are multiple options defined as below
hashing
HMAC
PBKDF2
Ciphers
Encoders
I understanding hashing is about generating the ciphertext. HMAC is about generating message authenticate code. But I am struggling to differentiate between PBKDF2, Ciphers, and Encoders. Which one to choose when?
Any pointers are helpful.

Password-Based Key Derivation Function 2 - PBKDF2 are functions used to create cryptographic keys that are harder to brute force using key-stretching. because humans are lazy and create passwords way too easy to brute force.
For example: our favorite password is "password"
Given a salt of "5C52FBAE9A4D97A49D14C8AF338DA55C"
The cryptographic key becomes
(Hex)A2EB261802FFD1965D034AC252E880A44955078D6D4F12EDCDF6D03549F0
(B64)ousmGAL/0ZZdA0rCUuiApElVB41tTxLtzfbQNUnw
try it here
It becomes apparent that the hash is not as easy to break as "password" on its own.
Nevertheless still possible with pre-computed hashes. You can see more here.
Ciphers on the other hand constitutes of methods for performing encryption as well as decryption. Some ciphers you see in cryptoJs are your basic AES, DES, triple DES etc.
Encoders are simply used for Encoding where encoding is very general. It is largely used to transform data so that another system can understand it. In the technology field, this is largely because every system architecture and technology has their own interpretations. Different applications will understand different encoding as per their need.
In Summary,
Encryption and Encoding are are designed 2 ways whereas PBKDF2 is a method of generating cryptographic keys (hashes) which are designed one way. Encoders are used to encode data into a form that can be transmitted or interpreted by another system.
Putting it in context:
If we want to store the password in a database we hash it because we do not need to know what the password is (no reversal required). However when we sent an encrypted mail to a friend we want to be able to reverse that encryption (decryption). Otherwise the content is lost. When the mail is sent, we added an attachment. The attachment is encoded in a way that other email clients can decode otherwise the other system cannot open up the attachment or will wrongly interpret the data sent.
So Encoding and Encrypting are similar in that encoded text and encrypted text can both be reversed. However, encoded text are meant to be reversed by anyone or any system that gets its hand on the encoded text since the encoding schemes are publicly available but encrypted text such as ciphertext are meant to be reversed only by certain specified individuals i.e. people who possess the key or decryption algorithms. In our example above, we want our attachment to be interpreted by any system but we do not want the content of the email including the attachment to be opened by everyone.

PBKDF2 is used when you want to hash a password but with the usual hashing functions, your password is vulnerable to dictionary attacks. So here comes PBKDF2 and salt.
Ciphers: Those are your normal encrypting functions. If you want to send some encrypted message where only the one with the right key can decrypt it.
Encoders: Are for text encoding formats.

is their is any size limit of the protocol buffer?

I am passing the data from my client to server and vice versa . I want to know is their is any size limit of the protocol buffer .

Citing the official source:
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.
That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.
Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.
As far as I understand the protobuf encoding the following applies:
varints above 64-bit are not specified, but given how their encoding works varint bit-length is not limited by the wire-format (varint consisting of several 1xxxxxxx groups and terminated by a single 0xxxxxxx is perfectly valid -- I suppose there is no actual implementation supporting varints larger than 64-bit thought)
given the above varint encoding property, it should be possible to encode any message length (as varints are used internally to encode length of length-delimited fields and other field types are varints or have a fixed length)
you can construct arbitrarily long valid protobuf messages just by repeating a single repeated field ad-absurdum -- parser should be perfectly happy as long as it has enough memory to store the values (there are even parsers which provide callbacks for field values thus relaxing memory consumption, e.g. nanopb)
(Please do validate my thoughts)

Why CryptoJS produced different value every time the browser loads

Following code is run on a web page via script tag. Every time I load the page or run the code in the browser console - I am getting different value...
var key = 'key-123:456';
var uid = 1234567890;
var encrypted = CryptoJS.AES.encrypt(id, key);
encrypted.toString();
How can I have single "encrypted value"for "single id" regardless of how many times I load the page or run the code in console?

AES is a "block" cipher, which means it operates deterministically on fixed-length blocks from plaintext to ciphertext (and vice versa). However, it's typical (and generally preferred) to use a "mode of operation" that adds non-determinism to the encryption process. For example, CBC mode (which CryptoJS uses by default) XORs a random initialization vector with the plaintext before encrypting it (and, correspondingly, after decrypting it):
This is vastly preferred because otherwise an eavesdropper can detect duplicate blocks, which might allow an attacker to eventually understand what is being communicated -- undoing the entire point of your encryption.
However, it sounds like you want your encryption to have this specific weakness, which suggests to me that maybe you don't really want encryption at all. Instead, you might want a hash, which is a deterministic one-way transformation. (CryptoJS supports several hashes.) With a hash, a given input A will always hash to the same hash value H, so you can compare Hash(A) == Hash(B) to see if A == B. (This isn't a perfect comparison, since hashes have an infinite input space and finite output space, but hashes are deliberately designed so that it's very, very difficult to find two inputs that produce the same output.) This is how websites securely store your password: the service stores Hash(password) instead of password itself, then when a user submits a password entry, the sites compares Hash(entry) and Hash(password) to see if the entry is correct.
var hash = CryptoJS.SHA3(message);
However, if you really do need to reverse the transformed value back into plaintext and not just compare it to another hashed value, then you do need encryption. In that case, you can use the cryptographically inferior ECB mode, which has the weaknesses described above. In CryptoJS, you can do this by supplying an options object with a mode property:
CryptoJS.AES.encrypt(msg, key, { mode: CryptoJS.mode.ECB });

JavaScript Games and Security

Let's say I'm making an HTML5 game using JavaScript and the <canvas> The varaibles are stored in the DOM such as level, exp, current_map, and the like.
Obviously, they can be edited client-side using Firebug. What would I have to do to maximize security, so it would be really hard to edit (and cheat)?

Don't store the variables in the DOM if you wish a reasonable level of security. JavaScript, even if obfuscated, can easily be reverse engineered. That defeats any local encryption mechanisms.
Store key variables server-side and use https to maximize security. Even so, the client code (JavaScript) is quite vulnerable to hacking.

You can use Object.freeze or a polyfill or a framework which does the hiding for you.
Check out http://netjs.codeplex.com/
You could also optionally implement some type of signing system but nothing is really impenetrable. For instance objects locked with Object.freeze or Object.watch can still be manually modified in memory.
What are you really trying to accomplish in the end?
What you could do is send a representation of the matrix of the game or the game itself or a special hash or a combination of both and tally the score at the server... causing the user to not only have to modify the score but to correctly modify the state of the game.

Server-side game logic
You need to keep the sensitive data on the server and a local copy on the browser for display purposes only. Then for every action that changes these values the server should be the one responsible for verifying them. For example if the player needs to solve a puzzle you should never verify the solution client side, but take for example the hash value of the ordered pieces represented as a string and send it to the server to verify that the hash value is correct. Then increase the xp/level of the player and send the information back to the client.

Anything that is living in the client can be modified. That is because in MMORPG the character's data is living on the server, so players can't hack their characters using any memory tools, hex editor, etc (they actually "can", but because the server keep the correct version of the character's data is useless).
A good example was Diablo 2: you have actually two different characters: one for single player (and Network playing with other players where one was the server), and one for Battle.net. In the first case, people could "hack" the character's level and points just editing the memory on the fly or the character file with an hex editor. But that wasn't possible with the character you was using on Battle.net.
Another simple example could be a quiz where you have a limited time to answer. If you handle everything on client side, players could hack it and modify the elapsed time and always get the best score: so you need to store the timestamp on the server as well, and use that value as comparison when you get the answer.
To sum up, it doesn't matter if it's JavaScript, C++ or Assembly: the rule is always "Don't rely on client". If you need security for you game data, you have to use something where the clients have no access: the server.

We Keep Coding

JavaScript is the programming language of the Web.