I want to generate a secure and ASCII encoded random nonce for my JavaScript program (it has to work in WebKit). Preferably, I want to use an off-the-shelf tool. So far, the only answers I found was this:
Secure random numbers in javascript?
The only problem with this is that the result is random, and I'm not sure how I can ASCII encode it without sacrificing security (i.e., making it less random).
Edit: Math.random() is not cryptographically secure, and window.crypto only generates values in multiples of a byte (ASCII values are 7 bits, so if the encoding mechanism is implemented naively, it would reduce the randomness).
You can encode arbitrary byte arrays as Base64 to get pure ASCII.
Related
I have a string that I k'now for sure has only ASCII lettes.
JS treats strings as UTF-8 by default,
so it means that every character takes up to 4 bytes,
which is 4 times ASCII.
I'm trying to compress / save spaces / get the shortest string as possible,
by having an encode and decode functions.
I thought about representing 4 characters of ASCII on a UTF-8 string and by that achieve my goals, is there anything like that?
If not, what is the best way to compress ASCII strings, so that by encoding and decoding I'll reach the same string?
Actually JavaScript encodes program strings in UTF-16, which uses 2 octets (16 bits) for Unicode characters in the BMP (Basic Multilingual Plane) and 4 octets (32 bits) for characters outside it. So internally at least, ASCII characters use 2 bytes.
There is room to pack two ASCII characters into 16 bits since they only use 7 bits each. Furthermore, since the difference between 2**16 and 2**14 is 49152, and the number of encodings used by surrogate pairs in UTF-16 is (allegedly) 2048, you should be able to devise an encoding scheme that avoids the range of code points used by surrogates.
You could also use 8 bit typed arrays to hold ASCII characters while avoiding the complexity of a custom compression algorithm.
The purpose of compressing 7 bit ASCII for use within JavaScript is largely (entirely?) academic these days and not something there is a demand for. Note that encoding 7 bit ASCII content into UTF-8 (for transmission or file encoding) only uses one byte for ASCII characters due to the design of UTF-8.
If you want to use 1 byte per character you can simply use a byte. There is already a function to change to a string from bytes.
I am trying to develop a hybrid mobile app with QR code functionality. QR Code contains a limited number of character can be stored with it. So, I am thinking is it possible to compress the string to make it shorter so that I can store more info into the QR code?
At lengths that short, most compression algorithms will actually make data longer, not shorter. There are some algorithms which may work well, though… smaz comes to mind. However, it is going to depend heavily on what you are trying to compress, and you haven't really provided any information about that.
Instead of thinking about compression, your best bet may be to find an encoding scheme which makes more sense for your data. For example, if you're encoding a date and time, store it as a single number instead of text. Think about whether you really need seconds. If you are storing numbers, consider using variable-length quantities. If your data is JSON, consider using protobuf instead.
If what you have really is text, it may be worth considering coming up with your own character set. Instead of ASCII where each character 8 bits, can you limit yourself to 64 characters? a-z, A-Z, 0-9, and two punctuation characters is only 64 possible symbols… if that is all you need, you could use a 6-bit encoding. If the strings aren't case-sensitive you have tons of room for punctuation.
I'm struggling to find any resources on this online, which is concerning.
I've been reading about UCS-2 and UTF-16 woes, but I can't find a solution.
I need to get a value from an input:
var val = $('input').val()
and encode it to base64, treating the text as utf-16, so:
this is a test
becomes:
dABoAGkAcwAgAGkAcwAgAGEAIAB0AGUAcwB0AA==
and not the below, which you get treating it as UTF-8:
dGhpcyBpcyBhIHRlc3Q=
Your data, once read into JavaScript, will be in an encodingless numerical format (strictly speaking, it has to be in Unicode Normalised Form C, but Unicode is just a series of identifying numbers for each glyph in the Unicode lexicon. It's encoding-less). So: if you specifically need the data encoded as a UTF-16 byte sequence, do so, then base64 encode that.
But here's the fun part: which UTF-16 do you need? Little or Big Endian? With or without BOM? UTF-16 is a really inconvenient encoding format (we're not even going to touch UCS-2. It's obsolete. Has been for a long time).
What you really should need is to get a text value from your HTML element, Base64 encode its value, and then have whatever receives that data unpack it as UTF8; don't try to make JavaScript do more work than it has to. I presume you're sending this data to a server or something, in which case: your server language is way more elaborate than JavaScript, and can unpack text in about a million different encodings thanks to built-in functions. So just use that. Don't solve Y for X.
The question is pretty simple: how much RAM in bytes does each character in an ECMAScript/JavaScript string consume?
I am going to guess two bytes, since the standard says they are stored as 16-bit unsigned integers?
Does this mean each character is always two bytes?
Yes, I believe that is the case. The characters are probably stored as widestrings or UCS2-strings.
They may be UTF-16, in which case they take up two Words (16 bit integers) per character for characters outside the BMP (Basic Multilingual Plane), but I believe these characters are not fully supported. Read This blog post about problems in the UTF16 implementation of ECMA.
Most modern languages store their strings with two byte characters. This way you have full support for all spoken languages. It costs a little extra memory, but that's peanuts for any modern computer with multiGig RAM. Storing the string in more compact UTF8 will cause processing to be more complex and slower. UTF8 is therefore mostly used for transportation only. ASCII supports only Latin alphabet without diacritics. ANSI is still limited and needs a specified code page to make sense.
Section 4.13.16 of ECMA-262 explicitly defines "String value" as a "primitive value that is a finite ordered sequence of zero or more 16-bit unsigned integers". It suggests that programs use these 16-bit values as UTF-16 text, but it is legal simply to use a string to store any immutable array of unsigned shorts.
Note that character size isn't the only thing that makes up the string size. I don't know about the exact implementation (and it might differ), but strings tend to have a 0x00 terminator to make them compatible with PChars. And they probably have some header that contains the string size and maybe some refcounting and even encoding information. A string with one character can easily consume 10 bytes or more (yes, that's 80 bits).
I want to encode a number which would be maximal 10 digits long using a key with ASCII chars.
The encoded string should be decodable with the same key , which should return the decoded number.
10 => encoding_with("secret_pass") => hash => decoding_with("secret_pass") => 10
Both operations should work the same way in Javascript and Ruby.
What algorythm should I use for this purpose ?
All data should be in ASCII, no multibyte data for input, hash, and the key.
A simple XOR should be sufficient.
JavaScript
Ruby
Yes, you can go overboard and break out the full crypto suite, but since one part of the algorithm will run in-browser, in JS (completely untrusted environment), any attempt at Serious Cryptography™ will give you a false sense of security (in other words, will actually make your system less secure).
If you're trying to protect the data in transit, use the tool that's made for the job; in this case, HTTPS.
I would look into some for of a Symmetric Key Encryption, and the most prominent one AES. AES is pretty much the standard and is implemented in both languages. Just make sure to use the same key and salt for encryption and decryption.
Javascript AES
Ruby AES
If you want it to be simple, and security isn't a great concern (since the javascript... observation in the comments), the easiest way would be to simple generate any random value, bigger (more digits) than the number, and simple XOR it with the number. This assumes the key (=random number) would be previously shared with the other program.
1. generate random number with 10 hex digits -> KEY
2. take the number then do (number XOR key ) -> result
3. send result
4. get the result and do (result XOR key) -> number
something better (stronger) would be to use any public-private key system. Exchange keys, crypt with public on one side, decrypt with private on the other side.