correct way to exclude certain characters from crypto.randomBytes - javascript

i have the following code, based on http://nodejs.org/docs/v0.6.9/api/crypto.html#randomBytes
crypto.randomBytes 32, (ex, buf) ->
user.tokenString = buf.toString("hex")
user.tokenExpires = Date.now() + TOKEN_TIME
next()
i am using this to generate a tokenString to use for a node.js/express user validation.
in some cases the tokenString generated includes '/' forward slash character, and this breaks my routes, for example, tokenString if the tokenString is like '$2a$10$OYJn2r/Ts.guyWqx7iJTwO8cij80m.uIQV9nJgTt18nqu8lT8OqPe' it can't find /user/activate/$2a$10$OYJn2r and i get an 404 error
is there a more direct way to exclude certain characters from being included when generating the crypto.randomBytes?

Crypto.randomBytes generates random bytes . That has nothing to do with characters, characters are determined by the way we look at the bytes.
For example:
user.tokenString = buf.toString("hex")
Would convert the buffer to a string (where two characters represent each byte), in the character range 0-9a-f
Another (might be more suiting approach is to use a more compact encoding. Base64Url is an encoding that provides string encoding that is URL/Filename safe
user.tokenString = base64url(buf)
Here is an NPM package you can use for it.
Other than that, your code seems fine. If you were to call .toString() without specifying "hex" or specifying something like "ascii" for example, it would break just like in your question description.

Related

Why is split and join doing this?

I'm getting some very weird behaviour that I don't understand using JavaScript split and join. I'm sending names with spaces in an API call but using %3 as the delimiter for spaces, as in, if I'm sending "Ammar Ahmed", the API call would look like: api/v1?q=Ammar%3Ahmed. In the server code, when I split it up again with q.split("%3").join(" ") because the database contains names with spaces, for one name in particular: "Ashwini Bettahalsoor", I'm getting "Ashwini;ettahalsoor". I'm very confused why its doing this, its splitting it including the B and joining it with a ; but it works perfectly normal for all names that the last name does not start with B. I'm sure it has something to do with the letter B but first of all I'm curious as to why this is happening and secondly, I'm wondering what I should use instead of %3 for spaces in the API call.
%3 is not the correct encoding for a space. You're getting ; because %3B is the encoding for that character. URI encoding always uses 2 hex digits.
You should use encodeURICompnent() to generate the correct encoding.
let url = 'api/v1?q=' + encodeURIComponent('Ashwini Bettahalsoor');
And on the server you should use middleware that decodes the query parameters for you, rather than using split() and join() explicitly.

How to fix an invalid random string to make it JSON valid

In Javascript, I need to "fix" a string, supposed to be JSON valid but may not be. The string has the following format (the unknown part is marked with "<INVALID_CHARS>"):
[
{ "key_1": "ok_data", "key_2": "something_valid <INVALID_CHARS>"},
{ "key_1": "ok_data", "key_2": "some_valid_value"}
]
"INVALID_CHARS" are chars which make the JSON.parse() function fail.
The errors are always localized on the "key_2" property of this array elements.
Note that these chars come from random binary data, and can thus be anything.
I would like to find the simplest solution, or at least one which is the least prone to errors.
I thought of replacing invalid characters, but there is also a problem with single backslash chars followed by a non special char, throwing an error too, or quote chars.
And I probably did not think of all the possible errors.
Thank you.
JSON is not allowed to contain arbitrary binary data; it must be a sequence of valid Unicode codepoints. (Usually these are transmitted in UTF-8 encoding, but regardless, arbitrary binary data is not possible.) So if you want to include arbitrary binary data you'll need to figure out how to unambiguously encode it for transmission. If you don't encode it in some way, then you won't be able to reliably distinguish a byte which happens to have the same code as " from the " which terminates the string.
There are a number of possible encodings you might use for which standard libraries exist in most languages. One of the most commonly used is base-64.
it's better to clarify the problem as seems you described wide range of the issues here. If you have problem with parsing structure above you just need to check the syntactic integrity of the structure. For example this structure parses well
let var1 = JSON.parse('[
{
"key_1":"ok_data",
"key_2":"something_valid <INVALID_CHARS>"
},
{
"key_1":"ok_data",
"key_2":"some_valid_value"
}
]');
In case if you need to replace <INVALID_CHARS> as binary data with json characters it's possible to encode <INVALID_CHARS> in base64 as it's the most reliable way. But I guess also problem not only to pack <INVALID_CHARS> to base64 and problem is also architectural and you need to prepare value of key_2 with valid part and invalid part. In this way, I would suggest separate (split) key_2 on two substrings separate by " " - "key_2": "something_valid <INVALID_CHARS>(can be omitted)".
Moreover, it's possible to use separate fields for string without error and a second for errors. Like this "key_2_1": "something_valid", "key_2_2":<INVALID_CHARS>
Another way is to look to using Multipart Form Data if it's possible, to transfer binary data

Javascript encodeURI() vs. PHP rawurldecode() and special characters

Encoding a string with German umlauts like ä,ü,ö,ß with Javascript encodeURI() causes a weird bug after decoding it in PHP with rawurldecode(). Although the string seems to be correctly decoded it isn't. See below example screenshots from my IDE
Also the strlen() of the - with rawurldecode() - decoded string gives more characters than it really has!
Problems occur when I need to process the decoded string, for example if I want to replace the German characters ä,ü,ö with ae, ue and oe. This can be seen in the example provided here.
I have also made an PHP fiddle where this whole weirdness can be seen.
What I've tried so far:
- utf8_decode
- iconv
- and also first two suggestions from here
This is a Unicode equivalence issue and it looks like your IDE doesnt handle multibyte strings very well.
In unicode you can represent Ü with either:
the single unicode codepoint (U+00DC) or %C3%9C in utf8
or use a capital U (U+0055) with a modifier (U+0308) or %55%CC%88 in utf8
Your GWT string uses the latter method called NFD while your one from PHP uses the first method called NFC. That's why your GWT string is 3 characters longer even though they are both valid encodings of logically identical unicode strings. Your problem is that they are not identical byte for byte in PHP.
More details about utf-8 normalisation.
If you want to do preg replacements on the strings you need to normalise them to the same form first. From your example I can see your IDE is using NFC since it's the PHP string that works. So I suggest normalising to NFC form in PHP (the default), then doing the preg_replace.
http://php.net/manual/en/normalizer.normalize.php
function cleanImageName($name)
{
$name = Normalizer::normalize( $name, Normalizer::FORM_C );
$clean = preg_replace(
Otherwise you have to do something like this which is based on this article.

Websafe encoding of hashed string in nodejs

I am creating a re-director of sorts in nodejs. I have a few values like
userid // superid
these I would like to hash to prevent users from retrieving the url and faking someone else's url and also base64 encode to minimize the length of the url created.
http://myurl.com/~hashedtoken
where un-hashed hashtoken could be something like this
55q322q23
55 = userid
I thought about using crypto library like so:
crypto.createHash('md5').update("55q322q23").digest("base64");
which returns: u/mxNJQaSs2HYJ5wirEZOQ==
The problem here is that I have the / which is not considered websafe so I would like to strip the un-safe letters from the base64 list of letters, somehow. Any ideas about this or perhaps a better solution to the problem at hand?
You could use a so called URL safe variant of Base64. The most common variant, described in RFC 4648, uses - and _ instead of + and / respectively, and omits padding characters (=).
Most implementations of Base64 support this URL safe variant too, though if yours doesn't, it's easy enough to do manually.
Here's what I used. Comments welcome :-)
The important bit is buffer.toString('base64'), then URL-safeing that base64 string with a couple of replace()s.
function newId() {
// Get random string with 20 bytes secure randomness
var crypto = require('crypto');
var id = crypto.randomBytes(20).toString('base64');
// Make URL safe
return id.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
}
Based on the implementation here.
Makes a string safe for URL's and local email addresses (before the #).

Using Crockford's base 32 for IDs in URLs?

I'd like to write some IDs for use in URLs in Crockford's base32. I'm using the base32 npm module.
So, for example, if the user types in http://domain/page/4A2A I'd like it to map to the same underlying ID as http://domain/page/4a2a
This is because I want human-friendly URLs, where the user doesn't have to worry about the difference between upper- and lower-case letters, or between "l" and "1" - they just get the page they expect.
But I'm struggling to implement this, basically because I'm too dim to understand how encoding works. First I tried:
var encoded1 = base32.encode('4a2a');
var encoded2 = base32.encode('4A2A');
console.log(encoded1, encoded2);
But they map to different underlying IDs:
6hgk4r8 6h0k4g8
OK, so maybe I need to use decode?
var encoded1 = base32.decode('4a2a');
var encoded2 = base32.decode('4A2A');
console.log(encoded1, encoded2);
No, that just gives me empty strings:
" "
What am I doing wrong, and how can I get 4A2A and 4A2A to map to the same thing?
For an incoming request, you'll want to decode the URL fragment. When you create URLs, you will take your identifier and encode it. So, given a URL http://domain/page/dnwnyub46m50, you will take that fragment and decode it. Example:
#> echo 'dnwnyub46m50'| base32 -d
my_id5
The library you linked to is case-insensitive, so you get the same result this way:
echo 'DNWNYUB46M50'| base32 -d
my_id5
When dealing with any encoding scheme (Base-16/32/64), you have two basic operations: encode, which works on a raw stream of bits/bytes, and decode which takes an encoded set of bytes and returns the original bit/byte stream. The Wikipedia page on Base32 encoding is a great resource.
When you decode a string, you get raw bytes: it may be that those bytes are not compatible with ASCII, UTF-8, or some other encoding which you are trying to work with. This is why your decoded examples look like spaces: the tools you are using do not recognize the resulting bytes as valid characters.
How you go about encoding identifiers depends on how your identifiers are generated. You didn't say how you were generating the underlying identifiers, so I can't make any assumptions about how you should handle the raw bytes that come out of the decoder, nor about the content of the raw bytes being passed into the encoder.
It's also important to mention that the library you linked to is not compatible with Crockford's Base32 encoding. The library excludes I, L, O, S, while Crockford's encoding excludes I, L, O, U. This would be a problem if you were trying to interoperate with another system that used a different library. If no one besides you will ever need to decode your URL fragments, then interoperability doesn't matter.
The source of your confusion is that a base64 or base32 are methods of representing numbers- whereas you are attempting in your examples to encode or decode text strings.
Encoding and decoding text strings as base32 is done by first converting the string into a large number. In your first examples, where you are encoding "4a2a" and "4A2A", those are strings with two different numeric values, that consequently translate to encoded base32 numbers with two different values, 6hgk4r8 6h0k4g8
when you "decode" 4a2a and 4A2A you say you get empty strings. However this is not true, the strings are not empty, they contain what the decoded number looks like, when interpreted as a string. Which is to say, it looks like nothing because 4a2a produces an unprintable character. It's invisible. What you want is to feed the encoder numbers, not strings.
JavaScript has
parseInt(num, 32)
and
num.toString(32)
built in in a way that's compatible with Java and across JavaScript versions.

Categories