Why use encodeURIComponent() when writing json to a cookie

Why use encodeURIComponent() when writing json to a cookie - javascript

In particular, when saving a JSON to the cookie is it safe to just save the raw value?
The reason I dopn't want to encode is because the json has small values and keys but a complex structure, so encoding, replacing all the ", : and {}, greatly increases the string length

if your values contain "JSON characters" (e.g. comma, quotes, [] etc) then you should probably use encodeURIComponent so these get escaped and don't break your code when reading the values back.

You can convert your JSON object to a string using the JSON.stringify() method then save it in a cookie.
Note that cookies have a 4000 character limit.
If your Json string is valid there should be no need to encode it.
e.g.
JSON.stringify({a:'foo"bar"',bar:69});
=> '{"a":"foo\"bar\"","bar":69}' valid json stings are escaped.

This is documented very well on MDN
To avoid unexpected requests to the server, you should call encodeURIComponent on any user-entered parameters that will be passed as part of a URI. For example, a user could type "Thyme &time=again" for a variable comment. Not using encodeURIComponent on this variable will give comment=Thyme%20&time=again. Note that the ampersand and the equal sign mark a new key and value pair. So instead of having a POST comment key equal to "Thyme &time=again", you have two POST keys, one equal to "Thyme " and another (time) equal to again.

If you can't be certain that your JSON will not include reserved characters such as ; then you will want to perform escaping on any strings being stored as a cookie. RFC 6265 covers special characters that are not allowed in the cookie-name or cookie-value.
If you are encoding static content you control, then this escaping may be unnecessary. If you are encoding dynamic content such as encoding user generated content, you probably need escaping.
MDN recommends using encodeURIComponent to escape any disallowed characters.
You can pull in a library such as cookie to handle this for you, but if your server is written in another language you will need to ensure it uses a library or language utilities to encodeURIComponent when setting cookies and to decodeURIComponent when reading cookies.
JSON.stringify is not sufficient as illustrated by this trivial example:
const bio = JSON.stringify({ "description": "foo; bar; baz" });
document.cookie = `bio=${stringified}`;
// Notice that the content after the first `;` is dropped.
// Attempting to JSON.parse this later will fail.
console.log(document.cookie) // bio={\"description\":\"foo;

Cookie: name=value; name2=value2
Spaces are part of the cookie separation in the HTTP Cookie header. Raw spaces in cookie values could thus confuse the server.

Related

What does colon ':' in API request mean?

i am using firebase for my project ,the documentation gives me the endpoint for signing in users as:
https://identitytoolkit.googleapis.com/v1/accounts:signInWithPassword?key=[API_KEY]
i want to know what does the colon : mean, for example the word key after the question mark shows its a parameter likewise what does the notion accounts:signInWithPassword mean.The reason:I have an axios instance with config:
axios.create(
{
baseURL:"https://identitytoolkit.googleapis.com/v1",
params:{
apiKey:"somekey"
}
})
now since the baseUrl shown above remains same for firebase signing in with email and password or signing up with email and password. I want to dynamically embed accounts:signInWithPassword and accounts:signUp for respective requests and i am not sure if specifying accounts:respectiveUsecase in params object would work.

A colon doesn't have any special meaning in an URL path. It's just a convention those APIs tend to use in their paths.
There are a handful of metacharacters that do:
question marks (?) and hashes (#) delimit the query or search parts
% is used for escaping characters (e.g. %0A)
+ is sometimes an encoding for a space instead of %20.
& generally separates query parameters (e.g. foo=bar&baz), though this is not a part of the standard. Some server software could expect e.g. semicolon-separated parameters.
As #deceze pointed out, colons do have a special meaning in the host part, e.g. https://user:pass#host/path:where:colons:do:not:matter.

It is a dynamic value (like a parameter where you in pass in a value directly)
:nounId: The colon (:) before the word indicates that we don't mean the literal string "nounId" as part of the endpoint, but rather that we are expecting some dynamic data to be inside there. From the above example of /ski/:skiId, one actual endpoint might be something like /ski/1234 (where 1234 is the unique ID number of one of the skis in our database.
source: https://coursework.vschool.io/rest-api-design/#:~:text=%3AnounId%20%3A%20The%20colon%20(%3A)%20before,data%20to%20be%20inside%20there.

NodeJS escaping back slash

I am facing some issues with escaping of back slash, below is the code snippet I have tried. Issues is how to assign a variable with escaped slash to another variable.
var s = 'domain\\username';
var options = {
user : ''
};
options.user = s;
console.log(s); // Output : domain\username - CORRECT
console.log(options); // Output : { user: 'domain\\username' } - WRONG
Why when I am printing options object both slashes are coming?
I had feeling that I am doing something really/badly wrong here, which may be basics.
Update:
When I am using this object options the value is passing as it is (with double slashes), and I am using this with my SOAP services, and getting 401 error due to invalid user property value.
But when I tried the same with PHP code using same user value its giving proper response, in PHP also we are escaping the value with two slashes.

When you console.log() an object, it is first converted to string using util.inspect(). util.inspect() formats string property values as literals (much like if you were to JSON.stringify(s)) to more easily/accurately display strings (that may contain control characters such as \n). In doing so, it has to escape certain characters in strings so that they are valid Javascript strings, which is why you see the backslash escaped as it is in your code.

The output is correct.
When you set the variable, the escaped backslash is interpreted into a single codepoint.
However, options is an object which, when logged, appears as a JSON blob. The backslash is re-escaped at this point, as this is the only way the backslash can appear validly as a string value within the JSON output.
If you re-read the JSON output from console.log(options) into javascript (using JSON.parse() or similar) and then output the user key, only one backslash will show.
(Following question edit:)
It is possible that for your data to be accepted by the SOAP consuming service, the data needs to be explicitly escaped in-band. In this case, you will need to double-escape it when assigning the value:
var s = 'domain\\\\user'
To definitively determine whether you need to do this or not, I'd suggest you put a proxy between your working PHP app and the SOAP app, and inspect the traffic.

javascript fetch GET request changing cookie value [duplicate]

What are the allowed characters in both cookie name and value? Are they same as URL or some common subset?
Reason I'm asking is that I've recently hit some strange behavior with cookies that have - in their name and I'm just wondering if it's something browser specific or if my code is faulty.

According to the ancient Netscape cookie_spec the entire NAME=VALUE string is:
a sequence of characters excluding semi-colon, comma and white space.
So - should work, and it does seem to be OK in browsers I've got here; where are you having trouble with it?
By implication of the above:
= is legal to include, but potentially ambiguous. Browsers always split the name and value on the first = symbol in the string, so in practice you can put an = symbol in the VALUE but not the NAME.
What isn't mentioned, because Netscape were terrible at writing specs, but seems to be consistently supported by browsers:
either the NAME or the VALUE may be empty strings
if there is no = symbol in the string at all, browsers treat it as the cookie with the empty-string name, ie Set-Cookie: foo is the same as Set-Cookie: =foo.
when browsers output a cookie with an empty name, they omit the equals sign. So Set-Cookie: =bar begets Cookie: bar.
commas and spaces in names and values do actually seem to work, though spaces around the equals sign are trimmed
control characters (\x00 to \x1F plus \x7F) aren't allowed
What isn't mentioned and browsers are totally inconsistent about, is non-ASCII (Unicode) characters:
in Opera and Google Chrome, they are encoded to Cookie headers with UTF-8;
in IE, the machine's default code page is used (locale-specific and never UTF-8);
Firefox (and other Mozilla-based browsers) use the low byte of each UTF-16 code point on its own (so ISO-8859-1 is OK but anything else is mangled);
Safari simply refuses to send any cookie containing non-ASCII characters.
so in practice you cannot use non-ASCII characters in cookies at all. If you want to use Unicode, control codes or other arbitrary byte sequences, the cookie_spec demands you use an ad-hoc encoding scheme of your own choosing and suggest URL-encoding (as produced by JavaScript's encodeURIComponent) as a reasonable choice.
In terms of actual standards, there have been a few attempts to codify cookie behaviour but none thus far actually reflect the real world.
RFC 2109 was an attempt to codify and fix the original Netscape cookie_spec. In this standard many more special characters are disallowed, as it uses RFC 2616 tokens (a - is still allowed there), and only the value may be specified in a quoted-string with other characters. No browser ever implemented the limitations, the special handling of quoted strings and escaping, or the new features in this spec.
RFC 2965 was another go at it, tidying up 2109 and adding more features under a ‘version 2 cookies’ scheme. Nobody ever implemented any of that either. This spec has the same token-and-quoted-string limitations as the earlier version and it's just as much a load of nonsense.
RFC 6265 is an HTML5-era attempt to clear up the historical mess. It still doesn't match reality exactly but it's much better then the earlier attempts—it is at least a proper subset of what browsers support, not introducing any syntax that is supposed to work but doesn't (like the previous quoted-string).
In 6265 the cookie name is still specified as an RFC 2616 token, which means you can pick from the alphanums plus:
!#$%&'*+-.^_`|~
In the cookie value it formally bans the (filtered by browsers) control characters and (inconsistently-implemented) non-ASCII characters. It retains cookie_spec's prohibition on space, comma and semicolon, plus for compatibility with any poor idiots who actually implemented the earlier RFCs it also banned backslash and quotes, other than quotes wrapping the whole value (but in that case the quotes are still considered part of the value, not an encoding scheme). So that leaves you with the alphanums plus:
!#$%&'()*+-./:<=>?#[]^_`{|}~
In the real world we are still using the original-and-worst Netscape cookie_spec, so code that consumes cookies should be prepared to encounter pretty much anything, but for code that produces cookies it is advisable to stick with the subset in RFC 6265.

In ASP.Net you can use System.Web.HttpUtility to safely encode the cookie value before writing to the cookie and convert it back to its original form on reading it out.
// Encode
HttpUtility.UrlEncode(cookieData);
// Decode
HttpUtility.UrlDecode(encodedCookieData);
This will stop ampersands and equals signs spliting a value into a bunch of name/value pairs as it is written to a cookie.

I think it's generally browser specific. To be on the safe side, base64 encode a JSON object, and store everything in that. That way you just have to decode it and parse the JSON. All the characters used in base64 should play fine with most, if not all browsers.

Here it is, in as few words as possible. Focus on characters that need no escaping:
For cookies:
abdefghijklmnqrstuvxyzABDEFGHIJKLMNQRSTUVXYZ0123456789!#$%&'()*+-./:<>?#[]^_`{|}~
For urls
abdefghijklmnqrstuvxyzABDEFGHIJKLMNQRSTUVXYZ0123456789.-_~!$&'()*+,;=:#
For cookies and urls ( intersection )
abdefghijklmnqrstuvxyzABDEFGHIJKLMNQRSTUVXYZ0123456789!$&'()*+-.:#_~
That's how you answer.
Note that for cookies, the = has been removed because it is
usually used to set the cookie value.
For urls this the = was kept. The intersection is obviously without.
var chars = "abdefghijklmnqrstuvxyz"; chars += chars.toUpperCase() + "0123456789" + "!$&'()*+-.:#_~";
Turns out escaping still occuring and unexpected happening, especially in a Java cookie environment where the cookie is wrapped with double quotes if it encounters the last characters.
So to be safe, just use A-Za-z1-9. That's what I am going to do.

Newer rfc6265 published in April 2011:
cookie-header = "Cookie:" OWS cookie-string OWS
cookie-string = cookie-pair *( ";" SP cookie-pair )
cookie-pair = cookie-name "=" cookie-value
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
If you look to #bobince answer you see that newer restrictions are more strict.

you can not put ";" in the value field of a cookie, the name that will be set is the string until the ";" in most browsers...

that's simple:
A <cookie-name> can be any US-ASCII characters except control
characters (CTLs), spaces, or tabs. It also must not contain a
separator character like the following: ( ) < > # , ; : \ " / [ ] ? =
{ }.
A <cookie-value> can optionally be set in double quotes and any
US-ASCII characters excluding CTLs, whitespace, double quotes, comma,
semicolon, and backslash are allowed. Encoding: Many implementations
perform URL encoding on cookie values, however it is not required per
the RFC specification. It does help satisfying the requirements about
which characters are allowed for though.
Link: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#Directives

There are 2 versions of cookies specifications
1. Version 0 cookies aka Netscape cookies,
2. Version 1 aka RFC 2965 cookies
In version 0 The name and value part of cookies are sequences of characters, excluding the semicolon, comma, equals sign, and whitespace, if not used with double quotes
version 1 is a lot more complicated you can check it here
In this version specs for name value part is almost same except name can not start with $ sign

There is another interesting issue with IE and Edge. Cookies that have names with more than 1 period seem to be silently dropped.
So
This works:
cookie_name_a=valuea
while this will get dropped
cookie.name.a=valuea

One more consideration. I recently implemented a scheme in which some sensitive data posted to a PHP script needed to convert and return it as an encrypted cookie, that used all base64 values I thought were guaranteed 'safe". So I dutifully encrypted the data items using RC4, ran the output through base64_encode, and happily returned the cookie to the site. Testing seemed to go well until a base64 encoded string contained a "+" symbol. The string was written to the page cookie with no trouble. Using the browser diagnostics I could also verify the cookies was written unchanged. Then when a subsequent page called my PHP and obtained the cookie via the $_COOKIE array, I was stammered to find the string was now missing the "+" sign. Every occurrence of that character was replaced with an ASCII space.
Considering how many similar unresolved complaints I've read describing this scenario since then, often siting numerous references to using base64 to "safely" store arbitrary data in cookies, I thought I'd point out the problem and offer my admittedly kludgy solution.
After you've done whatever encryption you want to do on a piece of data, and then used base64_encode to make it "cookie-safe", run the output string through this...
// from browser to PHP. substitute troublesome chars with
// other cookie safe chars, or vis-versa.
function fix64($inp) {
$out =$inp;
for($i = 0; $i < strlen($inp); $i++) {
$c = $inp[$i];
switch ($c) {
case '+': $c = '*'; break; // definitly won't transfer!
case '*': $c = '+'; break;
case '=': $c = ':'; break; // = symbol seems like a bad idea
case ':': $c = '='; break;
default: continue;
}
$out[$i] = $c;
}
return $out;
}
Here I'm simply substituting "+" (and I decided "=" as well) with other "cookie safe" characters, before returning the encoded value to the page, for use as a cookie. Note that the length of the string being processed doesn't change. When the same (or another page on the site) runs my PHP script again, I'll be able to recover this cookie without missing characters. I just have to remember to pass the cookie back through the same fix64() call I created, and from there I can decode it with the usual base64_decode(), followed by whatever other decryption in your scheme.
There may be some setting I could make in PHP that allows base64 strings used in cookies to be transferred back to to PHP without corruption. In the mean time this works. The "+" may be a "legal" cookie value, but if you have any desire to be able to transmit such a string back to PHP (in my case via the $_COOKIE array), I'm suggesting re-processing to remove offending characters, and restore them after recovery. There are plenty of other "cookie safe" characters to choose from.

If you are using the variables later, you'll find that stuff like path actually will let accented characters through, but it won't actually match the browser path. For that you need to URIEncode them. So i.e. like this:
const encodedPath = encodeURI(myPath);
document.cookie = `use_pwa=true; domain=${location.host}; path=${encodedPath};`
So the "allowed" chars, might be more than what's in the spec. But you should stay within the spec, and use URI-encoded strings to be safe.

Years ago MSIE 5 or 5.5 (and probably both) had some serious issue with a "-" in the HTML block if you can believe it. Alhough it's not directly related, ever since we've stored an MD5 hash (containing letters and numbers only) in the cookie to look up everything else in server-side database.

I ended up using
cookie_value = encodeURIComponent(my_string);
and
my_string = decodeURIComponent(cookie_value);
That seems to work for all kinds of characters. I had weird issues otherwise, even with characters that weren't semicolons or commas.

Converting textarea value into a valid JSON string

I am trying to make sure input from user is converted into a valid JSON string before submitted to server.
What I mean by 'Converting' is escaping characters such as '\n' and '"'.
Btw, I am taking user input from HTML textarea.
Converting user input to a valid JSON string is very important for me as it will be posted to the server and sent back to client in JSON format. (Invalid JSON string will make whole response invalid)
If User entered
Hello New World,
My Name is "Wonderful".
in HTML <textarea>,
var content = $("textarea").val();
content will contain new-line character and double quotes character.
It's not a problem for server and database to handle and store data.
My problem occurs when the server sends back the data posted by clients to them in JSON format as they were posted.
Let me clarify it further by giving some example of my server's response.
It's a JSON response and looks like this
{ "code": 0, "id": 1, "content": "USER_POSTED_CONTENT" }
If USER_POSTED_CONTENT contains new-line character '\n', double quotes or any characters that are must be escaped but not escaped, then it is no longer a valid JSON string and client's JavaScript engine cannot parse data.
So I am trying to make sure client is submitting valid JSON string.
This is what I came up with after doing some researches.
String.prototype.escapeForJson = function() {
return this
.replace(/\b/g, "")
.replace(/\f/g, "")
.replace(/\\/g, "\\")
.replace(/\"/g, "\\\"")
.replace(/\t/g, "\\t")
.replace(/\r/g, "\\r")
.replace(/\n/g, "\\n")
.replace(/\u2028/g, "\\u2028")
.replace(/\u2029/g, "\\u2029");
};
I use this function to escape all the characters that need to be escaped in order to create a valid JSON string.
var content = txt.val().escapeForJson();
$.ajax(
...
data:{ "content": content }
...
);
But then... it seems like str = JSON.stringify(str); does the same job!
However, after reading what JSON.stringify is really for, I am just confused. It says JSON.stringify is to convert JSON Object into string.
I am not really converting JSON Object to string.
So my question is...
Is it totally ok to use JSON.stringify to convert user input to valid JSON string object??
UPDATES:
JSON.stringify(content) worked good but it added double quotes in the beginning and in the end. And I had to manually remove it for my needs.

Yep, it is totally ok.
You do not need to re-invent what does exist already, and your code will be more useable for another developer.
EDIT:
You might want to use object instead a simple string because you would like to send some other information.
For example, you might want to send the content of another input which will be developed later.
You should not use stringify is the target browser is IE7 or lesser without adding json2.js.

I don't think JSON.stringify does what you need. Check the out the behavior when handling some of your cases:
JSON.stringify('\n\rhello\n')
*desired : "\\n\\rhello\\n"
*actual : "\n\rhello\n"
JSON.stringify('\b\rhello\n')
*desired : "\\rhello\\n"
*actual : "\b\rhello\n"
JSON.stringify('\b\f\b\f\b\f')
*desired : ""
*actual : ""\b\f\b\f\b\f""
The stringify function returns a valid JSON string. A valid JSON string does not require these characters to be escaped.
The question is... Do you just need valid JSON strings? Or do you need valid JSON strings AND escaped characters? If the former: use stringify, if the latter: use stringify, and then use your function on top of it.
Highly relevant: How to escape a JSON string containing newline characters using javascript?

Complexity. I don't know what say.
Take the urlencode function from your function list and kick it around a bit.
<?php
$textdata = $_POST['textdata'];
///// Try without this one line and json encoding tanks
$textdata = urlencode($textdata);
/******* textarea data slides into JSON string because JSON is designed to hold urlencoded strings ******/
$json_string = json_encode($textdata);
//////////// decode just for kicks and used decoded for the form
$mydata = json_decode($json_string, "true");
/// url decode
$mydata = urldecode($mydata['textdata']);
?>
<html>
<form action="" method="post">
<textarea name="textdata"><?php echo $mydata; ?></textarea>
<input type="submit">
</html>
Same thing can be done in Javascript to store textarea data in local storage. Again textarea will fail unless all the unix formatting is deal with. The answer is take urldecode/urlencode and kick it around.
I believe that urlencode on the server side will be a C wrapped function that iterates the char array once verses running a snippet of interpreted code.
The text area returned will be exactly what was entered with zero chance of upsetting a wyswyg editor or basic HTML5 textarea which could use a combination of HTML/CSS, DOS, Apple and Unix depending on what text is cut/pasted.
The down votes are hilarious and show an obvious lack of knowledge. You only need to ask yourself, if this data were file contents or some other array of lines, how would you pass this data in a URL? JSON.stringify is okay but url encoding works best in a client/server ajax.

How to handle possibly HTML encoded values in javascript

I have a situation where I'm not sure if the input I get is HTML encoded or not. How do I handle this? I also have jQuery available.
function someFunction(userInput){
$someJqueryElement.text(userInput);
}
// userInput "<script>" returns "<script>", which is fine
// userInput "<script>" returns &lt;script&gt;", which is bad
I could avoid escaping ampersands (&), but what are the risks in that? Any help is very much appreciated!
Important note: This user input is not in my control. It returns from a external service, and it is possible for someone to tamper with it and avoid the html escaping provided by that service itself.

You really need to make sure you avoid these situations as it introduces really difficult conditions to predict.
Try adding an additional variable input to the function.
function someFunction(userInput, isEncoded){
//Add some conditional logic based on isEncoded
$someJqueryElement.text(userInput);
}
If you look at products like fckEditor, you can choose to edit source or use the rich text editor. This prevents the need for automatic encoding detection.
If you are still insistent on automatically detecting html encoding characters, I would recommend using index of to verify that certain key phrases exist.
str.indexOf('<') !== -1
This example above will detect the < character.
~~~New text added after edit below this line.~~~
Finally, I would suggest looking at this answer. They suggest using the decode function and detecting lengths.
var string = "Your encoded & decoded string here"
function decode(str){
return decodeURIComponent(str).replace(/</g,'<').replace(/>/g,'>');
}
if(string.length == decode(string).length){
// The string does not contain any encoded html.
}else{
// The string contains encoded html.
}
Again, this still has the problem of a user faking out the process by entering those specially encoded characters, but that is what html encoding is. So it would be proper to assume html encoding as soon as one of these character sequences comes up.

You must always correctly encode untrusted input before concatenating it into a structured language like HTML.
Otherwise, you'll enable injection attacks like XSS.
If the input is supposed to contain HTML formatting, you should use a sanitizer library to strip all potentially unsafe tags & attributes.
You can also use the regex /<|>|&(?![a-z]+;) to check whether a string has any non-encoded characters; however, you cannot distinguish a string that has been encoded from an unencoded string that talks about encoding.

We Keep Coding

JavaScript is the programming language of the Web.