javascript fetch GET request changing cookie value [duplicate] - javascript

What are the allowed characters in both cookie name and value? Are they same as URL or some common subset?
Reason I'm asking is that I've recently hit some strange behavior with cookies that have - in their name and I'm just wondering if it's something browser specific or if my code is faulty.

According to the ancient Netscape cookie_spec the entire NAME=VALUE string is:
a sequence of characters excluding semi-colon, comma and white space.
So - should work, and it does seem to be OK in browsers I've got here; where are you having trouble with it?
By implication of the above:
= is legal to include, but potentially ambiguous. Browsers always split the name and value on the first = symbol in the string, so in practice you can put an = symbol in the VALUE but not the NAME.
What isn't mentioned, because Netscape were terrible at writing specs, but seems to be consistently supported by browsers:
either the NAME or the VALUE may be empty strings
if there is no = symbol in the string at all, browsers treat it as the cookie with the empty-string name, ie Set-Cookie: foo is the same as Set-Cookie: =foo.
when browsers output a cookie with an empty name, they omit the equals sign. So Set-Cookie: =bar begets Cookie: bar.
commas and spaces in names and values do actually seem to work, though spaces around the equals sign are trimmed
control characters (\x00 to \x1F plus \x7F) aren't allowed
What isn't mentioned and browsers are totally inconsistent about, is non-ASCII (Unicode) characters:
in Opera and Google Chrome, they are encoded to Cookie headers with UTF-8;
in IE, the machine's default code page is used (locale-specific and never UTF-8);
Firefox (and other Mozilla-based browsers) use the low byte of each UTF-16 code point on its own (so ISO-8859-1 is OK but anything else is mangled);
Safari simply refuses to send any cookie containing non-ASCII characters.
so in practice you cannot use non-ASCII characters in cookies at all. If you want to use Unicode, control codes or other arbitrary byte sequences, the cookie_spec demands you use an ad-hoc encoding scheme of your own choosing and suggest URL-encoding (as produced by JavaScript's encodeURIComponent) as a reasonable choice.
In terms of actual standards, there have been a few attempts to codify cookie behaviour but none thus far actually reflect the real world.
RFC 2109 was an attempt to codify and fix the original Netscape cookie_spec. In this standard many more special characters are disallowed, as it uses RFC 2616 tokens (a - is still allowed there), and only the value may be specified in a quoted-string with other characters. No browser ever implemented the limitations, the special handling of quoted strings and escaping, or the new features in this spec.
RFC 2965 was another go at it, tidying up 2109 and adding more features under a ‘version 2 cookies’ scheme. Nobody ever implemented any of that either. This spec has the same token-and-quoted-string limitations as the earlier version and it's just as much a load of nonsense.
RFC 6265 is an HTML5-era attempt to clear up the historical mess. It still doesn't match reality exactly but it's much better then the earlier attempts—it is at least a proper subset of what browsers support, not introducing any syntax that is supposed to work but doesn't (like the previous quoted-string).
In 6265 the cookie name is still specified as an RFC 2616 token, which means you can pick from the alphanums plus:
!#$%&'*+-.^_`|~
In the cookie value it formally bans the (filtered by browsers) control characters and (inconsistently-implemented) non-ASCII characters. It retains cookie_spec's prohibition on space, comma and semicolon, plus for compatibility with any poor idiots who actually implemented the earlier RFCs it also banned backslash and quotes, other than quotes wrapping the whole value (but in that case the quotes are still considered part of the value, not an encoding scheme). So that leaves you with the alphanums plus:
!#$%&'()*+-./:<=>?#[]^_`{|}~
In the real world we are still using the original-and-worst Netscape cookie_spec, so code that consumes cookies should be prepared to encounter pretty much anything, but for code that produces cookies it is advisable to stick with the subset in RFC 6265.

In ASP.Net you can use System.Web.HttpUtility to safely encode the cookie value before writing to the cookie and convert it back to its original form on reading it out.
// Encode
HttpUtility.UrlEncode(cookieData);
// Decode
HttpUtility.UrlDecode(encodedCookieData);
This will stop ampersands and equals signs spliting a value into a bunch of name/value pairs as it is written to a cookie.

I think it's generally browser specific. To be on the safe side, base64 encode a JSON object, and store everything in that. That way you just have to decode it and parse the JSON. All the characters used in base64 should play fine with most, if not all browsers.

Here it is, in as few words as possible. Focus on characters that need no escaping:
For cookies:
abdefghijklmnqrstuvxyzABDEFGHIJKLMNQRSTUVXYZ0123456789!#$%&'()*+-./:<>?#[]^_`{|}~
For urls
abdefghijklmnqrstuvxyzABDEFGHIJKLMNQRSTUVXYZ0123456789.-_~!$&'()*+,;=:#
For cookies and urls ( intersection )
abdefghijklmnqrstuvxyzABDEFGHIJKLMNQRSTUVXYZ0123456789!$&'()*+-.:#_~
That's how you answer.
Note that for cookies, the = has been removed because it is
usually used to set the cookie value.
For urls this the = was kept. The intersection is obviously without.
var chars = "abdefghijklmnqrstuvxyz"; chars += chars.toUpperCase() + "0123456789" + "!$&'()*+-.:#_~";
Turns out escaping still occuring and unexpected happening, especially in a Java cookie environment where the cookie is wrapped with double quotes if it encounters the last characters.
So to be safe, just use A-Za-z1-9. That's what I am going to do.

Newer rfc6265 published in April 2011:
cookie-header = "Cookie:" OWS cookie-string OWS
cookie-string = cookie-pair *( ";" SP cookie-pair )
cookie-pair = cookie-name "=" cookie-value
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
If you look to #bobince answer you see that newer restrictions are more strict.

you can not put ";" in the value field of a cookie, the name that will be set is the string until the ";" in most browsers...

that's simple:
A <cookie-name> can be any US-ASCII characters except control
characters (CTLs), spaces, or tabs. It also must not contain a
separator character like the following: ( ) < > # , ; : \ " / [ ] ? =
{ }.
A <cookie-value> can optionally be set in double quotes and any
US-ASCII characters excluding CTLs, whitespace, double quotes, comma,
semicolon, and backslash are allowed. Encoding: Many implementations
perform URL encoding on cookie values, however it is not required per
the RFC specification. It does help satisfying the requirements about
which characters are allowed for though.
Link: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#Directives

There are 2 versions of cookies specifications
1. Version 0 cookies aka Netscape cookies,
2. Version 1 aka RFC 2965 cookies
In version 0 The name and value part of cookies are sequences of characters, excluding the semicolon, comma, equals sign, and whitespace, if not used with double quotes
version 1 is a lot more complicated you can check it here
In this version specs for name value part is almost same except name can not start with $ sign

There is another interesting issue with IE and Edge. Cookies that have names with more than 1 period seem to be silently dropped.
So
This works:
cookie_name_a=valuea
while this will get dropped
cookie.name.a=valuea

One more consideration. I recently implemented a scheme in which some sensitive data posted to a PHP script needed to convert and return it as an encrypted cookie, that used all base64 values I thought were guaranteed 'safe". So I dutifully encrypted the data items using RC4, ran the output through base64_encode, and happily returned the cookie to the site. Testing seemed to go well until a base64 encoded string contained a "+" symbol. The string was written to the page cookie with no trouble. Using the browser diagnostics I could also verify the cookies was written unchanged. Then when a subsequent page called my PHP and obtained the cookie via the $_COOKIE array, I was stammered to find the string was now missing the "+" sign. Every occurrence of that character was replaced with an ASCII space.
Considering how many similar unresolved complaints I've read describing this scenario since then, often siting numerous references to using base64 to "safely" store arbitrary data in cookies, I thought I'd point out the problem and offer my admittedly kludgy solution.
After you've done whatever encryption you want to do on a piece of data, and then used base64_encode to make it "cookie-safe", run the output string through this...
// from browser to PHP. substitute troublesome chars with
// other cookie safe chars, or vis-versa.
function fix64($inp) {
$out =$inp;
for($i = 0; $i < strlen($inp); $i++) {
$c = $inp[$i];
switch ($c) {
case '+': $c = '*'; break; // definitly won't transfer!
case '*': $c = '+'; break;
case '=': $c = ':'; break; // = symbol seems like a bad idea
case ':': $c = '='; break;
default: continue;
}
$out[$i] = $c;
}
return $out;
}
Here I'm simply substituting "+" (and I decided "=" as well) with other "cookie safe" characters, before returning the encoded value to the page, for use as a cookie. Note that the length of the string being processed doesn't change. When the same (or another page on the site) runs my PHP script again, I'll be able to recover this cookie without missing characters. I just have to remember to pass the cookie back through the same fix64() call I created, and from there I can decode it with the usual base64_decode(), followed by whatever other decryption in your scheme.
There may be some setting I could make in PHP that allows base64 strings used in cookies to be transferred back to to PHP without corruption. In the mean time this works. The "+" may be a "legal" cookie value, but if you have any desire to be able to transmit such a string back to PHP (in my case via the $_COOKIE array), I'm suggesting re-processing to remove offending characters, and restore them after recovery. There are plenty of other "cookie safe" characters to choose from.

If you are using the variables later, you'll find that stuff like path actually will let accented characters through, but it won't actually match the browser path. For that you need to URIEncode them. So i.e. like this:
const encodedPath = encodeURI(myPath);
document.cookie = `use_pwa=true; domain=${location.host}; path=${encodedPath};`
So the "allowed" chars, might be more than what's in the spec. But you should stay within the spec, and use URI-encoded strings to be safe.

Years ago MSIE 5 or 5.5 (and probably both) had some serious issue with a "-" in the HTML block if you can believe it. Alhough it's not directly related, ever since we've stored an MD5 hash (containing letters and numbers only) in the cookie to look up everything else in server-side database.

I ended up using
cookie_value = encodeURIComponent(my_string);
and
my_string = decodeURIComponent(cookie_value);
That seems to work for all kinds of characters. I had weird issues otherwise, even with characters that weren't semicolons or commas.

Related

"&lang" misinterpreted in URL

I am developing for javascript disabled phones. My code looks like this
Link 1
Link 2
But the browser interprets the URL as -
someurl?var=a%e2%8c%a9=english (Link 1, incorrect)
someurl?lang=english&var=a (Link 2 works just fine !)
It seems like &lang=english is being converted to a%e2%8c%a9=english
Could someone explain why this is happening?
In HTML, the & character represents the start of a character reference.
If you try to specify an invalid character reference, then browsers will perform error recovery and treat it as an ampersand instead.
From the HTML DTD:
<!ENTITY lang CDATA "〈" -- left-pointing angle bracket = bra,
U+2329 ISOtech -->
… so &lang is not an invalid character reference.
To include an ampersand character as data, use the character reference for an ampersand: &
By HTML 4.01 rules, the &lang entity reference denotes the character U+2329 LEFT-POINTING ANGLE BRACKET “〈”. In UTF-8 encoding, that character is represented as 0xE2 0x8C 0xA9, and therefore in a URL, it gets %-encoded as a%e2%8c%a9.
Nowadays, most browsers don’t work that way. Specifically, in a URL, the reference &lang is not recognized when followed by an equals sign = (even though it is valid HTML 4.01 in that context).
To deal with browsers that may follow the old rules, as well as in order to comply with syntax rules independently of HTML version, escape each occurrence of the ampersand “&” as &—it is safest to do this for all occurrences of “&” as a data character, in attribute values and elsewhere.
Depending on the server-side software that processes the URL when they have been followed, you might be able to use an unproblematic character like “;” instead of “&” as a separator.
http://www.htmlhelp.com/tools/validator/problems.html#amp (linked by w3 from http://validator.w3.org/docs/help.html) explains it.
& marks the start of a so called entity. Entities are for example € (€), < (<),..
If you now put in the URL &lang, this throws an error in any validator, because its not a valid entity. The browser is then escaping this sequence.
Solution:
You have to escape the & by its own entity: & so the URL will look like:
Link 1

Special characters not displaying correctly in a Javascript string

I have a function that assigns a string containing specials characters into a variable, then passes that variable to a DOM element via innerHTML property, but it prints strange characters. Let's say I code this...
someText = "äêíøù";
document.getElementById("someElement").innerHTML = someText;
It prints the following text...
äêíøù
I know how to use the entity names to prevent this, but when I use them to pass the value through a Javascript method, they print literally.
This means that you have a conflict of encodings. Your JavaScript and your HTML are being served to the browser with different encodings/character sets. Ensure that they're encoded in and served with the same encoding / character set (UTF8 is a good choice) to make sure that characters are correctly interpreted.
Obligatory link: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Handling unicode in the http response xml

I'm writing a Google Chrome extension that builds upon myanimelist.net REST api. Sometimes the XMLHttpRequest response text contains unicode.
For example:
<title>Onegai My Melody Sukkiri�</title>
If I create a HTML node from the text it looks like this:
Onegai My Melody Sukkiri�
The actual title, however, is this:
Onegai My Melody Sukkiri♪
Why is my text not correctly rendered and how can I fix it?
Update
Code: background.html
I think these are the crucial parts:
function htmlDecode(input){
var e = document.createElement('div');
e.innerHTML = input;
return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}
function xmlDecode(input){
var result = input;
result = result.replace(/</g, "<");
result = result.replace(/>/g, ">");
result = result.replace(/\n/g, "
");
return htmlDecode(result);
}
Further:
var parser = new DOMParser();
var xmlText = response.value;
var doc = parser.parseFromString(xmlDecode(xmlText), "text/xml");
<title>Onegai My Melody Sukkiri�</title>
Oh dear! Not only is that the wrong text, it's not even well-formed XML. acirc and ordf are HTML entities which are not predefined in XML, and then there's an invalid UTF-8 sequence (one high byte, presumably originally 0x99) between them.
The problem is that myanimelist are generating their output ‘XML’ (but “if it ain't well-formed, it ain't XML”) using the PHP function htmlentities(). This tries to HTML-escape not only the potentially-sensitive-in-HTML characters <&"', but also all non-ASCII characters.
This generates the wrong characters because PHP defaults to treating the input to htmlentities() as ISO-8859-1 instead of UTF-8 which is the encoding they're actually using. But it was the wrong thing to begin with because the HTML entity set doesn't exist in XML. What they really wanted to use was htmlspecialchars(), which leaves the non-ASCII characters alone, only escaping the really sensitive ones. Because those are the same ones that are sensitive in XML, htmlspecialchars() works just as well for XML as HTML.
htmlentities() is almost always the Wrong Thing; htmlspecialchars() should typically be used instead. The one place you might want to encode non-ASCII bytes to entity references would be when you're targeting pure ASCII output. But even then htmlentities() fails because it doesn't make character references (&#...;) for the characters that don't have a predefined entity names. Pretty useless.
Anyway, you can't really recover the mangled data from this. The � represents a byte sequence that was UTF-8-undecodable to the XMLHttpRequest, so that information is irretrievably lost. You will have to persuade myanimelist to fix their broken XML output as per the above couple of paragraphs before you can go any further.
Also they should be returning it as Content-Type: text/xml not text/html as at the moment. Then you could pick up the responseXML directly from the XMLHttpRequest object instead of messing about with DOMParsers.
So, I've come across something similar to what's going on here at work, and I did a bit more research to confirm my hypothesis.
If you take a look at the returned value you posted above, you'll notice the tell-tell entity "â". 99% of the time when you see this entity, if means you have a character encoding issue (typically UTF-8 characters are being encoded as ISO-8859-1).
The first thing I would test for is to force a character encoding in the API return. (It's a long shot, but you could look)
Second, I'd try to force a character encoding onto the data returned (I know there's a .htaccess override, but I don't know what's allowed in Chrome extensions so you'll have to research that).
What I believe is going on, is when you crate the node with the data, you don't have a character encoding set on the document, and browsers (typically, in my experience) default to ISO-8859-1. So, check to make sure it's not your document that's the problem.
Finally, if you can't find the source (or can't prevent it) of the character encoding, you'll have to write a conversation table to replace the malformed values you're getting with the ones you want { JS' "replace" should be fine (http://www.w3schools.com/jsref/jsref_replace.asp) }.
You can't just use a simple search and replace to fix encoding issue since they are unicode, not characters typed on a keyboard.
Your data must be stored on the server in UTF-8 format if you are planning on retrieving it via AJAX. This problem is probably due to someone pasting in characters from MS-Word which use a completely different encoding scheme (ISO-8859).
If you can't fix the data, you're kinda screwed.
For more details, see: UTF-8 vs. Unicode

HTML5 Data URI fails when a base64 null is followed by an =?

I'm doing some pretty unholy things with JavaScript, and I've run into a weird problem.
I am creating binary data that fills a buffer of a static size. If the content doesn't fill the buffer, the remainder is filled with null characters.
The next step is to convert to base64.
The size (bytes) isn't always a multiple of 3, so I may need to add padding to the end. The last bytes in the buffer are always null (actually, it's about a kb of nulls).
When I convert this to base64 on Firefox and Chrome, I get an ERR_INVALID_URL when I have a trailing '=', but it downloads fine when I don't.
For example:
var url = "data:application/octet-stream;base64,";
window.open(url + "AAAA"); // works
window.open(url + "AAAA="); // doesn't work
window.open(url + "icw="); // works
My files work, but they're not up to spec.
Is there a reason why this is invalid base64? More importantly, is this a bug or part of the specification?
Edit:
I've posted an answer that gives some of the oddities between Firefox and Chrome. Does anyone know what the standard specifies? Or is it one of those loose specifications that causes fragmentation? I'd like something definitive if possible.
The padding character = is used to fill up to a multiple of four code characters. As every three bytes of input are mapped onto four bytes of output, a number of input bytes that is not a multiple of three requires padding (a remainder of one byte requires == and a remainder of two bytes requires =).
In you case AAAA already is a valid code word and doesn’t require padding.
Why would you imagine that adding an "=" character to the end of the string would work? That's not a valid character in base64.
The character set is upper- and lower-case letters; the digits; and "+" and "/". Anything else is therefore not valid in a base64 string.
edit — well for URLs it seems that instead of "+" and "/" you use "-" and "_" (for positions 62 and 63 in the character set).
edit some more — this is a very confusing topic due to the existence of different, apparently authoritative but contradictory, sources of information. For example, the Mozilla description of the data URL scheme makes no mention of using the "filename-friendly" alternate encoding. Same goes for the IETF data url RFC. However, other IETF documents clearly discuss and specify the variation with "-" and "_" replacing the problematic (for file names) "+" and "/".
Therefore, I declare myself ignorant :-) Gumbo is probably right, that the complaints you're getting are about incorrect padding (that is, padding when no padding is actually necessary).
Notes about different browsers:
Chrome
datalength % 4 === 1- a single equals is necessary
datalength % 4 === 2- two equals are necessary
Firefox- equals signs are optional, but follow the same conventions as in Chrome
This is the line I used to test it (I replaced AAAAAA== with a different string each time):
var url = "data:application/octet-stream;base64,AAAAAA=="; window.open(url);
Also, both Firefox and Chrome use + & /, not - and _.
Source:
My tests on Ubuntu 11.04 with Chrome 11 and Firefox 4.
Edit:
The code I need this for is a tar utility for Javascript. My code works as is, but I'd like to be as standards-compliant as possible, and I'm missing a byte I think. No biggie because tar in Linux recognizes it.

Why use encodeURIComponent() when writing json to a cookie

In particular, when saving a JSON to the cookie is it safe to just save the raw value?
The reason I dopn't want to encode is because the json has small values and keys but a complex structure, so encoding, replacing all the ", : and {}, greatly increases the string length
if your values contain "JSON characters" (e.g. comma, quotes, [] etc) then you should probably use encodeURIComponent so these get escaped and don't break your code when reading the values back.
You can convert your JSON object to a string using the JSON.stringify() method then save it in a cookie.
Note that cookies have a 4000 character limit.
If your Json string is valid there should be no need to encode it.
e.g.
JSON.stringify({a:'foo"bar"',bar:69});
=> '{"a":"foo\"bar\"","bar":69}' valid json stings are escaped.
This is documented very well on MDN
To avoid unexpected requests to the server, you should call encodeURIComponent on any user-entered parameters that will be passed as part of a URI. For example, a user could type "Thyme &time=again" for a variable comment. Not using encodeURIComponent on this variable will give comment=Thyme%20&time=again. Note that the ampersand and the equal sign mark a new key and value pair. So instead of having a POST comment key equal to "Thyme &time=again", you have two POST keys, one equal to "Thyme " and another (time) equal to again.
If you can't be certain that your JSON will not include reserved characters such as ; then you will want to perform escaping on any strings being stored as a cookie. RFC 6265 covers special characters that are not allowed in the cookie-name or cookie-value.
If you are encoding static content you control, then this escaping may be unnecessary. If you are encoding dynamic content such as encoding user generated content, you probably need escaping.
MDN recommends using encodeURIComponent to escape any disallowed characters.
You can pull in a library such as cookie to handle this for you, but if your server is written in another language you will need to ensure it uses a library or language utilities to encodeURIComponent when setting cookies and to decodeURIComponent when reading cookies.
JSON.stringify is not sufficient as illustrated by this trivial example:
const bio = JSON.stringify({ "description": "foo; bar; baz" });
document.cookie = `bio=${stringified}`;
// Notice that the content after the first `;` is dropped.
// Attempting to JSON.parse this later will fail.
console.log(document.cookie) // bio={\"description\":\"foo;
Cookie: name=value; name2=value2
Spaces are part of the cookie separation in the HTTP Cookie header. Raw spaces in cookie values could thus confuse the server.

Categories