Different behaviour of encodeURIComponent() in Firefox only - javascript

I encode a filename and send it as a part of URL like /rest/get?name=Filename.txt. In JS link construction is as simple as
url = '/rest/get?name=' + window.encodeURIComponent(file.name);
It works good for simple cases but for hardcore testing I use a file named
你好abcABCæøåÆØÅäöüïëêîâéíáóúýñ½§!#¤%&()=`#£$€{[]}+´¨^~'-_,;.txt
After URI encoding I expect to get a link
/rest/get?name=%E4%BD%A0%E5%A5%BDabcABC%C3%A6%C3%B8%C3%A5%C3%86%C3%98%C3%85%C3%A4%C3%B6%C3%BC%C3%AF%C3%AB%C3%AA%C3%AE%C3%A2%C3%A9%C3%AD%C3%A1%C3%B3%C3%BA%C3%BD%C3%B1%C2%BD%C2%A7%3F%3FabcABC%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD!%23%C2%A4%25%26()%3D%60%40%C2%A3%24%E2%82%AC%7B%5B%5D%7D%2B%C2%B4%C2%A8%5E~%27-_%2C%3B.txt
And I get it. The constructed link works ok in the latest versions of IE and Chrome but fails in Firefox. After some investigation I've found that in Firefox encodeURIcomponent works differently. Here's actual result in Firefox:
/rest/get?name=%3F%3FabcABC%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD!%23%EF%BF%BD%25%26%28%29%3D%60%40%EF%BF%BD%24%3F{[]}%2B%EF%BF%BD%EF%BF%BD^~%27-_%2C%3B.txt
Visual comparison (Chrome link is on the left and Firefox link is on the right):
I've also tried to copy and paste the valid link (constructed in Chrome) to Firefox and it works ok.
Why do I get different results?
I̶s̶ ̶i̶t̶ ̶a̶ ̶b̶u̶g̶ ̶w̶i̶t̶h̶ ̶̶e̶n̶c̶o̶d̶e̶U̶R̶I̶c̶o̶m̶p̶o̶n̶e̶n̶t̶(̶)̶̶ ̶i̶n̶ ̶F̶i̶r̶e̶f̶o̶x̶?̶
Does Firefox use a different encoding in encodeURIComponent()?
UPD. I've found similar questions (encodeURIComponent behaves differently in browsers for China as location [搜索] and
encodeURIComponent difference with browsers and ä-ö-å characters [äöå]), both without an answer.
UPD.2 Further investigation has shown that the following characters are encoded differently and causing 'File not found' exception on server:
你好
æøåÆØÅäöüïëêîâéíáóúýñ
½§¤
£€

I suppose your problem is not the encodeURIComponent() method. It's the encoding of whatever constructs file.name. Extend your question. How is file.name initialized? Where do the chars come from?

encodeURIComponent() is a native function so Firefox is obviously using some different implementation under the covers.
If you are stuck, then just deliver your own javascript implementation of encodeURIComponent(), then you'll get the same results across browsers. Here's a link how to get an open source copy of that:
encodeURIComponent algorithm source code

Related

Browser strips encoded dot character from url

I have a web app, which allows searching. So when I go to somedomain.com/search/<QUERY> it searches for entities according to <QUERY>. The problem is, when I try to search for . or .. it doesn't work as expected (which is pretty obvious). What surprised me though, is that if I manually enter the url of somedomain.com/search/%2E, the browser (tested Chrome and IE11) converts it somedomain.com/search/ and issues a request without necessary payload.
So far I haven't found anything that would say it's not possible to make this work, so I came here. Right now I have only one option: replacing . and .. to something like __dotPlaceholder__, but this feels like a dirty hack to me.
Any solution (js or non-js) will be welcomed. Any information on why do browsers strip url-encoded dots is also a nice-to-have.
Unfortunately part of RFC3986 defines the URI dot segments to be normalised and stripped out in that case, ie http://example.com/a/./ to become http://example.com/a
see https://www.rfc-editor.org/rfc/rfc3986#page-33 for more information

Data URL images unreadable for IE8

I have a 2D barcode generator making some barcodes on the (Java) backend. It gives me data URLs and I set them on the client side using Javascript. All works fine in Chrome, Firefox. But not in IE8 (of course!) although half of the images work and half don't work.
My images are a couple hundred bytes (so much less then 32kb)
Here is an example not working in IE8:
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAAAAADFHGIkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAEnRFWHRTb2Z0d2FyZQBCYXJjb2RlNEryjnYuAAAAgklEQVR42nVRixKAMAjy/3+a2hBwdVtdD1RELFxOXS6+9v1+F/+ICFs5jpGqsQWSosy3MQbVGSEDC7q4FaQrRiJDepJ1iG2sATggaqkeCc3VqicDDu6omgk1VdmS4W3Uq4sr4hE8llSYKe7GXsTxTPdZTdlyLQA4xrKQOit+Ryv7nwfFATbY5mERHQAAAABJRU5ErkJg
Here is an example working in IE8:
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFMAAABTCAAAAADG2WTcAAAACXBIWXMAACZzAAAmcwHzbHUKAAAAEnRFWHRTb2Z0d2FyZQBCYXJjb2RlNEryjnYuAAABAklEQVR42u2ZwQ7AIAhD+f+f3nbwMGwBl3hYg1xM5ngmUBE3u/abCTHtMTZcS4N3O0z3dNiYg+eeickZ02LMVzRcPJ0DD77zPsw5CQv6BGaYIwkmxo6+/V7S2CKHGShvYaNDmngN+T0TfGk9Y/E0DL4YkxVsWCQsSGGOOjPhHaNVfM5W2YMpMCFu2A3QQyHb722ZkQAT+dLzVJEJvqz4hYqc1H2Y4XHgU0WPCmzFxJhMjrTO00K3uDd7MXm8SSdKc1fej/7LLGTK+8yvPW0jZrGZ15sQXWbUGCTXOXIoiDGLLx3RUN1lOjOzCp5MkzZUmlkIM9JnGs9ezF2mxuz8r2e33b2PHKlEJ4PKAAAAAElFTkSuQmCC
Here is a fiddle. JS fiddle doesn't work so good in IE8 so use this link to view the result directly.
If you got an idea about the cause of this problem please share :)
For some reason the failing base64 decoded .png is invalid. If you download the image and re-save it the filesize is different.
I see 237 bytes for the original verses 409 for a re-saved copy.
This can be confirmed with the pngcheck tool.
Z:\pngcheck-2.3.0>pngcheck.exe original.png
original.png EOF while reading CRC value
ERROR: original.png
Z:\pngcheck-2.3.0>pngcheck.exe re-saved.png
OK: re-saved.png (24x24, 32-bit RGB+alpha, non-interlaced, 82.2%).
Using the following valid re-saved.png base64 encoded data fixes the problem in IE8.
background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAABJ0RVh0U29mdHdhcmUAQmFyY29kZTRK8o52LgAAAPlJREFUSEu1VdESwyAM6v7/o7vpqYcEova6vnRmNSRA9HP/nuufTwMorwL06rvm5MQI0n9X1FZAialCOD6+wY09kYqdgE15dpJx9QjGHYfukCJFBbbOHeLa/Wc1QDDFv6OW4xPAqZCuCKkBU6XcEqprjurfSoe5TRlFmVVDoRkAWzOztLP40IDtpShjlznX2TlALhFATbeb5kcuYhGzM4vnJsxBRoPiWek0dc9UZMcAT26oFmw7ClVcq7F3+rCdrQacwFGVHRvsvLpebXD+lsngwrL3wVI0uHB2DBE6UMezGkI11dtn0Yo2dcko90kN1FA9jQ0A9veb6y86oobBWhxQUgAAAABJRU5ErkJggg==);
I assume Chrome and Firefox are more lenient with the CRC.

regex not working in IE

IE version 8
The regex is working fine in firefox:
filename variable contains: testfile.arv (see invalid extension)
if (/\.(doc|xls|ppt|eml|txt|pdf|rtf).?\b/i.test(filename)) {
...
}
in IE it simply passes out as valid name.
EDIT:
After I changed the expression as suggested below. it continued to fail - in IE only. made me realize that it is not about this expression but something about IE and the javascript module I am using to do this.
I am using http://malsup.com/jquery/form/ this form plugin to upload multiple files. this is working properly on firefox, but is not working on internet explorer. it just simply uploads everything and does not show upload progress etc.
since this problem is turning out to be different I will close this thread and submit new question.
Thx for everyone's time and sorry for trouble. I am finding this porting thing a little difficult (from firefox to IE)
try ending it with an "$", like so
if (/\.(doc|xls|ppt|eml|txt|pdf|rtf)$/i.test(filename)) {
// super awesome code
}

encodeURIComponent keeps on returning URI error

Ok here is the thing. Our site has a bookmarklet and it works fine in all major browsers except Safari. I investigated and found out that it was because of two reasons:
Safari 5.5 has a 2347 char limit on any URL.
It encodes the URI.
I've solved problem 1 by renaming the variables & functions to very short names and also by minimizing the js.
For problem 2, I decided to store the whole function as a string, decodeURIComponent it and then use eval to evaluate back to a function and then execute it (I know I shouldn't be using eval but I can think of no other solution). The problem is that decodeURIComponent returns "URIError: URI error". But if I execute the same code in the developer console for Safari, it executes without any problem.
I am at my wits end. Any help would be greatly appreciated.
Thanks in advance.

HTML5 History.pushState mangles URL's containing percent encoded non-Ascii (Unicode) chars

In an OSS web app, we have JS code that performs some Ajax update (uses jQuery, not relevant). After the page update, a call is made to the html5 history interface History.pushState, in the following code:
var updateHistory = function(url) {
var context = { state:1, rand:Math.random() };
/* -----> bedfore the problem call <------- */
History.pushState( context, "Questions", url );
/* -----> after the problem call <------- */
setTimeout(function (){
/* HACK: For some weird reson, sometimes something overrides the above pushState so we re-aplly it
This might be caused by some other JS plugin.
The delay of 10msec allows the other plugin to override the URL.
*/
History.replaceState( context, "Questions", url );
}, 10);
};
[Please note: the full code segment is provided for context, the HACK part is not the issue of this question]
The app is i18n'ed and is using URL encoded Unicode segments in the URL's, so just before the marked problem call in the above code, the URL argument contains (as inspected in Firebug):
"/%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9/scope:all/sort:activity-desc/page:1/"
The encoded segment is utf-8 in percent encoding. The URL in the browser window is: (just for completeness, doesn't really matter)
http://<base-url>/%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9/
Just after the call, the URL displayed in the browser window changes to:
http://<base-url>/%C3%98%C2%A7%C3%99%C2%84%C3%98%C2%A3%C3%98%C2%B3%C3%98%C2%A6%C3%99%C2%84%C3%98%C2%A9/scope:all/sort:activity-desc/page:1/
The URL encoded segment is just mojibake, the result of using the wrong encoding at some level. The correct URL would've been:
http://<base-url>/%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9/scope:all/sort:activity-desc/page:1/
This behavior has been tested on both FF and Chrome.
The history interface specs don't mention anything about encoded URL's, but I assume the default standard for URL formation (utf-8 and percent encoding etc) would apply when using URL's in function calls for the interface.
Any idea on what's going on here.
Edit:
I wasn't paying attention to the uppercase H in History - this code is actually using the History.js wrapper for the history interface. I replaced with a direct call to history.pushState (notice the lowercase h) without going through the wrapper, and the code is working as expected as far as I can tell. The issue with the original code still stands - so an issue with the History.js library it seems.
Update
As Doug S explains in the comments below, the latest version of History.js includes a fix for this behaviour. He also found that my solution caused double-encoding when used in browsers (such as IE 9 and below) which require the hash fallback, so I recommend that instead of using the fix detailed below, just download the latest version.
I've kept my original answer below, since it does explain what's going on in much more detail.
Basel found a resolution of sorts, but there's still some confusion about what's happening under the hood. This answer goes into detail about the problem and suggests a better fix. (You can skip straight to the fix if you want.)
The problem
First, open your browser's JS console and run this:
window.encodeURI(window.unescape('%D8%A7%D9%84%D8%A3%D8%B3%D8%A6%D9%84%D8%A9'))
Does that look familiar? It should—that's what your URL is being mangled to. The problem lies in the implementation of History.unescapeString, specifically this line:
tmp = window.unescape(result);
window.unescape is a DOM Level 0 function—which is to say, an unstandardised relic from the hoary days of Netscape 2. It uses the escaping rules defined in RFC 2396, according to which characters outside of the unreserved range (alphanumerics and a small set of punctuation symbols) are encoded as octets.
This works fine for the US-ASCII range, but not all (indeed, the vast majority) of the characters in UTF-8 can be represented in a single byte. Since URIs do not have a built-in way of representing the character set being used, window.unescape just assumes each character maps to a single octet and blithely mangles any that don't.
In this example, the first letter in your URL is the Arabic letter alef (ا), represented by two bytes: 0xD8 0xA7. window.unescape interprets these as two separate characters: 0x00 0xD8 (Ø—capital O with stroke) and 0x00 0xA7 (§—section sign).
This is a known issue with History.js.
The fix
As noted above by the asker, the issue can be sidestepped by using the native implementation of the History API instead of the History.js wrapper, i.e. history.pushState instead of History.pushState.
This works for browsers that support the History API, but loses the benefit of having a polyfill for those that don't. Fortunately, there's a better fix. Open up the History.js source you're referencing and find this line (~1059 in my copy):
tmp = window.unescape(result);
Replace it with:
tmp = window.unescape(encodeURIComponent(result));
Or, if you're using the compressed source, replace a.unescape(c) with a.unescape(encodeURIComponent(c)).
To test this change, I ran the History.js HTML5 jQuery test suite on a local web server inside an Arabic-named directory. Before making the change, test 14 fails; after the change, all tests passed.
Credit
Though I found the problem and solution independently, Damien Antipa deserves credit for finding it first and making a pull request with the fix.
I'm still able to reproduce this in the following case:
History.pushState(null, null, "?" + some_Unicode_String_Or_A_String_With_Whitespace);
document.location.hash += "&someStuff";
In this case the _suid parameter gets removed and &someStuff as well. If the string is not unicode or doesn't have whitespaces (so no % chars) - this does not happen.
This workaround worked for me:
History.pushState(null, null, "?" + some_Unicode_String_Or_A_String_With_Whitespace + "&someStuff");

Categories