Handle non-ASCII filenames in XHR uploading

Handle non-ASCII filenames in XHR uploading - javascript

I have pretty standard javascript/XHR drag-and-drop file upload code, and just came across an unfortunate real-world snag. I have a file on my (Win7) desktop called "TEST-é-TEST.txt". In Chrome (30.0.1599.69), it arrives at the server with filename in UTF-8, which works out fine. In Firefox (24.0), the filename seems mangled when it arrives at the server.
I didn't trust what Firebug/Chrome might be telling me about the encoding, so I examined the hex of the request packet. Everything else is the same except the non-ASCII character is indeed being encoded differently in the two browsers:
Chrome: C3 A9 (this is the expected UTF-8 for that character)
Firefox: EF BF BD (UTF-8 "replacement character"?!)
Is this a Firefox bug? I tried renaming the file, replacing the é with ó, and the Firefox hex was the same... so such a mangle really seems like a browser bug. (If Firefox were confusedly sending along ISO-8859-1, for example, without touching it, I'd see an E9 byte, and I could handle that on the server side, but it shouldn't mangle it!)
Regardless of the reason, is there something I can do on either the client or server sides to correct for this? If a replacement character is indeed being sent to the server, then it would seem unrecoverable there, so I almost certainly need to do it on the client side.
And yes, the page on which this code exists has charset=utf-8, and Firefox confirms that it perceives the page as UTF-8 under View>Character Encoding.
Furthermore, if I dump the filename to console.log, it appears fine there--I guess it's just getting mangled in/after setRequestHeader("X-File-Name",file.name).
Finally, it would seem that the value passed to setRequestHeader() should be able to have code points up to U+00FF, so U+00E9 (é) and U+00F3 (ó) shouldn't cause a problem, though higher codes could trigger a SyntaxError: http://www.w3.org/TR/XMLHttpRequest2/#the-setrequestheader-method

Thanks so much for Boris's help. Here's a summary of what I discovered through our interactions in comments:
1) The core issue is that HTTP Request headers are supposed to be ISO-8859-1. Prior versions of Chrome and Firefox both passed along UTF-8 strings unchanged in setRequestHeader() calls. This changed in FF24.0 (and apparently will be changing in Chrome soon too), such that FF drops high bytes and passes along only the low byte for each character. In the example I gave in the question, this was recoverable, but characters with higher codes could be mangled irretrievably.
2) One workaround would be to encode on the client side, e.g.:
setRequestHeader('X-File-Name',encodeURIComponent(filename))
and then decode on the server side, e.g. in PHP:
$filename=rawurldecode($_SERVER['HTTP_X_FILE_NAME'])
3) Note that this is only problematic because my ajax file upload approach is to send the raw file data in the request body, so I need to send the filename via a custom request header (as shown in many tutorials online). If I used FormData instead, I wouldn't have to worry about this. I believe if you want solid, standards-based unicode filename support, you should use FormData and not the request header approach.

Related

Node.JS binary data URL Decoding to UTF-8 issue

When i am sending binary data with UTF-8 encoded from PHP client to Node.JS server, Node.JS internally encodes this data to URL Encode(percentage encoding). I guess Node.JS doesn't have support for UTF-8 as i already checked with base64 and it was working fine. I googled a lot and i found that everyone is facing the same issue. I manually checked the string but I get "URIError: URI malformed" error.
I also use the deprecated functions unescape, escape, encodeURI and decodeURI.
Little background of my work:
console.log(decodeURIComponent('%B1');
As you can search from the URLEncoding list that %B1 is for "±" sign. But i am getting the above mentioned error. This function can't decode other many special characters too. I don't know why Node.JS doesn't have support for the standard decoding style like UTF-8.
Please help me guys.

How does the browser process image sources?

[UPDATE] I found two links useful to me:
http://mrcoles.com/blog/canvas-composite-operations-demo-animation/
http://tutorials.jenkov.com/html5-canvas/composition.html
Recently when I learn canvas, I find there is more than one way to specify the image src:
You can give a image-URI, like : www.XYZ.com/abc.png
You can give a data-URI, like: data:image/svg+xml;base64,'+ hexcode;
You can give a data-URI from canvas like: canvas.toDataURL("image/png");
I am a little confused about what the difference among them and wondering how browser processes them?
Thanks

It's simply a matter of what the browser supports. The browser will look at the protocol header of the string and if it recognizes it try to interpret it. For example:
If it starts with "http://" or "https://" it will parse the rest of the string with that as a basis. Then try to connect using the HTTP(S) protocol to the server and communicate over this protocol. The protocol itself is specified in RFCs. If all OK, then data is transferred from server to browser which then goes to the next step interpreting the received data itself (this can happen during or after the data has loaded completely).
If the string starts with "data:" the browser will assume a Data-URI protocol (if it supports this protocol). If not, it will consider the source invalid. As it does not need to connect to any external resources in this case, it will validate the content of it and use it if valid. Data-URI will be converted to binary data (the base-64 representation will be converted back to binary form).
(2. and 3. are the same BTW. It's not hexcode that is appended, but a base-64 encoded string, ie. ASCII representation. Other representations are possible but not common).
Then there are other protocols which the browser may support, such as Blob-URL and perhaps some browser will allow FTP (ftp://), and some allow file:// given certain restrictions.

Intermittent failure to load images - ERR_CONTENT_LENGTH_MISMATCH

The problem
My website fails to load random images at random times.
Intermittent failure to load image with the following error in console:
"GET example.com/image.jpg net::ERR_CONTENT_LENGTH_MISMATCH"
Image either doesn't load at all and gives the broken image icon with alt tag, or it loads halfway and the rest is corrupted (e.g. colors all screwed up or half the image will be greyed out).
Setup
Litespeed server, PHP/mySQL website, with HTML, CSS, Javascript, and JQuery.
Important Notes
Problem occurs on all major web browsers - intermittently and with various images.
I am forcing UTF-8 encoding and HTTPS on all pages via htaccess.
Hosting provider states that all permissions are set correctly.
In my access log, when an image fails to load, it gives a '200 OK' response for the image and lists the bytes transferred as '0' (zero).
It is almost always images that fail to load but maybe 5% of the time it will be a CSS file or Javascript file.
Problem occurred immediately after moving servers from Apache to Litespeed and has been persistent over several weeks.
Gzip and caching enabled.

This error is definite mismatch between the data that is advertised in the HTTP Headers and the data transferred over the wire.
It could come from the following:
Server : If a server has a bug with certain modules that changes the content but don't update the content-length in the header or just doesn't work properly.
Proxy : Any proxy between you and your server could be modifying the request and not update the content-length header.
This could also happens if setting wrong content-type.
As far as I know, I haven't see those problem in IIS/apache/tomcat but mostly with custom written code. (Writing image yourself on the response stream)
It could be even caused by your ad blocker.
Try to disable it or adding an exception for the domain from which the images come from.

Suggest accessing the image as a discrete url using cURL, eg
php testCurlimg >image.log 2>&1 to see exactly what is being returned by the server. Then you can move upon level to test the webpage
php testCurlpg >page.log 2>&1 to see the context for mixed data

I just ran into this same ERR_CONTENT_LENGTH_MISMATCH error. I optimized the image and that fixed it. I did the image optimization using ImageOptim but I'm guessing that any image optimization tool would work.

Had this problem today retrieving images from Apache 2.4 when using a proxy I wrote in php to provide a JWT auth gateway for accessing a couchdb backend. The proxy uses php fsockopen and the fread() buffer was set relatively low (30 bytes) because I had seen this value used in other peoples work and I never thought to change it. In all my failing JPG (JFIF) images I found the discrepancy in the original versus the image served was a series of crlf that matched the size of the fread buffer. Increased the byte length for the buffer and the problem no longer exists.
In short, if your fread buffer streaming the image is completely full of carriage returns and line feeds, the data gets truncated. This possibly also relates to the post from Collin Krawll as to why image optimization resolved that problem.

Chrome, Firefox converting ":" to "-" and "_" respectively in their file save dialog

I am trying to save a file using FileSaver library which will save the file using Chrome's and Firefox's Save As dialog.
Ex: I have certain filename like testing:testing1:testing2.csv.
Now when the Save As dialog pops up, I am seeing filename converted to
testing-testing1-testing2.csv for Chrome
and
testing_testing1_testing2.csv for Firefox.
Is there any way we can suppress this conversion of characters?
Thanks

No.
File names can't contain, among other characters, the colon : (On Windows machines). If you want to make sure your application is compatible with Windows, keep that in mind.
These are the disallowed characters:
\/:*?"<>|
Firefox & Chrome probably replace all of those by the dash / underscore.

The filename is merely a suggestion. RFC 2616 states that:
19.5.1 Content-Disposition
The Content-Disposition response-header field has been proposed as a
means for the origin server to suggest a default filename if the user
requests that the content is saved to a file. This usage is derived
from the definition of Content-Disposition in RFC 1806 [35].
RFC 1806 further states that:
It is important that the receiving MUA not blindly use the suggested
filename. The suggested filename SHOULD be checked (and possibly
changed) to see that it conforms to local filesystem conventions, does
not overwrite an existing file, and does not present a security
problem (see Security Considerations below).
Long story short, different file systems have different restrictions on filenames. The browser is free to fix the filename if it cannot be used as-is.

Assign image to object gives a interpreted warning

I have created an image in my object so i can draw it to my canvas... I did it like this:
item[id].img = new Image();
item[id].img.src = './image_folder/'+data[i][j].image;
Then my canvas draws on this line:
canvas[2].ctx.drawImage(item[theID].imge, px, py);
It works fine but in Chrome console it says:
Resource interpreted as Image but transferred with MIME type text/html
I'm curious what this actually means and how to correct it ?

When you request things from servers, they will send "headers" with whatever is being sent.
It's how browsers can figure out how to use video or music, or know what to do with JS or CSS.
Modern browsers are pretty intelligent about dealing with these things, but if you tried to send an .mp3 to a browser that doesn't know how to use .mp3s, it might try loading the file as text, and you'd get a lot of funny characters.
MIME types can avoid that, mostly. If you ask to download an .mp3, the server might send a header that looks like "Content-Type: audio/mpeg codecs=mp3".
A regular web-page, in comparison, would be sent as "Content-Type: text/html", while a .png image would be sent as "Content-Type: image/png".
If you're playing around on a test-server that you installed using a WAMP installer, or EasyPHP or whatever, your server probably doesn't know about serving .png files with the "image/png" MIME-type.
Intelligent browsers will read the contents of the file and try to figure out what they're supposed to be, if they're given the wrong MIME-type for the file (which is why your images work in the first place).
This particular error probably isn't going to hurt anyone (because browsers that can't figure out you've got a .png file are probably browsers that don't have <canvas>).
But to fix it in other cases (like .ogg files for <audio> and <video> support, which IS important), you should figure out what kind of server you're running (my money's on Apache), and figure out how to add mime-type and file-type declarations.
You could find that through a Google search like "add mime-types to apache".
If this is a server that's live on the internet, and you're paying for hosting, then you'll need to set it through your hosted site.

We Keep Coding

JavaScript is the programming language of the Web.