I'm trying to figure out the lowest data-overhead way to upload/download binary data to Google AppEngine's Blobstore from a JavaScript initiated HTTP request. Ideally, I would like to submit the binary data directly, i.e. as unencoded 8-bit values; maybe in a POST request that looks something like this:
...
Content-Type: multipart/form-data; boundary=boundary;
--boundary
Content-Disposition: form-data; name="a"; filename="b"
Content-Type: application/octet-stream
##^%(^Qtr...
--boundary--
Here, ##^%(^Qtr... ideally represents arbitrary 8-bit binary data.
Specifically, I am trying to understand the following:
Is it possible to directly upload 8-bit binary data, or would I need to encode the data somehow, like a base-64 MIME encoding?
If I use a different encoding, would Blobstore save the data as 8-bit binary internally or in the encoded format? I.e. would a base-64 encoding increase my storage cost by 33%?
Along the same lines: Does encoding overhead increase outgoing bandwidth cost?
Is there a better way to format the POST request so I don't need to come up with a boundary that doesn't appear in my binary data? E.g. is there a way to specify a Content-Length rather than a boundary?
In the GET request to retrieve the data, can I simply expect to have binary data end up in the return string, or is the server going to automatically encode the data somehow?
If I need to use some encoding, which one would be the best choice among the supported options for essentially random 8-bit data? (base-64, UTF-8, someting else?)
Even though I received the Tumbleweed Badge for this question, let me report on my progress anyways in case somebody out there does care:
This question turned out to pose 3 independent problems:
Uploading data to BlobStore efficiently
Making sure BlobStore saves it in the smallest possible format
Finding a way to reliably download the data
Let's start with (3), because this ends up posing the biggest issue:
So far I have not been able to find a way to download true 8-bit data to the browser via XHR. Using mime-types like application/octet-stream leads to only 7 bits reaching the client reliably, unless the data is downloaded to a file. The best solution I found, is using the following mime-type for the data:
text/plain; charset=ISO-8859-1
This seems to be supported in all browsers that I've tested: IE 8, Chrome 21, FF 12.0, Opera 11.61, Safari 5.1.2 under Windows, and Android 2.3.3.
With this, it is possible to transfer almost any 8-bit value, with the following restrictions/caveats:
Character 0x00 is interpreted as the end of the input string in IE8 and must therefore be avoided.
Most browsers interpret charset ISO-8859-1 as Windows-1252 instead, leading to characters 0x80 through 0x9F being changed accordingly. This can be fixed, though, as the changes are unambiguous. (see http://en.wikipedia.org/wiki/Windows-1252#Codepage_layout)
Characters 0x81, 0x8D, 0x8F, 0x90, 0x9D are reserved in the Windows-1252 charset and Opera returns an error code for these, therefore these need to be avoided as well.
Overall, this leaves us with 250 out of the 256 characters which we can use. With the required basis-change for the data, this means an outgoing-data-overhead of under 0.5%, which I guess I'm ok with.
So, now to problem (1) and (2):
As incoming bandwidth is free, I've decided to reduce the priority of solving problem (1) in favor of problems (2) and (3). Turns out, using the following POST request does the trick then:
...
Content-Type: multipart/form-data; boundary=-
---
Content-Disposition: form-data; name="a"; filename="b"
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: base64
abcd==
-----
Here, abcd== is the base64-MIME-encoded data consisting of the above described 250 allowed characters (see http://en.wikipedia.org/wiki/Base64#Examples, GAE uses + and / as the last 2 characters). The encoding is necessary (correct me if I'm wrong) as calling the XHR send() function with String data will result in UTF-8 encoding of the string, which screws up the data received by the server. Unfortunately passing ArrayBuffers and Blobs to the send() function isn't available in all browsers yet to circumvent this issue more elegantly.
Now the good news: The AppEngine BlobStore decodes this data automatically and correctly and stores it without overhead! Therefore, using the base64-encoding only leads to slower data-uploads from the client, but does not result in additional hosting cost (unless maybe a couple CPU cycles for the decoding).
Aside: The AppEngine development-server will report the encoded size (i.e. 33% larger) for the stored blob, both in the admin console and in a retrieved BlobInfo record. The production servers do not have this issue, though, and report the correct blob size.
Conclusion:
Using Content-Transfer-Encoding base64 for uploading binary data of Content-Type text/plain; charset=ISO-8859-1, which may not contain characters 0x00, 0x81, 0x8D, 0x8F, 0x90, and 0x9D, leads to reliable data transfer for many tested browsers with a storage/outgoing-bandwidth overhead of less than half a percent. The upload-overhead of the base64-encoded data is 33%, which is better than the expected 50% for UTF-8 (for random 8-bit data), but still far from desirable.
What I don't know is: Is this the optimal solution, or could one do better? Anyone up for the challenge?
Related
I have a GraphQL server, hosted on express. I want to return images to the client by sending back nodejs buffer objects. How can i config graphql server, to return bytes, instead of json? I don't wish to do this through base64, as the image are large in size.
You have to return JSON, but there's still a way. We're using GraphQL to return images stored in Blob fields in a legacy sql database. We're using sequelize, graphql-sequelize, and graphql-js.
We've defined the Blob fields to be String types in our graphql schema and so they come through in the json reply just fine.
Then we convert to a buffer before delivering, like
const imgBuffer = new Buffer.from(imgObj.data, 'ascii');
The only problem is we're now having trouble saving image data back to the database via the graphql interface. Our mutation function gives us a syntax error when it finds certain bad unicode characters in the strings, like \U0000 and whatnot (so I found your question looking for a solution to that).
There's a way, but it's complicated, and very manual, and I'm only going to give you an overview of what I've done in ApolloServer, but I think it should be enough.
First, you need to use the "Accept" header in your request to send a binary mime type, and send a matching "Content-Type" in your response. This is nessisary to be efficient, but not nessisary to work, as you'll see (with EJSON).
To serialize and deserialize respecting the headers you may need to write an express middleware, and you'll need to handle base64 encoding with a {$data: "..."} encapsulating object (as EJSON does) or just (strangely) returning null, if someone makes a request for binary data using "application/json" for their "accept" header. You'll also want to choose what binary formats that you'll support. I only use 1: "application/x-msgpack", but I hear that "application/cbor" is becoming more popular. You can use a library for EJSON, MessagePack, and CBOR to do your serialization, so this isn't as hard as it sounds.
I would then strongly recommend using the #defer on any images. See this post for more information on #defer: https://www.apollographql.com/blog/introducing-defer-in-apollo-server-f6797c4e9d6e/
I've done it. It wasn't easy, and it would be better if ApolloServer worked this way "out of the box".
It's better to send a hashed & temporary link to download it
The URL can be hashed to be non-accessible by other users.
Backend can expire the link or remove the binary file on the static server.
There might be an answer to your question by using the node module found here.
My requirement is to allow users to use(type) ANSI characters instead of utf-8 when they are typing in to the text fields of my webpages.
I looked at the setting of the character set in html meta tag
<meta charset="ISO-8859-1">
That was helpful to display the content in ANSI instead of UTF-8, but it does not stop users typing in utf-8. Any help is appreciated.
Let's distinguish between two things here: characters the user can type and the encoding used to send this data to the server. These are two separate issues.
A user can type anything they want into a form in their browser. For all intents and purposes these characters have no encoding at this point, they're pure "text"; encodings do not play a role just yet and you cannot restrict the set of available characters with encodings.
Once the user submits the form, the browser will have to encode this data into binary somehow, which is where an encoding comes in. Ultimately the browser decides how to encode the data, but it will choose the encoding specified in the HTTP headers, meta elements and/or accept-charset attribute of the form. The latter should always by the deciding factor, but you'll find buggy behaviour in the real world (*cough*cough*IE*cough*). In practice, all three character set definitions should be identical to not cause any confusion there.
Now, if your user typed in some "exotic" characters and the browser has decided to encode the data in "ANSI" and the chosen encoding cannot represent those exotic characters, then the browser will typically replace those characters with HTML entities. So, even in this case it doesn't restrict the allowed characters, it simply finds a different way to encode them.
How can I know what encoding is used by the user
You cannot. You can only specify which character set you would like to receive and then double check that that's actually what you did receive. If the expectation doesn't match, reject the input (an HTTP 400 Bad Request response may be in order).
If you want to limit the acceptable set of characters a user may input, you need to do this by checking and rejecting characters independent of their encoding. You can do this in Javascript at input time, and will ultimately need to do this on the server again (since browser-side Javascript ultimately has no influence on what can get submitted to the server).
If you set the encoding of the page to UTF-8 in a and/or HTTP header, it will be interpreted as UTF-8, unless the user deliberately goes to the View->Encoding menu and selects a different encoding, overriding the one you specified.
In that case, accept-encoding would have the effect of setting the submission encoding back to UTF-8 in the face of the user messing about with the page encoding. However, this still won't work in IE, due the previous problems discussed with accept-encoding in that browser.
So it's IMO doubtful whether it's worth including accept-charset to fix the case where a non-IE user has deliberately sabotaged the page encoding
All I have is <input type="file" name="myFileUpload" />.
After the user chooses a file (most likely an image), how do I obtain the actual contents of the file as a string? (If possible, please tell me anything about base 64 and url encoding/decoding.)
I was asked to obtain such a string and set it as a value of a JSON object, then such a JSON object would be posted to the server "as is", that is, application/json; charset=utf-8.
I'm not sure if the above is a common practice since I'm accustomed to just posting such data as multipart/form-data which I was told not to use.
The receiver of this gigantic JSON object is an ASP.net Web API Controller. I suppose there would be a problem with deserialization if such an object is potentially multi-megabytes large.
So again, how to obtain the image bytes as a string and what problems may I encounter if I try to post such a large JSON object especially when it's received the server-side.
I'm stuck at the following situation.
I want to send bytes via http post, using javascript and jQuery to a server. I figured out that I can send bytes via String.fromCharCode(...) with a mime-type of application/octet-stream or text/plain; charset=x-user-defined
But now here's the problem. I have to send a specific amount of bytes with values greater than 127. (the packet, if sniffed in wireshark, has to consist of e.g. 5 bytes)
Is this possible with jQuery? Or is this possible with javascript at all?
(e.g. I need to send 1 byte --> 0xAF)
Is it possible to send this one byte, 0xAF ? Or will it be always 2 bytes because the value is bigger than 127?
To clearify this for whoever may concern.
The request is embedded in a HTTP-POST frame. This frame is, depending on its mime type, encoded.
For any textual encoding, including application/octet-stream, bytes are converted to its textual representation.
For byte values > 127, this means, there need to be 2 bytes to textual represent this.
In my case, i needed to ensure that a value of e.g. 128 MUST be 1 byte transmitted.(Through server piped requests to another protocol - modbus)
Solution was to use an arraybuffer, altough this meant IE>=10.
I want to send data from Javascript to a WebSocket server and also from a WebSocket server to Javascript.
I want to send this:
Headers
-------
Field 1: 2 byte hex
Field 2: 2 byte hex
Field 3: 4 byte hex
Data
----
Field1 : 2 byte hex
Field1 : 8 byte hex
From Javascript, I can send a two-byte value via
socket = new WebSocket(host);
...
socket.send(0xEF);
But I want to send multiple fields, together...let's say 0xEF, 0x60, and 0x0042.
How do I do this?
And, how to I interpret via Javascript data containing multiple fields coming from the WebSocket server?
You can send data as a string. For example:
socket.send("hello world");
I recommend you to use JSON as data format. You can convert JSON strings directly into objects and vice versa. It's so simple and useful!
You can send data as JSON objects.
socket.send(JSON.stringify({field1:'0xEF', field2:'0x60',field3: '0x0042'}));
Sound like what you are asking is how to send binary data over a WebSocket connection.
This is largely answered here:
Send and receive binary data over web sockets in Javascript?
A bit of extra info not covered in that answer:
The current WebSocket protocol and API only permits strings to be sent (or anything that can be coerced/type-cast to a string) and received messages are strings. The next iteration of the protocol (HyBi-07) supports binary data is currently being implemented in browsers.
Javascript strings are UTF-16 which is 2 bytes for every character internally. The current WebSockets payload is limited to UTF-8. In UTF-8 character values below 128 take 1 byte to encode. Values 128 and above take 2 or more bytes to encode. When you send a Javascript string, it gets converted from UTF-16 to UTF-8. To send and receive binary data (until the protocol and API natively support it), you need to encode your data into something compatible with UTF-8. For example, base64. This is covered in more detail in the answer linked above.