How to Obtain Image Bytes as String in JavaScript? - javascript

All I have is <input type="file" name="myFileUpload" />.
After the user chooses a file (most likely an image), how do I obtain the actual contents of the file as a string? (If possible, please tell me anything about base 64 and url encoding/decoding.)
I was asked to obtain such a string and set it as a value of a JSON object, then such a JSON object would be posted to the server "as is", that is, application/json; charset=utf-8.
I'm not sure if the above is a common practice since I'm accustomed to just posting such data as multipart/form-data which I was told not to use.
The receiver of this gigantic JSON object is an ASP.net Web API Controller. I suppose there would be a problem with deserialization if such an object is potentially multi-megabytes large.
So again, how to obtain the image bytes as a string and what problems may I encounter if I try to post such a large JSON object especially when it's received the server-side.

Related

Loading buffer data from database as PDF

I have been developing a web app where the user can upload a PDF file, and then later retrieve it and view it. What I have been doing to achieve this is having the PDF uploaded to a PostgreSQL database as bytea datatype (the column is called "attachment"), and then having nodeJS offer up this data to be fetched back and turned back into a PDF to view.
However I have been struggling to convert the data back into a valid PDF. This is the method I am using so far.
var file = new Blob([res[i].attachment], { type: 'application/pdf' });
var fileURL = URL.createObjectURL(file);
document.getElementById("pdf_box").data = fileURL;
The #pdf_box identifier refers to an object element in a HTML file which is used to display the PDF (this has been shown to work when I provide the file location of a dummy PDF file to the data attribute of this element).
The res[i].attachment is also shown to provide valid buffer data in JSON format, with an example provided below:
"attachment":{"type":"Buffer","data":[91,111,98,106,101,99,116,32,70,105,108,101,93]}
When I load the created fileURL into the data attribute of #pdf_box however, I get an error indicating along the lines of an invalid PDF file. My research so far appears to indicate this may be because the data is coming in as buffer whereas it needs to be in byte form, but I haven't found much that helps show me how this may be achieved or if there is a way to convert between the forms with the data I have access to? I have also seen occasional reference to a method called pdf.create(), but I cannot find documentation on this and I assume it must belong to a third-party library for JS.
If you know of any information that can help me with understanding what my problem is and what to search to figure out a solution, it would all be greatly appreciated, thank you.
To all reading this question, this method does not seem possible and would likely be incredibly inefficient to implement. If you send a string to nodeJS larger than the MTU of your link to this server (anywhere between 68 bytes and >1500 bytes depending on every component of your network) the packet will be silently dropped with no attempts to resolve or throw an error. What you must do is take the approach of using "multipart/form-data" encoding to send any files to the server (more on this found here).
It should also be mentioned that uploading a file to the database is not recommended in any capacity (due to databases being quite inefficient storage for large amounts of data) and what should be done is to upload a reference to the file path or URL to find the file at. Then on the client-side this file could be rendered as such when you have retrieved the file location...
<object id="pdf" data="../files/test.pdf"></object>
To change the data attribute, the following can be done...
document.getElementById("pdf").data = "../files/test2.pdf";

Why to use Blob at all if I can save file pathes in database and actual files in storage?

I know that blob is a data type for binary data as integer is a datatype for int. As they say, It's used to store files directly in database (we move our audio file into blob, and save that blob in database).
Question 1) why to store blob for audio if I can just put the audio in storage for example path /var/www/audio.mp3 and in database I store path_name /var/www/audio.mp3?
Question 2) which is better ? how netflix stores movies? just blobs or what?
Question 3) Curious if there're any cons or prons if you could just give me ideas so that I know when to use them .
Putting the blob in the database, rather than a file, allows you to grow to multiple servers with load balancing. If you put the data in files, you would have to replicate the files between the server. Most databases have built-in replication features, this isn't as easy for regular files.
Better to use external storage/cdn for serving such kind of large content.
How Netflix and our works? They upload content on external bucket i. e. S3 and write file name in db for identification. According to user file access frequency that file cache on CDN/edge location. User will get awesome experience while content server from their nearest edge location
With blob you can store all kinds of stuff.
Do you communicate with an API via SOAP or JSON and want to store it in the database? Use a blob. Want to log what a user filled into a form when it threw an exception? Store the entire post as a blob. You can save everything as is. It's handy for logging if you have different data formats. I know an API which expects some data via SOAP and some as JSON. To log the communication I use blob because the response may be in XML, JSON, a number (http code 203 for empty but accepted) or an exception as array.

How to send binary data back to client using GraphQL

I have a GraphQL server, hosted on express. I want to return images to the client by sending back nodejs buffer objects. How can i config graphql server, to return bytes, instead of json? I don't wish to do this through base64, as the image are large in size.
You have to return JSON, but there's still a way. We're using GraphQL to return images stored in Blob fields in a legacy sql database. We're using sequelize, graphql-sequelize, and graphql-js.
We've defined the Blob fields to be String types in our graphql schema and so they come through in the json reply just fine.
Then we convert to a buffer before delivering, like
const imgBuffer = new Buffer.from(imgObj.data, 'ascii');
The only problem is we're now having trouble saving image data back to the database via the graphql interface. Our mutation function gives us a syntax error when it finds certain bad unicode characters in the strings, like \U0000 and whatnot (so I found your question looking for a solution to that).
There's a way, but it's complicated, and very manual, and I'm only going to give you an overview of what I've done in ApolloServer, but I think it should be enough.
First, you need to use the "Accept" header in your request to send a binary mime type, and send a matching "Content-Type" in your response. This is nessisary to be efficient, but not nessisary to work, as you'll see (with EJSON).
To serialize and deserialize respecting the headers you may need to write an express middleware, and you'll need to handle base64 encoding with a {$data: "..."} encapsulating object (as EJSON does) or just (strangely) returning null, if someone makes a request for binary data using "application/json" for their "accept" header. You'll also want to choose what binary formats that you'll support. I only use 1: "application/x-msgpack", but I hear that "application/cbor" is becoming more popular. You can use a library for EJSON, MessagePack, and CBOR to do your serialization, so this isn't as hard as it sounds.
I would then strongly recommend using the #defer on any images. See this post for more information on #defer: https://www.apollographql.com/blog/introducing-defer-in-apollo-server-f6797c4e9d6e/
I've done it. It wasn't easy, and it would be better if ApolloServer worked this way "out of the box".
It's better to send a hashed & temporary link to download it
The URL can be hashed to be non-accessible by other users.
Backend can expire the link or remove the binary file on the static server.
There might be an answer to your question by using the node module found here.

Posting a file using Json

I want to upload a binary file using json.
I choose Json because with the file I would also like to send additional information.
I am going to do this by -
Select a file in the file input tag.
Use the HTML5 File Reader Api to read a file first.
Convert the file content into base64.
Add the base64 content to a JS object in a data uri format.
Convert the JS object to json and post it to the server.
I wonder if this is the only legitimate way to achieve my goal? Also, if there is a plugin already available somewhere which give me this ability?
No, this is not the only way - one of the other ways is just to submit a form with a file in it. Such form uses multipart/form-data content type.
See W3C documentation on the subject:
The content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters.
The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data.
So, there is no need to reinvent the wheel - browsers already support sending the files along with additional information, in a simple way. You just create a form where the user can enter data and select files, then all of them are sent to the server with multipart/form-data content type, and your web framework should be able to understand that it deals with both files and textual data.

Upload binary data to AppEngine Blobstore via HTTP request

I'm trying to figure out the lowest data-overhead way to upload/download binary data to Google AppEngine's Blobstore from a JavaScript initiated HTTP request. Ideally, I would like to submit the binary data directly, i.e. as unencoded 8-bit values; maybe in a POST request that looks something like this:
...
Content-Type: multipart/form-data; boundary=boundary;
--boundary
Content-Disposition: form-data; name="a"; filename="b"
Content-Type: application/octet-stream
##^%(^Qtr...
--boundary--
Here, ##^%(^Qtr... ideally represents arbitrary 8-bit binary data.
Specifically, I am trying to understand the following:
Is it possible to directly upload 8-bit binary data, or would I need to encode the data somehow, like a base-64 MIME encoding?
If I use a different encoding, would Blobstore save the data as 8-bit binary internally or in the encoded format? I.e. would a base-64 encoding increase my storage cost by 33%?
Along the same lines: Does encoding overhead increase outgoing bandwidth cost?
Is there a better way to format the POST request so I don't need to come up with a boundary that doesn't appear in my binary data? E.g. is there a way to specify a Content-Length rather than a boundary?
In the GET request to retrieve the data, can I simply expect to have binary data end up in the return string, or is the server going to automatically encode the data somehow?
If I need to use some encoding, which one would be the best choice among the supported options for essentially random 8-bit data? (base-64, UTF-8, someting else?)
Even though I received the Tumbleweed Badge for this question, let me report on my progress anyways in case somebody out there does care:
This question turned out to pose 3 independent problems:
Uploading data to BlobStore efficiently
Making sure BlobStore saves it in the smallest possible format
Finding a way to reliably download the data
Let's start with (3), because this ends up posing the biggest issue:
So far I have not been able to find a way to download true 8-bit data to the browser via XHR. Using mime-types like application/octet-stream leads to only 7 bits reaching the client reliably, unless the data is downloaded to a file. The best solution I found, is using the following mime-type for the data:
text/plain; charset=ISO-8859-1
This seems to be supported in all browsers that I've tested: IE 8, Chrome 21, FF 12.0, Opera 11.61, Safari 5.1.2 under Windows, and Android 2.3.3.
With this, it is possible to transfer almost any 8-bit value, with the following restrictions/caveats:
Character 0x00 is interpreted as the end of the input string in IE8 and must therefore be avoided.
Most browsers interpret charset ISO-8859-1 as Windows-1252 instead, leading to characters 0x80 through 0x9F being changed accordingly. This can be fixed, though, as the changes are unambiguous. (see http://en.wikipedia.org/wiki/Windows-1252#Codepage_layout)
Characters 0x81, 0x8D, 0x8F, 0x90, 0x9D are reserved in the Windows-1252 charset and Opera returns an error code for these, therefore these need to be avoided as well.
Overall, this leaves us with 250 out of the 256 characters which we can use. With the required basis-change for the data, this means an outgoing-data-overhead of under 0.5%, which I guess I'm ok with.
So, now to problem (1) and (2):
As incoming bandwidth is free, I've decided to reduce the priority of solving problem (1) in favor of problems (2) and (3). Turns out, using the following POST request does the trick then:
...
Content-Type: multipart/form-data; boundary=-
---
Content-Disposition: form-data; name="a"; filename="b"
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: base64
abcd==
-----
Here, abcd== is the base64-MIME-encoded data consisting of the above described 250 allowed characters (see http://en.wikipedia.org/wiki/Base64#Examples, GAE uses + and / as the last 2 characters). The encoding is necessary (correct me if I'm wrong) as calling the XHR send() function with String data will result in UTF-8 encoding of the string, which screws up the data received by the server. Unfortunately passing ArrayBuffers and Blobs to the send() function isn't available in all browsers yet to circumvent this issue more elegantly.
Now the good news: The AppEngine BlobStore decodes this data automatically and correctly and stores it without overhead! Therefore, using the base64-encoding only leads to slower data-uploads from the client, but does not result in additional hosting cost (unless maybe a couple CPU cycles for the decoding).
Aside: The AppEngine development-server will report the encoded size (i.e. 33% larger) for the stored blob, both in the admin console and in a retrieved BlobInfo record. The production servers do not have this issue, though, and report the correct blob size.
Conclusion:
Using Content-Transfer-Encoding base64 for uploading binary data of Content-Type text/plain; charset=ISO-8859-1, which may not contain characters 0x00, 0x81, 0x8D, 0x8F, 0x90, and 0x9D, leads to reliable data transfer for many tested browsers with a storage/outgoing-bandwidth overhead of less than half a percent. The upload-overhead of the base64-encoded data is 33%, which is better than the expected 50% for UTF-8 (for random 8-bit data), but still far from desirable.
What I don't know is: Is this the optimal solution, or could one do better? Anyone up for the challenge?

Categories