What is the logic for sending the base64 string into image? - javascript

I have generated a Base64 string, which I have shared as an image using Capacitor FileSharer,
for this I have used two approaches-
img.split(',')[1],
This I have understood as how it is giving me the image file from removing the "data:image" from the string.
img.replace(/^data:image\/[a-z]+;base64,/, "")
This I haven't understood properly as what functions it is performing to the string that I am getting a image file. Anyone If possible, do provide an explanation.
Though I have used both of them, and both works fine. It is only I am asking because ,If I am using any property in my project , I should now how it is actually working.
(PS- I am new to Javascript )

Introduction of Base64 encoding
In computer science, Base64 is a group of binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The term Base64 originates from a specific MIME content transfer encoding. Each Base64 digit represents exactly 6 bits of data. Three 8-bit bytes (i.e., a total of 24 bits) can therefore be represented by four 6-bit Base64 digits (you can read more here).
Where can we use Base64 encoding on images specifically?
Basically, there are multiple advantages in using base64 images or even files such as pdf, csv, etc., in web interactions:
For storying them easily in databases as string and retrieve them accordingly.
In JSON or XML based web architectures (such as REST or SOAP) usually is hard to send images along side with form data. For example, sending profile picture along side with user form data such as username, password, first name, last name, etc., in JSON format.
Security! Anyone who does not know anything about base64 encoding cannot open files easily as it should be.

Related

When exchanging data between a browser and a server, the data can only be text. Why?

I understood why we are using JSON to exchange data between browser and server in place of XML but I could not understand why we are using only string type of JSON even we have six different value datatype, I mean why we can't use integer or Boolean or any other value datatype.
Hope you guys understand what I'm trying to say, Thanks in advance.
If I understand correctly, the limitation is because of the way data needs to be encoded to be sent over HTTP and ultimately over the wire. You json object (or xml,etc) is ultimately just a payload for HTTP (which is just a payload for tcp in turn and so on).
HTTP inherently does not and should not identify data types in payload, it is just an array for HTTP. You can select how to represent this array i.e. how to encode it; It can be string (ascii, utf-8, etc) or binary but it has to be uniform for the whole payload.
HTTP does offer different encoding methods of payload which can be interpreted by the receiver by looking at the content-type header and accordingly decode the data.
Hope this helps.
why we are using only string type of JSON
Uhm, we're not. I believe you're misunderstanding something here. HTTP responses can really contain anything; every time you download a PDF or an image from a web server, the web server is sending a binary payload, which can literally be anything. So it's not even true that all HTTP bodies must be text.
To exchange data between systems, you send bytes. For these bytes to mean anything, you need an encoding scheme. Image formats have a particular way in which bytes need to be arranged, and when properly doing so, you can send pictures with them. Same for PDFs, video, audio, and anything else (including text).
If you want to send structured data, you need to express that structure somehow. How do you send a, for example, PHP array over HTTP…? (Substitute your equivalent list data structure in your language of choice.) You can't. A PHP array is a specific data structure in memory of a PHP runtime, sending that as is over HTTP has no meaning (because it deals with internal pointers and such). This array needs to be serialised first. There are many possible serialisation methods, some of them using binary data, and some using formats which are human readable to varying degrees. You could simply join all array elements with commas and .split(',') them again on the other end, but that's rather simplistic and misses many more complex cases and edge cases.
JSON and XML (and YAML and whatnot) are human readable formats which can serialise data structures like arrays (and dictionaries and numbers and booleans etc), and which happen to be text-based (purposely, to make them developer-friendly). You can use any of those data types JSON allows. Nothing prevents you from doing so, and not using them is insane. JSON and XML also happen to be two formats easily parsed with tools built into every browser. You could use any other binary format too, but then you'd have to manually parse it in Javascript.
Communication between browser and server can be done in many ways. It's just that JSON comes out of the box. You can use protobuf, xml and plethora of other data serialization techniques to communicate with the server as long as both sides understand what's the communication medium. On the browser side, you have to probably implement protobuf, xml etc serialization/deserialization on your own in javascript.
Any valid JSON is permitted for data exchange. The keys are string quoted but the values can be strings, numbers, booleans, array or other objects itself. Though before transmission, everything is converted into a string and the receiving side parses it into the correct format.

Is browser extensions' Storage API data stored UTF-8 encoded?

My currently "best understanding" is that Javascript Strings, while in memory are represented as DOMString, which means that the the unicode Glyph a (Latin Small Letter A) is represented by 2 bytes (in memory) using UTF-16 text encoding.
This encoding is as maintained when using the Browser's Storage API localStorage, where the documentation also states that what is stored is a DOMString, meaning contrary to popular myth one can usually store 10MB and not incorrectly 5MB in localStorage.
My question however is not with regard to window.localStorage, but instead with the web extensions Storage API browser.storage.local. With chromium I was able to test (using getBytesInUse) that the data stored was encoded using UTF-8, but I did not find any documentation/specification yet, which states what I up to know have only found out by experiment.
An answer to this question should tell if:
the browser extensions' Storage API data is stored UTF-8 encoded?
and provide a reference that specifies this to be that way.
Background / Rationale
I develop a browser extension, who stores text data, which I seek to compress before storage, to conserve space. Since the Storage API provided does not allow storing of raw binary data, I seek to tweak the compression algorithm to be least wasteful, making it counter-productive to base64 convert the binary data. To efficiently store information wihtin text, however it makes a huge difference which text-encoding is used.
The data stored in the browser extension is mostly compressed HTML markup, in English language, which would benefit most from the data storage using UTF-8 text encoding.
For reference I have checked/read through the following information regarding String types related to Browser's Javascript-engine and DOM-engine: String, DOMString, USVString`

Decode Base64 string in node.js

I'm trying to decode a base64 string representing an image stored in a db.
I tried many libraries and solutions provided on SO, but I'm still unable to decode the image correctly. In particular, using the following code:
var img = new Buffer(b64, 'base64').toString('ascii');
I get a similar binary representation, except for the first bytes.
This is the initial part of the base64 string:
/9j/4RxVRXhpZgAASUkqAAgAAAANADIBAgAUAAAAqgAAACWIBAABAAAAiwYAABABAgAIAAAAvgAA
Here are the first 50 bytes of the original image:
ffd8ffe11c5545786966000049492a00080000000d003201020014000000aa00000025880400010000008b06000010010200
And here are the first 50 bytes of the string I get with javascript:
7f587f611c5545786966000049492a00080000000d0032010200140000002a00000025080400010000000b06000010010200
How you can see, the two strings are identical except for the fisrt 3 bytes and some few bytes in the middle.
Can somebody help me understand why this is happening and how to solve it? Thanks
The problem is that you're trying to convert binary data to ASCII, which most likely than not, will mean loss of data since ASCII only consists of values 0x00-0x7F. So when the conversion takes place, all bytes > 0x7F are capped at 0x7F.
If you do this instead, you can see the data matches your first 50 bytes of the original image:
console.log(Buffer.from(b64, 'base64').toString('hex'));
But if you want to keep the binary data intact, just keep it as a Buffer instance without calling .toString(), as many functions that work with binary data can deal with Buffers (e.g. fs core module).

Replacing Base64 - Is http/https communication 8 bit clean?

Here is an overview of what 8 bit clean means.
In the context of web applications, why are images saved as Base64? There is a 33% overhead associated with being 8 bit clean.
If the transmission method is safe there is no need for this.
But basically, my images are saved in Base64 on the server, and transferred to the client, which as we all know can read Base64.
Here is the client side version of Base 64 in an SO Post.
How can you encode a string to Base64 in JavaScript?
Is http/https 8 bit clean?
Reference
http://www.princeton.edu/~achaney/tmve/wiki100k/docs/8-bit_clean.html
http://en.wikipedia.org/wiki/8-bit_clean
You are asking two different things.
Q: Is http 8 bit clean?
A: yes HTTP is "bit 8 clean".
Q: In the context of web applications, why are images saved as Base64?
A: images are not usually saved in Base64. In fact, they are almost never. They are usually saved or transmitted or streamed in compressed binary format (PNG or JPG or similar)
Base64 is used to embed images inside the HTML.
So, you got an image logo.png. You include it statically in your page as <img src='logo.png'>. The image is transmitted thru HTTP in binary, no encoding in neither browser nor server side. This is the most common case.
Alternatively, you might decide to embed the contents of the image inside the HTML. It has some advantages: The browser will not need to do a second trip to the server to fetch the image, because the browser has already received it in the same HTTP GET response of the HTML file. But some disadvantages, because HTML files are text and certain character values may have special meaning for HTML (not for HTTP), you cannot just embed the binary values inside the HTML text. You have to encode them to avoid such collisions. The most usual encoding method is base64, which avoids all the collisions with only a 33% of overhead.
RFC 2616s abstract states:
A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.
HTTP always starts with a text-only header and in this header the content-type is specified.
As long as sender and receiver agree on this contents type anything is possible.
HTTP relies on a reliable (recognize the wordplay) transport layer such as TCP. HTTPS only adds security to the transport layer (or between the transport layer and HTTP, not sure about this).
So yep, http(s) is 8 bit clean.
In addition to PAs answer and your question "But why use an encoding method that adds 33% overhead, when you don't need it?": because that's part of a different concept!
HTTP transfers data of any kind, and the http-content may be an html file with an embedded picture. But after receiving that html file a browser or some other renderer has to interpret the html content. And that follows different standards, which require arbitrary data to be encoded. html is not 8-bit clean, in fact it is not even 7-bit clean as there are many restrictions on the characters used and their order of appearance.
In the context of web applications, why are images saved as Base64?
There is a 33% overhead associated with being 8 bit clean.
Base64 is used to allow 8-bit binary data to be presented as printable text within the ASCII definition. This is only 7-bits, not 8 as the last 128 characters would be depending on set encoding (Latin1, UTF8 etc.) which means that the encoded data could be mangled if a different encoding type was set at client/receiver end compared to source.
As there aren't enough printable characters within ASCII to represent all 8-bit values (which has absolute values and aren't dependent on encoding itself) you need to "water out the bits" and base-64 keeps high enough numbers to enable the bytes to be represented as printable chars.
This is the 33% overhead you see as the byte values representing characters outside the printable range must be shifted to a value that becomes printable within the ASCII table; Base-64 allows this (you could also use quoted printable which was common in the past, ie. with Usenet, email etc.).
I'm thinking about writing another encoding type to remove the overhead.
Good luck :-)
Related to the query
Is HTTP 8-bit clean ?
HTTP protocol is not in entirety a 8-bit clean protocol.
HTTP Entity Body is 8-bit clean since there is a provision to suggest the content-type, allowing content-negotiation between the interacting entities as pointed by everyone in this thread.
However the request line , the headers and the status line are not 8-bit clean.
In order to send any binary information as part of
the request line, as part of query parameters / path segments
header
one must use one of the binary-to-text encoding to preserve the binary values.
For instance when sending a signature as part of query parameters or headers , which is the case of signed URL technique employed by CDN , the signature a binary information has to be encoded to preserve the binary value of it.

Difference between readAsBinaryString and readAsText using FileReader

So as an example, when I read the π character (\u03C0) from a File using the FileReader API, I get the pi character back to me when I read it using FileReader.readAsText(blob) which is expected. But when I use FileReader.readAsBinaryString(blob), I get the result \xcf\x80 instead, which doesn't seem to have any visible correlation with the pi character. What's going on? (This probably has something to do with the way UTF-8/16 is encoded...)
FileReader.readAsText takes the encoding of the file into account. In particular, since you have the file encoded in UTF-8, there may be multiple bytes per character. Reading it as text, the UTF-8 is read as it is, and you get your string.
FileReader.readAsBinaryString, on the other hand, does exactly what it says. It reads the file byte by byte. It doesn't recognise multi-byte characters, which in particular is good news for binary files (basically anything except a text file). Since π is a two-byte character, you get the two individual bytes that make it up in your string.
This difference can be seen in many places. In particular when encoding is lost and you see characters like é displayed as é.
Oh well, if that's all you needed... :)
CF80 is the UTF-8 encoding for π.

Categories