Decode Base64 string in node.js - javascript

I'm trying to decode a base64 string representing an image stored in a db.
I tried many libraries and solutions provided on SO, but I'm still unable to decode the image correctly. In particular, using the following code:
var img = new Buffer(b64, 'base64').toString('ascii');
I get a similar binary representation, except for the first bytes.
This is the initial part of the base64 string:
/9j/4RxVRXhpZgAASUkqAAgAAAANADIBAgAUAAAAqgAAACWIBAABAAAAiwYAABABAgAIAAAAvgAA
Here are the first 50 bytes of the original image:
ffd8ffe11c5545786966000049492a00080000000d003201020014000000aa00000025880400010000008b06000010010200
And here are the first 50 bytes of the string I get with javascript:
7f587f611c5545786966000049492a00080000000d0032010200140000002a00000025080400010000000b06000010010200
How you can see, the two strings are identical except for the fisrt 3 bytes and some few bytes in the middle.
Can somebody help me understand why this is happening and how to solve it? Thanks

The problem is that you're trying to convert binary data to ASCII, which most likely than not, will mean loss of data since ASCII only consists of values 0x00-0x7F. So when the conversion takes place, all bytes > 0x7F are capped at 0x7F.
If you do this instead, you can see the data matches your first 50 bytes of the original image:
console.log(Buffer.from(b64, 'base64').toString('hex'));
But if you want to keep the binary data intact, just keep it as a Buffer instance without calling .toString(), as many functions that work with binary data can deal with Buffers (e.g. fs core module).

Related

UInt8Array of image data has incorrect length

I need to convert a base64 string of an image to a Uint8Array to be inputted in a machine learning model. I've been attempting to convert the base64 to Uint8Array with this Node.js code:
new Uint8Array(Buffer.from(base64String, 'base64'));
Where base64 is a base64 encoded image. I've attached the base64 in this gist.
Since the image is 100x100, I would expect the Uint8Array to have 30,000 values (3 values per pixel due to RGB, and 10,000 total pixels). This is also what my machine learning model expects. However, after running that code, the Uint8Array only has 3889 values according to uintarray.length. I've tried multiple methods in different environments to get the Uint8Array and it's always 3889.
When I print the Uint8Array, I see a list of 3889 numbers from 0-255. When I convert the Uint8Array back to base64 through Buffer.from(uintarray).toString('base64'), it returns the correct base64.
Why are there only 3889 values in my Uint8Array instead of 30,000? How can I get the array in the format that I want, with three values for each pixel and 10,000 pixels?

What is the logic for sending the base64 string into image?

I have generated a Base64 string, which I have shared as an image using Capacitor FileSharer,
for this I have used two approaches-
img.split(',')[1],
This I have understood as how it is giving me the image file from removing the "data:image" from the string.
img.replace(/^data:image\/[a-z]+;base64,/, "")
This I haven't understood properly as what functions it is performing to the string that I am getting a image file. Anyone If possible, do provide an explanation.
Though I have used both of them, and both works fine. It is only I am asking because ,If I am using any property in my project , I should now how it is actually working.
(PS- I am new to Javascript )
Introduction of Base64 encoding
In computer science, Base64 is a group of binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The term Base64 originates from a specific MIME content transfer encoding. Each Base64 digit represents exactly 6 bits of data. Three 8-bit bytes (i.e., a total of 24 bits) can therefore be represented by four 6-bit Base64 digits (you can read more here).
Where can we use Base64 encoding on images specifically?
Basically, there are multiple advantages in using base64 images or even files such as pdf, csv, etc., in web interactions:
For storying them easily in databases as string and retrieve them accordingly.
In JSON or XML based web architectures (such as REST or SOAP) usually is hard to send images along side with form data. For example, sending profile picture along side with user form data such as username, password, first name, last name, etc., in JSON format.
Security! Anyone who does not know anything about base64 encoding cannot open files easily as it should be.

Shared memory | Node.JS + PHP | blocksize | remove trailing "00"

I am using shared memory to make data available among a PHP and a Node.JS process (on Debian9). I Open the memory block on php with shmop_open(). That function requires the size in bytes of the memory block. Since the block is created once and then repeating filled with differnt sized data, i select block size with a little buffer space. That means the block size can´t be as big as the data size, since it changes often. Data written with shmop_write() is of type string btw.
In Node.JS i use the module shm-typed-array to access the shm. I use shm.get (key, 'Buffer'). After that i convert it into a string using toString('utf8').
The Problem: shm.get() reads the entire shared memory, no matter how many bytes are actually used. So i receive a hexadecimal which is followed by a lot of 00 pairs. If i convert that hexa into a string, i receive my previosuly saved data with a lot of (spaces?) behind it. How am i supposed to fix this? I cannot "trim()" the resulting string, which makes me guess those spaces behind my data are no "real" spaces.
Thanks in advance

encode string as utf-16 to base64 in javascript

I'm struggling to find any resources on this online, which is concerning.
I've been reading about UCS-2 and UTF-16 woes, but I can't find a solution.
I need to get a value from an input:
var val = $('input').val()
and encode it to base64, treating the text as utf-16, so:
this is a test
becomes:
dABoAGkAcwAgAGkAcwAgAGEAIAB0AGUAcwB0AA==
and not the below, which you get treating it as UTF-8:
dGhpcyBpcyBhIHRlc3Q=
Your data, once read into JavaScript, will be in an encodingless numerical format (strictly speaking, it has to be in Unicode Normalised Form C, but Unicode is just a series of identifying numbers for each glyph in the Unicode lexicon. It's encoding-less). So: if you specifically need the data encoded as a UTF-16 byte sequence, do so, then base64 encode that.
But here's the fun part: which UTF-16 do you need? Little or Big Endian? With or without BOM? UTF-16 is a really inconvenient encoding format (we're not even going to touch UCS-2. It's obsolete. Has been for a long time).
What you really should need is to get a text value from your HTML element, Base64 encode its value, and then have whatever receives that data unpack it as UTF8; don't try to make JavaScript do more work than it has to. I presume you're sending this data to a server or something, in which case: your server language is way more elaborate than JavaScript, and can unpack text in about a million different encodings thanks to built-in functions. So just use that. Don't solve Y for X.

Difference between readAsBinaryString and readAsText using FileReader

So as an example, when I read the π character (\u03C0) from a File using the FileReader API, I get the pi character back to me when I read it using FileReader.readAsText(blob) which is expected. But when I use FileReader.readAsBinaryString(blob), I get the result \xcf\x80 instead, which doesn't seem to have any visible correlation with the pi character. What's going on? (This probably has something to do with the way UTF-8/16 is encoded...)
FileReader.readAsText takes the encoding of the file into account. In particular, since you have the file encoded in UTF-8, there may be multiple bytes per character. Reading it as text, the UTF-8 is read as it is, and you get your string.
FileReader.readAsBinaryString, on the other hand, does exactly what it says. It reads the file byte by byte. It doesn't recognise multi-byte characters, which in particular is good news for binary files (basically anything except a text file). Since π is a two-byte character, you get the two individual bytes that make it up in your string.
This difference can be seen in many places. In particular when encoding is lost and you see characters like é displayed as é.
Oh well, if that's all you needed... :)
CF80 is the UTF-8 encoding for π.

Categories