Base64 Decode embedded PDF in Typescript - javascript

Within an XML file we have a base64 encoded String representing a PDF file, that contains some table representations, i.e. similar to this example. When decoding the base64 string of that PDF document (i.e. such as this), we end up with a PDF document of 66 kB in size, which can be opened in any PDF viewer correctly.
On trying to decode that same base64 encoded string with Buffer in TypeScript (within a VSCode extension), i.e. with the functions below:
function decodeBase64(base64String: string): string {
const buf: Buffer = Buffer.from(base64String, "base64");
return buf.toString();
}
// the base64 encoded string is usually extracted from an XML file directly
// for testing purposes we load that base64 encoded string from a local file
const base64Enc: string = fs.readFileSync(".../base64Enc.txt", "ascii");
const base64Decoded: string = decodeBase64(base64Enc);
fs.writeFileSync(".../table.pdf", base64Decoded);
we end up with a PDF of 109 kB in size and a document that can't be opened using PDF viewers.
For a simple PDF, such as this one, with a base64 encoded string representation like this, the code above works and the PDF can be read in any PDF viewer.
I've also tried to directly read in the locally stored base64 encoded representation of the PDF file using
const buffer: string | Buffer = fs.readFileSync(".../base64Enc.txt", "base64");
though isn't producing something useful either.
Even with a slight adaptation of this suggestion, due to atob(...) not being present (with suggestions to replace atob with Buffer), which ended up in a code like this:
const buffer: string = fs.readFileSync(".../base64Enc.txt", "ascii");
// atob(...) is not present, other answers suggest to use Buffer for conversion
const binary: string = Buffer.from(buffer, 'base64').toString();
const arrayBuffer: ArrayBuffer = new ArrayBuffer(binary.length);
const uintArray: Uint8Array = new Uint8Array(arrayBuffer);
for (let i: number = 0; i < binary.length; i++) {
uintArray[i] = binary.charCodeAt(i);
}
const decoded: string = Buffer.from(uintArray.buffer).toString();
fs.writeFileSync(".../table.pdf", decoded);
I'm not ending up with a readable PDF. The "decoded" table.pdf sample ends up with 109 kB in size.
What am I doing wrong here? How can I decode a PDF such as the table.pdf sample to obtain a readable PDF document, similar to the functionality provided by Notepad++?

Borrowing heavily from answers to How to get an array from ArrayBuffer?, if you get a Uint8Array right from the Buffer using the Uint8Array constructor:
const buffer: string = fs.readFileSync(".../base64Enc.txt", "ascii");
const uintArray: Uint8Array = new Uint8Array(Buffer.from(buffer, 'base64'));
fs.writeFileSync(".../table.pdf", uintArray);
Writing the Uint8Array directly to the file guarantees there's no corruption due to encoding changes from moving to and from strings.
Just a note: the Uint8Array points to the same internal array of bytes as the Buffer. Not that it matters in this case, since this code doesn't reference the Buffer outside of the constructor, but in case someone decides to create a new variable for the output of Buffer.from(buffer, 'base64').

Related

Blob's DataUri vs Base64 string DataUri

As you know & stated in w3 it is possible to create a url for a Blob object in javascript by using Blob's createObjectUrl. On the other hand, if we have a data as a Base64 encoded string we can present it as a Url with the format "data[MIMEType];base64,[data>]".
Let's suppose that I have a base64 encoded string that was generated from an image that is very popular on these days :) "The red dot" image in wikipedia.
var reddotB64 = "iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg";
I'm 100% sure that if I create a URL conforming the Data URI Scheme as stated above, then, I'll be able to put a link element and download it from the browser: please see the code example below:
var reddotB64 = "iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg";
var reddotLink = document.createElement("a");
reddotLink.target = "_blank";
reddotLink.href = "data:image/png;base64," + reddotB64;
document.body.appendChild(reddotLink);
reddotLink.click();
document.body.removeChild(reddotLink);
This works prettywell and displays the image in a new tab. On the other hand I'll try to create the link by using Blob as follow:
var reddotB64 = "iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg";
var reddotBlob = new Blob([atob(reddotB64)], { type: 'image/png' });
var reddotLink = document.createElement("a");
reddotLink.target = "_blank";
reddotLink.href = URL.createObjectURL(reddotBlob);
document.body.appendChild(reddotLink);
reddotLink.click();
document.body.removeChild(reddotLink);
This code is decoding base64 encoded string variable reddotB64 via atob function. And then, creating a Blob object and continues with URL.createObjectURL function. In that case, since I've decoded reddotB64 from base64 to binary and created a Blob of type image/png and then create object url from that I expect it to work but it's not working.
Do you have a clue why it's not working? Or am I missing anything on the standards? Or doing something wrong in Javascript?
Here is the answer. Looks like it is an encoding issue. In order to convert/decode Base64 string to binary(UInt8Array/byte) using atob is not enough. After using atob it is required to use UTF-16 character code: and we achieve this by using charCodeAt function for every character in the decoded string. As a result we get UTF-16 encoded binary string which is definately working. Just create a Blob and then call URL.createObjectURL.

How to parse a DICOM in JavaScript using cornerstone?

I'm trying to parse a dicom file in javascript. I download the dicom with axios, the data I get is a string that looks like this:
"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000DICM\u0002\u0000\u0000\u0000UL\u0004\u0000�\u0000\u0000\u0000\u0002\u0000\u0001\u0000OB\u0000\u0000\u0002\u0000\u0000\u0000\u0000\u0001\u0002\u0000\u0002\u0000UI\u001a\u00001.2.840.10008.5.1.4.1.1.1\u0000\u0002\u0000\u0003\u0000UI0\u00001.3.12.2.1104.5.3.33.1388.11.201703201514180234\u0000\u0002\u0000\u0010\u0000UI\u0014\u00001.2.840.10008.1.2.1\u0000\u0002\u0000\u0012\u0000UI\u0014\u00001.3.12.2.1107.5.3.4\u0000\u0002\u0000\u0013\u0000SH\u000e\u0000Siemens_FLC_60\u0008\u0000\u0005\u0000CS\n\u0000ISO_IR 100\u0008\u0000\u0008\u0000CS\u0016\u0000ORIGINAL\\PRIMARY\\\\RAD \u0008\u0000\u0016\u0000UI\u001a\u00001.2.840.11008.5.1.4.1.1.1\u0000\u0008\u0000\u0018\u0000UI0\u00001.3.12.2.1417.5.3.33.1398.11.201703201514180234\u0000\u0008\u0000 \u0000DA\u0008\u000020170320\u0008\u0000!\u0000DA\u0008\u000020170320\u0008\u0000\"\u0000DA\u0008\u000020170320\u0008\u0000#\u0000DA\u0008\u000020170320\u0008\u00000\u0000TM\u0006\u0000151324\u0008\u00001\u0000TM\u000c\u0000151418.0234 \u0008\u00002\u0000TM\u000c\u0000151418.0234 \u0008\u00003\u0000TM\u000c\u0000151418.0234 \u0008\u0000P\u0000SH\u0000\u0000\u0008\u0000`\u0000CS\u0002\u0000CR\u0008\u0000p\u0000LO\u0008\u0000SIEMENS \u0008\u0000�\u0000LO\u0018\u0000CH Foo Bar - PARIS\u0008\u0000�\u0000PN\u0000\u0000\u0008\u0000\u0010\u0010SH\u0010\u0000AX10094200-1398 \u0008\u00000\u0010LO\n\u0000LDQK001 RC\u0008\u00002\u0010SQ\u0000\u00000\u0000\u0000\u0000��\u0000�(\u0000\u0000\u0000\u0008\u0000\u0000\u0001SH\u0002\u0000RC\u0008\u0000\u0002\u0001SH\u0004\u0000QDOC\u0008\u0000\u0004\u0001LO\n\u0000LDQK001 RC\u0008\u0000>\u0010LO\u001a\u0000RAD_Rachis Cerv. F 3/4 AP \u0008\u0000#\u0010LO\u0002\u000077\u0008\u0000�\u0010LO\u0000\u0000\u0008\u0000�\u0010LO\u0016\u0000Fluorospot Compact FD \u0008\u0000\u0010\u0011SQ\u0000\u0000V\u0000\u0000\u0000��\u0000�N\u0000\u0000\u0000\u0008\u0000P\u0011UI\u0018\u00001.2.840.10008.3.1.2.3.1\u0000\u0008\u0000U\u0011UI&\u00001.3.51.0.1.1.10.2.1.94.2417819.2393805\u0008\u0000\u0011\u0011SQ\u0000\u0000Z\u0000\u0000\u0000��\u0000�R\u0000\u0000\u0000\u0008\u0000P\u0011UI\u0018\u00001.2.840.10008.…"
I need to decode this to a json format (or a readable format like dcmdump does for example) in a js script.
I have tried to use the cornerstone dicom parser (https://github.com/cornerstonejs/dicomParser) like this:
import * as dicomParser from 'dicom-parser';
let enc = new TextEncoder("utf-8")
let arr8 = enc.encode(dicom_data).map(Number)
console.log(dicomParser.parseDicom(arr8))
But I get the following error:
"uncaught exception: dicomParser.parseDicom: missing required meta header attribute 0002,0010".
Does anyone know a simple way to do this?
I am not familiar with JavaScript. However, the attribute (0002,0010 = Transfer Syntax UID) is present in the header.
But the way you handle the data does not appear right to me.
The data is not in DICOM format.
Instead, it seems to be encoded as a string in which non-printable characters have been converted to \u. By this, one byte in the header has been expanded to a two-byte hex number. You should try to obtain the file in its original binary representation
Transcoding to UTF-8 does not appear appropriate to me. Some DICOM attributes contain a value in binary representation. Plus the attributes "addresses" (group, element) and their length are encoded in binary format. Transcoding to UTF-8 will destroy this.
So I think it should work once you pass the original binary DICOM file to the dicomParser without applying any modifications before the parsing.
Well, I never used the toolkit you mentioned in question. I did a simple google search and found github with the documentation.
Documentation also provides a sample to dicom dump. You can view the source of this page in your browser which will provide you complete code to print dump.
Following is snippet:
var reader = new FileReader();
reader.onload = function(file) {
var arrayBuffer = reader.result;
// Here we have the file data as an ArrayBuffer. dicomParser requires as input a
// Uint8Array so we create that here
var byteArray = new Uint8Array(arrayBuffer);
var kb = byteArray.length / 1024;
var mb = kb / 1024;
var byteStr = mb > 1 ? mb.toFixed(3) + " MB" : kb.toFixed(0) + " KB";

Angular: Convert base64 string to Byte Array in IE

I am trying to convert a base64 string to byte array and open it as a pdf file in IE. The only problem is atob is not supported in IE, so trying to use Buffer like this:
let b64Data = myBase64Url.split(',', 2)[1];
var byteArray = new Buffer(b64Data ,'base64').toString('binary');
var blob = new Blob([byteArray], {type: 'application/pdf'});
window.navigator.msSaveOrOpenBlob(blob);
I am getting a popup successfully to open the file
But the file is corrupted
What am i doing wrong ? Is there a better way to convert base64 to byte array in IE ?
In order for the base64 to be properly decoded, it must be only the base64 data, i.e. no mimetype information preceding it.
You will also need to remove .toString('binary') so that you're passing a buffer instead of a string.

websocket api - image encoding yields no image type on client side

I have a web socket server on tomcat 8 with the following binary use:
sess.getBasicRemote().sendBinary(bf);
where bf is a simple image to bytes conversion as follows:
BufferedImage img = ImageIO.read(...);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write( img, "png", baos );
ByteBuffer bf = ByteBuffer.wrap(baos.toByteArray());
this code ends up in the the client side (javascript) as a blob and eventually rendered as an image in the browser and this seems to work just fine.
the only thing that's strange is the the image is rendered typeless as:
data:;base64,iVBORw0KGgoAAAA......== without the type (image/png).
if I use online encoders for the same image I will get:
......== (notice the image/png type)
and so my question is why is that?
is my image to byte conversion wrong? like I said, the image is displayed fine it just is missing the type.
Note that the data send from the java websocket server is not encoded with base 64, its something I do on the client side (via JS's FileReader.readAsDataURL(blob) - very common).
thanks a lot and sorry for the long post
No, your image to byte array conversion is not wrong. Byte array conversion treats the images as a binary stream, it has nothing to do with the MediaType contained in it.
The type that you want to see is a Data URI media type.
Normal java code for converting files to byte array won't give you data URL scheme compliant URL.
From the RFC
data:[<mediatype>][;base64],
The <mediatype> is an Internet media type specification (with
optional parameters.) The appearance of ";base64" means that the data
is encoded as base64. Without ";base64", the data (as a sequence of
octets) is represented using ASCII encoding for octets inside the
range of safe URL characters and using the standard %xx hex encoding
of URLs for octets outside that range. If <mediatype> is omitted, it
defaults to text/plain;charset=US-ASCII. As a shorthand,
"text/plain" can be omitted but the charset parameter supplied.
RFC source
When you're creating the Blob object in Javascript you have an option to pass MediaType to it so that when you read it using FileReader.readAsDataURL it fills the appropriate media type.
Example is below
var blob = new Blob( [ arrayBufferView ], { type: "image/jpeg" } );
Source
You probably don't need BufferedImage in your code, simple file read should suffice.
Following is equivalent of your code with Apache FileUtils.
ByteBuffer bf = ByteBuffer.wrap(FileUtils.readFileToByteArray('test.jpg'));

Returning a byte string to ExternalInterface.call throws an error

I am working on my open source project Downloadify, and up until now it simply handles returning Strings in response to ExternalInterface.call commands.
I am trying to put together a test case using JSZip and Downloadify together, the end result being that a Zip file is created dynamically in the browser, then saved to the disk using FileReference.save. However, this is my problem:
The JSZip library can return either a base64 encoded string of the Zip, or the raw byte string. The problem is, if I return that byte string in response to the ExternalInterface.call command, I get this error:
Error #1085: The element type "string" must be terminated by the matching end-tag "</string>"
ActionScript 3:
var theData:* = ExternalInterface.call('Downloadify.getTextForSave',queue_name);
Where queue_name is just a string used to identify the correct instance in JS.
JavaScript:
var zip = new JSZip();
zip.add("test.txt", "Hello world!\n");
var content = zip.generate(true);
return content;
If I instead return a normal string instead of the byte string, the call works correctly.I would like to avoid using base64 as I would have to include a base64 decoder in my swf which will increase its size.
Finally: I am not looking for a AS3 Zip generator. It is imperative to my project to have that part run in JavaScript
I am admittedly not a AS3 programmer by trade, so if you need any more detail please let me know.
When data is being returned from javascript calls it's being serialized into an XML string. So if the "raw string" returned by JSZip will include characters which make the XML non-valid, which is what I think is happening here, you'll get errors like that.
What you get as a return is actually:
<string>[your JSZip generated string]</string>
Imagine your return string includes a "<" char - this will make the xml invalid, and it's hard to tell what character codes will a raw byte stream translate too.
You can read more about the external API's XML format on LiveDocs
i think the problem is caused by the fact, that flash expects a utf8 String and you throw some binary stuff at it. i think for example 0x00FF will not turn out to be valid utf8 ...
you can try fiddling around with flash.system::System.setCodePage, but i wouldn't be too optimistic ...
i guess a base64 decoder is probably really the easiest ... i'd rather worry about speed than about file size though ... this rudimentary decoder method uses less than half a K:
public function decodeBase64(source:String):ByteArray {
var ret:ByteArray = new ByteArray();
var map:Object = new Object();
var i:int = 0;
for each (var char:String in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".split("")) map[char] = i++;
map["="] = 0;
source = source.split("\n").join("").split("\r").join("");//remove linebreaks
for (i = 0; i < source.length/4; i++) {
var buf:int = 0;
for each (char in source.substr(i * 4, 4).split("")) buf = (buf << 6) + map[char];
ret.writeByte(buf >>> 16);
ret.writeShort(buf);
}
return ret;
}
you could simply shorten function names and take a smaller image ... or use ColorTransform or ConvolutionFilter on one image instead of four ... or compile the image into the SWF for smaller overall size ... or reduce function name length ...
so unless you're planning on working with MBs of data, this is the way to go ...

Categories