Displaying UTF-8 characters in PDF

Displaying UTF-8 characters in PDF - javascript

I am trying to display a PDF by converting it into a binary string from the backend.
This is the ajax call I am making
$.ajax({
type : 'GET',
url : '<url>',
data : oParameters,
contentType : 'application/pdf;charset=UTF-8',
success : function(odata) {
window.open("data:application/pdf;charset=utf-8," + escape(odata));
}
});
When I try to open the PDF in a new window, the url looks like
data:application/pdf;charset=utf-8,%25PDF-1.3%0D%0A%25%uFFFD%uFFFD%uFFFD%uFFFD%0D%0A2%200%20obj%0D%0A/WinAnsiEncoding%0D........
As you can see, it uses "WinAnsiEncoding" to display the PDF. Because of this, some of the characters are not being displayed properly. How do I change this to UTF-8?
EDIT : The backend is in ABAP. I am converting a smartform to OTF and then to a string using the function module "CONVERT_OTF".
CALL FUNCTION fname
EXPORTING
user_settings = space
control_parameters = ls_ctropt
output_options = ls_output
gv_lang = lv_lang
IMPORTING
job_output_info = ls_body_text
EXCEPTIONS
formatting_error = 1
internal_error = 2
send_error = 3
user_canceled = 4
OTHERS = 5.
CALL FUNCTION 'CONVERT_OTF'
EXPORTING
format = 'PDF'
IMPORTING
bin_filesize = ls_pdf_len
bin_file = ls_pdf_xstring
TABLES
otf = ls_body_text-otfdata
lines = lt_lines
EXCEPTIONS
err_max_linewidth = 1
err_format = 2
err_conv_not_possible = 3
err_bad_otf = 4
OTHERS = 5.
CALL METHOD server->response->set_header_field( name = 'Content-Type'
value = 'application/pdf;charset=UTF-8' ).
CALL METHOD server->response->append_data( data = lv_pdf_string
length = lv_len ).

Concerning your remark that it uses "WinAnsiEncoding" to display the PDF:
After the comma in
data:application/pdf;charset=utf-8,%25PDF-1.3%0D%0A%25%uFFFD%uFFFD%uFFFD%uFFFD%0D%0A2%200%20obj%0D%0A/WinAnsiEncoding%0D........
everything is pure data. Thus, "WinAnsiEncoding" is merely part of the content of the PDF, and if it is the reason of your troubles, the PDF generator must be asked to change his PDF generation process.
In the case at hand, your data is:
%PDF-1.3
%...
2 0 obj
/WinAnsiEncoding
........
which is completely normal PDF structure. It merely means that the PDF object 2 is defined as /WinAnsiEncoding which may or may not be used for some font definition, and even if it is used, it may still be adapted by some /Differences to include the characters you require. Furthermore it does not make sense to change this to UTF-8 (as you request) because UTF-8 is not a standard encoding for PDF page content. If you somehow put UTF-8 there, you'll break the PDF even more.
I'm afraid, though, that there are other problems, too.
You add a charset parameter to the type application/pdf --- this does not make sense, PDF is a binary format, i.e. a sequence of bytes is expected and, therefore, no charset is involved.
Your method call escape(odata) creates %uFFFD%uFFFD%uFFFD%uFFFD --- this is invalid according to the RFCs which only define
A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component. A percent-encoded octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing that octet's numeric value.
(RFC 3986, section 2.1)
Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI.
(ibidem, section 2.4)
Thus, %uFFFD%uFFFD%uFFFD%uFFFD is invalid.
PDF being a binary format are better suited for Base64 encoding, i.e.
data:application/pdf;base64,BASE_64_ENCODED_PDF
Thus, I propose you change your client side process accordingly.

Related

How to parse a DICOM in JavaScript using cornerstone?

I'm trying to parse a dicom file in javascript. I download the dicom with axios, the data I get is a string that looks like this:
"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000DICM\u0002\u0000\u0000\u0000UL\u0004\u0000�\u0000\u0000\u0000\u0002\u0000\u0001\u0000OB\u0000\u0000\u0002\u0000\u0000\u0000\u0000\u0001\u0002\u0000\u0002\u0000UI\u001a\u00001.2.840.10008.5.1.4.1.1.1\u0000\u0002\u0000\u0003\u0000UI0\u00001.3.12.2.1104.5.3.33.1388.11.201703201514180234\u0000\u0002\u0000\u0010\u0000UI\u0014\u00001.2.840.10008.1.2.1\u0000\u0002\u0000\u0012\u0000UI\u0014\u00001.3.12.2.1107.5.3.4\u0000\u0002\u0000\u0013\u0000SH\u000e\u0000Siemens_FLC_60\u0008\u0000\u0005\u0000CS\n\u0000ISO_IR 100\u0008\u0000\u0008\u0000CS\u0016\u0000ORIGINAL\\PRIMARY\\\\RAD \u0008\u0000\u0016\u0000UI\u001a\u00001.2.840.11008.5.1.4.1.1.1\u0000\u0008\u0000\u0018\u0000UI0\u00001.3.12.2.1417.5.3.33.1398.11.201703201514180234\u0000\u0008\u0000 \u0000DA\u0008\u000020170320\u0008\u0000!\u0000DA\u0008\u000020170320\u0008\u0000\"\u0000DA\u0008\u000020170320\u0008\u0000#\u0000DA\u0008\u000020170320\u0008\u00000\u0000TM\u0006\u0000151324\u0008\u00001\u0000TM\u000c\u0000151418.0234 \u0008\u00002\u0000TM\u000c\u0000151418.0234 \u0008\u00003\u0000TM\u000c\u0000151418.0234 \u0008\u0000P\u0000SH\u0000\u0000\u0008\u0000`\u0000CS\u0002\u0000CR\u0008\u0000p\u0000LO\u0008\u0000SIEMENS \u0008\u0000�\u0000LO\u0018\u0000CH Foo Bar - PARIS\u0008\u0000�\u0000PN\u0000\u0000\u0008\u0000\u0010\u0010SH\u0010\u0000AX10094200-1398 \u0008\u00000\u0010LO\n\u0000LDQK001 RC\u0008\u00002\u0010SQ\u0000\u00000\u0000\u0000\u0000��\u0000�(\u0000\u0000\u0000\u0008\u0000\u0000\u0001SH\u0002\u0000RC\u0008\u0000\u0002\u0001SH\u0004\u0000QDOC\u0008\u0000\u0004\u0001LO\n\u0000LDQK001 RC\u0008\u0000>\u0010LO\u001a\u0000RAD_Rachis Cerv. F 3/4 AP \u0008\u0000#\u0010LO\u0002\u000077\u0008\u0000�\u0010LO\u0000\u0000\u0008\u0000�\u0010LO\u0016\u0000Fluorospot Compact FD \u0008\u0000\u0010\u0011SQ\u0000\u0000V\u0000\u0000\u0000��\u0000�N\u0000\u0000\u0000\u0008\u0000P\u0011UI\u0018\u00001.2.840.10008.3.1.2.3.1\u0000\u0008\u0000U\u0011UI&\u00001.3.51.0.1.1.10.2.1.94.2417819.2393805\u0008\u0000\u0011\u0011SQ\u0000\u0000Z\u0000\u0000\u0000��\u0000�R\u0000\u0000\u0000\u0008\u0000P\u0011UI\u0018\u00001.2.840.10008.…"
I need to decode this to a json format (or a readable format like dcmdump does for example) in a js script.
I have tried to use the cornerstone dicom parser (https://github.com/cornerstonejs/dicomParser) like this:
import * as dicomParser from 'dicom-parser';
let enc = new TextEncoder("utf-8")
let arr8 = enc.encode(dicom_data).map(Number)
console.log(dicomParser.parseDicom(arr8))
But I get the following error:
"uncaught exception: dicomParser.parseDicom: missing required meta header attribute 0002,0010".
Does anyone know a simple way to do this?

I am not familiar with JavaScript. However, the attribute (0002,0010 = Transfer Syntax UID) is present in the header.
But the way you handle the data does not appear right to me.
The data is not in DICOM format.
Instead, it seems to be encoded as a string in which non-printable characters have been converted to \u. By this, one byte in the header has been expanded to a two-byte hex number. You should try to obtain the file in its original binary representation
Transcoding to UTF-8 does not appear appropriate to me. Some DICOM attributes contain a value in binary representation. Plus the attributes "addresses" (group, element) and their length are encoded in binary format. Transcoding to UTF-8 will destroy this.
So I think it should work once you pass the original binary DICOM file to the dicomParser without applying any modifications before the parsing.

Well, I never used the toolkit you mentioned in question. I did a simple google search and found github with the documentation.
Documentation also provides a sample to dicom dump. You can view the source of this page in your browser which will provide you complete code to print dump.
Following is snippet:
var reader = new FileReader();
reader.onload = function(file) {
var arrayBuffer = reader.result;
// Here we have the file data as an ArrayBuffer. dicomParser requires as input a
// Uint8Array so we create that here
var byteArray = new Uint8Array(arrayBuffer);
var kb = byteArray.length / 1024;
var mb = kb / 1024;
var byteStr = mb > 1 ? mb.toFixed(3) + " MB" : kb.toFixed(0) + " KB";

How do I insert a PNG comment block when saving an HTML5 canvas using javascript toDataURL?

I have a compact canvas-to-png download saver function (see code below).
This code works very well and I am satisfied with its output... mostly.
Would a second replace suffice? What would that replace look like?
My only other option is to post-process the file with imagemagick.
Any ideas?
More completely: I want to add metadata from javascript.
I found this link http://dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_PNG_files
which details the structures, and I may be able to figure it out with sufficient time.
If anyone has experience and can shorten this for me, I would appreciate it.
//------------------------------------------------------------------
function save () // has to be function not var for onclick to work.
//------------------------------------------------------------------
{
var element = document.getElementById("saver");
element.download = savename;
element.href = document.
getElementById(id.figure1a.canvas).
toDataURL("image/png").
replace(/^data:image\/[^;]/,'data:application/octet-stream');
}

The Base-64 representation has little to do with the internal chunks. It's just [any] binary data encoded as string so it can be transferred over string-only protocols (or displayed in a textual context).
It's perhaps a bit broad to create an example, but hopefully showing the main steps will help to achieve what you're looking for:
To add a chunk to a PNG you would first have to convert the data for it into an ArrayBuffer using XHR/fetch in the case of Data-URIs, or FileReader in case you have the PNG as Blob (which I recommend. See toBlob()).
Add a DataView to the ArrayBuffer
Go to position 0x08 in the array which will represent the start of the IHDR chunk, read the length of the chunk (Uint32) (it's very likely it has the same static size for almost any PNG but since it's possible to have changes, and you don't need to remember the chunk size we'll just read it from here). Add length to position (+4 for CRC-32 at the end of the chunk, and +4 if you didn't move the pointer while reading the length), typically this should land you at position 0x21.
You now have the position for the next chunk which we can use to insert our own text chunks
Split that first part into a part-array (a regular array) using a sub-array with the original ArrayBuffer, e.g. new Uint8Array(arraybuffer, 0, position); - you can also use the subarray method.
Produce the new chunk* as typed array and add to part-array
Add the remaining part of the original PNG array without the first part to the part-array, e.g. new Uint8Array(arraybuffer, position, length - position);
Convert the part-array to a Blob using the part-array directly as argument (var newPng = new Blob(partArray, {type: "image/png"});). This will now contain the custom chunk. From there you can use an Object-URL with it to read it back as an image (or make it available for download).
*) Chunk:
For tEXt be aware of it is limited to the Latin-1 charset which means you'll have to whitewash the string you want to use - use iTXt for unicode (UTF-8) content - we'll use tEXt here for simplicity.
The keyword and value is separated by a NUL-byte (0x00) in a tEXt chunk, and the keyword must be exactly typed as defined in the spec.
Build the chunk this way:
get byte-size from string
add 12 bytes (for length, four-cc and crc-32)
format the array this way (you can use a DataView here as well):
Uint32 - length of chunk (data only in number of bytes)
Uint32 - "tEXt" as four-cc
[...] - The data itself (copy byte-wise)
Uint32 - CRC32* which includes the FourCC but not length and itself.
All data in a PNG is big-endian.
To calculate CRC-32 feel free to use this part of my pngtoy solution (the LUT is built this way). Here is one way to format a four-cc:
function makeFourCC(n) { // n = "tEXt" etc., big-endian
var c = n.charCodeAt.bind(n);
return (c(0) & 0x7f) << 24 | (c(1) & 0x7f) << 16 | (c(2) & 0x7f) << 8 | c(3) & 0x7f
}

websocket api - image encoding yields no image type on client side

I have a web socket server on tomcat 8 with the following binary use:
sess.getBasicRemote().sendBinary(bf);
where bf is a simple image to bytes conversion as follows:
BufferedImage img = ImageIO.read(...);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write( img, "png", baos );
ByteBuffer bf = ByteBuffer.wrap(baos.toByteArray());
this code ends up in the the client side (javascript) as a blob and eventually rendered as an image in the browser and this seems to work just fine.
the only thing that's strange is the the image is rendered typeless as:
data:;base64,iVBORw0KGgoAAAA......== without the type (image/png).
if I use online encoders for the same image I will get:
data:image/png;base64,iVBORw0KGgoAAAA......== (notice the image/png type)
and so my question is why is that?
is my image to byte conversion wrong? like I said, the image is displayed fine it just is missing the type.
Note that the data send from the java websocket server is not encoded with base 64, its something I do on the client side (via JS's FileReader.readAsDataURL(blob) - very common).
thanks a lot and sorry for the long post

No, your image to byte array conversion is not wrong. Byte array conversion treats the images as a binary stream, it has nothing to do with the MediaType contained in it.
The type that you want to see is a Data URI media type.
Normal java code for converting files to byte array won't give you data URL scheme compliant URL.
From the RFC
data:[<mediatype>][;base64],
The <mediatype> is an Internet media type specification (with
optional parameters.) The appearance of ";base64" means that the data
is encoded as base64. Without ";base64", the data (as a sequence of
octets) is represented using ASCII encoding for octets inside the
range of safe URL characters and using the standard %xx hex encoding
of URLs for octets outside that range. If <mediatype> is omitted, it
defaults to text/plain;charset=US-ASCII. As a shorthand,
"text/plain" can be omitted but the charset parameter supplied.
RFC source
When you're creating the Blob object in Javascript you have an option to pass MediaType to it so that when you read it using FileReader.readAsDataURL it fills the appropriate media type.
Example is below
var blob = new Blob( [ arrayBufferView ], { type: "image/jpeg" } );
Source
You probably don't need BufferedImage in your code, simple file read should suffice.
Following is equivalent of your code with Apache FileUtils.
ByteBuffer bf = ByteBuffer.wrap(FileUtils.readFileToByteArray('test.jpg'));

DOM Exception 5 INVALID CHARACTER error on valid base64 image string in javascript

I'm trying to decode a base64 string for an image back into binary so it can be downloaded and displayed locally by an OS.
The string I have successfully renders when put as the src of an HTML IMG element with the data URI preface (data: img/png;base64, ) but when using the atob function or a goog closure function it fails.
However decoding succeeds when put in here: http://www.base64decode.org/
Any ideas?
EDIT:
I successfully got it to decode with another library other than the built-in JS function. But, it still won't open locally - on a Mac says it's damaged or in an unknown format and can't get opened.
The code is just something like:
imgEl.src = 'data:img/png;base64,' + contentStr; //this displays successfully
decodedStr = window.atob(contentStr); //this throws the invalid char exception but i just
//used a different script to get it decode successfully but still won't display locally
the base64 string itself is too long to display here (limit is 30,000 characters)

I was just banging my head against the wall on this one for awhile.
There are a couple of possible causes to the problem. 1) Utf-8 problems. There's a good write up + a solution for that here.
In my case, I also had to make sure all the whitespace was out of the string before passing it to atob. e.g.
function decodeFromBase64(input) {
input = input.replace(/\s/g, '');
return atob(input);
}
What was really frustrating was that the base64 parsed correctly using the base64 library in python, but not in JS.

I had to remove the data:audio/wav;base64, in front of the b64, as this was given as part of the b64.
var data = b64Data.substring(b64Data.indexOf(',')+1);
var processed = atob(data);

Returning a byte string to ExternalInterface.call throws an error

I am working on my open source project Downloadify, and up until now it simply handles returning Strings in response to ExternalInterface.call commands.
I am trying to put together a test case using JSZip and Downloadify together, the end result being that a Zip file is created dynamically in the browser, then saved to the disk using FileReference.save. However, this is my problem:
The JSZip library can return either a base64 encoded string of the Zip, or the raw byte string. The problem is, if I return that byte string in response to the ExternalInterface.call command, I get this error:
Error #1085: The element type "string" must be terminated by the matching end-tag "</string>"
ActionScript 3:
var theData:* = ExternalInterface.call('Downloadify.getTextForSave',queue_name);
Where queue_name is just a string used to identify the correct instance in JS.
JavaScript:
var zip = new JSZip();
zip.add("test.txt", "Hello world!\n");
var content = zip.generate(true);
return content;
If I instead return a normal string instead of the byte string, the call works correctly.I would like to avoid using base64 as I would have to include a base64 decoder in my swf which will increase its size.
Finally: I am not looking for a AS3 Zip generator. It is imperative to my project to have that part run in JavaScript
I am admittedly not a AS3 programmer by trade, so if you need any more detail please let me know.

When data is being returned from javascript calls it's being serialized into an XML string. So if the "raw string" returned by JSZip will include characters which make the XML non-valid, which is what I think is happening here, you'll get errors like that.
What you get as a return is actually:
<string>[your JSZip generated string]</string>
Imagine your return string includes a "<" char - this will make the xml invalid, and it's hard to tell what character codes will a raw byte stream translate too.
You can read more about the external API's XML format on LiveDocs

i think the problem is caused by the fact, that flash expects a utf8 String and you throw some binary stuff at it. i think for example 0x00FF will not turn out to be valid utf8 ...
you can try fiddling around with flash.system::System.setCodePage, but i wouldn't be too optimistic ...
i guess a base64 decoder is probably really the easiest ... i'd rather worry about speed than about file size though ... this rudimentary decoder method uses less than half a K:
public function decodeBase64(source:String):ByteArray {
var ret:ByteArray = new ByteArray();
var map:Object = new Object();
var i:int = 0;
for each (var char:String in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".split("")) map[char] = i++;
map["="] = 0;
source = source.split("\n").join("").split("\r").join("");//remove linebreaks
for (i = 0; i < source.length/4; i++) {
var buf:int = 0;
for each (char in source.substr(i * 4, 4).split("")) buf = (buf << 6) + map[char];
ret.writeByte(buf >>> 16);
ret.writeShort(buf);
}
return ret;
}
you could simply shorten function names and take a smaller image ... or use ColorTransform or ConvolutionFilter on one image instead of four ... or compile the image into the SWF for smaller overall size ... or reduce function name length ...
so unless you're planning on working with MBs of data, this is the way to go ...

We Keep Coding

JavaScript is the programming language of the Web.