JavaScript Base 64 Decoding Binary Data Doesn't Work

JavaScript Base 64 Decoding Binary Data Doesn't Work - javascript

I have a simple PHP file which loads a file from my server, base64 encodes it and echoes it out.
Then I have a simple HTML page that uses jQuery to fetch this file, base64 decode it and do a checksum test. The checksum test is not working.
I md5'd the file in PHP after encoding it and md5'd it in javascript before decoding it and the checksums matched (So nothing went wrong during transit). However, the pre encoding and post decoding checksums do NOT match.
I am using webtoolkit.base64.js for decoding it in JavaScript. The file is a binary file (A ZIP archive).
Is there a problem with the decoding library or something else I'm not aware of that could cause this issue? Could it be a problem with the MD5 library I'm using (http://pajhome.org.uk/crypt/md5/md5.html)

Summary
Your MD5 library is OK, your base64 library is broken.
Both your JavaScript Base64 library and MD5 library are not working correctly.
I have created and verified a ZIP file of 15097 bytes.
MD5 sum: a9de6b8e5a9173140cb46d4b3b31b67c
I have base64-encoded this file: http://pastebin.com/2rfdTzYT (20132 bytes).
Verify the length of the base64 file at pastebin, using the following JavaScript snippet:
document.querySelector('.de1').textContent.replace(/\s/g,'').length;
Base64-decode the file properly using atob, and verify the size:
window.b64_str = document.querySelector('.de1').textContent.replace(/\s/g,'');
console.log( atob(window.b64_str).length ); /* 15097 */
I verified that both files were exactly equal using the Hexdump JavaScript library, and the xxd UNIX command (available as EXE file for Windows).
Using your Base64 decoder, I get a string with the size of 8094. That is not 15097!
During my tests, I discovered that the atob method returned incorrect bytes after certain byte sequences, including carriage returns. I have not yet found a solution to this.
Your MD5 library is OK.

I may be misunderstanding the question, but if I'm not I've run into something like this before. The javascript library you're using doesn't do binary. What php encodes is going to be a bunch of 1's and 0's but what the javascript spits out is going to be text. If you want a binary string you'll have to convert the resulting text to binary, then it should be the same as your original file.

Related

storing images in xml file

I am new to XML. I have an XML document that I am inserting data manually. I wanted to know if it is possible to include an image in an XML file and not by using the file path. I have found something about encoding but I do not understand how this work and the option is not even available in the XML editor. After storing the images in the XML file, I will access it using javascript. Please provide further information on this matter.

An image is binary data, and the usual way to store binary data in an XML document is by encoding it in base64 (which turns it into ASCII characters). Libraries to convert from binary to base64, and back, are widely available, but the details depend very much on your programming environment. There are also online services where you can upload an image and get back its base64 representation: an example is here https://www.base64encode.net/base64-image-encoder

Is it possible to insert a base64 Word file using Office 2013 office api

I have a successfully running script that loads Word files from SharePoint and inserts them into Word 2017 (Office 365 Word local client, not online)
The current scripts reads up the files using Ajax and extracts the base64 file and uses
body.insertFileFromBase64(myBase64, end)
I now need to extend the functionality to support Word 2013 (i.e. use the Office.js instead of the Word JavaScript api). So the code has changed to
Office.context.document.setSelectedDataAsync(file, someCoercionType)
I hoped to be able to use a variant of
Office.context.document.setSelectedDataAsync(myBase64, {coercionType: Office.CoercionType.Ooxml}, function (
But I get an error back "The Format of the specified data object is invalid", which is correct enough as the Office API assumes a base64 file is an image.
Is it possible to convert the Base64 file to XML in JavaScript? (Elsewhere in my code I unzip the docx and extract bookmarks, but only from document.xml which lacks all formatting and images, footers etc.)

Base64 is simply an binary encoding and blissfully unaware of the underlying content type. So if you're source content was OOXML, decoding it would give you that OOXML back. What you cannot do is type conversion. For example, a Base64 encoded JPEG can not be decoded directly into a BMP. To do that you would need to first decode and then convert from JPEG to BMP using some other tool.
If you're seeking to manipulate or extract content an existing document, you may want to look at Aspose.Words. Aspose provides tools that allow you to programmatically work with Word documents (they have similar tools for a flew of other file types as well). Using this, you may be able to extract the OOXML you're looking for so you can then insert it into Word using Office.js.
At the moment, the only Coercion Type that accepts Base64 encoded content is Office.CoercionType.Image.

What is this encoding / why are these .txt files not plain text?

I am browsing the deck.gl repo. It ships with some examples with text files, for example this one. These files have a .txt extension, but aren't plain text:
!OohmwFjqwbMg#[?ADKJYXF#^?N?FAD
=wnmwFvvwbM_#WNg####C?C_#UA?AD#?Of#_#UTu#??BK?A??FUVP?#JF?AVP?#JF?AVPGTA?EL#?
=urmwF|swbM_#UFS##BK?C#C#A#E?CIGA?GE?CIGA#CF?#ABA#CJ##GR]Ud#wA\T?#DB?AXP?#DB?A\T
<aymwFnvwbMaAOKCA#OKPk#CCDKAADKAADKAADKAADKAAL_#fBjAIVCCEL
The examples also contain JavaScript files that look as though they are used to decode these files, for example this one for the file above.
What exactly is going on here? I assume this is a way of reducing the size of the data, but why not just rely on browser gzipping?
And why use a plain text extension when the file is clearly plain text? And why have a custom decoder at all?

It looks like a custom encoding that uses byte values to encode coordinates/GeoJSON features.
For example, this line from /dist-demo/data/building-data.txt:
!GqgmwFrhwbM}C}##K#IBO#IlBh#BOBMn#PHBGd#KC
is decoded using the decodePolyline() utility function into this array:
[
[0.00004,0.00001],
[40.70541,0.00002],
[40.7062,-74.01624],
[40.70619,-74.01593],
[40.70618,-74.01587],
[40.70616,-74.01582],
[40.70615,-74.01574],
[40.7056,-74.01569],
[40.70558,-74.0159],
[40.70556,-74.01582],
[40.70532,-74.01575],
[40.70527,-74.01584],
[40.70531,-74.01586],
[40.70537,-74.01605],
[40.70537,-74.01603]
]
which is substantially larger in JSON format.
So my guess would be that the main reason is to be able to use smaller data files that are still portable/cacheable. It's still line-based clear text, so it's diffable as well.
Also, these files are still compressible. I assume that a full JSON file is not only larger to begin with but also exhibits less favorable compression characteristics than this file. A quick test on building-data.txt shows a compression ratio of roughly 2:1 for gzip/deflate (139,089 bytes to 72,660 bytes compressed). The compression result for the same file in raw JSON won't be anywhere near that.

Working with characters based on their UTF-8 hex codes

I'm working on something that will read a user's text messages and export them to a csv file, which they can then download. The messages are being retrieved from a third-party web interface—I am essentially using js to grab the html of each message and compiling it as needed. The content of each message is added to a variable which, once all message are gathered, is given to a new Blob, which is then downloaded.
The problem I am having is that, in this web interface, emoji are represented as images, rather than characters. Thus, when writing a message containing an emoji to a file, the result is as so:
"Blah blah blah <img height="18px" width="18px" class="emoji adjustedSpriteForMessageDisplay spriteEMOJI sprite-1f612" data-textvalue="%F0%9F%98%92" src="assets/blank.gif">"
Now, from this image, we can get 2 workable values:
The UTF-8 hex value
F09F9892
and the Unicode codepoint (I may be referring to this wrong, I don't know much about encoding).
U+1f612
Now, what I want to do is take either of these values (whichever works better), and write it to the csv file as the character itself. So that, when viewing the csv file in a text editor or what have you, it would appear as
Though I have no idea where to even start with this. Maybe it's as simple as throwing some syntax around the character values, but I haven't been able to get anything from google, because I'm not familiar enough with encoding to know what to Google.

I suggest preprocessing the data as you grab it from the webpage instead of extracting it from the string afterwards.
You can then use decodeURIComponent() to decode the percent-encoded string:
decodeURIComponent('%F0%9F%98%92')
Combine that with jQuery to access the data-textvalue-attribute:
decodeURIComponent($(element).data('textvalue'))
I created a simple example on JSFiddle.
For some reason the emoji doesn't render correctly in the result screen in my browser, but that is a font issue. When looking at the result using a DOM inspector (or copying the text into a different application), the result is shown with a smiley.

CSV file format does not have character encoding information, so Excel usually assumes ASCII.
https://en.wikipedia.org/wiki/Comma-separated_values#General_functionality
Microsoft Excel mangles Diacritics in .csv files?

Images with Base64 Automatism - Javascript (with PHP?)

I heard a lot about Baase64 Encoding for Images in Webdesign.
And i saw a lot of developers they use it for thier headlines with: data:image/png;base64,iVBORw0...
Is there any automatism (with javascript) behind?
Or have they all converted & inserted ? (could not belive)
Example: http://obox-inkdrop.tumblr.com/ (- Headlines)

First of all, the encoding has to be done on the server-side, be it :
automated with a script, that reads the original image file, and returns the base64 encoded string to inject it into the HTML that's being generated
or by hand, and directly placed into the HTML.
The base64 encoding cannot be done on the client-side, as the goal is to avoid sending the image file from the server to the browser (to minimize the number of HTTP requests).
Depending of the language that's used on the server-side, you'll probably find some function to do base64 encoding.
In PHP, you might be interested by base64_encode()

We Keep Coding

JavaScript is the programming language of the Web.

JavaScript Base 64 Decoding Binary Data Doesn't Work - javascript

Related

storing images in xml file

Is it possible to insert a base64 Word file using Office 2013 office api

What is this encoding / why are these .txt files not plain text?

Working with characters based on their UTF-8 hex codes

Images with Base64 Automatism - Javascript (with PHP?)

Categories

Resources