Decompressing bzip2 data in Javascript

Decompressing bzip2 data in Javascript - javascript

I ultimately have to consume some data from a Javascript file that looks as follows:
Note: The base64 is illustrative only.
function GetTripsDataCompressed() { return 'QlpoOTFBWSZTWdXoWuEDCAgfgBAHf/.....=='; }
GetTripsDataCompressed() returns a base64 string that is derived as an array of objects converted to JSON using JSON.NET and the resulting string then compressed to bzip2 using SharpCompress with the resulting memory stream Base64 encoded.
This is what I have and cannot change it.
I am struggling to find a bzip2 JavaScript implementation that will take the result of:
var rawBzip2Data = atob(GetTripsDataCompressed());
and convert rawBzip2Data back into the string that is the JSON array. I cannot use something like compressjs as I need to support IE 10 and as it uses typed arrays that means IE10 support is out.
So it appears that my best option is https://github.com/antimatter15/bzip2.js however because I have not created an archive and only bzip2 a string it raises an error of Uncaught No magic number found after doing:
var c = GetTripsDataCompressed();
c = atob(c);
var arr = new Uint8Array(c);
var bitstream = bzip2.array(arr);
bzip2.simple(bitstream);
So can anyone help me here to decompress a BZip2, Base64 encoded string from JavaScript using script that is IE 10 compliant? Ultimately I don't care whether it uses https://github.com/antimatter15/bzip2.js or some other native JavaScript implementation.

It seems to me the answer is in the readme:
decompress(bitstream, size[, len]) does the main decompression of a single block. It'll return -1 if it detects that it's the final block, otherwise it returns a string with the decompressed data. If you want to cap the output to a certain number of bytes, set the len argument.
Also, keep in mind the repository doesn't have a license attached. You'll need to reach out to the author if you want to use the code. That might be tricky given that the repository is eight years old.
On the other hand, the Bzip2 algorithm itself is open-source (BSD-like license), so you can just reimplement it yourself in Javascript. It's just a few hundred lines of relatively straight-forward code.

Related

How do I insert a PNG comment block when saving an HTML5 canvas using javascript toDataURL?

I have a compact canvas-to-png download saver function (see code below).
This code works very well and I am satisfied with its output... mostly.
Would a second replace suffice? What would that replace look like?
My only other option is to post-process the file with imagemagick.
Any ideas?
More completely: I want to add metadata from javascript.
I found this link http://dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_PNG_files
which details the structures, and I may be able to figure it out with sufficient time.
If anyone has experience and can shorten this for me, I would appreciate it.
//------------------------------------------------------------------
function save () // has to be function not var for onclick to work.
//------------------------------------------------------------------
{
var element = document.getElementById("saver");
element.download = savename;
element.href = document.
getElementById(id.figure1a.canvas).
toDataURL("image/png").
replace(/^data:image\/[^;]/,'data:application/octet-stream');
}

The Base-64 representation has little to do with the internal chunks. It's just [any] binary data encoded as string so it can be transferred over string-only protocols (or displayed in a textual context).
It's perhaps a bit broad to create an example, but hopefully showing the main steps will help to achieve what you're looking for:
To add a chunk to a PNG you would first have to convert the data for it into an ArrayBuffer using XHR/fetch in the case of Data-URIs, or FileReader in case you have the PNG as Blob (which I recommend. See toBlob()).
Add a DataView to the ArrayBuffer
Go to position 0x08 in the array which will represent the start of the IHDR chunk, read the length of the chunk (Uint32) (it's very likely it has the same static size for almost any PNG but since it's possible to have changes, and you don't need to remember the chunk size we'll just read it from here). Add length to position (+4 for CRC-32 at the end of the chunk, and +4 if you didn't move the pointer while reading the length), typically this should land you at position 0x21.
You now have the position for the next chunk which we can use to insert our own text chunks
Split that first part into a part-array (a regular array) using a sub-array with the original ArrayBuffer, e.g. new Uint8Array(arraybuffer, 0, position); - you can also use the subarray method.
Produce the new chunk* as typed array and add to part-array
Add the remaining part of the original PNG array without the first part to the part-array, e.g. new Uint8Array(arraybuffer, position, length - position);
Convert the part-array to a Blob using the part-array directly as argument (var newPng = new Blob(partArray, {type: "image/png"});). This will now contain the custom chunk. From there you can use an Object-URL with it to read it back as an image (or make it available for download).
*) Chunk:
For tEXt be aware of it is limited to the Latin-1 charset which means you'll have to whitewash the string you want to use - use iTXt for unicode (UTF-8) content - we'll use tEXt here for simplicity.
The keyword and value is separated by a NUL-byte (0x00) in a tEXt chunk, and the keyword must be exactly typed as defined in the spec.
Build the chunk this way:
get byte-size from string
add 12 bytes (for length, four-cc and crc-32)
format the array this way (you can use a DataView here as well):
Uint32 - length of chunk (data only in number of bytes)
Uint32 - "tEXt" as four-cc
[...] - The data itself (copy byte-wise)
Uint32 - CRC32* which includes the FourCC but not length and itself.
All data in a PNG is big-endian.
To calculate CRC-32 feel free to use this part of my pngtoy solution (the LUT is built this way). Here is one way to format a four-cc:
function makeFourCC(n) { // n = "tEXt" etc., big-endian
var c = n.charCodeAt.bind(n);
return (c(0) & 0x7f) << 24 | (c(1) & 0x7f) << 16 | (c(2) & 0x7f) << 8 | c(3) & 0x7f
}

Differing SHA1 hashes for identical values on the server and the client

On the client I'm using Rusha, which I've put into a wrapper:
function cSHA1(m){
return (new Rusha).digest(m);
}
On the server I'm using Node's native crypto module,
function sSHA1(m){
var h = crypto.createHash('sha1');
h.update(m);
return h.digest('hex');
}
Let's try it:
cSHA1('foo')
"0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"
sSHA1('foo')
'0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33'
cSHA1('bar')
"62cdb7020ff920e5aa642c3d4066950dd1f01f4d"
sSHA1('bar')
'62cdb7020ff920e5aa642c3d4066950dd1f01f4d'
So far, so good.
Now let's throw them a curveball...
cSHA1(String.fromCharCode(10047))
"5bab61eb53176449e25c2c82f172b82cb13ffb9d"
sSHA1(String.fromCharCode(10047))
'5bab61eb53176449e25c2c82f172b82cb13ffb9d'
Ok, fine.
I have a string, and it shouldn't be important how I got it, and it's a long story, anyway, but:
s.split('').map(function(c){
return c.charCodeAt();
})
yields the exact same result in both places:
[58, 34, 10047, 32, 79]
Now, let's hash it:
s
":"✿ O"
cSHA1(s)
"a199372c8471f35d14955d6abfae4ab12cacf4fb"
s
':"? O'
sSHA1(s)
'fc67b1e4ceb3e57e5d9f601ef4ef10c347eb62e6'
This has caused me a fair bit of grief; what the hell?

I have run into the same problem with the German Umlaut character when comparing SHA1 hashes of PHPs sha1 and Rusha.
The reason is simple: some stoned fool decided Javascript strings are UTF16 - and PHP doesn't give a sh*t about encoding, it just takes what is there. So, if you supply PHP a json_decode("\u00e4"), it will turn this into a 2-byte string 0xc3 0xa4 (UTF8).
JS instead will make a single UTF16 byte out of this (0xE4) - and Rusha's manual explicitly says all codepoints must be below 256.
To help yourself, use the UTF18-to-8 library at http://www.onicos.com/staff/iz/amuse/javascript/expert/utf.txt like sha.digest(utf16to8("\u00e4")). This will feed rusha correct codepoints.

Fastest Way to Parse this XML in JS

Say I have this XML with about 1000+ bookinfo nodes.
<results>
<books>
<bookinfo>
<name>1</dbname>
</bookinfo>
<bookinfo>
<name>2</dbname>
</bookinfo>
<bookinfo>
<name>3</dbname>
</bookinfo>
</books>
</results>
I'm currently using this to get the name of each book:
var books = this.req.responseXML.getElementsByTagName("books")[0].getElementsByTagName("bookinfo")
Then use a for loop to do something with each book name:
var bookName = books[i].getElementsByTagName("name")[0].firstChild.nodeValue;
I'm finding this really slow when books is really big. Unfortunately, there's no way to limit the result set nor specify a different return type.
Is there a faster way?

You can try fast xml parser to convert XML data to JSON which is implemented in JS. Here is the benchmark against other parser.
var parser = require('fast-xml-parser');
var jsonObj = parser.parse(xmlData);
// when a tag has attributes
var options = {
attrPrefix : "#_" };
var jsonObj = parser.parse(xmlData,options);
If you don't want to use npm library, you can include parser.js in your HTML directly.
Disclaimer: I'm the author of this library.

Presumably you are using XMLHttpRequest, in which case the XML is parsed before you call any methods of responseXML (i.e. the XML has already been parsed and turned into a DOM). If you want a faster parser, you'll probably need a different user agent or a different javascript engine for your current UA.
If you want a faster way to access content in the XML document, consider XPath:
Mozilla documentation
MSDN documentation
I used an XPath expression (like //parentNode/node/text()) on a 134KB local file to extract the text node of 439 elements, put those into an array (because that's what my standard evalXPath() function does), then iterate over that array to put the nodeValue for each text node into another array, doing two replace calls with regular expressions to format the text, then alert() that to the screen with join('\n'). It took 3ms.
A 487KB file with 529 nodes took 4ms (IE 6 reported 15ms but its clock has very poor resolution). Of course my network latency will be nearly zero, but it shows that the XML parser, XPath evaluator and script in general can process that size file quickly.

if you want to parse the information from that xml much faster, try txml. it is very easy to use and for the type of xml you have shown, you can use its simplify method. it will give you very clean objects to work with.
https://www.npmjs.com/package/txml
Disclaimer: I'm the author of this library.

VBScript slow byte array copy

I am using the following code to read in a binary file in VBScript and store it in a byte array which I then access from Javascript and copy to a JS array, basically just a sneaky way (the only way!) i've found of reading binary data in my JS.
Function readBinaryFile(fileName)
dim inStream,buff
set inStream=CreateObject("ADODB.Stream")
inStream.Open
inStream.type=1
inStream.LoadFromFile fileName
buff=inStream.Read()
inStream.Close
Dim byteArray()
Dim i
Dim len
len = LenB(buff)
ReDim byteArray(len)
For i = 1 To len
byteArray(i-1) = AscB(MidB(buff, i, 1))
Next
readBinaryFile=byteArray
End Function
It appears to work exactly as expected, the only problem being it seems extremely slow. For example, reading in a 300kb file can take over 2 minutes. I am expecting to read files up to around 2meg.
Could anyone explain why this is such a slow operation and if there's anything I can do to speed it up?
Thanks.

The problem is the loop. Try using disconnected recordset to do the conversion:
Function RSBinaryToString(xBinary)
'Antonin Foller, http://www.motobit.com
'RSBinaryToString converts binary data (VT_UI1 | VT_ARRAY Or MultiByte string)
'to a string (BSTR) using ADO recordset
Dim Binary
'MultiByte data must be converted To VT_UI1 | VT_ARRAY first.
If vartype(xBinary)=8 Then Binary = MultiByteToBinary(xBinary) Else Binary = xBinary
Dim RS, LBinary
Const adLongVarChar = 201
Set RS = CreateObject("ADODB.Recordset")
LBinary = LenB(Binary)
If LBinary>0 Then
RS.Fields.Append "mBinary", adLongVarChar, LBinary
RS.Open
RS.AddNew
RS("mBinary").AppendChunk Binary
RS.Update
RSBinaryToString = RS("mBinary")
Else
RSBinaryToString = ""
End If
End Function
Function MultiByteToBinary(MultiByte)
'© 2000 Antonin Foller, http://www.motobit.com
' MultiByteToBinary converts multibyte string To real binary data (VT_UI1 | VT_ARRAY)
' Using recordset
Dim RS, LMultiByte, Binary
Const adLongVarBinary = 205
Set RS = CreateObject("ADODB.Recordset")
LMultiByte = LenB(MultiByte)
If LMultiByte>0 Then
RS.Fields.Append "mBinary", adLongVarBinary, LMultiByte
RS.Open
RS.AddNew
RS("mBinary").AppendChunk MultiByte & ChrB(0)
RS.Update
Binary = RS("mBinary").GetChunk(LMultiByte)
End If
MultiByteToBinary = Binary
End Function
In your case have readBinaryFile return the "ASCII contents" of the file and use it instead of the array: readBinaryFile = RSBinaryToString(buf)

I think its because you are using a high level scripting language to emulate something that should be done by low-level compiled languages. I guess there's a reason scripts don't support binary data. They are not designed to deal with data one byte at a time. Looping through 300,000 bytes of data would take a noticable amount of time in many languages, but a non-compiled (scripting) language makes it even worse. The only things I can suggest are using a compiled language instead, or using some ActiveX object created in a compiled language that supports the operations you want to perform without having to perform them byte-by-byte in script. Do you have the option of using compiled components or other languages?

Still not found a solution to this, but it's a side issue now and does work as it is (if very slowly in some circumstances) so not got time to look at it any further unfortunately.

Returning a byte string to ExternalInterface.call throws an error

I am working on my open source project Downloadify, and up until now it simply handles returning Strings in response to ExternalInterface.call commands.
I am trying to put together a test case using JSZip and Downloadify together, the end result being that a Zip file is created dynamically in the browser, then saved to the disk using FileReference.save. However, this is my problem:
The JSZip library can return either a base64 encoded string of the Zip, or the raw byte string. The problem is, if I return that byte string in response to the ExternalInterface.call command, I get this error:
Error #1085: The element type "string" must be terminated by the matching end-tag "</string>"
ActionScript 3:
var theData:* = ExternalInterface.call('Downloadify.getTextForSave',queue_name);
Where queue_name is just a string used to identify the correct instance in JS.
JavaScript:
var zip = new JSZip();
zip.add("test.txt", "Hello world!\n");
var content = zip.generate(true);
return content;
If I instead return a normal string instead of the byte string, the call works correctly.I would like to avoid using base64 as I would have to include a base64 decoder in my swf which will increase its size.
Finally: I am not looking for a AS3 Zip generator. It is imperative to my project to have that part run in JavaScript
I am admittedly not a AS3 programmer by trade, so if you need any more detail please let me know.

When data is being returned from javascript calls it's being serialized into an XML string. So if the "raw string" returned by JSZip will include characters which make the XML non-valid, which is what I think is happening here, you'll get errors like that.
What you get as a return is actually:
<string>[your JSZip generated string]</string>
Imagine your return string includes a "<" char - this will make the xml invalid, and it's hard to tell what character codes will a raw byte stream translate too.
You can read more about the external API's XML format on LiveDocs

i think the problem is caused by the fact, that flash expects a utf8 String and you throw some binary stuff at it. i think for example 0x00FF will not turn out to be valid utf8 ...
you can try fiddling around with flash.system::System.setCodePage, but i wouldn't be too optimistic ...
i guess a base64 decoder is probably really the easiest ... i'd rather worry about speed than about file size though ... this rudimentary decoder method uses less than half a K:
public function decodeBase64(source:String):ByteArray {
var ret:ByteArray = new ByteArray();
var map:Object = new Object();
var i:int = 0;
for each (var char:String in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".split("")) map[char] = i++;
map["="] = 0;
source = source.split("\n").join("").split("\r").join("");//remove linebreaks
for (i = 0; i < source.length/4; i++) {
var buf:int = 0;
for each (char in source.substr(i * 4, 4).split("")) buf = (buf << 6) + map[char];
ret.writeByte(buf >>> 16);
ret.writeShort(buf);
}
return ret;
}
you could simply shorten function names and take a smaller image ... or use ColorTransform or ConvolutionFilter on one image instead of four ... or compile the image into the SWF for smaller overall size ... or reduce function name length ...
so unless you're planning on working with MBs of data, this is the way to go ...

We Keep Coding

JavaScript is the programming language of the Web.

Decompressing bzip2 data in Javascript - javascript

Related

How do I insert a PNG comment block when saving an HTML5 canvas using javascript toDataURL?

Differing SHA1 hashes for identical values on the server and the client

Fastest Way to Parse this XML in JS

VBScript slow byte array copy

Returning a byte string to ExternalInterface.call throws an error

Categories

Resources