The R rgl package exports an HTML widget with the rglwidget() function, built using the htmlwidgets package. Often the data for this widget is quite large, and Pandoc and webshot2 choke on it.
I would like to try compressing the data when the HTML page is created, and uncompressing it in Javascript before display. I can see that there's a Javascript package pako that appears to do what I want, and it can be "browserified", but I can't see how to make it available to rglwidget(). Can anyone describe what's necessary?
Edited to add some more detail in response to a comment: This needs to happen on the server (or actually, even before the file gets to the server). htmlwidgets produces output in a Markdown document that is converted to HTML by Pandoc, and that step is failing because Pandoc chokes on the large JSON datasets. I'm not sure if it's just the size, or the complexity (I think Pandoc parses the JSON, and appears to blow up to a huge memory footprint before it crashes). I'm hoping that by using a base64 blob Pandoc will handle it better. webshot2 also has problems that may be the same.
2nd Edit: I've got some evidence that it's the size that matters, not the complexity. I used base64 encoding on the JSON to make it simpler (just one long string), but 33% bigger, and things were worse. So I'm back to thinking that compression would help.
Related
A buddy of mine is trying to convert a CSV file full of data into something called a tcsv file for use with a service he is developing.
I could barely find anything on TCSV files except for here This seems to describe what makes a TCSV file.
So my question is, would I be able to use python, (or JS, as in the example) in order to convert a csv to tcsv file? Is this something I can do with code? If someone could explain what a tcsv file is and how it's used that would help, thank you!
Tell your buddy the whole point of .csv files is that they are unstructured apart from the commas separating fields and the newlines demarcating new records.
A clue to how useful the .tscv file extension may be at some future point in time is in the title given by the person on github (your link) experimenting with them: "Simpler, streamable, more compact, easier to read, gzip friendlier than JSON. Hopefully..."
If your buddy insists on a more human-readable form of data he can transform them pretty easily into xml, or even JSON, and he'll very likely have more joy than following that github post's apparently aborted experiment.
That tcsv stuff doesn't look easier to read than well-formatted JSON to me. Though xml seems to win hands-down if written with readability in mind.
See http://json.org/example.html.
I am browsing the deck.gl repo. It ships with some examples with text files, for example this one. These files have a .txt extension, but aren't plain text:
!OohmwFjqwbMg#[?ADKJYXF#^?N?FAD
=wnmwFvvwbM_#WNg####C?C_#UA?AD#?Of#_#UTu#??BK?A??FUVP?#JF?AVP?#JF?AVPGTA?EL#?
=urmwF|swbM_#UFS##BK?C#C#A#E?CIGA?GE?CIGA#CF?#ABA#CJ##GR]Ud#wA\T?#DB?AXP?#DB?A\T
<aymwFnvwbMaAOKCA#OKPk#CCDKAADKAADKAADKAADKAAL_#fBjAIVCCEL
The examples also contain JavaScript files that look as though they are used to decode these files, for example this one for the file above.
What exactly is going on here? I assume this is a way of reducing the size of the data, but why not just rely on browser gzipping?
And why use a plain text extension when the file is clearly plain text? And why have a custom decoder at all?
It looks like a custom encoding that uses byte values to encode coordinates/GeoJSON features.
For example, this line from /dist-demo/data/building-data.txt:
!GqgmwFrhwbM}C}##K#IBO#IlBh#BOBMn#PHBGd#KC
is decoded using the decodePolyline() utility function into this array:
[
[0.00004,0.00001],
[40.70541,0.00002],
[40.7062,-74.01624],
[40.70619,-74.01593],
[40.70618,-74.01587],
[40.70616,-74.01582],
[40.70615,-74.01574],
[40.7056,-74.01569],
[40.70558,-74.0159],
[40.70556,-74.01582],
[40.70532,-74.01575],
[40.70527,-74.01584],
[40.70531,-74.01586],
[40.70537,-74.01605],
[40.70537,-74.01603]
]
which is substantially larger in JSON format.
So my guess would be that the main reason is to be able to use smaller data files that are still portable/cacheable. It's still line-based clear text, so it's diffable as well.
Also, these files are still compressible. I assume that a full JSON file is not only larger to begin with but also exhibits less favorable compression characteristics than this file. A quick test on building-data.txt shows a compression ratio of roughly 2:1 for gzip/deflate (139,089 bytes to 72,660 bytes compressed). The compression result for the same file in raw JSON won't be anywhere near that.
I have a pretty run-of-the-mill node.js server with expressjs installed locally for development purposes; I store various files and request them via a basic HTTP call that returns the file through express' res.download feature. Most of the time, this works without a hitch. For a very small subset of files, however, the end-user receives a file that is much larger than expected (almost 2x) and is unreadable by any conventional viewer. Out of maybe a hundred files, this has only happened twice, and both were JPG files, but the sample is too small to draw any conclusion. What I know :
The issue is reproductible : if it happens with a file, it always happens;
The issue is not related to the way files are stored : if I swap the problematic file with another one but keep everything else the same (name, location, etc.), it works fine;
Right before the res.download happens, the file is okay : checking its size with fs.stats returns the correct value
The HTTP response encounters no visible problem : no error, 200 response code...
The source file seems to have normal metadata and JPG markers
UPDATE I did some tests, and the issue seems to be somehow related to encoding : the mangled response file is, for reasons unknown, encoded in UTF-8 ; the size discrepancy comes from all non-UTF-8 characters being replaced by EF BF BD (the unknown character symbol) ! I still can't understand why it happens, what makes these few files different from others, and if it can be detected and/or corrected upstream.
UPDATE 2 After some additional tests, I still can't quite pinpoint the cause, but I can add the following info :
Systems-wise, the issue happens during the data streaming in fs.js
The root cause is located somewhere in the EXIF data of the image
For those interested, the source image (source.JPG) and download result (response.JPG) can be found here : http://www.sycomor.fr/test/ ; I also added a similar image that isn't affected by the download and comes out clean. For what it's worth, both pictures were taken minutes apart, with the same camera at the same settings, so I strongly doubt the issue is caused by some external source.
Thanks !
Your issue comes from 'connect-livereload' in your express configuration.
It corrupts binary stream while injecting the reloading script.
Refer to https://github.com/intesso/connect-livereload/issues/39 for details.
I'm work on a project recently, which need to pass a binary-stream from npapi plugin to javascript, I've tried following ways:
use NPN_InvokeDefault, i created a string variant which store the binary-stream, and invoke it to javascript, it failed. (i've tried to pass binary-stream read from XXX.txt file, it works!)
i tried to use NPN_NewStream, the example listed in http://www.terraluna.org/dgp/cvsweb/PluginSDK/Documentation/pi3.htm#npnnewstream workes, but the pic is loaded in a new browser tab, i don't know how to recieve it in javascript.
Is there any one have ever met similar problem before? or maybe npapi can't support such kind of data transfering?
looking forward to your suggestiongs, thanks a lot.
Unfortunately, NPAPI was never designed with this purpose in mind. There are a couple of ways you can do it, and none of them are really ideal:
You can create a javascript array and pass the data in small 1-4 byte chunks (this is really very inefficient)
You could create a webserver embedded in the plugin and request the data from there (I have done this and it can work quite well, but keep in mind that if you use this from an SSL website you'll get security warnings when the embedded webserver isn't SSL)
You can base64 encode the binary data and send it as a string.
Those are the ways I have seen it done. The reason you can't send the actual binary data directly as a string is that NPAPI requires string data to be UTF8, but if you base64 encode it then it works fine.
Sorry I can't give you a "happier" solution :-/
I heard a lot about Baase64 Encoding for Images in Webdesign.
And i saw a lot of developers they use it for thier headlines with: data:image/png;base64,iVBORw0...
Is there any automatism (with javascript) behind?
Or have they all converted & inserted ? (could not belive)
Example: http://obox-inkdrop.tumblr.com/ (- Headlines)
First of all, the encoding has to be done on the server-side, be it :
automated with a script, that reads the original image file, and returns the base64 encoded string to inject it into the HTML that's being generated
or by hand, and directly placed into the HTML.
The base64 encoding cannot be done on the client-side, as the goal is to avoid sending the image file from the server to the browser (to minimize the number of HTTP requests).
Depending of the language that's used on the server-side, you'll probably find some function to do base64 encoding.
In PHP, you might be interested by base64_encode()