I want to parse binary_compressed point cloud file in javascript.
I found, that in writeBinaryCompressed() points are 'Convert the XYZRGBXYZRGB structure to XXYYZZRGBRGB to aid compression', but I don't understand what does in mean. Also, I found that in parsing method points
'Unpack the xxyyzz to xyz'.
What does it mean? How to parse points after decompression?
In PCL, point clouds are represented as an array of structures (AoS), meaning that all fields of a point go one after another in the memory, followed by the fields of the next point, and so on. This is in contrast to the structure of arrays (SoA) layout, where first all x coordinates of each point are written, then all y coordinates, and so on. You can find more information and motivation for these layouts in a Wikipedia article.
That being said, I have an implementation of a PCD file loader for three.js that can handle binary compressed format, you may find it here. Specifically, decompression and unpacking happens in lines 96-112.
Related
I'm building a visualization for some industrial devices that produce large amounts of time-based data like temperatures, currents or voltages. All data is constantly written to a SQL Server database (can't control that part).
The HTML5 frontend consists of an interactive zoomable chart I made with d3.js. Data series can be added(loaded) to the chart on demand, in which case the frontend sends an ajax request, ASP.NET MVC and EF6 fetches the values from the DB and returns them as Json.
Each data element simply consists of a DateTime and a value. Please note that the values do not get written in regular intervals (like every 2 seconds or so), but in irregular intervals. This is because the device doesn't get polled regularly but sends data on specific events, like a raise/drop of a temperature by a given change of 0.1 °C, for example.
So far everything works really well and smooth, but the large amount of data becomes a problem. For example, when I want to show a line chart for a selected period of lets say 3 month, each data series already consists of appr. 500.000 values, so the Json response from the server also gets bigger and bigger and the request takes longer with growing time periods.
So I am looking for a way to reduce the amount of data without losing relevant information, such as peaks in temperature curves etc., but at the same time I want to smoothen out the noise in the signal.
Here's an example, please keep in mind that this is just a chosen period of some hours or days, usually the user would like to see data for several months or even years as well:
The green lines are temperatures, the red bars are representations of digital states (in this case a heater that makes one of the temp curves go up).
You can clearly see the noise in the signals, this is what I want to get rid of. At the same time, I want to keep characteristic features like the ones after the heater turns on and the temperature strongly rises and falls.
I already tried chopping the raw data into blocks of a given length and then aggregating the data in them, so that I have a min, max and average for that interval. This works, but by doing so I the characteristic features of the curve get lost and everything gets kind of flattened or averaged. Here's a picture of the same period as above zoomed out a bit, so that the aggregating kicks in:
The average of the upper series is shown as the green line, the extent (min/max) of each chop is represented by the green area around the average line.
Is there some kind of fancy algorithm that I can use to filter/smoothen/reduce my data right when it comes out of the DB and before it gets send to the frontend? What are the buzzwords here that I need to dig after? Any specific libraries, frameworks or techniques are highly appreciated, as well as general comments on this topic. I'm interested primarily in server-side solutions, but please feel free to mention client-side Javascript solutions as well as they might surely be of interest for other people facing the same problem.
"Is there some kind of fancy algorithm that I can use to filter/smoothen/reduce my data right when it comes out of the DB and before it gets send to the frontend? What are the buzzwords here that I need to dig after?"
I've asked a friend at the University where I work and she says Fourier Transforms can probably be used... but that looks like Dutch to me :)
Edit: looking at it a bit more myself, and because your data is time sampled, I'm guessing you'll be interested in Discrete Time Fourier Transforms
Further searching around this topic led me here - and that, to my (admittedly unexpert) eyes, looks like something that's useful...
Further Edit:
So, that link makes me think that you should be able to remove (for example) every second sample on the server-side: then on the client-side, you can use the interpolation technique described in that link (using the inverse fourier transform) to effectively "restore" the missing points on the client-side: you've transferred half of the points and yet the resulting graph will be exactly the same because on the client you've interpolated the missing samples.... or is that way off base? :)
In a short amount of time, I ran twice into the same problem:
I have a list of coordinates ( latitude, longitude in the case of geo-coordinates — or x,y,z in the case of a 3D OBJ file)
the coordinates are stored as numbers written out in ASCI decimals,... e.g. 3.14159265
the coordinates have decimals
the coordinates are stored as text in a text file or database
the whole bunch gets too large
Now, we could simply ignore the problem and accept a slow response or a more jagged shape - but it nags. A decimal in ASCII uses 8 bits (where we only need 4 to represent the numbers 0…10) and many coordinates share the same first couple of digits... It feels like these files could be compressed easily. Zipping obviously reduces the files a bit, although it varies. Base-encoding also seems to help, but it turns out not to be as efficient as I hoped (about 30%)
Using PHP, What would be a pragmatic approach to compress coordinates stored in text files?
( Pragmatic meaning: reasonably fast, preferably using vanilla PHP )
You can use a quadkey to presort the geo co-coordinates and other presort algorithm, for example, move-to-front and burrow-wheeler. A quadkey is often used in mapping application especially for map tiles but it has interesting features. Just convert the geo coordinate into a binary and concatenate it. Then treat it as base-4 number. Here is a free source code here:http://msdn.microsoft.com/en-us/library/bb259689.aspx. Then use a statistical compression like huffman. The same algorithm is used in delaunay triangulation.
I have raw data in text file format with lot of repetitive tokens (~25%). I would like to know if there's any algorithm which will help:
(A) store data in compact form
(B) yet, allow at run time to re-constitute the original file.
Any ideas?
More details:
the raw data is consumed in a pure html+javascript app, for instant search using regex.
data is made of tokens containing (case sensitive) alpha characters, plus few punctuation symbols.
tokens are separated by spaces, new lines.
Most promising Algorithm so far: Succinct data structures discussed below, but reconstituting looks difficult.
http://stevehanov.ca/blog/index.php?id=120
http://ejohn.org/blog/dictionary-lookups-in-javascript/
http://ejohn.org/blog/revised-javascript-dictionary-search/
PS: server side gzip is being employed right now, but its only a transport layer optimization, and doesn't help maximize use of offline storage for example. Given the massive 25% repetitiveness, it should be possible to store in a more compact way, isn't it?
Given that the actual use is pretty unclear I have no idea whether this is helpful or not, but for smallest total size (html+javascript+data) some people came up with the idea of storing text data in a greyscale .png file, one byte to each pixel. A small loader script can then draw the .png to a canvas, read it pixel for pixel and reassemble the original data this way. This gives you deflate compression without having to implement it in Javascript. See e.g. here for more detailled information.
Please, do not use a technique like that unless you have pretty esotheric requirements, e.g. for a size-constrained programming competition. Your coworkers will thank you :-)
Generally speaking, it's a bad idea to try to implement compression in JavaScript. Compression is the exact type of work that JS is the worst at: CPU-intensive calculations.
Remember that JS is single-threaded1, so for the entire time spent decompressing data, you block the browser UI. In contrast, HTTP gzipped content is decompressed by the browser asynchronously.
Given that you have to reconstruct the entire dataset (so as to test every record against a regex), I doubt the Succinct Trie will work for you. To be honest, I doubt you'll get much better compression than the native gzipping.
1 - Web Workers notwithstanding.
I'm writing a standalone javascript application with Spine, Node.js, etc.(Here is an earlier incarnation of it if you are interested). Basically, the application is an interactive 'number property' explorer. The idea being that you can select any number, and see what properties it possesses. Is it a prime, or triangular, etc? Where are other numbers that share the same properties? That kind of thing.
At the moment I can pretty easily show like numbers 1-10k, but I would like to show properties for numbers 1-million, or even better 1-billion.
I want my client to download a set of static data files, and then use them to present the information to the user. I don't want to write a server backend.
Currently I'm using JSON for the data files. For some data, I know a simple algorithm to derive the information I'm looking for on the client side, and I use that (ie, is it even?). For the harder numbers, I pre compute them, and then store the values in JSON parseable data files. I've kinda gone a little overboard with the whole thing - I implemented a pure javascript bloom filter and when that didn't scale to 1 million for primes, I tried using CONCISE bitmaps underneath (which didn't help). Eventually I realized that it doesn't matter too much how 'compressed' I get my data, if I'm representing it as JSON.
So the question is - I want to display 30 properties for each number, and I want to show a million numbers...thats like 30 million data points. I want the javascript app to download this data and present it to the user, but I don't want the user to have to download megabytes of information to use the app...
What options do I have for efficiently sending these large sets of data to my javascript only solution?
Can I convert to binary and then read binary on the client side? Examples, please!
How about just computing these data points on the client?
You'll save yourself a lot of headache. You can pre-compute the index chart and leave the rest of the data-points to be processed only when the user selects a particular number.
For the properties exhibited per number. Pure JavaScript on modern desktops is blindingly fast (if you stay away from DOM), I think you'll find processing speed differences are negligible between the algorithmic vs pre-computed JSON solution and you'll be saving yourself a lot of pain and unnecessary bandwith usage.
As for the initial index chart, this displays only the number of properties per number and can be transferred as an array:
'[18,12,9,11,9,7,8,2,6,1,4, ...]'
or in JSON:
{"i": [18,12,9,11,9,7,8,2,6,1,4, ...]}
Note that this works the same for a logarithmic scale since either way you can only attach a value to 1 point in the screen at any one time. You just have to cater the contents of the array accordingly (by returning logarithmic values sequentially on a 1-2K sized array).
You can even use a DEFLATE algorithm to compress it further, but since you can only display a limited amount of numbers on screen (<1-2K pixels on desktop), I would recommend you create your solution around this fact, for example by checking if you can calculate 2K *30 = 60K properties on the go with minimal impact, which will probably be faster than asking the server at this point to give you some JSON.
UPDATE 10-Jan-2012
I just saw your comment about users being able to click on a particular property and get a list of numbers that display that property.
I think the intial transfer of number of properties above can be jazzed up to include all properties in the initial payload, bearing in mind that you only want to transfer the values for numbers displayed in the initial logarithmic scale you wish to display (that means that you can skip numbers if they are not going to be represented on screen when a user first loads the page or clicks on a property). Anything beyond the initial payload can be calculated on the client.
{
"n": [18,12,9,11,9,7,8,2,6,1,4, ...] // number of properties x 1-2K
"p": [1,2,3,5,7,13,...] // prime numbers x 1-2K
"f": [1,2,6, ...] // factorials x 1-2K
}
My guess is that a JSON object like this will be around 30-60K, but you can further reduce this by removing properties whose algorithms are not recursive and letting the client calculate those locally.
If you want an alternative way to compress those arrays when you get to large numbers, you can format your array as a VECTOR instead of a list of numbers, storing differences between one number and the next, this will keep space down when you are dealing with large numbers (>1000). An example of the JSON above using vectors would be as follows:
{
"n": [18,-6,-3,2,-2,-2,1,-6,4,-5,-1, ...] // vectorised no of properties x 1-2K
"p": [1,1,2,2,2,6,...] // vectorised prime numbers x 1-2K
"f": [1,1,4, ...] // vectorised factorials x 1-2K
}
I would say the easiest way would be to break the dataset out into multiple data files. The "client" can then download the files as-needed based on what number(s) the user is looking for.
One advantage of this is that you can tune the size of the data files as you see fit, from one number per file up to all of the numbers in one file. The client only has to know how to pick the file it's numbers are in. This does require there to be some server, but all it needs to do is serve out the static data files.
To reduce the data load, you can also cache the data files using local storage within the browser.
I'm sitting here with a huge geoJSON that I got from an Open Street Map shape-file. However, most of the polygons are unnecessary. These could, in theory, easily be singled out based on certain properties.
But how do I query the geoJSON file to remove certain elements (features)? Or would it be easier to save the shape-file in another format (working in QGIS)?
Link to sample of json-file: http://dl.dropbox.com/u/15955488/hki_test_sample.json (240 kB)
When you say "query the geoJSON," are you talking about having the source where you get the geoJSON give you a subset of data? There is no widely-implemented standard for "querying" JSON like this, but each site you retrieve from may have its own parameters to reduce the size of data you get.
If you're talking about paring down the data in client-side code, simply looping through the structure and removing properties (with delete) and array items is what you'd have to do.
Shapefile beats GeoJSON for large (not mega) data. It supports random access to features. To get at the GeoJSON features in a collection you have to read and deserialize the entire file.
Depending on how you want to edit it and what software is available you have a few options. If you have access to Safe FME this is by far the best geographic feature manipuluation software and will give you tons of options (it can read / write (and convert between) just about any geographic format). If you're just looking for a text editor that can handle the volume of data I would look at Notepad++ - it can hold a lot of text and you can do find / replace using regular expressions. Safe FME can be a little pricy, but you might be able to get a trial
As Jacob says, just iterate and remove the elements you don't want. I like http://documentcloud.github.com/underscore/#reject for convenience.
If you are going to permanently remove fields just convert it to a shapefile, remove the fields you don't want, and re-export it as GeoJSON.
I realize this question is old, but if anyone comes across this now, I'd recommend TopoJSON.
Convert it to TopoJSON.
By default TopoJSON removes all attributes, but you can flag those you'd like to keep like this:
topojson -o output.topojson -p fieldToKeep,anotherFieldToKeep input.geojson
More info in the TopoJSON command line reference