Efficiently compress numbers in ASCII (using PHP or JS)

Efficiently compress numbers in ASCII (using PHP or JS) - javascript

In a short amount of time, I ran twice into the same problem:
I have a list of coordinates ( latitude, longitude in the case of geo-coordinates — or x,y,z in the case of a 3D OBJ file)
the coordinates are stored as numbers written out in ASCI decimals,... e.g. 3.14159265
the coordinates have decimals
the coordinates are stored as text in a text file or database
the whole bunch gets too large
Now, we could simply ignore the problem and accept a slow response or a more jagged shape - but it nags. A decimal in ASCII uses 8 bits (where we only need 4 to represent the numbers 0…10) and many coordinates share the same first couple of digits... It feels like these files could be compressed easily. Zipping obviously reduces the files a bit, although it varies. Base-encoding also seems to help, but it turns out not to be as efficient as I hoped (about 30%)
Using PHP, What would be a pragmatic approach to compress coordinates stored in text files?
( Pragmatic meaning: reasonably fast, preferably using vanilla PHP )

You can use a quadkey to presort the geo co-coordinates and other presort algorithm, for example, move-to-front and burrow-wheeler. A quadkey is often used in mapping application especially for map tiles but it has interesting features. Just convert the geo coordinate into a binary and concatenate it. Then treat it as base-4 number. Here is a free source code here:http://msdn.microsoft.com/en-us/library/bb259689.aspx. Then use a statistical compression like huffman. The same algorithm is used in delaunay triangulation.

Related

How to parse binary_compressed point cloud?

I want to parse binary_compressed point cloud file in javascript.
I found, that in writeBinaryCompressed() points are 'Convert the XYZRGBXYZRGB structure to XXYYZZRGBRGB to aid compression', but I don't understand what does in mean. Also, I found that in parsing method points
'Unpack the xxyyzz to xyz'.
What does it mean? How to parse points after decompression?

In PCL, point clouds are represented as an array of structures (AoS), meaning that all fields of a point go one after another in the memory, followed by the fields of the next point, and so on. This is in contrast to the structure of arrays (SoA) layout, where first all x coordinates of each point are written, then all y coordinates, and so on. You can find more information and motivation for these layouts in a Wikipedia article.
That being said, I have an implementation of a PCD file loader for three.js that can handle binary compressed format, you may find it here. Specifically, decompression and unpacking happens in lines 96-112.

How can I predict Math.random results?

How can I predict the results from a roulette gaming website csgopolygon.com, given that it is calling Math.random and Math.floor?

Your hunch that it is, in theory, possible to predict the results of Math.random is correct. This is why, if you ever want to build a gaming/gambling application, you should make sure to use a cryptographically secure pseudo-random number generator. If they are using such, then forget about it.
If however you are correct and they are using System.time as the seed to the standard Random generator that comes with Java, there might be a way. It would involve generating millions of numbers sequences with millions of numbers in each sequence, based on seeds corresponding to (future) timestamps, then observing the actual random numbers generated by the website and trying to find the specific sequence among the millions you generated beforehand. If you have a match, you found the seed. And if you have the seed and know where in the sequence they are, you could then theoretically predict the next numbers.
Problems with this approach:
You need to know the exact algorithm they are using, so you can make sure you are using the same
It would take huge amounts of processing power to generate all the sequences
It would take huge amounts of storage to store them
It would take huge amounts of processing power to search the observed sequence among the stored sequences
You don't have the full picture. Even if you found the right seed and position in that seed's sequence, you still need to predict the next number that you will get, but as it's a multiplayer site (I assume), they might be giving that number to another player.
In other answers it is said that predicting the results of Math.random is impossible. This is incorrect. Math.random is actually very predictable, once you know the seed and the iteration (how many numbers were generated since the seed was set). I actually once built a game that generated random levels. I was messing with the seed and found that if I always set the same seed, I would always get the same levels. Which was cool because my game had infinite levels, but level 27 (for example) always looked exactly the same.
However,
They are not using Java. Check out the 'Provably Fair' link at the top. They discuss how you can verify past rolls yourself by executing PHP code.
These guys are smart. They are publishing the old seeds once they dismiss it. This allows you (using the predictable behavior of pseudo-random number generators) to re-generate all rolls and verify that they never tampered with them.
Basically, you want to find the seed that is currently in use... However, point 5 I mentioned above still holds: you don't have the full picture, so how would you predict what roll you would be getting? Apart from that, finding the seed will prove near impossible. These guys know cryptography, so you can bet they are using a secure random number generator.

You can't, and you probably shouldn't develop a gambling addiction as a 16-year-old. That said, even if you could, that site isn't using JavaScript to generate a number, but a server-side language like PHP or ASP.NET.

Is it possible to sum a huge amount of numbers in parallel using the GPU in a browser?

Suppose I have the task of summing 1048576 numbers. Suppose I can also ignore the time involved sending those numbers to the GPU (they don't need to - they can be derived from a simple mathematical formula). Is it possible to sum all those numbers in parallel there?
My attempt: I was going to go with an usual parallel reduction, making the texture 1/4 of its size on each pass, so I'd need log(N) passes. The problem I am having is that a texture holds Vec4<byte> values. I am interested in float values! There is an extension to write Vec4<float> values, but it doesn't allow reading them back, and they are still Vec4s!
Does anyone have a solution for that problem? Considering the tricky nature of WebGL, a minimal code demo would be very helpful.

You may use Web Worker for doing your calculation in another thread in parallel.
Please see this for more details:
https://developer.mozilla.org/en-US/docs/Web/Guide/Performance/Using_web_workers

Are you looking at using WebGL because you need a mobile solution ? There are many limitations in having compute with WebGL. Have you looked at WebCL instead.
If you still want to pursue WebGL, you can take a look at encoding threads like below:
How do I convert a vec4 rgba value to a float?

How to store large sequences of doubles as literal string in javascript file

What is the best way to store ca. 100 sequences of doubles directly in the js file? Each sequence will have length of ca. 10 000 doubles or more.
Requirements
the javascript file must be executed as fast as possible
it is enough for me to iterate through the sequence on demand (I do not need to decode all the numbers at js execution. They will be decoded on event.)
it shouldn't take to much space
The simplest option is probably to use string of CSV format but then the doubles are not stored in the most efficient manner, right?
Another option might be to store the numbers in Base64 byte array, but then I have no idea how to read the base64 string into double.
EDIT:
I would like to use the doubles to transform Matrix4x4 of 3d nodes in Adobe 3D annotations. Adobe allows to import external files but it is so complicated that it might be simpler to include all the data in the js file directly.

As I mentioned in my comment, it is probably not worth it to try and encode the values. Here are some values from my head on the required amount of data to store doubles (updated from comment).
Assuming 1,000,000 values:
Using direct encoding (won't work well in a JS file): 8 B = 8 MB
Using base64: 10.7 B = 10.7 MB
Literals (best case): 1 B + delimiter = 2 MB
Literals (worst case): 21 B + delimiter = 22 MB
Literals (average case assuming evenly distributed values): 19 B + delimiter = 20MB
Note: A double can take 21 bytes (assuming 15 digits of precision) in the worst case like this: 1.23456789101112e-131
As you can see, you still won't be able to cut it below 1/2 of using plain literal values with encoding, and if you plan on doing random-access decoding it will get complicated fast. It may be best to stick to literals. You might get some benefit from using the external file that you mentioned, but that depends on how much effort is needed to do so.
Some ideas on how to optimize using literals:
Depending on the precision required, you could approximate the values and limit a value to, say, 5 digits of precision. This would incredibly shorten the file.
You could compress the file. I think you can specify any number of doubles using 14 characters, (0123456789.e-,) so theoretically, you could compress such a string to half its size. I don't know how good practical modern compression routines are though.

Best way to encode/decode large quantities of data for a JavaScript client?

I'm writing a standalone javascript application with Spine, Node.js, etc.(Here is an earlier incarnation of it if you are interested). Basically, the application is an interactive 'number property' explorer. The idea being that you can select any number, and see what properties it possesses. Is it a prime, or triangular, etc? Where are other numbers that share the same properties? That kind of thing.
At the moment I can pretty easily show like numbers 1-10k, but I would like to show properties for numbers 1-million, or even better 1-billion.
I want my client to download a set of static data files, and then use them to present the information to the user. I don't want to write a server backend.
Currently I'm using JSON for the data files. For some data, I know a simple algorithm to derive the information I'm looking for on the client side, and I use that (ie, is it even?). For the harder numbers, I pre compute them, and then store the values in JSON parseable data files. I've kinda gone a little overboard with the whole thing - I implemented a pure javascript bloom filter and when that didn't scale to 1 million for primes, I tried using CONCISE bitmaps underneath (which didn't help). Eventually I realized that it doesn't matter too much how 'compressed' I get my data, if I'm representing it as JSON.
So the question is - I want to display 30 properties for each number, and I want to show a million numbers...thats like 30 million data points. I want the javascript app to download this data and present it to the user, but I don't want the user to have to download megabytes of information to use the app...
What options do I have for efficiently sending these large sets of data to my javascript only solution?
Can I convert to binary and then read binary on the client side? Examples, please!

How about just computing these data points on the client?
You'll save yourself a lot of headache. You can pre-compute the index chart and leave the rest of the data-points to be processed only when the user selects a particular number.
For the properties exhibited per number. Pure JavaScript on modern desktops is blindingly fast (if you stay away from DOM), I think you'll find processing speed differences are negligible between the algorithmic vs pre-computed JSON solution and you'll be saving yourself a lot of pain and unnecessary bandwith usage.
As for the initial index chart, this displays only the number of properties per number and can be transferred as an array:
'[18,12,9,11,9,7,8,2,6,1,4, ...]'
or in JSON:
{"i": [18,12,9,11,9,7,8,2,6,1,4, ...]}
Note that this works the same for a logarithmic scale since either way you can only attach a value to 1 point in the screen at any one time. You just have to cater the contents of the array accordingly (by returning logarithmic values sequentially on a 1-2K sized array).
You can even use a DEFLATE algorithm to compress it further, but since you can only display a limited amount of numbers on screen (<1-2K pixels on desktop), I would recommend you create your solution around this fact, for example by checking if you can calculate 2K *30 = 60K properties on the go with minimal impact, which will probably be faster than asking the server at this point to give you some JSON.
UPDATE 10-Jan-2012
I just saw your comment about users being able to click on a particular property and get a list of numbers that display that property.
I think the intial transfer of number of properties above can be jazzed up to include all properties in the initial payload, bearing in mind that you only want to transfer the values for numbers displayed in the initial logarithmic scale you wish to display (that means that you can skip numbers if they are not going to be represented on screen when a user first loads the page or clicks on a property). Anything beyond the initial payload can be calculated on the client.
{
"n": [18,12,9,11,9,7,8,2,6,1,4, ...] // number of properties x 1-2K
"p": [1,2,3,5,7,13,...] // prime numbers x 1-2K
"f": [1,2,6, ...] // factorials x 1-2K
}
My guess is that a JSON object like this will be around 30-60K, but you can further reduce this by removing properties whose algorithms are not recursive and letting the client calculate those locally.
If you want an alternative way to compress those arrays when you get to large numbers, you can format your array as a VECTOR instead of a list of numbers, storing differences between one number and the next, this will keep space down when you are dealing with large numbers (>1000). An example of the JSON above using vectors would be as follows:
{
"n": [18,-6,-3,2,-2,-2,1,-6,4,-5,-1, ...] // vectorised no of properties x 1-2K
"p": [1,1,2,2,2,6,...] // vectorised prime numbers x 1-2K
"f": [1,1,4, ...] // vectorised factorials x 1-2K
}

I would say the easiest way would be to break the dataset out into multiple data files. The "client" can then download the files as-needed based on what number(s) the user is looking for.
One advantage of this is that you can tune the size of the data files as you see fit, from one number per file up to all of the numbers in one file. The client only has to know how to pick the file it's numbers are in. This does require there to be some server, but all it needs to do is serve out the static data files.
To reduce the data load, you can also cache the data files using local storage within the browser.

We Keep Coding

JavaScript is the programming language of the Web.