I need to send a blob to the server with ajax, but it can end up getting somewhat large, and I'd like to decrease upload time. I've tried jszip already, but that just gave me an empty file inside the zip. I've also tried btoa(), but it turns out that the encoded value just ends up being [object Blob] instead of the actual blob data. What are my options for compressing blobs?
Here was the code I used for jszip:
var zip = new JSZip();
zip.file("recording.wav", blobFile);
var content = zip.generate();
I then appended "content" to a FormData object and sent it to the server. On the server side, I decoded the POST data (from base64). The zip file opened just fine, but recording.wav was a 0 length file.
Additionally, I've tried using the LZW implementation found here. This was the additional code I used to compress it:
var compressed;
var reader = new FileReader();
reader.onload = function(event){
compressed = LZW.compress(event.target.result);
};
reader.readAsText(blobFile);
However, decompressing it returns null.
Caveat: compressing things like audio files would be better done using an algorithm meant specifically for that type of data, perhaps something lossy. However, knowing how hard it was to find a reasonable lossless implementation as provided below, I'm very concerned that it will be hard to find a good implementation in Javascript for that type of data specifically that meets your needs.
In any case, I've had this general need for compression/decompression in Javascript as well, and I needed the same algorithm to work both client (browser) and server-side (node.js) and I needed it to work on very large files. I had checked out jszip and I also tried that LZW algorithm among at least five or six others none of which satisfied the requirements. I can't remember what the issue was with each specifically, but suffice to say it is surprisingly hard to find a good and FAST compressor/decompressor in javascript that works both server and client side and handles large files! I tried at least a dozen different implementations of various compression algorithms, and finally settled with this one - it hasn't failed me yet!
UPDATE
This is the original source:
https://code.google.com/p/jslzjb/source/browse/trunk/Iuppiter.js?r=2
By someone named Bear - thanks Bear, whoever you are, you're the best.
It is LZJB: http://en.wikipedia.org/wiki/LZJB
UPDATE 2
Corrected a problem with missing semicolon - should not give the object not a function error any longer.
This implementation stops working on data less than about 80 characters in length. So I updated the example to reflect that.
Realized the base64 encode/decode methods are in fact exposed on the object passed in for this version, so...
Currently seeing what we can do about specific blob types - what for example the best approach would be for a image versus audio etc as that would be useful for JS folks in general... will update here with what is found.
UPDATE 3
There is a much better wrapper around the original Iuppiter source from Bear than the one I posted below. It is written by cscott and on github here: https://github.com/cscott/lzjb
I'll be switching to this one, as it does streams as well.
Below is an example in Node.js of its use with a wav file. But before copying the example, let me give you the terrible news first, at least for this one wav file that I tried:
63128 Jun 19 14:09 beep-1.wav
63128 Jun 19 17:47 beep-2.wav
89997 Jun 19 17:47 beep-2.wav.compressed
So it successfully regenerated the wav (and it played). However, the compressed one appears to be larger than the original. Well shoot. In any case, might be good to try on your data, you never know, you might get lucky. Here's the code I used:
var fs = require('fs');
var lzjb = require('lzjb');
fs.readFile('beep-1.wav', function(err, wav){
// base 64 first
var encoded = wav.toString('base64');
// then utf8 - you don't want to go utf-8 directly
var data = new Buffer(encoded, 'utf8');
// now compress
var compressed = lzjb.compressFile(data, null, 9);
// the next two lines are unnecessary, but to see what kind of
// size is written to disk to compare with the original binary file
var compressedBuffer = new Buffer(compressed, 'binary');
fs.writeFile('beep-2.wav.compressed', compressedBuffer, 'binary', function(err) {});
// decompress
var uncompressed = lzjb.decompressFile(compressed);
// decode from utf8 back to base64
var encoded2 = new Buffer(uncompressed).toString('utf8');
// decode back to binary original from base64
var decoded = new Buffer(encoded2, 'base64');
// write it out, make sure it is identical
fs.writeFile('beep-2.wav', decoded, function(err) {});
});
At the end of the day, I think its going to be too difficult to achieve any level of compression on most forms of binary data that isn't clobbered by the resulting base64 encoding. The days of control characters for terminals still haunt us to this day. You could try upping to a different base, but that has its risks and issues as well.
See this for example:
What is the most efficient binary to text encoding?
And this:
Why don't people use base128?
One thing though, definitely before you accept the answer, please please try it out on your blob, I've mainly used it for compressing utf-8, and I'd like to be sure it works on your specific data.
In any case, here it is!
/**
$Id: Iuppiter.js 3026 2010-06-23 10:03:13Z Bear $
Copyright (c) 2010 Nuwa Information Co., Ltd, and individual contributors.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of Nuwa Information nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
$Author: Bear $
$Date: 2010-06-23 18:03:13 +0800 (星期三, 23 六月 2010) $
$Revision: 3026 $
*/
var fastcompressor = {};
(function (k) {
k.toByteArray = function (c) {
var h = [],
b, a;
for (b = 0; b < c.length; b++) a = c.charCodeAt(b), 127 >= a ? h.push(a) : (2047 >= a ? h.push(a >> 6 | 192) : (65535 >= a ? h.push(a >> 12 | 224) : (h.push(a >> 18 | 240), h.push(a >> 12 & 63 | 128)), h.push(a >> 6 & 63 | 128)), h.push(a & 63 | 128));
return h
};
k.Base64 = {
CA: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/",
CAS: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_",
IA: Array(256),
IAS: Array(256),
init: function () {
var c;
for (c = 0; 256 > c; c++) k.Base64.IA[c] = -1, k.Base64.IAS[c] = -1;
c = 0;
for (iS = k.Base64.CA.length; c < iS; c++) k.Base64.IA[k.Base64.CA.charCodeAt(c)] = c, k.Base64.IAS[k.Base64.CAS.charCodeAt(c)] = c;
k.Base64.IA["="] = k.Base64.IAS["="] = 0
},
encode: function (c, h) {
var b, a, d, e, m, g, f, l, j;
b = h ? k.Base64.CAS : k.Base64.CA;
d = c.constructor == Array ? c : k.toByteArray(c);
e = d.length;
m = 3 * (e / 3);
g = (e - 1) / 3 + 1 << 2;
a = Array(g);
for (l = f = 0; f < m;) j = (d[f++] & 255) << 16 | (d[f++] & 255) << 8 | d[f++] & 255, a[l++] = b.charAt(j >> 18 & 63), a[l++] = b.charAt(j >> 12 & 63), a[l++] = b.charAt(j >> 6 & 63), a[l++] = b.charAt(j & 63);
f = e - m;
0 < f && (j = (d[m] &
255) << 10 | (2 == f ? (d[e - 1] & 255) << 2 : 0), a[g - 4] = b.charAt(j >> 12), a[g - 3] = b.charAt(j >> 6 & 63), a[g - 2] = 2 == f ? b.charAt(j & 63) : "=", a[g - 1] = "=");
return a.join("")
},
decode: function (c, h) {
var b, a, d, e, m, g, f, l, j, p, q, n;
b = h ? k.Base64.IAS : k.Base64.IA;
c.constructor == Array ? (d = c, m = !0) : (d = k.toByteArray(c), m = !1);
e = d.length;
g = 0;
for (f = e - 1; g < f && 0 > b[d[g]];) g++;
for (; 0 < f && 0 > b[d[f]];) f--;
l = "=" == d[f] ? "=" == d[f - 1] ? 2 : 1 : 0;
a = f - g + 1;
j = 76 < e ? ("\r" == d[76] ? a / 78 : 0) << 1 : 0;
e = (6 * (a - j) >> 3) - l;
a = Array(e);
q = p = 0;
for (eLen = 3 * (e / 3); p < eLen;) n = b[d[g++]] << 18 | b[d[g++]] <<
12 | b[d[g++]] << 6 | b[d[g++]], a[p++] = n >> 16 & 255, a[p++] = n >> 8 & 255, a[p++] = n & 255, 0 < j && 19 == ++q && (g += 2, q = 0);
if (p < e) {
for (j = n = 0; g <= f - l; j++) n |= b[d[g++]] << 18 - 6 * j;
for (b = 16; p < e; b -= 8) a[p++] = n >> b & 255
}
if (m) return a;
for (n = 0; n < a.length; n++) a[n] = String.fromCharCode(a[n]);
return a.join("")
}
};
k.Base64.init();
NBBY = 8;
MATCH_BITS = 6;
MATCH_MIN = 3;
MATCH_MAX = (1 << MATCH_BITS) + (MATCH_MIN - 1);
OFFSET_MASK = (1 << 16 - MATCH_BITS) - 1;
LEMPEL_SIZE = 256;
k.compress = function (c) {
var h = [],
b, a = 0,
d = 0,
e, m, g = 1 << NBBY - 1,
f, l, j = Array(LEMPEL_SIZE);
for (b = 0; b < LEMPEL_SIZE; b++) j[b] =
3435973836;
c = c.constructor == Array ? c : k.toByteArray(c);
for (b = c.length; a < b;) {
if ((g <<= 1) == 1 << NBBY) {
if (d >= b - 1 - 2 * NBBY) {
f = b;
for (d = a = 0; f; f--) h[d++] = c[a++];
break
}
g = 1;
m = d;
h[d++] = 0
}
if (a > b - MATCH_MAX) h[d++] = c[a++];
else if (e = (c[a] + 13 ^ c[a + 1] - 13 ^ c[a + 2]) & LEMPEL_SIZE - 1, l = a - j[e] & OFFSET_MASK, j[e] = a, e = a - l, 0 <= e && e != a && c[a] == c[e] && c[a + 1] == c[e + 1] && c[a + 2] == c[e + 2]) {
h[m] |= g;
for (f = MATCH_MIN; f < MATCH_MAX && c[a + f] == c[e + f]; f++);
h[d++] = f - MATCH_MIN << NBBY - MATCH_BITS | l >> NBBY;
h[d++] = l;
a += f
} else h[d++] = c[a++]
}
return h
};
k.decompress = function (c,
h) {
var b, a = [],
d, e = 0,
m = 0,
g, f, l = 1 << NBBY - 1,
j;
b = c.constructor == Array ? c : k.toByteArray(c);
for (d = b.length; e < d;) {
if ((l <<= 1) == 1 << NBBY) l = 1, f = b[e++];
if (f & l)
if (j = (b[e] >> NBBY - MATCH_BITS) + MATCH_MIN, g = (b[e] << NBBY | b[e + 1]) & OFFSET_MASK, e += 2, 0 <= (g = m - g))
for (; 0 <= --j;) a[m++] = a[g++];
else break;
else a[m++] = b[e++]
}
if (!("undefined" == typeof h ? 0 : h)) {
for (b = 0; b < m; b++) a[b] = String.fromCharCode(a[b]);
a = a.join("")
}
return a
}
})(fastcompressor);
And if memory serves... here's how you use it:
var compressed = fastcompressor.compress("0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789"); // data less than this length poses issues.
var decompressed = fastcompressor.decompress(compressed);
Rgds....Hoonto/Matt
Also, what I've posted is minified but beautified, and very slightly adapted for ease-of-use. Check the link in the update above for the original stuff.
JS Zip will work fine just correct your syntax..
function create_zip() {
var zip = new JSZip();
zip.add("recording.wav", blobfile);//here you have to give blobFile in the form of raw bits >> convert it in json notation.. or stream ..
zip.add("hello2.txt", "Hello Second World\n");//this is optional..
content = zip.generate();
location.href="data:application/zip;base64," + content;
}
you can add multiple files too..
Just zip.file will become zip.add
and then zip.generate() will do the rest.. as you have done,
or refer old Post its last part of JavaScript, and NativeBridge will be helpful if you can utilize, in this post user records using Objective C that you can ignore, but sends this object using JavaScript and Socket that you can/may utilize..
I hope this will do ... :)
Related
I built an application to suggest email addresses fixes, and I need to detect email addresses that are basically not real existing email addresses, like the following:
14370afcdc17429f9e418d5ffbd0334a#magic.com
ce06e817-2149-6cfd-dd24-51b31e93ea1a#stackoverflow.org.il
87c0d782-e09f-056f-f544-c6ec9d17943c#microsoft.org.il
root#ns3160176.ip-151-106-35.eu
ds4-f1g-54-h5-dfg-yk-4gd-htr5-fdg5h#outlook.com
h-rt-dfg4-sv6-fg32-dsv5-vfd5-ds312#gmail.com
test#454-fs-ns-dff4-xhh-43d-frfs.com
I could do multi regex checks, but I don't think I will hit the good rate % of the suspected 'not-real' email addresses, as I go to a specific regex pattern each time.
I looked in:
Javascript script to find gibberish words in form inputs
Translate this JavaScript Gibberish please?
Detect keyboard mashed email addresses
Finally I looked over this:
Unable to detect gibberish names using Python
And It seems to fit my needs, I think. A script that will give me some score about the possibility of the each part of the email address to be a Gibberish (or not real) email address.
So what I want is the output to be:
const strings = ["14370afcdc17429f9e418d5ffbd0334a", "gmail", "ce06e817-2149-6cfd-dd24-51b31e93ea1a",
"87c0d782-e09f-056f-f544-c6ec9d17943c", "space-max", "ns3160176.ip-151-106-35",
"ds4-f1g-54-h5-dfg-yk-4gd-htr5-fdg5h", "outlook", "h-rt-dfg4-sv6-fg32-dsv5-vfd5-
ds312", "system-analytics", "454-fs-ns-dff4-xhh-43d-frfs"];
for (let i = 0; i < strings.length; i++) {
validateGibbrish(strings[i]);
}
And this validateGibberish function logic will be similar to this python code:
from nltk.corpus import brown
from collections import Counter
import numpy as np
text = '\n'.join([' '.join([w for w in s]) for s in brown.sents()])
unigrams = Counter(text)
bigrams = Counter(text[i:(i+2)] for i in range(len(text)-2))
trigrams = Counter(text[i:(i+3)] for i in range(len(text)-3))
weights = [0.001, 0.01, 0.989]
def strangeness(text):
r = 0
text = ' ' + text + '\n'
for i in range(2, len(text)):
char = text[i]
context1 = text[(i-1):i]
context2 = text[(i-2):i]
num = unigrams[char] * weights[0] + bigrams[context1+char] * weights[1] + trigrams[context2+char] * weights[2]
den = sum(unigrams.values()) * weights[0] + unigrams[char] + weights[1] + bigrams[context1] * weights[2]
r -= np.log(num / den)
return r / (len(text) - 2)
So in the end I will loop on all the strings and get something like this:
"14370afcdc17429f9e418d5ffbd0334a" -> 8.9073
"gmail" -> 1.0044
"ce06e817-2149-6cfd-dd24-51b31e93ea1a" -> 7.4261
"87c0d782-e09f-056f-f544-c6ec9d17943c" -> 8.3916
"space-max" -> 1.3553
"ns3160176.ip-151-106-35" -> 6.2584
"ds4-f1g-54-h5-dfg-yk-4gd-htr5-fdg5h" -> 7.1796
"outlook" -> 1.6694
"h-rt-dfg4-sv6-fg32-dsv5-vfd5-ds312" -> 8.5734
"system-analytics" -> 1.9489
"454-fs-ns-dff4-xhh-43d-frfs" -> 7.7058
Does anybody have a hint how to do it and can help?
Thanks a lot :)
UPDATE (12-22-2020)
I manage to write some code based on #Konstantin Pribluda answer, the Shannon entropy calculation:
const getFrequencies = str => {
let dict = new Set(str);
return [...dict].map(chr => {
return str.match(new RegExp(chr, 'g')).length;
});
};
// Measure the entropy of a string in bits per symbol.
const entropy = str => getFrequencies(str)
.reduce((sum, frequency) => {
let p = frequency / str.length;
return sum - (p * Math.log(p) / Math.log(2));
}, 0);
const strings = ['14370afcdc17429f9e418d5ffbd0334a', 'or', 'sdf', 'test', 'dave coperfield', 'gmail', 'ce06e817-2149-6cfd-dd24-51b31e93ea1a',
'87c0d782-e09f-056f-f544-c6ec9d17943c', 'space-max', 'ns3160176.ip-151-106-35',
'ds4-f1g-54-h5-dfg-yk-4gd-htr5-fdg5h', 'outlook', 'h-rt-dfg4-sv6-fg32-dsv5-vfd5-ds312', 'system-analytics', '454-fs-ns-dff4-xhh-43d-frfs'];
for (let i = 0; i < strings.length; i++) {
const str = strings[i];
let result = 0;
try {
result = entropy(str);
}
catch (error) { result = 0; }
console.log(`Entropy of '${str}' in bits per symbol:`, result);
}
The output is:
Entropy of '14370afcdc17429f9e418d5ffbd0334a' in bits per symbol: 3.7417292966721747
Entropy of 'or' in bits per symbol: 1
Entropy of 'sdf' in bits per symbol: 1.584962500721156
Entropy of 'test' in bits per symbol: 1.5
Entropy of 'dave coperfield' in bits per symbol: 3.4565647621309536
Entropy of 'gmail' in bits per symbol: 2.3219280948873626
Entropy of 'ce06e817-2149-6cfd-dd24-51b31e93ea1a' in bits per symbol: 3.882021446536749
Entropy of '87c0d782-e09f-056f-f544-c6ec9d17943c' in bits per symbol: 3.787301737252941
Entropy of 'space-max' in bits per symbol: 2.94770277922009
Entropy of 'ns3160176.ip-151-106-35' in bits per symbol: 3.1477803284561103
Entropy of 'ds4-f1g-54-h5-dfg-yk-4gd-htr5-fdg5h' in bits per symbol: 3.3502926596166693
Entropy of 'outlook' in bits per symbol: 2.1280852788913944
Entropy of 'h-rt-dfg4-sv6-fg32-dsv5-vfd5-ds312' in bits per symbol: 3.619340871812292
Entropy of 'system-analytics' in bits per symbol: 3.327819531114783
Entropy of '454-fs-ns-dff4-xhh-43d-frfs' in bits per symbol: 3.1299133176846836
It's still not working as expected, as 'dave coperfield' gets about the same points as other gibberish results.
Anyone else have better logic or ideas on how to do it?
This is what I come up with:
// gibberish detector js
(function (h) {
function e(c, b, a) { return c < b ? (a = b - c, Math.log(b) / Math.log(a) * 100) : c > a ? (b = c - a, Math.log(100 - a) / Math.log(b) * 100) : 0 } function k(c) { for (var b = {}, a = "", d = 0; d < c.length; ++d)c[d] in b || (b[c[d]] = 1, a += c[d]); return a } h.detect = function (c) {
if (0 === c.length || !c.trim()) return 0; for (var b = c, a = []; a.length < b.length / 35;)a.push(b.substring(0, 35)), b = b.substring(36); 1 <= a.length && 10 > a[a.length - 1].length && (a[a.length - 2] += a[a.length - 1], a.pop()); for (var b = [], d = 0; d < a.length; d++)b.push(k(a[d]).length); a = 100 * b; for (d = b =
0; d < a.length; d++)b += parseFloat(a[d], 10); a = b / a.length; for (var f = d = b = 0; f < c.length; f++) { var g = c.charAt(f); g.match(/^[a-zA-Z]+$/) && (g.match(/^(a|e|i|o|u)$/i) && b++, d++) } b = 0 !== d ? b / d * 100 : 0; c = c.split(/[\W_]/).length / c.length * 100; a = Math.max(1, e(a, 45, 50)); b = Math.max(1, e(b, 35, 45)); c = Math.max(1, e(c, 15, 20)); return Math.max(1, (Math.log10(a) + Math.log10(b) + Math.log10(c)) / 6 * 100)
}
})("undefined" === typeof exports ? this.gibberish = {} : exports)
// email syntax validator
function validateSyntax(email) {
return /^(([^<>()[\]\\.,;:\s#"]+(\.[^<>()[\]\\.,;:\s#"]+)*)|(".+"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/.test(email.toLowerCase());
}
// shannon entropy
function entropy(str) {
return Object.values(Array.from(str).reduce((freq, c) => (freq[c] = (freq[c] || 0) + 1) && freq, {})).reduce((sum, f) => sum - f / str.length * Math.log2(f / str.length), 0)
}
// vowel counter
function countVowels(word) {
var m = word.match(/[aeiou]/gi);
return m === null ? 0 : m.length;
}
// dummy function
function isTrue(value){
return value
}
// validate string by multiple tests
function detectGibberish(str){
var strWithoutPunct = str.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,"");
var entropyValue = entropy(str) < 3.5;
var gibberishValue = gibberish.detect(str) < 50;
var vovelValue = 30 < 100 / strWithoutPunct.length * countVowels(strWithoutPunct) && 100 / strWithoutPunct.length * countVowels(str) < 35;
return [entropyValue, gibberishValue, vovelValue].filter(isTrue).length > 1
}
// main function
function validateEmail(email) {
return validateSyntax(email) ? detectGibberish(email.split("#")[0]) : false
}
// tests
document.write(validateEmail("dsfghjdhjs#gmail.com") + "<br/>")
document.write(validateEmail("jhon.smith#gmail.com"))
I have combined multiple tests: gibberish-detector.js, Shannon entropy and counting vowels (between 30% and 35%). You can adjust some values for more accurate result.
A thing you may consider doing is checking each time how random each string is, then sort the results according to their score and given a threshold exclude the ones with high randomness. It is inevitable that you will miss some.
There are some implementations for checking the randomness of strings, for example:
https://en.wikipedia.org/wiki/Diehard_tests
http://www.cacert.at/random/
You may have to create a hash (to map chars and symbols to sequences of integers) before you apply some of these because some work only with integers, since they test properties of random numbers generators.
Also a stack exchange link that can be of help is this:
https://stats.stackexchange.com/questions/371150/check-if-a-character-string-is-not-random
PS. I am having a similar problem in a service since robots create accounts with these type of fake emails. After years of dealing with this issue (basically deleting manually from the DB the fake emails) I am now considering introducing a visual check (captcha) in the signup page to avoid the frustration.
Already tried CRC8 but I wasn't able to get the correct checksum.
Does anyone have an idea of how this checksum could be generated using JS ?
This proved to be a very tricky one to solve, but I think I have a solution. It's a JavaScript port (mostly) of the Java. I did some simplifying of the code to eliminate the things that didn't seem to affect the answer.
First I had to export the hex equivalent of the CRC8_DATA in your Java program. I achieved this using just a simple bytesToHex routine I found here (this part is in Java):
System.out.print(bytesToHex(CRC8_DATA));
...
private static final char[] HEX_ARRAY = "0123456789ABCDEF".toCharArray();
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for (int j = 0; j < bytes.length; j++) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = HEX_ARRAY[v >>> 4];
hexChars[j * 2 + 1] = HEX_ARRAY[v & 0x0F];
}
return new String(hexChars);
}
Once I had this table, I just converted the Java code into JavaScript, to arrive at this:
var CRC8_DATA = '005EBCE2613FDD83C29C7E20A3FD1F419DC3217FFCA2401E5F01E3BD3E6082DC237D9FC1421CFEA0E1BF5D0380DE3C62BEE0025CDF81633D7C22C09E1D43A1FF4618FAA427799BC584DA3866E5BB5907DB856739BAE406581947A5FB7826C49A653BD987045AB8E6A7F91B45C6987A24F8A6441A99C7257B3A6486D85B05E7B98CD2306EEDB3510F4E10F2AC2F7193CD114FADF3702ECC92D38D6F31B2EC0E50AFF1134DCE90722C6D33D18F0C52B0EE326C8ED0530DEFB1F0AE4C1291CF2D73CA947628ABF517490856B4EA6937D58B5709EBB536688AD495CB2977F4AA4816E9B7550B88D6346A2B7597C94A14F6A8742AC896154BA9F7B6E80A54D7896B35';
function strToArr(str) {
var arr = str.match(/[0-9a-f]{2}/ig); // convert into array of hex pairs
arr = arr.map(x=> parseInt(x, 16)); // convert hex pairs into ints (bytes)
return arr;
}
CRC8_DATA = strToArr(CRC8_DATA);
function calculateCRC8(bArr) {
var i = 1;
var i2 = bArr.length - 1;
var b = 0;
while (i <= i2) {
b = CRC8_DATA[(b ^ bArr[i]) & 255];
i++;
}
return b;
}
function calcFromString(str) {
// convert from string to "byte" array
var byte_array = strToArr(str);
var checksum = calculateCRC8(byte_array)
console.log(str, checksum.toString(16));
}
calcFromString('02FD 1000 2322 4978 0140 00AF 6000 0000 0000');
calcFromString('02FD 1000 D82E 4F76 0189 00AF FA14 0000 0000');
The original Java is starting with i=1, so it's actually not including the first byte in the checksum calculation. That's probably one of the reasons why a lot of the JavaScript libraries weren't giving the same answer.
I went back and compared this with this online CRC calculator. When eliminating the first byte (0x02), I was able to get equivalent results using CRC-8/MAXIM;DOW-CRC.
I was not able to get the original fiddle working though, even after dropping the first byte and changing the polynomial to match the one from that website. Some of the other options must be different as well.
Checksum calculation in JavaScript
It can be quite difficult to do integer bitwise math in a language where numbers are double floating point numbers by default. Your code has to make sure that a number stays an integer, perhaps even stays unsigned and within a specific bit-range (16-bit, 32-bit). These extra steps might complicate things.
Tricks
A few tricks to ensure that a number is an x-bit integer, is by using the AND operator with a bitmask that allows all bits for that range. e.g. ensuring a 16-bit number could be done by number &= 0xffff;. Furthermore I use operations such as num | 0 or num >>> 0 to ensure it is a 32-bit integer, signed or unsigned. This is required to prevent negative results to be generated, which is especially weird looking when you display the checksum in hexadecimal.
Speed
Obviously I did not try this to make a quick checksum engine, I merely wanted to test the possibility of bitwise calculation more after my SmallPRNG and ISAAC CPRNG pens (post pending ;)). The algorithms will perform differently in all browsers and browser versions and might even be extremely slow in some. I do believe Chrome will handle data very quickly and checksums can be calculated with a reasonable amount of speed!
Base class
I'm going to implement multiple checksum algorithms, so I will create a base class which will construct a state (ctx) for the chosen algorithm. This state will keep track of the checksum and an instance of the algorithm class. This class will also handle strings properly, taking their encoding into account.
This class also contains a test for Uint8Array support. I'm not sure if this is the best way to test for support, but it did the trick.
var hasTyped = (function() {
if(!('Uint8Array' in window)) {
return false;
}
try {
var a = new window.Uint8Array(10);
a[0] = 100;
if(a[0] === 100) {
return true;
}
return false;
} catch (e) {
return false;
}
}());
Algorithm classes
Now we can just add algorithms with Checksum.registerChecksum. Each Algorithm class should implement a single and an array method for processing data and optionally a setup method which will be called when the Checksum object is constructed.
BSD16
Here's a very simple one, only a little bit of code is required for this algorithm. The BSD Checksum algorithm!
(function() {
'use strict';
if(typeof(window.Checksum) === "undefined") {
throw "The Checksum class is required for this code.";
}
/**
* Initialize anything you want in the constructor, such as setting up a lookup
* table of bytes. Make sure to cache your calculations, so they only have to be
* done once.
*/
var BSD16 = function BSD16() {
this.init = 0;
};
/**
* bsd16 for arrays, each value must be numeric and will be bound to 8-bits (Int8Array or Uint8Array works best!)
* #param {Array} a input (8-bit array)
* #param {Number} p previous checksum state
* #returns {Number} new checksum state
*/
BSD16.prototype.array = function(a, p) {
var c = p || 0, i = 0, l = a.length;
for(; i < l; i++) c = (((((c >>> 1) + ((c & 1) << 15)) | 0) + (a[i] & 0xff)) & 0xffff) | 0;
return c;
};
/**
* bsd16 for a single value, update a checksum with a new byte
* #param {Number} b byte (0-255)
* #param {Number} p previous checksum state
* #returns {Number} new checksum state
*/
BSD16.prototype.single = function(b, p) {
var c = p || 0;
return (((((c >>> 1) + ((c & 1) << 15)) | 0) + (b & 0xff)) & 0xffff) | 0;
};
Checksum.registerChecksum("bsd16", BSD16);
}());
FNV32 (FNV-0 and FNV-1)
Another easy one, the FNV Hash algorithm - which generates 32-bit checksums!
(function() {
'use strict';
if(typeof(window.Checksum) === "undefined") {
throw "The Checksum class is required for this code.";
}
var prime = 0x01000193;
/**
* Initialize anything you want in the constructor, such as setting up a lookup
* table of bytes. Make sure to cache your calculations, so they only have to be
* done once.
*/
var FNV32 = function FNV32() {
this.init = 2166136261; // fnv-1!
};
/**
* The setup method which will be called when new Checksum("fletcher", ...) is called.
* This method is supposed to initialize the checksum cipher and to recieve parameters
* from the constructor.
*
* #param {Number} mode the FNV32 mode (FNV-1 (defualt) or FNV-0)
*/
FNV32.prototype.setup = function(mode) {
if(mode === 0) {
this.init = 0; // fnv-0.
}
};
FNV32.prototype.array = function(a, p) {
var len = a.length,
fnv = p || this.init;
for(var i = 0; i < len; i++) {
fnv = (fnv + (((fnv << 1) + (fnv << 4) + (fnv << 7) + (fnv << 8) + (fnv << 24)) >>> 0)) ^ (a[i] & 0xff);
}
return fnv >>> 0;
};
FNV32.prototype.single = function(b, p) {
var fnv = p || this.init;
return ((fnv + (((fnv << 1) + (fnv << 4) + (fnv << 7) + (fnv << 8) + (fnv << 24)) >>> 0)) ^ (b & 0xff)) >>> 0;
};
Checksum.registerChecksum("fnv32", FNV32);
}());
You Can Use This url to read full article
https://codepen.io/ImagineProgramming/post/checksum-algorithms-in-javascript-checksum-js-engine?cf_chl_jschl_tk=39bd0de9934c580b6ce33374feddb9c37ee59c20-1590266531-0-AcdpYddIKyEBHAf0tuLzYDLpqpVAYrXmTKfWqF_3OLQVepxEcJq9z0gjYtxrVssDX487qBCful5gnyyFlqtl1fUzzH-lQXJhSZkUquU7GWLTAWAMbH2st22M1Ef6NxBfdHZKL5K0oLWb7gU88-MWOVWW3Ioponmmm7GVfHqSL7bdGcrSZIWjw_U2hKl57DXPw8YO3eWcykQnewEQPVOV7mQV5MlHPf17K-_Doo9NOoOtFJUQGHZyLJL2ANJAiWP98nK6vBZcIyLh69YUbuYFFgYB7uRmzfnb-NKCFLDbRUTBaq0I6Xr_blRfzAsyen4Jc-7bWN0cvNlKPQzViAxIR68edJx4gYcFIOrJiu-kQmmMQJMcHuwZFuvWCqAYLQEPX63ttncdhJUEXU8ThjhPhiHNPwX4FjzI1PxLRMH8Hoj1GlIur1DyZDxwz-4t64Pwqg
I need to create UID's on my my server side code, as well as on client side code (in the browser). I currently use NewID() as my default value, but when creating objects the client side (in browser), I use uuid.js. Am I more likely to get collisions with NewSequentialId() as my default value (which will be used when objects are created server side)?
FYI here is the uuid.js code, as I can't recall where I downloaded it.
// uuid.js
//
// Copyright (c) 2010-2012 Robert Kieffer
// MIT License - http://opensource.org/licenses/mit-license.php
(function() {
var _global = this;
// Unique ID creation requires a high quality random # generator. We feature
// detect to determine the best RNG source, normalizing to a function that
// returns 128-bits of randomness, since that's what's usually required
var _rng;
// Node.js crypto-based RNG - http://nodejs.org/docs/v0.6.2/api/crypto.html
//
// Moderately fast, high quality
if (typeof(require) == 'function') {
try {
var _rb = require('crypto').randomBytes;
_rng = _rb && function() {return _rb(16);};
} catch(e) {}
}
if (!_rng && _global.crypto && crypto.getRandomValues) {
// WHATWG crypto-based RNG - http://wiki.whatwg.org/wiki/Crypto
//
// Moderately fast, high quality
var _rnds8 = new Uint8Array(16);
_rng = function whatwgRNG() {
crypto.getRandomValues(_rnds8);
return _rnds8;
};
}
if (!_rng) {
// Math.random()-based (RNG)
//
// If all else fails, use Math.random(). It's fast, but is of unspecified
// quality.
var _rnds = new Array(16);
_rng = function() {
for (var i = 0, r; i < 16; i++) {
if ((i & 0x03) === 0) r = Math.random() * 0x100000000;
_rnds[i] = r >>> ((i & 0x03) << 3) & 0xff;
}
return _rnds;
};
}
// Buffer class to use
var BufferClass = typeof(Buffer) == 'function' ? Buffer : Array;
// Maps for number <-> hex string conversion
var _byteToHex = [];
var _hexToByte = {};
for (var i = 0; i < 256; i++) {
_byteToHex[i] = (i + 0x100).toString(16).substr(1);
_hexToByte[_byteToHex[i]] = i;
}
// **`parse()` - Parse a UUID into it's component bytes**
function parse(s, buf, offset) {
var i = (buf && offset) || 0, ii = 0;
buf = buf || [];
s.toLowerCase().replace(/[0-9a-f]{2}/g, function(oct) {
if (ii < 16) { // Don't overflow!
buf[i + ii++] = _hexToByte[oct];
}
});
// Zero out remaining bytes if string was short
while (ii < 16) {
buf[i + ii++] = 0;
}
return buf;
}
// **`unparse()` - Convert UUID byte array (ala parse()) into a string**
function unparse(buf, offset) {
var i = offset || 0, bth = _byteToHex;
return bth[buf[i++]] + bth[buf[i++]] +
bth[buf[i++]] + bth[buf[i++]] + '-' +
bth[buf[i++]] + bth[buf[i++]] + '-' +
bth[buf[i++]] + bth[buf[i++]] + '-' +
bth[buf[i++]] + bth[buf[i++]] + '-' +
bth[buf[i++]] + bth[buf[i++]] +
bth[buf[i++]] + bth[buf[i++]] +
bth[buf[i++]] + bth[buf[i++]];
}
// **`v1()` - Generate time-based UUID**
//
// Inspired by https://github.com/LiosK/UUID.js
// and http://docs.python.org/library/uuid.html
// random #'s we need to init node and clockseq
var _seedBytes = _rng();
// Per 4.5, create and 48-bit node id, (47 random bits + multicast bit = 1)
var _nodeId = [
_seedBytes[0] | 0x01,
_seedBytes[1], _seedBytes[2], _seedBytes[3], _seedBytes[4], _seedBytes[5]
];
// Per 4.2.2, randomize (14 bit) clockseq
var _clockseq = (_seedBytes[6] << 8 | _seedBytes[7]) & 0x3fff;
// Previous uuid creation time
var _lastMSecs = 0, _lastNSecs = 0;
// See https://github.com/broofa/node-uuid for API details
function v1(options, buf, offset) {
var i = buf && offset || 0;
var b = buf || [];
options = options || {};
var clockseq = options.clockseq != null ? options.clockseq : _clockseq;
// UUID timestamps are 100 nano-second units since the Gregorian epoch,
// (1582-10-15 00:00). JSNumbers aren't precise enough for this, so
// time is handled internally as 'msecs' (integer milliseconds) and 'nsecs'
// (100-nanoseconds offset from msecs) since unix epoch, 1970-01-01 00:00.
var msecs = options.msecs != null ? options.msecs : new Date().getTime();
// Per 4.2.1.2, use count of uuid's generated during the current clock
// cycle to simulate higher resolution clock
var nsecs = options.nsecs != null ? options.nsecs : _lastNSecs + 1;
// Time since last uuid creation (in msecs)
var dt = (msecs - _lastMSecs) + (nsecs - _lastNSecs)/10000;
// Per 4.2.1.2, Bump clockseq on clock regression
if (dt < 0 && options.clockseq == null) {
clockseq = clockseq + 1 & 0x3fff;
}
// Reset nsecs if clock regresses (new clockseq) or we've moved onto a new
// time interval
if ((dt < 0 || msecs > _lastMSecs) && options.nsecs == null) {
nsecs = 0;
}
// Per 4.2.1.2 Throw error if too many uuids are requested
if (nsecs >= 10000) {
throw new Error('uuid.v1(): Can\'t create more than 10M uuids/sec');
}
_lastMSecs = msecs;
_lastNSecs = nsecs;
_clockseq = clockseq;
// Per 4.1.4 - Convert from unix epoch to Gregorian epoch
msecs += 12219292800000;
// `time_low`
var tl = ((msecs & 0xfffffff) * 10000 + nsecs) % 0x100000000;
b[i++] = tl >>> 24 & 0xff;
b[i++] = tl >>> 16 & 0xff;
b[i++] = tl >>> 8 & 0xff;
b[i++] = tl & 0xff;
// `time_mid`
var tmh = (msecs / 0x100000000 * 10000) & 0xfffffff;
b[i++] = tmh >>> 8 & 0xff;
b[i++] = tmh & 0xff;
// `time_high_and_version`
b[i++] = tmh >>> 24 & 0xf | 0x10; // include version
b[i++] = tmh >>> 16 & 0xff;
// `clock_seq_hi_and_reserved` (Per 4.2.2 - include variant)
b[i++] = clockseq >>> 8 | 0x80;
// `clock_seq_low`
b[i++] = clockseq & 0xff;
// `node`
var node = options.node || _nodeId;
for (var n = 0; n < 6; n++) {
b[i + n] = node[n];
}
return buf ? buf : unparse(b);
}
// **`v4()` - Generate random UUID**
// See https://github.com/broofa/node-uuid for API details
function v4(options, buf, offset) {
// Deprecated - 'format' argument, as supported in v1.2
var i = buf && offset || 0;
if (typeof(options) == 'string') {
buf = options == 'binary' ? new BufferClass(16) : null;
options = null;
}
options = options || {};
var rnds = options.random || (options.rng || _rng)();
// Per 4.4, set bits for version and `clock_seq_hi_and_reserved`
rnds[6] = (rnds[6] & 0x0f) | 0x40;
rnds[8] = (rnds[8] & 0x3f) | 0x80;
// Copy bytes to buffer, if provided
if (buf) {
for (var ii = 0; ii < 16; ii++) {
buf[i + ii] = rnds[ii];
}
}
return buf || unparse(rnds);
}
// Export public API
var uuid = v4;
uuid.v1 = v1;
uuid.v4 = v4;
uuid.parse = parse;
uuid.unparse = unparse;
uuid.BufferClass = BufferClass;
if (typeof define === 'function' && define.amd) {
// Publish as AMD module
define(function() {return uuid;});
} else if (typeof(module) != 'undefined' && module.exports) {
// Publish as node.js module
module.exports = uuid;
} else {
// Publish as global (in browsers)
var _previousRoot = _global.uuid;
// **`noConflict()` - (browser only) to reset global 'uuid' var**
uuid.noConflict = function() {
_global.uuid = _previousRoot;
return uuid;
};
_global.uuid = uuid;
}
}).call(this);
This is actually a pretty interesting question, with a number of levels to it.
First, it's worth noting that uuid.js supports two different forms of id. uuid.v4() creates IDs using random numbers, while uuid.v1() creates IDs based on timestamps. The "version" of the id is actually encoded in the id itself, which guarantees that in theory no v4 id will ever collide with a v1 id. That's part of RFC4122, the UUID specification.
It's also worth noting that for v1 ids, each id source is supposed to have a unique "node id", also encoded in the id, that guarantees the uniqueness of id sequence created by that source. For id sources that have access to a guaranteed-unique value (e.g. a device's MAC address) this works well. However uuid.js doesn't have access to such a value and, thus, generates a random value for it's node id. This introduces the risk of it generating a node id that matches the one used by your server. The node id is 48-bit value, meaning the chance of a node id collision are 281,474,976,710,656:1. So, theres a chance, but it's pretty damn low.
... but none of that matters!
It turns out that even though NewSequentialID() produces IDs that are superficially similar to v1 IDs, Microsoft for whatever reasons decided to swap the various fields within the ID around, breaking RFC4122 compatability. What this means is that, depending on the sequence number, the IDs may or may not look like valid v1 ids, or valid v4 ids, or simply invalid UUIDs. I.e. using NewSequentialID() throws a wrench into the works if you want to talk about the possibility of uuid collision.
I'm not sure there's a simple way to quantify the risk of collision given this last issue. At the end of the day, UUIDs are 128-bit values, meaning there's a HUGE numberspace to draw from. For all but the most demanding of requirements you're probably okay. But there will be an increased risk of collision compared to what you'd have if you used an RFC-compliant UUID source.
[FWIW, your uuid.js comes from the node-uuid project (*cough* said the author).]
Executing this JavaScript code in Safari
// expected output - array containing 32 bit words
b = "a";
var a = Array((b.length+3) >> 2);
for (var i = 0; i < b.length; i++) a[i>>2] |= (b.charCodeAt(i) << (24-(i & 3)*8));
and this (Objective-)C code in iOS Simulator
int array[((#"a".length + 3) >> 2)];
for (int i = 0; i < #"a".length; i++) {
int c = (int) [#"a" characterAtIndex:i];
array[i>>2] |= (c << (24-((i & 3)*8)));
}
gives me different output - consecutively (JavaScript) 1627389952 and (Objective-C) 1627748484.
Since the first four digits are always the same I think that the error is connected with precision but I cannot spot the issue.
EDIT
Sorry for this lack of attention and thank you very much (#Joni and all of you guys). You were right that the array in C code is fullfilled with some random values. I solved the issue setting all elements in the array to zero:
memset(array, 0, sizeof(array));
If anyone is curious the C code looks like this now:
int array[((#"a".length + 3) >> 2)];
memset(array, 0, sizeof(array));
for (int i = 0; i < #"a".length; i++) {
int c = (int) [#"a" characterAtIndex:i];
array[i>>2] |= (c << (24-((i & 3)*8)));
}
I don't know how Objective-c initializes arrays but in javascript
they are not initialized to anything (in fact, the indices don't even exist), so take care of that at least:
var b = "a";
var a = Array((b.length + 3) >> 2);
for( var i = 0, len = a.length; i < len; ++i ) {
a[i] = 0; //initialize a values to 0
}
for (var i = 0; i < b.length; i++) {
a[i >> 2] |= (b.charCodeAt(i) << (24 - (i & 3) * 8));
}
Secondly, this effectively should calculate 97 << 24, for which the correct
answer is 1627389952, so the Objective-C result is wrong. Probably because
the array values are not initialized to 0?
You are not setting the array to zeros in objective c, so it may have some random garbage to start with.
Maybe i am just not that good enough in math, but I am having a problem in converting a number into pure alphabetical Bijective Hexavigesimal just like how Microsoft Excel/OpenOffice Calc do it.
Here is a version of my code but did not give me the output i needed:
var toHexvg = function(a){
var x='';
var let="_abcdefghijklmnopqrstuvwxyz";
var len=let.length;
var b=a;
var cnt=0;
var y = Array();
do{
a=(a-(a%len))/len;
cnt++;
}while(a!=0)
a=b;
var vnt=0;
do{
b+=Math.pow((len),vnt)*Math.floor(a/Math.pow((len),vnt+1));
vnt++;
}while(vnt!=cnt)
var c=b;
do{
y.unshift( c%len );
c=(c-(c%len))/len;
}while(c!=0)
for(var i in y)x+=let[y[i]];
return x;
}
The best output of my efforts can get is: a b c d ... y z ba bb bc - though not the actual code above. The intended output is suppose to be a b c ... y z aa ab ac ... zz aaa aab aac ... zzzzz aaaaaa aaaaab, you get the picture.
Basically, my problem is more on doing the ''math'' rather than the function. Ultimately my question is: How to do the Math in Hexavigesimal conversion, till a [supposed] infinity, just like Microsoft Excel.
And if possible, a source code, thank you in advance.
Okay, here's my attempt, assuming you want the sequence to be start with "a" (representing 0) and going:
a, b, c, ..., y, z, aa, ab, ac, ..., zy, zz, aaa, aab, ...
This works and hopefully makes some sense. The funky line is there because it mathematically makes more sense for 0 to be represented by the empty string and then "a" would be 1, etc.
alpha = "abcdefghijklmnopqrstuvwxyz";
function hex(a) {
// First figure out how many digits there are.
a += 1; // This line is funky
c = 0;
var x = 1;
while (a >= x) {
c++;
a -= x;
x *= 26;
}
// Now you can do normal base conversion.
var s = "";
for (var i = 0; i < c; i++) {
s = alpha.charAt(a % 26) + s;
a = Math.floor(a/26);
}
return s;
}
However, if you're planning to simply print them out in order, there are far more efficient methods. For example, using recursion and/or prefixes and stuff.
Although #user826788 has already posted a working code (which is even a third quicker), I'll post my own work, that I did before finding the posts here (as i didnt know the word "hexavigesimal"). However it also includes the function for the other way round. Note that I use a = 1 as I use it to convert the starting list element from
aa) first
ab) second
to
<ol type="a" start="27">
<li>first</li>
<li>second</li>
</ol>
:
function linum2int(input) {
input = input.replace(/[^A-Za-z]/, '');
output = 0;
for (i = 0; i < input.length; i++) {
output = output * 26 + parseInt(input.substr(i, 1), 26 + 10) - 9;
}
console.log('linum', output);
return output;
}
function int2linum(input) {
var zeros = 0;
var next = input;
var generation = 0;
while (next >= 27) {
next = (next - 1) / 26 - (next - 1) % 26 / 26;
zeros += next * Math.pow(27, generation);
generation++;
}
output = (input + zeros).toString(27).replace(/./g, function ($0) {
return '_abcdefghijklmnopqrstuvwxyz'.charAt(parseInt($0, 27));
});
return output;
}
linum2int("aa"); // 27
int2linum(27); // "aa"
You could accomplish this with recursion, like this:
const toBijective = n => (n > 26 ? toBijective(Math.floor((n - 1) / 26)) : "") + ((n % 26 || 26) + 9).toString(36);
// Parsing is not recursive
const parseBijective = str => str.split("").reverse().reduce((acc, x, i) => acc + ((parseInt(x, 36) - 9) * (26 ** i)), 0);
toBijective(1) // "a"
toBijective(27) // "aa"
toBijective(703) // "aaa"
toBijective(18279) // "aaaa"
toBijective(127341046141) // "overflow"
parseBijective("Overflow") // 127341046141
I don't understand how to work it out from a formula, but I fooled around with it for a while and came up with the following algorithm to literally count up to the requested column number:
var getAlpha = (function() {
var alphas = [null, "a"],
highest = [1];
return function(decNum) {
if (alphas[decNum])
return alphas[decNum];
var d,
next,
carry,
i = alphas.length;
for(; i <= decNum; i++) {
next = "";
carry = true;
for(d = 0; d < highest.length; d++){
if (carry) {
if (highest[d] === 26) {
highest[d] = 1;
} else {
highest[d]++;
carry = false;
}
}
next = String.fromCharCode(
highest[d] + 96)
+ next;
}
if (carry) {
highest.push(1);
next = "a" + next;
}
alphas[i] = next;
}
return alphas[decNum];
};
})();
alert(getAlpha(27)); // "aa"
alert(getAlpha(100000)); // "eqxd"
Demo: http://jsfiddle.net/6SE2f/1/
The highest array holds the current highest number with an array element per "digit" (element 0 is the least significant "digit").
When I started the above it seemed a good idea to cache each value once calculated, to save time if the same value was requested again, but in practice (with Chrome) it only took about 3 seconds to calculate the 1,000,000th value (bdwgn) and about 20 seconds to calculate the 10,000,000th value (uvxxk). With the caching removed it took about 14 seconds to the 10,000,000th value.
Just finished writing this code earlier tonight, and I found this question while on a quest to figure out what to name the damn thing. Here it is (in case anybody feels like using it):
/**
* Convert an integer to bijective hexavigesimal notation (alphabetic base-26).
*
* #param {Number} int - A positive integer above zero
* #return {String} The number's value expressed in uppercased bijective base-26
*/
function bijectiveBase26(int){
const sequence = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
const length = sequence.length;
if(int <= 0) return int;
if(int <= length) return sequence[int - 1];
let index = (int % length) || length;
let result = [sequence[index - 1]];
while((int = Math.floor((int - 1) / length)) > 0){
index = (int % length) || length;
result.push(sequence[index - 1]);
}
return result.reverse().join("")
}
I had to solve this same problem today for work. My solution is written in Elixir and uses recursion, but I explain the thinking in plain English.
Here are some example transformations:
0 -> "A", 1 -> "B", 2 -> "C", 3 -> "D", ..
25 -> "Z", 26 -> "AA", 27 -> "AB", ...
At first glance it might seem like a normal 26-base counting system
but unfortunately it is not so simple.
The "problem" becomes clear when you realize:
A = 0
AA = 26
This is at odds with a normal counting system, where "0" does not behave
as "1" when it is in a decimal place other than then unit.
To understand the algorithm, consider a simpler but equivalent base-2 system:
A = 0
B = 1
AA = 2
AB = 3
BA = 4
BB = 5
AAA = 6
In a normal binary counting system we can determine the "value" of decimal places by
taking increasing powers of 2 (1, 2, 4, 8, 16) and the value of a binary number is
calculated by multiplying each digit by that digit place's value.
e.g. 10101 = 1 * (2 ^ 4) + 0 * (2 ^ 3) + 1 * (2 ^ 2) + 0 * (2 ^ 1) + 1 * (2 ^ 0) = 21
In our more complicated AB system, we can see by inspection that the decimal place values are:
1, 2, 6, 14, 30, 62
The pattern reveals itself to be (previous_unit_place_value + 1) * 2.
As such, to get the next lower unit place value, we divide by 2 and subtract 1.
This can be extended to a base-26 system. Simply divide by 26 and subtract 1.
Now a formula for transforming a normal base-10 number to special base-26 is apparent.
Say the input is x.
Create an accumulator list l.
If x is less than 26, set l = [x | l] and go to step 5. Otherwise, continue.
Divide x by 2. The floored result is d and the remainder is r.
Push the remainder as head on an accumulator list. i.e. l = [r | l]
Go to step 2 with with (d - 1) as input, e.g. x = d - 1
Convert """ all elements of l to their corresponding chars. 0 -> A, etc.
So, finally, here is my answer, written in Elixir:
defmodule BijectiveHexavigesimal do
def to_az_string(number, base \\ 26) do
number
|> to_list(base)
|> Enum.map(&to_char/1)
|> to_string()
end
def to_09_integer(string, base \\ 26) do
string
|> String.to_charlist()
|> Enum.reverse()
|> Enum.reduce({0, nil}, fn
char, {_total, nil} ->
{to_integer(char), 1}
char, {total, previous_place_value} ->
char_value = to_integer(char + 1)
place_value = previous_place_value * base
new_total = total + char_value * place_value
{new_total, place_value}
end)
|> elem(0)
end
def to_list(number, base, acc \\ []) do
if number < base do
[number | acc]
else
to_list(div(number, base) - 1, base, [rem(number, base) | acc])
end
end
defp to_char(x), do: x + 65
end
You use it simply as BijectiveHexavigesimal.to_az_string(420). It also accepts on optional "base" arg.
I know the OP asked about Javascript but I wanted to provide an Elixir solution for posterity.
I have published these functions in npm package here:
https://www.npmjs.com/package/#gkucmierz/utils
Converting bijective numeration to number both ways (also BigInt version is included).
https://github.com/gkucmierz/utils/blob/main/src/bijective-numeration.mjs