Related
I am trying to find values that commonly appear next to each other in an array.
E.G. given the array:
["dog","cat","goat","dog","cat","elephant","dog","cat","pig","seal","dog","cat","pig","monkey"]
it should return something similar to:
[[["dog","cat"],4],[["cat","pig"],2],[["dog","cat","pig"],2]]
Here is some better data: https://pastebin.com/UG4iswrZ
Help would be greatly appreciated. Here is my current failed attempt at doing something similar:
function findAssociations(words){
var temp = [],tempStore = [],store = [],found = false;
//loop through the words counting occurrances of words together with a window of 5
for(var i = 0;i<words.length-1;i++){
if(i % 5 == 0){
//on every fith element, loop through store attempting to add combinations of words stored in tempStore
for(var j = 0;j<5;j++){
temp = []
//create the current combination
for(var k = 0;k<j;k++){
temp.push(tempStore[k]);
}
//find if element is already stored, if it is, increment the occurrence counter
for(var k = 0;k<store.length;k++){
if(store[k][0]===temp){
found = true;
store[k][1] = store[k][1]+1;
}
}
//if it isn't add it
if(found == false){
store.push([temp,1]);
}
found == false;
}
tempStore = [];
} else {
//add word to tempStore if it i isnt a multiple of 5
tempStore.push(words[i]);
}
}
}
This script is doesn't remove combinations that appear once,it doesn't sort the output by occurrences, nor does it work. It is just an outline of how a possible solution might work (as suggested by benvc).
Here is a generic solution working with multiple group sizes.
You specify a range of group sizes, for example [2,4] for groups of 2 to 4 elements and a minimum number of occurrences.
The function then generates all groups of neighbours of the given sizes, sorts each group and counts the duplicates. The sorting step can be removed is the order in the groups matters.
The duplicates are counted by creating a dictionary whose keys are the group elements sorted and jointed with a special marker. The values in the dictionary are the counts.
It then returns the groups sorted by occurences and then by group size.
const data = ["dog","cat","goat","dog","cat","elephant","dog","cat","pig","seal","dog","cat","pig","monkey"];
function findSimilarNeighbors(groupSizeRange, minOccurences, data) {
const getNeighbors = (size, arr) => arr.reduce((acc, x) => {
acc.push([]);
for (let i = 0; i < size; ++ i) {
const idx = acc.length - i - 1;
(acc[idx] || []).push(x);
}
return acc;
}, []).filter(x => x.length === size);
const groups = [];
for (let groupSize = groupSizeRange[0]; groupSize <= groupSizeRange[1]; ++groupSize) {
groups.push(...getNeighbors(groupSize, data));
}
const groupName = group => group.sort().join('###'); // use a separator that won't occur in the strings
const groupsInfo = groups.reduce((acc, group) => {
const name = groupName(group);
acc[name] = acc[name] || {};
acc[name] = { group, count: (acc[name].count || 0) + 1 };
return acc;
}, {});
return Object.values(groupsInfo)
.filter(group => group.count >= minOccurences)
.sort((a, b) => {
const countDiff = b.count - a.count;
return countDiff ? countDiff : b.group.length - a.group.length;
})
.map(({ group, count }) => [group, count]);
};
console.log(findSimilarNeighbors([2, 4], 2, data));
console.log(findSimilarNeighbors([4, 4], 2, data));
Here is what I came up with. It only finds pairs, but you could modify it to find sets of 3, 4, etc, based on what you % by
const animals = ['dog','cat','goat','dog','cat','elephant','dog','cat','pig','seal','dog','cat','pig','monkey'];
let pairs = ',';
animals.forEach((animal, i) => {
let separator = ',';
if (i % 2 === 0) {
separator = ';'
}
pairs += animal + separator;
});
const evenPairs = pairs.split(',');
const oddPairs = pairs.split(';');
const allPairs = evenPairs.concat(oddPairs).map(pair => pair.replace(/[;,]/, ' '));
let result = {}
allPairs.forEach(pair => {
if (pair.length) {
if (result[pair] === undefined) {
result[pair] = 1;
} else {
result[pair]++;
}
}
});
results in:
dog: 1
cat elephant: 1
cat goat: 1
cat pig: 2
dog cat: 4
elephant dog: 1
goat dog: 1
monkey : 1
pig monkey: 1
pig seal: 1
seal dog: 1
https://stackblitz.com/edit/typescript-wvuvnr
You need to be clear what you mean by close and how close. Just looking at first neighbours you could try:
const findAssociations = words => {
const associations = {}
for (let i = 0; i < words.length - 1; i++) {
const word = words[i]
const wordRight = words[i+1]
const wordOne = word < wordRight ? word : wordRight;
const wordTwo = word < wordRight ? wordRight : word;
const keys = Object.keys(associations)
const key = `${wordOne}:${wordTwo}`
if (keys.indexOf(key) >= 0) {
associations[key]++
} else {
associations[key] = 1
}
}
const keys = Object.keys(associations)
const values = Object.values(associations)
const zipped = keys.map((key, index) => [key, values[index]])
zipped.sort((a, b) => a[1] < b[1] ? 1 : -1);
return zipped;
}
https://stackblitz.com/edit/js-3ppdit
You can use this function inside another function and add every time an element to ["dog", "cat"]
const arr = ["dog", "cat", "goat", "dog", "cat", "dog", "cat", "elephant", "dog", "cat", "pig", "seal", "dog", "cat", "pig", "monkey"]
const findArrayInArray = (arr1, arr2) => {
let count = 0,
arrString1 = arr1.join(""),
arrString2 = arr2.join("");
while (arrString2.indexOf(arrString1) > -1) {
count += 1;
arrString2 = arrString2.replace(arrString1, '');
}
return count;
}
console.log(`["dog", "cat"] exist ${findArrayInArray(["dog", "cat"], arr)} times`)
Assuming each item in the list is a delimiter of a set, and each set counts once for each item (i.e. ["dog", "cat", "goat"] counts as ["dog", "cat"] and ["dog", "cat", "goat"], and assuming you don't want any single occurrences, then here's one way:
const full_list = ["dog","cat","goat","dog","cat","dog","cat","elephant","dog","cat","pig","seal","dog","cat","pig","monkey"];
// create list of unique items
const distinct = (value, index, self) => {
return self.indexOf(value) ===index;
}
const unique_items = full_list.filter(distinct);
// get all patterns
var pre_report = {};
for (var i in unique_items) {
item = unique_items[i];
var pattern = [item];
var appending = false;
for (var j = full_list.indexOf(item) + 1; j < full_list.length; ++j) {
const related_item = full_list[j];
if (item == related_item) {
pattern = [item]
continue;
}
pattern.push(related_item);
if (pattern in pre_report) {
++pre_report[pattern];
} else {
pre_report[pattern] = 1;
}
}
}
// filter out only single occurring patterns
var report = {};
for (key in pre_report) {
if (pre_report[key] > 1) {
report[key] = pre_report[key];
}
}
console.log(report);
produces:
{ 'dog,cat': 5, 'dog,cat,pig': 2, 'cat,pig': 2 }
I've seen several similar questions about how to generate all possible combinations of elements in an array. But I'm having a very hard time figuring out how to write an algorithm that will only output combination pairs. Any suggestions would be super appreciated!
Starting with the following array (with N elements):
var array = ["apple", "banana", "lemon", "mango"];
And getting the following result:
var result = [
"apple banana"
"apple lemon"
"apple mango"
"banana lemon"
"banana mango"
"lemon mango"
];
I was trying out the following approach but this results in all possible combinations, instead only combination pairs.
var letters = splSentences;
var combi = [];
var temp= "";
var letLen = Math.pow(2, letters.length);
for (var i = 0; i < letLen ; i++){
temp= "";
for (var j=0;j<letters.length;j++) {
if ((i & Math.pow(2,j))){
temp += letters[j]+ " "
}
}
if (temp !== "") {
combi.push(temp);
}
}
Here are some functional programming solutions:
Using EcmaScript2019's flatMap:
var array = ["apple", "banana", "lemon", "mango"];
var result = array.flatMap(
(v, i) => array.slice(i+1).map( w => v + ' ' + w )
);
console.log(result);
Before the introduction of flatMap (my answer in 2017), you would go for reduce or [].concat(...) in order to flatten the array:
var array = ["apple", "banana", "lemon", "mango"];
var result = array.reduce( (acc, v, i) =>
acc.concat(array.slice(i+1).map( w => v + ' ' + w )),
[]);
console.log(result);
Or:
var array = ["apple", "banana", "lemon", "mango"];
var result = [].concat(...array.map(
(v, i) => array.slice(i+1).map( w => v + ' ' + w ))
);
console.log(result);
A simple way would be to do a double for loop over the array where you skip the first i elements in the second loop.
let array = ["apple", "banana", "lemon", "mango"];
let results = [];
// Since you only want pairs, there's no reason
// to iterate over the last element directly
for (let i = 0; i < array.length - 1; i++) {
// This is where you'll capture that last value
for (let j = i + 1; j < array.length; j++) {
results.push(`${array[i]} ${array[j]}`);
}
}
console.log(results);
Rewritten with ES5:
var array = ["apple", "banana", "lemon", "mango"];
var results = [];
// Since you only want pairs, there's no reason
// to iterate over the last element directly
for (var i = 0; i < array.length - 1; i++) {
// This is where you'll capture that last value
for (var j = i + 1; j < array.length; j++) {
results.push(array[i] + ' ' + array[j]);
}
}
console.log(results);
In my case, I wanted to get the combinations as follows, based on the size range of the array:
function getCombinations(valuesArray: String[])
{
var combi = [];
var temp = [];
var slent = Math.pow(2, valuesArray.length);
for (var i = 0; i < slent; i++)
{
temp = [];
for (var j = 0; j < valuesArray.length; j++)
{
if ((i & Math.pow(2, j)))
{
temp.push(valuesArray[j]);
}
}
if (temp.length > 0)
{
combi.push(temp);
}
}
combi.sort((a, b) => a.length - b.length);
console.log(combi.join("\n"));
return combi;
}
Example:
// variable "results" stores an array with arrays string type
let results = getCombinations(['apple', 'banana', 'lemon', ',mango']);
Output in console:
The function is based on the logic of the following documentation, more information in the following reference:
https://www.w3resource.com/javascript-exercises/javascript-function-exercise-3.php
if ((i & Math.pow(2, j)))
Each bit of the first value is compared with the second, it is taken as valid if it matches, otherwise it returns zero and the condition is not met.
Although solutions have been found, I post here an algorithm for general case to find all combinations size n of m (m>n) elements. In your case, we have n=2 and m=4.
const result = [];
result.length = 2; //n=2
function combine(input, len, start) {
if(len === 0) {
console.log( result.join(" ") ); //process here the result
return;
}
for (let i = start; i <= input.length - len; i++) {
result[result.length - len] = input[i];
combine(input, len-1, i+1 );
}
}
const array = ["apple", "banana", "lemon", "mango"];
combine( array, result.length, 0);
I ended up writing a general solution to this problem, which is functionally equivalent to nhnghia's answer, but I'm sharing it here as I think it's easier to read/follow and is also full of comments describing the algorithm.
/**
* Generate all combinations of an array.
* #param {Array} sourceArray - Array of input elements.
* #param {number} comboLength - Desired length of combinations.
* #return {Array} Array of combination arrays.
*/
function generateCombinations(sourceArray, comboLength) {
const sourceLength = sourceArray.length;
if (comboLength > sourceLength) return [];
const combos = []; // Stores valid combinations as they are generated.
// Accepts a partial combination, an index into sourceArray,
// and the number of elements required to be added to create a full-length combination.
// Called recursively to build combinations, adding subsequent elements at each call depth.
const makeNextCombos = (workingCombo, currentIndex, remainingCount) => {
const oneAwayFromComboLength = remainingCount == 1;
// For each element that remaines to be added to the working combination.
for (let sourceIndex = currentIndex; sourceIndex < sourceLength; sourceIndex++) {
// Get next (possibly partial) combination.
const next = [ ...workingCombo, sourceArray[sourceIndex] ];
if (oneAwayFromComboLength) {
// Combo of right length found, save it.
combos.push(next);
}
else {
// Otherwise go deeper to add more elements to the current partial combination.
makeNextCombos(next, sourceIndex + 1, remainingCount - 1);
}
}
}
makeNextCombos([], 0, comboLength);
return combos;
}
The best solutions I have found - https://lowrey.me/es6-javascript-combination-generator/
Uses ES6 generator functions, I adapted to TS. Most often you don't need all of the combinations at the same time. And I was getting annoyed by writing loops like for (let i=0; ... for let (j=i+1; ... for (let k=j+1... just to get combos one by one to test if I need to terminate the loops..
export function* combinations<T>(array: T[], length: number): IterableIterator<T[]> {
for (let i = 0; i < array.length; i++) {
if (length === 1) {
yield [array[i]];
} else {
const remaining = combinations(array.slice(i + 1, array.length), length - 1);
for (let next of remaining) {
yield [array[i], ...next];
}
}
}
}
usage:
for (const combo of combinations([1,2,3], 2)) {
console.log(combo)
}
output:
> (2) [1, 2]
> (2) [1, 3]
> (2) [2, 3]
Just to give an option for next who'll search it
const arr = ['a', 'b', 'c']
const combinations = ([head, ...tail]) => tail.length > 0 ? [...tail.map(tailValue => [head, tailValue]), ...combinations(tail)] : []
console.log(combinations(arr)) //[ [ 'a', 'b' ], [ 'a', 'c' ], [ 'b', 'c' ] ]
There are also this answer:
https://stackoverflow.com/a/64414875/19518308
The alghorithm is this answer generates all the possible sets of combination(or choose(n, k)) of n items within k spaces.
The algorhitm:
function choose(arr, k, prefix=[]) {
if (k == 0) return [prefix];
return arr.flatMap((v, i) =>
choose(arr.slice(i+1), k-1, [...prefix, v])
);
}
console.log(choose([0,1,2,3,4], 3));
I had a similar problem and this algorhitm is working very well for me.
Using map and flatMap the following can be done (flatMap is only supported on chrome and firefox)
var array = ["apple", "banana", "lemon", "mango"]
array.flatMap(x => array.map(y => x !== y ? x + ' ' + y : null)).filter(x => x)
I think it is an answer to all such questions.
/**
*
* Generates all combination of given Array or number
*
* #param {Array | number} item - Item accepts array or number. If it is array exports all combination of items. If it is a number export all combination of the number
* #param {number} n - pow of the item, if given value is `n` it will be export max `n` item combination
* #param {boolean} filter - if it is true it will just export items which have got n items length. Otherwise export all posible length.
* #return {Array} Array of combination arrays.
*
* Usage Example:
*
* console.log(combination(['A', 'B', 'C', 'D'], 2, true)); // [[ 'A','A' ], [ 'A', 'B' ]...] (16 items)
* console.log(combination(['A', 'B', 'C', 'D'])); // [['A', 'A', 'A', 'B' ],.....,['A'],] (340 items)
* console.log(comination(4, 2)); // all posible values [[ 0 ], [ 1 ], [ 2 ], [ 3 ], [ 0, 0 ], [ 0, 1 ], [ 0, 2 ]...] (20 items)
*/
function combination(item, n) {
const filter = typeof n !=='undefined';
n = n ? n : item.length;
const result = [];
const isArray = item.constructor.name === 'Array';
const count = isArray ? item.length : item;
const pow = (x, n, m = []) => {
if (n > 0) {
for (var i = 0; i < count; i++) {
const value = pow(x, n - 1, [...m, isArray ? item[i] : i]);
result.push(value);
}
}
return m;
}
pow(isArray ? item.length : item, n);
return filter ? result.filter(item => item.length == n) : result;
}
console.log("#####first sample: ", combination(['A', 'B', 'C', 'D'], 2)); // with filter
console.log("#####second sample: ", combination(['A', 'B', 'C', 'D'])); // without filter
console.log("#####third sample: ", combination(4, 2)); // gives array with index number
Generating combinations of elements in an array is a lot like counting in a numeral system,
where the base is the number of elements in your array (if you account for the leading zeros that will be missing).
This gives you all the indices to your array (concatenated):
arr = ["apple", "banana", "lemon", "mango"]
base = arr.length
idx = [...Array(Math.pow(base, base)).keys()].map(x => x.toString(base))
You are only interested in pairs of two, so restrict the range accordingly:
range = (from, to) = [...Array(to).keys()].map(el => el + from)
indices = range => range.map(x => x.toString(base).padStart(2,"0"))
indices( range( 0, Math.pow(base, 2))) // range starts at 0, single digits are zero-padded.
Now what's left to do is map indices to values.
As you don't want elements paired with themselves and order doesn't matter,
those need to be removed, before mapping to the final result.
const range = (from, to) => [...Array(to).keys()].map(el => el + from)
const combinations = arr => {
const base = arr.length
return range(0, Math.pow(base, 2))
.map(x => x.toString(base).padStart(2, "0"))
.filter(i => !i.match(/(\d)\1/) && i === i.split('').sort().join(''))
.map(i => arr[i[0]] + " " + arr[i[1]])
}
console.log(combinations(["apple", "banana", "lemon", "mango"]))
With more than ten elements, toString() will return letters for indices; also, this will only work with up to 36 Elements.
Generating combinations is a classic problem. Here's my interpretation of that solution:
const combinations = (elements) => {
if (elements.length == 1) {
return [elements];
} else {
const tail = combinations(elements.slice(1));
return tail.reduce(
(combos, combo) => { combos.push([elements[0], ...combo]); return combos; },
[[elements[0]], ...tail]
);
}
};
const array = ["apple", "banana", "lemon", "mango"];
console.log(combinations(array));
Here is an non-mutating ES6 approach combining things (TS):
function combine (tail: any[], length: number, head: any[][] = [[]]): any[][] {
return tail.reduce((acc, tailElement) => {
const tailHeadVariants = head.reduce((acc, headElement: any[]) => {
const combination = [...headElement, tailElement]
return [...acc, combination]
}, [])
if (length === 1) return [...acc, tailHeadVariants]
const subCombinations = combine(tail.filter(t => t !== tailElement), length - 1, tailHeadVariants)
return [...acc, ...subCombinations]
}, [])
}
As this post is well indexed on Google under the keywords "generate all combinations", lots of people coming here simply need to generate all the unique combinations, regardless of the size of the output (not only pairs).
This post answers this need.
All unique combinations, without recursion:
const getCombos = async (a) => {
const separator = '';
const o = Object();
for (let i = 0; i < a.length; ++i) {
for (let j = i + 1; j <= a.length; ++j) {
const left = a.slice(i, j);
const right = a.slice(j, a.length);
o[left.join(separator)] = 1;
for (let k = 0; k < right.length; ++k) {
o[[...left, right[k]].join(separator)] = 1;
}
}
}
return Object.keys(o);
}
const a = ['a', 'b', 'c', 'd'];
const b = await getCombos(a);
console.log(b);
// (14) ['a', 'ab', 'ac', 'ad', 'abc', 'abd', 'abcd',
// 'b', 'bc', 'bd', 'bcd', 'c', 'cd', 'd']
This code splits the array into 2 sub arrays, left / right, then iterate over the right array to combine it with the left array. The left becomes bigger overtime, while the right becomes smaller. The result has only unique values.
Beating a dead horse a bit, but with smaller sets where recursion limit and performance is not a problem, the general combination generation can be done recursively with "recurse combinations containing the first element in given array" plus "recurse combinations not containing the first element". It gives quite compact implementation as a generator:
// Generator yielding k-item combinations of array a
function* choose(a, k) {
if(a.length == k) yield a;
else if(k == 0) yield [];
else {
for(let rest of choose(a.slice(1), k-1)) yield [a[0], ...rest];
for(let rest of choose(a.slice(1), k)) yield rest;
}
}
And even slightly shorter (and twice faster, 1 M calls of 7 choose 5 took 3.9 seconds with my MacBook) with function returning and array of combinations:
// Return an array of combinations
function comb(a, k) {
if(a.length === k) return [a];
else if(k === 0) return [[]];
else return [...comb(a.slice(1), k-1).map(c => [a[0], ...c]),
...comb(a.slice(1), k)];
}
I have string like the following:
11222233344444445666
What I would like to do is output the number followed the times it was displayed:
112433475163
Question is, I want this to be efficient. I can store this in an object as the following:
1: { id: 1, displayed: 2},
2: { id: 2, displayed: 1},
3: { id: 3, displayed: 2},
etc.
I can access this object and increment displayed.
My issues is, there is no guarantee in the order. I would like to store the keys in the order they are in the string. How do I accomplish the importance of the order in the object?
This is a proposal for run length coding with an array which holds infomation about one charcter and the count of it:
{
"char": "1",
"count": 2
},
var string = "11222233344444445666",
array = function () {
var r = [], o = {};
string.split('').forEach(function (a, i, aa) {
if (a !== aa[i - 1]) {
o[a] = { char: a, count: 0 };
r.push(o[a]);
}
o[a].count++;
});
return r;
}(string);
document.write('<pre>' + JSON.stringify(array, 0, 4) + '</pre>');
Quick solution with for loop:
var str = "7771122229933344444445666",
obj = {},
len = str.length,
val = null,
count_str = "",
key = "";
for (var i = 0; i < len; i++) {
val = str[i], key = 'k' + val;
if (!obj[key]) {
obj[key] = {'id': val, 'displayed': 1};
} else {
obj[key].displayed++;
}
}
for (var p in obj) {
count_str += obj[p]['id'] + obj[p]['displayed'];
}
console.log(count_str); // "7312249233475163"
because you have such a small set of distinct numbers, I seen no reason why you can't use a array (yeah it's not super ideal memorywise if you skip values and it becomes sparse, but for such a small subset it won't affect you enough to worry of it). Then you can use (number-1) as the index and increment that number as needed.
var counts = [];
var str = "11222233344444445666";
for(var i in str){
var index = parseInt(str[i])-1
counts[index] = (counts[index]||0)+1;
}
for(var i in counts){
var which = 1+parseInt(i);
var count = counts[i];
console.log("# of " + which +"'s: "+count);
}
https://jsfiddle.net/ga0fqpqn/
note: You shouldn't need the parseInt(i)... just +i should work but I think jsfiddle has a bug with it about it defaulting i to handle like a string.
You could store an additional array with the order of the numbers, which you only append to if the object doesn't yet contain the given number. Then once you're done counting, iterate through that array and output the number and the count from the lookup dictionary.
var chars = "1234576123452345".split("");
var order = [];
var hash = {};
chars.forEach(function(char) {
if (!hash[char]) {
hash[char] = 1;
order.push(char);
} else {
hash[char]++;
}
});
console.log(order.map(function(char) {
return char + hash[char];
}).join(""));
// "12233343537161"
Just wondering if there is some other way than this.
var hashStringArray = function(array) {
array.sort();
return array.join('|');
};
I don't like sorting much and using that delimiter is not safe either if it's contained in one of the strings. In overall I need to produce same hash no matter the order of strings. It will be rather short arrays (up to 10 items), but it will be required very often so it shouldn't be too slow.
I intend to use it with ES6 Map object and I need to easily find same array collection.
Updated example of use
var theMap = new Map();
var lookup = function(arr) {
var item = null;
var hashed = hashStringArray(arr);
if (item = theMap.get( hashed )) {
return item;
}
theMap.set( hashed, itemBasedOnInput );
return itemBasedOnInput;
}
var arr1 = ['alpha','beta','gama'];
var arr2 = ['beta','alpha','gama'];
lookup(arr1) === lookup(arr2)
Performance tests
http://jsperf.com/hashing-array-of-strings/5
Two things occurred to me as the basis of a solution:
summing doesn't depend on order, which is actually a flaw in simple checksums (they don't catch changes in block order within a word), and
we can convert strings to summable numbers using their charcodes
Here's a function to do (2) :
charsum = function(s) {
var i, sum = 0;
for (i = 0; i < s.length; i++) {
sum += (s.charCodeAt(i) * (i+1));
}
return sum
}
Here's a version of (1) that computes an array hash by summing the charsum values:
array_hash = function(a) {
var i, sum = 0
for (i = 0; i < a.length; i++) {
var cs = charsum(a[i])
sum = sum + (65027 / cs)
}
return ("" + sum).slice(0,16)
}
Fiddle here: http://jsfiddle.net/WS9dC/11/
If we did a straight sum of the charsum values, then the array ["a", "d"] would have the same hash as the array ["b", "c"] - leading to undesired collisions. So based on using non-UTF strings, where charcodes go up to 255, and allowing for 255 characters in each string, then the max return value of charsum is 255 * 255 = 65025. So I picked the next prime number up, 65027, and used (65027 / cs) to compute the hash. I am not 100% convinced this removes collisions... perhaps more thought needed... but it certainly fixes the [a, d] versus [b, c] case.
Testing:
var arr1 = ['alpha','beta','gama'];
var arr2 = ['beta','alpha','gama'];
console.log(array_hash(arr1))
console.log(array_hash(arr2))
console.log(array_hash(arr1) == array_hash(arr2))
Outputs:
443.5322979371356
443.5322979371356
true
And testing a case that shows different hashes:
var arr3 = ['a', 'd'];
var arr4 = ['b', 'c'];
console.log(array_hash(arr3))
console.log(array_hash(arr4))
console.log(array_hash(arr3) == array_hash(arr4))
outputs:
1320.651443298969
1320.3792001649144
false
Edit:
Here's a revised version, which ignore duplicates from the arrays as it goes, and return the hash based on unique items only:
http://jsfiddle.net/WS9dC/7/
array_hash = function(a) {
var i, sum = 0, product = 1
for (i = 0; i < a.length; i++) {
var cs = charsum(a[i])
if (product % cs > 0) {
product = product * cs
sum = sum + (65027 / cs)
}
}
return ("" + sum).slice(0, 16)
}
testing:
var arr1 = ['alpha', 'beta', 'gama', 'delta', 'theta', 'alpha', 'gama'];
var arr2 = ["beta", "gama", "alpha", "theta", "delta", "beta"];
console.log(array_hash(arr1))
console.log(array_hash(arr2))
console.log(array_hash(arr1) === array_hash(arr2))
returns:
689.878503111701
689.878503111701
true
Edit
I've revised the answer above to account for arrays of words that have the same letters. We need these to return different hashes, which they now do:
var arr1 = ['alpha', 'beta']
var arr2 = ['alhpa', 'ateb']
The fix was to add a multiplier to the charsum func based on the char index:
sum += (s.charCodeAt(i) * (i+1));
If you calculate a numeric hash code for each string, then you can combine them with an operator where the order doesn't matter, like the ^ XOR operator, then you don't need to sort the array:
function hashStringArray(array) {
var code = 0;
for (var i = 0; i < array.length; i++) {
var n = 0;
for (var j = 0; j < array[i].length; j++) {
n = n * 251 ^ array[i].charCodeAt(j);
}
code ^= n;
}
return code
};
You can do this:
var hashStringArray = function(array) {
return array.sort().join('\u200b');
};
The \u200b character is an unicode character that also means null, but is not the same as the \0 character, which is most widely used.
'\u200b' == '\0'
> false
An idea to have very fast hash if your set of possible string is less than 32 items long : hash the string with a built-in hash function that will return power-of two as hash :
function getStringHash(aString) {
var currentPO2 = 0;
var hashSet = [];
getStringHash = function ( aString) {
var aHash = hashSet[aString];
if (aHash) return aHash;
aHash = 1 << currentPO2++;
hashSet[aString] = aHash;
return aHash;
}
return getStringHash(aString);
}
Then use this hash on your string array, ORing the hashes ( | ) :
function getStringArrayHash( aStringArray) {
var aHash = 0;
for (var i=0; i<aStringArray.length; i++) {
aHash |= getStringHash(aStringArray[i]);
}
return aHash;
}
So to test a bit :
console.log(getStringHash('alpha')); // 1
console.log(getStringHash('beta')); // 2
console.log(getStringHash('gamma')); // 4
console.log(getStringHash('alpha')); // 1 again
var arr1 = ['alpha','beta','gama'];
var arr2 = ['beta','alpha','gama'];
var arr3 = ['alpha', 'teta'];
console.log(getStringArrayHash(arr1)); // 11
console.log(getStringArrayHash(arr2)); // 11 also, like for arr1
var arr3 = ['alpha', 'teta'];
console.log(getStringArrayHash(arr3)); // 17 : a different array has != hashset
jsbin is here : http://jsbin.com/rozanufa/1/edit?js,console
RQ !!! with this method, arrays are considered as set, meaning that a repeated item won't change the hash of an array !!!
This HAS to be faster since it uses only 1) function call 2) lookup 3) integer arithmetic.
So no sort, no (long) string, no concat.
jsperf confirms that :
http://jsperf.com/hashing-array-of-strings/4
EDIT :
version with prime numbers, here : http://jsbin.com/rozanufa/3/edit?js,console
// return the unique prime associated with the string.
function getPrimeStringHash(aString) {
var hashSet = [];
var currentPrimeIndex = 0;
var primes = [ 2, 3, 5, 7, 11, 13, 17 ];
getPrimeStringHash = function ( aString) {
var aPrime = hashSet[aString];
if (aPrime) return aPrime;
if (currentPrimeIndex == primes.length) aPrime = getNextPrime();
else aPrime = primes[currentPrimeIndex];
currentPrimeIndex++
hashSet[aString] = aPrime;
return aPrime;
};
return getPrimeStringHash(aString);
// compute next prime number, store it and returns it.
function getNextPrime() {
var pr = primes[primes.length-1];
do {
pr+=2;
var divides = false;
// discard the number if it divides by one earlier prime.
for (var i=0; i<primes.length; i++) {
if ( ( pr % primes[i] ) == 0 ) {
divides = true;
break;
}
}
} while (divides == true)
primes.push(pr);
return pr;
}
}
function getStringPrimeArrayHash( aStringArray) {
var primeMul = 1;
for (var i=0; i<aStringArray.length; i++) {
primeMul *= getPrimeStringHash(aStringArray[i]);
}
return primeMul;
}
function compareByPrimeHash( aStringArray, anotherStringArray) {
var mul1 = getStringPrimeArrayHash ( aStringArray ) ;
var mul2 = getStringPrimeArrayHash ( anotherStringArray ) ;
return ( mul1 > mul2 ) ?
! ( mul1 % mul2 )
: ! ( mul2 % mul1 );
// Rq : just test for mul1 == mul2 if you are sure there's no duplicates
}
Tests :
console.log(getPrimeStringHash('alpha')); // 2
console.log(getPrimeStringHash('beta')); // 3
console.log(getPrimeStringHash('gamma')); // 5
console.log(getPrimeStringHash('alpha')); // 2 again
console.log(getPrimeStringHash('a1')); // 7
console.log(getPrimeStringHash('a2')); // 11
var arr1 = ['alpha','beta','gamma'];
var arr2 = ['beta','alpha','gamma'];
var arr3 = ['alpha', 'teta'];
var arr4 = ['alpha','beta','gamma', 'alpha']; // == arr1 + duplicate 'alpha'
console.log(getStringPrimeArrayHash(arr1)); // 30
console.log(getStringPrimeArrayHash(arr2)); // 30 also, like for arr1
var arr3 = ['alpha', 'teta'];
console.log(getStringPrimeArrayHash(arr3)); // 26 : a different array has != hashset
console.log(compareByPrimeHash(arr1, arr2) ); // true
console.log(compareByPrimeHash(arr1, arr3) ); // false
console.log(compareByPrimeHash(arr1, arr4) ); // true despite duplicate
What is a clean way of taking a random sample, without replacement from an array in javascript? So suppose there is an array
x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
and I want to randomly sample 5 unique values; i.e. generate a random subset of length 5. To generate one random sample one could do something like:
x[Math.floor(Math.random()*x.length)];
But if this is done multiple times, there is a risk of a grabbing the same entry multiple times.
I suggest shuffling a copy of the array using the Fisher-Yates shuffle and taking a slice:
function getRandomSubarray(arr, size) {
var shuffled = arr.slice(0), i = arr.length, temp, index;
while (i--) {
index = Math.floor((i + 1) * Math.random());
temp = shuffled[index];
shuffled[index] = shuffled[i];
shuffled[i] = temp;
}
return shuffled.slice(0, size);
}
var x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15];
var fiveRandomMembers = getRandomSubarray(x, 5);
Note that this will not be the most efficient method for getting a small random subset of a large array because it shuffles the whole array unnecessarily. For better performance you could do a partial shuffle instead:
function getRandomSubarray(arr, size) {
var shuffled = arr.slice(0), i = arr.length, min = i - size, temp, index;
while (i-- > min) {
index = Math.floor((i + 1) * Math.random());
temp = shuffled[index];
shuffled[index] = shuffled[i];
shuffled[i] = temp;
}
return shuffled.slice(min);
}
A little late to the party but this could be solved with underscore's new sample method (underscore 1.5.2 - Sept 2013):
var x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15];
var randomFiveNumbers = _.sample(x, 5);
In my opinion, I do not think shuffling the entire deck necessary. You just need to make sure your sample is random not your deck. What you can do, is select the size amount from the front then swap each one in the sampling array with another position in it. So, if you allow replacement you get more and more shuffled.
function getRandom(length) { return Math.floor(Math.random()*(length)); }
function getRandomSample(array, size) {
var length = array.length;
for(var i = size; i--;) {
var index = getRandom(length);
var temp = array[index];
array[index] = array[i];
array[i] = temp;
}
return array.slice(0, size);
}
This algorithm is only 2*size steps, if you include the slice method, to select the random sample.
More Random
To make the sample more random, we can randomly select the starting point of the sample. But it is a little more expensive to get the sample.
function getRandomSample(array, size) {
var length = array.length, start = getRandom(length);
for(var i = size; i--;) {
var index = (start + i)%length, rindex = getRandom(length);
var temp = array[rindex];
array[rindex] = array[index];
array[index] = temp;
}
var end = start + size, sample = array.slice(start, end);
if(end > length)
sample = sample.concat(array.slice(0, end - length));
return sample;
}
What makes this more random is the fact that when you always just shuffling the front items you tend to not get them very often in the sample if the sampling array is large and the sample is small. This would not be a problem if the array was not supposed to always be the same. So, what this method does is change up this position where the shuffled region starts.
No Replacement
To not have to copy the sampling array and not worry about replacement, you can do the following but it does give you 3*size vs the 2*size.
function getRandomSample(array, size) {
var length = array.length, swaps = [], i = size, temp;
while(i--) {
var rindex = getRandom(length);
temp = array[rindex];
array[rindex] = array[i];
array[i] = temp;
swaps.push({ from: i, to: rindex });
}
var sample = array.slice(0, size);
// Put everything back.
i = size;
while(i--) {
var pop = swaps.pop();
temp = array[pop.from];
array[pop.from] = array[pop.to];
array[pop.to] = temp;
}
return sample;
}
No Replacement and More Random
To apply the algorithm that gave a little bit more random samples to the no replacement function:
function getRandomSample(array, size) {
var length = array.length, start = getRandom(length),
swaps = [], i = size, temp;
while(i--) {
var index = (start + i)%length, rindex = getRandom(length);
temp = array[rindex];
array[rindex] = array[index];
array[index] = temp;
swaps.push({ from: index, to: rindex });
}
var end = start + size, sample = array.slice(start, end);
if(end > length)
sample = sample.concat(array.slice(0, end - length));
// Put everything back.
i = size;
while(i--) {
var pop = swaps.pop();
temp = array[pop.from];
array[pop.from] = array[pop.to];
array[pop.to] = temp;
}
return sample;
}
Faster...
Like all of these post, this uses the Fisher-Yates Shuffle. But, I removed the over head of copying the array.
function getRandomSample(array, size) {
var r, i = array.length, end = i - size, temp, swaps = getRandomSample.swaps;
while (i-- > end) {
r = getRandom(i + 1);
temp = array[r];
array[r] = array[i];
array[i] = temp;
swaps.push(i);
swaps.push(r);
}
var sample = array.slice(end);
while(size--) {
i = swaps.pop();
r = swaps.pop();
temp = array[i];
array[i] = array[r];
array[r] = temp;
}
return sample;
}
getRandomSample.swaps = [];
Or... if you use underscore.js...
_und = require('underscore');
...
function sample(a, n) {
return _und.take(_und.shuffle(a), n);
}
Simple enough.
You can get a 5 elements sample by this way:
var sample = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
.map(a => [a,Math.random()])
.sort((a,b) => {return a[1] < b[1] ? -1 : 1;})
.slice(0,5)
.map(a => a[0]);
You can define it as a function to use in your code:
var randomSample = function(arr,num){ return arr.map(a => [a,Math.random()]).sort((a,b) => {return a[1] < b[1] ? -1 : 1;}).slice(0,num).map(a => a[0]); }
Or add it to the Array object itself:
Array.prototype.sample = function(num){ return this.map(a => [a,Math.random()]).sort((a,b) => {return a[1] < b[1] ? -1 : 1;}).slice(0,num).map(a => a[0]); };
if you want, you can separate the code for to have 2 functionalities (Shuffle and Sample):
Array.prototype.shuffle = function(){ return this.map(a => [a,Math.random()]).sort((a,b) => {return a[1] < b[1] ? -1 : 1;}).map(a => a[0]); };
Array.prototype.sample = function(num){ return this.shuffle().slice(0,num); };
While I strongly support using the Fisher-Yates Shuffle, as suggested by Tim Down, here's a very short method for achieving a random subset as requested, mathematically correct, including the empty set, and the given set itself.
Note solution depends on lodash / underscore:
Lodash v4
const _ = require('loadsh')
function subset(arr) {
return _.sampleSize(arr, _.random(arr.length))
}
Lodash v3
const _ = require('loadsh')
function subset(arr) {
return _.sample(arr, _.random(arr.length));
}
If you're using lodash the API changed in 4.x:
const oneItem = _.sample(arr);
const nItems = _.sampleSize(arr, n);
https://lodash.com/docs#sampleSize
A lot of these answers talk about cloning, shuffling, slicing the original array. I was curious why this helps from a entropy/distribution perspective.
I'm no expert but I did write a sample function using the indexes to avoid any array mutations — it does add to a Set though. I also don't know how the random distribution on this but the code was simple enough to I think warrant an answer here.
function sample(array, size = 1) {
const { floor, random } = Math;
let sampleSet = new Set();
for (let i = 0; i < size; i++) {
let index;
do { index = floor(random() * array.length); }
while (sampleSet.has(index));
sampleSet.add(index);
}
return [...sampleSet].map(i => array[i]);
}
const words = [
'confused', 'astonishing', 'mint', 'engine', 'team', 'cowardly', 'cooperative',
'repair', 'unwritten', 'detailed', 'fortunate', 'value', 'dogs', 'air', 'found',
'crooked', 'useless', 'treatment', 'surprise', 'hill', 'finger', 'pet',
'adjustment', 'alleged', 'income'
];
console.log(sample(words, 4));
Perhaps I am missing something, but it seems there is a solution that does not require the complexity or potential overhead of a shuffle:
function sample(array,size) {
const results = [],
sampled = {};
while(results.length<size && results.length<array.length) {
const index = Math.trunc(Math.random() * array.length);
if(!sampled[index]) {
results.push(array[index]);
sampled[index] = true;
}
}
return results;
}
Here is another implementation based on Fisher-Yates Shuffle. But this one is optimized for the case where the sample size is significantly smaller than the array length. This implementation doesn't scan the entire array nor allocates arrays as large as the original array. It uses sparse arrays to reduce memory allocation.
function getRandomSample(array, count) {
var indices = [];
var result = new Array(count);
for (let i = 0; i < count; i++ ) {
let j = Math.floor(Math.random() * (array.length - i) + i);
result[i] = array[indices[j] === undefined ? j : indices[j]];
indices[j] = indices[i] === undefined ? i : indices[i];
}
return result;
}
You can remove the elements from a copy of the array as you select them. Performance is probably not ideal, but it might be OK for what you need:
function getRandom(arr, size) {
var copy = arr.slice(0), rand = [];
for (var i = 0; i < size && i < copy.length; i++) {
var index = Math.floor(Math.random() * copy.length);
rand.push(copy.splice(index, 1)[0]);
}
return rand;
}
For very large arrays, it's more efficient to work with indexes rather than the members of the array.
This is what I ended up with after not finding anything I liked on this page.
/**
* Get a random subset of an array
* #param {Array} arr - Array to take a smaple of.
* #param {Number} sample_size - Size of sample to pull.
* #param {Boolean} return_indexes - If true, return indexes rather than members
* #returns {Array|Boolean} - An array containing random a subset of the members or indexes.
*/
function getArraySample(arr, sample_size, return_indexes = false) {
if(sample_size > arr.length) return false;
const sample_idxs = [];
const randomIndex = () => Math.floor(Math.random() * arr.length);
while(sample_size > sample_idxs.length){
let idx = randomIndex();
while(sample_idxs.includes(idx)) idx = randomIndex();
sample_idxs.push(idx);
}
sample_idxs.sort((a, b) => a > b ? 1 : -1);
if(return_indexes) return sample_idxs;
return sample_idxs.map(i => arr[i]);
}
My approach on this is to create a getRandomIndexes method that you can use to create an array of the indexes that you will pull from the main array. In this case, I added a simple logic to avoid the same index in the sample. this is how it works
const getRandomIndexes = (length, size) => {
const indexes = [];
const created = {};
while (indexes.length < size) {
const random = Math.floor(Math.random() * length);
if (!created[random]) {
indexes.push(random);
created[random] = true;
}
}
return indexes;
};
This function independently of whatever you have is going to give you an array of indexes that you can use to pull the values from your array of length length, so could be sampled by
const myArray = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
getRandomIndexes(myArray.length, 3).map(i => myArray[i])
Every time you call the method you are going to get a different sample of myArray. at this point, this solution is cool but could be even better to sample different sizes. if you want to do that you can use
getRandomIndexes(myArray.length, Math.ceil(Math.random() * 6)).map(i => myArray[i])
will give you a different sample size from 1-6 every time you call it.
I hope this has helped :D
Underscore.js is about 70kb. if you don't need all the extra crap, rando.js is only about 2kb (97% smaller), and it works like this:
console.log(randoSequence([8, 6, 7, 5, 3, 0, 9]).slice(-5));
<script src="https://randojs.com/2.0.0.js"></script>
You can see that it keeps track of the original indices by default in case two values are the same but you still care about which one was picked. If you don't need those, you can just add a map, like this:
console.log(randoSequence([8, 6, 7, 5, 3, 0, 9]).slice(-5).map((i) => i.value));
<script src="https://randojs.com/2.0.0.js"></script>
D3-array's shuffle uses the Fisher-Yeates shuffle algorithm to randomly re-order arrays. It is a mutating function - meaning that the original array is re-ordered in place, which is good for performance.
D3 is for the browser - it is more complicated to use with node.
https://github.com/d3/d3-array#shuffle
npm install d3-array
//import {shuffle} from "d3-array"
let x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15];
d3.shuffle(x)
console.log(x) // it is shuffled
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.0.0/d3.min.js"></script>
If you don't want to mutate the original array
let x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15];
let shuffled_x = d3.shuffle(x.slice()) //calling slice with no parameters returns a copy of the original array
console.log(x) // not shuffled
console.log(shuffled_x)
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.0.0/d3.min.js"></script>