Efficient way to dedupe large list of arrays

Efficient way to dedupe large list of arrays - javascript

I have a very large array of arrays (on the order of 960,799 entries or possibly much larger). I need to process it into a new array such that:
Each sub-array contains no duplicates.
The main array contains no duplicate sub-arrays.
The problem is that "duplicate sub-arrays" must include arrays with the same values in a different order. In other words, if I had these sub-arrays:
[[1,2,3], [1,2,3], [3,1,2]]
They would all be considered duplicates and only one would be kept (any of them, it doesn't matter; I've been just keeping the first one; it's also fine if the order of the selected sub-array doesn't actually match, i.e. if the order of elements in the sub-array changes during processing).
My attempted solution has been to map all the sub-arrays into strings based on de-duping the sub-array, sorting it, and joining it with a delimiter. Then I de-dupe that final array, then map them back to arrays with a split. It works, but the process is extremely slow. It takes over 30 seconds for a single pass, and since the array I end up processing can grow exponentially larger, this is not acceptable. I need a more efficient algorithm.
Here's the code I'm using now that's slow (ret is the input array):
const stringList = ret.map(list => {
return [...new Set(list)].sort().join('|');
});
const hashSet = new Set(stringList);
const output = [...hashSet].map(str => str.split('|'));
Can anyone help me get the same result more efficiently? Thanks.
EDIT
To elaborate, I'm getting these massive input arrays by calculating what is essentially the power set of some input of strings. This is the code; if it's possible to stop it from producing duplicate entries in the first place, that would work well, too, I think:
// Calculate the Cartesian product of set s
function cart(s) {
return s.reduce((acc, val) => {
return acc.map((x, i) => {
return val.map(y => {
return x.concat([y]);
});
}).flat();
}, [[]]);
}
// Use the Cartesian product to calculate the power set of set s
function pset(s) {
let ret = [];
for (let i = 0; i < s.length; ++i) {
const temp = [];
for (let j = 0; j <= i; ++j) {
temp.push([].concat(s));
}
ret = ret.concat(cart(temp));
}
return ret;
}

You could generate the power set without duplicates.
function pset(array) {
function iter(index, temp) {
if (index >= array.length) {
temp.length && result.push(temp);
return;
}
iter(index + 1, temp.concat(array[index]));
iter(index + 1, temp);
}
var result = [];
iter(0, []);
return result;
}
console.log(pset(['a', 'b', 'c']));
.as-console-wrapper { max-height: 100% !important; top: 0; }

Given that I'm not able to perform a benchmark with real data, I can't verify how much faster this approach is for your use case, but by using basic for loops and avoiding functional code as much as conveniently possible, I've come up with the following:
const ret = [[1, 2, 3], [1, 2, 3], [3, 1, 2], [1, 4, 5], [4, 1, 5]];
function ascending (a, b) {
// works for strings and numbers
return -(a < b) || +(a > b);
}
function ascending2d (a, b) {
const aLength = a.length;
const bLength = b.length;
const length = Math.min(aLength, bLength);
for (let i = 0; i < length; ++i) {
const difference = ascending(a[i], b[i]);
if (difference !== 0) return difference;
}
return aLength - bLength;
}
for (let i = 0; i < ret.length; ++i) {
ret[i].sort(ascending);
}
ret.sort(ascending2d);
const output = [ret[0]];
for (let i = 1; i < ret.length; ++i) {
const value = ret[i];
if (ascending2d(ret[i - 1], value) !== 0) output.push(value);
}
console.log(output);
Let me know if this is an improvement over your current approach. You can always improve performance further by profiling your code and looking for bottlenecks that can be re-written.
Performance Benchmark
I've published a benchmark using the test data in my example here, comparing your original solution, my solution, and Andrew's solution. I couldn't include Nina's for comparison because hers doesn't perform deduplication on ret, instead it modifies the generation of ret.

EDIT: Nevermind, my implementation had no benchmarks. It is slower. Due to the underlying implementation of JSON.parse, JSON.stringify, and the default algorithm for Array#sort.
Since you're looking for bleeding edge performance, it's hard to get an elegant solution. If you instantiate an object with Object.create(null) you minimize the overhead for O(1) insertion. It creates a POJO with no prototype. You also don't need to check in the for in loop for Object.hasOwnProperty, because there's no prototype to search.
const ret = [[], [1, 2, 3], [3, 1, 2], [1, 4, 5], [4, 1, 5]];
const hashMap = Object.create(null)
function createUniqArraysOfPrimitiveArrays(ret) {
for (let i = 0; i < ret.length; i++) {
const currEl = ret[i]
if (currEl.length === 0) {
hashMap['[]'] = null
} else if (currEl.length === 1) {
hashMap[`[${currEl[0]}]`] = null
} else {
hashMap[JSON.stringify(currEl.sort())] = null
}
}
const outputArray = []
for (const array in hashMap) {
outputArray.push(JSON.parse(array))
}
return outputArray
}
console.log(createUniqArraysOfPrimitiveArrays(ret))

Related

javascript function to find the second largest element in an array

I am completing the hackerrank's 10 days of javascript challenge. The question:
write a function to take an array as an argument and then return the second largest element in the array.
I have written the code but my code is returning the largest element and not the second largest as asked.
function getSecondLargest(nums) {
// Complete the function
var largest=nums[0];
for(let i=1;i<nums.length;++i)
{
if(nums[i]>largest)
largest=nums[i];
}
var large=nums[0];
for(let j=1;j<nums.length;++j)
{
if(large<nums[j]&&large<largest)
large=nums[j];
}
return large;
}
When input array nums={2,3,6,6,5} the result is coming 6 while expected output is 5. Please help and point out the errors in the function code below.

should not initialize large with first value var large=nums[0]; because it may appear the biggest value and won't work
should use nums[j]<largest instead of large<largest as mentioned above
I think don't need second loop as all checks can be done in first loop, and you can assign prev largest to large whenever you change it:
function getSecondLargest(nums) {
var largest = nums[0];
var large;
for (let i = 1; i < nums.length; ++i) {
if (nums[i] > largest) {
large = largest;
largest = nums[i];
} else if (nums[i] > large || typeof large === 'undefined') {
large = nums[i]
}
}
return large;
}
console.log(getSecondLargest([5,1-2,3]))
console.log(getSecondLargest([-5,1,-2,3]))

GET SECOND LARGEST
first, I create new array with unique values.
let arr = [...new Set(nums)];
second, sort value using built-in function .sort().
note : by default .sort() always sorts asciibetically, but for some testcase, it doesn't work. So, I put (a, b) => { return a - b } to make sure it will work properly.
arr = arr.sort((a, b) => { return a -b });
third, get the value from arr
let result = arr[arr.length - 2] || arr[0];
finally, return the result
return result
function getSecondLargest(nums) {
let arr = [...new Set(nums)];
//Javascript's array member method .sort( always sorts asciibetically.
arr = arr.sort((a, b) => { return a - b });
let result = arr[arr.length - 2] || arr[0];
return result
}

Just one minor change:
Use nums[j]<largest instead of large<largest in the second for loop
function getSecondLargest(nums) {
// Complete the function
var largest=nums[0];
for(let i=1;i<nums.length;++i)
{
if(nums[i]>largest)
largest=nums[i];
}
var large;
//To ensure that the selected number is not the largest
for(let j=0;j<nums.length;++j)
{
if (nums[j] !== largest){
large = nums[j];
break;
}
}
for(let j=1;j<nums.length;++j)
{
if(large<nums[j]&&nums[j]!=largest)
large=nums[j];
else
console.log(large)
}
return large;
}
var secondLargest = getSecondLargest([6,3,6,6,5]);
console.log("Second largest number", secondLargest);

If you want to avoid using library functions like #ifaruki suggests, this line
if(large<nums[j]&&large<largest)
should read
if (large<nums[j] && nums[j] < largest)
Sorting and picking the second or second-to-last value fails when there are duplicates of the highest value in the input array.

Another easiest logic is to remove duplicates from the array and sort.
let givenArray = [2, 3, 6, 6, 5];
let uniqueArray = [...new Set(givenArray)];
console.log("The second largets element is", uniqueArray.sort()[uniqueArray.length - 2]);

I know you had your question answered, just thought I would provide my solution for any future users looking into this.
You can use reduce to go through the array while remembering the two largest numbers so far.
You just make a simple reduction function:
function twoMax(two_max, candidate)
{
if (candidate > two_max[0]) return [candidate,two_max[0]];
else if (candidate > two_max[1]) return [two_max[0],candidate];
else return two_max;
}
And then you use it for example like this:
let my_array = [0,1,5,7,0,8,12];
let two_largest = my_array.reduce(twoMax,[-Infinity,-Infinity]);
let second_largest = two_largest[1];
This solution doesn't require sorting and goes through the array only once.

If you want to avoid using **sort method. I think here's the easiest logic to do that, which will also work in arrays where there's duplicates of largest integer exists.
function getSecondLargest(arr) {
const largest = Math.max.apply(null, arr);
for (let i = 0; i < arr.length; i++) {
if (largest === arr[i]) {
arr[i] = -Infinity;
}
}
return Math.max.apply(null, arr);
}
console.log(getSecondLargest([5, 7, 11, 11, 11])); //7

Optimize Time Complexity For Odd Occurrences In Array

I have this code that pairs same elements in an array, with the expectation that the array will have an odd length and it should return the only element that couldn't get a pair. So I wrote the code just well, and it works fine for smaller arrays, but with very large big integers of over 1 billion, the time complexity became O(N**2) and then the need to refactor my code to get a much better performance for large arrays and large array elements. Here is my code below;
function solution(A) {
if(!Array.isArray(A)) return 0;
var temp = new Array(A.length);
var position = 0;
for(let i=0; i<A.length; i++){
if(temp.includes(A[i])){
position = temp.indexOf(A[i]);
index = A.indexOf(A[i]);
delete temp[position];
delete A[index];
delete A[i];
}else{
temp[i] = A[i];
}
}
for(let j=0; j<A.length; j++){
if(A[j] !== undefined) return A[j];
else continue;
}
}
To test it, source data can look like [2,3,6,7,3,5,5,6,2] and it will give an output of 7. But when the array is so large up to [1,2,....] with length n = n=999,999, or n = 5000,000,000, the time complexity increases exponentially.

You might use Object to store non-paired elements only.
Please note that you don't need to store all the array elements and their counts in the Object and then filter by count (like #StepUp does).
Everything's been done in a single loop.
The function returns Array of all non-paired elements:
const solution = A => Array.isArray(A) ?
Object.keys(
A.reduce((r, k) => {
r[k] = r[k] || 0;
if (++r[k] > 1) delete r[k];
return r;
}, {})
) : [];
console.log(solution([2, 3, 6, 7, 3, 5, 5, 6, 2]))

We can try to find odd occurrences for one iteration by using great features of object. Object is key - value pair. So access to object key is O(1). So when we meet the same element, then we just increment value:
const hashMap = arr.reduce((a, c)=> {
a[c] = a[c] || 0;
a[c] += 1;
return a;
},{})
const result = Object.keys(hashMap).filter(key => hashMap[key] === 1);
An example:
let arr = [2, 3, 6, 7, 3, 5, 5, 6, 2];
const hashMap = arr.reduce((a, c)=> {
a[c] = a[c] || 0;
a[c] += 1;
return a;
},{})
const result = Object.keys(hashMap).filter(key => hashMap[key] === 1);
console.log(result);

My two 100% JavaScript solutions with optimized time complexity. The first one is using Set:
function solution(A) {
const pairs = new Set();
for (const num of A) {
if (pairs.has(num)) {
pairs.delete(num);
} else {
pairs.add(num);
}
}
const [unpaired] = pairs;
return unpaired;
}
The second one is using bitwise XOR:
function solution(A) {
let unpaired;
for (const num of A) {
unpaired ^= num;
}
return unpaired;
}

JS - Lesson on codility involving perm check

I'm studing a solution of this lesson:
https://app.codility.com/programmers/lessons/4-counting_elements/perm_check/
I headed up of this solution made my a github user.
https://github.com/daraosn/codility/tree/master/02-CountingElements/02-PermCheck/javascript
I did understand everything of the code below:
function solution(A) {
var N = A.length;
var sum = (N * (N+1)) / 2;
var tap = [];
for (var i in A) {
sum-=A[i];
if(tap[A[i]]) {
return 0;
}
tap[A[i]] = true;
}
return +(sum==0);
}
with exception of these code lines below:
if(tap[A[i]]) {
return 0;
}
tap[A[i]] = true;
What is its purppose? I didn't understand.
I did a test deleting these code lines from the answer in the
codility interface and it returned 75% right instead of 100% when I had these lines

function solution(A) {
const set = new Set(A)
const max = Math.max(...A)
return set.size === max && set.size === A.length ? 1:0
}

That section checks to see if the number being iterated over has been found before, and per the instructions, duplicates are forbidden:
A permutation is a sequence containing each element from 1 to N once, and only once.
On every normal iteration, the current number being iterated over is assigned to a property of tap:
tap[A[i]] = true;
Then, on subsequent iterations, that test checks to see if the new number being iterated over has already been used:
if(tap[A[i]]) {
return 0;
}
This helps to invalidate inputs like [2, 2, 2], while permitting inputs like [1, 2, 3].
That said, there are two major red flags with this. First, for..in shouldn't be used to iterate over arrays. Instead:
for (const num of A) {
// use num
}
Also, sparse arrays are a very bad idea - it would make much more sense to use an object:
var tap = {};
or a Set:
var tap = new Set();
for (const num of A) {
sum -= num;
if (tap.has(num)) {
return 0;
}
tap.add(num);
}
return +(sum == 0);

Array solution is not so proper way such above explaining. But I will put the solution(O(n)) in case you want :)
const solution = A => ~~(A.sort((a,b) => a-b).every((a,i) => a === i+1));

Finding nested duplicate arrays in JavaScript. (Nested Array uniq in lodash/underscore)

I am trying to determine if an array of JavaScript arrays contains duplicates. Is this possible? I am first trying to see if I can strip the duplicates out and then do an equality check but I cannot get past the first part. Here is what underscore returns:
var arr1 = [[1,2], [2,3], [1,2]];
var arr2 = _.uniq(arr1);
var arraysAreEqual = _.isEqual(arr1, arr2);
console.log(arraysAreEqual, arr1, arr2);
// true
Jsbin: http://jsbin.com/vogumo/1/edit?js,console
Anyone know of a way to determine if the array contains duplicate arrays?

It's a little sloppy, but (possible)
var arr2 = _.uniq(arr1, function(item) {
return JSON.stringify(item);
});
will give you a correct result

Try This:
var numArray = [1, 7, 3, 0, 9, 7, 8, 6, 2, 3];
var duplicates = [];
var sortednumArray = numArray.sort();
for (var i = 0; i < sortednumArray.length; i++) {
//console.log(sortednumArray[i]);
if (sortednumArray[i] == sortednumArray[i + 1]) {
duplicates.push(sortednumArray[i]);
}
}
if (duplicates.length == 0) {
console.log("Soted Array:");
for(var i = 0; i < sortednumArray.length; i++) {
console.log(sortednumArray[i]);
}
} else {
console.log("Duplicates:");
for(var i = 0; i < duplicates.length; i++){
console.log(duplicates[i]);
}
}
Program pushes all duplicates to an array called 'duplicates' then displays it, but if none are present, it displays the sorted version of numArray

From the underscore.js documentation:
uniq _.uniq(array, [isSorted], [iteratee]) Alias: unique
Produces a
duplicate-free version of the array, using === to test object
equality. If you know in advance that the array is sorted, passing
true for isSorted will run a much faster algorithm. If you want to
compute unique items based on a transformation, pass an iteratee
function.
But arrays can't be strictly compared in JavaScript.
Therefore, you can use a transformation function to enable comparison with uniq. For example:
console.log([1,2] === [1,2]) // false, can't strict compare arrays
console.log([1,2].toString()) // "1,2" - string representation
console.log([1,2].toString() === [1,2].toString()) // true, strings can be compared
var valueToString = function(v) {return v.toString()}; // transform array to string
var arr1 = [[1,2], [2,3], [1,2]];
var arr2 = _.uniq(arr1, false, valueToString); // compare based on transformation
var arraysAreEqual = _.isEqual(arr1, arr2);
console.log("arraysAreEqual:", arraysAreEqual, arr1, arr2);
// false
// [[1, 2], [2, 3], [1, 2]]
// [[1, 2], [2, 3]]
Note that transforming to string is "hacky": you would be better off comparing each value of the array, as discussed in this StackOverflow question.
By using the proposed equals implementation in that question, you would need to implement your own version of uniq that uses equals instead of ===.
The implementation of uniq in Underscore is very straight-forward - it creates a new result array and loops through the given array. If the current value is not already in result, insert it.
console.log("Using array comparison:");
arrayEquals = function (array1, array2) {
// if any array is a falsy value, return
if (!array1 || !array2)
return false;
// compare lengths - can save a lot of time
if (array1.length != array2.length)
return false;
for (var i = 0, l=array1.length; i < l; i++) {
// Check if we have nested arrays
if (array1[i] instanceof Array && array2[i] instanceof Array) {
// recurse into the nested arrays
if (!arrayEquals(array1[i],array2[i]))
return false;
}
else if (array1[i] !== array2[i]) {
return false;
}
}
return true;
};
_.uniqArrays = function(array) {
if (array == null) return [];
var result = [];
for (var i = 0, length = array.length; i < length; i++) {
var value = array[i];
var arrayEqualsToValue = arrayEquals.bind(this, value); // arrayEquals with first argument set to value
var existing = _.find(result, arrayEqualsToValue); // did we already find this?
if (!existing) {
result.push(value);
}
}
return result;
};
var arr3 = _.uniqArrays(arr1);
arraysAreEqual = _.isEqual(arr1, arr3);
console.log("arraysAreEqual:", arraysAreEqual, arr1, arr3); // false
I made a jsbin with all the code, if you want to play around.

In the latest lodash (4.6.1) you could do something like this:
if (_.uniqWith(arr, _.isEqual).length < arr.length) {
// then there were duplicates
}

Binomial sub arrays

I have an A array with n length.
I want to take all possible k (0
for example, if i have A's length is five:
[1,2,3,4,5]
and if k = 3, algorithm must give me B array.
[1,2,3 ]
[1,2, 4 ]
[1,2, 5]
[1, 3,4 ]
[1, 3, 5]
[1, 4,5]
[ 2,3,4 ]
[ 2,3, 5]
[ 2, 4,5]
[ 3,4,5]
Length of B would be equal to n!/k!(n-k)! ('!' means factorial, Newtons method)
I'm using javascript, so in my tags i included it, but it's just algorithm, not necessary written in javascript.

You could do this via a filter method.
In your example you want to receive all permutations of an array, taking a specific number of elements of that array.
You can easily do that in an iterative manner.
Start by taking all permutations of n - 1 elements of an array:
// return all (n - 1) element permutations of an array
var permutations = function(arr) {
return arr.reduce(function(re, value, i) {
// add an array for each element in the original array
return re.concat([arr.filter(function(v, index) {
// drop each element with the same index
return index !== i
})])
}, [])
}
Now permutations([1,2,3]) would return [[1,2], [1,3], [2,3]]
That's always a disjoint set suppose you're having only unique values in the source array.
To receive all 3-element arrays of a 5-element array, you would first calculate the list of 4-element arrays and transform each of them to a 3-element array.
permutations([1,2,3,4]).map(permutations)
=> [[1,2,3] => [[[1,2], [1,3], [2,3]]
,[1,2,4] ,[[1,2], [1,4], [2,4]]
,[1,3,4] ,[[1,3], [1,4], [3,4]]
,[2,3,4] ,[[2,3], [2,4], [3,4]]
] ]
Obviously the problem here is that there are doubles.
That can be solved by dropping all non-unique values.
var unique = function(arr) {
var s = arr.map(function(v) { return "" + v })
return arr.filter(function(v, i) { return s.indexOf("" + v) == i })
}
Packing it all into one function could be done like this:
var permutationsWithLength = function(arr, length) {
var re = [arr]
for (var i = arr.length; i >= length; i--) {
re = re.reduce(function(tmp, perms) {
return unique(temp.concat(permutations(perms)))
}, [])
}
return re
}
I admit that this may not be the fastest approach, especially regarding the unique function, but it's a very generic one and will work for the problem you described even with larger arrays.
Hope it helps ;)

Below is the copy-paste from one of my projects. Don't know if it still works ;)
var choose = function choose_func(elems, len) {
var result = [];
for (var i=0; i<elems.length; i++) {
if (len == 1) {
result.push([elems[i]]);
} else {
var remainingItems = choose_func(elems.slice(i+1, elems.length), len - 1);
for (var j=0; j<remainingItems.length; j++)
result.push([elems[i]].concat(remainingItems[j]));
}
}
return result;
};
var result = choose([1,2,3,4,5], 3)
/*result = [[1,2,3],[1,2,4],[1,2,5],[1,3,4],[1,3,5],
[1,4,5],[2,3,4],[2,3,5],[2,4,5],[3,4,5]] */

We Keep Coding

JavaScript is the programming language of the Web.

Efficient way to dedupe large list of arrays - javascript

Related

javascript function to find the second largest element in an array

Optimize Time Complexity For Odd Occurrences In Array

JS - Lesson on codility involving perm check

Finding nested duplicate arrays in JavaScript. (Nested Array uniq in lodash/underscore)

Binomial sub arrays

Categories

Resources