I have a group of arrays that I need to filter out duplicates. It needs to work in such a fashion that within each array, there are no duplicates, and within the total group, there are no two arrays that hold the same values.
The first part is easy - for each inner array, I can apply Set to the array and filter it out. So, given the matrix arrays I can apply the following to filter:
const sets : string[][] = arrays.map(arr=>[...new Set(arr)].sort());
This will give me an array of sets. How can I make this into a set of sets? As in, if sets=[[a, b],[c],[d, a],[c],[e]] I would like setOfSets to equal [[a, b],[c],[d, a],[e]]?
Applying setOfSets = [...new Set(sets)]; would not work, since arrays that are equal are not considered equal by default if they have different addresses. Is there a way to force set to check by value, or another effective way to create this effect?
Edit
Original matrix:
[[a, b, b],
[c,c],
[b,a],
[d,a],
[c,c],
[e,e]]
after creating and sorting sets:
[[a,b],
[c],
[a,b],
[d,a],
[c],
[e]]
desired result:
[[a,b],
[c],
[d,a],
[e]]
If the data in your set is easy to serialize, I would opt for a solution like this:
const data = [
["a", "b", "b"],
["c","c"],
["b","a"],
["d","a"],
["c","c"],
["e","e"]
];
// Create the "hash" of your set
const serializeSet = s => Array
.from(s)
.sort()
.join("___");
// Create a map (or object) that ensures 1 entry per hash
const outputMap = data
.map(xs => new Set(xs))
.reduce(
(acc, s) => acc.set(serializeSet(s), s),
new Map()
);
// Turn your Map and Sets back in to arrays
const output = Array
.from(outputMap.values())
.map(s => Array.from(s));
console.log(output);
To come up with a good hash function for your set, you need to have a good look at your data. For example:
When your arrays consist of single characters from a-z, like in my example above, we can sort those strings using a default sorter and then join the result using a character from outside the a-z range.
If your arrays consist of random strings or numbers, JSON.stringify(Array.from(s).sort()) is safer to use
When your arrays consist of plain objects, you could JSON.stringify its sorted elements, but watch out for differences in the order of objects properties! (e.g. {a: 1, b: 2} vs {b: 2, a: 1})
Related
I am learning about join algorithms in regards to relational data query processing. The simple case is the nested loop join:
function nestedJoin(R, S, compare) {
const out = []
for (const r of R) {
for (const s of S) {
if (compare(r, s)) {
out.push([ r, s ])
}
}
}
return out
}
Where compare would compare the join attribute.
The case I'm wondering about is the index join. Copying from that cheat sheet into JS sort of, we have:
function indexJoin(R, S) {
const out = []
for (const r of R) {
const X = findInIndex(S.C, r.c)
for (const s of X) {
out.push([ r, s ])
}
}
return out
}
But what is that findInIndex(S.C, r.c)? What is being passed into it (S.C)? And how does it work?
The join indices paper says this:
With no indices, two basic algorithms based on sorting [4] and hashing
[5, lo] avoid the prohibitive cost of the nested loop method.
A join index is a binary relation. It only contains pairs of surrogates which makes
( it small. However, for generality, we assume that it does not always fit in RAM.
Therefore, a join index must be clustered. Since we may need fast access to JI
tuples via either r values or s values depending on whether there are selects on
relations R or S, a JI should be clustered on (r, s). A simple and uniform solution
is to maintain two copies of the JI, one clustered on r and the other clustered on
s. Each copy is implemented by a W-tree an efficient variation of the versatile
B-tree [l, 71.
So if it were a B+tree, what would the key and value be used in each B+tree, and how would you use the keys and values (in what order do you plugin keys and get values)? Also, cannot the "join index" just be implemented something like this if it were in JavaScript?
const joinIndex = {}
function join(r, s) {
const rest = joinIndex[r.c] = joinIndex[r.c] ?? {}
rest[s.c] = true
}
function findInIndex(leftKey) {
return Object.keys(joinIndex[leftKey])
}
Please show how the join algorithm would be implemented, either with my approach or the B+tree approach. If it is the B+tree approach, you don't need to implement a B+tree, but just explain how you would plug things in/out of the 2 B+trees to explain the algorithm with more clarity.
First of all, the join index that the paper speaks of, can best be imagined as a table that implements a many-to-many relationship between two tables. A record in this join index table consists of two foreign keys: one referencing the primary key in the R table, and another referencing the primary key in the S table.
I didn't get the S.C notation used in the cheat sheet. But it is clear you'll somehow need to specify which join index to use, and more specifically, which B+Tree (clustering) you want to use on it (in case two of them are defined), and finally, which value (r.c, the key of r) you want to find in it.
The role of the B+tree is to provide an ordered hash table, i.e. where you can search a key efficiently, and can easily walk from that point to the subsequent entries in order. In this particular use for a join index, this allows you to efficiently find all pairs (r1, s) for a given r1. The first of those would be found by drilling down from the root of the B+tree to the first leaf having r1. Then a walk forward across the bottom layer of the B+tree would find all the other tuples with r1, until a tuple is encountered that no longer has this r1 value.
Note that you still need an index on the original tables as well, in order to find the complete record for a given key. In practice that could also be done with a B+Tree, but in JavaScript, a simple dictionary (plain object) would suffice.
So in JavaScript syntax we could imagine something like this:
// Arguments:
// - joinIndexTree: a B+Tree having (rKey, sKey) tuples, keyed and ordered by rKey.
// - rKey: the key to search all matches for
function findInIndex(joinIndexTree, rKey) {
let result = []; // This will collect all the sKey for
// which thee is a (rKey, sKey)
// Find left-most location in B+Tree where rKey should occur (if present)
let btreeCursor = joinIndexTree.find(rKey);
if (btreeCursor.EOF()) return result; // At the far-right end of the B+Tree
let tuple = btreeCursor.get(); // Read the tuple found at this location
while (tuple[0] == rKey) { // First member of tuple matches rKey
result.push(tuple[1]); // Collect the corresponding s-value
btreeCursor.next(); // Walk to the next tuple
if (btreeCursor.EOF()) break; // At the end of the B+Tree
tuple = btreeCursor.get(); // Read the tuple found at this location
}
return result;
}
The main program would be:
const joinIndexTree = ;// B+Tree with (rKey, sKey) pairs, ordered by rKey
const sIndex = Object.fromEntries(S.map(s => [s.id, s])); // dictionary
function indexJoin(joinIndexTree, R, S, sIndex) {
const out = []
for (const r of R) {
const sids = findInIndex(joinIndexTree, r.id)
for (const s_id of sids) {
const s = sIndex[s_id]; // Look up by primary key index
out.push([ r, s ])
}
}
return out
}
When you only need read-only operations on the table (queries), then instead of a B+Tree, you can create a dictionary of arrays, where you can lookup by joinIndex[r.id] and get an array of s.id values. This is certainly easy to set up and work with, but it is a pain to keep updated when the tables are not read-only.
As alternative to B+Tree, you can also use other balanced search trees, such as AVL and red-black trees, but in my experience B+Trees have superior performance.
I need some help finding symmetric difference of a multi dimensional array, and a simple array. The first value in each inner array of the multidimensional array cells is the index that compares to the simple array.
So
array1 = [1,4,6,7]
array2 = [[1,"more",12],[8,"some",12]]
the result should be something like:
compare(array1, array2) = //[4,6,7] // there are three differences when compared this way
compare(array2, array1) = //[8,"some",12] // there is only one difference when compared this way
I need to return an array that has both difference of array1 from array2 AND difference from array2 from array1 in the same format as the lead array.
Ideally these are not overwriting the existing arrays but creates a new with the output results. There won't be other array formats besides these two array formats.
Each compare can use a different function if it helps. You don't have to use the same function for both, but if you can, great.
I tried a few permutations of loop comparisons
Also solutions found here
How to get the difference between two arrays of objects in JavaScript
And of the simple array methods here
How to get the difference between two arrays in JavaScript?
But I just am not being successful. Can someone give me a hand, and also explain their solution? Any modern tools are fine as long as its broadly cross browser compatible. All my other code sticks to ES6, so that would be ideal. If whipping out a one liner solution please explain what is going on so I can learn.
Thanks!
Update # Dave, this made sense to me, but after it failed I started trying different filter methods and other techniques in the posts above, without much success.
let newNurkles = new Array();
for(var i = 0; i < nurkles.length; i++){
if(this.activeNurkles.includes(nurkles[i])){
} else {
newNurkles.push(nurkles[i]);// if not then push to array
}
}
console.warn("Nurkles to Add" + newNurkles);
This shows how to perform a disjunctive union on two arrays, one being single dimensional while the other is a multidimensional array.
The symmetry is determined by each element of the single with the first element of each sub-array in the multi. The multi will only be one level deep.
Uses: Array.prototype.map(), Array.prototype.filter()
Steps:
Map over the first input array
For each element, filter the second input to exclude those found in first input
Limit results to only the first array returned
Notes:
o is the iteration of array1
t is iteration of array2
t[0] represents the match key
t[idx] represents the current value of the sub-array being iterated
Results from array2 will produce a multidimensional array
const array1 = [1, 4, 6, 7];
const array2 = [[1, "more", 12],[8, "some", 12], [7, 3, 9], [2, 7, 5, 4], [4, 3]];
const oneToTwo = array2.map((t, idx) => array1.filter(o => t[idx] !== o))[0]
const twoToOne = array1.map(o => array2.filter(t => o !== t[0]))[0]
console.log(oneToTwo);
console.log(twoToOne)
This question already has answers here:
How to sort an array of integers correctly
(32 answers)
Closed 2 years ago.
I am trying to sort a map using below function
var m=new Map();
m.set('0900','0910');
m.set('1100','1200');
m.set('1000','1030');
m.set('1235','1240');
var ma=new Map([...m.entries()].sort());
console.log(ma);
Output:{ 900 => 910, 1000 => 1030, 1100 => 1200, 1235 => 1240}
the map is getting sorted, but when I use the integers instead of characters I can't able to sort it
var m=new Map();
m.set(0900,0910);
m.set(1100,1200);
m.set(1000,1030);
m.set(1235,1240);
var ma=new Map([...m.entries()].sort());
console.log(ma)
Output:
{1000 => 1030, 1100 => 1200, 1235 => 1240, 900 => 910}
sort() function, when you don't supply a compareFunction as an argument, does not really work the way you instinctively expect it to work. See the following quote from relevant MDN page:
If compareFunction is not supplied, all non-undefined array elements
are sorted by converting them to strings and comparing strings in
UTF-16 code units order. For example, "banana" comes before "cherry".
In a numeric sort, 9 comes before 80, but because numbers are
converted to strings, "80" comes before "9" in the Unicode order. All
undefined elements are sorted to the end of the array.
The numeric sort bit in the quote explains why you're getting two different sorts with strings and numbers (with "0900" and 900). To overcome this, simply provide a function to the sort to the comparisons the way you want it, like so:
let ma = new Map([...m.entries()].sort((a, z) => a[0] - z[0]);
You can look into the details of how these compareFunctions work in the same MDN page: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort
Here, extract the first element with the help of destructuring out of the array and compare it.
var m=new Map();
m.set(0900,0910);
m.set(1100,1200);
m.set(1000,1030);
m.set(1235,1240);
var ma=new Map([...m.entries()].sort(([a], [b]) => a - b));
for(const e of ma) {
console.log(e);
}
MDN on Map.entries():
The entries() method returns a new Iterator object that contains the [key, value] pairs for each element in the Map object in insertion order.
When calling .sort() on the entries, it first converts the key-value pairs into strings before sorting them. That means that 0900 becomes 900, which comes after 1235, because '9' > '1' (first character of each string).
If you want to sort the entries by key, you will need to pass in a custom compareFunction argument to sort, which handles the key-value pairs and compares the keys as numbers:
var m = new Map();
m.set(0900,0910);
m.set(1100,1200);
m.set(1000,1030);
m.set(1235,1240);
var ma = new Map([...m.entries()].sort((kv1, kv2) => kv1[0] - kv2[0]));
console.log(ma);
I got an array a which is filled by node objects. For example I got 200 nodes on my screen and all are saved inside this array. These nodes are labeled by individual indexes a[0], a[1], ect. Now, when I select a number of random nodes (for example with shortest path) and store them in a second array b which looks like b = [ object, object, object, object.....],
is there a way to automatically return the position in array a? For example, when I click a random node, which is at the 3rd position in array a , that means at a[2], then I want that this position is returned or stored automatically, maybe in a third array c. And at the end, it could look like: c= [ a[2], a[6], a[18], a[7], a[0] .....]. Hope someone can help me with my problem. Thanks so much!
I'm not sure of the format of your arrays and what the objects contain. Using indexOf, https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf, may be a solution.
Using indexOf you can search the first array using the object that is clicked in the second array, you could then push this to a third array.
If I understand your problem correctly, your array a looks like this :
a = [{id:1} , {id:2} , {id:3} , {id:4}]
and then you randomly click on elements, so that they are then stored in an array b as such :
b = [{id:3}, {id:1}]
So eventually you want to memorize the information that a[2] is stored in position b[0], and a[0] is stored in position b[1], right?
What you could do is not store the entire objects in b, but just their id :
b = [3, 1]
This way, you can easily find what position they're at :
findPosition(node){
return b.indexOf( node.id )
}
console.log(findPosition(a[2])) // 0
Is there any way efficiently to join JSON data? Suppose we have two JSON datasets:
{"COLORS":[[1,red],[2,yellow],[3,orange]]}
{"FRUITS":[[1,apple],[2,banana],[3,orange]]}
And I want to turn this into the following client side:
{"NEW_FRUITS":[[1,apple,red],[2,banana,yellow],[3,orange,orange]]}
Keep in mind there will be thousands of records here with much more complex data structures. jQuery and vanilla javascript are both fine. Also keep in mind that there may be colors without fruits and fruits without colors.
NOTE: For the sake of simplicity, let's say that the two datasets are both in the same order, but the second dataset may have gaps.
Alasql JavaScript SQL library does exactly what you need in one line:
<script src="alasql.min.js"></script>
<script>
var data = { COLORS: [[1,"red"],[2,"yellow"],[3,"orange"]],
FRUITS: [[1,"apple"],[2,"banana"],[3,"orange"]]};
data.NEW_FRUITS = alasql('SELECT MATRIX COLORS.[0], COLORS.[1], FRUITS.[1] AS [2] \
FROM ? AS COLORS JOIN ? AS FRUITS ON COLORS.[0] = FRUITS.[0]',
[data.COLORS, data.FRUITS]);
</script>
You can play with this example in jsFiddle.
This is a SQL expression, where:
SELECT - select operator
MATRIX - modifier, whci converts resultset from array of objects to array of arrays
COLORS.[0] - first column of COLORS array, etc.
FRUITS.1 AS 2 - the second column of array FRUITS will be stored as third column in resulting recordset
FROM ? AS COLORS - data array from parameters named COLORS in SQL statement
JOIN ? ON ... - join
[data.COLORS, data.FRUITS] - parameters with data arrays
The fact that there will be thousands of inputs and the keys are not necessarily ordered means your best bet (at least for large objects) is to sort by key first. For objects of size less than about 5 or so, a brute-force n^2 approach should suffice.
Then you can write out the result by walking through the two arrays in parallel, appending new "records" to your output as you go. This sort-then-merge idea is a relatively powerful one and is used frequently. If you do not want to sort first, you can add elements to a priority queue, merging as you go. The sort-then-merge approach is conceptually simpler to code perhaps; if performance matters you should do some profiling.
For colors-without-fruits and fruits-without-colors, I assume writing null for the missing value is sufficient. If the same key appears more than once in either color or fruit, you can either choose one arbitrarily, or throw an exception.
ADDENDUM I did a fiddle as well: http://jsfiddle.net/LuLMz/. It makes no assumptions on the order of the keys nor any assumptions on the relative lengths of the arrays. The only assumptions are the names of the fields and the fact that each subarray has two elements.
There is not a direct way, but you can write logic to get a combined object like this. Since "apple, red, banana...." are all strings, they should be wrapped in a single or double quote.
If you can match the COLORS and FRUITS config array by adding null values for missing items then you can use this approach.
Working demo
var colors = {"COLORS":[[1,'red'],[2,'yellow'],[3,'orange']]}
var fruits = {"FRUITS":[[1,'apple'],[2,'banana'],[3,'orange']]}
var newFruits = {"NEW_FRUITS": [] }
//Just to make sure both arrays are the same size, otherwise the logic will break
if(colors.COLORS.length == fruits.FRUITS.length){
var temp;
$.each(fruits.FRUITS, function(i){
temp = this;
temp.push(colors.COLORS[i][2]);
newFruits.NEW_FRUITS.push(temp);
});
}
Alternatively, if you can create colors and fruits configs as an array of objects, instead of an array of arrays, you can try this solution. The sequence of the elements is irrelevant here, but the array size should still match.
Working demo
var colors = {"COLORS":[ {"1": 'red'}, { "2": 'yellow'}, {"3":'orange'}]}
var fruits = {"FRUITS":[ {"1":'apple'}, { "2": 'banana'}, {"3":'orange'}]}
var newFruits = {"NEW_FRUITS": [] }
if(colors.COLORS.length == fruits.FRUITS.length){
var temp, first;
$.each(fruits.FRUITS, function(i){
for(first in this)break;
temp = {};
temp[first] = [];
temp[first].push(this[first]);
temp[first].push(colors.COLORS[i][first]);
newFruits.NEW_FRUITS.push(temp);
});
}