I have a dataset that I add regularly to a Dexie database, which includes data like this:
...
name: Bill Smith, age: 21, location: New York
name: John Smith, age: 36, location: Los Angeles // These are two different people
name: John Smith, age: 36 location: Los Angeles //with the same information
name: Susie Smith, age: 72 location: Detroit
...
I'm hashing each entry to create a unique submission id:
//Creating a unique submission.id based on input data
const input = `${submission.name}${submission.age}${submission.location}`
const encoder = new TextEncoder();
const data = encoder.encode(input);
crypto.subtle.digest('SHA-256', data)
.then((hash) =>
{const hashArray = Array.from(new Uint8Array(hash));
const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
submission.id = hashHex;
You can see my problem. When I go to detect duplicates, both Johns have been assigned the same hashed id and are erroneously detected as duplicate entries.
//Detect duplicate ids
db.entries.filter(entry =>
entry.id === submission.id.count(count =>
{if (count > 0)
console.log("ID already exists in the database. Ending loop.")
I don't have any control over the structure of the data I'm working with, or would assign unique identifiers to them during the creation of the data.
So my question for you folks is: How do I detect and discard "true duplicates" while identifying "non-duplicates" like our two Johns and adding them both to the database?
Related
I am working on something where I take data from 2 different APIs that I have no control of and I want to combine the results in the most efficient way.
One of the arrays hold some assets, lets say books, the other one holds a transaction for the said book. Here is an example:
{
author: {name: 'J.K. Rowling', },
assetName: 'Book1'
}]
const array2 = [
{from: 'John',
to: 'Sarah,
price: 10,
timeStamp: 123,
assetName: 'Book1',
authorName: 'J.K. Rowling'
}]
Note that to find the corresponding transaction for a given book, you need both assetName and authorName to match - you can own more than one book of the same author and you can own two books with the same name but a different author but an author has only one book with a given name, thus finding the corresponding transaction to an asset requires both fields to match and there are no other unique identifiers.
The naive approach is to iterate over one of the arrays and for each entry to check in the second array to find the transaction but that looks like it will take too long to execute if the arrays are of substantial size.
I was wondering what better solutions can you think of for merging two objects with different structure that is efficient?
Well, if author.name + assetName form an id, you could iterate over array1 once & create a Map with keys being author.name + assetName & vales being original objects.
Then you could iterate over array2 once as well & enrich it whatever way you want. All lookups in the second iteration will be fast since you will access the Map instead of searching in array.
const indexedArray1 = new Map();
array1.forEach(data => indexedArray1.set(data.author.name + data.assetName, data);
const enrichedArray2 = array2.map(transaction => {
const relatedBook = indexedArray1.get(transaction.authorName + transaction.assetName);
// Merge relatedBook & transaction the way you want here
});
I often do the following when merging arrays
The time complexity is O(n)
const array1 = [{
author: {name: 'J.K. Rowling' },
assetName: 'Book1'
}]
const array2 = [{
from: 'John',
to: 'Sarah',
price: 10,
timeStamp: 123,
assetName: 'Book1',
authorName: 'J.K. Rowling'
}]
const array2_map = {}
array2.forEach(e => {
const key = `${e.assetName}:${e.authorName}`
if (!array2_map[key]) array2_map[key] = []
const { from, to, price, timeStamp } = e
array2_map[key].push({
from,
to,
price,
timeStamp
})
})
const merged_array = array1.map(e => ({
...e,
transaction: array2_map[`${e.assetName}:${e.authorName}`] || []
}))
const users = [
{id:1, email:"abc#email.com"},
{id:2, email:"xyz#email.com"},
{....}(~70,000 objects)
]
function a(){
const id = 545
users.filter((value)=>{
if(value.id === id)
return true
})
}
We have 70,000 users' objects. we need to filter the email based on the id.
users= [{id: '1001', email: "abc#gmail.com"}, {{id: '1002', email: "spc#gmail.com"} , ..];
Using array and array.filter() ends up in the error.
Error
what's the best way of approach for this?
It might be best to convert your array into a Map so that lookups can be made without scanning the entire array. As such:
const lookupMap = new Map(users.map((u) => [u.id, u]));
so now you can
const user = lookupMap.get(userId)
without having to scan all 70000 user objects.
What is the correct method JavaScript function I can use to loop through an array of JSON data and return the JSON.KEY value that includes the user input query
Suppose that we have an array books with these data
const books = [
{ title: 'chemistry', pages: 123 },
{ title: 'chemical abcd', pages: 103 },
{ title: 'anatomy of something ', pages: 423 }
];
When a user query is
let query= 'chemi'
Then the output should be
filteredBooks = [
{ title: 'chemistry', pages: 123 },
{ title: 'chemical abcd', pages: 103 }
];
For this I would use the js Array.filter method:
const filteredBooks = books.filter(book => book.title.includes(query))
In addition of the other questions, using destructuring saves a bit of code:
const filteredBooks = books.filter(({title}) => title.includes(query));
this method searches all the values of the object if they contain the query. you should make your query lowercase first:
query=query.toLowerCase();
filteredBooks = books.filter(book=>Object.values(book).some(value=>value.toString().toLowerCase().includes(query)));
if the book object has sub-fields, this is a lazy way to query them:
query=query.toLowerCase();
filteredBooks = books.filter(book=>JSON.stringify(Object.values(book)).toLowerCase().includes(query))
I have an array of objects that looks like:
{
name: 'steve',
plaintiff:'IRS'
amount: 5000,
otherliens:[
{
amount:5000,
plaintiff:'irs'
},
{amount:5000,
plaintiff:'irs'
}
]
}
i need to send this as a csv so i need to map and iterate over this subarray and flatten it into them ain object like so:
{
name:'steve',
plaintiff:'irs',
amount:5000,
plaintiff2:'irs',
amount2:5000,
plaintiff3:'irs',
amount3:5000
}
the code i use to normally do this process is by mapping the contents of the original array into a new array with arr.map(a,i =>{ a[i] ? a[i].amount = a[i].amount }) i am able to work the the subarrays that are string based by flat guessing a number of entries (see phones and emails) because if i return null it just returns blank, which in the csv isnt the worst thing. but i cannot do the same because accessing a sub property of an element that doesnt exist obviously wont work. So here is the map im using where emailAddresses is a string array, phoneNumbers is a string array and otherliens is an object array.
any help would be appreciated and bear in mind because it is bulk data transfer and csvs that will have a fixed number of columns in the end, i dont mind null values, so i guess you would take the longest subarray length and use that in all the other objects.
Full code
prospects.map((list, i) => {
result[i]
? (result[i].fullName = list.fullName)
(result[i].First_Name = list.firstName)
(result[i].Last_Name = list.lastName)
(result[i].Delivery_Address = list.deliveryAddress)
(result[i].City = list.city)
(result[i].State = list.state)
(result[i].Zip_4 = list.zip4)
(result[i].County = list.county)
(result[i].plaintiff= list.plaintiff)
(result[i].Amount = list.amount)
(result[i].age = list.age)
(result[i].dob= list.dob)
(result[i].snn= list.ssn)
(result[i].plaintiff2= list.otherliens[1].plaintiff )
(result[i].filingDate2= list.otherliens[1].filingDate)
(result[i].amount2= list.otherliens[1].amount )
(result[i].plaintiff3= list.otherliens[2].plaintiff)
(result[i].filingDate3= list.otherliens[2].filingDate )
(result[i].amount3= list.otherliens[2].amount )
(result[i].amount4= list.otherliens[3].amount)
(result[i].plaintiff4= list.otherliens[3].plaintiff )
(result[i].filingDate4= list.otherliens[3].filingDate )
(result[i].phone1 = list.phones[0])
(result[i].phone2 = list.phones[1])
(result[i].phone3 = list.phones[2])
(result[i].phone4 = list.phones[3])
(result[i].phone5 = list.phones[4])
(result[i].phone6 = list.phones[5])
(result[i].phone7 = list.phones[6])
(result[i].phone8 = list.phones[7])
(result[i].phone9 = list.phones[8])
(result[i].emailAddress1 = list.emailAddresses[0])
(result[i].emailAddress2 = list.emailAddresses[1])
(result[i].emailAddress3 = list.emailAddresses[2])
(result[i].emailAddress4 = list.emailAddresses[3])
(result[i].emailAddress5 = list.emailAddresses[4])
(result[i].emailAddress6 = list.emailAddresses[5])
(result[i].emailAddress7 = list.emailAddresses[6])
: (result[i] = {
Full_Name: list.fullName ,
First_Name: list.firstName,
Last_Name: list.lastName,
Delivery_Address: list.deliveryAddress,
City: list.city,
State: list.state,
Zip_4: list.zip4,
County: list.county,
dob: list.dob,
ssn:list.ssn,
age:list.age,
Amount: list.amount,
plaintiff: list.plaintiff,
filingDate: list.filingDate,
phone1:list.phones[0],
phone2:list.phones[1],
phone3:list.phones[3],
phone4:list.phones[4],
phone5:list.phones[5],
phone6:list.phones[6],
phone7:list.phones[7],
phone8:list.phones[8],
emailAddress1:list.emailAddresses[0],
emailAddress2:list.emailAddresses[1],
emailAddress3:list.emailAddresses[2],
emailAddress4:list.emailAddresses[3],
emailAddress5:list.emailAddresses[4],
emailAddress6:list.emailAddresses[5],
plaintiff2: list.otherliens[1].plaintiff,
amount2: list.otherliens[1].amount,
filingDate2: list.otherliens[1].filingDate,
plaintiff3: list.otherliens[2].plaintiff,
filingDate3: list.otherliens[2].filingDate,
amount3: list.otherliens[2].amount,
plaintiff4: list.otherliens[3].plaintiff,
amount4: list.otherliens[3].amount,
filingDate4: list.otherliens[3].filingDate,
})
} );
Use loops to assign properties from the nested arrays, rather than hard-coding the number of items.
I also don't see the need for the conditional expression. Since each input element maps directly to an output element, there won't already be result[i] that needs to be updated.
result = prospects.map(({fullName, firstName, lastName, deliveryAddress, city, state,zip4, county, plaintiff, amount, age, dob, ssn, otherliens, phones, emailAddresses}) => {
let obj = {
fullName: fullName,
First_Name: firstName,
Last_Name: lastName,
Delivery_Address: deliveryAddress,
City: city,
State: state,
Zip_4: zip4,
County: county,
plaintiff: plaintiff,
Amount: amount,
age: age,
dob: dob,
ssn: ssn
};
otherliens.forEach(({plaintiff, amount}, i) => {
obj[`plaintiff${i+2}`] = plaintiff;
obj[`amount${i+1}`] = amount;
});
phones.forEach((phone, i) => obj[`phone${i+1}`] = phone);
emailAddresses.forEach((addr, i) => obj[`emailAddress${i+1}`] = addr);
return obj;
})
I need to loop through my JSON file, where info is stored in user id's, like this.
{
"350707981178109964":{"wins":1,"losses":0,"rank":4,"username":"TheeSniper95"},
"426459326031593482":{"wins":0,"losses":0,"rank":1,"username":"Ding Dang Test"},
"267752826623492102":{"wins":0,"losses":0,"rank":1,"username":"MrDooba"}
}
The long number is the user id accessed through member.id or message.author.id using the discord.js package.
I need to grab the user and put them at the top of an array with their user names and wins, the higher their wins the higher they are on the leaderboard.
But I have had trouble keeping the username with the wins. And getting the user id to access it and sort through all of them and then save it in a variable.
If you just need an array of users without their ids sorted by wins, you can try this:
let board = Object.values(users).sort((a, b) => b.wins - a.wins);
The Object.values method returns an array of object's values (which are your users) and then you sort them by wins in descending order using sort function.
If you need to include user id in your objects, you can use Object.entries to get all object's [key, value] pairs and them use the map function to create an array of users with their ids included:
let users = {
"350707981178109964":{"wins":1,"losses":0,"rank":4,"username":"TheeSniper95"},
"426459326031593482":{"wins":0,"losses":1,"rank":1,"username":"Ding Dang Test"},
"267752826623492102":{"wins":10,"losses":0,"rank":1,"username":"MrDooba"},
"267752827723492576":{"wins":3,"losses":0,"rank":1,"username":"Johny"},
"267733277234925765":{"wins":7,"losses":4,"rank":1,"username":"Sam"}
};
let board = Object.entries(users)
.map(([key, val]) => ({id: key, ...val}))
.sort((a, b) => b.wins - a.wins);
console.log(board);
var res = [];
var obj = {
"350707981178109964": {"wins":1,"losses":0,"rank":4,"username":"TheeSniper95"},
"42645932603159342": {"wins":0,"losses":0,"rank":1,"username":"Ding Dang Test"},
"267752826623492102": {"wins":0,"losses":0,"rank":1,"username":"MrDooba"}
}
var users = Object.keys(obj)
users.sort(function(a,b) {
return obj[a].wins - obj[b].wins;
})
users.forEach(function(user) {
var newObj = {
user: user,
details: obj[user]
}
res.push(newObj);
})
to sort a collection you can use underscore.js
var stooges = [{name: 'moe', age: 40}, {name: 'larry', age: 50}, {name: 'curly', age: 60}];
_.sortBy(stooges, 'name');
=> [{name: 'curly', age: 60}, {name: 'larry', age: 50}, {name: 'moe', age: 40}];