I am trying to get multiple random documents from a dynamic collection. Until know, I have thought to do it using simple queries, something like this:
Pseudocode
arr = [];
while (arr.length < 5) {
// Start the query at a random position
startAt = Math.random() * collection.size;
randomDoc = await dbRef.startAt(startAt).limit(1).get( ... );
arr.push(randomDoc);
}
Here, firstly, I have to get the collection size, as it can be 0 or bigger. Then, select a random document using a kind of "db random pointer/index".
My question is if there is any way to get the same result but without all the loop stuff, only with the query syntax.
Thank you.
Clever use of startAt and limit!
As you can see in the reference, there are no built-in methods that would return random documents.
In order to avoid the loop, you can use Promise.all:
const indices = getRandomIndices(collection.size);
const docs = await Promise.all(indices.map(i => {
return dbRef.startAt(i).limit(1).get();
}));
And for the getRandomIndices, I suggest: create an array [0, 1, 2, ...], shuffle it as describe in this SO answer, and slice its 5 first elements.
Related
In my case minimum JSON data which I am using 90k [...] and currently I am using .filter method. Nothing is wrong working everything perfectly without any issue, but just for the performance point of view I am wondering and, need suggestion as well, which way we can use for improving the permeance do you agree or we have better way to improve the performance.
All the request coming from backend can not modify and split.
For the reference adding a 5k data which takes around 1sec.
I value all the developer times, I added a code snippet as well.
Appreciate any help and suggestion.
const load5kData = async () => {
let url = 'https://jsonplaceholder.typicode.com/photos';
let obj = await (await fetch(url)).json();
const filteredValue = obj.filter(item => item.albumId == 36);
console.log(filteredValue)
}
load5kData();
<h1>5k data</h1>
It looks like the response is returned with the albumId ordered in ascending order. You could make use of that by using a traditional for loop and short circuiting once you reach id 37.
In my opinion, if you aren't having performance issues just using the filter method, I would say just leave it and dont over-optimize!
Another option is that there are only 50 items with albumId == 36. You could just make your own array of all those objects in a json file. However you obviously lose out on fetching the latest images if the results of the api ever change
filter is really your only solution which will involve iterating over each element.
If you need to do multiple searches against the same data you can index data by the key you will be using so finding data with a specific albumId requires no additional filtering but would still require iterating over each element when initially indexing.
const indexByAlbumId = data =>
data.reduce((a, c) => {
if (a[c.albumId] === undefined) {
a[c.albumId] = [c];
} else {
a[c.albumId].push(c);
}
return a;
}, {});
const load5kData = async () => {
const url = 'https://jsonplaceholder.typicode.com/photos';
const data = await (await fetch(url)).json();
const indexedData = indexByAlbumId(data);
console.log('36', indexedData[36]);
console.log('28', indexedData[28]);
};
load5kData();
Another optimisation is that if the data is sorted by the index you are searching you can take advantage of this by doing a divide and conquer search where you first try to find an entry where the value is what you need, then from there you find where the chunk begins/ends by doing the same divide and conquer to the left/right of that element.
Currently using: obj.filter(item => item.albumId == 36);
My task is:
Implement the function duplicateStudents(), which gets the variable
"students" and filters for students with the same matriculation
number. Firstly, project all elements in students by matriculation
number. After that you can filter for duplicates relatively easily. At
the end project using the following format: { matrikelnummer:
(matrikelnummer), students: [ (students[i], students[j], ... ) ] }.
Implement the invalidGrades() function, which gets the variable "grades"
and filters for possibly incorrect notes. For example, in order to
keep a manual check as low as possible, the function should determine
for which matriculation numbers several grades were transmitted for
the same course. Example: For matriculation number X, a 2. 7 and a 2.
3 were transmitted for course Y. However, the function would also take
into account the valid case, i. e. for matriculation number X, once a
5,0 and once a 2,3 were transmitted for course Y.
In this task you should only use map(), reduce(), and filter(). Do not
implement for-loops.
function duplicateStudents(students) {
return students
// TODO: implement me
}
function invalidGrades(grades) {
return grades
.map((s) => {
// TODO: implement me
return {
matrikelnummer: -1/* put something here */,
grades: []/* put something here */,
};
})
.filter((e) => e.grades.length > 0)
}
The variables students and grades I have in a separate file. I know it might be helpful to upload the files too, but one is 1000 lines long, the other 500. That’s why I’m not uploading them. But I hope it is possible to do the task without the values. It is important to say that the values are represented as an array
I'll give you an example of using reduce on duplicateStudents, that's not returning the expected format but you could go from there.
const duplicateStudents = (students) => {
const grouping = students.reduce((previous, current) => {
if (previous[current.matrikelnummer]) previous[current.matrikelnummer].push(current); // add student if matrikelnummer already exist
else previous[current.matrikelnummer] = [current];
return previous;
}, {});
console.log(grouping);
return //you could process `grouping` to the expected format in here
};
here's preferences for you:
map
filter
reduce
I have a javascript array of nested data that holds data which will be displayed to the user.
The user would like to be able to apply 0 to n filter conditions to the data they are looking at.
In order to meet this goal, I need to first find elements that match the 0 to n filter conditions, then perform some data manipulation on those entries. An obvious way of solving this is to have several filter statements back to back (with a conditional check inside them to see if the filter needs to be applied) and then a map function at the end like this:
var firstFilterList = _.filter(myData, firstFilterFunction);
var secondFilterList = _.filter(firstFilterList, secondFilterFunction);
var thirdFilterList = _.filter(secondFilterList, thirdFilterFunction);
var finalList = _.map(thirdFilterList, postFilterFunction);
In this case however, the javascript array would be traversed 4 times. A way to get around this would be to have a single filter that checks all 3 (or 0 to n) conditions before determining if there is a match, and then, inside the filter at the end of the function, doing the data manipulation, however this seems a bit hacky and makes the "filter" responsible for more than one thing, which is not ideal. The upside would be that the javascript Array is traversed only once.
Is there a "best practices" way of doing what I am trying to accomplish?
EDIT: I am also interested in hearing if it is considered bad practice to perform data manipulation (adding fields to javascript objects etc...) within a filter function.
You could collect all filter functions in an array and check every filter with the actual data set and filter by the result. Then take your mapping function to get the wanted result.
var data = [ /* ... */ ],
filterFn1 = () => Math.round(Math.random()),
filterFn2 = (age) => age > 35,
filterFn3 = year => year === 1955,
fns = [filterFn1, filterFn2, filterFn2],
whatever = ... // final function for mapping
result = data
.filter(x => fns.every(f => f(x)))
.map(whatever);
One thing you can do is to combine all those filter functions into one single function, with reduce, then call filter with the combined function.
var combined = [firstFilterFunction, seconfFilterFunction, ...]
.reduce((x, y) => (z => x(z) && y(z)));
var filtered = myData.filter(combined);
I have two data sets that vary in length. I need to loop through both arrays and push the photo url of the matching player ID's to a new array. I am doing this after the API data has loaded into my React component, so it does not seem to be a matter of the data not being loaded yet. I can console.log both data sets and see that they are successfully being returned from the API. All of this works in JS Fiddle (link below), but not React. So I am having a tough time tracking down what is going on.
Here are my data sets (all of them have obviously been shortened):
This object array is returned from an API endpoint that holds all of the player information, such as ID, Photo URL's, etc etc. (this is 1200+ objects):
playerInfo = [
{PlayerID: 123, PhotoURL: url123},
{PlayerID: 345, PhotoURL: url345},
{PlayerID: 678, PhotoURL: url678},
{PlayerID: 910, PhotoURL: url910},
{PlayerID: 1112, PhotoURL: "url1112"},
{PlayerID: 1213, PhotoURL: "url1213"}
];
This data is returned from a separate API endpoint that lists the current leaders (returns top 40 leaders):
playerIDs = [123, 345, 678, 910]
Unfortunately this leaderboard data does not include the photo URL's with the returned player ID's, so I have to compare the player IDs in both arrays, and then push the PhotoURL from the above matching playerInfo array.
My approach is as follows(I probably could be using some type of ES6 feature for this, so please let me know if that's the case.):
let leaderPhotos = [];
for(let i = 0; i < playerInfo.length; i++) {
if(playerInfo[i].PlayerID === playerIDs[i]) {
leaderPhotos.push(playerInfo[i].PhotoURL);
}
}
My thinking for this is that even though the playerIDs array is shorter than the PlayerInfo array, setting the loop to the length of the PlayerInfo array will allow the shorter PlayerIDs leaderboard array to be compared against every player ID in the PlayerInfo object array since any player could be on the leaderboard at any given time. If a Player ID matches within both arrays, it pushes the photo URL value of the PlayerInfo object to an array, that I can then use to load in top leader photos. Since the loop goes in order through the leaderboard PlayerIDs, the returned photo URL's are in order of the leader board.
Here it is working in JS Fiddle.
You can see in the console that it only pushes the photo URL's of the matching player ID's. Perfect, just what I want.
However when this same thing is applied inside of the React component responsible for handling the API data, I am getting back an empty array in the console after the logic has run it's course. I can console.log both of the returned API data sets to see that I am successfully getting data back to work with.
I was messing around, and decided to see if setting both array indexs to a value would work like so:
let leaderPhotos = [];
if(playerInfo[0].PlayerID === playerIDs[0]) {
leaderPhotos.push(playerInfo[0].PhotoURL);
} else {
return false;
} console.log(leaderPhotos);
And sure enough, this will push the matching pair's photo url to leaderPhotos as expected without the for loop. I am having a difficult time understanding why the for loop is not working, even though in JS Fiddle it proves that this should return the desired results. Thanks for any tips in advance!
Short clean version
const leaderPhotos = playerIDs.map(id =>
playerInfo.find(({ PlayerID }) => PlayerID === id).PhotoURL
);
Note: Just to let you know this is a nested loop. Where map is a loop through the leadership board and find is a loop through the players. The time complexity is O(n * m) where n is the total users and m is the length of the leadership board.
Though if the total number of users isn't huge it shouldn't matter and as long as the code looks clean and understandable that would always be better (:
If you have a HUGE number of users and you want to have the most efficient code. What you should do is store your playerInfo in a map/object where the key is the playerId and the value is the playerURL. This way it only needs to loop through your objects once and has a O(1) retrieval for the player photos.
Efficient Version
const playerMap = {};
playerInfo.forEach((player) => {
playerMap[player.PlayerID] = player.PhotoURL;
});
const leaderPhotos = playerIDs.map(leaderId => playerMap[leaderId]);
The time complexity of this is at most O(n) where n is the number of players.
Try this
let leaderPhotos = playerInfo.filter(plr => playerIDs.indexOf(plr.PlayerID) !== -1).map( t => t.PhotoURL);
Make sure that both of your arrays have the same order based on playerId.
Otherwise, you need to use two loops:
let leaderPhotos = [];
for(let i = 0; i < playerInfo.length; i++) {
for(let j = 0; j < playerIDs.length; j++ )
if(playerInfo[i].PlayerID === playerIDs[j]) {
leaderPhotos.push(playerInfo[i].PhotoURL);
}
}
I think so this is the shortest solution possible compared to the other answers:
let leaderPhotos = playerInfo.filter((info) => playerIDs.includes(info.PlayerID)).map(id => id.PhotoURL);
I would really appreciate if someone could help me with something: I need to make a normal query to the database but, as my collection is very large (10000 documents) I need to do the query and use $limit and $skip. That part I solved but now I want to have a count to all the documents, even if the returned ones are less. The output should be something like this:
{
count: 1150,
data: [/*length of 50*/]
}
Could anyone help please? Thank you very much.
Since you mentioned you are making a normal query, its not wise to go for aggregation. find() will be a much better option here. Instead you can use the find query itself. The commands to do this in mongoDB console is shown below:
> var docs = db.collection.find().skip(10).limit(50)
> docs.count()
189000
> docs.length()
50
You can do this using one query itself. In Node.js and Express.js, you will have to use it like this to be able to use the "count" function along with the toArray's "result".
var curFind = db.collection('tasks').find({query});
Then you can run two functions after it like this (one nested in the other)
curFind.count(function (e, count) {
// Use count here
curFind.skip(0).limit(10).toArray(function(err, result) {
// Use result here and count here
});
});
I don't think it is possible in a single query to get the total count of the result along with the paginated data without using aggregation.
You can probably achieve this via aggregation but since you mentioned, your collection is very large, you should avoid it and break the query into two parts. I'm providing you an example of considering user collection having a rating field with more than 10,000 records:
var finalResult = {};
var query = {
rating: {$gt: 2}
};
// Get first 50 records of users having rating greater than 2
var users = db.user.find(query).limit(50).skip(0).toArray();
// Get total count of users having rating greater than 2
var totalUsers = db.user.cound(query);
finalResult.count = totalUsers;
finalResult.data = users;
And your final output can be like:
finalResult == {
count: 1150,
data: [/*length of 50 users*/]
}
Hope, this make sense to you. Some of the famous technologies like Grails internally do that to achieve pagination.
Another cleaner approach could be:
var finalResult = {};
var query = {
rating: {$gt: 2}
};
var cursor = db.user.find(query).limit(50).skip(0);
// Get total count of users having rating greater than 2
// By default, the count() method ignores the effects of the cursor.skip() and cursor.limit()
finalResult.count = cursor.count();;
finalResult.data = cursor.toArray();
As mentioned by Sarath Nair you can use count, as it ignores skip and limit. Check: MongoDB Count
By the way, this is duplication from another StackOverflow question: Limiting results in MongoDB but still getting the full count?