I'd like to be able to sort this collection according to the following attributes of each item.
var distance1 = { velocity : 100, time: 20, rank: time*velocity};
var distance2 = { velocity : 50, time: 25, rank: time*velocity};
var distance3 = { velocity : 300, time: 10, rank: time*velocity};
collection.insert([distance1, distance2, distance3]);
So basically rank them in accordance of 'distance' (time*velocity) in ascending order.
The list should end up as this;
distance2
distance1
distance3
However, the rank needs to be derived from the distance and velocity containers, since otherwise you'd run into a problem when the user changes the values;
distance1.set(time= "1");
distance1.set(velocity = "1");
distance2.set(time= "100");
distance2.set(velocity = "20");
distance3.set(time= "10000");
distance3.set(velocity = "20000");
Which should the rank to;
distance1
distance2
distance3
But will not do so if the rank is static.
How can I make use of MongoDB's meta and sorting functionality to allow a dynamic ranking algorithm based on user input? Kind of like how Reddit uses upvotes to determine where a page ranks.
EDIT: holy shit fixed a dumb physics mistake
The most efficient approach would be to update the rank field when you change the values in your application code (i.e. don't allow time and velocity to be set without also recalculating the rank). You can then add an appropriate index (or indexes) including the rank field to support your common queries (see Optimizing MongoDB Compound Indexes for some helpful background on creating indexes).
You could also compute rank dynamically via a query with the aggregation framework, but that would involve a very inefficient recalculation for all matching documents in the collection to determine result order.
Related
For my Node.js application I need to choose the best performing structure to represent a grid.
My requirements/limitations are:
The grid to store is two-dimensional by nature (x, y) and not very large (100-300 cells)
Some cells in the grid contain nothing, i.e. the grid will be empty for up to 25%
I will have to address the grid very often. I'll need to do some heavy
algorithms to the grid, like flood-fill, A* pathfinding and some more
This will be a repetitive simulation process of changing the grid and applying the algorithms again
I aim at hundreds of simulations in a limited time, so every millisecond matters
I do not care about readability of the code
Amount of memory used is also a minor concern
Switch to another programming language is not possible
I've been choosing between three options:
var grid = [height][width];
grid[y][x] = {};
var grid = [height * width];
grid[y * height + x] = {};
var y = ~~(index % height);
var x = index - height * y;
var grid = [];
var key = x + ',' + y;
grid[key] = {};
The first one is the most comfortable as I will manipulate the coordinates a lot, meaning x and y will be handy all the time. Possible disadvantage - I've read it could be much slower when holding objects in comparison to 1D array.
The second is fine and probably very fast. But I will need to convert index to x and y and vice versa which is extra calculations involving modulo for index to coords conversion. Still I see this approach in many good sources.
The third way is new to me but I've seen it in some robust code examples and I've read that retrieving an object from hash table can be faster in comparison to 2D array as well.
I do not trust synthetic benchmarks too much so I do not wish to set up a code competition with almost empty logic so far. But I'm afraid it will be a very long way back if I pick a wrong way now and then will have to revert.
I've seen similar questions asking about different pairs of these methods, but none of them reflects my requirements close enough.
Thank you for your considerations with code samples.
I've recently finished a course where we've used an older version of TensorFlow.js, and there were a useful method on tensors (not just buffers): .get(). Since it has been removed, I have to use a different solution to create my simplified learning rate optimisation, where I compare the previous cost to the new cost, if the previous is bigger, increase the learning rate, otherwise decrease it. Cost is always a scalar tensor, I stack the previous one with the new cost, get the index of the bigger one with .argMax(), and get the item from my "constant" tensor which just stores two values, how much to multiply the learning rate with, by the index -the result of .argMax().
An example would be:
let learningRate = tf.tensor(1);
const prevCost = tf.tensor(1);
const nextCost = tf.tensor(2);
const modifiers = tf.tensor([1.05, 0.5]);
const bigger = tf.stack([prevCost, nextCost]).argMax(); // 1
const modifier = modifiers.get(// if it would still exist
bigger
); // 0.5
learningRate = learningRate.mul(modifier); // 1 * 0.5 = 0.5
But unfortunately .get() doesn't exist anymore, however, there should be a method to do that.
tf.slice can be used as explained here
tensor.slice([...cordinates], 1)
I want to use Fusionchart to read my Firebase data and create a chart in my web app. but my Firebase DB has a wrong structure, so the Fusionchart can't get data (my Firebase config is right).
Following is the code that I write data to Firebase, num is a value increased in each loop. But as shown in the attached picture, the child's name is not added as a sequence number.
Another question is I don't want the unique key inside the child 1, just six various inside the child one is ok.
Any suggestions will be appreciated.
firebase.database().ref('testdata/User1').child(num).push({
x: posX,
y: posY,
MaxSpeed: maxSpeed,
steps: counter,
time: Timeperiod /1000,
speed: SpeedRecord,
});
If you don't want the push ID inside your pseudo-numeric keys, call set instead of push.
So:
firebase.database().ref('testdata/User1').child(num).set({
x: posX,
y: posY,
MaxSpeed: maxSpeed,
steps: counter,
time: Timeperiod /1000,
speed: SpeedRecord,
});
Your other problems seems (it's impossible to be certain, since you didn't include the code for the increment) to come from the fact that num is a string. If that is indeed the case, increment it with:
num = String(parseInt(num) + 1);
Using such numeric keys is an antipattern in Firebase though, so I'd usually recommend against using them. If you must, at least pad them til a certain length, so that you can sort/filter on them easily.
Something as simple as:
num = String(parseInt(num) + 1).padLeft(5, "0");
Will work on all modern browsers, and ensures that all keys-that-look-like-numbers-but-behave-like-strings will show up in the order you expect.
I need to calculate the percentile rank of a particular value against a large number of values filtered in various different ways. The data is all stored on Parse.com, which has a limitation of returning a maximum of 1000 rows per query. The number of values stored is likely to exceed well over 100,000.
By 'percentile rank', I mean I need to calculate the percentage of values that the provided value is greater than. I am not trying to calculate the value of a provided percentile. For example, given a list of values {20, 23, 24, 29, 30, 31, 35, 40, 40, 43} the percentile rank of the provided value 35 is 70%. The algorithm for this is simply the rank of the value / count of values * 100. Not sure if 'percentile rank' is the correct terminology for this.
I have considered a couple of different approaches to this. The first is to pull down the full list of values (into Parse Cloud) and then calculate the percentile rank from there, then filter the list and calculate again, repeating the last two steps as many times as required. The problem with this approach is it will not work once we reach 1000 values, which we can expect pretty quickly.
Another option, which is the best I can come up with so far, is to query the count of items, and the rank of the provided value. For example:
var rank_world_alltime = new Parse.Query("Values")
.lessThan("value", request.params.value) // Filters query to values less than the provided value, so counting this query will return the rank
.count();
var count_world_alltime = new Parse.Query("Values")
.count();
Parse.Promise.when(rank_world_alltime, count_world_alltime).then(function(rank, count) {
percentile = rank / count * 100;
console.log("world_alltime_percentile = " + percentile);
});
This works well for a single calculation, but I need to perform multiple calculations, and this approach very quickly becomes a lot of queries. I expect to need to run about 15 calculations per call, which is 30 queries. All calculations need to complete in under 3 seconds before Parse terminates the job, and I am limited to 30 reqs/second, so this is very quickly going to become a problem.
Does anyone have any suggestions on how else I could approach this? I've thought about somehow pre-processing some of this but can't quite work out how to do so, as the filters will be based on time and location (city and country), so there are potentially a LOT of pre-calculations that will need to be run at regular intervals. The results do not need to be 100% accurate but something close.
I don't know much about parse, but as far as I understand what you say, it is some kind of cloud database thingy that holds your hiscores, and limits you 1000 rows per query, 3 seconds per job, and 30 queries per second.
In order to have approximate calculations and divide by 2 the number of queries, I would first of all cache the total (count_world_alltime, count_region,week, whatever). If you can save them somewhere locally. For numbers of 100K just getting the order of magnitude (thus not the latest updated number) should be good enough to get a percentile.
Maybe you can get several counts per query. However my lack of expertise in parse/nosql kind of stops me from being sure of this, you'll have to check their documentation. If it is possible however, for the case where you need percentiles for a serie of values all in the same category, I would
Order the values, let's call them a,b,c,d,e (once ordered)
Get the number of values between the intervals [0,a] [a,b] [b,c] [c,d] [d,e]
Use the cached total to get the percentiles (where Nxy is the number of values in [x,y]) :
Pa = 100 * N0a / total
Pb = 100 * ( N0a + Nab ) / total
Pc = 100 * ( N0a + Nab + Nbc ) / total
and so on...
If you need a value ranked worldwide, the other per region, some per week others over all times, etc, this doesn't apply. In that case I don't think you can get below 1 query/number, with caching the totals.
I'm building a leaderboard using Firebase. The player's position in the leaderboard is tracked using Firebase's priority system.
At some point in my program's execution, I need to know what position a given user is at in the leaderboard. I might have thousands of users, so iterating through all of them to find an object with the same ID (thus giving me the index) isn't really an option.
Is there a more performant way to determine the index of an object in an ordered list in Firebase?
edit: I'm trying to figure out the following:
/
---- leaderboard
--------user4 {...}
--------user1 {...}
--------user3 {...} <- what is the index of user3, given a snapshot of user3?
--------...
If you are processing tens or hundreds of elements and don't mind taking a bandwidth hit, see Katos answer.
If you're processing thounsands of records, you'll need to follow an approach outlined in principle in pperrin's answer. The following answer details that.
Step 1: setup Flashlight to index your leaderboard with ElasticSearch
Flashlight is a convenient node script that syncs elasticsearch with Firebase data.
Read about how to set it up here.
Step 2: modify Flashlight to allow you to pass query options to ElasticSearch
as of this writing, Flashlight gives you no way to tell ElasticSearch you're only interested in the number of documents matched and not the documents themselves.
I've submitted this pull request which uses a simple one-line fix to add this functionality. If it isn't closed by the time you read this answer, simply make the change in your copy/fork of flashlight manually.
Step 3: Perform the query!
This is the query I sent via Firebase:
{
index: 'firebase',
type: 'allTime',
query: {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"points": {
"gte": minPoints
}
}
}
}
},
options: {
"search_type": "count"
}
};
Replace points with the name of the field tracking points for your users, and minPoints with the number of points of the user whose rank you are interested in.
The response will look something like:
{
max_score: 0,
total: 2
}
total is the number of users who have the same or greater number of points -- in other words, the user's rank!
Since Firebase stores object, not arrays, the elements do not have an "index" in the list--JavaScript and by extension JSON objects are inherently unordered. As explained in Ordered Docs and demonstrated in the leaderboard example, you accomplish ordering by using priorities.
A set operation:
var ref = new Firebase('URL/leaderboard');
ref.child('user1').setPriority( newPosition /*score?*/ );
A read operation:
var ref = new Firebase('URL/leaderboard');
ref.child('user1').once('value', function(snap) {
console.log('user1 is at position', snap.getPriority());
});
To get the info you want, at some point a process is going to have to enumerate the nodes to count them. So the question is then where/when the couting takes place.
Using .count() in the client will mean it is done every time it is needed, it will be pretty accurate, but procesing/traffic heavy.
If you keep a separate index of the count it will need regular refreshing, or constant updating (each insert causeing a shuffling up of the remaining entries).
Depending on the distribution and volume of your data I would be tempted to go with a background process that just updates(/rebuilds) the index every (say) ten or twenty additions. And indexes every (say) 10 positions.
"Leaderboard",$UserId = priority=$score
...
"Rank",'10' = $UserId,priority=$score
"Rank",'20' = $UserId,priority=$score
...
From a score you get the rank within ten and then using a startat/endat/count on your "Leaderboard" get it down to the unit.
If your background process is monitoring the updates to the leaderboard, it could be more inteligent about its updates to the index either updating only as requried.
I know this is an old question, but I just wanted to share my solutions for future reference. First of all, the Firebase ecosystem has changed quite a bit, and I'm assuming the current best practices (i.e. Firestore and serverless functions). I personally considered these solutions while building a real application, and ended up picking the scheduled approximated ranks.
Live ranks (most up-to-date, but expensive)
When preparing a user leaderboard I make a few assumptions:
The leaderboard ranks users based on a number which I'll call 'score' from now on
New users rank lowest on the leaderboard, so upon user creation, their rank is set to the total user count (with a Firebase function, which sets the rank, but also increases the 'total user' counter by 1).
Scores can only increase (with a few adaptations decreasing scores can also be supported).
Deleted users keep a 'ghost' spot on the leaderboard.
Whenever a user gets to increase their score, a Firebase function responds to this change by querying all surpassed users (whose score is >= the user's old score but < the user's new score) and have their rank decreased by 1. The user's own rank is increased by the size of the before-mentioned query.
The rank is now immediately available on client reads. However, the ranking updates inside of the proposed functions are fairly read- and write-heavy. The exact number of operations depends greatly on your application, but for my personal application a great frequency of score changes and relative closeness of scores rendered this approach too inefficient. I'm curious if anyone has found a more efficient (live) alternative.
Scheduled ranks (simplest, but expensive and periodic)
Schedule a Firebase function to simply sort the entire user collection by ascending score and write back the rank for each (in a batch update). This process can be repeated daily, or more frequent/infrequent depending on your application. For N users, the function always makes N reads and N writes.
Scheduled approximated ranks (cheapest, but non-precise and periodic)
As an alternative for the 'Scheduled ranks' option, I would suggest an approximation technique: instead of writing each user's exact rank upon for each scheduled update, the collection of users (still sorted as before) is simply split into M chunks of equal size and the scores that bound these chunks are written to a separate 'stats' collection.
So, for example: if we use M = 3 for simplicity and we read 60 users sorted by ascending score, we have three chunks of 20 users. For each of the (still sorted chunks) we get the score of the last (lowest score of chunk) and the first user (highest score of chunk) (i.e. the range that contains all scores of that chunk). Let's say that the chunk with the lowest scores has scores ranging from 20-120, the second chunk has scores from 130-180 and the chunk with the highest scores has scores 200-350. We now simply write these ranges to a 'stats' collection (the write-count is reduced to 1, no matter how many users!).
Upon rank retrieval, the user simply reads the most recent 'stats' document and approximates their percentile rank by comparing the ranges with their own score. Of course it is possible that a user scores higher than the greatest score or lower than the lowest score from the previous 'stats' update, but I would just consider them belonging to the highest scoring group and the lowest scoring group respectively.
In my own application I used M = 20 and could therefore show the user percentile ranks by 5% accuracy, and estimate even within that range using linear interpolation (for example, if the user score is 450 and falls into the 40%-45%-chunk ranging from 439-474, we estimate the user's percentile rank to be 40 + (450 - 439) / (474 - 439) * 5 = 41.57...%).
If you want to get real fancy you can also estimate exact percentile ranks by fitting your expected score distribution (e.g. normal distribution) to the measured ranges.
Note: all users DO need to read the 'stats' document to approximate their rank. However, in most applications not all users actually view the statistics (as they are either not active daily or just not interested in the stats). Personally, I also used the 'stats' document (named differently) for storing other DB values that are shared among users, so this document is already retrieved anyways. Besides that, reads are 3x cheaper than writes. Worst case scenario is 2N reads and 1 write.