As far as i know, you can call ObjectId("something"); to generate a new id.
Is it possible to generate a random id, which does not already exist in the database/collection and has a specific format?
In my case, i want object id to generate a unique random 10digit number.
So the result should be:
var ObjectId = require('mongodb').ObjectID
var id = new ObjectId("something");
console.log(id) ==> 0123456789
As in the comments, your best bet would be to get seconds into the current year when inserting the document. But the real question would be how intensive would insertion of new documents be. You need to account for multiple factors, first being the more documents you insert, the higher the chance your ids will collide at one point.
I would recommend just leaving the standard GUID mongo generates for you, however a solution that I can think of from top of my head would be getting the seconds into the current year, substring that to get the last 5 digits, and then generate 5 random digits and merge them together.
new Date().getTime().toString().substring(8) + Math.floor(Math.random() * (99999 - 10000)) + 100000;
With the above (80 bits) you would get 4.135898×10-13 collision probability on 1000000 documents.
Related
I am (VERY) new to Apps Script and JS generally. I am trying to write a script that will automatically tally the difference between student entry time and start time of a course to deliver total minutes missed.
I have been able to get a function working that can do this for a single cell value, but am having trouble iterating it across a range. Doubtless this is due to a fundamental misunderstanding I have about the for loop I am using, but I am not sure where to look for more detailed information.
Any and all advice is appreciated. Please keep in mind my extreme "beginner status".
I have tried declaring a blank variable and adding multiple results of previously written single-cell functions to that total, but it is returning 0 regardless of given information.
I am including all three of the functions below, the idea is that each will do one part of the overall task.
function LATENESS (entry,start) {
return (entry-start)/60000
}
function MISSEDMINUTES(studenttime,starttime) {
const time = studenttime;
const begin = starttime;
if (time=="Present") {
return 0
} else if (time=="Absent") {
return 90
} else {
return LATENESS(time,begin)
}
}
function TOTALMISSED(range,begintime) {
var total = 0
for (let i = 0; i < range.length; i++) {
total = total + MISSEDMINUTES(i,begintime)
}
}```
If you slightly tweak your layout to have the 'missing minutes' column immediately adjacent to the column of names, you can have a single formula which will calculate the missing minutes for any number of students over any number of days:
Name
*
2/6
2/7
2/8
2/9
John Smith
-
Present
Present
Absent
10:06
Lucy Jenkins
-
Absent
Absent
Absent
Absent
Darren Polter
-
Present
Present
Present
10:01
With 'Name' present in A1, add the following to cell B1 (where I've marked an asterisk):
={"mins missed";
byrow(map(
C2:index(C2:ZZZ,counta(A2:A),counta(C1:1)),
lambda(x,switch(x,"Present",0,"Absent",90,,0,1440*(x-timevalue("10:00"))))),
lambda(row,sum(row)))}
We are MAPping a minute value onto each entry in the table (where 'Present'=0, 'Absent'=90 & a time entry = the number of minutes difference between then and 10am), then summing BYROW.
Updated
Based on the example, you could probably have a formula like the below one to conduct your summation:
=Sum(ARRAYFORMULA(if(B2:E2="Absent",90,if(isnumber(B2:E2),(B2:E2-$K$1)*60*24,0))))
Note that k1 has the start time of 10:00. Same sample sheet has working example.
Original Answer
I'm pretty sure you could do what you want with regular sheets formulas. Here'a sample sheet that shows how to get the difference in two times in minutes and seconds... Along with accounting for absent.
Here's the formula used that will update with new entries.
=Filter({if(B2:B="Absent",90*60,Round((C2:C-B2:B)*3600*24,0)),if(B2:B="Absent",90,Round((C2:C-B2:B)*3600*24/60,1))},(B2:B<>""))
This example might not solve all your issues, but from what I'm seeing, there's no need to be using an app script. If this doesn't cover it, post some sample data using Mark down table.
I have a use case where I need to generate alpha numeric capital case strings of length 25, so the total possible unique combinations are very high:
36 pow (25) = 808281277464764060643139600456536293376
The string is to be stored in MySql database table with unique set to true
I am using following code to generate the string:
const Chance = require('chance');
const chance = new Chance(Date.now() + Math.random());
let randomStr = chance.string({length: 25,
pool: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'});
console.log(randomStr);
Node.js can run in cluster mode, so value of timestamp can be same for different requests so I also added Math.random(). Is this enough to ensure that MySql unique constraint won't be violated by the random strings.
Is this enough to ensure that MySQL unique constraint won't be violated by the random strings.
3625 has 129 bit. If we apply the birthday problem, then you're likely getting a collision around 264 strings. You'll probably generate much less than that. This is only true provided that you use a good randomness source.
Math.random() is not a good randomness source.
I need a random number which I will be storing in the database per record and that (the generated random number)too should not repeat in future. So is Math.random() function is good for that?
Thanks in advance
Random is random and statistically it will repeat somewhere in the future. What you can do is combine something unique with a random part - for example use the unix timestamp with a random number.
function getRand(){
return new Date().getTime().toString() + Math.floor(Math.random()*1000000);
}
Math.random() will not give you a unique number, in general it will not even give you a real random number.
You can try to use a datetime-based random algorythm, or go for a random number and then check if it's already in your database, but both approaches are not 100% save. There is pretty much only one way to ensure the number you store is unique, which is on database level.
You can see here for GUID generator for unique keys for your DB. Also there you can find a lot of good information about random mechanisms
Create GUID / UUID in JavaScript?
time + increment + random:
var newGuid = (function() {
var guid = parseInt(Math.random() * 36);
return function newGuid() {
return Date.now().toString(36) + (guid++ % 36).toString(36) + Math.random().toString(36).slice(2, 4);
};
})();
If you look at the specs of math.random, you will see that is defined as quasi-random number generator, meaning, it is not REALLY random. Moreover, a real random number generator will for sure repeat a result somewhere along the line BUT, when this happens, the series following this repetition will not resemble the series that followed the first appearance of the number.
Now, you mentioned that you need to store this in a database. Why don't you use a SEQUENCE (in Oracle; other DBMS have different mechanisms for this)? This will warrant that any used number will NEVER be reused. Moreover, if you don't want to use numbers in a sequence, you can using the value of this sequence as the seed for a random number (or a hashing). This will give you uniqueness to quite many digits.
I need to calculate the percentile rank of a particular value against a large number of values filtered in various different ways. The data is all stored on Parse.com, which has a limitation of returning a maximum of 1000 rows per query. The number of values stored is likely to exceed well over 100,000.
By 'percentile rank', I mean I need to calculate the percentage of values that the provided value is greater than. I am not trying to calculate the value of a provided percentile. For example, given a list of values {20, 23, 24, 29, 30, 31, 35, 40, 40, 43} the percentile rank of the provided value 35 is 70%. The algorithm for this is simply the rank of the value / count of values * 100. Not sure if 'percentile rank' is the correct terminology for this.
I have considered a couple of different approaches to this. The first is to pull down the full list of values (into Parse Cloud) and then calculate the percentile rank from there, then filter the list and calculate again, repeating the last two steps as many times as required. The problem with this approach is it will not work once we reach 1000 values, which we can expect pretty quickly.
Another option, which is the best I can come up with so far, is to query the count of items, and the rank of the provided value. For example:
var rank_world_alltime = new Parse.Query("Values")
.lessThan("value", request.params.value) // Filters query to values less than the provided value, so counting this query will return the rank
.count();
var count_world_alltime = new Parse.Query("Values")
.count();
Parse.Promise.when(rank_world_alltime, count_world_alltime).then(function(rank, count) {
percentile = rank / count * 100;
console.log("world_alltime_percentile = " + percentile);
});
This works well for a single calculation, but I need to perform multiple calculations, and this approach very quickly becomes a lot of queries. I expect to need to run about 15 calculations per call, which is 30 queries. All calculations need to complete in under 3 seconds before Parse terminates the job, and I am limited to 30 reqs/second, so this is very quickly going to become a problem.
Does anyone have any suggestions on how else I could approach this? I've thought about somehow pre-processing some of this but can't quite work out how to do so, as the filters will be based on time and location (city and country), so there are potentially a LOT of pre-calculations that will need to be run at regular intervals. The results do not need to be 100% accurate but something close.
I don't know much about parse, but as far as I understand what you say, it is some kind of cloud database thingy that holds your hiscores, and limits you 1000 rows per query, 3 seconds per job, and 30 queries per second.
In order to have approximate calculations and divide by 2 the number of queries, I would first of all cache the total (count_world_alltime, count_region,week, whatever). If you can save them somewhere locally. For numbers of 100K just getting the order of magnitude (thus not the latest updated number) should be good enough to get a percentile.
Maybe you can get several counts per query. However my lack of expertise in parse/nosql kind of stops me from being sure of this, you'll have to check their documentation. If it is possible however, for the case where you need percentiles for a serie of values all in the same category, I would
Order the values, let's call them a,b,c,d,e (once ordered)
Get the number of values between the intervals [0,a] [a,b] [b,c] [c,d] [d,e]
Use the cached total to get the percentiles (where Nxy is the number of values in [x,y]) :
Pa = 100 * N0a / total
Pb = 100 * ( N0a + Nab ) / total
Pc = 100 * ( N0a + Nab + Nbc ) / total
and so on...
If you need a value ranked worldwide, the other per region, some per week others over all times, etc, this doesn't apply. In that case I don't think you can get below 1 query/number, with caching the totals.
I'm building a leaderboard using Firebase. The player's position in the leaderboard is tracked using Firebase's priority system.
At some point in my program's execution, I need to know what position a given user is at in the leaderboard. I might have thousands of users, so iterating through all of them to find an object with the same ID (thus giving me the index) isn't really an option.
Is there a more performant way to determine the index of an object in an ordered list in Firebase?
edit: I'm trying to figure out the following:
/
---- leaderboard
--------user4 {...}
--------user1 {...}
--------user3 {...} <- what is the index of user3, given a snapshot of user3?
--------...
If you are processing tens or hundreds of elements and don't mind taking a bandwidth hit, see Katos answer.
If you're processing thounsands of records, you'll need to follow an approach outlined in principle in pperrin's answer. The following answer details that.
Step 1: setup Flashlight to index your leaderboard with ElasticSearch
Flashlight is a convenient node script that syncs elasticsearch with Firebase data.
Read about how to set it up here.
Step 2: modify Flashlight to allow you to pass query options to ElasticSearch
as of this writing, Flashlight gives you no way to tell ElasticSearch you're only interested in the number of documents matched and not the documents themselves.
I've submitted this pull request which uses a simple one-line fix to add this functionality. If it isn't closed by the time you read this answer, simply make the change in your copy/fork of flashlight manually.
Step 3: Perform the query!
This is the query I sent via Firebase:
{
index: 'firebase',
type: 'allTime',
query: {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"points": {
"gte": minPoints
}
}
}
}
},
options: {
"search_type": "count"
}
};
Replace points with the name of the field tracking points for your users, and minPoints with the number of points of the user whose rank you are interested in.
The response will look something like:
{
max_score: 0,
total: 2
}
total is the number of users who have the same or greater number of points -- in other words, the user's rank!
Since Firebase stores object, not arrays, the elements do not have an "index" in the list--JavaScript and by extension JSON objects are inherently unordered. As explained in Ordered Docs and demonstrated in the leaderboard example, you accomplish ordering by using priorities.
A set operation:
var ref = new Firebase('URL/leaderboard');
ref.child('user1').setPriority( newPosition /*score?*/ );
A read operation:
var ref = new Firebase('URL/leaderboard');
ref.child('user1').once('value', function(snap) {
console.log('user1 is at position', snap.getPriority());
});
To get the info you want, at some point a process is going to have to enumerate the nodes to count them. So the question is then where/when the couting takes place.
Using .count() in the client will mean it is done every time it is needed, it will be pretty accurate, but procesing/traffic heavy.
If you keep a separate index of the count it will need regular refreshing, or constant updating (each insert causeing a shuffling up of the remaining entries).
Depending on the distribution and volume of your data I would be tempted to go with a background process that just updates(/rebuilds) the index every (say) ten or twenty additions. And indexes every (say) 10 positions.
"Leaderboard",$UserId = priority=$score
...
"Rank",'10' = $UserId,priority=$score
"Rank",'20' = $UserId,priority=$score
...
From a score you get the rank within ten and then using a startat/endat/count on your "Leaderboard" get it down to the unit.
If your background process is monitoring the updates to the leaderboard, it could be more inteligent about its updates to the index either updating only as requried.
I know this is an old question, but I just wanted to share my solutions for future reference. First of all, the Firebase ecosystem has changed quite a bit, and I'm assuming the current best practices (i.e. Firestore and serverless functions). I personally considered these solutions while building a real application, and ended up picking the scheduled approximated ranks.
Live ranks (most up-to-date, but expensive)
When preparing a user leaderboard I make a few assumptions:
The leaderboard ranks users based on a number which I'll call 'score' from now on
New users rank lowest on the leaderboard, so upon user creation, their rank is set to the total user count (with a Firebase function, which sets the rank, but also increases the 'total user' counter by 1).
Scores can only increase (with a few adaptations decreasing scores can also be supported).
Deleted users keep a 'ghost' spot on the leaderboard.
Whenever a user gets to increase their score, a Firebase function responds to this change by querying all surpassed users (whose score is >= the user's old score but < the user's new score) and have their rank decreased by 1. The user's own rank is increased by the size of the before-mentioned query.
The rank is now immediately available on client reads. However, the ranking updates inside of the proposed functions are fairly read- and write-heavy. The exact number of operations depends greatly on your application, but for my personal application a great frequency of score changes and relative closeness of scores rendered this approach too inefficient. I'm curious if anyone has found a more efficient (live) alternative.
Scheduled ranks (simplest, but expensive and periodic)
Schedule a Firebase function to simply sort the entire user collection by ascending score and write back the rank for each (in a batch update). This process can be repeated daily, or more frequent/infrequent depending on your application. For N users, the function always makes N reads and N writes.
Scheduled approximated ranks (cheapest, but non-precise and periodic)
As an alternative for the 'Scheduled ranks' option, I would suggest an approximation technique: instead of writing each user's exact rank upon for each scheduled update, the collection of users (still sorted as before) is simply split into M chunks of equal size and the scores that bound these chunks are written to a separate 'stats' collection.
So, for example: if we use M = 3 for simplicity and we read 60 users sorted by ascending score, we have three chunks of 20 users. For each of the (still sorted chunks) we get the score of the last (lowest score of chunk) and the first user (highest score of chunk) (i.e. the range that contains all scores of that chunk). Let's say that the chunk with the lowest scores has scores ranging from 20-120, the second chunk has scores from 130-180 and the chunk with the highest scores has scores 200-350. We now simply write these ranges to a 'stats' collection (the write-count is reduced to 1, no matter how many users!).
Upon rank retrieval, the user simply reads the most recent 'stats' document and approximates their percentile rank by comparing the ranges with their own score. Of course it is possible that a user scores higher than the greatest score or lower than the lowest score from the previous 'stats' update, but I would just consider them belonging to the highest scoring group and the lowest scoring group respectively.
In my own application I used M = 20 and could therefore show the user percentile ranks by 5% accuracy, and estimate even within that range using linear interpolation (for example, if the user score is 450 and falls into the 40%-45%-chunk ranging from 439-474, we estimate the user's percentile rank to be 40 + (450 - 439) / (474 - 439) * 5 = 41.57...%).
If you want to get real fancy you can also estimate exact percentile ranks by fitting your expected score distribution (e.g. normal distribution) to the measured ranges.
Note: all users DO need to read the 'stats' document to approximate their rank. However, in most applications not all users actually view the statistics (as they are either not active daily or just not interested in the stats). Personally, I also used the 'stats' document (named differently) for storing other DB values that are shared among users, so this document is already retrieved anyways. Besides that, reads are 3x cheaper than writes. Worst case scenario is 2N reads and 1 write.