Scalable Unique ID Function in Typescript

Scalable Unique ID Function in Typescript - javascript

What is the methodology for generating a unique ID that minimizes the chance for an overlap?
for(let i = 0; i < <Arbitrary Limit>; i++)
generateID();
There are a few current solutions, but they are all roundabout ways of solving the problem.
Possible Solution 1:
Use a database to generate the new ID.
This does not work for me since I would like to do this in the frontend.
Possible Solution 2:
Use Math.random()*Math.floor(LIMIT), where LIMIT is a numeric value.
This does not work as minimizing the chance for overlap requires a large limit, and thus a massive ID. If working with hundreds of thousands of instances that need an ID, the chance increases greatly.
Possible Solution 3:
'_' + Math.random().toString(36).substr(2, 9);
This is close to working but I believe Math.random() is pseudo-random.
Possible Solution 4:
Date.now(). Date.getTime(), etc.
This does not work as generating [Date.now(), Date.now()] will cause the same ID. It arguably also needs a long ID to minimize overlap.
I do not require absolutely 0% chance of generating the same ID, I wish to minimize the chance as much as possible without:
Storing a count
Using 'other technology' (no database, no library, etc)
Making a massive ID
This preferrably should be scalabe, eg. This should work for 10 or 1000000 IDs.
Edit: Unique IDs generated locally and without need for communication across users of the frontend. Ex: A component needs to render many instances of the same class and needs a key to assign to it. Keys must be different and upon unmounting the component with its generated keys/instances is removed.

It seems you want to generate unique IDs entirely on the front-end and without relying on a backend at all or "storing a count". Then the solution depends, in part, on how many different users you expect will access your application's frontend during its lifetime, and the size of IDs depends on how much you're willing to tolerate the risk of collisions (assuming you're generating IDs at random); for that, see Birthday problem.
Then, depending on the size of IDs you choose, you generate the IDs at random using a cryptographic RNG (such as the function crypto.randomBytes), which is the closest available to "truly" random IDs.
On the other hand, if only few users will access your front end, but each one generates many unique IDs, then you can assign each user a unique value from a central database, because then each front-end computer can use that unique value as part of unique identifiers it generates and ensure that the identifiers are unique across the application without further contacting other computers or the central database.
Notice that there are other considerations as well: You should also consider whether just anyone should access the resource an ID identifies simply by knowing the ID (even without being logged in or authorized in some way). If not, then additional access control will be necessary.
For your purposes, you can try applying sequential IDs. If sequential IDs alone are not adequate, then you can try applying a reversible operation on sequential IDs. One example of this operation is technically called a "linear congruential generator with a power of 2 modulus", such as the ones described in the following pages:
How to sync a PRNG between C#/Unity and Python?
Algorithm or formula that can take an incrementing counter and make it appear uniquely random

Related

How to generate a referral link from an Hash code in nodejs

I have a system where each account of a user has a unique id, I can't user that id as a referral link because it contains dashes -.
I need to generate a unique referral code given that id, I thought about using a sha256 of that id.
The problem is that the sha256 is too long to be used as a referral but if I truncate it, the collision chances increase.
Is there any way to generate a referral link given an id?
this is the format id: 04v23533-680d-1107-j4h1-1c32343c1004

If your user IDs are random and unique, then you can't use them to generate shorter IDs without collisions (because you're effectively trying to compress random data).
Realistically though, are collisions a concern? If you have 500 users a a truncated hash has a one in 100 billion chance of colliding, is that really a problem? Depends how many users you have, and how long your codes can be.
But referral links don't need to be secret, they just need to be unique. So at the simplest level you could just generate sequential integers and use those for the IDs? Sure, you could guess other valid referral IDs, but who cares?
If you want them to be a bit more complex, you could generate random strings, and check them against the current list of IDs in the database to make sure that they're unique. But this might slow down once you have millions of users.

Generate a unique id in react that persist

I need to keep a unique identifier for each client using my react app.
doing this will regenerate a random string (what I want) but does this on each refresh which is not what I want
const [id] = useState(Math.random().toString(36).substr(2, 8));
I've found uniqueId() form lodash but I'm afraid the id's won't be unique across multiple clients as it only give a unique Id and increment it at every call (1, 2, 3...)
const [id] = useState(_uniqueId());
Is there some kind of _uniqueId that generates a random string and also persist through page refresh?

I don't think there is a built-in or out-of-the-box solution that generates unique id in react that persist automatically. You have two problems to solve.
How to generate unique id. Which was already solved by using the uuid.
And how to persist it.
There are plenty of storage you can use depend on your need. Here's few of them where you can persist your data assuming you want it to be stored in client side.
LocalStorage
SessionStorage
Cookie
IndexedDB API
FileSystem
Again, it depends on your use case. So, carefully check them out which one fits on your requirement.

Another way to generate a temporary ID that would be the same for the same client, without storing it is to use browser fingerprinting.
For example, you can take user-agent, client timezone, and screen resolution, apply some hash function to them and call it an ID.
There are more advanced ways of fingerprinting that would result in less chance of two different users having the same ID, but it'll never be a 0% chance.
You also might want to use some libraries, such as https://github.com/fingerprintjs/fingerprintjs for this.

How to generate unique number from string in node.js

I would like to generate a unique number from string. The string is a combination of username and password. I would like to generate a unique number id (not string) from this combination. I first md5 the combination and then convert it to number. The number length needs to be 10. Any suggestions?

It would be best if you can provide more details about the third-party you're trying to interface with, because this is a very odd request and it contains a fundamental flaw. You ask for the number to be unique, but you are allowing for only 10 decimal ("number id") digits, or ~10 billion possible values.
This sounds like an awful lot but it's really not. This gives you a hash of just over 33 bits. The simple hash collision probability calculator at http://davidjohnstone.net/pages/hash-collision-probability puts this at a 44% chance of a collision at just 100,000 entries. But that assumes full usage of all the available input characters. Since username and password combinations are almost always limited to alphabetic and numeric characters, the real collision chance is much worse at far fewer entries (can't be calculated without knowing the characters you allow for these fields - but it's bad).
NodeJS provides numerous crypto functions in the crypto module. A whole set of hashing functions is available, including the ideal-case SHA* options. These can be used to provide safe, irreversible hashes with astronomically collision probabilities.
If these options are not usable for you, I would suggest you have a fundamental design flaw. You're almost certainly mapping a user/pass combination to a userID in a remote system in a way that an attacker would find easy to compromise with a simple brute-force attack, given the high collision risk in your model.
If you are doing what I think you are doing, the "right" way to do this would be to have a simple database on a server somewhere. The user/pass would be assigned a unique ID in there, and it doesn't matter what this is - it could be an auto-increment ID field in a single MySQL table. The server would then contact this remote service with the ID value for any API calls necessary, and return the results to the user. This eliminates the security risk because the username/password are not actually hashed, just stored, and can be checked 100% on every call.
Never use a hash as a primary data value. It's a simplification, not a real value on its own.

Followers - mongodb database design

So I'm using mongodb and I'm unsure if I've got the correct / best database collection design for what I'm trying to do.
There can be many items, and a user can create new groups with these items in. Any user may follow any group!
I have not just added the followers and items into the group collection because there could be 5 items in the group, or there could be 10000 (and the same for followers) and from research I believe that you should not use unbound arrays (where the limit is unknown) due to performance issues when the document has to be moved because of its expanding size. (Is there a recommended maximum for array lengths before hitting performance issues anyway?)
I think with the following design a real performance issue could be when I want to get all of the groups that a user is following for a specific item (based off of the user_id and item_id), because then I have to find all of the groups the user is following, and from that find all of the item_groups with the group_id $in and the item id. (but I can't actually see any other way of doing this)
Follower
.find({ user_id: "54c93d61596b62c316134d2e" })
.exec(function (err, following) {
if (err) {throw err;};
var groups = [];
for(var i = 0; i<following.length; i++) {
groups.push(following[i].group_id)
}
item_groups.find({
'group_id': { $in: groups },
'item_id': '54ca9a2a6508ff7c9ecd7810'
})
.exec(function (err, groups) {
if (err) {throw err;};
res.json(groups);
});
})
Are there any better DB patterns for dealing with this type of setup?
UPDATE: Example use case added in comment below.
Any help / advice will be really appreciated.
Many Thanks,
Mac

I agree with the general notion of other answers that this is a borderline relational problem.
The key to MongoDB data models is write-heaviness, but that can be tricky for this use case, mostly because of the bookkeeping that would be required if you wanted to link users to items directly (a change to a group that is followed by lots of users would incur a huge number of writes, and you need some worker to do this).
Let's investigate whether the read-heavy model is inapplicable here, or whether we're doing premature optimization.
The Read Heavy Approach
Your key concern is the following use case:
a real performance issue could be when I want to get all of the groups that a user is following for a specific item [...] because then I have to find all of the groups the user is following, and from that find all of the item_groups with the group_id $in and the item id.
Let's dissect this:
Get all groups that the user is following
That's a simple query: db.followers.find({userId : userId}). We're going to need an index on userId which will make the runtime of this operation O(log n), or blazing fast even for large n.
from that find all of the item_groups with the group_id $in and the item id
Now this the trickier part. Let's assume for a moment that it's unlikely for items to be part of a large number of groups. Then a compound index { itemId, groupId } would work best, because we can reduce the candidate set dramatically through the first criterion - if an item is shared in only 800 groups and the user is following 220 groups, mongodb only needs to find the intersection of these, which is comparatively easy because both sets are small.
We'll need to go deeper than this, though:
The structure of your data is probably that of a complex network. Complex networks come in many flavors, but it makes sense to assume your follower graph is nearly scale-free, which is also pretty much the worst case. In a scale free network, a very small number of nodes (celebrities, super bowl, Wikipedia) attract a whole lot of 'attention' (i.e. have many connections), while a much larger number of nodes have trouble getting the same amount of attention combined.
The small nodes are no reason for concern, the queries above, including round-trips to the database are in the 2ms range on my development machine on a dataset with tens of millions of connections and > 5GB of data. Now that data set isn't huge, but no matter what technology you choose you, will be RAM bound because the indices must be in RAM in any case (data locality and separability in networks is generally poor), and the set intersection size is small by definition. In other words: this regime is dominated by hardware bottlenecks.
What about the supernodes though?
Since that would be guesswork and I'm interested in network models a lot, I took the liberty of implementing a dramatically simplified network tool based on your data model to make some measurements. (Sorry it's in C#, but generating well-structured networks is hard enough in the language I'm most fluent in...).
When querying the supernodes, I get results in the range of 7ms tops (that's on 12M entries in a 1.3GB db, with the largest group having 133,000 items in it and a user that follows 143 groups.)
The assumption in this code is that the number of groups followed by a user isn't huge, but that seems reasonable here. If it's not, I'd go for the write-heavy approach.
Feel free to play with the code. Unfortunately, it will need a bit of optimization if you want to try this with more than a couple of GB of data, because it's simply not optimized and does some very inefficient calculations here and there (especially the beta-weighted random shuffle could be improved).
In other words: I wouldn't worry about the performance of the read-heavy approach yet. The problem is often not so much that the number of users grows, but that users use the system in unexpected ways.
The Write Heavy Approach
The alternative approach is probably to reverse the order of linking:
UserItemLinker
{
userId,
itemId,
groupIds[] // for faster retrieval of the linker. It's unlikely that this grows large
}
This is probably the most scalable data model, but I wouldn't go for it unless we're talking about HUGE amounts of data where sharding is a key requirement. The key difference here is that we can now efficiently compartmentalize the data by using the userId as part of the shard key. That helps to parallelize queries, shard efficiently and improve data locality in multi-datacenter-scenarios.
This could be tested with a more elaborate version of the testbed, but I didn't find the time yet, and frankly, I think it's overkill for most applications.

I read your comment/use-case. So I update my answer.
I suggest to change the design as per this article: MongoDB Many-To-Many
The design approach is different and you might want to remodel your approach to this. I'll try to give you an idea to start with.
I make the assumption that a User and a Follower are basically the same entities here.
I think the point you might find interesting is that in MongoDB you can store array fields and this is what I will use to simplify/correct your design for MongoDB.
The two entities I would omit are: Followers and ItemGroups
Followers: It is simply a User who can follow Groups. I would add an
array of group ids to have a list of Groups that the User follows. So instead of having an entity Follower, I would only have User with an array field that has a list of Group Ids.
ItemGroups: I would remove this entity too. Instead I would use an array of Item Ids in the Group entity and an array of Group Ids in the Item entity.
This is basically it. You will be able to do what you described in your use case. The design is simpler and more accurate in the sense that it reflects the design decisions of a document based database.
Notes:
You can define indexes on array fields in MongoDB. See Multikey Indexes for example.
Be wary about using indexes on array fields though. You need to understand your use case in order to decide whether it is reasonable or not. See this article. Since you only reference ObjectIds I thought you could try it, but there might be other cases where it is better to change the design.
Also note that the ID field _id is a MongoDB
specific field type of ObjectID used as primary key. To access the ids you can refer to it e.g. as user.id, group.id, etc. You can use an index to ensure uniqueness as per this question.
Your schema design could look like this:
As to your other question/concerns
Is there a recommended maximum for array lengths before hitting performance issues anyway?
the answer is in MongoDB the document size is limited to 16 MB and there is now way you can work around that. However 16 MB is considered to be sufficient; if you hit the 16 MB then your design has to be improved. See here for info, section Document Size Limit.
I think with the following design a real performance issue could be when I want to get all of the groups that a user is following for a specific item (based off of the user_id and item_id)...
I would do this way. Note how "easier" it sounds when using MongoDB.
get the item of the user
get groups that reference that item
I would be rather concerned if the arrays get very large and you are using indexes on them. This could overall slow down write operations on the respective document(s). Maybe not so much in your case, but not entirely sure.

You're on the right track to creating a performant NoSQL schema design, and I think you're asking the right questions as to how to properly lay things out.
Here's my understanding of your application:
It looks like Groups can both have many Followers (mapping users to groups) and many Items, but Items may not necessarily be in many Groups (although it is possible). And from your given use-case example, it sounds like retrieving all the Groups an Item is in and all the Items in a Group will be some common read operations.
In your current schema design, you've implemented a model between mapping users to groups as followers and items to groups as item_groups. This works alright until you mention the problem with more complex queries:
I think with the following design a real performance issue could be when I want to get all of the groups that a user is following for a specific item (based off of the user_id and item_id)
I think a few things could help you out in this situation:
Take advantage of MongoDB's powerful indexing capabilities. In particular, I think you should consider creating compound indexes on your Follower objects covering your Group and User, and your Item_Groups on Item and Group, respectively. You'll also want to make sure this kind of relationship is unique, in that a user can only follow a group once and an item can only be added to a group once. This would best be achieved in some pre-save hooks defined in your schema, or using a plugin to check for validity.
FollowerSchema.index({ group: 1, user: 1 }, { unique: true });
Item_GroupsSchema.index({ group: 1, item: 1 }, { unique: true });
Using an index on these fields will create some overhead when writing to the collection, but it sounds like reading from the collection will be a more common interaction so it'll be worth it (I'd suggest reading more up on index performance).
Since a User probably won't be following thousands of groups, I think it'd be worthwhile to include in the user model an array of groups the user is following. This will help you out with that complex query when you want to find all instances of an item in groups that a user is currently following, since you'll have the list of groups right there. You'll still have the implementation where your using $in: groups, but it'll be with one less query to the collection.
As I mentioned before, it seems like items may not necessarily be in that many groups (just like users won't necessarily be following thousands of groups). If the case may commonly be that an item is in maybe a couple hundred groups, I'd consider just adding an array to the item model for each group that it gets added to. This would increase your performance when reading all the groups an item is in, a query you mentioned would be a common one. Note: You'd still use the Item_Groups model to retrieve all the items in a group by querying on the (now indexed) group_id.

Unfortunately NoSQL databases aren't eligible in this case. Your data model seems exact relational. According to MongoDB documentation we can do only these and can perform only these.
There are some practices. MongoDB advises to us using Followers collection to get which user follows which group and vice versa with good performance. You may find the closest case to your situation on this page on slide 14th. But I think the slides can eligible if you want to get each result on different page. For instance; You are a twitter user and when you click the followers button you'll see the all your followers. And then you click on a follower name you'll see the follower's messages and whatever you can see. As we can see all of those work step-by-step. No needed a relational query.
I believe that you should not use unbound arrays (where the limit is unknown) due to performance issues when the document has to be moved because of its expanding size. (Is there a recommended maximum for array lengths before hitting performance issues anyway?)
Yes, you're right. http://askasya.com/post/largeembeddedarrays .
But if you have about a hundred items in your array there is no problem.
If you have fixed size some data thousands you may embed them into your relational collections as array. And you can query your indexed embedded document fields rapidly.
In my humble opinion, you should create hundreds of thousands test data and check performances of using embedded documents and arrays eligible to your case. Don't forget creating indexes appropriate your queries. You may try to using document references on your tests. After tests if you like performance of results go ahead..
You had tried to find group_id records that are followed by a specific user and then you've tried to find a specific item with those group_id. Would it be possible Item_Groups and Followers collections have a many-to-many relation?
If so, many-to-many relation isn't supported by NoSQL databases.
Is there any chance you can change your database to MySQL?
If so you should check this out.
briefly MongoDB pros against to MySQL;
- Better writing performance
briefly MongoDB cons against to MySQL;
- Worse reading performance
If you work on Node.js you may check https://www.npmjs.com/package/mysql and https://github.com/felixge/node-mysql/
Good luck...

What are the uniqueness guarantees of names generated with Firebase's push()/childByAutoID?

I'd like to use Firebase to make publicly-readable data whose location is difficult to guess. So, to give someone access to the data stored in "element [element ID = X]", I'd like to just send them "X", instead of sending them "X" along with a security token crafted to give them access to the element. Firebase's push() and childByAutoID seem like a natural fit: I can grant public read access to all individual elements, but deny public listing. My code will be blissfully free of token and random number generation. The automatically generated ID is supposed to be unique, and thus should be difficult to guess.
From looking at Firebase.js, it appears the first 8 characters of the automatically generated ID are based on the current timestamp, and the next 12 characters are randomly generated using Math.random(). I assume that the iOS framework does the same thing, and although I can't see the code, the library links to both SecRandomCopyBytes and arc4random.
For my purposes, this looks good enough, but has anyone seen guidance from Firebase on whether we can count on this behavior? I would hate to build code that assumes these names are relatively strong random strings and then have that assumption violated when I upgraded to a newer version of Firebase.

The purpose of the auto-generated IDs provided by Firebase is to allow the developer to create a chronologically ordered list in a distributed manner. It relies on Math.random and the timestamp to generate an ID unique to that client.
However, if you're going to use the auto IDs as security keys, it may not be the best idea depending on how secure you want your system to be. Math.random is not a cryptographically secure random number generator and since push() relies on it, the IDs generated by it aren't either.
The general concept of giving a user access to some data in Firebase if they know the key is a good one though. We have an example of using this type of security rule, but instead of using push IDs, we use a SHA-256 hash of the content itself (in this particular case, they are images). Hashing the content to generate the keys is more secure than relying on push() IDs.

We Keep Coding

JavaScript is the programming language of the Web.