I'm writing a Node API and got a model for which I gotta generate a random number of 15 digits. This must be unique and should not look trivial (I can't get an autoincrement).
I really don't want to generate a number and query against Mongo database for existance checking. I would have to generate some kind of while loop based on promises that way.
I thought about simply using new Date().epoch but, is this going to be unique? could I ever get a duplicate?
Then I also thought on appending something like:
function privateKey (howMany, chars) {
chars = chars
|| "0123456789";
var rnd = crypto.randomBytes(howMany)
, value = new Array(howMany)
, len = chars.length;
for (var i = 0; i < howMany; i++) {
value[i] = chars[rnd[i] % len]
};
return parseInt(value.join(''));
}
To include a duplicity avoiding. How should I implement this?
Edit, this should be a number.
I know there's uuid and Mongo ObjectId but they're not only numbers.
I don't think it's a good idea. One of the reasons is system time skew. Upon synchronizing time with some benchmark server you would get duplicates. In fact this can happen on the runtime every couple of hours. Some servers have serious time drift and they sync time every one in a while. Any time this happens you can get duplicates.
Why not use ObjectId?
According to the MongoDB documentation, ObjectId is a 12-byte BSON type, constructed using:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
ObjectIds are small and fast to generate. MongoDB clients should add an _id field with a unique ObjectId. However, if a client does not add an _id field, mongod will add an _id field that holds an ObjectId.
Edit :
You can convert ObjectId to a number of length of your choice using the below code.
var idNum = parseInt(ObjectId.valueOf(), 15);
Using time for generating unique IDs is not safe. As ak. suggested it you could get duplicates due to bad synchro.
If not having a number is not critical, you should use node-uuid which is based on RFC4122 for unique ID generation.
Related
This question already has answers here:
How can I generate a unique, small, random, and user-friendly key?
(6 answers)
Closed 2 years ago.
I have a unique field in a SQL table that currently has 200K rows
I use randomString(6, '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ') to insert data on that field, I have too many unique conflict error when I want to insert new rows
In my log , I see that randomString generated HEGDDX string today but it generated in 3 months ago too and I have an error on insert
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ' has 36 character , I generate 6 length random string, So there is 36^6=2176782336 = 2.17E9 possible cases, So 200K rows in 2 billion has 0.00009 possibility for duplication
Is 0.00009 big enough for too may errors? is Math.random bad random generator? what is alternative for me?
const randomString = function(length, chars) {
let str = '';
const charsLen = chars.length;
for (let i = 0; i < length; i++) {
str += chars.charAt(Math.floor(Math.random() * charsLen));
}
return str;
}
At first sight, your implementation seems ok.
The JS builtin Math.random may not be "cryptography-random-safe", but it's fine for your use-case.
The problem lies in the math, it's counter-intuitive that with billions of possibilities you get collisions with a few hundred thousands. This "paradox" is closely related to the birthday paradox. For instance, this blog post is very close to your issue.
Potential solutions
Since it's supposed to be a user-friendly number, you clearly don't want to use a UUID / GUID or such.
Instead I suggest the following options:
Use a retry strategy. That may seem a poor hack to fix the problem, but I think it is appropriate in this case. Just catch the DB error for non-unique value on insert, and generate a new one. Limit the retries at 10 times or such, because you want to avoid infinite error loop if something else goes wrong.
The caveat of this basic approach is that may lead to some rare slowness if your db itself is slow to respond, for those cases.
You could also just load all existing coupons already generated in memory before insert and do the generation-retry in the code instead of waiting an error from db. That would make you read all coupons first, so you'd better index them.
If performance is more critical, you could even use a mix of the two: load a global cache of all coupons at regular intervals that fits your case (every hour, every day, etc...), so you can first quickly check in this list, without doing a big query in db. But collision may still happen since some values may have been added in the meantime, so you check for errors and retry.
You could also change strategy and don't enforce uniqueness. You need to check this with your requirements, but you could add some fields or a "child" "coupon" table for users.
Get some ideas to generate those values in DB here: Generating a random & unique 8 character string using MySQL (it's for MySQL, but some ideas could apply for all DBs)
You can use uuid. I have used it in many applications of mine.
If the rate of data coming is slow, not many requests are coming at the same time on the same server then you can also use the timestamp as an id too.
I have this issue with Google's Firestore and Google's Realtime DB ids/duplicates but I think it is more general problem and it may have multiple solutions without even considering Firebase.
Right now, I create IDs from my JSON object like this:
// Preparing data
let data = {
workouts: [{...}],
movements: [{...}]
}
// Creating ID from data
const id = btoa(JSON.stringify(data))
// Deleting workouts and movements
delete data.workouts
delete data.movements
// Adding multiple other properties to data objects for example:
data.date = new Date()
... and much more
// Creating new document in DB here or
alerting user it already exists if data.id already exists
When I load the data object from Firestore, I decode it like this:
const data = JSON.parse(atob(this.workout.id))
My goal is to have only unique workouts + movements combinations in my database and generating id based on data from workouts + movements solves it.
The issue is that Realtime DB has limit of 750 Bytes (750 UTF-8 chars per id) and Firestore has limit of 1500 Bytes per id. I have just discovered that by having id that has ~1000 chars. And I believe I would be able to hit even the 1500 chars limit with data from users.
My ideas:
1) Use some different encoding (supporting UTF-8) that will make even long (1000 chars) string to something like 100 chars max. It will still need to be decodable. Is it even possible or Base64 is the shortest it could be?
2) Use autogenerated IDs + save encoded string as data.id parameter to db and when creating new workout always compare this data.id to table of already created workout data.id(s).
Is it possible to solve without looping through all existing workouts?
3) Any other idea? I am still in the realm of decoding/encoding but I believe it must has a different more simple solution.
Do not btoa
First off, Base64 string is probably gonna be longer than the stringified JSON, so if you're struggling with character limit and you can use entire UTF-8, do not btoa anything.
IDs
You're looking for a hash. You could (not recommended) try to roll your own by writing hashing functions for JSON primitives, each must return number:
{ ... } object shall have it's properties sorted by name then hashed
string string shall construct the it's hash from individual characters (.charCodeAt())
number probably can just be kept as-is
[ ... ] Not really sure what would I do with arrays, probably assume different order is different hash and hash them as is
Then you'd deal with the json recursively, constructing the value as:
let hash = 0;
hash += hashStringValue("ddd");
hash *= 31;
hash += hashObjectValue({one:1, test:"text"});
return hash
The multiplication by a prime before addition is a cheap trick, but this only works fir limited depth of the object.
Use library for hash
I googled javascript hash json and found this: https://www.npmjs.com/package/json-hash which looks like what you want:
// If you're not on babel use:
// require('babel/polyfill')
npm install json-hash
var assert = require('assert')
var hash = require('json-hash')
// hash.digest(any, options?)
assert.equal(hash.digest({foo:1,bar:1}), hash.digest({bar:1,foo:1}))
Storage
For the storage of JSON data, if you really need it, use compression algorithm such as LZString. You could also filter the JSON and only keep the values you really need.
I currently have a chat bot setup to store records onto MongoDB. The object that's stored in Mongo looks like...
{ ..., expiration_time: 12451525, ... }
Where expiration_time is a number represented in minutes.
My initial approach was use setInterval on the web application to query the database to delete all records that matched the criteria of current time being greater or equal to expiration time. However, I feel like there would be a lot of additional queries to the database from the web application, when there are already other operations such as reading and writing data.
I read about storing functions onto Mongo, but I'm sure how to automate the process of invoking the function to self delete the records.
I would definitely love any feedback, approach, or guidelines for best practices.
Thanks in advance!
You can use a TTL index:
db.messages.createIndex( { "expiration_time": 1 }, { expireAfterSeconds: 0 } )
The only requirement is that expirtion_time has to be a date instead of an integer.
To expire documents at a specific clock time, begin by creating a TTL index on a field that holds values of BSON date type
IMHO there are 2 options:
One can use a cronjob to remove the out-of-date entries
One can use the Capped Collections. It's like a ring buffer, so that the oldest entry will be overwritten. Here one must choose the right fix-size of the capped Collections. I.e, size = 24 * 60 = 1440 if the chat bot writes every minute to the collection.
I am creating an app where I sometimes need to allow user to generate some random strings. I would like to force that to be generated in the following format:
xxxx-xxxx-xxxx
Where "x" is some number [0-9] or character [A-Z]. What would the most efficient way to do this? When generated, I would also need to check does it already exist in database so I am a little bit worried about the time which it would take.
We can make it so simple like
require("crypto").randomBytes(64).toString('hex')
You can use crypto library.
var crypto = require('crypto');
//function code taken from http://blog.tompawlak.org/how-to-generate-random-values-nodejs-javascript
function randomValueHex (len) {
return crypto.randomBytes(Math.ceil(len/2))
.toString('hex') // convert to hexadecimal format
.slice(0,len).toUpperCase(); // return required number of characters
}
var string = randomValueHex(4)+"-"+randomValueHex(4)+"-"+randomValueHex(4);
console.log(string);
Check these threads: Generate random string/characters in JavaScript
You can check if the field exists in the database. If it does, just generate a new token. Then check again. The probability of it existing is really low if you don't have large user base. Hence, probability of long loop of checks is low as well.
I am writing a node.js application that relies on redis as its main database, and user info is stored in this database.
I currently have the user data (email, password, date created, etc.) in a hash with the name as user:(incremental uid). And a key email:(email) with value (same incremental uid).
When someone logs in, the app looks up a key matching the email with email:(email) to return the (incremental uid) to access the user data with user:(incremental uid).
This works great, however, if the number of users reaches into the millions (possible, but somewhat a distant issue), my database size will increase dramatically and I'll start running into some problems.
I'm wondering how to hash an email down to an integer that I can use to sort into hash buckets like this (pseudocode):
hash(thisguy#somedomain.com) returns 1234
1234 % 3 or something returns 1
store { thisguy#somedomain.com : (his incremental uid) } in hash emailbucket:1
Then when I need to lookup this uid for email thisguy#somedomain.com, I use a similar procedure:
hash(thisguy#somedomain.com) returns 1234
1234 % 3 or something returns 1
lookup thisguy#somedomain.com in hash emailbucket:1 returns his (incremental uid)
So, my questions in list form:
Is this practical / is there a better way?
How can I hash the email to a few digits?
What is the best way to organize these hashes into buckets?
It probably won't end up mattering that much. Redis doesn't have an integer type, so you're only saving yourself a few bytes (and less each time your counter rolls over to the next digit). Doing some napkin math, at a million users, the difference in actual storage would be ~50 mbs. With hard drives in the < $1 / gb range, it's not worth the time it would take to implement.
As a thought experiment, you could maintain a key that is your current user counter, and just GET and INCR each time you add a new user.
Yes it the better way for saving millions of key value pair in hashes.
You need to create the algorithm for yourself. For example - you can use timestamp for creating a bucket value which changes after every 1000 value. . There can be many other ways.
Read this article for more reference http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value