This question already has answers here:
How can I generate a unique, small, random, and user-friendly key?
(6 answers)
Closed 2 years ago.
I have a unique field in a SQL table that currently has 200K rows
I use randomString(6, '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ') to insert data on that field, I have too many unique conflict error when I want to insert new rows
In my log , I see that randomString generated HEGDDX string today but it generated in 3 months ago too and I have an error on insert
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ' has 36 character , I generate 6 length random string, So there is 36^6=2176782336 = 2.17E9 possible cases, So 200K rows in 2 billion has 0.00009 possibility for duplication
Is 0.00009 big enough for too may errors? is Math.random bad random generator? what is alternative for me?
const randomString = function(length, chars) {
let str = '';
const charsLen = chars.length;
for (let i = 0; i < length; i++) {
str += chars.charAt(Math.floor(Math.random() * charsLen));
}
return str;
}
At first sight, your implementation seems ok.
The JS builtin Math.random may not be "cryptography-random-safe", but it's fine for your use-case.
The problem lies in the math, it's counter-intuitive that with billions of possibilities you get collisions with a few hundred thousands. This "paradox" is closely related to the birthday paradox. For instance, this blog post is very close to your issue.
Potential solutions
Since it's supposed to be a user-friendly number, you clearly don't want to use a UUID / GUID or such.
Instead I suggest the following options:
Use a retry strategy. That may seem a poor hack to fix the problem, but I think it is appropriate in this case. Just catch the DB error for non-unique value on insert, and generate a new one. Limit the retries at 10 times or such, because you want to avoid infinite error loop if something else goes wrong.
The caveat of this basic approach is that may lead to some rare slowness if your db itself is slow to respond, for those cases.
You could also just load all existing coupons already generated in memory before insert and do the generation-retry in the code instead of waiting an error from db. That would make you read all coupons first, so you'd better index them.
If performance is more critical, you could even use a mix of the two: load a global cache of all coupons at regular intervals that fits your case (every hour, every day, etc...), so you can first quickly check in this list, without doing a big query in db. But collision may still happen since some values may have been added in the meantime, so you check for errors and retry.
You could also change strategy and don't enforce uniqueness. You need to check this with your requirements, but you could add some fields or a "child" "coupon" table for users.
Get some ideas to generate those values in DB here: Generating a random & unique 8 character string using MySQL (it's for MySQL, but some ideas could apply for all DBs)
You can use uuid. I have used it in many applications of mine.
If the rate of data coming is slow, not many requests are coming at the same time on the same server then you can also use the timestamp as an id too.
Related
PROBLEM:
I know there is a limit to the number of records (2^32 - 1 per Mozilla) that can be returned from IndexedDB, but what about data? I am having a problem with IDBIndex.getAll(), or IDBObjectStore.getAll() for that matter, throwing a generic error whenever the amount of data apparently exceeds a certain size.
"DOMException: The operation failed for reasons unrelated to the database itself and not covered by any other error code."
I am using Firefox, and this has been an issue for a while.
EXAMPLE:
... open the database, etc ...
let index = db.transaction("database").objectStore("myStore").index("myIndex");
let range = IDBKeyRange.lowerBound(100000); // That's just a random number for the example
let results = index.getAll(range);
results.onerror = e => {...};
results.onsuccess = e => {...};
That will always throw an error. But if I change the range to IDBKeyRange.lowerBound(100001), it always works.
TESTS:
If I were to turn lowerBound into upperBound (and change the range), the number of records returned without throwing an error would change, but that number would be consistent. Since not all records have the same amount of data, and the number of records returned in each case is fairly close (19,844 vs 20,188), that seems to support my assertion. The same thing happens if I change the index, just a different number of records.
Edit: I tested the database on another machine and I got a different maximum number of records returned, but that number was consistent as well.
I've also tried closing programs, thinking maybe it's a memory issue, and changing the size of my database by adding or removing a significant number of records, but that has had no effect on the results returned.
As a workaround, I am currently using openCursor() to evaluate each record individually, but that is slower than using getAll() since I want all the data from each record that satisfies the query.
Any ideas what's wrong and how to fix it?
See related problem that was never solved
I have certain requirements , I wanted to do the following in quickest way as possible.
I have 1000's of objects like below
{id:1,value:"value1"} . . {id:1000,value:"value1000"}
I want to access above objects by id
I want to clean the objects Lesser than certain id every few minutes (Because it generates 1000's of objects every second for my high frequency algorithm)
I can clean easily by using this.
myArray = myArray.filter(function( obj ) {
return obj.id > cleanSize;
});
I can find the object by id using
myArray.find(x => x.id === '45');
Problem is here , I feel that find is little slower when there is larger sets of data.So I created some objects of object like below
const id = 22;
myArray["x" + id] = {};
myArray["x" + id] = { id: id, value:"test" };
so I can access my item by id easily by myArray[x22]; , but problem is i am not able find the way to remove older items by id.
someone guide me better way to achieve the three points I mentioned above using arrays or objects.
The trouble with your question is, you're asking for a way to finish an algorithm that is supposed to solve a problem of yours, but I think there's something fundamentally wrong with the problem to begin with :)
If you store a sizeable amount of data records, each associated with an ID, and allow your code to access them freely, then you cannot have another part of your code dump some of them to the bin out of the blue (say, from within some timer callback) just because they are becoming "too old". You must be sure nobody is still working on them (and will ever need to) before deleting any of them.
If you don't explicitly synchronize the creation and deletion of your records, you might end up with a code that happens to work (because your objects happen to be processed quickly enough never to be deleted too early), but will be likely to break anytime (if your processing time increases and your data becomes "too old" before being fully processed).
This is especially true in the context of a browser. Your code is supposed to run on any computer connected to the Internet, which could have dozens of reasons to be running 10 or 100 times slower than the machine you test your code on. So making assumptions about the processing time of thousands of records is asking for serious trouble.
Without further specification, it seems to me answering your question would be like helping you finish a gun that would only allow you to shoot yourself in the foot :)
All this being said, any JavaScript object inherently does exactly what you ask for, provided you're okay with using strings for IDs, since an object property name can also be used as an index in an associative array.
var associative_array = {}
var bob = { id:1456, name:"Bob" }
var ted = { id:2375, name:"Ted" }
// store some data with arbitrary ids
associative_array[bob.id] = bob
associative_array[ted.id] = ted
console.log(JSON.stringify(associative_array)) // Bob and Ted
// access data by id
var some_guy = associative_array[2375] // index will be converted to string anyway
console.log(JSON.stringify(some_guy)) // Ted
var some_other_guy = associative_array["1456"]
console.log(JSON.stringify(some_other_guy)) // Bob
var some_AWOL_guy = associative_array[9999]
console.log(JSON.stringify(some_AWOL_guy)) // undefined
// delete data by id
delete associative_array[bob.id] // so long, Bob
console.log(JSON.stringify(associative_array)) // only Ted left
Though I doubt speed will really be an issue, this mechanism is about as fast as you will ever get JavaScript to run, since the underlying data structure is a hash table, theoretically O(1).
Anything involving array methods like find() or filter() will run in at least O(n).
Besides, each invocation of filter() would waste memory and CPU recreating the array to no avail.
Trying to create an activation code which should be unique, but it only consists of specific characters.
So, this is solution which i build
function findByActivationId() {
return Activation
.findOne({activationId})
.lean()
.exec();
}
let activationId = buildActivationId();
while (await findByActivationId(activationId)) {
activationId = buildActivationId();
}
This makes too many db calls, is there any better way to make query to mongodb?
Well, the major problem of checking if key is unique is based on how you are creating those.
Choose the best way for you to avoid bunch of problems later.
Your own generated string as a key
Well, you can do this but it's important to understand few disclaimers
If you want to generate your own key by the code and then compare if it is unique
in the database with all other currently created it can be done. Just create key by your
algorithm then select all keys from db and check if array of selected rows contains this freshly created string
Problems of this solution
As we can see we need to select all keys from DB and then compare each one to freshly created one. Problem can appear when your database is storing big amount of data. Every time application have to "download" big amount of data and then compare it to new one so in addition this might produce some freezes.
But if you are sure that your database will store not that much amount of unique rows, it is cool to work with.
Then it is important to create those keys properly. Now we talking about complexity, more symbols key is created from, harder to get same ones.
Shall we take a look at this example?
If you are creating keys based on letters a-z and numbers 1-9
and the length of key is for example 5, the complexity of this key is 35^5
which generates more than 52 milions possibilities.
Same keys can be generated but it is like a win on a lottery, almost impossible
And then you can just check if generated key is really unique, if not. (oh cmon) Repeat.
Other ways
Use mongodb _id which is always unique
Use UNIX timestamp to create unique key
I have a database with roughly 1.2M names. I'm using Twitter's typeahead.js to remotely fetch the autocomplete suggestions when you type someone's name. In my local environment this takes roughly 1-2 seconds for the results to appear after you stop typing (the autocomplete doesn't appear while you are typing), and 2-5+ seconds on the deployed app on Heroku (using only 1 dyno).
I'm wondering if the reason why it only shows the suggestions after you stop typing (and a few seconds delay) is because my code isn't as optimized?
The script on the page:
<script type="text/javascript">
$(document).ready(function() {
$("#navPersonSearch").typeahead({
name: 'people',
remote: 'name_autocomplete/?q=%QUERY'
})
.keydown(function(e) {
if (e.keyCode === 13) {
$("form").trigger('submit');
}
});
});
</script>
The keydown snippet is because without it my form doesn't submit for some reason when pushing enter.
my django view:
def name_autocomplete(request):
query = request.GET.get('q','')
if(len(query) > 0):
results = Person.objects.filter(short__istartswith=query)
result_list = []
for item in results:
result_list.append(item.short)
else:
result_list = []
response_text = json.dumps(result_list, separators=(',',':'))
return HttpResponse(response_text, content_type="application/json")
The short field in my Person model is also indexed. Is there a way to improve the performance of my typeahead?
I don't think this is directly related Django, but I may be wrong. I can offer some generic advice for this kind of situations:
(My money is on #4 or #5 below).
1) What is an average "ping" from your machine to Heroku? If it's far, that's a little bit extra overhead. Not much, though. Certainly not much when compared to then 8-9 seconds you are referring to. The penalty will be larger with https, mind you.
2) Check the value of waitLimitFn and rateLimitWait in your remote dataset. Are they the default?
3) In all likelyhood, the problem is database/dataset related. First thing to check is how long it takes you to establish a connection to the database (do you use a connection pool?).
4) Second thing: how long it takes to run the query. My bet is on this point or the next. Add debug prints, or use NewRelic (even the free plan is OK). Have a look at the generated query and make sure it is indexed. Have your DB "explain" the execution plan for such a query and make it is uses the index.
5) Third thing: are the results large? If, for example, you specify "J" as the query, I imagine there will be lots of answers. Just getting them and streaming them to the client will take time. In such cases:
5.1) Specify a minLength for your dataset. Make it at least 3, if not 4.
5.2) Limit the result set that your DB query returns. Make it return no more than 10, say.
6) I am no Django expert, but make sure the way you use your model in Django doesn't make it load the entire table into memory first. Just sayin'.
HTH.
results = Person.objects.filter(short__istartswith=query)
result_list = []
for item in results:
result_list.append(item.short)
Probably not the only cause of your slowness but this horrible from a performance point of view: never loop over a django queryset. To assemble a list from a django queryset you should always use values_list. In this specific case:
results = Person.objects.filter(short__istartswith=query)
result_list = results.values_list('short', flat=True)
This way you are getting the single field you need straight from the db instead of: getting all the table row, creating a Person instance from it and finally reading the single attribute from it.
Nitzan covered a lot of the main points that would improve performance, but unlike him I think this might be directly related to Django (at at least, sever side).
A quick way to test this would be to update your name_autocomplete method to simply return 10 random generated strings in the format that Typeahead expects. (The reason we want them random is so that Typeahead's caching doesn't skew any results).
What I suspect you will see is that Typeahead is now running pretty quick and you should start seeing results appear as soon as your minLength of string has been typed.
If that is the case then we will need to into what could be slowing the query up, my Python skills are non-existent so I can't help you there sorry!
If that isn't the case then I would maybe consider doing some logging of when $('#navPersonSearch') calls typeahead:initialized and typeahead:opened to see if they bring up anything odd.
You can use django haystack, and your server side code would be roughly like:
def autocomplete(request):
sqs = SearchQuerySet().filter(content_auto=request.GET.get('q', ''))[:5] # or how many names you need
suggestions = [result.first_name for result in sqs]
# you have to configure typeahead how to process returned data, this is a simple example
data = json.dumps({'q': suggestions})
return HttpResponse(data, content_type='application/json')
I need to implement a simple way to handle localization about weekdays' names, and I came up with the following structure:
var weekdaysLegend=new Array(
{'it-it':'Lunedì', 'en-us':'Monday'},
{'it-it':'Martedì', 'en-us':'Tuesday'},
{'it-it':'Mercoledì', 'en-us':'Wednesday'},
{'it-it':'Giovedì', 'en-us':'Thursday'},
{'it-it':'Venerdì', 'en-us':'Friday'},
{'it-it':'Sabato', 'en-us':'Saturday'},
{'it-it':'Domenica', 'en-us':'Sunday'}
);
I know I could implement something like an associative array (given the fact that I know that javascript does not provide associative arrays but objects with similar structure), but i need to iterate through the array using numeric indexes instead of labels.
So, I would like to handle this in a for cycle with particular values (like j-1 or indexes like that).
Is my structure correct? Provided a variable "lang" as one of the value between "it-it" or "en-us", I tried to print weekdaysLegend[j-1][lang] (or weekdaysLegend[j-1].lang, I think I tried everything!) but the results is [object Object]. Obviously I'm missing something..
Any idea?
The structure looks fine. You should be able to access values by:
weekdaysLegend[0]["en-us"]; // returns Monday
Of course this will also work for values in variables such as:
weekdaysLegend[i][lang];
for (var i = 0; i < weekdaysLegend.length; i++) {
alert(weekdaysLegend[i]["en-us"]);
}
This will alert the days of the week.
Sounds like you're doing everything correctly and the structure works for me as well.
Just a small note (I see the answer is already marked) as I am currently designing on a large application where I want to put locals into a javascript array.
Assumption: 1000 words x4 languages generates 'xx-xx' + the word itself...
Thats 1000 rows pr. language + the same 7 chars used for language alone = wasted bandwitdh...
the client/browser will have to PARSE THEM ALL before it can do any lookup in the arrays at all.
here is my approach:
Why not generate the javascript for one language at a time, if the user selects another language, just respond(send) the right javascript to the browser to include?
Either store a separate javascript with large array for each language OR use the language as parametre to the server-side script aka:
If the language file changes a lot or you need to minimize it per user/module, then its quite archivable with this approach as you can just add an extra parametre for "which part/module" to generate or a timestamp so the cache of the javascript file will work until changes occures.
if the dynamic approach is too performance heavy for the webserver, then publish/generate the files everytime there is a change/added a new locale - all you'll need is the "language linker" check in the top of the page, to check which language file to server the browser.
Conclusion
This approach will remove the overhead of a LOT of repeating "language" ID's if the locales list grows large.
You have to access an index from the array, and then a value by specifying a key from the object.
This works just fine for me: http://jsfiddle.net/98Sda/.
var day = 2;
var lang = 'en-us';
var weekdaysLegend = [
{'it-it':'Lunedì', 'en-us':'Monday'},
{'it-it':'Martedì', 'en-us':'Tuesday'},
{'it-it':'Mercoledì', 'en-us':'Wednesday'},
{'it-it':'Giovedì', 'en-us':'Thursday'},
{'it-it':'Venerdì', 'en-us':'Friday'},
{'it-it':'Sabato', 'en-us':'Saturday'},
{'it-it':'Domenica', 'en-us':'Sunday'}
];
alert(weekdaysLegend[day][lang]);