How to speed up performance of autocomplete from indexeddb database

How to speed up performance of autocomplete from indexeddb database - javascript

I have jQuery autocomplete field that has to search through several thousand items, populated from an IndexedDB query (using the idb wrapper). The following is the autocomplete function called when the user begins typing in the box. hasKW() is a function that finds keywords.
async function siteAutoComplete(request, response) {
const db = await openDB('AgencySite');
const hasKW = createKeyWordFunction(request.term);
const state = "NY";
const PR = 0;
const agency_id = 17;
const range = IDBKeyRange.bound([state, PR, agency_id], [state, PR, agency_id || 9999999]);
let cursor = await db.transaction('sites').store.index("statePRAgency").openCursor(range);
let result = [];
while (cursor) {
if (hasKW(cursor.value.name)) result.push({
value: cursor.value.id,
label: cursor.value.name
});
cursor = await cursor.continue();
}
response(result);
}
My question is this: I'm not sure if the cursor is making everything slow. Is there a way to get all database rows that match the query without using a cursor? Is building the result array slowing me down? Is there a better way of doing this? Currently it takes 2-3s to show the autocomplete list.

I hope this will be useful to someone else. I removed the cursor and just downloaded the whole DB into a javascript array and then used .filter. The speedup was dramatic. It took 2300ms using the way above and about 21ms using this:
let result = await db.transaction('sites').store.index("statePRAgency").getAll();
response(result.filter(hasKW));

You probably want to use an index, where by the term index, I mean a custom built one that represents a search engine index. You cannot easily and efficiently perform "startsWith" style queries over one of indexedDB's indices because they are effectively whole value (or least lexicographic).
There are many ways to create the search engine index I am suggesting. You probably want something like a prefix-tree, also known informally as a trie.
Here is a nice article by John Resig that you might find helpful: https://johnresig.com/blog/javascript-trie-performance-analysis/. Otherwise, I suggest searching around on Google for trie implementations and then figuring out how to represent a similar data structure within an indexedDb object store or indexdDb index on an object store.
Essentially, insert the data first without the properties used by the index. Then, in an "indexing step", visit each object and index its value, and set the properties used by the indexedDb index. Or do this at time of insert/update.
From there, you probably want to open a connection shortly after page load and keep it open for the entire duration of the page. Then query against the index every time a character is typed (probably want to rate limit this call to refrain from querying more than n/second, perhaps using some kind of debounce helper function).
On the other hand, I might be a bit rusty on this one, but maybe you can create an index on the string prop, then use a lower bound that is the entered characters. A string that is lesser length than another string that contains it is present earlier in lexicographic order. So maybe it is actually that easy. You would also need to impose an upper bound that contains the entered characters thus far concatenated with some kind of sentinel value that can never realistically exist in the data, something silly like ZZZZZ.
Try this out in the browser's console:
indexedDB.cmp('test', 'tasting'); // 1
indexedDB.cmp('test', 'testing'); // -1
indexedDB.cmp('test', 'test'); // 0
You essentially want to experiment with a query like this:
const sentinel = 'ZZZ';
const index = store.index('myStore');
const bounds = IDBKeyRange.bound(value, value + sentinel);
const request = index.get(bounds);
You might need to tweak the sentinel, experiment with other parameters to IDBKeyRange.bound (the inclusive/exclusive flags), probably need to store the value in homogenized case so that the search is case insensitive, avoid every sending a query when nothing has been typed, etc.

Related

javascript - Set vs Map - which is faster?

Set and Map both are newer data types in es6 and for certain situations both can be used.
e.g - if i want to store all unique elements, i can use Set as well as Map with true as value.
const data: string[] ;
// console.log('data', data[0])
const set = new Set();
const map = new Map<string, boolean>();
data.forEach((item) => {
map.set(item, true);
});
data.forEach((item) => {
set.add(item);
});
Both works, but i was wondering which one is faster ?
Update 1
I am looking for which of the data structure is faster in case of storing data.
checking if value exist using -
map.has(<value>)
set.has(<value>)
deleting values
Also i can understand true is redundant and not used anywhere, but i am just trying to show how map and set can be used alternatively.
What matters is speed.

In the most basic sense:
Maps are for holding key-value pairs
Sets are for holding values
The true in your map is completely redundant ... if a key exists, that automatically implies, that it is true/exists - so you will never ever need to use the value of the key-value pair in the map (so why use the map at all, if you're never gonna make use of what it is actually for? - to me that sounds like a set/array with extra steps)
If you just want to store values use an array or set. - Which of the two depends on what you are trying to do.
The question of "which is faster" can't really be answered properly, because it largely depends on what you are trying to do with the stored values. (What you are trying to do also determines what data structure to use)
So choose whatever data structure you think fits your needs best, and when you run into a problem that another would fix, you can always change it later/convert from one into another.
And the more you use them and the more you see what they can and can not do, the better you'll get at determining which to use from the start (for a given problem)

accessing and removing objects by ID

I have certain requirements , I wanted to do the following in quickest way as possible.
I have 1000's of objects like below
{id:1,value:"value1"} . . {id:1000,value:"value1000"}
I want to access above objects by id
I want to clean the objects Lesser than certain id every few minutes (Because it generates 1000's of objects every second for my high frequency algorithm)
I can clean easily by using this.
myArray = myArray.filter(function( obj ) {
return obj.id > cleanSize;
});
I can find the object by id using
myArray.find(x => x.id === '45');
Problem is here , I feel that find is little slower when there is larger sets of data.So I created some objects of object like below
const id = 22;
myArray["x" + id] = {};
myArray["x" + id] = { id: id, value:"test" };
so I can access my item by id easily by myArray[x22]; , but problem is i am not able find the way to remove older items by id.
someone guide me better way to achieve the three points I mentioned above using arrays or objects.

The trouble with your question is, you're asking for a way to finish an algorithm that is supposed to solve a problem of yours, but I think there's something fundamentally wrong with the problem to begin with :)
If you store a sizeable amount of data records, each associated with an ID, and allow your code to access them freely, then you cannot have another part of your code dump some of them to the bin out of the blue (say, from within some timer callback) just because they are becoming "too old". You must be sure nobody is still working on them (and will ever need to) before deleting any of them.
If you don't explicitly synchronize the creation and deletion of your records, you might end up with a code that happens to work (because your objects happen to be processed quickly enough never to be deleted too early), but will be likely to break anytime (if your processing time increases and your data becomes "too old" before being fully processed).
This is especially true in the context of a browser. Your code is supposed to run on any computer connected to the Internet, which could have dozens of reasons to be running 10 or 100 times slower than the machine you test your code on. So making assumptions about the processing time of thousands of records is asking for serious trouble.
Without further specification, it seems to me answering your question would be like helping you finish a gun that would only allow you to shoot yourself in the foot :)
All this being said, any JavaScript object inherently does exactly what you ask for, provided you're okay with using strings for IDs, since an object property name can also be used as an index in an associative array.
var associative_array = {}
var bob = { id:1456, name:"Bob" }
var ted = { id:2375, name:"Ted" }
// store some data with arbitrary ids
associative_array[bob.id] = bob
associative_array[ted.id] = ted
console.log(JSON.stringify(associative_array)) // Bob and Ted
// access data by id
var some_guy = associative_array[2375] // index will be converted to string anyway
console.log(JSON.stringify(some_guy)) // Ted
var some_other_guy = associative_array["1456"]
console.log(JSON.stringify(some_other_guy)) // Bob
var some_AWOL_guy = associative_array[9999]
console.log(JSON.stringify(some_AWOL_guy)) // undefined
// delete data by id
delete associative_array[bob.id] // so long, Bob
console.log(JSON.stringify(associative_array)) // only Ted left
Though I doubt speed will really be an issue, this mechanism is about as fast as you will ever get JavaScript to run, since the underlying data structure is a hash table, theoretically O(1).
Anything involving array methods like find() or filter() will run in at least O(n).
Besides, each invocation of filter() would waste memory and CPU recreating the array to no avail.

Return Array of Data using Google Assistant from Firebase

The structure I have for my firebase database is like this:
fruits:
apple,5
banana,6
I want to put apple and banana in an array so that when i give a command to Google Assistant, it would give me apple, 5 and banana, 6. The code I have is like the one below:
function handleCommand(agent) {
return admin.database().ref('Fruits').child().once("value").then((snapshot) =>{
var i;
var fruitlist=[];
//puts each snapshot child of 'Fruit' in an array
snapshot.forEach(function(item) {
var itemVal = item.val();
fruitlist.push(itemVal);
});
//outputs command in google assistant
for (i=0; i < fruitlist.length; i++) {
agent.add(fruitlist[i]);
}
})
The default response is "not available".
I get the following in the execution logs:
Firebase.child failed. Was called 0 aruguments. expects at least 1.
I do not know which argument to put inside the Firebase.child. if i want all fruits to be "spoken" by Google Assistant. Below is a picture of my firebase structure.
The error looks like the one below:
What I am currently doing now to just output the fruits are manually entering each child in the code like this and removed the ".child" in the return statement:
Which gives me the output below which is also what I want to see but using arrays as the solution I am using now is very much hardcoded:

As the error message suggests, and as you surmise, the child() call expects a parameter - in particular, the name of the child node you want to get information from. However, since you want all the children of the "Fruits" node - you don't need to specify it at all. The child() call just navigates down through the hierarchy, but you don't need to navigate at all if you don't want to.
The snapshot you get back will have a value of the entire object. In some cases, this can be pretty large, so it isn't a good idea to get it all at once. In your case, it is fairly small, so not as big a deal.
On the JavaScript side, you can now handle that value as an object with attributes and values. Your original code didn't quite do what you said you want it to, however - you're getting the value, but ignoring the name (which is the attribute name or key). You can iterate over the attributes of an object in a number of ways, but I like getting the keys of the object, looping over this, getting the value associated with the key, and then "doing something" with it.
While I haven't tested the code, it might look something like this:
function handleCommand(agent) {
return admin.database().ref('Fruits').once("value").then((snapshot) =>{
// Get an object with all the fruits and values
var fruits = snapshot.val();
// Get the keys for the attributes of this object as an array
var keys = Object.keys( fruits );
// Iterate over the keys, get the associated value, and do something with it
for( var i=0; i<keys.length; i++ ){
var key = keys[i];
var val = fruits[key];
agent.add( `The number of ${key} you have are: ${val}` );
}
})
While this is (or should be) working Firebase and JavaScript, there are a couple of problems with this on the Actions on Google side.
First, the message returned might have some grammar problems, so using your example, you may see a message such as "The number of Apple you have are: 1". There are ways to resolve this, but keep in mind my sample code is just a starter sample.
More significantly, however, the call to agent.add() with a string creates a "SimpleResponse". You're only allowed two simple responses per reply in an Action. So while this will work for your example, it will have problems if you have more fruit. You can solve this by concatenating the strings together so you're only calling agent.add() once.
Finally, you may wish to actually look at some of the other response options for different surfaces. So while you might read out this list on a speaker, you may read a shorter list on a device with a screen and show a table with the information. Details about these might be better addressed as a new StackOverflow question, however.

Query using multiple conditions

I recently discovered (sadly) that WebSQL is no longer being supported for HTML5 and that IndexedDB will be replacing it instead.
I'm wondering if there is any way to query or search through the entries of an IndexedDB in a similar way to how I can use SQL to search for an entry satisfying multiple conditions.
I've seen that I can search through IndexedDB using one condition with the KeyRange. However, I can't seem to find any way to search two or more columns of data without grabbing all the data from the database and doing it with for loops.
I know this is a new feature that's barely implemented in the browsers, but I have a project that I'm starting and I'm researching the different ways I could do it.
Thank you!

Check out this answer to the same question. It is more detailed than the answer I give here. The keypath parameter to store.createIndex and IDBKeyRange methods can be an array. So, crude example:
// In onupgradeneeded
var store = db.createObjectStore('mystore');
store.createIndex('myindex', ['prop1','prop2'], {unique:false});
// In your query section
var transaction = db.transaction('mystore','readonly');
var store = transaction.objectStore('mystore');
var index = store.index('myindex');
// Select only those records where prop1=value1 and prop2=value2
var request = index.openCursor(IDBKeyRange.only([value1, value2]));
// Select the first matching record
var request = index.get(IDBKeyRange.only([value1, value2]));

Let's say your SQL Query is something like:
SELECT * FROM TableName WHERE Column1 = 'value1' AND Column2 = 'value2'
Equivalent Query in JsStore library:
var Connection = new JsStore.Instance("YourDbName");
Connection.select({
From: "YourTableName"
Where: {
Column1: 'value1',
Column2: 'value2'
},
OnSuccess:function (results){
console.log(results);
},
OnError:function (error) {
console.log(error);
}
});
Now, if you are wondering what JsStore is, let me tell you it is a library to query IndexedDB in a simplified manner. Click here to learn more about JsStore

I mention some suggestions for querying relationships in my answer to this question, which may be of interest:
Conceptual problems with IndexedDB (relationships etc.)
As to querying multiple fields at once, it doesn't look like there's a native way to do that in IndexedDB (I could be wrong; I'm still new to it), but you could certainly create a helper function that used a separate cursor for each field, and iterated over them to see which records met all the criteria.

Yes, opening continuous key range on an index is pretty much that is in indexedDB. Testing for multiple condition is not possible in IndexedDB. It must be done on cursor loop.
If you find the solution, please let me know.
BTW, i think looping cursor could be very fast and require less memory than possible with Sqlite.

I'm a couple of years late, but I'd just like to point out that Josh's answer works on the presumption that all of the "columns" in the condition are part of the index's keyPath.
If any of said "columns" exist outside the the index's keyPath, you will have to test the conditions involving them on each entry which the cursor created in the example iterates over. So if you're dealing with such queries, or your index isn't unique, be prepared to write some iteration code!
In any case, I suggest you check out BakedGoods if you can represent your query as a boolean expression.
For these types of operations, it will always open a cursor on the focal objectStore unless you're performing a strict equality query (x ===? y, given x is an objectStore or index key), but it will save you the trouble writing your own cursor iteration code:
bakedGoods.getAll({
filter: "keyObj > 5 && valueObj.someProperty !== 'someValue'",
storageTypes: ["indexedDB"],
complete: function(byStorageTypeResultDataObj, byStorageTypeErrorObj){}
});
Just for the sake of complete transparency, BakedGoods is maintained by moi.

jQuery "Autocomplete" plugin is messing up the order of my data

I'm using Jorn Zaefferer's Autocomplete plugin on a couple of different pages. In both instances, the order of displayed strings is a little bit messed up.
Example 1: array of strings: basically they are in alphabetical order except for General Knowledge which has been pushed to the top:
General Knowledge,Art and Design,Business Studies,Citizenship,Design and Technology,English,Geography,History,ICT,Mathematics,MFL French,MFL German,MFL Spanish,Music,Physical Education,PSHE,Religious Education,Science,Something Else
Displayed strings:
General Knowledge,Geography,Art and Design,Business Studies,Citizenship,Design and Technology,English,History,ICT,Mathematics,MFL French,MFL German,MFL Spanish,Music,Physical Education,PSHE,Religious Education,Science,Something Else
Note that Geography has been pushed to be the second item, after General Knowledge. The rest are all fine.
Example 2: array of strings: as above but with Cross-curricular instead of General Knowledge.
Cross-curricular,Art and Design,Business Studies,Citizenship,Design and Technology,English,Geography,History,ICT,Mathematics,MFL French,MFL German,MFL Spanish,Music,Physical Education,PSHE,Religious Education,Science,Something Else
Displayed strings:
Cross-curricular,Citizenship,Art and Design,Business Studies,Design and Technology,English,Geography,History,ICT,Mathematics,MFL French,MFL German,MFL Spanish,Music,Physical Education,PSHE,Religious Education,Science,Something Else
Here, Citizenship has been pushed to the number 2 position.
I've experimented a little, and it seems like there's a bug saying "put things that start with the same letter as the first item after the first item and leave the rest alone". Kind of mystifying. I've tried a bit of debugging by triggering alerts inside the autocomplete plugin code but everywhere i can see, it's using the correct order. it seems to be just when its rendered out that it goes wrong.
Any ideas anyone?
max
EDIT - reply to Clint
Thanks for pointing me at the relevant bit of code btw. To make diagnosis simpler i changed the array of values to ["carrot", "apple", "cherry"], which autocomplete re-orders to ["carrot", "cherry", "apple"].
Here's the array that it generates for stMatchSets:
stMatchSets = ({'':[#1={value:"carrot", data:["carrot"], result:"carrot"}, #3={value:"apple", data:["apple"], result:"apple"}, #2={value:"cherry", data:["cherry"], result:"cherry"}], c:[#1#, #2#], a:[#3#]})
So, it's collecting the first letters together into a map, which makes sense as a first-pass matching strategy. What i'd like it to do though, is to use the given array of values, rather than the map, when it comes to populating the displayed list. I can't quite get my head around what's going on with the cache inside the guts of the code (i'm not very experienced with javascript).
SOLVED - i fixed this by hacking the javascript in the plugin.
On line 549 (or 565) we return a variable csub which is an object holding the matching data. Before it's returned, I reorder this so that the order matches the original array of value we were given, ie that we used to build the index in the first place, which i had put into another variable:
csub = csub.sort(function(a,b){ return originalData.indexOf(a.value) > originalData.indexOf(b.value); })
hacky but it works. Personally i think that this behaviour (possibly coded more cleanly) should be the default behaviour of the plugin: ie, the order of results should match the original passed array of possible values. That way the user can sort their array alphabetically if they want (which is trivial) to get the results in alphabetical order, or they can preserve their own 'custom' order.

What I did instead of your solution was to add
if (!q && data[q]){return data[q];}
just above
var csub = [];
found in line ~535.
What this does, if I understood correctly, is to fetch the cached data for when the input is empty, specified in line ~472: stMatchSets[""] = []. Assuming that the cached data for when the input is empty are the first data you provided to begin with, then its all good.

I'm not sure about this autocomplete plugin in particular, but are you sure it's not just trying to give you the best match possible? My autocomplete plugin does some heuristics and does reordering of that nature.
Which brings me to my other answer: there are a million jQuery autocomplete plugins out there. If this one doesn't satisfy you, I'm sure there is another that will.
edit:
In fact, I'm completely certain that's what it's doing. Take a look around line 474:
// loop through the array and create a lookup structure
for ( var i = 0, ol = options.data.length; i < ol; i++ ) {
/* some code */
var firstChar = value.charAt(0).toLowerCase();
// if no lookup array for this character exists, look it up now
if( !stMatchSets[firstChar] )
and so on. So, it's a feature.

We Keep Coding

JavaScript is the programming language of the Web.