How can we search through 5,000 users in Firestore db? - javascript

We have a staffing application, built using vuejs and a firestore database with over 5,000 users. Our challenge is that we need a layout for admins to search for users in the db.
Previously, we were loading all users on our Users.vue layout and then searching/viewing them in a Vuetify data table. The problem now is that we just have too many users. That layout loads way too slowly and will even cause the app to crash on mobile browsers.
The solution we are trying to make work is to search for users in the db, and only load those results into our data table. The code below (using vuex) works, as long as the "name" is EXACT.
getUsersState({commit}, payload){
fb.usersCollection.where("name", "==", payload.search).limit(10).onSnapshot(querySnapshot => {
let usersArray = []
console.log(payload.search)
querySnapshot.forEach(doc => {
let user = doc.data()
user.id = doc.id
usersArray.push(user)
})
commit('setUsers', usersArray)
})
},
The problem is that we need it to work even if we only type in the first few letters of a name or even an email address. Firestore only offers ==, >=, <=, >, and < parameters. And "array-contains" only works with an array, not our "user" object.
On Users.vue:
created () {
console.log('getting users')
this.$store.dispatch("getUsersState", {search: this.search})
},
computed: {
...mapState(['currentUser', 'users', 'userProfile']),
isAdmin: function() {
return this.userProfile.accessLevel >= 5
},
isUsers: function() {
return this.users
}
},
watch: {
search: 'updateSearch'
},
methods: {
clearSearch () {
return this.isSearch = ''
},
updateSearch() {
this.$store.dispatch("getUsersState", {search: this.search})
},
},
Does anyone have any ideas for how we can search the users in our firestore DB by only typing in the first few letters of their name?

Integrate a full text search engine, and keep it in sync with Firestore. Nontrivial to implement. Official docs recommend Algolia: https://firebase.google.com/docs/firestore/solutions/search

The right answer is full text search, but is a big hammer for this use case. Here are some other options that can keep you going for a while:
1) First, note that Firestore has an index sitting there that looks like
Collection\User\a -> somedoc
Collection\User\aaa -> somedoc
Collection\User\aba -> somedoc
Collection\User\abc -> somedoc
Collection\User\bbc -> somedoc
If you have a username prefix like a there is nothing to say you can't run a query for user >='a' and user <= 'b' and have if fetch (in this example) {a,aaa,aba}
Similarly >= 'ab' && <= 'b' gets you {ab, abc}
So you go from fetching all 5000 users to just the users with the prefix -- which is alot smaller.
2) Stuff the things you want to autocomplete into a few documents and load them.
Imagine you have 5000 users, and you store their names into 10 documents with 500 usernames each -- you keep those documents up to date as users add or remove. To get the entire autocomplete list you fetch those 10 documents into the browser and feed the 5000 users to some sort of autocomplete widget. You could do the same thing for emails.
The browser can now do fancy instant autocomplete. This is faster/cheaper than fetching the entire collection of 5000 users -- you only ask for the data you need.

Related

"Bulk" Updating with Postgres DB and JS/Knex/Express Question

I have an update endpoint that when an incoming (request) contains a site name that matches any site name in my job site's table, I change all those particular DB entries status to "Pending Transfer" and essentially clear their site location data.
I've been able to make this work with the following:
async function bulkUpdate(req, res){
const site = req.body.data;
const data = await knex('assets')
.whereRaw(`location ->> 'site' = '${site.physical_site_name}'`)
.update({
status: "Pending Transfer",
location: {
site: site.physical_site_name,
site_loc: { first_octet: site.first_octet, mdc: '', shelf: '', unit: ''} //remove IP address
},
//history: ''
}) //todo: update history as well
.returning('*')
.then((results) => results[0]);
res.status(200).json({ data });
}
I also want to update history (any action we ever take on an object like a job site is stored in a JSON object, basically used as an array.
As you can see, history is commented out, but as this function essentially "sweeps" over all job sites that match the criteria and makes the change, I would also like to "push" an entry onto the existing history column here as well. I've done this in other situations where I destructure the existing history data, and add the new entry, etc. But as we are "sweeping" over the data, I'm wondering if there is a way to just push this data onto that array without having to pull each individual's history data via destructuring?
The shape of an entry in history column is like so:
[{"action_date":"\"2022-09-06T22:41:10.232Z\"","action_taken":"Bulk Upload","action_by":"Davi","action_by_id":120,"action_comment":"Initial Upload","action_key":"PRtW2o3OoosRK9oiUUMnByM4V"}]
So ideally I would like to "push" a new object onto this array without having (or overwriting) the previous data.
I'm newer at this, so thank you for all the help.
I had to convert the column from json to jsonb type, but this did the trick (with the concat operator)...
history: knex.raw(`history || ?::jsonb`, JSON.stringify({ newObj: newObjData }))

Query stored values that contain specific string

I have a small realtime firebase database that's set up like this:
database
-messages
--XXXXXXXXXXXX
---id : "XXX-XXX"
---content : "Hello world!"
It's a very simple message system, the id field is basically a combination of users id from my mysql database. I'm trying to return all messages that match one of the ids, either sender or receiver. But I can't do it, seems like firebase only support exacts querys. Could you give me some guidanse?
Here's the code I'm working with
firebase.database().ref("messages").orderByChild("id").equalTo(userId).on("value", function(snapshot)
I'm looking for something like ".contains(userId)"
Firebase supports exact matches (with equalTo) and so-called prefix queries where the value starts with a certain value (by combining startAt and endAt). It does not support querying for values that end with a certain value.
I recommend keeping a mapping from each user IDs to their messages nodes, somewhere separately in their database.
So say that you have:
messages: {
"XXXXXXXXXXXX": {
id : "YYY-ZZZ",
content : "Hello world!"
}
}
You also have the following mappings:
userMessages: {
"YYY": {
"XXXXXXXXXXXX": true
},
"ZZZ": {
"XXXXXXXXXXXX": true
}
}
Now with this information you can look up the messages for each user based on their ID.
For more on the modeling of this type of data, I recommend:
Best way to manage Chat channels in Firebase
Many to Many relationship in Firebase
this artcle on NoSQL data modeling

Generate a custom unique id and check it does not exist before creating a new user with MySQL and sequelize

I'm trying to create a unique id that is 8 characters long for each new user added to a MySQL database. I am using Sequelize along with express to create users. I've created my own custom function: idGen() that simply returns a randomized 8 character string. Using express router I can handle/validate all the form data used to create a new user. The issue I am having is when I generate a new ID I want to check to make sure that ID does not already exist in the database. So far I have this solution:
Users.findAll().then( data => {
tableData = data.map(id => id.get('id'));
while( tableData.includes(uid) ){
try {
uid = idGen(8);
} catch( error ){
return res.status(400).json( error )
}
}
}).then( () => {
Users.create({
id: uid,
name: req.body.name,
email: req.body.email
})
}).then( user => res.json(user) );
This block of code is actually working and saving the new user in the DB, but I am almost certain that this is not the best / right way of doing this. Is anyone able to point me in the right direction and show me a better/proper way to check the random generated ID, and recall idGen(if needed) in a loop before adding a new user?
Many Thanks!
instead of find all and then filter in Javascript, why don't you select from the database right away?
an alternative way I could think of is using a filter like bloom or cuckoo.the false positive rate should be low.
load ids to redis, probably with redis bloom (https://github.com/RedisBloom/RedisBloom)
check the new generated id with bloom filter.
if exists => re-generate id. if not, insert. there could be false positive but the rate is low and you can handle it just the same.
pros:
- no need to check again database every time.
- checking with bloom filter is probably much faster than db.
- scaling redis is easier than db.
cons:
- need redis and redis bloom.

Ajax GET: multiple data-specific calls, or fewer less specific calls?

I'm developing a web app using a Node.js/express backend and MongoDB as a database.
The below example is for an admin dashboard page where I will display cards with different information relating to the users on the site. I might want to show - on the sample page - for example:
The number of each type of user
The most common location for each user type
How many signups there are by month
Most popular job titles
I could do this all in one route, where I have a controller that performs all of these tasks, and bundles them as an object to a url that I can then pull data from using ajax. Or, I could split each task into its own route/controller, with a separate ajax call to each. What I'm trying to decide is what are the best practices around making multiple ajax calls on a single page.
Example:
I am building up a page where I will make an interactive table using DataTables for different types of user ( currently have two: mentors and mentees). This example requires just two data requests (one for each user type), but my final page will be more like 10.
For each user type, I am making an ajax get call for each user type, and building the table from the returned data:
User type 1 - Mentees
$.get('/admin/' + id + '/mentees')
.done(data => {
$('#menteeTable').DataTable( {
data: data,
"columns": [
{ "data": "username"},
{ "data": "status"}
]
});
})
User type 2 - Mentors
$.get('/admin/' + id + '/mentors')
.done(data => {
$('#mentorTable').DataTable( {
data: data,
"columns": [
{ "data": "username"},
{ "data": "position"}
]
});
})
This then requires two routes in my Node.js backend:
router.get("/admin/:id/mentors", getMentors);
router.get("/admin/:id/mentees", getMentees);
And two controllers, that are structured identically (but filter for differnt user types):
getMentees(req, res, next){
console.log("Controller: getMentees");
let query = { accountType: 'mentee', isAdmin: false };
Profile.find(query)
.lean()
.then(users => {
return res.json(users);
})
.catch(err => {
console.log(err)
})
}
This works great. However, as I need to make multiple data requests I want to make sure that I'm building this the right way. I can see several options:
Make individual ajax calls for each data type, and do any heavy lifting on the backend (e.g. tally user types and return) - as above
Make individual ajax calls for each data type, but do the heavy lifting on the frontend. In the above example I could have just as easily filtered out isAdmin users on the data returned from my ajax call
Make fewer ajax calls that request less refined data. In the above example I could have made one call (requiring only one route/controller) for all users, and then filtered data on the frontend to build two tables
I would love some advice on which strategy is most efficient in terms of time spent sourcing data
UPDATE
To clarify the question, I could have achieved the same result as above using a controller setup something like this:
Profile.find(query)
.lean()
.then(users => {
let mentors = [],
mentees = []
users.forEach(user => {
if(user.accountType === 'mentee') {
mentees.push(user);
} else if (user.accountType === 'mentor') {
mentors.push(user);
}
});
return res.json({mentees, mentors});
})
And then make one ajax call, and split the data accordingly. My question is: which is the preferred option?
TL;DR: Option 1
IMO I wouldn't serve unprocessed data to the front-end, things can go wrong, you can reveal too much, it could take a lot for the unspecified client machine to process (could be a low power device with limited bandwidth and battery power for example), you want a smooth user experience, and javascript on the client churning out information from a mass of data would detract from that. I use the back-end for the processing (prepare the information how you need it), JS for retrieving and placing the information (AJAX) on the page and things like switching element states, and CSS for anything moving around (animations and transitions etc) as much as possible before resorting to JS.
Also for the routes, my approach would be each distinct package of information (dataTable) has a route, so you're not overloading a method with too many purposes, keep it simple and maintainable. You can always abstract away anything that's identical and repeated often.
So to answer your question, I'd go with Option 1.
You could also offer a single 'page-load' endpoint, then if anything changes update the individual tables later using their distinct endpoints. This initial 'page-load' call could collate the information from the endpoints on the backend and serve as one package of data to populate all tables initially. One initial request with one lot of well-defined data, then the ability to update an individual table if the user requests it (or there is a push if you get into that).
It is really good question. First of all you should realize how your application will manage with received data. If it is huge amount of data that are not changed on fronend but with different views and whole data needs for these views it might be cached into frontend (like user settings data - application always reads it but rare changes) then you could follow with your second options. Other case if frontend works only with small part of huge amount of database data (like log data for specific user) it is preferably to preprocess (filtering) on server side your first and third options. Actually second options is preferable only for caching unchanged data on frontend as for me.
After clarifying the question you could use grouping for your request and lodash library:
Profile.find(query)
.lean()
.then(users => {
let result = [];
result = _(users)
.groupBy((elem) => elem.accountType)
.map((vals, key) => ({accountType: key, users: vals}))
.value();
});
return res.json(result);
});
Certainly you could map your data as you comfortable. This way allows to get all types of accounts (not only 'mentee' and 'mentor')
Usually there are 3 things in such architectures:
1. Client
2. API Gateway
3. Micro services (Servers)
In your case :
1. Client is JS application code
2. API Gateway + Server is Nodejs/express (Dual responsibility)
Point 1 to be noted
Servers only provides core APIs. So this API for a server should be only a user api like:
/users?type={mentor/mentee/*}&limit=10&pageNo=8
i.e anyone can ask for all data or filtered data using type query string.
Point 2 to be noted
Since Web pages are composed of multiple data points and making call for every data point to the same server increases the round trip and makes the UX worse, API gateways are there. So in this case JS would not directly communicate with core server, it communicates with API Gateway with and APIs like:
/home
The above API internally calls below APIs and aggregates the data in a single json with mentor and mentee list
/users?type={mentor/mentee/*}&limit=10&pageNo=8
This API simply passes the call to core server with query attributes
Now since in your code, API gateway and Core server is merged into single layer, this is how you should setup your code:
getHome(req, res, next){
console.log("Controller: /home");
let queryMentees = { accountType: 'mentee', isAdmin: false };
let queryMentors = { accountType: 'mentor', isAdmin: true };
mentes = getProfileData(queryMentees);
mentors = getProfileData(queryMentors);
return res.json({mentes,mentors});
}
getUsers(req, res, next){
console.log("Controller: /users");
let query = {accountType:request.query.type,isAdmin:request.query.isAdmin};
return res.json(getProfileData(query));
}
And a common ProfileService.js class with a function like:
getProfileData(query){
Profile.find(query)
.lean()
.then(users => {
return users;
})
.catch(err => {
console.log(err)
})
}
More info about API Gateway Pattern here
If you can't estimate how many types need on your app then needs to be use parameters,
If I wrote like this application I don't write multiple function for calling ajax and don't write multiple route and controller,
Client side like this
let getQuery = (id,userType)=>{
$.get('/admin/' + id + '/userType/'+userType)
.done(data => {
let dataTable = null;
switch(userType){
case "mentee":
dataTable = $('#menteeTable');
break;
case "mentor":
dataTable = $('#mentorTable');
break;
//.. you can add more selector for datatables but I wouldn't prefer this way you can generate "columns" property on server like "data" so meaning that you can just use one datatable object on client side
}
dataTable.DataTable( {
data: data,
"columns": [
{ "data": "username"},
{ "data": "status"}
]
});
})
}
My prefer for client side
let getQuery = (id,userType)=>{
$.get('/admin/' + id + '/userType/'+userType)
.done(data => {
$('#dataTable').DataTable( {
data: data.rows,
"columns": data.columns
]
});
})
}
Server response should support {data: [{}...], columns:[{}....]} like this on this scenario Datatables examples
Server side like this
Router just one
router.get("/admin/:id/userType/:userType", getQueryFromDB);
Controller
getQueryFromDB(req, res, next){
let query = { accountType: req.params.userType, isAdmin: false };
Profile.find(query)
.lean()
.then(users => {
return res.json(users);
})
.catch(err => {
console.log(err)
})
}
So main meaning about your question for me that mentees, mentors etc... are parameters like as "id"
make sure that your authentication checked which users have access userType data for both code samples mine and your code, someone can reach your data with just change routing
Have a nice weekend
from performance and smoothness of ui on user device:
Sure it would be better to do 1 ajax request for all core data (which is important to show as soon as possible), and possibly perform more requests for less priority data with some tiny delay. Or do 2 requests: one for 'fast' data and another for 'slow' (if this is applicable) because:
On one hand, many ajax requests could slowdown ui there could be a limitation for amount of ajax requests getting done at same time (it is browser dependent an could be from 2 to 10) so if for ex. in ie there will be limit of 2 then with 10 ajaxes there will be an queue of waiting ajax requests
But on the other hand if there is much data to show or some data takes longer to prepare it could result in long waiting for backend response to show something.
Talking of heavy lifting: It is not good to make such things on UI side anyway, because:
User device can be not good with resources and 'slow'.
Javascript is synchronous and as a consequence, any long loop 'freeze' UI for time it required to run that loop.
Talking of filtering users:
Profile.find(query)
.lean()
.then(users => {
let mentors = [],
mentees = []
users.forEach(user => {
if(user.accountType === 'mentee') {
mentees.push(user);
} else if (user.accountType === 'mentor') {
mentors.push(user);
}
});
return res.json({mentees, mentors});
})
seems to have one problem, possibly query will have sortings and limits, if so final result will be inconsistent, it possibly end up with only mentees or only mentors, i think you should do 2 separate queries to data storage anyways
from project structuring, maintainability, flexibility, reusability, and so on, of course it is good to decouple things as much as possible.
So, finally, imagine you made:
1. many microservices like for each widget 1 backend microcervice but there is a layer which allows to aggregate results to optimize traffic from UI in 1-2 ajax query.
2. many ui modules each working with own data, received from some service, which do 1-2 calls for aggregating backend and distributes different datasets it recieved to many frontend modules.
At back end just make one dynamic parametric method API. you can pass mentor, mentee,admin etc as role.you should have some type of user authentication and authorization to check if user a can see users in role B or not.
Regarding UI its up to user they want one page with drop-down filter or they want URLs to bookmark.
Like multiple url /admin /mentor etc.
or one url with querystring and dropdown./user?role=mentor,/user?role=admin.
Based on url you have to make controllers. I generally prefer drop down and fetch data (by default all mentors might be the selection).
This is a specific invitation suited for invitations of a romantic nature (e.g. dates or engagement parties).

Firestore manual entry for indices?

I have the following query:
get invitations() {
return this.firestore.colWithIds$(`users/${this.authProvider.currentUserId}/meetings`, (ref) => {
return ref
.where(`participants.${this.authProvider.currentUserId}.invitation`, '==', 'pending')
.orderBy('createdAt', 'desc')
});
}
Everytime Firebase will generate an error that says: This query required an index and generates a link with the currentUserId in it. I add it to the console and everything works as expected..
However, this seems a bit too manual to maintain when users are registering. How can I generate a index that is dynamic and does not require manual entry every time a new user downloads my app?
The query you want to do, in the general case, is not possible in Cloud Firestore with your data structure.
You will quickly run into the limit of 200 composite indexes per database, probably around your 200th user:
https://firebase.google.com/docs/firestore/quotas#indexes

Categories