Improve performance for repetitive Mongo database access task

Improve performance for repetitive Mongo database access task - javascript

I'm building a chat bot (using MeteorJS/NodeJS) which interacts with about 2,000 active users everyday. I know the exact number of people who chat with the bot everyday because I store the users active information in a MongoDB collection called ActiveReports.
This is a scenario in my app: if a user A chat with the bot 100 times (= 100 messages) in a day, these steps will be executed:
- receive message from users
- check if this user is marked as 'active' today ? // high cost
- if yes => don't do anything
- if no => mark this user as 'active' for today
As you can see, step 2 is executed for every messages. This step is technically equivalent to accessing the ActiveReports collection, find the one with timestamp = today, user = user A. Since the ActiveReports collection has a lot of documents (about 100,000 documents), this is a fairly heavy task. This negatively affects the app's performance.
NOTE 1: This is the ActiveReports collection schema:
SimpleSchema({
// _id must be set `type` as String and `optional` as true
// to avoid ObjectId(_id) after insert in to database
_id: {
type: String,
optional: true,
},
date: {
type: Date, // Note: date is always the timestamp of the start of the current day, so 1AM timestamp and 9PM timestamp will be changed to 0AM timestamp (before the insert)
},
userId: {
type: String,
},
});
And this is how I indexed this collection:
ActiveReports._ensureIndex({ date: 1, userId: 1 }, { unique: true });
NOTE 2: A user is active in a day means he interacts with the bot at least 1 time (e.g send a message to the bot) that day.
Any ideas how I can improve this ? Please tell me if you need further information. Thank you.

Add a field last_active_date to a User schema and update it every time you get a message. If the date matches today, you are done. If it's not, you need to update the field and add a record to ActiveReports collection.
Actually, it seems to me that you are trying to use Mongo here in a way you would use a relational database. I mean, that there is no need in ActiveReports if you just want to mark a user as active.
If you are trying to build some sort of report for showing app usage per user per day, you can do it in the background. You can have a job that will run once a day (actually, if you have users in different time zones and you want to tolerate their time, you may want to run it few times a day). This job will query the User collection and add records to ActiveReports for each user it finds where last_active_date is current_date.

If you're building a stateless server application the minimum you need to do is pull the user's record to check active.
You might consider having have a daemon task process the ActiveReports and update user dates in the background. That way you only process those records once and the user info is ready to go. Also, that process can have state so it can be more optimal that updating every user for every record.

Related

How to create user specific Cron Job in Node.js?

Requirement
I want to create an email scheduling system in which user set the time to schedule a email sending. Now in this case every user can set their appropriate time to send an email everyday. Now how can I tackle this for individual user ?
Where I stuck ?
If I run the cron job every minute to verify the scheduled time set by the users and if the previous call of Cron() function is not finished then it will not run for the second time until the previous execution completes. and hence same task for the other users will not start. So do I need to create a separate cron job for each user ? And if so then How can I implement that ?
Cron Function
Inside Cron function I am fetching all the users whose time is matching with the current time and then sending them email.
const job = new CronJob({
cronTime: `* * * * *`,
onTick: function () {
Cron().catch((err) => console.error(`Error --> ${err.stack}`))
},
start: false,
timeZone: `Asia/Kolkata`
})
job.start()

You Events in Nodejs (multithreading). With the help of MultiThreading, you can run multiple users tasks without stopping in between.

How to save a document with a dynamic id into Cloud Firestore? Always changing

I am using Cloud Firestore as my database
This is my form codes on my webpage that creates a new document into my Cloud Firestore collection called "esequiz". So how do I code it in such a way that it always plus 1 to the number of documents there are in the database? And also set a limit to having the amount of documents inside the database
form.addEventListener('submit', (e) => {
e.preventDefault();
db.collection('esequiz').add({
question: form.question.value,
right: form.right.value,
wrong: form.wrong.value
});
form.question.value = '';
form.right.value = '';
form.wrong.value = '';
});
It currently works but it will show up as an auto generated ID. How do I make it carry on from the numbers, like as my current documents? When i save I would like it to read the current last document id, OR simply count the number of documents, then just + 1
Insight from Andrei Cusnir, counting documents in Cloud Firestore is not supported.
Now I am trying Andrei's approach 2, to query documents in descending order, then using .limit to retrieve the first one only.
UPDATED
form.addEventListener('submit', (e) => {
e.preventDefault();
let query = db.collection('esequiz');
let getvalue = query.orderBy('id', 'desc').limit(1).get();
let newvalue = getvalue + 1;
db.collection('esequiz').doc(newvalue).set({
question: form.question.value,
right: form.right.value,
wrong: form.wrong.value
});
form.question.value = '';
form.right.value = '';
form.wrong.value = '';
});
No more error, but instead, the code below returns [object Promise]
let getvalue = query.orderBy('id', 'desc').limit(1).get();
So when my form saves, it saves as [object Promise]1, which I don't know why it is like this. Can someone advise me on how to return the document id value instead of [object Promise]
I think it is because I did specify to pull the document id as the value, how do I do so?
UPDATED: FINAL SOLUTION
Played around with the codes from Andrei, and here are the final codes that works. Much thanks to Andrei!
let query = db.collection('esequiz');
//let getvalue = query.orderBy('id', 'desc').limit(1).get();
//let newvalue = getvalue + 1;
query.orderBy('id', 'desc').limit(1).get().then(querySnapshot => {
querySnapshot.forEach(documentSnapshot => {
var newID = documentSnapshot.id;
console.log(`Found document at ${documentSnapshot.ref.path}`);
console.log(`Document's ID: ${documentSnapshot.id}`);
var newvalue = parseInt(newID, 10) + 1;
var ToString = ""+ newvalue;
db.collection('esequiz').doc(ToString).set({
id: newvalue,
question: form.question.value,
right: form.right.value,
wrong: form.wrong.value
});
});
});

If I understood correctly you are adding data to the Cloud Firestore and each new document will have as name an incremental number.
If you query all the documents and then count how many are of them, then you are going to end up with many document reads as the database increases. Don't forget that Cloud Firestore is charging per document Read and Write, therefore if you have 100 documents and you want to add new document with ID: 101, then with the approach of first reading all of them and then counting them will cost you 100 Reads and then 1 Write. The next time it will cost you 101 Reads and 1 Write. And it will go on as your database increases.
The way I see is from two different approaches:
Approach 1:
You can have a single document that will hold all the information of the database and what the next name should be.
e.g.
The structure of the database:
esequiz:
0:
last_document: 2
1:
question: "What is 3+3?
right: "6"
wrong: "0"
2:
question: "What is 2+3?
right: "5"
wrong: "0"
So the process will go as follows:
Read document "/esequiz/0" Counts as 1 READ
Create new document with ID: last_document + 1 Counts as 1 WRITE
Update the document that holds the information: last_document = 3; Counts as 1 WRITE
This approach cost you 1 READ and 2 WRITES to the database.
Approach 2:
You can load only the last document from the database and get it's ID.
e.g.
The structure of the database (Same as before, but without the additional doc):
esequiz:
1:
question: "What is 3+3?
right: "6"
wrong: "0"
2:
question: "What is 2+3?
right: "5"
wrong: "0"
So the process will go as follows:
Read the last document using the approach described in Order and limit data with Cloud Firestore documentation. So you can use direction=firestore.Query.DESCENDING with combination of limit(1) which will give you the last document. Counts as 1 READ
Now you know the ID of the loaded document so you can create new document with ID: that will use the loaded value and increase it by 1. Counts as 1 WRITE
This approach cost you 1 READ and 1 WRITE in total to the database.
I hope that this information was helpful and it resolves your issue. Currently counting documents in Cloud Firestore is not supported.
UPDATE
In order for the sorting to work, you will also have to include the id as a filed of the document that so you can be able to order based on it. I have tested the following example and it is working for me:
Structure of database:
esequiz:
1:
id: 1
question: "What is 3+3?
right: "6"
wrong: "0"
2:
id:2
question: "What is 2+3?
right: "5"
wrong: "0"
As you can see the ID is set the same as the document's ID.
Now you can query all the documents and order based on that filed. At the same time you can only retrieve the last document from the query:
const {Firestore} = require('#google-cloud/firestore');
const firestore = new Firestore();
async function getLastDocument(){
let query = firestore.collection('esequiz');
query.orderBy('id', 'desc').limit(1).get().then(querySnapshot => {
querySnapshot.forEach(documentSnapshot => {
console.log(`Found document at ${documentSnapshot.ref.path}`);
console.log(`Document's ID: ${documentSnapshot.id}`);
});
});
}
OUTPUT:
Found document at esequiz/2
Document's ID: 2
Then you can take the ID and increase it by 1 to generate the name for your new document!
UPDATE 2
So, the initial question is about "How to store data in the Cloud Firestore with documents having incremental ID", at the moment you are facing issues of setting up Firestore with you project. Unfortunately, the new raised questions should be discussed in another Stackoverflow post as they have nothing to do with the logic of having incremental IDs for the document and it is better to keep one issue per question, to give better community support for members that are looking for a solution about particular issues. Therefore, I will try to help you, in this post, to execute a simple Node.js script and resolve the initial issue, which is storing to Cloud Firestore documents with incremental IDs. Everything else, on how to setup this in your project and how to have this function in your page, should be addressed in additional question, where you also will need to provide as much information as possible about the Framework you are using, the project setup etc.
So, lets make a simple app.js work with the logic described above:
Since you have Cloud Firestore already working, this means that you already have Google Cloud Platform project (where the Firestore relies) and the proper APIs already enabled. Otherwise it wouldn't be working.
Your guide in this tutorial is the Cloud Firestore: Node.js Client documentation. It will help you to understand all the methods you can use with the Firestore Node.js API. You can find helpful links for adding, reading, querying documents and many more operations. (I will post entire working code later in this steps. I just shared the link so you know where to look for additional features)
Go to Google Cloud Console Dashboard page. You should login with your Google account where your project with the Firestore database is setup.
On top right corner you should see 4 buttons and your profile picture. The first button is the Activate Cloud Shell. This will open a terminal on the bottom of the page with linux OS and Google Cloud SDK already install. There you can interact with your resources within GCP projects and test your code locally before using it in your projects.
After clicking that button, you will notice that the terminal will open in the bottom of your page.
To make sure that you are properly authenticated we will set up the project and authenticate the account again, even if it is already done by default. So first execute $ gcloud auth login
On the prompted question type Y and hit enter
Click on the generated link and authenticate your account on the prompted window
Copy the generated string back to the terminal and hit enter. Now you should be properly authenticated.
Then setup the project that contains Cloud Firestore database with the following command: $ gcloud config set project PROJECT_ID. Now you are ready to build a simple app.js script and execute it.
Create a new app.js file: nano app.js
Inside paste my code example that can be found in this GitHub link. It contains fully working example and many comments explaining each part therefore it is better that it is shared through GitHub link and not pasted here. Without doing any modifications, this code will execute exactly what you are trying to do. I have tested it my self and it is working.
Execute the script as: node app.js
This will give you the following error:
Error: Cannot find module '#google-cloud/firestore'
Since we are importing the library #google-cloud/firestore but haven't installed it yet.
Install #google-cloud/firestore library as follows: $ npm i #google-cloud/firestore. Described in DOC.
Execute the script again: $ node app.js.
You should see e.g. Document with ID: 3 is written.
If you execute again, you should see e.g. Document with ID: 4 is written.
All those changes should appear in your Cloud Firestore database as well. As you can see it is loading the ID of the last document, it is creating a new ID and then it creates a new document with the given arguments, while using the new generated ID as document name. This is exactly what the initial issue was about.
So I have shared with you the full code that works and does exactly what you are trying to do. Unfortunately, the other newly raised issues, should be addressed in another Stackoverflow post, as they have nothing to do with the initial issue, which is "How to create documents with incremental ID". I recommend you to follow the steps and have a working example and then try to implement the logic to your project. However, if you are still facing any issues with how to setup Firestore in your project then you can ask another question. After that you can combine both solutions and you will have working app!
Good luck!

I don't think the way you are trying to get the length of the collection is right and I am entirely not sure what is the best way to get that either. Because the method you are trying to implement will cost you a lot more as you are trying to read all the records of the collection.
But there can be alternatives to get the number you require.
Start storing the ID in the record and make the query with limit 1 and a descending sort on ID.
Store the latest number in another collection and increment that every time you create a new record, And fetch the same whenever needed.
These methods might fail if concurrent requests are being made without transactions.

stripe: Create a 30-day trial subscription charge on a saved card but allow user to upgrade and start charging immediately?

So before a user can create an account I want to save their credit card to charge a subscription with 30 day trial AND the ability to immediately charge the card for a subscription if the user demands it.
so my logic is to
1) create a customer
2) add payment details for customer
3) create subscription with 30 day trial
4) activate subscription upon user upgrade action
I'm not clear on how 4) is possible. I get that on 3), after 30 days, they are on a subscription. but what if the customer wants to start using the full version immediately before the trial is over, how would I create a charge for the subscription?
const stripe = require('stripe')('sk_test_asdfasdf');
(async () => {
// Create a Customer:
stripe.customers.create({
email: 'jenny.rosen#example.com',
payment_method: 'pm_1FWS6ZClCIKljWvsVCvkdyWg',
invoice_settings: {
default_payment_method: 'pm_1FWS6ZClCIKljWvsVCvkdyWg',
},
}, function(err, customer) {
// asynchronously called
});
//create subscription
stripe.subscriptions.create({
customer: 'cus_4fdAW5ftNQow1a',
items: [
{
plan: 'plan_CBXbz9i7AIOTzr',
},
],
expand: ['latest_invoice.payment_intent'],
}, function(err, subscription) {
// asynchronously called
}
);
})();

I'll chime in on this really quick, to hopefully get you in the right direction; This sounds like a case for Setup Intents. You can collect payment details, with the intent to charge at a later time. Since the trial won't incur any charges at first, all is good. However you work out the logic to switch from trial to active status on the subscription, you'd update the subscription end-date to end the trial.
This is nicely summarized here, for the most part, save for updating the Subscription and setting the trial_end argument:
https://stripe.com/docs/payments/save-and-reuse
API docs entry on updating the Subscription:
https://stripe.com/docs/api/subscriptions/update#update_subscription-trial_end
Once the trial is over, whether "naturally" or by explicitly setting the end timestamp, an invoice should be sent out or default payment method charged, depending on your settings. It wouldn't hurt to work in some good flow here, concerning user-experience; For example, having a confirmation step to let the customer know they are about to be charged X amount, before actually firing off that off.
Here are other helpful docs:
https://stripe.com/docs/saving-cards
https://stripe.com/docs/payments/setup-intents
https://stripe.com/docs/billing/subscriptions/trials
https://stripe.com/docs/billing/subscriptions/payment#handling-trial
https://stripe.com/docs/api/subscriptions/update

BigQuery similarity to "signed urls"

I have the following use case in BigQuery:
A non-trusted user will be querying a BigQuery table. Let's say the query is SELECT * FROM [bigquery.table123].
The query will return a large amount of data, let's say 200MB, which will then be displayed in the user's browser.
Our goal is to provide the most efficient way to get the 200MB data into the user's browser (and the worst way seems to do two trips instead of one -- from BQ to our server and then (compressed) to the client). I think the solution for this would probably be to enable the end (non-trusted) user to get something like a "signed-url" to perform the query directly from their browser to BigQuery. The flow would then be like this:
User issues query to our backend.
Authentication is done and a signed url is generated and passed back into javascript.
The client then sends the signed url and the data is loaded directly into the browser.
Only that exact query that has been authorized may be performed, and no other queries could be done (for example, if the client copied any tokens from the javascript)
I would never, ever want the end user to know the ProjectId or Table Name(s) that they are querying.
Is something like this possible to do in BigQuery? Here is an example of a similar need in Cloud Storage. Here is an example of an authenticated/trusted user doing this in browser: https://github.com/googleapis/nodejs-bigquery/blob/master/samples/browseRows.js or . https://stackoverflow.com/a/11509425/651174, but is there a way to do this in-browser for a non-trusted user?

Below is an option that involves two levels of authorized views. This allows to shield not only underlying data from end user - but also hides what exactly data is being used
Let's assume data is in DatasetA. Below steps explain the logic
Create InternalView in DatasetB - this one will target real data from DatasetA.
Make InternalView as Authorized View for DatasetA
Create PublicView in DatasetC - this one will target InternalView
Make PublicView as Authorized View for DatasetB
Give users read access to DatasetC
Users will be ale to run PublicView which will actually be running PrivateView against readl data.
Meantime, users will not be able to see the definition of PrivateView thus will never know ProjectId or Table Name(s) that they are querying, etc.
Note: this does not address how we'd prevent users from being able to issue queries that we haven't pre-authorized? part of your question but I am adding my answer as you asked me to do
Meantime - at least theoretically - you can embed some logic into your PrivateView, which will be querying some internal metatable with info which user and when allowed to get result. Assuming that such meta-table will be managed by your backend based on authentication/token or whatever else you have in mind
Below is simplified and brief outline of that approach
#standardSQL
WITH `projectA.datasetA.table` AS (
SELECT 'data1' col UNION ALL
SELECT 'data2' UNION ALL
SELECT 'data3'
), `projectA.datasetA.applicationPermissions` AS (
SELECT 'user1#gmail.com' user UNION ALL
SELECT 'user2#gmail.com'
), `projectA.datasetB.privateView` AS (
SELECT d.*
FROM `projectA.datasetA.table` d
CROSS JOIN `projectA.datasetA.applicationPermissions` p
WHERE LOWER(user) = LOWER(SESSION_USER())
), `projectA.datasetC.publicView` AS (
SELECT *
FROM `projectA.datasetB.privateView`
)
SELECT *
FROM `projectA.datasetC.publicView`
If user1#gmail.com or user2#gmail.com will run below query
SELECT *
FROM `projectA.datasetC.publicView`
they will get below result
Row col
1 data1
2 data2
3 data3
while if user3#gmail.com will run same very query - result will be
Row col
Query returned zero records.
Obviously, you can extend your meta-table (applicationPermissions) with for example timeframe during which user will be allowed to get result (respective lines to check time conditions will need to be added to projectA.datasetB.privateView )

How to get active visitors count at this moment on this page?

I have a chat room and I want to show how many people are online exactly in this chat at this moment. User can join room with or without registation.

This is strongly dependant on the way you implement the chat room.
You could assign a chat-session id and timeout to each visitor, which gets expired over time and removed from a list.
This list will contain details on visitors, including the count.

Just a quick idea that has come to my mind (can be heavily customized and improved):
1) Call a PHP script periodically (for instance, once per a minute) through an AJAX call with a unique per-user ID. Like this, for example:
var visitorCounter = function() {
$.get('audience_checker.php', {
id: get_random_id() // inspiration below
});
}
setInterval(visitorCounter, 60000); // it gets called every 60000 ms = 1 minute
Take an inspiration how to build a random ID generation here. Or use the IP address.
2) Now write the PHP script that will store IDs from $_GET super-global variable in a database, with timestamp. If the ID already exists, just update the timestamp.
3) And finally, another script statistics.php can just select those data from the database which are not older than a minute bases on the timestamp.

Of course will depend on your chat application logic but that's something I am using to count the users in my application. It is not perfect because you never know about your users if they don't log off.
You can add a new table to handle sessions:
`id`, `expire`, `data`, `user_id`, `last_write`
then change the configuration to save the sessions into this table instead of files.
'session' => [
'class' => 'yii\web\DbSession',
'writeCallback' => function ($session) {
return [
'user_id' => Yii::$app->user->id,
'last_write' => time(),
];
},
],
then you can check the sessions in the last 5 minutes for instance
Hope it helps

We Keep Coding

JavaScript is the programming language of the Web.