I'm migrating from MongoDB to Firestore and I've been struggling to make this call faster. So, to give you the context, this is how every document looks like in the collection:
Document sample:
The "entities" object may have the following fields: "customer", "supplier", "product", "product_group" and/or "origin". What I'm trying to do is to build a ranking, e.g., a supplier ranking for docs with origin = "A" and product = "B". That means, I have to get from the DB all documents with entities = { origin:"A", product:"B", supplier: exists }
The collection has more than 300.000 documents, and I make between 4-10 calls depending on the "entities", which return between 0 and a few hundred results each. These calls are taking excessively long to execute (between 10 and 30 seconds in total) in Firestore, whereas with MongoDB it took around 2-3 seconds in total. This is how my code looks like as of now (following the previous example, the parameters in the function would be entities = {origin:"A", product:"B"} and entity = "supplier"):
async getRanking(entities: { [key: string]: any }, stage: string, metric: string, organisationId: string, entity: string) {
let indicators: IndicatorInsight[] = [];
const entitySet = ['product', 'origin', 'product_group', 'supplier', 'customer'];
const entitiesKeys = [];
//this way we always keep the values in the same order as in entitySet
for (const ent of entitySet) {
if ([...Object.keys(entities), entity].includes(ent)) {
entitiesKeys.push(ent);
}
}
let rankingRef = OrganisationRef
.doc(organisationId)
.collection('insight')
.where('stage', '==', stage)
.where('metric', '==', metric)
.where('insight_type', '==', 'indicator')
.where('entities_keys', '==', entitiesKeys)
for (const ent of Object.keys(entities)) {
rankingRef = rankingRef.where(`entities.${ent}`, '==', entities[ent]);
}
(await rankingRef.get()).forEach(snap => indicators.push(snap.data() as IndicatorInsight));
return indicators;
}
So my question is, do you have any suggestion on how to improve the structure of the documents and the querying in order to improve the performance of this call? As I mentioned, with MongoDB this was quite fast, 2-3 seconds tops.
I've been told Firestore should be extremely fast, so I guess I'm not doing things right here and I'd really appreciate your help. Please let me know if you need more details.
Related
Forgive me but I have been trying to solve the problem for several days, but I can not locate the problem.
I am downloading from mongoDB^14..(with mongoose) two arrays with complementary data. One contains the user data, the other the user survey records.
Both arrays are related through the user's email:
/***************************************************************
This is what I download from mongoDB with Nodejs
****************************************************************
const user = require ('../models/user');
const test = require ('../models/test');
let users = await user.find ({email:userEmail});
let tests = await test.find ({email:testEmail});
Request:
let users: [
{name:'pp', email:'pp#gmail.com},
{name:'aa', email:'aa#gmail.com'}
];
let tests: [
{email:'pp#gmail.com', satisfaction:'5'},
{email:'aa#gmail.com', satisfaction:'2'}];
*****************************************************************************/
Now I try to relate both json arrays using:
for (let i = 0; i < prof1.length; i++) {
for (let z = 0; z < registro1.length; z++) {
if (users[i].email==tests[z].email){
users[i]['satisfation'] = test[z].satisfation;
}
}
}
If I do a console.log(users[0]) my wish is:
{name:'pp', email:'pp#gmail.com', satisfation:'5'}
But I receive:
{name:'pp', email:'pp#gmail.com'}
But attention¡¡¡¡ If I do a console.log(users[0].satisfation)
The result is: 5
?????? Please can someone help me. Thank you very much
Note:
If I instead of downloading the mongodb arrays, I write them by hand. So it works perfectly. Can it be a lock on the models?
WIDE INFORMATION
Although I have given a simple example, the user and test arrays are much more complex in my application, however they are modeled and correctly managed in mongoDB.
The reason for having two documents in mongoDB is because there are more than 20 variables in each of them. In user, I save fixed user data and in the test the data that I collect over time. Later, I lower both arrays to my server where I perform statistical calculations and for this I need to generate a single array with data from both arrays. I just take the test data and add it to the user to relate all the parameters. I thought that in this way it would unburden the mongoDB to carry out continuous queries and subsequent registrations, since I perform the operations on the server, to finally update the arry test in mongoDB.
When I query the database I receive the following array of documents:
let registroG = await datosProf.find({idAdmin:userAdmin });
res.json(registroG);
This is the response received in the front client:
If I open the object the document [0] would be:
**THE QUESTION **:
Why when I try to include a value key in the object it doesn't include it?
You could use Array.map with es6 spread operator to marge to objects
let users = [{ name: 'pp', email: 'pp#gmail.com' }, { name: 'aa', email: 'aa#gmail.com' }];
let tests = [{ email: 'pp#gmail.com', satisfaction: '5' }, { email: 'aa#gmail.com', satisfaction: '2' }];
let result = users.map(v => ({
...v,
...tests.find(e => e.email == v.email)
}))
console.log(result)
I have a project im working on and I have to seed a database with 10 million random rows, which i have successfully done. However it takes about 30 minutes for it to complete, which is expected, but i know it could be faster. I would like to make it run ever faster and figure out a way to make it seed 10 million random entries in under 10 minutes preferably while still using mongodb/mongoose. This is my current seed file, any tips on making it run faster? First time posting on here, just fyi. thanks!
I use 'node database/seed.js' to run this file in the terminal.
const db = require("./index.js");
const mongoose = require("mongoose");
const faker = require("faker");
const productSchema = mongoose.Schema({
product_name: String,
image: String,
price: String
});
let Product = mongoose.model("Product", productSchema);
async function seed() {
for (let i = 0; i < 10000000; i++) {
let name = faker.commerce.productName();
let image = faker.image.imageUrl();
let price = faker.commerce.price();
let item = new Product({
product_name: `${name}`,
image: `${image}`,
price: `$${price}`
});
await item
.save()
.then(success => {})
.catch(err => {});
}
}
seed();
You can create batch of may be 1 million records and can use insertMany function to insert bulk into database.
Use InsertMany
Insert/update always takes time in all kinda database. try to reduce number of insertion.
Insert something for every 1000 or loops once
Model.insertMany(arr, function(error, docs) {});
I have a small job that runs every minute and perform a scan in a table that has near 3000 rows:
async execute (dialStatus) {
if (!process.env.DIAL_TABLE) {
throw new Error('Dial table not found')
}
const params = {
TableName: process.env.DIAL_TABLE,
FilterExpression: '#name = :name AND #dial_status = :dial_status AND #expires_on > :expires_on',
ExpressionAttributeNames: {
'#name': 'name',
'#dial_status': 'dial_status',
'#expires_on': 'expires_on'
},
ExpressionAttributeValues: {
':name': { 'S': this.name },
':dial_status': { 'S': dialStatus ? dialStatus : 'received' },
':expires_on': { 'N': Math.floor(moment().valueOf() / 1000).toString() }
}
}
console.log('params', params)
const dynamodb = new AWS.DynamoDB()
const data = await dynamodb.scan(params).promise()
return this._buildObject(data)
}
I'm facing a problem about read units and timeouts on dynamodb. Right now, I'm using 50 read units and it's getting expensive if compared to a RDS.
The attributes names used on scan function are not my primary key: name is a secondary index and dial_status is a normal attribute on my json but every row has this attribute.
This job run every minute for a list of parameters (i.e: if i have 10 parameters, I'll perform this scan 10 times in a minute).
My table has the following schema:
phone (PK Hash)
configuration: JSON in String format;
dial_status String;
expires_on: TTL number;
name: String
origin: String;
The job should get all items based on name and dial_status and the number of items is restricted to 15 elements each execution (each minute). For each element, it should be enqueued on SQS to be processed.
I really need to decrease those read units but I'm not sure on how optimize this function. I've read about reduce the page size or avoid scan. What's are my alternatives to avoid scan if I don't have primary key and I want to return a group of rows?
Any idea on how to fix this code to be called like 10-15 times every minute?
I suggest you to create a GSI (Global Secondary Index) with keys:
HASH: name_dialStatus
RANGE: expiresOn
As you've already guessed, the hash key has as value the concatenation of the two independent fields name and dialStatus.
Now you may use a query on this GSI, which is much more efficient since it doesn't scan all the table, but explores only the items you are interested in:
async execute(dialStatus) {
if (!process.env.DIAL_TABLE) {
throw new Error('Dial table not found')
}
const params = {
TableName: process.env.DIAL_TABLE,
IndexName: 'MY_GSI_NAME',
// replace `FilterExpression`
// always test the partition key for equality!
KeyConditionExpression: '#pk = :pk AND #sk > :skLow',
ExpressionAttributeNames: {
'#pk': 'name_dialStatus', // partition key name
'#sk': 'expires_on' // sorting key name
},
ExpressionAttributeValues: {
':pk': { 'S': `${this.name}:${dialStatus || 'received'}` },
':skLow': { 'N': Math.floor(moment().valueOf() / 1000).toString() }
}
}
console.log('params', params)
// Using AWS.DynamoDB.DocumentClient() there is no need to specify the type of fields. This is a friendly advice :)
const dynamodb = new AWS.DynamoDB();
// `scan` becomes `query` !!!
const data = await dynamodb.query(params).promise();
return this._buildObject(data);
}
It is always recommended to design dynamodb table based on access patterns to query it easily with keys(primarykey/sortkey) and to avoid expensive scan operations.
Revisit your table schema if it is not too late.
If it is already late then either create GSI with "name" as PrimaryKey and "expires_on" as SortKey with Projected attributes e.g. "dialStatus" so that you can query only required data to lower ready capacity.
If you still do not want to go with option 1 and option2 Scan operation with RateLimiter and pass only 25% of read capacity so that you can avoid spike.
I have a long list of chat rooms
let chatRooms = {
"general": ChatRoom,
"myRoomA": ChatRoom,
"bobsRoom": ChatRoom,
...
}
ChatRoom has a serialize method
ChatRoom.serialize = function(){
return {
name: this.name,
clients: this.clients,
...
}
}
In order to list all ChatRooms to a user, I must send this data to them
ChatRoomManager.serialize = function(){
let serializedObjects = [];
Util.each(this.chatRooms, function(i, e){
if(e.serialize){
serializedObjects.push(e.serialize());
}
});
return serializedObjects;
}
This becomes a performance issue as people regularly request to list all chat rooms and it gets serialized so often so I want to do paging. But if an object has no guaranteed order, how can I possibly say "here are the next 10 chat rooms"? Even if I could guarantee order, how could I start at index 11 without looping through all of the objects? Imagine if I was at index 1000, etc..
TLDR: is it possible to do paging with an object of objects efficiently and accurately.
You coulf just take the values of the objects which returns an array, so the order is guaranteed:
const ordered = Object.values(chatRooms);
You could now also apply a custom sort order, e.g.:
ordered.sort((roomA, roomB) => roomA.name.localeCompare(roomB.name));
To now serialize only one chunk it is as easy as:
let index = 0, chunk = 100;
const result = ordered.slice(index * chunk, (index + 1) * chunk).map(room => room.serialize());
I have a field in my documents, that is named after its timestamp, like so:
{
_id: ObjectId("53f2b954b55e91756c81d3a5"),
domain: "example.com",
"2014-08-07 01:25:08": {
A: [
"123.123.123.123"
],
NS: [
"ns1.example.com.",
"ns2.example.com."
]
}
}
This is very impractical for queries, since every document has a different timestamp.
Therefore, I want to rename this field, for all documents, to a fixed name.
However, I need to be able to match the field names using regex, because they are all different.
I tried doing this, but this is an illegal query.
db['my_collection'].update({}, {$rename:{ /2014.*/ :"201408"}}, false, true);
Does someone have a solution for this problem?
SOLUTION BASED ON NEIL LUNN'S ANSWER:
conn = new Mongo();
db = conn.getDB("my_db");
var bulk = db['my_coll'].initializeOrderedBulkOp();
var counter = 0;
db['my_coll'].find().forEach(function(doc) {
for (var k in doc) {
if (k.match(/^2014.*/) ) {
print("replacing " + k)
var unset = {};
unset[k] = 1;
bulk.find({ "_id": doc._id }).updateOne({ "$unset": unset, "$set": { WK1: doc[k]} });
counter++;
}
}
if ( counter % 1000 == 0 ) {
bulk.execute();
bulk = db['my_coll'].initializeOrderedBulkOp();
}
});
if ( counter % 1000 != 0 )
bulk.execute();
This is not a mapReduce operation, not unless you want a new collection that consists only of the _id and value fields that are produced from mapReduce output, much like:
"_id": ObjectId("53f2b954b55e91756c81d3a5"),
"value": {
"domain": "example.com",
...
}
}
Which at best is a kind of "server side" reworking of your collection, but of course not in the structure you want.
While there are ways to execute all of the code in the server, please don't try to do so unless you are really in a spot. These ways generally don't play well with sharding anyway, which is usually where people "really are in a spot" for the sheer size of records.
When you want to change things and do it in bulk, you generally have to "loop" the collection results and process the updates while having access to the current document information. That is, in the case where your "update" is "based on" information already contained in fields or structure of the document.
There is therefore not "regex replace" operation available, and there certainly is not one for renaming a field. So let's loop with bulk operations for the "safest" form of doing this without running the code all on the server.
var bulk = db.collection.initializeOrderedBulkOp();
var counter = 0;
db.collection.find().forEach(function(doc) {
for ( var k in doc ) {
if ( doc[k].match(/^2014.*/) ) {
var update = {};
update["$unset"][k] = 1;
update["$set"][ k.replace(/(\d+)-(\d+)-(\d+).+/,"$1$2$3") ] = doc[k];
bulk.find({ "_id": doc._id }).updateOne(update);
counter++;
}
}
if ( counter % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
});
if ( counter % 1000 != 0 )
bulk.execute();
So the main things there are the $unset operator to remove the existing field and the $set operator to create the new field in the document. You need the document content to examine and use both the "field name" and "value", so hence the looping as there is no other way.
If you don't have MongoDB 2.6 or greater on the server then the looping concept still remains without the immediate performance benefit. You can look into things like .eval() in order to process on the server, but as the documentation suggests, it really is not recommended. Use with caution if you must.
As you already recognized, value-keys are indeed very bad for the MongoDB query language. So bad that what you want to do doesn't work.
But you could do it with a MapReduce. The map and reduce functions wouldn't do anything, but the finalize function would do the conversion in Javascript.
Or you could write a little program in a programming language of your which reads all documents from the collection, makes the change, and writes them back using collection.save.