node.js, mongo quick way to hold unique number - javascript

I want to let mongo hold an incrementing number for me such that I can call that number and then generate a string from it.
x = 1e10
x.toString(36).substring(2,7)
>>'dqpds'
I have a way to increment the number every time I call it from mongo
db.counter.update({ _id: 1 }, { $inc: { seq: 1 } }, {upsert: true},
function(err, val){
//...
})
But I want to set the number to something like 1e10 at the beginning such that I get a 5 character long string, But I would rather not have something more than one call to the database.
How to I set a default value for the upsert in mongo. Or do you have a more efficient way of generating a unique 5 - 6 character string?

If you only need a unique id which is not necessarily sequential, you can use the first part of ObjectId.
From the above document there is a description:
ObjectId is a 12-byte BSON type, constructed using:
a 4-byte timestamp,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
So you can do like this:
x = ObjectId().toString().subString(0,4)
This approach doesn't involve database IO, so the performance would be better. If you want to be more sure about its uniqueness, add the last 2 bytes of the counter to make a 6 character one.

There is a way to do this in MongoDB.
You use the findAndModify command and it's described in detail in exactly the context you are looking for here:
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/

Related

MongoDB - slow query on old documents (aggregation and sorting)

I have two DBs for testing and each contains thousands/hundreds of thousand of documents.
But with the same Schemas and CRUD operations.
Let's call DB1 and DB2.
I am using Mongoose
Suddenly DB1 became really slow during:
const eventQueryPipeline = [
{
$match: {
$and: [{ userId: req.body.userId }, { serverId: req.body.serverId }],
},
},
{
$sort: {
sort: -1,
},
},
];
const aggregation = db.collection
.aggregate(eventQueryPipeline)
.allowDiskUse(true);
aggregation.exect((err, result) => {
res.json(result);
});
In DB2 the same exact query runs in milliseconds up to maximum a 10 seconds
In DB1 the query never takes less than 40 seconds.
I do not understand why. What could I be missing?
I tried to confront the Documents and the Indexes and they're the same.
Deleting the collection and restrting saving the documents, brings the speed back to normal and acceptable, but why is it happening? Does someone had same experience?
Short answer:
You should create following index:
{ "userId": 1, "serverId": 1, "sort": 1 }
Longer answer
Based on your code (i see that you have .allowDiskUse(true)) it looks like mongo is trying to do in memory sort with "a lot" of data. Mongo has by default 100MB system memory limit for sort operations, and you can allow it to use temporary files on disk to store data if it hits that limit.
You can read more about it here: https://www.mongodb.com/docs/manual/reference/method/cursor.allowDiskUse/
In order to optimise the performance of your queries, you can use indexes.
Common rule that you should follow when planning indexes is ESR (Equality, Sort, Range). You can read more about it here: https://www.mongodb.com/docs/v4.2/tutorial/equality-sort-range-rule/
If we follow that rule while creating our compound index, we will add equality matches first, in your case "userId" and "serverId". After that comes the sort field, in your case "sort".
If you had a need to additionally filter results based on some range (eg. some value greater than X, or timestamp greater than yday), you would add that after the "sort".
That means your index should look like this:
schema.index({ userId: 1, serverId: 1, sort: 1 });
Additionally, you can probably remove allowDiskUse, and handle err inside aggregation.exec callback (im assuming that aggregation.exect is a typo)

How many documents does updateOne effect?

It would seem that it would simply provide the Update method in the overall CRUD ( Create, Read, Update, Delete. )
However the docs don't seem to make sense ( Mongoose - updateOne ):
const res = await Person.updateOne({ name: 'Jean-Luc Picard' }, { ship: 'USS Enterprise' });
res.n; // Number of documents matched
res.nModified; // Number of documents modified
Why is it returning some parameters that count the number of documents matched and modified?
Does it update one or does it update more than one?
Also, what does param1 and param2 refer to in
const res = await Person.updateOne(param1, param2);
The reference I posted above causes more confusion than help.
updateOne, as the name suggests, can update up to one document.
It's returning n and nModified because the that's what the Node.js MongoDB Driver API returns for several update operations (updateOne, updateMany, replaceOne)
param1 is the filter that you use to query the document(s) to be updated.
param2 is the change you want to apply for the matched document(s)
n "Number of documents matched", means the number of documents that matches the filter, provided as param1, for updateOne it can be 0 or 1
nModified "Number of documents modified", means the number of documents that matches the filter and was actually modified because previous value did not match what's given in param2, for updateOne it can be 0 or 1 (generally less than or equal to n)
see also
https://docs.mongodb.com/manual/reference/method/db.collection.updateOne/

How can you store and modify large datasets in node.js?

Basics
So basically I have written a program which generates test data for MongoDB in Node.
The problem
For that, the program reads a schema file and generates a specified amount of test data out of it. The problem is that this data can eventually become quite big (Think about creating 1M Users (with all properties it needs) and 20M chat messages (with userFrom and userTo) and it has to keep all of that in the RAM to modify/transform/map it and after that save it to a file.
How it works
The program works like that:
Read schema file
Create test data from the schema and store it in a structure (look down below for the structure)
Run through this structure and link all objects referenceTo to a random object with matching referenceKey.
Transform the object structure in a string[] of MongoDB insert statements
Store that string[] in a file.
This is the structure of the generated test data:
export interface IGeneratedCollection {
dbName: string, // Name of the database
collectionName: string, // Name of the collection
documents: IGeneratedDocument[] // One collection has many documents
}
export interface IGeneratedDocument {
documentFields: IGeneratedField [] // One document has many fields (which are recursive, because of nested documents)
}
export interface IGeneratedField {
fieldName: string, // Name of the property
fieldValue: any, // Value of the property (Can also be IGeneratedField, IGeneratedField[], ...)
fieldNeedsQuotations?: boolean, // If the Value needs to be saved with " ... "
fieldIsObject?: boolean, // If the Value is a object (stored as IGeneratedField[]) (To handle it different when transforming to MongoDB inserts)
fieldIsJsonObject?: boolean, // If the Value is a plain JSON object
fieldIsArray?: boolean, // If the Value is array of objects (stored as array of IGeneratedField[])
referenceKey?: number, // Field flagged to be a key
referenceTo?: number // Value gets set to a random object with matching referenceKey
}
Actual data
So in the example with 1M Users and 20M messages it would look like this:
1x IGeneratedCollection (collectionName = "users")
1Mx IGeneratedDocument
10x IGeneratedField (For example each user has 10 fields)
1x IGeneratedCollection (collectionName = "messages")
20Mx IGeneratedDocument
3x IGeneratedField (message, userFrom, userTo)
hich would result in 190M instances of IGeneratedField (1x1Mx10 + 1x20Mx3x = 190M).
Conclusion
This is obviously a lot to handle for the RAM as it needs to store all of that at the same time.
Temporary Solution
It now works like that:
Generate 500 documents(rows in sql) at a time
JSON.stringify those 500 documents and put them in a SQLite table with the schema (dbName STRING, collectionName STRING, value
JSON)
Remove those 500 documents from JS and let the Garbage Collector do its thing
Repeat until all data is generated and in the SQLite table
Take one of the rows (each containing 500 documents) at a time, apply JSON.parse and search for keys in them
Repeat until all data is queried and all keys retrieved
Take one of the rows at a time, apply JSON.parse and search for key references in them
Apply JSON.stringify and update the row if necessary (if key references found and resolved)
Repeat until all data is queried and all keys are resolved
Take one of the rows at a time, apply JSON.parse and transform the documents to valid sql/mongodb inserts
Add the insert (string) in a SQLite table with the schema (singleInsert STRING)
Remove the old and now unused row from the SQLite table
Write all inserts to file (if run from the command line) or return a dataHandle to query the data in the SQLite table (if run from other
node app)
This solution does handle the problem with RAM, because SQLite automatically swaps to the Harddrive when the RAM is full
BUT
As you can see there are a lot of JSON.parse and JSON.stringify involved, which slows down the whole process drastically
What I have thought:
Maybe I should modify the IGeneratedField to only use shortend names as variables (fieldName -> fn, fieldValue -> fv, fieldIsObject -> fio, fieldIsArray -> fia, ....)
This would make the needed storage in the SQLite table smaller, BUT it would also make the code harder to read
Use a document oriented database (But I have not really found one), to handle JSON data better
The Question
Is there any better solution to handle big objects like this in node?
Is my temporary solution OK? What is bad about it? Can it be changed to perform better?
Conceptually, generate items in a stream.
You don't need all 1M users in db. You could add 10k at a time.
For the messages, random sample 2n users from db, those send messages to each other. Repeat till satisfied.
Example:
// Assume Users and Messages are both db.collections
// Assume functions generateUser() and generateMessage(u1, u2) exist.
const desiredUsers = 10000;
const desiredMessages = 5000000;
const blockSize = 1000;
(async () => {
for (const i of _.range(desiredUsers / blockSize) ) {
const users = _.range(blockSize).map(generateUser);
await Users.insertMany(users);
}
for (const i of _.range(desiredMessages / blockSize) ) {
const users = await Users.aggregate([ { $sample: { size: 2 * blockSize } } ]).toArray();
const messages = _.chunk(users, 2).map( (usr) => generateMessage(usr[0], usr[1]));
await Messages.insertMany(messages);
}
})();
Depending on how you tweak the stream, you get a different distribution. This is uniform distribution. You can get more long tailed distribution by interleaving the users and messages. For example, you might want to do this for message boards.
Went to 200MB after i switched the blockSize to 1000.

How do I query MongoDB for 2 ranges and text search?

I have event objects in MonogDB that look like this:
{
"start": 2010-09-04T16:54:11.216Z,
"title":"Short descriptive title",
"description":"Full description",
"score":56
}
And I need to get a query across three parameters:
Time window (event start is between two dates)
Score threshold (score is > x)
Full-text search of title and description
What's the right way to approach this efficiently? I think the first two are done with an aggregation but I'm not sure how text search would factor in.
Assuming your start field is of type date (which it should be) and not a string, here are the basic components that you'd want to play with. In fact, given the ISO 8601 structure of a MongoDB date a string based comparison would work just as well.
// create your text index
db.collection.ensureIndex({
description: "text",
title: "text"
})
// optionally create an index on your other fields
db.collection.ensureIndex({
start: 1,
score: 1
})
x = 50
lowerDate = ISODate("2010-09-04T16:54:11.216Z") // or just the string part for string fields
upperDate = ISODate("2010-09-04T16:54:11.216Z")
// simple find to retrieve your result set
db.collection.find({
start: {
$gte: lowerDate, // depending on your exact scenario, you'd need to use $gt
$lte: upperDate // depending on your exact scenario, you'd need to use $lt
},
score: { $gt: x }, // depending on your exact scenario, you'd need to use $gte
$text: { // here comes the text search
$search: "descriptive"
}
})
There is an important topic with respect to performance/indexing that needs to be understood, though, which is very well documented here: Combine full text with other index
This is why I initially wrote "components of what you'd want to play with". So depending on the rest of your application you may want to create different indexes.

What are the security considerations for the size of an array that can be passed over HTTP to a JavaScript server?

I'm dealing with the library qs in Node.js, which lets you stringify and parse query strings.
For example, if I want to send a query with an array of items, I would do qs.stringify({ items: [1,2,3] }), which would send this as my query string:
http://example.com/route?items[0]=1&items[1]=2&items[2]=3
(Encoded URI would be items%5B0%5D%3D1%26items%5B1%5D%3D2%26items%5B2%5D%3D3)
When I do qs.parse(url) on the server, I'd get the original object back:
let query = qs.parse(url) // => { items: [1,2,3] }
However, the default size of the array for qs is limited to 20, according to the docs:
qs will also limit specifying indices in an array to a maximum index of 20. Any array members with an index of greater than 20 will instead be converted to an object with the index as the key
This means that if I have more than 20 items in the array, qs.parse will give me an object like this (instead of the array that I expected):
{ items: { '0': 1, '1': 2 ...plus 19 more items } }
I can override this behavior by setting a param, like this: qs.parse(url, { arrayLimit: 1000 }), and this would allow a max array size of 1,000 for example. This would, thus, turn an array of 1,001 items into a plain old JavaScript object.
According to this github issue, the limit might be for "security considerations" (same in this other github issue).
My questions:
If the default limit of 20 is meant to help mitigate a DoS attack, how does turning an array of over 20 items into a plain old JavaScript object supposed to help anything? (Does the object take less memory or something?)
If the above is true, even if there is an array limit of, say, 20, couldn't the attacker just send more requests and still get the same DoS effect? (The number of requests necessary to be sent would decrease linearly with the size limit of the array, I suppose... so I guess the "impact" or load of a single request would be lower)

Categories