I'm learning Dynamodb and for that I installed the local server that comes with a shell at http://localhost:8000/shell
now.. I created the following table:
var serverUpTimeTableName = 'bingodrive_server_uptime';
var eventUpTimeColumn = 'up_time';
var params = {
TableName: serverUpTimeTableName,
KeySchema: [ // The type of of schema. Must start with a HASH type, with an optional second RANGE.
{ // Required HASH type attribute
AttributeName: eventUpTimeColumn,
KeyType: 'HASH',
},
],
AttributeDefinitions: [ // The names and types of all primary and index key attributes only
{
AttributeName: eventUpTimeColumn,
AttributeType: 'N', // (S | N | B) for string, number, binary
},
],
ProvisionedThroughput: { // required provisioned throughput for the table
ReadCapacityUnits: 2,
WriteCapacityUnits: 2,
}
};
dynamodb.createTable(params, callback);
so I created a table only with one hash key called up_time, that's actually the only item in the table.
Now I want to fetch the last 10 inserted up times.
so far I created the following code:
var serverUpTimeTableName = 'bingodrive_server_uptime';
var eventUpTimeColumn = 'up_time';
var params = {
TableName: serverUpTimeTableName,
KeyConditionExpression: eventUpTimeColumn + ' != :value',
ExpressionAttributeValues: {
':value':0
},
Limit: 10,
ScanIndexForward: false
}
docClient.query(params, function(err, data) {
if (err) ppJson(err); // an error occurred
else ppJson(data); // successful response
});
ok.. so few things to notice:
I don't really need a KeyCondition. i just want the last 10 items, so I used Limit 10 for the limit and ScanIndexForward:false for reverse order.
!= or NE are not supported in key expressions for hash keys. and it seems that I must use some kind of index in the query.. confused about that.
so.. any information regarding the issue would be greatly appreciated.
Some modern terminology: Hash is now called Partition, Range is now called Sort.
Thank you Amazon.
You need to understand that Query-ing is an action on hash-keys. In order to initiate a query you must supply a hash-key. Since your table's primary key is only hash key (and not hash+range) you can't query it. You can only Scan it in order to find items. Scan doesn't require any knowledge about items in the table.
Moving on.. when you say "last 10 items" you actually do want a condition because you are filtering on the date attribute, you haven't defined any index so you can't have the engine provide you 10 results. If it were a range key element, you could get the Top-10 ordered elements by querying with a backwards index (ScanIndexForward:false) - again, not your schema.
In your current table - what exactly are you trying to do? You currently only have one attribute which is also the hash key so 10 items would look like (No order, no duplicates):
12312
53453
34234
123
534534
3101
11
You could move those to range key and have a global hash-key "stub" just to initiate the query you're making but that breaks the guidelines of DynamoDB as you have a hot partition and it won't have the best performance. Not sure this bothers you at the moment, but it is worth mentioning.
Related
I have two DBs for testing and each contains thousands/hundreds of thousand of documents.
But with the same Schemas and CRUD operations.
Let's call DB1 and DB2.
I am using Mongoose
Suddenly DB1 became really slow during:
const eventQueryPipeline = [
{
$match: {
$and: [{ userId: req.body.userId }, { serverId: req.body.serverId }],
},
},
{
$sort: {
sort: -1,
},
},
];
const aggregation = db.collection
.aggregate(eventQueryPipeline)
.allowDiskUse(true);
aggregation.exect((err, result) => {
res.json(result);
});
In DB2 the same exact query runs in milliseconds up to maximum a 10 seconds
In DB1 the query never takes less than 40 seconds.
I do not understand why. What could I be missing?
I tried to confront the Documents and the Indexes and they're the same.
Deleting the collection and restrting saving the documents, brings the speed back to normal and acceptable, but why is it happening? Does someone had same experience?
Short answer:
You should create following index:
{ "userId": 1, "serverId": 1, "sort": 1 }
Longer answer
Based on your code (i see that you have .allowDiskUse(true)) it looks like mongo is trying to do in memory sort with "a lot" of data. Mongo has by default 100MB system memory limit for sort operations, and you can allow it to use temporary files on disk to store data if it hits that limit.
You can read more about it here: https://www.mongodb.com/docs/manual/reference/method/cursor.allowDiskUse/
In order to optimise the performance of your queries, you can use indexes.
Common rule that you should follow when planning indexes is ESR (Equality, Sort, Range). You can read more about it here: https://www.mongodb.com/docs/v4.2/tutorial/equality-sort-range-rule/
If we follow that rule while creating our compound index, we will add equality matches first, in your case "userId" and "serverId". After that comes the sort field, in your case "sort".
If you had a need to additionally filter results based on some range (eg. some value greater than X, or timestamp greater than yday), you would add that after the "sort".
That means your index should look like this:
schema.index({ userId: 1, serverId: 1, sort: 1 });
Additionally, you can probably remove allowDiskUse, and handle err inside aggregation.exec callback (im assuming that aggregation.exect is a typo)
There is Dynamo table with fields:
email (primary)
tenant
other stuff
I want to get all the items where email contains 'mike'
In my nodejs server, I have this code
const TableName= 'UserTable';
const db = new aws.DynamoDB();
const email = 'mike.green#abc.com'
params = {
TableName: userTableName,
KeyConditionExpression: '#email = :email',
ExpressionAttributeNames: {
'#email': 'email',
},
ExpressionAttributeValues: {
':email': { S: email },
},
};
db.query(params, (err, data) => {
if (err) {
reject(err);
} else {
const processedItems = [...data.Items].sort((a, b) => a.email < b.email ? -1 : 1);
const processedData = { ...data, Items: processedItems };
resolve(processedData);
}
this works ^^ only if I search entire email mike.green#abc.com
Question 1 -
But, if i want to search mike, and return all items where email contains mike, How can i get that?
Question 2
If I want to get all the rows where email contains mike and tenant is Canada. How can i get that?
I'm not a NodeJS user but hope it will be helpful.
Question 1 - But, if i want to search mike, and return all items where
email contains mike, How can i get that?
Key expressions are reserved to equality constraints. If you want to have more querying flexibility, you need to use a filter expression. Please notice that you won't be able to use filter expression on your partition key. You can find more information on https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html but the most important is:
Key Condition Expression
To specify the search criteria, you use a key condition expression—a
string that determines the items to be read from the table or index.
You must specify the partition key name and value as an equality
condition.
You can optionally provide a second condition for the sort key (if
present). The sort key condition must use one of the following
comparison operators:
a = b — true if the attribute a is equal to the value b
a < b — true if a is less than b
a <= b — true if a is less than or equal to b
a > b — true if a is greater than b
a >= b — true if a is greater than or equal to b
a BETWEEN b AND c — true if a is greater than or equal to b, and less than or equal to c.
The following function is also supported:
begins_with (a, substr)— true if the value of attribute a begins with a particular substring.
......
Question 2 If I want to get all the rows where email contains mike and
tenant is Canada. How can i get that?
You can use a filter expression to do that and use one of available functions https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.OperatorsAndFunctions.html#Expressions.OperatorsAndFunctions.Syntax. A filter expression is:
If you need to further refine the Query results, you can optionally
provide a filter expression. A filter expression determines which
items within the Query results should be returned to you. All of the
other results are discarded.
A filter expression is applied after a Query finishes, but before the
results are returned. Therefore, a Query will consume the same amount
of read capacity, regardless of whether a filter expression is
present.
A Query operation can retrieve a maximum of 1 MB of data. This limit
applies before the filter expression is evaluated.
A filter expression cannot contain partition key or sort key
attributes. You need to specify those attributes in the key condition
expression, not the filter expression.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html
To wrap-up:
if e-mail is your partition key, you cannot apply contains on it - you have to query it directly.
eventually you can do a scan over your table and apply filter on it (https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html) but I wouldn't do that because of consumed capacity of the table and response time. Scan involves operating over all rows in the table, so if you have kind of hundreds of GB, you will likely not get the information in real-time. And real-time serving is one of purposes of DynamoDB.
I have event objects in MonogDB that look like this:
{
"start": 2010-09-04T16:54:11.216Z,
"title":"Short descriptive title",
"description":"Full description",
"score":56
}
And I need to get a query across three parameters:
Time window (event start is between two dates)
Score threshold (score is > x)
Full-text search of title and description
What's the right way to approach this efficiently? I think the first two are done with an aggregation but I'm not sure how text search would factor in.
Assuming your start field is of type date (which it should be) and not a string, here are the basic components that you'd want to play with. In fact, given the ISO 8601 structure of a MongoDB date a string based comparison would work just as well.
// create your text index
db.collection.ensureIndex({
description: "text",
title: "text"
})
// optionally create an index on your other fields
db.collection.ensureIndex({
start: 1,
score: 1
})
x = 50
lowerDate = ISODate("2010-09-04T16:54:11.216Z") // or just the string part for string fields
upperDate = ISODate("2010-09-04T16:54:11.216Z")
// simple find to retrieve your result set
db.collection.find({
start: {
$gte: lowerDate, // depending on your exact scenario, you'd need to use $gt
$lte: upperDate // depending on your exact scenario, you'd need to use $lt
},
score: { $gt: x }, // depending on your exact scenario, you'd need to use $gte
$text: { // here comes the text search
$search: "descriptive"
}
})
There is an important topic with respect to performance/indexing that needs to be understood, though, which is very well documented here: Combine full text with other index
This is why I initially wrote "components of what you'd want to play with". So depending on the rest of your application you may want to create different indexes.
I am new to AWS Lambda, Amazon DynamoDB and serverless. I have one user table want to do like this.
I want to fetch records pagination wise in each page fetch 10 records
from the user table,
I want to make sorting on columns like name and email. This both column with string datatype.
I am using serverless with node.js. Here I'm attaching my serverless.yaml file
UserDynamoDbTable:
Type: 'AWS::DynamoDB::Table'
DeletionPolicy: Retain
Properties:
AttributeDefinitions:
-
AttributeName: id
AttributeType: S
KeySchema:
-
AttributeName: id
KeyType: HASH
ProvisionedThroughput:
ReadCapacityUnits: 1
WriteCapacityUnits: 1
TableName: 'user'
For sorting i'm trying with this query
let params = {
TableName: 'user',
limit: 10,
ScanIndexForward: false
};
dynamoDb.scan(params, (error, result) => { })
But I didn't get a response as per my requirement.
Please help me here I'm new into this. Thanks in advance.
ScanIndexForward works on range key only. As the table doesn't contain range key (i.e. sort key) defined, the data is not sorted.
Specifies ascending (true) or descending (false) traversal of the
index. DynamoDB returns results reflecting the requested order
determined by the range key.
Unfortunately, DynamoDB can't sort the data by any other attributes. It can sort by range key only.
If you need to work around this, the best solution would be to add a secondary global index. That is essentially a copy of your table that has the appropriate sorting keys that come in handy in scenarios such as this.
Using Global Secondary Indexes in DynamoDB
I have a database of documents which are tagged with keywords. I am trying to find (and then count) the unique tags which are used alongside each other. So for any given tag, I want to know what tags have been used alongside that tag.
For example, if I had one document which had the tags [fruit, apple, plant] then when I query [apple] I should get [fruit, plant]. If another document has tags [apple, banana] then my query for [apple] would give me [fruit, plant, banana] instead.
This is my map function which emits all the tags and their neighbours:
function(doc) {
if(doc.tags) {
doc.tags.forEach(function(tag1) {
doc.tags.forEach(function(tag2) {
emit(tag1, tag2);
});
});
}
}
So in my example above, it would emit
apple -- fruit
apple -- plant
apple -- banana
fruit -- apple
fruit -- plant
...
My question is: what should my reduce function be? The reduce function should essentially filter out the duplicates and group them all together.
I have tried a number of different attempts, but my database server (CouchDB) keeps giving me a Error: reduce_overflow_error. Reduce output must shrink more rapidly.
EDIT: I've found something that seems to work, but I'm not sure why. I see there is an optional "rereduce" parameter to the reduce function call. If I ignore these special cases, then it stops throwing reduce_overflow_errors. Can anyone explain why? And also, should I just be ignoring these, or will this bite me in the ass later?
function(keys, values, rereduce) {
if(rereduce) return null; // Throws error without this.
var a = [];
values.forEach(function(tag) {
if(a.indexOf(tag) < 0) a.push(tag);
});
return a;
}
Your answer is nice, and as I said in the comments, if it works for you, that's all you should care about. Here is an alternative implementation in case you ever bump into performance problems.
CouchDB likes tall lists, not fat lists. Instead of view rows keeping an array with every previous tag ever seen, this solution keeps the "sibling" tags in the key of the view rows, and then group them together to guarantee one unique sibling tag per row. Every row is just two tags, but there could be thousands or millions of rows: a tall list, which CouchDB prefers.
The main idea is to emit a 2-array of tag pairs. Suppose we have one doc, tagged fruit, apple, plant.
// Pseudo-code visualization of view rows (before reduce)
// Key , Value
[apple, fruit ], 1
[apple, plant ], 1 // Basically this is every combination of 2 tags in the set.
[fruit, apple ], 1
[fruit, plant ], 1
[plant, apple ], 1
[plant, fruit ], 1
Next I tag something apple, banana.
// Pseudo-code visualization of view rows (before reduce)
// Key , Value
[apple, banana], 1 // This is from my new doc
[apple, fruit ], 1
[apple, plant ], 1 // This is also from my new doc
[banana, apple], 1
[fruit, apple ], 1
[fruit, plant ], 1
[plant, apple ], 1
[plant, fruit ], 1
Why is the value always 1? Because I can make a very simple built-in reduce function: _sum to tell me the count of all tag pairs. Next, query with ?group_level=2 and CouchDB will give you unique pairs, with a count of their total.
A map function to produce this kind of view might look like this:
function(doc) {
// Emit "sibling" tags, keyed on tag pairs.
var tags = doc.tags || []
tags.forEach(function(tag1) {
tags.forEach(function(tag2) {
if(tag1 != tag2)
emit([tag1, tag2], 1)
})
})
}
I have found a correct solution I am much happier with. The trick was that CouchDB must be set to reduce_limit = false so that it stops checking its heuristic against your query.
You can set this via Futon on http://localhost:5984/_utils/config.html under the query_server_config settings, by double clicking on the value.
Once that's done, here is my new map function which works better with the "re-reducing" part of the reduce function:
function(doc) {
if(doc.tags) {
doc.tags.forEach(function(tag1) {
doc.tags.forEach(function(tag2) {
emit(tag1, [tag2]); // Array with single value
});
});
}
}
And here is the reduce function:
function(keys, values) {
var a = [];
values.forEach(function(tags) {
tags.forEach(function(tag) {
if(a.indexOf(tag) < 0) a.push(tag);
});
});
return a;
}
Hope this helps someone!