I have a list of id and for each of them I'm fetching the corresponding item from a DynamoDB table using GetItem.
The thing is some ids are not present.
My question is: Let's say if I go through my list, there are 5000 ids that doesn't match any item in table, and I make each call with a 2 seconds delay between each of them.
What should I expect to happen to my table ?
const dynamo = new AWS.DynamoDB.DocumentClient();
const getItem = (key) => {
const getParams = {
TableName: 'my-table',
Key: {
id: key
}
};
return dynamo
.get(getParams)
.promise()
.then(result => {
const item = result.Item;
if(item){
return Promise.resolve(item);
}
return Promise.reject();
}).catch(error => {
console.log('Could not retrieve item with id', key);
return Promise.reject(error);
});
};
Well nothing will happen to DynamoDB Table. It will still serve normally. It's very scalable and fast. But, attention here if you missed that. This might trick you into increasing costs ->
If you perform a read operation on an item that does not exist,
DynamoDB still consumes provisioned read throughput: A strongly
consistent read request consumes one read capacity unit, while an
eventually consistent read request consumes 0.5 of a read capacity
unit.
See here
Related
I am using Couchbase in a node app. Every time I insert a document, I am using a random UUID.
It inserts fine and I could retrieve data based on this id.
But in reality, I actually want to search by a key called url in the document. To be able to get or update or delete a document.
I could possibly add the url as the id I suppose but that is not what I see in any database concepts. Ids are not urls
or any unique names. They are typically random numbers or incremental numbers.
How could I approach this so that I can use a random UUID as id but be able to search by url?
Cos lets say the id was 56475-asdf-7856, I am not going to know this value to search for right.
Whereas if the id was https://www.example.com I know about this url and searching for it would give me what I want.
Is it a good idea making the url the id.
This is in a node app using Couchbase.
databaseRouter.put('/update/:id', (req, res) => {
updateDocument(req)
.then(({ document, error }) => {
if (error) {
res.status(404).send(error);
}
res.json(document);
})
.catch(error => res.status(500).send(error));
});
export const updateDocument = async (req) => {
try {
const result = await collection.get(req.params.id); // Feels like id should be the way to do this, but doesn't make sense cos I won't know the id beforehand.
document.url = req.body.url || document.url;
await collection.replace(req.params.id, document);
return { document };
} catch (error) {
return { error };
}
};
I think it's okay to use URLs as IDs, especially if that's the primary way you're going to lookup documents, and you don't need to change the URL later. Yes, often times IDs are numbers or UUIDs, but there is no reason you have to be restricted to this.
However, another approach you can take is to use a SQL query (SQL++, technically, since this is a JSON database).
Something like:
SELECT d.*
FROM mybucket.myscope.mydocuments d
WHERE d.url = 'http://example.com/foo/baz/bar'
You'll also need an index with that, something like:
CREATE INDEX ix_url ON mybucket.myscope.mydocuments (url)
I'd recommend checking out the docs for writing a SQL++ query (sometimes still known as "N1QL") with Node.js: https://docs.couchbase.com/nodejs-sdk/current/howtos/n1ql-queries-with-sdk.html
Here's the first example in the docs:
async function queryPlaceholders() {
const query = `
SELECT airportname, city FROM \`travel-sample\`.inventory.airport
WHERE city=$1
`;
const options = { parameters: ['San Jose'] }
try {
let result = await cluster.query(query, options)
console.log("Result:", result)
return result
} catch (error) {
console.error('Query failed: ', error)
}
}
After using the below to pull data from Dynamo db sucessfully
async function pullone(sessionid) {
const params = {
TableName: dynamodbTableName,
Key: {
'sessionid': sessionid
}
};
return await dynamodb.get(params).promise().then((response) => {
return response.Item
}, (error) => {
console.error('Do your custom error handling here. I am just gonna log it: ', error);
});
}
Instead of 'return response.Item' i just want to return the count instead.
I tried doing count(pullone(sessionid)) but not sure if that is even a valid method. Please assist
Not sure if I understood your question, but:
Since you're requesting data associated with a primary key, you'll get either 0 or 1 element in Item.
So, if you aim to know if "you've found something or not", you can use Number(response.Item != null) and you'll get 1 in case of "something" and 0 in case of "nothing".
If, instead, your data contains a "count" attribute, then (await pullone(sessionId)).count should work.
Otherwise, you have to query your DB (but you'll get Items (plural) in your response) and use the length() function of the Items array you'll get in the response.
I am building a REACT note taking app.
I only track the changes the user makes to the note, and not what the current state of the note is.
If there are no changes to a specific property, the property will be sent as an empty string.
I am handling this in NODE (node-postgres) with the following function:
const updateNote = async (req, res) => {
const { category, title, body, nid } = req.body;
let noteStatement = "";
let valueStatement = "";
for (const key in req.body)
{
if (req.body[key] !== "" && key !== "nid") {
noteStatement = noteStatement + key + ", ";
valueStatement = valueStatement + `'${req.body[key]}', `;
}
}
try {
const result = await pool.query(
`UPDATE notes SET (${noteStatement}last_updated)
= (${valueStatement}(to_timestamp(${Date.now()} / 1000.0)))
WHERE nid = ${nid} RETURNING *`
);
const note = result.rows;
return res.status(200).send({ note });
} catch (err) {
return res.send({error: err});
}
};
I may be overthinking, but the goal was to send the smallest bit of data to the DB as possible.
I have spent a fair amount of time on this and this is the most pragmatic approach I came up with.
Is writing this type of query bad practice?
Would it make more sense to send all the note data including properties that have not been updated from React and have a fixed query updating all properties?
EDIT: Updated Query
const updateNote = async (req, res) => {
const { category, title, body, nid } = req.body;
const text = `UPDATE notes SET (category, title, body, last_updated)
= ($1, $2, $3, (to_timestamp(${Date.now()} / 1000.0)))
WHERE nid = $4 RETURNING *`;
const values = [category, title, body, nid];
try {
const result = await pool.query(text, values);
const note = result.rows;
return res.status(200).send({ note });
} catch (err) {
return res.send({ error: err });
}
};
I wanted to leave a comment but soon realized I approach the character limit so I would just leave it as a response.
First of, I want to make sure I understand what you are trying to accomplish. I assume you just wanna update your DB with only the fields that has been provided from the client. With that in mind I just want to underline the fact that most people are trying to overcomplicate things that should not. Everything in software is a tradeoff, in your case your data isn't that big to worry about updating just certain fields. It can be done but not the way you are doing it right now, you can have a utility function that would build a parameterized query based on the values that are not empty/null depending on how would you send the data that did not change from the client
Which brings me to the 2nd thing, you should never write a SQL query the way you have done it, by concatonating a string, leaves you vulnerable to SQL injections. Instead you must always use parameterized query unless you use some kind of library that abstracts writing the queries (ORMs).
As a sidenote, never trust data that comes from the client, so always, always validate the data on the server before you make any changes to the DB, even if you already did validation on the client. You can do it using a middleware like celebrate or other validation libraries. Never trust anything that comes from the client.
I have connected to this local I created dynamo with a Lambda with this code
console.log('Starting Function Now');
const AWS = require("aws-sdk");
const docClient = new AWS.DynamoDB.DocumentClient({region: 'us-east-1'})
exports.handler = function(event,ctx,callback){
let scanParameters = {
"TableName": "Accommodation_Request",
"FilterExpression": "SSD_ID = :val",
"ExpressionAttributeValues": {":val": {"N": "2"}},
"ProjectionExpression": 'SSD_ID,AI_Org_ID'
};
console.log(scanParameters)
console.log("Your Student information");
docClient.scan(scanParameters, function(err,data){
if(err){
callback(err, null)
} else{
callback(null,data)
}
});
};
After running the Lambda this I get this outcome
{
"Items": [],
"Count": 0,
"ScannedCount": 5
}
Problem: I am unable to return the correct scan results with SSD_ID = 1
It could be in case you are using String instead of Number in your filter or you have composite partition key
I would guess that there is more data to look at. Is there a value in 'LastEvaluatedKey'? When you do a scan operation with a filter you may get no results on a particular request. The LastEvaluatedKey value should be passed into the next request's ExclusiveStartKey. You can potentially have many empty responses before getting any data. Scans should be used with caution, and generally not be part of production workloads (exceptions do exists, of course).
I have a script that is pulling 25,000 records from AWS Athena which is basically a PrestoDB Relational SQL Database. Lets say that I'm generating a request for each one of these records, which means I have to make 25,000 requests to Athena, then when the data comes back I have to make 25,000 requests to my Redis Cluster.
What would be the ideal amount of requests to make at one time from node to Athena?
The reason I ask is because I tried to do this by creating an array of 25,000 promises and then calling Promise.all(promiseArray) on it, but the app just hanged forever.
So I decided instead to fire off 1 at a time and use recursion to splice the first index out and then pass the remaining records to the calling function after the promise has been resolved.
The problem with this is that it takes forever. I took about an hour break and came back and there were 23,000 records remaining.
I tried to google how many requests Node and Athena can handle at once, but I came up with nothing. I'm hoping someone might know something about this and be able to share it with me.
Thank you.
Here is my code just for reference:
As a sidenote, what I would like to do differently is instead of sending one request at a time I could send 4, 5, 6, 7 or 8 at a time depending on how fast it would execute.
Also, how would a Node cluster effect the performance of something like this?
exports.storeDomainTrends = () => {
return new Promise((resolve, reject)=>{
athenaClient.execute(`SELECT DISTINCT the_column from "the_db"."the_table"`,
(err, data) => {
var getAndStoreDomainData = (records) => {
if(records.length){
return new promise((resolve, reject) => {
var subrecords = records.splice(0, )[0]
athenaClient.execute(`
SELECT
field,
field,
field,
SUM(field) as field
FROM "the_db"."the_table"
WHERE the_field IN ('Month') AND the_field = '`+ record.domain_name +`'
GROUP BY the_field, the_field, the_field
`, (err, domainTrend) => {
if(err) {
console.log(err)
reject(err)
}
redisClient.set(('Some String' + domainTrend[0].domain_name), JSON.stringify(domainTrend))
resolve(domainTrend);
})
})
.then(res => {
getAndStoreDomainData(records);
})
}
}
getAndStoreDomainData(data);
})
})
}
Using the lib your code could look something like this:
const Fail = function(reason){this.reason=reason;};
const isFail = x=>(x&&x.constructor)===Fail;
const distinctDomains = () =>
new Promise(
(resolve,reject)=>
athenaClient.execute(
`SELECT DISTINCT domain_name from "endpoint_dm"."bd_mb3_global_endpoints"`,
(err,data)=>
(err)
? reject(err)
: resolve(data)
)
);
const domainDetails = domain_name =>
new Promise(
(resolve,reject)=>
athenaClient.execute(
`SELECT
timeframe_end_date,
agg_type,
domain_name,
SUM(endpoint_count) as endpoint_count
FROM "endpoint_dm"."bd_mb3_global_endpoints"
WHERE agg_type IN ('Month') AND domain_name = '${domain_name}'
GROUP BY timeframe_end_date, agg_type, domain_name`,
(err, domainTrend) =>
(err)
? reject(err)
: resolve(domainTrend)
)
);
const redisSet = keyValue =>
new Promise(
(resolve,reject)=>
redisClient.set(
keyValue,
(err,res)=>
(err)
? reject(err)
: resolve(res)
)
);
const process = batchSize => limitFn => resolveValue => domains =>
Promise.all(
domains.slice(0,batchSize)
.map(//map domains to promises
domain=>
//maximum 5 active connections
limitFn(domainName=>domainDetails(domainName))(domain.domain_name)
.then(
domainTrend=>
//the redis client documentation makes no sense whatsoever
//https://redis.io/commands/set
//no mention of a callback
//https://github.com/NodeRedis/node_redis
//mentions a callback, since we need the return value
//and best to do it async we will use callback to promise
redisSet([
`Endpoint Profiles - Checkin Trend by Domain - Monthly - ${domainTrend[0].domain_name}`,
JSON.stringify(domainTrend)
])
)
.then(
redisReply=>{
//here is where things get unpredictable, set is documented as
// a synchronous function returning "OK" or a function that
// takes a callback but no mention of what that callback recieves
// as response, you should try with one or two records to
// finish this on reverse engineering because documentation
// fails 100% here and can not be relied uppon.
console.log("bad documentation of redis client... reply is:",redisReply);
(redisReply==="OK")
? domain
: Promise.reject(`Redis reply not OK:${redisReply}`)
}
)
.catch(//catch failed, save error and domain of failed item
e=>
new Fail([e,domain])
)
)
).then(
results=>{
console.log(`got ${batchSize} results`);
const left = domains.slice(batchSize);
if(left.length===0){//nothing left
return resolveValue.conat(results);
}
//recursively call process untill done
return process(batchSize)(limitFn)(resolveValue.concat(results))(left)
}
);
const max5 = lib.throttle(5);//max 5 active connections to athena
distinctDomains()//you may want to limit the results to 50 for testing
//you may want to limit batch size to 10 for testing
.then(process(1000)(max5)([]))//we have 25000 domains here
.then(
results=>{//have 25000 results
const successes = results.filter(x=>!isFail(x));
//array of failed items, a failed item has a .reason property
// that is an array of 2 items: [the error, domain]
const failed = results.filter(isFail);
}
)
You should figure out what redis client does, I tried to figure it out using the documentation but may as well ask my goldfish. Once you've reverse engineered the client behavior it is best to try with small batch size to see if there are any errors. You have to import lib to use it, you can find it here.
I was able to take what Kevin B said to find a much quicker way to query the data. What I did was change the query so that I could get the trend for all domains from Athena. I ordered it by domain_name and then sent it as a Node stream so that I could separate out each domain name into it's own JSON as the data was coming in.
Anyways this is what I ended up with.
exports.storeDomainTrends = () => {
return new Promise((resolve, reject)=>{
var streamObj = athenaClient.execute(`
SELECT field,
field,
field,
SUM(field) AS field
FROM "db"."table"
WHERE field IN ('Month')
GROUP BY field, field, field
ORDER BY field desc`).toStream();
var data = [];
streamObj.on('data', (record)=>{
if (!data.length || record.field === data[0].field){
data.push(record)
} else if (data[0].field !== record.field){
redisClient.set(('Key'), JSON.stringify(data))
data = [record]
}
})
streamObj.on('end', resolve);
streamObj.on('error', reject);
})
.then()
}