Using async/await within a read stream to batch upload to DynamoDB - javascript

A bit of a Node novice here...
I'm trying to write a function that pulls a CSV down from S3 and batch-writes the items to DynamoDB. DynamoDB has a limit of 25 in each batch so I need write the entries as I go. The problem I'm running into is that my await function to execute the DB write only fires at the .end(), rather than when I check.
I understand that I can't execute things like this, but I'm not sure how to fix it? I'm using Node12.
Thanks.
async function populateTable(
dataFile: bucketKey,
tableName: string
): Promise<void> {
const s3 = getS3Client();
const stream = s3.getObject(dataFile).createReadStream();
const BATCH_COUNT = 25; // Max size to write to DynamoDB
let counter = 0;
let datarows: any = [];
let datarow = {};
stream
.pipe(parse(DATA_HEADERS))
.on("data", async function(data: DataRow) {
counter++;
datarow = {
PutRequest: {
Item: data
}
};
datarows.push(datarow);
if (counter % BATCH_COUNT === 0) {
console.log("before batch write " + counter); // This fires!
await batchWriteToDynamo(datarows, tableName); // I want this function to fully execute before moving on
console.log("after batch write " + counter); // This does not
datarows = [];
}
})
.on("end", async function() {
await batchWriteToDynamo(datarows, tableName); // This fires!
});
}

I'd guess that these stream events aren't async compatible; you might have to resort to creating your own promise chain. You could potentially do that in the following manner:
let datarow = {};
let pr = Promise.resolve();
// ...
if (counter % BATCH_COUNT === 0) {
let scopedRows = datarows.slice(); // scoped shallow copy
pr = pr.then(()=> batchWriteToDynamo(scopedrows, tableName));
// ...
.on("end", async function() {
pr = pr.then(()=> batchWriteToDynamo(datarows, tableName));
This should make sure your batch writes happen one at a time and in the correct order. Note also the shallow copy of datarows during the data event. Pretty sure this is necessary since events and promises will be happening in an unpredictable order.
But in the end event it shouldn't be necessary since datarows shouldn't be changing any more at that point, I would guess.

Related

How to correctly use 'async, await and promises' in nodejs, while allocating values to a variable returned from a time-consuming function?

Problem Statement:
Our aim is to allocate values in the array ytQueryAppJs, which are returned from a time consuming function httpsYtGetFunc().
The values in ytQueryAppJs needs to be used many times in further part of the code, hence it needs to be done 'filled', before the code proceeds further.
There are many other arrays like ytQueryAppJs, namely one of them is ytCoverAppJs, that needs to be allocated the value, the same way as ytQueryAppJs.
The values in ytCoverAppJs further require the use of values from ytQueryAppJs. So a solution with clean code would be highly appreciated.
(I am an absolute beginner. I have never used async, await or promises and I'm unaware of the correct way to use it. Please guide.)
Flow (to focus on):
The user submits a queryValue in index.html.
An array ytQueryAppJs is logged in console, based on the query.
Expected Log in Console (similar to):
Current Log in Console:
Flow (originally required by the project):
User submits query in index.html.
The values of arrays, ytQueryAppJs, ytCoverAppJs, ytCoverUniqueAppJs, ytLiveAppJs, ytLiveUniqueAppJs gets logged in the console, based on the query.
Code to focus on, from 'app.js':
// https://stackoverflow.com/a/14930567/14597561
function compareAndRemove(removeFromThis, compareToThis) {
return (removeFromThis = removeFromThis.filter(val => !compareToThis.includes(val)));
}
// Declaring variables for the function 'httpsYtGetFunc'
let apiKey = "";
let urlOfYtGetFunc = "";
let resultOfYtGetFunc = "";
let extractedResultOfYtGetFunc = [];
// This function GETs data, parses it, pushes required values in an array.
async function httpsYtGetFunc(queryOfYtGetFunc) {
apiKey = "AI...MI"
urlOfYtGetFunc = "https://www.googleapis.com/youtube/v3/search?key=" + apiKey + "&part=snippet&q=" + queryOfYtGetFunc + "&maxResults=4&order=relevance&type=video";
let promise = new Promise((resolve, reject) => {
// GETting data and storing it in chunks.
https.get(urlOfYtGetFunc, (response) => {
const chunks = []
response.on('data', (d) => {
chunks.push(d)
})
// Parsing the chunks
response.on('end', () => {
resultOfYtGetFunc = JSON.parse((Buffer.concat(chunks).toString()))
// console.log(resultOfYtGetFunc)
// Extracting useful data, and allocating it.
for (i = 0; i < (resultOfYtGetFunc.items).length; i++) {
extractedResultOfYtGetFunc[i] = resultOfYtGetFunc.items[i].id.videoId;
// console.log(extractedResultOfYtGetFunc);
}
resolve(extractedResultOfYtGetFunc);
})
})
})
let result = await promise;
return result;
}
app.post("/", function(req, res) {
// Accessing the queryValue, user submitted in index.html. We're using body-parser package here.
query = req.body.queryValue;
// Fetching top results related to user's query and putting them in the array.
ytQueryAppJs = httpsYtGetFunc(query);
console.log("ytQueryAppJs:");
console.log(ytQueryAppJs);
});
Complete app.post method from app.js:
(For better understanding of the problem.)
app.post("/", function(req, res) {
// Accessing the queryValue user submitted in index.html.
query = req.body.queryValue;
// Fetcing top results related to user's query and putting them in the array.
ytQueryAppJs = httpsYtGetFunc(query);
console.log("ytQueryAppJs:");
console.log(ytQueryAppJs);
// Fetching 'cover' songs related to user's query and putting them in the array.
if (query.includes("cover") == true) {
ytCoverAppJs = httpsYtGetFunc(query);
console.log("ytCoverAppJs:");
console.log(ytCoverAppJs);
// Removing redundant values.
ytCoverUniqueAppJs = compareAndRemove(ytCoverAppJs, ytQueryAppJs);
console.log("ytCoverUniqueAppJs:");
console.log(ytCoverUniqueAppJs);
} else {
ytCoverAppJs = httpsYtGetFunc(query + " cover");
console.log("ytCoverAppJs:");
console.log(ytCoverAppJs);
// Removing redundant values.
ytCoverUniqueAppJs = compareAndRemove(ytCoverAppJs, ytQueryAppJs);
console.log("ytCoverUniqueAppJs:");
console.log(ytCoverUniqueAppJs);
}
// Fetching 'live performances' related to user's query and putting them in the array.
if (query.includes("live") == true) {
ytLiveAppJs = httpsYtGetFunc(query);
console.log("ytLiveAppJs:");
console.log(ytLiveAppJs);
// Removing redundant values.
ytLiveUniqueAppJs = compareAndRemove(ytLiveAppJs, ytQueryAppJs.concat(ytCoverUniqueAppJs));
console.log("ytLiveUniqueAppJs:");
console.log(ytLiveUniqueAppJs);
} else {
ytLiveAppJs = httpsYtGetFunc(query + " live");
console.log("ytLiveAppJs:");
console.log(ytLiveAppJs);
// Removing redundant values.
ytLiveUniqueAppJs = compareAndRemove(ytLiveAppJs, ytQueryAppJs.concat(ytCoverUniqueAppJs));
console.log("ytLiveUniqueAppJs:");
console.log(ytLiveUniqueAppJs);
}
// Emptying all the arrays.
ytQueryAppJs.length = 0;
ytCoverAppJs.length = 0;
ytCoverUniqueAppJs.length = 0;
ytLiveAppJs.length = 0;
ytLiveUniqueAppJs.length = 0;
});
Unfortunately you can use the async/await on http module when making requests. You can install and use axios module . In your case it will be something like this
const axios = require('axios');
// Declaring variables for the function 'httpsYtGetFunc'
let apiKey = "";
let urlOfYtGetFunc = "";
let resultOfYtGetFunc = "";
let extractedResultOfYtGetFunc = [];
// This function GETs data, parses it, pushes required values in an array.
async function httpsYtGetFunc(queryOfYtGetFunc) {
apiKey = "AI...MI"
urlOfYtGetFunc = "https://www.googleapis.com/youtube/v3/search?key=" + apiKey + "&part=snippet&q=" + queryOfYtGetFunc + "&maxResults=4&order=relevance&type=video";
const promise = axios.get(urlOfYtGetFunc).then(data => {
//do your data manipulations here
})
.catch(err => {
//decide what happens on error
})
Or async await
const data = await axios.get(urlOfYtGetFunc);
//Your data variable will become what the api has returned
If you still want to catch errors on async await you can use try catch
try{
const data = await axios.get(urlOfYtGetFunc);
}catch(err){
//In case of error do something
}
I have just looked at the code I think the issue is how you are handling the async code in the request handler. You are not awaiting the result of the function call to httpsYtGetFunc in the body so when it returns before the promise is finished which is why you get the Promise {Pending}.
Another issue is that the array is not extractedResultOfYtGetFunc is not initialised and you may access indexes that don't exist. The method to add an item to the array is push.
To fix this you need to restructure your code slightly. A possible solution is something like this,
// Declaring variables for the function 'httpsYtGetFunc'
let apiKey = "";
let urlOfYtGetFunc = "";
let resultOfYtGetFunc = "";
let extractedResultOfYtGetFunc = [];
// This function GETs data, parses it, pushes required values in an array.
function httpsYtGetFunc(queryOfYtGetFunc) {
apiKey = "AI...MI";
urlOfYtGetFunc =
"https://www.googleapis.com/youtube/v3/search?key=" +
apiKey +
"&part=snippet&q=" +
queryOfYtGetFunc +
"&maxResults=4&order=relevance&type=video";
return new Promise((resolve, reject) => {
// GETting data and storing it in chunks.
https.get(urlOfYtGetFunc, (response) => {
const chunks = [];
response.on("data", (d) => {
chunks.push(d);
});
// Parsing the chunks
response.on("end", () => {
// Initialising the array
extractedResultOfYtGetFunc = []
resultOfYtGetFunc = JSON.parse(Buffer.concat(chunks).toString());
// console.log(resultOfYtGetFunc)
// Extracting useful data, and allocating it.
for (i = 0; i < resultOfYtGetFunc.items.length; i++) {
// Adding the element to the array
extractedResultOfYtGetFunc.push(resultOfYtGetFunc.items[i].id.videoId);
// console.log(extractedResultOfYtGetFunc);
}
resolve(extractedResultOfYtGetFunc);
});
});
});
}
app.post("/", async function (req, res) {
query = req.body.queryValue;
// Fetching top results related to user's query and putting them in the array.
ytQueryAppJs = await httpsYtGetFunc(query);
console.log("ytQueryAppJs:");
console.log(ytQueryAppJs);
});
Another option would be to use axios,
The code for this would just be,
app.post("/", async function (req, res) {
query = req.body.queryValue;
// Fetching top results related to user's query and putting them in the array.
try{
ytQueryAppJs = await axios.get(url); // replace with your URL
console.log("ytQueryAppJs:");
console.log(ytQueryAppJs);
} catch(e) {
console.log(e);
}
});
Using Axios would be a quicker way as you don't need to write promise wrappers around everything, which is required as the node HTTP(S) libraries don't support promises out of the box.

async functions not executing in the correct order inside a map function

I have created an async function that will extra the data from the argument, create a Postgres query based on a data, then did some processing using the retrieved query data. Yet, when I call this function inside a map function, it seemed like it has looped through all the element to extra the data from the argument first before it proceed to the second and the third part, which lead to wrong computation on the second element and onwards(the first element is always correct). I am new to async function, can someone please take at the below code? Thanks!
async function testWeightedScore(test, examData) {
var grade = [];
const testID = examData[test.name];
console.log(testID);
var res = await DefaultPostgresPool().query(
//postgres query based on the score constant
);
var result = res.rows;
for (var i = 0; i < result.length; i++) {
const score = result[i].score;
var weightScore = score * 20;
//more computation
const mid = { "testID": testID, "score": weightScore, more values...};
grade.push(mid);
}
return grade;
}
(async () => {
const examSession = [{"name": "Sally"},{"name": "Bob"},{"name": "Steph"}]
const examData = {
"Sally": 384258,
"Bob": 718239,
"Steph": 349285,
};
var test = [];
examSession.map(async sesion => {
var result = await testWeightedScore(sesion,examData);
let counts = result.reduce((prev, curr) => {
let count = prev.get(curr.testID) || 0;
prev.set(curr.testID, curr.score + count);
return prev;
}, new Map());
let reducedObjArr = [...counts].map(([testID, score]) => {
return {testID, score}
})
console.info(reducedObjArr);
}
);
})();
// The console log printed out all the tokenID first(loop through all the element in examSession ), before it printed out reducedObjArr for each element
The async/await behaviour is that the code pause at await, and do something else (async) until the result of await is provided.
So your code will launch a testWeightedScore, leave at the postgresql query (second await) and in the meantime go to the other entries in your map, log the id, then leave again at the query level.
I didn't read your function in detail however so I am unsure if your function is properly isolated or the order and completion of each call is important.
If you want each test to be fully done one after the other and not in 'parallel', you should do a for loop instead of a map.

Javascript object retaining "old" properties, can't override?

I have the following code:
const readDataFromSql = () => {
// going to have to iterate through all known activities + load them here
let sql = "[...]"
return new Promise((resolve, reject) => {
executeSqlQuery(sql).then((dict) => {
let loadedData = [];
for (let key in dict) {
let newItemVal = new ItemVal("reading hw", 7121, progress.DONE);
loadedData.push(newItemVal);
}
resolve(loadedData);
});
});
}
ItemVal implementation:
class ItemVal {
constructor(name, time, type) {
this.name = name
this.time = time
this.type = type
}
}
Let's assume that newItemVal = "reading hwj", 5081, progress.PAUSED when readDataFromSql() first runs.
readDataFromSql() is then again called after some state changes -- where it repulls some information from a database and generates new values. What is perplexing, however, is that when it is called the second time, newItemVal still retains its old properties (attaching screenshot below).
Am I misusing the new keyword?
From what I can see in your example code, you are not mutating existing properties but creating a new object with the ItemVal constructor function and adding them to an array, that you then return as a resolved promise. Are you sure the examples you give a correct representation of what you are actually doing
Given that, I'm not sure what could be causing the issue you are having, but I would at least recommend a different structure for your code, using a simpler function for the itemVal.
Perhaps with this setup, you might get an error returned that might help you debug your issue.
const itemVal = (name, time, type) => ({ name, time, type })
const readDataFromSql = async () => {
try {
const sql = "[...]"
const dict = await executeSqlQuery(sql)
const loadedData = dict.map((key) =>
ItemVal("reading hw", 7121, progress.DONE)
)
return loadedData
} catch (error) {
return error
}
};
If the issue is not in the function, then I would assume that the way you handle the data, returned from the readDataFromSql function, is where the issue lies. You need to then share more details about your implementation.
const readDataFromSql = async () => {
let sql = "[...]"
------> await executeSqlQuery(sql).then((dict) => {
Use the await keyword instead of creating a new promise.
I did some modification and found that below code is working correctly, and updating the new values on each call.
const readDataFromSql = () => {
return new Promise((resolve, reject) => {
let loadedData = [];
let randomVal = Math.random();
let newItemVal = new ItemVal(randomVal*10, randomVal*100, randomVal*1000);
loadedData.push(newItemVal);
resolve(loadedData);
});
}
Could you recheck if you are using below line in the code, as it will instantiate object with same properties again and again.
let newItemVal = new ItemVal("reading hw", 7121, progress.DONE);
You can modify your code as below to simplify the problem.
const readDataFromSql = async () => {
// going to have to iterate through all known activities + load them here
let sql = "[...]" // define sql properly
let result = await executeSqlQuery(sql);
let loadedData = [];
for (let row in result) {
let newItemVal = new ItemVal(row.name, row.time, row.type);
loadedData.push(newItemVal);
}
return loadedData;
}
class ItemVal {
constructor(name, time, type) {
this.name = name
this.time = time
this.type = type
}
}
What you are talking about is an issue related to Object mutation in Redux, however, you didn't add any redux code. Anyway, you might be making some mistake while recreating(not mutating) the array.
General solution is the use spread operator as:
loadedData = [ ...loadedData.slice(0) , ...newloadedData]
In Dropdown.js line 188 instead of console.log-ing your variable write debugger;
This will function as a breakpoint. It will halt your code and you can inspect the value by hovering your mouse over the code BEFORE the newItemVal is changed again.
I can see in your screenshot that the newItemVal is modified again after you log it.

asynchronous loop for in Javascript

I'm trying to iterate and print out in order an array in Javascript that contains the title of 2 events that I obtained from doing web scraping to a website but it prints out in disorder. I know Javascript is asynchronous but I'm new in this world of asynchronism. How can I implement the loop for to print the array in order and give customized info?
agent.add('...') is like console.log('...'). I'm doing a chatbot with DialogFlow and NodeJs 8 but that's not important at this moment. I used console.log() in the return just for debug.
I tried the next:
async function printEvent(event){
agent.add(event)
}
async function runLoop(eventsTitles){
for (let i = 0; i<eventsTitles.length; i++){
aux = await printEvent(eventsTitles[i])
}
}
But i got this error error Unexpected await inside a loop no-await-in-loop
async function showEvents(agent) {
const cheerio = require('cheerio');
const rp = require('request-promise');
const options = {
uri: 'https://www.utb.edu.co/eventos',
transform: function (body) {
return cheerio.load(body);
}
}
return rp(options)
.then($ => {
//** HERE START THE PROBLEM**
var eventsTitles = [] // array of event's titles
agent.add(`This mont we have these events available: \n`)
$('.product-title').each(function (i, elem) {
var event = $(this).text()
eventsTitles.push(event)
})
agent.add(`${eventsTitles}`) // The array prints out in order but if i iterate it, it prints out in disorder.
// *** IMPLEMENT LOOP FOR ***
agent.add(`To obtain more info click on this link https://www.utb.edu.co/eventos`)
return console.log(`Show available events`);
}).catch(err => {
agent.add(`${err}`)
return console.log(err)
})
}
I would like to always print out Event's title #1 and after Event's title #2. Something like this:
events titles.forEach((index,event) => {
agent.add(`${index}. ${event}`) // remember this is like console.log(`${index}. ${event}`)
})
Thanks for any help and explanation!
There no async case here but if you still face difficultly than use this loop
for (let index = 0; index < eventsTitles.length; index++) {
const element = eventsTitles[index];
agent.add(${index}. ${element})
}

Looping through Protractor's code in `it` statement

Relatively new to writing end to end tests with Protractor. Also relatively inexperienced at working with promises.
I am writing a test where in some cases I need to loop through my code b/c the record that I select does not meet certain criteria. In those cases I would like to proceed back to a previous step and try another record (and continue doing so until I find a suitable record). I am not able to get my test to enter into my loop though.
I can write regular e2e tests with Protractor, but solving this looping issue is proving difficult. I know it must be because I'm dealing with Promises, and am not handling them correctly. Although I've seen examples of looping through protractor code, they often involve a single method that needs to be done to every item in a list. Here I have multiple steps that need to be done in order to arrive at the point where I can find and set my value to break out of the loop.
Here are some of the threads I've looked at trying to resolve this:
protractor and for loops
https://www.angularjsrecipes.com/recipes/27910331/using-protractor-with-loops
Using protractor with loops
Looping through fields in an Angular form and testing input validations using Protractor?
Protractors, promises, parameters, and closures
Asynchronously working of for loop in protractor
My code as it currently stands:
it('should select a customer who has a valid serial number', () => {
const products = new HomePage();
let serialIsValid: boolean = false;
let selectedElement, serialNumber, product, recordCount, recordList;
recordList = element.all(by.css(`mat-list.desco-list`));
recordList.then((records) => {
recordCount = records.length;
console.log('records', records.length, 'recordCount', recordCount);
}
);
for (let i = 0; i < recordCount; i++) {
if (serialIsValid === false) {
const j = i + 1;
products.btnFormsSelector.click();
products.formSelectorRepossession.click();
browser.wait(EC.visibilityOf(products.itemSearch));
products.itemSearch.element(by.tagName('input')).sendKeys(browser.params.search_string);
products.itemSearch.element(by.id('btnSearch')).click();
browser.wait(EC.visibilityOf(products.itemSearch.element(by.id('list-container'))));
selectedElement = element(by.tagName(`#itemSearch mat-list:nth-child(${{j}})`));
selectedElement.click();
browser.wait(EC.visibilityOf(products.doStuffForm));
browser.sleep(1000);
element(by.css('#successful mat-radio-button:nth-child(1) label')).click();
browser.sleep(1000);
expect(element(by.css('.itemDetailsContainer'))).toBeTruthy();
product = products.productIDNumber.getText();
product.then((item) => {
serialNumber = item;
if (item !== 'Unknown') {
expect(serialNumber).not.toContain('Unknown');
serialIsValid = true;
} else {
i++
}
})
} else {
console.log('serial is valid: ' + serialIsValid);
expect(serialNumber).not.toContain('Unknown');
break;
}
}
console.log('serial number validity: ', serialIsValid);
})
I have rewritten and reorganized my code several times, including trying to break out my code into functions grouping related steps together (as recommended in one of the threads above, and then trying to chain them together them together, like this:
findValidCustomer() {
const gotoProductSearch = (function () {...})
const searchForRecord = (function () {...})
const populateForm = (function (j) {...})
for (let i = 0; i < recordCount; i++) {
const j = i + 1;
if (serialIsValid === false) {
gotoProductSearch
.then(searchForRecord)
.then(populateForm(j))
.then(findValidSerial(i))
} else {
console.log('serial number validity' + serialIsValid);
expect(serialIsValid).not.toContain('Unknown');
break;
}
}
console.log('serial number validity' + serialIsValid);
}
When I've tried to chain them like that, I received this error
- TS2345: Argument of type 'number | undefined' is not assignable to parameter of type 'number'
Have edited my code from my actual test and apologies if I've made mistakes in doing so. Would greatly appreciate comments or explanation on how to do this in general though, b/c I know I'm not doing it correctly. Thanks in advance.
I would suggest looking into async / await and migrating this test. Why migrate? Protractor 6 and moving forward will require async / await. In order to do that, you will need to have SELENIUM_PROMISE_MANAGER: false in your config and await your promises. In my answer below, I'll use async / await.
Below is my attempt to rewrite this as async / await. Also try to define your ElementFinders, numbers, and other stuff when you need them so you can define them as consts.
it('should select a customer who has a valid serial number', async () => {
const products = new HomePage();
let serialIsValid = false; // Setting the value to false is enough
// and :boolean is not needed
const recordList = element.all(by.css(`mat-list.desco-list`));
const recordCount = await recordList.count();
console.log(`recordCount ${recordCount}`);
// This could be rewritten with .each
// See https://github.com/angular/protractor/blob/master/lib/element.ts#L575
// await recordList.each(async (el: WebElement, index: number) => {
for (let i = 0; i < recordCount; i++) {
if (serialIsValid === false) {
const j = index + 1; // Not sure what j is being used for...
await products.btnFormsSelector.click();
await products.formSelectorRepossession.click();
await browser.wait(EC.visibilityOf(products.itemSearch));
await products.itemSearch.element(by.tagName('input'))
.sendKeys(browser.params.search_string);
await products.itemSearch.element(by.id('btnSearch')).click();
await browser.wait(
EC.visibilityOf(await products.itemSearch.element(
by.id('list-container')))); // Maybe use a boolean check?
const selectedElement = element(by.tagName(
`#itemSearch mat-list:nth-child(${{j}})`));
await selectedElement.click();
// not sure what doStuffForm is but I'm guessing it returns a promise.
await browser.wait(EC.visibilityOf(await products.doStuffForm));
await browser.sleep(1000); // I would avoid sleeps since this might
// cause errors (if ran on a slower machine)
// or just cause your test to run slow
await element(by.css(
'#successful mat-radio-button:nth-child(1) label')).click();
await browser.sleep(1000);
expect(await element(by.css('.itemDetailsContainer'))).toBeTruthy();
const serialNumber = await products.productIDNumber.getText();
if (item !== 'Unknown') {
expect(serialNumber).not.toContain('Unknown');
serialIsValid = true;
}
// The else statement if you were using i in a for loop, it is not
// a good idea to increment it twice.
} else {
// So according to this, if the last item is invalid, you will not break
// and not log this. This will not fail the test. It might be a good idea
// to not have this in an else statement.
console.log(`serial is valid: ${serialIsValid}`);
expect(serialNumber).not.toContain('Unknown');
break;
}
}
console.log('serial number validity: ', serialIsValid);
});
Can you check the count again after updating your code by following snippet
element.all(by.css(`mat-list.desco-list`)).then(function(records) => {
recordCount = records.length;
console.log(recordCount);
});
OR
There is count() function in ElementArrayFinder class which returns promise with count of locator
element.all(by.css(`mat-list.desco-list`)).then(function(records) => {
records.count().then(number => {
console.log(number); })
});

Categories