Insert multiple entries to SQL Server with Node.js - javascript

I am rewriting an old API for which I am trying to insert multiple values at once into a MSSQL-Server (2008) database using the node module mssql. Now, I am capable of doing this somehow, but I want to this following best practices. I've done my research and tried a lot of things to accomplish my target. However, I was not able to find a single solution which works just right.
Before
You may wonder:
Well, you are rewriting this API, so there must be a way this has been done before and that was working?
Sure, you're right, it was working before, but... not in a way I'd feel comfortable with using in the rewrite. Let me show you how it was done before (little bit of abstraction added of course):
const request = new sql.Request(connection);
let query = "INSERT INTO tbl (col1, col2, col3, col4) VALUES ";
for (/*basic for loop w/ counter variable i*/) {
query += "(1, #col2" + [i] + ", #col3" + [i] + ", (SELECT x FROM y WHERE z = #someParam" + [i] + "))";
// a check whether to add a comma or not
request.input("col2" + [i], sql.Int(), values[i]);
// ...
}
request.query(query, function(err, recordset) {
// ...
}
While this is working, again, I don't quite think this could be called anything like 'best practice'. Also this shows the biggest problem: a subselect is used to insert a value.
What I tried so far
The easy way
At first I tried the probably easiest thing:
// simplified
const sQuery = "INSERT INTO tbl (col1, col2, col3, col4) VALUES (1, #col2, #col3, (SELECT x FROM y WHERE z = #col4));";
oPool.request().then(oRequest => {
return oRequest
.input("col2", sql.Int(), aValues.map(oValue => oValue.col2))
.input("col3", sql.Int(), aValues.map(oValue => oValue.col3))
.input("col4", sql.Int(), aValues.map(oValue => oValue.col4))
.query(sQuery);
});
I'd say, this was a pretty good guess and actually working relative fine.
Except for the part, that ignores every item after the first one... which makes this pretty useless. So, I tried...
Request.multiple = true
...and I thought, it would do the job. But - surprise - it doesn't, still only the first item is inserted.
Using '?' for parameters
At this point I really started the search for a solution, as the second one was only a quick search in the modules documentation.
I stumbled upon this answer and tried it immediately.
Didn't take long for my terminal to spit out a
RequestError: Incorrect syntax near '?'.
So much for that.
Bulk inserting
Some further research led to bulk inserting.
Pretty interesting, cool feature and excellent updating of the question with the solution by the OP!
I had some struggle getting started here, but eventually it looked really good: Multiple records were inserted and the values seemed okay.
Until I added the subquery. Using it as value for a column declared didn't cause any error, however when checking the values of the table, it simply displayed a 0 as value for this column. Not a big surprise at all, but everybody can dream, right?
The lazy way
I don't really know what to think about this:
// simplified
Promise.all(aValues.map(oValue => {
return oPool.request().then(oRequest =>
oRequest
.input("col2", sql.Int, oValue.col2)
.input("col3", sql.Int, oValue.col3)
.input("col4", sql.Int, oValue.col4)
.query(sQuery);
});
});
It does the job, but if any of the request fails for whichever reason, the other, non-failing inserts, will still be executed, even though this should not be possible.
Lazy + Transaction
As continuing even if some fail was the major problem with the last method, I tried building a transaction around it. All querys are successful? Good, commit. Any query has an errpr? Well, just rollback than. So I build a transaction, moved my Promise.all construct into it and tried again.
Aaand the next error pops up in my terminal:
TransactionError: Can't acquire connection for the request. There is another request in progress.
If you came this far, I don't need to tell you what the problem is.
Summary
What I didn't try yet (and I don't think I will try this) is using the transaction way and calling the statements sequentially. I do not believe that this is be the way to go.
And I also don't think the lazy way is the one that should be used, as it uses single requests for every record to insert, when this could somehow be done using only one request. It's just that this somehow is, I don't know, not in my head right now. So please, if you have anything that could help me, tell me.
Also, if you see anything else that's wrong with my code, feel free to point it out. I am not considering myself as a beginner, but I also don't think that learning will ever end. :)

The way I solved this was using PQueue library with concurrency 1. Its slow due to concurrency of one but it works with thousands of queries:
const transaction = sql.transaction();
const request = transaction.request();
const queue = new PQueue({ concurrency: 1 });
// being transaction
await transaction.begin();
for (const query of queries) {
queue.add(async () => {
try {
await request.query(query);
} catch (err) {
// stop pending transactions
await queue.clear();
await queue.onIdle();
// rollback transaction
await transaction.rollback();
// throw error
throw err;
}
});
}
// await queue
await queue.onIdle();
// comit transaction
await transaction.commit();

Related

How to do find all or get all in dynamo db in NodeJS

I want to get all the data from a table in Dynamo DB in nodejs, this is my code
const READ = async (payload) => {
const params = {
TableName: payload.TableName,
};
let scanResults = [];
let items;
do {
items = await dbClient.scan(params).promise();
items.Items.forEach((item) => scanResults.push(item));
params.ExclusiveStartKey = items.LastEvaluatedKey;
} while (typeof items.LastEvaluatedKey != "undefined");
return scanResults;
};
I implemented this and this is working fine, but our code review tool is flagging red that this is not optimized or causing some memory leak, I just cannot figure out why, I have read somewhere else that scanning API from dynamo DB is not the most efficient way to get all data in node or is there something else that I am missing to optimize this code
DO LIKE THIS ONLY IF YOUR DATA SIZE IS VERY LESS (less than 100 items or data size less than 1MB, that's I prefer and in that case you don't need a do-while loop)
Think about the following scenario, What about in case in future, more and more items will add in to DynamoDB table? - This will return all your data and put into the scanResults variable right? This will impact the memory. Also, DynamoDB scan operation is expensive - in terms of both memory and cost
It's perfectly okay to use SCAN operation if the data is very less. Otherwise, go with pagination (I always prefer this). If there are 1000's of items, then who will look in to all these in a single shot? So use pagination instead.
Lets take another scenario, If your requirement is to retrieve all the data for doing some analytics or aggregation. Then better store the aggregate data upfront into the table (same or different DynamoDB table) as an item or use some analytics database.
If your requirement is something else, elaborate it in the question.

API can't handle my request because of template literals to make the API dynamic

For a school project, I have to make a quiz app. It is possible to chose a difficulty, a category and an amount of desired questions. The api is a url which can be modified easily by changing some values. For example: https://quizapi.io/api/v1/questions?apiKey=MYAPIKEY&limit=15&difficulty=hard&category=cms. If you would just change the php to code in the url, you would get a max amount of 15 questions on a hard difficulty about HTML and CSS. I think you see where this is going.
However. I have setup my code that the difficulty, category and amount are stored in localstorage and they are fetched when the quiz is started. At the moment, I get the amout of questions I desire but I can't change my difficulty or category because probably Template Literals aren't working in a fetch api.. Maybe someone can give me an idea or maybe I'm making a mistake in my current code
let storageDif = localStorage.getItem("mD");
console.log(storageDif.toString());
let storageCat = localStorage.getItem("mC");
console.log(storageCat);
let geslideVragen = localStorage.getItem("slider");
let MAX_VRAGEN = geslideVragen;
console.log(MAX_VRAGEN);
let vragen = [];
fetch(`https://quizapi.io/api/v1/questions?apiKey=kAFKilHLeEcfLkGE2H0Ia9uTIp1rYHDTIYIHs9qf&limit=15&difficulty=hard&category=${storageCat}`)
.then((res) => {
return res.json();
})
.then((loadedQuestions) => {
for (let i = 0; i < MAX_VRAGEN; i++) {
vragen = loadedQuestions;
console.log(vragen[i].question);
};
startGame();
})
.catch( err => {
console.error(err);
});
I'm sure you found out by now that you're only interpolating the category. To get it to be correctly, you'd need to do this:
`https://quizapi.io/api/v1/questions?apiKey=kAFKilHLeEcfLkGE2H0Ia9uTIp1rYHDTIYIHs9qf&limit=${MAX_VRAGEN}&difficulty=${storageDif}&category=${storageCat}`
That being said, you should never expose your API keys this way, because especially for cloud services, it can easily cost you over 5 digits in a single day if someone decided to use it for their own means. There are plenty of scrapers that scour GitHub for exposed API keys for illegitimate uses.
Also, should apply a check to make sure all values are present using an if() statement so that it doesn't fetch anything if a value is undefined.

Ag-grid: duplicate node id 107 detected from getRowNodeId callback , this could cause issues in your grid

I am going to do live data streaming on ag-grid datatable, so I used DeltaRowData for gridOptions and added getRowNodeId method as well which return unique value 'id'.
After all, I got a live update result on my grid table within some period I set, but some rows are duplicated so I can notice total count is a bit increased each time it loads updated data. The question title is warning message from browser console, I got bunch of these messages with different id number. Actually it is supposed not to do this from below docs. This is supposed to detect dups and smartly added new ones if not exist. Ofc, there are several ways to get refreshed data live, but I chose this one, since it says it helps to persist grid info like selected rows, current position of scroll on the grid etc. I am using vanilla js, not going to use any frameworks.
How do I make live data updated periodically without changing any current grid stuff? There is no error on the code, so do not try to speak about any bug. Maybe I am wrong with current implementation, Anyway, I want to know the idea or hear any implementation experience on this.
let gridOptions = {
....
deltaRowDataMode: true,
getRowNodeId = (data) => {
return data.id; // return the property you want set as the id.
}
}
fetch(loadUrl).then((res) => {
return res.json()
}).then((data) => {
gridOptions.api.setRowData(data);
})
...
If you get:
duplicated node warning
it means your getRowNodeId() has 1 value for 2 different rows.
here is part from source:
if (this.allNodesMap[node.id]) {
console.warn("ag-grid: duplicate node id '" + node.id + "' detected from getRowNodeId callback, this could cause issues in your grid.");
}
so try to check your data again.
if u 100% sure there is an error not related with your data - cut oof the private data, create a plinkr/stackblitz examples to reproduce your issue and then it would be simpler to check and help you.

Better performance when saving large JSON file to MySQL

I have an issue.
So, my story is:
I have a 30 GB big file (JSON) of all reddit posts in a specific timeframe.
I will not insert all values of each post into the table.
I have followed this series, and he coded what I'm trying to do in Python.
I tried to follow along (in NodeJS), but when I'm testing it, it's way too slow. It inserts one row every 5 seconds. And there 500000+ reddit posts and that would literally take years.
So here's an example of what I'm doing in.
var readStream = fs.createReadStream(location)
oboe(readStream)
.done(async function(post) {
let { parent_id, body, created_utc, score, subreddit } = data;
let comment_id = data.name;
// Checks if there is a comment with the comment id of this post's parent id in the table
getParent(parent_id, function(parent_data) {
// Checks if there is a comment with the same parent id, and then checks which one has higher score
getExistingCommentScore(parent_id, function(existingScore) {
// other code above but it isn't relevant for my question
// this function adds the query I made to a table
addToTransaction()
})
})
})
Basically what that does, is to start a read stream and then pass it on to a module called oboe.
I then get JSON in return.
Then, it checks if there is a parent saved already in the database, and then checks if there is an existing comment with the same parent id.
I need to use both functions in order to get the data that I need (only getting the "best" comment)
This is somewhat how addToTransaction looks like:
function addToTransaction(query) {
// adds the query to a table, then checks if the length of that table is 1000 or more
if (length >= 1000) {
connection.beginTransaction(function(err) {
if (err) throw new Error(err);
for (var n=0; n<transactions.length;n++) {
let thisQuery = transactions[n];
connection.query(thisQuery, function(err) {
if (err) throw new Error(err);
})
}
connection.commit();
})
}
}
What addToTransaction does, is to get the queries I made and them push them to a table, then check the length of that table and then create a new transaction, execute all those queries in a for loop, then comitting (to save).
Problem is, it's so slow that the callback function I made doesn't even get called.
My question (finally) is, is there any way I could improve the performance?
(If you're wondering why I am doing this, it is because I'm trying to create a chatbot)
I know I've posted a lot, but I tried to give you as much information as I could so you could have a better chance to help me. I appreciate any answers, and I will answer the questions you have.

Meteor: Slow template render blocks method return value

I'm experiencing a very strange issue with Meteor whereby a single page is taking a very long time to load. I've gone down numerous routes to try and troubleshoot before identifying a cause which isn't really making sense to me.
Fundamentally I have a two-step process that collects some initial data and then inserts a couple of hundred records into the database. On the second page I then need to be able to assign users to each of the new records. I can also go back into the same page and edit the assigned users so in this second scenario the inserts have already happened - same lag.
The insert code is relatively straightforward.
Client Side
setup assessment meta-data & create new assessment record + items:
// set updatingAssessment variable to stop subs re-loading as records are inserted
Session.set('updatingAssessment', true);
Meteor.call('insertAssessment', assessment, function(err, result) {
if (err) { showError(err.reason); }
var assessmentId = result.toHexString();
// some code excluded to get 'statements' array
Meteor.call('insertAssessmentItems',
assessmentId,
statements,
function(err, result) {
Session.set('updatingAssessment', false);
});
}
// hide first modal and show 2nd
Session.set("assessmentStage1", false);
Session.set("assessmentStage2", true);
Server methods
Volume wise, we're inserting 1 assessment record and then ~200 assessment items
insertAssessment: function(assessment) {
this.unblock();
return Assessments.insert({
<<fields>>
}, function(err, result) { return result; });
},
insertAssessmentItems: function(assessmentId, statements) {
this.unblock();
for (i = 0; i < statements.length; i++) {
AssessmentItems.insert({
<<fields>>
}, function(err, result) { });
}
}
Initially I tried numerous SO suggestions around not refreshing subscriptions while the inserts were happening, using this.unblock() to free up DDP traffic, making queries on the slow page non-reactive, only returning the bare minimum fields, calling the inserts asynchronously, etc, etc... all of which made no difference with either the initial insert, or subsequent editing of the same records (i.e go out of the page and come back so the inserts have already happened).
After commenting out all database code and literally returning a string from the meteor method (with the same result / lag) I looked at the actual template generation code for the html page itself.
After commenting out various sections, I identified that a helper to build up a dropdown list of users (repeated for each of those ~200 records) was causing the massive spike in load time. When I commented out the dropdown creation the page load happens in <5s which is within a more acceptable range although still slower than i would like.
Two oddities with this, which I was hoping somebody could shed some light on:
Has anything changed recently in Meteor that would have caused a slowdown as I haven't significantly changed my code for a couple of months and it was working before, albeit still around 10 seconds to fully render the page. The page also used to be responsive when loaded, whereas now the CPU is stuck at 100% and selecting users is very laggy.
Why is the actual template loading blocking my server method returns? There is no doubt I need to re-write my template to be more efficient but I spent a lot of time trying to figure out why my server methods were not returning values quickly. I even analysed the DDP traffic and there was no activity while it was waiting for the template to load - no reloaded subs, no method returns, nothing.
Thanks in advance,
David

Categories