How to Handle Invalid Mongo/Mongoose ObjectId Assignment in ETL

How to Handle Invalid Mongo/Mongoose ObjectId Assignment in ETL - javascript

I have created an ETL to transfer data from sybase to a mongoDB environment. In one place I am matching against a legacy id in order to populate the correct mongo ObjectId in the model.
This works for my first few thousand records, until I run into a record that's missing a legacy id, in which case I get an error because the model is expecting an objectId. Doing a null check in my ETL file isn't sufficient to handle this. I need to also handle the fact that the model is expecting a valid objectId. How would I do this?
My relevant ETL code looks like this:
let agency = await Agency.findOne({ agencyId: record.agent_house }).exec();
Which I drop in like this:
agency: {
id: agency._id ? agency._id : null,
// Other prop,
// Other prop
}
And the data then gets ported to the model, which looks like this:
agency: {
id: { type: mongoose.Schema.Types.ObjectId, ref: 'agencies' },
// other prop,
// other prop
}
How can I handle a situation where there is no value, which will cause the objectId assignment to fail even with the null check in place in the ETL file?

Using the mongoose ObjectId type, you can do:
agency: {
id: agency._id ? agency._id : new mongoose.mongo.ObjectId(),
// Other prop,
// Other prop
}

Related

How `transform` in mongoose schema works?

Picture of my user model :
In my user model I have this as second argument to my user model, to delete __v and replace _id by id:
{
toJSON: {
transform: function (doc, ret) {
ret.id = ret._id;
delete ret._id;
delete ret.password;
delete ret.__v;
},
},
}
In my signin router I have something like this :
const existingUser = await User.findOne({email});
console.log("existingUser*", existingUser)
res.status(200).send(existingUser);
I got this from my console.log
{
_id: 5fe81e29fdd22a00546b05e3,
email: 'chs#hotmail.fr',
password: '0636b425ef0add0056ec85a5596eacf9ff0c71f8c2a1d4bad068a8679398e11870df12262722b911502eacb5fca23cef0cdd3b740481102ead50c58756d14a34.3f82d856ad93bc99',
__v: 0
}
But in postman I received this :
{
"email": "chs#hotmail.fr",
"id": "5fe81e29fdd22a00546b05e3"
}
I know that with transform, "if set, mongoose will call this function to allow you to transform the returned object".
But could someone explain to me when the 'transform' occurs to justify the difference between the console.log and the data I received in postman ?
Does this have something to do with asynchronous ?

res.status(200).send(existingUser); looks like expressjs (or look-alike) controller code, so i'll assume it is Express.
.send(body) method sends response to the client browser as a string (well, technically). So, before actual transmission, the body argument is converted to string if it isn't a string already. existingUser in your code isn't a string, it's a mongoose object, so express casts it to a string, effectively this will be similar to the following code:
res.status(200)
.send(
existingUser.toString() // .toString() here is the key
);
Under the hood, mongoose object's .toString() is proxied to .toJSON() method, so your code becomes equivalent to following:
...
.send(
existingUser.toJSON() // .toJSON() here
);
...and .toJSON() method is what takes into account the transform(doc, ret) option, that you specified for the mongoose schema.
console.log() on the other hand, does not use underlying .toString()/.toJSON() methods of the arguments. If you want to print to console the result, that would be received by the end consumer (postman, f.e.), then you should call the transform manually:
console.log(existingUser.toJSON()); // like this, transformed, but not stringified
console.log(existingUser.toString()); // or like this, but already stringified
console.log(JSON.stringify(existingUser, null, 3)); // or transform and then stringify with custom formatting (3-space tabulated instead of default single-line formatting)
The whole transform chain looks like this:
Model
-> `.toJSON()`
-> Mongoose transforms model internally into POJO
if custom transform is defined
-> Mongoose passes POJO to user defined `transform(doc, ret)`,
where `doc` - original document, `ret` - internally transformed POJO
-> `ret` is returned
else
-> POJO is returned

Is it advisable to use the spread operator to store user input in database

I'm using Joi library to validate input from users and mongoose schema to validate data sent to Mongodb?
I find myself using the spread operator after validating the input to quickly store the data in a mongoose data model, like this:
validate(req.body.inputObject);
//...
const newData = {...req.body.inputObject } /// spread operator here!!!
//...
if /* required additional data */ {
newData.more = {
_id: someId,
value: someValue,
}
}
//...
const data = Data.findByIdAndUpdate(req.params.id, newData, { new: true });
// check and report error, otherwise
res.send(data);
I know that mongoose stores only the properties found in the Schema of an object in the database. Therefore, any additional data passed by the user that is not in the schema is dropped. The alternative, as I see it, is something like this:
validate(req.body.inputObject);
//...
const newData = {
value1: req.body.inputObject.value1,
value2: req.body.inputObject.value2,
...
valueN: req.body.inputObject.valueN,
}
//...
if /* require additional data */ {
newData.more = {
_id: someId,
value: someValue,
}
}
//...
const data = Data.findByIdAndUpdate(req.params.id, newData, { new: true });
// check and report error, otherwise
res.send(data);
The spread operator approach is flexible because I do not need to know/guess what values in the data object the user wants to update. Better, I dont have to assume that the user will provided values for all the properties of the object including the modifications they want to make. This way, users can send just the updated properties. This is not the case for the second approach.
My question is: are there any down sides or security concerns for using the spread operator in situations like this?
Is it bad to rely on mongoose dropping properties not found in the schema?
Is there a better approach that is both flexible and secure?
Or, should I stick with the second approach and expect users to send all the properties along with updates?
Thanks.

Pass an object to update fields on MongoDB

I need to pass to the updateMany method in a Node/Typescript software a piece of query retrieved from the database:
{'$unset': {'extras' : {'origin': ''}}}
In the db, the above query is stored as a field of an Object:
"cleanup.aggregated_records_to_modify" : {
"update_fields_by" : "{'$unset': {'extras' : {'origin': ''}}}"
}
If I pass the update_fields_by to the updateMany mondodb Nodejs driver method, I have an error saying ""MongoError: the update operation document must contain atomic operators." (it receives a string instead of an object?!?); if, instead, I create an object variable:
const queryTemp = { $unset: { extras: { origin: "" } } };
to give to the updateMany, all goes well.
How can I retrieve the field from db and correctly pass it to the update method as an object?

If you use JSON.parse(foo) on your variable, to convert it from string to an object

There was a problem (a bug?) with the tool I use to manage MondoDB; I cannot store an object with a key starting with $ because I receive the error "Illegal argument: Invalid BSON field name $unset". I have add the $ symbol programmatically.

GraphQL Unions and Sequelize

I'm having trouble understanding how to retrieve information from a GraphQL Union. I have something in place like this:
const Profile = StudentProfile | TeacherProfile
Then in my resolver I have:
Profile: {
__resolveType(obj, context, info) {
if (obj.studentId) {
return 'StudentProfile'
} else if (obj.salaryGrade) {
return 'TeacherProfile'
}
},
},
This doesn't throw any errors, but when I run a query like this:
query {
listUsers {
id
firstName
lastName
email
password
profile {
__typename
... on StudentProfile {
studentId
}
... on TeacherProfile {
salaryGrade
}
}
}
}
This returns everything except for profile which just returns null. I'm using Sequelize to handle my database work, but my understanding of Unions was that it would simply look up the relevant type for the ID being queried and return the appropriate details in the query.
If I'm mistaken, how can I get this query to work?
edit:
My list user resolver:
const listUsers = async (root, { filter }, { models }) => {
const Op = Sequelize.Op
return models.User.findAll(
filter
? {
where: {
[Op.or]: [
{
email: filter,
},
{
firstName: filter,
},
{
lastName: filter,
},
],
},
}
: {},
)
}
User model relations (very simple and has no relation to profiles):
User.associate = function(models) {
User.belongsTo(models.UserType)
User.belongsTo(models.UserRole)
}
and my generic user resolvers:
User: {
async type(type) {
return type.getUserType()
},
async role(role) {
return role.getUserRole()
},
},

The easiest way to go about this is to utilize a single table (i.e. single table inheritance).
Create a table that includes columns for all the types. For example, it would include both student_id and salary_grade columns, even though these will be exposed as fields on separate types in your schema.
Add a "type" column that identifies each row's actual type. In practice, it's helpful to name this column __typename (more on that later).
Create a Sequelize model for your table. Again, this model will include all attributes, even if they don't apply to a specific type.
Define your GraphQL types and your interface/union type. You can provide a __resolveType method that returns the appropriate type name based on the "type" field you added. However, if you named this field __typename and populated it with the names of the GraphQL types you are exposing, you can actually skip this step!
You can use your model like normal, utilizing find methods to query your table or creating associations with it. For example, you might add a relationship like User.belongsTo(Profile) and then lazy load it: User.findAll({ include: [Profile] }).
The biggest drawback to this approach is you lose database- and model-level validation. Maybe salary_grade should never be null for a TeacherProfile but you cannot enforce this with a constraint or set the allowNull property for the attribute to false. At best, you can only rely on GraphQL's type system to enforce validation but this is not ideal.
You can take this a step further and create additional Sequelize models for each individual "type". These models would still point to the same table, but would only include attributes specific to the fields you're exposing for each type. This way, you could at least enforce "required" attributes at the model level. Then, for example, you use your Profile model for querying all profiles, but use the TeacherProfile when inserting or updating a teacher profile. This works pretty well, just be mindful that you cannot use the sync method when structuring your models like this -- you'll need to handle migrations manually. You shouldn't use sync in production anyway, so it's not a huge deal, but definitely something to be mindful of.

MongoDB findAndModify() adds query to update clause

I'm creating an application in Node that has some CRUD components. On one of my data objects, I have a save() method that is meant to update a record if the object has an id that is found in the collection, or upsert a new document if not. Additionally, if doing an upsert I'd like to get back the _id for the document generated by Mongo.
It seems that findAndModify would do this, and, indeed, it does return an _id value from the database. In my query clause, I am filtering by _id. If my data object doesn't have an id, Mongo correctly does an upsert, however, no matter what _id value it returns, in addition to the the keys I am setting in the update clause, it also sets the _id on the document based on what value I used in the query clause. Some code for clarity:
User.prototype.save = function(callback) {
var that = this;
var args = {
'query' : { _id: this.getId() }, //getId() returns empty string if not set
'update' : { $set : {
firstName : this.firstName,
lastName : this.lastName,
email : this.email
//_id : this.getId()
// which is blank, is magically getting added due to query clause
}},
'new' : true,
'upsert' : true,
'fields' : {'_id' : true}
};
this.db.collection(dbName).findAndModify(args, function(err, doc){
if(!that.getId()) {
that.setId(doc._id);
}
if (typeof(callback) === "function"){
callback.call(that);
}
});
}
I'm simply looking for the semantics of update that also happens to return a Mongo-generated _id on upsert. I do not want the values of the query clause to additionally be appended as if they were in the update map. Is there any way to achieve what I am getting at?

You can generate the _id client side, with new new require('mongodb').ObjectID()
Then you can just do a regular upsert (no need to do findandmodify) because you already have the _id.
However, if you are using findAndModify, keep in mind that the node driver accepts the arguments to this function positionally, not as an object (like in the regular mongo shell).
The correct format to do findandmodify with the node driver looks like this:
collection.findAndModify(criteria, sort, update[, options, callback])
(options and callback are optional params). Full docs here:
https://github.com/mongodb/node-mongodb-native/blob/master/docs/insert.md

We Keep Coding

JavaScript is the programming language of the Web.