Node.js / MongoDB / Mongoose: Buffer Comparison - javascript

First, a little background:
I'm trying to check to see if an image's binary data has already been saved in Mongo. Given the following schema:
var mongoose = require('mongoose')
, Schema = mongoose.Schema;
var imageSchema = new Schema({
mime: String,
bin: { type: Buffer, index: { unique: true }},
uses : [{type: Schema.Types.ObjectId}]
});
module.exports = mongoose.model('Image', imageSchema);
...I want to query to see if an image exists, if it does add a reference that my object is using it, and then update it. If it doesn't, I want to create (upsert) it.
Given the case that it does not exist, the below code works perfectly. If it does, the below code does not and adds another Image document to Mongo. I feel like it is probably a comparison issue for the Mongo Buffer type vs node Buffer, but I can't figure out how to properly compare them. Please let me know how to update the below! Thanks!
Image.findOneAndUpdate({
mime : contentType,
bin : image
}, {
$pushAll : {
uses : [ myObject._id ]
}
}, {
upsert : true
}, function(err, image) {
if (err)
console.log(err);
// !!!image is created always, never updated!!!
});

Mongoose converts Buffer elements destined to be stored to mongodb Binary, but it performs the appropriate casts when doing queries.
The expected behavior is also checked in units tests (also the storage and retrieval of a node.js Buffer).
Are you sure you are passing a node.js Buffer?
In any case I think the best approach to handle the initial problem (check if an image is already in the db) would be storing a strong hash digest (sha1, sha256, ...) of the binary data and check that (using the crypto module).
When querying, as a preliminary test you could also check the binary length to avoid unnecessary computations.
For an example of how to get the digest for your image before storing/querying it:
var crypto = require('crypto');
...
// be sure image is a node.js Buffer
var image_digest = crypto.createHash('sha256');
image_digest.update(image);
image_digest = image_digest.digest('base64');

It is not a good idea to query for your image by the node.js Buffer that contains the image data. You're right that it's probably an issue between the BSON binary data type and a node Buffer, but does your application really require such a comparison?
Instead, I'd add an imageID or slug field to your schema, add an index to this field, and query on it instead of bin in your findOneAndUpdate call:
var imageSchema = new Schema({
imageID: { type: String, index: { unique: true }},
mime: String,
bin: Buffer,
uses : [{type: Schema.Types.ObjectId}]
});

the hash does work, another filter I have used is the exif data for the image.
As this is structured information, if you have a match on exif data, you could then go to the next step of checking for a match on the hash or file size...
heaps of node modules to get the exif data nice and easily for your storage :)
example code to get exif data for node

Related

Mongoose how to update array from child to parent

I have following schema for Audio.
const AudioSchema = new mongoose.Schema({
name: {
type: String,
required: true
},
uploaderId: {
type: String,
required: true
}
});
Instead of referencing the User, I just store the User's _id as uploaderId.
In my User schema I also have audioFiles: [Audio] array for all audio files that user has uploaded.
const UserSchema = new mongoose.Schema({
...,
audioFiles: [Audio]
});
When I try to update my AudioSchema from my REST Api, I can change all the properties and that works, but after saving this Audio model those changes doesn't affect the User model.
Now I created a new branch and try to change uploaderId to UserSchema. But I wonder is there a solution for this without referencing the UserSchema
I managed to do this with help of MongooseArray.prototype.pull method.
Steps for solving this problem is really easy.
First I get the User that is associated with AudioModel.uploaderId, then I used the user.audioFiles.pull() method. Correct code is below.
let user = await UserService.getUser(userId);
await user.audioFiles.pull({
_id: audioId //audioId is the id which i'm trying to remove from array
});
await user.save();
I also added try-catch block to handle errors.
Anyone having this kind of issue can use the link below to get more information about MongooseArray.prototype.pull method.
Also you can check the other answers in this post.

Meteor: How to store and retrive file in a mongodb collection?

I try to save the content of uploaded file in a collection:
export const Files = new Mongo.Collection('files');
Meteor.methods({
'saveFileToDB': function (buffer) {
Files.insert({ data:buffer,
createdAt: new Date(),
owner: this.userId,
username: Meteor.users.findOne(this.userId).username
});
},
});
In another file I want to retrieve the saved file. First, I don't know what id I should to pass it. Suppose that there is one file in the collection or I want the first one, or the ones owned by the current user. I tried, I passed fileId as 1 but it didn't work. I don't know actually the question marks below:
import {Files} from "./files";
Meteor.methods({
'slides.readFileFromDB'(fileId) {
if (Files.???) { //contain any data
const text = Files.findOne(???);
console.log(text);
// Meteor.call('slides.insert', text);
return text;
}
})
});
There is a special layer for storing files as chunks, called GridFs because a normal collection will get into problems when you try to store binaries larger than 16MB.
There is also a well established wrapper for handling files in Meteor. Check out the following package: https://github.com/VeliovGroup/Meteor-Files

associating a GridFS-Stream file to a Schema with Mongoose

I am writing an API for my application, using Mongoose, Express, and GridFS-Stream. I have a Schema for the articles the user will create:
var articleSchema = mongoose.Schema({
title:String,
author:String,
type: String,
images: {type: Schema.Types.ObjectId, ref: "fs.files"},
datePublished: { type: Date, default: Date.now },
content: String
})
var Article = mongoose.model("article", articleSchema, "articles");
and my grid-fs set up for when a user uploads an image:
api.post('/file', fileUpload.single("image"), function(req, res) {
var path = req.file.path;
var gridWriteStream = gfs.createWriteStream(path)
.on('close',function(){
//remove file on close of mongo connection
setTimeout(function(){
fs.unlink(req.file.path);
},1000);
})
var readStream = fs.createReadStream(path)
.on('end',function(){
res.status(200).json({"id":readStream.id});
console.log(readStream);
})
.on('error',function(){
res.status(500).send("Something went wrong. :(");
})
.pipe(gridWriteStream)
});
Right now it's set up to when the user chooses an image, it automatically uploads it via gridfs-stream, puts it in a temp folder, then deletes it when it is uploaded to the mongo server, and in the console returns what the ObjectId is. Well thats all find and dandy, but we need to associate this ID with the articleSchema, so when we call that article in the app, it will display the associated image.
on our creation/update of an article when the user hits submit:
createArticle(event) {
event.preventDefault();
var article = {
type: this.refs.type.getValue(),
author: this.refs.author.getValue(),
title: this.refs.title.getValue(),
content: this.refs.pm.getContent('html')
};
var image = {
images: this.refs.imageUpload.state.imageString
};
var id = {_id: this.refs.id.getValue()};
var payload = _.merge(id, article, image);
var newPayload = _.merge(article, image)
if(this.props.params.id){
superagent.put("http://"+this.context.config.API_SERVER+"/api/v1.0/article/").send(payload).end((err, res) => {
err ? console.log(err) : console.log(res);
});
} else {
superagent.post("http://"+this.context.config.API_SERVER+"/api/v1.0/article").send(newPayload).end((err, res) => {
err ? console.log(err) : console.log(res);
this.replaceState(this.getInitialState())
this.refs.articleForm.reset();
});
}
},
So what I need it to do, is call the ID, of the image I just uploaded to the images section of my schema when the user hits submit on the creation of an article. I've tried doing a readstream on submit, but again, the problem is I can't get the ID, or the filename, to be able to associate it.
They are getting stored in the mongo database, it creates fs.files and fs.chunks, but for the life of me I can't figure out how to get that data and attach it to a schema, or just even get the data out, without knowing the ObjectId.
So how do I call out the objectid from fs.files or fs.chunks to attach it to the schema? and in the schema how do I reference the fs.files or chunks? so it knows what the objectid is associated with?
I can provide anymore data, if what I have is to vague, I have a nasty habit of doing that. sorry.
So I ended up solving my problem, might not be the best solution, but it works until I can get a better solution.
in the API changed
res.status(200).json({"id":readStream.id});
to
res.status(200).send(readStream.id);
in my component, I then set the state to the response.body, which will set the state of the id of the image uploaded. So in the main view, i reference the image uploading component, and set the image state of my view to the id state of my component, and viola, I now have the id in my database, associated with the newly created article.
the problem i then ran into was, it didn't know what to reference. so I attached the API URL to the id, and it acts like it is referencing a URL img, and renders the image correctly.
Again, this may not be the best way to go about this, in fact, I am pretty sure it isn't, but It is whats working for now until I can either reference the database correctly, or create a new component that just stores all the images on server and reference them that way, much like wordpress.

How to use MongoDB find without a hexadecimal _id

I am having a lot of trouble finding a record in my database with the _id: "AAE45/0RQfm/VUrywfb1Gw=="
(eg. db.collection.find( {_id: new BinData(3, "AAE45/0RQfm/VUrywfb1Gw==") }) ).
It works fine using a BinData converter in the mongo console, but refuses to work from inside a javascript file (I am using node.js) even though I have installed the BinData npm and "required" it.
I have also tried the Binary() function, but it keeps telling me it needs to be hexadecimal or 12-byte binary or something. .hex, .str and .toString() don't work either.
I found this somewhere:
{"$binary": "AAE45/0RQfm/VUrywfb1Gw==", "$type": "03"}
which looks promising, but I have no idea how to implement it.
I hope this makes sense. Any suggestions would be very much appreciated, if anyone has any insight on what process I should follow (eg: convert to binary, then hex, then use ...) that would be fantastic.
You'll have to convert the base64 string to a byte array, then use Binary to create the corresponding mongodb object. Here's some working sample code that inserts a document with the given id in a mongodb collection:
var MongoClient = require('mongodb').MongoClient;
var Binary = require('mongodb').Binary;
MongoClient.connect("mongodb://localhost:27017/example", function (err, db) {
if (err) { return console.dir(err); }
var collection = db.collection('test');
// decode the base64 string into a buffer
var buf = new Buffer("AAE45/0RQfm/VUrywfb1Gw==", 'base64');
// create a mongo 'binary' object w/ subtype 3
var uuid = new Binary(buf, 3);
var doc1 = { 'hello': 'foo bar', '_id' : uuid };
collection.insert(doc1, { w: 1 }, function (err, result) { });
});
You might want to ensure you really want to use subtype 3, because it's the old UUID type.

Mongoose/Node server restart and duplicates

Ok so after a ton of trial and error, I've determined that when I drop a collection and then recreate it through my app, unique doesn't work until I restart my local node server. Here's my Schema
var mongoose = require('mongoose');
var Schema = mongoose.Schema;
var Services = new Schema ({
type : {type : String},
subscriptionInfo : Schema.Types.Mixed,
data : Schema.Types.Mixed
},{_id:false});
var Hashtags = new Schema ({
name: {type : String},
services : [Services]
},{_id:false});
var SubscriptionSchema = new Schema ({
eventId : {type: String, index: { unique: true, dropDups: true }},
hashtags : [Hashtags]
});
module.exports = mongoose.model('Subscription', SubscriptionSchema);
And Here's my route...
router.route('/')
.post(function(req, res) {
var subscription = new subscribeModel();
subscription.eventId = eventId;
subscription.save(function(err, subscription) {
if (err)
res.send(err);
else
res.json({
message: subscription
});
});
})
If I drop the collection, then hit the /subscribe endpoint seen above, it will create the entry but will not honor the duplicate. It's not until I then restart the server that it starts to honor it. Any ideas why this is? Thanks!
What mongoose does when your application starts and it itself initializes is scan your schema definitions for the registered models and calls the .ensureIndexes() method for the supplied arguments. This is the "by design" behavior and is also covered with this statement:
When your application starts up, Mongoose automatically calls ensureIndex for each defined index in your schema. While nice for development, it is recommended this behavior be disabled in production since index creation can cause a significant performance impact. Disable the behavior by setting the autoIndex option of your schema to false.
So your general options here are:
Don't "drop" the collection and call .remove() which leaves the indexes intact.
Manually call .ensureIndexes() when you issue a drop on a collection in order to rebuild them.
The warning in the document is generally that creating indexes for large collections can take some time and take up server resources. If the index exists this is more or less a "no-op" to MongoDB, but beware of small changes to the index definition which would result in creating "additional" indexes.
As such, it is generally best to have a deployment plan for production systems where you determine what needs to be done.
This post seems to argue that indexes are not re-built when you restart: Are MongoDB indexes persistent across restarts?

Categories