Rethinkdb Bulk import while checking for duplicates - javascript

I'm trying to think of the best way to import csv data into rethinkdb while avoiding possible duplicates (eg importing the same file twice).
The csv data comes from bank statements. There is no real primary key, instead a composite key of date,description and amount can be used.
I wanted to use the CLI for imports but I couldn't see how I could do that given I don't have a single primary key.
Is the best way to iterate through the csv, first check for the existence and then insert if not found? I struggled to create a single query that would insert only if the check for existence was empty.
Any guidance - I was assuming this is a somewhat common scenario?
(I'm using JavaScript as my language)
Thanks in advance!

You could set the conflict options on the insert statement to either "update" or "replace" This will check to see if a document with the same specicifed id already exists and either update or replace it. http://www.rethinkdb.com/api/javascript/insert/
r.table("users").insert(
{id: "william", email: "william#rethinkdb.com"},
{conflict: "replace"}
).run(conn, callback)

Related

Is it possible to force morphia to map the ObjectId to the hex representation?

I am currently working on a kotlin multi project solution.
I have one project defining some data classes and defining an api to access a mongodb. The objectId is created automatically. This project is using morphia:1.3.2.
Entries are stored using this function:
fun store(myClass: MyClass) = db.save(myClass).let { myClass.id?.toHexString() ?: "0" }
Now I'm using this project in a spring-boot kotlin project.
I created a small web page with some filters. These filters should be applied on my query. So far so good, everything is working.
The results of my query are returned via my Rest-controller without any conversions. In my web page I want to print the ObjectId foreach result.
But the ObjectId is not a String as it used to be, it is an object.
id:
counter:15304909
date:"2018-08-27T23:45:35.000+0000"
machineIdentifier:123456
processIdentifier:1234
time:1535413535000
timeSecond:1535413535
timestamp:1535413535
Is it possible to force morphia to return the objectId in the String representation? Or is there a on Option to activate the correct mapping? Or do I have to touch each result one by one and convert the object id to the hexadecimal string representation? I hope that there is a better, and quicker solution then this.
I am also not able to remap the object to a valid id, due to an java.lang.IllegalArgumentException: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986 exception. The request looks like this:
myClass?id={"timestamp":1535413631,"machineIdentifier":123456,"processIdentifier":1234,"counter":16576969,"time":1535413631000,"date":"2018-08-27T23:47:11.000+0000","timeSecond":1535413631}
I'm a little bit out of ideas, how to fix this issue.
Depending on your REST framework, you would need to provided a serializer for writing out that ObjectId as its String version, say. Most such frameworks make that transparent once it's configured so you need only worry about returning your objects out of your REST service and the framework will serialize properly.
I, personally, wouldn't muck about by trying to change how it's serialized in the database. ObjectId is a pretty good _id type and I wouldn't change it.

Firebase database retrieving data from comma seperated list

I want to store the comma separated ids on a child node & how can I filter data as in sql we can use IN clause to fetch data any possibility in firebase to perform this kind of operation in firebase database.
Please suggest any possible solution for this.
Firebase Realtime Database doesn't have the equivalent of SQLs IN clause. It also doesn't have a way to find a substring in a value. So the data model you are looking to use, doesn't allow the use-case you want. As usual with NoSQL databases, the solution is to pick a data model that does allow your use-case..
The most likely cause I know for the structure you describe is to associate the child node with a bunch of categories. If that is your case, read my answer here for a proper data structure: Firebase query if child of child contains a value
This is one of the cases where the new Cloud Firestore database offers better querying support, since it recently added a feature to efficiently test if an array contains a certain value (video). If you're only just getting started with your project, you might want to check if Firestore is a better fit for your use-cases.

Return formatted value in MongoDB db.collection.find()

I have a MongoDB JavaScript function saved in db.system.js, and I want to use it to both query and produce output data.
I'm able to query the results of the function using the $where clause like so:
db.records.find(
{$where: "formatEmail(this.email.toString(), 'format type') == 'xxx'"},
{email:1})
but I'm unable to use this to return a formatted value for the projected "email" field, like so:
db.records.find({}, {"formatEmail(this.email.toString(), 'format type')": 1})
Is there any way to do this while preserving the ability to simply use a pre-built function?
UPDATE:
Thank you all for your prompt participation.
Let me explain why I need to do this in MongoDB and it's not a matter of client logic at the wrong layer.. What I am really trying to do is use the function for a shard bucketing value. Email was just one example, but in reality, what I have is a hash function that returns a mod value.
I'm aware of Mongo having the ability to shard based on a hashed value, but from what I gather, it produces a highly random value that can burden the re-balancing of shards with unnecessary load. So I want to control it like so func(_id, mod), which would return a value from 0 to say 1000 (depending on the mod value).
Also, I guess I would also like to use the output of the function in some sort of grouping scenario, and I guess Map Reduce does come to mind.. I was just hoping to avoid writing overly complex M/R for something so simple.. also, I don't really know how to do Map Reduce .. lol.
So, I gather that from your answers, there is no way to return any formatted value back from mongo (without map/reduce), is that right?
I think you are mixing your "layers" of functionality here -- the database stores and retrieves data, thats all. What you need to do is:
* get that data and store the cursor in a variable
* loop through your cursor, and for every record you go through
* format and output your record as you see fit.
This is somewhat similar to what you have described in your question, but its not part of MongoDB and you have to provide the "formatEmail" function in your "application layer"
Hope it helps
As #alernerdev has already mentioned, this is generally not done at a database layer. However, sometimes storing a pre-formatted version in your database is the way to go. Here's some instances where you may wish to store extra data:
If you need to lookup data in a particular format. For example, I have a "username" and a "usernameLowercase" fields for my primary user collection. The lowercase'd one is indexed, and is the one I use for username lookup. The mixed-case one is used for displaying the username.
If you need to report a large amount of data in a particular format. 100,000 email addresses all formatted in a particular way? Probably best to just store them in that format in the db.
If your translation from one format to another is computationally expensive. Doubly so if you're processing lots of records.
In this case, if all you're doing is looking up or retrieving an email in a specific format, I'd recommend adding a field for it and then indexing it. That way you won't need to do actual document retrieval for the lookup or the display. Super fast. Disk storage space for something the size of an email address is super cheap!

Meteor mongo insert unique document

I have a simple Tags Collection in Meteor. Currently in order to ensure that a user cannot create a duplicate Tag document I do this:
var existingTag = Tags.findOne({name: "userInput"})
If existingTag is undefined then I can go ahead and do the insert.
Is there a better/correct way of doing this utilizing meteor mongodb syntax? Cant seem to find any documentation on this.
Thanks.
A good solution is to create Mongo index at the unique field. That way you'll have the uniqueness validation at Mongo level, as well as performance increase for searches on that field.
Meteor currently doesn't support index creation directly, so you need to manually log in to your database and add your index from there. The command for this is:
db.tags.ensureIndex({name: 1}, {unique: true})
Here and here you can find more information.

What does the collection returned by CrmRestKit.RetrieveMultiple look like?

In a form in CRM2011 I am using a JavaScript function to retrieve some attributes from a custom entity unrelated to the one in the form.
I have a successful call to CrmRestKit.RetrieveMultiple but I don't know what the returned collection comprises. Can someone point me in the right direction, please?
To be a little more specific about the requirement: the query returns a set of Field schema names; i.e. the column being queried is in a custom entity and contains schema names of Fields. I want to match each one I retrieve against the calling form's collection of Field-based controls so that I can perform an action on matching ones. Any assistance towards that would also be gratefully received, thanks.
The easiest way I have found to know what you'll be working with is to take the output and run it through JSON.stringify() and write the contents of that out to the page.
For bits like this I usually just debug with IE. That will allow you to add breakpoints and inspect the object.
Related info: Debugging Script with the Developer Tools.

Categories