I have a field in my documents, that is named after its timestamp, like so:
{
_id: ObjectId("53f2b954b55e91756c81d3a5"),
domain: "example.com",
"2014-08-07 01:25:08": {
A: [
"123.123.123.123"
],
NS: [
"ns1.example.com.",
"ns2.example.com."
]
}
}
This is very impractical for queries, since every document has a different timestamp.
Therefore, I want to rename this field, for all documents, to a fixed name.
However, I need to be able to match the field names using regex, because they are all different.
I tried doing this, but this is an illegal query.
db['my_collection'].update({}, {$rename:{ /2014.*/ :"201408"}}, false, true);
Does someone have a solution for this problem?
SOLUTION BASED ON NEIL LUNN'S ANSWER:
conn = new Mongo();
db = conn.getDB("my_db");
var bulk = db['my_coll'].initializeOrderedBulkOp();
var counter = 0;
db['my_coll'].find().forEach(function(doc) {
for (var k in doc) {
if (k.match(/^2014.*/) ) {
print("replacing " + k)
var unset = {};
unset[k] = 1;
bulk.find({ "_id": doc._id }).updateOne({ "$unset": unset, "$set": { WK1: doc[k]} });
counter++;
}
}
if ( counter % 1000 == 0 ) {
bulk.execute();
bulk = db['my_coll'].initializeOrderedBulkOp();
}
});
if ( counter % 1000 != 0 )
bulk.execute();
This is not a mapReduce operation, not unless you want a new collection that consists only of the _id and value fields that are produced from mapReduce output, much like:
"_id": ObjectId("53f2b954b55e91756c81d3a5"),
"value": {
"domain": "example.com",
...
}
}
Which at best is a kind of "server side" reworking of your collection, but of course not in the structure you want.
While there are ways to execute all of the code in the server, please don't try to do so unless you are really in a spot. These ways generally don't play well with sharding anyway, which is usually where people "really are in a spot" for the sheer size of records.
When you want to change things and do it in bulk, you generally have to "loop" the collection results and process the updates while having access to the current document information. That is, in the case where your "update" is "based on" information already contained in fields or structure of the document.
There is therefore not "regex replace" operation available, and there certainly is not one for renaming a field. So let's loop with bulk operations for the "safest" form of doing this without running the code all on the server.
var bulk = db.collection.initializeOrderedBulkOp();
var counter = 0;
db.collection.find().forEach(function(doc) {
for ( var k in doc ) {
if ( doc[k].match(/^2014.*/) ) {
var update = {};
update["$unset"][k] = 1;
update["$set"][ k.replace(/(\d+)-(\d+)-(\d+).+/,"$1$2$3") ] = doc[k];
bulk.find({ "_id": doc._id }).updateOne(update);
counter++;
}
}
if ( counter % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
});
if ( counter % 1000 != 0 )
bulk.execute();
So the main things there are the $unset operator to remove the existing field and the $set operator to create the new field in the document. You need the document content to examine and use both the "field name" and "value", so hence the looping as there is no other way.
If you don't have MongoDB 2.6 or greater on the server then the looping concept still remains without the immediate performance benefit. You can look into things like .eval() in order to process on the server, but as the documentation suggests, it really is not recommended. Use with caution if you must.
As you already recognized, value-keys are indeed very bad for the MongoDB query language. So bad that what you want to do doesn't work.
But you could do it with a MapReduce. The map and reduce functions wouldn't do anything, but the finalize function would do the conversion in Javascript.
Or you could write a little program in a programming language of your which reads all documents from the collection, makes the change, and writes them back using collection.save.
Related
I just started trying Javascript and I'm struggling. My end result is supposed to be a chronological timeline of activities (Calls, Meetings, Tasks).
I'm receiving a file from an application with different types of records. It contains Calls, Meetings, and Tasks. When I receive the file, they are in no particular order and each record type has different fields. I need to get them into the same table but sorted by date.
Here is a sample file with a Call and a Task but it might have 10 or more records of differing type.
[
{
"Owner": {
"name": "Raymond Carlson",
},
"Check_In_State": null,
"Activity_Type": "Calls",
"Call_Start_Time": "2022-10-23T20:00:00-05:00",
"$editable": true,
"Call_Agenda": "Need to call and discuss some upcoming events",
"Subject": "Call scheduled with Florence"
},
{
"Owner": {
"name": "Raymond Carlson",
},
"Check_In_State": null,
"Activity_Type": "Tasks",
"Due_Date": "2022-10-24",
"$editable": true,
"Description": "-Review Action Items from Last week”,
"Subject": "Complete Onboarding"
}
]
This is what I'm doing now and I know it's not the best way to go about it.
for (var i = 0; i < obj.length; i++) {
var activityType0 = (obj[0].Activity_Type)
var activityOwner0 = (obj[0].Owner.name);
if (activityType0 == "Events") {
start0 = (obj[0].Start_DateTime)
startDate0 = new Date(start0);
activityDate0 = startDate0.toLocaleString();
activityTitle0 = (obj[0].Subject);
activityDesc0 = (obj[0].Description);
}
else if (activityType0 == "Tasks"){
dueDate0 = (obj[0].Due_Date)
activityDate0 = dueDate0;
activityTitle0 = (obj[0].Subject);
activityDesc0 = (obj[0].Description);
}
else if (activityType0 == "Calls"){
callStart0 = (obj[0].Call_Start_Time)
callStartTime0 = new Date(callStart0);
activityDate0 = callStartTime0.toLocaleString();
activityTitle0 = (obj[0].Subject);
activityDesc0 = (obj[0].Call_Agenda);
}
}
So regardless of the type of record, I have an
activityOwner,
activityDate,
activityTitle,
activityDesc,
And that's what I need.
Aside from that code above needing work, now my question is, what do I need to do with these values for each record to put them in order by "activityDate". Do I need to put them back into an array then sort and if so, what's the best approach?
Thank you much!
Right now I'm not really sure what your end goal is, is it sorting by activity type or activity date ?
If it's the latter, you can try referring to this answer, or try to sort activity type by the ASCII number of each starting letter in each type (e.g. "C" in "Call", "T" in "Tasks", etc.)
I have a small job that runs every minute and perform a scan in a table that has near 3000 rows:
async execute (dialStatus) {
if (!process.env.DIAL_TABLE) {
throw new Error('Dial table not found')
}
const params = {
TableName: process.env.DIAL_TABLE,
FilterExpression: '#name = :name AND #dial_status = :dial_status AND #expires_on > :expires_on',
ExpressionAttributeNames: {
'#name': 'name',
'#dial_status': 'dial_status',
'#expires_on': 'expires_on'
},
ExpressionAttributeValues: {
':name': { 'S': this.name },
':dial_status': { 'S': dialStatus ? dialStatus : 'received' },
':expires_on': { 'N': Math.floor(moment().valueOf() / 1000).toString() }
}
}
console.log('params', params)
const dynamodb = new AWS.DynamoDB()
const data = await dynamodb.scan(params).promise()
return this._buildObject(data)
}
I'm facing a problem about read units and timeouts on dynamodb. Right now, I'm using 50 read units and it's getting expensive if compared to a RDS.
The attributes names used on scan function are not my primary key: name is a secondary index and dial_status is a normal attribute on my json but every row has this attribute.
This job run every minute for a list of parameters (i.e: if i have 10 parameters, I'll perform this scan 10 times in a minute).
My table has the following schema:
phone (PK Hash)
configuration: JSON in String format;
dial_status String;
expires_on: TTL number;
name: String
origin: String;
The job should get all items based on name and dial_status and the number of items is restricted to 15 elements each execution (each minute). For each element, it should be enqueued on SQS to be processed.
I really need to decrease those read units but I'm not sure on how optimize this function. I've read about reduce the page size or avoid scan. What's are my alternatives to avoid scan if I don't have primary key and I want to return a group of rows?
Any idea on how to fix this code to be called like 10-15 times every minute?
I suggest you to create a GSI (Global Secondary Index) with keys:
HASH: name_dialStatus
RANGE: expiresOn
As you've already guessed, the hash key has as value the concatenation of the two independent fields name and dialStatus.
Now you may use a query on this GSI, which is much more efficient since it doesn't scan all the table, but explores only the items you are interested in:
async execute(dialStatus) {
if (!process.env.DIAL_TABLE) {
throw new Error('Dial table not found')
}
const params = {
TableName: process.env.DIAL_TABLE,
IndexName: 'MY_GSI_NAME',
// replace `FilterExpression`
// always test the partition key for equality!
KeyConditionExpression: '#pk = :pk AND #sk > :skLow',
ExpressionAttributeNames: {
'#pk': 'name_dialStatus', // partition key name
'#sk': 'expires_on' // sorting key name
},
ExpressionAttributeValues: {
':pk': { 'S': `${this.name}:${dialStatus || 'received'}` },
':skLow': { 'N': Math.floor(moment().valueOf() / 1000).toString() }
}
}
console.log('params', params)
// Using AWS.DynamoDB.DocumentClient() there is no need to specify the type of fields. This is a friendly advice :)
const dynamodb = new AWS.DynamoDB();
// `scan` becomes `query` !!!
const data = await dynamodb.query(params).promise();
return this._buildObject(data);
}
It is always recommended to design dynamodb table based on access patterns to query it easily with keys(primarykey/sortkey) and to avoid expensive scan operations.
Revisit your table schema if it is not too late.
If it is already late then either create GSI with "name" as PrimaryKey and "expires_on" as SortKey with Projected attributes e.g. "dialStatus" so that you can query only required data to lower ready capacity.
If you still do not want to go with option 1 and option2 Scan operation with RateLimiter and pass only 25% of read capacity so that you can avoid spike.
I can't find an answer for this. All the examples that I saw is to find for example the value 123456 of the field Number on the collection.
But what I want to know if it's possible is if I can search the whole collection documents for a number inside in the array.
For example, this is the FIRST document of my collection:
{
"_id": "Qfjgftt7jLh3njw8X",
"reference": 102020026, // Field that I want to check
"type": "Moradias",
"n_beds": 5,
"price": 30000,
"departement": "Portalegre",
"city": "Ponte de Sor",
"street_name": "Rua do Comércio, CE",
"l_surface": 248,
"is_exclusive": true
}
The client is importing a XLSX file that is converted to JSON, and added to my collection so the reference number is inserted by the client.
Then I want to loop over my JSON, (there might be more than one reference) get the references imported by the client and check if those references already exists in one of my collection documents. After that I'll throw an error.
I know I can do Collection.find({reference: '102020026'}) but the thing is that there can be 100+ different references in my collection.
I hope this makes sense and haven't been asnwered somewhere else.
Thanks in advance!
You might want to use the $in mongo operator (see doc) to find all documents by a given array:
const ids = [
102020026,
102021407,
102023660,
];
Collection.find({reference: {$in: ids}});
This returns all documents, where reference matches one of the numbers, given in the ìds` array.
Here's the solution in my case:
Using the Mongo operator $in that was answered by #Jankapunkt:
let arr = [];
for (let c = 0; c< myJSON.length; c++) {
arr.push(myJSON[c].reference);
}
if(Collection.find({reference: {$in: arr}}).count() > 1) {
alert(`Reference ${arr} already exists`);
throw new Error('Duplicated reference');
}
And without the $in Mongo operator, by just using a for loop that will end up in the same thing:
for (let c = 0; c < myJSON.length; c++) {
if (Collection.find({reference: myJSON[c].reference}).count > 1) {
alert(`Reference ${arr} already exists`);
throw new Error('Duplicated reference');
}
}
I'm working with a large dataset that needs to be efficient with its Mongo queries. The application uses the Ford-Fulkerson algorithm to calculate recommendations and runs in polynomial time, so efficiency is extremely important. The syntax is ES6, but everything is basically the same.
This is an approximation of the data I'm working with. An array of items and one item being matched up against the other items:
let items = ["pen", "marker", "crayon", "pencil"];
let match = "sharpie";
Eventually, we will iterate over match and increase the weight of the pairing by 1. So, after going through the function, my ideal data looks like this:
{
sharpie: {
pen: 1,
marker: 1,
crayon: 1,
pencil: 1
}
}
To further elaborate, the value next to each key is the weight of that relationship, which is to say, the number of times those items have been paired together. What I would like to have happen is something like this:
// For each in the items array, check to see if the pairing already
// exists. If it does, increment. If it does not, create it.
_.each(items, function(item, i) {
Database.upsert({ match: { $exist: true }}, { match: { $inc: { item: 1 } } });
})
The problem, of course, is that Mongo does not allow bracket notation, nor does it allow for variable names as keys (match). The other problem, as I've learned, is that Mongo also has problems with deeply nested $inc operators ('The dollar ($) prefixed field \'$inc\' in \'3LhmpJMe9Es6r5HLs.$inc\' is not valid for storage.' }).
Is there anything I can do to make this in as few queries as possible? I'm open to suggestions.
EDIT
I attempted to create objects to pass into the Mongo query:
_.each(items, function(item, i) {
let selector = {};
selector[match] = {};
selector[match][item] = {};
let modifier = {};
modifier[match] = {};
modifier[match]["$inc"] = {};
modifier[match]["$inc"][item] = 1
Database.upsert(selector, modifier);
Unfortunately, it still doesn't work. The $inc breaks the query and it won't let me go more than 1 level deep to change anything.
Solution
This is the function I ended up implementing. It works like a charm! Thanks Matt.
_.each(items, function(item, i) {
let incMod = {$inc:{}};
let matchMod = {$inc:{}};
matchMod.$inc[match] = 1;
incMod.$inc[item] = 1;
Database.upsert({node: item}, matchMod);
Database.upsert({node: match}, incMod);
});
I think the trouble comes from your ER model. a sharpie isn't a standalone entity, a sharpie is an item. The relationship between 1 item and other items is such that 1 item has many items (1:M recursive) and each item-pairing has a weight.
Fully normalized, you'd have an items table & a weights table. The items table would have the items. The weights table would have something like item1, item2, weight (in doing so, you can have asymmetrical weighting, e.g. sharpie:pencil = 1, pencil:sharpie = .5, which is useful when calculating pushback in the FFA, but I don't think that applies in your case.
Great, now let's mongotize it.
When we say 1 item has many items, that "many" is probably not going to exceed a few thousand (think 16MB document cap). That means it's actually 1-to-few, which means we can nest the data, either using subdocs or fields.
So, let's check out that schema!
doc =
{
_id: "sharpie",
crayon: 1,
pencil: 1
}
What do we see? sharpie isn't a key, it's a value. This makes everything easy. We leave the items as fields. The reason we don't use an array of objects is because this is faster & cleaner (no need to iterate over the array to find the matching _id).
var match = "sharpie";
var items = ["pen", "marker", "crayon", "pencil"];
var incMod = {$inc:{}};
var matchMod = {$inc:{}};
matchMod.$inc[match] = 1;
for (var i = 0; i < items.length; i++) {
Collection.upsert({_id: items[i]}, matchMod);
incMod.$inc[items[i]] = 1;
}
Collection.upsert({_id: match}, incMod);
That's the easy part. The hard part is figuring out why you want to use an FFA for a suggestion engine :-P.
I have a perl hash which I'm looping over and building a JavaScript array. The JavaScript array starts out with a length of 0 when I initiate it; however, it quickly grows to 1001 in the first past, 2001 in the second, and 4001 in the third pass. I'm expecting the length to be 3! Here's the code and the perl hash following.
Code
var offers = [];
% foreach my $amount (keys %$offers) {
offers['<% $amount %>'] = [];
console.log(offers.length);
% }
Perl Hash
{
'1000'=>{
'6'=>{
'payment'=>'173.49',
'fee'=>'2',
'APR'=>'13.9'
},
'4'=>{
'payment'=>'256.23',
'fee'=>'2',
'APR'=>'11.9'
}
},
'2000'=>{
'6'=>{
'payment'=>'346.98',
'fee'=>'2',
'APR'=>'13.9'
},
'4'=>{
'payment'=>'512.46',
'fee'=>'2',
'APR'=>'11.9'
}
},
'4000'=>{
'6'=>{
'payment'=>'693.96',
'fee'=>'2',
'APR'=>'13.9'
},
'4'=>{
'payment'=>'1024.92',
'fee'=>'2',
'APR'=>'11.9'
}
}
};
Try
var offers = [];
% foreach my $amount (keys %$offers) {
offers.push('<% $amount %>');
console.log(offers.length);
% }
I think what you want is an associative array / object. If you want the data to be identified through code such as offers['1000'] and yet not have 1,000 elements, then you simply need to initialize the offers like this:
var offers = {};
and leave the rest of your code unchanged. There will not be a length property any longer, but you will only be creating one entry rather than 1,000 for each item being stored.
You can iterate through the data by doing like this:
var offer;
for (offer in offers) {
/* do something with offers[offer] here */
}