Is there any way we can query and get location data using mongodb geospatial query that matches the following criteria?
Getting all locations that are part of intersection between two boxes or in general two polygons.
For example below can we get in query output only those locations that are within the yellow area which actually is the common area for the purple and red geometric objects [ polygons ] ?
My study of mongodb document so far
http://docs.mongodb.org/manual/reference/operator/query/geoWithin/
This provides results that are within one or more polygons [ I am looking for the intersection of these individual polygon results as output ]
Use case
db.places.find( {
loc: { $geoWithin: { $box: [ [ 0, 0 ], [ 100, 100 ] ] } }
} )
Above query provides results within a rectangle geometric area [ I am looking for locations that are common to two such individual queries ]
db.places.find( {
loc: { $geoWithin: { $box: [ [ 0, 0 ], [ 100, 100 ] ] } }
} )
db.places.find( {
loc: { $geoWithin: { $box: [ [ 50, 50 ], [ 90, 120 ] ] } }
} )
So looking at this with a fresh mind the answer is staring me in the face. The key thing that you have already stated is that you want to find the "intersection" of two queries in a single response.
Another way to look at this is you want all of the points bound by the first query to then be "input" for the second query, and so on as required. That is essentially what an intersection does, but the logic is actually literal.
So just use the aggregation framework to chain the matching queries. For a simple example, consider the following documents:
{ "loc" : { "type" : "Point", "coordinates" : [ 4, 4 ] } }
{ "loc" : { "type" : "Point", "coordinates" : [ 8, 8 ] } }
{ "loc" : { "type" : "Point", "coordinates" : [ 12, 12 ] } }
And the chained aggregation pipeline, just two queries:
db.geotest.aggregate([
{ "$match": {
"loc": {
"$geoWithin": {
"$box": [ [0,0], [10,10] ]
}
}
}},
{ "$match": {
"loc": {
"$geoWithin": {
"$box": [ [5,5], [20,20] ]
}
}
}}
])
So if you consider that logically, the first result will find the points that fall within the bounds of the initial box or the first two items. Those results are then acted on by the second query, and since the new box bounds start at [5,5] that excludes the first point. The third point was already excluded, but if the box restrictions were reversed then the result would be the same middle document only.
How this works in quite unique to the $geoWithin query operator as compared to various other geo functions:
$geoWithin does not require a geospatial index. However, a geospatial index will improve query performance. Both 2dsphere and 2d geospatial indexes support $geoWithin.
So the results are both good and bad. Good in that you can do this type of operation without an index in place, but bad because once the aggregation pipeline has altered the collection results after the first query operation the no further index can be used. So any performance benefit of an index is lost on merging the "set" results from anything after the initial Polygon/MultiPolygon as supported.
For this reason I would still recommend that you calculate the intersection bounds "outside" of the query issued to MongoDB. Even though the aggregation framework can do this due to the "chained" nature of the pipeline, and even though resulting intersections will get smaller and smaller, your best performance is a single query with the correct bounds that can use all of the index benefits.
There are various methods for doing that, but for reference here is an implementation using the JSTS library, which is a JavaScript port of the popular JTS library for Java. There may be others or other language ports, but this has simple GeoJSON parsing and built in methods for such things as getting the intersection bounds:
var async = require('async');
util = require('util'),
jsts = require('jsts'),
mongo = require('mongodb'),
MongoClient = mongo.MongoClient;
var parser = new jsts.io.GeoJSONParser();
var polys= [
{
type: 'Polygon',
coordinates: [[
[ 0, 0 ], [ 0, 10 ], [ 10, 10 ], [ 10, 0 ], [ 0, 0 ]
]]
},
{
type: 'Polygon',
coordinates: [[
[ 5, 5 ], [ 5, 20 ], [ 20, 20 ], [ 20, 5 ], [ 5, 5 ]
]]
}
];
var points = [
{ type: 'Point', coordinates: [ 4, 4 ] },
{ type: 'Point', coordinates: [ 8, 8 ] },
{ type: 'Point', coordinates: [ 12, 12 ] }
];
MongoClient.connect('mongodb://localhost/test',function(err,db) {
db.collection('geotest',function(err,geo) {
if (err) throw err;
async.series(
[
// Insert some data
function(callback) {
var bulk = geo.initializeOrderedBulkOp();
bulk.find({}).remove();
async.each(points,function(point,callback) {
bulk.insert({ "loc": point });
callback();
},function(err) {
bulk.execute(callback);
});
},
// Run each version of the query
function(callback) {
async.parallel(
[
// Aggregation
function(callback) {
var pipeline = [];
polys.forEach(function(poly) {
pipeline.push({
"$match": {
"loc": {
"$geoWithin": {
"$geometry": poly
}
}
}
});
});
geo.aggregate(pipeline,callback);
},
// Using external set resolution
function(callback) {
var geos = polys.map(function(poly) {
return parser.read( poly );
});
var bounds = geos[0];
for ( var x=1; x<geos.length; x++ ) {
bounds = bounds.intersection( geos[x] );
}
var coords = parser.write( bounds );
geo.find({
"loc": {
"$geoWithin": {
"$geometry": coords
}
}
}).toArray(callback);
}
],
callback
);
}
],
function(err,results) {
if (err) throw err;
console.log(
util.inspect( results.slice(-1), false, 12, true ) );
db.close();
}
);
});
});
Using the full GeoJSON "Polygon" representations there as this translates to what JTS can understand and work with. Chances are any input you might receive for a real application would be in this format as well rather than applying conveniences such as $box.
So it can be done with the aggregation framework, or even parallel queries merging the "set" of results. But while the aggregation framework may do it better than merging sets of results externally, the best results will always come from computing the bounds first.
In case anyone else looks at this, as of mongo version 2.4, you can use $geoIntersects to find the intersection of GeoJSON objects, which supports intersections of two polygons, among other types.
{
<location field>: {
$geoIntersects: {
$geometry: {
type: "<GeoJSON object type>" ,
coordinates: [ <coordinates> ]
}
}
}
}
There is a nice write up on this blog.
Related
I have a document that holds lists containing nested objects. The document simplified looks like this:
{
"username": "user",
"listOne": [
{
"name": "foo",
"qnty": 5
},
{
"name": "bar",
"qnty": 3
},
],
"listTwo": [
{
"id": 1,
"qnty": 13
},
{
"id": 2,
"qnty": 9
},
]
}
And I need to update quantity in lists based on an indentifier. For list one it was easy. I was doing something like this:
db.collection.findOneAndUpdate(
{
"username": "user",
"listOne.name": name
},
{
$inc: {
"listOne.$.qnty": qntyChange,
}
}
)
Then I would catch whenever find failed because there was no object in the list with that name and nothing was updated, and do a new operation with $push. Since this is a rarer case, it didn't bother me to do two queries in the database collection.
But now I had to also add list two to the document. And since the identifiers are not the same I would have to query them individually. Meaning four searches in the database collection, in the worst case scenario, if using the same strategy I was using before.
So, to avoid this, I wrote an update using an aggregation pipeline. What it does is:
Look if there is an object in the list one with the queried identifier.
If true, map through the entire array and:
2.1) Return the same object if the identifier is different.
2.2) Return object with the quantity changed when identifier matches.
If false, push a new object with this identifier to the list.
Repeat for list two
This is the pipeline for list one:
db.coll1.updateOne(
{
"username": "user"
},
[{
"$set": {
"listOne": {
"$cond": {
"if": {
"$in": [
name,
"$listOne.name"
]
},
"then": {
"$map": {
"input": "$listOne",
"as": "one",
"in": {
"$cond": {
"if": {
"$eq": [
"$$one.name",
name
]
},
"then": {
"$mergeObjects": [
"$$one",
{
"qnty": {
"$add": [
"$$one.qnty",
qntyChange
]
}
}
]
},
"else": "$$one"
}
}
}
},
"else": {
"$concatArrays": [
"$listOne",
[
{
"name": name,
"qnty": qntyChange
}
]
]
}
}
}
}
}]
);
Entire pipeline can be foun on this Mongo Playgorund.
So my question is about how efficient is this. As I am paying for server time, I would like to use an efficient solution to this problem. Querying the collection four times, or even just twice but at every call, seems like a bad idea, as the collection will have thousands of entries. The two lists, on the other hand, are not that big, and should not exceed a thousand elements each. But the way it's written it looks like it will iterate over each list about two times.
And besides, what worries me the most, is if when I use map to change the list and return the same object, in cases where the identifier does not match, does MongoDB rewrite these elements too? Because not only would that increase my time on the server rewriting the entire list with the same objects, but it would also count towards the bytes size of my write operation, which are also charged by MongoDB.
So if anyone has a better solution to this, I'm all ears.
According to this SO answer,
What you actually do inside of the document (push around an array, add a field) should not have any significant impact on the total cost of the operation
So, in your case, your array operations should not be causing a heavy impact on the total cost.
Am really new to MongoDB or NoSQL database.
I have this userSchema schema
const postSchema = {
title: String,
posted_on: Date
}
const userSchema = {
name: String,
posts: [postSchema]
}
I want to retrieve the posts by a user in given range(/api/users/:userId/posts?from=date&to=date&limit=limit) using mongodb query. In a relational database, we generally create two different sets of tables and query the second table(posts) using some condition and get the required result.
How can we achieve the same in mongodb? I have tried using $elemMatch by referring this but it doesn't seem to work.
2 ways to do it with aggregation framework, that can do much more than a find can do.
With find we mostly select documents from a collection, or project to keep some fields from a document that is selected, but here you need only some members of an array, so aggregation is used.
Local way (solution at document level) no unwind etc
Test code here
Query
filter the array and keep only posted_on >1 and <4
(i used numbers fro simplicity use dates its the same)
take the first 2 elements of the array (limit 2)
db.collection.aggregate([
{
"$match": {
"name": {
"$eq": "n1"
}
}
},
{
"$set": {
"posts": {
"$slice": [
{
"$filter": {
"input": "$posts",
"cond": {
"$and": [
{
"$gt": [
"$$this.posted_on",
1
]
},
{
"$lt": [
"$$this.posted_on",
5
]
}
]
}
}
},
2
]
}
}
}
])
Uwind solution (solution at collection level)
(its smaller a bit, but keeping things local is better, but in your case it doesn't matter)
Test code here
Query
match user
unwind the array, and make each member to be ROOT
match the dates >1 <4
limit 2
db.collection.aggregate([
{
"$match": {
"name": {
"$eq": "n1"
}
}
},
{
"$unwind": {
"path": "$posts"
}
},
{
"$replaceRoot": {
"newRoot": "$posts"
}
},
{
"$match": {
"$and": [
{
"posted_on": {
"$gt": 1
}
},
{
"posted_on": {
"$lt": 5
}
}
]
}
},
{
"$limit": 2
}
])
I need to know the best way to get following results
courseFrequency : [
{
'courses': [
'a.i'
],
'count' : 1
},
{
'courses': [
'robotics'
],
'count' : 2
},
{
'courses': [
'software engineering', 'a.i'
],
'count' : 2
},
{
'courses': [
'software engineering', 'a.i','robotics'
],
'count' : 1
}
]
from following json data.
arr = [
{
'courses': [
'a.i'
]
},
{
'courses': [
'robotics'
]
},
{
'courses': [
'software engineering', 'a.i'
]
},
{
'courses': [
'robotics'
]
},
{
'courses': [
'software engineering', 'a.i'
],
'courses': [
'software engineering', 'a.i','robotics'
]
}];
Basically i need to find out the unique courses and their frequency. What is the most optimal way to do that ?
const hash = {}, result = [];
for(const {courses} of arr){
const k = courses.join("$");
if(hash[k]){
hash[k].count++;
} else {
result.push(hash[k] = { courses, count : 1 });
}
}
Simply use a hashmap to find duplicates. As arrays are compared by reference, we need to join it to a string for referencing ( note that this will fail if a coursename contains the joining symbol ($))
There both of them are best for area relates to them.These concepts are heaving their own property and methods to accomplish a certain task like JSON used for data transfer and cross browsing aspect as the common type data value.Arrays are really good at storing ordered lists and ordering things while the cost of removing/splicing elements is a bit higher.
JSON is a representation of the data structure, it's not an object or an array.
JSON can be used to send data from the server to the browser, for example, because it is easy for JavaScript to parse into a normal JavaScript data structure.for doing an action on JSON data you need to convert it into an object which is also seamed some property like ARRAY.
Arrays are really good at storing ordered lists and ordering things while the cost of removing/splicing elements is a bit higher.
Relative link
Relative link
I'm using MongoDB 2.6.6
I have these documents in a MongoDB collection and here is an example:
{ ..., "field3" : { "one" : [ ISODate("2014-03-18T05:47:33Z"),ISODate("2014-06-02T20:00:25Z") ] }, ...}
{ ..., "field3" : { "two" : [ ISODate("2014-03-18T05:47:33Z"),ISODate("2014-06-02T20:00:25Z") ] }, ...}
{ ..., "field3" : { "three" : [ ISODate("2014-03-18T05:47:39Z"),ISODate("2014-03-19T20:18:38Z") ] }, ... }
I would like the merge these documents in one field. For an example, I would like the new result to be as follows:
{ "field3", : { "all" : [ ISODate("2014-03-18T05:47:39Z"),ISODate("2014-03-19T20:18:38Z"),...... ] },}
I'm just not sure any more how to have that result!
Doesn't really leave much to go on here but you can arguably get the kind of merged result with mapReduce:
db.collection.mapReduce(
function() {
var field = this.field3;
Object.keys(field).forEach(function(key) {
field[key].forEach(function(date) {
emit( "field3", { "all": [date] } )
});
});
},
function (key,values) {
var result = { "all": [] };
values.forEach(function(value) {
value.all.forEach(function(date) {
result.all.push( date );
});
});
result.all.sort(function(a,b) { return a.valueOf()-b.valueOf() });
return result;
},
{ "out": { "inline": 1 } }
)
Which being mapReduce is not exactly in the same output format given it's own restrictions for doing things:
{
"results" : [
{
"_id" : "field3",
"value" : {
"all" : [
ISODate("2014-03-18T05:47:33Z"),
ISODate("2014-03-18T05:47:33Z"),
ISODate("2014-03-18T05:47:39Z"),
ISODate("2014-03-19T20:18:38Z"),
ISODate("2014-06-02T20:00:25Z"),
ISODate("2014-06-02T20:00:25Z")
]
}
}
],
"timeMillis" : 86,
"counts" : {
"input" : 3,
"emit" : 6,
"reduce" : 1,
"output" : 1
},
"ok" : 1
}
Since the aggregation here into a single document is fairly arbitrary you could pretty much argue that you simply take the same kind of approach in client code.
At any rate this is only going to be useful over a relatively small set of data with next to the same sort of restrictions on the client processing. More than the 16MB BSON limit for MongoDB, but certainly limited by memory to be consumed.
So I presume you would need to add a "query" argument but it's not really clear from your question. Either using mapReduce or your client code, you are basically going to need to follow this sort of process to "mash" the arrays together.
I would personally go with the client code here.
I have a collection of JSON values that has 3 levels:
cluster > segment > node
Where each cluster is made of segments and each segment is made up of nodes. I am trying to figure out how to represent this as a JSON object and I am unsure how to create the structure.
Each node contains an id and a reference to its segment id and cluster id. I have written up a test object like this:
var customers = [
{
"cluster" :
{"flights":4, "profit":5245, "clv":2364,
"segment" :
{ "flights":2, "profit":2150, "clv":1564,
"node" :
{ 'xpos': 1, 'ypos': 2 }// closes node
}// closes segment
}//closes cluster
},
{
"cluster" :
{"flights":4, "profit":5245, "clv":2364,
"segment" :
{ "flights":2, "profit":2150, "clv":1564,
"node" :
{ 'xpos': 1, 'ypos': 2 }// closes node
}// closes segment
}//closes cluster
}
];
The part that feels a bit flaky is the way segment and node are nested. I am not getting any errors but is this the best way to represent this data?
EDIT:
Thanks for the answers, it definitely pointed me in the right direction as far as tools to use (jsonlint) and get a better understanding of structuring data in json. They're all correct answers which shows me that it was a pretty basic question. Thanks again.
the nature of json you have is perfectly valid (the idea of an object nested in an object) if not syntactically correct (didn't verify that all your commas were in the right place).
however, you dont have what you said you wanted, which is a collection of segments in a cluster, and a collection of nodes in a segment.
change it to be
[{
"cluster": {..,
"segments": [{ <--- note the array -- you now have a collection
"name": 'segment1', <- optional, just here to show multiple segments
"nodes": [{....}] <-- same here
},
{
"name": 'segment2',
"nodes": [{....}]
}]
}
}]
I think this looks alright for the most part. However, note the following:
JSON key and values should be in double quotes"and not single quotes'. Look at yourxposandypos` values to see what I mean. I usually use JSONLint to ensure that my JSON is valid.
You say that clusters have a collection of segments and segments have a collection of nodes. This might be best represented as arrays.
It also looks like you want multiple clusters. That is also best expressed as an array.
So something of the form (greatly exaggerated the indentation, hopefully that will help):
{
"cluster" : [
{
"flights": 4,
"profit": 5245,
"clv": 2364,
"segment" : [
{
"flights": 2,
"profit": 2150,
"clv": 1564,
"node" : [
{
"xpos": 1,
"ypos": 2
},
{
//node 2
}
]
},
{
//segment 2
}
]
},
{
//next cluster
}
]
}
There is nothing wrong with the nesting, however, if each cluster can contain multiple segments, and each segment can in-turn have multiple nodes, then you ought to use an array.
{
"cluster": {
"flights": 4,
...,
"segments": [ // segments is an array
{
"flights": 6,
"nodes": [ // nodes is an array
{ "xpos": 4, "ypos": 6 },
{ "xpos": 1, "ypos": 6 },
{ third node },
...
]
},
{ second segment },
...
]
}
}
Seems fine to me, though out of habit I check everything in http://www.jsonlint.com and the slightly 'fixed' version validates (remove your single quotes and ensure you name the structure):
{
"customers": [
{
"cluster" : {
"flights": 4,
"profit": 5245,
"clv": 2364,
"segment" : {
"flights": 2,
"profit": 2150,
"clv": 1564,
"node" : {
"xpos": 1,
"ypos": 2
}
}
}
},
{
"cluster" : {
"flights": 4,
"profit": 5245,
"clv": 2364,
"segment" : {
"flights": 2,
"profit": 2150,
"clv": 1564,
"node" : {
"xpos": 1,
"ypos": 2
}
}
}
}
]
}
As a note, if you were to let jQuery or another plugin do the 'JSONification' it would turn out the same, as has also been noted, you're not representing the segments, etc as a collection (this is where I personally find building the object to be an easier representation).
.. ala (but build your object out):
var stuff = {};
stuff.customers = [];
stuff.customers[stuff.customers.length] = new Cluster();
stuff.customers[i].segment[stuff.customers[i].segment.length] = new Segment();
...etc.
...blah blah fill out object
$.toJSON('{"customerArrary":' + stuff + '}');
function cluster(){
this.flights;
this.profit;
this.clv;
this.segment = [];
}
function Segment(){
this.flights;
this.profit;
this.clv;
this.node = [];
}
function Node(){
this.xpos;
this.ypos;
}
Here's an improvement to the logic with no loss of meaning:
var customers = [
{
"ID" : "client ABC",
"cluster" : { "ID": "cluster 123", "flights": 4, "profit": 5245, "clv": 2364 },
"segment" : { "ID": "segment 456", "flights": 2, "profit": 2150, "clv": 1564 },
"node" : { "xpos" : 1, "ypos" : 2 }
}, {
"ID" : "client DEF",
"cluster" : { "ID": "cluster 789", "flights": 4, "profit": 5245, "clv": 2364 },
"segment" : { "ID": "segment 876", "flights": 2, "profit": 2150, "clv": 1564 },
"node" : { "xpos" : 1, "ypos" : 2 }
}
];
In the above, the actual 'levels' are
clusters > flights etc & segments > flights etc & nodes > xpos etc
which could also be written:
level 1: clusters
level 2: flights, profit, & clv (note: values are unique from segments tho labels are identical)
level 1: segments
level 2: flights, profit, & clv
level 1: nodes
level 2: xpos & ypos
Ok, let's agree the OP's example (as initially written) can meet the strict mechanical requirements of the JSON spec.
However, the OP describes 3 'levels', illustrating them as cluster > segment > node. The word 'level' and the arrows only make any sense if there is a semantic relationship between those objects. After all, 'levels' must relate to each other in a hierarchy, inheritance, sequence or some similarly layered fashion.
The original example gives no hint of the relationship between any part of a cluster and any part of a segment or any part of a node; it gives no way to guess what the relationship should be. The labels just sit adjacent to each other in the example, with a few extraneous braces around them.
Without an apparent relationship to encode, each of these keys most logically names a unique property of a 'customer' object--that is to say, each customer has clusters, segments and nodes. Each property is clearly labeled, and each can happily coexist in a flat structure. If OP has more info on relationships that require levels, the structure is easy to modify.
In short, nesting should have a semantic purpose; if it does not, markers of nesting should be omitted. As presented, much of the JSON syntax in the OP's example had no apparent meaning and introduces logical issues. The revision resolves these issues as well as possible with given information.