Aggregation the data from mongodb, map reduce or any other ways? - javascript

Well i am struggling with the aggregation problems. I thought the easiest way to solve problem is to use map reduce or make separate find queries and then loop through with the async library help.
The schema is here:
db.keyword
keyword: String
start: Date
source: String(Only one of these (‘google’,’yahoo’,’bing’,’duckduckgo’) )
job: ref db.job
results: [
{
title: String
url: String
position: Number
}
]
db.job
name: String
keywords: [ String ]
urls: [ String ]
sources: [ String(‘google’,’yahoo’,’bing’,’duckduckgo’) ]
Now i need to take the data to this form:
data = {
categories: [ 'keyword1', 'keyword2', 'keyword3' ],
series: [
{
name: 'google',
data: [33, 43, 22]
},
{
name: 'yahoo',
data: [12, 5, 3]
}
]
}
Well the biggest problem is that the series[0].data array is made of really difficult find, matching the db.job.urls against the db.keyword.results.url and then get the position.
Is there any way to simplify the query_? I have looked through many of the map reduce examples, but I cant find the correct way what data to map and which to reduce.

It looks as though you are trying to combine data from two separate collections (keyword and job).
Map Reduce as well as the new Aggregation Framework can only operate on a single collection at a time.
Your best bet is probably to query each collection separately and programmatically combine the results, saving them in whichever form is best suited to your application.
If you would like to experiment with Map Reduce, here is a link to a blog post written by a user who used an incremental Map Reduce operation to combine values from two collections.
http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/
For more information on using Map Reduce with MongoDB, please see the Mongo Documentation:
http://www.mongodb.org/display/DOCS/MapReduce
(The section on incremental Map Reduce is here: http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-IncrementalMapreduce)
There are some additional Map Reduce examples in the MongoDB Cookbook:
http://cookbook.mongodb.org/
For a step-by-step walkthrough of how a Map Reduce operation is run, please see the "Extras" section of the MongoDB Cookbook recipe "Finding Max And Min Values with Versioned Documents" http://cookbook.mongodb.org/patterns/finding_max_and_min/
Hopefully the above will give you some ideas for how to achieve your desired results. As I mentioned, I believe that the most straightforward solution is simply to combine the results programmatically. However, if you are successful writing a Map Reduce operation that does this, please post your solution, so that the Community may gain the benefit of your experience.

Related

Algolia search on nested objects in a record - multiple facetFilters in one object

I’m migrating from Mongo to Firebase with Algolia on top to provide the search. But hitting a snag coming up with a comparable way to search in individual elements of a record.
I have an object that stores when a room is available: from and to. Each record can have many individual from/to combos (see the sample below with 2). I want to be able to run a search something like:
roomavailable.from <= 1522195200 AND roomavailable.to >=1522900799
But only have the query search a match within each element, not any facet in all elements. An element query in Mongo works like that. But if I run that query on the record listed below, it will return the record, because the two roomavailable objects satisfy the .from and .to query. I think.
Is there a way to ensure the search is looking only at matching a pair of .from and .to in an individual object/element?
Below is the pertinent part of the record stored in Algolia so you can see the structure.
"roomavailable": [
{
"_id": "rJbdWvY9M",
"from": 1522195200,
"to": 1522799999
},
{
"_id": "r1H_-vKqz",
"from": 1523923200,
"to": 1524268799
}
],
And here is the Mongo (mongoose) equivalent where its searching inside individual elements (this works):
$elemMatch: {
from: {
$lte: moment(dateArray[0]).utc().startOf('day').format()
},
to: {
$gte: moment(dateArray[1]).utc().endOf('day').format()
}
}
I have also tried this query but it seems to still match either the .from AND .to but in any of the the individual roomavailable elements:
index.search({
query: '',
filters: filters,
facetFilters: [roomavailable.from: 1522195200, roomavailable.to: 1524268799],
attributesToRetrieve: [
"roomavailable",
],
restrictHighlightAndSnippetArrays: true
})
I found a couple posts on Algolia discussing using 1 bracket vs. 2 brackets in the facetFilters. I've tried both. Neither work.
Any suggestions would be awesome. Thanks!
Edit: See discussion on Algolia Discourse:
https://discourse.algolia.com/t/how-to-match-multiple-attributes-in-nested-object-with-numericfilters/4887/8
Hi #kanec, thanks for clarifying your question!
Indeed what #Alefort suggested (using roomavailable in a separate index) would be the easiest option since the query I mentioned above will definitely return the results you want. This will mean that you'll have to query the room availability index separately in order to get which IDs are available, so you'll have to use multiple-queries:
https://www.algolia.com/doc/api-reference/api-methods/multiple-queries/
That said, I asked our core API team to see if there's a more reasonable way to approach this issue, but I fear that this is a filter limit due to performance reasons with arrays. You could transform your data structure in the following and index your rooms as an object instead:
[
{
"roomavailable": {
"0": {
"_id": "rJbdWvY9M",
"from": 1522195200,
"to": 1522799999
},
"1": {
"_id": "r1H_-vKqz",
"from": 1523923200,
"to": 1524268799
}
}
}
]
So you can apply the following filter:
{
"filters": "roomavailable.0.from <= 1522195200 AND roomavailable.0.to >= 1522799999 AND roomavailable.1.from <= 1522195200 AND roomavailable.1.to >=1522900799"
}
The downside of this is that you'll need to know the length of roomavailable in order to build the search query on the front-end (you can do so at indexing time by adding a roomavailable_count property) and also this will probably will be less performant with a considerable number of rooms per item; in this case, switching to a dedicated index makes totally sense for the following reasons:
If in your backend you frequently update available rooms you won't impact the other indices' build time
Filters will perform better (as explained above)
Indexing strategy will be simpler to handle
Let me know what you think about this and if it helps you out.

How to optimize performance of searching in two array of object

There are two array of objects one from database and one from csv. I required to compare both array object by their relative properties of Phones and emails and find duplicate array among them. Due to odd database object structure I required to compare both array with Javascript. I wanted to know what is the best algorithm and best way of compare and find duplicates?
I explain simple calculations.
There are 5000 contacts in my database and user may upload another 3000 contacts from csv. Everytime we requires to find duplicate contacts from database and if they find then it may overwrite and rest should be insert. If I compare contact row by row then it may loop 5000 database contacts x 3000 csv contacts = 15000000 time traverse.
This is my present scenario I face due to this system goes stuck. I require some efficient solution of this issue.
I develop the stuff in NodeJS, RethinkDB.
Database object structure exactly represent like that way and it may duplicate entry of emails and phones in other contacts also.
[{
id: 2349287349082734,
name: "ABC",
phones: [
{
id: 2234234,
flag: true,
value: 982389679823
},
{
id: 65234234,
flag: false,
value: 2979023423
}
],
emails: [
{
id: 22346234,
flag: true,
value: "test#domain.com"
},
{
id: 609834234,
flag: false,
value: "test2#domain.com"
}
]
}]
Please review fiddle code, if you want: https://jsfiddle.net/dipakchavda2912/eua1truj/
I have already did indexing. The problem is looking very easy and known in first sight but when we talk about concurrency it is really very critical and CPU intensive.
If understand the question you can use the lodash method differenceWith
let csvContacts = [] //fill it with your values;
let databaseContacts = .... //from your database
let diffArray = [] //the non duplicated object;
const l = require("lodash");
diffArray = l.differenceWith(csvContact,
databaseContacts,
(firstValue,secValue)=>firstValue.email == secValue.email

How do I query in an array of objects in mongodb

I have a db object looking like this:
{
user_name: 'string',
skills: [
{ skill: 'skill1', lvl: 3 }
],
wantsToLearn: [
{skill: 'skill2' }
]
}
I want to make a query wherein I find all users with a wantToLearn skill matching with one pf my input user's skill (regardless of lvl) AND vice versa. Basically, I want to be able to find all users with a match between a skill and something they want to learn.
I have looked at the mongodb documentation and am still a bit clueless on how to do this the best way. I am new to databases in general except for some sql.
Any pointers would be very appreciated!
If you want to find all users matching your given skill, all you have to do is :
db.getCollection('yourCollection').find({"wantsToLearn.skill": "skill2" })
That's the way you query subdocuments in MongoDB, even in arrays

MongoDB - Query conundrum - Document refs or subdocument

I've run into a bit of an issue with some data that I'm storing in my MongoDB (Note: I'm using mongoose as an ODM). I have two schemas:
mongoose.model('Buyer',{
credit: Number,
})
and
mongoose.model('Item',{
bid: Number,
location: { type: [Number], index: '2d' }
})
Buyer/Item will have a parent/child association, with a one-to-many relationship. I know that I can set up Items to be embedded subdocs to the Buyer document or I can create two separate documents with object id references to each other.
The problem I am facing is that I need to query Items where it's bid is lower than Buyer's credit but also where location is near a certain geo coordinate.
To satisfy the first criteria, it seems I should embed Items as a subdoc so that I can compare the two numbers. But, in order to compare locations with a geoNear query, it seems it would be better to separate the documents, otherwise, I can't perform geoNear on each subdocument.
Is there any way that I can perform both tasks on this data? If so, how should I structure my data? If not, is there a way that I can perform one query and then a second query on the result from the first query?
Thanks for your help!
There is another option (besides embedding and normalizing) for storing hierarchies in mongodb, that is storing them as tree structures. In this case you would store Buyers and Items in separate documents but in the same collection. Each Item document would need a field pointing to its Buyer (parent) document, and each Buyer document's parent field would be set to null. The docs I linked to explain several implementations you could choose from.
If your items are stored in two separate collections than the best option will be write your own function and call it using mongoose.connection.db.eval('some code...');. In such case you can execute your advanced logic on the server side.
You can write something like this:
var allNearItems = db.Items.find(
{ location: {
$near: {
$geometry: {
type: "Point" ,
coordinates: [ <longitude> , <latitude> ]
},
$maxDistance: 100
}
}
});
var res = [];
allNearItems.forEach(function(item){
var buyer = db.Buyers.find({ id: item.buyerId })[0];
if (!buyer) continue;
if (item.bid < buyer.credit) {
res.push(item.id);
}
});
return res;
After evaluation (place it in mongoose.connection.db.eval("...") call) you will get the array of item id`s.
Use it with cautions. If your allNearItems array will be too large or you will query it very often you can face the performance problems. MongoDB team actually has deprecated direct js code execution but it is still available on current stable release.

Efficient Sorted Data Structure in JavaScript

I'm looking for a way to take a bunch of JSON objects and store them in a data structure that allows both fast lookup and also fast manipulation which might change the position in the structure for a particular object.
An example object:
{
name: 'Bill',
dob: '2014-05-17T15:31:00Z'
}
Given a sort by name ascending and dob descending, how would you go about storing the objects so that if I have a new object to insert, I know very quickly where in the data structure to place it so that the object's position is sorted against the other objects?
In terms of lookup, I need to be able to say, "Give me the object at index 12" and it pulls it quickly.
I can modify the objects to include data that would be helpful such as storing current index position etc in a property e.g. {_indexData: {someNumber: 23, someNeighbour: Object}} although I would prefer not to.
I have looked at b-trees and think this is likely to be the answer but was unsure how to implement using multiple sort arguments (name: ascending, dob: descending) unless I implemented two trees?
Does anyone have a good way to solve this?
First thing you need to do is store all the objects in an array. That'll be your best bet in terms of lookup considering you want "Give me the object at index 12", you can easily access that object like data[11]
Now coming towards storing and sorting them, consider you have the following array of those objects:
var data = [{
name: 'Bill',
dob: '2014-05-17T15:31:00Z'
},
{
name: 'John',
dob: '2013-06-17T15:31:00Z'
},
{
name: 'Alex',
dob: '2010-06-17T15:31:00Z'
}];
The following simple function (taken from here) will help you in sorting them based on their properties:
function sortResults(prop, asc) {
data = data.sort(function(a, b) {
if (asc) return (a[prop] > b[prop]);
else return (b[prop] > a[prop]);
});
}
First parameter is the property name on which you want to sort e.g. 'name' and second one is a boolean of ascending sort, if false, it will sort descendingly.
Next step, you need to call this function and give the desired values:
sortResults('name', true);
and Wola! Your array is now sorted ascendingly w.r.t names. Now you can access the objects like data[11], just like you wished to access them and they are sorted as well.
You can play around with the example HERE. If i missed anything or couldn't understand your problem properly, feel free to explain and i'll tweak my solution.
EDIT: Going through your question again, i think i missed that dynamically adding objects bit. With my solution, you'll have to call the sortResults function everytime you add an object which might get expensive.

Categories