Trouble Pivoting data with Map Reduce - javascript

I am having trouble pivoting my dataset with map reduce. I've been using the MongoDB cookbook for help, but I'm getting some weird errors. I want to take the below collection and pivot it so that each user has a list of all of the review ratings.
My collection looks like this:
{
'type': 'review',
'business_id': (encrypted business id),
'user_id': (encrypted user id),
'stars': (star rating),
'text': (review text),
}
Map function (wrapped in Python):
map = Code(""""
function(){
key = {user : this.user_id};
value = {ratings: [this.business_id, this.stars]};
emit(key, value);
}
""")
The map function should return an array of values associated with the key...
Reduce function (wrapped in Python):
reduce = Code("""
function(key, values){
var result = { value: [] };
temp = [];
for (var i = 0; i < values.length; i++){
temp.push(values[i].ratings);
}
result.value = temp;
return result;
}
""")
However, the results return one less rating than total. In fact, some users have None returned, which can't happen. Some entries look like the following:
{u'_id': {u'user: u'zwZytzNIayFoQVEG8Xcvxw'}, u'value': [None, [u'e9nN4XxjdHj4qtKCOPQ_vg', 3.0], None, [...]...]
I can't pinpoint what in my code is causing this. If there are 3 reviews, they all have business IDs and ratings in the document. Plus, using 'values.length + 1' in my loop condition breaks values[i] for some reason.
Edit 1
I've embraced the fact that reduce gets called multiple times on itself, so below is my new reducer. This returns an array of [business, rating, business, rating]. Any idea how to output [business, rating] arrays instead of one giant array?
function(key, value){
var result = { ratings:[] };
var temp = [];
values.forEach(function(value){
value.ratings.forEach(function(rating){
if(temp.indexof(rating) == -1){
temp.push(rating);
}
});
});
result. rartings = temp;
return result;
}

Heres a test example:
1) Add some sample data:
db.test.drop();
db.test.insert(
[{
'type': 'review',
'business_id': 1,
'user_id': 1,
'stars': 1,
},
{
'type': 'review',
'business_id': 2,
'user_id': 1,
'stars': 2,
},
{
'type': 'review',
'business_id': 2,
'user_id': 2,
'stars': 3,
}]
);
2) Map function
var map = function() {
emit(this.user_id, [[this.business_id, this.stars]]);
};
Here we set the results as we want them to look like at the end of the process. Why? because if there is only ever a single review by a user (the key we are grouping by) then the results won't go through a reduce phase.
3) Reduce function
var reduce = function(key, values) {
var result = { ratings: [] };
values.forEach(function(value){
result.ratings.push(value[0]);
});
return result;
};
Here we collect up all the values, remembering we nested them in the map method, so we can just pick out the first value for each set of results.
4) Run the map reduce:
db.test.mapReduce(map, reduce, {finalize: final, out: { inline: 1 }});
Alternative - use the aggregation framework:
db.test.aggregate({
$group: {
_id: "$user_id",
ratings: {$addToSet: {business_id: "$business_id", stars: "$stars"}}
}
});

Related

How to declare a Hash/Dictionary of Array

I have a program that pushes values into one data structure like this:
if(symbolType == "C" || symbolType == "P") // The calls and puts
stocks.push({
symbol: symbol,
undsymbol: undSymbol,
open: 0,
type: symbolType,
expiry: expiry,
days: days,
strike: strike
});
}
else // The stock
{
stocks.push({
symbol: symbol,
open: 0,
type: symbolType
});
}
So this is the key: NOT A STRING!
{
symbol: symbol,
open: 0,
type: symbolType
}
And the values of which are many look like this:
{
symbol: symbol,
undsymbol: undSymbol,
open: 0,
type: symbolType,
expiry: expiry,
days: days,
strike: strike
}
The problem is that stocks and calls and puts are being put into one collection. Instead, I want to add the the stocks and their corresponding calls and puts into a dictionary/map, where the stocks are the keys, and the calls and puts get pushed into an array indexed by it's stock.
At the end, I want to be able to iterate and get the keys and values.
How do I declare this object
Index into it to see if the key[stock] already exists, if it doesn't add it with an empty array.
If I get a "C" or "P", I want to get the corresponding array that holds the Calls/Puts for this key [stock] and push the call/put into the array.
Initially I thought the declaration was something like this:
var stockCallsPutDict = {[]}
stockCallsPutDict[stock] = [];
stockCallsPut[stock].push(call);
// Pretty print the dict of keys and its options =
stockCallsPutDict.forEach(function kvp) {
...
}
If ES6 is an option, you can either build an object yourself or use a Map.
Here's some quick code I came up with:
const stocks = {};
const addCallAndPut = callAndPut => {
const symbol = callAndPut.symbol;
if (!stocks[symbol]) {
stocks[symbol] = [];
}
stocks[symbol].push(callAndPut);
}
const showStuff = () => {
for (const symbol in stocks) {
// output stuff using stocks[symbol]
}
}
OR WITH A MAP
const stocks = new Map();
// basic implementation
const addCallAndPut = callAndPut => {
const stockCallsAndPuts = stocks.get(callAndPut.symbol) || [];
stockCallsAndPuts.push(callAndPut);
stock.set(callAndPut.symbol, stockCallsAndPuts);
}
There are a few ways to go about this, and the best depends on how the data needs to be processed later, but from your description I'd go with something along the lines of
var stocks = {};
var stockCallsPut = {};
// loop over stocks and actions
if (!(symbol in stocks)) {
stocks[symbol] = [];
}
if (!(symbol in stockCallsPut)) {
stockCallsPut[symbol] = {};
}
if (!(symbolType in stockCallsPut[symbol])) {
stockCallsPut[symbol][symbolType] = [];
}
// accumulated stock json items here
stocks[symbol].push(new_stock_item);
// accumulated push/call json items of stock here
stockCallsPut[symbol][symbolType].push(new_action);
I'm still not sure I actually understood what your data looks like, but sounds kind of like this to me:
// Not sure if data is an object or array
var data = {
'one': {
'name': 'one-somename',
'number': 'one-somenumber',
'symbol': 'C'
},
'two': {
'name': 'two-somename',
'number': 'two-somenumber',
'symbol': 'P'
},
'three': {
'name': 'three-somename',
'number': 'three-somenumber',
'symbol': 'C'
}
};
var stocks = {};
for (var name in data) {
// It sounded like you wanted a call/put array for each object but I'm not sure if that's true since it wouldn't be possible... if so can just divide this part up into it's appropriate place in the if statement below
// Checking that the property is set on the object, if it is, it uses itself, otherwise it adds it with the call/put arrays created
stocks[name] = stocks[name] ? stocks[name] : {'calls': [], 'puts': []};
var type;
if (data[name]['symbol'] === 'C') {
type = 'calls';
} else if (data[name]['symbol'] === 'P') {
type = 'puts';
}
stocks[name][type].push(data[name]);
}

Summarize & Group By with Lodash

I'm new to Lodash and I'm trying to perform a complex sum with group by as SQL but I don't find any solution. I have tried to use/combine multiple Lodash functions without success.
My requirement is like this. I have a JSON response:
input =
[{"quantity":1067,"gross_revenue":4094.2,"date":"03","company":"Cat1","product":"Car"},
{"quantity":106,"gross_revenue":409,"date":"02","company":"Cat2","product":"Car"},
{"quantity":106,"gross_revenue":85,"date":"03","company":"Cat2","product":"House"},
{"quantity":106,"gross_revenue":100,"date":"02","company":"Cat3","product":"House"},
{"quantity":20,"gross_revenue":150,"date":"03","company":"Cat5","product":"Technology"},
{"quantity":40,"gross_revenue":100,"date":"01","company":"Cat5","product":"Technology"},
{"quantity":20,"gross_revenue":15,"date":"01","company":"Cat5","product":"Car"},
{"quantity":20,"gross_revenue":18,"date":"01","company":"Cat5","product":"House"},
{"quantity":20,"gross_revenue":2,"date":"01","company":"Cat2","product":"House"},
{"quantity":20,"gross_revenue":25,"date":"01","company":"Cat3","product":"House"}]
I need to generate a result as below to populate the series for a HighChart:
[{ name: 'Car', data: [15, 409, 4094.2] },
{ name: 'House', data:[45, 100, 85] },
{ name: 'Techonology', data:[100, null, 150] }]
Those values are the result from:
Make a group by using Product with the tag name
Based on following procedure, generate an array with the tag data
2.1 Sum the gross revenue based on Product and date (all existing dates)
2.2 Include a null value if there doesn't exist gross revenue for any existing day
2.3 Sort the results for gross revenue based on date, ascending order
Is this possible? Or is there another solution for this?
Thanks.
Here's one way to do it - certainly not the only solution...
var input = [
{"quantity":1067,"gross_revenue":4094.2,"date":"03","company":"Cat1","product":"Car"},
{"quantity":106,"gross_revenue":409,"date":"02","company":"Cat2","product":"Car"},
{"quantity":106,"gross_revenue":85,"date":"03","company":"Cat2","product":"House"},
{"quantity":106,"gross_revenue":100,"date":"02","company":"Cat3","product":"House"},
{"quantity":20,"gross_revenue":150,"date":"03","company":"Cat5","product":"Technology"},
{"quantity":40,"gross_revenue":100,"date":"01","company":"Cat5","product":"Technology"},
{"quantity":20,"gross_revenue":15,"date":"01","company":"Cat5","product":"Car"},
{"quantity":20,"gross_revenue":18,"date":"01","company":"Cat5","product":"House"},
{"quantity":20,"gross_revenue":2,"date":"01","company":"Cat2","product":"House"},
{"quantity":20,"gross_revenue":25,"date":"01","company":"Cat3","product":"House"}
];
var result = [];
var groupedByProduct = _.groupBy(input, "product");
// get the set of unique dates
var dates = _.uniq(_.map(input, 'date'));
// for each product, perform the aggregation
_.forEach(groupedByProduct, function(value, key) {
// initialize the data array for each date
data = [];
for (var i = 0; i < dates.length; i++) {
data.push(null);
}
// aggregate gross_revenue by date
_.forEachRight(_.groupBy(groupedByProduct[key], "date"), function(dateValue, dateKey) {
// use the date as an array index
data[parseInt(dateKey) - 1] = _.sumBy(dateValue, function(o) {
return o.gross_revenue
});
});
// push into the result array
result.push({"name": key, "data": data});
});
document.getElementById("result").innerHTML = JSON.stringify(result);
<script src="https://cdn.jsdelivr.net/lodash/4.11.1/lodash.min.js"></script>
<pre id="result"></pre>

AngularJS: Merge object by ID, i.e. replace old entry when IDs are identical

I am using Ionic with AngularJS and I am using a localForage database and AJAX via $http. My app has a news stream that contains data like this:
{
"feed":[
{
"id":"3",
"title":"Ein Hund",
"comments:"1"
},
{
"id":"2",
"title":"Eine Katze",
"comments":"2"
}
],
"ts":"20150907171943"
}
ts stands for Timestamp. My app saves the feed locally via localForage.
When the app starts it first loads the locally saved items:
$localForage.getItem("feed").then(function(val) { vm.feed = val; })
Then, it loads the new or updated items (ts < current timestamp) and merges both the old and new data:
angular.extend(vm.feed, response.data.feed);
Updated items look like this:
{
"feed":[
{
"id":"2",
"title":"Eine Katze",
"comments":"4"
}
],
"ts":"20150907171944"
}
That is, the comments count on feed item 2 has changed from 2 to 4. When I merge the old and new data, vm.feed has two items with id = 2.
Does angularjs has a built-in "merge by id" function, i. e. copy from source to destination (if it is a new element), or otherwise replace the old element? In case angularjs does not have such a function, what's the best way to implement this?
Thanks in advance!
angular.merge(vm.feed, response.data.feed);
// EDIT
Probably, it will not merge correctly, so you have to update all properties manually. Update ts property and then find your object with id and replace it.
There is no builtin, I usually write my own merge function:
(function(){
function itemsToArray(items) {
var result = [];
if (items) {
// items can be a Map, so don't use angular.forEach here
items.forEach(function(item) {
result.push(item);
});
}
return result;
}
function idOf(obj) {
return obj.id;
}
function defaultMerge(newItem, oldItem) {
return angular.merge(oldItem, newItem);
}
function mergeById(oldItems, newItems, idSelector, mergeItem) {
if (mergeItem === undefined) mergeItem = defaultMerge;
if (idSelector === undefined) idSelector = idOf;
// Map retains insertion order
var mapping = new Map();
angular.forEach(oldItems, function(oldItem) {
var key = idSelector(oldItem);
mapping.set(key, oldItem);
});
angular.forEach(newItems, function(newItem) {
var key = idSelector(newItem);
if (mapping.has(key)) {
var oldItem = mapping.get(key);
mapping.set(key, mergeItem(newItem, oldItem));
} else {
// new items are simply added, will be at
// the end of the result list, in order
mapping.set(key, newItem);
}
});
return itemsToArray(mapping);
}
var olds = [
{ id: 1, name: 'old1' },
{ id: 2, name: 'old2' }
];
var news = [
{ id: 3, name: 'new3' },
{ id: 2, name: 'new2' }
];
var merged = mergeById(olds, news);
console.log(merged);
/* Prints
[
{ id: 1, name: 'old1' },
{ id: 2, name: 'new2' },
{ id: 3, name: 'new3' }
];
*/
})();
This builds a Map from the old items by id, merges in the new items, and converts the map back to list. Fortunately the Map object will iterate on the entries in insertion order, according to the specification. You can provide your idSelector and mergeItem functions.
Thanks hege_hegedus. Based on your code, I've written my own and tried to use less loops to speed things up a bit:
function updateCollection(localCollection, fetchedCollection) {
angular.forEach(fetchedCollection, function(item) {
var append = true;
for (var i = 0; i < localCollection.length; i++) {
if (localCollection[i].id == item.id) {
// Replace item
localCollection[i] = item;
append = false;
break;
} else if (localCollection[i].id > item.id) {
// Add new element at the right position, if IDs are descending check for "< item.id" instead
localCollection.splice(i, 0, item);
append = false;
break;
}
}
if (append) {
// Add new element with a higher ID at the end
localCollection.push(item);
// When IDs are descending use .unshift(item) instead
}
});
}
There is still room for improvements, i. e. the iteration through all the objects should use binary search since all items are sorted by id.

Why do I get different results using withMutations?

Am I misunderstanding its purpose or how it works?
var menuItems = Immutable.List.of(
{ parent_id: 0, id: 1 },
{ parent_id: 1, id: 2 },
{ parent_id: 1, id: 3 }
);
var results1 = menuItems
.filter(function(menuItem) { return menuItem.parent_id === 1; }) // Filter out items with parent_id = 1
.sort(function(childA, childB) { return childA.sort_order - childB.sort_order; }); // Sort them by sort_order
var results2 = menuItems.withMutations(function(list) {
list
.filter(function(menuItem) { return menuItem.parent_id === 1; }) // Filter out items with parent_id = 1
.sort(function(childA, childB) { return childA.sort_order - childB.sort_order; }); // Sort them by sort_order
});
console.log(results1.size); // 2
console.log(results2.size); // 3
My understanding is that they would yield the same results, but that withMutations would be faster due to the chaining of operations.
You have misunderstood withMutations. The point of it is to give you a temporary playground where you can actually change the list instead of creating copies.
An example would be:
var results2 = menuItems.withMutations(function(list) {
list.shift()
});
In your code, you use filter inside withMutations. Filter creates a new array and does not modify the original array, so your withMutations does nothing.
I think you would be better off just not using withMutations at all. If at some point you think "this would be so much easier if I could just modify the array instead of making copies", you can turn to withMutations.

How to do group.reduce in a flat data in crossfilter

New to crossfilter. I've a flat data which is given below:
id,name,patientId,conditionId,isPrimary,age,gender,race,Status,CGI
1,M1,1,c1,Y,33,Male,White,Discharged,0
2,M2,1,c1,N,33,Male,White,Discharged,0
3,M3,1,c2,N,33,Male,White,Discharged,0
4,M4,1,c2,N,33,Male,White,Discharged,0
5,M5,1,c3,N,33,Male,White,Discharged,0
6,M6,1,c3,N,33,Male,White,Discharged,0
25,M1,5,c1,Y,33,Male,White,Discharged,1
26,M7,5,c2,N,33,Male,White,Discharged,1
27,M4,5,c4,N,33,Male,White,Discharged,1
28,M4,5,c1,N,33,Male,White,Discharged,1
29,M4,5,c2,N,33,Male,White,Discharged,1
30,M5,5,c4,N,33,Male,White,Discharged,1
29,M2,6,c1,Y,33,Male,White,Discharged,1
30,M2,7,c1,Y,33,Male,White,Discharged,1
I want to do a count on conditionId but since there are multiple records belonging to the same person as identified by patientId, the count of value c1 should be 4 (belonging to patientId 1, 5, 6, 7) - because same patient may have multiple records (for eg. patientId of 1 is repeated 6 times and two of them has c1 which should be counted only once) . I'm struggling to write a group.reduce on conditionId but could not even start.
Thanks in advance.
Here's one way of doing it. In the example I assumed that the first value was the patientId and the second value the conditionId. The code keeps track of grouping keys (concatenation of the patientId and the conditionId) that were seen already and ignores them.
var countMap = [
[1, 'c1'],
[1, 'c1'],
[2, 'c1'],
[2, 'c2']
].reduce(function (r, v) {
var condition = v[1],
groupKey = v[0] + condition;
if (!r.seen[groupKey]) {
r.seen[groupKey] = true;
r.count[condition] = (r.count[condition] || 0) + 1;
}
return r;
}, {seen: {}, count: {}}).count;
countMap.c1; //2
countMap.c2; //1
I do not know about crossfilter or dc.js, that's why I gave you a vanilla JS solution.
It's a little complicated to do this in Crossfilter, but the solution is similar to that provided by #plalx.
Here is a helper function I am using in one of my projects. It's not perfect, and is a bit optimized to reduce dictionary lookups, so it's not the most readable. The basic idea is you need to keep a dictionary of values seen before for each group. You only need to remember patients, because the condition is already known based on the group your are in:
function reduceHelper(accessorFunction) {
var internalCount;
return {
add: function (p, v) {
if(p.unique.has(accessorFunction(v))) {
internalCount = p.unique.get(accessorFunction(v));
p.unique.set(accessorFunction(v), internalCount + 1);
} else {
p.unique.set(accessorFunction(v), 1);
++p.count;
}
return p;
},
remove: function (p, v) {
if(p.unique.has(accessorFunction(v))) {
internalCount = p.unique.get(accessorFunction(v));
if(internalCount == 1) {
p.unique.remove(accessorFunction(v));
--p.count;
} else {
p.unique.set(accessorFunction(v), internalCount - 1);
}
}
return p;
},
init: function () {
return {unique: d3.map(), count: 0};
}
};
}
You'll need to create a Crossfilter (xfilter) on your data and then:
var helperFunctions = reduceHelper(function(d) { return d.patientId; });
var dim = xfilter.dimension(function (d) { return d.conditionId; };
var group = dim.group()
.reduce(helperFunctions.add, helperFunctions.remove, helperFunctions.init);
Your group will now count the number of patients that have each condition. If a condition appears more than once for a given patient, that patient will still only be counted once. At least, it will if my solution works properly :-)

Categories