riak: stumped on a basho MapReduce challenge - javascript

Working through their MapReduce tutorial, and Basho posits a MR challenge here, given daily stock data for the GOOG ticker:
Find the largest day for each month in terms of dollars traded, and
subsequently the largest overall day. Hint: You will need at least
one each of map and reduce phases.
Each day in the goog bucket has a key that corresponds to its data and corresponding data that looks like this:
"2010-04-21":{
Date: "2010-04-21",
Open: "556.46",
High: "560.25",
Low: "552.16",
Close: "554.30",
Volume: "2391500",
Adj Close: "554.30"
}
Due to my relative lack of familiarity with the MR paradigm (and, candidly, Javascript), I wanted to work through how to do this. I assume that most of the work here would actually get done in the reduce function, and that you'd want a map function that looks something like:
function(value, keyData, arg){
var data = Riak.mapValuesJson(value)[0];
var obj = {};
obj[data.Date] = Math.abs(data.Open - data.Close);
return [ obj ];
}
which would give you a list, by day, if not of dollars traded per day, at least the change in the stock price by day.
The question I would then have would be how to structure a reduce function that is able to parse through by month, select for only the largest value per month, and then sort everything from largest month to smallest.
Am I shortchanging the work that I need to do in my map function here, or is this roughly the right idea?

I originally authored that challenge! Unless you'd like me to just give you the answer, I'll give you this hint: the key here is to think in terms of aggregate functions. How do you need to group the entries to find the maximum for each month, and then the maximum across the entire dataset?
Also, from the given data you can't know the exact amount of money exchanged in the day, but you could make a guess by multiplying the average price by the volume of shares traded.

Related

Simple popular/trending products algorithm for an ecommerce site with Javascript

I am currently building a small ecommerce site and for the home page, I would like to rank the products based on popularity. The product schema has the following data:
{
...
noViews: Number, // Number of times a user has clicked on the product
avgRating: Number, // Average star rating (1 - 5)
datePosted: Date // Date the product was posted
}
What is a (relatively) simple algorithm/code I can use to implement this (with Node.js), where the products would be ranked based on popularity (with time taken into consideration so the content isn't stale).
Thank you.
First of all, you need a time based metric to solve this.
Overall trending products does not make much sense particularly for a ecommerce app.
Lets say, you want to do find the trending products for a particular day (say today).
Now this is not a easy problem to solve. And there is no just 1 direct way to solve it. For each company, one method works!
Ideally , what could be a simple solution here would be to create a score value based on combination of factors. For example, number of buys, number of views, number of clicks and so on.
Here, you have the datePosted parameter. Now, create a date_score for this. So, the basic idea here is a product that was posted yesterday has more score than the one posted a week back. The values you put need to tweaked and checked for your algorithm's customization.
Similarly, implemeent a similar score for avgRating and noViews.
Once you have these scores ready, create weights for these scores. Now the weighted average for these out of 100 is the final score.
So, a final example solution:
Date Posted:
If Date is in the last 3 days (score = 100)
If Date is the last week (score= 75)
If Date in the previous week (score = 50)
Else default score = 25.
Star Rating:
(Rating/5 ) *100
numberOfViews:
Here , you can use percentiles.
Lets say max views among all the product is x.
Then, the view score is (views/x) *100
Now, handle the corner case of what happens when two scores match.
After this, you can simply order by the score parameter.
Also, make sure everything you do is dynamic as static thresholds dont provide a great resul generally!

Getting a portion of a sorted array from start value to end value

I'm pretty new to javascript and I need to get a portion (slice) of a sorted array (Numbers, timestamps basically) by start_value and end_value.
For example, let's say I have an array of random timestamps from last month, and I want to get all timestamps between two weeks ago and a week ago.
This is a pretty simple algorithm to write (using a binary search) but I don't want to mess up my code with these computations.
I've been searching for a way to do this in javascript but haven't found any.
Thanks for any future help :)
Perhaps use filter?
var dates = [123, 234, 456, 468, 568, 678];
var min = 300;
var max = 500;
var inRange = dates.filter(function(date) {
return min < date && date < max;
});
console.log(inRange);
On the plus side, this doesn't even need them to be sorted. On the down side, it probably won't be as fast as a well-implemented binary search for the relevant start and end points. Unless you've got some really harsh performance requirements I don't think that'll matter.
Ok, I found a pure js library called binarysearch that has exactly what I'm looking for: https://www.npmjs.com/package/binarysearch. It has rangeValue function which accepts non-existing numbers as start-end. Seems to be working :)

Calculate percentile rank (Parse)

I need to calculate the percentile rank of a particular value against a large number of values filtered in various different ways. The data is all stored on Parse.com, which has a limitation of returning a maximum of 1000 rows per query. The number of values stored is likely to exceed well over 100,000.
By 'percentile rank', I mean I need to calculate the percentage of values that the provided value is greater than. I am not trying to calculate the value of a provided percentile. For example, given a list of values {20, 23, 24, 29, 30, 31, 35, 40, 40, 43} the percentile rank of the provided value 35 is 70%. The algorithm for this is simply the rank of the value / count of values * 100. Not sure if 'percentile rank' is the correct terminology for this.
I have considered a couple of different approaches to this. The first is to pull down the full list of values (into Parse Cloud) and then calculate the percentile rank from there, then filter the list and calculate again, repeating the last two steps as many times as required. The problem with this approach is it will not work once we reach 1000 values, which we can expect pretty quickly.
Another option, which is the best I can come up with so far, is to query the count of items, and the rank of the provided value. For example:
var rank_world_alltime = new Parse.Query("Values")
.lessThan("value", request.params.value) // Filters query to values less than the provided value, so counting this query will return the rank
.count();
var count_world_alltime = new Parse.Query("Values")
.count();
Parse.Promise.when(rank_world_alltime, count_world_alltime).then(function(rank, count) {
percentile = rank / count * 100;
console.log("world_alltime_percentile = " + percentile);
});
This works well for a single calculation, but I need to perform multiple calculations, and this approach very quickly becomes a lot of queries. I expect to need to run about 15 calculations per call, which is 30 queries. All calculations need to complete in under 3 seconds before Parse terminates the job, and I am limited to 30 reqs/second, so this is very quickly going to become a problem.
Does anyone have any suggestions on how else I could approach this? I've thought about somehow pre-processing some of this but can't quite work out how to do so, as the filters will be based on time and location (city and country), so there are potentially a LOT of pre-calculations that will need to be run at regular intervals. The results do not need to be 100% accurate but something close.
I don't know much about parse, but as far as I understand what you say, it is some kind of cloud database thingy that holds your hiscores, and limits you 1000 rows per query, 3 seconds per job, and 30 queries per second.
In order to have approximate calculations and divide by 2 the number of queries, I would first of all cache the total (count_world_alltime, count_region,week, whatever). If you can save them somewhere locally. For numbers of 100K just getting the order of magnitude (thus not the latest updated number) should be good enough to get a percentile.
Maybe you can get several counts per query. However my lack of expertise in parse/nosql kind of stops me from being sure of this, you'll have to check their documentation. If it is possible however, for the case where you need percentiles for a serie of values all in the same category, I would
Order the values, let's call them a,b,c,d,e (once ordered)
Get the number of values between the intervals [0,a] [a,b] [b,c] [c,d] [d,e]
Use the cached total to get the percentiles (where Nxy is the number of values in [x,y]) :
Pa = 100 * N0a / total
Pb = 100 * ( N0a + Nab ) / total
Pc = 100 * ( N0a + Nab + Nbc ) / total
and so on...
If you need a value ranked worldwide, the other per region, some per week others over all times, etc, this doesn't apply. In that case I don't think you can get below 1 query/number, with caching the totals.

Ways to simplify an array of objects that is repeated several times

I wonder if I can simplify and use less lines of code for this purpose:
I have a class called "worker", and that class has a method that reads the properties (name, age, etc...) from a series of simple arrays.
Until there, everything is fine. Now, one of the properties that I want to add is a boolean value that makes reference to which months of the year the worker is active. For the moment, I have solved it like this:
var months_worker_1 = [{"jan":true},{"feb":true},{"mar":true},{"apr":false}] //and so on
And then, my property reads months_worker_1, but I have one array like that for each worker. I wonder if there is a way to do this that requires less lines of code, like for example, create a "master" array with all the months of the year, and in the array for each worker, specify just the months they are working. Those months become "true", and the rest of months become "false" automatically without specifying so... I have been scratching my head for some time, and for the moment only my current system is working fine, but I am guessing that there must be a simpler way...
Thanks very much!
Edit: I clarify, there is no "big picture". I am just doing some exercises trying to learn javascript and this one woke my interest, because the solution I thought seems too complicated (repeating same array many times). There is no specific goal I need to achieve, I am just learning ways to do this.
A really nice trick that I use sometimes is to use a binary number to keep track of a fixed amount of flags, and convert it to a decimal for easier storage / URL embedding / etc. Let's assume Mark, a user, is active all months of the year. Considering a binary number, in which 1 means "active" and 0 inactive, Mark's flag would be:
111111111111 (twelve months)
if Mark would only be active during january, february and december, his flag value would be:
11000000001
Checking if Mark is active during a specific months is as simple as checking if the character that corresponds to that month's index in Mark's flag is 1 or 0.
This technique has helped me in the past to send values for a large number of flags via URLs, while also keeping the URL reasonably short. Of course, you probably don't need this, but it's a nice thing to know:
Converting from binary to decimal is easy in JS:
parseInt(11000000001, 2).toString(10); // returns 1537
And the reverse:
parseInt((1537).toString(2)); // returns 11000000001
Edit
You could just as easily use an array made out of the month numbers:
var months_worker_1 = [1, 2, 3]; // this would mean that the user is active during january, february and march

How do I set up an automated "Quote of the day"?

I am in charge of a website, and I have set up a "Quote of the Day" which currently is quite simplistic. See Here (on the right of the page)
What it currently does is it gets the Day of the month and the month, and normalises to one, then multiplies by the number of quotes (stored in an xml file) and rounds down. While this method will give me the same quote whichever machine I am on (something a random number generator could never do) it has been pointed out to me that this method is flawed. If you consider January the first couple quotes are going to be the same, 1*1/360, 2*1/360, 3*1/360, thus the quote isn't unique.
Can anyone think of a better way to select a quote of the day?
Fun question. Instead of relying on days of the month, why not count days since a given date? JS provides a pretty good property for that: getTime(), which gives you the number of milliseconds since 12am UTC on Jan. 1 1970, which you can convert to days with some simple division.
The only thing that complicates it is that if you expect your quotes to shift at midnight (and who doesn't?), you have to take into account the timezone. Again, JS provides that with getTimezoneOffset(), which gives the number of minutes ahead or behind the user's locale is compared to UTC. If you want ALL users to flip at the same time, regardless of where they live, just set this to a static value.
Your code could look something like this:
var intQuoteCount = 51; // The number of quotes in your library
var dtNow = new Date();
var intTZOffset = dtNow.getTimezoneOffset() * 60000; // automatically adjust for user timezone
var intNow = dtNow.getTime() - intTZOffset;
var intDay = Math.floor(intNow / 86400000); // The number of 'local' days since Jan 1, 1970
var intQuoteToDisplay = intDay % intQuoteCount;
True, determinism is something "a random number generator could never do". Fortunately (for this case, at least), programming languages provide pseudo-random number generators, not the real thing. The pseudo-random numbers are generated by doing a bunch of calculations on a "seed" value.
To get a repeatable "random" selection, then, all you need to do is set the seed in a way which is consistent for each day - I would suggest using the date, in "yyyymmdd" format, as the seed, but any other number which will be unchanged over the course of a day will work just as well.
Once you have your seed, tell the PRNG to use it with the command srand(mySeed); and you'll get the same sequence of "random" numbers from rand() every time (until mySeed changes).
If you want to show the quotes in order, you could get the current Julian Day number, which will increase by one each day, and take the reminder after dividing it by the number of quotes as the number of today's quote. If you want to show all quotes but the order of them to change each cycle, you can xor the quote number and rearrange the bits using some logic that you get from the quotient of the division.
You could try rounding up on an even day and rounding down on an odd day. But I'm am sure there is better ways, this is just quick suggestion.
Also you could try using the current day of the year in the calculation as this is unique for each new day in the year as opposed to repeating each month.
Do you have to limit yourself to having a cycle of 360 days? If you have for example 500 quotes. some might never be used.
How about- Every day pick a random number between 1 and #OfQoutes, use it as the quote of day index, and mark it as "used in current cycle".
Next time when you pick a number, if you pick a quote that is marked as "used in current cycle" re-pick until you get a number of quote which isn't marked so. When all quotes are marked, un-mark all of them.
This will ensure you're going through all quotes in each cycle, together with randomness, and it will obviously work for any number of quotes.
<body onLoad="thoughts_authors()">
<script>
function thoughts_authors()
{
var authors=new Array()
authors[0] = "Charles Schulz";
authors[1] = "Jack Wagner";
authors[2] = "Mark Twain";
authors[3] = "Oscar Wilde";
authors[4] = "David Letterman";
authors[5] = "Lily Tomlin";
var thoughts=new Array()
thoughts[0] = "Good Day Is Today";
thoughts[1] = "Style Is What You Choose";
thoughts[2] = "Be The Best Version Of You.";
thoughts[3] = "Truth Along Triumphs.";
thoughts[4] = "How can Life Be Devastating When YOU Are Present in It.";
thoughts[5] = "Believe In What You Say";
index = Math.floor(Math.random() * thoughts.length);
alert(thoughts[index]+ "-" + authors[index]);
}
</script>
THIS WILL GENERATE RANDOM QUOTES ALONG WITH RANDOM AUTHORS

Categories