Picking quartile value on each point

Picking quartile value on each point - javascript

I'm plotting sentiment value of tweet over last 10 years.
The csv file has the three columns like below.
I plotted each value by date successfully.
However, when I tried to generate an area graph,
I encountered a problem which is,
each date has multiple values.
That's because each data point is from one single tweets that
one x point ended up with having multiple y values.
So I tried to pick quartile value of each date or pick largest or least y value.
For clarity, please see the example below.
January 8 has multiple y values (textblob)
I want to draw area graph by picking the largest value or 2nd quartile value of each point.
How do I only pick the point?
I would like to feed the point in the following code as a
x/y coordinate for line or area greaph.
function* vlinedrawing(data){
for(let i;i<data.length;i++){
if( i%500==0) yield svg.node();
let px = margin+xscale(data[i].date)
let py = height-margin-yscale(data[i].vader)
paths.append('path')
.attr('x',px)
.attr('y',py)
}
yield svg.node()
}
The entire code is in the following link.
https://jsfiddle.net/soonk/uh5djax4/2/
Thank you in advance.
( The reason why it is a generator is that I'm going to visualize the graph in animated way)

For getting the 2nd quartile you can use d3.quantile like this:
d3.quantile(dataArray, 0.5);
Of course, since the 2nd quartile is the median, you can also just use:
d3.median(dataArray);
But d3.quantile is a bit more versatile, you can just change the p value for any quartile you want.
Using your data, without parsing the dates (so we can use a Set for unique values`), here is a possible solution:
const aggregatedData = [...new Set(data.map(function(d) {
return d.date
}))].map(function(d) {
return {
date: parser(d),
textblob: d3.quantile(data.filter(function(e) {
return e.date === d
}).map(function(e) {
return e.textblob
}), 0.5)
}
});
This is just a quick answer for showing you the way: that's not a optimised code, because there are several loops within loops. You can try to optimise it.
Here is the demo:
var parser = d3.timeParse("%m/%d/%y");
d3.csv('https://raw.githubusercontent.com/jotnajoa/Javascript/master/tweetdata.csv', row).then(function(data) {
const aggregatedData = [...new Set(data.map(function(d) {
return d.date
}))].map(function(d) {
return {
date: parser(d),
textblob: d3.quantile(data.filter(function(e) {
return e.date === d
}).map(function(e) {
return e.textblob
}), 0.5)
}
});
console.log(aggregatedData)
});
function row(d) {
d.vader = +d.vader;
d.textblob = +d.textblob;
return d;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.7.0/d3.min.js"></script>

Related

crossfilter.js - Histogram with custom reduce function fails in filtering data

following situation: I have two plots, one scatterplot and one histogram for the x-values in this scatterplot. I wrote a custom reduce function that looks similar to this:
let grouping = this._cf_dimensions[attribute].group().reduce(
function(elements, item) {
elements.items.push(item);
elements.count++;
return elements;
},
function(elements, item) {
// console.log("item.id = " + item.id);
let match = false;
let values = [];
for (let i = 0; i < elements.items.length && !match; i++) {
// Compare hyperparameter signature.
if (item.id === elements.items[i].id) {
match = true;
elements.items.splice(i, 1);
elements.count--;
}
}
}
return elements;
},
function() {
return {items: [], count: 0};
}
);
The problem: When I select points in my scatterplot, the correlating histogram does not update properly. I traced it back to the remove function, i. e. the second of the three functions above, being called for only one of my five groups (I checked by comparison of the length of elements with the original group size). That means that the items to be removed won't be necessarily found.
In other words, the scatterplot selects the correct set of datapoints, but the remove function in the barchart grouping shown above, while registering the incoming filter update, is not called for all groups of this grouping (equivalently: not called for all bars in the bar chart).
I'm a bit at a loss, since I seem to remember successfully implementing dashboards with dc.js and crossfilter.js and the past exactly like this. Do I misunderstand something about the custom reduce concept or is there something obvious I'm overlooking?
Thanks!

Crossfilter - Double Dimensions (second value linked to daily max)

Quite an oddly specific question here but something I've been having a lot of trouble with over the past day or so. Broadly, I'm trying to calculate the maximum of an array using crossfilter and then use this value to find a maximum.
For example, I have a series of Timestamps with an associated X Value and a Y Value. I want to aggregate the Timestamps by day and find the maximum X Value and then report the Y Value associated with this Timestamp. In essence this is a double dimension as I understand it.
I'm able to do the first stage simply to find the maximum values. But am having a lot of difficulty getting through to the second value.
Working code for the first, (using Crossfilter and Reductio). Assuming that each row has the following four values.
[(Timestamp, Date, XValue, YValue),
(2015-05-15 16:00:00, 2015-05-15, 30, 15),
(2015-05-15 16:45:00, 2015-05-15, 25, 33)
... (many thousand of rows)]
First Dimension
ndx = crossfilter(data);
dailyDimension = ndx.dimension(function(d) { return d.date; });
Get the max of the X Value using reductio
maxXValue = reductio().max(function(d) { return d.XValue;});
XValues = maxXValue(dailyDimension.group())
XValues now contains all of the maximum X Values on a Daily Basis.
I would now like to use these X Values to identify the corresponding Y Values on a date basis.
Using the same data above the appropriate value returned would be:
[(date, YValue),
('2015-05-15', 15)]
// Note, that it is 15 as it is the max X Value we find, not the max Y Value.
In Python/Pandas I would set the index of a DataFrame to X and then do an index match to find the Y Values
(Note, it can safely be assumed that the X Values are unique in this case but in reality we should really identify the Timestamp linked to this period and then match on that as they are strictly guaranteed to be unique, not loosely).
I believe this can be accomplished by modifying the reductio maximum code which I don't fully understand properly Source Code is from here
var reductio_max = {
add: function (prior, path) {
return function (p, v) {
if(prior) prior(p, v);
path(p).max = path(p).valueList[path(p).valueList.length - 1];
return p;
};
},
remove: function (prior, path) {
return function (p, v) {
if(prior) prior(p, v);
// Check for undefined.
if(path(p).valueList.length === 0) {
path(p).max = undefined;
return p;
}
path(p).max = path(p).valueList[path(p).valueList.length - 1];
return p;
};
},
initial: function (prior, path) {
return function (p) {
p = prior(p);
path(p).max = undefined;
return p;
};
}
};
Perhaps this can be modified so that there is a second valueList of Y Values which maps 1:1 with the X Values associated in the max function. In that case it would be the same index look up of both in the functions and could be assigned simply.
My apologies that I don't have any more working code.
An alternative approach would be to use some form of Filtering Function to remove entries which don't satisfy the X Criteria and then group by day (there should only be one value in this setting so a simple reduceSum for example will still return the correct value).
// Pseudo non working code
dailyDimension.filter(function(p) {return p.XValue === XValues;})
dailyDimension.group().reduceSum(function(d) {return d.YValue;})
Eventual results will be plotted in dc.js

Not sure if this will work, but maybe give it a try:
maxXValue = reductio()
.valueList(function(d) {
return ("0000000000" + d.XValue).slice(-10) + ',' + d.YValue;
})
.aliasProp({
max: function(g) {
return +(g.valueList[g.valueList.length - 1].split(',')[0]);
},
yValue: function(g) {
return +(g.valueList[g.valueList.length - 1].split(',')[1]);
}
});
XValues = maxXValue(dailyDimension.group())
This is kind of a less efficient and less safe re-implementation of the maximum calculation using the aliasProp option, which let's you do pretty much whatever you want to to a group on every record addition and removal.
My untested assumption here is that the undocumented valueList function that is used internally in max/min/median will properly order. Might be easier/better to write a Crossfilter maximum aggregation and then modify it to also add the y-value to the group.
If you want to work through this with Reductio, I'm happy to do that with you here, but it will be easier if we have a working example on something like JSFiddle.

Optimising a group of dc.js line graphs

I have a group of graphs visualizing a bunch of data for me (here), based off a csv with approximately 25,000 lines of data, each having 12 parameters. However, doing any interaction (such as selecting a range with the brush on any of the graphs) is slow and unwieldy, completely unlike the dc.js demo found here, which deals with thousands of records as well but maintains smooth animations, or crossfilter's demo here which has 10 times as many records (flights) as I do.
I know the main resource hogs are the two line charts, since they have data points every 15 minutes for about 8 solid months. Removing either of them makes the charts responsive again, but they're the main feature of the visualizations, so is there any way I can make them show less fine-grained data?
The code for the two line graphs specifically is below:
var lineZoomGraph = dc.lineChart("#chart-line-zoom")
.width(1100)
.height(60)
.margins({top: 0, right: 50, bottom: 20, left: 40})
.dimension(dateDim)
.group(tempGroup)
.x(d3.time.scale().domain([minDate,maxDate]));
var tempLineGraph = dc.lineChart("#chart-line-tempPer15Min")
.width(1100).height(240)
.dimension(dateDim)
.group(tempGroup)
.mouseZoomable(true)
.rangeChart(lineZoomGraph)
.brushOn(false)
.x(d3.time.scale().domain([minDate,maxDate]));
Separate but relevant question; how do I modify the y-axis on the line charts? By default they don't encompass the highest and lowest values found in the dataset, which seems odd.
Edit: some code I wrote to try to solve the problem:
var graphWidth = 1100;
var dataPerPixel = data.length / graphWidth;
var tempGroup = dateDim.group().reduceSum(function(d) {
if (d.pointNumber % Math.ceil(dataPerPixel) === 0) {
return d.warmth;
}
});
d.pointNumber is a unique point ID for each data point, cumulative from 0 to 22 thousand ish. Now however the line graph shows up blank. I checked the group's data using tempGroup.all() and now every 21st data point has a temperature value, but all the others have NaN. I haven't succeeded in reducing the group size at all; it's still at 22 thousand or so. I wonder if this is the right approach...
Edit 2: found a different approach. I create the tempGroup normally but then create another group which filters the existing tempGroup even more.
var tempGroup = dateDim.group().reduceSum(function(d) { return d.warmth; });
var filteredTempGroup = {
all: function () {
return tempGroup.top(Infinity).filter( function (d) {
if (d.pointNumber % Math.ceil(dataPerPixel) === 0) return d.value;
} );
}
};
The problem I have here is that d.pointNumber isn't accessible so I can't tell if it's the Nth data point (or a multiple of that). If I assign it to a var it'll just be a fixed value anyway, so I'm not sure how to get around that...

When dealing with performance problems with d3-based charts, the usual culprit is the number of DOM elements, not the size of the data. Notice the crossfilter demo has lots of rows of data, but only a couple hundred bars.
It looks like you might be attempting to plot all the points instead of aggregating them. I guess since you are doing a time series it may be unintuitive to aggregate the points, but consider that your plot can only display 1100 points (the width), so it is pointless to overwork the SVG engine plotting 25,000.
I'd suggest bringing it down to somewhere between 100-1000 bins, e.g. by averaging each day:
var daysDim = data.dimension(function(d) { return d3.time.day(d.time); });
function reduceAddAvg(attr) {
return function(p,v) {
if (_.isLegitNumber(v[attr])) {
++p.count
p.sums += v[attr];
p.averages = (p.count === 0) ? 0 : p.sums/p.count; // gaurd against dividing by zero
}
return p;
};
}
function reduceRemoveAvg(attr) {
return function(p,v) {
if (_.isLegitNumber(v[attr])) {
--p.count
p.sums -= v[attr];
p.averages = (p.count === 0) ? 0 : p.sums/p.count;
}
return p;
};
}
function reduceInitAvg() {
return {count:0, sums:0, averages:0};
}
...
// average a parameter (column) named "param"
var daysGroup = dim.group().reduce(reduceAddAvg('param'), reduceRemoveAvg('param'), reduceInitAvg);
(reusable average reduce functions from the FAQ)
Then specify your xUnits to match, and use elasticY to auto-calculate the y axis:
chart.xUnits(d3.time.days)
.elasticY(true)

updating a line graph in d3 is not working

i am trying to update a line graph and it is not throwing any error but it is also not updating the graph.
i am deleting a point and adding a new one with an incremented rate and incremented created_at date by a second(trying to follow http://bl.ocks.org/benjchristensen/1148374)
function redrawWithoutAnimation() {
for (var i in chart_data) {
linedata = chart_data[i];
//delete first element of array
linedata.points.reverse().shift();
//create a new point
rate = linedata.points[0].rate + 1;
created_at = linedata.points[0].created_at + 6000;
new_point = {};
new_point.rate = rate;
new_point.created_at = created_at;
linedata.points.push(new_point);
console.log(linedata);
}
// static update without animation
svg.selectAll("path")
.data([linedata.points]); // set the new data
line(linedata.points); // apply the new data values
}
redrawWithoutAnimation();
setInterval(function () {
redrawWithoutAnimation();
}, 8000);
here is my code
http://jsfiddle.net/yr2Nw/8/

Working fiddle: http://jsfiddle.net/reblace/GsaGb/1
There's a few issues here...
First, you were updating all the chart_data in the for loop, but outside the loop, you were only trying to update the line still stored in the linedata variable after loop execution. You should try to avoid having variables with greater scope than they need. It can lead to bugs like this one:
svg.selectAll("path").data([linedata.points]);
line(linedata.points);
You should instead use D3's data joining to rejoin the new data to all the paths at once declaratively like so:
linesGroup.selectAll("path")
.data(chart_data)
.attr("d", function(d){ return line(d.points); });
What that code's doing is it's selecting the paths and then joining each of them to the chart_data elements and then binding the appropriate line generator to the "d" attribute for the appropriate path.
Then, you need to update your x axis and y axis otherwise the plot will just shoot off the drawn area. This code is updating the domains and then rebinding the axes to the dom elements so they redraw:
xAxis.scale().domain([
d3.min(chart_data, function (c) { return d3.min(c.points, function (v) { return v.created_at; }); }),
d3.max(chart_data, function (c) { return d3.max(c.points, function (v) { return v.created_at; }); })
]);
yAxis.scale().domain([
0,
d3.max(chart_data, function (c) { return d3.max(c.points, function (v) { return v.rate; }); })
]);
svg.select(".x.axis").call(xAxis);
svg.select(".y.axis").call(yAxis);
There were a few other bugs I fixed them in the Fiddle. For example, you need to calculate the time for the new point based on the last element in the array, not the first, otherwise the line can't interpolate properly since its no longer a continuous function... and this is a bit more concise way to do your line updates:
for (var i=0; i<chart_data.length; i++) {
linedata = chart_data[i];
//delete first element of array
var removedPoint = linedata.points.shift();
//create a new point
var lastpoint = linedata.points[linedata.points.length-1];
var new_point = {
rate: removedPoint.rate,
created_at: lastpoint.created_at + 6000
};
linedata.points.push(new_point);
}
Also note that you shouldn't use the for(var in) loop for Arrays, that's for iterating over the properties in an object.
There's still some issues, but I think this should help get you over the hurdle you were stuck on. Anyways, it looks cool in action!

Fine fenac.. You facing so many problems since your data is not in good format for your requirements..
as per http://bl.ocks.org/benjchristensen/1148374 The x-axis data must be (data[] (data array))
Your data is something like this
[objects,object,object] where each object holds one element of xaxis value.. so the pushing and shifting is not possible..
try to change the format of the data (linedata.points) to an array (data[]) and try it out sure it works..
You just need to put all the values in linedata.points into an array data[] and use this data[] to animate your line..
Since yours the multiline.. you need to create 2D array and must pass them accordingly...
Cheers..
I updated your jsfiddle
setInterval(function () {
console.log(linedata.points);
var v = linedata.points.shift(); // remove the first element of the array
linedata.points.push(v); // add a new element to the array (we're just taking the number we just shifted off the front and appending to the end)
redrawWithoutAnimation();
}, 3000);
http://jsfiddle.net/yr2Nw/9/
But still it wont works till you do that work...
Personal Suggestion: First Try with single line graph then go with looping for multiline...

Remove hours from time series

Dygraphs allows easy display of time series...
However, if my data contains only two data points, it automatically fills the gaps in X axis with hours. Is it possible to disable this functionality?
I searched and tried many options but not found anything useful.
Example might be the 'Time Series Drawing Demo' from the gallery - if executed on only few datapoints, it fills the 'gaps' with hours.
This is a good example:
g = new Dygraph(document.getElementById('plot'),"a,b\n2008-12-01,0.9\n2008-12-02,0.3\n2008-12-03,0.7\n")
UPDATE- this seems to be working:
ticker: function(a, b, pixels, opts, dygraph, vals) {
var chosen = Dygraph.pickDateTickGranularity(a, b, pixels, opts);
if(chosen==12) chosen=13;
if (chosen >= 0) {
return Dygraph.getDateAxis(a, b, chosen, opts, dygraph);
} else {
// this can happen if self.width_ is zero.
return [];
}
};

Your issue is not that you have two points, but that your points cover a certain amount of time. Dygraphs tries to calculate the best granularity for the x axis tick marks in a given set of data.
One way to modify the default calculation is by using the pixelsPerLabel option.
Example: http://jsfiddle.net/kaliatech/P8ehg/
var data = "a,b\n2008-12-01,0.9\n2008-12-02,0.3\n2008-12-03,0.7\n";
g = new Dygraph(document.getElementById("plot"), data, {
axes: {
x: {
pixelsPerLabel: 100
}
}
});
This requires hard coding a pixel width though, and it is still ultimately dependent on the data set that you are graphing. A more flexible approach might be to use the ticker option, allowing you to supply your own function for calculating label granularity. See the documentation and built-in functions of dygraph-tickers.js.
See also:
How to set specific y-axis label points in dygraphs?
EDIT: Example using ticker. This requires that you are familiar with the data and the data range is somewhat constant, otherwise you could end up with unreadable x-axis labels.
var g = new Dygraph(document.getElementById("demodiv3"), data(), {
title: 'Example for changing x-axis label granularity 3',
axes: {
x: {
ticker: function(a, b, pixels, opts, dygraph, vals) {
var chosen = Dygraph.pickDateTickGranularity(a, b, pixels, opts);
//Force to DAILY if built-in calculation returned SIX_HOURLY
//if(chosen==Dygraph.SIX_HOURLY)
// chosen=Dygraph.DAILY;
//or
//Force to DAILY always
chosen = Dygraph.DAILY;
if (chosen >= 0) {
return Dygraph.getDateAxis(a, b, chosen, opts, dygraph);
} else {
return [];
}
}
}
}
});

We Keep Coding

JavaScript is the programming language of the Web.

Picking quartile value on each point - javascript

Related

crossfilter.js - Histogram with custom reduce function fails in filtering data

Crossfilter - Double Dimensions (second value linked to daily max)

Optimising a group of dc.js line graphs

updating a line graph in d3 is not working

Remove hours from time series

Categories

Resources