Optimising a group of dc.js line graphs - javascript

I have a group of graphs visualizing a bunch of data for me (here), based off a csv with approximately 25,000 lines of data, each having 12 parameters. However, doing any interaction (such as selecting a range with the brush on any of the graphs) is slow and unwieldy, completely unlike the dc.js demo found here, which deals with thousands of records as well but maintains smooth animations, or crossfilter's demo here which has 10 times as many records (flights) as I do.
I know the main resource hogs are the two line charts, since they have data points every 15 minutes for about 8 solid months. Removing either of them makes the charts responsive again, but they're the main feature of the visualizations, so is there any way I can make them show less fine-grained data?
The code for the two line graphs specifically is below:
var lineZoomGraph = dc.lineChart("#chart-line-zoom")
.width(1100)
.height(60)
.margins({top: 0, right: 50, bottom: 20, left: 40})
.dimension(dateDim)
.group(tempGroup)
.x(d3.time.scale().domain([minDate,maxDate]));
var tempLineGraph = dc.lineChart("#chart-line-tempPer15Min")
.width(1100).height(240)
.dimension(dateDim)
.group(tempGroup)
.mouseZoomable(true)
.rangeChart(lineZoomGraph)
.brushOn(false)
.x(d3.time.scale().domain([minDate,maxDate]));
Separate but relevant question; how do I modify the y-axis on the line charts? By default they don't encompass the highest and lowest values found in the dataset, which seems odd.
Edit: some code I wrote to try to solve the problem:
var graphWidth = 1100;
var dataPerPixel = data.length / graphWidth;
var tempGroup = dateDim.group().reduceSum(function(d) {
if (d.pointNumber % Math.ceil(dataPerPixel) === 0) {
return d.warmth;
}
});
d.pointNumber is a unique point ID for each data point, cumulative from 0 to 22 thousand ish. Now however the line graph shows up blank. I checked the group's data using tempGroup.all() and now every 21st data point has a temperature value, but all the others have NaN. I haven't succeeded in reducing the group size at all; it's still at 22 thousand or so. I wonder if this is the right approach...
Edit 2: found a different approach. I create the tempGroup normally but then create another group which filters the existing tempGroup even more.
var tempGroup = dateDim.group().reduceSum(function(d) { return d.warmth; });
var filteredTempGroup = {
all: function () {
return tempGroup.top(Infinity).filter( function (d) {
if (d.pointNumber % Math.ceil(dataPerPixel) === 0) return d.value;
} );
}
};
The problem I have here is that d.pointNumber isn't accessible so I can't tell if it's the Nth data point (or a multiple of that). If I assign it to a var it'll just be a fixed value anyway, so I'm not sure how to get around that...

When dealing with performance problems with d3-based charts, the usual culprit is the number of DOM elements, not the size of the data. Notice the crossfilter demo has lots of rows of data, but only a couple hundred bars.
It looks like you might be attempting to plot all the points instead of aggregating them. I guess since you are doing a time series it may be unintuitive to aggregate the points, but consider that your plot can only display 1100 points (the width), so it is pointless to overwork the SVG engine plotting 25,000.
I'd suggest bringing it down to somewhere between 100-1000 bins, e.g. by averaging each day:
var daysDim = data.dimension(function(d) { return d3.time.day(d.time); });
function reduceAddAvg(attr) {
return function(p,v) {
if (_.isLegitNumber(v[attr])) {
++p.count
p.sums += v[attr];
p.averages = (p.count === 0) ? 0 : p.sums/p.count; // gaurd against dividing by zero
}
return p;
};
}
function reduceRemoveAvg(attr) {
return function(p,v) {
if (_.isLegitNumber(v[attr])) {
--p.count
p.sums -= v[attr];
p.averages = (p.count === 0) ? 0 : p.sums/p.count;
}
return p;
};
}
function reduceInitAvg() {
return {count:0, sums:0, averages:0};
}
...
// average a parameter (column) named "param"
var daysGroup = dim.group().reduce(reduceAddAvg('param'), reduceRemoveAvg('param'), reduceInitAvg);
(reusable average reduce functions from the FAQ)
Then specify your xUnits to match, and use elasticY to auto-calculate the y axis:
chart.xUnits(d3.time.days)
.elasticY(true)

Related

Picking quartile value on each point

I'm plotting sentiment value of tweet over last 10 years.
The csv file has the three columns like below.
I plotted each value by date successfully.
However, when I tried to generate an area graph,
I encountered a problem which is,
each date has multiple values.
That's because each data point is from one single tweets that
one x point ended up with having multiple y values.
So I tried to pick quartile value of each date or pick largest or least y value.
For clarity, please see the example below.
January 8 has multiple y values (textblob)
I want to draw area graph by picking the largest value or 2nd quartile value of each point.
How do I only pick the point?
I would like to feed the point in the following code as a
x/y coordinate for line or area greaph.
function* vlinedrawing(data){
for(let i;i<data.length;i++){
if( i%500==0) yield svg.node();
let px = margin+xscale(data[i].date)
let py = height-margin-yscale(data[i].vader)
paths.append('path')
.attr('x',px)
.attr('y',py)
}
yield svg.node()
}
The entire code is in the following link.
https://jsfiddle.net/soonk/uh5djax4/2/
Thank you in advance.
( The reason why it is a generator is that I'm going to visualize the graph in animated way)
For getting the 2nd quartile you can use d3.quantile like this:
d3.quantile(dataArray, 0.5);
Of course, since the 2nd quartile is the median, you can also just use:
d3.median(dataArray);
But d3.quantile is a bit more versatile, you can just change the p value for any quartile you want.
Using your data, without parsing the dates (so we can use a Set for unique values`), here is a possible solution:
const aggregatedData = [...new Set(data.map(function(d) {
return d.date
}))].map(function(d) {
return {
date: parser(d),
textblob: d3.quantile(data.filter(function(e) {
return e.date === d
}).map(function(e) {
return e.textblob
}), 0.5)
}
});
This is just a quick answer for showing you the way: that's not a optimised code, because there are several loops within loops. You can try to optimise it.
Here is the demo:
var parser = d3.timeParse("%m/%d/%y");
d3.csv('https://raw.githubusercontent.com/jotnajoa/Javascript/master/tweetdata.csv', row).then(function(data) {
const aggregatedData = [...new Set(data.map(function(d) {
return d.date
}))].map(function(d) {
return {
date: parser(d),
textblob: d3.quantile(data.filter(function(e) {
return e.date === d
}).map(function(e) {
return e.textblob
}), 0.5)
}
});
console.log(aggregatedData)
});
function row(d) {
d.vader = +d.vader;
d.textblob = +d.textblob;
return d;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.7.0/d3.min.js"></script>

dc.js - rangeChart bars disappearing when filtered out

I'm new to dc.js and trying to implement a something like the "Monthly Index Abs Move" graph in the demo at https://dc-js.github.io/dc.js/
(see document source at https://dc-js.github.io/dc.js/docs/stock.html).
ie. I'm trying to implement a line chart for "zoom in" view with a bar chart for the "zoomed out" view (rangeChart).
My problem is that when I filter a date range (eg. by using the "brushOn" the bar chart) then the bars that are filtered out disappear
The demo has this working correctly - the bars outside the date range are gray and those within the date range are blue - see screenshots.
I'm using the css file used in the demo, and I'm using very similar code (see code below), so I'm not sure why this difference.
var maxDate = new Date(1985, 0, 1);
var minDate = new Date(2200, 12, 31);
events.forEach(function (d) {
d.created = new Date(d.created);
//d.last_modified = new Date(d.last_modified);
d.hour = d3.time.hour(d.created); // precaclculate for performance
d.day = d3.time.day(d.created);
if (d.created > maxDate) {
maxDate = d.created;
}
if (d.created < minDate) {
minDate = d.created;
}
});
var ndx = crossfilter(events);
var dateDimension = ndx.dimension(dc.pluck('created'));
var chatHourDim = ndx.dimension(dc.pluck('hour'));
var chatDayDim = ndx.dimension(dc.pluck('day'));
var chatsPerHourGroup = chatHourDim.group().reduceCount();
var chatsPerDayGroup = chatDayDim.group().reduceCount();
visitorsPerHour /* dc.lineChart('#visitors-count', 'chartGroup'); */
.renderArea(true)
.width(900)
.height(200)
.transitionDuration(10)
.margins({top: 30, right: 40, bottom: 25, left: 40})
.dimension(chatHourDim)
.mouseZoomable(true)
// Specify a “range chart” to link its brush extent with the zoom of the current “focus chart”.
.rangeChart(visitorsPerDay)
.x(d3.time.scale().domain([minDate, maxDate]))
.round(d3.time.hour.round)
.xUnits(d3.time.hours)
.elasticY(true)
.renderHorizontalGridLines(true)
.legend(dc.legend().x(650).y(10).itemHeight(13).gap(5))
.brushOn(false)
.group(chatsPerHourGroup, 'Chat events per hour')
.title(function (d) {
var value = d.value;
if (isNaN(value)) {
value = 0;
}
return dateFormat(d.key) + '\n' + value + " chat events";
});
// dc.barChart("visitors-count-per-day", 'chartGroup');
visitorsPerDay.width(900)
.height(40)
.margins({top: 0, right: 50, bottom: 20, left: 40})
.dimension(chatDayDim)
.group(chatsPerDayGroup)
// .centerBar(true)
.gap(1)
.brushOn(true)
.x(d3.time.scale().domain([minDate, maxDate]))
.round(d3.time.day.round)
.alwaysUseRounding(true)
.xUnits(d3.time.days);
The way dc.js and crossfilter ordinarily support this functionality is that a crossfilter group does not observe its own dimension's filters.
The range chart example in the stock example uses the same dimension for both charts (moveMonths). So, when the focus chart is zoomed to the selected range in the range chart, it does filter the data for all the other charts (which you want), but it does not filter the range chart.
If you want to use different dimensions for the two charts, I can see a couple ways to get around this.
Using a fake group
Perhaps the easiest thing to do is snapshot the data and disconnect the range chart from later filters, using a fake group:
function snapshot_group(group) {
// will get evaluated immediately when the charts are initializing
var _all = group.all().map(function(kv) {
// don't just copy the array, copy the objects inside, because they may change
return {key: kv.key, value: kv.value};
});
return {
all: function() { return _all; }
};
}
visitorsPerDay
.group(snapshot_group(chatsPerDayGroup))
However, the range chart also won't respond to filters on other charts, and you probably want it to.
Same dimension, different groups
So arguably the more correct thing is to use only one time dimension for both the focus and range charts, although it kills the optimization you were trying to do on binning. A group optionally takes its own accessor, which takes the dimension key and produces its own key, which must preserve the ordering.
Seems like it was probably designed for exactly this purpose:
var dateDimension = ndx.dimension(dc.pluck('created'));
var chatsPerHourGroup = dateDimension.group(function(d) {
return d3.time.hour(d);
}).reduceCount();
var chatsPerDayGroup = dateDimension.group(function(d) {
return d3.time.day(d);
}).reduceCount();
visitorsPerHour /* dc.lineChart('#visitors-count', 'chartGroup'); */
.dimension(dateDimension)
.group(chatsPerHourGroup, 'Chat events per hour')
visitorsPerDay.width(900)
.dimension(dateDimension)
.group(chatsPerDayGroup)
I don't know if you'll notice a slowdown. Yes, JavaScript date objects are slow, but this shouldn't be an issue unless you are converting tens or hundreds of thousands of dates. It's usually DOM elements that are the bottleneck in d3/dc, not anything on the JavaScript side.

Highcharts - gap between series in stacked area chart

I've created a stacked area chart in Highcharts, which you can see in the image below and in the following jsfiddle: http://jsfiddle.net/m3dLtmoz/
I have a workaround for the gaps you see, which is to group the data for each series by month so that each series looks something like this instead:
series: [{
data: [
[1464739200000,2471],
[1467331200000,6275],
[1470009600000,2574],
[1472688000000,7221],
[1475280000000,3228]
]}
]
While the above isn't exactly what I'm going for, the way the series above is structured does give me what I ultimately want, which is this:
I'm really dying to know why the original setup isn't working appropriately, however. I've tested other instances where datetimes group and aggregate properly based on a single datetime x axis value. I'm stumped as to why this particular data set isn't working. I've tried using the dataGrouping option in the Highstock library, but wasn't able to integrate that effectively. I've messed with options as far as tickInterval goes to no avail. I tried setting the "stacking: 'normal' option in each series instead of in the plotOptions, but that made no difference. I've seen issues on github dealing with the stacked area charts, but nothing seems to exactly match up with what I'm seeing. Any help is appreciated - thank you much!
You receive the error in the console. Most of the series require data to be sorted in ascending order. Stacking has nothing do to it, see example.
Series which do not require data to be sorted are scatter or polygon. No error in scatter
You should sort and group the points on your own. If you want to group them by months you have to prepare the data before you put them in a chart. The example below takes averages from the same datetime.
function groupData(unsortedData) {
var data = unsortedData.slice();
data.sort(function (a, b) {
return a[0] - b[0]
});
var i = 1,
len = data.length,
den = 1,
sum = data[0][1],
groupedData = [[data[0][0], sum]],
groupedData = [];
for (; i < len; i++) {
if (data[i - 1][0] === data[i][0]) {
sum += data[i][1];
den++;
} else {
groupedData.push([data[i - 1][0], sum / den]);
den = 1;
sum = data[i][1];
}
}
groupedData.push([data[i-1][0], sum / den]);
return groupedData;
}
example: http://jsfiddle.net/e4enhw9a/1/

dc.js x axis not refreshing

I am using crossfilter with several charts in combination with dc.js.
When filtering with ring charts, the data in the linechart disappears, but the x-axis remains unchanged and doesn't refresh.
var tempLineChartt1 = dc.lineChart("#chart-line-temp-t1");
tempLineChartt1
.width(768)
.height(480)
.elasticX(true)
.x(d3.time.scale().domain([dateDim.bottom(1)[0].dd,dateDim.top(1)[0].dd]))
.elasticX(true)
.dimension(dateDim)
.group(iotmPerDate)
.renderArea(true)
.brushOn(false)
.renderDataPoints(true)
.clipPadding(10)
.yAxisLabel("T1")
I know this has been answered a few times, but I couldn't find a reference to exactly this in a quick search.
Mostly likely you're referring to the fact that bins aren't automatically removed from crossfilter groups. So dc.js sees no reason to change the X domain - elasticX(true) will only kick in when the set of X keys changes, and here the Y values have only dropped to zero.
You can use a "fake group" to filter out these results dynamically:
function remove_empty_bins(source_group) {
return {
all:function () {
return source_group.all().filter(function(d) {
return d.value != 0;
});
}
};
}
var filtered_group = remove_empty_bins(group) // or filter_bins, or whatever
chart.dimension(dim)
.group(filtered_group)
https://github.com/dc-js/dc.js/wiki/FAQ#fake-groups
With this in place, each time the line chart is redrawn, the fake group will filter out the zeros as the data is read. Then the line chart will recalculate the domain and zoom to fit.

Remove hours from time series

Dygraphs allows easy display of time series...
However, if my data contains only two data points, it automatically fills the gaps in X axis with hours. Is it possible to disable this functionality?
I searched and tried many options but not found anything useful.
Example might be the 'Time Series Drawing Demo' from the gallery - if executed on only few datapoints, it fills the 'gaps' with hours.
This is a good example:
g = new Dygraph(document.getElementById('plot'),"a,b\n2008-12-01,0.9\n2008-12-02,0.3\n2008-12-03,0.7\n")
UPDATE- this seems to be working:
ticker: function(a, b, pixels, opts, dygraph, vals) {
var chosen = Dygraph.pickDateTickGranularity(a, b, pixels, opts);
if(chosen==12) chosen=13;
if (chosen >= 0) {
return Dygraph.getDateAxis(a, b, chosen, opts, dygraph);
} else {
// this can happen if self.width_ is zero.
return [];
}
};
Your issue is not that you have two points, but that your points cover a certain amount of time. Dygraphs tries to calculate the best granularity for the x axis tick marks in a given set of data.
One way to modify the default calculation is by using the pixelsPerLabel option.
Example: http://jsfiddle.net/kaliatech/P8ehg/
var data = "a,b\n2008-12-01,0.9\n2008-12-02,0.3\n2008-12-03,0.7\n";
g = new Dygraph(document.getElementById("plot"), data, {
axes: {
x: {
pixelsPerLabel: 100
}
}
});
This requires hard coding a pixel width though, and it is still ultimately dependent on the data set that you are graphing. A more flexible approach might be to use the ticker option, allowing you to supply your own function for calculating label granularity. See the documentation and built-in functions of dygraph-tickers.js.
See also:
How to set specific y-axis label points in dygraphs?
EDIT: Example using ticker. This requires that you are familiar with the data and the data range is somewhat constant, otherwise you could end up with unreadable x-axis labels.
var g = new Dygraph(document.getElementById("demodiv3"), data(), {
title: 'Example for changing x-axis label granularity 3',
axes: {
x: {
ticker: function(a, b, pixels, opts, dygraph, vals) {
var chosen = Dygraph.pickDateTickGranularity(a, b, pixels, opts);
//Force to DAILY if built-in calculation returned SIX_HOURLY
//if(chosen==Dygraph.SIX_HOURLY)
// chosen=Dygraph.DAILY;
//or
//Force to DAILY always
chosen = Dygraph.DAILY;
if (chosen >= 0) {
return Dygraph.getDateAxis(a, b, chosen, opts, dygraph);
} else {
return [];
}
}
}
}
});

Categories