Check Validity of Data Before Update Phase in D3.js

Check Validity of Data Before Update Phase in D3.js - javascript

I have data which updates every 10 seconds and I would like to check that all the data is valid before progressing with updates. I am currently getting false data intermittently which occurs as a negative number in one of the values. If one of the objects has a negative value then I don't trust the whole set and don't want to update any elements.
Ideally I don't want to update some items and then bail once the incorrect value occurs, but rather, determine if the whole set is good before updating anything
I'm not sure how d3 can manage this but I've tried with this and it seems to work. But it doesn't seem particularly in keeping with the elegance of D3 so I think there's probably a correct and better way to do it. But maybe not?!
var dataValid = true;
abcItems.each(function (d, i) {
if (0 > dd.Number1 - dd.Number2) dataValid = false;
});
if (dataValid) {
abcItems.each(function (d, i) {
// updating elements here
});
} else {
console.log("negative value occurred");
}
Is there a better way to manage this through D3?
A little bit more context:
The data (JSON provided via a RESTful API) and visualisation (a bar chart) are updating every 10 seconds. The glitch in the API results in incorrect data once every hour or so at the most (sometimes it doesn't happen all day). The effect of the glitch is that the bars all change dramatically whereas the data should only change by ones or twos each iteration. In the next fetch of data 10 seconds later the data is fine and the visualisation comes right.
The data itself is always "well-formed" it's just that the values provided are incorrect. Therefore even during the glitch it is safe to bind the data to elements.
What I want to do, is skip the entire iteration and update phase if the data contains one of these negative values.
Perhaps also worth noting is that the items in the data are always the same, that is to say the only "enter" phase that occurs is on page load and there are no items that exit (though I do include these operations to capture any unexpected fluctuations in the data). The values for these items do change though.

Looking at your code it seams you already have bound the dataset to your DOM elements abcItems.each(...).
Why not bail out of the update function when the data is not valid.
d3.json("bar-tooltip.json", function(dataset) {
if (!dataset.every(d => d.Number2 <= d.Number1)) return;
// do the update of the graph
});
The example assumes you call d3.json() froma function that is called every update interval, but you can use a different update method.

Related

Why is my reducer behaving differently between the first filter and subsequent filters applied in dc.js?

I'm working on a data visualization that has an odd little bug:
It's a little tricky to see, but essentially, when I click on a point in the line chart, that point corresponds to a specific issue of a magazine. The choropleth updates to reflect geodata for that issue, but, critically, the geodata is for a sampled period that corresponds to the issue. Essentially, the choropleth will look the same for any issue between January-June or July-December of a given year.
As you can see, I have a key called Sampled Issue Date (for Geodata), and the value should be the date of the issue for which the geodata is based on (basically, they would get geographical distribution for one specific issue and call it representative of ALL data in a six month period.) Yet, when I initially click on an issue, I'm always getting the last sampled date in my data. All of the geodata is correct, and, annoyingly, all subsequent clicks display the correct information. So it's only that first click (after refreshing the page OR clearing an issue) that I have a problem.
Honestly, my code is a nightmare right now because I'm focused on debugging, but you can see my reducer for the remove function on GitHub which is also copy/pasted below:
// Reducer function for raw geodata
function geoReducerAdd(p, v) {
// console.log(p.sampled_issue_date, v.sampled_issue_date, state.periodEnding, state.periodStart)
++p.count
p.sampled_mail_subscriptions += v.sampled_mail_subscriptions
p.sampled_single_copy_sales += v.sampled_single_copy_sales
p.sampled_total_sales += v.sampled_total_sales
p.state_population = v.state_population // only valid for population viz
p.sampled_issue_date = v.sampled_issue_date
return p
}
function geoReducerRemove(p, v) {
const currDate = new Date(v.sampled_issue_date)
// if(currDate.getFullYear() === 1921) {
// console.log(currDate)
// }
currDate <= state.periodEnding && currDate >= state.periodStart ? console.log(v.sampled_issue_date, p.sampled_issue_date) : null
const dateToRender = currDate <= state.periodEnding && currDate >= state.periodStart ? v.sampled_issue_date : p.sampled_issue_date
--p.count
p.sampled_mail_subscriptions -= v.sampled_mail_subscriptions
p.sampled_single_copy_sales -= v.sampled_single_copy_sales
p.sampled_total_sales -= v.sampled_total_sales
p.state_population = v.state_population // only valid for population viz
p.sampled_issue_date = dateToRender
return p
}
// generic georeducer
function geoReducerDefault() {
return {
count: 0,
sampled_mail_subscriptions: 0,
sampled_single_copy_sales: 0,
sampled_total_sales: 0,
state_population: 0,
sampled_issue_date: ""
}
}
The problem could be somewhere else, but I don't think it's a crossfilter issue (I'm not running into the "two groups from the same dimension" problem for sure) and adding additional logic to the add reducer makes things even less predictable (understandably - I don't ever really need to render the sample date for all values anyway.) The point of this is that I'm completely lost about where the flaw in my logic is, and I'd love some help!
EDIT: Note that the reducers are for the reduce method on a dc.js dimension, not the native javascript reducer! :D

Two crossfilters! Always fun to see that... but it can be tricky because nothing in dc.js directly supports that, except for the chart registry. You're on your own for filtering between different chart groups, and it can be tricky to map between data sets with different time resolutions and so on.
The problem
As I understand your app, when a date is selected in the line chart, the choropleth and accompanying text should have exactly one row from the geodata dataset selected per state.
The essential problem is that Crossfilter is not great at telling you which rows are in any given bin. So even though there's just one row selected, you don't know what it is!
This is the same problem that makes minimum, maximum, and median reductions surprisingly complicated. You often end up building new data structures to capture what crossfilter throws away in the name of efficiency.
A general solution
I'll go with a general solution that's more that you need, but can be helpful in similar situations. The only alternative that I know is to go completely outside crossfilter and look in the original dataset. That's fine too, and maybe more efficient. But it can be buggy and it's nice to work within the system.
So let's keep track of which dates we've seen per bin. When we start out, every bin will have all the dates. Once a date is selected, there will be only one date (but not exactly the one that was selected, because of your two-crossfilter setup).
Instead of the sampled_issue_date stuff, we'll keep track of an object called date_counts now:
// Reducer function for raw geodata
function geoReducerAdd(p, v) {
// ...
const canonDate = new Date(v.sampled_issue_date).getTime()
p.date_counts[canonDate] = (p.date_counts[canonDate] || 0) + 1
return p
}
function geoReducerRemove(p, v) {
// ...
const canonDate = new Date(v.sampled_issue_date).getTime()
if(!--p.date_counts[canonDate])
delete p.date_counts[canonDate]
return p
}
// generic georeducer
function geoReducerDefault() {
return {
// ...
date_counts: {}
}
}
What does it do?
Line by line
const canonDate = new Date(v.sampled_issue_date).getTime()
Maybe this is paranoid, but this canonicalizes the input dates by converting them to the number of milliseconds since 1970. I'm sure you'd be safe using the string dates directly, but who knows there could be a space or a zero or something.
You can't index an object with a date object, you have to convert it to an integer.
p.date_counts[canonDate] = (p.date_counts[canonDate] || 0) + 1
When we add a row, we'll check if we currently have a count for the row's date. If so, we'll use the count we have. Otherwise we'll default to zero. Then we'll add one.
if(!--p.date_counts[canonDate])
delete p.date_counts[canonDate]
When we remove a row, we know that we have a count for the date for that row (because crossfilter won't tell us it's removing the row unless it was added earlier). So we can go ahead and decrement the count. Then if it hits zero we can remove the entry.
Like I said, it's overkill. In your case, the count will only go to 1 and then drop to 0. But it's not much more expensive to this rather than just keep
Rendering the side panel
When we render the side panel, there should only be one date left in date_counts for that selected item.
console.assert(Object.keys(date_counts).length === 1) // only one entry
console.assert(Object.entries(date_counts)[0][1] === 1) // with count 1
document.getElementById('geo-issue-date').textContent = new Date(+Object.keys(date_counts)[0]).format('mmm dd, yyyy')
Usability notes
From a usability perspective, I would recommend not to filter(null) on mouseleave, or if you really want to, then put it on a timeout which gets cancelled when you see a mouseenter. One should be able to "scrub" over the line chart and see the changes over time in the choropleth without accidentally switching back to the unfiltered colors.
I also noticed (and filed) an issue because I noticed that dots to the right of the mouse pointer are shown, making them difficult to click. The reason is that the dots are overlapping, so only a little sliver of a crescent is hoverable. At least with my trackpad, the click causes the pointer to travel leftward. (I can see the date go back a week in the tooltip and then return.) It's not as much of a problem when you're zoomed in.

Cesium large number of entity updates

I am working on a project dealing with sensor data. In my backend everything is stored in a database which is getting polled by a controller and converted into kml to display on the cesium globe. This poll happens every 5-10 seconds and contains around 4000-8000 objects (we store up to 1 minute worth of data so we are looking at somewhere like 20k - 50k points). Following this I have an update function which slowly fades the markers out which updates every 5 seconds.
To load the kml on the map I use the following function:
var dataSource = new Cesium.KmlDataSource();
dataSource.load('link').then(function(value);
viewer.dataSources.add(dataSource);
});
On the update color function I am iterating over all of the objects within the datasources entity collection and updating them like so (this is very inefficient):
var colorUpdate = Cesium.Color.fromAlpha(newColor, .4);
dataSource.entities.values[i].billboard.color = colorUpdate;
When I do and add or color update I see a large amount of lag and was curious if there was anything you would suggest to fix this? Generally I get a freeze up for a few seconds. After 60 seconds of the data being on the map it gets removed like so (just a different if case within the color update loop)
dataSource.entities.remove(dataSource.entities.values[i]);
Is there potentially a way to set a propertiy for an entire entity collection so when this collection becomes 30 seconds old it updates the color to a new one? It seems that I just need to find a way to set a property for the entire collection vs individual entities. Does anyone know how to do that or have a suggestion for something better?

d3 — Progressively draw a large dataset

I'm using d3.js to plot the contents of an 80,000 row .tsv onto a chart.
The problem I'm having is that since there is so much data, the page becomes unresponsive for aprox 5 seconds while the entire dataset is churned through at once.
Is there an easy way to process the data progressively if it's spread over a longer period of time?
Ideally the page would remain responsive, and the data would be plotted as it became available, instead of in one big hit at the end

I think you'll have to chunk your data and display it in groups using setInterval or setTimeout. This will give the UI some breathing room to jump in the middle.
The basic approach is:
1) chunk the data set
2) render each chunk separately
3) keep track of each rendered group
Here's an example:
var dataPool = chunkArray(data,100);
function updateVisualization() {
group = canvas.append("g").selectAll("circle")
.data(dataPool[poolPosition])
.enter()
.append("circle")
/* ... presentation stuff .... */
}
iterator = setInterval(updateVisualization,100);
You can see a demo fiddle of this -- done before I had coffee -- here:
http://jsfiddle.net/thudfactor/R42uQ/
Note that I'm making a new group, with its own data join, for each array chunk. If you keep adding to the same data join over time ( data(oldData.concat(nextChunk) ), the entire data set still gets processed and compared even if you're only using the enter() selection, so it doesn't take long for things to start crawling.

Hiding all series except one very slow in IE [duplicate]

I'm using Highcharts to represent groups of time series. So, data points collected from the same individual are connected by lines, and data points from individuals that belong to the same group share the same color. The Highcharts legend displays each individual time series instead of groups, and I have over a hundred time series, to it's ugly and impractical to hide and show data that way.
Instead I made buttons and used jQuery to associate them with functions that would search for matching colors among the time series and toggle the visibility of each matching series.
Here is an example with a small dataset: http://jsfiddle.net/bokov/VYkmg/6/
Here is the series-hiding function from that example:
$("#button").click(function() {
if ($(this).hasClass("hideseries")) {
hs = true;
} else {
hs = false;
}
$(chart.series).each(function(idx, item) {
if (item.color == 'green') {
if (hs) {
item.show();
} else {
item.hide();
}
}
});
$(this).toggleClass("hideseries");
});
The above works. The problem is, my real data can have over a hundred individual time series and it looks like checking the color of each series is really slow. So, can anybody suggest a more efficient way to solve this problem? Are there some built-in Highcharts methods that already do this? Or, can I give jQuery a more specific selector?
I tried digging into the <svg> element created by Highcharts but I can't figure out which child elements correspond to the series in the chart.
Thanks.

The issue here is that Highcharts is redrawing the chart after every series change. I checked the API to see if there was a param you could pass to defer that, but that doesn't appear to be the case.
Instead, you can stub out the redraw method until you are ready, like so:
var _redraw = chart.redraw;
chart.redraw = function(){};
//do work
chart.redraw = _redraw;
chart.redraw();
Check out the full example here. For me, it was about 10 times faster to do it this way.

Rather than calling show() or hide() for each series, call setVisible(/* TRUE OR FALSE HERE */, false);. This second parameter is the redraw parameter, and you can avoid causing a redraw (which is slow) for each series.
Then, after you're done changing visibilities, call chart.redraw() once.
http://api.highcharts.com/highcharts#Series.setVisible

KnockoutJS foreach blocks main thread

When I have a large dataset in my viewModel and I use foreach to loop over an Array of Objects to render each Object as a row within a table, KnockoutJS will block the main thread until it can render, which sometimes takes minutes (!).
Here is a jsFiddle example using a dataset containing 2000 Objects, containing a url and a code. Real data will have longer URLs in some cases and 4 other columns (only 2 in this example.) I also added some simple styles because adding styles also seems to slow things down a bit during the process.
Warning: your browser might break
http://jsfiddle.net/DESC3/7/

I suggest that if you have such large datasets you try an alternative solution. For example slickGrid renders large datasets in a much more efficient way, by only generating HTML elements for the data that is actually visible. We've used this for large datasets, and it performs well.

How about something like this. Say, you've got viewModel.items = ko.observableArray() that you'd like to render.
Have a separate non-observable array of all your data: var itemsToRender = functionThatReturnsLargeArray().
Put some portion of your data from itemsToRender into your observable array. Say, 50 elements only.
Keep adding elements into observable array in portions inside a setTimeout callback.
NOTE1: You can add some time-tracking into setTimeout callback and increase/reduce the number of items that you add at each iteration. Your goal is to keep each callback time below 50-100 milliseconds so your application still feels responsive.
var batchSize = 50; // default number of items rendered per iteration
var batchOffset = 0;
function render(items, itemsToRender, done) {
setTimeout(function () {
var startTime = new Date().getTime();
items.pushAll(itemsToRender.slice(batchOffset, batchSize));
batchOffset += batchSize;
// at this point Knockout rendered next batchSize items from itemsToRender
var endTime = new Date().getTime();
// update batchSize for next iteration
batchSize = batchSize * 50 / (endTime - startTime); // 50 milliseconds
batchSize = Math.min(itemsToRender.length, batchOffset + batchSize);
if (batchSize > 0) render() else done(); // callback if you need one
}, 0);
}
/* I haven't actually tested the code */
Another batch size updating strategy could be based on target FPS. Say you'd like to achieve 60 fps update rate and thus 60 calls to setTimeout per 1000 milliseconds. That would take somewhat longer to process the whole collection. You can also use requestAnimationFrame instead of setTimeout and see how that would work out.
EDIT: Build-in throttling was added into Knockout JS 1.3 (currently it's in beta but seems pretty stable).
NOTE2: If some other data on the view depends on viewModel.items you can still map it down to original array itemsToRender. Say, for example, that you'd like to show the number of items in collection. If you use viewModel.items().length you'll end up with changing size value in the UI while more items get renderred. To avoid that you can first define your size binding as a dependentObservable based on itemsToRender, not viewModel.items. After you've done rendering all items you can re-map it onto viewModel.items if you see fit.

We Keep Coding

JavaScript is the programming language of the Web.

Check Validity of Data Before Update Phase in D3.js - javascript

Related

Why is my reducer behaving differently between the first filter and subsequent filters applied in dc.js?

Cesium large number of entity updates

d3 — Progressively draw a large dataset

Hiding all series except one very slow in IE [duplicate]

KnockoutJS foreach blocks main thread

Categories

Resources