D3 Scatterplot with thousands of data points

D3 Scatterplot with thousands of data points - javascript

I would like to make a scatter plot using D3 with the ability of only looking at a small section at a time using some sort of slider across the x-axis. Is there a method in javascript where I can efficiently buffer the data and quickly access the elements as the user scrolls left or right?
My goal is similar to this protovis example here, but with 10 times the amount of data. This example chokes when I make that many data points.

I have done a scatterplot with around 10k points where I needed to filter sections of the plot interactively.
I share a series of tips that worked for me, which I hope some may hopefully help you too:
Use a key function for your .data() operator as it is done at the end of this tutorial. The advantage of using keys is that you do not need to update elements that do not change.
Not related to d3, but I divided my data space into a grid, so that each data point is associated to a single cell (in other words each cell is an index to a set of points). In this way, when I needed to access from, let's say, from x_0 to x_1, I knew what cells I needed, and hence I could access a much more refined set of possible data points (avoiding iterating along all points).
Avoid transitions: from my personal experiences the .transition() is not very smooth when thousand of SVG elements are selected (it may be better now in newer versions or with faster processors)
In my case it was more convenient to make points invisible (.attr("display","none")) or visible rather than removing and creating SVG elements (I wonder if this is more time efficient too)

Related

How can I prevent overlapping in a family tree generator?

I'm creating an interactive family tree creator, unlike more simpler versions which are simple pedigree charts/trees.
The requirements for mine (based on familyecho.com) are:
multiple partners vs just a simple 2 parent to 1 child that you normally see.
multiple siblings
partners dont necessarily need to have children
there doesn't always have to be a parent "pair", there can just be a single father/mother
The problem I'm encountering is: I'm generating the offsets based on the "current" node/family member and when I go past the first generation with say, 2 parents, it overlaps.
Example of the overlap as well as partner not being drawn on the same X axis:
Here is the actual app and main js file where I'm having the issue. And here is a simplified jsfiddle I created that demonstrates the parent/offset issue though I really have to solve overlapping for this in general, in addition to making sure partners are drawn on the same x axis as other partners.
How can I go about solving this and possible future overlapping conflicts? Do I need some sort of redraw function that detects collisions and adjusts the offsets of each block upon detecting? I'm trying to make it seamless so there's a limited amount of redrawing done.
An example of calculating offset relative to the "context" or current node:
var offset = getCurrentNodeOffset();
if ( relationship == RELATIONSHIPS.PARTNER ) {
var t = offset.top; // same level
var l = offset.left + ( blockWidth + 25 );
} else {
var t = offset.top - (blockHeight + 123 ); // higher
var l = offset.left - ( blockWidth - 25 );
}

I'm going to give a complicated answer, and that's because this situation is more complicated than you seem aware of. Graph layout algorithms are an active field of research. It's easy to attempt a simpler-than-general algorithm and then have it fail in spectacular ways when you make unwarranted, and usually hidden, assumptions.
In general, genetic inheritance graphs are not planar (see Planar Graphs on Wikipedia). Although uncommon, it certainly happens that all the ancestral relationships are not filled by unique people. This happens, for example, when second cousins have children.
Another non-planar situation can occur in the situation of children from non-monogamous parents. The very simplest example is two men and two women, each pairing with children (thus at least four). You can't lay out even the four parent pairs in one rank without curved lines.
These are only examples. I'm sure you'll discover more as you work on your algorithm. The real lesson here is to explicitly model the class of relationship your algorithm is able to lay out and to have verification code in the algorithm to detect when the data doesn't meet these requirements.
The question you are actually asking, though, is far more basic. You're having basic difficulties because you need to be using a depth-first traversal of the graph. This is the (easiest) full version of what it means to lay out "from the top down" (in one of the comments). This is only one of many algorithms for tree traversal.
You're laying out a directed graph with (at least) implicit notion of rank. The subject is rank 0; parents are rank 1; grandparents at rank 2. (Apropos the warnings above, ranking is not always unique.) Most of the area of such graphs is in the ancestry. If you don't lay out the leaf nodes first, you don't have any hope of succeeding. The idea is that you lay out nodes with the highest rank first, progressively incorporating lower-ranked nodes. Depth-first traversal is the most common way of doing this.
I would treat this as a graph-rewriting algorithm. The basic data structure is a hybrid of rendered subgraphs and the underlying ancestry graph. A rendered subgraph is a (1) a subtree of the whole graph with (1a) a set of progeny, all of whose ancestors are rendered and (2) a collection of rendering data: positions of nodes and lines, etc. The initial state of the hybrid is the whole graph and has no rendered subgraphs. The final state is a rendered whole graph. Each step of the algorithm converts some set of elements at the leaf boundary of the hybrid graph into a (larger) rendered subgraph, reducing the number of elements in the hybrid. At the end there's only one element, the render graph as a whole.

Since you are already using Family Echo, I'd suggest you look at how they develop their online family tree diagram, since they seem to have solved your problem.
When I enter your sample diagram into Family Echo, I can build a nice looking tree that seems to be what you are looking for with no cross over.
Although they are creating their diagrams with html and css, you can add the people to their diagrams one by one and then inspect where the boxes are being placed in terms of the left and top pixel locations of each element.
If I had more expertise in JavaScript, I would have tried building up some code to replicate some of what Family Echo is doing, but I'm afraid that's not my mojo.

you'll have to adjust all the branches off the node that you affect, each branch will have to recalculate the position of its nodes, and each node will have to be recalculated locally reaching the leaves.You calculated once the leaves are going to have to recalculate all the way to backing up, all that recursively. It's like a real tree, when you add physically branch to trunk ... the other branches move alone to leave some space, all sheets are automatically reset, so you have to imagine. And simulate this process in your diagram. Processes each branch reaches each leaf, and recalculates up to recompute the modified node neighbors. (one level above you started) That is not easy or single job to do.

D3 with crossfilter becomes very slow large datasets

I have a visualization similar to the crossfilter example, except that the crossfilter selection dynamically updates a timeline. The code works well with around 100 elements. I tested the code with close to 5000 elements and it became very slow when applying/changing the brush to filter the dimensions. I was wondering where the performance issue was and how to fix it. The timeline with around 5000 svg rects rendered well. The crossfilter histograms were very slorw (not creating the histogram, but when resizing and applying brush) when applying and resizing the brush. I even disabled the timeline from updating after the brush changes and that did not help performance. I am generating the crossfilter histograms using the svg path method similar to that in the crossfilter example.I am not sure why drawing the brush is taking so long.
Could it be related to the crossfilter? A note about the data: while there are only a few thousand elements of data in the crossfilter, the elements are very large (contain around a hundred or so attributes). not sure if that contributes to the problem.
Thanks
a

Usually performance problems such as you're describing will be caused by the browser rendering the generated SVG rather than the actual processing of the data in Javascript. One thing you could try is to use HTML canvas instead of SVG. Note that converting your code to do so will not be a trivial process however.

Create a smooth curve from a series of GPS coordinates

I have looked at a lot of Q/A here on SO regarding similar (if not the same) question I have. Yet none had an answer I was able to understand.
I wish to input a series of GPS coordinates, and create a smooth curve that connects them all, and passes through ALL of them. Javascript is my preferred language and I have found this page
http://jsdraw2d.jsfiction.com/demo/curvesbezier.htm
It allows you to plot any number of points and when clicking the 'Draw Curve' button it does exactly what I want (except it is on html5 canvas whereas I want to use lat/lon values)
You can download the jsDraw2D source code here:
http://jsdraw2d.jsfiction.com/download.htm
The function in question is drawCurve() and it appears to calculate the points of the curve, creating a separate 'div' for each point as it goes along, while also appending them to the html page. I am presuming I need to get rid of the code for creating the html divs and instead add each point as it is calculated to an array or string. However, it is simply over my head (perhaps because it seems overwhelming and my understanding is not quite spot on).
I would post the code here, but it is pretty long, plus I am not sure how many other functions it calls/requires from the rest of the script.
The only other thing I can think of that needs to be considered is the +/- values in GPS coordinates. I am hoping that altitude changes would not effect the smooth line created too much, especially since it seems to create the new points so close together.
Any help in modifying that code would be greatly appreciated. If someone has some other approach, I am open to suggestions - however I would prefer a way that passes through ALL the input points (unlike some mathematical curve functions that do not)
Thanks!

Javascript directed acyclic graph library? (Graph visualization is NOT necessary)

I have a dataset which is best represented by a graph. It consists of nodes of 6 or 7 different "types" with directed edges (dependencies on one another, guaranteed not to have cyclic dependencies). The dataset is essentially a template of a layered configuration, and the user needs to be able to select bits and pieces of the configuration from different layers which are desired, and have the dependent bits be brought in automatically.
The general UI need is for a user to select or un-select items from multi-select boxes (one such box for each node type), and have "depended-on" items in the other boxes become selected or unselected as needed. I need to be able to pull down the dataset from the server, let the user select the desired bits (with the dependency processing being done in javascript on the client side for responsiveness), and then submit the result back when they are finished.
The dataset is large and complex enough that actually showing it as a graph would be overwhelming and confusing to the user. Only basic graph traversal operations are needed, since all that is required is to cascade selections out the dependencies. (For example, a user un-selecting a node would result in that nodes dependencies becoming unselected if there were no other selected node which still depended on them. A user selecting a node would result in all of that node's dependencies becoming selected.) A simple depth or breadth first search following directed edges from the start node will suffice to visit all affected nodes. If I can follow edges either direction, bonus. (If not I can easily generate an edge-reversed graph and use that when needed.)
I have dug around on here and found references to a number of javascript graph visualization libraries, but most of these discussions seem to interpret "graph" as "chart" and I have no charting needs here. My digging has led me to this list: Raphael, protovis, flare, D3, jsVis, Dracula, and prefuse. From this list it looks like jsVis or Dracula might have the underlying graph constructs I need if I just ignore the visualization side, but it isn't clear to me from the documentation if that is the case. I have to rule out a few others because I cannot bring in any flash dependencies. Unfortunately I don't have time to prototype things with this many libraries. (I will be digging into jsVis and dracula more though, barring some handy input here.)
If anyone has experience with something from that list and believes that the graph portion of it can be used independently of the visualization portion, that will certainly meet my needs. If there is some other library I could use that meets my needs, that would be great too. One final requirement regarding licensing: the library needs to be "free" in a non-copyleft way - So ideally Apache v2.0, BSD, MIT, or something like that.

I haven't used it, but you might want to check out data.js. It's an MIT-licensed library with a range of data-structure utilities. In particular, it includes Data.Node and Data.Graph:
A Data.Graph can be used for representing arbitrary complex object graphs. Relations between objects are expressed through links that point to referred objects. Data.Graphs can be traversed in various ways.

Data visualization: Bubble charts, Venn diagrams, and tag clouds (oh my!)

Suppose I have a large list of objects (thousands or tens of thousands), each of which is tagged with a handful of tags.
There are dozens or hundreds of possible tags and their usage follows a typical power law:
some tags are used extremely often but most are rare.
All but the most frequent couple dozen tags could typically be ignored, in fact.
Now the problem is how to visualize the relationship between these tags.
A tag cloud is a nice visualization of just their frequencies but it ignores which tags occur with which other tags.
Suppose tag :bar only occurs on objects also tagged :foo.
That should be visually apparent.
Similarly for three tags that tend to occur together.
You could make each tag a bubble and let them partially overlap with each other.
Technically that's a Venn diagram but treating it that way might be unwieldy.
For example, Google charts can create Venn diagrams, but only for 3 or fewer sets (tags):
http://code.google.com/apis/chart/docs/gallery/venn_charts.html
The reason they limit it to 3 sets is that any more and it looks horrendous.
See "extentions to higher numbers of sets" on the Wikipedia page: http://en.wikipedia.org/wiki/Venn_diagrams
But that's only if every possible intersection is non-empty.
If no more than 3 tags ever co-occur (maybe after throwing out the rare tags) then a collection of Venn diagrams could work (with the sizes of the bubbles representing tag frequency).
Or perhaps a graph (as in vertices and edges) with visually thicker or thinner edges to represent frequency of co-occurrence.
Do you have any ideas, or pointers to tools or libraries?
Ideally I'd do this with javascript but I'm open to things like R and Mathematica or really anything else.
I'm happy to share some actual data (you'll laugh if I tell you what it represents) if anyone is curious.
Addendum: The application I originally had in mind was TagTime but it occurs to me that this also maps well to the problem of visualizing one's delicious bookmarks.

If i understand your question correctly, an image matrix should work nicely here. The implementation i have in mind would be an n x m matrix in which the tagged items are rows, and each tags type is a separate column. Every cell in the matrix would consist entirely of "1's" and "0's", i.e., a particular item either has a given tag or it doesn't.
In the matrix below (which i rotated 90 degrees so it would fit better in this window--so columns actually represent tagged items, and each row shows the presence or absence of a given tag across all items), i simulated the scenario in which there are 8 tags and 200 tagged items. , a "0" is blue and a "1" is light yellow.
All values in this matrix were randomly selected (each tagged item is eight draws from a box consisting of two tokens, one blue and one yellow (no tag and tag, respectively). So not surprisingly there's no visual evidence of a pattern here, but if there is one in your data, this technique, which is dead simple to implement, can help you find it.
I used R to generate and plot the simulated data, using only base graphics (no external packages or libraries):
# create the matrix
A = matrix(data=r1, nrow=1, ncol=8)
# populate it with random data
for (i in seq(0, 200, 1)){r1 = sample(0:1, 8, replace=TRUE); A = rbind(A, r1)}
# now plot it
image(z=A, ann=F, axes=F, col=topo.colors(12))

I would create something like this if you are targeting the web. Edges connecting the nodes could be thicker or darker in color, or perhaps a stronger force connecting them so they are close in distance. I would also add the tag name inside the circle.
Some libraries that would be very good for this include:
Protovis (Javascript)
Flare (Adobe Flash)
Some other fun javascript libraries worth looking into are:
Processing for Javascript
Raphael

Although this is an old thread, I just came across it today.
You may also want to consider using a Self-Organizing Map.
Here is an example of a self-organizing map for world poverty. It used 39 of what you call your "tags" to arrange what you call your "objects".
http://www.cis.hut.fi/research/som-research/povertymap.gif

Note sure it would work as I did not test that, but here is how I would start:
You can create a matrix as doug suggests in his answer, but instead of having documents as rows and tags as columns, you take a square matrix where tags are rows and columns. Value of the cell T1;T2 will be the number of documents tagged with both T1 and T2 (note that by doing that you'll get a symetric matrix because [T1;T2] will have the same value as [T2;T1]).
Once you have done that, each row (or column) is a vector locating the tag in a space with T dimensions. Tags near each others in this space often occur together. To visualize co-occurrence you can then use a method to reduce your space dimensionality or any clustering method. For example you can use a kohonen self organizing map to project your T-dimensions space to a 2D space, you'll then get a 2D matrix where each cell represents an abstract vector in the tag space (meaning the vector won't necessary exists in your data set). This vector reflect a topological constraint of your source space, and can be seen as a "model" vector reflecting a significant co-occurence of some tags. Moreover, cells near each others on this map will represent vectors close to each other in the source space, thus allowing you to map the tag space on a 2D matrix.
Final visualization of the matrix can be done in many ways but I cannot give you advice on that without first seeing the results of the previous processing.

We Keep Coding

JavaScript is the programming language of the Web.