Friday, August 03, 2007

Visualising very big trees, Part II

OK, time to put my money where my mouth is. Here's a first stab at displaying big trees in a browser. Not terribly sophisticated, but reasonably fast. Take a look at Big Trees.

Approach
Given a tree I simply draw it in a predetermined area (in these examples 400 x 600 pixels). If there are more leaves than can be drawn without overlapping I simply cull the leaf labels. If there are internal node labels I draw vertical lines corresponding to the span of the corresponding subtree, which is simply the range between the left-most and right-most decendants of that node. If internal node labels are nested (e.g., "Mammalia" and "Primates") I draw the most recent internal node label, the rationale being that I want only a single set of vertical bars. This gives the visual effect of partitioning up the leaves into non-overlapping sets. This gives us a diagram like this:


OK, but what about all the nodes we can't see? What I do here is make the tree "clickable" in the following way. If there are internal node labels I make the corresponding tree clickable. I also traverse the tree looking for well defined clusters -- basically subtrees that are isolated by a long branch from their nearest neighbours -- and make these clickable. This approach is partly a hang over form earlier experiments on automatically folding a tree (partly inspired by doi:10.1111/1467-8659.00235). The key point is I'm trying to avoid testing for mouse clicks on nodes and edges, as many of these will be ocluded by other nodes and edges, and it will also be expensive to do hit testing on nodes and edges in a big tree.

If you click on one the script extracts the subtree and reloads the display showing just that part of the tree, using exactly the same approach as above. Behind the scenes the code is doing a least common ancestor (LCA) query, hence it defines subtrees rather like the Phylocode does (oh the irony).

Pros
  • Reasonably fast (everything you see is done live "on the fly").
  • Works in any modern browser, no dependence on plugins or technology that has limited support.
  • Image is clear, text is small but legible.
  • Entirely automated layout


Cons
  • Reloading a new page is costly in terms of time, and potentially disorienting (you loose sense of the larger tree).
  • It is not obvious where to click on the tree (needs to be highlighted).
  • Text is not clickable. This is would be really useful for internal node labels.

3 comments:

David Shorthouse said...

Now I am starting to appreciate what you're wanting to do. But, I'm still not convinced that panning and zooming is a problem so perhaps I need to have it better explained. What is the value in being able to simultaneously see a large portion of a tree while zoomed into a specific branch? I assumed the cognitive significance of wanting to see more detail is to temporarily push aside the big picture. Isn't your bitmap-based solution of clicks & page reloads equivalent to zooming in where the browser back button is equivalent to zooming out? You can however implement some AJAX without any great difficulty to expand/contract branches with onclick events or to also make use of the fisheye zoom (e.g. the Dojo toolkit: http://dojotoolkit.org/demos)

David Shorthouse said...

Out of curiosity, what are you using to generate your imagemaps? I muck around with MapServer, which has a pretty good imagemap generator (see: HERE). It's entirely possible that this could be kludged to pull data from a backend & overlay the imagemap on pre-existing bitmap trees.

Roderic Page said...

Firstly, there's a bit of a gap between what I'd like to do and what the first experiment achieves. Click and reload is rather like zoom I agree. At this stage I'm exploring what is doable (by me).

The reason for being able to see the entire picture while at the same time zoomed in on one part is to maintain a sense of context, and my impression from the computer science literature is that usability studies suggest this matters (I'll try and dig up some references for you).

Regarding fisheye, the issue is likely to be one of scalability. The Dogo example (and the Mac OS X Dock) are tiny examples compared to a tree with 1000s of nodes. To do fish eye effectively I think you'd need to either have the tree in memory (which programs such as Dendroscopecan do)
), or really fast access to a server that would return new co-ordinates for each transformation -- neither of which seems easy to do for big trees over the web. One strategy might be to use a simpler transformation, such as bifocal visualisation (see, for example, The Bifocal Tree: a Technique for the Visualization of Hierarchical Information Structures).

Regarding the image maps, these are generated using a C++ program for parsing and drawing trees (based on code I use in TreeView X), then I use ImageMagick to convert the output to a bitmap. The C++ program also generates the image maps.

However, this evening I've abandoned image maps in favour of DIVs, because I can now show the user where they are in the tree when they click (I hope to get this next version up online later this week).