Thursday, 30 June 2011

Why did I move into data visualization?

Preamble: It's been very quiet on this blog since I left the Wellcome Trust Sanger Institute in the UK and took my position here at Leuven University in Belgium last October. Truth is that the type of work changed so profoundly that it takes a while to give it all a place in your head; let alone a blog. Until I remembered this morning why I started this blog in the first place: to help me order my thoughts in the first place. So it might have sped things up instead, actually...

Anyway... In this post I'd like to explain why I'm moving into the *data visualization* field. And it's not just because it's always nice to look at pretty pictures.

Statistics are great, but...

Ben Schneiderman, Professor at the University of Maryland, very eloquently stated that "The great fun of information visualization is that it gives you answers to questions you didn't know you had". I'd rephrase that a bit to "The great use of data visualization is that it gives you clues to questions you didn't know you had". To me, it's as much about finding the questions to ask as about finding the answers to those questions. Data visualization should not be used to "prove" things; that's what statistics is for. But the visualization can give you ideas on what statistical models to test. As do many others, I see a strong connection between statistics and data visualization. Taking a bit of a shortcut here, you could say that statistics is about proving what you expect, while visualization is about discovering what you didn't expect and refining those expectations.

From my own experience, I've seen that many (but not all!) statisticians look down upon data visualization with the argument that it can't proof anything. That's true. But their reaction then often becomes to throw away the baby with the bath water, instead of trying to see how both fields can benefit from each other. It's not always equally simple to convince people of the effectiveness of visualizations, but we're getting there...

Explain and explore

In the data visualization field, there is often the tension between explanation and exploration. The work I'll be doing here in Leuven will cover the whole spectrum. In the explanation corner, there is trying to make sense of complex data. For example helping cancer genetics researchers understand how tumours evolve (e.g. the phylogeny of cancer cells) or what the rearranged genome in those tumours looks like. This type of visualization sits downstream from the data analysis, after the data is churned. On the other hand, there are the exploration projects, where we focus on showing the raw(ish) data to help us decide on what type of analysis to perform, for example for investigating parameter-space for an algorithm. Of course many projects will fit somewhere in the middle...

The visualization model

Jarke van Wijk's paper "The Value of Visualization" (doi: 10.1109/VISUAL.2005.1532781) is a masterpiece in that it describes a comprehensive model of what visualization is and how we can quantify its effectiveness (cost). I'll just leave the picture here for you to contemplate over:

My inspiration

There are several people whose work I keep in mind when discussing what I want to do in my group; my sources of inspiration, so to speak. They have a very important thing in common: they don't take shortcuts in their work and are not afraid to really think about what their visualizations are intended to do.

These include:

  • Cydney Nielsen: ABySS-Explorer - a sequence assembly visualization tool
  • Miriah Meyer: Pathline - a tool for comparative functional genomics
  • Martin Krzywinski: Hive plots - rational network visualization

It's really exciting to work in this field; I'm looking forward to what the next few years will bring :-)