Thursday, 1 September 2011

Visualize This (by Nathan Yau) arrived...


Last Friday I received my long-anticipated copy of "Visualize This" by Nathan Yau. On its website it is described as a "practical guide on visualization and how to approach real-world data". You can guess what my weekend looked like :-)

Overall, I believe this book is a very good choice for people interested in getting started in data visualization. Not only does it provide the context in which to create visualizations (chapters 1, 2 and 9), it also handles different tools for creating them: R, protovis, flash.... Apart from chapter 3 that is dedicated entirely to that topic, different examples in the book were created using different tools, which gives people a good feel of what's possible in each and how "hard" or "easy" the coding itself is for the different options. Different chapters discuss different types of data that you could encounter: patterns over time, proportions, relationships, ...

There were some minor points in the book that I'd mention if they asked me to review it (but that's according to me, and I don't want to pretend to be an expert). First of all, it would have been nice if Nathan had gone a little bit deeper into theories behind what is seen as good visualization. In the first chapter ("Telling Stories with Data") he does mention Cleveland & McGill in a side-note, but I think that information (along with Gestalt laws, etc) definitely deserves one or two full paragraphs, if not half a chapter. I also don't completely agree with the use of a stacked barchart (about page 109). From my experience, they're worth less than the time it takes to create them. After all, it's impossible to compare any groups other than the one that is at the bottom (and therefore has a common "zero"-line). For example: look at the first picture below. This shows the number of "stupid things done" by women and men, stratified over 5 different groups (A-F). Although it is easy to compare total stupidity per group (group C is doing particularly bad), as well as that for men, we can't see which of the groups A, D or F scores the worst for women. And that's because they don't have a common origin. We could of course put the women next to the men, but then we'd loose the total numbers.


In the second plot, however, it is possible to compare women, men and totals. The bars for women are put next to those for men, but I've added a shaded larger bar at the back that shows the sum of the two. This plot was originally created in R using ggplot2, but I'm afraid I can't find back the reference that explained how to do this... Let me know if you can find it.



The contents of the book of course is not world-shattering. But that's not the point of the book. For people new to the field it's a great addition to their library (and I learned a thing or two myself as well). If you're interested in data visualization, go out and get it.