About 2 years ago I've become very much interested in UX and everything about UX. This interest has eventually evolved into strong awareness of presenting information, visually in particular. One just can't help thinking in terms of visualization when reading Tufte, Cleveland and Berten. Ideas come pouring in all the time: how to make things more visual, easier to grasp, more clear (in our product, in particular).
I will try to share this feeling and tell more about the principles of information visualization based on some very impressive stories. I beg your pardon for a couple or two boring definitions. There're no jokes in this article, intentionally. It's a deadly serious business, so scrape up all your patience and read on.
Disclaimer: the article is quite lengthy. Even so, it's my sincere hope that you will get over it in no time, as I can't stress enough how engaging and fascinating the subject is.
Why visualization matters?
- Visualization lets you see things that would rather go unnoticed. Any data contain information but if there's no visual data you're missing out on trends, behavior patterns and dependencies.
- Visualization gives answers faster. Looking at a graph and identifying a trend is an instant. Now imagine how much time it will take to eye-scan rows of numbers.
- A good visualization gives way to research data, to play with them, to investigate some curious cause-effect relationships. This is very important for investigation and research work, as in journalism.
- Data volume is growing at a crazy rate. Visualization helps to leverage not only the volume, but the ever increasing diversity of data.
- Color pictures are pretty and fun to look at. Rows of boring numbers give no fun. Subjectively, we give more credit to information if it comes in a nice visual wrap up.
Here's a classic example of using visualization to resolve a life-and-death situation: the cholera epidemic in London (see Visual Explanations by Edward R. Tufte). A Dr. John Snow used some visualization techniques to back his theory that cholera sits in water, although previously held beliefs suggested that cholera is spread by the air or by some other means.
In 40 days more than 660 people died in one of London areas. Dr. Snow summoned the mortality stats and plotted it on a map. Black rectangles represent deaths from cholera. The intense cluster of death marks is closing in on the notorious well, near the D on Broad Street.
Some deaths were reported in the households located at quite a distance from the pump. John Snow investigated each death looking for the evidence to back his case about a possible cause-effect relationship. Strangely, no one died in a brewery nearby. The owner of the brewery said the workmen were allowed to drink beer, and there was also a deep well right in the brewery. Shortly, the epidemic ended once they shut down the pump.
As we can see, data visualization helped discover the cause of the epidemic and bring it to an end. This evidence is good enough both to ignite the feeling of deep respect to Dr. Snow and to develop a liking for Tufte's pieces.
Information Visualization: General Principles
Let's look into just two of them: the Image Concept and Visualization Mantra. I'm leaving off such low-level principles as data-to-ink ratio and legibility rules on purpose, as they influence the execution, not the patterns.
Bertin introduced the concept of Image and defined the levels of reading a representation.
Boring definition #1
“Image is the meaningful visual form, perceptible in the minimum instant of vision”.
Here's the same in plain English: if we are able to say right away what we see in a picture — then this picture is an image. How is this related to information visualization?
Let's now look into the levels of reading, of which we've got 3: elementary, intermediate and overall.
What you see above is the most boring graph ever. It shows the daily dynamics of stock quotes.
- The elementary level question would be “What was the quotation for stock X on March 5?”
- Intermediate: “Over the first 3 days, what was the movement of the stock?”
- Overall: “During the entire period, what was the trend of the stock?”
Good visualization gives instant answers to all the 3 level questions. Poor visualization can't do that, which means that good visualization should try to get as close as possible to the Image.
Getting an answer should take next to no time
Ben Schneiderman has also contributed a lot to information visualization studies. He invented the information visualization mantra.
There goes a little bit less boring definition #2
“Overview first, zoom and filter, then details-on-demand.”
The mantra got some polishing further along. Here's the latest version:
These are some tangible quality standards for good visualization. Users should be able to:
- Select and filter data. Take stock quotations. There should be an option to filter out the companies by category, by their assets etc.
- Reconfigure. Sometimes I want to browse data in a simple list, or give it a more complex representation, such as a matrix or a scatter chart.
- Encode. I want to distinguish the value of capitalization by size. Large bubbles for large companies, small bubbles for small ones. Or, I want to color hi-tech businesses in red, and energy businesses in grey.
- Connect (if it makes sense). If data are somehow related (e.g. people), I want to be able to display/hide these connections.
- Zoom+Details. What's this big green bubble? Mouseover… Oh, that's Apple (see the Sector Snapshot Chart below).
As you can see, visualization tends to be very interactive. It's a rather new, exciting and yet a little explored research domain.
Information Visualization = Representation + Interaction
Basically, there're 4 of them:
- Maps. Geographic maps, infographic maps, cartograms, etc. We see those mapsalmost every day as we search the web or watch news on TV.
- Timelines. Timelines are used to visualize anything related to time, for example, a project delivery schedule (remember the notorious Gantt chart?) or the moon cycle.
- Many variables. This is a rather broad class of patterns used to visualize data with many variables e.g. annual per capita Coke consumption with relation to gender, weight and height of the consumers.
- Networks. Used to visualize dependencies, connections and hierarchies.
Let's take a look at some examples.
That's the least interesting pattern for me as I don't need maps in my work at all. For the public, on the contrary, they are much more engaging.
Below is a great example of an interactive map created by the wizards of New York Times. This daily abounds in nice visualizations, and you can learn lots of cool stuff from them.
Immigration Explorer successfully applies all the 5 principles of good visualization to the maps.
You can filter data by countries and by years, show it either by percent of population or by number of residents, zoom in/out on specific areas and adjust bubble sizes.
Playing with an interactive map is much more fun than staring at a static image. Looking into the dynamics of immigration trends, you map it to some history facts that you know and discover some interesting details. By the way, immigrants from the former Soviet Union have developed a special liking for LA and New York City.
That's what I can (almost) wholeheartedly call an ideal visualization.
There are a great many of timelines in the world. Let's take a look at the CalTrain timetable. Departure times are shown on the X axis, stations – on the Y axis.
We can see right away that the stations are separated in proportion to the distance between them. This display also reveals if trains bypass stations or not. Besides, the slope of line reflects the actual speed of the train: the steeper the line, the faster the train. Some limited service trains (colored in black) with a later departure time are passed by bullets.
Unfortunately, this timetable is quite limited in terms of interaction with viewers. You can only filter trains by speed, direction and week days. There's also a mouseover pop-up showing the exact time of arrival. What I'd like to be able to do is: zoom in on certain time or distance intervals, highlight trains by their type and change the representation to a more conventional timeline.
Seems like I can't keep it to myself any longer, so let me share an example fromTargetProcess, the agile project management tool that we develop (disclaimer: this is a shameless ad). What you see below is a progress timeline showing for how many days a user story has been sitting in a certain state, who was working on it, when new tasks and bugs have been added and when they have been closed.
We're using Kanban, so this user story is flowing through several states. In general, the flow was smooth; it only got stuck in the Release Branch state for a week.
There're no filters, everything is set in stone and it's pretty painstaking to get a bird's eye view if the development takes more than several weeks. So there's lots of space for improvement. But this timeline is good enough to fit diverse information to one screen.
With many variables you've got to do a lot of thinking. Is there a decent way to visualize as many as 6 variables? Why not. What we see below is the visualized 10-year risk of fatal CVD with relation to gender, age, blood pressure, cholesterol level and smoking or non-smoking.
Let's ignore the stylistics which leaves much to be desired and focus on the message. One can explore this representation for quite long, noticing that for females the average risk is 2 times lower; the risk free young people can relax; it turns out that smoking starts taking its heavy toll on people in their 40s. The diverse data are visualized very well here.
Now let's get back to New York Times. How to differentiate leaders from outsiders? Who is getting back on success track, and who's lagging behind? The beautiful chart below has answers. This visualization might seem quite complicated at first sight, but you'll like it when you understand how it works.
The chart shows how companies perform on the market over two time periods: a year (Y-axis) and a user-selected period (day, week, month, quarter - X-axis). They are falling into 4 groups:
- Leaders: ahead both for year and in short term
- Outsiders: behind both for year and in short term
- Improving: behind for year but ahead in short term
- Slipping: ahead for year but behind in short term
You can filter companies by category; see details for any company and slide the time period scale. And, yes, the large green bubble on the upper right – that's Apple.
A mind map would be the most common example of tree network visualization. Take a look at the mind map of this article.
You can also check a more detailed version of this mind map which includes links to some other nice examples.
It's quite difficult to do a treemap visualization for a small screen. Treemaps look great on large wall posters.
With this article I wanted to pass on the importance of visualizing information to you. Using some hands-on examples I tried to show how visualization helps people make decisions and work with data.
I'm so overwhelmingly happy if I managed to accomplish this. Thanks for scrolling all the way down to the end (I tried to avoid this joke but couldn't help myself, sorry…).