STAT 80B - Data Visualization
10 Mar 2026
Reading: Wilke Ch 17 & 29
“The sizes of shaded areas in a visualization need to be proportional to the data values they represent.” — Wilke, Ch 17
Ink = any visual element that deviates from the background (bars, lines, areas, points).
When shaded area encodes a value, that area must scale with the value. Violating this is one of the most common ways visualizations mislead.
The problem: A bar chart with a y-axis starting at $50,000 instead of $0. The bars look dramatically different, but the actual income differences are modest.
Why it misleads: Bar height is no longer proportional to the underlying values. The visual difference is amplified far beyond the data difference.
The fix: Bar charts on a linear scale must always start at zero.


Line graphs are different from bar charts — lines encode change and trend, not absolute quantity.
Rule of thumb
Ask: “Does the reader perceive magnitude from the length/area of a shape?” If yes → start at zero.
With a partner, look at these chart descriptions and identify whether the proportional ink principle is violated:
3D visualizations add visual complexity without adding information:
The only exception
True 3D spatial data (e.g., topographic maps, molecular structures) — but even then, consider 2D alternatives.
The same data in 2D is almost always clearer, easier to read, and more honest. When someone uses 3D in a presentation, ask: what does the third dimension actually represent?
We covered color theory in Week 2 — here we focus on design errors:
Redundant coding = encoding the same variable through multiple aesthetics (color + shape, color + line type, etc.)
Best practice
When using color to distinguish groups, also use different shapes (for points) or line types (for lines).
When many points overlap, individual data is hidden (overplotting). Solutions:
| Problem | Solution | Trade-off |
|---|---|---|
| Moderate overplotting | Transparency (alpha) | Can still create dark blobs |
| Many points | Jittering | Slightly moves points; use carefully |
| Very many points | 2D density / hexbin | Loses individual point identity |
| Categorical x-axis | Sina plot / beeswarm | Preserves distribution shape |
| Time series | Reduce to summary stats | Loses detail |
Small multiples (or facets) show the same visualization repeated for different subgroups or conditions.
Wilke’s advice (Ch 29)
When building up to a complex multi-panel figure, first show your audience one panel alone so they understand the structure, then reveal the full grid.
A compound figure combines multiple different plot types into one figure (panel A, panel B, etc.).
Best practices:
These are often treated as afterthoughts — but they’re critical:
Rewrite these poor titles as informative ones:
Then: write a two-sentence figure caption for one of them that includes (1) what is shown, and (2) the data source.
In data visualization, a story has:
“Every time you decide what to include and what to leave out, you are creating a story.”
Wilke uses a military analogy: generals need to understand the situation quickly, not memorize every detail. Your audience is usually the same.
The scatter plot is technically impressive. The bar charts actually tell the story.
When you need a complex visualization, earn it:
In a presentation
Never drop a complex figure cold. Walk your audience through it: “This is the structure… here’s what the x-axis shows… here’s what each color means… and here’s what I want you to notice.”
People remember images, not tables.
Choose one graph from this collection: NYT collection
Share with the class — we’ll discuss 2–3 examples.
Due Week 9: Design Principles Concept Map. Due tomorrow!
You’ll be synthesizing the principles from Wilke Ch 17–26 into a visual concept map. Think about how the ideas from today connect:
More details in the assignment instructions (ConceptMap3).
STAT 80B - Winter 2026 | Week 8 - Thursday