STAT 80B Week 3 - Tuesday
2026-03-10
The Question: You have data showing sales across 10 product categories. How do you show it?
Bar chart? Dot plot? Heatmap? Table?
All can work—but which tells the story best?
Today we’re building your “visualization vocabulary” so you can make smart choices!
We’re learning how to choose and create effective visualizations for:
Why it matters: These are the foundation for most data visualizations you’ll ever make!
Think of Wilke Chapter 5 as a visual menu of charts:
Think of Wilke Chapter 5 as a visual menu of charts:
Think of Wilke Chapter 5 as a visual menu of charts:
Think of Wilke Chapter 5 as a visual menu of charts:
Think of Wilke Chapter 5 as a visual menu of charts:
Today’s focus: Amounts (Ch 6) and Distributions (Ch 7)
You have data on:
Brainstorm: How many different ways could you visualize this?
Sketch at least 2 different approaches!
For that simple dataset, you could use:
There’s rarely ONE right answer—but some choices are better than others! here’s a collection of MANY more plots: https://datavizcatalogue.com/
Definition: Numerical values for different categories
Examples:
The common thread: One number per category
The workhorse of data visualization
When to use vertical: Few categories (2-8), category names are short
When to use horizontal: Many categories (8+), category names are long
Scenario: Visualizing the 50 US states by population
Option A: Vertical bars
Option B: Horizontal bars
Discuss: Which would you use? Why?
Horizontal bars win when:
💡 Pro tip: If your vertical bars need rotated labels, consider going horizontal!
New challenge: You have TWO sets of amounts
Example: Sales by product AND by quarter
Grouped bars:
Bars side-by-side
Easy to compare within category
Stacked bars:
Bars on top of each other
Shows total + parts
You’re comparing:
Question 1: Which is easier with grouped bars?
Question 2: Which is easier with stacked bars?
Try sketching both!
Grouped bars: Great for comparing specific values
Stacked bars: Great for seeing totals
Trade-off: You can’t optimize for both comparisons at once!
A minimalist alternative to bars
When dots work better than bars:
Showing amounts with color instead of position
Best for:
| Method | Best When | Avoid When |
|---|---|---|
| Vertical bars | Few categories, short names | Many categories, long names |
| Horizontal bars | Many categories, long names | Very few categories |
| Grouped bars | Comparing specific values | Need to see totals |
| Stacked bars | Need to see totals | Comparing middle segments |
| Dot plots | Precise values matter | Need to emphasize magnitude |
| Heatmaps | 2D patterns, many categories | Small datasets |
Stand up, stretch, grab water!
Instead of one number per category, we have MANY values:
The question: How are these values spread out?
Understanding spread helps you answer:
Real impact: This affects everything from quality control to medical diagnoses!
The classic approach:
Example: Student heights
Same data, different stories:
Too few bins (wide bins):
Too many bins (narrow bins):
Take the heights data we had before. Now, draw different histrograms:
Option A: Use 3 bins
Option B: Use 6 bins
Option C: Use 12 bins
Discuss: What would each show/hide? Which would you choose?
The right bin width depends on:
💡 Best practice: Try 3-5 different bin widths and see what story emerges!
A smooth version of histograms:
Advantage: No arbitrary bins!
Disadvantage: The smoothness itself is arbitrary (bandwidth parameter)
Same issue as bin width, different name:
Large bandwidth:
Small bandwidth:
The solution: Try multiple bandwidths, just like bin widths!
Common scenarios:
Three approaches:
You’re comparing:
Sketch or describe: How would you show both distributions?
What makes your choice better than alternatives?
For comparing distributions:
Side-by-side works when:
Overlapping works when:
Avoid stacked histograms: Only the bottom distribution is easy to read!
The crime: Starting y-axis at non-zero
Why it’s bad: Exaggerates differences
Example:
The crime: 30 tiny bars you can’t read
The fix:
The crime: Using default settings without thinking
Why it’s bad: Software defaults might hide important patterns
The fix: ALWAYS try at least 3 different bin widths!
The crime: Using stacked bars when comparisons matter
Why it’s bad: Only the bottom segment has a common baseline
Example: Comparing middle segments across groups is nearly impossible!
Every visualization is a choice:
Your job: Make informed choices that reveal truth, not distortion!
Lab 2: Hands-on practice!
We’ll create:
Your choice of tools: Tableau, R, or Python
Due: Same day in class
I’ll be here if you have any questions :)