Visualizing Many Distributions

Boxplots, Violin Plots, Ridgelines, and Small Multiples

STAT 80: Data Visualization

Week 4, Day 2

Comparing Multiple Distributions

Today’s focus:

  • Boxplots and box-and-whisker plots
  • Violin plots
  • Ridgeline plots
  • Small multiples (faceting)

The Challenge

You have data for multiple groups and want to compare their distributions:

  • Test scores across different classes
  • Salaries across different companies
  • Temperatures across different cities
  • Product ratings across different brands

Problem: Histograms get messy with many groups!

Solution: Specialized comparison plots

Introducing: The Boxplot

Also called “box-and-whisker plot”

Shows 5 key numbers in one compact visual

Boxplot Anatomy: The Five Numbers

  1. Minimum - smallest value (end of bottom whisker)
  2. Q1 (25th percentile) - bottom of box
  3. Median (50th percentile) - line in middle of box
  4. Q3 (75th percentile) - top of box
  5. Maximum - largest value (end of top whisker)

The Box = Middle 50%

The box contains the middle half of your data

  • Bottom of box = 25% of data below this
  • Top of box = 25% of data above this
  • Height of box = “Interquartile Range” (IQR)

Taller box = more spread out
Shorter box = more tightly clustered

Example: Building a Boxplot

Data: Test scores: 55, 62, 68, 72, 75, 78, 80, 82, 85, 88, 92, 95

Step 1: Find the five numbers

  • Minimum = 55
  • Q1 (25th %ile) = 69
  • Median (50th %ile) = 79
  • Q3 (75th %ile) = 86.5
  • Maximum = 95

Step 2: Draw it!

Building a Boxplot by Hand

Try it yourself:

  1. Draw a number line
  2. Mark the five numbers
  3. Draw box from Q1 to Q3
  4. Draw line at median
  5. Draw whiskers to min and max

What About Outliers?

Sometimes we have extreme values that don’t fit the pattern…

Convention: Values more than 1.5 × IQR beyond the box are shown as individual points

Outlier Detection Rule

Step 1: Calculate IQR = Q3 - Q1

Step 2: Calculate fences:

  • Lower fence = Q1 - (1.5 × IQR)
  • Upper fence = Q3 + (1.5 × IQR)

Step 3: Any values outside the fences are outliers

Whiskers go to: The most extreme value within the fences (not all the way to the fences!)

Example: With Outliers

Data: 12, 15, 16, 18, 19, 21, 22, 24, 25, 28, 55

  • Q1 = 16, Q3 = 25, IQR = 9
  • Lower fence = 16 - (1.5 × 9) = 2.5
  • Upper fence = 25 + (1.5 × 9) = 38.5
  • 55 > 38.5, so it’s an outlier!
  • Upper whisker goes to 28 (not 55)
  • 55 is plotted as a point

Comparing Groups with Boxplots

This is where boxplots really shine!

Easy to see:

  • Which group has higher median
  • Which group has more variability
  • Whether there are outliers

Reading Boxplots: Practice

Questions:

  1. Which class has the highest median score?
  2. Which class has the most variability?
  3. Which class has outliers?
  4. In which class did 75% of students score above 70?

Boxplot Summary

Advantages

Compact - can show many groups side-by-side

Clear comparison - easy to see medians and spreads

Outliers highlighted - unusual values stand out

Standardized - everyone reads them the same way

Works with small or large datasets

Limitations

Hides the actual shape - can’t see if distribution is bumpy, smooth, symmetric

Can’t see sample size - 10 points vs 1000 points look similar

Oversimplifies - reduces all data to 5 numbers

Solution: Combine with other visualizations!

Violin Plots: Best of Both Worlds

Violin plot = Boxplot + smoothed histogram (rotated and mirrored)

Shows both summary statistics and shape!

Violin Plot Anatomy

  • Width = density (how many values at that level)
  • Often includes boxplot inside
  • Shape shows full distribution

Wide part = many values
Narrow part = few values

When to Use Violin Plots

Best for:

  • When shape matters (is it symmetric? bimodal?)
  • Moderate number of groups (2-8)
  • Technical or data-savvy audiences
  • When you want more detail than boxplot

Avoid when:

  • Too many groups (gets cluttered)
  • Very small sample sizes (shape is noisy)
  • General public audiences (less familiar)

Ridgeline Plots: Elegant Overlaps

Also called “joy plots” (from Joy Division album cover!)

Shows: Many distributions, slightly overlapped, like mountain ridges

Ridgeline Plot Characteristics

Each “ridge” is:

  • A smoothed histogram (density curve)
  • Partially transparent
  • Offset vertically

Great for:

  • Many groups (6+)
  • Showing overall patterns
  • When order matters (time, ranking)

Ridgelines vs Boxplots vs Violins

Boxplot:

  • Most compact
  • Best for quick comparison
  • Focuses on medians and spreads

Violin:

  • Shows shape + summary
  • Good for detailed comparison
  • 2-8 groups

Ridgeline:

  • Most elegant
  • Best for many groups
  • Emphasizes overall patterns

Small Multiples (Faceting)

Idea: Instead of overlaying, make separate mini-charts for each group

Also called: trellis plots, panel plots, facets

Small Multiples: When and Why

Best for:

  • Comparing many groups without overlap
  • When each group needs detail
  • Complex patterns within groups
  • Different types of visualizations per group

Design principles:

  • Keep scales consistent (same axes!)
  • Arrange logically (alphabetical, by value, by time)
  • Make it easy to compare across panels

Small Multiples: Good vs Bad

Good

  • Same scales
  • Clear labels
  • Easy to compare

Bad

  • Different scales
  • Hard to compare
  • Confusing

Choosing Your Comparison Method

How many groups?

  • 2-3 groups: Overlaid histograms, ECDFs, or boxplots
  • 4-8 groups: Boxplots or violin plots
  • 8+ groups: Ridgeline plots or small multiples

What matters most?

  • Medians and spreads: Boxplots
  • Full shape: Violin plots or ridgelines
  • Fine details: Small multiples with histograms

Reference Lines and Bands

Adding context to your distribution plots:

Reference line = vertical line showing important value (mean, target, threshold)

Reference band = shaded region showing acceptable range

When to Use Reference Lines

Good uses:

  • Industry average or benchmark
  • Passing threshold or cutoff
  • Previous year’s value
  • Legal limit or standard

Tips:

  • Use a different color/style than data
  • Add a label explaining what it represents
  • Don’t clutter - 1-2 reference lines max

Combining Multiple Charts: Dashboards

Sometimes you want multiple visualizations side-by-side:

In Tableau: Use “Dashboard” feature to combine worksheets

Dashboard Design Principles

Do:

  • ✅ Have a clear story or comparison goal
  • ✅ Use consistent color schemes
  • ✅ Align related charts
  • ✅ Guide the viewer’s eye

Don’t:

  • ❌ Cram too many visualizations
  • ❌ Use different scales for same metric
  • ❌ Rely on color alone for meaning
  • ❌ Forget titles and labels

Using Pages/Filters in Tableau

Pages: Animate through one category at a time (like PowerPoint slides)

Filters: Let viewer choose which categories to show

Great for: Interactive exploration, presentations

Summary: Choosing Your Tool

Visualization Best For Limitations
Boxplot Quick comparison, many groups Hides shape
Violin plot Shape + summary, 2-8 groups Complex for general audience
Ridgeline Many groups, elegant display Needs more space
Small multiples Detailed comparison, any number Takes up space
ECDF (from Tuesday) Exact percentiles, few groups Unfamiliar to most

Coming Up

Next class (Tuesday):

  • Visualizing proportions
  • Pie charts: when they work (and when they don’t!)
  • Alternative proportion visualizations
  • The great pie chart debate

Don’t forget:

  • Project Proposal due tomorrow
  • I have office hours today after class