Week 8: Geospatial Data & Uncertainty

STAT 80B - Data Visualization

10 Mar 2026

🌍 Geospatial Data

Overview

  • Map projections and coordinate systems
  • Choropleth maps and their pitfalls
  • Cartograms and alternative geographic visualizations
  • Visualizing uncertainty: error bars, confidence intervals
  • Confidence bands for curves
  • Frequency framing for probabilities

Reading: Wilke Ch 15 & 16

The Earth Is Not Flat (But Our Screens Are)

The fundamental challenge of maps: projecting a 3D sphere onto a 2D surface always introduces distortion.

Every map projection makes a tradeoff — it can preserve:

  • Shape (conformal projections, e.g., Mercator)
  • Area (equal-area projections, e.g., Albers, Mollweide)
  • Distance (equidistant projections)
  • Direction (azimuthal projections)

No projection can preserve all properties at once.

Common Map Projections

Projection Preserves Distorts Best For
Mercator Shape/angles Area at poles Navigation
Albers Equal-Area Area Shape Thematic maps of US
Robinson Compromise Both slightly World maps
Mollweide Area Shape near edges Global comparisons
Web Mercator (Speed) Area Web tiles (Google Maps)

The Mercator Problem

The Mercator projection wildly inflates area at high latitudes:

  • Greenland appears as large as Africa → in reality, Africa is ~14× larger
  • Antarctica looks enormous
  • Europe looks larger relative to Africa than it actually is

🔍 Find this example yourself

Search: “Mercator projection distortion comparison” — Wilke Figure 15.2 shows a good side-by-side. Also try thetruesize.com — drag countries to compare their real sizes!

Choosing the Right Projection

For U.S. data: Albers Equal-Area Conic is the standard choice

  • Preserves area accurately across the contiguous states
  • Widely used by the Census Bureau, USGS

For global data:

  • Mollweide or Equal-Earth for area comparisons
  • Avoid Mercator for thematic maps showing quantities

Key question: Are you comparing quantities across regions? → Use an equal-area projection.

Maps Have Layers

A complete map is built from layers (Wilke Ch 15.2):

  1. Terrain / background — coastlines, country/state boundaries
  2. Data layer — what you’re visualizing (colors, symbols, sizes)
  3. Context layer — labels, graticules, scale bars, north arrows

Think of it like ggplot’s geom_* layers — each adds meaning.

Example Structure

Choropleth Maps

A choropleth map colors geographic regions according to a data value.

✅ When choropleths work well

  • Data represents density (value ÷ area): e.g., population per km²
  • All regions are approximately the same size
  • Color scale matches data type (sequential, diverging)

⚠️ Choropleth pitfalls

  • Large regions dominate visually even if their values are small
  • Raw counts (not rates) are misleading on a choropleth
  • Poor color choice makes patterns hard to read

Example

The Big Area Problem

Imagine a choropleth of total votes by U.S. county. Wyoming covers a huge geographic area, but has fewer than 250,000 voters. Los Angeles County is tiny on the map but has ~5 million voters.

The eye is drawn to area, not to the data.

🔍 Another Example

Choropleth: Color Scale Matters

  • Sequential palette → for data that goes from low to high (e.g., income, density)
  • Diverging palette → for data centered on a meaningful midpoint (e.g., % change, above/below average)
  • Qualitative palette → for categorical regions (e.g., political parties, climate zones)

❌ Common mistake

Using a rainbow/jet colormap — it implies order and magnitude where there may be none, and is not colorblind-friendly.

🧠 Active Learning: Critique a Choropleth (8 min)

Look at this map: https://www.nytimes.com/elections/2012/results/president.html

Discuss with a partner:

  1. What does the map make you perceive at first glance?
  2. Is the color scale appropriate? Why or why not?
  3. What would happen if you mapped vote margin per km² instead?
  4. Would a cartogram improve this visualization? Why?

Cartograms: Distorting Geography for Clarity

A cartogram rescales regions proportionally to some data variable — usually population.

Types of cartograms:

  • Contiguous cartogram — regions stay connected but areas are distorted. Example
  • Non-contiguous cartogram — regions float free, sized by data
  • Dorling cartogram — regions become circles, sized by data
  • Cartogram heatmap — equal-sized tiles arranged geographically (e.g., US state squares).

Cartogram Heatmap: A Practical Alternative

The cartogram heatmap (tilegram) gives every region equal visual weight — great when you care equally about all units.

Tradeoff: geographic accuracy is lost, but no region dominates visually.

When to Use Which Map Type

Situation Best Choice
Data is a rate/density Choropleth (equal-area projection)
Showing counts, all regions matter equally Cartogram heatmap
Emphasizing population-weighted patterns Contiguous/Dorling cartogram
Showing point locations with data Bubble map
Comparing a variable over time per region Small multiples map

📊 Visualizing Uncertainty

Why Uncertainty Visualization Matters

  • Nearly every dataset has uncertainty — measurement error, sampling variability, model uncertainty
  • Choosing not to show uncertainty is itself a design decision — and often a misleading one
  • Different audiences respond differently to uncertainty representations
  • People tend to interpret ranges as hard limits (deterministic construal error)

“The most challenging aspect of data visualization is the visualization of uncertainty.” — Wilke, Ch 16

Error Bars: The Classic (and Often Misread) Tool

Error bars extend from a central estimate to show a range. But what does the bar represent?

  • ± 1 standard deviation (SD)?
  • ± 1 standard error (SE)?
  • 95% confidence interval?
  • Min/max range?

The error bar problem

Studies show readers often can’t distinguish these even when labeled. Always state explicitly what your error bars represent.

Types of Error Bar Representations

Visual What it shows Good for
Simple error bars A single range Point estimates with CI
Graded error bars Multiple confidence levels Showing uncertainty spectrum
Box plots Quartiles + outliers Distribution shape
Violin plots Full distribution Comparing distributions
Half-eye / eye plots CI + distribution Combining precision + shape
Quantile dot plots Discretized distribution Lay audiences, frequency framing

Graded Error Bars

Instead of one confidence level, graded error bars show multiple levels simultaneously (e.g., 50%, 80%, 95% CI).

The thicker inner bar = higher confidence; the thinner outer bar = lower confidence. Readers get a sense of the full range of plausible values.

Confidence Bands for Curves

When fitting a model to data, the fitted line itself has uncertainty. We visualize this with a confidence band.

  • The band shows the range of lines compatible with the data at a given confidence level
  • Confidence bands are curved even for straight-line fits — because the line can both shift up/down and rotate
  • Graded confidence bands can show multiple levels simultaneously

🧠 Active Learning: Read the Uncertainty (6 min)

Look at this plot description:

A scatter plot of exam scores vs. study hours. A fitted regression line is shown with a gray shaded band. The band is narrow in the middle and widens at the extremes.

Discuss:

  1. Why does the band widen at the extremes?
  2. What would a graded confidence band add?
  3. If you showed this to a non-statistician, what would they likely misinterpret?

Frequency Framing: Making Probability Intuitive

People are bad at reasoning about probabilities. Frequency framing reframes probability as counts out of a concrete group.

Hard to grasp: > “There is a 17% chance of rain.”

Easier: > “In 17 out of 100 days like today, it rained.”

Quantile dot plot — shows a distribution as discrete dots, where each dot = one possible outcome.

Hypothetical Outcome Plots (HOPs)

HOPs animate through multiple possible outcomes, one at a time. Each frame = one draw from the distribution.

  • More intuitive than static confidence intervals for lay audiences
  • Forces viewer to confront the variability of outcomes
  • Important: outcomes must be representative of the true distribution

Choosing Your Uncertainty Visualization

Audience Goal Best approach
Scientists Precise inference Graded error bars, CI bands
General public Intuition about variability Quantile dot plots, HOPs
Decision-makers Range of plausible outcomes Frequency framing
Data-savvy readers Full distribution shape Violin plots, half-eyes

🧠 Wrap-Up Discussion (5 min)

Choose one graph from this collection: NYT collection

  1. Was uncertainty shown? If not, should it have been?
  2. Which visualization type would work best for that uncertainty, and for what audience?
  3. What’s the risk of not showing uncertainty?

Summary: Week 8 Tuesday

  • Map projections always involve tradeoffs — choose based on what you need to preserve (area for comparisons!)
  • Choropleths work best with rates/densities; large areas visually dominate raw count maps
  • Cartograms correct for area bias but sacrifice geographic accuracy
  • Error bars must always be labeled — readers can’t guess what they represent
  • Confidence bands curve even for straight-line fits — because the line can shift and rotate
  • Frequency framing and quantile dot plots make probability more intuitive for general audiences

For Thursday

Read: Wilke Ch 17 (Proportional Ink) and Wilke Ch 29 (Telling Stories with Data)

We’ll shift from what to show to how to show it well — design principles, avoiding common pitfalls, and building visualizations that tell a clear story.