Week 8: Geospatial Data & Uncertainty

STAT 80B - Data Visualization

Marcela Alfaro Córdoba

Statistics - UCSC

10 Mar 2026

🌍 Geospatial Data

Overview

Map projections and coordinate systems
Choropleth maps and their pitfalls
Cartograms and alternative geographic visualizations
Visualizing uncertainty: error bars, confidence intervals
Confidence bands for curves
Frequency framing for probabilities

Reading: Wilke Ch 15 & 16

The Earth Is Not Flat (But Our Screens Are)

The fundamental challenge of maps: projecting a 3D sphere onto a 2D surface always introduces distortion.

Every map projection makes a tradeoff — it can preserve:

Shape (conformal projections, e.g., Mercator)
Area (equal-area projections, e.g., Albers, Mollweide)
Distance (equidistant projections)
Direction (azimuthal projections)

No projection can preserve all properties at once.

Common Map Projections

Projection	Preserves	Distorts	Best For
Mercator	Shape/angles	Area at poles	Navigation
Albers Equal-Area	Area	Shape	Thematic maps of US
Robinson	Compromise	Both slightly	World maps
Mollweide	Area	Shape near edges	Global comparisons
Web Mercator	(Speed)	Area	Web tiles (Google Maps)

The Mercator Problem

The Mercator projection wildly inflates area at high latitudes:

Greenland appears as large as Africa → in reality, Africa is ~14× larger
Antarctica looks enormous
Europe looks larger relative to Africa than it actually is

🔍 Find this example yourself

Search: “Mercator projection distortion comparison” — Wilke Figure 15.2 shows a good side-by-side. Also try thetruesize.com — drag countries to compare their real sizes!

Choosing the Right Projection

For U.S. data: Albers Equal-Area Conic is the standard choice

Preserves area accurately across the contiguous states
Widely used by the Census Bureau, USGS

For global data:

Mollweide or Equal-Earth for area comparisons
Avoid Mercator for thematic maps showing quantities

Key question: Are you comparing quantities across regions? → Use an equal-area projection.

Maps Have Layers

A complete map is built from layers (Wilke Ch 15.2):

Terrain / background — coastlines, country/state boundaries
Data layer — what you’re visualizing (colors, symbols, sizes)
Context layer — labels, graticules, scale bars, north arrows

Think of it like ggplot’s geom_* layers — each adds meaning.

Example Structure

Choropleth Maps

A choropleth map colors geographic regions according to a data value.

✅ When choropleths work well

Data represents density (value ÷ area): e.g., population per km²
All regions are approximately the same size
Color scale matches data type (sequential, diverging)

⚠️ Choropleth pitfalls

Large regions dominate visually even if their values are small
Raw counts (not rates) are misleading on a choropleth
Poor color choice makes patterns hard to read

Example

The Big Area Problem

Imagine a choropleth of total votes by U.S. county. Wyoming covers a huge geographic area, but has fewer than 250,000 voters. Los Angeles County is tiny on the map but has ~5 million voters.

The eye is drawn to area, not to the data.

🔍 Another Example

Choropleth: Color Scale Matters

Sequential palette → for data that goes from low to high (e.g., income, density)
Diverging palette → for data centered on a meaningful midpoint (e.g., % change, above/below average)
Qualitative palette → for categorical regions (e.g., political parties, climate zones)

❌ Common mistake

Using a rainbow/jet colormap — it implies order and magnitude where there may be none, and is not colorblind-friendly.

🧠 Active Learning: Critique a Choropleth (8 min)

Look at this map: https://www.nytimes.com/elections/2012/results/president.html

Discuss with a partner:

What does the map make you perceive at first glance?
Is the color scale appropriate? Why or why not?
What would happen if you mapped vote margin per km² instead?
Would a cartogram improve this visualization? Why?

Cartograms: Distorting Geography for Clarity

A cartogram rescales regions proportionally to some data variable — usually population.

Types of cartograms:

Contiguous cartogram — regions stay connected but areas are distorted. Example
Non-contiguous cartogram — regions float free, sized by data
Dorling cartogram — regions become circles, sized by data
Cartogram heatmap — equal-sized tiles arranged geographically (e.g., US state squares).

Cartogram Heatmap: A Practical Alternative

The cartogram heatmap (tilegram) gives every region equal visual weight — great when you care equally about all units.

Tradeoff: geographic accuracy is lost, but no region dominates visually.

When to Use Which Map Type

Situation	Best Choice
Data is a rate/density	Choropleth (equal-area projection)
Showing counts, all regions matter equally	Cartogram heatmap
Emphasizing population-weighted patterns	Contiguous/Dorling cartogram
Showing point locations with data	Bubble map
Comparing a variable over time per region	Small multiples map

📊 Visualizing Uncertainty

Why Uncertainty Visualization Matters

Nearly every dataset has uncertainty — measurement error, sampling variability, model uncertainty
Choosing not to show uncertainty is itself a design decision — and often a misleading one
Different audiences respond differently to uncertainty representations
People tend to interpret ranges as hard limits (deterministic construal error)

“The most challenging aspect of data visualization is the visualization of uncertainty.” — Wilke, Ch 16

Error Bars: The Classic (and Often Misread) Tool

Error bars extend from a central estimate to show a range. But what does the bar represent?

± 1 standard deviation (SD)?
± 1 standard error (SE)?
95% confidence interval?
Min/max range?

The error bar problem

Studies show readers often can’t distinguish these even when labeled. Always state explicitly what your error bars represent.

Types of Error Bar Representations

Visual	What it shows	Good for
Simple error bars	A single range	Point estimates with CI
Graded error bars	Multiple confidence levels	Showing uncertainty spectrum
Box plots	Quartiles + outliers	Distribution shape
Violin plots	Full distribution	Comparing distributions
Half-eye / eye plots	CI + distribution	Combining precision + shape
Quantile dot plots	Discretized distribution	Lay audiences, frequency framing

Graded Error Bars

Instead of one confidence level, graded error bars show multiple levels simultaneously (e.g., 50%, 80%, 95% CI).

The thicker inner bar = higher confidence; the thinner outer bar = lower confidence. Readers get a sense of the full range of plausible values.

Confidence Bands for Curves

When fitting a model to data, the fitted line itself has uncertainty. We visualize this with a confidence band.

The band shows the range of lines compatible with the data at a given confidence level
Confidence bands are curved even for straight-line fits — because the line can both shift up/down and rotate
Graded confidence bands can show multiple levels simultaneously

🧠 Active Learning: Read the Uncertainty (6 min)

Look at this plot description:

A scatter plot of exam scores vs. study hours. A fitted regression line is shown with a gray shaded band. The band is narrow in the middle and widens at the extremes.

Discuss:

Why does the band widen at the extremes?
What would a graded confidence band add?
If you showed this to a non-statistician, what would they likely misinterpret?

Frequency Framing: Making Probability Intuitive

People are bad at reasoning about probabilities. Frequency framing reframes probability as counts out of a concrete group.

Hard to grasp: > “There is a 17% chance of rain.”

Easier: > “In 17 out of 100 days like today, it rained.”

Quantile dot plot — shows a distribution as discrete dots, where each dot = one possible outcome.

Hypothetical Outcome Plots (HOPs)

HOPs animate through multiple possible outcomes, one at a time. Each frame = one draw from the distribution.

More intuitive than static confidence intervals for lay audiences
Forces viewer to confront the variability of outcomes
Important: outcomes must be representative of the true distribution

Choosing Your Uncertainty Visualization

Audience	Goal	Best approach
Scientists	Precise inference	Graded error bars, CI bands
General public	Intuition about variability	Quantile dot plots, HOPs
Decision-makers	Range of plausible outcomes	Frequency framing
Data-savvy readers	Full distribution shape	Violin plots, half-eyes

🧠 Wrap-Up Discussion (5 min)

Choose one graph from this collection: NYT collection

Was uncertainty shown? If not, should it have been?
Which visualization type would work best for that uncertainty, and for what audience?
What’s the risk of not showing uncertainty?

Summary: Week 8 Tuesday

Map projections always involve tradeoffs — choose based on what you need to preserve (area for comparisons!)
Choropleths work best with rates/densities; large areas visually dominate raw count maps
Cartograms correct for area bias but sacrifice geographic accuracy
Error bars must always be labeled — readers can’t guess what they represent
Confidence bands curve even for straight-line fits — because the line can shift and rotate
Frequency framing and quantile dot plots make probability more intuitive for general audiences

For Thursday

Read: Wilke Ch 17 (Proportional Ink) and Wilke Ch 29 (Telling Stories with Data)

We’ll shift from what to show to how to show it well — design principles, avoiding common pitfalls, and building visualizations that tell a clear story.