Scatter Plots and Correlation
By the end of this lecture, you will be able to:
When we have two quantitative variables, we want to know:
Do they relate to each other?
Examples:
Required Reading
Chapter 12: Visualizing Associations
Sections to focus on:
Available at: clauswilke.com/dataviz
The Language of Scatter Plots
We plot the variable on the y-axis AGAINST the variable on the x-axis
Example: Plot head length (y) against body mass (x)
Correlation coefficient (r): A number between -1 and +1 that measures how two variables covary
r = +1
Perfect positive relationship
As x ↑, y ↑
r = 0
No relationship
No pattern
r = -1
Perfect negative relationship
As x ↑, y ↓
Think of correlation in ranges:
Note: |r| means the absolute value (ignore the sign)
Critical Thinking Required!
Just because two variables are correlated does NOT mean one causes the other!
Examples of spurious correlations:
Work with a partner:
20 | *
| * * *
15 | * * *
| * * *
10 | * * *
| * *
5 | * *
|___________________________
5 10 15 20 25 30
Your estimate: r = ____
20 | *
| *
15 | *
| *
10 | *
| *
5 | *
| *
|___________________________
5 10 15 20 25 30
Your estimate: r = ____
20 | * * * * * *
| * * * * *
15 | * * * * *
| * * * * *
10 | * * * * *
| * * * *
5 | * * * * *
|___________________________
5 10 15 20 25 30
Your estimate: r = ____
20 | *
| *
15 | *
| *
10 | *
| *
5 | *
| *
|___________________________
5 10 15 20 25 30
Your estimate: r = ____
20 | * *
| * * *
15 | * * *
| * * *
10 | * *
| * * *
5 | * *
|___________________________
5 10 15 20 25 30
Your estimate: r = ____
20 | *
| *
15 | *
| *
10 | *
| *
5 | *
| *
|___________________________
5 10 15 20 25 30
Your estimate: r = ____
Every scatter plot needs:
From Wilke Chapter 12 - measurements on 123 blue jays:
Research question: Do heavier birds have longer heads?
Head length vs body mass:
This is typical of biological data - relationships exist but aren’t perfect!
Here are exam scores and study hours for 10 students:
| Student | Study Hours | Exam Score |
|---|---|---|
| A | 2 | 65 |
| B | 5 | 78 |
| C | 3 | 68 |
| D | 8 | 92 |
| E | 1 | 58 |
| F | 6 | 85 |
| G | 4 | 75 |
| H | 7 | 88 |
| I | 9 | 95 |
| J | 3 | 70 |
On graph paper or blank paper:
Draw axes
Label everything clearly
Plot each student as a point
Answer these questions:
Let’s share what you found:
Correlation (r) only measures linear relationships.
Other patterns exist:
Exercise intensity vs. enjoyment:
This is an inverted U-shape - correlation might be near zero even though there’s a clear pattern!
Blue jay data with sex included:
When Wilke colored points by bird sex:
Work in groups of 3-4. For each scenario, decide:
A. Age of a car vs. its resale value
B. Outside temperature vs. hot chocolate sales
C. Number of slugs visible on campus vs. humidity in Santa Cruz
D. Hours of sleep vs. test performance
E. Distance from equator vs. average temperature
F. Coffee consumption vs. alertness (think about too much coffee!)
G. Years of education vs. lifetime earnings
H. Number of bikes rented in SC per day vs. ammount of daily rain in SC
What did you decide for each scenario?
Watch Out For:
A single extreme point can:
Always look at your scatter plot! Don’t just calculate r.
Next class we’ll cover:
Homework: Read Wilke Chapter 12 sections 12.2-12.4
Office hours: Check Canvas for times
Concept Map 2 is coming: Start thinking about how to organize chart types and when to use different visualizations.