28 Apr 2026
Last time: Sofia learned to calculate probabilities for student participation in climate programs
The new challenge: Sofia noticed something interesting…
Today’s goal: Use contingency tables and conditional probability to uncover relationships between variables and make data-driven decisions!
By the end of this lecture, you will be able to:
Before we dive in, let’s refresh the key rules from last class:
Addition Rule (for “OR” questions): \[P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)\]
If mutually exclusive: P(A or B) = P(A) + P(B)
Multiplication Rule (for “AND” questions): \[P(A \text{ and } B) = P(A) \times P(B|A)\]
If independent: P(A and B) = P(A) × P(B)
Remember Sofia’s survey? (n = 600 students)
Question from last time: Are these events independent?
Test: If independent, P(First-gen AND On-campus) should equal P(First-gen) × P(On-campus)
Check: 0.30 × 0.40 = 0.12
Actual: 0.15
0.15 ≠ 0.12 → NOT independent! ❌
Today: We’ll use contingency tables to make this analysis clearer and more powerful! 💪
Sofia’s raw data looks like this:
| Student | Lives On Campus | Uses Bike-sharing |
|---|---|---|
| 1 | Yes | Yes |
| 2 | Yes | No |
| 3 | No | Yes |
| 4 | No | No |
| … | … | … |
| 600 | Yes | No |
Challenges:
Solution: Organize data into a contingency table! 🎯
Structure:
Why use them? They make it easy to see relationships between variables at a glance! 👁️
Question: Is there a relationship between living on campus and using bike-sharing?
Data: Survey of 600 UCSC students
| Uses Bike-sharing | No Bike-sharing | Row Total | |
|---|---|---|---|
| Lives On Campus | 84 | 156 | 240 |
| Lives Off Campus | 36 | 324 | 360 |
| Column Total | 120 | 480 | 600 |
What do we notice?
Step-by-step process:
| Uses Bike-sharing | No Bike-sharing | Row Total | |
|---|---|---|---|
| Lives On Campus | 84 ← Joint frequency | 156 | 240 ← Marginal frequency |
| Lives Off Campus | 36 | 324 | 360 |
| Column Total | 120 ↑ Marginal frequency | 480 | 600 ← Grand total |
Key terms:
From a contingency table, we can calculate three types of probabilities:
Let’s explore each type! 🚀
Simple Probability: Probability of a single characteristic, regardless of other variables
Formula: Marginal frequency / Grand total
From Sofia’s table:
| Uses Bike | No Bike | Row Total | |
|---|---|---|---|
| On Campus | 84 | 156 | 240 |
| Off Campus | 36 | 324 | 360 |
| Column Total | 120 | 480 | 600 |
Calculate:
Interpretation: 40% of all UCSC students live on campus; 20% use bike-sharing
Using the same table, calculate:
| Uses Bike | No Bike | Row Total | |
|---|---|---|---|
| On Campus | 84 | 156 | 240 |
| Off Campus | 36 | 324 | 360 |
| Column Total | 120 | 480 | 600 |
Your turn:
Answers:
Joint Probability: Probability that two events occur together
Formula: Joint frequency / Grand total
From Sofia’s table:
| Uses Bike | No Bike | Row Total | |
|---|---|---|---|
| On Campus | 84 | 156 | 240 |
| Off Campus | 36 | 324 | 360 |
| Column Total | 120 | 480 | 600 |
Calculate:
P(On Campus AND Uses Bike) = 84/600 = 0.14 or 14%
P(Off Campus AND No Bike) = 324/600 = 0.54 or 54%
Every interior cell represents a joint probability:
| Uses Bike | No Bike | Row Total | |
|---|---|---|---|
| On Campus | 84/600 = 0.14 | 156/600 = 0.26 | 240 |
| Off Campus | 36/600 = 0.06 | 324/600 = 0.54 | 360 |
| Column Total | 120 | 480 | 600 |
Check: All joint probabilities should sum to 1.00
0.14 + 0.26 + 0.06 + 0.54 = 1.00 ✅
Interpretation: 14% of all students live on campus AND use bike-sharing
Remember: For independent events, P(A and B) = P(A) × P(B)
Test for independence:
P(On Campus) × P(Uses Bike) = 0.40 × 0.20 = 0.08
Actual P(On Campus AND Uses Bike) = 0.14
0.14 ≠ 0.08 → NOT independent! 🚨
What does this mean?
Living on campus and bike usage are related! Students on campus are more likely to bike than we’d expect if these were independent. This is a valuable insight for Sofia! 🎯
Practice with Contingency Tables:
Sofia surveys 500 students about workshop attendance and garden volunteering:
| Volunteers in Garden | No Garden | Row Total | |
|---|---|---|---|
| Attends Workshops | 40 | 60 | 100 |
| No Workshops | 35 | 365 | 400 |
| Column Total | 75 | 425 | 500 |
Calculate:
The most powerful type of probability for decision-making!
Conditional Probability: The probability of event A occurring GIVEN that event B has already occurred
Notation: P(A|B) - read as “probability of A given B”
Key insight: We’re now working with a reduced sample space - only considering cases where B is true! 🔍
Sofia’s question: “Among students who live ON CAMPUS, what percentage use bike-sharing?”
This is NOT the same as P(Uses Bike-sharing)!
Why? We’re only looking at the subset of students who live on campus.
Answer: P(Bikes | On Campus) = 84/240 = 0.35 or 35%
Compare to overall: P(Bikes) = 120/600 = 0.20 or 20%
Living on campus increases bike usage from 20% to 35%! 📈
Method 1 - Using the contingency table directly:
\[P(A|B) = \frac{\text{Count in both A and B}}{\text{Count in B}}\]
Method 2 - Using probabilities:
\[P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\]
Both methods give the same answer!
P(Bikes | On Campus) = 84/240 = 0.35
OR
P(Bikes | On Campus) = (84/600) / (240/600) = 0.14/0.40 = 0.35 ✅
Sofia’s table:
| Uses Bike | No Bike | Row Total | |
|---|---|---|---|
| On Campus | 84 | 156 | 240 |
| Off Campus | 36 | 324 | 360 |
| Column Total | 120 | 480 | 600 |
Calculate: P(Bikes | On Campus)
Solution: Focus on the “On Campus” row only
Interpretation: Among on-campus students, 35% use bike-sharing 🚲
Same table:
| Uses Bike | No Bike | Row Total | |
|---|---|---|---|
| On Campus | 84 | 156 | 240 |
| Off Campus | 36 | 324 | 360 |
| Column Total | 120 | 480 | 600 |
Calculate: P(Bikes | Off Campus)
Solution: Focus on the “Off Campus” row only
Compare the two conditional probabilities:
Living on campus makes students 3.5× more likely to bike! 🎯
The reduced sample space concept:
When calculating P(A|B), we ONLY look at cases where B is true!
IMPORTANT: P(A|B) ≠ P(B|A) in general!
Example from Sofia’s data:
P(Bikes | On Campus) = 84/240 = 0.35
P(On Campus | Bikes) = 84/120 = 0.70
These answer different questions:
Both are useful, but very different! 🔄
Conditional probabilities guide strategy:
Finding: P(Bikes | On Campus) = 0.35 vs P(Bikes | Off Campus) = 0.10
Action: Install more bike racks and repair stations near residence halls! 🚲
Finding: P(Workshop | First-gen) = 0.32 vs P(Workshop | Not first-gen) = 0.18
Action: Market workshops more to first-gen students who are already interested! 📢
Finding: P(Garden | Attends Workshop) = 0.40 vs P(Garden | overall) = 0.15
Action: Recruit garden volunteers at workshops - they’re already engaged! 🌿
Formal definition using conditional probability:
Events A and B are independent if and only if:
\[P(A|B) = P(A)\]
In words: Knowing that B occurred doesn’t change the probability of A!
Equivalently: Events are independent if:
\[P(A \text{ and } B) = P(A) \times P(B)\]
Sofia’s data:
Test for independence: Does P(Bikes | On Campus) = P(Bikes)?
0.35 ≠ 0.20
Not independent! Living on campus changes the probability of biking! 📊
Three equivalent ways to test independence:
All three will give the same answer! ✅
Creating a contingency table in Sheets:
Method 1 - Pivot Table (best for raw data):
Method 2 - COUNTIFS (manual):
=COUNTIFS($A$2:$A$601, "On Campus", $B$2:$B$601, "Yes")
This counts students who are BOTH on campus AND use bikes
Calculating probabilities:
=B2/$D$5 ' Joint or simple probability
=B2/D2 ' Conditional probability (row-based)
Sofia expands her analysis - now with three categories for each variable:
| Bike | Bus | Walk | Row Total | |
|---|---|---|---|---|
| On Campus | 84 | 48 | 108 | 240 |
| Off Campus < 2 miles | 36 | 45 | 69 | 150 |
| Off Campus > 2 miles | 12 | 138 | 60 | 210 |
| Column Total | 132 | 231 | 237 | 600 |
Now we can ask more nuanced questions:
Insight: Distance matters! Students far from campus heavily rely on buses! 🚌
Comprehensive Contingency Table Analysis:
Sofia surveys 800 students about zero-waste dining participation:
| Participates | Doesn’t Participate | Row Total | |
|---|---|---|---|
| Has meal plan | 280 | 120 | 400 |
| No meal plan | 80 | 320 | 400 |
| Column Total | 360 | 440 | 800 |
Calculate and interpret:
Post your complete analysis on Ed Discussion with reasoning!
A powerful tool when you know conditional probabilities:
\[P(A) = P(A|B_1) \times P(B_1) + P(A|B_2) \times P(B_2) + ...\]
Sofia’s example: Overall bike usage probability
P(Bikes) = P(Bikes|On Campus) × P(On Campus) + P(Bikes|Off Campus) × P(Off Campus)
= 0.35 × 0.40 + 0.10 × 0.60
= 0.14 + 0.06
= 0.20 ✅
This matches our direct calculation: 120/600 = 0.20
When is this useful? When you know conditional probabilities but need the overall probability!
Sofia’s strategic questions answered by conditional probability:
What if a cell in your contingency table is zero?
| Uses Bike | No Bike | Row Total | |
|---|---|---|---|
| Lives >10 miles away | 0 | 150 | 150 |
| Lives <10 miles away | 120 | 330 | 450 |
| Column Total | 120 | 480 | 600 |
Interpretation:
For Sofia: Don’t target far-away students for bike program; focus on bus/carpool instead! 🚌
Comprehensive Analysis Challenge:
Sofia conducts a final survey on sustainable transportation (n = 900):
| Bike | Bus | Car | Row Total | |
|---|---|---|---|---|
| Lives On Campus | 120 | 90 | 60 | 270 |
| Lives Off <3mi | 135 | 90 | 135 | 360 |
| Lives Off >3mi | 45 | 135 | 90 | 270 |
| Column Total | 300 | 315 | 285 | 900 |
Your comprehensive analysis:

Why this matters beyond the classroom:
Public Health: Vaccine effectiveness by age group
Marketing: Purchase rates by customer demographics
Education: Success rates by teaching method
Climate Action: Participation rates by student characteristics (Sofia’s work!)
Social Justice: Understanding disparities in outcomes across groups
Data-driven decisions are only as good as your understanding of probability! 💡
Q: “How do I know which probability type to calculate?”
A: Read the question carefully! - “Overall” or “in general” → Simple - “Both” or “and” → Joint - “Given” or “if we know” → Conditional
Q: “Why doesn’t P(A|B) + P(A|Bc) equal 1?”
A: Because they have different denominators! They’re probabilities within different subgroups, not complementary events.
Q: “Can I use contingency tables for continuous variables?”
A: Not directly! You need to “bin” continuous data into categories first (e.g., age → age groups).
What Sofia learned about effective activism:
Lesson 1: Not all students are equally likely to participate - understand your audience!
Lesson 2: Some behaviors cluster together - leverage existing engagement!
Lesson 3: Distance and infrastructure matter more than enthusiasm!
Rate your confidence (1-5) on Ed Discussion:
If you rated anything 3 or below, please come to office hours! These concepts build on each other. 🤗
Make the most of collaborative learning:
Form study groups on Ed Discussion! Research shows collaborative learning improves outcomes. 📚
Simple (Marginal) Probability: \(P(A) = \frac{\text{Row or Column Total}}{\text{Grand Total}}\)
Joint Probability: \(P(A \text{ and } B) = \frac{\text{Cell Frequency}}{\text{Grand Total}}\)
Conditional Probability: \(P(A|B) = \frac{\text{Cell Frequency}}{\text{Row or Column Total for B}}\)
OR equivalently:
\(P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\)
Three equivalent methods:
Method 1 - Using conditional probability:
Check if P(A|B) = P(A)
If equal → Independent ✅
If not equal → Dependent ❌
Method 2 - Using multiplication rule:
Check if P(A and B) = P(A) × P(B)
If equal → Independent ✅
If not equal → Dependent ❌
Method 3 - Compare conditional to marginal:
Check if P(A|B) = P(A|Bc)
If equal → Independent ✅
If not equal → Dependent ❌
✅ Comparing two categorical variables
✅ Looking for relationships between factors
✅ Calculating conditional probabilities
✅ Testing for independence
✅ Making data-driven decisions about groups
✅ Organizing survey or experimental data
Questions? I have office hours right after class today!
Next up: Probability Distributions
Remember: Contingency tables are your friends - they organize complex information into clear, actionable insights! 🎯
![]()
STAT 17 – Fall 2025