STAT 17: Statistical Methods for Business and Economics

28 Apr 2026

Sofia’s Climate Action Update 🌍

Last time: Sofia learned to calculate probabilities for student participation in climate programs

The new challenge: Sofia noticed something interesting…

  • Students living on campus seem MORE likely to use bike-sharing
  • First-generation students participate in workshops at different rates
  • Are these observations real patterns or just coincidence? 🤔

Today’s goal: Use contingency tables and conditional probability to uncover relationships between variables and make data-driven decisions!

What We’ll Accomplish Today

By the end of this lecture, you will be able to:

  • Create and interpret contingency tables from raw data
  • Calculate simple, marginal, and joint probabilities from contingency tables
  • Apply conditional probability to answer “what if” questions
  • Use probability to test for independence between variables
  • Make informed decisions using probability relationships

Quick Probability Review 🔄

Before we dive in, let’s refresh the key rules from last class:

Addition Rule (for “OR” questions): \[P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)\]

If mutually exclusive: P(A or B) = P(A) + P(B)

Multiplication Rule (for “AND” questions): \[P(A \text{ and } B) = P(A) \times P(B|A)\]

If independent: P(A and B) = P(A) × P(B)

Review: Testing for Independence

Remember Sofia’s survey? (n = 600 students)

  • 180 students are first-generation (P = 0.30)
  • 240 students live on campus (P = 0.40)
  • 90 students are BOTH first-gen AND on campus (P = 0.15)

Question from last time: Are these events independent?

Test: If independent, P(First-gen AND On-campus) should equal P(First-gen) × P(On-campus)

Check: 0.30 × 0.40 = 0.12

Actual: 0.15

0.15 ≠ 0.12 → NOT independent!

Today: We’ll use contingency tables to make this analysis clearer and more powerful! 💪

The Problem with Long Lists 📋

Sofia’s raw data looks like this:

Student Lives On Campus Uses Bike-sharing
1 Yes Yes
2 Yes No
3 No Yes
4 No No
600 Yes No

Challenges:

  • Hard to see patterns 👀
  • Difficult to calculate probabilities quickly
  • Tough to compare groups
  • Cannot easily visualize relationships

Solution: Organize data into a contingency table! 🎯

Introducing: The Contingency Table! 📊

Contingency Table (also called a two-way table or cross-tabulation):
A table that displays the frequency distribution of two categorical variables

Structure:

  • Rows represent one variable (e.g., Lives on Campus: Yes/No)
  • Columns represent another variable (e.g., Uses Bike-sharing: Yes/No)
  • Cells contain counts or frequencies
  • Margins show row and column totals

Why use them? They make it easy to see relationships between variables at a glance! 👁️

Sofia’s First Contingency Table

Question: Is there a relationship between living on campus and using bike-sharing?

Data: Survey of 600 UCSC students

Uses Bike-sharing No Bike-sharing Row Total
Lives On Campus 84 156 240
Lives Off Campus 36 324 360
Column Total 120 480 600

What do we notice?

  • 84 students live on campus AND use bike-sharing
  • 36 students live off campus AND use bike-sharing
  • 240 students total live on campus
  • 600 students total surveyed

Building a Contingency Table

Step-by-step process:

  1. Identify your two variables (both must be categorical)
    • Variable 1: Lives on Campus (Yes/No)
    • Variable 2: Uses Bike-sharing (Yes/No)
  2. Count combinations - How many students fall into each category?
    • On Campus + Bikes = 84
    • On Campus + No Bikes = 156
    • Off Campus + Bikes = 36
    • Off Campus + No Bikes = 324
  3. Calculate margins (row and column totals)
    • Row totals: 84+156=240, 36+324=360
    • Column totals: 84+36=120, 156+324=480
  4. Verify - All margins should sum to total sample size (600) ✅

Anatomy of a Contingency Table 🔍

Uses Bike-sharing No Bike-sharing Row Total
Lives On Campus 84 ← Joint frequency 156 240 ← Marginal frequency
Lives Off Campus 36 324 360
Column Total 120 ↑ Marginal frequency 480 600Grand total

Key terms:

  • Joint frequency: Count in interior cells (e.g., 84)
  • Marginal frequency: Row and column totals (e.g., 240, 120)
  • Grand total: Total sample size (600)

Three Types of Probability 🎲

From a contingency table, we can calculate three types of probabilities:

1. Simple (Marginal) Probability
Probability of a single event, found in the margins
2. Joint Probability
Probability of two events occurring together, found in interior cells
3. Conditional Probability
Probability of one event GIVEN another has occurred

Let’s explore each type! 🚀

Type 1: Simple (Marginal) Probability

Simple Probability: Probability of a single characteristic, regardless of other variables

Formula: Marginal frequency / Grand total

From Sofia’s table:

Uses Bike No Bike Row Total
On Campus 84 156 240
Off Campus 36 324 360
Column Total 120 480 600

Calculate:

  • P(Lives on Campus) = 240/600 = 0.40 or 40%
  • P(Uses Bike-sharing) = 120/600 = 0.20 or 20%

Interpretation: 40% of all UCSC students live on campus; 20% use bike-sharing

Simple Probability Practice

Using the same table, calculate:

Uses Bike No Bike Row Total
On Campus 84 156 240
Off Campus 36 324 360
Column Total 120 480 600

Your turn:

  1. P(Lives off campus) = ?
  2. P(Does NOT use bike-sharing) = ?

Answers:

  1. P(Off campus) = 360/600 = 0.60 or 60%
  2. P(No bike-sharing) = 480/600 = 0.80 or 80%

Type 2: Joint Probability 🤝

Joint Probability: Probability that two events occur together

Formula: Joint frequency / Grand total

From Sofia’s table:

Uses Bike No Bike Row Total
On Campus 84 156 240
Off Campus 36 324 360
Column Total 120 480 600

Calculate:

P(On Campus AND Uses Bike) = 84/600 = 0.14 or 14%

P(Off Campus AND No Bike) = 324/600 = 0.54 or 54%

Joint Probability: All Four Cells

Every interior cell represents a joint probability:

Uses Bike No Bike Row Total
On Campus 84/600 = 0.14 156/600 = 0.26 240
Off Campus 36/600 = 0.06 324/600 = 0.54 360
Column Total 120 480 600

Check: All joint probabilities should sum to 1.00

0.14 + 0.26 + 0.06 + 0.54 = 1.00 ✅

Interpretation: 14% of all students live on campus AND use bike-sharing

Connecting to the Multiplication Rule 🔗

Remember: For independent events, P(A and B) = P(A) × P(B)

Test for independence:

P(On Campus) × P(Uses Bike) = 0.40 × 0.20 = 0.08

Actual P(On Campus AND Uses Bike) = 0.14

0.14 ≠ 0.08 → NOT independent! 🚨

What does this mean?

Living on campus and bike usage are related! Students on campus are more likely to bike than we’d expect if these were independent. This is a valuable insight for Sofia! 🎯

📊 THINK-PAIR-SHARE #1 (5 minutes)

Practice with Contingency Tables:

Sofia surveys 500 students about workshop attendance and garden volunteering:

Volunteers in Garden No Garden Row Total
Attends Workshops 40 60 100
No Workshops 35 365 400
Column Total 75 425 500

Calculate:

  1. P(Attends Workshops) - simple probability
  2. P(Volunteers in Garden) - simple probability
  3. P(Attends Workshops AND Volunteers) - joint probability
  4. If independent, what would P(Workshops AND Garden) be?
  5. Are these events independent? What does this mean for Sofia?

Share your answers in Poll Everywhere!

  1. P(Attends Workshops)
  2. P(Volunteers in Garden)
  3. P(Attends Workshops AND Volunteers)
  4. If independent: P(Workshops) × P(Garden) = ?
  5. Are they independent?

🧘‍♀️ STRETCH BREAK

Time to move! (5 minutes)

  • Stand up and stretch 🤸‍♀️
  • Chat with neighbors about conditional probability 💬
  • Grab some water 💧

Type 3: Conditional Probability 🎯

The most powerful type of probability for decision-making!

Conditional Probability: The probability of event A occurring GIVEN that event B has already occurred

Notation: P(A|B) - read as “probability of A given B”

Key insight: We’re now working with a reduced sample space - only considering cases where B is true! 🔍

Understanding Conditional Probability

Sofia’s question: “Among students who live ON CAMPUS, what percentage use bike-sharing?”

This is NOT the same as P(Uses Bike-sharing)!

Why? We’re only looking at the subset of students who live on campus.

  • Total universe: All 600 students
  • Reduced universe: Only 240 on-campus students
  • Within this group: 84 use bikes

Answer: P(Bikes | On Campus) = 84/240 = 0.35 or 35%

Compare to overall: P(Bikes) = 120/600 = 0.20 or 20%

Living on campus increases bike usage from 20% to 35%! 📈

Conditional Probability Formula 📐

Method 1 - Using the contingency table directly:

\[P(A|B) = \frac{\text{Count in both A and B}}{\text{Count in B}}\]

Method 2 - Using probabilities:

\[P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\]

Both methods give the same answer!

P(Bikes | On Campus) = 84/240 = 0.35

OR

P(Bikes | On Campus) = (84/600) / (240/600) = 0.14/0.40 = 0.35 ✅

Conditional Probability Example

Sofia’s table:

Uses Bike No Bike Row Total
On Campus 84 156 240
Off Campus 36 324 360
Column Total 120 480 600

Calculate: P(Bikes | On Campus)

Solution: Focus on the “On Campus” row only

  • Students on campus: 240
  • Students on campus who bike: 84
  • P(Bikes | On Campus) = 84/240 = 0.35

Interpretation: Among on-campus students, 35% use bike-sharing 🚲

Another Conditional Probability

Same table:

Uses Bike No Bike Row Total
On Campus 84 156 240
Off Campus 36 324 360
Column Total 120 480 600

Calculate: P(Bikes | Off Campus)

Solution: Focus on the “Off Campus” row only

  • Students off campus: 360
  • Students off campus who bike: 36
  • P(Bikes | Off Campus) = 36/360 = 0.10

Compare the two conditional probabilities:

  • P(Bikes | On Campus) = 0.35
  • P(Bikes | Off Campus) = 0.10

Living on campus makes students 3.5× more likely to bike! 🎯

Visualizing Conditional Probability

The reduced sample space concept:

Seeing theory

When calculating P(A|B), we ONLY look at cases where B is true!

Conditional Probability Direction Matters! ⚠️

IMPORTANT: P(A|B) ≠ P(B|A) in general!

Example from Sofia’s data:

P(Bikes | On Campus) = 84/240 = 0.35

P(On Campus | Bikes) = 84/120 = 0.70

These answer different questions:

  • First: “Among on-campus students, what % bike?”
  • Second: “Among bike users, what % live on campus?”

Both are useful, but very different! 🔄

Real-World Application for Sofia 🌱

Conditional probabilities guide strategy:

Finding: P(Bikes | On Campus) = 0.35 vs P(Bikes | Off Campus) = 0.10

Action: Install more bike racks and repair stations near residence halls! 🚲

Finding: P(Workshop | First-gen) = 0.32 vs P(Workshop | Not first-gen) = 0.18

Action: Market workshops more to first-gen students who are already interested! 📢

Finding: P(Garden | Attends Workshop) = 0.40 vs P(Garden | overall) = 0.15

Action: Recruit garden volunteers at workshops - they’re already engaged! 🌿

Independence Revisited

Formal definition using conditional probability:

Events A and B are independent if and only if:

\[P(A|B) = P(A)\]

In words: Knowing that B occurred doesn’t change the probability of A!

Equivalently: Events are independent if:

\[P(A \text{ and } B) = P(A) \times P(B)\]

Testing Independence with Conditionals

Sofia’s data:

  • P(Bikes) = 120/600 = 0.20
  • P(Bikes | On Campus) = 84/240 = 0.35

Test for independence: Does P(Bikes | On Campus) = P(Bikes)?

0.35 ≠ 0.20

Not independent! Living on campus changes the probability of biking! 📊

Three equivalent ways to test independence:

  1. Does P(A|B) = P(A)?
  2. Does P(B|A) = P(B)?
  3. Does P(A and B) = P(A) × P(B)?

All three will give the same answer! ✅

Google Sheets for Contingency Tables 💻

Creating a contingency table in Sheets:

Method 1 - Pivot Table (best for raw data):

  1. Select your data range
  2. Insert → Pivot Table
  3. Add row: First variable
  4. Add column: Second variable
  5. Add value: COUNTA of any column

Method 2 - COUNTIFS (manual):

=COUNTIFS($A$2:$A$601, "On Campus", $B$2:$B$601, "Yes")

This counts students who are BOTH on campus AND use bikes

Calculating probabilities:

=B2/$D$5  ' Joint or simple probability
=B2/D2    ' Conditional probability (row-based)

Complex Contingency Table Example

Sofia expands her analysis - now with three categories for each variable:

Bike Bus Walk Row Total
On Campus 84 48 108 240
Off Campus < 2 miles 36 45 69 150
Off Campus > 2 miles 12 138 60 210
Column Total 132 231 237 600

Now we can ask more nuanced questions:

  • P(Walk | On Campus) = 108/240 = 0.45
  • P(Bus | Off Campus >2mi) = 138/210 = 0.66
  • P(Bike | Off Campus <2mi) = 36/150 = 0.24

Insight: Distance matters! Students far from campus heavily rely on buses! 🚌

📊 THINK-PAIR-SHARE #2 (7 minutes)

Comprehensive Contingency Table Analysis:

Sofia surveys 800 students about zero-waste dining participation:

Participates Doesn’t Participate Row Total
Has meal plan 280 120 400
No meal plan 80 320 400
Column Total 360 440 800

Calculate and interpret:

  1. P(Has meal plan AND Participates) - joint probability
  2. P(Participates | Has meal plan) - conditional probability
  3. P(Participates | No meal plan) - conditional probability
  4. Are meal plan status and zero-waste participation independent? Show your work.
  5. What should Sofia recommend based on this analysis?

Post your complete analysis on Ed Discussion with reasoning!

The Law of Total Probability 📏

A powerful tool when you know conditional probabilities:

\[P(A) = P(A|B_1) \times P(B_1) + P(A|B_2) \times P(B_2) + ...\]

Sofia’s example: Overall bike usage probability

P(Bikes) = P(Bikes|On Campus) × P(On Campus) + P(Bikes|Off Campus) × P(Off Campus)

= 0.35 × 0.40 + 0.10 × 0.60

= 0.14 + 0.06

= 0.20

This matches our direct calculation: 120/600 = 0.20

When is this useful? When you know conditional probabilities but need the overall probability!

Common Mistakes with Conditional Probability ⚠️

  1. Confusing P(A|B) with P(B|A) - Direction matters!
    • Example: P(Cancer|Positive test) ≠ P(Positive test|Cancer)
  2. Using the wrong denominator - Always use the “given” total
    • P(A|B) = (A and B) / B, not total population!
  3. Assuming independence without testing
    • Must verify: P(A|B) = P(A) or P(A and B) = P(A) × P(B)
  4. Forgetting the reduced sample space
    • When “given” something, you’re working with fewer cases!
  5. Wrong row/column in table
    • P(A|B) uses B’s margin (row or column) as denominator

Decision-Making with Conditional Probability 🎯

Sofia’s strategic questions answered by conditional probability:

  1. Target marketing: Which students respond best to emails?
    • P(Click | First-gen) vs P(Click | Not first-gen)
  2. Resource allocation: Where to place bike stations?
    • P(Bikes | Location) for each campus area
  3. Program bundling: Which activities go together?
    • P(Activity A | Participates in Activity B)
  4. Outreach timing: When are students most engaged?
    • P(Attendance | Day of week) or P(Attendance | Time)

Interpreting Zero Probabilities 🔍

What if a cell in your contingency table is zero?

Uses Bike No Bike Row Total
Lives >10 miles away 0 150 150
Lives <10 miles away 120 330 450
Column Total 120 480 600

Interpretation:

  • P(Bikes | >10 miles) = 0/150 = 0
  • These events are not independent (extremely dependent!)
  • Practical meaning: Distance is a barrier to cycling

For Sofia: Don’t target far-away students for bike program; focus on bus/carpool instead! 🚌

📊 THINK-PAIR-SHARE #3 (7 minutes)

Comprehensive Analysis Challenge:

Sofia conducts a final survey on sustainable transportation (n = 900):

Bike Bus Car Row Total
Lives On Campus 120 90 60 270
Lives Off <3mi 135 90 135 360
Lives Off >3mi 45 135 90 270
Column Total 300 315 285 900

Your comprehensive analysis:

  1. Calculate P(Bike), P(On Campus), and P(Bike AND On Campus)
  2. Calculate P(Bike | On Campus) for all three housing categories
  3. Which housing group has the highest bike usage rate?
  4. Are bike usage and housing location independent? Prove it.
  5. Calculate P(Bus | Lives >3mi) - what does this tell Sofia?
  6. Recommend TWO specific actions Sofia should take based on this data
  7. If Sofia only has budget for one new bike station, where should she put it?

Share your answers in Poll Everywhere!

  1. If Sofia only has budget for one new bike station, where should she put it?

Real-World Impact 🌍

Why this matters beyond the classroom:

Public Health: Vaccine effectiveness by age group

Marketing: Purchase rates by customer demographics

Education: Success rates by teaching method

Climate Action: Participation rates by student characteristics (Sofia’s work!)

Social Justice: Understanding disparities in outcomes across groups

Data-driven decisions are only as good as your understanding of probability! 💡

Common Student Questions 🙋

Q: “How do I know which probability type to calculate?”

A: Read the question carefully! - “Overall” or “in general” → Simple - “Both” or “and” → Joint - “Given” or “if we know” → Conditional

Q: “Why doesn’t P(A|B) + P(A|Bc) equal 1?”

A: Because they have different denominators! They’re probabilities within different subgroups, not complementary events.

Q: “Can I use contingency tables for continuous variables?”

A: Not directly! You need to “bin” continuous data into categories first (e.g., age → age groups).

Sofia’s Final Insights 💡

What Sofia learned about effective activism:

Lesson 1: Not all students are equally likely to participate - understand your audience!

  • P(Participate | On Campus) = 0.42
  • P(Participate | Off Campus) = 0.26
  • Target housing groups differently

Lesson 2: Some behaviors cluster together - leverage existing engagement!

  • P(Garden | Workshop) = 0.40 vs P(Garden) = 0.15
  • Recruit where people already are

Lesson 3: Distance and infrastructure matter more than enthusiasm!

  • P(Bike | <2mi from campus) = 0.30
  • P(Bike | >5mi from campus) = 0.02
  • Remove barriers, don’t just increase awareness

Quick Knowledge Check ✅

Rate your confidence (1-5) on Ed Discussion:

  1. Creating contingency tables from raw data ⭐⭐⭐⭐⭐
  2. Calculating simple (marginal) probabilities ⭐⭐⭐⭐⭐
  3. Calculating joint probabilities ⭐⭐⭐⭐⭐
  4. Calculating conditional probabilities ⭐⭐⭐⭐⭐
  5. Testing for independence using tables ⭐⭐⭐⭐⭐
  6. Understanding P(A|B) vs P(B|A) distinction ⭐⭐⭐⭐⭐
  7. Making decisions using conditional probability ⭐⭐⭐⭐⭐

If you rated anything 3 or below, please come to office hours! These concepts build on each other. 🤗

Group Study Tips 👥

Make the most of collaborative learning:

  1. Practice explaining conditional probability to each other
  2. Create practice problems from real campus scenarios
  3. Quiz each other on probability types
  4. Check each other’s work on contingency tables
  5. Discuss interpretations - what do the numbers mean?

Form study groups on Ed Discussion! Research shows collaborative learning improves outcomes. 📚

Summary: Contingency Table Probabilities

Three Types of Probability:

Simple (Marginal) Probability: \(P(A) = \frac{\text{Row or Column Total}}{\text{Grand Total}}\)

Joint Probability: \(P(A \text{ and } B) = \frac{\text{Cell Frequency}}{\text{Grand Total}}\)

Conditional Probability: \(P(A|B) = \frac{\text{Cell Frequency}}{\text{Row or Column Total for B}}\)

OR equivalently:

\(P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\)

Summary: Testing for Independence

Three equivalent methods:

Method 1 - Using conditional probability:

Check if P(A|B) = P(A)

If equal → Independent ✅

If not equal → Dependent ❌

Method 2 - Using multiplication rule:

Check if P(A and B) = P(A) × P(B)

If equal → Independent ✅

If not equal → Dependent ❌

Method 3 - Compare conditional to marginal:

Check if P(A|B) = P(A|Bc)

If equal → Independent ✅

If not equal → Dependent ❌

Your Contingency Table Toolkit 🧰

When to use contingency tables:

✅ Comparing two categorical variables

✅ Looking for relationships between factors

✅ Calculating conditional probabilities

✅ Testing for independence

✅ Making data-driven decisions about groups

✅ Organizing survey or experimental data

Quick reference:

  • Simple: Use margins
  • Joint: Use cells
  • Conditional: Use row/column of “given” variable
  • Independence: Compare conditional to simple

Thank you! 📊✨

Questions? I have office hours right after class today!

Next up: Probability Distributions

Remember: Contingency tables are your friends - they organize complex information into clear, actionable insights! 🎯