Lecture 6: Bayes’ Theorem & Diagnostic Testing

STAT 7 - Statistical Methods for the Biological, Environmental & Health Sciences

Prof. Marcela Alfaro Córdoba

Statistics - UCSC

10 Mar 2026

Welcome Back! Quick Recap Poll

PollEv.com/slugstats

Which concept from Tuesday needs more clarification?

A. Conditional probability formula
B. Independence vs. dependence
C. Tree diagrams
D. I’m good with all of these!

Tree Diagrams

Organizing sequential probability problems

Why Tree Diagrams?

Problem: Many probability problems involve sequences of events

Examples:

First test positive, then confirmatory test
First have disease, then symptom appears
First treatment assigned, then outcome observed

Solution: Tree diagrams help us:

Organize information systematically
Visualize conditional probabilities
Calculate joint probabilities
Avoid common mistakes

Building a Tree: Health Insurance Example

Scenario: In a population:

87.4% have health insurance
12.6% don’t have health insurance

Among those with insurance:

90% report excellent/very good health

Among those without insurance:

72% report excellent/very good health

Question: What’s the probability someone is insured AND has excellent/very good health?

Step 1: First Branch (Insurance Status)

         ┌─ Insured (0.874)
    ○────┤
         └─ Not Insured (0.126)

Step 2: Second Branches (Health Status)

            ┌─ Excellent/VGood (0.90) ─→ 0.874 × 0.90 = 0.787
            │
Insured ────○ 
(0.874)     │
            └─ Not Exc/VGood (0.10) ─→ 0.874 × 0.10 = 0.087
         
            ┌─ Excellent/VGood (0.72) ─→ 0.126 × 0.72 = 0.091
            │
Not Insur ─○ 
(0.126)     │
            └─ Not Exc/VGood (0.28) ─→ 0.126 × 0.28 = 0.035

Reading the Tree

Each path represents a joint probability:

Path 1:
Insured AND Excellent Health
0.874 × 0.90 = 0.787

Path 2:
Insured AND Not Excellent
0.874 × 0.10 = 0.087

Path 3:
Not Insured AND Excellent
0.126 × 0.72 = 0.091

Path 4:
Not Insured AND Not Excellent
0.126 × 0.28 = 0.035

Check: 0.787 + 0.087 + 0.091 + 0.035 = 1.000 ✓

General Multiplication Rule

From the tree diagram, we can see:

General Multiplication Rule

For any two events A and B:

\[P(A \text{ and } B) = P(B) \times P(A|B)\]

or equivalently:

\[P(A \text{ and } B) = P(A) \times P(B|A)\]

Note: This works whether or not A and B are independent!

When independent: P(A|B) = P(A), so we get back P(A and B) = P(A) × P(B)

Activity: Medical Test Tree

Your Task

Scenario: A medical test has:

95% sensitivity (true positive rate)
90% specificity (true negative rate)
Disease prevalence: 5%

Individual (3 min): Work on your own to

Draw a tree diagram for this scenario
Calculate P(Disease AND Positive Test)
Calculate P(No Disease AND Positive Test)

Pair (4 min): Compare diagrams and calculations

Poll: PollEv.com/slugstats

Solution: Medical Test Tree

         ┌─ Test + (0.95) ─→ 0.05 × 0.95 = 0.0475
         │
 D  ○────┤ 
(0.05)   │
         └─ Test - (0.05) ─→ 0.05 × 0.05 = 0.0025
         
         ┌─ Test + (0.10) ─→ 0.95 × 0.10 = 0.0950
         │
ND  ○────┤ 
(0.95)   │
         └─ Test - (0.90) ─→ 0.95 × 0.90 = 0.8550

Key Results from Tree

From our medical test tree:

P(Disease AND Test+) = 0.0475 (4.75%)
P(No Disease AND Test+) = 0.0950 (9.5%)
P(Test+) = 0.0475 + 0.0950 = 0.1425 (14.25%)

Surprising finding:
Among those who test positive, more than half (9.5% out of 14.25%) don’t actually have the disease!

This is why we need Bayes’ Theorem (today’s topic) to calculate P(Disease | Test+)

Contingency Tables

Another tool for organizing probability information

What is a Contingency Table?

A contingency table (also called a two-way table) displays the relationship between two categorical variables.

Uses:

Display joint probabilities
Calculate marginal probabilities
Find conditional probabilities
Check for independence

Alternative to: Tree diagrams (same information, different format)

Example: Health Coverage & Health Status

	Excellent/VG	Not Exc/VG	Total
Insured	0.787	0.087	0.874
Not Insured	0.091	0.035	0.126
Total	0.878	0.122	1.000

Note: This contains exactly the same information as our tree diagram!

Reading a Contingency Table

	Excellent/VG	Not Exc/VG	Total
Insured	0.787	0.087	0.874
Not Insured	0.091	0.035	0.126
Total	0.878	0.122	1.000

Joint probabilities: Interior cells (e.g., 0.787)
Marginal probabilities: Row and column totals (e.g., 0.874, 0.878)
Conditional probabilities: Calculate using formula

Calculating Conditional Probabilities

Question: What’s P(Excellent Health | Insured)?

\[P(\text{Excellent | Insured}) = \frac{P(\text{Excellent and Insured})}{P(\text{Insured})}\]

From table:

Numerator: 0.787 (joint probability)
Denominator: 0.874 (marginal probability)

\[P(\text{Excellent | Insured}) = \frac{0.787}{0.874} = 0.900 = 90\%\]

Important

Tip: For P(A|B), find row/column for B, then look at proportion for A within that row/column

Activity: Practice with Tables

Multiple Conditional Probabilities

	Excellent/VG	Not Exc/VG	Total
Insured	0.787	0.087	0.874
Not Insured	0.091	0.035	0.126
Total	0.878	0.122	1.000

Calculate:

P(Excellent | Not Insured)
P(Insured | Excellent)
P(Not Insured | Not Excellent)

Individual (2 min) → Pair (2 min) → Share

Poll: PollEv.com/slugstats

Solutions

P(Excellent | Not Insured) = 0.091 / 0.126 = 0.722 = 72.2%
P(Insured | Excellent) = 0.787 / 0.878 = 0.896 = 89.6%
P(Not Insured | Not Excellent) = 0.035 / 0.122 = 0.287 = 28.7%

Interpretation tips:

“Given X” → X goes in denominator
Look at the row/column for X
Find the proportion for the outcome of interest

Tree vs. Table: When to Use Which?

Tree Diagrams:

Sequential events are natural
Clear conditional structure
Good for explaining step-by-step
Easier for some people to visualize

Contingency Tables:

Compact representation
Easy to calculate marginals
Quick independence checks
Standard in research papers

Bottom line: Use whichever helps YOU understand the problem better!

Motivation: The Prosecutor’s Fallacy

When conditional probability goes wrong in court

People v. Collins (1968)

The Case:

A woman’s purse was snatched in Los Angeles. Witnesses described:

Blonde woman with ponytail
Black man with beard
Interracial couple in yellow car

The Couple:

Malcolm and Janet Collins matched this description

The Evidence:

Prosecutor used probability to argue guilt

The Prosecutor’s Argument

The prosecutor brought in a mathematician who estimated:

Characteristic	Probability
Yellow car	1 in 10
Man with mustache	1 in 4
Woman with ponytail	1 in 10
Woman with blonde hair	1 in 3
Black man with beard	1 in 10
Interracial couple in car	1 in 1000

Claimed calculation:
(1/10) × (1/4) × (1/10) × (1/3) × (1/10) × (1/1000) = 1 in 12,000,000

“Only one in 12 million couples match this description!”

The Critical Question

The prosecutor argued:

“The probability that another couple matching this description exists is only 1 in 12 million, so the Collins’ must be guilty.”

But wait…

What probability did we actually calculate?

What probability do we actually NEED?

The Fallacy Revealed

What was calculated:
P(Matching description | Innocent random couple)

What’s needed for guilt:
P(Innocent | Matching description)

The Prosecutor’s Fallacy:
Confusing P(Evidence | Innocent) with P(Innocent | Evidence)

These are NOT the same!

The California Supreme Court overturned the conviction in 1968, citing misuse of probability.

Why They’re Different

P(DNA match | Innocent) = 1 in 1,000,000

Among innocent people, only 1 in million would match
This is about the accuracy of the DNA test

P(Innocent | DNA match) = ?

Among people who match DNA, what proportion are innocent?
This is what juries need to know!

Missing piece: How many people might have been at the crime scene? (Base rate!)

Bayes’ Theorem

The tool for reversing conditional probabilities

The Setup

We often know: - P(Evidence | Hypothesis)
e.g., P(Positive test | Disease)

But we want: - P(Hypothesis | Evidence)
e.g., P(Disease | Positive test)

Bayes’ Theorem tells us how to reverse conditional probabilities!

Bayes’ Theorem: The Formula

Bayes’ Theorem

For events A and B:

\[P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\]

Or more fully:

\[P(A|B) = \frac{P(B|A) \times P(A)}{P(B|A) \times P(A) + P(B|A^C) \times P(A^C)}\]

Parts:

P(A): Prior probability (before seeing evidence)
P(B|A): Likelihood (probability of evidence given hypothesis)
P(A|B): Posterior probability (after seeing evidence)

Deriving Bayes’ Theorem

Start with conditional probability definition:

\[P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\]

\[P(B|A) = \frac{P(B \text{ and } A)}{P(A)}\]

Since P(A and B) = P(B and A):

From second equation: \(P(B \text{ and } A) = P(B|A) \times P(A)\)

Substitute into first: \(P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\)

Example: Rare Disease Testing

Scenario:

Disease prevalence: 1% (P(Disease) = 0.01)
Test sensitivity: 95% (P(+|Disease) = 0.95)
Test specificity: 90% (P(-|No Disease) = 0.90)

Question: If someone tests positive, what’s the probability they have the disease?

What we want: P(Disease | +)

What we know: P(+ | Disease) = 0.95

Solution: Step by Step

Step 1: Identify what we know

P(Disease) = 0.01 → P(No Disease) = 0.99
P(+|Disease) = 0.95
P(+|No Disease) = 1 - 0.90 = 0.10

Step 2: Apply Bayes’ Theorem

\[P(\text{Disease}|+) = \frac{P(+|\text{Disease}) \times P(\text{Disease})}{P(+)}\]

Where \(P(+) = P(+|\text{Disease}) \times P(\text{Disease}) + P(+|\text{No Disease}) \times P(\text{No Disease})\)

Solution: Calculation

Calculate denominator (total probability of +):

\[P(+) = P(+|\text{Disease}) \times P(\text{Disease}) + P(+|\text{No Disease}) \times P(\text{No Disease})\]

\[P(+) = (0.95)(0.01) + (0.10)(0.99) = 0.0095 + 0.099 = 0.1085\]

Calculate posterior:

\[P(\text{Disease}|+) = \frac{(0.95)(0.01)}{0.1085} = \frac{0.0095}{0.1085} = 0.0876\]

Result: Only 8.76% of positive tests actually have the disease!

Surprising Result!

Test is 95% accurate, but positive result only means 8.76% chance of disease??

Why?

Disease is rare (1% prevalence)
Many more healthy people than sick people
False positives among healthy outnumber true positives
Base rate matters - you can’t ignore prevalence!

This is why screening rare diseases is challenging!

Visualizing with Tree Diagram

         ┌─ Test + (0.95) ─→ 100 × 0.95 = 95 [TRUE POS]
         │
 D  ○────┤ 
 (100)   │
         └─ Test - (0.05) ─→ 100 × 0.05 = 5 [FALSE NEG]
         
         ┌─ Test + (0.10) ─→ 9900 × 0.10 = 990 [FALSE POS]
         │
ND  ○────┤  
(9900)   │
         └─ Test - (0.90) ─→ 9900 × 0.90 = 8910 [TRUE NEG]

Among 95 + 990 = 1085 positive tests, only 95 are true positives!
95 / 1085 = 8.76%

Solution: 10% Prevalence

Tree for 10,000 people:

Disease: 1000 people
- Test+: 1000 × 0.95 = 950
- Test-: 1000 × 0.05 = 50
No Disease: 9000 people
- Test+: 9000 × 0.10 = 900
- Test-: 9000 × 0.90 = 8100

Bayes’ Theorem: \[P(\text{Disease}|+) = \frac{(0.95)(0.10)}{(0.95)(0.10) + (0.10)(0.90)} = \frac{0.095}{0.185} = 0.514 = 51.4\%\]

Interpretation: When prevalence is 10%, a positive test means 51.4% chance of disease (vs. 8.76% when prevalence was 1%)

The Power of Base Rates

Prevalence	P(Disease \| Positive Test)
1%	8.76%
10%	51.4%
20%	70.4%
50%	90.5%

Same test accuracy, dramatically different interpretation!

Clinical Lesson: The predictive value of a test depends heavily on disease prevalence in the population being tested.

Break Time! ☕ 5-minute break

Stretch, grab water, chat with neighbors!

When we return: Medical screening metrics (sensitivity, specificity, PPV, NPV)

Medical Screening Metrics

Understanding test performance characteristics

The 2×2 Table for Diagnostic Tests

	Disease +	Disease -	Total
Test +	a (TP)	b (FP)	a + b
Test -	c (FN)	d (TN)	c + d
Total	a + c	b + d	n

Key:

TP = True Positive (correctly identified disease)
FP = False Positive (incorrectly identified disease)
TN = True Negative (correctly identified no disease)
FN = False Negative (missed disease)

Four Key Metrics

Sensitivity (True Positive Rate)

\[\text{Sensitivity} = \frac{\text{TP}}{\text{TP + FN}} = \frac{a}{a+c}\]

Probability test is positive given person has disease: P(+ | Disease)

Specificity (True Negative Rate)

\[\text{Specificity} = \frac{\text{TN}}{\text{TN + FP}} = \frac{d}{b+d}\]

Probability test is negative given person doesn’t have disease: P(- | No Disease)

Two More Key Metrics

Positive Predictive Value (PPV)

\[\text{PPV} = \frac{\text{TP}}{\text{TP + FP}} = \frac{a}{a+b}\]

Probability person has disease given test is positive: P(Disease | +)

Negative Predictive Value (NPV)

\[\text{NPV} = \frac{\text{TN}}{\text{TN + FN}} = \frac{d}{c+d}\]

Probability person doesn’t have disease given test is negative: P(No Disease | -)

How to Remember These

Sensitivity & Specificity:

Based on true disease status (columns)
Properties of the test itself
Don’t depend on prevalence

PPV & NPV:

Based on test result (rows)
What clinicians and patients want to know
DO depend on prevalence

Key Distinction:
Sensitivity/specificity = test characteristics
PPV/NPV = clinical interpretation (prevalence-dependent)

Example: Diabetes Screening

A study of 1000 people screened for diabetes:

	Diabetes	No Diabetes	Total
Screen +	85	50	135
Screen -	15	850	865
Total	100	900	1000

Calculate:

Sensitivity = ?
Specificity = ?
PPV = ?
NPV = ?

Solutions: Diabetes Screening

Sensitivity = TP / (TP + FN) = 85 / (85 + 15) = 85 / 100 = 85%

Among people with diabetes, 85% test positive

Specificity = TN / (TN + FP) = 850 / (850 + 50) = 850 / 900 = 94.4%

Among people without diabetes, 94.4% test negative

PPV = TP / (TP + FP) = 85 / (85 + 50) = 85 / 135 = 63.0%

Among those who screen positive, 63.0% have diabetes

NPV = TN / (TN + FN) = 850 / (850 + 15) = 850 / 865 = 98.3%

Among those who screen negative, 98.3% don’t have diabetes

Clinical Interpretation

For this diabetes screening test:

Strengths:

High NPV (98.3%) - negative result is reassuring
Good specificity (94.4%) - few false alarms

Limitations:

Moderate sensitivity (85%) - misses 15% of cases
Moderate PPV (63%) - only 63% of positives are true cases

Clinical decision: Good as a screening tool, but positive results should be confirmed with more specific testing

Screening vs. Confirmatory Tests

Initial Screening (want high sensitivity):

Cast a wide net
Don’t want to miss anyone with disease
OK with false positives (will confirm later)
Example: Mammograms, HIV rapid tests

Confirmatory Testing (want high specificity):

Verify suspected cases
Don’t want to falsely diagnose
OK with missing some cases (already screened)
Example: Biopsies, Western blot for HIV

Strategy: Use sensitive test first, follow up positives with specific test

Practice Problem: HIV Testing

Setting: Urban clinic, high-risk population

	HIV +	HIV -	Total
Test +	47	15	62
Test -	3	935	938
Total	50	950	1000

Calculate:

Sensitivity
Specificity
PPV
NPV

Then interpret: Is this a good screening test?

Solution: HIV Test Performance

Sensitivity = 47 / 50 = 94%
(Among HIV+ people, 94% test positive)

Specificity = 935 / 950 = 98.4%
(Among HIV- people, 98.4% test negative)

PPV = 47 / 62 = 75.8%
(Among positive tests, 75.8% truly have HIV)

NPV = 935 / 938 = 99.7%
(Among negative tests, 99.7% truly don’t have HIV)

Interpretation: Excellent test! High sensitivity catches most cases, high specificity minimizes false alarms, and in this higher-prevalence population, the PPV is strong.

Returning to Our Case

HIV Testing in General Population

Remember This Scenario?

Tuesday’s Setup:

25-year-old patient
Tests positive for HIV
What’s the probability they actually have HIV?

Population prevalence: 0.2% in young adults
Test sensitivity: 99.7%
Test specificity: 98.5%

Now we have the tools to answer properly!

Setting Up the 2×2 Table

For 100,000 people:

	HIV +	HIV -	Total
Test +	?	?	?
Test -	?	?	?
Total	200	99,800	100,000

Given:

Total HIV+: 100,000 × 0.002 = 200
Total HIV-: 100,000 × 0.998 = 99,800
Sensitivity = 0.997, Specificity = 0.985

Filling in the Table

	HIV +	HIV -	Total
Test +	199	1,497	1,696
Test -	1	98,303	98,304
Total	200	99,800	100,000

Calculations:

TP: 200 × 0.997 = 199
FN: 200 × 0.003 = 1
TN: 99,800 × 0.985 = 98,303
FP: 99,800 × 0.015 = 1,497

The Answer

PPV = TP / (TP + FP) = 199 / (199 + 1,497) = 199 / 1,696 = 11.7%

What this means:

Even with a 99.7% sensitive and 98.5% specific test…
In a low-prevalence population (0.2%)…
A positive test only indicates 11.7% probability of actually having HIV

Clinical response:

Don’t panic patients!
Follow up with confirmatory testing
Consider risk factors and symptoms
Understand the base rate effect

Bayes’ Theorem Verification

Using Bayes directly:

\[P(\text{HIV}|+) = \frac{P(+|\text{HIV}) \times P(\text{HIV})}{P(+)}\]

\[= \frac{(0.997)(0.002)}{(0.997)(0.002) + (0.015)(0.998)}\]

\[= \frac{0.001994}{0.001994 + 0.01497} = \frac{0.001994}{0.016964} = 0.1175 = 11.75\%\]

Same answer! Bayes’ Theorem and the 2×2 table are just different ways of organizing the same calculation.

Solution: High-Risk Population

For 10,000 people with 5% prevalence:

	HIV +	HIV -	Total
Test +	499	143	642
Test -	1	9,357	9,358
Total	500	9,500	10,000

PPV = 499 / 642 = 77.7%

Comparison:

General population (0.2% prevalence): PPV = 11.7%
High-risk population (5% prevalence): PPV = 77.7%

Same test, dramatically different interpretation based on who you’re testing!

Key Lessons from HIV Example

Base rates matter enormously
- Same test, different populations → different interpretations
High accuracy ≠ High predictive value (in rare diseases)
- 99%+ accuracy can still give majority false positives
Context is critical for interpretation
- Risk factors, symptoms, population prevalence all matter
Sequential testing is standard
- Screening test → Confirmatory test for positives
Communication is essential
- Explain probabilities clearly to avoid panic or false reassurance

Wrapping Up

Synthesis and looking ahead

Today’s Key Concepts

Bayes’ Theorem: Tool for reversing conditional probabilities \[P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\]
Sensitivity & Specificity: Test characteristics (independent of prevalence)
PPV & NPV: Clinical interpretation (depend on prevalence!)
Base Rate Effect: Prevalence dramatically affects predictive values
Prosecutor’s Fallacy: Don’t confuse P(A|B) with P(B|A)

The Big Picture

Why This Matters:

Understanding medical test results
Evaluating screening programs
Legal reasoning with evidence
Scientific reasoning with data
Decision-making under uncertainty

Life Skill: Always ask yourself: - What’s the base rate? - Am I confusing the direction of conditional probability? - What additional information do I need?

Looking Ahead: Random Variables

Next Week (Week 4):

What is a random variable?
Discrete vs. continuous random variables
Probability distributions
Expected value and variance
Binomial and normal distributions

Why it matters: Random variables let us model uncertainty systematically—essential for statistical inference!

Quick Knowledge Check

Exit ticket

PollEv.com/slugstats

A rare disease (0.1% prevalence) has a test with 99% sensitivity and 99% specificity. What’s approximately the PPV?

A. 99%
B. 90%
C. 50%
D. 9%
E. I need to calculate it!

Answer: Knowledge Check

Given: - Prevalence = 0.1% = 0.001 - Sensitivity = 0.99 - Specificity = 0.99

Bayes: \[\text{PPV} = \frac{(0.99)(0.001)}{(0.99)(0.001) + (0.01)(0.999)} = \frac{0.00099}{0.01089} \approx 0.091 = 9\%\]

Answer: D - Only about 9%!

Even with 99% sensitivity and 99% specificity, the low prevalence means most positive tests are false positives.

Final Thoughts

Two weeks of probability - why?

Because understanding uncertainty is:

Essential for statistical inference
Critical for scientific reasoning
Necessary for evaluating evidence
Important for decision-making
A life skill beyond this course

You’ve learned powerful tools!

From here forward, we build on probability to develop statistical inference methods.

Bonus: Interactive Bayes’ Theorem

Want to explore more?

Check out these interactive visualizations:

Practice problems: - Textbook Section 2.3.3

Understanding takes practice - keep working with examples!

Lecture 6: Bayes’ Theorem & Diagnostic Testing

Welcome Back! Quick Recap Poll

Tree Diagrams

Why Tree Diagrams?

Building a Tree: Health Insurance Example

Step 1: First Branch (Insurance Status)

Step 2: Second Branches (Health Status)

Reading the Tree

General Multiplication Rule

Activity: Medical Test Tree

Solution: Medical Test Tree

Key Results from Tree

Contingency Tables

What is a Contingency Table?

Example: Health Coverage & Health Status

Reading a Contingency Table

Calculating Conditional Probabilities

Activity: Practice with Tables

Solutions

Tree vs. Table: When to Use Which?

Motivation: The Prosecutor’s Fallacy

People v. Collins (1968)

The Prosecutor’s Argument

The Critical Question

The Fallacy Revealed

Why They’re Different

Bayes’ Theorem

The Setup

Bayes’ Theorem: The Formula

Deriving Bayes’ Theorem

Example: Rare Disease Testing

Solution: Step by Step

Solution: Calculation

Surprising Result!

Visualizing with Tree Diagram

Think-Pair-Share

Solution: 10% Prevalence

The Power of Base Rates

Medical Screening Metrics

The 2×2 Table for Diagnostic Tests

Four Key Metrics

Two More Key Metrics

How to Remember These

Example: Diabetes Screening

Solutions: Diabetes Screening

Clinical Interpretation

Think-Pair-Share

Screening vs. Confirmatory Tests

Practice Problem: HIV Testing

Solution: HIV Test Performance

Returning to Our Case

Remember This Scenario?

Setting Up the 2×2 Table

Filling in the Table

The Answer

Bayes’ Theorem Verification

Think-Pair-Share

Solution: High-Risk Population

Key Lessons from HIV Example

Wrapping Up

Today’s Key Concepts

The Big Picture

Looking Ahead: Random Variables

Quick Knowledge Check

Answer: Knowledge Check

Final Thoughts

Bonus: Interactive Bayes’ Theorem