STAT 7 - Statistical Methods for the Biological, Environmental & Health Sciences
10 Mar 2026
PollEv.com/slugstats
Which concept from Tuesday needs more clarification?
A. Conditional probability formula
B. Independence vs. dependence
C. Tree diagrams
D. I’m good with all of these!
Organizing sequential probability problems
Problem: Many probability problems involve sequences of events
Examples:
Solution: Tree diagrams help us:
Scenario: In a population:
Among those with insurance:
Among those without insurance:
Question: What’s the probability someone is insured AND has excellent/very good health?
┌─ Insured (0.874)
○────┤
└─ Not Insured (0.126)
┌─ Excellent/VGood (0.90) ─→ 0.874 × 0.90 = 0.787
│
Insured ────○
(0.874) │
└─ Not Exc/VGood (0.10) ─→ 0.874 × 0.10 = 0.087
┌─ Excellent/VGood (0.72) ─→ 0.126 × 0.72 = 0.091
│
Not Insur ─○
(0.126) │
└─ Not Exc/VGood (0.28) ─→ 0.126 × 0.28 = 0.035
Each path represents a joint probability:
Path 1:
Insured AND Excellent Health
0.874 × 0.90 = 0.787
Path 2:
Insured AND Not Excellent
0.874 × 0.10 = 0.087
Path 3:
Not Insured AND Excellent
0.126 × 0.72 = 0.091
Path 4:
Not Insured AND Not Excellent
0.126 × 0.28 = 0.035
Check: 0.787 + 0.087 + 0.091 + 0.035 = 1.000 ✓
From the tree diagram, we can see:
General Multiplication Rule
For any two events A and B:
\[P(A \text{ and } B) = P(B) \times P(A|B)\]
or equivalently:
\[P(A \text{ and } B) = P(A) \times P(B|A)\]
Note: This works whether or not A and B are independent!
When independent: P(A|B) = P(A), so we get back P(A and B) = P(A) × P(B)
Your Task
Scenario: A medical test has:
Individual (3 min): Work on your own to
Pair (4 min): Compare diagrams and calculations
Poll: PollEv.com/slugstats
┌─ Test + (0.95) ─→ 0.05 × 0.95 = 0.0475
│
D ○────┤
(0.05) │
└─ Test - (0.05) ─→ 0.05 × 0.05 = 0.0025
┌─ Test + (0.10) ─→ 0.95 × 0.10 = 0.0950
│
ND ○────┤
(0.95) │
└─ Test - (0.90) ─→ 0.95 × 0.90 = 0.8550
From our medical test tree:
Surprising finding:
Among those who test positive, more than half (9.5% out of 14.25%) don’t actually have the disease!
This is why we need Bayes’ Theorem (today’s topic) to calculate P(Disease | Test+)
Another tool for organizing probability information
A contingency table (also called a two-way table) displays the relationship between two categorical variables.
Uses:
Alternative to: Tree diagrams (same information, different format)
| Excellent/VG | Not Exc/VG | Total | |
|---|---|---|---|
| Insured | 0.787 | 0.087 | 0.874 |
| Not Insured | 0.091 | 0.035 | 0.126 |
| Total | 0.878 | 0.122 | 1.000 |
Note: This contains exactly the same information as our tree diagram!
| Excellent/VG | Not Exc/VG | Total | |
|---|---|---|---|
| Insured | 0.787 | 0.087 | 0.874 |
| Not Insured | 0.091 | 0.035 | 0.126 |
| Total | 0.878 | 0.122 | 1.000 |
Question: What’s P(Excellent Health | Insured)?
\[P(\text{Excellent | Insured}) = \frac{P(\text{Excellent and Insured})}{P(\text{Insured})}\]
From table:
\[P(\text{Excellent | Insured}) = \frac{0.787}{0.874} = 0.900 = 90\%\]
Important
Tip: For P(A|B), find row/column for B, then look at proportion for A within that row/column
Multiple Conditional Probabilities
| Excellent/VG | Not Exc/VG | Total | |
|---|---|---|---|
| Insured | 0.787 | 0.087 | 0.874 |
| Not Insured | 0.091 | 0.035 | 0.126 |
| Total | 0.878 | 0.122 | 1.000 |
Calculate:
Individual (2 min) → Pair (2 min) → Share
Poll: PollEv.com/slugstats
P(Excellent | Not Insured) = 0.091 / 0.126 = 0.722 = 72.2%
P(Insured | Excellent) = 0.787 / 0.878 = 0.896 = 89.6%
P(Not Insured | Not Excellent) = 0.035 / 0.122 = 0.287 = 28.7%
Interpretation tips:
Tree Diagrams:
Contingency Tables:
Bottom line: Use whichever helps YOU understand the problem better!
When conditional probability goes wrong in court
The Case:
A woman’s purse was snatched in Los Angeles. Witnesses described:
The Couple:
Malcolm and Janet Collins matched this description
The Evidence:
Prosecutor used probability to argue guilt
The prosecutor brought in a mathematician who estimated:
| Characteristic | Probability |
|---|---|
| Yellow car | 1 in 10 |
| Man with mustache | 1 in 4 |
| Woman with ponytail | 1 in 10 |
| Woman with blonde hair | 1 in 3 |
| Black man with beard | 1 in 10 |
| Interracial couple in car | 1 in 1000 |
Claimed calculation:
(1/10) × (1/4) × (1/10) × (1/3) × (1/10) × (1/1000) = 1 in 12,000,000
“Only one in 12 million couples match this description!”
The prosecutor argued:
“The probability that another couple matching this description exists is only 1 in 12 million, so the Collins’ must be guilty.”
But wait…
What probability did we actually calculate?
What probability do we actually NEED?
What was calculated:
P(Matching description | Innocent random couple)
What’s needed for guilt:
P(Innocent | Matching description)
The Prosecutor’s Fallacy:
Confusing P(Evidence | Innocent) with P(Innocent | Evidence)
These are NOT the same!
The California Supreme Court overturned the conviction in 1968, citing misuse of probability.
P(DNA match | Innocent) = 1 in 1,000,000
P(Innocent | DNA match) = ?
Missing piece: How many people might have been at the crime scene? (Base rate!)
The tool for reversing conditional probabilities
We often know: - P(Evidence | Hypothesis)
e.g., P(Positive test | Disease)
But we want: - P(Hypothesis | Evidence)
e.g., P(Disease | Positive test)
Bayes’ Theorem tells us how to reverse conditional probabilities!
Bayes’ Theorem
For events A and B:
\[P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\]
Or more fully:
\[P(A|B) = \frac{P(B|A) \times P(A)}{P(B|A) \times P(A) + P(B|A^C) \times P(A^C)}\]
Parts:
Start with conditional probability definition:
\[P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\]
\[P(B|A) = \frac{P(B \text{ and } A)}{P(A)}\]
Since P(A and B) = P(B and A):
From second equation: \(P(B \text{ and } A) = P(B|A) \times P(A)\)
Substitute into first: \(P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\)
Scenario:
Question: If someone tests positive, what’s the probability they have the disease?
What we want: P(Disease | +)
What we know: P(+ | Disease) = 0.95
Step 1: Identify what we know
Step 2: Apply Bayes’ Theorem
\[P(\text{Disease}|+) = \frac{P(+|\text{Disease}) \times P(\text{Disease})}{P(+)}\]
Where \(P(+) = P(+|\text{Disease}) \times P(\text{Disease}) + P(+|\text{No Disease}) \times P(\text{No Disease})\)
Calculate denominator (total probability of +):
\[P(+) = P(+|\text{Disease}) \times P(\text{Disease}) + P(+|\text{No Disease}) \times P(\text{No Disease})\]
\[P(+) = (0.95)(0.01) + (0.10)(0.99) = 0.0095 + 0.099 = 0.1085\]
Calculate posterior:
\[P(\text{Disease}|+) = \frac{(0.95)(0.01)}{0.1085} = \frac{0.0095}{0.1085} = 0.0876\]
Result: Only 8.76% of positive tests actually have the disease!
Test is 95% accurate, but positive result only means 8.76% chance of disease??
Why?
This is why screening rare diseases is challenging!
┌─ Test + (0.95) ─→ 100 × 0.95 = 95 [TRUE POS]
│
D ○────┤
(100) │
└─ Test - (0.05) ─→ 100 × 0.05 = 5 [FALSE NEG]
┌─ Test + (0.10) ─→ 9900 × 0.10 = 990 [FALSE POS]
│
ND ○────┤
(9900) │
└─ Test - (0.90) ─→ 9900 × 0.90 = 8910 [TRUE NEG]
Among 95 + 990 = 1085 positive tests, only 95 are true positives!
95 / 1085 = 8.76%
Your task
Scenario: A disease affects 10% of the population (instead of 1%). Same test (95% sensitive, 90% specific).
Tasks:
Individual (3 min): Work through the calculation
Pair (4 min): Compare answers and interpretation
Share: How did the result change?
PollEv.com/slugstats - What’s your calculated PPV?
Tree for 10,000 people:
Bayes’ Theorem: \[P(\text{Disease}|+) = \frac{(0.95)(0.10)}{(0.95)(0.10) + (0.10)(0.90)} = \frac{0.095}{0.185} = 0.514 = 51.4\%\]
Interpretation: When prevalence is 10%, a positive test means 51.4% chance of disease (vs. 8.76% when prevalence was 1%)
| Prevalence | P(Disease | Positive Test) |
|---|---|
| 1% | 8.76% |
| 10% | 51.4% |
| 20% | 70.4% |
| 50% | 90.5% |
Same test accuracy, dramatically different interpretation!
Clinical Lesson: The predictive value of a test depends heavily on disease prevalence in the population being tested.
Break Time! ☕ 5-minute break
Stretch, grab water, chat with neighbors!
When we return: Medical screening metrics (sensitivity, specificity, PPV, NPV)
Understanding test performance characteristics
| Disease + | Disease - | Total | |
|---|---|---|---|
| Test + | a (TP) | b (FP) | a + b |
| Test - | c (FN) | d (TN) | c + d |
| Total | a + c | b + d | n |
Key:
Sensitivity (True Positive Rate)
\[\text{Sensitivity} = \frac{\text{TP}}{\text{TP + FN}} = \frac{a}{a+c}\]
Probability test is positive given person has disease: P(+ | Disease)
Specificity (True Negative Rate)
\[\text{Specificity} = \frac{\text{TN}}{\text{TN + FP}} = \frac{d}{b+d}\]
Probability test is negative given person doesn’t have disease: P(- | No Disease)
Positive Predictive Value (PPV)
\[\text{PPV} = \frac{\text{TP}}{\text{TP + FP}} = \frac{a}{a+b}\]
Probability person has disease given test is positive: P(Disease | +)
Negative Predictive Value (NPV)
\[\text{NPV} = \frac{\text{TN}}{\text{TN + FN}} = \frac{d}{c+d}\]
Probability person doesn’t have disease given test is negative: P(No Disease | -)
Sensitivity & Specificity:
PPV & NPV:
Key Distinction:
Sensitivity/specificity = test characteristics
PPV/NPV = clinical interpretation (prevalence-dependent)
A study of 1000 people screened for diabetes:
| Diabetes | No Diabetes | Total | |
|---|---|---|---|
| Screen + | 85 | 50 | 135 |
| Screen - | 15 | 850 | 865 |
| Total | 100 | 900 | 1000 |
Calculate:
Sensitivity = TP / (TP + FN) = 85 / (85 + 15) = 85 / 100 = 85%
Specificity = TN / (TN + FP) = 850 / (850 + 50) = 850 / 900 = 94.4%
PPV = TP / (TP + FP) = 85 / (85 + 50) = 85 / 135 = 63.0%
NPV = TN / (TN + FN) = 850 / (850 + 15) = 850 / 865 = 98.3%
For this diabetes screening test:
Strengths:
Limitations:
Clinical decision: Good as a screening tool, but positive results should be confirmed with more specific testing
Two scenarios for a cancer screening test
Scenario A: High sensitivity (95%), Low specificity (70%)
Scenario B: Low sensitivity (70%), High specificity (95%)
Questions:
Individual (3 min): Think through each scenario
Pair (3 min): Discuss tradeoffs
Share: What factors matter for your choice?
Initial Screening (want high sensitivity):
Confirmatory Testing (want high specificity):
Strategy: Use sensitive test first, follow up positives with specific test
Setting: Urban clinic, high-risk population
| HIV + | HIV - | Total | |
|---|---|---|---|
| Test + | 47 | 15 | 62 |
| Test - | 3 | 935 | 938 |
| Total | 50 | 950 | 1000 |
Calculate:
Then interpret: Is this a good screening test?
Sensitivity = 47 / 50 = 94%
(Among HIV+ people, 94% test positive)
Specificity = 935 / 950 = 98.4%
(Among HIV- people, 98.4% test negative)
PPV = 47 / 62 = 75.8%
(Among positive tests, 75.8% truly have HIV)
NPV = 935 / 938 = 99.7%
(Among negative tests, 99.7% truly don’t have HIV)
Interpretation: Excellent test! High sensitivity catches most cases, high specificity minimizes false alarms, and in this higher-prevalence population, the PPV is strong.
HIV Testing in General Population
Tuesday’s Setup:
Population prevalence: 0.2% in young adults
Test sensitivity: 99.7%
Test specificity: 98.5%
Now we have the tools to answer properly!
For 100,000 people:
| HIV + | HIV - | Total | |
|---|---|---|---|
| Test + | ? | ? | ? |
| Test - | ? | ? | ? |
| Total | 200 | 99,800 | 100,000 |
Given:
| HIV + | HIV - | Total | |
|---|---|---|---|
| Test + | 199 | 1,497 | 1,696 |
| Test - | 1 | 98,303 | 98,304 |
| Total | 200 | 99,800 | 100,000 |
Calculations:
PPV = TP / (TP + FP) = 199 / (199 + 1,497) = 199 / 1,696 = 11.7%
What this means:
Clinical response:
Using Bayes directly:
\[P(\text{HIV}|+) = \frac{P(+|\text{HIV}) \times P(\text{HIV})}{P(+)}\]
\[= \frac{(0.997)(0.002)}{(0.997)(0.002) + (0.015)(0.998)}\]
\[= \frac{0.001994}{0.001994 + 0.01497} = \frac{0.001994}{0.016964} = 0.1175 = 11.75\%\]
Same answer! Bayes’ Theorem and the 2×2 table are just different ways of organizing the same calculation.
Your task
Scenario: Same test, but now in a high-risk population where prevalence is 5% (not 0.2%)
Tasks:
Individual (3 min): Work through the numbers
Pair (3 min): Discuss the dramatic difference
Share: Clinical implications?
PollEv.com/slugstats - What’s the PPV with 5% prevalence?
For 10,000 people with 5% prevalence:
| HIV + | HIV - | Total | |
|---|---|---|---|
| Test + | 499 | 143 | 642 |
| Test - | 1 | 9,357 | 9,358 |
| Total | 500 | 9,500 | 10,000 |
PPV = 499 / 642 = 77.7%
Comparison:
Same test, dramatically different interpretation based on who you’re testing!
Synthesis and looking ahead
Bayes’ Theorem: Tool for reversing conditional probabilities \[P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\]
Sensitivity & Specificity: Test characteristics (independent of prevalence)
PPV & NPV: Clinical interpretation (depend on prevalence!)
Base Rate Effect: Prevalence dramatically affects predictive values
Prosecutor’s Fallacy: Don’t confuse P(A|B) with P(B|A)
Why This Matters:
Life Skill: Always ask yourself: - What’s the base rate? - Am I confusing the direction of conditional probability? - What additional information do I need?
Next Week (Week 4):
Why it matters: Random variables let us model uncertainty systematically—essential for statistical inference!
Exit ticket
PollEv.com/slugstats
A rare disease (0.1% prevalence) has a test with 99% sensitivity and 99% specificity. What’s approximately the PPV?
A. 99%
B. 90%
C. 50%
D. 9%
E. I need to calculate it!
Given: - Prevalence = 0.1% = 0.001 - Sensitivity = 0.99 - Specificity = 0.99
Bayes: \[\text{PPV} = \frac{(0.99)(0.001)}{(0.99)(0.001) + (0.01)(0.999)} = \frac{0.00099}{0.01089} \approx 0.091 = 9\%\]
Answer: D - Only about 9%!
Even with 99% sensitivity and 99% specificity, the low prevalence means most positive tests are false positives.
Two weeks of probability - why?
Because understanding uncertainty is:
You’ve learned powerful tools!
From here forward, we build on probability to develop statistical inference methods.
Want to explore more?
Check out these interactive visualizations:
Practice problems: - Textbook Section 2.3.3
Understanding takes practice - keep working with examples!
![]()
STAT 7 – Winter 2026