HW3: Probability and Medical Testing
Understanding Chance and Diagnostic Results in Medicine
🧬 The Case
Welcome back, Statistical Detective!
Dr. Sarah Martinez from the County Public Health Department has a critical challenge for you. “Understanding probability is essential for public health,” she explains. “We need to assess disease risks, interpret genetic test results, evaluate screening programs, and help patients understand what their test results really mean. The county is facing questions about genetic disorders, disease transmission, and the reliability of medical tests.”
A woman with a positive mammogram asks, “Do I have cancer?” A man with a positive prostate screening asks, “What are the chances this is wrong?” Your mission: Use probability concepts to help the Public Health Department make informed decisions and communicate results clearly to patients!
Question 1: Probability Vocabulary and Basic Concepts
Dr. Martinez wants to ensure everyone on her team understands probability terminology.
a. Define in your own words
Provide a definition and a biological or medical example for each term:
- Sample space:
- Definition:
- Example:
- Event:
- Definition:
- Example:
- Probability:
- Definition:
- Example:
b. Blood type probabilities
A biology lab is studying blood types in a population. They collected data from 200 randomly selected individuals:
| Blood Type | Frequency |
|---|---|
| O | 90 |
| A | 72 |
| B | 28 |
| AB | 10 |
Calculate the probability that a randomly selected person has:
- Type O blood: P(O) =
- Type A blood: P(A) =
- Type AB blood: P(AB) =
Show your work using the formula: P(event) = favorable outcomes / total outcomes
c. Interpretation
If 1,000 people from this population donate blood, approximately how many would you expect to have Type O blood? Explain your reasoning.
Question 2: Addition Rule (OR Probabilities)
The Public Health Department is tracking flu and COVID cases in the community.
In a sample of 500 patients at a clinic: - 150 tested positive for flu - 100 tested positive for COVID - 30 tested positive for both flu and COVID
a. Venn diagram
Draw and label a Venn diagram showing these events. Include the number of patients in each region.
b. Calculate probabilities
Calculate:
- P(Flu) =
- P(COVID) =
- P(Flu AND COVID) =
- P(Flu OR COVID) =
For #4, show your work using the addition rule: P(A or B) = P(A) + P(B) - P(A and B)
c. Interpret
What percentage of patients tested positive for at least one respiratory illness? Explain why we need to subtract P(Flu AND COVID) in the addition rule.
d. Neither illness
What’s the probability that a randomly selected patient tested negative for both illnesses? (Hint: This is the complement of “Flu OR COVID”)
Question 3: Multiplication Rule and Independence
a. Define independence
In your own words, explain what it means for two events to be independent. Give a biological example of: - Two independent events - Two dependent events
b. Genetic independence
A couple is planning to have two children. Assume the probability of a boy is 0.5 and the probability of a girl is 0.5, and that the sex of each child is independent.
Calculate:
- P(first child is a boy) =
- P(second child is a boy) =
- P(both children are boys) =
- Show how you used the multiplication rule for independent events
c. At least one girl
What’s the probability the couple has at least one girl? (Hint: Use the complement rule)
Question 4: Tree Diagrams for Genetics
A genetic counselor is working with a family. Sarah is a carrier of a gene for cystic fibrosis. Each of her children has a 50% chance of inheriting the gene.
a. Draw a tree diagram
Draw a complete tree diagram for Sarah having two children, showing: - First child: carrier or not carrier (50% each) - Second child: carrier or not carrier (50% each) - Label all branches with probabilities - Show all possible outcomes with their probabilities
b. Calculate probabilities
Using your tree diagram, calculate:
- P(both children are carriers) =
- P(neither child is a carrier) =
- P(exactly one child is a carrier) =
- P(at least one child is a carrier) =
Show your work for each calculation.
c. Verify
Do all the probabilities for the final outcomes sum to 1? Show this calculation.
Question 5: Contingency Tables and Joint Probabilities
A study examined the relationship between exercise habits and heart disease in 1,000 adults:
| Heart Disease | No Heart Disease | Total | |
|---|---|---|---|
| Regular Exercise | 80 | 420 | 500 |
| No Regular Exercise | 180 | 320 | 500 |
| Total | 260 | 740 | 1000 |
a. Marginal probabilities
Calculate:
- P(Regular Exercise) =
- P(Heart Disease) =
b. Joint probabilities
Calculate:
- P(Regular Exercise AND Heart Disease) =
- P(No Regular Exercise AND Heart Disease) =
- P(Regular Exercise AND No Heart Disease) =
- P(No Regular Exercise AND No Heart Disease) =
Verify that all four joint probabilities sum to 1.
c. Public health interpretation
Among people who exercise regularly, what proportion have heart disease? Among people who don’t exercise regularly, what proportion have heart disease? What does this suggest?
Question 6: Understanding Conditional Probability
a. Notation and meaning
Explain what P(A|B) means in your own words, then provide a medical example.
b. Calculate from contingency table
Using the exercise and heart disease data from Question 5, calculate:
- P(Heart Disease | Regular Exercise) =
- P(Heart Disease | No Regular Exercise) =
Show your work and interpret: How many times higher is the risk of heart disease for non-exercisers compared to regular exercisers?
c. Reverse conditional
Calculate:
- P(Regular Exercise | Heart Disease) =
- P(No Regular Exercise | Heart Disease) =
d. Important distinction
Explain the difference between: - P(Heart Disease | Regular Exercise) and P(Regular Exercise | Heart Disease)
Why is this distinction important?
Question 7: Tree Diagrams for Diagnostic Testing
A rapid COVID test has the following characteristics: - Community infection rate: 10% - Sensitivity (correctly identifies infected): 85% - Specificity (correctly identifies not infected): 95%
a. Draw a tree diagram
Create a tree diagram with: - First branch: Infected (10%) or Not Infected (90%) - Second branches: Test Positive or Test Negative (for each first branch) - Label all branches with probabilities - Calculate the probability for each final outcome
b. Calculate outcomes
For a population of 1,000 people, use your tree diagram to calculate:
- How many are actually infected?
- How many infected people test positive (true positives)?
- How many infected people test negative (false negatives)?
- How many uninfected people test positive (false positives)?
- How many uninfected people test negative (true negatives)?
c. Conditional probability question
If someone tests positive, what’s the probability they’re actually infected?
P(Infected | Test Positive) =
Show your work step by step.
Question 8: Creating Contingency Tables for Screening
A screening test for a rare genetic disorder is being evaluated in a population where 2% of people have the disorder.
The test has: - Sensitivity = 95% (correctly identifies 95% of people with the disorder) - Specificity = 90% (correctly identifies 90% of people without the disorder)
a. Complete the contingency table
For 10,000 people, fill in all cells:
| Disease Present | Disease Absent | Total | |
|---|---|---|---|
| Test Positive | |||
| Test Negative | |||
| Total | 10,000 |
Show your calculations for each cell.
b. Calculate key probabilities
From your table, calculate:
- P(Test Positive) =
- P(Disease Present | Test Positive) =
c. Interpretation
How many people who test positive actually have the disease? How many are false positives? What might this mean for people who receive positive test results?
Question 9: Bayes’ Theorem Introduction
a. State Bayes’ Theorem
Write out Bayes’ Theorem for calculating P(Disease | Positive Test).
b. Apply Bayes’ Theorem
Use the genetic disorder screening data from Question 8. Apply Bayes’ Theorem to calculate P(Disease Present | Test Positive).
Step 1: Identify the components - P(Disease) = - P(Positive | Disease) = - P(Positive | No Disease) =
Step 2: Calculate P(Positive)
Step 3: Apply Bayes’ Theorem
Step 4: Verify this matches your answer from Question 8b.
c. Interpret
Explain this result to a patient in plain language. The test is “95% accurate” for detecting the disease, but what does a positive result really mean?
Question 10: Sensitivity, Specificity, and Predictive Values
A new rapid test for strep throat is being evaluated. In a study of 1,000 patients: - 200 actually have strep throat - Of the 200 with strep: 180 test positive, 20 test negative - Of the 800 without strep: 80 test positive, 720 test negative
a. Create the 2×2 table
| Strep Present | Strep Absent | Total | |
|---|---|---|---|
| Test Positive | |||
| Test Negative | |||
| Total | 1000 |
b. Calculate sensitivity
Sensitivity = P(Test Positive | Disease Present)
Sensitivity =
Interpretation: What does this tell us about the test?
c. Calculate specificity
Specificity = P(Test Negative | Disease Absent)
Specificity =
Interpretation: What does this tell us about the test?
d. Positive Predictive Value (PPV)
PPV = P(Disease Present | Test Positive)
PPV =
Interpretation: If someone tests positive, what’s the probability they actually have strep?
e. Negative Predictive Value (NPV)
NPV = P(Disease Absent | Test Negative)
NPV =
Interpretation: If someone tests negative, what’s the probability they truly don’t have strep?
f. Summary and patient communication
Fill in this summary table:
| Measure | Value | What it tells us |
|---|---|---|
| Sensitivity | ||
| Specificity | ||
| PPV | ||
| NPV |
A patient tests positive. Write 2-3 sentences explaining what this means using the PPV. Make your explanation clear for someone without statistical training.
Question 11: Impact of Prevalence on Predictive Values
Consider a highly accurate COVID test with sensitivity = 99% and specificity = 98%.
a. High prevalence scenario
During a major outbreak, 20% of people tested actually have COVID.
Create a table for 10,000 people and calculate PPV and NPV:
| COVID+ | COVID- | Total | |
|---|---|---|---|
| Test Positive | |||
| Test Negative | |||
| Total | 10,000 |
PPV =
NPV =
b. Low prevalence scenario
When community spread is low, only 1% of people tested have COVID.
Create a table for 10,000 people and calculate PPV and NPV:
| COVID+ | COVID- | Total | |
|---|---|---|---|
| Test Positive | |||
| Test Negative | |||
| Total | 10,000 |
PPV =
NPV =
c. Compare and explain
How does the PPV change from high prevalence to low prevalence? Why does this happen even though sensitivity and specificity stay the same?
d. Public health implication
Why might the same test be useful for diagnosing symptomatic patients (higher prevalence) but less useful for screening asymptomatic people (lower prevalence)?
Question 12: Mammography Screening Application
Consider these facts about breast cancer screening: - Breast cancer prevalence in women age 40-50: 1.4% - Mammogram sensitivity: 75% - Mammogram specificity: 92%
a. Create a contingency table
For 10,000 women screened, complete the table (show all calculations):
| Cancer | No Cancer | Total | |
|---|---|---|---|
| Positive Mammogram | |||
| Negative Mammogram | |||
| Total | 10,000 |
b. Calculate predictive values
- PPV = P(Cancer | Positive Mammogram) =
- NPV = P(No Cancer | Negative Mammogram) =
c. Interpret for a patient
A 45-year-old woman receives a positive mammogram. She’s terrified and assumes she has cancer. Write a compassionate but accurate paragraph explaining what her positive result actually means, using the PPV.
d. False positives discussion
How many women in this screening will have false positive results (positive mammogram but no cancer)? What are the potential consequences of false positives in cancer screening?
💭 Question 13: Detective’s Reflection
Reflect on probability and medical testing (6-8 sentences):
- Why is understanding probability important for interpreting genetic test results and disease screening?
- What’s the difference between P(Disease | Positive Test) and P(Positive Test | Disease)? Why does this matter?
- How can a test with 99% sensitivity and 99% specificity still produce many false positives?
- Why do sensitivity and specificity stay the same regardless of disease prevalence, but PPV and NPV change with prevalence?
- When is a tree diagram most helpful versus a contingency table?
- How might understanding these concepts change the way doctors communicate test results to patients?
- Name one specific way that probability concepts could reduce unnecessary medical interventions or patient anxiety.
🎉 Excellent work, Statistical Detective! Understanding probability, conditional probability, and Bayes’ Theorem is crucial for interpreting medical test results and making informed health decisions!
Remember: A positive test doesn’t always mean you have the disease, and understanding the mathematics helps both patients and doctors make better decisions. Your work matters!
