HW3: Probability and Medical Testing

Understanding Chance and Diagnostic Results in Medicine

Author

Dr. Marcela Alfaro Córdoba

🧬 The Case

Welcome back, Statistical Detective!

Dr. Sarah Martinez from the County Public Health Department has a critical challenge for you. “Understanding probability is essential for public health,” she explains. “We need to assess disease risks, interpret genetic test results, evaluate screening programs, and help patients understand what their test results really mean. The county is facing questions about genetic disorders, disease transmission, and the reliability of medical tests.”

A woman with a positive mammogram asks, “Do I have cancer?” A man with a positive prostate screening asks, “What are the chances this is wrong?” Your mission: Use probability concepts to help the Public Health Department make informed decisions and communicate results clearly to patients!

Question 1: Probability Vocabulary and Basic Concepts

Dr. Martinez wants to ensure everyone on her team understands probability terminology.

a. Define in your own words

Provide a definition and a biological or medical example for each term:

Sample space:
- Definition:
- Example:
Event:
- Definition:
- Example:
Probability:
- Definition:
- Example:

b. Blood type probabilities

A biology lab is studying blood types in a population. They collected data from 200 randomly selected individuals:

Blood Type	Frequency
O	90
A	72
B	28
AB	10

Calculate the probability that a randomly selected person has:

Type O blood: P(O) =
Type A blood: P(A) =
Type AB blood: P(AB) =

Show your work using the formula: P(event) = favorable outcomes / total outcomes

c. Interpretation

If 1,000 people from this population donate blood, approximately how many would you expect to have Type O blood? Explain your reasoning.

Question 2: Addition Rule (OR Probabilities)

The Public Health Department is tracking flu and COVID cases in the community.

In a sample of 500 patients at a clinic: - 150 tested positive for flu - 100 tested positive for COVID - 30 tested positive for both flu and COVID

a. Venn diagram

Draw and label a Venn diagram showing these events. Include the number of patients in each region.

b. Calculate probabilities

Calculate:

P(Flu) =
P(COVID) =
P(Flu AND COVID) =
P(Flu OR COVID) =

For #4, show your work using the addition rule: P(A or B) = P(A) + P(B) - P(A and B)

c. Interpret

What percentage of patients tested positive for at least one respiratory illness? Explain why we need to subtract P(Flu AND COVID) in the addition rule.

d. Neither illness

What’s the probability that a randomly selected patient tested negative for both illnesses? (Hint: This is the complement of “Flu OR COVID”)

Question 3: Multiplication Rule and Independence

a. Define independence

In your own words, explain what it means for two events to be independent. Give a biological example of: - Two independent events - Two dependent events

b. Genetic independence

A couple is planning to have two children. Assume the probability of a boy is 0.5 and the probability of a girl is 0.5, and that the sex of each child is independent.

Calculate:

P(first child is a boy) =
P(second child is a boy) =
P(both children are boys) =
Show how you used the multiplication rule for independent events

c. At least one girl

What’s the probability the couple has at least one girl? (Hint: Use the complement rule)

Question 4: Tree Diagrams for Genetics

A genetic counselor is working with a family. Sarah is a carrier of a gene for cystic fibrosis. Each of her children has a 50% chance of inheriting the gene.

a. Draw a tree diagram

Draw a complete tree diagram for Sarah having two children, showing: - First child: carrier or not carrier (50% each) - Second child: carrier or not carrier (50% each) - Label all branches with probabilities - Show all possible outcomes with their probabilities

b. Calculate probabilities

Using your tree diagram, calculate:

P(both children are carriers) =
P(neither child is a carrier) =
P(exactly one child is a carrier) =
P(at least one child is a carrier) =

Show your work for each calculation.

c. Verify

Do all the probabilities for the final outcomes sum to 1? Show this calculation.

Question 5: Contingency Tables and Joint Probabilities

A study examined the relationship between exercise habits and heart disease in 1,000 adults:

	Heart Disease	No Heart Disease	Total
Regular Exercise	80	420	500
No Regular Exercise	180	320	500
Total	260	740	1000

a. Marginal probabilities

Calculate:

P(Regular Exercise) =
P(Heart Disease) =

b. Joint probabilities

Calculate:

P(Regular Exercise AND Heart Disease) =
P(No Regular Exercise AND Heart Disease) =
P(Regular Exercise AND No Heart Disease) =
P(No Regular Exercise AND No Heart Disease) =

Verify that all four joint probabilities sum to 1.

c. Public health interpretation

Among people who exercise regularly, what proportion have heart disease? Among people who don’t exercise regularly, what proportion have heart disease? What does this suggest?

Question 6: Understanding Conditional Probability

a. Notation and meaning

Explain what P(A|B) means in your own words, then provide a medical example.

b. Calculate from contingency table

Using the exercise and heart disease data from Question 5, calculate:

P(Heart Disease | Regular Exercise) =
P(Heart Disease | No Regular Exercise) =

Show your work and interpret: How many times higher is the risk of heart disease for non-exercisers compared to regular exercisers?

c. Reverse conditional

Calculate:

P(Regular Exercise | Heart Disease) =
P(No Regular Exercise | Heart Disease) =

d. Important distinction

Explain the difference between: - P(Heart Disease | Regular Exercise) and P(Regular Exercise | Heart Disease)

Why is this distinction important?

Question 7: Tree Diagrams for Diagnostic Testing

A rapid COVID test has the following characteristics: - Community infection rate: 10% - Sensitivity (correctly identifies infected): 85% - Specificity (correctly identifies not infected): 95%

a. Draw a tree diagram

Create a tree diagram with: - First branch: Infected (10%) or Not Infected (90%) - Second branches: Test Positive or Test Negative (for each first branch) - Label all branches with probabilities - Calculate the probability for each final outcome

b. Calculate outcomes

For a population of 1,000 people, use your tree diagram to calculate:

How many are actually infected?
How many infected people test positive (true positives)?
How many infected people test negative (false negatives)?
How many uninfected people test positive (false positives)?
How many uninfected people test negative (true negatives)?

c. Conditional probability question

If someone tests positive, what’s the probability they’re actually infected?

P(Infected | Test Positive) =

Show your work step by step.

Question 8: Creating Contingency Tables for Screening

A screening test for a rare genetic disorder is being evaluated in a population where 2% of people have the disorder.

The test has: - Sensitivity = 95% (correctly identifies 95% of people with the disorder) - Specificity = 90% (correctly identifies 90% of people without the disorder)

a. Complete the contingency table

For 10,000 people, fill in all cells:

	Disease Present	Disease Absent	Total
Test Positive
Test Negative
Total			10,000

Show your calculations for each cell.

b. Calculate key probabilities

From your table, calculate:

P(Test Positive) =
P(Disease Present | Test Positive) =

c. Interpretation

How many people who test positive actually have the disease? How many are false positives? What might this mean for people who receive positive test results?

Question 9: Bayes’ Theorem Introduction

a. State Bayes’ Theorem

Write out Bayes’ Theorem for calculating P(Disease | Positive Test).

b. Apply Bayes’ Theorem

Use the genetic disorder screening data from Question 8. Apply Bayes’ Theorem to calculate P(Disease Present | Test Positive).

Step 1: Identify the components - P(Disease) = - P(Positive | Disease) = - P(Positive | No Disease) =

Step 2: Calculate P(Positive)

Step 3: Apply Bayes’ Theorem

Step 4: Verify this matches your answer from Question 8b.

c. Interpret

Explain this result to a patient in plain language. The test is “95% accurate” for detecting the disease, but what does a positive result really mean?

Question 10: Sensitivity, Specificity, and Predictive Values

A new rapid test for strep throat is being evaluated. In a study of 1,000 patients: - 200 actually have strep throat - Of the 200 with strep: 180 test positive, 20 test negative - Of the 800 without strep: 80 test positive, 720 test negative

a. Create the 2×2 table

	Strep Present	Strep Absent	Total
Test Positive
Test Negative
Total			1000

b. Calculate sensitivity

Sensitivity = P(Test Positive | Disease Present)

Sensitivity =

Interpretation: What does this tell us about the test?

c. Calculate specificity

Specificity = P(Test Negative | Disease Absent)

Specificity =

Interpretation: What does this tell us about the test?

d. Positive Predictive Value (PPV)

PPV = P(Disease Present | Test Positive)

PPV =

Interpretation: If someone tests positive, what’s the probability they actually have strep?

e. Negative Predictive Value (NPV)

NPV = P(Disease Absent | Test Negative)

NPV =

Interpretation: If someone tests negative, what’s the probability they truly don’t have strep?

f. Summary and patient communication

Fill in this summary table:

Measure	Value	What it tells us
Sensitivity
Specificity
PPV
NPV

A patient tests positive. Write 2-3 sentences explaining what this means using the PPV. Make your explanation clear for someone without statistical training.

Question 11: Impact of Prevalence on Predictive Values

Consider a highly accurate COVID test with sensitivity = 99% and specificity = 98%.

a. High prevalence scenario

During a major outbreak, 20% of people tested actually have COVID.

Create a table for 10,000 people and calculate PPV and NPV:

	COVID+	COVID-	Total
Test Positive
Test Negative
Total			10,000

PPV =

NPV =

b. Low prevalence scenario

When community spread is low, only 1% of people tested have COVID.

Create a table for 10,000 people and calculate PPV and NPV:

	COVID+	COVID-	Total
Test Positive
Test Negative
Total			10,000

PPV =

NPV =

c. Compare and explain

How does the PPV change from high prevalence to low prevalence? Why does this happen even though sensitivity and specificity stay the same?

d. Public health implication

Why might the same test be useful for diagnosing symptomatic patients (higher prevalence) but less useful for screening asymptomatic people (lower prevalence)?

Question 12: Mammography Screening Application

Consider these facts about breast cancer screening: - Breast cancer prevalence in women age 40-50: 1.4% - Mammogram sensitivity: 75% - Mammogram specificity: 92%

a. Create a contingency table

For 10,000 women screened, complete the table (show all calculations):

	Cancer	No Cancer	Total
Positive Mammogram
Negative Mammogram
Total			10,000

b. Calculate predictive values

PPV = P(Cancer | Positive Mammogram) =
NPV = P(No Cancer | Negative Mammogram) =

c. Interpret for a patient

A 45-year-old woman receives a positive mammogram. She’s terrified and assumes she has cancer. Write a compassionate but accurate paragraph explaining what her positive result actually means, using the PPV.

d. False positives discussion

How many women in this screening will have false positive results (positive mammogram but no cancer)? What are the potential consequences of false positives in cancer screening?

💭 Question 13: Detective’s Reflection

Reflect on probability and medical testing (6-8 sentences):

Why is understanding probability important for interpreting genetic test results and disease screening?
What’s the difference between P(Disease | Positive Test) and P(Positive Test | Disease)? Why does this matter?
How can a test with 99% sensitivity and 99% specificity still produce many false positives?
Why do sensitivity and specificity stay the same regardless of disease prevalence, but PPV and NPV change with prevalence?
When is a tree diagram most helpful versus a contingency table?
How might understanding these concepts change the way doctors communicate test results to patients?
Name one specific way that probability concepts could reduce unnecessary medical interventions or patient anxiety.

🎉 Excellent work, Statistical Detective! Understanding probability, conditional probability, and Bayes’ Theorem is crucial for interpreting medical test results and making informed health decisions!

Remember: A positive test doesn’t always mean you have the disease, and understanding the mathematics helps both patients and doctors make better decisions. Your work matters!

🧬 The Case

Question 1: Probability Vocabulary and Basic Concepts

Question 2: Addition Rule (OR Probabilities)

Question 3: Multiplication Rule and Independence

Question 4: Tree Diagrams for Genetics

Question 5: Contingency Tables and Joint Probabilities

Question 6: Understanding Conditional Probability

Question 7: Tree Diagrams for Diagnostic Testing

Question 8: Creating Contingency Tables for Screening

Question 9: Bayes’ Theorem Introduction

Question 10: Sensitivity, Specificity, and Predictive Values

Question 11: Impact of Prevalence on Predictive Values

Question 12: Mammography Screening Application

💭 Question 13: Detective’s Reflection

End of Assignment