Inference for Proportions

Week 9

Welcome!

Today’s Plan

  • Opening application: vaccine efficacy in the real world
  • Confidence intervals for a single proportion
  • Hypothesis tests for a single proportion
  • Comparing two proportions
  • Checking conditions for inference

Opening Application: mRNA Vaccines

In December 2020, the Pfizer-BioNTech COVID-19 vaccine received Emergency Use Authorization.

Here is the key result from the Phase 3 clinical trial:

Group n COVID-19 cases
Vaccine 18,198 8
Placebo 18,325 162

Vaccine efficacy = 1 − (risk in vaccine group / risk in placebo group) = 1 − (8/18198)/(162/18325) ≈ 95%

But is 95% a parameter or a statistic? How confident should we be in that number?

From Means to Proportions

So far, we’ve focused on inference for means (µ) using t-tests.

But many scientific questions are really about proportions (p):

  • What fraction of patients respond to a treatment?
  • Does vaccination reduce the probability of infection?
  • Are two groups equally likely to develop a disease?

The logic is exactly the same — we just need a different standard error.

Notation

Symbol Meaning
p Population proportion (parameter)
Sample proportion (statistic)
n Sample size
p₀ Hypothesized proportion (in H₀)

Example: In the placebo group, \(\hat{p} = 162/18325 = 0.00884\) (about 0.88% infection rate).

We want to make inferences about the true infection rate p in an unvaccinated population.

The Sampling Distribution of p̂

Just like \(\bar{x}\), the sample proportion \(\hat{p}\) varies from sample to sample.

When conditions are met, the sampling distribution of \(\hat{p}\) is approximately normal:

\[\hat{p} \sim N\!\left(p,\ \sqrt{\frac{p(1-p)}{n}}\right)\]

The standard deviation of \(\hat{p}\) is called the standard error of a proportion:

\[SE(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

Conditions for Inference with Proportions

Before we compute anything, we must check three conditions:

1. Independence: Observations must be independent. → Usually satisfied if sampling randomly, or if treated/control groups are assigned independently.

2. Success/Failure (Large Sample) Condition: \[n\hat{p} \geq 10 \quad \text{AND} \quad n(1-\hat{p}) \geq 10\]

We need at least 10 “successes” and 10 “failures” in the sample.

3. Sample size / Population size: n < 10% of the population (if sampling without replacement).

Checking Conditions: Placebo Group

For the placebo group in the Pfizer trial:

  • \(n = 18{,}325\), \(\hat{p} = 162/18325 \approx 0.00884\)

Check Success/Failure:

  • Successes: \(n\hat{p} = 18325 \times 0.00884 \approx 162\) ✅ (≥ 10)
  • Failures: \(n(1-\hat{p}) = 18325 \times 0.99116 \approx 18163\) ✅ (≥ 10)

Independence: Participants were randomly assigned to placebo or vaccine ✅

10% condition: 18,325 is far less than 10% of all adults worldwide ✅

Conditions met — we can proceed.

Confidence Interval for a Single Proportion

\[\hat{p} \pm z^* \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

where \(z^*\) is the critical value from the standard normal:

Confidence Level \(z^*\)
90% 1.645
95% 1.960
99% 2.576

Notice: We use \(z^*\) (normal), not \(t^*\) (t-distribution), because proportions have a known theoretical SE formula.

CI Example: Placebo Infection Rate

Placebo group: \(\hat{p} = 162/18325 \approx 0.00884\), \(n = 18325\)

Standard Error: \[SE = \sqrt{\frac{0.00884 \times 0.99116}{18325}} = \sqrt{0.000000478} \approx 0.000692\]

95% CI: \[0.00884 \pm 1.96 \times 0.000692 = 0.00884 \pm 0.00136 = (0.00748,\ 0.01020)\]

Interpretation: We are 95% confident that the true infection rate in the unvaccinated population is between 0.75% and 1.02%.

Think-Pair-Share #1

[Poll Everywhere — respond now!]

In the Pfizer trial, the vaccine group had: \(n = 18{,}198\) participants, 8 COVID cases.

Discuss with your neighbor (2 min):

  1. Calculate \(\hat{p}\) for the vaccine group.
  2. Check the success/failure condition. Does it pass? Does anything concern you?
  3. Calculate the 95% CI for the true infection rate in vaccinated people.
  4. Does the CI for vaccinated people overlap with the CI for unvaccinated people? What does that suggest?

→ On Poll Everywhere: Type one word that describes what these two CIs tell us about vaccine effectiveness.

Hypothesis Test (Single Prop)

We want to test a specific claim about p.

Step 1: State hypotheses \[H_0: p = p_0 \qquad H_a: p \neq p_0 \text{ (or } < \text{ or } >)\]

Step 2: Check conditions (same as for CI)

Step 3: Compute the test statistic (z-score) \[z = \frac{\hat{p} - p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}\]

Important: Under \(H_0\), we know p = p₀, so we use p₀ (not p̂) in the SE.

Step 4: Find the p-value using the standard normal distribution.

Step 5: Conclude in context.

Test Statistic: Why p₀ in the Denominator?

When computing CIs, we don’t know p, so we estimate it with \(\hat{p}\).

When testing, we’re asking: “Assuming H₀ is true (p = p₀), how surprising is our result?”

So we use p₀ in the SE under H₀:

\[SE_{H_0} = \sqrt{\frac{p_0(1-p_0)}{n}}\]

This is a subtle but important distinction from the CI formula.

Hypothesis Test Example

A hospital claims its surgical infection rate is only 2% (the national benchmark). You audit 350 surgeries and find 11 infections.

Hypotheses: \(H_0: p = 0.02\) vs. \(H_a: p > 0.02\) (one-sided — the concern is rates above benchmark)

Check conditions: \(n p_0 = 350 \times 0.02 = 7 < 10\) ⚠️

The success/failure condition is borderline — we should note this caveat before proceeding.

\(\hat{p} = 11/350 = 0.0314\)

\[z = \frac{0.0314 - 0.02}{\sqrt{(0.02)(0.98)/350}} = \frac{0.0114}{0.00748} = 1.52\]

p-value = P(Z > 1.52) = 0.064

Interpreting the Hospital Example

Decision: p-value = 0.064 > 0.05, so we fail to reject H₀.

Conclusion: There is not sufficient evidence at the 5% significance level to conclude the true infection rate exceeds the 2% national benchmark.

But wait — practical significance matters too!

  • The observed rate is 3.1% vs. 2% benchmark — a 55% relative increase
  • With only n = 350, we may lack power to detect this real difference
  • A larger audit might well find statistical significance

Statistical non-significance ≠ evidence that rates are equal

☕ BREAK — 10 minutes

Coming up:

What if we want to compare two proportions — like the vaccine group vs. placebo? We need a test that handles two independent groups.

Comparing Two Proportions

Research question: Is the infection rate significantly lower in the vaccine group than the placebo group?

Setup:

Group n Cases \(\hat{p}\)
Vaccine 18,198 8 0.000440
Placebo 18,325 162 0.008839

Hypotheses: \[H_0: p_V = p_P \quad \text{(vaccine has no effect)}\] \[H_a: p_V < p_P \quad \text{(vaccine reduces infection)}\]

SE for the Difference of Two Proportions

For a confidence interval, we use both sample proportions:

\[SE = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\]

For a hypothesis test (where \(H_0: p_1 = p_2\)), we pool the proportions:

\[\hat{p}_{pool} = \frac{x_1 + x_2}{n_1 + n_2} = \frac{8 + 162}{18198 + 18325} = \frac{170}{36523} \approx 0.00465\]

\[SE_{pool} = \sqrt{\hat{p}_{pool}(1-\hat{p}_{pool})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}\]

Two-Proportion Z-Test: Pfizer

\[\hat{p}_{pool} = \frac{170}{36523} = 0.004654\]

\[SE_{pool} = \sqrt{0.004654 \times 0.99535 \times \left(\frac{1}{18198} + \frac{1}{18325}\right)} = \sqrt{5.07 \times 10^{-7}} \approx 0.000712\]

\[z = \frac{(0.000440 - 0.008839) - 0}{0.000712} = \frac{-0.008399}{0.000712} \approx -11.8\]

p-value ≈ 0 (essentially zero for a one-sided test with z = −11.8)

Conclusion: There is overwhelming statistical evidence that the true infection rate is lower in the vaccinated group than in the placebo group (z = −11.8, p ≈ 0).

R Output for Two-Proportion Test

In R, we use prop.test(). Here is what the output looks like:

    2-sample test for equality of proportions

data:  c(8, 162) out of c(18198, 18325)
X-squared = 139.23, df = 1, p-value < 2.2e-16
alternative hypothesis: less
95 percent confidence interval:
 -1.0000000 -0.0073064
sample estimates:
    prop 1     prop 2 
0.00043961 0.00883889

Notes on the output:

  • prop.test() uses a chi-square statistic (\(X^2 = z^2 = 139.23 \approx 11.8^2\)) ✅
  • p-value < 2.2e-16 (essentially zero) ✅
  • The 95% CI for \(p_V - p_P\) is entirely negative → vaccine group always has lower rate ✅

Think-Pair-Share #2

[Poll Everywhere — respond now!]

Consider this R output from a different vaccine study:

    2-sample test for equality of proportions

data:  c(45, 68) out of c(900, 920)
X-squared = 4.21, df = 1, p-value = 0.040
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.0461  -0.0006
sample estimates:
   prop 1    prop 2 
0.050000  0.073913

Discuss (2 min):

  1. What are the two sample proportions? Which group had a higher infection rate?
  2. Write a one-sentence conclusion based on the p-value (α = 0.05).
  3. Interpret the 95% CI in plain language.
  4. Would you call this result practically significant? (Hint: absolute vs. relative reduction)

→ Poll Everywhere: True or False — If p-value < 0.05, the effect must be large.

CI for the Difference of Two Proportions

\[(\hat{p}_1 - \hat{p}_2) \pm z^* \cdot \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\]

For the Pfizer trial:

\[\hat{p}_V - \hat{p}_P = 0.000440 - 0.008839 = -0.008399\]

\[SE = \sqrt{\frac{0.000440 \times 0.999560}{18198} + \frac{0.008839 \times 0.991161}{18325}} \approx 0.000712\]

95% CI:

\[-0.008399 \pm 1.96 \times 0.000712 = (-0.00979,\ -0.00701)\]

We are 95% confident that the true difference in infection rates (vaccine − placebo) is between −0.98 and −0.70 percentage points.

Conditions for Two-Proportion Inference

Check these for each group separately:

1. Independence: Groups are independent of each other ✅ (Random assignment ensures this)

2. Success/Failure in each group:

  • Group 1: \(n_1 \hat{p}_1 \geq 10\) AND \(n_1(1-\hat{p}_1) \geq 10\)
  • Group 2: \(n_2 \hat{p}_2 \geq 10\) AND \(n_2(1-\hat{p}_2) \geq 10\)

For the Pfizer vaccine group: \(n_1 \hat{p}_1 = 18198 \times 0.000440 = 8 < 10\) ⚠️

Note: With rare events (very small p), even large samples may not meet this condition perfectly. The z-test is approximate; exact methods exist for such cases.

Think-Pair-Share #3

[Poll Everywhere — respond now!]

A public health researcher studies HPV vaccination in two counties:

  • County A: 480 out of 600 eligible teens vaccinated (\(\hat{p}_A = 0.80\))
  • County B: 390 out of 500 eligible teens vaccinated (\(\hat{p}_B = 0.78\))

Discuss (2 min):

  1. Check the success/failure condition for both counties.
  2. What are the hypotheses for a two-sided test of whether the vaccination rates differ?
  3. Compute the pooled proportion \(\hat{p}_{pool}\).
  4. Do you think this difference (80% vs. 78%) is likely to be statistically significant? What about practically significant?

→ Poll Everywhere: What is \(\hat{p}_{pool}\) for this study? (Round to 2 decimal places)

Summary: Inference for Proportions

Single Proportion Two Proportions
Statistic \(\hat{p}\) \(\hat{p}_1 - \hat{p}_2\)
SE (CI) \(\sqrt{\hat{p}(1-\hat{p})/n}\) \(\sqrt{\hat{p}_1(1-\hat{p}_1)/n_1 + \hat{p}_2(1-\hat{p}_2)/n_2}\)
SE (test) \(\sqrt{p_0(1-p_0)/n}\) \(\sqrt{\hat{p}_{pool}(1-\hat{p}_{pool})(1/n_1 + 1/n_2)}\)
Distribution Normal (z) Normal (z)
Conditions np̂ ≥ 10, n(1-p̂) ≥ 10 Both groups separately

Key Distinctions to Remember

CI vs. Test SE: They use different formulas!

  • CI uses \(\hat{p}\) because we don’t assume a specific p
  • Test uses \(p_0\) (single) or \(\hat{p}_{pool}\) (two-sample) because we assume H₀ is true

Pooling: We pool for two-proportion tests because under H₀, both groups have the same true p — so we combine all data to estimate it.

Conditions matter: Always check success/failure for each group. When counts are small, interpret results cautiously.

Looking Ahead

Thursday: We extend contingency table analysis to the chi-square test — a more flexible approach when we have multiple categories.

HW8 / Practice Final: Covers all material from Week 5 onward with emphasis on interpretation. You’ll be given R output and asked to explain what it means.

Key reminder for the final: No distribution tables. No software. Focus on:

  • Reading R output correctly
  • Interpreting p-values, CIs, and test statistics
  • Checking conditions
  • Choosing the right standard error