Independent Samples & Statistical Power

STAT 7

Welcome!

STAT 7 - Winter 2026

Today’s Plan

  • Continue: The DASH Diet Study
  • Independent samples t-tests
  • Introduction to statistical power
  • Calculating power for a study
  • Sample size determination
  • Planning future research

Recap: Tuesday’s Analysis

The DASH Diet Study (Appel et al., 1997, NEJM)

We analyzed a paired design:

  • Same participants before and after DASH diet
  • Mean reduction: 5.5 mmHg systolic BP
  • p < 0.001
  • Both statistically and clinically significant!

Today: How do we compare the DASH diet to OTHER diets?

The Full DASH Study Design

Three independent groups (different people in each):

  1. Control diet (n = 154) - typical American diet
  2. Fruits & Vegetables (n = 154) - control + more produce
  3. DASH diet (n = 151) - full intervention

Key difference from Tuesday:

  • Not the same people measured twice
  • Different participants in each diet group
  • This is an independent samples design

Comparing Two Diet Groups

Research Question: Does the DASH diet reduce blood pressure more than the Fruits & Vegetables diet?

Summary statistics (change in systolic BP from baseline):

  • DASH diet: mean = -5.5 mmHg, SD = 7.8 mmHg, n = 151
  • F&V diet: mean = -2.8 mmHg, SD = 7.5 mmHg, n = 154

Note: Negative values indicate BP decreased (good!)

Independent samples - Different people in each diet group

Independent Samples: Hypotheses

Let μ₁ = mean BP change for DASH diet
Let μ₂ = mean BP change for F&V diet

  • H₀: μ₁ = μ₂ (no difference between diets)
    • Equivalently: μ₁ - μ₂ = 0
  • Hₐ: μ₁ ≠ μ₂ (there is a difference)
    • Equivalently: μ₁ - μ₂ ≠ 0
  • Two-sided test at α = 0.05

Why two-sided? Though we expect DASH to be better, we test for any difference.

Independent Samples t-Test Formula

\[t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

Where:

  • \(\bar{x}_1, \bar{x}_2\) = sample means
  • \(s_1, s_2\) = sample standard deviations
  • \(n_1, n_2\) = sample sizes
  • Under H₀: μ₁ - μ₂ = 0

Degrees of Freedom for Independent Samples

Simple approximation: \(df = \min(n_1 - 1, n_2 - 1)\)

Better approximation (Welch’s):

\[df = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{(s_1^2/n_1)^2/(n_1-1) + (s_2^2/n_2)^2/(n_2-1)}\]

Statistical software uses Welch’s method automatically.

For our example: df ≈ 150 (using simpler method)

DASH vs. F&V: Calculation

Difference in means: -2.7 mmHg
Standard error: 0.88 mmHg
t-statistic: -3.08 
df: 150 
p-value: < 0.001

Decision: p < 0.001, reject H₀

Conclusion: The DASH diet reduces blood pressure significantly more than the Fruits & Vegetables diet alone (additional reduction of 2.7 mmHg, p < 0.001).

Clinical and Practical Significance

Statistical result: p < 0.001 - highly significant

Practical significance:

  • DASH reduces BP 2.7 mmHg more than F&V diet
  • This additional reduction, while statistically significant, is modest
  • Both diets show benefit compared to control

The bigger picture:

  • DASH: 5.5 mmHg reduction from baseline
  • F&V: 2.8 mmHg reduction from baseline
  • Control: 0.9 mmHg reduction from baseline

All three differ significantly from each other!

Confidence Interval: Independent Samples

\[(\bar{x}_1 - \bar{x}_2) \pm t^* \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]

For 95% CI with large df, \(t^* \approx 1.96\):

95% CI: (-4.42, -0.98) mmHg

Interpretation: We’re 95% confident that the DASH diet reduces blood pressure between 0.98 and 4.42 mmHg more than the F&V diet.

Note: Entire interval is negative (favoring DASH)

Think-Pair-Share 1

Comparison: Control diet showed 0.9 mmHg reduction. DASH showed 5.5 mmHg reduction. Difference = 4.6 mmHg (p < 0.001).

  1. Think (1 min): Is this a larger effect than DASH vs. F&V? Why might the F&V diet also reduce BP somewhat?
  2. Pair (2 min): Discuss the role of fruits and vegetables in both diets
  3. Share: What does this tell us about dietary components?

Conditions for Independent Samples t-Test

  1. Independence:
    • Observations within each group are independent
    • The two groups are independent of each other
    • DASH study: Random assignment to diet groups ✓
  2. Normality:
    • Data in each group should be approximately normal
    • OR large enough samples (n₁, n₂ ≥ 30)
    • DASH study: n > 150 in each group ✓
  3. Note: We DON’T assume equal variances (use Welch’s t-test)

The DASH study meets all conditions!

Break Time! ☕ 5-minute break

Stretch, grab water, chat with neighbors!

We’ll resume with conditional probability.

Planning the Next Study

Based on DASH results, researchers want to design a new study:

Question: Can a modified DASH diet be effective in adolescents with pre-hypertension?

  • Before collecting data, need to determine sample size
  • How many adolescents to enroll?
  • This requires power analysis

Statistical Power: Introduction

Imagine planning this new dietary intervention study…

  • The diet truly does work (unknown to you)
  • You run your study
  • Will you detect the effect?

Power = Probability of correctly rejecting H₀ when Hₐ is true

In other words: Probability of detecting a real effect when it exists

Type I and Type II Errors (Review)

H₀ is TRUE H₀ is FALSE (Hₐ is TRUE)
Reject H₀ Type I Error (False Positive) - Probability = α Correct! (True Positive) - Probability = 1-β (Power)
Fail to Reject H₀ Correct (True Negative) - Probability = 1-α Type II Error (False Negative) - Probability = β

Power = 1 - β = Probability of detecting a true effect

Why Does Power Matter?

  • Low power → High chance of missing real effects
    • Wastes resources (time, money, participants)
    • Misleading null results
    • Ethical concerns (participants for no gain)
  • High power → Good chance of detecting real effects
    • More confidence in study design
    • Better use of resources

Typical goal: Power ≥ 80% (sometimes 90%)

What Affects Power?

Power depends on:

  1. Effect size (Δ) - How large is the true difference?
    • Larger effects → easier to detect → higher power
  2. Variability (σ) - How much do measurements vary?
    • Less variability → easier to detect effects → higher power
  3. Sample size (n) - How many subjects?
    • Larger samples → more precise estimates → higher power
  4. Significance level (α) - How strict is our threshold?
    • Larger α → easier to reject H₀ → higher power (but more Type I errors!)

Example: Planning a Dietary Study

Scenario: Researchers want to test a simplified DASH-style diet in adolescents.

  • Previous studies (like DASH): σ ≈ 8 mmHg for BP changes
  • Effect of interest: Δ = 4 mmHg reduction (clinically meaningful)
  • Significance level: α = 0.05 (two-sided)
  • Question: With n = 50 per group, what’s the power?

Why n = 50? Budget constraints for this pilot study

Visualizing the Problem

Null distribution (H₀: no difference):

  • Center: 0
  • SE = \(\sqrt{8^2/50 + 8^2/50} = 1.60\) mmHg

Alternative distribution (Hₐ: difference = -4):

  • Center: -4
  • Same SE = 1.60 mmHg

Rejection region: |difference| > 1.96 × 1.60 = 3.14 mmHg

The Power Calculation

Green area = Power = P(Reject H₀ | Hₐ is true) ≈ 71%

Computing Power

Step 1: Find rejection region for H₀

  • For α = 0.05, two-sided: critical values at ±1.96 SE
  • Rejection region: difference < -3.14 or > 3.14 mmHg

Step 2: Calculate probability under Hₐ (when true difference = -4)

  • Convert to z-score: \(z = \frac{-3.14 - (-4)}{1.60} = 0.54\)
  • P(Z < 0.54) ≈ 0.71

Power ≈ 71% - Better than our earlier example, but could be higher.

Interpretation: If the diet truly reduces BP by 4 mmHg, there’s a 71% chance this study will detect it.

Think-Pair-Share 2

The dietary intervention study has 71% power with n=50 per group to detect a 4 mmHg reduction.

  1. Think (1 min): What does 71% power mean? Is this adequate?
  2. Pair (2 min): The researchers want 80% power. What could they do?
  3. Share: What are the tradeoffs?

Increasing Power to 80%

Target: 80% power to detect Δ = 4 mmHg reduction

Key insight: Rejection region is always 1.96 SE from 0. We need the alternative distribution far enough left that 80% falls in the rejection region.

This requires: \(0.84 \times SE + 1.96 \times SE = 4\)

Where 0.84 is the z-score for 80th percentile (for 80% power)

Sample Size Calculation

\[2.8 \times SE = 4\]

\[SE = \frac{4}{2.8} = 1.43\]

Since \(SE = \sqrt{\frac{\sigma_1^2}{n} + \frac{\sigma_2^2}{n}} = \sqrt{\frac{2\sigma^2}{n}}\) with σ = 8:

\[\sqrt{\frac{2 \times 8^2}{n}} = 1.43\]

\[n = \frac{2 \times 8^2}{1.43^2} = 63\] participants per group

Conclusion: Need 63 adolescents in each diet group for 80% power.

Sample Size Formula

For comparing two means with equal n per group:

\[n = \frac{(\sigma_1^2 + \sigma_2^2)(z_{1-\alpha/2} + z_{1-\beta})^2}{\Delta^2}\]

Where:

  • σ₁, σ₂ = population standard deviations (often assumed equal)
  • Δ = minimum effect size of interest
  • z_{1-α/2} = critical value for significance (1.96 for α=0.05)
  • z_{1-β} = critical value for power (0.84 for 80% power, 1.28 for 90%)

Always round UP!

Verify Our Calculation

Using the formula for our dietary study:

\[n = \frac{(8^2 + 8^2)(1.96 + 0.84)^2}{4^2}\]

\[n = \frac{128 \times (2.8)^2}{16} = \frac{128 \times 7.84}{16} = 62.7\]

Round up to n = 63 per group ✓

This matches our earlier calculation!

Practice: Omega-3 Supplementation Study

A nutrition researcher wants to test omega-3 supplementation on inflammation markers.

  • Previous data: σ ≈ 12 mg/L for C-reactive protein (CRP)
  • Target reduction: Δ = 8 mg/L (clinically meaningful)
  • Desired power: 90%
  • Significance: α = 0.05 (two-sided)

Calculate: How many participants needed per group?

Solution

\[n = \frac{(12^2 + 12^2)(1.96 + 1.28)^2}{8^2}\]

Where:

  • z_{1-α/2} = 1.96 (for α = 0.05)
  • z_{1-β} = 1.28 (for 90% power)

\[n = \frac{2 \times 144 \times (3.24)^2}{64} = \frac{288 \times 10.50}{64} = 47.25\]

Answer: Need 48 participants per group (supplement vs. placebo)

This is a modest sample size - omega-3 studies are feasible!

Think-Pair-Share 3

Researchers want to study the DASH diet in older adults (65+). They can afford to enroll 40 participants total (20 per group). Previous data shows σ = 10 mmHg, and they want to detect Δ = 5 mmHg.

  1. Think (1 min): Will this study have good power (≥80%)?
  2. Pair (2 min): What tradeoffs might the researchers consider?
  3. Share: How would you advise them to proceed?

The DASH Study’s Legacy

Original DASH trial (1997):

  • Carefully planned with adequate sample size
  • Multiple diet groups for comparisons
  • Detected clinically meaningful effects
  • Published in top medical journal

Impact:

  • Changed dietary guidelines nationwide
  • Led to DASH eating plan recommendations
  • Spawned follow-up studies (DASH-Sodium, DASH-Plus)
  • Continues to influence public health policy

Key lesson: Good study design with proper power analysis leads to impactful science!

Power Analysis: Key Principles

  1. Do power analysis BEFORE collecting data
    • Determines appropriate sample size
    • Justifies study to reviewers/funders
  2. Balance competing factors:
    • Higher power → need more participants → more cost/time
    • Lower power → risk missing real effects
  3. Be realistic about effect sizes
    • Use previous studies
    • Consider minimum clinically important difference
  4. Remember: Power = 1 - P(Type II Error)

Real-World Considerations

Why not always use huge samples for 99% power?

  • Cost - Each participant costs money
  • Time - Recruitment takes time
  • Ethics - Don’t expose more participants than necessary
  • Diminishing returns - Going from 80% to 90% power requires much larger n than 60% to 70%

Standard practice: Target 80-90% power

Summary: Power and Sample Size

To Increase Power: Effect on Study:
Increase sample size (n) More expensive, takes longer
Study larger effect sizes (Δ) May not match research question
Reduce variability (σ) Better measurement, stricter inclusion
Increase α More Type I errors - usually not done

Bottom line: Sample size is usually the only practical lever

Key Takeaways

  1. Independent samples t-tests compare means from two separate groups (like DASH vs. F&V diet)
  2. Power = probability of detecting a real effect when it exists
  3. Power depends on: effect size (Δ), variability (σ), sample size (n), and α
  4. Sample size planning ensures adequate power - do this BEFORE collecting data!
  5. Standard target: 80-90% power
  6. The DASH study exemplifies well-designed research with lasting impact

Bottom line: Invest time in planning. The DASH researchers did, and it changed nutrition guidelines!

Looking Ahead

Next week (Week 8):

  • Correlation between two quantitative variables
  • Simple linear regression
  • Moving from comparing groups to studying relationships

Before next class:

  • Complete DSA 6 (due after DS or on Thursday class for those on Monday DS)
  • Complete HW 5 (due Friday)
  • Review: paired vs. independent designs