STAT 17: Confidence Intervals (cont.) & Hypothesis Testing

Prof. Marcela Alfaro Cordoba

Statistics - UCSC

29 May 2026

What We’ll Accomplish Today

Last time: We introduced the Central Limit Theorem and started confidence intervals.

We still need to cover:

  • Confidence intervals with unknown σ (t-distribution) — recap
  • Confidence intervals for proportions

Then, new material:

  • The logic of hypothesis testing
  • Null and alternative hypotheses
  • Test statistics and p-values
  • Complete hypothesis tests (four-step process)

Quick Recap: Confidence Intervals

General form:

\[\text{Estimate} \pm \text{Margin of Error}\]

Three CIs we need:

  1. Mean, σ known (or large n): \(\bar{x} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}\)

  2. Mean, σ unknown: \(\bar{x} \pm t_{\alpha/2,\,df} \times \frac{s}{\sqrt{n}}\)

  3. Proportion: \(\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)

Key insight: CIs give a range of plausible values for a population parameter. Our confidence is in the method, not in any single interval.

Confidence Intervals: σ Unknown — The t-Distribution

Reality check: We almost never know σ in practice.

When σ is unknown:

  • Replace σ with the sample standard deviation s
  • Replace the z critical value with a t critical value
  • Use degrees of freedom df = n − 1

\[\bar{x} \pm t_{\alpha/2,\, df} \times \frac{s}{\sqrt{n}}\]

Why a different distribution?

  • Using s instead of σ introduces extra uncertainty
  • The t-distribution has heavier tails than the normal — it accounts for this
  • As df increases (larger n), the t-distribution approaches the z-distribution

See the applet: https://istats.shinyapps.io/tdist/

The t-Distribution: Critical Values

Finding t critical values in Google Sheets:

=T.INV.2T(alpha, df)

where alpha = 1 − confidence level.

Example: 95% CI with n = 25 (df = 24)

=T.INV.2T(0.05, 24) ≈ 2.064

Compare to z:

=NORM.S.INV(0.975) ≈ 1.96

The t critical value is always larger — giving a wider interval for small samples.

Sample size df t (95%) z (95%)
n = 10 9 2.262 1.960
n = 30 29 2.045 1.960
n = 100 99 1.984 1.960

As n grows, t → z.

Example: t-Confidence Interval

Chloe tests a new production method:

  • Sample: n = 25 phones
  • Sample mean: x̄ = 24.5 hours
  • Sample std dev: s = 3.5 hours
  • Want 95% confidence interval

Step 1: Check conditions

  • Random sample? ✓
  • n = 25 < 30, but battery life is approximately normal ✓

Step 2: Critical value

df = 24
=T.INV.2T(0.05, 24) ≈ 2.064

Step 3–5: Calculate

SE = 3.5 / SQRT(25) = 0.7
ME = 2.064 × 0.7 = 1.445
CI: 24.5 ± 1.445 = (23.055, 25.945)

Interpretation: We are 95% confident the true mean battery life with the new method is between 23.1 and 25.9 hours.

Confidence Intervals for Proportions

New scenario: Estimating a population proportion p

Examples:

  • Proportion of defective products in a batch
  • Percentage of voters supporting a candidate
  • Proportion of customers who would repurchase

Sample proportion:

\[\hat{p} = \frac{x}{n}\]

where x = number of “successes,” n = sample size.

Standard error for proportions:

\[SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

Why z, not t? For proportions, we estimate SE directly from p̂ — there is no unknown σ analogous to the mean case.

CI for Proportions: Formula and Conditions

Confidence interval for a proportion:

\[\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

Conditions — check these before proceeding!

  1. Random sample from population
  2. Success-failure condition: np̂ ≥ 10 and n(1 − p̂) ≥ 10
  3. Population is at least 10× the sample size (if sampling without replacement)

If condition 2 fails, the normal approximation breaks down and this formula is not valid.

Example: Proportion CI

Chloe surveys customers:

  • n = 200 customers sampled
  • x = 156 satisfied with their phone
  • p̂ = 156/200 = 0.78
  • Want 95% CI for the true satisfaction rate

Step 1: Check conditions

  • Random sample? ✓
  • np̂ = 200 × 0.78 = 156 ≥ 10 ✓
  • n(1 − p̂) = 200 × 0.22 = 44 ≥ 10 ✓

Step 2: Calculate

SE = SQRT(0.78 * 0.22 / 200) = 0.0293
z  = NORM.S.INV(0.975) = 1.96
ME = 1.96 × 0.0293 = 0.0574
CI: 0.78 ± 0.0574 = (0.723, 0.837)

Interpretation: We are 95% confident that between 72.3% and 83.7% of all customers are satisfied.

Sample Size for Proportion CI

How large must n be for a desired margin of error?

\[n = \left(\frac{z_{\alpha/2}}{ME}\right)^2 \times \hat{p}(1-\hat{p})\]

The catch: We need p̂ to compute n, but we need n to compute p̂!

Solutions:

  1. Use prior data (historical p̂ if available)
  2. Use p̂ = 0.5 — most conservative, gives the largest (safest) n
  3. Use an educated guess from similar studies

Example: ME = 0.03, 95% confidence, no prior estimate (p̂ = 0.5)

= (NORM.S.INV(0.975) / 0.03)^2 * 0.5 * 0.5
= (1.96 / 0.03)^2 × 0.25 ≈ 1068

Need n = 1068 for a margin of error of 3% with no prior information.

Always round up.

THINK-PAIR-SHARE 1 (7 minutes)

Proportion Confidence Interval

A politician claims to have 55% support. A poll of n = 400 voters finds 200 support the politician (p̂ = 0.50).

  1. Check the conditions for constructing a CI for p.
  2. Construct a 99% confidence interval for the true proportion p.
  3. Does the interval contain 0.55? What does that tell you?
  4. What would happen to the interval if we used 95% confidence instead of 99%?
  5. How many voters would we need to poll to get a margin of error of ±2% with 95% confidence (use p̂ = 0.50)?

Use Google Sheets! Post on Ed Discussion with your partner’s name.

Share your answers in Poll Everywhere!

What is the 99% CI for the true proportion p?

Summary: Which CI to Use?

Situation Formula Distribution
Mean, σ known (or large n) \(\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\) z
Mean, σ unknown \(\bar{x} \pm t_{\alpha/2,\,df} \cdot \frac{s}{\sqrt{n}}\) t (df = n−1)
Proportion \(\hat{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\) z

When in doubt, use t for means — it is always more conservative, and Google Sheets makes it just as easy.

Always check conditions before building any interval!

🧘‍♀️ STRETCH BREAK

Time to move! (5 minutes)

  • Stand up and stretch 🤸‍♀️
  • Chat with neighbors about CIs 💬
  • Grab some water 💧

Hypothesis Testing

Case Study: The Drug Approval Decision

Meet Dr. Chen, a medical researcher testing a new drug to lower blood pressure.

Her challenge: The pharmaceutical company claims the drug lowers blood pressure by at least 10 mmHg. Dr. Chen must:

  • Test whether this claim is supported by data
  • Balance two types of errors: approving ineffective drugs vs. rejecting effective ones
  • Make a decision with statistical evidence
  • Communicate findings to the FDA

The stakes: Approving an ineffective drug wastes money and gives false hope. Rejecting an effective drug denies patients a helpful treatment.

The tool: Hypothesis testing — the scientific method in statistical form!

From Estimation to Testing

So far: Using data to estimate parameters

  • Point estimate: x̄ = 24.3 hours
  • Interval estimate: 95% CI = (23.0, 24.6) hours

Now: Using data to test claims about parameters

  • Claim: “The average battery life is 24 hours”
  • Question: Do our data support or refute this claim?

This is hypothesis testing!

The Logic of Hypothesis Testing

The scientific method:

  1. Start with a claim (hypothesis)
  2. Collect data
  3. See if the data are consistent with the claim
  4. Make a decision: support or reject the claim

Key principle: Proof by contradiction

  • Assume the claim is true
  • If data are very unlikely under this assumption → reject the claim
  • If data are reasonably likely → do not reject the claim

Important: We never prove hypotheses; we only gather evidence for or against them!

Example: Drug lowers BP by 10 mmHg

  • Observe only 2 mmHg reduction with a large sample → reject claim
  • Observe 9 mmHg reduction → not enough evidence to reject

But how do we find the cut-off point?

The Null and Alternative Hypotheses

Null Hypothesis (H₀):

  • The “status quo” or “nothing interesting is happening” claim
  • What we assume is true initially
  • Always contains =, ≤, or ≥
  • The hypothesis we try to find evidence against

Alternative Hypothesis (H₁ or Hₐ):

  • The “research” hypothesis
  • What we are trying to find evidence for
  • Contains ≠, <, or >
  • Determines the type of test (two-tailed, left-tailed, right-tailed)

Key rule: H₀ and H₁ must be:

  • Mutually exclusive (can’t both be true)
  • Exhaustive (one must be true)
  • Statements about population parameters, not sample statistics

Types of Alternative Hypotheses

1. Two-tailed (≠):

  • H₀: μ = μ₀    H₁: μ ≠ μ₀
  • Use when: Interested in detecting any difference, in either direction
  • Example: “Is the mean different from 24 hours?”

2. Right-tailed (>):

  • H₀: μ ≤ μ₀    H₁: μ > μ₀
  • Use when: Want to show the parameter is greater
  • Example: “Has the new process increased battery life?”

3. Left-tailed (<):

  • H₀: μ ≥ μ₀    H₁: μ < μ₀
  • Use when: Want to show the parameter is less
  • Example: “Has the drug lowered blood pressure?”

The alternative hypothesis determines which tail(s) of the distribution we examine!

Formulating Hypotheses: Examples

Example 1: Drug testing

Claim: Drug lowers BP by at least 10 mmHg (μ ≥ 10)

  • H₀: μ ≥ 10   (drug is effective)
  • H₁: μ < 10   (drug is not effective enough)
  • Type: Left-tailed

Example 2: Quality control

Standard: Battery life should be 24 hours (μ = 24)

  • H₀: μ = 24   (meeting standard)
  • H₁: μ ≠ 24   (not meeting standard)
  • Type: Two-tailed

Example 3: Process improvement

Question: Has training improved customer satisfaction above 75%?

  • H₀: p ≤ 0.75   (no improvement)
  • H₁: p > 0.75   (improvement occurred)
  • Type: Right-tailed

Key: The research question determines H₁; H₀ is its complement.

THINK-PAIR-SHARE 2 (7 minutes)

Formulating Hypotheses

For each scenario, write H₀ and H₁, and identify the test type:

  1. A company claims their phone battery lasts at least 48 hours. You want to test this claim.
  2. Historical average GPA at UCSC is 3.2. Has it changed?
  3. A website claims their ads have a 5% click-through rate. You think it’s lower.
  4. A manufacturer wants to know if a new process produces parts with mean weight different from the current 50 grams.
  5. A hospital wants to show that their new treatment reduces recovery time below the current 7 days.

For each: Identify the parameter, write the hypotheses, name the test type.

Post on Ed Discussion with your partner’s name!

Test Statistics: The Evidence Measure

Test statistic: A single number measuring how far the sample data are from H₀

For means (σ known or large sample):

\[z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\]

For means (σ unknown, small sample):

\[t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\]

For proportions:

\[z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\]

Interpretation: How many standard errors is our sample statistic from the null value? A large |test statistic| means the data are inconsistent with H₀.

The P-Value: Measuring Evidence

P-value definition:

The probability of observing data as extreme or more extreme than what we got, assuming H₀ is true.

Interpretation:

  • Small p-value → data unlikely under H₀ → evidence against H₀
  • Large p-value → data consistent with H₀ → insufficient evidence against H₀

The p-value is NOT:

  • The probability that H₀ is true
  • The probability that H₁ is true
  • The probability that we made the wrong decision

Calculating P-Values by Test Type

The p-value depends on the alternative hypothesis:

Two-tailed test (H₁: μ ≠ μ₀):

  • p-value = 2 × P(Z > |test statistic|)
  • Look at both tails

Right-tailed test (H₁: μ > μ₀):

  • p-value = P(Z > test statistic)
  • Look at right tail only

Left-tailed test (H₁: μ < μ₀):

  • p-value = P(Z < test statistic)
  • Look at left tail only

Google Sheets formulas on the next slide!

Google Sheets: P-Value Calculations

z-tests

(proportions or means with σ known)

Two-tailed:

=2*NORM.S.DIST(-ABS(z), TRUE)

Right-tailed:

=1-NORM.S.DIST(z, TRUE)

Left-tailed:

=NORM.S.DIST(z, TRUE)

t-tests

(means with σ unknown)

Two-tailed:

=2*T.DIST(-ABS(t), df, TRUE)

Right-tailed:

=1-T.DIST(t, df, TRUE)

Left-tailed:

=T.DIST(t, df, TRUE)

Pro tip: Use ABS() for the absolute value in two-tailed tests!

Significance Level (α)

Significance level (α): The threshold for rejecting H₀

Common choices:

  • α = 0.05 (most common)
  • α = 0.01 (more conservative)
  • α = 0.10 (more liberal)

Decision rule:

  • If p-value ≤ α → Reject H₀ (statistically significant)
  • If p-value > α → Fail to reject H₀ (not statistically significant)

Relationship to confidence intervals:

  • α = 0.05 corresponds to a 95% CI
  • α = 0.01 corresponds to a 99% CI
  • α = 0.10 corresponds to a 90% CI

Note: α must be chosen before seeing the data!

The Four-Step Hypothesis Test

Step 1: STATE

  • State H₀ and H₁
  • Define parameters clearly
  • Choose significance level α

Step 2: PLAN

  • Check conditions (random sample, sample size, etc.)
  • Choose test statistic (z or t)
  • Identify the distribution

Step 3: SOLVE

  • Calculate the test statistic
  • Find the p-value using Google Sheets

Step 4: CONCLUDE

  • Compare p-value to α
  • Make decision (reject or fail to reject H₀)
  • State conclusion in context

Always follow all four steps!

Example: Battery Life Test

Scenario: Sarah’s company claims μ = 24 hours. She tests n = 100 phones and finds x̄ = 23.5 hours, s = 4 hours. Test at α = 0.05.

STEP 1: STATE

  • H₀: μ = 24 hours    H₁: μ ≠ 24 hours (two-tailed)
  • α = 0.05    Parameter: μ = true mean battery life

STEP 2: PLAN

  • Random sample ✓, n = 100 ≥ 30 ✓, σ unknown → use t
  • df = 99

STEP 3: SOLVE

t = (23.5 - 24) / (4/SQRT(100)) = -0.5 / 0.4 = -1.25
p-value = 2*T.DIST(-1.25, 99, TRUE) ≈ 0.214

STEP 4: CONCLUDE

p-value = 0.214 > α = 0.05 → Fail to reject H₀

There is insufficient evidence at the 0.05 level to conclude that the mean battery life is different from 24 hours.

THINK-PAIR-SHARE 3 (7 minutes)

Complete Hypothesis Test

A coffee shop claims the average wait time is 5 minutes. You sample n = 36 customers and find x̄ = 5.8 minutes with s = 2.4 minutes. Test at α = 0.05 whether the true mean wait time is different from 5 minutes.

Follow the four-step process:

  1. STATE: Write H₀, H₁, define α and the parameter
  2. PLAN: Check conditions, identify test statistic and distribution
  3. SOLVE: Calculate test statistic and p-value (use Google Sheets)
  4. CONCLUDE: Make decision and write conclusion in context

Bonus: What would change if this were a right-tailed test (want to show wait time exceeds 5 minutes)?

Post on Ed Discussion with your partner’s name!

Share your answers in Poll Everywhere!

What is the p-value for this test?

Quick Knowledge Check ✅

Rate your confidence (1–5) on Ed Discussion:

  1. Constructing CIs with unknown σ (t-interval) ⭐⭐⭐⭐⭐
  2. Constructing CIs for proportions ⭐⭐⭐⭐⭐
  3. Formulating null and alternative hypotheses ⭐⭐⭐⭐⭐
  4. Calculating and interpreting p-values ⭐⭐⭐⭐⭐
  5. Conducting complete hypothesis tests (4 steps) ⭐⭐⭐⭐⭐

If you rated anything 3 or below, come to office hours!

Thank you! 📊✨

Questions? I have office hours right after class!

Next up: Type I & II errors, comparing two groups

Remember:

  • Post Think-Pair-Share responses on Ed Discussion and Poll Everywhere
  • Rate your confidence
  • Practice the four-step process with complete, in-context conclusions
  • Always check conditions before testing or building a CI