STAT 17: Confidence Intervals (cont.) & Hypothesis Testing

Prof. Marcela Alfaro Cordoba

Statistics - UCSC

08 Jun 2026

What We’ll Accomplish Today

Last time: We introduced the Central Limit Theorem and started confidence intervals.

We still need to cover:

Confidence intervals with unknown σ (t-distribution) — recap
Confidence intervals for proportions

Then, new material:

The logic of hypothesis testing
Null and alternative hypotheses
Test statistics and p-values
Complete hypothesis tests (four-step process)

Quick Recap: Confidence Intervals

General form:

\[\text{Estimate} \pm \text{Margin of Error}\]

Three CIs we need:

Mean, σ known (or large n): \(\bar{x} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}\)
Mean, σ unknown: \(\bar{x} \pm t_{\alpha/2,\,df} \times \frac{s}{\sqrt{n}}\)
Proportion: \(\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)

Key insight: CIs give a range of plausible values for a population parameter. Our confidence is in the method, not in any single interval.

Confidence Intervals: σ Unknown — The t-Distribution

Reality check: We almost never know σ in practice.

When σ is unknown:

Replace σ with the sample standard deviation s
Replace the z critical value with a t critical value
Use degrees of freedom df = n − 1

\[\bar{x} \pm t_{\alpha/2,\, df} \times \frac{s}{\sqrt{n}}\]

Why a different distribution?

Using s instead of σ introduces extra uncertainty
The t-distribution has heavier tails than the normal — it accounts for this
As df increases (larger n), the t-distribution approaches the z-distribution

See the applet: https://istats.shinyapps.io/tdist/

The t-Distribution: Critical Values

Finding t critical values in Google Sheets:

=T.INV.2T(alpha, df)

where alpha = 1 − confidence level.

Example: 95% CI with n = 25 (df = 24)

=T.INV.2T(0.05, 24) ≈ 2.064

Compare to z:

=NORM.S.INV(0.975) ≈ 1.96

The t critical value is always larger — giving a wider interval for small samples.

Sample size	df	t (95%)	z (95%)
n = 10	9	2.262	1.960
n = 30	29	2.045	1.960
n = 100	99	1.984	1.960

As n grows, t → z.

Example: t-Confidence Interval

Chloe tests a new production method:

Sample: n = 25 phones
Sample mean: x̄ = 24.5 hours
Sample std dev: s = 3.5 hours
Want 95% confidence interval

Step 1: Check conditions

Random sample? ✓
n = 25 < 30, but battery life is approximately normal ✓

Step 2: Critical value

df = 24
=T.INV.2T(0.05, 24) ≈ 2.064

Step 3–5: Calculate

SE = 3.5 / SQRT(25) = 0.7
ME = 2.064 × 0.7 = 1.445
CI: 24.5 ± 1.445 = (23.055, 25.945)

Interpretation: We are 95% confident the true mean battery life with the new method is between 23.1 and 25.9 hours.

Confidence Intervals for Proportions

New scenario: Estimating a population proportion p

Examples:

Proportion of defective products in a batch
Percentage of voters supporting a candidate
Proportion of customers who would repurchase

Sample proportion:

\[\hat{p} = \frac{x}{n}\]

where x = number of “successes,” n = sample size.

Standard error for proportions:

\[SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

Why z, not t? For proportions, we estimate SE directly from p̂ — there is no unknown σ analogous to the mean case.

CI for Proportions: Formula and Conditions

Confidence interval for a proportion:

\[\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

Conditions — check these before proceeding!

Random sample from population
Success-failure condition: np̂ ≥ 10 and n(1 − p̂) ≥ 10
Population is at least 10× the sample size (if sampling without replacement)

If condition 2 fails, the normal approximation breaks down and this formula is not valid.

Example: Proportion CI

Chloe surveys customers:

n = 200 customers sampled
x = 156 satisfied with their phone
p̂ = 156/200 = 0.78
Want 95% CI for the true satisfaction rate

Step 1: Check conditions

Random sample? ✓
np̂ = 200 × 0.78 = 156 ≥ 10 ✓
n(1 − p̂) = 200 × 0.22 = 44 ≥ 10 ✓

Step 2: Calculate

SE = SQRT(0.78 * 0.22 / 200) = 0.0293
z  = NORM.S.INV(0.975) = 1.96
ME = 1.96 × 0.0293 = 0.0574
CI: 0.78 ± 0.0574 = (0.723, 0.837)

Interpretation: We are 95% confident that between 72.3% and 83.7% of all customers are satisfied.

Sample Size for Proportion CI

How large must n be for a desired margin of error?

\[n = \left(\frac{z_{\alpha/2}}{ME}\right)^2 \times \hat{p}(1-\hat{p})\]

The catch: We need p̂ to compute n, but we need n to compute p̂!

Solutions:

Use prior data (historical p̂ if available)
Use p̂ = 0.5 — most conservative, gives the largest (safest) n
Use an educated guess from similar studies

Example: ME = 0.03, 95% confidence, no prior estimate (p̂ = 0.5)

= (NORM.S.INV(0.975) / 0.03)^2 * 0.5 * 0.5
= (1.96 / 0.03)^2 × 0.25 ≈ 1068

Need n = 1068 for a margin of error of 3% with no prior information.

Always round up.

Summary: Which CI to Use?

Situation	Formula	Distribution
Mean, σ known (or large n)	\(\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\)	z
Mean, σ unknown	\(\bar{x} \pm t_{\alpha/2,\,df} \cdot \frac{s}{\sqrt{n}}\)	t (df = n−1)
Proportion	\(\hat{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)	z

When in doubt, use t for means — it is always more conservative, and Google Sheets makes it just as easy.

Always check conditions before building any interval!

🧘‍♀️ STRETCH BREAK

Time to move! (5 minutes)

Stand up and stretch 🤸‍♀️
Chat with neighbors about CIs 💬
Grab some water 💧

Hypothesis Testing

Case Study: The Drug Approval Decision

Meet Dr. Chen, a medical researcher testing a new drug to lower blood pressure.

Her challenge: The pharmaceutical company claims the drug lowers blood pressure by at least 10 mmHg. Dr. Chen must:

Test whether this claim is supported by data
Balance two types of errors: approving ineffective drugs vs. rejecting effective ones
Make a decision with statistical evidence
Communicate findings to the FDA

The stakes: Approving an ineffective drug wastes money and gives false hope. Rejecting an effective drug denies patients a helpful treatment.

The tool: Hypothesis testing — the scientific method in statistical form!

From Estimation to Testing

So far: Using data to estimate parameters

Point estimate: x̄ = 24.3 hours
Interval estimate: 95% CI = (23.0, 24.6) hours

Now: Using data to test claims about parameters

Claim: “The average battery life is 24 hours”
Question: Do our data support or refute this claim?

This is hypothesis testing!

The Logic of Hypothesis Testing

The scientific method:

Start with a claim (hypothesis)
Collect data
See if the data are consistent with the claim
Make a decision: support or reject the claim

Key principle: Proof by contradiction

Assume the claim is true
If data are very unlikely under this assumption → reject the claim
If data are reasonably likely → do not reject the claim

Important: We never prove hypotheses; we only gather evidence for or against them!

Example: Drug lowers BP by 10 mmHg

Observe only 2 mmHg reduction with a large sample → reject claim
Observe 9 mmHg reduction → not enough evidence to reject

But how do we find the cut-off point?

The Null and Alternative Hypotheses

Null Hypothesis (H₀):

The “status quo” or “nothing interesting is happening” claim
What we assume is true initially
Always contains =, ≤, or ≥
The hypothesis we try to find evidence against

Alternative Hypothesis (H₁ or Hₐ):

The “research” hypothesis
What we are trying to find evidence for
Contains ≠, <, or >
Determines the type of test (two-tailed, left-tailed, right-tailed)

Key rule: H₀ and H₁ must be:

Mutually exclusive (can’t both be true)
Exhaustive (one must be true)
Statements about population parameters, not sample statistics

Types of Alternative Hypotheses

1. Two-tailed (≠):

H₀: μ = μ₀ H₁: μ ≠ μ₀
Use when: Interested in detecting any difference, in either direction
Example: “Is the mean different from 24 hours?”

2. Right-tailed (>):

H₀: μ ≤ μ₀ H₁: μ > μ₀
Use when: Want to show the parameter is greater
Example: “Has the new process increased battery life?”

3. Left-tailed (<):

H₀: μ ≥ μ₀ H₁: μ < μ₀
Use when: Want to show the parameter is less
Example: “Has the drug lowered blood pressure?”

The alternative hypothesis determines which tail(s) of the distribution we examine!

Formulating Hypotheses: Examples

Example 1: Drug testing

Claim: Drug lowers BP by at least 10 mmHg (μ ≥ 10)

H₀: μ ≥ 10 (drug is effective)
H₁: μ < 10 (drug is not effective enough)
Type: Left-tailed

Example 2: Quality control

Standard: Battery life should be 24 hours (μ = 24)

H₀: μ = 24 (meeting standard)
H₁: μ ≠ 24 (not meeting standard)
Type: Two-tailed

Example 3: Process improvement

Question: Has training improved customer satisfaction above 75%?

H₀: p ≤ 0.75 (no improvement)
H₁: p > 0.75 (improvement occurred)
Type: Right-tailed

Key: The research question determines H₁; H₀ is its complement.

Test Statistics: The Evidence Measure

Test statistic: A single number measuring how far the sample data are from H₀

For means (σ known or large sample):

\[z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\]

For means (σ unknown, small sample):

\[t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\]

For proportions:

\[z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\]

Interpretation: How many standard errors is our sample statistic from the null value? A large |test statistic| means the data are inconsistent with H₀.

The P-Value: Measuring Evidence

P-value definition:

The probability of observing data as extreme or more extreme than what we got, assuming H₀ is true.

Interpretation:

Small p-value → data unlikely under H₀ → evidence against H₀
Large p-value → data consistent with H₀ → insufficient evidence against H₀

The p-value is NOT:

The probability that H₀ is true
The probability that H₁ is true
The probability that we made the wrong decision

Calculating P-Values by Test Type

The p-value depends on the alternative hypothesis:

Two-tailed test (H₁: μ ≠ μ₀):

p-value = 2 × P(Z > |test statistic|)
Look at both tails

Right-tailed test (H₁: μ > μ₀):

p-value = P(Z > test statistic)
Look at right tail only

Left-tailed test (H₁: μ < μ₀):

p-value = P(Z < test statistic)
Look at left tail only

Google Sheets formulas on the next slide!

Google Sheets: P-Value Calculations

z-tests

(proportions or means with σ known)

Two-tailed:

=2*NORM.S.DIST(-ABS(z), TRUE)

Right-tailed:

=1-NORM.S.DIST(z, TRUE)

Left-tailed:

=NORM.S.DIST(z, TRUE)

t-tests

(means with σ unknown)

Two-tailed:

=2*T.DIST(-ABS(t), df, TRUE)

Right-tailed:

=1-T.DIST(t, df, TRUE)

Left-tailed:

=T.DIST(t, df, TRUE)

Pro tip: Use ABS() for the absolute value in two-tailed tests!

Significance Level (α)

Significance level (α): The threshold for rejecting H₀

Common choices:

α = 0.05 (most common)
α = 0.01 (more conservative)
α = 0.10 (more liberal)

Decision rule:

If p-value ≤ α → Reject H₀ (statistically significant)
If p-value > α → Fail to reject H₀ (not statistically significant)

Relationship to confidence intervals:

α = 0.05 corresponds to a 95% CI
α = 0.01 corresponds to a 99% CI
α = 0.10 corresponds to a 90% CI

Note: α must be chosen before seeing the data!

The Four-Step Hypothesis Test

Step 1: STATE

State H₀ and H₁
Define parameters clearly
Choose significance level α

Step 2: PLAN

Check conditions (random sample, sample size, etc.)
Choose test statistic (z or t)
Identify the distribution

Step 3: SOLVE

Calculate the test statistic
Find the p-value using Google Sheets

Step 4: CONCLUDE

Compare p-value to α
Make decision (reject or fail to reject H₀)
State conclusion in context

Always follow all four steps!

Example: Battery Life Test

Scenario: Sarah’s company claims μ = 24 hours. She tests n = 100 phones and finds x̄ = 23.5 hours, s = 4 hours. Test at α = 0.05.

STEP 1: STATE

H₀: μ = 24 hours H₁: μ ≠ 24 hours (two-tailed)
α = 0.05 Parameter: μ = true mean battery life

STEP 2: PLAN

Random sample ✓, n = 100 ≥ 30 ✓, σ unknown → use t
df = 99

STEP 3: SOLVE

t = (23.5 - 24) / (4/SQRT(100)) = -0.5 / 0.4 = -1.25
p-value = 2*T.DIST(-1.25, 99, TRUE) ≈ 0.214

STEP 4: CONCLUDE

p-value = 0.214 > α = 0.05 → Fail to reject H₀

There is insufficient evidence at the 0.05 level to conclude that the mean battery life is different from 24 hours.

Quick Knowledge Check ✅

Rate your confidence (1–5) on Ed Discussion:

Constructing CIs with unknown σ (t-interval) ⭐⭐⭐⭐⭐
Constructing CIs for proportions ⭐⭐⭐⭐⭐
Formulating null and alternative hypotheses ⭐⭐⭐⭐⭐
Calculating and interpreting p-values ⭐⭐⭐⭐⭐
Conducting complete hypothesis tests (4 steps) ⭐⭐⭐⭐⭐

If you rated anything 3 or below, come to office hours!

Thank you! 📊✨

Questions? I have office hours right after class!

Next up: Type I & II errors, comparing two groups

Remember:

Post Think-Pair-Share responses on Ed Discussion and Poll Everywhere
Rate your confidence
Practice the four-step process with complete, in-context conclusions
Always check conditions before testing or building a CI