Statistics - UCSC
29 May 2026
Last time: We introduced the Central Limit Theorem and started confidence intervals.
We still need to cover:
Then, new material:
General form:
\[\text{Estimate} \pm \text{Margin of Error}\]
Three CIs we need:
Mean, σ known (or large n): \(\bar{x} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}\)
Mean, σ unknown: \(\bar{x} \pm t_{\alpha/2,\,df} \times \frac{s}{\sqrt{n}}\)
Proportion: \(\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)
Key insight: CIs give a range of plausible values for a population parameter. Our confidence is in the method, not in any single interval.
Reality check: We almost never know σ in practice.
When σ is unknown:
\[\bar{x} \pm t_{\alpha/2,\, df} \times \frac{s}{\sqrt{n}}\]
Why a different distribution?
See the applet: https://istats.shinyapps.io/tdist/
Finding t critical values in Google Sheets:
=T.INV.2T(alpha, df)
where alpha = 1 − confidence level.
Example: 95% CI with n = 25 (df = 24)
=T.INV.2T(0.05, 24) ≈ 2.064
Compare to z:
=NORM.S.INV(0.975) ≈ 1.96
The t critical value is always larger — giving a wider interval for small samples.
| Sample size | df | t (95%) | z (95%) |
|---|---|---|---|
| n = 10 | 9 | 2.262 | 1.960 |
| n = 30 | 29 | 2.045 | 1.960 |
| n = 100 | 99 | 1.984 | 1.960 |
As n grows, t → z.
Chloe tests a new production method:
Step 1: Check conditions
Step 2: Critical value
df = 24
=T.INV.2T(0.05, 24) ≈ 2.064
Step 3–5: Calculate
SE = 3.5 / SQRT(25) = 0.7
ME = 2.064 × 0.7 = 1.445
CI: 24.5 ± 1.445 = (23.055, 25.945)
Interpretation: We are 95% confident the true mean battery life with the new method is between 23.1 and 25.9 hours.
New scenario: Estimating a population proportion p
Examples:
Sample proportion:
\[\hat{p} = \frac{x}{n}\]
where x = number of “successes,” n = sample size.
Standard error for proportions:
\[SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]
Why z, not t? For proportions, we estimate SE directly from p̂ — there is no unknown σ analogous to the mean case.
Confidence interval for a proportion:
\[\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]
Conditions — check these before proceeding!
If condition 2 fails, the normal approximation breaks down and this formula is not valid.
Chloe surveys customers:
Step 1: Check conditions
Step 2: Calculate
SE = SQRT(0.78 * 0.22 / 200) = 0.0293
z = NORM.S.INV(0.975) = 1.96
ME = 1.96 × 0.0293 = 0.0574
CI: 0.78 ± 0.0574 = (0.723, 0.837)
Interpretation: We are 95% confident that between 72.3% and 83.7% of all customers are satisfied.
How large must n be for a desired margin of error?
\[n = \left(\frac{z_{\alpha/2}}{ME}\right)^2 \times \hat{p}(1-\hat{p})\]
The catch: We need p̂ to compute n, but we need n to compute p̂!
Solutions:
Example: ME = 0.03, 95% confidence, no prior estimate (p̂ = 0.5)
= (NORM.S.INV(0.975) / 0.03)^2 * 0.5 * 0.5
= (1.96 / 0.03)^2 × 0.25 ≈ 1068
Need n = 1068 for a margin of error of 3% with no prior information.
Always round up.
Proportion Confidence Interval
A politician claims to have 55% support. A poll of n = 400 voters finds 200 support the politician (p̂ = 0.50).
Use Google Sheets! Post on Ed Discussion with your partner’s name.
What is the 99% CI for the true proportion p?
| Situation | Formula | Distribution |
|---|---|---|
| Mean, σ known (or large n) | \(\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\) | z |
| Mean, σ unknown | \(\bar{x} \pm t_{\alpha/2,\,df} \cdot \frac{s}{\sqrt{n}}\) | t (df = n−1) |
| Proportion | \(\hat{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\) | z |
When in doubt, use t for means — it is always more conservative, and Google Sheets makes it just as easy.
Always check conditions before building any interval!
Meet Dr. Chen, a medical researcher testing a new drug to lower blood pressure.
Her challenge: The pharmaceutical company claims the drug lowers blood pressure by at least 10 mmHg. Dr. Chen must:
The stakes: Approving an ineffective drug wastes money and gives false hope. Rejecting an effective drug denies patients a helpful treatment.
The tool: Hypothesis testing — the scientific method in statistical form!
So far: Using data to estimate parameters
Now: Using data to test claims about parameters
This is hypothesis testing!
The scientific method:
Key principle: Proof by contradiction
Important: We never prove hypotheses; we only gather evidence for or against them!
Example: Drug lowers BP by 10 mmHg
But how do we find the cut-off point?
Null Hypothesis (H₀):
Alternative Hypothesis (H₁ or Hₐ):
Key rule: H₀ and H₁ must be:
1. Two-tailed (≠):
2. Right-tailed (>):
3. Left-tailed (<):
The alternative hypothesis determines which tail(s) of the distribution we examine!
Example 1: Drug testing
Claim: Drug lowers BP by at least 10 mmHg (μ ≥ 10)
Example 2: Quality control
Standard: Battery life should be 24 hours (μ = 24)
Example 3: Process improvement
Question: Has training improved customer satisfaction above 75%?
Key: The research question determines H₁; H₀ is its complement.
Formulating Hypotheses
For each scenario, write H₀ and H₁, and identify the test type:
For each: Identify the parameter, write the hypotheses, name the test type.
Post on Ed Discussion with your partner’s name!
Test statistic: A single number measuring how far the sample data are from H₀
For means (σ known or large sample):
\[z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\]
For means (σ unknown, small sample):
\[t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\]
For proportions:
\[z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\]
Interpretation: How many standard errors is our sample statistic from the null value? A large |test statistic| means the data are inconsistent with H₀.
P-value definition:
The probability of observing data as extreme or more extreme than what we got, assuming H₀ is true.
Interpretation:
The p-value is NOT:
The p-value depends on the alternative hypothesis:
Two-tailed test (H₁: μ ≠ μ₀):
Right-tailed test (H₁: μ > μ₀):
Left-tailed test (H₁: μ < μ₀):
Google Sheets formulas on the next slide!
z-tests
(proportions or means with σ known)
Two-tailed:
=2*NORM.S.DIST(-ABS(z), TRUE)
Right-tailed:
=1-NORM.S.DIST(z, TRUE)
Left-tailed:
=NORM.S.DIST(z, TRUE)
t-tests
(means with σ unknown)
Two-tailed:
=2*T.DIST(-ABS(t), df, TRUE)
Right-tailed:
=1-T.DIST(t, df, TRUE)
Left-tailed:
=T.DIST(t, df, TRUE)
Pro tip: Use ABS() for the absolute value in two-tailed tests!
Significance level (α): The threshold for rejecting H₀
Common choices:
Decision rule:
Relationship to confidence intervals:
Note: α must be chosen before seeing the data!
Step 1: STATE
Step 2: PLAN
Step 3: SOLVE
Step 4: CONCLUDE
Always follow all four steps!
Scenario: Sarah’s company claims μ = 24 hours. She tests n = 100 phones and finds x̄ = 23.5 hours, s = 4 hours. Test at α = 0.05.
STEP 1: STATE
STEP 2: PLAN
STEP 3: SOLVE
t = (23.5 - 24) / (4/SQRT(100)) = -0.5 / 0.4 = -1.25
p-value = 2*T.DIST(-1.25, 99, TRUE) ≈ 0.214
STEP 4: CONCLUDE
p-value = 0.214 > α = 0.05 → Fail to reject H₀
There is insufficient evidence at the 0.05 level to conclude that the mean battery life is different from 24 hours.
Complete Hypothesis Test
A coffee shop claims the average wait time is 5 minutes. You sample n = 36 customers and find x̄ = 5.8 minutes with s = 2.4 minutes. Test at α = 0.05 whether the true mean wait time is different from 5 minutes.
Follow the four-step process:
Bonus: What would change if this were a right-tailed test (want to show wait time exceeds 5 minutes)?
Post on Ed Discussion with your partner’s name!
What is the p-value for this test?
Rate your confidence (1–5) on Ed Discussion:
If you rated anything 3 or below, come to office hours!
Questions? I have office hours right after class!
Next up: Type I & II errors, comparing two groups
Remember:
STAT 17 – Fall 2025