Statistics - UCSC
20 Nov 2025
Quick recap: Last class we covered hypothesis testing — the four-step process, p-values, and confidence interval connections.
Today:
Four possible outcomes in hypothesis testing:
| H₀ True | H₀ False | |
|---|---|---|
| Reject H₀ | Type I Error (α) | Correct Decision (Power) |
| Fail to Reject H₀ | Correct Decision | Type II Error (β) |
Type I Error (False Positive):
Type II Error (False Negative):
Power = 1 − β: Probability of correctly rejecting a false H₀. Higher power is better!
Dr. Chen’s drug trial:
Type I Error (α):
Type II Error (β):
The trade-off:
Type I error rate = α — we choose it directly, before seeing the data.
Life-or-death decision:
Preliminary research:
Standard research:
Controlling β:
Statistical significance: p-value < α
Practical significance: Effect size matters in the real world
Example — Large study (n = 10,000):
Example — Small study (n = 20):
Always report: p-value and effect size together.
Mistake 1: “Accepting” H₀
Mistake 2: Wrong interpretation of p-value
Mistake 3: Changing α after seeing results (“p-hacking”)
Mistake 4: Confusing significance with importance
Mistake 5: Conclusion not in context — always state what the decision means for the problem, not just “reject H₀”
Errors in Context
A factory produces bolts with a specified length of 5 cm. Quality control samples n = 40 bolts and finds x̄ = 5.15 cm, s = 0.4 cm.
Use Google Sheets! Post on Ed Discussion with your partner’s name.
What is the test statistic for the bolt length test?
Scenario: A major online retailer is testing two checkout designs:
Questions we’ll answer today:
Key assumption: Two independent samples from two populations.
This is different from the one-sample tests we’ve been doing — now we have no known μ₀ to test against. We let the data from both groups speak.
Hypotheses:
Test statistic (under H₀):
\[t = \frac{\bar{x}_1 - \bar{x}_2}{SE(\bar{x}_1 - \bar{x}_2)}\]
Two cases for SE — depending on whether we assume equal variances:
| Case | Assumption | SE formula |
|---|---|---|
| Pooled | σ₁² = σ₂² | uses pooled \(s_p\) |
| Welch’s | σ₁² ≠ σ₂² | uses \(s_1, s_2\) separately |
Case 1: Equal Variances (pooled t-test)
\[SE = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}, \quad \text{where } s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}\]
Case 2: Unequal Variances (Welch’s t-test)
\[SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]
In practice: When in doubt, use Welch’s — it’s the default in most software.
Data from the checkout design test:
Test: H₀: μ_A = μ_B vs Hₐ: μ_B > μ_A at α = 0.05 (one-sided, assuming equal variances)
Step 1: Pooled standard deviation
\[s_p = \sqrt{\frac{(249)(22.30)^2 + (249)(24.10)^2}{498}} = \sqrt{\frac{123{,}956 + 144{,}840}{498}} = \sqrt{539.6} = 23.23\]
Step 2: Standard error
\[SE = 23.23 \times \sqrt{\frac{1}{250} + \frac{1}{250}} = 23.23 \times 0.0894 = 2.08\]
Step 3: Test statistic
\[t = \frac{92.80 - 87.50}{2.08} = \frac{5.30}{2.08} = 2.55\]
Step 4: P-value and decision
df = 498, one-sided test at α = 0.05:
=1 - T.DIST(2.55, 498, TRUE) ≈ 0.0055
p-value = 0.0055 < 0.05 → Reject H₀
Conclusion: There is significant evidence that Design B leads to higher average purchase amounts than Design A.
Practical interpretation: The new checkout design increases average purchases by about $5.30.
Function: =T.TEST(array1, array2, tails, type)
| Parameter | Meaning |
|---|---|
array1 |
First sample data range |
array2 |
Second sample data range |
tails |
1 = one-sided, 2 = two-sided |
type |
1 = paired, 2 = equal variance, 3 = unequal variance (Welch’s) |
Example:
=T.TEST(A2:A251, B2:B251, 1, 2)
Returns the p-value directly for a one-sided, equal-variance test.
For the test statistic and CI manually:
// Pooled SD
=SQRT(((n1-1)*s1^2 + (n2-1)*s2^2) / (n1+n2-2))
// SE
=sp * SQRT(1/n1 + 1/n2)
// t-statistic
=(xbar1 - xbar2) / SE
Statistical significance ≠ Practical importance.
Even a tiny difference can be “significant” with a large enough sample.
Cohen’s d measures the standardized difference between two means:
\[d = \frac{\bar{x}_1 - \bar{x}_2}{s_p}\]
It tells us how far apart the means are in standard deviation units.
Cohen’s benchmarks:
| Cohen’s d | Interpretation |
|---|---|
| 0.2 | Small effect — difficult to notice |
| 0.5 | Medium effect — noticeable |
| 0.8 | Large effect — very noticeable |
Our example:
\[d = \frac{92.80 - 87.50}{23.23} = \frac{5.30}{23.23} = 0.23\]
Small-to-medium effect — statistically significant, but modest in practical terms.
Two-Sample t-Test
A software company tested two training methods for new employees.
Calculate:
Post on Ed Discussion with your partner’s name!
Is this difference practically important?
When: Testing whether two groups differ on a binary outcome (success/failure)
Examples:
Hypotheses:
Key difference from one-sample proportion test: We no longer have a known p₀. Instead, we estimate the common proportion under H₀ by pooling both samples.
Step 1: Pooled proportion (our best estimate of p under H₀)
\[\hat{p} = \frac{x_1 + x_2}{n_1 + n_2}\]
Step 2: Standard error under H₀
\[SE = \sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}\]
Step 3: Test statistic
\[z = \frac{\hat{p}_1 - \hat{p}_2}{SE}\]
Under H₀, z follows the standard normal distribution, so we use z critical values and NORM.S.DIST for p-values.
Data from the checkout design test:
Test: H₀: p_A = p_B vs Hₐ: p_B > p_A at α = 0.05
Step 1: Pooled proportion
\[\hat{p} = \frac{47 + 63}{250 + 250} = \frac{110}{500} = 0.220\]
Step 2: Standard error
\[SE = \sqrt{0.220 \times 0.780 \times \left(\frac{1}{250} + \frac{1}{250}\right)} = \sqrt{0.001373} = 0.0371\]
Step 3: Test statistic
\[z = \frac{0.252 - 0.188}{0.0371} = \frac{0.064}{0.0371} = 1.72\]
Step 4: P-value and decision
One-sided test at α = 0.05:
=1 - NORM.S.DIST(1.72, TRUE) ≈ 0.043
p-value = 0.043 < 0.05 → Reject H₀
Conclusion: There is significant evidence that Design B has a higher conversion rate than Design A.
Practical interpretation: Design B increases conversion by about 6.4 percentage points (18.8% → 25.2%).
// Pooled proportion
=(x1 + x2) / (n1 + n2)
// Standard error
=SQRT(p_pool * (1 - p_pool) * (1/n1 + 1/n2))
// z-statistic
=(phat1 - phat2) / SE
// P-value (one-sided, right-tailed)
=1 - NORM.S.DIST(z, TRUE)
// P-value (two-sided)
=2 * (1 - NORM.S.DIST(ABS(z), TRUE))
Conditions to check before running this test:
| Scenario | Parameter | Test statistic | Distribution |
|---|---|---|---|
| One mean, σ known | μ | \(z = \frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}\) | z |
| One mean, σ unknown | μ | \(t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}\) | t (df = n−1) |
| One proportion | p | \(z = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}\) | z |
| Two means (equal var) | μ₁−μ₂ | \(t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{1/n_1+1/n_2}}\) | t (df = n₁+n₂−2) |
| Two means (Welch’s) | μ₁−μ₂ | \(t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_1^2/n_1+s_2^2/n_2}}\) | t (df approx.) |
| Two proportions | p₁−p₂ | \(z = \frac{\hat{p}_1 - \hat{p}_2}{SE_\text{pool}}\) | z |
Two Proportions
A university tested two formats of an online course:
Use Google Sheets! Post on Ed Discussion with your partner’s name.
What is the z-statistic for the pass rate comparison?
Error types:
Comparing two groups:
| Goal | Test | Key formula |
|---|---|---|
| Compare two means | Two-sample t | \(t = \frac{\bar{x}_1 - \bar{x}_2}{SE}\) |
| Quantify difference | Cohen’s d | \(d = \frac{\bar{x}_1 - \bar{x}_2}{s_p}\) |
| Compare two proportions | Two-proportion z | \(z = \frac{\hat{p}_1 - \hat{p}_2}{SE_\text{pool}}\) |
Always report statistical significance (p-value) AND effect size!
Rate your confidence (1–5) on Ed Discussion:
If you rated anything 3 or below, come to office hours!
Questions? I have office hours right after class!
Next up: One-way ANOVA — comparing more than two groups
Remember:
STAT 17 – Fall 2025