Statistics - UCSC
06 Nov 2025
Meet Dr. Chen, a medical researcher testing a new drug to lower blood pressure.
Her challenge: The pharmaceutical company claims the drug lowers blood pressure by at least 10 mmHg. Dr. Chen must:
The stakes: Approving an ineffective drug wastes money and gives false hope. Rejecting an effective drug denies patients a helpful treatment.
The tool: Hypothesis testing - the scientific method in statistical form!
Understanding hypothesis testing helps Dr. Chen (and you!) make evidence-based decisions.
What we learned last time:
Central Limit Theorem: - \(\bar{x} \sim N\left(\mu, \frac{\sigma}{\sqrt{n}}\right)\) for large n - Standard Error: SE = σ/√n
Three types of CIs:
Mean, σ known: \(\bar{x} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}\)
Mean, σ unknown: \(\bar{x} \pm t_{\alpha/2,df} \times \frac{s}{\sqrt{n}}\)
Proportion: \(\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)
Key insight: CIs provide a range of plausible values for a parameter
By the end of this lecture, you will be able to:
So far: Using data to estimate parameters
Now: Using data to test claims about parameters
This is hypothesis testing!
The scientific method:
Key principle: Proof by contradiction
Important: We never “prove” hypotheses, we only gather evidence for or against them!
The scientific method:
Example: Drug lowers BP by 10 mmHg
If we observe only 2 mmHg reduction with large sample → reject claim
If we observe 9 mmHg reduction → not enough evidence to reject
But how do we find the cutting point?
Null Hypothesis (H₀):
The “status quo” or “nothing interesting” claim
What we assume is true initially
Always has =, ≤, or ≥
The hypothesis we try to find evidence AGAINST
Alternative Hypothesis (H₁ or Hₐ):
The “research” hypothesis
What we’re trying to find evidence FOR
Has ≠, <, or >
Determines the type of test (two-tailed, left-tailed, right-tailed)
Null Hypothesis (H₀)
Alternative Hypothesis (H₁ or Hₐ)
Key rule: H₀ and H₁ must be:
Mutually exclusive (can’t both be true)
Exhaustive (one must be true)
Statements about POPULATION parameters, not sample statistics
Three types based on research question:
1. Two-tailed (≠):
H₀: μ = μ₀
H₁: μ ≠ μ₀
Use when: Interested in detecting any difference (either direction)
Example: “Is the mean different from 24?”
2. Right-tailed (>):
H₀: μ ≤ μ₀
H₁: μ > μ₀
Use when: Want to show parameter is greater
Example: “Has the new process increased battery life?”
Three types based on research question:
3. Left-tailed (<):
H₀: μ ≥ μ₀
H₁: μ < μ₀
Use when: Want to show parameter is less
Example: “Has the drug lowered blood pressure?”
The alternative hypothesis determines which tail(s) we look at!
Example 1: Drug testing
Claim: Drug lowers BP by at least 10 mmHg (mean reduction μ ≥ 10)
Example 2: Quality control
Standard: Battery life should be 24 hours (μ = 24)
Example 3: Process improvement
Question: Has training improved customer satisfaction above 75%?
Key: The research question determines H₁, and H₀ is the complement!
Formulating Hypotheses
For each scenario, write H₀ and H₁, and identify the test type:
A company claims their phone battery lasts at least 48 hours. You want to test this claim.
Historical average GPA at UCSC is 3.2. Has it changed?
A website claims their ads have a 5% click-through rate. You think it’s lower.
A manufacturer wants to know if a new process produces parts with mean weight different from the current 50 grams.
A hospital wants to show that their new treatment reduces recovery time below the current 7 days.
For each: Identify the parameter, write hypotheses, name test type Post on Ed Discussion with partner’s name!
Test statistic: A single number that measures how far the sample data are from H₀
For means (σ known or large sample):
\[z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\]
For means (σ unknown, small sample):
\[t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\]
For proportions:
\[z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\]
Interpretation: - How many standard errors is our sample statistic from the null value? - Large |test statistic| → data inconsistent with H₀
P-value definition:
The probability of observing data as extreme or more extreme than what we got, assuming H₀ is true
Interpretation:
Important: P-value is NOT:
The probability H₀ is true
The probability H₁ is true
The probability we made the wrong decision
The p-value depends on the alternative hypothesis:
Two-tailed test (H₁: μ ≠ μ₀):
Right-tailed test (H₁: μ > μ₀):
Left-tailed test (H₁: μ < μ₀):
Google Sheets formulas coming up!
Significance level (α): The threshold for rejecting H₀
Common choices:
α = 0.05 (most common)
α = 0.01 (more conservative)
α = 0.10 (more liberal)
Decision rule:
If p-value ≤ α → Reject H₀ (statistically significant)
If p-value > α → Fail to reject H₀ (not statistically significant)
Relationship to confidence intervals: - α = 0.05 corresponds to 95% CI - α = 0.01 corresponds to 99% CI - α = 0.10 corresponds to 90% CI
Note: α is chosen BEFORE seeing the data!
Step 1: STATE
Step 2: PLAN
Step 3: SOLVE
Step 4: CONCLUDE
Always follow all four steps for complete tests!
Scenario: Sarah’s company claims μ = 24 hours. She tests n = 100 phones, finds x̄ = 23.5 hours, s = 4 hours. Test at α = 0.05.
STEP 1: STATE
STEP 2: PLAN
Scenario: Sarah’s company claims μ = 24 hours. She tests n = 100 phones, finds x̄ = 23.5 hours, s = 4 hours. Test at α = 0.05.
STEP 3: SOLVE
t = (23.5 - 24)/(4/SQRT(100)) = -0.5/0.4 = -1.25
p-value = 2 × P(T < -1.25) with df = 99
=2*T.DIST(-1.25, 99, TRUE) ≈ 0.214
STEP 4: CONCLUDE
z-tests
For proportions or means with σ known:
Two-tailed:
=2*NORM.S.DIST(-ABS(z), TRUE)
Right-tailed:
=1-NORM.S.DIST(z, TRUE)
Left-tailed:
=NORM.S.DIST(z, TRUE)
t-tests
For means with σ unknown:
Two-tailed:
=2*T.DIST(-ABS(t), df, TRUE)
Right-tailed:
=1-T.DIST(t, df, TRUE)
Left-tailed:
=T.DIST(t, df, TRUE)
Pro tip: Use ABS() for absolute value in two-tailed tests!
Complete Hypothesis Test
A coffee shop claims the average wait time is 5 minutes. You sample n = 36 customers and find x̄ = 5.8 minutes with s = 2.4 minutes. Test at α = 0.05 whether the true mean wait time is different from 5 minutes.
Follow the four-step process:
Bonus: What would change if this were a right-tailed test (want to show wait time exceeds 5 minutes)?
Post on Ed Discussion with partner’s name!
What is the p-value for this test?
Four possible outcomes in hypothesis testing:
| H₀ True | H₀ False | |
|---|---|---|
| Reject H₀ | Type I Error (α) | Correct Decision (Power) |
| Fail to Reject H₀ | Correct Decision | Type II Error (β) |
Type I Error (False Positive):
Type II Error (False Negative):
Power = 1 - β: - Probability of correctly rejecting false H₀ - Higher power is better!
Dr. Chen’s drug trial:
Type I Error (α):
Type II Error (β):
The trade-off: - Decrease α → increase β (more conservative) - Increase α → decrease β (more liberal) - Increase n → decrease both α and β!
Type I error rate = α (by definition!)
We CHOOSE α, which directly sets the Type I error rate
Example scenarios:
Life-or-death medical decision:
Preliminary research:
Standard research:
Key insight: We control Type I error directly by choosing α!
Type II error rate (β) depends on:
Calculating β requires:
Type II error rate (β) depends on:
Power = 1 - β:
Sample size planning:
Dr. Chen’s power analysis:
Setup:
Question: What sample size gives 80% power?
Answer: Use power analysis (Google Sheets add-on or statistical software)
Interpretation: - 80% chance of detecting that drug is ineffective (μ = 8) - 20% chance of Type II error (approving drug that only reduces BP by 8)
This is BEFORE conducting the study!
Scenario: Company claims 80% customer satisfaction (p = 0.80). You survey n = 200 customers, find 148 satisfied (p̂ = 0.74). Test at α = 0.05.
STEP 1: STATE
STEP 2: PLAN
Scenario: Company claims 80% customer satisfaction (p = 0.80). You survey n = 200 customers, find 148 satisfied (p̂ = 0.74). Test at α = 0.05.
STEP 3: SOLVE
SE = SQRT(0.80*0.20/200) = 0.0283
z = (0.74 - 0.80)/0.0283 = -2.12
p-value = 2*NORM.S.DIST(-2.12, TRUE) ≈ 0.034
STEP 4: CONCLUDE
Two-tailed test:
One-tailed test:
Example: Battery life (μ₀ = 24 hours)
Two-tailed:
Is it different? (could be better OR worse) - H₁: μ ≠ 24
One-tailed:
Is it worse? (only care about decrease) - H₁: μ < 24
Important: Choose BEFORE seeing data based on research question!
Key connection: CIs and two-tailed tests give same conclusion!
For a two-tailed test at α:
Example: Battery life
95% CI: (22.7, 24.3) hours
Test H₀: μ = 24 vs H₁: μ ≠ 24
Since 24 is IN the CI → Fail to reject H₀ at α = 0.05
Test H₀: μ = 22 vs H₁: μ ≠ 22
Since 22 is NOT in the CI → Reject H₀ at α = 0.05
Key connection: CIs and two-tailed tests give same conclusion!
Why this works:
Note: This relationship only holds for two-tailed tests with independence!
Hypothesis Testing with Proportions
A politician claims to have 55% support. A poll of n = 400 voters finds 200 support the politician (p̂ = 0.50). Test at α = 0.01 whether the true support differs from 55%.
Complete the test:
Additional questions:
Work in pairs, and then answer on PE individually.
What is the test statistic (z-value)?
Statistical significance: p-value < α
Practical significance: Effect size matters in real world
Example: Large study (n = 10,000)
Example: Small study (n = 20)
Always report: p-value AND effect size (like difference in means)
Mistake 1: “Accepting” H₀
Mistake 2: Wrong interpretation of p-value
Mistake 3: Changing α after seeing results
Mistake 4: Confusing significance with importance
Mistake 5: Wrong conclusion statement
Setup: Drug should reduce BP by ≥10 mmHg. Test n = 50 patients, find x̄ = 8.5 mmHg reduction, s = 6 mmHg. Test at α = 0.05.
STEP 1: STATE
STEP 2: PLAN
STEP 3: SOLVE
t = (8.5 - 10)/(6/SQRT(50)) = -1.5/0.849 = -1.77
p-value = T.DIST(-1.77, 49, TRUE) ≈ 0.042
STEP 4: CONCLUDE
Practical implication: FDA should not approve based on this evidence.
| Scenario | Parameter | Conditions | Test Statistic | Distribution |
|---|---|---|---|---|
| Mean, σ known | μ | Random, any n if normal, n≥30 if not | \(z = \frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}\) | z |
| Mean, σ unknown | μ | Random, n≥30 or normal | \(t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}\) | t (df=n-1) |
| Proportion | p | Random, np₀≥10, n(1-p₀)≥10 | \(z = \frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\) | z |
Decision tree:
Complete Hypothesis Test with Errors
A factory produces bolts with specified length of 5 cm. Quality control samples n = 40 bolts, finds x̄ = 5.15 cm, s = 0.4 cm.
Test at α = 0.05 whether mean length differs from 5 cm (complete 4 steps)
What type of error could you have made? Describe it in context.
If you failed to reject H₀, what would the Type II error be in context?
How would you reduce the probability of Type II error?
Construct a 95% CI for μ. Does it support your hypothesis test conclusion?
If α = 0.01 instead, would your conclusion change?
Use Google Sheets! Post on Ed Discussion!
The Four-Step Process:
Key Concepts:
Google Sheets: T.DIST, NORM.S.DIST for p-values
Both methods use: - Same conditions (random sampling, sample size) - Same distributions (z or t) - Same standard errors
Key differences:
| Feature | Confidence Interval | Hypothesis Test |
|---|---|---|
| Purpose | Estimate parameter | Test claim about parameter |
| Output | Range of values | Decision (reject or not) |
| Information | Shows precision | Shows significance |
| Flexibility | Test many values | Test one value |
When to use each:
Example: Drug trial - CI: (7.8, 9.2) mmHg reduction - Test: Reject H₀: μ ≥ 10 (p = 0.042) - Together: Drug reduces BP by 8-9 mmHg (not the claimed 10+)
Dr. Chen now understands:
✅ How to formulate hypotheses about drug effectiveness
✅ The trade-off between Type I and Type II errors
✅ How to calculate and interpret p-values
✅ That rejecting H₀: μ ≥ 10 means insufficient evidence drug is effective
✅ How to communicate findings: “Drug reduces BP by 8.5 mmHg (95% CI: 7.8-9.2), which is significantly less than the claimed 10 mmHg (p = 0.042)”
Her decision:
This is evidence-based decision making!
Medicine: - Clinical trials for new treatments - Comparing treatment effectiveness
Business: - A/B testing for website designs - Quality control in manufacturing
Science: - Testing scientific theories - Comparing experimental conditions
Public Policy: - Evaluating program effectiveness - Testing policy impacts
Sports: - Player performance analysis - Strategy effectiveness
All use the same hypothesis testing framework we learned today!
Rate your confidence (1-5) on Ed Discussion:
If you rated anything 3 or below, visit office hours!
Questions? I have office hours right after class today!
Coming up: More hypothesis tests
Remember:
STAT 17 – Fall 2025