STATS 17 — Final Exam Review

Spring 2026

What to Expect on the Final

Exam logistics

120 minutes (+ DRC time if applicable)
100 points total
In-person, scheduled location
Permitted: pen, photo ID, calculator
No notes, textbooks, phones, AI tools

Structure

Section	Items	Points
Multiple Choice	20 Qs	40 pts
Free Response	4 Qs (multi-part)	60 pts

Emphasis: Material covered after the midterm; midterm topics (descriptive stats, basic probability, discrete distributions) may appear but are not the focus.

Topics at a Glance

Topic	Key Idea
Normal Distribution & Z-scores	Standardize, use empirical rule, find percentiles
Central Limit Theorem	Sampling distribution of $\bar{X}$, standard error
Confidence Intervals	Means (z or t), proportions, sample size
Hypothesis Testing	H₀ vs Hₐ, p-value, Type I/II error
Two-Sample Inference	Pooled t-test, two-proportion z-test, Cohen’s d
Chi-Square Tests	Independence, expected counts, df
ANOVA	F-ratio, between vs. within variation
Correlation & Regression	r, r², slope/intercept interpretation, residuals

Tip

Exam tip: For every conclusion, state it in context — what do your results mean for the actual problem?

Normal Distribution: Key Ideas

Z-score: $z = \dfrac{x - \mu}{\sigma}$ transforms any normal to standard normal $N(0,1)$.

Empirical Rule:

Range	% of data
$\mu \pm \sigma$	~68%
$\mu \pm 2\sigma$	~95%
$\mu \pm 3\sigma$	~99.7%

Finding a value from a percentile: $x = \mu + z^* \cdot \sigma$

Critical values (included in the exam):

CI Level	$z^*$
90%	1.645
95%	1.96
98%	2.33
99%	2.576

Warning

On the exam you won’t have a z-table. Use the critical value list and sketch the distribution to reason about whether a probability is greater or less than 0.5.

Normal Distribution: Practice

Q1. Test scores are normally distributed with $\mu = 70$, $\sigma = 8$. What score corresponds to the 95th percentile?

Q2. Heights are normal with $\mu = 65$ in, $\sigma = 3$ in. Is the probability that a randomly selected woman is shorter than 62 inches greater than, less than, or equal to 0.5? Explain without calculating the exact value.

A1. 95th percentile → use $z^* = 1.645$ (the value that leaves 5% in the upper tail, same as the 90% CI critical value).
$x = 70 + 1.645(8) = 70 + 13.16 = \mathbf{83.16}$

A2. $z = (62 - 65)/3 = -1.0$. Since 62 is below the mean, the area to the left is less than 0.5. The probability is less than 0.5.

Central Limit Theorem

What it says: For a random sample of size $n$, the sampling distribution of $\bar{X}$ is approximately normal with: \[\bar{X} \sim N\!\left(\mu,\; \frac{\sigma}{\sqrt{n}}\right)\]

When it applies:

Population is normal → any $n$
Population is not normal → need $n \geq 30$ (rule of thumb)

Standard Error: $SE(\bar{X}) = \dfrac{\sigma}{\sqrt{n}}$, Z-score for sample means: $z = \dfrac{\bar{x} - \mu}{\sigma/\sqrt{n}}$

Tip

As $n$ increases, the standard error decreases — the sampling distribution becomes more concentrated around $\mu$.
Key distinction: individual observations use $\sigma$; sample means use $\sigma/\sqrt{n}$.

CLT: Practice

Q3. A population has $\mu = 80$, $\sigma = 20$. For samples of size $n = 64$:

What is the standard error of $\bar{X}$?
Is it more likely for an individual value to exceed 85, or a sample mean to exceed 85? Explain.

a. $SE = \sigma/\sqrt{n} = 20/\sqrt{64} = 20/8 = \mathbf{2.5}$

b. The sample mean is less likely to exceed 85. For an individual: $z = (85-80)/20 = 0.25$ — a small z, so fairly likely. For the sample mean: $z = (85-80)/2.5 = 2.0$ — a much larger z, so much less likely. Sample means have less variability than individual observations.

Confidence Intervals

Three formulas to know:

Parameter	CI Formula	Use when
$\mu$, $\sigma$ known	$\bar{x} \pm z^* \cdot \dfrac{\sigma}{\sqrt{n}}$	$\sigma$ given, large $n$
$\mu$, $\sigma$ unknown	$\bar{x} \pm t^* \cdot \dfrac{s}{\sqrt{n}}$, df $= n-1$	$s$ from data
$p$	$\hat{p} \pm z^* \cdot \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}$	Check: $n\hat{p} \geq 10$, $n(1-\hat{p}) \geq 10$

Correct interpretation: “We are [level]% confident that the interval captures the true population parameter.”

Warning

Higher confidence → wider interval.
A CI and a hypothesis test are related: if the null value falls outside the CI, reject $H_0$ at the corresponding $\alpha$.

CI: Practice

Q4. A random sample of 36 store receipts has $\bar{x} = \$47.20$ and $s = \$9.00$. Construct a 95% CI for the population mean purchase amount. Interpret your interval.

Use the $t$-distribution since $\sigma$ is unknown. df $= 35$ → $t^* \approx 2.03$ (close to 1.96 for large df).

\[47.20 \pm 2.03 \cdot \frac{9.00}{\sqrt{36}} = 47.20 \pm 2.03(1.5) = 47.20 \pm 3.05\]

Interval: ($44.15, $50.25)

Interpretation: We are 95% confident that the true mean purchase amount for all customers falls between $44.15 and $50.25.

Hypothesis Testing: The Framework

Steps for every test:

State $H_0$ and $H_a$ in symbols and words
Check conditions
Calculate the test statistic
Find the p-value (or compare to critical value)
Make a decision: reject $H_0$ if p-value $< \alpha$
State conclusion in context

✅ Say this	❌ Not this
Reject $H_0$	Accept $H_a$
Fail to reject $H_0$	Accept $H_0$

Tip

P-value: The probability of observing data as extreme as (or more extreme than) what we got, assuming $H_0$ is true.

Type I and Type II Errors

	$H_0$ is True	$H_0$ is False
Reject $H_0$	Type I Error ($\alpha$)	Correct ✅ (Power)
Fail to Reject $H_0$	Correct ✅	Type II Error ($\beta$)

Power $= 1 - \beta$ = probability of correctly rejecting a false $H_0$
Lowering $\alpha$ → reduces Type I error but increases Type II error (trade-off)

Q5. A company tests whether a new drug reduces blood pressure. State the consequences of a Type I and a Type II error in this context.

Type I error: Concluding the drug works when it actually doesn’t — patients receive an ineffective treatment.
Type II error: Concluding the drug doesn’t work when it actually does — patients are denied an effective treatment.

Two-Sample Inference

Comparing two means (pooled t-test, equal variances):

\[s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}, \quad t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}, \quad df = n_1+n_2-2\]

Effect size (Cohen’s d): $d = \dfrac{\bar{x}_1 - \bar{x}_2}{s_p}$

d value	Interpretation
0.2	Small
0.5	Medium
0.8	Large

Warning

Statistical significance ≠ practical significance. A tiny difference can be statistically significant with large $n$. Always report effect size.

Two-Sample Practice

Q6. Two training programs are compared on exam scores.
Group A: $n_1 = 20$, $\bar{x}_1 = 78$, $s_1 = 10$
Group B: $n_2 = 20$, $\bar{x}_2 = 72$, $s_2 = 10$

Calculate the pooled standard deviation $s_p$.
Calculate Cohen’s $d$ and interpret it.

a. Since $n_1 = n_2$ and $s_1 = s_2 = 10$: $s_p = 10$

b. $d = \dfrac{78 - 72}{10} = \dfrac{6}{10} = 0.6$

A Cohen’s $d$ of 0.6 is between medium (0.5) and large (0.8) — a practically meaningful difference between the two programs.

Chi-Square Test of Independence

Setup:

$H_0$: The two categorical variables are independent
$H_a$: The two variables are associated/related

Test statistic: $\chi^2 = \sum \dfrac{(O - E)^2}{E}$, where $E = \dfrac{(\text{row total})(\text{col total})}{\text{grand total}}$

Degrees of freedom: $df = (r-1)(c-1)$

Warning

A significant result tells you an association exists — it does NOT specify the direction or nature of the relationship.

Q7. A 3×4 contingency table (exercise: Low/Med/High vs. health: Poor/Fair/Good/Excellent) gives $\chi^2 = 18.5$, p-value $= 0.005$. What are the degrees of freedom? What is the conclusion at $\alpha = 0.01$?

$df = (3-1)(4-1) = 2 \times 3 = 6$. p-value $= 0.005 < \alpha = 0.01$ → Reject $H_0$. There is significant evidence at the 0.01 level that exercise frequency and health status are associated.

ANOVA

When to use: Comparing means across 3 or more groups (use a two-sample $t$-test for 2 groups).

\[F = \frac{MSB}{MSW} = \frac{SSB/(k-1)}{SSW/(n-k)}\]

$MSB$ = between-group variation; $MSW$ = within-group variation
$H_0$: $\mu_1 = \mu_2 = \cdots = \mu_k$; $H_a$: At least one mean differs

ANOVA table structure:

Source	SS	df	MS	F
Between	SSB	$k-1$	SSB/$(k-1)$	MSB/MSW
Within	SSW	$n-k$	SSW/$(n-k)$
Total	SST	$n-1$

Rejecting $H_0$ only tells you at least one mean differs — not which ones. Post-hoc tests (e.g., Tukey’s HSD) are needed for pairwise comparisons.

Correlation & Regression

Correlation: $-1 \leq r \leq 1$; measures strength and direction of a linear relationship.

Regression equation: $\hat{y} = b_0 + b_1 x$

\[b_1 = r\frac{s_y}{s_x} \qquad b_0 = \bar{y} - b_1\bar{x}\]

Key interpretations:

Slope $b_1$: For a 1-unit increase in $x$, $y$ is predicted to change by $b_1$ units on average.
Intercept $b_0$: Predicted value of $y$ when $x = 0$ (check if this makes sense in context).
$r^2$: Proportion of variation in $y$ explained by the linear relationship with $x$.
Residual: $e = y - \hat{y}$ (positive = above the line; negative = below).

Warning

Correlation ≠ causation. Avoid extrapolation (predicting outside the range of observed $x$ values).

Regression: Practice

Q8. Advertising spending (in $1000s) and monthly sales (in $1000s): $\hat{y} = 12.5 + 2.3x$, $r = 0.78$, $r^2 = 0.608$, $n = 25$.

Interpret the slope in context.
Estimate sales when advertising spending is $10,000.
Interpret $r^2$.
A month with $10,000 in advertising had actual sales of $40,000. What is the residual?

a. For every additional $1,000 spent on advertising, monthly sales are predicted to increase by $2,300 on average.
b. $x = 10$: $\hat{y} = 12.5 + 2.3(10) = 35.5$ → estimated sales of $35,500.
c. About 60.8% of the variation in monthly sales is explained by the linear relationship with advertising spending.
d. Actual $y = 40$ (thousands); predicted $\hat{y} = 35.5$. Residual $= 40 - 35.5 = \mathbf{4.5}$ (i.e., $4,500 above predicted).

Putting It All Together: Strategy Tips

Before you compute

Identify what type of test/interval is needed
Check conditions (random sample? $n \geq 30$? $n\hat{p} \geq 10$?)
Write hypotheses before looking at the data

When you write answers

State every conclusion in context
Use “fail to reject $H_0$” — never “accept $H_0$”
Report units on all numerical answers
For free response: show your formula, plug-in values, then compute

Tip

Common point-losers to avoid:

Forgetting to check conditions
Concluding “we accept $H_0$”
Interpreting a CI as “the probability the true mean is in this interval”
Forgetting that a significant chi-square or ANOVA result doesn’t specify which groups differ
Extrapolating beyond the data range in regression

Good Luck! 🎉

Final reminders:

Bring: pen, photo ID, calculator
Arrive a few minutes early
Sketch normal distributions when reasoning about probabilities
Show all work — partial credit is available
Verify your student ID will be checked at the end
Please don’t bring or make sure to turn off your smart glasses, smart watches, and cell phones during the exam.

Tip

You’ve got this. Take a breath, read each question carefully, and state every conclusion in context.

Parameter	CI Formula	Use when
\(\mu\), \(\sigma\) known	\(\bar{x} \pm z^* \cdot \dfrac{\sigma}{\sqrt{n}}\)	\(\sigma\) given, large \(n\)
\(\mu\), \(\sigma\) unknown	\(\bar{x} \pm t^* \cdot \dfrac{s}{\sqrt{n}}\), df \(= n-1\)	\(s\) from data
\(p\)	\(\hat{p} \pm z^* \cdot \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)	Check: \(n\hat{p} \geq 10\), \(n(1-\hat{p}) \geq 10\)

✅ Say this	❌ Not this
Reject \(H_0\)	Accept \(H_a\)
Fail to reject \(H_0\)	Accept \(H_0\)

	\(H_0\) is True	\(H_0\) is False
Reject \(H_0\)	Type I Error (\(\alpha\))	Correct ✅ (Power)
Fail to Reject \(H_0\)	Correct ✅	Type II Error (\(\beta\))

Range	% of data
\(\mu \pm \sigma\)	~68%
\(\mu \pm 2\sigma\)	~95%
\(\mu \pm 3\sigma\)	~99.7%

Source	SS	df	MS	F
Between	SSB	\(k-1\)	SSB/\((k-1)\)	MSB/MSW
Within	SSW	\(n-k\)	SSW/\((n-k)\)
Total	SST	\(n-1\)