STATS 17 — Final Exam Review

Spring 2026

What to Expect on the Final

Exam logistics

  • 120 minutes (+ DRC time if applicable)
  • 100 points total
  • In-person, scheduled location
  • Permitted: pen, photo ID, calculator
  • No notes, textbooks, phones, AI tools

Structure

Section Items Points
Multiple Choice 20 Qs 40 pts
Free Response 4 Qs (multi-part) 60 pts

Emphasis: Material covered after the midterm; midterm topics (descriptive stats, basic probability, discrete distributions) may appear but are not the focus.

Topics at a Glance

Topic Key Idea
Normal Distribution & Z-scores Standardize, use empirical rule, find percentiles
Central Limit Theorem Sampling distribution of \(\bar{X}\), standard error
Confidence Intervals Means (z or t), proportions, sample size
Hypothesis Testing H₀ vs Hₐ, p-value, Type I/II error
Two-Sample Inference Pooled t-test, two-proportion z-test, Cohen’s d
Chi-Square Tests Independence, expected counts, df
ANOVA F-ratio, between vs. within variation
Correlation & Regression r, r², slope/intercept interpretation, residuals

Tip

Exam tip: For every conclusion, state it in context — what do your results mean for the actual problem?

Normal Distribution: Key Ideas

Z-score: \(z = \dfrac{x - \mu}{\sigma}\) transforms any normal to standard normal \(N(0,1)\).

Empirical Rule:

Range % of data
\(\mu \pm \sigma\) ~68%
\(\mu \pm 2\sigma\) ~95%
\(\mu \pm 3\sigma\) ~99.7%

Finding a value from a percentile: \(x = \mu + z^* \cdot \sigma\)

Critical values (included in the exam):

CI Level \(z^*\)
90% 1.645
95% 1.96
98% 2.33
99% 2.576

Warning

On the exam you won’t have a z-table. Use the critical value list and sketch the distribution to reason about whether a probability is greater or less than 0.5.

Normal Distribution: Practice

Q1. Test scores are normally distributed with \(\mu = 70\), \(\sigma = 8\). What score corresponds to the 95th percentile?

Q2. Heights are normal with \(\mu = 65\) in, \(\sigma = 3\) in. Is the probability that a randomly selected woman is shorter than 62 inches greater than, less than, or equal to 0.5? Explain without calculating the exact value.

A1. 95th percentile → use \(z^* = 1.645\) (the value that leaves 5% in the upper tail, same as the 90% CI critical value).
\(x = 70 + 1.645(8) = 70 + 13.16 = \mathbf{83.16}\)

A2. \(z = (62 - 65)/3 = -1.0\). Since 62 is below the mean, the area to the left is less than 0.5. The probability is less than 0.5.

Central Limit Theorem

What it says: For a random sample of size \(n\), the sampling distribution of \(\bar{X}\) is approximately normal with: \[\bar{X} \sim N\!\left(\mu,\; \frac{\sigma}{\sqrt{n}}\right)\]

When it applies:

  • Population is normal → any \(n\)
  • Population is not normal → need \(n \geq 30\) (rule of thumb)

Standard Error: \(SE(\bar{X}) = \dfrac{\sigma}{\sqrt{n}}\), Z-score for sample means: \(z = \dfrac{\bar{x} - \mu}{\sigma/\sqrt{n}}\)

Tip

As \(n\) increases, the standard error decreases — the sampling distribution becomes more concentrated around \(\mu\).
Key distinction: individual observations use \(\sigma\); sample means use \(\sigma/\sqrt{n}\).

CLT: Practice

Q3. A population has \(\mu = 80\), \(\sigma = 20\). For samples of size \(n = 64\):

  1. What is the standard error of \(\bar{X}\)?
  2. Is it more likely for an individual value to exceed 85, or a sample mean to exceed 85? Explain.

a. \(SE = \sigma/\sqrt{n} = 20/\sqrt{64} = 20/8 = \mathbf{2.5}\)

b. The sample mean is less likely to exceed 85. For an individual: \(z = (85-80)/20 = 0.25\) — a small z, so fairly likely. For the sample mean: \(z = (85-80)/2.5 = 2.0\) — a much larger z, so much less likely. Sample means have less variability than individual observations.

Confidence Intervals

Three formulas to know:

Parameter CI Formula Use when
\(\mu\), \(\sigma\) known \(\bar{x} \pm z^* \cdot \dfrac{\sigma}{\sqrt{n}}\) \(\sigma\) given, large \(n\)
\(\mu\), \(\sigma\) unknown \(\bar{x} \pm t^* \cdot \dfrac{s}{\sqrt{n}}\), df \(= n-1\) \(s\) from data
\(p\) \(\hat{p} \pm z^* \cdot \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\) Check: \(n\hat{p} \geq 10\), \(n(1-\hat{p}) \geq 10\)

Correct interpretation: “We are [level]% confident that the interval captures the true population parameter.”

Warning

Higher confidence → wider interval.
A CI and a hypothesis test are related: if the null value falls outside the CI, reject \(H_0\) at the corresponding \(\alpha\).

CI: Practice

Q4. A random sample of 36 store receipts has \(\bar{x} = \$47.20\) and \(s = \$9.00\). Construct a 95% CI for the population mean purchase amount. Interpret your interval.

Use the \(t\)-distribution since \(\sigma\) is unknown. df \(= 35\)\(t^* \approx 2.03\) (close to 1.96 for large df).

\[47.20 \pm 2.03 \cdot \frac{9.00}{\sqrt{36}} = 47.20 \pm 2.03(1.5) = 47.20 \pm 3.05\]

Interval: ($44.15, $50.25)

Interpretation: We are 95% confident that the true mean purchase amount for all customers falls between $44.15 and $50.25.

Hypothesis Testing: The Framework

Steps for every test:

  1. State \(H_0\) and \(H_a\) in symbols and words
  2. Check conditions
  3. Calculate the test statistic
  4. Find the p-value (or compare to critical value)
  5. Make a decision: reject \(H_0\) if p-value \(< \alpha\)
  6. State conclusion in context
✅ Say this ❌ Not this
Reject \(H_0\) Accept \(H_a\)
Fail to reject \(H_0\) Accept \(H_0\)

Tip

P-value: The probability of observing data as extreme as (or more extreme than) what we got, assuming \(H_0\) is true.

Type I and Type II Errors

\(H_0\) is True \(H_0\) is False
Reject \(H_0\) Type I Error (\(\alpha\)) Correct ✅ (Power)
Fail to Reject \(H_0\) Correct ✅ Type II Error (\(\beta\))
  • Power \(= 1 - \beta\) = probability of correctly rejecting a false \(H_0\)
  • Lowering \(\alpha\) → reduces Type I error but increases Type II error (trade-off)

Q5. A company tests whether a new drug reduces blood pressure. State the consequences of a Type I and a Type II error in this context.

Type I error: Concluding the drug works when it actually doesn’t — patients receive an ineffective treatment.
Type II error: Concluding the drug doesn’t work when it actually does — patients are denied an effective treatment.

Two-Sample Inference

Comparing two means (pooled t-test, equal variances):

\[s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}, \quad t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}, \quad df = n_1+n_2-2\]

Effect size (Cohen’s d): \(d = \dfrac{\bar{x}_1 - \bar{x}_2}{s_p}\)

d value Interpretation
0.2 Small
0.5 Medium
0.8 Large

Warning

Statistical significance ≠ practical significance. A tiny difference can be statistically significant with large \(n\). Always report effect size.

Two-Sample Practice

Q6. Two training programs are compared on exam scores.
Group A: \(n_1 = 20\), \(\bar{x}_1 = 78\), \(s_1 = 10\)
Group B: \(n_2 = 20\), \(\bar{x}_2 = 72\), \(s_2 = 10\)

  1. Calculate the pooled standard deviation \(s_p\).
  2. Calculate Cohen’s \(d\) and interpret it.

a. Since \(n_1 = n_2\) and \(s_1 = s_2 = 10\): \(s_p = 10\)

b. \(d = \dfrac{78 - 72}{10} = \dfrac{6}{10} = 0.6\)

A Cohen’s \(d\) of 0.6 is between medium (0.5) and large (0.8) — a practically meaningful difference between the two programs.

Chi-Square Test of Independence

Setup:

  • \(H_0\): The two categorical variables are independent
  • \(H_a\): The two variables are associated/related

Test statistic: \(\chi^2 = \sum \dfrac{(O - E)^2}{E}\), where \(E = \dfrac{(\text{row total})(\text{col total})}{\text{grand total}}\)

Degrees of freedom: \(df = (r-1)(c-1)\)

Warning

A significant result tells you an association exists — it does NOT specify the direction or nature of the relationship.

Q7. A 3×4 contingency table (exercise: Low/Med/High vs. health: Poor/Fair/Good/Excellent) gives \(\chi^2 = 18.5\), p-value \(= 0.005\). What are the degrees of freedom? What is the conclusion at \(\alpha = 0.01\)?

\(df = (3-1)(4-1) = 2 \times 3 = 6\). p-value \(= 0.005 < \alpha = 0.01\)Reject \(H_0\). There is significant evidence at the 0.01 level that exercise frequency and health status are associated.

ANOVA

When to use: Comparing means across 3 or more groups (use a two-sample \(t\)-test for 2 groups).

\[F = \frac{MSB}{MSW} = \frac{SSB/(k-1)}{SSW/(n-k)}\]

  • \(MSB\) = between-group variation; \(MSW\) = within-group variation
  • \(H_0\): \(\mu_1 = \mu_2 = \cdots = \mu_k\); \(H_a\): At least one mean differs

ANOVA table structure:

Source SS df MS F
Between SSB \(k-1\) SSB/\((k-1)\) MSB/MSW
Within SSW \(n-k\) SSW/\((n-k)\)
Total SST \(n-1\)

Rejecting \(H_0\) only tells you at least one mean differs — not which ones. Post-hoc tests (e.g., Tukey’s HSD) are needed for pairwise comparisons.

Correlation & Regression

Correlation: \(-1 \leq r \leq 1\); measures strength and direction of a linear relationship.

Regression equation: \(\hat{y} = b_0 + b_1 x\)

\[b_1 = r\frac{s_y}{s_x} \qquad b_0 = \bar{y} - b_1\bar{x}\]

Key interpretations:

  • Slope \(b_1\): For a 1-unit increase in \(x\), \(y\) is predicted to change by \(b_1\) units on average.
  • Intercept \(b_0\): Predicted value of \(y\) when \(x = 0\) (check if this makes sense in context).
  • \(r^2\): Proportion of variation in \(y\) explained by the linear relationship with \(x\).
  • Residual: \(e = y - \hat{y}\) (positive = above the line; negative = below).

Warning

Correlation ≠ causation. Avoid extrapolation (predicting outside the range of observed \(x\) values).

Regression: Practice

Q8. Advertising spending (in $1000s) and monthly sales (in $1000s): \(\hat{y} = 12.5 + 2.3x\), \(r = 0.78\), \(r^2 = 0.608\), \(n = 25\).

  1. Interpret the slope in context.
  2. Estimate sales when advertising spending is $10,000.
  3. Interpret \(r^2\).
  4. A month with $10,000 in advertising had actual sales of $40,000. What is the residual?

a. For every additional $1,000 spent on advertising, monthly sales are predicted to increase by $2,300 on average.
b. \(x = 10\): \(\hat{y} = 12.5 + 2.3(10) = 35.5\) → estimated sales of $35,500.
c. About 60.8% of the variation in monthly sales is explained by the linear relationship with advertising spending.
d. Actual \(y = 40\) (thousands); predicted \(\hat{y} = 35.5\). Residual \(= 40 - 35.5 = \mathbf{4.5}\) (i.e., $4,500 above predicted).

Putting It All Together: Strategy Tips

Before you compute

  • Identify what type of test/interval is needed
  • Check conditions (random sample? \(n \geq 30\)? \(n\hat{p} \geq 10\)?)
  • Write hypotheses before looking at the data

When you write answers

  • State every conclusion in context
  • Use “fail to reject \(H_0\)” — never “accept \(H_0\)
  • Report units on all numerical answers
  • For free response: show your formula, plug-in values, then compute

Tip

Common point-losers to avoid:

  1. Forgetting to check conditions
  2. Concluding “we accept \(H_0\)
  3. Interpreting a CI as “the probability the true mean is in this interval”
  4. Forgetting that a significant chi-square or ANOVA result doesn’t specify which groups differ
  5. Extrapolating beyond the data range in regression

Good Luck! 🎉

Final reminders:

  • Bring: pen, photo ID, calculator
  • Arrive a few minutes early
  • Sketch normal distributions when reasoning about probabilities
  • Show all work — partial credit is available
  • Verify your student ID will be checked at the end
  • Please don’t bring or make sure to turn off your smart glasses, smart watches, and cell phones during the exam.

Tip

You’ve got this. Take a breath, read each question carefully, and state every conclusion in context.