Spring 2026
Exam logistics
Structure
| Section | Items | Points |
|---|---|---|
| Multiple Choice | 20 Qs | 40 pts |
| Free Response | 4 Qs (multi-part) | 60 pts |
Emphasis: Material covered after the midterm; midterm topics (descriptive stats, basic probability, discrete distributions) may appear but are not the focus.
| Topic | Key Idea |
|---|---|
| Normal Distribution & Z-scores | Standardize, use empirical rule, find percentiles |
| Central Limit Theorem | Sampling distribution of \(\bar{X}\), standard error |
| Confidence Intervals | Means (z or t), proportions, sample size |
| Hypothesis Testing | H₀ vs Hₐ, p-value, Type I/II error |
| Two-Sample Inference | Pooled t-test, two-proportion z-test, Cohen’s d |
| Chi-Square Tests | Independence, expected counts, df |
| ANOVA | F-ratio, between vs. within variation |
| Correlation & Regression | r, r², slope/intercept interpretation, residuals |
Tip
Exam tip: For every conclusion, state it in context — what do your results mean for the actual problem?
Z-score: \(z = \dfrac{x - \mu}{\sigma}\) transforms any normal to standard normal \(N(0,1)\).
Empirical Rule:
| Range | % of data |
|---|---|
| \(\mu \pm \sigma\) | ~68% |
| \(\mu \pm 2\sigma\) | ~95% |
| \(\mu \pm 3\sigma\) | ~99.7% |
Finding a value from a percentile: \(x = \mu + z^* \cdot \sigma\)
Critical values (included in the exam):
| CI Level | \(z^*\) |
|---|---|
| 90% | 1.645 |
| 95% | 1.96 |
| 98% | 2.33 |
| 99% | 2.576 |
Warning
On the exam you won’t have a z-table. Use the critical value list and sketch the distribution to reason about whether a probability is greater or less than 0.5.
Q1. Test scores are normally distributed with \(\mu = 70\), \(\sigma = 8\). What score corresponds to the 95th percentile?
Q2. Heights are normal with \(\mu = 65\) in, \(\sigma = 3\) in. Is the probability that a randomly selected woman is shorter than 62 inches greater than, less than, or equal to 0.5? Explain without calculating the exact value.
A1. 95th percentile → use \(z^* = 1.645\) (the value that leaves 5% in the upper tail, same as the 90% CI critical value).
\(x = 70 + 1.645(8) = 70 + 13.16 = \mathbf{83.16}\)
A2. \(z = (62 - 65)/3 = -1.0\). Since 62 is below the mean, the area to the left is less than 0.5. The probability is less than 0.5.
What it says: For a random sample of size \(n\), the sampling distribution of \(\bar{X}\) is approximately normal with: \[\bar{X} \sim N\!\left(\mu,\; \frac{\sigma}{\sqrt{n}}\right)\]
When it applies:
Standard Error: \(SE(\bar{X}) = \dfrac{\sigma}{\sqrt{n}}\), Z-score for sample means: \(z = \dfrac{\bar{x} - \mu}{\sigma/\sqrt{n}}\)
Tip
As \(n\) increases, the standard error decreases — the sampling distribution becomes more concentrated around \(\mu\).
Key distinction: individual observations use \(\sigma\); sample means use \(\sigma/\sqrt{n}\).
Q3. A population has \(\mu = 80\), \(\sigma = 20\). For samples of size \(n = 64\):
a. \(SE = \sigma/\sqrt{n} = 20/\sqrt{64} = 20/8 = \mathbf{2.5}\)
b. The sample mean is less likely to exceed 85. For an individual: \(z = (85-80)/20 = 0.25\) — a small z, so fairly likely. For the sample mean: \(z = (85-80)/2.5 = 2.0\) — a much larger z, so much less likely. Sample means have less variability than individual observations.
Three formulas to know:
| Parameter | CI Formula | Use when |
|---|---|---|
| \(\mu\), \(\sigma\) known | \(\bar{x} \pm z^* \cdot \dfrac{\sigma}{\sqrt{n}}\) | \(\sigma\) given, large \(n\) |
| \(\mu\), \(\sigma\) unknown | \(\bar{x} \pm t^* \cdot \dfrac{s}{\sqrt{n}}\), df \(= n-1\) | \(s\) from data |
| \(p\) | \(\hat{p} \pm z^* \cdot \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\) | Check: \(n\hat{p} \geq 10\), \(n(1-\hat{p}) \geq 10\) |
Correct interpretation: “We are [level]% confident that the interval captures the true population parameter.”
Warning
Higher confidence → wider interval.
A CI and a hypothesis test are related: if the null value falls outside the CI, reject \(H_0\) at the corresponding \(\alpha\).
Q4. A random sample of 36 store receipts has \(\bar{x} = \$47.20\) and \(s = \$9.00\). Construct a 95% CI for the population mean purchase amount. Interpret your interval.
Use the \(t\)-distribution since \(\sigma\) is unknown. df \(= 35\) → \(t^* \approx 2.03\) (close to 1.96 for large df).
\[47.20 \pm 2.03 \cdot \frac{9.00}{\sqrt{36}} = 47.20 \pm 2.03(1.5) = 47.20 \pm 3.05\]
Interval: ($44.15, $50.25)
Interpretation: We are 95% confident that the true mean purchase amount for all customers falls between $44.15 and $50.25.
Steps for every test:
| ✅ Say this | ❌ Not this |
|---|---|
| Reject \(H_0\) | Accept \(H_a\) |
| Fail to reject \(H_0\) | Accept \(H_0\) |
Tip
P-value: The probability of observing data as extreme as (or more extreme than) what we got, assuming \(H_0\) is true.
| \(H_0\) is True | \(H_0\) is False | |
|---|---|---|
| Reject \(H_0\) | Type I Error (\(\alpha\)) | Correct ✅ (Power) |
| Fail to Reject \(H_0\) | Correct ✅ | Type II Error (\(\beta\)) |
Q5. A company tests whether a new drug reduces blood pressure. State the consequences of a Type I and a Type II error in this context.
Type I error: Concluding the drug works when it actually doesn’t — patients receive an ineffective treatment.
Type II error: Concluding the drug doesn’t work when it actually does — patients are denied an effective treatment.
Comparing two means (pooled t-test, equal variances):
\[s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}, \quad t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}, \quad df = n_1+n_2-2\]
Effect size (Cohen’s d): \(d = \dfrac{\bar{x}_1 - \bar{x}_2}{s_p}\)
| d value | Interpretation |
|---|---|
| 0.2 | Small |
| 0.5 | Medium |
| 0.8 | Large |
Warning
Statistical significance ≠ practical significance. A tiny difference can be statistically significant with large \(n\). Always report effect size.
Q6. Two training programs are compared on exam scores.
Group A: \(n_1 = 20\), \(\bar{x}_1 = 78\), \(s_1 = 10\)
Group B: \(n_2 = 20\), \(\bar{x}_2 = 72\), \(s_2 = 10\)
a. Since \(n_1 = n_2\) and \(s_1 = s_2 = 10\): \(s_p = 10\)
b. \(d = \dfrac{78 - 72}{10} = \dfrac{6}{10} = 0.6\)
A Cohen’s \(d\) of 0.6 is between medium (0.5) and large (0.8) — a practically meaningful difference between the two programs.
Setup:
Test statistic: \(\chi^2 = \sum \dfrac{(O - E)^2}{E}\), where \(E = \dfrac{(\text{row total})(\text{col total})}{\text{grand total}}\)
Degrees of freedom: \(df = (r-1)(c-1)\)
Warning
A significant result tells you an association exists — it does NOT specify the direction or nature of the relationship.
Q7. A 3×4 contingency table (exercise: Low/Med/High vs. health: Poor/Fair/Good/Excellent) gives \(\chi^2 = 18.5\), p-value \(= 0.005\). What are the degrees of freedom? What is the conclusion at \(\alpha = 0.01\)?
\(df = (3-1)(4-1) = 2 \times 3 = 6\). p-value \(= 0.005 < \alpha = 0.01\) → Reject \(H_0\). There is significant evidence at the 0.01 level that exercise frequency and health status are associated.
When to use: Comparing means across 3 or more groups (use a two-sample \(t\)-test for 2 groups).
\[F = \frac{MSB}{MSW} = \frac{SSB/(k-1)}{SSW/(n-k)}\]
ANOVA table structure:
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between | SSB | \(k-1\) | SSB/\((k-1)\) | MSB/MSW |
| Within | SSW | \(n-k\) | SSW/\((n-k)\) | |
| Total | SST | \(n-1\) |
Rejecting \(H_0\) only tells you at least one mean differs — not which ones. Post-hoc tests (e.g., Tukey’s HSD) are needed for pairwise comparisons.
Correlation: \(-1 \leq r \leq 1\); measures strength and direction of a linear relationship.
Regression equation: \(\hat{y} = b_0 + b_1 x\)
\[b_1 = r\frac{s_y}{s_x} \qquad b_0 = \bar{y} - b_1\bar{x}\]
Key interpretations:
Warning
Correlation ≠ causation. Avoid extrapolation (predicting outside the range of observed \(x\) values).
Q8. Advertising spending (in $1000s) and monthly sales (in $1000s): \(\hat{y} = 12.5 + 2.3x\), \(r = 0.78\), \(r^2 = 0.608\), \(n = 25\).
a. For every additional $1,000 spent on advertising, monthly sales are predicted to increase by $2,300 on average.
b. \(x = 10\): \(\hat{y} = 12.5 + 2.3(10) = 35.5\) → estimated sales of $35,500.
c. About 60.8% of the variation in monthly sales is explained by the linear relationship with advertising spending.
d. Actual \(y = 40\) (thousands); predicted \(\hat{y} = 35.5\). Residual \(= 40 - 35.5 = \mathbf{4.5}\) (i.e., $4,500 above predicted).
Before you compute
When you write answers
Tip
Common point-losers to avoid:
Final reminders:
Tip
You’ve got this. Take a breath, read each question carefully, and state every conclusion in context.