Full Quarter Review

STAT 7 — Winter 2026

Welcome to Review Day!

Today’s structure (90 min):

  1. The big picture: what is statistics? (5 min)
  2. Content map: all topics at a glance (10 min)
  3. Formula sheet: every formula, organized (25 min)
  4. Break (10 min)
  5. Practice problems with real data (35 min)
  6. Final exam logistics & Q&A (5 min)

The Big Picture

Statistics is the science of learning from data under uncertainty.

The entire quarter follows one arc:

Collect data carefully → Describe what you see → Draw conclusions about the bigger picture → Quantify your uncertainty

Every method we learned asks one of two questions:

  1. What does the data tell us?Estimation (confidence intervals)
  2. Is there evidence for a claim?Hypothesis testing

Content Map: The Whole Quarter

FOUNDATIONS (Weeks 1–3)
├── Data types: numerical (continuous/discrete) vs. categorical (ordinal/nominal)
├── Study design: observational vs. experimental, randomization, confounding
├── Descriptive stats: mean, median, SD, IQR, shape, outliers
└── Visualizations: histograms, boxplots, scatterplots, bar charts

PROBABILITY (Weeks 3–4)
├── Basic rules: addition, multiplication, independence
├── Conditional probability, Bayes' Theorem
├── Diagnostic testing: sensitivity, specificity, PPV, NPV
└── Random variables: discrete, binomial, continuous, normal

INFERENCE — THE CORE (Weeks 5–9)
├── Week 5: Sampling distributions, CLT, intro to inference
├── Week 6: Confidence intervals & hypothesis tests for means
├── Week 7: t-tests (paired & independent), power analysis
├── Week 8: Correlation, regression, ANOVA
└── Week 9: Inference for proportions, chi-square tests

The Inference Roadmap

One question guides all of inference: What type of data do I have?

Situation Method
One mean (large n) z-test / z-interval
One mean (small n or unknown σ) One-sample t-test
Compare two means (paired) Paired t-test
Compare two means (independent) Two-sample t-test
Compare 3+ means ANOVA (F-test)
One proportion z-test for p / z-interval
Compare two proportions Two-proportion z-test
Association: two categorical vars Chi-square test
Association: two numerical vars Correlation / Regression

FORMULA SHEET: Part 1 — Descriptive Statistics

Mean: \(\bar{x} = \dfrac{1}{n}\sum x_i\)

Sample variance: \(s^2 = \dfrac{\sum(x_i - \bar{x})^2}{n-1}\)

Sample standard deviation: \(s = \sqrt{s^2}\)

IQR: \(IQR = Q_3 - Q_1\)

Outlier fences: Lower = \(Q_1 - 1.5 \times IQR\); Upper = \(Q_3 + 1.5 \times IQR\)

Standardized score (z-score): \(z = \dfrac{x - \mu}{\sigma}\)

FORMULA SHEET: Part 2 — Probability

Basic probability: \(P(A) = \dfrac{\text{favorable outcomes}}{\text{total outcomes}}\)

Addition rule (general): \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)

Addition rule (mutually exclusive): \(P(A \cup B) = P(A) + P(B)\)

Multiplication rule (general): \(P(A \cap B) = P(A) \times P(B|A)\)

Multiplication rule (independent): \(P(A \cap B) = P(A) \times P(B)\)

Conditional probability: \(P(A|B) = \dfrac{P(A \cap B)}{P(B)}\)

Bayes’ Theorem: \(P(A|B) = \dfrac{P(B|A) \cdot P(A)}{P(B)}\)

FORMULA SHEET: Part 3 — Diagnostic Testing

Measure Formula Meaning
Sensitivity TP / (TP + FN) P(+ test | disease)
Specificity TN / (TN + FP) P(− test | no disease)
PPV TP / (TP + FP) P(disease | + test)
NPV TN / (TN + FN) P(no disease | − test)

PPV and NPV depend on prevalence (base rate). Sensitivity and specificity do not.

FORMULA SHEET: Part 4 — Random Variables

Expected value (discrete): \(E(X) = \mu_X = \sum x \cdot P(X=x)\)

Variance (discrete): \(\text{Var}(X) = \sum (x - \mu_X)^2 \cdot P(X=x)\)

Binomial: \(X \sim Bin(n, p)\)

  • \(P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\)
  • \(\mu_X = np\); \(\sigma_X = \sqrt{np(1-p)}\)

Normal: \(X \sim N(\mu, \sigma)\)

  • Empirical rule: 68% within ±1σ; 95% within ±2σ; 99.7% within ±3σ
  • Normal approximation to binomial valid when \(np \geq 10\) and \(n(1-p) \geq 10\)

FORMULA SHEET: Part 5 — Sampling Distributions & CLT

Central Limit Theorem: For large n, \(\bar{x} \sim N\!\left(\mu, \dfrac{\sigma}{\sqrt{n}}\right)\)

Standard Error of the mean: \(SE(\bar{x}) = \dfrac{s}{\sqrt{n}}\)

Standard Error of a proportion: \(SE(\hat{p}) = \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)

Standard Error of difference of means: \(SE(\bar{x}_1 - \bar{x}_2) = \sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}\)

Standard Error of difference of proportions (CI): \(SE = \sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\)

Standard Error of difference of proportions (test, pooled): \(SE = \sqrt{\hat{p}_{pool}(1-\hat{p}_{pool})\left(\dfrac{1}{n_1} + \dfrac{1}{n_2}\right)}\)

FORMULA SHEET: Part 6 — Confidence Intervals

General form: \(\text{statistic} \pm (\text{critical value}) \times SE\)

Parameter Interval
Mean (z, σ known) \(\bar{x} \pm z^* \cdot \dfrac{\sigma}{\sqrt{n}}\)
Mean (t, σ unknown) \(\bar{x} \pm t^* \cdot \dfrac{s}{\sqrt{n}}\)
Paired difference \(\bar{d} \pm t^* \cdot \dfrac{s_d}{\sqrt{n}}\)
Two means \((\bar{x}_1 - \bar{x}_2) \pm t^* \cdot \sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}\)
One proportion \(\hat{p} \pm z^* \cdot \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)
Two proportions \((\hat{p}_1-\hat{p}_2) \pm z^* \cdot \sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\)

FORMULA SHEET: Part 7 — Hypothesis Tests

Test statistic (general): \(\text{test stat} = \dfrac{\text{statistic} - \text{null value}}{SE_{H_0}}\)

Test Statistic SE under H₀
One mean (z) \(z = \dfrac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\) \(\sigma/\sqrt{n}\)
One mean (t) \(t = \dfrac{\bar{x} - \mu_0}{s/\sqrt{n}}\) \(s/\sqrt{n}\)
Paired t \(t = \dfrac{\bar{d} - 0}{s_d/\sqrt{n}}\) \(s_d/\sqrt{n}\)
Two-sample t \(t = \dfrac{(\bar{x}_1-\bar{x}_2)}{SE}\) \(\sqrt{s_1^2/n_1 + s_2^2/n_2}\)
One proportion \(z = \dfrac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}\) \(\sqrt{p_0(1-p_0)/n}\)
Two proportions \(z = \dfrac{\hat{p}_1 - \hat{p}_2}{SE_{pool}}\) See pooled SE formula

FORMULA SHEET: Part 8 — ANOVA, Regression, Chi-Square

ANOVA F-statistic: \(F = \dfrac{MSG}{MSE} = \dfrac{\text{variation between groups}}{\text{variation within groups}}\)

  • \(df_{\text{between}} = k - 1\) (k = number of groups)
  • \(df_{\text{within}} = n - k\)

Regression: - Slope: \(b_1 = r \cdot \dfrac{s_y}{s_x}\) - Intercept: \(b_0 = \bar{y} - b_1 \bar{x}\) - \(R^2\): proportion of variability in y explained by x - Residual: \(e_i = y_i - \hat{y}_i\)

Chi-Square: - \(\chi^2 = \sum \dfrac{(O-E)^2}{E}\); Expected: \(E_{ij} = \dfrac{\text{row}_i \times \text{col}_j}{n}\) - \(df = (r-1)(c-1)\)

FORMULA SHEET: Part 9 — Power & Critical Values

Type I error (α): P(reject H₀ | H₀ is true)

Type II error (β): P(fail to reject H₀ | H₀ is false)

Power: \(1 - \beta\) = P(reject H₀ | H₀ is false)

Power increases when: larger n, larger effect size, larger α, smaller σ

Common critical values (memorize these!):

Confidence z*
90% 1.645
95% 1.960
99% 2.576

For t and F: use R output on the exam — critical values will be given, or p-values will be provided directly.

☕ BREAK — 10 minutes

After the break: Practice problems with real data. Bring your questions!

Practice: Reading R Output

Here is output from an independent samples t-test comparing resting heart rates of athletes vs. non-athletes:

    Welch Two Sample t-test

data:  heart_rate by group
t = -4.832, df = 87.3, p-value = 5.8e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -12.4  -5.2
sample estimates:
mean in group athlete mean in group non-athlete 
               62.4                      71.2 

Q1: What is the difference in sample means?

Q2: Interpret the 95% CI.

Q3: State the conclusion (use α = 0.05).

Q4: Is this a paired or independent test? How do you know?

Practice: ANOVA Output

A study compares tree growth (cm/year) across three forest types: temperate, tropical, and boreal.

             Df Sum Sq Mean Sq F value  Pr(>F)    
forest_type   2   8.84    4.42   18.74 3.2e-08 ***
Residuals   297  69.99    0.24                    

Q1: How many groups? How many total observations?

Q2: What are the null and alternative hypotheses?

Q3: Calculate the F-statistic from the output. Does it match?

Q4: Write a complete conclusion.

Q5: If the ANOVA is significant, what do we do next?

Practice: Regression Output

Predicting penguin body mass (g) from flipper length (mm):

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -5780.83     305.81  -18.90   <2e-16 ***
flipper_length_mm    49.69       1.52   32.72   <2e-16 ***

Residual standard error: 394.3 on 331 degrees of freedom
Multiple R-squared:  0.7592

Q1: Write the regression equation.

Q2: Interpret the slope in biological terms.

Q3: A penguin has a flipper length of 200 mm. Predict its body mass.

Q4: What does R² = 0.7592 mean?

Q5: Is flipper length a significant predictor? How do you know?

Practice: Chi-Square Output

A study examines whether diet type (omnivore/vegetarian/vegan) is associated with vitamin D deficiency (yes/no).

    Pearson's Chi-squared test

data:  diet_vitD
X-squared = 8.43, df = 2, p-value = 0.0148

Expected counts:

             Deficient Not.Deficient
omnivore       34.2          165.8
vegetarian     22.8          110.2
vegan          13.0           63.0

Q1: What are the dimensions of this table?

Q2: Is this independence or homogeneity? Why?

Q3: Is the expected count condition satisfied?

Q4: Write a complete conclusion.

Practice: Proportion Test Output

Testing whether a new drug reduces the proportion of patients experiencing side effects below the industry standard of 15%:

    1-sample proportions test with continuity correction

data:  24 out of 200, null probability 0.15
X-squared = 5.72, df = 1, p-value = 0.0084
alternative hypothesis: less
95 percent confidence interval:
 0.000000 0.161884
sample estimates:
p 
0.12 

Q1: What are H₀ and Hₐ?

Q2: What is the sample proportion?

Q3: Write the conclusion.

Q4: The CI includes values up to 0.162. Is this contradicting the significant result?

Common Mistakes to Avoid on the Final

1. Using the wrong SE formula - CI uses \(\hat{p}\) in the SE; test uses \(p_0\) - Two-proportion test uses pooled p̂; CI does not

2. Reversing H₀ and Hₐ - \(H_0\) always includes equality (=); \(H_a\) is the research claim

3. Misinterpreting “fail to reject” - We do NOT “accept H₀” — we just don’t have enough evidence to reject it

4. Ignoring conditions - Always check success/failure (proportions), independence, expected counts (chi-square)

5. Confusing statistical and practical significance - Large samples can make tiny effects statistically significant

Learning Objectives Checklist

Weeks 5–9 (emphasized on final):

☐ Explain CLT and sampling distributions
☐ Interpret CI (what does 95% mean?)
☐ Conduct and interpret z/t/paired/two-sample tests
☐ Distinguish paired vs. independent designs
☐ Define and interpret power
☐ Read regression output: slope, intercept, R², t-test on slope
☐ Interpret ANOVA output: F-statistic, df, p-value
☐ Compute expected counts; conduct chi-square test
☐ Distinguish independence vs. homogeneity
☐ Inference for single and two proportions — correct SE
☐ Check all conditions before interpreting results
☐ Distinguish statistical vs. practical significance
☐ Communicate results clearly in biological/health context

Final Exam Logistics

  • No tables (normal, t, F, chi-square) — not needed, p-values are in the R output
  • No calculators needed for distribution lookups
  • Focus on: reading output, checking conditions, writing conclusions
  • Critical values for 90/95/99% CIs will be provided if needed
  • R output for each question will be clearly labeled

Study tips:

  • HW8 practice final covers everything — work through all 200 questions
  • Focus on why you choose a specific SE formula
  • Practice writing conclusions in one statistical sentence + one plain-language sentence

End of Week 10 — Review Complete

Questions? Now is the time!

Good luck on the final. You’ve covered an enormous amount of material this quarter — from data collection to chi-square tests. Be proud of how far you’ve come!