STAT 7 — Winter 2026
Today’s structure (90 min):
Statistics is the science of learning from data under uncertainty.
The entire quarter follows one arc:
Collect data carefully → Describe what you see → Draw conclusions about the bigger picture → Quantify your uncertainty
Every method we learned asks one of two questions:
FOUNDATIONS (Weeks 1–3)
├── Data types: numerical (continuous/discrete) vs. categorical (ordinal/nominal)
├── Study design: observational vs. experimental, randomization, confounding
├── Descriptive stats: mean, median, SD, IQR, shape, outliers
└── Visualizations: histograms, boxplots, scatterplots, bar charts
PROBABILITY (Weeks 3–4)
├── Basic rules: addition, multiplication, independence
├── Conditional probability, Bayes' Theorem
├── Diagnostic testing: sensitivity, specificity, PPV, NPV
└── Random variables: discrete, binomial, continuous, normal
INFERENCE — THE CORE (Weeks 5–9)
├── Week 5: Sampling distributions, CLT, intro to inference
├── Week 6: Confidence intervals & hypothesis tests for means
├── Week 7: t-tests (paired & independent), power analysis
├── Week 8: Correlation, regression, ANOVA
└── Week 9: Inference for proportions, chi-square tests
One question guides all of inference: What type of data do I have?
| Situation | Method |
|---|---|
| One mean (large n) | z-test / z-interval |
| One mean (small n or unknown σ) | One-sample t-test |
| Compare two means (paired) | Paired t-test |
| Compare two means (independent) | Two-sample t-test |
| Compare 3+ means | ANOVA (F-test) |
| One proportion | z-test for p / z-interval |
| Compare two proportions | Two-proportion z-test |
| Association: two categorical vars | Chi-square test |
| Association: two numerical vars | Correlation / Regression |
Mean: \(\bar{x} = \dfrac{1}{n}\sum x_i\)
Sample variance: \(s^2 = \dfrac{\sum(x_i - \bar{x})^2}{n-1}\)
Sample standard deviation: \(s = \sqrt{s^2}\)
IQR: \(IQR = Q_3 - Q_1\)
Outlier fences: Lower = \(Q_1 - 1.5 \times IQR\); Upper = \(Q_3 + 1.5 \times IQR\)
Standardized score (z-score): \(z = \dfrac{x - \mu}{\sigma}\)
Basic probability: \(P(A) = \dfrac{\text{favorable outcomes}}{\text{total outcomes}}\)
Addition rule (general): \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
Addition rule (mutually exclusive): \(P(A \cup B) = P(A) + P(B)\)
Multiplication rule (general): \(P(A \cap B) = P(A) \times P(B|A)\)
Multiplication rule (independent): \(P(A \cap B) = P(A) \times P(B)\)
Conditional probability: \(P(A|B) = \dfrac{P(A \cap B)}{P(B)}\)
Bayes’ Theorem: \(P(A|B) = \dfrac{P(B|A) \cdot P(A)}{P(B)}\)
| Measure | Formula | Meaning |
|---|---|---|
| Sensitivity | TP / (TP + FN) | P(+ test | disease) |
| Specificity | TN / (TN + FP) | P(− test | no disease) |
| PPV | TP / (TP + FP) | P(disease | + test) |
| NPV | TN / (TN + FN) | P(no disease | − test) |
PPV and NPV depend on prevalence (base rate). Sensitivity and specificity do not.
Expected value (discrete): \(E(X) = \mu_X = \sum x \cdot P(X=x)\)
Variance (discrete): \(\text{Var}(X) = \sum (x - \mu_X)^2 \cdot P(X=x)\)
Binomial: \(X \sim Bin(n, p)\)
Normal: \(X \sim N(\mu, \sigma)\)
Central Limit Theorem: For large n, \(\bar{x} \sim N\!\left(\mu, \dfrac{\sigma}{\sqrt{n}}\right)\)
Standard Error of the mean: \(SE(\bar{x}) = \dfrac{s}{\sqrt{n}}\)
Standard Error of a proportion: \(SE(\hat{p}) = \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)
Standard Error of difference of means: \(SE(\bar{x}_1 - \bar{x}_2) = \sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}\)
Standard Error of difference of proportions (CI): \(SE = \sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\)
Standard Error of difference of proportions (test, pooled): \(SE = \sqrt{\hat{p}_{pool}(1-\hat{p}_{pool})\left(\dfrac{1}{n_1} + \dfrac{1}{n_2}\right)}\)
General form: \(\text{statistic} \pm (\text{critical value}) \times SE\)
| Parameter | Interval |
|---|---|
| Mean (z, σ known) | \(\bar{x} \pm z^* \cdot \dfrac{\sigma}{\sqrt{n}}\) |
| Mean (t, σ unknown) | \(\bar{x} \pm t^* \cdot \dfrac{s}{\sqrt{n}}\) |
| Paired difference | \(\bar{d} \pm t^* \cdot \dfrac{s_d}{\sqrt{n}}\) |
| Two means | \((\bar{x}_1 - \bar{x}_2) \pm t^* \cdot \sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}\) |
| One proportion | \(\hat{p} \pm z^* \cdot \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\) |
| Two proportions | \((\hat{p}_1-\hat{p}_2) \pm z^* \cdot \sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\) |
Test statistic (general): \(\text{test stat} = \dfrac{\text{statistic} - \text{null value}}{SE_{H_0}}\)
| Test | Statistic | SE under H₀ |
|---|---|---|
| One mean (z) | \(z = \dfrac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\) | \(\sigma/\sqrt{n}\) |
| One mean (t) | \(t = \dfrac{\bar{x} - \mu_0}{s/\sqrt{n}}\) | \(s/\sqrt{n}\) |
| Paired t | \(t = \dfrac{\bar{d} - 0}{s_d/\sqrt{n}}\) | \(s_d/\sqrt{n}\) |
| Two-sample t | \(t = \dfrac{(\bar{x}_1-\bar{x}_2)}{SE}\) | \(\sqrt{s_1^2/n_1 + s_2^2/n_2}\) |
| One proportion | \(z = \dfrac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}\) | \(\sqrt{p_0(1-p_0)/n}\) |
| Two proportions | \(z = \dfrac{\hat{p}_1 - \hat{p}_2}{SE_{pool}}\) | See pooled SE formula |
ANOVA F-statistic: \(F = \dfrac{MSG}{MSE} = \dfrac{\text{variation between groups}}{\text{variation within groups}}\)
Regression: - Slope: \(b_1 = r \cdot \dfrac{s_y}{s_x}\) - Intercept: \(b_0 = \bar{y} - b_1 \bar{x}\) - \(R^2\): proportion of variability in y explained by x - Residual: \(e_i = y_i - \hat{y}_i\)
Chi-Square: - \(\chi^2 = \sum \dfrac{(O-E)^2}{E}\); Expected: \(E_{ij} = \dfrac{\text{row}_i \times \text{col}_j}{n}\) - \(df = (r-1)(c-1)\)
Type I error (α): P(reject H₀ | H₀ is true)
Type II error (β): P(fail to reject H₀ | H₀ is false)
Power: \(1 - \beta\) = P(reject H₀ | H₀ is false)
Power increases when: larger n, larger effect size, larger α, smaller σ
Common critical values (memorize these!):
| Confidence | z* |
|---|---|
| 90% | 1.645 |
| 95% | 1.960 |
| 99% | 2.576 |
For t and F: use R output on the exam — critical values will be given, or p-values will be provided directly.
After the break: Practice problems with real data. Bring your questions!
Here is output from an independent samples t-test comparing resting heart rates of athletes vs. non-athletes:
Welch Two Sample t-test
data: heart_rate by group
t = -4.832, df = 87.3, p-value = 5.8e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.4 -5.2
sample estimates:
mean in group athlete mean in group non-athlete
62.4 71.2
Q1: What is the difference in sample means?
Q2: Interpret the 95% CI.
Q3: State the conclusion (use α = 0.05).
Q4: Is this a paired or independent test? How do you know?
A study compares tree growth (cm/year) across three forest types: temperate, tropical, and boreal.
Df Sum Sq Mean Sq F value Pr(>F)
forest_type 2 8.84 4.42 18.74 3.2e-08 ***
Residuals 297 69.99 0.24
Q1: How many groups? How many total observations?
Q2: What are the null and alternative hypotheses?
Q3: Calculate the F-statistic from the output. Does it match?
Q4: Write a complete conclusion.
Q5: If the ANOVA is significant, what do we do next?
Predicting penguin body mass (g) from flipper length (mm):
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5780.83 305.81 -18.90 <2e-16 ***
flipper_length_mm 49.69 1.52 32.72 <2e-16 ***
Residual standard error: 394.3 on 331 degrees of freedom
Multiple R-squared: 0.7592
Q1: Write the regression equation.
Q2: Interpret the slope in biological terms.
Q3: A penguin has a flipper length of 200 mm. Predict its body mass.
Q4: What does R² = 0.7592 mean?
Q5: Is flipper length a significant predictor? How do you know?
A study examines whether diet type (omnivore/vegetarian/vegan) is associated with vitamin D deficiency (yes/no).
Pearson's Chi-squared test
data: diet_vitD
X-squared = 8.43, df = 2, p-value = 0.0148
Expected counts:
Deficient Not.Deficient
omnivore 34.2 165.8
vegetarian 22.8 110.2
vegan 13.0 63.0
Q1: What are the dimensions of this table?
Q2: Is this independence or homogeneity? Why?
Q3: Is the expected count condition satisfied?
Q4: Write a complete conclusion.
Testing whether a new drug reduces the proportion of patients experiencing side effects below the industry standard of 15%:
1-sample proportions test with continuity correction
data: 24 out of 200, null probability 0.15
X-squared = 5.72, df = 1, p-value = 0.0084
alternative hypothesis: less
95 percent confidence interval:
0.000000 0.161884
sample estimates:
p
0.12
Q1: What are H₀ and Hₐ?
Q2: What is the sample proportion?
Q3: Write the conclusion.
Q4: The CI includes values up to 0.162. Is this contradicting the significant result?
1. Using the wrong SE formula - CI uses \(\hat{p}\) in the SE; test uses \(p_0\) - Two-proportion test uses pooled p̂; CI does not
2. Reversing H₀ and Hₐ - \(H_0\) always includes equality (=); \(H_a\) is the research claim
3. Misinterpreting “fail to reject” - We do NOT “accept H₀” — we just don’t have enough evidence to reject it
4. Ignoring conditions - Always check success/failure (proportions), independence, expected counts (chi-square)
5. Confusing statistical and practical significance - Large samples can make tiny effects statistically significant
Weeks 5–9 (emphasized on final):
☐ Explain CLT and sampling distributions
☐ Interpret CI (what does 95% mean?)
☐ Conduct and interpret z/t/paired/two-sample tests
☐ Distinguish paired vs. independent designs
☐ Define and interpret power
☐ Read regression output: slope, intercept, R², t-test on slope
☐ Interpret ANOVA output: F-statistic, df, p-value
☐ Compute expected counts; conduct chi-square test
☐ Distinguish independence vs. homogeneity
☐ Inference for single and two proportions — correct SE
☐ Check all conditions before interpreting results
☐ Distinguish statistical vs. practical significance
☐ Communicate results clearly in biological/health context
Study tips:
Questions? Now is the time!
Good luck on the final. You’ve covered an enormous amount of material this quarter — from data collection to chi-square tests. Be proud of how far you’ve come!