HW8: Practice Final Exam
STAT 7 — Winter 2026
🔬 Your Mission
Welcome, Statistical Detective!
You’ve made it to the end of the quarter. Before the final exam, we have assembled the ultimate challenge: a comprehensive set of problems drawn from real biological and health research.
Your mission: Work through these problems to prepare for the final. Everything emphasizes interpretation over calculation. In most questions, R output is provided — your job is to read it correctly, check conditions, and communicate conclusions.
Format: 150 Multiple Choice (1 point each) + 50 Short Answer (2 points each) = 250 points
Coverage: All learning objectives from Week 5 onward (post–normal distribution)
PART 1: MULTIPLE CHOICE (150 questions, 1 point each)
Section A: Sampling Distributions and the Central Limit Theorem
Question 1
A researcher takes repeated samples of size n = 50 from a population with mean μ = 120 and standard deviation σ = 15. Which of the following best describes the sampling distribution of the sample mean?
- Normal distribution with mean 120 and standard deviation 15
- Normal distribution with mean 120 and standard deviation 2.12
- Skewed distribution with mean 120 and standard deviation 15
- Normal distribution with mean 120 and standard deviation 0.30
Answer: b)
The sampling distribution of \(\bar{x}\) is approximately normal (by the CLT, n = 50 is large enough) with mean μ = 120 and standard error \(SE = \sigma/\sqrt{n} = 15/\sqrt{50} \approx 2.12\). The standard deviation of the sampling distribution is the standard error, not σ.
Question 2
What does the Central Limit Theorem state?
- All populations are approximately normally distributed
- Sample means from large samples are approximately normally distributed, regardless of the population shape
- The sample mean always equals the population mean
- Larger samples always produce more accurate estimates
Answer: b)
The CLT tells us about the shape of the sampling distribution of \(\bar{x}\), not the population itself. It applies regardless of population shape, as long as n is sufficiently large (generally n ≥ 30).
Question 3
A population of bacterial colony sizes is strongly right-skewed with mean μ = 45 and σ = 20. If a researcher takes random samples of n = 100, the sampling distribution of \(\bar{x}\) will be:
- Right-skewed with mean 45 and SD 20
- Approximately normal with mean 45 and SD 2
- Approximately normal with mean 45 and SD 20
- Right-skewed with mean 45 and SD 2
Answer: b)
With n = 100 (large), the CLT guarantees the sampling distribution of \(\bar{x}\) is approximately normal (not skewed), with mean μ = 45 and \(SE = 20/\sqrt{100} = 2\).
Question 4
The standard error of the mean is 3.2 when n = 25. What is the population standard deviation σ?
- 3.2
- 16
- 0.64
- 80
Answer: b)
\(SE = \sigma/\sqrt{n}\), so \(\sigma = SE \times \sqrt{n} = 3.2 \times \sqrt{25} = 3.2 \times 5 = 16\).
Question 5
Which change would reduce the standard error of the mean by half?
- Double the population mean
- Double the sample size
- Quadruple the sample size
- Reduce the population standard deviation by half
Answer: c)
\(SE = \sigma/\sqrt{n}\). To halve SE, we need \(\sqrt{n}\) to double, which requires n to quadruple. Doubling n only reduces SE by a factor of \(\sqrt{2} \approx 1.41\).
Question 6
A sampling distribution is:
- The distribution of scores in a single sample
- The distribution of a statistic computed across many samples of the same size
- The distribution of all possible population values
- A histogram of the population
Answer: b)
A sampling distribution shows how a statistic (like \(\bar{x}\)) varies across all possible samples of size n drawn from the same population.
Question 7
As sample size increases, the shape of the sampling distribution of \(\bar{x}\) becomes:
- More skewed
- More variable
- More normally distributed
- More like the population distribution
Answer: c)
This is the essence of the CLT: larger n → the sampling distribution becomes more normal, regardless of the population’s shape.
Question 8
A medical researcher samples n = 9 patients from a normally distributed population (σ = 12). The standard error of the mean is:
- 12
- 4
- 1.33
- 36
Answer: b)
\(SE = \sigma/\sqrt{n} = 12/\sqrt{9} = 12/3 = 4\).
Question 9
The Central Limit Theorem is particularly important when:
- The population is already normally distributed
- The sample size is small (n < 10)
- The population is skewed and sample sizes are large
- The sample mean equals the population mean
Answer: c)
When the population is already normal, the sampling distribution is automatically normal for any n. The CLT’s value is precisely when populations are non-normal — for large n, inference based on normality is still valid.
Question 10
If \(SE = \sigma/\sqrt{n}\), and a researcher wants to halve the SE, they should:
- Double σ
- Double n
- Quadruple n
- Halve σ
Answer: c)
To halve \(\sigma/\sqrt{n}\), you need \(\sqrt{n}\) to double, which means n must quadruple. See also Q5.
Section B: Confidence Intervals
Question 11
What is the correct interpretation of a 95% confidence interval?
- There is a 95% chance the population mean is in this interval
- 95% of the data falls within this interval
- If we repeated this procedure many times, about 95% of the resulting intervals would contain the true population mean
- The sample mean has a 95% probability of being correct
Answer: c)
This is a frequentist interpretation. The parameter is fixed (not random); it is the interval that varies from sample to sample. 95% of all such intervals would capture the true mean.
Question 12
A 95% CI for mean blood pressure is (118, 134) mmHg. A researcher claims “the true mean blood pressure is probably around 126 mmHg.” Which statement is correct?
- The researcher is wrong; we can only say the mean is in the interval
- 126 is the sample mean, but we cannot make probability statements about the population mean
- 126 is the sample mean (midpoint of the CI) and it is our best point estimate
- Both b and c are correct
Answer: d)
The midpoint of the CI is \(\bar{x} = (118+134)/2 = 126\), which is our best point estimate. However, we cannot say there is a “95% chance” the mean is at any particular value — probability statements apply to the interval, not the parameter.
Question 13
A wider confidence interval indicates:
- Greater precision
- A larger sample size
- More uncertainty about the parameter
- A smaller standard deviation
Answer: c)
Width = \(2 \times z^* \times SE\). A wider CI results from a larger SE (smaller n or larger s) or higher confidence level — all of which reflect more uncertainty in our estimate.
Question 14
Which of the following would produce the NARROWEST 95% confidence interval for a population mean?
- n = 30, s = 10
- n = 100, s = 10
- n = 100, s = 5
- n = 30, s = 5
Answer: c)
Width depends on \(SE = s/\sqrt{n}\). Compute for each: (a) 10/√30 ≈ 1.83; (b) 10/√100 = 1.0; (c) 5/√100 = 0.5 ✅; (d) 5/√30 ≈ 0.91. Largest n and smallest s → narrowest interval.
Question 15
The margin of error in a confidence interval is:
- The sample mean minus the population mean
- The critical value multiplied by the standard error
- The standard deviation divided by the sample size
- The width of the entire confidence interval
Answer: b)
Margin of error \(= z^* \times SE\) (or \(t^* \times SE\)). The CI is statistic ± margin of error, so the margin of error is half the width.
Question 16
A 99% CI will be ________ than a 95% CI based on the same data.
- Narrower
- Wider
- The same width
- Centered at a different value
Answer: b)
Higher confidence requires a larger critical value (\(z^* = 2.576\) for 99% vs. \(z^* = 1.960\) for 95%), producing a wider interval. More confidence = less precision.
Question 17
The t-distribution is used instead of the z-distribution for confidence intervals when:
- The sample size is large
- The population standard deviation is unknown
- The data are skewed
- The sample mean is large
Answer: b)
When σ is unknown, we estimate it with s, which introduces additional uncertainty. The t-distribution accounts for this by having heavier tails than the normal.
Question 18
A 95% CI for mean cholesterol level is (185, 205) mg/dL. Can we conclude that the mean is significantly different from 200?
- Yes, because 200 is above the center of the interval
- No, because 200 falls within the interval
- Yes, because the interval does not include 0
- No, because 200 is close to the upper bound
Answer: b)
A two-sided hypothesis test at α = 0.05 is equivalent to checking whether the null value falls in the 95% CI. Since 200 falls within (185, 205), we fail to reject H₀: μ = 200.
Question 19
As sample size increases, a 95% confidence interval becomes:
- Wider
- Narrower
- More likely to contain the true mean
- Centered at a different value
Answer: b)
Larger n → smaller SE → narrower CI. The confidence level (95%) is fixed by design and does not change with n.
Question 20
A researcher reports: “We are 95% confident that the mean recovery time is between 7.2 and 9.8 days.” The margin of error is:
- 2.6 days
- 1.3 days
- 8.5 days
- 0.65 days
Answer: b)
Margin of error = half the width = \((9.8 - 7.2)/2 = 2.6/2 = 1.3\) days.
Question 21
Which condition is NOT required for a valid confidence interval for a mean?
- The sample is random
- The population is normally distributed OR n ≥ 30
- The population standard deviation equals the sample standard deviation
- Observations are independent
Answer: c)
We never need σ = s. We use s precisely because σ is unknown. The actual requirements are: random sample, independence, and normality (either from a normal population or from the CLT when n is large).
Question 22
A 95% CI for the difference (A-B) in mean weights between two diet groups is (1.2, 4.8) kg. Which conclusion is correct?
- Diet A produces significantly higher weight loss (α = 0.05)
- There is no significant difference because the interval is wide
- We need to know the p-value to draw any conclusion
- The interval needs to include 0 to be valid
Answer: a)
The entire CI is positive (does not include 0), meaning we are 95% confident the true difference is positive. This is equivalent to rejecting H₀: μ₁ = μ₂ at α = 0.05. Width is irrelevant to the significance decision.
Section C: Hypothesis Testing
Question 23
In hypothesis testing, the null hypothesis H₀ typically states:
- The research hypothesis we hope to prove
- No effect, no difference, or a specific parameter value
- The alternative we will accept if p < 0.05
- The result we observed in our sample
Answer: b)
H₀ is always the “status quo” or “no effect” claim. It always contains an equality (=). We test against it, not for it.
Question 24
A p-value of 0.03 means:
- There is a 3% chance H₀ is true
- The probability of observing data this extreme (or more), assuming H₀ is true, is 3%
- There is a 97% chance H₁ is true
- The result is not practically significant
Answer: b)
The p-value is a conditional probability: P(data this extreme or more | H₀ true). It says nothing about the probability that H₀ or Hₐ is true.
Question 25
A researcher uses α = 0.05 and obtains p = 0.08. The correct conclusion is:
- There is strong evidence against H₀
- H₀ is true
- There is insufficient evidence to reject H₀
- The study needs to be repeated
Answer: c)
p = 0.08 > 0.05 = α, so we fail to reject H₀. We never “accept” H₀ — absence of evidence is not evidence of absence.
Question 26
A Type I error occurs when:
- We fail to reject a false H₀
- We reject a true H₀
- Our p-value is too large
- Our sample size is too small
Answer: b)
Type I error = false positive = rejecting H₀ when it is actually true. Its probability is α.
Question 27
A Type II error occurs when:
- We reject a true H₀
- We fail to reject a false H₀
- Our sample is not random
- The p-value is below α
Answer: b)
Type II error = false negative = failing to detect a real effect. Its probability is β. Power = 1 − β.
Question 28
Statistical significance (p < 0.05) means:
- The effect is large and important
- There is strong evidence that the effect is not zero
- The study will be published
- The effect has a 95% chance of being real
Answer: b)
Statistical significance only tells us that the observed data are unlikely under H₀. It says nothing about effect size, importance, or the probability that the effect is real.
Question 29
A researcher tests whether a new drug reduces blood pressure with H₀: μ = 0 vs. Hₐ: μ < 0. This is a:
- Two-sided test
- Left-tailed test
- Right-tailed test
- Paired test
Answer: b)
Hₐ: μ < 0 points to the left tail. The p-value is the area to the left of the observed test statistic.
Question 30
For a one-sided test with z = −8.10, the p-value is approximately:
- more than 0.5
- less than 0.001
- 0.05
- 0.10
Answer: b)
For a left-tailed test, p-value = P(Z < −8.10) < 0.00001. For a two-sided test it would also be close to zero.
Question 31
The difference between statistical significance and practical significance is:
- There is no difference; they mean the same thing
- Statistical significance indicates the effect is real; practical significance indicates the effect is important
- Practical significance is only relevant for medical research
- Statistical significance is only meaningful with large samples
Answer: b)
Statistical significance (small p-value) tells us the effect exists in the population. Practical significance asks whether the effect is large enough to matter in real-world terms. Large samples can make tiny, meaningless effects statistically significant.
Question 32
A study with n = 10,000 finds that a dietary supplement increases mean weight loss by 0.1 kg (p = 0.002). Which statement is most accurate?
- The supplement is an effective weight loss treatment
- The result is statistically significant but likely not practically meaningful
- The large sample invalidates the result
- The p-value proves the supplement works
Answer: b)
With n = 10,000, even a trivially small effect (0.1 kg ≈ 3.5 oz) can achieve statistical significance. A 0.1 kg weight loss is not clinically meaningful for most contexts.
Question 33
Decreasing α from 0.05 to 0.01:
- Increases the probability of a Type I error
- Decreases the probability of a Type I error
- Increases statistical power
- Decreases the probability of a Type II error
Answer: b)
α = P(Type I error). Decreasing α makes it harder to reject H₀, so Type I errors become less likely — but Type II errors (and β) increase, and power decreases.
Question 34
To reject H₀ at α = 0.05 in a two-sided test, you need:
- p < 0.025
- p < 0.05
- z > 1.645
- z > 2.576
Answer: b)
For a two-sided test at α = 0.05, reject H₀ when p < 0.05. The critical values are ±1.96 (not 1.645, which is for one-sided α = 0.05).
Question 35
A 95% CI for a parameter that does NOT include the null value implies:
- Failing to reject H₀ at α = 0.05
- Rejecting H₀ at α = 0.05 (two-sided)
- The parameter equals the null value
- The CI was calculated incorrectly
Answer: b)
The duality between CIs and hypothesis tests: if the null value falls outside the 95% CI, the two-sided test rejects H₀ at α = 0.05.
Section D: t-Tests
Use the following output for Questions 36–40:
Paired t-test
data: after - before
t = -3.21, df = 24, p-value = 0.0037
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-8.43 -1.87
sample estimates:
mean of x
-5.15
This test compares blood pressure (mmHg) before and after a meditation program for 25 participants.
Question 36
The p-value = 0.0037 means:
- There is a 0.37% chance the meditation works
- There is a 0.37% chance of observing a difference this large if the true mean difference is 0
- The mean difference is 0.37 mmHg
- The confidence interval is 0.37% accurate
Answer: b)
The p-value is the probability of observing a mean difference as extreme as −5.15 mmHg (or more extreme) assuming H₀: mean difference = 0 is true. It is not the probability that the treatment works.
Question 37
The correct conclusion from this output (α = 0.05) is:
- The meditation program does not significantly reduce blood pressure
- There is significant evidence that blood pressure changed after meditation
- Blood pressure increased significantly after meditation
- The study has insufficient power
Answer: b)
p = 0.0037 < 0.05, so we reject H₀. The mean difference is negative (after − before = −5.15), indicating blood pressure decreased. The alternative is two-sided, so we conclude it “changed” — and the direction (decrease) is shown by the negative mean.
Question 38
How many participants were in this study?
- 24
- 25
- 26
- Cannot tell from the output
Answer: b)
For a paired t-test, df = n − 1 = 24, so n = 25 participants.
Question 39
The 95% CI (−8.43, −1.87) means:
- 95% of participants experienced a reduction between 1.87 and 8.43 mmHg
- We are 95% confident the true mean reduction in blood pressure is between 1.87 and 8.43 mmHg
- The meditation reduced blood pressure by exactly 5.15 mmHg
- The p-value is between 1.87% and 8.43%
Answer: b)
The CI is for the population mean difference, not individual outcomes. Because both bounds are negative, the entire interval excludes 0, consistent with the significant result.
Question 40
Why is a paired t-test appropriate here instead of a two-sample t-test?
- Because the sample size is small
- Because each participant’s before and after measurements are linked
- Because the data are normally distributed
- Because the researcher wanted a smaller p-value
Answer: b)
The same 25 people are measured twice. Before and after values are not independent — they share the same subject. The paired design accounts for between-person variability by analyzing the differences within each person.
Use the following output for Questions 41–45:
Welch Two Sample t-test
data: cortisol by group
t = 2.84, df = 58.3, p-value = 0.0061
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.42 2.38
sample estimates:
mean in group control mean in group stressed
12.1 13.5
Question 41
The standard error used to compute this t-statistic is:
- (13.5 − 12.1) / 2.84
- 1.4 / 2.84
- The square root of the sum of the two group variances
- The pooled standard deviation
Answer: b)
\(t = (\bar{x}_1 - \bar{x}_2)/SE\), so \(SE = (13.5 - 12.1)/2.84 = 1.4/2.84 \approx 0.493\). Both a and b are equivalent statements; b is more direct.
Question 42
This test uses “Welch” correction because:
- The data are skewed
- The two groups may have different population variances
- The sample sizes are unequal
- The p-value is small
Answer: b)
The Welch (unpooled) t-test does not assume equal population variances. It adjusts the degrees of freedom to account for this, making it more robust than the pooled t-test.
Question 43
The 95% CI (0.42, 2.38) represents:
- The range of values for cortisol in the stressed group
- The plausible values for the difference in mean cortisol (stressed − control)
- The plausible values for the difference in mean cortisol (control − stressed)
- The range of individual cortisol values
R computes the CI as group listed first minus group listed second. If we had “control − stressed”: the CI (0.42, 2.38) should be entirely negative, meaning control < stressed as the data show: control = 12.1 < stressed = 13.5.
In this output, sample estimates show control = 12.1 and stressed = 13.5, so the difference (control − stressed) should be negative. If the CI is (0.42, 2.38), R may have computed stressed − control. Accept b) as the intended correct answer based on the ordering in sample estimates.
Answer: b) — difference in mean cortisol (stressed − control), since 13.5 − 12.1 = 1.4, which is inside (0.42, 2.38).
Question 44
Can we conclude that stress causes higher cortisol from this study? To answer this, you need to know:
- Whether the study was observational or experimental
- Whether the sample size was sufficient
- Whether the CI includes zero
- Whether the t-statistic is large
Answer: a)
Causation requires an experiment with random assignment. If this is an observational study (people self-identified as “stressed”), confounding variables could explain the difference in cortisol.
Question 45
The difference in sample means is:
- 2.84
- 0.42
- 1.4
- 13.5
Answer: c)
13.5 − 12.1 = 1.4. The value 2.84 is the t-statistic, not the mean difference.
Section E: Statistical Power
Question 46
Statistical power is defined as:
- The probability of making a Type I error
- The probability of correctly rejecting a false H₀
- The probability of failing to detect an effect
- The significance level α
Answer: b)
Power = 1 − β = P(reject H₀ | H₀ is false). It is the probability of detecting a true effect.
Question 47
Which factor does NOT increase statistical power?
- Larger sample size
- Larger effect size
- Smaller α (e.g., 0.01 vs. 0.05)
- Smaller within-group variability
Answer: c)
Smaller α makes the rejection region harder to reach, which decreases power (increases β). Larger n, larger effect size, and smaller σ all increase power.
Question 48
A study has power = 0.80. This means:
- 80% of participants will show an effect
- There is an 80% chance of detecting a real effect if it exists
- The p-value will be less than 0.80
- The Type I error rate is 20%
Answer: b)
Power = P(reject H₀ | Hₐ is true) = 0.80. There is a 20% chance of a Type II error (missing a real effect), not a 20% Type I error rate.
Question 49
A pilot study finds an effect size of d = 0.3 (small). To achieve 80% power at α = 0.05, researchers would need ________ than if d = 0.8 (large).
- Fewer participants
- More participants
- The same number of participants
- Cannot determine without more information
Answer: b)
Smaller effects are harder to detect and require larger samples to achieve the same power. This is why small-effect studies are so resource-intensive.
Question 50
A researcher fails to reject H₀ (p = 0.23). The most important follow-up question is:
- Was the p-value close to 0.05?
- Was the study adequately powered to detect a meaningful effect?
- Should they lower α to 0.01?
- Was a two-sided test used?
Answer: b)
A non-significant result is hard to interpret without knowing the power. A low-powered study (small n) that fails to reject H₀ tells us very little — we may simply have been unable to detect the effect even if it exists.
Section F: Correlation and Regression
Use the following output for Questions 51–60:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.400 1.850 6.70 <2e-16 ***
exercise_hrs -0.380 0.092 -4.13 0.00008 ***
Residual standard error: 2.14 on 98 degrees of freedom
Multiple R-squared: 0.1483
This model predicts resting heart rate (bpm) from weekly exercise hours for 100 adults.
Question 51
The slope estimate is −0.380. This means:
- For every 1-hour increase in weekly exercise, resting heart rate decreases by 0.380 bpm on average
- For every 1-bpm decrease in heart rate, exercise increases by 0.380 hours
- Exercise causes heart rate to decrease by 38% per hour
- The correlation between exercise and heart rate is −0.380
Answer: a)
The slope is always interpreted as: for a one-unit increase in x, the predicted y changes by the slope value. Slopes are not correlations, percentages, or reverse predictions.
Question 52
Is the slope statistically significant at α = 0.05?
- No, because the p-value is very small
- Yes, because p = 0.00008 < 0.05
- Yes, because the slope is negative
- No, because R² < 0.50
Answer: b)
p = 0.00008 < 0.05, so we reject H₀: β₁ = 0. The sign of the slope and the value of R² do not determine significance.
Question 53
The value R² = 0.1483 means:
- The model explains 14.83% of the variability in resting heart rate
- The correlation coefficient is 0.1483
- 14.83% of participants show the expected pattern
- The model is not useful
Answer: a)
R² is the proportion of variability in y explained by the linear model. Note: \(r = -\sqrt{0.1483} \approx -0.385\) (negative because slope is negative), not 0.1483.
Question 54
What is the predicted resting heart rate for someone who exercises 5 hours per week?
- −0.380 + 12.4 × 5 = 61.6 bpm
- 12.4 − 0.380 × 5 = 10.5 bpm
- 12.4 + (−0.380) × 5 = 10.5 bpm
- Both b and c (same calculation)
Answer: d)
\(\hat{y} = 12.4 + (-0.380)(5) = 12.4 - 1.9 = 10.5\) bpm. Options b and c are the same calculation written differently. Note: this seems biologically implausible — a heart rate of 10.5 bpm is not possible. This illustrates the danger of extrapolation if 5 hours is outside the observed range.
Question 55
The intercept (12.400) represents:
- The predicted heart rate when exercise = 0 hours per week
- The average resting heart rate in the sample
- The heart rate reduction per hour of exercise
- The maximum heart rate in the sample
Answer: a)
The intercept is the predicted y when x = 0. Whether this value is meaningful depends on whether x = 0 is within the observed range of data.
Question 56
A researcher states: “Since exercise is a significant predictor of heart rate, we can conclude that exercise lowers heart rate.” What is the flaw?
- The p-value is too small to justify this conclusion
- Regression does not prove causation; this may be an observational study
- The R² is too low for this conclusion
- The sample size is insufficient
Answer: b)
Regression shows statistical association. Without random assignment of exercise hours to participants (an experiment), we cannot rule out confounding. Healthier people may exercise more AND have lower heart rates for other reasons.
Question 57
The correlation coefficient r for this data is approximately:
- 0.1483
- 0.3851
- −0.3851
- −0.1483
Answer: c)
\(r = -\sqrt{R^2} = -\sqrt{0.1483} \approx -0.385\). The sign is negative because the slope is negative (negative relationship between exercise and heart rate).
Question 58
Residuals are defined as:
- The predicted values from the regression line
- The difference between observed and predicted y values
- The sum of squared deviations
- The slope multiplied by the error
Answer: b)
Residual \(e_i = y_i - \hat{y}_i\). Residuals measure how far each observed point is from the regression line.
Question 59
A residual plot shows a fan-shaped pattern (variance increases with fitted values). This suggests:
- The linearity condition is violated
- The equal variance (homoscedasticity) condition is violated
- The independence condition is violated
- The normality condition is violated
Answer: b)
A fan shape indicates non-constant variance (heteroscedasticity) — the spread of residuals changes across the range of fitted values. This violates the “E” in LINE (Equal variance).
Question 60
An influential point in regression is one that:
- Has a very large residual
- Is an outlier in the x direction that strongly affects the slope
- Has a y-value far from the mean
- Corresponds to a participant who didn’t follow protocol
Answer: b)
Influential points are extreme in x (high leverage) and their inclusion or removal substantially changes the slope. A point can have a large residual without being influential, and vice versa.
Section G: ANOVA
Use the following output for Questions 61–68:
Analysis of Variance Table
Df Sum Sq Mean Sq F value Pr(>F)
treatment 3 487.2 162.4 14.78 2.4e-08 ***
Residuals 196 2154.9 11.0
This output compares mean immune cell counts across 4 treatment conditions.
Question 61
How many groups are being compared?
- 3
- 4
- 196
- 200
Answer: b)
\(df_{between} = k - 1 = 3\), so \(k = 4\) groups.
Question 62
The total number of observations in this study is:
- 196
- 199
- 200
- 204
Answer: c)
Total df = df_between + df_within = 3 + 196 = 199 = n − 1, so n = 200.
Question 63
The F-statistic (14.78) is calculated as:
- Sum Sq treatment / Sum Sq Residuals
- Mean Sq treatment / Mean Sq Residuals
- Mean Sq Residuals / Mean Sq treatment
- df treatment / df Residuals
Answer: b)
\(F = MSG/MSE = 162.4/11.0 = 14.76 \approx 14.78\) ✅. Always use mean squares (not sums of squares) for the F ratio.
Question 64
What are the null and alternative hypotheses for this ANOVA?
- H₀: All treatments are the same; Hₐ: At least one treatment is different
- H₀: μ₁ = μ₂ = μ₃ = μ₄; Hₐ: All means are different
- H₀: At least one mean differs; Hₐ: All means are equal
- H₀: μ₁ = μ₂ = μ₃ = μ₄; Hₐ: At least one mean differs
Answer: d)
H₀ states all group means are equal. Hₐ only requires that at least one mean differs — not that all differ from each other. Option b’s Hₐ is too strong.
Question 65
The conclusion from this ANOVA (α = 0.05) is:
- All four treatments have the same mean immune cell count
- There is significant evidence that at least one treatment mean differs from the others
- All treatment means are significantly different from each other
- The study has insufficient power
Answer: b)
p = 2.4 × 10⁻⁸ << 0.05. We reject H₀. The ANOVA tells us that means differ but not which ones — that requires post-hoc testing.
Question 66
Why is ANOVA preferable to running three separate pairwise t-tests when comparing 4 groups?
- ANOVA is faster to compute
- Conducting multiple t-tests inflates the Type I error rate beyond α
- ANOVA uses the F-distribution which is more accurate
- t-tests cannot be used with more than 2 groups
Answer: b)
With 4 groups there are \(\binom{4}{2} = 6\) pairwise comparisons. At α = 0.05 each, P(at least one false positive) = 1 − (0.95)⁶ ≈ 0.26 — far above 5%.
Question 67
With 4 groups and α = 0.05, the probability of at least one false positive from all pairwise t-tests would be:
- 0.05
- 1 − (0.95)⁶ ≈ 0.264
- 1 − (0.95)³ ≈ 0.143
- 6 × 0.05 = 0.30
Answer: b)
4 groups → \(\binom{4}{2} = 6\) pairwise comparisons. P(at least one Type I error) = 1 − (1 − 0.05)⁶ ≈ 0.264. Option d is a conservative Bonferroni bound, not exact.
Question 68
After a significant ANOVA result, a post-hoc test (like Tukey’s HSD) is used to:
- Recalculate the F-statistic
- Determine which specific pairs of groups differ significantly
- Check the ANOVA conditions
- Increase statistical power
Answer: b)
ANOVA tells us that groups differ; post-hoc tests (with appropriate corrections for multiple comparisons) tell us which pairs differ.
Section H: Inference for Proportions
Question 69
A study finds that 45 out of 200 patients in a clinical trial showed a positive response. The sample proportion is:
- 200/45 = 4.44
- 45/200 = 0.225
- (45 + 200)/2 = 122.5
- 45²/200 = 10.125
Answer: b)
\(\hat{p} = x/n = 45/200 = 0.225\). The sample proportion is always number of successes divided by sample size.
Question 70
The standard error for the sample proportion 0.225 with n = 200 is:
- √(0.225 × 0.775 / 200) ≈ 0.0295
- 0.225 / √200 ≈ 0.0159
- √(0.225 / 200) ≈ 0.0335
- 0.225 × 0.775 / 200 ≈ 0.000872
Answer: a)
\(SE(\hat{p}) = \sqrt{\hat{p}(1-\hat{p})/n} = \sqrt{0.225 \times 0.775 / 200} = \sqrt{0.000872} \approx 0.0295\).
Question 71
The success/failure condition for proportions requires:
- p > 0.5
- np̂ ≥ 10 AND n(1−p̂) ≥ 10
- n > 30
- p̂ is normally distributed
Answer: b)
We need at least 10 observed successes and 10 observed failures. This ensures the normal approximation to the binomial is adequate.
Question 72
For testing H₀: p = p₀, the SE in the z-statistic uses:
- p̂ (the sample proportion)
- p₀ (the null value)
- The pooled proportion
- The population proportion
Answer: b)
Under H₀, we assume p = p₀ is true, so we use \(SE = \sqrt{p_0(1-p_0)/n}\). For a CI, we don’t assume a specific value for p, so we use \(\hat{p}\) instead.
Question 73
A hospital claims its C-section rate is 20%. You audit 150 deliveries and find 38 C-sections. The z-statistic for testing H₀: p = 0.20 vs. Hₐ: p ≠ 0.20 is:
- z = (0.253 − 0.20) / √(0.253 × 0.747 / 150)
- z = (0.253 − 0.20) / √(0.20 × 0.80 / 150)
- z = (38 − 30) / √(0.20 × 0.80 × 150)
- z = (0.253 − 0.20) / √(0.20 × 0.80 × 150)
Answer: b)
For a one-proportion z-test, use \(p_0\) (not \(\hat{p}\)) in the SE: \(z = (\hat{p} - p_0)/\sqrt{p_0(1-p_0)/n}\). Here \(\hat{p} = 38/150 = 0.253\) and \(p_0 = 0.20\).
Use this R output for Questions 74–78:
2-sample test for equality of proportions
data: c(85, 110) out of c(400, 500)
X-squared = 0.375, df = 1, p-value = 0.540
alternative hypothesis: two.sided
95 percent confidence interval:
-0.0503 0.0903
sample estimates:
prop 1 prop 2
0.2125 0.2200
Question 74
What are the two sample proportions?
- 85 and 110
- 0.2125 and 0.2200
- 400 and 500
- 0.540 and 0.375
Answer: b)
The “sample estimates” row gives the sample proportions: 85/400 = 0.2125 and 110/500 = 0.2200.
Question 75
The p-value = 0.540. The correct conclusion (α = 0.05) is:
- There is significant evidence of a difference in proportions
- There is insufficient evidence of a difference in proportions
- The proportions are exactly equal
- We should reduce α to find significance
Answer: b)
p = 0.540 >> 0.05, so we fail to reject H₀. We do not conclude the proportions are equal — only that we lack evidence to distinguish them.
Question 76
The 95% CI (−0.0503, 0.0903) includes 0. This is consistent with:
- Rejecting H₀ at α = 0.05
- Failing to reject H₀ at α = 0.05
- The CI and hypothesis test giving different answers
- A statistically significant difference
Answer: b)
When the CI for a difference includes 0, we cannot rule out that the true difference is zero — consistent with failing to reject H₀.
Question 77
Why does the two-proportion test use a pooled proportion in the SE, but the CI does not?
- The test assumes a specific value (H₀: p₁ = p₂) so it pools; the CI estimates without that assumption
- The CI always uses pooled proportions
- The test uses a larger standard error to be conservative
- The CI uses individual proportions because they are more accurate
Answer: a)
Under H₀: p₁ = p₂, both groups share the same true p. Pooling uses all the data to estimate this common p. For a CI, we make no such assumption and estimate each group’s p separately.
Question 78
The success/failure condition for the two-proportion test requires:
- Both groups to have n > 30
- n₁p̂₁ ≥ 10, n₁(1−p̂₁) ≥ 10, n₂p̂₂ ≥ 10, n₂(1−p̂₂) ≥ 10
- The pooled proportion to be greater than 0.5
- Both proportions to be equal
Answer: b)
We check the success/failure condition separately in each group. All four counts must be ≥ 10. We can also check this with the pooled proportion, but we have to check n₁p̂p ≥ 10, n₁(1−p̂p) ≥ 10, n₂p̂p ≥ 10, n₂(1−p̂p) ≥ 10, and not p̂p > 0.5 as stated in c).
Section I: Chi-Square Tests
Use the following output and table for Questions 79–86:
Pearson's Chi-squared test
data: diet_cancer
X-squared = 12.43, df = 2, p-value = 0.00200
| Negative | Positive | Total | |
|---|---|---|---|
| Mediterranean | 210 | 40 | 250 |
| Western | 180 | 70 | 250 |
| Vegan | 110 | 15 | 125 |
| Total | 500 | 125 | 625 |
Question 79
What is df = 2 for a 3×2 table?
- (3−1) × (2−1) = 2
- (3+2) − 1 = 4
- 3 × 2 = 6
- 3 − 1 = 2
Answer: a)
\(df = (r-1)(c-1) = (3-1)(2-1) = 2 \times 1 = 2\).
Question 80
The expected count for the cell “Mediterranean / Positive” is:
- 40
- 125 × 250/625 = 50
- 500 × 250/625 = 200
- Cannot determine from the table
Answer: b)
\(E = \frac{\text{row total} \times \text{column total}}{\text{grand total}} = \frac{250 \times 125}{625} = 50\).
The observed count was 40 — fewer positive results than expected under independence.
Question 81
The p-value = 0.002 means:
- Diet causes cancer 0.2% of the time
- If diet type and cancer screening result were independent, the probability of observing an association this strong or stronger is 0.2%
- 0.2% of participants had cancer
- The chi-square statistic is wrong
Answer: b)
The p-value is always interpreted as: assuming H₀ (independence) is true, the probability of observing a test statistic this large or larger.
Question 82
A researcher concludes “the Mediterranean diet protects against cancer.” What is the limitation?
- Chi-square can only detect associations, not causation
- The p-value is too small to support this conclusion
- Chi-square tests cannot be used with proportions
- The sample size is too small
Answer: a)
Chi-square tests association between categorical variables. Without random assignment to diet, we cannot establish causation. Confounders (lifestyle, income, other health behaviors) could explain the pattern.
Question 83
This study sampled 625 individuals and asked about both diet and cancer screening. This is a test of:
- Homogeneity
- Independence
- Goodness of fit
- Proportions
Answer: b)
When one sample is drawn and two categorical variables are measured on each individual, we test for independence.
Question 84
If instead researchers had recruited 250 Mediterranean dieters, 250 Western dieters, and 125 vegans separately, then measured cancer screening, this would be:
- A test of independence
- A test of homogeneity
- A two-proportion z-test
- An ANOVA
Answer: b)
When multiple independent samples are drawn (row totals fixed by the researcher) and one outcome is measured, we test for homogeneity of proportions across groups.
Question 85
The conditions for this chi-square test require all expected counts to be:
- E > 0
- E ≥ 5
- E ≥ 10
- E ≥ 30
Answer: b)
The standard condition for chi-square tests is all expected cell counts ≥ 5 (not observed counts). If this fails, consider combining categories or using Fisher’s Exact Test.
Question 86
Which diet group has the lowest proportion of positive screening results?
- Mediterranean
- Western
- Vegan
- Vegan and Mediterranean tied
Answer: c)
Vegan: 15/125 = 12%; Mediterranean: 40/250 = 16%; Western: 70/250 = 28%. Vegan has the lowest positive rate.
Question 87
For a chi-square test, a large test statistic (relative to df) indicates:
- Strong agreement between observed and expected counts
- Large differences between observed and expected counts
- A small sample size
- That H₀ should be accepted
Answer: b)
\(\chi^2 = \sum (O-E)^2/E\). Large values arise when observed counts are far from what independence would predict.
Question 88
The chi-square distribution is:
- Symmetric around 0
- Always left-skewed
- Non-negative and right-skewed
- Identical to the t-distribution
Answer: c)
\(\chi^2\) is always ≥ 0 (because we square the deviations) and right-skewed. It approaches symmetry as df increases.
Question 89
A 2×2 table chi-square test and a two-proportion z-test are related by:
- χ² = z
- χ² = z²
- z = √χ²
- Both b and c
Answer: d)
For a 2×2 table, \(\chi^2 = z^2\), and equivalently \(z = \sqrt{\chi^2}\) (taking the appropriate sign). This is why both tests give identical two-sided p-values for 2×2 tables.
Question 90
If all expected counts exactly equal observed counts, the chi-square statistic is:
- Undefined
- Equal to the degrees of freedom
- 0
- 1
Answer: c)
\(\chi^2 = \sum (O-E)^2/E\). If O = E for every cell, every term is \((0)^2/E = 0\), so \(\chi^2 = 0\) — perfect consistency with H₀.
Section J: Mixed Concepts and Reading R Output
Question 91
A study tests H₀: μ₁ = μ₂ = μ₃ and gets F = 0.42, p = 0.66. The correct interpretation is:
- All three means are significantly different
- There is insufficient evidence that any of the group means differ
- Two of the three means are equal
- The F-statistic is too large to be meaningful
Answer: b)
p = 0.66 >> 0.05. We fail to reject H₀. A small F (close to 1) means between-group variation is no larger than within-group variation.
Question 92
In a regression model, a t-test is run on the slope coefficient. H₀ is:
- The slope equals 1
- The slope equals 0 (no linear relationship)
- The intercept equals 0
- R² equals 0
Answer: b)
H₀: β₁ = 0 tests whether the predictor has any linear relationship with the response. Failing to reject this means the slope is not significantly different from zero.
Question 93
The conditions for a two-sample t-test include:
- Normal populations OR large samples (n₁ ≥ 30 and n₂ ≥ 30), independent groups
- The two populations must have equal variances
- Both samples must be the same size
- The data must be from a controlled experiment
Answer: a)
The Welch t-test does not require equal variances. Equal sample sizes are not required. Observational studies can use t-tests (though causation cannot be concluded). The key conditions are independence and normality/large n.
Question 94
A paired t-test on 30 pairs has df = _____.
- 30
- 29
- 58
- 60
Answer: b)
For a paired t-test, df = n − 1 where n is the number of pairs, not the total number of observations. Here df = 30 − 1 = 29.
Question 95
Which test is most appropriate: 200 patients are randomized to Drug A, Drug B, or placebo, and mean pain scores are compared?
- One-sample t-test
- Paired t-test
- Two-sample t-test
- One-way ANOVA
Answer: d)
Three independent groups, comparing a numerical mean outcome → one-way ANOVA. Two-sample t-tests only handle two groups.
Question 96
Which test is most appropriate: 50 patients’ pain scores are measured before and after treatment?
- One-sample t-test
- Paired t-test
- Two-sample t-test
- Chi-square test
Answer: b)
Same patients measured at two time points → paired design. Each before–after pair is linked.
Question 97
Which test is most appropriate: testing whether blood type (A/B/AB/O) is associated with BMI category (normal/overweight/obese)?
- ANOVA
- Two-sample t-test
- Chi-square test for independence
- Regression
Answer: c)
Both variables are categorical (blood type = nominal; BMI category = ordinal treated as nominal) → chi-square test.
Question 98
Which test is most appropriate: studying the relationship between age and resting heart rate?
- Chi-square test
- ANOVA
- Linear regression
- Paired t-test
Answer: c)
Both variables are numerical → linear regression (or correlation). ANOVA would require categorizing one variable.
Question 99
A confidence interval for a proportion is (0.42, 0.58). If we test H₀: p = 0.50 (two-sided, α = 0.05), we would:
- Reject H₀ because 0.50 is not the midpoint
- Fail to reject H₀ because 0.50 falls within the interval
- Reject H₀ because the interval is wide
- Need a p-value to decide
Answer: b)
0.50 lies within (0.42, 0.58), so we fail to reject H₀: p = 0.50 at α = 0.05. The CI and hypothesis test are equivalent for two-sided tests.
Question 100
A study comparing penguin bill lengths across 3 species produces:
Df Sum Sq Mean Sq F value Pr(>F)
species 2 7194 3597 410.6 <2e-16 ***
Residuals 330 2892 8.8
The between-species variability is ______ times greater than the within-species variability:
- 2
- 410.6
- 7194
- 3597
Answer: b)
F = MSG/MSE = 3597/8.8 ≈ 410.6. The F-statistic is the ratio of between-group to within-group variance.
Question 101
A 95% CI for a proportion is (0.31, 0.49). The sample proportion is:
- 0.31
- 0.49
- 0.40
- 0.18
Answer: c)
\(\hat{p}\) = midpoint = \((0.31 + 0.49)/2 = 0.40\).
Question 102
What is the margin of error for the CI (0.31, 0.49)?
- 0.18
- 0.09
- 0.40
- 0.31
Answer: b)
Margin of error = half the width = \((0.49 - 0.31)/2 = 0.18/2 = 0.09\).
Question 103
For large samples, the sampling distribution of p̂ is approximately:
- t-distributed
- chi-square distributed
- Normal
- Uniform
Answer: c)
By the CLT applied to proportions: when np̂ ≥ 10 and n(1−p̂) ≥ 10, the distribution of p̂ is approximately normal.
Question 104
The residual standard error in a regression output (2.14 on 98 df) estimates:
- The average error in predicting y
- The standard deviation of x
- The slope of the regression line
- The mean of the residuals
Answer: a)
The residual standard error (RSE) estimates the typical distance between observed y values and the regression line — the standard deviation of the residuals.
Question 105
An R² = 0.92 in a regression means:
- The slope is 0.92
- 92% of the variability in y is explained by the linear relationship with x
- The correlation coefficient is 0.92
- The model predicts correctly 92% of the time
Answer: b)
R² is always “proportion of variability in y explained by the model.” Note: r = ±√0.92 ≈ ±0.959, not 0.92 itself.
Question 106
If r = −0.85 between two variables, we can say:
- As x increases by 1 unit, y decreases by 0.85 units
- There is a strong negative linear relationship
- x causes y to decrease
- 85% of the variability in y is explained by x
Answer: b)
r describes strength and direction. It is not a slope (option a), does not imply causation (option c), and \(R^2 = r^2 = 0.7225\), not 0.85 (option d).
Question 107
A chi-square test with df = 4 and χ² = 2.1 (p = 0.72) would lead to:
- Rejecting H₀ of independence
- Failing to reject H₀ of independence
- Concluding the variables are definitely independent
- Concluding the test was underpowered
Answer: b)
p = 0.72 >> 0.05. We fail to reject H₀. We do NOT conclude independence — only that we lack evidence of association.
Question 108
ANOVA assumes:
- The populations have equal means
- The populations have equal variances (homoscedasticity)
- The populations are all skewed
- The sample sizes are all equal
Answer: b)
ANOVA conditions: (1) independent random samples, (2) approximately normal populations (or large n), (3) equal population variances. Equal sample sizes are helpful but not required.
Question 109
The pooled standard deviation in a two-sample t-test:
- Weights both sample standard deviations equally
- Weights the standard deviations by their respective degrees of freedom
- Always gives a smaller SE than the Welch method
- Is used when populations clearly have unequal variances
Answer: b)
The pooled SD is a weighted average using df as weights: \(s_p^2 = [(n_1-1)s_1^2 + (n_2-1)s_2^2]/(n_1+n_2-2)\). It is used when variances are assumed equal.
Question 110
A paired study finds t = 1.85, df = 19, p = 0.079 (two-sided). At α = 0.05:
- The difference is statistically significant
- The difference is not statistically significant
- The null hypothesis is accepted
- The study must be redone
Answer: b)
p = 0.079 > 0.05. Fail to reject H₀. We never “accept” H₀. The study could be repeated with more power, but there is no requirement to do so.
Question 111
Power = 1 − β. If β = 0.20, power is:
- 0.20
- 0.80
- 1.20
- 0.80%
Answer: b)
Power = 1 − β = 1 − 0.20 = 0.80. Power of 80% is considered the conventional minimum for well-designed studies.
Question 112
Which of the following best describes the relationship between α and power?
- Increasing α increases power
- Increasing α decreases power
- α and power are independent
- Power = 1 − α
Answer: a)
Increasing α makes it easier to reject H₀ → increases power but also increases Type I error. There is a fundamental trade-off between α (Type I error) and β (Type II error).
Question 113
A study is designed with 80% power. This means:
- There is an 80% chance H₀ is false
- If the effect exists, there is an 80% chance of detecting it
- The p-value will be less than 0.20
- The Type I error rate is 20%
Answer: b)
Power = P(reject H₀ | H₀ is false) = 0.80. There is a 20% chance of a Type II error — missing a real effect.
Question 114
If two 95% confidence intervals for two group means do not overlap, we can conclude:
- The difference is not statistically significant
- The difference is statistically significant at α = 0.05
- The difference is practically significant
- A t-test is unnecessary
Answer: b)
Non-overlapping 95% CIs imply significance at approximately α = 0.05. (Technically this is a conservative rule — overlapping CIs do not necessarily imply non-significance, but non-overlapping does imply significance.)
Question 115
The standard error of the difference between two independent means is:
- SE = s₁/√n₁ + s₂/√n₂
- SE = √(s₁²/n₁ + s₂²/n₂)
- SE = (s₁ + s₂) / √(n₁ + n₂)
- SE = s_pooled / √(n₁ + n₂)
Answer: b)
The correct formula adds variances (not standard deviations): \(SE = \sqrt{s_1^2/n_1 + s_2^2/n_2}\). Option a is wrong because you cannot add SEs directly.
Question 116
In the output Pr(>|t|) = 0.0043, the test is:
- One-sided
- Two-sided
- Cannot tell from this notation
- This notation indicates a chi-square test
Answer: b)
The notation |t| (absolute value of t) indicates the p-value is for a two-sided test: P(|T| > |t_observed|).
Question 117
Cramér’s V measures:
- The significance of a chi-square test
- The effect size (strength) of an association in a contingency table
- The degrees of freedom
- The expected cell count
Answer: b)
Cramér’s V ranges from 0 (no association) to 1 (perfect association). It is the chi-square analog of a correlation coefficient and measures effect size, not significance.
Question 118
For a regression slope, a 95% CI that does NOT include 0 means:
- The intercept is significant
- The slope is significantly different from 0 at α = 0.05
- R² > 0.50
- The residuals are normally distributed
Answer: b)
A 95% CI for the slope that excludes 0 is equivalent to rejecting H₀: β₁ = 0 at α = 0.05 (two-sided).
Question 119
A researcher says “increasing sleep by 1 hour causes exam scores to increase by 3 points, based on our regression.” What is the issue?
- The slope is too small to be meaningful
- Regression shows association; “causes” requires experimental design
- Regression cannot predict exam scores
- The intercept is not reported
Answer: b)
Unless sleep was experimentally manipulated (randomized), we only have an observational association. Confounders (study habits, stress, health) could explain the relationship.
Question 120
When checking conditions for a chi-square test, you calculate expected count = 3.5 for one cell. You should:
- Proceed with the test as normal
- Consider combining categories or using an alternative test
- Increase α to 0.10 to compensate
- Remove that cell from the table
Answer: b)
The chi-square approximation is not valid when expected counts are below 5. Options include collapsing categories, collecting more data, or using Fisher’s Exact Test.
Question 121
The df for a one-sample t-test with n = 25 is:
- 25
- 24
- 26
- 50
Answer: b)
df = n − 1 = 25 − 1 = 24 for any one-sample or paired t-test.
Question 122
An ANOVA F-test is always _____ tailed.
- Left
- Two
- Right
- It depends on the alternative hypothesis
Answer: c)
F-statistics are always non-negative. Large F values provide evidence against H₀. The p-value is always the area in the right tail of the F-distribution.
Question 123
Which R function conducts a chi-square test?
t.test()prop.test()chisq.test()aov()
Answer: c)
chisq.test() conducts Pearson’s chi-square test. prop.test() conducts proportion tests (which for 2×2 tables produces an equivalent chi-square result, but is framed differently).
Question 124
Which R function conducts a paired t-test?
t.test(x, y, paired = TRUE)chisq.test(x, y)aov(y ~ group)lm(y ~ x)
Answer: a)
Setting paired = TRUE in t.test() tells R to compute differences within pairs and test whether the mean difference equals zero.
Question 125
In a two-proportion test where the success/failure condition is barely met (np̂ = 10), you should:
- Proceed and report results as you normally would
- Note that the condition is just met and interpret results cautiously
- Use a t-test instead
- Double the sample size before analyzing
Answer: b)
np̂ = 10 is the minimum threshold. The normal approximation will be adequate but not ideal. It is good practice to note the borderline condition and interpret with appropriate caution.
Question 126
A scatterplot of residuals vs. fitted values should show:
- A clear curved pattern
- Randomly scattered points with no pattern
- A strong positive linear pattern
- All residuals close to zero
Answer: b)
A random scatter (no pattern) indicates that the linearity and equal variance conditions are met. Any systematic pattern (curve, fan, etc.) signals a violation.
Question 127
Which of the following is a correct statement about p-values?
- p = 0.05 means H₀ has a 5% chance of being true
- A very small p-value means a large effect
- p-value is the probability of the observed data (or more extreme) given H₀ is true
- p < 0.05 always implies the result is important
Answer: c)
This is the correct definition. p-values say nothing about the probability of H₀ being true (a), the effect size (b), or practical importance (d).
Question 128
A 90% CI is wider than a 99% CI.
- True
- False — a 99% CI is wider
- They are the same width
- It depends on the sample size
Answer: b)
Higher confidence level → larger critical value → wider CI. A 99% CI (\(z^* = 2.576\)) is wider than a 90% CI (\(z^* = 1.645\)).
Question 129
When a 95% CI for the difference in two means is (−2, 8), we:
- Conclude there is a significant difference (α = 0.05)
- Cannot conclude there is a significant difference (interval includes 0)
- Conclude the means are equal
- Need to know the sample size to interpret
Answer: b)
The CI contains 0, meaning 0 is a plausible value for the difference. We fail to reject H₀: μ₁ = μ₂ at α = 0.05.
Question 130
The mean of the sampling distribution of \(\bar{x}\) is:
- The sample mean \(\bar{x}\)
- The standard error σ/√n
- The population mean μ
- Zero
Answer: c)
The sampling distribution of \(\bar{x}\) is centered at the population mean μ. This is why \(\bar{x}\) is an unbiased estimator of μ.
Question 131
If H₀: p = 0.40 and the sample has \(\hat{p} = 0.40\), the test statistic z =:
- 0
- 1
- Undefined
- It depends on n
Answer: a)
\(z = (\hat{p} - p_0)/SE = (0.40 - 0.40)/SE = 0/SE = 0\). The observed proportion exactly matches the null, so there is no deviation to speak of.
Question 132
In ANOVA, MSG (Mean Square Between Groups) estimates:
- The variance within groups
- The variance explained by group differences
- The total variance
- The residual variance
Answer: b)
MSG = SS_between / df_between. It captures variability due to group membership. MSE captures variability within groups. The F-ratio compares them.
Question 133
A correlation of r = 0 means:
- There is no relationship between x and y
- There is no linear relationship between x and y
- x and y are perfectly related
- The regression slope is 1
Answer: b)
r measures the strength of the linear relationship. r = 0 could still be consistent with a strong nonlinear (e.g., quadratic) relationship.
Question 134
For a regression line, the predicted value at x = \(\bar{x}\) (the mean of x) is always:
- 0
- b₀ (the intercept)
- \(\bar{y}\) (the mean of y)
- R²
Answer: c)
The regression line always passes through \((\bar{x}, \bar{y})\). Plugging x = \(\bar{x}\) into \(\hat{y} = b_0 + b_1 x\) gives \(\hat{y} = \bar{y}\).
Question 135
A hospital has a 3% surgical complication rate. A quality control audit of 30 surgeries finds 2 complications (6.7%). What is the most important reason to be cautious about testing this?
- 2 complications is a very small number
- The success/failure condition: np₀ = 30 × 0.03 = 0.9 < 10
- The sample size is too large
- The complication rate is too high
Answer: b)
np₀ = 30 × 0.03 = 0.9 << 10. The normal approximation is severely violated. With so few expected complications, the binomial distribution is extremely skewed and the z-test is not reliable.
Question 136
In a two-sample t-test, the degrees of freedom (Welch approximation) are:
- n₁ + n₂
- n₁ + n₂ − 1
- n₁ + n₂ − 2
- A complex formula, typically smaller than n₁ + n₂ − 2
Answer: d)
The Welch-Satterthwaite df formula produces a non-integer value that is ≤ n₁ + n₂ − 2 (the pooled df). R computes this automatically.
Question 137
The residual in a regression equals:
- y − ŷ
- ŷ − y
- y − ȳ
- x − x̄
Answer: a)
Residual = observed − predicted = \(y_i - \hat{y}_i\). Positive residuals mean the model underpredicts; negative residuals mean overprediction.
Question 138
If p-value = 0.001 and α = 0.05, we:
- Fail to reject H₀
- Reject H₀
- Accept H₀
- Cannot make a decision
Answer: b)
0.001 < 0.05 = α, so we reject H₀. We never “accept” H₀ — we can only reject or fail to reject it.
Question 139
Which is the best description of “80% confidence interval”?
- 80% of all data falls in this interval
- If repeated many times, 80% of such intervals contain the true parameter
- There is an 80% chance the true mean equals the midpoint
- The interval is correct 80% of the time
Answer: b)
The confidence level is a long-run frequency property of the procedure, not a probability statement about any single interval.
Question 140
For a regression, the LINE conditions stand for:
- Likelihood, Independence, Normality, Error
- Linearity, Independence, Normal residuals, Equal variance
- Large n, Independence, Null hypothesis, Estimation
- Linearity, Intercept, Nonlinearity, Errors
Answer: b)
LINE: Linearity (linear relationship between x and y), Independence (observations independent), Normal residuals (residuals approximately normal), Equal variance (constant spread of residuals).
Question 141
A statistically significant ANOVA tells us:
- Which specific group means are different
- That the variation between groups is larger than within groups, relative to chance
- All groups are significantly different from each other
- The F-statistic is greater than 5
Answer: b)
A significant ANOVA (F large, p small) only tells us that group means are not all equal. Post-hoc tests are needed to identify which groups differ.
Question 142
A chi-square test for homogeneity is used when:
- One sample is drawn and two categorical variables are measured
- Multiple independent samples are drawn and one categorical outcome is measured
- Means from multiple groups are compared
- Proportions are compared using a z-test
Answer: b)
Homogeneity: researcher fixes group sizes (row totals) and measures one outcome. Independence: one sample, two variables measured on each person.
Question 143
In a 4×3 contingency table, df =:
- (4−1) × (3−1) = 6
- (4 × 3) − 1 = 11
- 4 + 3 = 7
- 4 × 3 = 12
Answer: a)
\(df = (r-1)(c-1) = (4-1)(3-1) = 3 \times 2 = 6\).
Question 144
All else equal, a larger effect size leads to:
- Lower power
- Higher power
- A larger Type I error rate
- A wider confidence interval
Answer: b)
Larger effects are easier to detect. Higher power means we are more likely to correctly reject a false H₀.
Question 145
The standard normal distribution is used (instead of t) for inference about proportions because:
- Proportions are always normally distributed
- The SE formula for proportions is derived from the binomial, and for large n approximates the normal
- t-distributions cannot be used with categorical data
- Proportions have smaller standard errors than means
Answer: b)
The binomial distribution approximates the normal for large n (success/failure condition). Because the SE formula \(\sqrt{p(1-p)/n}\) is known from theory, we use z (not t, which is for unknown σ).
Question 146
A researcher reports r = 0.65 and p = 0.002 for a correlation between diet quality and cognitive test scores. The correct interpretation is:
- Diet quality causes higher cognitive scores
- There is a moderately strong positive linear relationship; diet quality explains approximately 42% of variability in cognitive scores
- There is a 65% correlation, meaning 65% of the variation in cognitive scores is due to diet
- The p-value proves a causal relationship
Answer: b)
r = 0.65 indicates a moderate positive relationship. \(R^2 = r^2 = 0.4225 \approx 42\%\) of variability explained. Correlation never implies causation.
Question 147
Adding more explanatory variables to a regression model always:
- Decreases R²
- Increases or maintains R²
- Decreases the residual standard error
- Increases the significance of the slope
Answer: b)
Adding predictors never decreases R² (it can only stay the same or increase), which is why adjusted R² is preferred when comparing models with different numbers of predictors.
Question 148
The null hypothesis for a chi-square test of independence is:
- The two variables are perfectly correlated
- The two variables are independent (no association)
- All expected counts equal all observed counts
- The test statistic equals the degrees of freedom
Answer: b)
H₀: the two categorical variables are independent (knowing the value of one tells you nothing about the other). Option c would be perfect fit, not independence.
Question 149
When the population is normal, a one-sample t-test is valid for:
- Any sample size
- Only n ≥ 30
- Only n ≥ 10
- Only large samples
Answer: a)
When the population is truly normal, the t-test is exact for any n. The n ≥ 30 rule is a practical guideline for when the CLT applies to non-normal populations.
Question 150
A result is practically significant if:
- p < 0.05
- The effect is large enough to matter in real-world terms
- The study was well-designed
- The confidence interval does not include 0
Answer: b)
Practical significance is a judgment about whether the effect size is meaningful, not a statistical threshold. It requires domain knowledge, not just a p-value.
PART 2: SHORT ANSWER (50 questions, 2 points each)
For all questions, write complete sentences. Provide both a statistical conclusion and a plain-language interpretation.
SA1
The sampling distribution of \(\bar{x}\) for samples of size n = 36 from a population with μ = 70 and σ = 18 has what mean and standard error? Describe in one sentence what this distribution represents.
Mean = 70; \(SE = 18/\sqrt{36} = 3\).
This distribution represents all possible sample means we could get if we repeatedly drew samples of size 36 from this population — most would be near 70, and about 95% would fall within ±6 (i.e., two standard errors) of 70.
SA2
A 95% CI for mean recovery time is (6.4, 9.2) days. A doctor claims “most patients recover in 6 to 9 days.” Is this a correct interpretation of the CI? Explain.
No. The CI is a statement about the population mean, not individual patients. It says we are 95% confident the true mean recovery time lies between 6.4 and 9.2 days. Individual recovery times will vary much more widely around that mean — many patients could recover outside this range. To describe individual patient variation, you would use a prediction interval, not a confidence interval.
SA3
A study testing whether a new antidepressant reduces depression scores gets t = −1.85, df = 28, p = 0.075 (two-sided). Write a complete conclusion at α = 0.05, including what this p-value means.
Statistical conclusion: p = 0.075 > 0.05, so we fail to reject H₀ at α = 0.05. There is insufficient evidence that the antidepressant significantly reduces depression scores.
Interpretation of p-value: If the drug had no true effect (H₀ true), there would be a 7.5% chance of observing a mean reduction as large as this or larger just by sampling variability.
Note: This is not strong evidence that the drug doesn’t work — the study may simply lack sufficient power to detect a real effect.
SA4
Define Type I and Type II errors. In the context of testing whether a new drug lowers blood pressure, describe a real-world consequence of each error type.
Type I error (false positive, probability = α): Rejecting H₀ when it is true — concluding the drug lowers blood pressure when it actually does not. Consequence: the drug may be approved and prescribed, exposing patients to side effects and costs with no real benefit.
Type II error (false negative, probability = β): Failing to reject H₀ when it is false — concluding there is insufficient evidence that the drug works, even though it actually does. Consequence: an effective treatment is abandoned; patients continue to suffer uncontrolled hypertension.
SA5
A study with n = 5,000 finds that people who exercise 3 hours/week have 0.5% lower resting heart rate than sedentary people (p = 0.03). Comment on both statistical and practical significance.
Statistically significant: p = 0.03 < 0.05, so we reject H₀. The difference is unlikely due to chance.
Practically insignificant: A 0.5% difference in resting heart rate (e.g., ~0.35 bpm if mean is 70 bpm) is far too small to be clinically meaningful. With n = 5,000, even trivially small effects achieve statistical significance. Clinicians would not change treatment recommendations based on a 0.35 bpm difference. This is a classic example where statistical significance overstates the importance of the finding.
SA6
Use the following output to answer this question:
Paired t-test
data: post - pre
t = 4.12, df = 39, p-value = 0.00018
95 percent confidence interval:
2.10 6.30
sample estimates:
mean of x
4.20
This test compares sleep quality scores (0–10) before and after a sleep hygiene program for 40 participants. Write a complete conclusion with interpretation.
Hypotheses: H₀: mean difference = 0; Hₐ: mean difference ≠ 0.
Statistical conclusion: t(39) = 4.12, p = 0.00018 < 0.05. We reject H₀. There is very strong evidence that sleep quality scores changed significantly after the program.
Interpretation: The mean increase in sleep quality score was 4.20 points. We are 95% confident the true mean improvement is between 2.10 and 6.30 points on the 0–10 scale. Since the CI is entirely positive, participants improved. An improvement of 2–6 points on a 10-point scale would likely be considered clinically meaningful.
SA7
Explain why a paired design is more powerful than an independent samples design for measuring change over time within the same subjects.
In a paired design, we analyze within-subject differences, which removes between-subject variability. People naturally differ in their baseline values (e.g., some have inherently higher blood pressure). In an independent samples design, this between-person variation inflates the SE, making it harder to detect the treatment effect. By looking only at each person’s own change, the paired test eliminates this noise — the SE of the differences is often much smaller than the SE of the group means, leading to a larger t-statistic and greater power.
SA8
A Welch two-sample t-test gives t = 2.14, df = 45.2, p = 0.037. The 95% CI for the difference is (0.18, 5.82). Interpret the CI and comment on practical significance for weight loss in kg.
CI interpretation: We are 95% confident the true difference in mean weight loss between the two groups is between 0.18 and 5.82 kg. Since the interval excludes 0, there is a statistically significant difference at α = 0.05.
Practical significance: The very wide CI reveals substantial uncertainty. The lower bound (0.18 kg ≈ 6 oz) is trivially small and clinically irrelevant, while the upper bound (5.82 kg) would be clinically meaningful. With this much uncertainty, we cannot confidently say whether the treatment produces a practically important weight difference. A larger study is needed to narrow the CI and determine whether the effect is clinically relevant.
SA9
A power analysis suggests you need n = 120 participants per group to detect an effect size of d = 0.40 with 80% power at α = 0.05. Your budget allows only n = 60 per group. Describe two consequences of proceeding with the smaller sample.
1. Reduced power: With n = 60 instead of 120, power will be well below 80% (roughly 55–60% for d = 0.40). There is a substantially higher probability of failing to detect a real effect (Type II error), meaning the study may conclude “no effect” even if the treatment truly works.
2. Less precise estimates: Confidence intervals will be wider, providing less informative estimates of effect size. Even if the result is statistically significant, the CI will span a large range, making it difficult to assess whether the effect is practically meaningful.
SA10
A researcher states: “Our study failed to detect a significant effect (p = 0.15), proving that the treatment doesn’t work.” Identify and explain the flaw.
Flaw: Failing to reject H₀ is not the same as proving H₀ is true. “Absence of evidence is not evidence of absence.”
A non-significant result (p = 0.15) could occur because: (1) the treatment truly has no effect, OR (2) the study was underpowered — the sample was too small to detect a real effect. Without knowing the power of the study, we cannot interpret the non-significant result as proof that the treatment doesn’t work. The researcher should report a confidence interval and discuss whether the study was adequately powered to detect a clinically meaningful effect.
SA11
Use this regression output for SA11–SA15:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 95.200 12.800 7.44 <2e-16 ***
salt_intake 1.820 0.410 4.44 0.0001 ***
Multiple R-squared: 0.289
Residual standard error: 8.3 on 73 degrees of freedom
This model predicts systolic blood pressure (mmHg) from daily salt intake (grams).
Write the regression equation and interpret the slope in biological terms.
Equation: \(\widehat{\text{BP}} = 95.2 + 1.82 \times \text{salt intake}\)
Slope interpretation: For each additional gram of daily salt intake, predicted systolic blood pressure increases by 1.82 mmHg on average, holding all other factors constant. This suggests a positive association between sodium consumption and blood pressure.
SA12
Interpret R² = 0.289 in context.
Daily salt intake explains approximately 28.9% of the variability in systolic blood pressure across participants. This means the linear relationship with salt accounts for less than a third of BP variation — other factors (age, genetics, exercise, medications, overall diet) explain the remaining ~71%.
SA13
Is salt intake a statistically significant predictor of blood pressure? Cite specific output values.
Yes. The t-statistic for the slope is t = 4.44 with p = 0.0001 < 0.05. We reject H₀: β₁ = 0. There is very strong evidence that salt intake is a statistically significant linear predictor of systolic blood pressure.
SA14
A patient consumes 10g of salt per day. What is the predicted systolic blood pressure? Show your work.
\(\hat{y} = 95.2 + 1.82 \times 10 = 95.2 + 18.2 = 113.4\) mmHg.
We predict a systolic blood pressure of 113.4 mmHg for someone consuming 10g of salt daily. This prediction is valid only if 10g is within the range of salt intakes observed in the study.
SA15
A researcher concludes: “Reducing salt intake will lower blood pressure.” Is this conclusion justified? Explain.
Not fully justified. The regression output shows a statistically significant positive association between salt intake and blood pressure. However, unless this study involved random assignment of salt intake levels (an experiment), we cannot conclude causation. Confounders — such as overall diet quality, exercise habits, or socioeconomic status — could explain why people who eat more salt also have higher blood pressure. To justify a causal claim, a randomized controlled trial where participants are assigned to different sodium intake levels would be needed.
SA16
Use this ANOVA output for SA16–SA20:
Analysis of Variance Table
Df Sum Sq Mean Sq F value Pr(>F)
habitat 3 156.8 52.27 8.43 0.0001 ***
Residuals 96 595.2 6.20
This compares mean nest-building time (hours) for birds from 4 habitat types.
State the hypotheses and write a complete conclusion.
H₀: μ₁ = μ₂ = μ₃ = μ₄ (mean nest-building times are equal across all four habitat types)
Hₐ: At least one habitat type has a different mean nest-building time
Conclusion: F(3, 96) = 8.43, p = 0.0001 < 0.05. We reject H₀. There is very strong evidence that mean nest-building time differs across at least one pair of habitat types.
SA17
Calculate the total sample size.
Total df = df_between + df_within = 3 + 96 = 99 = n − 1, so n = 100 total birds observed.
SA18
Why would it be inappropriate to test all pairwise comparisons with individual t-tests after seeing this ANOVA result?
With 4 groups there are \(\binom{4}{2} = 6\) pairwise comparisons. If each is tested at α = 0.05, the probability of at least one false positive is \(1 - (0.95)^6 \approx 0.26\) — nearly 5 times the intended error rate. This inflated Type I error means we would be very likely to conclude differences exist even when all means are truly equal. Post-hoc procedures (like Tukey’s HSD) correct for this by adjusting the threshold for each comparison.
SA19
The ANOVA is significant. What does this tell us (and what does it NOT tell us)?
What it tells us: At least one habitat type has a significantly different mean nest-building time from at least one other. The between-habitat variation is much larger than within-habitat variation (F = 8.43).
What it does NOT tell us: Which specific habitats differ from each other. We know there are differences somewhere, but not whether it’s one habitat vs. all others, or multiple pairwise differences. Post-hoc tests are needed for that.
SA20
What additional analysis would be conducted, and what would it determine?
A post-hoc multiple comparison procedure (e.g., Tukey’s Honestly Significant Difference, or Bonferroni correction) would be conducted. It would test all pairwise comparisons (e.g., forest vs. meadow, forest vs. wetland, etc.) while controlling the family-wise Type I error rate. This identifies specifically which habitat pairs have significantly different mean nest-building times.
SA21
In a survey of 400 patients, 124 reported side effects from a medication (\(\hat{p} = 0.31\)). Calculate a 95% confidence interval for the true proportion. Verify the success/failure condition.
Success/Failure check: - Successes: \(n\hat{p} = 400 \times 0.31 = 124 \geq 10\) ✅ - Failures: \(n(1-\hat{p}) = 400 \times 0.69 = 276 \geq 10\) ✅
SE: \(\sqrt{0.31 \times 0.69 / 400} = \sqrt{0.000534} \approx 0.0231\)
95% CI: \(0.31 \pm 1.96 \times 0.0231 = 0.31 \pm 0.045 = (0.265, 0.355)\)
We are 95% confident that between 26.5% and 35.5% of patients on this medication experience side effects.
SA22
A public health official claims “fewer than 25% of adults in our county are vaccinated against flu.” You survey 300 adults and find 66 vaccinated (\(\hat{p} = 0.22\)). Set up the hypotheses and describe what you need from R output to reach a conclusion.
H₀: p = 0.25 (vaccination rate equals 25%)
Hₐ: p < 0.25 (vaccination rate is less than 25%) — one-sided, matching the official’s claim
From R output, you need: - The test statistic z - The p-value for the one-sided alternative - Verification that the success/failure condition is met (\(np_0 = 300 \times 0.25 = 75 \geq 10\) ✅)
If p < 0.05, we reject H₀ and conclude there is evidence that fewer than 25% are vaccinated.
SA23
For testing H₀: p = 0.25 vs. Hₐ: p < 0.25 with \(\hat{p} = 0.22\) and n = 300, write the SE formula and explain why it uses p₀ rather than p̂.
Formula: \(SE = \sqrt{\frac{p_0(1-p_0)}{n}} = \sqrt{\frac{0.25 \times 0.75}{300}} = \sqrt{0.000625} = 0.025\)
Why p₀: Under the null hypothesis, we assume the true proportion is p = 0.25. The test asks: “If p really were 0.25, how surprising is our result?” So we compute the SE using the assumed (null) value. Using \(\hat{p}\) instead would mean we’re not testing against a specific claim — we’d be estimating, which is what we do for CIs.
SA24
Two hospitals report C-section rates: Hospital A: 180/800 = 22.5%; Hospital B: 210/700 = 30.0%. R output shows a 95% CI for the difference of (−0.118, −0.032). Interpret this CI.
We are 95% confident that the true difference in C-section rates (Hospital A − Hospital B) is between −11.8 and −3.2 percentage points. Since the entire interval is negative, Hospital A’s true C-section rate is significantly lower than Hospital B’s at α = 0.05. The magnitude of the difference is between about 3 and 12 percentage points, which represents a clinically meaningful gap in surgical practice.
SA25
The success/failure condition fails for a rare disease study (only 4 cases out of 500 patients tested). Explain the implication and what you might do instead.
When \(n\hat{p} = 4 < 10\), the sampling distribution of \(\hat{p}\) is highly skewed (not approximately normal), so the z-based normal approximation is unreliable. The resulting CIs and p-values could be inaccurate.
Alternatives: (1) Use Fisher’s Exact Test for 2×2 tables, which does not rely on the normal approximation; (2) use exact binomial methods for single proportions; (3) collect more data until the condition is met; (4) report the raw counts and use exact methods in the analysis.
SA26
Use the following for SA26–SA30:
Pearson's Chi-squared test
data: exercise_obesity
X-squared = 9.87, df = 1, p-value = 0.0017
| Obese | Not Obese | Total | |
|---|---|---|---|
| Exercise regularly | 45 | 255 | 300 |
| Does not exercise | 80 | 220 | 300 |
| Total | 125 | 475 | 600 |
Calculate the expected count for “Exercise regularly / Obese” and check conditions.
\(E = \frac{300 \times 125}{600} = \frac{37500}{600} = 62.5\)
All expected counts: - Exercise / Obese: 62.5 ✅ (≥ 5) - Exercise / Not Obese: 237.5 ✅ - No exercise / Obese: 62.5 ✅ - No exercise / Not Obese: 237.5 ✅
Conditions are met. Note: the observed count (45) is notably lower than expected (62.5), contributing to a large chi-square.
SA27
Write a complete conclusion based on the R output.
Statistical conclusion: \(\chi^2(1) = 9.87\), p = 0.0017 < 0.05. We reject H₀ of independence. There is strong evidence of a statistically significant association between exercise status and obesity.
Plain language: People who do not exercise regularly are disproportionately more likely to be classified as obese compared to those who exercise regularly. This association is unlikely to be due to chance alone.
SA28
Is this a test of independence or homogeneity? Explain.
Test of independence. The description implies one random sample of 600 individuals was drawn, and both exercise status and obesity status were measured on each person. When both categorical variables are measured on the same sample (neither margin was fixed in advance by the researcher), we use the test of independence.
If the researcher had recruited 300 exercisers and 300 non-exercisers separately and then measured obesity in each group, it would be homogeneity.
SA29
A student says “Since the chi-square test is significant, we can conclude that lack of exercise causes obesity.” Respond.
This conclusion is not justified. The chi-square test detects a statistically significant association between exercise and obesity, but cannot establish causation. This appears to be an observational study — participants chose whether to exercise, they were not randomly assigned. Numerous confounders could explain the relationship: diet, genetics, occupation, socioeconomic status, or underlying health conditions. To support causation, we would need a randomized controlled experiment where participants are assigned to exercise or not exercise, with all other factors controlled.
SA30
The relative risk (RR) of obesity for non-exercisers vs. exercisers is (80/300)/(45/300). Calculate and interpret this RR. Does the chi-square test provide this information?
\(RR = \frac{80/300}{45/300} = \frac{0.267}{0.150} = 1.78\)
Interpretation: Non-exercisers are 1.78 times as likely to be obese as those who exercise regularly — a 78% higher relative risk of obesity.
Does chi-square provide this? No. The chi-square test only tells us whether the association is statistically significant. It does not quantify the strength or direction of the association. The RR (or odds ratio) is needed to describe the magnitude of the relationship.
SA31
Explain in plain language what a p-value is and what it is not. Give an example of a common misinterpretation and correct it.
What it is: The p-value is the probability of observing data as extreme as (or more extreme than) what we saw, assuming the null hypothesis is true. It measures how surprising our result would be if H₀ were correct.
What it is not: It is NOT the probability that H₀ is true, nor the probability that the result is due to chance, nor the probability that the alternative hypothesis is true.
Common misinterpretation: “p = 0.03 means there is only a 3% chance this result was due to chance.”
Correction: p = 0.03 means: if H₀ were true, we would observe a result this extreme only 3% of the time. It says nothing about the probability that H₀ is true.
SA32
A 95% CI for a regression slope is (0.3, 1.2). Interpret this interval and state what you can conclude about the significance of the predictor.
Interpretation: We are 95% confident the true slope is between 0.3 and 1.2. For each one-unit increase in x, the predicted y increases by somewhere between 0.3 and 1.2 units (on average) in the population.
Significance: Since the entire CI is positive and does not include 0, the slope is statistically significantly different from zero at α = 0.05 (two-sided). This means x is a statistically significant linear predictor of y.
SA33
Describe the three main conditions for valid t-tests and explain what happens if they are violated.
1. Random sampling / Independence: Observations must be independent. Violation (e.g., clustered data, repeated measures handled incorrectly) leads to artificially small standard errors and inflated Type I error rates.
2. Normality (or large n): Either the population is approximately normal, or n is large enough for the CLT to apply (generally n ≥ 30). For small samples from highly skewed populations, the t-test can produce inaccurate p-values.
3. For two-sample tests — independence between groups: The two groups must not be paired or related. Ignoring pairing (using two-sample instead of paired t) inflates the SE and reduces power.
SA34
Explain the relationship between confidence intervals and hypothesis tests: how can you use a CI to make a hypothesis test decision?
A 95% CI and a two-sided hypothesis test at α = 0.05 are mathematically equivalent:
- If the null value falls inside the 95% CI → fail to reject H₀ at α = 0.05
- If the null value falls outside the 95% CI → reject H₀ at α = 0.05
For example: if testing H₀: μ = 0 and the 95% CI is (2.1, 5.8), then 0 is not in the interval → reject H₀. This works because both procedures use the same critical value (\(z^* = 1.96\) or \(t^*\)) and the same SE.
The CI is generally more informative because it also tells you the magnitude of the effect, not just whether it’s significant.
SA35
A study finds no significant difference between two drugs (p = 0.18). A pharmaceutical company says “this proves the drugs are equally effective.” Write a rebuttal in at most three sentences.
Failing to reject H₀ is not the same as proving H₀ is true — absence of evidence is not evidence of absence. The non-significant result (p = 0.18) could easily reflect a study that was underpowered to detect a real difference, rather than evidence that no difference exists. To support a claim of equivalence, the study would need to use an equivalence testing framework and demonstrate with a confidence interval that any possible difference is too small to be practically meaningful.
SA36
Compare the chi-square test for independence and the two-proportion z-test. When would you prefer each? What is the mathematical relationship for a 2×2 table?
Use two-proportion z-test when: you have exactly two groups with a binary outcome and want a directional (one-sided) test, or want a CI for the difference in proportions.
Use chi-square when: you have more than two categories in either variable, when you want to describe overall association without a directional hypothesis, or when a one-sided test is not meaningful.
Mathematical relationship for 2×2 tables: \(\chi^2 = z^2\) (and \(z = \pm\sqrt{\chi^2}\)). Both tests produce the same two-sided p-value for a 2×2 table.
SA37
A researcher presents regression output with R² = 0.95 and significant predictors, but the residual plot shows a clear U-shape. What concern does this raise?
A U-shaped residual plot indicates that the linearity condition is violated — the true relationship between x and y is nonlinear, and a straight line is not an adequate model. Despite the high R², the model is systematically wrong: it overpredicts in the middle range of x and underpredicts at the extremes (or vice versa). This means the inference results (p-values, CIs for the slope) are not valid, because they rely on the assumption that the linear model is correctly specified. A nonlinear model (e.g., quadratic) or transformation of variables would be more appropriate.
SA38
Interpret the following ANOVA result for a study comparing 5 diets:
Df Sum Sq Mean Sq F value Pr(>F)
diet 4 1240 310 2.15 0.076
Residuals 95 13690 144
Write a complete conclusion and comment on α = 0.05 vs. α = 0.10 decisions.
Conclusion at α = 0.05: F(4, 95) = 2.15, p = 0.076 > 0.05. We fail to reject H₀. There is insufficient evidence at the 5% significance level that mean outcomes differ across the five diets.
At α = 0.10: p = 0.076 < 0.10, so we would reject H₀ and conclude significant differences exist at the 10% level.
Commentary: The p-value is borderline. The choice of α matters here. Most scientific fields use α = 0.05 as the standard. The study may be underpowered; examining effect sizes and CIs for each diet comparison would provide more context than the p-value alone.
SA39
A study reports Cramér’s V = 0.08 from a chi-square test with p < 0.001 and n = 10,000. Explain what this tells us.
This result illustrates the distinction between statistical and practical significance. The chi-square test is highly significant (p < 0.001), meaning we are very confident there is a real association between the two categorical variables in the population. However, Cramér’s V = 0.08 indicates a very weak association — far below the 0.3 threshold for a moderate effect. With n = 10,000, even trivially small associations become statistically detectable. In practical terms, knowing one variable tells us almost nothing about the other. This association, while real, is likely not meaningful for decision-making.
SA40
Explain why using p₀(1-p₀)/n (rather than p̂(1-p̂)/n) in the denominator of the one-proportion z-test is correct.
The z-test asks: Assuming H₀ is true (p = p₀), how surprising is our observed p̂? Under H₀, the true standard deviation of p̂ is \(\sqrt{p_0(1-p_0)/n}\) — not \(\sqrt{\hat{p}(1-\hat{p})/n}\). Using \(p_0\) is consistent with the logic of hypothesis testing: we evaluate the evidence against H₀ from H₀’s perspective, not from our sample’s perspective. For confidence intervals, we have no assumed value for p, so we estimate it with \(\hat{p}\) — hence the different SE formula.
SA41
A regression line has \(\hat{y} = 50 + 2.5x\). For a new observation (x = 10, y = 73), calculate the residual and explain what it means.
\(\hat{y} = 50 + 2.5(10) = 50 + 25 = 75\)
Residual \(= y - \hat{y} = 73 - 75 = -2\)
Interpretation: The observed value (73) is 2 units below what the model predicted (75). The model slightly overpredicts for this individual. A negative residual means the model predicts higher than what actually occurred.
SA42
In an ANOVA comparing mean reaction times for 3 groups, MSE = 250 and MSG = 750. What is F? Interpret it.
\(F = MSG/MSE = 750/250 = 3.0\)
Interpretation: The variability between group means is 3 times as large as the typical variability within groups. Whether this is statistically significant depends on the degrees of freedom and the p-value from the F-distribution (not given here, consistent with the final exam format where p-values come from R output).
SA43
A 95% CI for a proportion is (0.48, 0.64). Someone asks: “Does this mean there’s a 95% chance the true proportion is between 0.48 and 0.64?” How do you respond?
No — this is a common misinterpretation. The true proportion is a fixed (though unknown) value, not a random variable. It either is or isn’t between 0.48 and 0.64 — we just don’t know which. The correct interpretation is: the interval (0.48, 0.64) was constructed using a procedure that, if repeated many times with different samples, would capture the true proportion in 95% of the resulting intervals. The 95% refers to the long-run success rate of the procedure, not to any probability about this specific interval.
SA44
A paired t-test gives t = 0.72, df = 24, p = 0.479 (two-sided). The 95% CI is (−1.8, 3.9). Write a complete conclusion and discuss whether the lack of significance means “no effect.”
Conclusion: t(24) = 0.72, p = 0.479 > 0.05. We fail to reject H₀. There is insufficient evidence of a significant mean difference.
Does this mean no effect? Not necessarily. The 95% CI (−1.8, 3.9) is quite wide, spanning both negative values (the treatment could help) and positive values (it could harm). This range includes potentially meaningful effects in either direction. The study likely lacks sufficient power to detect a real effect if it exists. A well-powered study with n large enough to narrow the CI would be needed before concluding there is truly no effect.
SA45
Explain what an “influential point” is in regression. How does it differ from an outlier in the y-direction? Why does it matter for inference?
An influential point is an observation that, when removed, substantially changes the regression slope. Influential points typically have an extreme x-value (high leverage) — they sit far from the mean of x. A point can be influential without having a large residual if the regression line bends toward it.
An outlier in y has a large residual (its observed y is far from the regression line) but may not be influential if it has an average x-value.
Why it matters: If an influential point drives the slope, then removing that one observation would lead to very different conclusions. Inference (significance of slope, CI for slope) may be valid only because of that point’s influence, not because of the overall pattern. Sensitive analyses should check results with and without influential points.
SA46
A clinical trial randomizes 200 patients to vaccine (n = 100) or placebo (n = 100). Vaccine: 5 infections; Placebo: 18 infections.
2-sample test for equality of proportions
X-squared = 7.92, df = 1, p-value = 0.0049
95 percent confidence interval:
-0.195 -0.035
sample estimates:
prop 1 prop 2
0.050 0.180
Write a complete statistical analysis: check conditions, state hypotheses, interpret output, and write conclusions.
Conditions: - Vaccine: \(n\hat{p}_1 = 100 \times 0.05 = 5 < 10\) ⚠️ — borderline violation; note this caveat - Placebo: \(n\hat{p}_2 = 100 \times 0.18 = 18 \geq 10\) ✅; \(n(1-\hat{p}_2) = 82 \geq 10\) ✅ - Independence: random assignment ✅
The success/failure condition is not fully met for the vaccine group (only 5 infections). Results should be interpreted with some caution; exact methods would be more reliable.
Hypotheses: H₀: p_vaccine = p_placebo; Hₐ: p_vaccine ≠ p_placebo (two-sided)
Conclusion: \(\chi^2(1) = 7.92\), p = 0.0049 < 0.05. We reject H₀. There is strong evidence that infection rates differ between the vaccine and placebo groups.
Plain language: The infection rate was significantly lower in the vaccine group (5%) than the placebo group (18%). We are 95% confident the true reduction in infection rate due to vaccination is between 3.5 and 19.5 percentage points — a practically meaningful benefit.
SA47
ANOVA output compares cholesterol for 4 diets (n = 25/group). F = 5.23, p = 0.002. Means: Diet 1 = 185, Diet 2 = 195, Diet 3 = 200, Diet 4 = 205.
- Write the conclusion. b) Does this tell you which diets differ? What would you do next? c) Can you conclude Diet 1 is best?
a) F(3, 96) = 5.23, p = 0.002 < 0.05. We reject H₀: μ₁ = μ₂ = μ₃ = μ₄. There is strong evidence that mean cholesterol levels differ across at least one pair of the four diets.
b) No — ANOVA only tells us that differences exist somewhere. A post-hoc test (e.g., Tukey’s HSD) would determine which specific pairs of diets have significantly different mean cholesterol levels.
c) Not yet. Diet 1 has the lowest observed mean (185 mg/dL), but we do not know whether it is significantly lower than Diets 2, 3, or 4. Tukey’s HSD would determine whether the difference between Diet 1 and others (e.g., 185 vs. 195 = 10 mg/dL difference) is statistically significant.
SA48
Chi-square test: insurance type (4 categories) × preventive screening (Yes/No) gives χ² = 21.4, df = 3, p < 0.001. Cell: uninsured / No screening: O = 145, E = 98.
- Does this cell drive the result? b) Write the conclusion. c) What does the association imply?
a) Yes. This cell’s contribution to the chi-square is \((145-98)^2/98 = 2209/98 \approx 22.5\), which alone exceeds the entire test statistic of 21.4. The uninsured group has substantially more people forgoing screening than expected under independence — this is the primary driver of the result.
b) \(\chi^2(3) = 21.4\), p < 0.001 < 0.05. We reject H₀ of independence. There is very strong evidence that insurance type is associated with likelihood of receiving preventive screening.
c) Uninsured individuals are disproportionately less likely to receive preventive cancer screening than would be expected if insurance status and screening were unrelated. This suggests that lack of insurance creates a barrier to preventive care, with potential implications for health equity and policy — though causation cannot be confirmed from this observational study.
SA49
Regression output (birth weight ~ maternal age):
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2800.0 180.0 15.56 <2e-16 ***
maternal_age 12.5 4.2 2.98 0.003 **
Multiple R-squared: 0.041
- Write and interpret equation. b) Comment on R². c) Is slope significant? d) Should maternal age be used clinically?
a) \(\widehat{\text{birth weight}} = 2800 + 12.5 \times \text{maternal age}\) (in grams)
For each additional year of maternal age, predicted birth weight increases by 12.5 grams on average.
b) R² = 0.041: maternal age explains only 4.1% of the variability in birth weight. The vast majority of variation in birth weight is explained by other factors (gestational age, nutrition, genetics, etc.).
c) Yes. t = 2.98, p = 0.003 < 0.05. Maternal age is a statistically significant predictor of birth weight. However, with such a small R², statistical significance does not imply clinical importance.
d) No. Despite statistical significance, maternal age explains only 4% of birth weight variation. A 12.5-gram increase per year of maternal age is not clinically meaningful, and the model would produce very imprecise predictions for individual patients. More relevant clinical predictors should be used.
SA50
Describe two situations from this course where a statistically significant result did NOT imply a practically meaningful conclusion. For each, explain what additional information is needed. Then describe one situation where a non-significant result was still worth reporting.
Situation 1 — Large n makes tiny effects significant: The dietary supplement study (n = 10,000, weight loss = 0.1 kg, p = 0.002). Statistical significance was achieved, but a 0.1 kg difference is clinically trivial. Additional information needed: effect size (Cohen’s d or raw difference with CI) and clinical judgment about the minimum meaningful threshold.
Situation 2 — Significant correlation, tiny R²: The exercise-heart rate regression (r = −0.385, p < 0.001) where R² = 0.148 — exercise explained only 15% of heart rate variability. While statistically significant, the model would make poor predictions for individual patients. Additional information needed: R² and residual standard error to assess practical predictive value.
Non-significant result worth reporting: The study comparing two vaccines (X-squared = 0.375, p = 0.540). Even though no difference was detected, this result — combined with the wide CI (−0.05, 0.09) — informs public health policy. Knowing the vaccines are likely comparable in efficacy can guide procurement decisions, particularly if one is cheaper or easier to distribute. The non-significant result is informative, but only when accompanied by a CI that shows the range of plausible differences.
Remember: The final exam emphasizes interpretation. There are no distribution tables — p-values will be provided in R output. Focus on reading output correctly, checking conditions, and writing clear conclusions in biological context.
End of Practice Final — Good Luck! 🎓
“The goal is not to find statistical significance — the goal is to learn something true about the world.”
