HW8: Practice Final Exam

STAT 7 — Winter 2026

Author

🔬 Your Mission

Welcome, Statistical Detective!

You’ve made it to the end of the quarter. Before the final exam, we have assembled the ultimate challenge: a comprehensive set of problems drawn from real biological and health research.

Your mission: Work through these problems to prepare for the final. Everything emphasizes interpretation over calculation. In most questions, R output is provided — your job is to read it correctly, check conditions, and communicate conclusions.

Format: 150 Multiple Choice (1 point each) + 50 Short Answer (2 points each) = 250 points

Coverage: All learning objectives from Week 5 onward (post–normal distribution)

PART 1: MULTIPLE CHOICE (150 questions, 1 point each)

Section A: Sampling Distributions and the Central Limit Theorem

Question 1

A researcher takes repeated samples of size n = 50 from a population with mean μ = 120 and standard deviation σ = 15. Which of the following best describes the sampling distribution of the sample mean?

Normal distribution with mean 120 and standard deviation 15
Normal distribution with mean 120 and standard deviation 2.12
Skewed distribution with mean 120 and standard deviation 15
Normal distribution with mean 120 and standard deviation 0.30

Solution

Answer: b)

The sampling distribution of \(\bar{x}\) is approximately normal (by the CLT, n = 50 is large enough) with mean μ = 120 and standard error \(SE = \sigma/\sqrt{n} = 15/\sqrt{50} \approx 2.12\). The standard deviation of the sampling distribution is the standard error, not σ.

Question 2

What does the Central Limit Theorem state?

All populations are approximately normally distributed
Sample means from large samples are approximately normally distributed, regardless of the population shape
The sample mean always equals the population mean
Larger samples always produce more accurate estimates

Solution

Answer: b)

The CLT tells us about the shape of the sampling distribution of \(\bar{x}\), not the population itself. It applies regardless of population shape, as long as n is sufficiently large (generally n ≥ 30).

Question 3

A population of bacterial colony sizes is strongly right-skewed with mean μ = 45 and σ = 20. If a researcher takes random samples of n = 100, the sampling distribution of \(\bar{x}\) will be:

Right-skewed with mean 45 and SD 20
Approximately normal with mean 45 and SD 2
Approximately normal with mean 45 and SD 20
Right-skewed with mean 45 and SD 2

Solution

Answer: b)

With n = 100 (large), the CLT guarantees the sampling distribution of \(\bar{x}\) is approximately normal (not skewed), with mean μ = 45 and \(SE = 20/\sqrt{100} = 2\).

Question 4

The standard error of the mean is 3.2 when n = 25. What is the population standard deviation σ?

3.2
16
0.64
80

Solution

Answer: b)

\(SE = \sigma/\sqrt{n}\), so \(\sigma = SE \times \sqrt{n} = 3.2 \times \sqrt{25} = 3.2 \times 5 = 16\).

Question 5

Which change would reduce the standard error of the mean by half?

Double the population mean
Double the sample size
Quadruple the sample size
Reduce the population standard deviation by half

Solution

Answer: c)

\(SE = \sigma/\sqrt{n}\). To halve SE, we need \(\sqrt{n}\) to double, which requires n to quadruple. Doubling n only reduces SE by a factor of \(\sqrt{2} \approx 1.41\).

Question 6

A sampling distribution is:

The distribution of scores in a single sample
The distribution of a statistic computed across many samples of the same size
The distribution of all possible population values
A histogram of the population

Solution

Answer: b)

A sampling distribution shows how a statistic (like \(\bar{x}\)) varies across all possible samples of size n drawn from the same population.

Question 7

As sample size increases, the shape of the sampling distribution of \(\bar{x}\) becomes:

More skewed
More variable
More normally distributed
More like the population distribution

Solution

Answer: c)

This is the essence of the CLT: larger n → the sampling distribution becomes more normal, regardless of the population’s shape.

Question 8

A medical researcher samples n = 9 patients from a normally distributed population (σ = 12). The standard error of the mean is:

12
4
1.33
36

Solution

Answer: b)

\(SE = \sigma/\sqrt{n} = 12/\sqrt{9} = 12/3 = 4\).

Question 9

The Central Limit Theorem is particularly important when:

The population is already normally distributed
The sample size is small (n < 10)
The population is skewed and sample sizes are large
The sample mean equals the population mean

Solution

Answer: c)

When the population is already normal, the sampling distribution is automatically normal for any n. The CLT’s value is precisely when populations are non-normal — for large n, inference based on normality is still valid.

Question 10

If \(SE = \sigma/\sqrt{n}\), and a researcher wants to halve the SE, they should:

Double σ
Double n
Quadruple n
Halve σ

Solution

Answer: c)

To halve \(\sigma/\sqrt{n}\), you need \(\sqrt{n}\) to double, which means n must quadruple. See also Q5.

Section B: Confidence Intervals

Question 11

What is the correct interpretation of a 95% confidence interval?

There is a 95% chance the population mean is in this interval
95% of the data falls within this interval
If we repeated this procedure many times, about 95% of the resulting intervals would contain the true population mean
The sample mean has a 95% probability of being correct

Solution

Answer: c)

This is a frequentist interpretation. The parameter is fixed (not random); it is the interval that varies from sample to sample. 95% of all such intervals would capture the true mean.

Question 12

A 95% CI for mean blood pressure is (118, 134) mmHg. A researcher claims “the true mean blood pressure is probably around 126 mmHg.” Which statement is correct?

The researcher is wrong; we can only say the mean is in the interval
126 is the sample mean, but we cannot make probability statements about the population mean
126 is the sample mean (midpoint of the CI) and it is our best point estimate
Both b and c are correct

Solution

Answer: d)

The midpoint of the CI is \(\bar{x} = (118+134)/2 = 126\), which is our best point estimate. However, we cannot say there is a “95% chance” the mean is at any particular value — probability statements apply to the interval, not the parameter.

Question 13

A wider confidence interval indicates:

Greater precision
A larger sample size
More uncertainty about the parameter
A smaller standard deviation

Solution

Answer: c)

Width = \(2 \times z^* \times SE\). A wider CI results from a larger SE (smaller n or larger s) or higher confidence level — all of which reflect more uncertainty in our estimate.

Question 14

Which of the following would produce the NARROWEST 95% confidence interval for a population mean?

n = 30, s = 10
n = 100, s = 10
n = 100, s = 5
n = 30, s = 5

Solution

Answer: c)

Width depends on \(SE = s/\sqrt{n}\). Compute for each: (a) 10/√30 ≈ 1.83; (b) 10/√100 = 1.0; (c) 5/√100 = 0.5 ✅; (d) 5/√30 ≈ 0.91. Largest n and smallest s → narrowest interval.

Question 15

The margin of error in a confidence interval is:

The sample mean minus the population mean
The critical value multiplied by the standard error
The standard deviation divided by the sample size
The width of the entire confidence interval

Solution

Answer: b)

Margin of error \(= z^* \times SE\) (or \(t^* \times SE\)). The CI is statistic ± margin of error, so the margin of error is half the width.

Question 16

A 99% CI will be ________ than a 95% CI based on the same data.

Narrower
Wider
The same width
Centered at a different value

Solution

Answer: b)

Higher confidence requires a larger critical value (\(z^* = 2.576\) for 99% vs. \(z^* = 1.960\) for 95%), producing a wider interval. More confidence = less precision.

Question 17

The t-distribution is used instead of the z-distribution for confidence intervals when:

The sample size is large
The population standard deviation is unknown
The data are skewed
The sample mean is large

Solution

Answer: b)

When σ is unknown, we estimate it with s, which introduces additional uncertainty. The t-distribution accounts for this by having heavier tails than the normal.

Question 18

A 95% CI for mean cholesterol level is (185, 205) mg/dL. Can we conclude that the mean is significantly different from 200?

Yes, because 200 is above the center of the interval
No, because 200 falls within the interval
Yes, because the interval does not include 0
No, because 200 is close to the upper bound

Solution

Answer: b)

A two-sided hypothesis test at α = 0.05 is equivalent to checking whether the null value falls in the 95% CI. Since 200 falls within (185, 205), we fail to reject H₀: μ = 200.

Question 19

As sample size increases, a 95% confidence interval becomes:

Wider
Narrower
More likely to contain the true mean
Centered at a different value

Solution

Answer: b)

Larger n → smaller SE → narrower CI. The confidence level (95%) is fixed by design and does not change with n.

Question 20

A researcher reports: “We are 95% confident that the mean recovery time is between 7.2 and 9.8 days.” The margin of error is:

2.6 days
1.3 days
8.5 days
0.65 days

Solution

Answer: b)

Margin of error = half the width = \((9.8 - 7.2)/2 = 2.6/2 = 1.3\) days.

Question 21

Which condition is NOT required for a valid confidence interval for a mean?

The sample is random
The population is normally distributed OR n ≥ 30
The population standard deviation equals the sample standard deviation
Observations are independent

Solution

Answer: c)

We never need σ = s. We use s precisely because σ is unknown. The actual requirements are: random sample, independence, and normality (either from a normal population or from the CLT when n is large).

Question 22

A 95% CI for the difference (A-B) in mean weights between two diet groups is (1.2, 4.8) kg. Which conclusion is correct?

Diet A produces significantly higher weight loss (α = 0.05)
There is no significant difference because the interval is wide
We need to know the p-value to draw any conclusion
The interval needs to include 0 to be valid

Solution

Answer: a)

The entire CI is positive (does not include 0), meaning we are 95% confident the true difference is positive. This is equivalent to rejecting H₀: μ₁ = μ₂ at α = 0.05. Width is irrelevant to the significance decision.

Section C: Hypothesis Testing

Question 23

In hypothesis testing, the null hypothesis H₀ typically states:

The research hypothesis we hope to prove
No effect, no difference, or a specific parameter value
The alternative we will accept if p < 0.05
The result we observed in our sample

Solution

Answer: b)

H₀ is always the “status quo” or “no effect” claim. It always contains an equality (=). We test against it, not for it.

Question 24

A p-value of 0.03 means:

There is a 3% chance H₀ is true
The probability of observing data this extreme (or more), assuming H₀ is true, is 3%
There is a 97% chance H₁ is true
The result is not practically significant

Solution

Answer: b)

The p-value is a conditional probability: P(data this extreme or more | H₀ true). It says nothing about the probability that H₀ or Hₐ is true.

Question 25

A researcher uses α = 0.05 and obtains p = 0.08. The correct conclusion is:

There is strong evidence against H₀
H₀ is true
There is insufficient evidence to reject H₀
The study needs to be repeated

Solution

Answer: c)

p = 0.08 > 0.05 = α, so we fail to reject H₀. We never “accept” H₀ — absence of evidence is not evidence of absence.

Question 26

A Type I error occurs when:

We fail to reject a false H₀
We reject a true H₀
Our p-value is too large
Our sample size is too small

Solution

Answer: b)

Type I error = false positive = rejecting H₀ when it is actually true. Its probability is α.

Question 27

A Type II error occurs when:

We reject a true H₀
We fail to reject a false H₀
Our sample is not random
The p-value is below α

Solution

Answer: b)

Type II error = false negative = failing to detect a real effect. Its probability is β. Power = 1 − β.

Question 28

Statistical significance (p < 0.05) means:

The effect is large and important
There is strong evidence that the effect is not zero
The study will be published
The effect has a 95% chance of being real

Solution

Answer: b)

Statistical significance only tells us that the observed data are unlikely under H₀. It says nothing about effect size, importance, or the probability that the effect is real.

Question 29

A researcher tests whether a new drug reduces blood pressure with H₀: μ = 0 vs. Hₐ: μ < 0. This is a:

Two-sided test
Left-tailed test
Right-tailed test
Paired test

Solution

Answer: b)

Hₐ: μ < 0 points to the left tail. The p-value is the area to the left of the observed test statistic.

Question 30

For a one-sided test with z = −8.10, the p-value is approximately:

more than 0.5
less than 0.001
0.05
0.10

Solution

Answer: b)

For a left-tailed test, p-value = P(Z < −8.10) < 0.00001. For a two-sided test it would also be close to zero.

Question 31

The difference between statistical significance and practical significance is:

There is no difference; they mean the same thing
Statistical significance indicates the effect is real; practical significance indicates the effect is important
Practical significance is only relevant for medical research
Statistical significance is only meaningful with large samples

Solution

Answer: b)

Statistical significance (small p-value) tells us the effect exists in the population. Practical significance asks whether the effect is large enough to matter in real-world terms. Large samples can make tiny, meaningless effects statistically significant.

Question 32

A study with n = 10,000 finds that a dietary supplement increases mean weight loss by 0.1 kg (p = 0.002). Which statement is most accurate?

The supplement is an effective weight loss treatment
The result is statistically significant but likely not practically meaningful
The large sample invalidates the result
The p-value proves the supplement works

Solution

Answer: b)

With n = 10,000, even a trivially small effect (0.1 kg ≈ 3.5 oz) can achieve statistical significance. A 0.1 kg weight loss is not clinically meaningful for most contexts.

Question 33

Decreasing α from 0.05 to 0.01:

Increases the probability of a Type I error
Decreases the probability of a Type I error
Increases statistical power
Decreases the probability of a Type II error

Solution

Answer: b)

α = P(Type I error). Decreasing α makes it harder to reject H₀, so Type I errors become less likely — but Type II errors (and β) increase, and power decreases.

Question 34

To reject H₀ at α = 0.05 in a two-sided test, you need:

p < 0.025
p < 0.05
z > 1.645
z > 2.576

Solution

Answer: b)

For a two-sided test at α = 0.05, reject H₀ when p < 0.05. The critical values are ±1.96 (not 1.645, which is for one-sided α = 0.05).

Question 35

A 95% CI for a parameter that does NOT include the null value implies:

Failing to reject H₀ at α = 0.05
Rejecting H₀ at α = 0.05 (two-sided)
The parameter equals the null value
The CI was calculated incorrectly

Solution

Answer: b)

The duality between CIs and hypothesis tests: if the null value falls outside the 95% CI, the two-sided test rejects H₀ at α = 0.05.

Section D: t-Tests

Use the following output for Questions 36–40:

    Paired t-test

data:  after - before
t = -3.21, df = 24, p-value = 0.0037
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -8.43  -1.87
sample estimates:
mean of x 
    -5.15

This test compares blood pressure (mmHg) before and after a meditation program for 25 participants.

Question 36

The p-value = 0.0037 means:

There is a 0.37% chance the meditation works
There is a 0.37% chance of observing a difference this large if the true mean difference is 0
The mean difference is 0.37 mmHg
The confidence interval is 0.37% accurate

Solution

Answer: b)

The p-value is the probability of observing a mean difference as extreme as −5.15 mmHg (or more extreme) assuming H₀: mean difference = 0 is true. It is not the probability that the treatment works.

Question 37

The correct conclusion from this output (α = 0.05) is:

The meditation program does not significantly reduce blood pressure
There is significant evidence that blood pressure changed after meditation
Blood pressure increased significantly after meditation
The study has insufficient power

Solution

Answer: b)

p = 0.0037 < 0.05, so we reject H₀. The mean difference is negative (after − before = −5.15), indicating blood pressure decreased. The alternative is two-sided, so we conclude it “changed” — and the direction (decrease) is shown by the negative mean.

Question 38

How many participants were in this study?

24
25
26
Cannot tell from the output

Solution

Answer: b)

For a paired t-test, df = n − 1 = 24, so n = 25 participants.

Question 39

The 95% CI (−8.43, −1.87) means:

95% of participants experienced a reduction between 1.87 and 8.43 mmHg
We are 95% confident the true mean reduction in blood pressure is between 1.87 and 8.43 mmHg
The meditation reduced blood pressure by exactly 5.15 mmHg
The p-value is between 1.87% and 8.43%

Solution

Answer: b)

The CI is for the population mean difference, not individual outcomes. Because both bounds are negative, the entire interval excludes 0, consistent with the significant result.

Question 40

Why is a paired t-test appropriate here instead of a two-sample t-test?

Because the sample size is small
Because each participant’s before and after measurements are linked
Because the data are normally distributed
Because the researcher wanted a smaller p-value

Solution

Answer: b)

The same 25 people are measured twice. Before and after values are not independent — they share the same subject. The paired design accounts for between-person variability by analyzing the differences within each person.

Use the following output for Questions 41–45:

    Welch Two Sample t-test

data:  cortisol by group
t = 2.84, df = 58.3, p-value = 0.0061
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.42  2.38
sample estimates:
mean in group control mean in group stressed 
               12.1                    13.5

Question 41

The standard error used to compute this t-statistic is:

(13.5 − 12.1) / 2.84
1.4 / 2.84
The square root of the sum of the two group variances
The pooled standard deviation

Solution

Answer: b)

\(t = (\bar{x}_1 - \bar{x}_2)/SE\), so \(SE = (13.5 - 12.1)/2.84 = 1.4/2.84 \approx 0.493\). Both a and b are equivalent statements; b is more direct.

Question 42

This test uses “Welch” correction because:

The data are skewed
The two groups may have different population variances
The sample sizes are unequal
The p-value is small

Solution

Answer: b)

The Welch (unpooled) t-test does not assume equal population variances. It adjusts the degrees of freedom to account for this, making it more robust than the pooled t-test.

Question 43

The 95% CI (0.42, 2.38) represents:

The range of values for cortisol in the stressed group
The plausible values for the difference in mean cortisol (stressed − control)
The plausible values for the difference in mean cortisol (control − stressed)
The range of individual cortisol values

Solution

R computes the CI as group listed first minus group listed second. If we had “control − stressed”: the CI (0.42, 2.38) should be entirely negative, meaning control < stressed as the data show: control = 12.1 < stressed = 13.5.

In this output, sample estimates show control = 12.1 and stressed = 13.5, so the difference (control − stressed) should be negative. If the CI is (0.42, 2.38), R may have computed stressed − control. Accept b) as the intended correct answer based on the ordering in sample estimates.

Answer: b) — difference in mean cortisol (stressed − control), since 13.5 − 12.1 = 1.4, which is inside (0.42, 2.38).

Question 44

Can we conclude that stress causes higher cortisol from this study? To answer this, you need to know:

Whether the study was observational or experimental
Whether the sample size was sufficient
Whether the CI includes zero
Whether the t-statistic is large

Solution

Answer: a)

Causation requires an experiment with random assignment. If this is an observational study (people self-identified as “stressed”), confounding variables could explain the difference in cortisol.

Question 45

The difference in sample means is:

2.84
0.42
1.4
13.5

Solution

Answer: c)

13.5 − 12.1 = 1.4. The value 2.84 is the t-statistic, not the mean difference.

Section E: Statistical Power

Question 46

Statistical power is defined as:

The probability of making a Type I error
The probability of correctly rejecting a false H₀
The probability of failing to detect an effect
The significance level α

Solution

Answer: b)

Power = 1 − β = P(reject H₀ | H₀ is false). It is the probability of detecting a true effect.

Question 47

Which factor does NOT increase statistical power?

Larger sample size
Larger effect size
Smaller α (e.g., 0.01 vs. 0.05)
Smaller within-group variability

Solution

Answer: c)

Smaller α makes the rejection region harder to reach, which decreases power (increases β). Larger n, larger effect size, and smaller σ all increase power.

Question 48

A study has power = 0.80. This means:

80% of participants will show an effect
There is an 80% chance of detecting a real effect if it exists
The p-value will be less than 0.80
The Type I error rate is 20%

Solution

Answer: b)

Power = P(reject H₀ | Hₐ is true) = 0.80. There is a 20% chance of a Type II error (missing a real effect), not a 20% Type I error rate.

Question 49

A pilot study finds an effect size of d = 0.3 (small). To achieve 80% power at α = 0.05, researchers would need ________ than if d = 0.8 (large).

Fewer participants
More participants
The same number of participants
Cannot determine without more information

Solution

Answer: b)

Smaller effects are harder to detect and require larger samples to achieve the same power. This is why small-effect studies are so resource-intensive.

Question 50

A researcher fails to reject H₀ (p = 0.23). The most important follow-up question is:

Was the p-value close to 0.05?
Was the study adequately powered to detect a meaningful effect?
Should they lower α to 0.01?
Was a two-sided test used?

Solution

Answer: b)

A non-significant result is hard to interpret without knowing the power. A low-powered study (small n) that fails to reject H₀ tells us very little — we may simply have been unable to detect the effect even if it exists.

Section F: Correlation and Regression

Use the following output for Questions 51–60:

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   12.400      1.850    6.70   <2e-16 ***
exercise_hrs   -0.380      0.092   -4.13   0.00008 ***

Residual standard error: 2.14 on 98 degrees of freedom
Multiple R-squared:  0.1483

This model predicts resting heart rate (bpm) from weekly exercise hours for 100 adults.

Question 51

The slope estimate is −0.380. This means:

For every 1-hour increase in weekly exercise, resting heart rate decreases by 0.380 bpm on average
For every 1-bpm decrease in heart rate, exercise increases by 0.380 hours
Exercise causes heart rate to decrease by 38% per hour
The correlation between exercise and heart rate is −0.380

Solution

Answer: a)

The slope is always interpreted as: for a one-unit increase in x, the predicted y changes by the slope value. Slopes are not correlations, percentages, or reverse predictions.

Question 52

Is the slope statistically significant at α = 0.05?

No, because the p-value is very small
Yes, because p = 0.00008 < 0.05
Yes, because the slope is negative
No, because R² < 0.50

Solution

Answer: b)

p = 0.00008 < 0.05, so we reject H₀: β₁ = 0. The sign of the slope and the value of R² do not determine significance.

Question 53

The value R² = 0.1483 means:

The model explains 14.83% of the variability in resting heart rate
The correlation coefficient is 0.1483
14.83% of participants show the expected pattern
The model is not useful

Solution

Answer: a)

R² is the proportion of variability in y explained by the linear model. Note: \(r = -\sqrt{0.1483} \approx -0.385\) (negative because slope is negative), not 0.1483.

Question 54

What is the predicted resting heart rate for someone who exercises 5 hours per week?

−0.380 + 12.4 × 5 = 61.6 bpm
12.4 − 0.380 × 5 = 10.5 bpm
12.4 + (−0.380) × 5 = 10.5 bpm
Both b and c (same calculation)

Solution

Answer: d)

\(\hat{y} = 12.4 + (-0.380)(5) = 12.4 - 1.9 = 10.5\) bpm. Options b and c are the same calculation written differently. Note: this seems biologically implausible — a heart rate of 10.5 bpm is not possible. This illustrates the danger of extrapolation if 5 hours is outside the observed range.

Question 55

The intercept (12.400) represents:

The predicted heart rate when exercise = 0 hours per week
The average resting heart rate in the sample
The heart rate reduction per hour of exercise
The maximum heart rate in the sample

Solution

Answer: a)

The intercept is the predicted y when x = 0. Whether this value is meaningful depends on whether x = 0 is within the observed range of data.

Question 56

A researcher states: “Since exercise is a significant predictor of heart rate, we can conclude that exercise lowers heart rate.” What is the flaw?

The p-value is too small to justify this conclusion
Regression does not prove causation; this may be an observational study
The R² is too low for this conclusion
The sample size is insufficient

Solution

Answer: b)

Regression shows statistical association. Without random assignment of exercise hours to participants (an experiment), we cannot rule out confounding. Healthier people may exercise more AND have lower heart rates for other reasons.

Question 57

The correlation coefficient r for this data is approximately:

0.1483
0.3851
−0.3851
−0.1483

Solution

Answer: c)

\(r = -\sqrt{R^2} = -\sqrt{0.1483} \approx -0.385\). The sign is negative because the slope is negative (negative relationship between exercise and heart rate).

Question 58

Residuals are defined as:

The predicted values from the regression line
The difference between observed and predicted y values
The sum of squared deviations
The slope multiplied by the error

Solution

Answer: b)

Residual \(e_i = y_i - \hat{y}_i\). Residuals measure how far each observed point is from the regression line.

Question 59

A residual plot shows a fan-shaped pattern (variance increases with fitted values). This suggests:

The linearity condition is violated
The equal variance (homoscedasticity) condition is violated
The independence condition is violated
The normality condition is violated

Solution

Answer: b)

A fan shape indicates non-constant variance (heteroscedasticity) — the spread of residuals changes across the range of fitted values. This violates the “E” in LINE (Equal variance).

Question 60

An influential point in regression is one that:

Has a very large residual
Is an outlier in the x direction that strongly affects the slope
Has a y-value far from the mean
Corresponds to a participant who didn’t follow protocol

Solution

Answer: b)

Influential points are extreme in x (high leverage) and their inclusion or removal substantially changes the slope. A point can have a large residual without being influential, and vice versa.

Section G: ANOVA

Use the following output for Questions 61–68:

Analysis of Variance Table

             Df Sum Sq Mean Sq F value  Pr(>F)    
treatment     3  487.2  162.4   14.78  2.4e-08 ***
Residuals   196 2154.9   11.0

This output compares mean immune cell counts across 4 treatment conditions.

Question 61

How many groups are being compared?

Solution

Answer: b)

\(df_{between} = k - 1 = 3\), so \(k = 4\) groups.

Question 62

The total number of observations in this study is:

Solution

Answer: c)

Total df = df_between + df_within = 3 + 196 = 199 = n − 1, so n = 200.

Question 63

The F-statistic (14.78) is calculated as:

Sum Sq treatment / Sum Sq Residuals
Mean Sq treatment / Mean Sq Residuals
Mean Sq Residuals / Mean Sq treatment
df treatment / df Residuals

Solution

Answer: b)

\(F = MSG/MSE = 162.4/11.0 = 14.76 \approx 14.78\) ✅. Always use mean squares (not sums of squares) for the F ratio.

Question 64

What are the null and alternative hypotheses for this ANOVA?

H₀: All treatments are the same; Hₐ: At least one treatment is different
H₀: μ₁ = μ₂ = μ₃ = μ₄; Hₐ: All means are different
H₀: At least one mean differs; Hₐ: All means are equal
H₀: μ₁ = μ₂ = μ₃ = μ₄; Hₐ: At least one mean differs

Solution

Answer: d)

H₀ states all group means are equal. Hₐ only requires that at least one mean differs — not that all differ from each other. Option b’s Hₐ is too strong.

Question 65

The conclusion from this ANOVA (α = 0.05) is:

All four treatments have the same mean immune cell count
There is significant evidence that at least one treatment mean differs from the others
All treatment means are significantly different from each other
The study has insufficient power

Solution

Answer: b)

p = 2.4 × 10⁻⁸ << 0.05. We reject H₀. The ANOVA tells us that means differ but not which ones — that requires post-hoc testing.

Question 66

Why is ANOVA preferable to running three separate pairwise t-tests when comparing 4 groups?

ANOVA is faster to compute
Conducting multiple t-tests inflates the Type I error rate beyond α
ANOVA uses the F-distribution which is more accurate
t-tests cannot be used with more than 2 groups

Solution

Answer: b)

With 4 groups there are \(\binom{4}{2} = 6\) pairwise comparisons. At α = 0.05 each, P(at least one false positive) = 1 − (0.95)⁶ ≈ 0.26 — far above 5%.

Question 67

With 4 groups and α = 0.05, the probability of at least one false positive from all pairwise t-tests would be:

0.05
1 − (0.95)⁶ ≈ 0.264
1 − (0.95)³ ≈ 0.143
6 × 0.05 = 0.30

Solution

Answer: b)

4 groups → \(\binom{4}{2} = 6\) pairwise comparisons. P(at least one Type I error) = 1 − (1 − 0.05)⁶ ≈ 0.264. Option d is a conservative Bonferroni bound, not exact.

Question 68

After a significant ANOVA result, a post-hoc test (like Tukey’s HSD) is used to:

Recalculate the F-statistic
Determine which specific pairs of groups differ significantly
Check the ANOVA conditions
Increase statistical power

Solution

Answer: b)

ANOVA tells us that groups differ; post-hoc tests (with appropriate corrections for multiple comparisons) tell us which pairs differ.

Section H: Inference for Proportions

Question 69

A study finds that 45 out of 200 patients in a clinical trial showed a positive response. The sample proportion is:

200/45 = 4.44
45/200 = 0.225
(45 + 200)/2 = 122.5
45²/200 = 10.125

Solution

Answer: b)

\(\hat{p} = x/n = 45/200 = 0.225\). The sample proportion is always number of successes divided by sample size.

Question 70

The standard error for the sample proportion 0.225 with n = 200 is:

√(0.225 × 0.775 / 200) ≈ 0.0295
0.225 / √200 ≈ 0.0159
√(0.225 / 200) ≈ 0.0335
0.225 × 0.775 / 200 ≈ 0.000872

Solution

Answer: a)

\(SE(\hat{p}) = \sqrt{\hat{p}(1-\hat{p})/n} = \sqrt{0.225 \times 0.775 / 200} = \sqrt{0.000872} \approx 0.0295\).

Question 71

The success/failure condition for proportions requires:

p > 0.5
np̂ ≥ 10 AND n(1−p̂) ≥ 10
n > 30
p̂ is normally distributed

Solution

Answer: b)

We need at least 10 observed successes and 10 observed failures. This ensures the normal approximation to the binomial is adequate.

Question 72

For testing H₀: p = p₀, the SE in the z-statistic uses:

p̂ (the sample proportion)
p₀ (the null value)
The pooled proportion
The population proportion

Solution

Answer: b)

Under H₀, we assume p = p₀ is true, so we use \(SE = \sqrt{p_0(1-p_0)/n}\). For a CI, we don’t assume a specific value for p, so we use \(\hat{p}\) instead.

Question 73

A hospital claims its C-section rate is 20%. You audit 150 deliveries and find 38 C-sections. The z-statistic for testing H₀: p = 0.20 vs. Hₐ: p ≠ 0.20 is:

z = (0.253 − 0.20) / √(0.253 × 0.747 / 150)
z = (0.253 − 0.20) / √(0.20 × 0.80 / 150)
z = (38 − 30) / √(0.20 × 0.80 × 150)
z = (0.253 − 0.20) / √(0.20 × 0.80 × 150)

Solution

Answer: b)

For a one-proportion z-test, use \(p_0\) (not \(\hat{p}\)) in the SE: \(z = (\hat{p} - p_0)/\sqrt{p_0(1-p_0)/n}\). Here \(\hat{p} = 38/150 = 0.253\) and \(p_0 = 0.20\).

Use this R output for Questions 74–78:

    2-sample test for equality of proportions

data:  c(85, 110) out of c(400, 500)
X-squared = 0.375, df = 1, p-value = 0.540
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.0503  0.0903
sample estimates:
prop 1 prop 2 
0.2125 0.2200

Question 74

What are the two sample proportions?

85 and 110
0.2125 and 0.2200
400 and 500
0.540 and 0.375

Solution

Answer: b)

The “sample estimates” row gives the sample proportions: 85/400 = 0.2125 and 110/500 = 0.2200.

Question 75

The p-value = 0.540. The correct conclusion (α = 0.05) is:

There is significant evidence of a difference in proportions
There is insufficient evidence of a difference in proportions
The proportions are exactly equal
We should reduce α to find significance

Solution

Answer: b)

p = 0.540 >> 0.05, so we fail to reject H₀. We do not conclude the proportions are equal — only that we lack evidence to distinguish them.

Question 76

The 95% CI (−0.0503, 0.0903) includes 0. This is consistent with:

Rejecting H₀ at α = 0.05
Failing to reject H₀ at α = 0.05
The CI and hypothesis test giving different answers
A statistically significant difference

Solution

Answer: b)

When the CI for a difference includes 0, we cannot rule out that the true difference is zero — consistent with failing to reject H₀.

Question 77

Why does the two-proportion test use a pooled proportion in the SE, but the CI does not?

The test assumes a specific value (H₀: p₁ = p₂) so it pools; the CI estimates without that assumption
The CI always uses pooled proportions
The test uses a larger standard error to be conservative
The CI uses individual proportions because they are more accurate

Solution

Answer: a)

Under H₀: p₁ = p₂, both groups share the same true p. Pooling uses all the data to estimate this common p. For a CI, we make no such assumption and estimate each group’s p separately.

Question 78

The success/failure condition for the two-proportion test requires:

Both groups to have n > 30
n₁p̂₁ ≥ 10, n₁(1−p̂₁) ≥ 10, n₂p̂₂ ≥ 10, n₂(1−p̂₂) ≥ 10
The pooled proportion to be greater than 0.5
Both proportions to be equal

Solution

Answer: b)

We check the success/failure condition separately in each group. All four counts must be ≥ 10. We can also check this with the pooled proportion, but we have to check n₁p̂p ≥ 10, n₁(1−p̂p) ≥ 10, n₂p̂p ≥ 10, n₂(1−p̂p) ≥ 10, and not p̂p > 0.5 as stated in c).

Section I: Chi-Square Tests

Use the following output and table for Questions 79–86:

    Pearson's Chi-squared test

data:  diet_cancer
X-squared = 12.43, df = 2, p-value = 0.00200

	Negative	Positive	Total
Mediterranean	210	40	250
Western	180	70	250
Vegan	110	15	125
Total	500	125	625

Question 79

What is df = 2 for a 3×2 table?

(3−1) × (2−1) = 2
(3+2) − 1 = 4
3 × 2 = 6
3 − 1 = 2

Solution

Answer: a)

\(df = (r-1)(c-1) = (3-1)(2-1) = 2 \times 1 = 2\).

Question 80

The expected count for the cell “Mediterranean / Positive” is:

40
125 × 250/625 = 50
500 × 250/625 = 200
Cannot determine from the table

Solution

Answer: b)

\(E = \frac{\text{row total} \times \text{column total}}{\text{grand total}} = \frac{250 \times 125}{625} = 50\).

The observed count was 40 — fewer positive results than expected under independence.

Question 81

The p-value = 0.002 means:

Diet causes cancer 0.2% of the time
If diet type and cancer screening result were independent, the probability of observing an association this strong or stronger is 0.2%
0.2% of participants had cancer
The chi-square statistic is wrong

Solution

Answer: b)

The p-value is always interpreted as: assuming H₀ (independence) is true, the probability of observing a test statistic this large or larger.

Question 82

A researcher concludes “the Mediterranean diet protects against cancer.” What is the limitation?

Chi-square can only detect associations, not causation
The p-value is too small to support this conclusion
Chi-square tests cannot be used with proportions
The sample size is too small

Solution

Answer: a)

Chi-square tests association between categorical variables. Without random assignment to diet, we cannot establish causation. Confounders (lifestyle, income, other health behaviors) could explain the pattern.

Question 83

This study sampled 625 individuals and asked about both diet and cancer screening. This is a test of:

Homogeneity
Independence
Goodness of fit
Proportions

Solution

Answer: b)

When one sample is drawn and two categorical variables are measured on each individual, we test for independence.

Question 84

If instead researchers had recruited 250 Mediterranean dieters, 250 Western dieters, and 125 vegans separately, then measured cancer screening, this would be:

A test of independence
A test of homogeneity
A two-proportion z-test
An ANOVA

Solution

Answer: b)

When multiple independent samples are drawn (row totals fixed by the researcher) and one outcome is measured, we test for homogeneity of proportions across groups.

Question 85

The conditions for this chi-square test require all expected counts to be:

E > 0
E ≥ 5
E ≥ 10
E ≥ 30

Solution

Answer: b)

The standard condition for chi-square tests is all expected cell counts ≥ 5 (not observed counts). If this fails, consider combining categories or using Fisher’s Exact Test.

Question 86

Which diet group has the lowest proportion of positive screening results?

Mediterranean
Western
Vegan
Vegan and Mediterranean tied

Solution

Answer: c)

Vegan: 15/125 = 12%; Mediterranean: 40/250 = 16%; Western: 70/250 = 28%. Vegan has the lowest positive rate.

Question 87

For a chi-square test, a large test statistic (relative to df) indicates:

Strong agreement between observed and expected counts
Large differences between observed and expected counts
A small sample size
That H₀ should be accepted

Solution

Answer: b)

\(\chi^2 = \sum (O-E)^2/E\). Large values arise when observed counts are far from what independence would predict.

Question 88

The chi-square distribution is:

Symmetric around 0
Always left-skewed
Non-negative and right-skewed
Identical to the t-distribution

Solution

Answer: c)

\(\chi^2\) is always ≥ 0 (because we square the deviations) and right-skewed. It approaches symmetry as df increases.

Question 89

A 2×2 table chi-square test and a two-proportion z-test are related by:

χ² = z
χ² = z²
z = √χ²
Both b and c

Solution

Answer: d)

For a 2×2 table, \(\chi^2 = z^2\), and equivalently \(z = \sqrt{\chi^2}\) (taking the appropriate sign). This is why both tests give identical two-sided p-values for 2×2 tables.

Question 90

If all expected counts exactly equal observed counts, the chi-square statistic is:

Undefined
Equal to the degrees of freedom
0
1

Solution

Answer: c)

\(\chi^2 = \sum (O-E)^2/E\). If O = E for every cell, every term is \((0)^2/E = 0\), so \(\chi^2 = 0\) — perfect consistency with H₀.

Section J: Mixed Concepts and Reading R Output

Question 91

A study tests H₀: μ₁ = μ₂ = μ₃ and gets F = 0.42, p = 0.66. The correct interpretation is:

All three means are significantly different
There is insufficient evidence that any of the group means differ
Two of the three means are equal
The F-statistic is too large to be meaningful

Solution

Answer: b)

p = 0.66 >> 0.05. We fail to reject H₀. A small F (close to 1) means between-group variation is no larger than within-group variation.

Question 92

In a regression model, a t-test is run on the slope coefficient. H₀ is:

The slope equals 1
The slope equals 0 (no linear relationship)
The intercept equals 0
R² equals 0

Solution

Answer: b)

H₀: β₁ = 0 tests whether the predictor has any linear relationship with the response. Failing to reject this means the slope is not significantly different from zero.

Question 93

The conditions for a two-sample t-test include:

Normal populations OR large samples (n₁ ≥ 30 and n₂ ≥ 30), independent groups
The two populations must have equal variances
Both samples must be the same size
The data must be from a controlled experiment

Solution

Answer: a)

The Welch t-test does not require equal variances. Equal sample sizes are not required. Observational studies can use t-tests (though causation cannot be concluded). The key conditions are independence and normality/large n.

Question 94

A paired t-test on 30 pairs has df = _____.

Solution

Answer: b)

For a paired t-test, df = n − 1 where n is the number of pairs, not the total number of observations. Here df = 30 − 1 = 29.

Question 95

Which test is most appropriate: 200 patients are randomized to Drug A, Drug B, or placebo, and mean pain scores are compared?

One-sample t-test
Paired t-test
Two-sample t-test
One-way ANOVA

Solution

Answer: d)

Three independent groups, comparing a numerical mean outcome → one-way ANOVA. Two-sample t-tests only handle two groups.

Question 96

Which test is most appropriate: 50 patients’ pain scores are measured before and after treatment?

One-sample t-test
Paired t-test
Two-sample t-test
Chi-square test

Solution

Answer: b)

Same patients measured at two time points → paired design. Each before–after pair is linked.

Question 97

Which test is most appropriate: testing whether blood type (A/B/AB/O) is associated with BMI category (normal/overweight/obese)?

ANOVA
Two-sample t-test
Chi-square test for independence
Regression

Solution

Answer: c)

Both variables are categorical (blood type = nominal; BMI category = ordinal treated as nominal) → chi-square test.

Question 98

Which test is most appropriate: studying the relationship between age and resting heart rate?

Chi-square test
ANOVA
Linear regression
Paired t-test

Solution

Answer: c)

Both variables are numerical → linear regression (or correlation). ANOVA would require categorizing one variable.

Question 99

A confidence interval for a proportion is (0.42, 0.58). If we test H₀: p = 0.50 (two-sided, α = 0.05), we would:

Reject H₀ because 0.50 is not the midpoint
Fail to reject H₀ because 0.50 falls within the interval
Reject H₀ because the interval is wide
Need a p-value to decide

Solution

Answer: b)

0.50 lies within (0.42, 0.58), so we fail to reject H₀: p = 0.50 at α = 0.05. The CI and hypothesis test are equivalent for two-sided tests.

Question 100

A study comparing penguin bill lengths across 3 species produces:

             Df Sum Sq Mean Sq F value   Pr(>F)    
species       2  7194   3597   410.6  <2e-16 ***
Residuals   330  2892      8.8

The between-species variability is ______ times greater than the within-species variability:

2
410.6
7194
3597

Solution

Answer: b)

F = MSG/MSE = 3597/8.8 ≈ 410.6. The F-statistic is the ratio of between-group to within-group variance.

Question 101

A 95% CI for a proportion is (0.31, 0.49). The sample proportion is:

0.31
0.49
0.40
0.18

Solution

Answer: c)

\(\hat{p}\) = midpoint = \((0.31 + 0.49)/2 = 0.40\).

Question 102

What is the margin of error for the CI (0.31, 0.49)?

0.18
0.09
0.40
0.31

Solution

Answer: b)

Margin of error = half the width = \((0.49 - 0.31)/2 = 0.18/2 = 0.09\).

Question 103

For large samples, the sampling distribution of p̂ is approximately:

t-distributed
chi-square distributed
Normal
Uniform

Solution

Answer: c)

By the CLT applied to proportions: when np̂ ≥ 10 and n(1−p̂) ≥ 10, the distribution of p̂ is approximately normal.

Question 104

The residual standard error in a regression output (2.14 on 98 df) estimates:

The average error in predicting y
The standard deviation of x
The slope of the regression line
The mean of the residuals

Solution

Answer: a)

The residual standard error (RSE) estimates the typical distance between observed y values and the regression line — the standard deviation of the residuals.

Question 105

An R² = 0.92 in a regression means:

The slope is 0.92
92% of the variability in y is explained by the linear relationship with x
The correlation coefficient is 0.92
The model predicts correctly 92% of the time

Solution

Answer: b)

R² is always “proportion of variability in y explained by the model.” Note: r = ±√0.92 ≈ ±0.959, not 0.92 itself.

Question 106

If r = −0.85 between two variables, we can say:

As x increases by 1 unit, y decreases by 0.85 units
There is a strong negative linear relationship
x causes y to decrease
85% of the variability in y is explained by x

Solution

Answer: b)

r describes strength and direction. It is not a slope (option a), does not imply causation (option c), and \(R^2 = r^2 = 0.7225\), not 0.85 (option d).

Question 107

A chi-square test with df = 4 and χ² = 2.1 (p = 0.72) would lead to:

Rejecting H₀ of independence
Failing to reject H₀ of independence
Concluding the variables are definitely independent
Concluding the test was underpowered

Solution

Answer: b)

p = 0.72 >> 0.05. We fail to reject H₀. We do NOT conclude independence — only that we lack evidence of association.

Question 108

ANOVA assumes:

The populations have equal means
The populations have equal variances (homoscedasticity)
The populations are all skewed
The sample sizes are all equal

Solution

Answer: b)

ANOVA conditions: (1) independent random samples, (2) approximately normal populations (or large n), (3) equal population variances. Equal sample sizes are helpful but not required.

Question 109

The pooled standard deviation in a two-sample t-test:

Weights both sample standard deviations equally
Weights the standard deviations by their respective degrees of freedom
Always gives a smaller SE than the Welch method
Is used when populations clearly have unequal variances

Solution

Answer: b)

The pooled SD is a weighted average using df as weights: \(s_p^2 = [(n_1-1)s_1^2 + (n_2-1)s_2^2]/(n_1+n_2-2)\). It is used when variances are assumed equal.

Question 110

A paired study finds t = 1.85, df = 19, p = 0.079 (two-sided). At α = 0.05:

The difference is statistically significant
The difference is not statistically significant
The null hypothesis is accepted
The study must be redone

Solution

Answer: b)

p = 0.079 > 0.05. Fail to reject H₀. We never “accept” H₀. The study could be repeated with more power, but there is no requirement to do so.

Question 111

Power = 1 − β. If β = 0.20, power is:

0.20
0.80
1.20
0.80%

Solution

Answer: b)

Power = 1 − β = 1 − 0.20 = 0.80. Power of 80% is considered the conventional minimum for well-designed studies.

Question 112

Which of the following best describes the relationship between α and power?

Increasing α increases power
Increasing α decreases power
α and power are independent
Power = 1 − α

Solution

Answer: a)

Increasing α makes it easier to reject H₀ → increases power but also increases Type I error. There is a fundamental trade-off between α (Type I error) and β (Type II error).

Question 113

A study is designed with 80% power. This means:

There is an 80% chance H₀ is false
If the effect exists, there is an 80% chance of detecting it
The p-value will be less than 0.20
The Type I error rate is 20%

Solution

Answer: b)

Power = P(reject H₀ | H₀ is false) = 0.80. There is a 20% chance of a Type II error — missing a real effect.

Question 114

If two 95% confidence intervals for two group means do not overlap, we can conclude:

The difference is not statistically significant
The difference is statistically significant at α = 0.05
The difference is practically significant
A t-test is unnecessary

Solution

Answer: b)

Non-overlapping 95% CIs imply significance at approximately α = 0.05. (Technically this is a conservative rule — overlapping CIs do not necessarily imply non-significance, but non-overlapping does imply significance.)

Question 115

The standard error of the difference between two independent means is:

SE = s₁/√n₁ + s₂/√n₂
SE = √(s₁²/n₁ + s₂²/n₂)
SE = (s₁ + s₂) / √(n₁ + n₂)
SE = s_pooled / √(n₁ + n₂)

Solution

Answer: b)

The correct formula adds variances (not standard deviations): \(SE = \sqrt{s_1^2/n_1 + s_2^2/n_2}\). Option a is wrong because you cannot add SEs directly.

Question 116

In the output Pr(>|t|) = 0.0043, the test is:

One-sided
Two-sided
Cannot tell from this notation
This notation indicates a chi-square test

Solution

Answer: b)

The notation |t| (absolute value of t) indicates the p-value is for a two-sided test: P(|T| > |t_observed|).

Question 117

Cramér’s V measures:

The significance of a chi-square test
The effect size (strength) of an association in a contingency table
The degrees of freedom
The expected cell count

Solution

Answer: b)

Cramér’s V ranges from 0 (no association) to 1 (perfect association). It is the chi-square analog of a correlation coefficient and measures effect size, not significance.

Question 118

For a regression slope, a 95% CI that does NOT include 0 means:

The intercept is significant
The slope is significantly different from 0 at α = 0.05
R² > 0.50
The residuals are normally distributed

Solution

Answer: b)

A 95% CI for the slope that excludes 0 is equivalent to rejecting H₀: β₁ = 0 at α = 0.05 (two-sided).

Question 119

A researcher says “increasing sleep by 1 hour causes exam scores to increase by 3 points, based on our regression.” What is the issue?

The slope is too small to be meaningful
Regression shows association; “causes” requires experimental design
Regression cannot predict exam scores
The intercept is not reported

Solution

Answer: b)

Unless sleep was experimentally manipulated (randomized), we only have an observational association. Confounders (study habits, stress, health) could explain the relationship.

Question 120

When checking conditions for a chi-square test, you calculate expected count = 3.5 for one cell. You should:

Proceed with the test as normal
Consider combining categories or using an alternative test
Increase α to 0.10 to compensate
Remove that cell from the table

Solution

Answer: b)

The chi-square approximation is not valid when expected counts are below 5. Options include collapsing categories, collecting more data, or using Fisher’s Exact Test.

Question 121

The df for a one-sample t-test with n = 25 is:

Solution

Answer: b)

df = n − 1 = 25 − 1 = 24 for any one-sample or paired t-test.

Question 122

An ANOVA F-test is always _____ tailed.

Left
Two
Right
It depends on the alternative hypothesis

Solution

Answer: c)

F-statistics are always non-negative. Large F values provide evidence against H₀. The p-value is always the area in the right tail of the F-distribution.

Question 123

Which R function conducts a chi-square test?

t.test()
prop.test()
chisq.test()
aov()

Solution

Answer: c)

chisq.test() conducts Pearson’s chi-square test. prop.test() conducts proportion tests (which for 2×2 tables produces an equivalent chi-square result, but is framed differently).

Question 124

Which R function conducts a paired t-test?

t.test(x, y, paired = TRUE)
chisq.test(x, y)
aov(y ~ group)
lm(y ~ x)

Solution

Answer: a)

Setting paired = TRUE in t.test() tells R to compute differences within pairs and test whether the mean difference equals zero.

Question 125

In a two-proportion test where the success/failure condition is barely met (np̂ = 10), you should:

Proceed and report results as you normally would
Note that the condition is just met and interpret results cautiously
Use a t-test instead
Double the sample size before analyzing

Solution

Answer: b)

np̂ = 10 is the minimum threshold. The normal approximation will be adequate but not ideal. It is good practice to note the borderline condition and interpret with appropriate caution.

Question 126

A scatterplot of residuals vs. fitted values should show:

A clear curved pattern
Randomly scattered points with no pattern
A strong positive linear pattern
All residuals close to zero

Solution

Answer: b)

A random scatter (no pattern) indicates that the linearity and equal variance conditions are met. Any systematic pattern (curve, fan, etc.) signals a violation.

Question 127

Which of the following is a correct statement about p-values?

p = 0.05 means H₀ has a 5% chance of being true
A very small p-value means a large effect
p-value is the probability of the observed data (or more extreme) given H₀ is true
p < 0.05 always implies the result is important

Solution

Answer: c)

This is the correct definition. p-values say nothing about the probability of H₀ being true (a), the effect size (b), or practical importance (d).

Question 128

A 90% CI is wider than a 99% CI.

True
False — a 99% CI is wider
They are the same width
It depends on the sample size

Solution

Answer: b)

Higher confidence level → larger critical value → wider CI. A 99% CI (\(z^* = 2.576\)) is wider than a 90% CI (\(z^* = 1.645\)).

Question 129

When a 95% CI for the difference in two means is (−2, 8), we:

Conclude there is a significant difference (α = 0.05)
Cannot conclude there is a significant difference (interval includes 0)
Conclude the means are equal
Need to know the sample size to interpret

Solution

Answer: b)

The CI contains 0, meaning 0 is a plausible value for the difference. We fail to reject H₀: μ₁ = μ₂ at α = 0.05.

Question 130

The mean of the sampling distribution of \(\bar{x}\) is:

The sample mean \(\bar{x}\)
The standard error σ/√n
The population mean μ
Zero

Solution

Answer: c)

The sampling distribution of \(\bar{x}\) is centered at the population mean μ. This is why \(\bar{x}\) is an unbiased estimator of μ.

Question 131

If H₀: p = 0.40 and the sample has \(\hat{p} = 0.40\), the test statistic z =:

0
1
Undefined
It depends on n

Solution

Answer: a)

\(z = (\hat{p} - p_0)/SE = (0.40 - 0.40)/SE = 0/SE = 0\). The observed proportion exactly matches the null, so there is no deviation to speak of.

Question 132

In ANOVA, MSG (Mean Square Between Groups) estimates:

The variance within groups
The variance explained by group differences
The total variance
The residual variance

Solution

Answer: b)

MSG = SS_between / df_between. It captures variability due to group membership. MSE captures variability within groups. The F-ratio compares them.

Question 133

A correlation of r = 0 means:

There is no relationship between x and y
There is no linear relationship between x and y
x and y are perfectly related
The regression slope is 1

Solution

Answer: b)

r measures the strength of the linear relationship. r = 0 could still be consistent with a strong nonlinear (e.g., quadratic) relationship.

Question 134

For a regression line, the predicted value at x = \(\bar{x}\) (the mean of x) is always:

0
b₀ (the intercept)
\(\bar{y}\) (the mean of y)
R²

Solution

Answer: c)

The regression line always passes through \((\bar{x}, \bar{y})\). Plugging x = \(\bar{x}\) into \(\hat{y} = b_0 + b_1 x\) gives \(\hat{y} = \bar{y}\).

Question 135

A hospital has a 3% surgical complication rate. A quality control audit of 30 surgeries finds 2 complications (6.7%). What is the most important reason to be cautious about testing this?

2 complications is a very small number
The success/failure condition: np₀ = 30 × 0.03 = 0.9 < 10
The sample size is too large
The complication rate is too high

Solution

Answer: b)

np₀ = 30 × 0.03 = 0.9 << 10. The normal approximation is severely violated. With so few expected complications, the binomial distribution is extremely skewed and the z-test is not reliable.

Question 136

In a two-sample t-test, the degrees of freedom (Welch approximation) are:

n₁ + n₂
n₁ + n₂ − 1
n₁ + n₂ − 2
A complex formula, typically smaller than n₁ + n₂ − 2

Solution

Answer: d)

The Welch-Satterthwaite df formula produces a non-integer value that is ≤ n₁ + n₂ − 2 (the pooled df). R computes this automatically.

Question 137

The residual in a regression equals:

y − ŷ
ŷ − y
y − ȳ
x − x̄

Solution

Answer: a)

Residual = observed − predicted = \(y_i - \hat{y}_i\). Positive residuals mean the model underpredicts; negative residuals mean overprediction.

Question 138

If p-value = 0.001 and α = 0.05, we:

Fail to reject H₀
Reject H₀
Accept H₀
Cannot make a decision

Solution

Answer: b)

0.001 < 0.05 = α, so we reject H₀. We never “accept” H₀ — we can only reject or fail to reject it.

Question 139

Which is the best description of “80% confidence interval”?

80% of all data falls in this interval
If repeated many times, 80% of such intervals contain the true parameter
There is an 80% chance the true mean equals the midpoint
The interval is correct 80% of the time

Solution

Answer: b)

The confidence level is a long-run frequency property of the procedure, not a probability statement about any single interval.

Question 140

For a regression, the LINE conditions stand for:

Likelihood, Independence, Normality, Error
Linearity, Independence, Normal residuals, Equal variance
Large n, Independence, Null hypothesis, Estimation
Linearity, Intercept, Nonlinearity, Errors

Solution

Answer: b)

LINE: Linearity (linear relationship between x and y), Independence (observations independent), Normal residuals (residuals approximately normal), Equal variance (constant spread of residuals).

Question 141

A statistically significant ANOVA tells us:

Which specific group means are different
That the variation between groups is larger than within groups, relative to chance
All groups are significantly different from each other
The F-statistic is greater than 5

Solution

Answer: b)

A significant ANOVA (F large, p small) only tells us that group means are not all equal. Post-hoc tests are needed to identify which groups differ.

Question 142

A chi-square test for homogeneity is used when:

One sample is drawn and two categorical variables are measured
Multiple independent samples are drawn and one categorical outcome is measured
Means from multiple groups are compared
Proportions are compared using a z-test

Solution

Answer: b)

Homogeneity: researcher fixes group sizes (row totals) and measures one outcome. Independence: one sample, two variables measured on each person.

Question 143

In a 4×3 contingency table, df =:

(4−1) × (3−1) = 6
(4 × 3) − 1 = 11
4 + 3 = 7
4 × 3 = 12

Solution

Answer: a)

\(df = (r-1)(c-1) = (4-1)(3-1) = 3 \times 2 = 6\).

Question 144

All else equal, a larger effect size leads to:

Lower power
Higher power
A larger Type I error rate
A wider confidence interval

Solution

Answer: b)

Larger effects are easier to detect. Higher power means we are more likely to correctly reject a false H₀.

Question 145

The standard normal distribution is used (instead of t) for inference about proportions because:

Proportions are always normally distributed
The SE formula for proportions is derived from the binomial, and for large n approximates the normal
t-distributions cannot be used with categorical data
Proportions have smaller standard errors than means

Solution

Answer: b)

The binomial distribution approximates the normal for large n (success/failure condition). Because the SE formula \(\sqrt{p(1-p)/n}\) is known from theory, we use z (not t, which is for unknown σ).

Question 146

A researcher reports r = 0.65 and p = 0.002 for a correlation between diet quality and cognitive test scores. The correct interpretation is:

Diet quality causes higher cognitive scores
There is a moderately strong positive linear relationship; diet quality explains approximately 42% of variability in cognitive scores
There is a 65% correlation, meaning 65% of the variation in cognitive scores is due to diet
The p-value proves a causal relationship

Solution

Answer: b)

r = 0.65 indicates a moderate positive relationship. \(R^2 = r^2 = 0.4225 \approx 42\%\) of variability explained. Correlation never implies causation.

Question 147

Adding more explanatory variables to a regression model always:

Decreases R²
Increases or maintains R²
Decreases the residual standard error
Increases the significance of the slope

Solution

Answer: b)

Adding predictors never decreases R² (it can only stay the same or increase), which is why adjusted R² is preferred when comparing models with different numbers of predictors.

Question 148

The null hypothesis for a chi-square test of independence is:

The two variables are perfectly correlated
The two variables are independent (no association)
All expected counts equal all observed counts
The test statistic equals the degrees of freedom

Solution

Answer: b)

H₀: the two categorical variables are independent (knowing the value of one tells you nothing about the other). Option c would be perfect fit, not independence.

Question 149

When the population is normal, a one-sample t-test is valid for:

Any sample size
Only n ≥ 30
Only n ≥ 10
Only large samples

Solution

Answer: a)

When the population is truly normal, the t-test is exact for any n. The n ≥ 30 rule is a practical guideline for when the CLT applies to non-normal populations.

Question 150

A result is practically significant if:

p < 0.05
The effect is large enough to matter in real-world terms
The study was well-designed
The confidence interval does not include 0

Solution

Answer: b)

Practical significance is a judgment about whether the effect size is meaningful, not a statistical threshold. It requires domain knowledge, not just a p-value.

PART 2: SHORT ANSWER (50 questions, 2 points each)

For all questions, write complete sentences. Provide both a statistical conclusion and a plain-language interpretation.

SA1

The sampling distribution of \(\bar{x}\) for samples of size n = 36 from a population with μ = 70 and σ = 18 has what mean and standard error? Describe in one sentence what this distribution represents.

Solution

Mean = 70; \(SE = 18/\sqrt{36} = 3\).

This distribution represents all possible sample means we could get if we repeatedly drew samples of size 36 from this population — most would be near 70, and about 95% would fall within ±6 (i.e., two standard errors) of 70.

SA2

A 95% CI for mean recovery time is (6.4, 9.2) days. A doctor claims “most patients recover in 6 to 9 days.” Is this a correct interpretation of the CI? Explain.

Solution

No. The CI is a statement about the population mean, not individual patients. It says we are 95% confident the true mean recovery time lies between 6.4 and 9.2 days. Individual recovery times will vary much more widely around that mean — many patients could recover outside this range. To describe individual patient variation, you would use a prediction interval, not a confidence interval.

SA3

A study testing whether a new antidepressant reduces depression scores gets t = −1.85, df = 28, p = 0.075 (two-sided). Write a complete conclusion at α = 0.05, including what this p-value means.

Solution

Statistical conclusion: p = 0.075 > 0.05, so we fail to reject H₀ at α = 0.05. There is insufficient evidence that the antidepressant significantly reduces depression scores.

Interpretation of p-value: If the drug had no true effect (H₀ true), there would be a 7.5% chance of observing a mean reduction as large as this or larger just by sampling variability.

Note: This is not strong evidence that the drug doesn’t work — the study may simply lack sufficient power to detect a real effect.

SA4

Define Type I and Type II errors. In the context of testing whether a new drug lowers blood pressure, describe a real-world consequence of each error type.

Solution

Type I error (false positive, probability = α): Rejecting H₀ when it is true — concluding the drug lowers blood pressure when it actually does not. Consequence: the drug may be approved and prescribed, exposing patients to side effects and costs with no real benefit.

Type II error (false negative, probability = β): Failing to reject H₀ when it is false — concluding there is insufficient evidence that the drug works, even though it actually does. Consequence: an effective treatment is abandoned; patients continue to suffer uncontrolled hypertension.

SA5

A study with n = 5,000 finds that people who exercise 3 hours/week have 0.5% lower resting heart rate than sedentary people (p = 0.03). Comment on both statistical and practical significance.

Solution

Statistically significant: p = 0.03 < 0.05, so we reject H₀. The difference is unlikely due to chance.

Practically insignificant: A 0.5% difference in resting heart rate (e.g., ~0.35 bpm if mean is 70 bpm) is far too small to be clinically meaningful. With n = 5,000, even trivially small effects achieve statistical significance. Clinicians would not change treatment recommendations based on a 0.35 bpm difference. This is a classic example where statistical significance overstates the importance of the finding.

SA6

Use the following output to answer this question:

    Paired t-test

data:  post - pre
t = 4.12, df = 39, p-value = 0.00018
95 percent confidence interval:
 2.10  6.30
sample estimates:
mean of x 
     4.20

This test compares sleep quality scores (0–10) before and after a sleep hygiene program for 40 participants. Write a complete conclusion with interpretation.

Solution

Hypotheses: H₀: mean difference = 0; Hₐ: mean difference ≠ 0.

Statistical conclusion: t(39) = 4.12, p = 0.00018 < 0.05. We reject H₀. There is very strong evidence that sleep quality scores changed significantly after the program.

Interpretation: The mean increase in sleep quality score was 4.20 points. We are 95% confident the true mean improvement is between 2.10 and 6.30 points on the 0–10 scale. Since the CI is entirely positive, participants improved. An improvement of 2–6 points on a 10-point scale would likely be considered clinically meaningful.

SA7

Explain why a paired design is more powerful than an independent samples design for measuring change over time within the same subjects.

Solution

In a paired design, we analyze within-subject differences, which removes between-subject variability. People naturally differ in their baseline values (e.g., some have inherently higher blood pressure). In an independent samples design, this between-person variation inflates the SE, making it harder to detect the treatment effect. By looking only at each person’s own change, the paired test eliminates this noise — the SE of the differences is often much smaller than the SE of the group means, leading to a larger t-statistic and greater power.

SA8

A Welch two-sample t-test gives t = 2.14, df = 45.2, p = 0.037. The 95% CI for the difference is (0.18, 5.82). Interpret the CI and comment on practical significance for weight loss in kg.

Solution

CI interpretation: We are 95% confident the true difference in mean weight loss between the two groups is between 0.18 and 5.82 kg. Since the interval excludes 0, there is a statistically significant difference at α = 0.05.

Practical significance: The very wide CI reveals substantial uncertainty. The lower bound (0.18 kg ≈ 6 oz) is trivially small and clinically irrelevant, while the upper bound (5.82 kg) would be clinically meaningful. With this much uncertainty, we cannot confidently say whether the treatment produces a practically important weight difference. A larger study is needed to narrow the CI and determine whether the effect is clinically relevant.

SA9

A power analysis suggests you need n = 120 participants per group to detect an effect size of d = 0.40 with 80% power at α = 0.05. Your budget allows only n = 60 per group. Describe two consequences of proceeding with the smaller sample.

Solution

1. Reduced power: With n = 60 instead of 120, power will be well below 80% (roughly 55–60% for d = 0.40). There is a substantially higher probability of failing to detect a real effect (Type II error), meaning the study may conclude “no effect” even if the treatment truly works.

2. Less precise estimates: Confidence intervals will be wider, providing less informative estimates of effect size. Even if the result is statistically significant, the CI will span a large range, making it difficult to assess whether the effect is practically meaningful.

SA10

A researcher states: “Our study failed to detect a significant effect (p = 0.15), proving that the treatment doesn’t work.” Identify and explain the flaw.

Solution

Flaw: Failing to reject H₀ is not the same as proving H₀ is true. “Absence of evidence is not evidence of absence.”

A non-significant result (p = 0.15) could occur because: (1) the treatment truly has no effect, OR (2) the study was underpowered — the sample was too small to detect a real effect. Without knowing the power of the study, we cannot interpret the non-significant result as proof that the treatment doesn’t work. The researcher should report a confidence interval and discuss whether the study was adequately powered to detect a clinically meaningful effect.

SA11

Use this regression output for SA11–SA15:

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   95.200     12.800    7.44    <2e-16 ***
salt_intake    1.820      0.410    4.44    0.0001 ***

Multiple R-squared:  0.289
Residual standard error: 8.3 on 73 degrees of freedom

This model predicts systolic blood pressure (mmHg) from daily salt intake (grams).

Write the regression equation and interpret the slope in biological terms.

Solution

Equation: \(\widehat{\text{BP}} = 95.2 + 1.82 \times \text{salt intake}\)

Slope interpretation: For each additional gram of daily salt intake, predicted systolic blood pressure increases by 1.82 mmHg on average, holding all other factors constant. This suggests a positive association between sodium consumption and blood pressure.

SA12

Interpret R² = 0.289 in context.

Solution

Daily salt intake explains approximately 28.9% of the variability in systolic blood pressure across participants. This means the linear relationship with salt accounts for less than a third of BP variation — other factors (age, genetics, exercise, medications, overall diet) explain the remaining ~71%.

SA13

Is salt intake a statistically significant predictor of blood pressure? Cite specific output values.

Solution

Yes. The t-statistic for the slope is t = 4.44 with p = 0.0001 < 0.05. We reject H₀: β₁ = 0. There is very strong evidence that salt intake is a statistically significant linear predictor of systolic blood pressure.

SA14

A patient consumes 10g of salt per day. What is the predicted systolic blood pressure? Show your work.

Solution

\(\hat{y} = 95.2 + 1.82 \times 10 = 95.2 + 18.2 = 113.4\) mmHg.

We predict a systolic blood pressure of 113.4 mmHg for someone consuming 10g of salt daily. This prediction is valid only if 10g is within the range of salt intakes observed in the study.

SA15

A researcher concludes: “Reducing salt intake will lower blood pressure.” Is this conclusion justified? Explain.

Solution

Not fully justified. The regression output shows a statistically significant positive association between salt intake and blood pressure. However, unless this study involved random assignment of salt intake levels (an experiment), we cannot conclude causation. Confounders — such as overall diet quality, exercise habits, or socioeconomic status — could explain why people who eat more salt also have higher blood pressure. To justify a causal claim, a randomized controlled trial where participants are assigned to different sodium intake levels would be needed.

SA16

Use this ANOVA output for SA16–SA20:

Analysis of Variance Table

            Df Sum Sq Mean Sq F value   Pr(>F)    
habitat      3  156.8   52.27   8.43   0.0001 ***
Residuals   96  595.2    6.20

This compares mean nest-building time (hours) for birds from 4 habitat types.

State the hypotheses and write a complete conclusion.

Solution

H₀: μ₁ = μ₂ = μ₃ = μ₄ (mean nest-building times are equal across all four habitat types)

Hₐ: At least one habitat type has a different mean nest-building time

Conclusion: F(3, 96) = 8.43, p = 0.0001 < 0.05. We reject H₀. There is very strong evidence that mean nest-building time differs across at least one pair of habitat types.

SA17

Calculate the total sample size.

Solution

Total df = df_between + df_within = 3 + 96 = 99 = n − 1, so n = 100 total birds observed.

SA18

Why would it be inappropriate to test all pairwise comparisons with individual t-tests after seeing this ANOVA result?

Solution

With 4 groups there are \(\binom{4}{2} = 6\) pairwise comparisons. If each is tested at α = 0.05, the probability of at least one false positive is \(1 - (0.95)^6 \approx 0.26\) — nearly 5 times the intended error rate. This inflated Type I error means we would be very likely to conclude differences exist even when all means are truly equal. Post-hoc procedures (like Tukey’s HSD) correct for this by adjusting the threshold for each comparison.

SA19

The ANOVA is significant. What does this tell us (and what does it NOT tell us)?

Solution

What it tells us: At least one habitat type has a significantly different mean nest-building time from at least one other. The between-habitat variation is much larger than within-habitat variation (F = 8.43).

What it does NOT tell us: Which specific habitats differ from each other. We know there are differences somewhere, but not whether it’s one habitat vs. all others, or multiple pairwise differences. Post-hoc tests are needed for that.

SA20

What additional analysis would be conducted, and what would it determine?

Solution

A post-hoc multiple comparison procedure (e.g., Tukey’s Honestly Significant Difference, or Bonferroni correction) would be conducted. It would test all pairwise comparisons (e.g., forest vs. meadow, forest vs. wetland, etc.) while controlling the family-wise Type I error rate. This identifies specifically which habitat pairs have significantly different mean nest-building times.

SA21

In a survey of 400 patients, 124 reported side effects from a medication (\(\hat{p} = 0.31\)). Calculate a 95% confidence interval for the true proportion. Verify the success/failure condition.

Solution

Success/Failure check: - Successes: \(n\hat{p} = 400 \times 0.31 = 124 \geq 10\) ✅ - Failures: \(n(1-\hat{p}) = 400 \times 0.69 = 276 \geq 10\) ✅

SE: \(\sqrt{0.31 \times 0.69 / 400} = \sqrt{0.000534} \approx 0.0231\)

95% CI: \(0.31 \pm 1.96 \times 0.0231 = 0.31 \pm 0.045 = (0.265, 0.355)\)

We are 95% confident that between 26.5% and 35.5% of patients on this medication experience side effects.

SA22

A public health official claims “fewer than 25% of adults in our county are vaccinated against flu.” You survey 300 adults and find 66 vaccinated (\(\hat{p} = 0.22\)). Set up the hypotheses and describe what you need from R output to reach a conclusion.

Solution

H₀: p = 0.25 (vaccination rate equals 25%)

Hₐ: p < 0.25 (vaccination rate is less than 25%) — one-sided, matching the official’s claim

From R output, you need: - The test statistic z - The p-value for the one-sided alternative - Verification that the success/failure condition is met (\(np_0 = 300 \times 0.25 = 75 \geq 10\) ✅)

If p < 0.05, we reject H₀ and conclude there is evidence that fewer than 25% are vaccinated.

SA23

For testing H₀: p = 0.25 vs. Hₐ: p < 0.25 with \(\hat{p} = 0.22\) and n = 300, write the SE formula and explain why it uses p₀ rather than p̂.

Solution

Formula: \(SE = \sqrt{\frac{p_0(1-p_0)}{n}} = \sqrt{\frac{0.25 \times 0.75}{300}} = \sqrt{0.000625} = 0.025\)

Why p₀: Under the null hypothesis, we assume the true proportion is p = 0.25. The test asks: “If p really were 0.25, how surprising is our result?” So we compute the SE using the assumed (null) value. Using \(\hat{p}\) instead would mean we’re not testing against a specific claim — we’d be estimating, which is what we do for CIs.

SA24

Two hospitals report C-section rates: Hospital A: 180/800 = 22.5%; Hospital B: 210/700 = 30.0%. R output shows a 95% CI for the difference of (−0.118, −0.032). Interpret this CI.

Solution

We are 95% confident that the true difference in C-section rates (Hospital A − Hospital B) is between −11.8 and −3.2 percentage points. Since the entire interval is negative, Hospital A’s true C-section rate is significantly lower than Hospital B’s at α = 0.05. The magnitude of the difference is between about 3 and 12 percentage points, which represents a clinically meaningful gap in surgical practice.

SA25

The success/failure condition fails for a rare disease study (only 4 cases out of 500 patients tested). Explain the implication and what you might do instead.

Solution

When \(n\hat{p} = 4 < 10\), the sampling distribution of \(\hat{p}\) is highly skewed (not approximately normal), so the z-based normal approximation is unreliable. The resulting CIs and p-values could be inaccurate.

Alternatives: (1) Use Fisher’s Exact Test for 2×2 tables, which does not rely on the normal approximation; (2) use exact binomial methods for single proportions; (3) collect more data until the condition is met; (4) report the raw counts and use exact methods in the analysis.

SA26

Use the following for SA26–SA30:

    Pearson's Chi-squared test

data:  exercise_obesity
X-squared = 9.87, df = 1, p-value = 0.0017

	Obese	Not Obese	Total
Exercise regularly	45	255	300
Does not exercise	80	220	300
Total	125	475	600

Calculate the expected count for “Exercise regularly / Obese” and check conditions.

Solution

\(E = \frac{300 \times 125}{600} = \frac{37500}{600} = 62.5\)

All expected counts: - Exercise / Obese: 62.5 ✅ (≥ 5) - Exercise / Not Obese: 237.5 ✅ - No exercise / Obese: 62.5 ✅ - No exercise / Not Obese: 237.5 ✅

Conditions are met. Note: the observed count (45) is notably lower than expected (62.5), contributing to a large chi-square.

SA27

Write a complete conclusion based on the R output.

Solution

Statistical conclusion: \(\chi^2(1) = 9.87\), p = 0.0017 < 0.05. We reject H₀ of independence. There is strong evidence of a statistically significant association between exercise status and obesity.

Plain language: People who do not exercise regularly are disproportionately more likely to be classified as obese compared to those who exercise regularly. This association is unlikely to be due to chance alone.

SA28

Is this a test of independence or homogeneity? Explain.

Solution

Test of independence. The description implies one random sample of 600 individuals was drawn, and both exercise status and obesity status were measured on each person. When both categorical variables are measured on the same sample (neither margin was fixed in advance by the researcher), we use the test of independence.

If the researcher had recruited 300 exercisers and 300 non-exercisers separately and then measured obesity in each group, it would be homogeneity.

SA29

A student says “Since the chi-square test is significant, we can conclude that lack of exercise causes obesity.” Respond.

Solution

This conclusion is not justified. The chi-square test detects a statistically significant association between exercise and obesity, but cannot establish causation. This appears to be an observational study — participants chose whether to exercise, they were not randomly assigned. Numerous confounders could explain the relationship: diet, genetics, occupation, socioeconomic status, or underlying health conditions. To support causation, we would need a randomized controlled experiment where participants are assigned to exercise or not exercise, with all other factors controlled.

SA30

The relative risk (RR) of obesity for non-exercisers vs. exercisers is (80/300)/(45/300). Calculate and interpret this RR. Does the chi-square test provide this information?

Solution

\(RR = \frac{80/300}{45/300} = \frac{0.267}{0.150} = 1.78\)

Interpretation: Non-exercisers are 1.78 times as likely to be obese as those who exercise regularly — a 78% higher relative risk of obesity.

Does chi-square provide this? No. The chi-square test only tells us whether the association is statistically significant. It does not quantify the strength or direction of the association. The RR (or odds ratio) is needed to describe the magnitude of the relationship.

SA31

Explain in plain language what a p-value is and what it is not. Give an example of a common misinterpretation and correct it.

Solution

What it is: The p-value is the probability of observing data as extreme as (or more extreme than) what we saw, assuming the null hypothesis is true. It measures how surprising our result would be if H₀ were correct.

What it is not: It is NOT the probability that H₀ is true, nor the probability that the result is due to chance, nor the probability that the alternative hypothesis is true.

Common misinterpretation: “p = 0.03 means there is only a 3% chance this result was due to chance.”

Correction: p = 0.03 means: if H₀ were true, we would observe a result this extreme only 3% of the time. It says nothing about the probability that H₀ is true.

SA32

A 95% CI for a regression slope is (0.3, 1.2). Interpret this interval and state what you can conclude about the significance of the predictor.

Solution

Interpretation: We are 95% confident the true slope is between 0.3 and 1.2. For each one-unit increase in x, the predicted y increases by somewhere between 0.3 and 1.2 units (on average) in the population.

Significance: Since the entire CI is positive and does not include 0, the slope is statistically significantly different from zero at α = 0.05 (two-sided). This means x is a statistically significant linear predictor of y.

SA33

Describe the three main conditions for valid t-tests and explain what happens if they are violated.

Solution

1. Random sampling / Independence: Observations must be independent. Violation (e.g., clustered data, repeated measures handled incorrectly) leads to artificially small standard errors and inflated Type I error rates.

2. Normality (or large n): Either the population is approximately normal, or n is large enough for the CLT to apply (generally n ≥ 30). For small samples from highly skewed populations, the t-test can produce inaccurate p-values.

3. For two-sample tests — independence between groups: The two groups must not be paired or related. Ignoring pairing (using two-sample instead of paired t) inflates the SE and reduces power.

SA34

Explain the relationship between confidence intervals and hypothesis tests: how can you use a CI to make a hypothesis test decision?

Solution

A 95% CI and a two-sided hypothesis test at α = 0.05 are mathematically equivalent:

If the null value falls inside the 95% CI → fail to reject H₀ at α = 0.05
If the null value falls outside the 95% CI → reject H₀ at α = 0.05

For example: if testing H₀: μ = 0 and the 95% CI is (2.1, 5.8), then 0 is not in the interval → reject H₀. This works because both procedures use the same critical value (\(z^* = 1.96\) or \(t^*\)) and the same SE.

The CI is generally more informative because it also tells you the magnitude of the effect, not just whether it’s significant.

SA35

A study finds no significant difference between two drugs (p = 0.18). A pharmaceutical company says “this proves the drugs are equally effective.” Write a rebuttal in at most three sentences.

Solution

Failing to reject H₀ is not the same as proving H₀ is true — absence of evidence is not evidence of absence. The non-significant result (p = 0.18) could easily reflect a study that was underpowered to detect a real difference, rather than evidence that no difference exists. To support a claim of equivalence, the study would need to use an equivalence testing framework and demonstrate with a confidence interval that any possible difference is too small to be practically meaningful.

SA36

Compare the chi-square test for independence and the two-proportion z-test. When would you prefer each? What is the mathematical relationship for a 2×2 table?

Solution

Use two-proportion z-test when: you have exactly two groups with a binary outcome and want a directional (one-sided) test, or want a CI for the difference in proportions.

Use chi-square when: you have more than two categories in either variable, when you want to describe overall association without a directional hypothesis, or when a one-sided test is not meaningful.

Mathematical relationship for 2×2 tables: \(\chi^2 = z^2\) (and \(z = \pm\sqrt{\chi^2}\)). Both tests produce the same two-sided p-value for a 2×2 table.

SA37

A researcher presents regression output with R² = 0.95 and significant predictors, but the residual plot shows a clear U-shape. What concern does this raise?

Solution

A U-shaped residual plot indicates that the linearity condition is violated — the true relationship between x and y is nonlinear, and a straight line is not an adequate model. Despite the high R², the model is systematically wrong: it overpredicts in the middle range of x and underpredicts at the extremes (or vice versa). This means the inference results (p-values, CIs for the slope) are not valid, because they rely on the assumption that the linear model is correctly specified. A nonlinear model (e.g., quadratic) or transformation of variables would be more appropriate.

SA38

Interpret the following ANOVA result for a study comparing 5 diets:

         Df Sum Sq Mean Sq F value Pr(>F)
diet      4   1240     310    2.15  0.076
Residuals 95  13690     144

Write a complete conclusion and comment on α = 0.05 vs. α = 0.10 decisions.

Solution

Conclusion at α = 0.05: F(4, 95) = 2.15, p = 0.076 > 0.05. We fail to reject H₀. There is insufficient evidence at the 5% significance level that mean outcomes differ across the five diets.

At α = 0.10: p = 0.076 < 0.10, so we would reject H₀ and conclude significant differences exist at the 10% level.

Commentary: The p-value is borderline. The choice of α matters here. Most scientific fields use α = 0.05 as the standard. The study may be underpowered; examining effect sizes and CIs for each diet comparison would provide more context than the p-value alone.

SA39

A study reports Cramér’s V = 0.08 from a chi-square test with p < 0.001 and n = 10,000. Explain what this tells us.

Solution

This result illustrates the distinction between statistical and practical significance. The chi-square test is highly significant (p < 0.001), meaning we are very confident there is a real association between the two categorical variables in the population. However, Cramér’s V = 0.08 indicates a very weak association — far below the 0.3 threshold for a moderate effect. With n = 10,000, even trivially small associations become statistically detectable. In practical terms, knowing one variable tells us almost nothing about the other. This association, while real, is likely not meaningful for decision-making.

SA40

Explain why using p₀(1-p₀)/n (rather than p̂(1-p̂)/n) in the denominator of the one-proportion z-test is correct.

Solution

The z-test asks: Assuming H₀ is true (p = p₀), how surprising is our observed p̂? Under H₀, the true standard deviation of p̂ is \(\sqrt{p_0(1-p_0)/n}\) — not \(\sqrt{\hat{p}(1-\hat{p})/n}\). Using \(p_0\) is consistent with the logic of hypothesis testing: we evaluate the evidence against H₀ from H₀’s perspective, not from our sample’s perspective. For confidence intervals, we have no assumed value for p, so we estimate it with \(\hat{p}\) — hence the different SE formula.

SA41

A regression line has \(\hat{y} = 50 + 2.5x\). For a new observation (x = 10, y = 73), calculate the residual and explain what it means.

Solution

\(\hat{y} = 50 + 2.5(10) = 50 + 25 = 75\)

Residual \(= y - \hat{y} = 73 - 75 = -2\)

Interpretation: The observed value (73) is 2 units below what the model predicted (75). The model slightly overpredicts for this individual. A negative residual means the model predicts higher than what actually occurred.

SA42

In an ANOVA comparing mean reaction times for 3 groups, MSE = 250 and MSG = 750. What is F? Interpret it.

Solution

\(F = MSG/MSE = 750/250 = 3.0\)

Interpretation: The variability between group means is 3 times as large as the typical variability within groups. Whether this is statistically significant depends on the degrees of freedom and the p-value from the F-distribution (not given here, consistent with the final exam format where p-values come from R output).

SA43

A 95% CI for a proportion is (0.48, 0.64). Someone asks: “Does this mean there’s a 95% chance the true proportion is between 0.48 and 0.64?” How do you respond?

Solution

No — this is a common misinterpretation. The true proportion is a fixed (though unknown) value, not a random variable. It either is or isn’t between 0.48 and 0.64 — we just don’t know which. The correct interpretation is: the interval (0.48, 0.64) was constructed using a procedure that, if repeated many times with different samples, would capture the true proportion in 95% of the resulting intervals. The 95% refers to the long-run success rate of the procedure, not to any probability about this specific interval.

SA44

A paired t-test gives t = 0.72, df = 24, p = 0.479 (two-sided). The 95% CI is (−1.8, 3.9). Write a complete conclusion and discuss whether the lack of significance means “no effect.”

Solution

Conclusion: t(24) = 0.72, p = 0.479 > 0.05. We fail to reject H₀. There is insufficient evidence of a significant mean difference.

Does this mean no effect? Not necessarily. The 95% CI (−1.8, 3.9) is quite wide, spanning both negative values (the treatment could help) and positive values (it could harm). This range includes potentially meaningful effects in either direction. The study likely lacks sufficient power to detect a real effect if it exists. A well-powered study with n large enough to narrow the CI would be needed before concluding there is truly no effect.

SA45

Explain what an “influential point” is in regression. How does it differ from an outlier in the y-direction? Why does it matter for inference?

Solution

An influential point is an observation that, when removed, substantially changes the regression slope. Influential points typically have an extreme x-value (high leverage) — they sit far from the mean of x. A point can be influential without having a large residual if the regression line bends toward it.

An outlier in y has a large residual (its observed y is far from the regression line) but may not be influential if it has an average x-value.

Why it matters: If an influential point drives the slope, then removing that one observation would lead to very different conclusions. Inference (significance of slope, CI for slope) may be valid only because of that point’s influence, not because of the overall pattern. Sensitive analyses should check results with and without influential points.

SA46

A clinical trial randomizes 200 patients to vaccine (n = 100) or placebo (n = 100). Vaccine: 5 infections; Placebo: 18 infections.

    2-sample test for equality of proportions

X-squared = 7.92, df = 1, p-value = 0.0049
95 percent confidence interval:
 -0.195  -0.035
sample estimates:
prop 1 prop 2 
 0.050  0.180

Write a complete statistical analysis: check conditions, state hypotheses, interpret output, and write conclusions.

Solution

Conditions: - Vaccine: \(n\hat{p}_1 = 100 \times 0.05 = 5 < 10\) ⚠️ — borderline violation; note this caveat - Placebo: \(n\hat{p}_2 = 100 \times 0.18 = 18 \geq 10\) ✅; \(n(1-\hat{p}_2) = 82 \geq 10\) ✅ - Independence: random assignment ✅

The success/failure condition is not fully met for the vaccine group (only 5 infections). Results should be interpreted with some caution; exact methods would be more reliable.

Hypotheses: H₀: p_vaccine = p_placebo; Hₐ: p_vaccine ≠ p_placebo (two-sided)

Conclusion: \(\chi^2(1) = 7.92\), p = 0.0049 < 0.05. We reject H₀. There is strong evidence that infection rates differ between the vaccine and placebo groups.

Plain language: The infection rate was significantly lower in the vaccine group (5%) than the placebo group (18%). We are 95% confident the true reduction in infection rate due to vaccination is between 3.5 and 19.5 percentage points — a practically meaningful benefit.

SA47

ANOVA output compares cholesterol for 4 diets (n = 25/group). F = 5.23, p = 0.002. Means: Diet 1 = 185, Diet 2 = 195, Diet 3 = 200, Diet 4 = 205.

Write the conclusion. b) Does this tell you which diets differ? What would you do next? c) Can you conclude Diet 1 is best?

Solution

a) F(3, 96) = 5.23, p = 0.002 < 0.05. We reject H₀: μ₁ = μ₂ = μ₃ = μ₄. There is strong evidence that mean cholesterol levels differ across at least one pair of the four diets.

b) No — ANOVA only tells us that differences exist somewhere. A post-hoc test (e.g., Tukey’s HSD) would determine which specific pairs of diets have significantly different mean cholesterol levels.

c) Not yet. Diet 1 has the lowest observed mean (185 mg/dL), but we do not know whether it is significantly lower than Diets 2, 3, or 4. Tukey’s HSD would determine whether the difference between Diet 1 and others (e.g., 185 vs. 195 = 10 mg/dL difference) is statistically significant.

SA48

Chi-square test: insurance type (4 categories) × preventive screening (Yes/No) gives χ² = 21.4, df = 3, p < 0.001. Cell: uninsured / No screening: O = 145, E = 98.

Does this cell drive the result? b) Write the conclusion. c) What does the association imply?

Solution

a) Yes. This cell’s contribution to the chi-square is \((145-98)^2/98 = 2209/98 \approx 22.5\), which alone exceeds the entire test statistic of 21.4. The uninsured group has substantially more people forgoing screening than expected under independence — this is the primary driver of the result.

b) \(\chi^2(3) = 21.4\), p < 0.001 < 0.05. We reject H₀ of independence. There is very strong evidence that insurance type is associated with likelihood of receiving preventive screening.

c) Uninsured individuals are disproportionately less likely to receive preventive cancer screening than would be expected if insurance status and screening were unrelated. This suggests that lack of insurance creates a barrier to preventive care, with potential implications for health equity and policy — though causation cannot be confirmed from this observational study.

SA49

Regression output (birth weight ~ maternal age):

              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2800.0     180.0   15.56   <2e-16 ***
maternal_age    12.5       4.2    2.98    0.003 **

Multiple R-squared:  0.041

Write and interpret equation. b) Comment on R². c) Is slope significant? d) Should maternal age be used clinically?

Solution

a) \(\widehat{\text{birth weight}} = 2800 + 12.5 \times \text{maternal age}\) (in grams)

For each additional year of maternal age, predicted birth weight increases by 12.5 grams on average.

b) R² = 0.041: maternal age explains only 4.1% of the variability in birth weight. The vast majority of variation in birth weight is explained by other factors (gestational age, nutrition, genetics, etc.).

c) Yes. t = 2.98, p = 0.003 < 0.05. Maternal age is a statistically significant predictor of birth weight. However, with such a small R², statistical significance does not imply clinical importance.

d) No. Despite statistical significance, maternal age explains only 4% of birth weight variation. A 12.5-gram increase per year of maternal age is not clinically meaningful, and the model would produce very imprecise predictions for individual patients. More relevant clinical predictors should be used.

SA50

Describe two situations from this course where a statistically significant result did NOT imply a practically meaningful conclusion. For each, explain what additional information is needed. Then describe one situation where a non-significant result was still worth reporting.

Solution

Situation 1 — Large n makes tiny effects significant: The dietary supplement study (n = 10,000, weight loss = 0.1 kg, p = 0.002). Statistical significance was achieved, but a 0.1 kg difference is clinically trivial. Additional information needed: effect size (Cohen’s d or raw difference with CI) and clinical judgment about the minimum meaningful threshold.

Situation 2 — Significant correlation, tiny R²: The exercise-heart rate regression (r = −0.385, p < 0.001) where R² = 0.148 — exercise explained only 15% of heart rate variability. While statistically significant, the model would make poor predictions for individual patients. Additional information needed: R² and residual standard error to assess practical predictive value.

Non-significant result worth reporting: The study comparing two vaccines (X-squared = 0.375, p = 0.540). Even though no difference was detected, this result — combined with the wide CI (−0.05, 0.09) — informs public health policy. Knowing the vaccines are likely comparable in efficacy can guide procurement decisions, particularly if one is cheaper or easier to distribute. The non-significant result is informative, but only when accompanied by a CI that shows the range of plausible differences.

Remember: The final exam emphasizes interpretation. There are no distribution tables — p-values will be provided in R output. Focus on reading output correctly, checking conditions, and writing clear conclusions in biological context.

End of Practice Final — Good Luck! 🎓

“The goal is not to find statistical significance — the goal is to learn something true about the world.”