STATS 17 Practice Quiz
60 Questions Covering All Learning Objectives
0.1 Instructions
This practice set contains 100 questions designed to help you prepare for the final exam. Click on the “Show Answer” buttons to reveal solutions.
1 Part I: Multiple Choice Questions (70 questions)
1.1 Normal Distribution and Z-Scores (Questions 1-10)
1.1.1 Question 1
If X follows a normal distribution with μ = 100 and σ = 15, what is P(X > 115)?
- Smaller than 0.5
- Greater than 0.5
- Equal to 0.5
- There is no way to know with the available information
Answer: a) Smaller than 0.5
Solution: - z = (115 - 100) / 15 = 1.0 - P(Z > 1.0) = 1 - 0.8413 = 0.1587 However, you are not going to be able to calculate the exact number in the exam. You can know if it is smaller, greater or equal to 0.5 by sketching the distribution and identifying the area under the curve that you are trying to calculate.
1.1.2 Question 2
A z-score of -1.96 corresponds to what percentile in a standard normal distribution?
- 2.5th percentile
- 5th percentile
- 95th percentile
- 97.5th percentile
Answer: a) 2.5th percentile
Solution: - P(Z < -1.96) = 0.025 = 2.5th percentile. Once again, you may not have the tools to calculate this directly, however, you can use the critical value list in the formula list and identify that this is the critical value often used in 95% confidence intervals (2.5% in each tail).
1.1.3 Question 3
Test scores are normally distributed with μ = 70 and σ = 8. What score corresponds to the 95th percentile?
- 80.24
- 79.12
- 60.76
- 83.16
Answer: b) 83.16
Solution: - The 90% CI corresponds to z = 1.645 according to the critical value list in the formulas. You have 5% in each of the tails, which means that the 95th percentile can be calculated using this critical value. Because this is the 95th percentile, we only take one tail (the positive one).
- X = μ + zσ = 70 + (1.645)(8) = 70 + 13.16 = 83.16
1.1.4 Question 4
For a normal distribution, what proportion of data (approximately) falls between μ - σ and μ + σ?
- 50%
- 68%
- 95%
- 99.7%
Answer: b) 68%
Solution: - This is the empirical rule (68-95-99.7 rule) - About 68% of data falls within 1 standard deviation of the mean - About 95% within 2 standard deviations - About 99.7% within 3 standard deviations
1.1.5 Question 5
If Z ~ N(0,1), what is P(-2.33 < Z < 2.33)?
- 0.98
- 0.68
- 0.95
- 0.75
Answer: a) 0.8664
Solution: - P(-2.33 < Z < 2.33) = 0.98 by using the critical value for 98% CI
1.1.6 Question 6
A value that is 2.5 standard deviations below the mean has a z-score of:
- 2.5
- -2.5
- 0.25
- -0.25
Answer: b) -2.5
Solution: - Values below the mean have negative z-scores - 2.5 standard deviations below means z = -2.5
1.1.7 Question 7
Heights of adult women are normally distributed with μ = 65 inches and σ = 3 inches. What is the probability a randomly selected woman is shorter than 62 inches?
- Smaller than 0.5
- Greater than 0.5
- Equal to 0.5
- There is no way to know with the available information
Answer: a) Smaller than 0.5
Solution: - z = (62 - 65) / 3 = -1.0 - P(Z < -1.0) = 0.1587 However, you are not going to be able to calculate the exact number in the exam. You can know if it is smaller, greater or equal to 0.5 by sketching the distribution and identifying the area under the curve that you are trying to calculate.
1.1.8 Question 8
Which z-score represents a value furthest from the mean?
- z = 1.8
- z = -2.3
- z = 0.5
- z = -1.2
Answer: b) z = -2.3
Solution: - Distance from mean is measured by absolute value of z-score - |1.8| = 1.8, |-2.3| = 2.3, |0.5| = 0.5, |-1.2| = 1.2 - z = -2.3 is furthest from the mean
1.1.9 Question 9
If a normal distribution has μ = 50 and σ = 10, what is the value corresponding to z = 1.5?
- 35
- 65
- 55
- 75
Answer: b) 65
Solution: - X = μ + zσ = 50 + (1.5)(10) = 50 + 15 = 65
1.1.10 Question 10
The area under the entire standard normal curve equals:
- 0
- 0.5
- 1
- 100
Answer: c) 1
Solution: - The total area under any probability density function equals 1 - This represents the total probability of all possible outcomes
1.2 Central Limit Theorem (Questions 11-20)
1.2.1 Question 11
The Central Limit Theorem states that the sampling distribution of the sample mean will be approximately normal if:
- The population is normal or the sample size is large enough
- The population is always normal
- The sample size is less than 30
- The population variance is known
Answer: a) The population is normal or the sample size is large enough
Solution: - CLT applies when: (1) the population is normal (any n), OR (2) n is sufficiently large (typically n ≥ 30) - If the population is normal, sampling distribution of X̄ is normal for any sample size - If the population is not normal, we need a large enough sample for CLT to apply
1.2.2 Question 12
For the Central Limit Theorem to apply, a general rule of thumb is that n should be at least:
- 10
- 20
- 30
- 100
Answer: c) 30
Solution: - The rule of thumb is n ≥ 30 for CLT to apply when population is not normal - For more skewed populations, larger samples may be needed - For normal populations, any sample size works
1.2.3 Question 13
A population has μ = 80 and σ = 20. For samples of size 64, what is the standard error of the mean?
- 2.5
- 5.0
- 10.0
- 20.0
Answer: a) 2.5
Solution: - Standard error (SE) = σ/√n = 20/√64 = 20/8 = 2.5
1.2.4 Question 14
If the population distribution is highly skewed, the sampling distribution of X̄ will be approximately normal when:
- n is small
- n is sufficiently large
- σ is known
- Never
Answer: b) n is sufficiently large
Solution: - The Central Limit Theorem tells us that regardless of population shape, the sampling distribution of X̄ approaches normality as n increases - More skewed populations require larger samples
1.2.5 Question 15
The mean of the sampling distribution of X̄ equals:
- σ/√n
- μ/n
- μ
- σ
Answer: c) μ
Solution: - E(X̄) = μ, meaning the sampling distribution of X̄ is centered at the population mean - X̄ is an unbiased estimator of μ
1.2.6 Question 16
A population has σ = 12. To cut the standard error in half, the sample size must be:
- Doubled
- Quadrupled
- Tripled
- Cut in half
Answer: b) Quadrupled
Solution: - SE = σ/√n - To cut SE in half: σ/(√n) → σ/(2√n) - This requires √n → 2√n, so n → 4n - Sample size must be quadrupled
1.2.7 Question 17
For a population with μ = 50 and σ = 15, samples of size 100 are drawn. What is P(X̄ > 52)?
- Smaller than 0.5
- Greater than 0.5
- Equal to 0.5
- There is no way to know with the available information
Answer: a) Smaller than 0.5
Solution: - SE = σ/√n = 15/√100 = 1.5 - z = (52 - 50) / 1.5 = 1.33 - P(Z > 1.33) = 1 - 0.9082 = 0.0918. However, you are not going to be able to calculate the exact number in the exam. You can know if it is smaller, greater or equal to 0.5 by sketching the distribution and identifying the area under the curve that you are trying to calculate.
1.2.8 Question 18
According to the Central Limit Theorem, as sample size increases, the sampling distribution of X̄:
- Becomes more skewed
- Has larger standard deviation
- Becomes more concentrated around μ
- Approaches the population distribution
Answer: c) Becomes more concentrated around μ
Solution: - As n increases, SE = σ/√n decreases - This means the sampling distribution becomes narrower and more concentrated around μ - The sampling distribution approaches a normal distribution (not the population distribution)
1.2.9 Question 19
A population is uniformly distributed. The sampling distribution of X̄ for n = 50 will be:
- Uniform
- Skewed
- Approximately normal
- Bimodal
Answer: c) Approximately normal
Solution: - By CLT, even though the population is uniform (not normal), the sampling distribution of X̄ will be approximately normal when n is large - n = 50 is sufficiently large for CLT to apply
1.2.10 Question 20
If samples of size 25 are drawn from a normal population with σ = 10, the standard deviation of X̄ is:
- 0.4
- 2
- 5
- 10
Answer: b) 2
Solution: - Standard deviation of X̄ is the standard error: SE = σ/√n = 10/√25 = 10/5 = 2
1.3 Confidence Intervals (Questions 21-35)
1.3.1 Question 21
A 95% confidence interval means that:
- 95% of the data falls within the interval
- We are 95% confident the interval captures the true parameter
- The probability the parameter is in the interval is 95%
- 95% of sample means fall within the interval
Answer: b) We are 95% confident the interval captures the true parameter
Solution: - Correct interpretation: We are 95% confident this specific interval contains the true parameter - The parameter is fixed; the interval is random - If we repeated this process many times, about 95% of intervals would contain the true parameter
1.3.2 Question 22
A researcher constructs a 90% confidence interval for μ as (23.5, 28.5). The point estimate is:
- 5
- 26
- 2.5
- Cannot be determined
Answer: b) 26
Solution: - The point estimate (X̄) is at the center of the interval - Point estimate = (23.5 + 28.5) / 2 = 52 / 2 = 26
1.3.3 Question 23
When σ is unknown and n = 15, we should construct a confidence interval using:
- Z-distribution
- t-distribution with 15 degrees of freedom
- t-distribution with 14 degrees of freedom
- Normal distribution always
Answer: c) t-distribution with 14 degrees of freedom
Solution: - When σ is unknown, we use the t-distribution - Degrees of freedom = n - 1 = 15 - 1 = 14
1.3.4 Question 24
To decrease the width of a confidence interval, we can:
- Increase the confidence level
- Decrease the sample size
- Increase the sample size
- Increase the standard deviation
Answer: c) Increase the sample size
Solution: - CI width depends on margin of error: ME = critical value × SE - SE = s/√n, so increasing n decreases SE and thus narrows the interval - Increasing confidence level or σ would widen the interval
1.3.5 Question 25
A 99% confidence interval will be _____ a 95% confidence interval (all else equal):
- Narrower than
- Wider than
- The same width as
- Cannot determine
Answer: b) Wider than
Solution: - Higher confidence requires wider interval to be more “confident” we capture the parameter - 99% CI uses z* = 2.576 vs 95% CI uses z* = 1.96 - Larger critical value → larger margin of error → wider interval
1.3.6 Question 26
For a confidence interval for a proportion, we need np ≥ 10 and n(1-p) ≥ 10 to:
- Ensure the sample is random
- Ensure the normal approximation is valid
- Calculate the margin of error
- Determine the confidence level
Answer: b) Ensure the normal approximation is valid
Solution: - These are the success-failure conditions - We need enough successes (np ≥ 10) and failures (n(1-p) ≥ 10) - This ensures the sampling distribution of p̂ is approximately normal
1.3.7 Question 27
A sample of 100 gives X̄ = 45 with s = 12. The 95% confidence interval for μ is approximately:
- (42.6, 47.4)
- (43.0, 47.0)
- (44.0, 46.0)
- (40.5, 49.5)
Answer: a) (42.6, 47.4)
Solution: - SE = s/√n = 12/√100 = 1.2 - For large n, use z* ≈ 1.96 for 95% CI - ME = 1.96 × 1.2 = 2.352 ≈ 2.4 - CI: 45 ± 2.4 = (42.6, 47.4)
1.3.8 Question 28
The margin of error in a confidence interval is:
- Half the width of the interval
- The width of the interval
- The standard error
- The confidence level
Answer: a) Half the width of the interval
Solution: - CI = point estimate ± margin of error - Width = upper limit - lower limit = 2 × margin of error - Therefore, margin of error = width / 2
1.3.9 Question 29
A researcher wants to estimate a population proportion with margin of error 0.03 at 95% confidence. The required sample size is approximately:
- 267
- 384
- 1068
- 33
Answer: c) 1068
Solution: - When no prior estimate, use p = 0.5 (most conservative) - ME = z* √(p(1-p)/n) - 0.03 = 1.96 √(0.25/n) - n = (1.96)² × 0.25 / (0.03)² = 1067.1 ≈ 1068
1.3.10 Question 30
If a 95% CI for μ is (40, 50), which null hypothesis would be rejected at α = 0.05?
- H₀: μ = 45
- H₀: μ = 48
- H₀: μ = 42
- H₀: μ = 55
Answer: d) H₀: μ = 55
Solution: - If a value is inside the 95% CI, we would not reject H₀ at α = 0.05 - If a value is outside the 95% CI, we would reject H₀ at α = 0.05 - 55 is outside the interval (40, 50), so we would reject H₀: μ = 55
1.3.11 Question 31
The t-distribution differs from the normal distribution in that it:
- Is more spread out with heavier tails
- Is always skewed
- Has mean different from 0
- Cannot be used for inference
Answer: a) Is more spread out with heavier tails
Solution: - t-distribution has heavier tails than normal (more probability in extremes) - As df increases, t-distribution approaches normal distribution - Both are symmetric with mean 0
1.3.12 Question 32
As degrees of freedom increase, the t-distribution:
- Becomes more skewed
- Approaches the normal distribution
- Becomes more spread out
- Stays exactly the same
Answer: b) Approaches the normal distribution
Solution: - As df → ∞, t-distribution → standard normal distribution - This is why we can use z-values for large samples
1.3.13 Question 33
A sample of 400 voters shows 220 favor a proposition. The 90% CI for the true proportion is approximately:
- (0.51, 0.59)
- (0.49, 0.61)
- (0.52, 0.58)
- (0.50, 0.60)
Answer: a) (0.51, 0.59)
Solution: - p̂ = 220/400 = 0.55 - SE = √(0.55 × 0.45 / 400) = √(0.0006188) = 0.0249 - z* for 90% CI = 1.645 - ME = 1.645 × 0.0249 = 0.041 - CI: 0.55 ± 0.041 = (0.509, 0.591) ≈ (0.51, 0.59)
1.3.14 Question 34
To halve the margin of error in a confidence interval (keeping everything else constant), you must:
- Double the sample size
- Quadruple the sample size
- Take the square root of the sample size
- Divide the sample size by 4
Answer: b) Quadruple the sample size
Solution: - ME ∝ 1/√n - To halve ME: need √n to double - If √n doubles, then n quadruples
1.3.15 Question 35
The critical value for a 98% confidence interval using the standard normal distribution is approximately:
- 1.96
- 2.33
- 2.58
- 1.645
Answer: b) 2.33
Solution: - This solution comes directly from the critical values listed in the formulas.
- 98% confidence means 2% in tails, so 1% in each tail - P(Z < z) = 0.99 - z = 2.33
1.4 Hypothesis Testing Fundamentals (Questions 36-50)
1.4.1 Question 36
The null hypothesis typically represents:
- The researcher’s belief
- The status quo or no effect
- The alternative theory
- The sample statistic
Answer: b) The status quo or no effect
Solution: - H₀ represents the claim of no difference, no effect, or status quo - It’s what we assume to be true unless we have strong evidence against it
1.4.2 Question 37
The p-value is the probability of:
- The null hypothesis being true
- The alternative hypothesis being true
- Observing data as extreme or more extreme than what we got, assuming H₀ is true
- Making a Type I error
Answer: c) Observing data as extreme or more extreme than what we got, assuming H₀ is true
Solution: - p-value = P(observing test statistic as extreme or more extreme | H₀ is true) - It measures the strength of evidence against H₀ - It is NOT the probability that H₀ is true
1.4.3 Question 38
If we reject H₀ when it is actually true, we have made:
- Type I error
- Type II error
- Correct decision
- No error
Answer: a) Type I error
Solution: - Type I error: Rejecting H₀ when H₀ is true (false positive) - P(Type I error) = α
1.4.4 Question 39
The probability of Type II error is denoted by:
- α
- β
- p
- 1 - α
Answer: b) β
Solution: - β = P(Type II error) = P(Fail to reject H₀ | H₀ is false) - Power = 1 - β
1.4.5 Question 40
Power of a test is:
- α
- β
- 1 - β
- 1 - α
Answer: c) 1 - β
Solution: - Power = P(Reject H₀ | H₀ is false) - Power = 1 - P(Type II error) = 1 - β - Higher power is better (more likely to detect a true effect)
1.4.6 Question 41
If α = 0.05 and p-value = 0.03, we should:
- Fail to reject H₀
- Reject H₀
- Accept H₀
- Cannot determine
Answer: b) Reject H₀
Solution: - Decision rule: If p-value < α, reject H₀ - 0.03 < 0.05, so we reject H₀
1.4.7 Question 42
A two-tailed test at α = 0.05 is equivalent to:
- A one-tailed test at α = 0.05
- A one-tailed test at α = 0.025
- Checking if the parameter equals the null value
- Using a 95% confidence interval
Answer: d) Using a 95% confidence interval
Solution: - A two-tailed test at α = 0.05 is equivalent to checking if the null value falls within a 95% CI - If the null value is outside the 95% CI, we reject at α = 0.05
1.4.8 Question 43
Which statement is correct?
- Failing to reject H₀ proves H₀ is true
- Rejecting H₀ proves Hₐ is true
- We never “accept” the null hypothesis
- P-value equals α
Answer: c) We never “accept” the null hypothesis
Solution: - We either reject H₀ or fail to reject H₀ - Failing to reject ≠ accepting; it just means insufficient evidence against H₀ - We never “prove” hypotheses with statistical tests
1.4.9 Question 44
To increase the power of a test, we can:
- Decrease sample size
- Increase α
- Decrease α
- Make the test two-tailed
Answer: b) Increase α
Solution: - Power = 1 - β = P(Reject H₀ | H₀ is false) - Increasing α makes it easier to reject H₀, thus increasing power - Also: increasing sample size, increasing effect size, or decreasing variance increases power
1.4.10 Question 45
A researcher finds a strong positive correlation (r = 0.82) between ice cream sales and drowning incidents. Which conclusion is most appropriate?
- Eating ice cream causes drowning
- Drowning causes people to buy ice cream
- A third variable (like temperature) likely affects both variables
- The strong correlation proves a causal relationship
Answer: c) A third variable (like temperature) likely affects both variables
Solution: - Correlation does NOT imply causation - This is a classic example of a confounding variable - Temperature (or summer weather) likely causes both increased ice cream sales and more swimming (leading to more drowning incidents) - The correlation between ice cream and drowning is spurious (not causal)
1.4.11 Question 46
The significance level α represents:
- P(Type II error)
- P(Type I error)
- The p-value
- Power
Answer: b) P(Type I error)
Solution: - α = P(Reject H₀ | H₀ is true) = P(Type I error) - Common values: α = 0.05, 0.01, 0.10
1.4.12 Question 47
A study finds that students who sit in the front rows of classrooms have higher exam scores on average than students who sit in the back (r = 0.65, p < 0.01). What can we conclude?
- Sitting in the front causes higher exam scores
- Higher exam scores cause students to sit in the front
- There is a significant association, but causation cannot be determined from this study
- Random assignment would eliminate this correlation
Answer: c)There is a significant association, but causation cannot be determined from this study
Solution: - The correlation is statistically significant (p < 0.01), so there is a real association - However, this is an observational study, not an experiment - Possible explanations: Motivated students choose to sit in front AND study more, Better vision/hearing in front helps learning, Less distraction in front - Cannot establish causation without a randomized experiment - Answer (d) is incorrect because random assignment would be part of designing an experiment, but wouldn’t “eliminate” a real relationship
1.4.13 Question 48
When comparing a p-value to α:
- If p-value < α, fail to reject H₀
- If p-value < α, reject H₀
- If p-value > α, reject H₀
- P-value and α are unrelated
Answer: b) If p-value < α, reject H₀
Solution: - Decision rule: Reject H₀ if p-value < α - Fail to reject H₀ if p-value ≥ α
1.4.14 Question 49
In hypothesis testing, we test:
- Sample statistics
- Population parameters
- Both statistics and parameters
- Neither
Answer: b) Population parameters
Solution: - Hypotheses are statements about population parameters (μ, p, σ, etc.) - We use sample statistics to make inferences about parameters
1.4.15 Question 50
A researcher obtains a test statistic of t = 2.5 with a p-value of 0.01. At α = 0.05, this provides:
- No evidence against H₀
- Evidence against H₀
- Nothing
- Cannot determine
Answer: c) Evidence against H₀
Solution: - p-value = 0.01 < 0.05, so we reject H₀ - p < 0.01 indicates evidence against H₀
1.5 Two-Sample Tests and Effect Sizes (Questions 51-60)
1.5.1 Question 51
When comparing two population means with independent samples and unknown but equal variances, we use:
- Paired t-test
- Pooled t-test
- Z-test for proportions
- Chi-square test
Answer: b) Pooled t-test
Solution: - Equal variances → pooled t-test - Unequal variances → Welch’s t-test (unpooled) - Paired data → paired t-test (outside of the scope of our class)
1.5.2 Question 52
Cohen’s d = 0.8 represents:
- Small effect
- Medium effect
- Large effect
- No effect
Answer: c) Large effect
Solution: - Cohen’s standards: d = 0.2 (small), 0.5 (medium), 0.8 (large) - d = 0.8 is considered a large, practically meaningful effect
1.5.3 Question 53
A pooled t-test assumes:
- The samples are dependent
- Population variances are equal
- Population variances are unequal
- Sample sizes must be equal
Answer: b) Population variances are equal
Solution: - Pooled t-test pools the variances, assuming σ₁² = σ₂² - If variances are unequal, use Welch’s t-test instead
1.5.4 Question 54
To test H₀: p₁ = p₂ vs. Hₐ: p₁ ≠ p₂, we use:
- t-test
- ANOVA
- Two-proportion z-test
- Chi-square goodness of fit
Answer: c) Two-proportion z-test
Solution: - Comparing two population proportions → two-proportion z-test - Uses pooled proportion under H₀: p₁ = p₂
1.5.5 Question 55
When comparing two means with known population standard deviations, we use:
- t-test
- z-test
- F-test
- Chi-square test
Answer: b) z-test
Solution: - Known σ → z-test - Unknown σ → t-test - In practice, σ is almost always unknown
1.5.6 Question 56
Cohen’s d is calculated as:
- (X̄₁ - X̄₂) / s_pooled
- (X̄₁ - X̄₂) / SE
- s_pooled / (X̄₁ - X̄₂)
- SE / (X̄₁ - X̄₂)
Answer: a) (X̄₁ - X̄₂) / s_pooled
Solution: - Cohen’s d = (difference in means) / (pooled standard deviation) - Measures effect size in standard deviation units - Not affected by sample size (unlike test statistics)
1.5.7 Question 57
According to Cohen’s standards, d = 0.4 is closest to:
- Small effect
- Medium effect
- Large effect
- Very large effect
Answer: b) Medium effect
Solution: - Cohen’s standards: 0.2 (small), 0.5 (medium), 0.8 (large) - d = 0.4 is between small and medium, but closer to medium
1.5.8 Question 58
When testing the difference between two proportions, the null hypothesis is typically:
- p₁ - p₂ = 1
- p₁ - p₂ = 0
- p₁/p₂ = 1
- p₁ + p₂ = 1
Answer: b) p₁ - p₂ = 0
Solution: - H₀: p₁ - p₂ = 0, which is equivalent to H₀: p₁ = p₂ - Tests if the two proportions are equal
1.5.9 Question 59
A researcher wants to estimate the average income of all residents in a city. She surveys 500 people who visit an upscale shopping mall on a Saturday afternoon and constructs a 95% confidence interval. What is the primary concern with this approach?
- The sample size is too small for the Central Limit Theorem to apply
- The sampling method is not random, so the confidence interval may not be valid
- A 99% confidence interval should be used instead
- The t-distribution should be used instead of the z-distribution
Answer: b) The sampling method is not random, so the confidence interval may not be valid
Solution: - All inferential procedures (confidence intervals, hypothesis tests) require random sampling - This is a convenience sample from an upscale shopping mall, which likely: - Overrepresents higher-income individuals - Excludes people who don’t shop at malls - Only captures Saturday afternoon shoppers - The resulting confidence interval will be biased and not representative of all city residents - Sample size (n = 500) is actually quite large, so (a) is incorrect - The issue isn’t about choosing 95% vs 99% confidence level (c) - The issue isn’t about z vs t distribution (d) - Key principle: Without random sampling, we cannot validly generalize from our sample to the population, regardless of sample size or statistical technique used
1.5.10 Question 60
In a two-sample t-test, if we fail to reject H₀: μ₁ = μ₂, we conclude:
- μ₁ = μ₂ is definitely true
- There is insufficient evidence that μ₁ ≠ μ₂
- The samples are identical
- μ₁ > μ₂
Answer: b) There is insufficient evidence that μ₁ ≠ μ₂
Solution: - Failing to reject H₀ means we don’t have enough evidence to conclude the means differ - It does NOT prove the means are equal
1.6 Chi-Square Tests and ANOVA (Questions 61-70)
1.6.1 Question 61
The chi-square distribution is:
- Symmetric
- Always right-skewed
- Always left-skewed
- Can be negative
Answer: b) Always right-skewed
Solution: - χ² distribution is right-skewed (positive values only) - Approaches normal as df increases - Used for: tests of independence, goodness of fit, variance tests
1.6.2 Question 62
The degrees of freedom for a chi-square test of independence with a 3×4 contingency table is:
- 12
- 7
- 6
- 11
Answer: c) 6
Solution: - df = (r - 1)(c - 1) where r = rows, c = columns - df = (3 - 1)(4 - 1) = 2 × 3 = 6
1.6.3 Question 63
In ANOVA, the null hypothesis states that:
- All sample means are equal
- All population means are equal
- All population variances are equal
- Sample and population means are equal
Answer: b) All population means are equal
Solution: - H₀: μ₁ = μ₂ = μ₃ = … = μₖ - Tests if all k population means are equal
1.6.4 Question 64
The F-statistic in ANOVA is always:
- Negative
- Between -1 and 1
- Non-negative
- Greater than 1
Answer: c) Non-negative
Solution: - F = MSB / MSW (ratio of two variances) - Variances are always non-negative, so F ≥ 0 - F close to 1 suggests no difference in means
1.6.5 Question 65
MSB (Mean Square Between) measures:
- Variation within groups
- Variation between groups
- Total variation
- Sample variance
Answer: b) Variation between groups
Solution: - MSB = SSB / (k-1) measures variation between group means - MSW = SSW / (n-k) measures variation within groups - F = MSB / MSW
1.6.6 Question 66
If the F-statistic in ANOVA is close to 1, this suggests:
- Strong evidence against H₀
- Group means are very different
- Little difference between group means
- The test is invalid
Answer: c) Little difference between group means
Solution: - F ≈ 1 means MSB ≈ MSW - Between-group variation is similar to within-group variation - Suggests groups means are similar (fail to reject H₀)
1.6.7 Question 67
The chi-square test for independence tests whether:
- Two means are equal
- A distribution is normal
- Two categorical variables are related
- Variances are equal
Answer: c) Two categorical variables are related
Solution: - H₀: Variables are independent - Hₐ: Variables are related/associated/dependent - Uses contingency tables
1.6.8 Question 68
In a chi-square test, expected frequencies are calculated assuming:
- The alternative hypothesis is true
- The null hypothesis is true
- The sample is biased
- Variables are dependent
Answer: b) The null hypothesis is true
Solution: - Expected frequencies assume independence (H₀ is true) - E = (row total × column total) / grand total - Compare observed to expected frequencies
1.6.9 Question 69
For a one-way ANOVA with 4 groups and 40 total observations, the df for MSW is:
- 36
- 39
- 3
- 4
Answer: a) 36
Solution: - df for MSE (within groups) = n - k - n = 40 total observations, k = 4 groups - df = 40 - 4 = 36
1.6.10 Question 70
A statistically significant F-test in ANOVA tells us:
- All means are different
- At least one mean differs from the others
- Exactly which means differ
- All means are equal
Answer: b) At least one mean differs from the others
Solution: - Rejecting H₀ in ANOVA means at least one μᵢ ≠ μⱼ - Doesn’t tell us which specific means differ - Need post-hoc tests (e.g., Tukey’s HSD) to identify differences
2 Part II: Free Response Questions (30 questions)
2.1 Section A: Normal Distribution (Questions 1-3)
2.1.1 Question 1
Battery life for a certain laptop is normally distributed with μ = 6.5 hours and σ = 0.8 hours.
- What proportion of laptops have battery life between 6 and 7 hours?
- Find the battery life that represents the 75th percentile.
- If a laptop’s battery lasts 8 hours, is this unusual? Explain using the z-score.
Part a: - z₁ = (6 - 6.5) / 0.8 = -0.625 - z₂ = (7 - 6.5) / 0.8 = 0.625 - You won’t have a way to calculate the following probabilities in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(-0.625 < Z < 0.625) = P(Z < 0.625) - P(Z < -0.625) - = 0.7340 - 0.2660 = 0.468 or 46.8%
Part b: - You won’t have a way to calculate the following percentile in the exam, but we may ask you to sketch the N(0,1) distribution and point the values in the x-axis that you are looking for. - 75th percentile corresponds to z = 0.674 - X = μ + zσ = 6.5 + (0.674)(0.8) = 6.5 + 0.539 = 7.04 hours
Part c: - z = (8 - 6.5) / 0.8 = 1.875 - This is between 1.5 and 2 standard deviations above the mean - This is somewhat unusual (in the upper 3-4% of the distribution) - Not extremely unusual (would need |z| > 2 or 3 for that)
2.1.2 Question 2
SAT scores are normally distributed with μ = 1050 and σ = 200.
- What percentage of students score above 1300?
- What score represents the bottom 10% of all scores?
- Between what two scores (symmetric around the mean) do the middle 90% of students score?
Part a: - z = (1300 - 1050) / 200 = 1.25 - You won’t have a way to calculate the following probability in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(Z > 1.25) = 1 - 0.8944 = 0.1056 or 10.56%
Part b: - You won’t have a way to calculate the following percentile in the exam, but we may ask you to sketch the N(0,1) distribution and point the values in the x-axis that you are looking for. - Bottom 10% means z = -1.28 - X = 1050 + (-1.28)(200) = 1050 - 256 = 794
Part c: - Middle 90% leaves 5% in each tail - z-scores: ±1.645 (from the critical values listed in the formulas) - Lower bound: 1050 + (-1.645)(200) = 721 - Upper bound: 1050 + (1.645)(200) = 1379 - Middle 80% score between 721 and 1379
2.1.3 Question 3
A factory produces bolts with diameters that are normally distributed with μ = 10 mm and σ = 0.2 mm. Bolts are acceptable if their diameter is between 9.671 mm and 10.329 mm.
- What proportion of bolts are acceptable?
- If the factory produces 10,000 bolts per day, how many are expected to be unacceptable?
- What should the standard deviation be (keeping μ = 10) so that 99% of bolts are acceptable?
Part a: - z₁ = (9.671 - 10) / 0.2 = -1.645 - z₂ = (10.329 - 10) / 0.2 = 1.645 - P(-1.645 < Z < 1.645) = 90% (using the critical values listed in the formulas)
Part b: - Proportion unacceptable = 1 - 0.90 = 0.10 - Expected unacceptable = 10,000 × 0.10 = 1000 bolts
Part c: - For 99% acceptable, need P(9.671 < X < 10.329) = 0.99 - Need z = 2.576 for each endpoint (from the critical values listed in the formulas) - 10.329 = 10 + 2.576σ - σ = 0.329 / 2.576 = 0.1277 mm
2.2 Section B: Central Limit Theorem (Questions 4-6)
2.2.1 Question 4
A population of customer service wait times has μ = 12 minutes and σ = 4 minutes. The distribution is right-skewed.
- Can we use the Central Limit Theorem for samples of size n = 5? Why or why not?
- For samples of size n = 64, describe the sampling distribution of X̄.
- What is the probability that a sample of 64 customers has a mean wait time less than 11.5 minutes?
Part a: - No, we cannot reliably use CLT for n = 5 - The population is right-skewed, so we need a larger sample (typically n ≥ 30) - With n = 5, the sampling distribution will still be skewed
Part b: - By CLT, for n = 64 (large sample), X̄ is approximately normal - Mean of X̄: μ_X̄ = 12 minutes - Standard error: SE = σ/√n = 4/√64 = 0.5 minutes - X̄ ~ N(12, 0.5)
Part c: - z = (11.5 - 12) / 0.5 = -1.0 - You won’t have a way to calculate the following probability in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(Z < -1.0) = 0.1587 or 15.87%
2.2.2 Question 5
Monthly cell phone bills for a population have μ = $85 and σ = $20.
- For random samples of 100 customers, what is the mean and standard deviation of the sampling distribution of X̄?
- What is P(X̄ > $87)?
- Would it be unusual to observe a sample mean of $90? Explain.
Part a: - Mean of sampling distribution: μ_X̄ = μ = $85 - Standard deviation (SE): σ_X̄ = σ/√n = 20/√100 = $2
Part b: - z = (87 - 85) / 2 = 1.0 - You won’t have a way to calculate the following probability in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(Z > 1.0) = 1 - 0.8413 = 0.1587 or 15.87%
Part c: - z = (90 - 85) / 2 = 2.5 - You won’t have a way to calculate the following probability in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(Z > 2.5) = 0.0062 or 0.62% - Yes, this would be unusual (more than 2 standard errors from mean) - Only occurs about 0.6% of the time by chance
2.2.3 Question 6
A population is uniformly distributed on the interval [0, 10].
- What are μ and σ for this population?
- For samples of size 36, describe the sampling distribution of X̄.
- Calculate P(4.5 < X̄ < 5.5) for n = 36.
Part a: - For uniform distribution on [a, b]: - μ = (a + b) / 2 = (0 + 10) / 2 = 5 - σ = (b - a) / √12 = 10 / √12 = 2.887
Part b: - By CLT, even though population is uniform, X̄ is approximately normal for n = 36 - μ_X̄ = 5 - SE = σ/√n = 2.887/√36 = 0.481 - X̄ ~ N(5, 0.481)
Part c: - z₁ = (4.5 - 5) / 0.481 = -1.04 - z₂ = (5.5 - 5) / 0.481 = 1.04 - You won’t have a way to calculate the following probability in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(-1.04 < Z < 1.04) = 0.8508 - 0.1492 = 0.7016 or 70.16%
2.3 Section C: Confidence Intervals (Questions 7-12)
2.3.1 Question 7
A random sample of 50 students has mean GPA of 3.2 with standard deviation 0.6.
- Construct a 95% confidence interval for the true mean GPA.
- Interpret this interval in context.
- Based on this interval, is it plausible that the true mean GPA is 3.0? Explain.
Part a: - n = 50 (large), so use z* ≈ 1.96 for 95% CI (from list of critical values in formulas) - SE = s/√n = 0.6/√50 = 0.0849 - ME = 1.96 × 0.0849 = 0.166 - CI: 3.2 ± 0.166 = (3.034, 3.366)
Part b: - We are 95% confident that the true mean GPA for all students is between 3.034 and 3.366. - If we repeated this sampling process many times, about 95% of the intervals would contain the true population mean GPA.
Part c: - No, μ = 3.0 is NOT plausible at the 95% confidence level. However, we can always be in the 5% of cases where we get the decision wrong. - We would reject H₀: μ = 3.0 at α = 0.05
2.3.2 Question 8
A survey of 500 voters finds that 270 support a ballot measure.
- Calculate the sample proportion.
- Construct a 99% confidence interval for the true proportion of supporters.
- Based on this interval, is there evidence the measure will pass (needs >50%)? Explain.
Part a: - p̂ = 270/500 = 0.54
Part b: - Check conditions: np̂ = 270 ≥ 10, n(1-p̂) = 230 ≥ 10 ✓ - SE = √(p̂(1-p̂)/n) = √(0.54×0.46/500) = 0.0223 - z* = 2.576 for 99% CI - ME = 2.576 × 0.0223 = 0.057 - CI: 0.54 ± 0.057 = (0.483, 0.597) or (48.3%, 59.7%)
Part c: - The entire interval is above 50%, so there is evidence at the 99% confidence level that the measure will pass - We are quite confident that more than 50% of voters support the measure
2.3.3 Question 9
A manufacturer wants to estimate the mean lifetime of light bulbs with margin of error 50 hours at 95% confidence. Previous studies suggest σ = 200 hours.
- What sample size is needed?
- If the desired margin of error is reduced to 25 hours, what sample size is needed?
- Explain why the sample size changes the way it does.
Part a: - ME = z* × σ/√n - 50 = 1.96 × 200/√n - √n = (1.96 × 200) / 50 = 7.84 - n = 61.47, round up to n = 62
Part b: - 25 = 1.96 × 200/√n - √n = (1.96 × 200) / 25 = 15.68 - n = 245.86, round up to n = 246
Part c: - To cut the margin of error in half (from 50 to 25), we need to quadruple the sample size (from 62 to 246) - This is because ME ∝ 1/√n - To reduce ME by factor of k, need to increase n by factor of k²
2.3.4 Question 10
A sample of 15 measurements has X̄ = 42.5 and s = 6.8. Assume the population is normally distributed.
- Why must we use the t-distribution for inference?
- Construct a 90% confidence interval for μ.
- How would the interval change if n = 50 instead of 15 (all else equal)?
Part a: - We must use t-distribution because: - σ is unknown (we only have s) - Sample size is small (n = 15 < 30) - df = n - 1 = 14
Part b: - t* = 1.761 for 90% CI with df = 14 - SE = s/√n = 6.8/√15 = 1.756 - ME = 1.761 × 1.756 = 3.092 - CI: 42.5 ± 3.09 = (39.41, 45.59)
Part c: - With n = 50: - df = 49, t* ≈ 1.677 (closer to z* = 1.645) - SE = 6.8/√50 = 0.962 (much smaller) - ME = 1.677 × 0.962 = 1.61 (much smaller) - The interval would be much narrower due to larger sample size
2.3.5 Question 11
A 95% confidence interval for the difference in mean salaries between two departments is ($2,000, $8,000).
- Interpret this interval in context.
- Is there significant evidence at α = 0.05 that mean salaries differ? Explain.
- What would change if we computed a 99% CI instead?
Part a: - We are 95% confident that the true difference in mean salaries between the two departments is between $2,000 and $8,000. - Department 1 appears to have higher mean salary than Department 2 by somewhere between $2,000 and $8,000.
Part b: - Yes, there is statististically significant evidence that mean salaries differ - The interval does not contain 0, so we would reject H₀: μ₁ - μ₂ = 0 at α = 0.05 - This is consistent with a two-tailed test rejecting H₀
Part c: - A 99% CI would be wider than the 95% CI - It might still not contain 0 (but it would be closer) - We would be more confident but less precise
2.3.6 Question 12
Two samples are collected: Sample A (n = 100) and Sample B (n = 400), both with the same standard deviation.
- Which sample will produce a narrower confidence interval? Explain.
- How much narrower will it be?
- What does this tell us about the value of larger samples?
Part a: - Sample B (n = 400) will produce a narrower CI - CI width depends on SE = s/√n - Larger n → smaller SE → narrower CI
Part b: - SE_A = s/√100 = s/10 - SE_B = s/√400 = s/20 - SE_B = SE_A / 2 - Sample B’s CI will be half as wide as Sample A’s
Part c: - Larger samples provide more precise estimates - To cut width in half, need to quadruple sample size - Diminishing returns: going from 100 to 400 cuts width in half, but going from 400 to 1600 would be needed to cut in half again
2.4 Section D: Hypothesis Testing (Questions 13-18)
2.4.1 Question 13
A coffee shop claims the mean wait time is 5 minutes. A sample of 35 customers has X̄ = 5.8 minutes with s = 2.1 minutes. Test at α = 0.05.
- State the hypotheses in symbols and words.
- Calculate the test statistic.
- If the p-value is 0.027, what conclusion should be made? State it in context.
- What type of error might have been made? What would it mean in context?
Part a: - H₀: μ = 5 (The mean wait time is 5 minutes) - Hₐ: μ ≠ 5 (The mean wait time is not 5 minutes) - Two-tailed test
Part b: - t = (X̄ - μ₀) / (s/√n) - t = (5.8 - 5) / (2.1/√35) - t = 0.8 / 0.355 = 2.25
Part c: - p-value = 0.027 < α = 0.05, so reject H₀ - Conclusion: There is statistically significant evidence at the 0.05 level that the mean wait time is not 5 minutes. The data suggest the actual mean wait time is different from (likely greater than) the claimed 5 minutes.
Part d: - If we reject H₀, we might have made a Type I error - Type I error: Concluding the mean wait time is not 5 minutes when it actually is 5 minutes - Consequence: The coffee shop might unnecessarily change their operations based on incorrect conclusion
2.4.2 Question 14
A university claims that 70% of students graduate in 4 years. In a random sample of 200 students, 130 graduated in 4 years. Test at α = 0.01.
- State the hypotheses.
- Check if conditions for the test are met.
- The test statistic is z = -1.55 with p-value = 0.121. What conclusion should be made?
- Explain what Type I and Type II errors would mean in this context.
Part a: - H₀: p = 0.70 - Hₐ: p ≠ 0.70
Part b: - Random sample? yes - np₀ = 200(0.70) = 140 ≥ 10 ✓ - n(1-p₀) = 200(0.30) = 60 ≥ 10 ✓ - Conditions are met for normal approximation
Part c: - p-value = 0.121 > α = 0.01, so fail to reject H₀ - Conclusion: There is insufficient evidence at the 0.01 significance level to conclude that the graduation rate differs from 70%. The data are consistent with the university’s claim.
Part d: - Type I error: Concluding the rate is not 70% when it actually is 70% - Consequence: University might waste resources investigating a non-existent problem - Type II error: Failing to conclude the rate differs from 70% when it actually does differ - Consequence: University might not address a real problem with graduation rates
2.4.3 Question 15
A researcher tests H₀: μ = 100 vs. Hₐ: μ ≠ 100 at α = 0.05. Sample data: n = 50, X̄ = 103, s = 12.
- Calculate the test statistic.
- If t₀.₀₂₅,₄₉ = 2.010, what is the rejection region?
- What decision should be made?
- Construct a 95% CI for μ. How does this relate to your hypothesis test conclusion?
Part a: - t = (X̄ - μ₀) / (s/√n) - t = (103 - 100) / (12/√50) - t = 3 / 1.697 = 1.77
Part b: - Two-tailed test with α = 0.05 - Rejection region: t < -2.010 or t > 2.010
Part c: - t = 1.77 does not fall in rejection region - Fail to reject H₀ - There is insufficient evidence to conclude μ ≠ 100
Part d: - 95% CI: X̄ ± t* × SE - CI: 103 ± 2.010 × 1.697 = 103 ± 3.41 = (99.59, 106.41) - The interval contains 100, which is consistent with failing to reject H₀: μ = 100 - If 100 is in the 95% CI, we fail to reject H₀: μ = 100 at α = 0.05
2.4.4 Question 16
A medical researcher claims a new drug reduces blood pressure by more than 10 points on average.
- Set up appropriate hypotheses.
- Describe Type I error in context and explain its consequences.
- Describe Type II error in context and explain its consequences.
- If you were the researcher, would you want α to be large or small? Why?
Part a: - H₀: μ ≤ 10 (Drug reduces BP by 10 or fewer points) - Hₐ: μ > 10 (Drug reduces BP by more than 10 points) - This is a right-tailed test
Part b: - Type I error: Concluding the drug reduces BP by more than 10 points when it actually doesn’t - Consequences: - Patients might be prescribed an ineffective drug - Healthcare resources wasted on inferior treatment - False hope given to patients - Potentially dangerous if they stop other effective treatments
Part c: - Type II error: Failing to conclude the drug reduces BP by more than 10 points when it actually does - Consequences: - An effective drug might not be approved or used - Patients miss out on beneficial treatment - Research investment wasted - Public health opportunity lost
Part d: - Want α to be small (like 0.01 or 0.05) - In medical research, Type I error is typically considered more serious - Don’t want to falsely claim a drug is effective - FDA requires strong evidence (small α) before approval - However, this increases β (Type II error probability)
2.4.5 Question 17
Two hypothesis tests are performed on the same data. Test 1 uses α = 0.01 and Test 2 uses α = 0.10.
- Which test is more likely to make a Type I error?
- Which test is more likely to make a Type II error?
- Which test has more power?
- Explain the trade-off between Type I and Type II errors.
Part a: - Test 2 (α = 0.10) is more likely to make Type I error - P(Type I error) = α - 0.10 > 0.01
Part b: - Test 1 (α = 0.01) is more likely to make Type II error - More stringent criterion means harder to reject H₀ - More likely to miss a true effect (β is larger)
Part c: - Test 2 (α = 0.10) has more power - Power = 1 - β - Larger α → easier to reject H₀ → higher power
Part d: - There is an inverse relationship between Type I and Type II errors - Decreasing α (being more conservative) increases β - Increasing α (being more liberal) decreases β but increases false positives - Need to balance based on consequences of each error type - Can improve both by increasing sample size
2.4.6 Question 18
A p-value of 0.08 is obtained for a hypothesis test.
- What decision would be made at α = 0.05?
- What decision would be made at α = 0.10?
- Is the p-value the probability that H₀ is true? Explain what it actually means.
Part a: - p-value = 0.08 > α = 0.05 - Fail to reject H₀
Part b: - p-value = 0.08 < α = 0.10 - Reject H₀
Part c: - No, the p-value is NOT the probability that H₀ is true - The p-value is: P(observing data as extreme or more extreme than what we got | H₀ is true) - It measures how surprising our data would be if H₀ were true - It’s the probability of the data given H₀, not the probability of H₀ given the data
2.5 Section E: Two-Sample Tests (Questions 19-22)
2.5.1 Question 19
Two teaching methods are compared. Method A: n₁ = 30, X̄₁ = 78, s₁ = 12. Method B: n₂ = 35, X̄₂ = 82, s₂ = 10. Assume equal variances.
- State hypotheses to test if mean scores differ.
- Calculate the pooled standard deviation.
- Calculate Cohen’s d and interpret the effect size.
- If the p-value is 0.14, what conclusion is made at α = 0.05?
Part a: - H₀: μ₁ = μ₂ (Mean scores are equal) - Hₐ: μ₁ ≠ μ₂ (Mean scores differ)
Part b: - s_pooled = √[((n₁-1)s₁² + (n₂-1)s₂²) / (n₁+n₂-2)] - s_pooled = √[(29×144 + 34×100) / 63] - s_pooled = √[(4176 + 3400) / 63] - s_pooled = √(7576/63) = √120.25 = 10.97
Part c: - Cohen’s d = (X̄₁ - X̄₂) / s_pooled - d = (78 - 82) / 10.97 = -4 / 10.97 = -0.36 - |d| = 0.36, which is between small (0.2) and medium (0.5) - This represents a small to medium effect size - Method B has slightly higher scores than Method A
Part d: - p-value = 0.14 > α = 0.05 - Fail to reject H₀ - Conclusion: There is insufficient statistical evidence at the 0.05 level to conclude that the mean scores differ between the two teaching methods.
2.5.2 Question 20
A company tests whether the proportion of defects differs between two production lines. Line 1: 15 defects in 200 items. Line 2: 25 defects in 250 items.
- Calculate both sample proportions.
- State appropriate hypotheses.
- Calculate the pooled proportion.
- If the test statistic is z = -0.82 with p-value = 0.41, what is the conclusion at α = 0.05?
Part a: - p̂₁ = 15/200 = 0.075 - p̂₂ = 25/250 = 0.100
Part b: - H₀: p₁ = p₂ (Defect rates are equal) - Hₐ: p₁ ≠ p₂ (Defect rates differ)
Part c: - p̂_pooled = (x₁ + x₂) / (n₁ + n₂) - p̂_pooled = (15 + 25) / (200 + 250) - p̂_pooled = 40/450 = 0.089
Part d: - p-value = 0.41 > α = 0.05 - Fail to reject H₀ - Conclusion: There is insufficient evidence at the 0.05 level to conclude that the defect rates differ between the two production lines. The observed difference could easily be due to random variation.
2.5.3 Question 21
A study compares weight loss for two diet plans. Diet A: n = 25, X̄ = 8.2 lbs, s = 3.1 lbs. Diet B: n = 20, X̄ = 6.5 lbs, s = 2.8 lbs.
- If we assume equal population variances, should we use a pooled or unpooled test?
- Calculate the pooled variance.
- The 95% CI for μ₁ - μ₂ is (-0.3, 3.7). Interpret this interval.
- Based on this CI, what would you conclude about H₀: μ₁ = μ₂ at α = 0.05?
Part a: - If we assume equal variances, use a pooled test - Pooled t-test combines variance estimates for more power
Part b: - s²_pooled = [(n₁-1)s₁² + (n₂-1)s₂²] / (n₁+n₂-2) - s²_pooled = [24×9.61 + 19×7.84] / 43 - s²_pooled = [230.64 + 148.96] / 43 - s²_pooled = 379.6 / 43 = 8.83 - s_pooled = √8.83 = 2.97 lbs
Part c: - We are 95% confident that the true difference in mean weight loss (Diet A - Diet B) is between -0.3 and 3.7 pounds. - Diet A might result in 0.3 lbs less weight loss up to 3.7 lbs more weight loss than Diet B - The interval includes both positive and negative values
Part d: - The interval contains 0, so we fail to reject H₀: μ₁ = μ₂ at α = 0.05 - There is insufficient evidence to conclude the diets differ in effectiveness - The difference is not statistically significant
2.5.4 Question 22
Below is computer output comparing mean salaries for two positions:
Two-sample t-test
n Mean Std Dev
Group 1 40 62500 8200
Group 2 45 58300 7500
t = 2.56, p-value = 0.012
95% CI for difference: (950, 7450)
Cohen's d = 0.53
- Interpret the p-value in context.
- What conclusion should be made at α = 0.05?
- Interpret Cohen’s d. Is this a meaningful difference?
- Interpret the confidence interval.
Part a: - p-value = 0.012 means: - If the true mean salaries were equal, there is a 1.2% probability of observing a difference as large or larger than $4,200 just by chance - This provides statistical evidence that mean salaries from the population differ
Part b: - p-value = 0.012 < α = 0.05, so reject H₀ - Conclusion: There is statistically significant evidence at the 0.05 level that mean salaries differ between the two positions. Position 1 has significantly higher mean salary than Position 2.
Part c: - Cohen’s d = 0.53 represents a medium effect size (between 0.5 and 0.8) - The difference is about half a standard deviation - This is a practically meaningful difference, not just statistically significant - The higher salary for Position 1 is both statistically and practically significant
Part d: - We are 95% confident that the true difference in mean salaries (Position 1 - Position 2) is between $950 and $7,450 - Position 1’s mean salary is somewhere between about $1,000 and $7,500 higher than Position 2’s - The interval does not contain 0, consistent with rejecting H₀
2.6 Section F: Chi-Square and ANOVA (Questions 23-26)
2.6.1 Question 23
A chi-square test examines if major choice (Business, STEM, Arts) is independent of whether students live on campus. The output shows:
Chi-square test of independence
χ² = 8.45, df = 2, p-value = 0.015
- State the null and alternative hypotheses.
- What conclusion should be made at α = 0.05?
- What does this conclusion mean in practical terms?
- What additional information would help you interpret these results?
Part a: - H₀: Major choice and living on campus are independent - Hₐ: Major choice and living on campus are related/associated/dependent
Part b: - p-value = 0.015 < α = 0.05, so reject H₀ - Conclusion: There is significant evidence at the 0.05 level that major choice and living arrangements are related
Part c: - Students in different majors have different patterns of living on vs. off campus - For example, maybe STEM students are more likely to live on campus than Arts students - The association is statistically significant - However, we don’t know the nature of the relationship from just the chi-square test
Part d: - The contingency table with observed frequencies would help - Expected frequencies to see where largest deviations occur - Percentages/proportions for each major - Sample size (n) - Residuals to identify which cells contribute most to χ²
2.6.2 Question 24
An ANOVA compares mean productivity scores for four different work schedules:
ANOVA Table
Source SS df MS F p-value
Between 320 3 106.67 4.27 0.008
Within 900 36 25
Total 1220 39
- How many workers were in this study?
- State the null and alternative hypotheses.
- What is the conclusion at α = 0.01?
- Can we determine which specific schedules differ based on this output alone?
Part a: - Total df = n - 1 = 39 - Therefore n = 40 workers
Part b: - H₀: μ₁ = μ₂ = μ₃ = μ₄ (All four schedules have equal mean productivity) - Hₐ: At least one mean differs from the others
Part c: - p-value = 0.008 < α = 0.01, so reject H₀ - Conclusion: There is significant evidence at the 0.01 level that mean productivity differs across the four work schedules
Part d: - No, we cannot determine which specific schedules differ - ANOVA only tells us that at least one mean is different - Need post-hoc tests (like Tukey’s HSD) to identify which pairs of means differ - This prevents making multiple comparisons without controlling Type I error rate
2.6.3 Question 25
A study tests if customer satisfaction ratings (on a 50-point scale) differ across three store locations. Location A: n = 20, X̄ = 42. Location B: n = 25, X̄ = 38. Location C: n = 22, X̄ = 45.
ANOVA: F = 5.82, p-value = 0.005
- What is the response variable? Is it quantitative or categorical?
- What is the explanatory variable? How many levels does it have?
- What conclusion should be drawn at α = 0.05?
- If you reject H₀, does this mean all three locations have different mean ratings? Explain.
Part a: - Response variable: Customer satisfaction rating - This is quantitative (numerical scale from 1-50) - Could be treated as continuous for ANOVA purposes
Part b: - Explanatory variable: Store location - This is categorical with 3 levels (Location A, B, C) - Also called the factor or grouping variable
Part c: - p-value = 0.005 < α = 0.05, so reject H₀ - Conclusion: There is statistically significant evidence at the 0.05 level that mean customer satisfaction ratings differ across the three store locations.
Part d: - No, this does NOT mean all three locations have different ratings - It means at least one location differs from the others - Possibilities: - A ≠ B ≠ C (all different) - A = B ≠ C - A ≠ B = C - A = C ≠ B - Need post-hoc comparisons to determine which specific pairs differ
2.6.4 Question 26
A contingency table shows the relationship between exercise frequency (Low, Medium, High) and health status (Poor, Fair, Good, Excellent).
- How many degrees of freedom does the chi-square test have?
- If χ² = 18.5 and the p-value = 0.005, what is the conclusion at α = 0.01?
- What does “independence” mean in this context?
- If the test is significant, what additional analysis might be helpful?
Part a: - df = (r - 1)(c - 1) - r = 3 rows (Low, Medium, High) - c = 4 columns (Poor, Fair, Good, Excellent) - df = (3-1)(4-1) = 2 × 3 = 6
Part b: - p-value = 0.005 < α = 0.01, so reject H₀ - Conclusion: There is significant evidence at the 0.01 level that exercise frequency and health status are related/associated.
Part c: - Independence would mean exercise frequency and health status are unrelated - Knowing someone’s exercise frequency wouldn’t help predict their health status - The distribution of health status would be the same across all exercise levels - Rejecting independence means there IS an association
Part d: - Examine the contingency table with observed vs. expected frequencies - Calculate residuals to see which cells contribute most to χ² - Look at conditional percentages (e.g., % in Good health | High exercise) - Create a mosaic plot or grouped bar chart - Compute measures of association (like Cramér’s V) - This helps understand the nature and strength of the relationship
2.7 Section G: Correlation and Regression (Questions 27-30)
2.7.1 Question 27
A regression analysis examines the relationship between advertising spending (in $1000s) and monthly sales (in $1000s):
Regression Output:
Ŷ = 12.5 + 2.3X
r = 0.78, r² = 0.608
n = 25
SE(slope) = 0.42
- Interpret the slope in context.
- Estimate sales when advertising spending is $10,000.
- Interpret r².
Part a: - For every $1,000 increase in advertising spending, monthly sales are expected to increase by $2,300 on average. - The slope of 2.3 means sales increase by 2.3 (thousand dollars) per unit increase in advertising
Part b: - X = 10 (representing $10,000) - Ŷ = 12.5 + 2.3(10) = 12.5 + 23 = 35.5 - Estimated sales are $35,500
Part c: - r² = 0.608 means 60.8% of the variation in monthly sales can be explained by the linear relationship with advertising spending - The remaining 39.2% is due to other factors or random variation
2.7.2 Question 28
The correlation between hours of TV watched per week and GPA is r = -0.65.
- Describe the relationship between these variables.
- What proportion of variation in GPA is explained by TV watching?
- Does this correlation prove that watching TV causes lower GPA? Explain.
- If the p-value for testing H₀: ρ = 0 is 0.003, what can we conclude?
Part a: - There is a strong, negative linear relationship between TV hours and GPA - As TV watching increases, GPA tends to decrease - The relationship is fairly strong (|r| = 0.65)
Part b: - r² = (-0.65)² = 0.4225 - About 42.25% of the variation in GPA is explained by hours of TV watched
Part c: - No, correlation does not prove causation - Possible explanations: - TV watching might cause lower GPA (possible) - Lower GPA might cause more TV watching (reverse causation) - A third variable (like motivation, work hours, study habits) might affect both - This is an observational study, not a randomized experiment - Cannot establish causal relationship from correlation alone
Part d: - p-value = 0.003 < 0.05, so reject H₀: ρ = 0 - There is significant evidence of a linear relationship (correlation ≠ 0) - The negative relationship is statistically significant - However, this still doesn’t prove causation
2.7.3 Question 29
A regression of exam scores (Y) on hours studied (X) gives:
Ŷ = 35 + 5.2X
r² = 0.42
- Predict the exam score for a student who studies 8 hours.
- The mean hours studied is 7 with standard deviation 2.5. The mean exam score is 71.4. Calculate the correlation coefficient r.
- What does r² = 0.42 tell us?
- Would you feel confident predicting the exam score for someone who studied 20 hours? Why or why not?
Part a: - Ŷ = 35 + 5.2(8) = 35 + 41.6 = 76.6 - Predicted score is 76.6
Part b: - First find s_y: Ŷ = 35 + 5.2(7) = 71.4 ✓ (confirms equation) - Slope b = r(s_y/s_x) - Need to find s_y first using other information - From Ŷ = a + bX and point (X̄, Ȳ): 71.4 = 35 + 5.2(7) ✓ - r² = 0.42, so r = ±√0.42 = ±0.648 - Since slope is positive, r = +0.648 or about 0.65
Part c: - 42% of the variation in exam scores is explained by the linear relationship with hours studied - 58% is due to other factors (aptitude, prior knowledge, test anxiety, etc.)
Part d: - No, should not feel confident - This is extrapolation (20 hours likely outside the range of data) - The linear relationship may not hold at extreme values - May encounter ceiling effects (scores can’t exceed 100) - Regression is most reliable within the range of observed X values
2.7.4 Question 30
Below is regression output examining the relationship between years of experience (X) and salary in thousands (Y):
Coefficients:
Intercept: 45.2 (SE = 2.1, t = 21.5, p < 0.001)
Experience: 3.8 (SE = 0.4, t = 9.5, p < 0.001)
r² = 0.63, n = 50
- Write the regression equation.
- Interpret the intercept. Does it make practical sense?
- Test if the slope is significantly different from zero at α = 0.05.
- A person with 10 years of experience earns $95,000. What is the residual for this person?
Part a: - Ŷ = 45.2 + 3.8X - Where Y is salary in thousands and X is years of experience
Part b: - The intercept of 45.2 means the predicted starting salary (0 years experience) is $45,200 - This may or may not make practical sense depending on the field - Be cautious: if no one in the data had 0 years experience, this is extrapolation - However, it’s reasonably close to typical entry-level salaries in many fields
Part c: - t = 9.5 with p < 0.001 - p-value < α = 0.05, so reject H₀: β₁ = 0 - The slope is significantly different from zero - There is very strong evidence of a relationship between experience and salary
Part d: - Predicted: Ŷ = 45.2 + 3.8(10) = 45.2 + 38 = 83.2 thousand = $83,200 - Actual: Y = $95,000 = 95 thousand - Residual = Y - Ŷ = 95 - 83.2 = 11.8 thousand = $11,800 - This person earns $11,800 more than predicted (positive residual)
3 End of Practice Set
Key Reminders for the Exam:
- Always interpret results in context
- Check conditions before using tests
- Distinguish between statistical significance and practical importance
- Remember that correlation ≠ causation
- Be precise with language (reject vs. fail to reject, not accept)
- Show all work for partial credit. If you are using your calculator for computations, you must explain what you are calculating and write down the formula or process you are using. Numerical results without justification or explanation will receive minimal or no credit.
- State conclusions in context, not just as statistical decisions
Good luck on your final exam!