STATS 17 Practice Quiz

60 Questions Covering All Learning Objectives

Author
Published

28 April 2026

0.1 Instructions

This practice set contains 100 questions designed to help you prepare for the final exam. Click on the “Show Answer” buttons to reveal solutions.


1 Part I: Multiple Choice Questions (70 questions)

1.1 Normal Distribution and Z-Scores (Questions 1-10)

1.1.1 Question 1

If X follows a normal distribution with μ = 100 and σ = 15, what is P(X > 115)?

  1. Smaller than 0.5
  2. Greater than 0.5
  3. Equal to 0.5
  4. There is no way to know with the available information

Answer: a) Smaller than 0.5

Solution: - z = (115 - 100) / 15 = 1.0 - P(Z > 1.0) = 1 - 0.8413 = 0.1587 However, you are not going to be able to calculate the exact number in the exam. You can know if it is smaller, greater or equal to 0.5 by sketching the distribution and identifying the area under the curve that you are trying to calculate.

1.1.2 Question 2

A z-score of -1.96 corresponds to what percentile in a standard normal distribution?

  1. 2.5th percentile
  2. 5th percentile
  3. 95th percentile
  4. 97.5th percentile

Answer: a) 2.5th percentile

Solution: - P(Z < -1.96) = 0.025 = 2.5th percentile. Once again, you may not have the tools to calculate this directly, however, you can use the critical value list in the formula list and identify that this is the critical value often used in 95% confidence intervals (2.5% in each tail).

1.1.3 Question 3

Test scores are normally distributed with μ = 70 and σ = 8. What score corresponds to the 95th percentile?

  1. 80.24
  2. 79.12
  3. 60.76
  4. 83.16

Answer: b) 83.16

Solution: - The 90% CI corresponds to z = 1.645 according to the critical value list in the formulas. You have 5% in each of the tails, which means that the 95th percentile can be calculated using this critical value. Because this is the 95th percentile, we only take one tail (the positive one).

  • X = μ + zσ = 70 + (1.645)(8) = 70 + 13.16 = 83.16

1.1.4 Question 4

For a normal distribution, what proportion of data (approximately) falls between μ - σ and μ + σ?

  1. 50%
  2. 68%
  3. 95%
  4. 99.7%

Answer: b) 68%

Solution: - This is the empirical rule (68-95-99.7 rule) - About 68% of data falls within 1 standard deviation of the mean - About 95% within 2 standard deviations - About 99.7% within 3 standard deviations

1.1.5 Question 5

If Z ~ N(0,1), what is P(-2.33 < Z < 2.33)?

  1. 0.98
  2. 0.68
  3. 0.95
  4. 0.75

Answer: a) 0.8664

Solution: - P(-2.33 < Z < 2.33) = 0.98 by using the critical value for 98% CI

1.1.6 Question 6

A value that is 2.5 standard deviations below the mean has a z-score of:

  1. 2.5
  2. -2.5
  3. 0.25
  4. -0.25

Answer: b) -2.5

Solution: - Values below the mean have negative z-scores - 2.5 standard deviations below means z = -2.5

1.1.7 Question 7

Heights of adult women are normally distributed with μ = 65 inches and σ = 3 inches. What is the probability a randomly selected woman is shorter than 62 inches?

  1. Smaller than 0.5
  2. Greater than 0.5
  3. Equal to 0.5
  4. There is no way to know with the available information

Answer: a) Smaller than 0.5

Solution: - z = (62 - 65) / 3 = -1.0 - P(Z < -1.0) = 0.1587 However, you are not going to be able to calculate the exact number in the exam. You can know if it is smaller, greater or equal to 0.5 by sketching the distribution and identifying the area under the curve that you are trying to calculate.

1.1.8 Question 8

Which z-score represents a value furthest from the mean?

  1. z = 1.8
  2. z = -2.3
  3. z = 0.5
  4. z = -1.2

Answer: b) z = -2.3

Solution: - Distance from mean is measured by absolute value of z-score - |1.8| = 1.8, |-2.3| = 2.3, |0.5| = 0.5, |-1.2| = 1.2 - z = -2.3 is furthest from the mean

1.1.9 Question 9

If a normal distribution has μ = 50 and σ = 10, what is the value corresponding to z = 1.5?

  1. 35
  2. 65
  3. 55
  4. 75

Answer: b) 65

Solution: - X = μ + zσ = 50 + (1.5)(10) = 50 + 15 = 65

1.1.10 Question 10

The area under the entire standard normal curve equals:

  1. 0
  2. 0.5
  3. 1
  4. 100

Answer: c) 1

Solution: - The total area under any probability density function equals 1 - This represents the total probability of all possible outcomes


1.2 Central Limit Theorem (Questions 11-20)

1.2.1 Question 11

The Central Limit Theorem states that the sampling distribution of the sample mean will be approximately normal if:

  1. The population is normal or the sample size is large enough
  2. The population is always normal
  3. The sample size is less than 30
  4. The population variance is known

Answer: a) The population is normal or the sample size is large enough

Solution: - CLT applies when: (1) the population is normal (any n), OR (2) n is sufficiently large (typically n ≥ 30) - If the population is normal, sampling distribution of X̄ is normal for any sample size - If the population is not normal, we need a large enough sample for CLT to apply

1.2.2 Question 12

For the Central Limit Theorem to apply, a general rule of thumb is that n should be at least:

  1. 10
  2. 20
  3. 30
  4. 100

Answer: c) 30

Solution: - The rule of thumb is n ≥ 30 for CLT to apply when population is not normal - For more skewed populations, larger samples may be needed - For normal populations, any sample size works

1.2.3 Question 13

A population has μ = 80 and σ = 20. For samples of size 64, what is the standard error of the mean?

  1. 2.5
  2. 5.0
  3. 10.0
  4. 20.0

Answer: a) 2.5

Solution: - Standard error (SE) = σ/√n = 20/√64 = 20/8 = 2.5

1.2.4 Question 14

If the population distribution is highly skewed, the sampling distribution of X̄ will be approximately normal when:

  1. n is small
  2. n is sufficiently large
  3. σ is known
  4. Never

Answer: b) n is sufficiently large

Solution: - The Central Limit Theorem tells us that regardless of population shape, the sampling distribution of X̄ approaches normality as n increases - More skewed populations require larger samples

1.2.5 Question 15

The mean of the sampling distribution of X̄ equals:

  1. σ/√n
  2. μ/n
  3. μ
  4. σ

Answer: c) μ

Solution: - E(X̄) = μ, meaning the sampling distribution of X̄ is centered at the population mean - X̄ is an unbiased estimator of μ

1.2.6 Question 16

A population has σ = 12. To cut the standard error in half, the sample size must be:

  1. Doubled
  2. Quadrupled
  3. Tripled
  4. Cut in half

Answer: b) Quadrupled

Solution: - SE = σ/√n - To cut SE in half: σ/(√n) → σ/(2√n) - This requires √n → 2√n, so n → 4n - Sample size must be quadrupled

1.2.7 Question 17

For a population with μ = 50 and σ = 15, samples of size 100 are drawn. What is P(X̄ > 52)?

  1. Smaller than 0.5
  2. Greater than 0.5
  3. Equal to 0.5
  4. There is no way to know with the available information

Answer: a) Smaller than 0.5

Solution: - SE = σ/√n = 15/√100 = 1.5 - z = (52 - 50) / 1.5 = 1.33 - P(Z > 1.33) = 1 - 0.9082 = 0.0918. However, you are not going to be able to calculate the exact number in the exam. You can know if it is smaller, greater or equal to 0.5 by sketching the distribution and identifying the area under the curve that you are trying to calculate.

1.2.8 Question 18

According to the Central Limit Theorem, as sample size increases, the sampling distribution of X̄:

  1. Becomes more skewed
  2. Has larger standard deviation
  3. Becomes more concentrated around μ
  4. Approaches the population distribution

Answer: c) Becomes more concentrated around μ

Solution: - As n increases, SE = σ/√n decreases - This means the sampling distribution becomes narrower and more concentrated around μ - The sampling distribution approaches a normal distribution (not the population distribution)

1.2.9 Question 19

A population is uniformly distributed. The sampling distribution of X̄ for n = 50 will be:

  1. Uniform
  2. Skewed
  3. Approximately normal
  4. Bimodal

Answer: c) Approximately normal

Solution: - By CLT, even though the population is uniform (not normal), the sampling distribution of X̄ will be approximately normal when n is large - n = 50 is sufficiently large for CLT to apply

1.2.10 Question 20

If samples of size 25 are drawn from a normal population with σ = 10, the standard deviation of X̄ is:

  1. 0.4
  2. 2
  3. 5
  4. 10

Answer: b) 2

Solution: - Standard deviation of X̄ is the standard error: SE = σ/√n = 10/√25 = 10/5 = 2


1.3 Confidence Intervals (Questions 21-35)

1.3.1 Question 21

A 95% confidence interval means that:

  1. 95% of the data falls within the interval
  2. We are 95% confident the interval captures the true parameter
  3. The probability the parameter is in the interval is 95%
  4. 95% of sample means fall within the interval

Answer: b) We are 95% confident the interval captures the true parameter

Solution: - Correct interpretation: We are 95% confident this specific interval contains the true parameter - The parameter is fixed; the interval is random - If we repeated this process many times, about 95% of intervals would contain the true parameter

1.3.2 Question 22

A researcher constructs a 90% confidence interval for μ as (23.5, 28.5). The point estimate is:

  1. 5
  2. 26
  3. 2.5
  4. Cannot be determined

Answer: b) 26

Solution: - The point estimate (X̄) is at the center of the interval - Point estimate = (23.5 + 28.5) / 2 = 52 / 2 = 26

1.3.3 Question 23

When σ is unknown and n = 15, we should construct a confidence interval using:

  1. Z-distribution
  2. t-distribution with 15 degrees of freedom
  3. t-distribution with 14 degrees of freedom
  4. Normal distribution always

Answer: c) t-distribution with 14 degrees of freedom

Solution: - When σ is unknown, we use the t-distribution - Degrees of freedom = n - 1 = 15 - 1 = 14

1.3.4 Question 24

To decrease the width of a confidence interval, we can:

  1. Increase the confidence level
  2. Decrease the sample size
  3. Increase the sample size
  4. Increase the standard deviation

Answer: c) Increase the sample size

Solution: - CI width depends on margin of error: ME = critical value × SE - SE = s/√n, so increasing n decreases SE and thus narrows the interval - Increasing confidence level or σ would widen the interval

1.3.5 Question 25

A 99% confidence interval will be _____ a 95% confidence interval (all else equal):

  1. Narrower than
  2. Wider than
  3. The same width as
  4. Cannot determine

Answer: b) Wider than

Solution: - Higher confidence requires wider interval to be more “confident” we capture the parameter - 99% CI uses z* = 2.576 vs 95% CI uses z* = 1.96 - Larger critical value → larger margin of error → wider interval

1.3.6 Question 26

For a confidence interval for a proportion, we need np ≥ 10 and n(1-p) ≥ 10 to:

  1. Ensure the sample is random
  2. Ensure the normal approximation is valid
  3. Calculate the margin of error
  4. Determine the confidence level

Answer: b) Ensure the normal approximation is valid

Solution: - These are the success-failure conditions - We need enough successes (np ≥ 10) and failures (n(1-p) ≥ 10) - This ensures the sampling distribution of p̂ is approximately normal

1.3.7 Question 27

A sample of 100 gives X̄ = 45 with s = 12. The 95% confidence interval for μ is approximately:

  1. (42.6, 47.4)
  2. (43.0, 47.0)
  3. (44.0, 46.0)
  4. (40.5, 49.5)

Answer: a) (42.6, 47.4)

Solution: - SE = s/√n = 12/√100 = 1.2 - For large n, use z* ≈ 1.96 for 95% CI - ME = 1.96 × 1.2 = 2.352 ≈ 2.4 - CI: 45 ± 2.4 = (42.6, 47.4)

1.3.8 Question 28

The margin of error in a confidence interval is:

  1. Half the width of the interval
  2. The width of the interval
  3. The standard error
  4. The confidence level

Answer: a) Half the width of the interval

Solution: - CI = point estimate ± margin of error - Width = upper limit - lower limit = 2 × margin of error - Therefore, margin of error = width / 2

1.3.9 Question 29

A researcher wants to estimate a population proportion with margin of error 0.03 at 95% confidence. The required sample size is approximately:

  1. 267
  2. 384
  3. 1068
  4. 33

Answer: c) 1068

Solution: - When no prior estimate, use p = 0.5 (most conservative) - ME = z* √(p(1-p)/n) - 0.03 = 1.96 √(0.25/n) - n = (1.96)² × 0.25 / (0.03)² = 1067.1 ≈ 1068

1.3.10 Question 30

If a 95% CI for μ is (40, 50), which null hypothesis would be rejected at α = 0.05?

  1. H₀: μ = 45
  2. H₀: μ = 48
  3. H₀: μ = 42
  4. H₀: μ = 55

Answer: d) H₀: μ = 55

Solution: - If a value is inside the 95% CI, we would not reject H₀ at α = 0.05 - If a value is outside the 95% CI, we would reject H₀ at α = 0.05 - 55 is outside the interval (40, 50), so we would reject H₀: μ = 55

1.3.11 Question 31

The t-distribution differs from the normal distribution in that it:

  1. Is more spread out with heavier tails
  2. Is always skewed
  3. Has mean different from 0
  4. Cannot be used for inference

Answer: a) Is more spread out with heavier tails

Solution: - t-distribution has heavier tails than normal (more probability in extremes) - As df increases, t-distribution approaches normal distribution - Both are symmetric with mean 0

1.3.12 Question 32

As degrees of freedom increase, the t-distribution:

  1. Becomes more skewed
  2. Approaches the normal distribution
  3. Becomes more spread out
  4. Stays exactly the same

Answer: b) Approaches the normal distribution

Solution: - As df → ∞, t-distribution → standard normal distribution - This is why we can use z-values for large samples

1.3.13 Question 33

A sample of 400 voters shows 220 favor a proposition. The 90% CI for the true proportion is approximately:

  1. (0.51, 0.59)
  2. (0.49, 0.61)
  3. (0.52, 0.58)
  4. (0.50, 0.60)

Answer: a) (0.51, 0.59)

Solution: - p̂ = 220/400 = 0.55 - SE = √(0.55 × 0.45 / 400) = √(0.0006188) = 0.0249 - z* for 90% CI = 1.645 - ME = 1.645 × 0.0249 = 0.041 - CI: 0.55 ± 0.041 = (0.509, 0.591) ≈ (0.51, 0.59)

1.3.14 Question 34

To halve the margin of error in a confidence interval (keeping everything else constant), you must:

  1. Double the sample size
  2. Quadruple the sample size
  3. Take the square root of the sample size
  4. Divide the sample size by 4

Answer: b) Quadruple the sample size

Solution: - ME ∝ 1/√n - To halve ME: need √n to double - If √n doubles, then n quadruples

1.3.15 Question 35

The critical value for a 98% confidence interval using the standard normal distribution is approximately:

  1. 1.96
  2. 2.33
  3. 2.58
  4. 1.645

Answer: b) 2.33

Solution: - This solution comes directly from the critical values listed in the formulas.
- 98% confidence means 2% in tails, so 1% in each tail - P(Z < z) = 0.99 - z = 2.33


1.4 Hypothesis Testing Fundamentals (Questions 36-50)

1.4.1 Question 36

The null hypothesis typically represents:

  1. The researcher’s belief
  2. The status quo or no effect
  3. The alternative theory
  4. The sample statistic

Answer: b) The status quo or no effect

Solution: - H₀ represents the claim of no difference, no effect, or status quo - It’s what we assume to be true unless we have strong evidence against it

1.4.2 Question 37

The p-value is the probability of:

  1. The null hypothesis being true
  2. The alternative hypothesis being true
  3. Observing data as extreme or more extreme than what we got, assuming H₀ is true
  4. Making a Type I error

Answer: c) Observing data as extreme or more extreme than what we got, assuming H₀ is true

Solution: - p-value = P(observing test statistic as extreme or more extreme | H₀ is true) - It measures the strength of evidence against H₀ - It is NOT the probability that H₀ is true

1.4.3 Question 38

If we reject H₀ when it is actually true, we have made:

  1. Type I error
  2. Type II error
  3. Correct decision
  4. No error

Answer: a) Type I error

Solution: - Type I error: Rejecting H₀ when H₀ is true (false positive) - P(Type I error) = α

1.4.4 Question 39

The probability of Type II error is denoted by:

  1. α
  2. β
  3. p
  4. 1 - α

Answer: b) β

Solution: - β = P(Type II error) = P(Fail to reject H₀ | H₀ is false) - Power = 1 - β

1.4.5 Question 40

Power of a test is:

  1. α
  2. β
  3. 1 - β
  4. 1 - α

Answer: c) 1 - β

Solution: - Power = P(Reject H₀ | H₀ is false) - Power = 1 - P(Type II error) = 1 - β - Higher power is better (more likely to detect a true effect)

1.4.6 Question 41

If α = 0.05 and p-value = 0.03, we should:

  1. Fail to reject H₀
  2. Reject H₀
  3. Accept H₀
  4. Cannot determine

Answer: b) Reject H₀

Solution: - Decision rule: If p-value < α, reject H₀ - 0.03 < 0.05, so we reject H₀

1.4.7 Question 42

A two-tailed test at α = 0.05 is equivalent to:

  1. A one-tailed test at α = 0.05
  2. A one-tailed test at α = 0.025
  3. Checking if the parameter equals the null value
  4. Using a 95% confidence interval

Answer: d) Using a 95% confidence interval

Solution: - A two-tailed test at α = 0.05 is equivalent to checking if the null value falls within a 95% CI - If the null value is outside the 95% CI, we reject at α = 0.05

1.4.8 Question 43

Which statement is correct?

  1. Failing to reject H₀ proves H₀ is true
  2. Rejecting H₀ proves Hₐ is true
  3. We never “accept” the null hypothesis
  4. P-value equals α

Answer: c) We never “accept” the null hypothesis

Solution: - We either reject H₀ or fail to reject H₀ - Failing to reject ≠ accepting; it just means insufficient evidence against H₀ - We never “prove” hypotheses with statistical tests

1.4.9 Question 44

To increase the power of a test, we can:

  1. Decrease sample size
  2. Increase α
  3. Decrease α
  4. Make the test two-tailed

Answer: b) Increase α

Solution: - Power = 1 - β = P(Reject H₀ | H₀ is false) - Increasing α makes it easier to reject H₀, thus increasing power - Also: increasing sample size, increasing effect size, or decreasing variance increases power

1.4.10 Question 45

A researcher finds a strong positive correlation (r = 0.82) between ice cream sales and drowning incidents. Which conclusion is most appropriate?

  1. Eating ice cream causes drowning
  2. Drowning causes people to buy ice cream
  3. A third variable (like temperature) likely affects both variables
  4. The strong correlation proves a causal relationship

Answer: c) A third variable (like temperature) likely affects both variables

Solution: - Correlation does NOT imply causation - This is a classic example of a confounding variable - Temperature (or summer weather) likely causes both increased ice cream sales and more swimming (leading to more drowning incidents) - The correlation between ice cream and drowning is spurious (not causal)

1.4.11 Question 46

The significance level α represents:

  1. P(Type II error)
  2. P(Type I error)
  3. The p-value
  4. Power

Answer: b) P(Type I error)

Solution: - α = P(Reject H₀ | H₀ is true) = P(Type I error) - Common values: α = 0.05, 0.01, 0.10

1.4.12 Question 47

A study finds that students who sit in the front rows of classrooms have higher exam scores on average than students who sit in the back (r = 0.65, p < 0.01). What can we conclude?

  1. Sitting in the front causes higher exam scores
  2. Higher exam scores cause students to sit in the front
  3. There is a significant association, but causation cannot be determined from this study
  4. Random assignment would eliminate this correlation

Answer: c)There is a significant association, but causation cannot be determined from this study

Solution: - The correlation is statistically significant (p < 0.01), so there is a real association - However, this is an observational study, not an experiment - Possible explanations: Motivated students choose to sit in front AND study more, Better vision/hearing in front helps learning, Less distraction in front - Cannot establish causation without a randomized experiment - Answer (d) is incorrect because random assignment would be part of designing an experiment, but wouldn’t “eliminate” a real relationship

1.4.13 Question 48

When comparing a p-value to α:

  1. If p-value < α, fail to reject H₀
  2. If p-value < α, reject H₀
  3. If p-value > α, reject H₀
  4. P-value and α are unrelated

Answer: b) If p-value < α, reject H₀

Solution: - Decision rule: Reject H₀ if p-value < α - Fail to reject H₀ if p-value ≥ α

1.4.14 Question 49

In hypothesis testing, we test:

  1. Sample statistics
  2. Population parameters
  3. Both statistics and parameters
  4. Neither

Answer: b) Population parameters

Solution: - Hypotheses are statements about population parameters (μ, p, σ, etc.) - We use sample statistics to make inferences about parameters

1.4.15 Question 50

A researcher obtains a test statistic of t = 2.5 with a p-value of 0.01. At α = 0.05, this provides:

  1. No evidence against H₀
  2. Evidence against H₀
  3. Nothing
  4. Cannot determine

Answer: c) Evidence against H₀

Solution: - p-value = 0.01 < 0.05, so we reject H₀ - p < 0.01 indicates evidence against H₀


1.5 Two-Sample Tests and Effect Sizes (Questions 51-60)

1.5.1 Question 51

When comparing two population means with independent samples and unknown but equal variances, we use:

  1. Paired t-test
  2. Pooled t-test
  3. Z-test for proportions
  4. Chi-square test

Answer: b) Pooled t-test

Solution: - Equal variances → pooled t-test - Unequal variances → Welch’s t-test (unpooled) - Paired data → paired t-test (outside of the scope of our class)

1.5.2 Question 52

Cohen’s d = 0.8 represents:

  1. Small effect
  2. Medium effect
  3. Large effect
  4. No effect

Answer: c) Large effect

Solution: - Cohen’s standards: d = 0.2 (small), 0.5 (medium), 0.8 (large) - d = 0.8 is considered a large, practically meaningful effect

1.5.3 Question 53

A pooled t-test assumes:

  1. The samples are dependent
  2. Population variances are equal
  3. Population variances are unequal
  4. Sample sizes must be equal

Answer: b) Population variances are equal

Solution: - Pooled t-test pools the variances, assuming σ₁² = σ₂² - If variances are unequal, use Welch’s t-test instead

1.5.4 Question 54

To test H₀: p₁ = p₂ vs. Hₐ: p₁ ≠ p₂, we use:

  1. t-test
  2. ANOVA
  3. Two-proportion z-test
  4. Chi-square goodness of fit

Answer: c) Two-proportion z-test

Solution: - Comparing two population proportions → two-proportion z-test - Uses pooled proportion under H₀: p₁ = p₂

1.5.5 Question 55

When comparing two means with known population standard deviations, we use:

  1. t-test
  2. z-test
  3. F-test
  4. Chi-square test

Answer: b) z-test

Solution: - Known σ → z-test - Unknown σ → t-test - In practice, σ is almost always unknown

1.5.6 Question 56

Cohen’s d is calculated as:

  1. (X̄₁ - X̄₂) / s_pooled
  2. (X̄₁ - X̄₂) / SE
  3. s_pooled / (X̄₁ - X̄₂)
  4. SE / (X̄₁ - X̄₂)

Answer: a) (X̄₁ - X̄₂) / s_pooled

Solution: - Cohen’s d = (difference in means) / (pooled standard deviation) - Measures effect size in standard deviation units - Not affected by sample size (unlike test statistics)

1.5.7 Question 57

According to Cohen’s standards, d = 0.4 is closest to:

  1. Small effect
  2. Medium effect
  3. Large effect
  4. Very large effect

Answer: b) Medium effect

Solution: - Cohen’s standards: 0.2 (small), 0.5 (medium), 0.8 (large) - d = 0.4 is between small and medium, but closer to medium

1.5.8 Question 58

When testing the difference between two proportions, the null hypothesis is typically:

  1. p₁ - p₂ = 1
  2. p₁ - p₂ = 0
  3. p₁/p₂ = 1
  4. p₁ + p₂ = 1

Answer: b) p₁ - p₂ = 0

Solution: - H₀: p₁ - p₂ = 0, which is equivalent to H₀: p₁ = p₂ - Tests if the two proportions are equal

1.5.9 Question 59

A researcher wants to estimate the average income of all residents in a city. She surveys 500 people who visit an upscale shopping mall on a Saturday afternoon and constructs a 95% confidence interval. What is the primary concern with this approach?

  1. The sample size is too small for the Central Limit Theorem to apply
  2. The sampling method is not random, so the confidence interval may not be valid
  3. A 99% confidence interval should be used instead
  4. The t-distribution should be used instead of the z-distribution

Answer: b) The sampling method is not random, so the confidence interval may not be valid

Solution: - All inferential procedures (confidence intervals, hypothesis tests) require random sampling - This is a convenience sample from an upscale shopping mall, which likely: - Overrepresents higher-income individuals - Excludes people who don’t shop at malls - Only captures Saturday afternoon shoppers - The resulting confidence interval will be biased and not representative of all city residents - Sample size (n = 500) is actually quite large, so (a) is incorrect - The issue isn’t about choosing 95% vs 99% confidence level (c) - The issue isn’t about z vs t distribution (d) - Key principle: Without random sampling, we cannot validly generalize from our sample to the population, regardless of sample size or statistical technique used

1.5.10 Question 60

In a two-sample t-test, if we fail to reject H₀: μ₁ = μ₂, we conclude:

  1. μ₁ = μ₂ is definitely true
  2. There is insufficient evidence that μ₁ ≠ μ₂
  3. The samples are identical
  4. μ₁ > μ₂

Answer: b) There is insufficient evidence that μ₁ ≠ μ₂

Solution: - Failing to reject H₀ means we don’t have enough evidence to conclude the means differ - It does NOT prove the means are equal


1.6 Chi-Square Tests and ANOVA (Questions 61-70)

1.6.1 Question 61

The chi-square distribution is:

  1. Symmetric
  2. Always right-skewed
  3. Always left-skewed
  4. Can be negative

Answer: b) Always right-skewed

Solution: - χ² distribution is right-skewed (positive values only) - Approaches normal as df increases - Used for: tests of independence, goodness of fit, variance tests

1.6.2 Question 62

The degrees of freedom for a chi-square test of independence with a 3×4 contingency table is:

  1. 12
  2. 7
  3. 6
  4. 11

Answer: c) 6

Solution: - df = (r - 1)(c - 1) where r = rows, c = columns - df = (3 - 1)(4 - 1) = 2 × 3 = 6

1.6.3 Question 63

In ANOVA, the null hypothesis states that:

  1. All sample means are equal
  2. All population means are equal
  3. All population variances are equal
  4. Sample and population means are equal

Answer: b) All population means are equal

Solution: - H₀: μ₁ = μ₂ = μ₃ = … = μₖ - Tests if all k population means are equal

1.6.4 Question 64

The F-statistic in ANOVA is always:

  1. Negative
  2. Between -1 and 1
  3. Non-negative
  4. Greater than 1

Answer: c) Non-negative

Solution: - F = MSB / MSW (ratio of two variances) - Variances are always non-negative, so F ≥ 0 - F close to 1 suggests no difference in means

1.6.5 Question 65

MSB (Mean Square Between) measures:

  1. Variation within groups
  2. Variation between groups
  3. Total variation
  4. Sample variance

Answer: b) Variation between groups

Solution: - MSB = SSB / (k-1) measures variation between group means - MSW = SSW / (n-k) measures variation within groups - F = MSB / MSW

1.6.6 Question 66

If the F-statistic in ANOVA is close to 1, this suggests:

  1. Strong evidence against H₀
  2. Group means are very different
  3. Little difference between group means
  4. The test is invalid

Answer: c) Little difference between group means

Solution: - F ≈ 1 means MSB ≈ MSW - Between-group variation is similar to within-group variation - Suggests groups means are similar (fail to reject H₀)

1.6.7 Question 67

The chi-square test for independence tests whether:

  1. Two means are equal
  2. A distribution is normal
  3. Two categorical variables are related
  4. Variances are equal

Answer: c) Two categorical variables are related

Solution: - H₀: Variables are independent - Hₐ: Variables are related/associated/dependent - Uses contingency tables

1.6.8 Question 68

In a chi-square test, expected frequencies are calculated assuming:

  1. The alternative hypothesis is true
  2. The null hypothesis is true
  3. The sample is biased
  4. Variables are dependent

Answer: b) The null hypothesis is true

Solution: - Expected frequencies assume independence (H₀ is true) - E = (row total × column total) / grand total - Compare observed to expected frequencies

1.6.9 Question 69

For a one-way ANOVA with 4 groups and 40 total observations, the df for MSW is:

  1. 36
  2. 39
  3. 3
  4. 4

Answer: a) 36

Solution: - df for MSE (within groups) = n - k - n = 40 total observations, k = 4 groups - df = 40 - 4 = 36

1.6.10 Question 70

A statistically significant F-test in ANOVA tells us:

  1. All means are different
  2. At least one mean differs from the others
  3. Exactly which means differ
  4. All means are equal

Answer: b) At least one mean differs from the others

Solution: - Rejecting H₀ in ANOVA means at least one μᵢ ≠ μⱼ - Doesn’t tell us which specific means differ - Need post-hoc tests (e.g., Tukey’s HSD) to identify differences


2 Part II: Free Response Questions (30 questions)

2.1 Section A: Normal Distribution (Questions 1-3)

2.1.1 Question 1

Battery life for a certain laptop is normally distributed with μ = 6.5 hours and σ = 0.8 hours.

  1. What proportion of laptops have battery life between 6 and 7 hours?
  2. Find the battery life that represents the 75th percentile.
  3. If a laptop’s battery lasts 8 hours, is this unusual? Explain using the z-score.

Part a: - z₁ = (6 - 6.5) / 0.8 = -0.625 - z₂ = (7 - 6.5) / 0.8 = 0.625 - You won’t have a way to calculate the following probabilities in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(-0.625 < Z < 0.625) = P(Z < 0.625) - P(Z < -0.625) - = 0.7340 - 0.2660 = 0.468 or 46.8%

Part b: - You won’t have a way to calculate the following percentile in the exam, but we may ask you to sketch the N(0,1) distribution and point the values in the x-axis that you are looking for. - 75th percentile corresponds to z = 0.674 - X = μ + zσ = 6.5 + (0.674)(0.8) = 6.5 + 0.539 = 7.04 hours

Part c: - z = (8 - 6.5) / 0.8 = 1.875 - This is between 1.5 and 2 standard deviations above the mean - This is somewhat unusual (in the upper 3-4% of the distribution) - Not extremely unusual (would need |z| > 2 or 3 for that)

2.1.2 Question 2

SAT scores are normally distributed with μ = 1050 and σ = 200.

  1. What percentage of students score above 1300?
  2. What score represents the bottom 10% of all scores?
  3. Between what two scores (symmetric around the mean) do the middle 90% of students score?

Part a: - z = (1300 - 1050) / 200 = 1.25 - You won’t have a way to calculate the following probability in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(Z > 1.25) = 1 - 0.8944 = 0.1056 or 10.56%

Part b: - You won’t have a way to calculate the following percentile in the exam, but we may ask you to sketch the N(0,1) distribution and point the values in the x-axis that you are looking for. - Bottom 10% means z = -1.28 - X = 1050 + (-1.28)(200) = 1050 - 256 = 794

Part c: - Middle 90% leaves 5% in each tail - z-scores: ±1.645 (from the critical values listed in the formulas) - Lower bound: 1050 + (-1.645)(200) = 721 - Upper bound: 1050 + (1.645)(200) = 1379 - Middle 80% score between 721 and 1379

2.1.3 Question 3

A factory produces bolts with diameters that are normally distributed with μ = 10 mm and σ = 0.2 mm. Bolts are acceptable if their diameter is between 9.671 mm and 10.329 mm.

  1. What proportion of bolts are acceptable?
  2. If the factory produces 10,000 bolts per day, how many are expected to be unacceptable?
  3. What should the standard deviation be (keeping μ = 10) so that 99% of bolts are acceptable?

Part a: - z₁ = (9.671 - 10) / 0.2 = -1.645 - z₂ = (10.329 - 10) / 0.2 = 1.645 - P(-1.645 < Z < 1.645) = 90% (using the critical values listed in the formulas)

Part b: - Proportion unacceptable = 1 - 0.90 = 0.10 - Expected unacceptable = 10,000 × 0.10 = 1000 bolts

Part c: - For 99% acceptable, need P(9.671 < X < 10.329) = 0.99 - Need z = 2.576 for each endpoint (from the critical values listed in the formulas) - 10.329 = 10 + 2.576σ - σ = 0.329 / 2.576 = 0.1277 mm


2.2 Section B: Central Limit Theorem (Questions 4-6)

2.2.1 Question 4

A population of customer service wait times has μ = 12 minutes and σ = 4 minutes. The distribution is right-skewed.

  1. Can we use the Central Limit Theorem for samples of size n = 5? Why or why not?
  2. For samples of size n = 64, describe the sampling distribution of X̄.
  3. What is the probability that a sample of 64 customers has a mean wait time less than 11.5 minutes?

Part a: - No, we cannot reliably use CLT for n = 5 - The population is right-skewed, so we need a larger sample (typically n ≥ 30) - With n = 5, the sampling distribution will still be skewed

Part b: - By CLT, for n = 64 (large sample), X̄ is approximately normal - Mean of X̄: μ_X̄ = 12 minutes - Standard error: SE = σ/√n = 4/√64 = 0.5 minutes - X̄ ~ N(12, 0.5)

Part c: - z = (11.5 - 12) / 0.5 = -1.0 - You won’t have a way to calculate the following probability in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(Z < -1.0) = 0.1587 or 15.87%

2.2.2 Question 5

Monthly cell phone bills for a population have μ = $85 and σ = $20.

  1. For random samples of 100 customers, what is the mean and standard deviation of the sampling distribution of X̄?
  2. What is P(X̄ > $87)?
  3. Would it be unusual to observe a sample mean of $90? Explain.

Part a: - Mean of sampling distribution: μ_X̄ = μ = $85 - Standard deviation (SE): σ_X̄ = σ/√n = 20/√100 = $2

Part b: - z = (87 - 85) / 2 = 1.0 - You won’t have a way to calculate the following probability in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(Z > 1.0) = 1 - 0.8413 = 0.1587 or 15.87%

Part c: - z = (90 - 85) / 2 = 2.5 - You won’t have a way to calculate the following probability in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(Z > 2.5) = 0.0062 or 0.62% - Yes, this would be unusual (more than 2 standard errors from mean) - Only occurs about 0.6% of the time by chance

2.2.3 Question 6

A population is uniformly distributed on the interval [0, 10].

  1. What are μ and σ for this population?
  2. For samples of size 36, describe the sampling distribution of X̄.
  3. Calculate P(4.5 < X̄ < 5.5) for n = 36.

Part a: - For uniform distribution on [a, b]: - μ = (a + b) / 2 = (0 + 10) / 2 = 5 - σ = (b - a) / √12 = 10 / √12 = 2.887

Part b: - By CLT, even though population is uniform, X̄ is approximately normal for n = 36 - μ_X̄ = 5 - SE = σ/√n = 2.887/√36 = 0.481 - X̄ ~ N(5, 0.481)

Part c: - z₁ = (4.5 - 5) / 0.481 = -1.04 - z₂ = (5.5 - 5) / 0.481 = 1.04 - You won’t have a way to calculate the following probability in the exam, but we may ask you to sketch the N(0,1) distribution and color the area under the curve you are looking for. - P(-1.04 < Z < 1.04) = 0.8508 - 0.1492 = 0.7016 or 70.16%


2.3 Section C: Confidence Intervals (Questions 7-12)

2.3.1 Question 7

A random sample of 50 students has mean GPA of 3.2 with standard deviation 0.6.

  1. Construct a 95% confidence interval for the true mean GPA.
  2. Interpret this interval in context.
  3. Based on this interval, is it plausible that the true mean GPA is 3.0? Explain.

Part a: - n = 50 (large), so use z* ≈ 1.96 for 95% CI (from list of critical values in formulas) - SE = s/√n = 0.6/√50 = 0.0849 - ME = 1.96 × 0.0849 = 0.166 - CI: 3.2 ± 0.166 = (3.034, 3.366)

Part b: - We are 95% confident that the true mean GPA for all students is between 3.034 and 3.366. - If we repeated this sampling process many times, about 95% of the intervals would contain the true population mean GPA.

Part c: - No, μ = 3.0 is NOT plausible at the 95% confidence level. However, we can always be in the 5% of cases where we get the decision wrong. - We would reject H₀: μ = 3.0 at α = 0.05

2.3.2 Question 8

A survey of 500 voters finds that 270 support a ballot measure.

  1. Calculate the sample proportion.
  2. Construct a 99% confidence interval for the true proportion of supporters.
  3. Based on this interval, is there evidence the measure will pass (needs >50%)? Explain.

Part a: - p̂ = 270/500 = 0.54

Part b: - Check conditions: np̂ = 270 ≥ 10, n(1-p̂) = 230 ≥ 10 ✓ - SE = √(p̂(1-p̂)/n) = √(0.54×0.46/500) = 0.0223 - z* = 2.576 for 99% CI - ME = 2.576 × 0.0223 = 0.057 - CI: 0.54 ± 0.057 = (0.483, 0.597) or (48.3%, 59.7%)

Part c: - The entire interval is above 50%, so there is evidence at the 99% confidence level that the measure will pass - We are quite confident that more than 50% of voters support the measure

2.3.3 Question 9

A manufacturer wants to estimate the mean lifetime of light bulbs with margin of error 50 hours at 95% confidence. Previous studies suggest σ = 200 hours.

  1. What sample size is needed?
  2. If the desired margin of error is reduced to 25 hours, what sample size is needed?
  3. Explain why the sample size changes the way it does.

Part a: - ME = z* × σ/√n - 50 = 1.96 × 200/√n - √n = (1.96 × 200) / 50 = 7.84 - n = 61.47, round up to n = 62

Part b: - 25 = 1.96 × 200/√n - √n = (1.96 × 200) / 25 = 15.68 - n = 245.86, round up to n = 246

Part c: - To cut the margin of error in half (from 50 to 25), we need to quadruple the sample size (from 62 to 246) - This is because ME ∝ 1/√n - To reduce ME by factor of k, need to increase n by factor of k²

2.3.4 Question 10

A sample of 15 measurements has X̄ = 42.5 and s = 6.8. Assume the population is normally distributed.

  1. Why must we use the t-distribution for inference?
  2. Construct a 90% confidence interval for μ.
  3. How would the interval change if n = 50 instead of 15 (all else equal)?

Part a: - We must use t-distribution because: - σ is unknown (we only have s) - Sample size is small (n = 15 < 30) - df = n - 1 = 14

Part b: - t* = 1.761 for 90% CI with df = 14 - SE = s/√n = 6.8/√15 = 1.756 - ME = 1.761 × 1.756 = 3.092 - CI: 42.5 ± 3.09 = (39.41, 45.59)

Part c: - With n = 50: - df = 49, t* ≈ 1.677 (closer to z* = 1.645) - SE = 6.8/√50 = 0.962 (much smaller) - ME = 1.677 × 0.962 = 1.61 (much smaller) - The interval would be much narrower due to larger sample size

2.3.5 Question 11

A 95% confidence interval for the difference in mean salaries between two departments is ($2,000, $8,000).

  1. Interpret this interval in context.
  2. Is there significant evidence at α = 0.05 that mean salaries differ? Explain.
  3. What would change if we computed a 99% CI instead?

Part a: - We are 95% confident that the true difference in mean salaries between the two departments is between $2,000 and $8,000. - Department 1 appears to have higher mean salary than Department 2 by somewhere between $2,000 and $8,000.

Part b: - Yes, there is statististically significant evidence that mean salaries differ - The interval does not contain 0, so we would reject H₀: μ₁ - μ₂ = 0 at α = 0.05 - This is consistent with a two-tailed test rejecting H₀

Part c: - A 99% CI would be wider than the 95% CI - It might still not contain 0 (but it would be closer) - We would be more confident but less precise

2.3.6 Question 12

Two samples are collected: Sample A (n = 100) and Sample B (n = 400), both with the same standard deviation.

  1. Which sample will produce a narrower confidence interval? Explain.
  2. How much narrower will it be?
  3. What does this tell us about the value of larger samples?

Part a: - Sample B (n = 400) will produce a narrower CI - CI width depends on SE = s/√n - Larger n → smaller SE → narrower CI

Part b: - SE_A = s/√100 = s/10 - SE_B = s/√400 = s/20 - SE_B = SE_A / 2 - Sample B’s CI will be half as wide as Sample A’s

Part c: - Larger samples provide more precise estimates - To cut width in half, need to quadruple sample size - Diminishing returns: going from 100 to 400 cuts width in half, but going from 400 to 1600 would be needed to cut in half again


2.4 Section D: Hypothesis Testing (Questions 13-18)

2.4.1 Question 13

A coffee shop claims the mean wait time is 5 minutes. A sample of 35 customers has X̄ = 5.8 minutes with s = 2.1 minutes. Test at α = 0.05.

  1. State the hypotheses in symbols and words.
  2. Calculate the test statistic.
  3. If the p-value is 0.027, what conclusion should be made? State it in context.
  4. What type of error might have been made? What would it mean in context?

Part a: - H₀: μ = 5 (The mean wait time is 5 minutes) - Hₐ: μ ≠ 5 (The mean wait time is not 5 minutes) - Two-tailed test

Part b: - t = (X̄ - μ₀) / (s/√n) - t = (5.8 - 5) / (2.1/√35) - t = 0.8 / 0.355 = 2.25

Part c: - p-value = 0.027 < α = 0.05, so reject H₀ - Conclusion: There is statistically significant evidence at the 0.05 level that the mean wait time is not 5 minutes. The data suggest the actual mean wait time is different from (likely greater than) the claimed 5 minutes.

Part d: - If we reject H₀, we might have made a Type I error - Type I error: Concluding the mean wait time is not 5 minutes when it actually is 5 minutes - Consequence: The coffee shop might unnecessarily change their operations based on incorrect conclusion

2.4.2 Question 14

A university claims that 70% of students graduate in 4 years. In a random sample of 200 students, 130 graduated in 4 years. Test at α = 0.01.

  1. State the hypotheses.
  2. Check if conditions for the test are met.
  3. The test statistic is z = -1.55 with p-value = 0.121. What conclusion should be made?
  4. Explain what Type I and Type II errors would mean in this context.

Part a: - H₀: p = 0.70 - Hₐ: p ≠ 0.70

Part b: - Random sample? yes - np₀ = 200(0.70) = 140 ≥ 10 ✓ - n(1-p₀) = 200(0.30) = 60 ≥ 10 ✓ - Conditions are met for normal approximation

Part c: - p-value = 0.121 > α = 0.01, so fail to reject H₀ - Conclusion: There is insufficient evidence at the 0.01 significance level to conclude that the graduation rate differs from 70%. The data are consistent with the university’s claim.

Part d: - Type I error: Concluding the rate is not 70% when it actually is 70% - Consequence: University might waste resources investigating a non-existent problem - Type II error: Failing to conclude the rate differs from 70% when it actually does differ - Consequence: University might not address a real problem with graduation rates

2.4.3 Question 15

A researcher tests H₀: μ = 100 vs. Hₐ: μ ≠ 100 at α = 0.05. Sample data: n = 50, X̄ = 103, s = 12.

  1. Calculate the test statistic.
  2. If t₀.₀₂₅,₄₉ = 2.010, what is the rejection region?
  3. What decision should be made?
  4. Construct a 95% CI for μ. How does this relate to your hypothesis test conclusion?

Part a: - t = (X̄ - μ₀) / (s/√n) - t = (103 - 100) / (12/√50) - t = 3 / 1.697 = 1.77

Part b: - Two-tailed test with α = 0.05 - Rejection region: t < -2.010 or t > 2.010

Part c: - t = 1.77 does not fall in rejection region - Fail to reject H₀ - There is insufficient evidence to conclude μ ≠ 100

Part d: - 95% CI: X̄ ± t* × SE - CI: 103 ± 2.010 × 1.697 = 103 ± 3.41 = (99.59, 106.41) - The interval contains 100, which is consistent with failing to reject H₀: μ = 100 - If 100 is in the 95% CI, we fail to reject H₀: μ = 100 at α = 0.05

2.4.4 Question 16

A medical researcher claims a new drug reduces blood pressure by more than 10 points on average.

  1. Set up appropriate hypotheses.
  2. Describe Type I error in context and explain its consequences.
  3. Describe Type II error in context and explain its consequences.
  4. If you were the researcher, would you want α to be large or small? Why?

Part a: - H₀: μ ≤ 10 (Drug reduces BP by 10 or fewer points) - Hₐ: μ > 10 (Drug reduces BP by more than 10 points) - This is a right-tailed test

Part b: - Type I error: Concluding the drug reduces BP by more than 10 points when it actually doesn’t - Consequences: - Patients might be prescribed an ineffective drug - Healthcare resources wasted on inferior treatment - False hope given to patients - Potentially dangerous if they stop other effective treatments

Part c: - Type II error: Failing to conclude the drug reduces BP by more than 10 points when it actually does - Consequences: - An effective drug might not be approved or used - Patients miss out on beneficial treatment - Research investment wasted - Public health opportunity lost

Part d: - Want α to be small (like 0.01 or 0.05) - In medical research, Type I error is typically considered more serious - Don’t want to falsely claim a drug is effective - FDA requires strong evidence (small α) before approval - However, this increases β (Type II error probability)

2.4.5 Question 17

Two hypothesis tests are performed on the same data. Test 1 uses α = 0.01 and Test 2 uses α = 0.10.

  1. Which test is more likely to make a Type I error?
  2. Which test is more likely to make a Type II error?
  3. Which test has more power?
  4. Explain the trade-off between Type I and Type II errors.

Part a: - Test 2 (α = 0.10) is more likely to make Type I error - P(Type I error) = α - 0.10 > 0.01

Part b: - Test 1 (α = 0.01) is more likely to make Type II error - More stringent criterion means harder to reject H₀ - More likely to miss a true effect (β is larger)

Part c: - Test 2 (α = 0.10) has more power - Power = 1 - β - Larger α → easier to reject H₀ → higher power

Part d: - There is an inverse relationship between Type I and Type II errors - Decreasing α (being more conservative) increases β - Increasing α (being more liberal) decreases β but increases false positives - Need to balance based on consequences of each error type - Can improve both by increasing sample size

2.4.6 Question 18

A p-value of 0.08 is obtained for a hypothesis test.

  1. What decision would be made at α = 0.05?
  2. What decision would be made at α = 0.10?
  3. Is the p-value the probability that H₀ is true? Explain what it actually means.

Part a: - p-value = 0.08 > α = 0.05 - Fail to reject H₀

Part b: - p-value = 0.08 < α = 0.10 - Reject H₀

Part c: - No, the p-value is NOT the probability that H₀ is true - The p-value is: P(observing data as extreme or more extreme than what we got | H₀ is true) - It measures how surprising our data would be if H₀ were true - It’s the probability of the data given H₀, not the probability of H₀ given the data


2.5 Section E: Two-Sample Tests (Questions 19-22)

2.5.1 Question 19

Two teaching methods are compared. Method A: n₁ = 30, X̄₁ = 78, s₁ = 12. Method B: n₂ = 35, X̄₂ = 82, s₂ = 10. Assume equal variances.

  1. State hypotheses to test if mean scores differ.
  2. Calculate the pooled standard deviation.
  3. Calculate Cohen’s d and interpret the effect size.
  4. If the p-value is 0.14, what conclusion is made at α = 0.05?

Part a: - H₀: μ₁ = μ₂ (Mean scores are equal) - Hₐ: μ₁ ≠ μ₂ (Mean scores differ)

Part b: - s_pooled = √[((n₁-1)s₁² + (n₂-1)s₂²) / (n₁+n₂-2)] - s_pooled = √[(29×144 + 34×100) / 63] - s_pooled = √[(4176 + 3400) / 63] - s_pooled = √(7576/63) = √120.25 = 10.97

Part c: - Cohen’s d = (X̄₁ - X̄₂) / s_pooled - d = (78 - 82) / 10.97 = -4 / 10.97 = -0.36 - |d| = 0.36, which is between small (0.2) and medium (0.5) - This represents a small to medium effect size - Method B has slightly higher scores than Method A

Part d: - p-value = 0.14 > α = 0.05 - Fail to reject H₀ - Conclusion: There is insufficient statistical evidence at the 0.05 level to conclude that the mean scores differ between the two teaching methods.

2.5.2 Question 20

A company tests whether the proportion of defects differs between two production lines. Line 1: 15 defects in 200 items. Line 2: 25 defects in 250 items.

  1. Calculate both sample proportions.
  2. State appropriate hypotheses.
  3. Calculate the pooled proportion.
  4. If the test statistic is z = -0.82 with p-value = 0.41, what is the conclusion at α = 0.05?

Part a: - p̂₁ = 15/200 = 0.075 - p̂₂ = 25/250 = 0.100

Part b: - H₀: p₁ = p₂ (Defect rates are equal) - Hₐ: p₁ ≠ p₂ (Defect rates differ)

Part c: - p̂_pooled = (x₁ + x₂) / (n₁ + n₂) - p̂_pooled = (15 + 25) / (200 + 250) - p̂_pooled = 40/450 = 0.089

Part d: - p-value = 0.41 > α = 0.05 - Fail to reject H₀ - Conclusion: There is insufficient evidence at the 0.05 level to conclude that the defect rates differ between the two production lines. The observed difference could easily be due to random variation.

2.5.3 Question 21

A study compares weight loss for two diet plans. Diet A: n = 25, X̄ = 8.2 lbs, s = 3.1 lbs. Diet B: n = 20, X̄ = 6.5 lbs, s = 2.8 lbs.

  1. If we assume equal population variances, should we use a pooled or unpooled test?
  2. Calculate the pooled variance.
  3. The 95% CI for μ₁ - μ₂ is (-0.3, 3.7). Interpret this interval.
  4. Based on this CI, what would you conclude about H₀: μ₁ = μ₂ at α = 0.05?

Part a: - If we assume equal variances, use a pooled test - Pooled t-test combines variance estimates for more power

Part b: - s²_pooled = [(n₁-1)s₁² + (n₂-1)s₂²] / (n₁+n₂-2) - s²_pooled = [24×9.61 + 19×7.84] / 43 - s²_pooled = [230.64 + 148.96] / 43 - s²_pooled = 379.6 / 43 = 8.83 - s_pooled = √8.83 = 2.97 lbs

Part c: - We are 95% confident that the true difference in mean weight loss (Diet A - Diet B) is between -0.3 and 3.7 pounds. - Diet A might result in 0.3 lbs less weight loss up to 3.7 lbs more weight loss than Diet B - The interval includes both positive and negative values

Part d: - The interval contains 0, so we fail to reject H₀: μ₁ = μ₂ at α = 0.05 - There is insufficient evidence to conclude the diets differ in effectiveness - The difference is not statistically significant

2.5.4 Question 22

Below is computer output comparing mean salaries for two positions:

Two-sample t-test

         n    Mean    Std Dev
Group 1  40   62500   8200
Group 2  45   58300   7500

t = 2.56, p-value = 0.012
95% CI for difference: (950, 7450)
Cohen's d = 0.53
  1. Interpret the p-value in context.
  2. What conclusion should be made at α = 0.05?
  3. Interpret Cohen’s d. Is this a meaningful difference?
  4. Interpret the confidence interval.

Part a: - p-value = 0.012 means: - If the true mean salaries were equal, there is a 1.2% probability of observing a difference as large or larger than $4,200 just by chance - This provides statistical evidence that mean salaries from the population differ

Part b: - p-value = 0.012 < α = 0.05, so reject H₀ - Conclusion: There is statistically significant evidence at the 0.05 level that mean salaries differ between the two positions. Position 1 has significantly higher mean salary than Position 2.

Part c: - Cohen’s d = 0.53 represents a medium effect size (between 0.5 and 0.8) - The difference is about half a standard deviation - This is a practically meaningful difference, not just statistically significant - The higher salary for Position 1 is both statistically and practically significant

Part d: - We are 95% confident that the true difference in mean salaries (Position 1 - Position 2) is between $950 and $7,450 - Position 1’s mean salary is somewhere between about $1,000 and $7,500 higher than Position 2’s - The interval does not contain 0, consistent with rejecting H₀


2.6 Section F: Chi-Square and ANOVA (Questions 23-26)

2.6.1 Question 23

A chi-square test examines if major choice (Business, STEM, Arts) is independent of whether students live on campus. The output shows:

Chi-square test of independence

χ² = 8.45, df = 2, p-value = 0.015
  1. State the null and alternative hypotheses.
  2. What conclusion should be made at α = 0.05?
  3. What does this conclusion mean in practical terms?
  4. What additional information would help you interpret these results?

Part a: - H₀: Major choice and living on campus are independent - Hₐ: Major choice and living on campus are related/associated/dependent

Part b: - p-value = 0.015 < α = 0.05, so reject H₀ - Conclusion: There is significant evidence at the 0.05 level that major choice and living arrangements are related

Part c: - Students in different majors have different patterns of living on vs. off campus - For example, maybe STEM students are more likely to live on campus than Arts students - The association is statistically significant - However, we don’t know the nature of the relationship from just the chi-square test

Part d: - The contingency table with observed frequencies would help - Expected frequencies to see where largest deviations occur - Percentages/proportions for each major - Sample size (n) - Residuals to identify which cells contribute most to χ²

2.6.2 Question 24

An ANOVA compares mean productivity scores for four different work schedules:

ANOVA Table

Source      SS     df    MS      F      p-value
Between    320     3    106.67  4.27   0.008
Within     900    36     25
Total     1220    39
  1. How many workers were in this study?
  2. State the null and alternative hypotheses.
  3. What is the conclusion at α = 0.01?
  4. Can we determine which specific schedules differ based on this output alone?

Part a: - Total df = n - 1 = 39 - Therefore n = 40 workers

Part b: - H₀: μ₁ = μ₂ = μ₃ = μ₄ (All four schedules have equal mean productivity) - Hₐ: At least one mean differs from the others

Part c: - p-value = 0.008 < α = 0.01, so reject H₀ - Conclusion: There is significant evidence at the 0.01 level that mean productivity differs across the four work schedules

Part d: - No, we cannot determine which specific schedules differ - ANOVA only tells us that at least one mean is different - Need post-hoc tests (like Tukey’s HSD) to identify which pairs of means differ - This prevents making multiple comparisons without controlling Type I error rate

2.6.3 Question 25

A study tests if customer satisfaction ratings (on a 50-point scale) differ across three store locations. Location A: n = 20, X̄ = 42. Location B: n = 25, X̄ = 38. Location C: n = 22, X̄ = 45.

ANOVA: F = 5.82, p-value = 0.005
  1. What is the response variable? Is it quantitative or categorical?
  2. What is the explanatory variable? How many levels does it have?
  3. What conclusion should be drawn at α = 0.05?
  4. If you reject H₀, does this mean all three locations have different mean ratings? Explain.

Part a: - Response variable: Customer satisfaction rating - This is quantitative (numerical scale from 1-50) - Could be treated as continuous for ANOVA purposes

Part b: - Explanatory variable: Store location - This is categorical with 3 levels (Location A, B, C) - Also called the factor or grouping variable

Part c: - p-value = 0.005 < α = 0.05, so reject H₀ - Conclusion: There is statistically significant evidence at the 0.05 level that mean customer satisfaction ratings differ across the three store locations.

Part d: - No, this does NOT mean all three locations have different ratings - It means at least one location differs from the others - Possibilities: - A ≠ B ≠ C (all different) - A = B ≠ C - A ≠ B = C - A = C ≠ B - Need post-hoc comparisons to determine which specific pairs differ

2.6.4 Question 26

A contingency table shows the relationship between exercise frequency (Low, Medium, High) and health status (Poor, Fair, Good, Excellent).

  1. How many degrees of freedom does the chi-square test have?
  2. If χ² = 18.5 and the p-value = 0.005, what is the conclusion at α = 0.01?
  3. What does “independence” mean in this context?
  4. If the test is significant, what additional analysis might be helpful?

Part a: - df = (r - 1)(c - 1) - r = 3 rows (Low, Medium, High) - c = 4 columns (Poor, Fair, Good, Excellent) - df = (3-1)(4-1) = 2 × 3 = 6

Part b: - p-value = 0.005 < α = 0.01, so reject H₀ - Conclusion: There is significant evidence at the 0.01 level that exercise frequency and health status are related/associated.

Part c: - Independence would mean exercise frequency and health status are unrelated - Knowing someone’s exercise frequency wouldn’t help predict their health status - The distribution of health status would be the same across all exercise levels - Rejecting independence means there IS an association

Part d: - Examine the contingency table with observed vs. expected frequencies - Calculate residuals to see which cells contribute most to χ² - Look at conditional percentages (e.g., % in Good health | High exercise) - Create a mosaic plot or grouped bar chart - Compute measures of association (like Cramér’s V) - This helps understand the nature and strength of the relationship


2.7 Section G: Correlation and Regression (Questions 27-30)

2.7.1 Question 27

A regression analysis examines the relationship between advertising spending (in $1000s) and monthly sales (in $1000s):

Regression Output:
Ŷ = 12.5 + 2.3X
r = 0.78, r² = 0.608
n = 25
SE(slope) = 0.42
  1. Interpret the slope in context.
  2. Estimate sales when advertising spending is $10,000.
  3. Interpret r².

Part a: - For every $1,000 increase in advertising spending, monthly sales are expected to increase by $2,300 on average. - The slope of 2.3 means sales increase by 2.3 (thousand dollars) per unit increase in advertising

Part b: - X = 10 (representing $10,000) - Ŷ = 12.5 + 2.3(10) = 12.5 + 23 = 35.5 - Estimated sales are $35,500

Part c: - r² = 0.608 means 60.8% of the variation in monthly sales can be explained by the linear relationship with advertising spending - The remaining 39.2% is due to other factors or random variation

2.7.2 Question 28

The correlation between hours of TV watched per week and GPA is r = -0.65.

  1. Describe the relationship between these variables.
  2. What proportion of variation in GPA is explained by TV watching?
  3. Does this correlation prove that watching TV causes lower GPA? Explain.
  4. If the p-value for testing H₀: ρ = 0 is 0.003, what can we conclude?

Part a: - There is a strong, negative linear relationship between TV hours and GPA - As TV watching increases, GPA tends to decrease - The relationship is fairly strong (|r| = 0.65)

Part b: - r² = (-0.65)² = 0.4225 - About 42.25% of the variation in GPA is explained by hours of TV watched

Part c: - No, correlation does not prove causation - Possible explanations: - TV watching might cause lower GPA (possible) - Lower GPA might cause more TV watching (reverse causation) - A third variable (like motivation, work hours, study habits) might affect both - This is an observational study, not a randomized experiment - Cannot establish causal relationship from correlation alone

Part d: - p-value = 0.003 < 0.05, so reject H₀: ρ = 0 - There is significant evidence of a linear relationship (correlation ≠ 0) - The negative relationship is statistically significant - However, this still doesn’t prove causation

2.7.3 Question 29

A regression of exam scores (Y) on hours studied (X) gives:

Ŷ = 35 + 5.2X
r² = 0.42
  1. Predict the exam score for a student who studies 8 hours.
  2. The mean hours studied is 7 with standard deviation 2.5. The mean exam score is 71.4. Calculate the correlation coefficient r.
  3. What does r² = 0.42 tell us?
  4. Would you feel confident predicting the exam score for someone who studied 20 hours? Why or why not?

Part a: - Ŷ = 35 + 5.2(8) = 35 + 41.6 = 76.6 - Predicted score is 76.6

Part b: - First find s_y: Ŷ = 35 + 5.2(7) = 71.4 ✓ (confirms equation) - Slope b = r(s_y/s_x) - Need to find s_y first using other information - From Ŷ = a + bX and point (X̄, Ȳ): 71.4 = 35 + 5.2(7) ✓ - r² = 0.42, so r = ±√0.42 = ±0.648 - Since slope is positive, r = +0.648 or about 0.65

Part c: - 42% of the variation in exam scores is explained by the linear relationship with hours studied - 58% is due to other factors (aptitude, prior knowledge, test anxiety, etc.)

Part d: - No, should not feel confident - This is extrapolation (20 hours likely outside the range of data) - The linear relationship may not hold at extreme values - May encounter ceiling effects (scores can’t exceed 100) - Regression is most reliable within the range of observed X values

2.7.4 Question 30

Below is regression output examining the relationship between years of experience (X) and salary in thousands (Y):

Coefficients:
Intercept: 45.2 (SE = 2.1, t = 21.5, p < 0.001)
Experience: 3.8 (SE = 0.4, t = 9.5, p < 0.001)

r² = 0.63, n = 50
  1. Write the regression equation.
  2. Interpret the intercept. Does it make practical sense?
  3. Test if the slope is significantly different from zero at α = 0.05.
  4. A person with 10 years of experience earns $95,000. What is the residual for this person?

Part a: - Ŷ = 45.2 + 3.8X - Where Y is salary in thousands and X is years of experience

Part b: - The intercept of 45.2 means the predicted starting salary (0 years experience) is $45,200 - This may or may not make practical sense depending on the field - Be cautious: if no one in the data had 0 years experience, this is extrapolation - However, it’s reasonably close to typical entry-level salaries in many fields

Part c: - t = 9.5 with p < 0.001 - p-value < α = 0.05, so reject H₀: β₁ = 0 - The slope is significantly different from zero - There is very strong evidence of a relationship between experience and salary

Part d: - Predicted: Ŷ = 45.2 + 3.8(10) = 45.2 + 38 = 83.2 thousand = $83,200 - Actual: Y = $95,000 = 95 thousand - Residual = Y - Ŷ = 95 - 83.2 = 11.8 thousand = $11,800 - This person earns $11,800 more than predicted (positive residual)


3 End of Practice Set

Key Reminders for the Exam:

  1. Always interpret results in context
  2. Check conditions before using tests
  3. Distinguish between statistical significance and practical importance
  4. Remember that correlation ≠ causation
  5. Be precise with language (reject vs. fail to reject, not accept)
  6. Show all work for partial credit. If you are using your calculator for computations, you must explain what you are calculating and write down the formula or process you are using. Numerical results without justification or explanation will receive minimal or no credit.
  7. State conclusions in context, not just as statistical decisions

Good luck on your final exam!