Sampling Distributions and the Central Limit Theorem
A researcher wants to know the average sleep duration of college students per day.
Population: All college students
Parameter: Average sleep hours (μ) per day
Sample: 100 students surveyed -> random sample
Statistic: Sample mean = 6.8 hours (\(\bar{x}\))
Key Question: How confident can we be that 6.8 hours represents the true population average?
Scenario: You want to estimate the average mercury level in anchovies fished in the Monterey Bay.
You obtain 25 anchovies using a random sample from local fisheries and test them, finding an average mercury level of 0.055 ppm.
Questions to discuss (2 minutes):
Quick Report: Answer Poll with your opinion ✋
We use statistics to estimate parameters, but statistics vary due to sampling variability.
Let’s conduct an experiment together!
Activity: Estimate the average height in cms in this class
What we’ll observe: Different samples → different sample means (sampling variability)
Even from the same population, different samples give different results!
Example: If we took 1000 different samples of 25 fish, we’d get 1000 different sample means. The sampling distribution shows how those means are distributed.
Scenario: You’re studying bird migration distances. You take samples of 50 birds each.
Sample 1: \(\bar{x}_1\) = 2,430 km
Sample 2: \(\bar{x}_2\) = 2,510 km
Sample 3: \(\bar{x}_3\) = 2,465 km
Questions to discuss (2 minutes):
Quick Report: What affects how much sample means vary? ✋
In plain language:
When you take sufficiently large random samples from any population, the sampling distribution of the sample mean will be approximately normal (bell-shaped).
Population distribution
(can be any shape!)

Sampling distribution of \(\bar{x}\)
(approximately normal!)

The magic: Regardless of the population’s shape, sample means follow a normal pattern!
Scenario: A biologist measures shell thickness in snails. The population distribution is heavily right-skewed (most snails have thin shells, few have very thick shells).
She takes samples of n = 40 snails at a time and calculates the average shell thickness for each sample.
Questions to discuss (2 minutes):
Quick Report: What’s the difference between the population distribution and the sampling distribution? ✋
The standard error (SE) measures the typical distance between a sample statistic and the population parameter.
\[SE = \frac{\sigma}{\sqrt{n}}\]
Key insights:
Now we can connect these ideas to statistical inference:
All three rely on understanding sampling distributions!
Example: “We are 95% confident that the average sleep duration of college students is between 6.5 and 7.1 hours”
NOT: “There’s a 95% chance the true mean is in this interval”
YES: “If we repeated this process many times, about 95% of the intervals we create would contain the true population mean”
Analogy: It’s like a fishing net. We’re 95% confident our net caught the fish (parameter), but the fish is either in the net or not - there’s no probability to it once we’ve caught it.
Scenario: A health researcher wants to estimate the average daily water intake of adults. They survey 100 adults and calculate a 95% confidence interval: (1.8, 2.4) liters per day.
Questions to discuss (3 minutes):
Quick Report: How would you explain this confidence interval to someone without statistics training? ✋
The general form:
\[\text{Point Estimate} \pm \text{Margin of Error}\]
For a population mean:
\[\bar{x} \pm z^* \times SE(\bar{x})\]
where:
First Question: What are you estimating?
Quantitative variable
(height, age, income, test scores)
Formula structure: \[\bar{x} \pm (\text{critical value}) \times SE\]
where \(SE = \frac{s}{\sqrt{n}}\)
Categorical variable
(% yes/no, success/failure)
Formula structure: \[\hat{p} \pm (\text{critical value}) \times SE\]
where \(SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)
Which distribution should you use?
Conditions: - Population \(\sigma\) is known, OR - Large sample (\(n \geq 30\)) AND \(\sigma\) unknown
Critical value: \(z^*\) from standard normal
(e.g., \(z^* = 1.96\) for 95% CI)
Formula: \[\bar{x} \pm z^* \frac{\sigma}{\sqrt{n}}\] or \[\bar{x} \pm z^* \frac{s}{\sqrt{n}}\]
Conditions: - Population \(\sigma\) is unknown, AND - Small sample (\(n < 30\)), AND - Population is approximately normal
Critical value: \(t^*\) with \(df = n-1\)
(e.g., \(t^* \approx 2.09\) for 95% CI, \(n=20\))
Formula: \[\bar{x} \pm t^* \frac{s}{\sqrt{n}}\]
Note: Wider than z-interval (more uncertainty)
Use Normal (z) approximation
Conditions (must check!): 1. Random sample 2. \(n\hat{p} \geq 10\) 3. \(n(1-\hat{p}) \geq 10\)
Formula: \[\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]
Common critical values: - 90% CI: \(z^* = 1.645\) - 95% CI: \(z^* = 1.96\) - 99% CI: \(z^* = 2.576\)
Example: In a survey of 200 students, 120 prefer online learning.
\(\hat{p} = 120/200 = 0.6\)
Check: \(200(0.6) = 120 \geq 10\) ✓ and \(200(0.4) = 80 \geq 10\) ✓
95% CI: \(0.6 \pm 1.96\sqrt{\frac{0.6(0.4)}{200}} = 0.6 \pm 0.068 = (0.532, 0.668)\)
Key insight: There’s always a trade-off between precision and confidence!
Next class we’ll build on these ideas to test specific claims:
Preview: We’ll use sampling distributions to determine how likely we’d see our sample result if a claim were true.
Questions?