28 Apr 2026
Last time, Sarah discovered that the median watch time is 12 hours, while the mean is 18 hours…
But her manager asks: “How consistent are our users?”
The Problem: Measures of center don’t tell the whole story! 🤔
Today’s mission: Learn to measure and interpret data spread and distribution shape!
By the end of this lecture, you will be able to:
Two datasets with the same mean (15 hours):
Dataset A: 14, 14, 15, 15, 16, 16
Dataset B: 5, 8, 12, 18, 22, 25
Both have mean = 15 hours
But they look completely different!
Question: Which dataset represents more consistent user behavior?
Answer: Dataset A! The values are clustered tightly around the mean.
Red dashed line = Mean (15 hours for both)
Three key measures:
Each measure tells us how spread out or dispersed the data is!
Definition: Range = Maximum - Minimum
Example: Sarah’s watch times (hours)
Data: 5, 8, 10, 12, 15, 18, 95
Range = 95 - 5 = 90 hours
Pros:
Cons:
Example data in cells A1:A7: 5, 8, 10, 12, 15, 18, 95
Formula: =MAX(A1:A7) - MIN(A1:A7)
Result: 90
Alternative (more explicit):
In cell B1: =MAX(A1:A7)
In cell B2: =MIN(A1:A7)
In cell B3: =B1 - B2
Limitation: Notice how one outlier (95) drastically affects the range!
To measure spread better, we need to consider how far each value is from the mean.
Deviation = (Value - Mean) = \(x_i - \bar{x}\)
Example: If mean = 12 hours
Positive deviation = above average
Negative deviation = below average
Let’s try: Data = 5, 8, 10, 12, 15, 18, 95
Mean = (5+8+10+12+15+18+95)/7 = 23.3 hours
Deviations:
Sum of deviations = -18.3 - 15.3 - 13.3 - 11.3 - 8.3 - 5.3 + 71.7 = 0 😱
Mathematical fact: \(\sum_{i=1}^{n} (x_i - \bar{x}) = 0\) always!
Why? Positive and negative deviations always cancel each other out.
Solution: Square the deviations to make them all positive!
\((x_i - \bar{x})^2\) is always positive (or zero)
Definition: The average of the squared deviations from the mean
Sample Variance: \(s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}\)
Population Variance: \(\sigma^2 = \frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N}\)
Key differences:
Short answer: To get an unbiased estimate of population variance.
Intuition: When we calculate \(\bar{x}\) from the sample, we “use up” one degree of freedom. Only (n-1) values are truly “free” to vary.
For this class: Just remember to use n-1 for sample variance!
Good news: Google Sheets handles this automatically! 🎉
Example: Dataset A = 14, 14, 15, 15, 16, 16 (n = 6)
Step 1: Calculate mean
\(\bar{x} = \frac{14+14+15+15+16+16}{6} = 15\) hours
Step 2: Calculate each squared deviation
Step 3: Sum the squared deviations
\(\sum (x_i - \bar{x})^2 = 1 + 1 + 0 + 0 + 1 + 1 = 4\)
Step 4: Divide by n-1
\(s^2 = \frac{4}{6-1} = \frac{4}{5} = 0.8\) hours²
Units: Variance is in squared units (hours²) - not very intuitive! 🤔
Example data in A1:A6: 14, 14, 15, 15, 16, 16
Formula: =VAR.S(A1:A6)
Result: 0.8
Alternative functions:
=VAR.S(range) - Sample variance (use this for samples!)=VAR.P(range) - Population variance (rarely used)=VAR(range) - Older function (same as VAR.S)Remember: The “S” stands for “Sample” (divides by n-1)!
Definition: The square root of the variance
Sample Standard Deviation: \(s = \sqrt{s^2} = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}}\)
Population Standard Deviation: \(\sigma = \sqrt{\sigma^2}\)
Why take the square root?
To get back to the original units! 🎯
Dataset A: Variance = 0.8 hours²
Standard Deviation: \(s = \sqrt{0.8} \approx 0.89\) hours
Interpretation: On average, users’ watch times deviate from the mean by about 0.89 hours (or 53 minutes).
Much more interpretable than “0.8 hours²”! ✅
Example data in A1:A6: 14, 14, 15, 15, 16, 16
Formula: =STDEV.S(A1:A6)
Result: ≈ 0.89
Alternative functions:
=STDEV.S(range) - Sample standard deviation (use this!)=STDEV.P(range) - Population standard deviation=STDEV(range) - Older function (same as STDEV.S)Pro tip: You can also use =SQRT(VAR.S(A1:A6)) - gives the same result!
Dataset A: 14, 14, 15, 15, 16, 16
Dataset B: 5, 8, 12, 18, 22, 25
Interpretation: Dataset B has users with much more variable viewing habits!
Dataset B: 5, 8, 12, 18, 22, 25 (n = 6, mean = 15)
Squared deviations:
\(\sum (x_i - \bar{x})^2 = 316\)
\(s^2 = \frac{316}{5} = 63.2\) hours²
\(s = \sqrt{63.2} \approx 7.95\) hours ✅
Practice with Spread Measures:
Dataset C: Weekly Netflix watch times (hours)
10, 11, 12, 13, 14
Tasks:
Discuss with a partner: How would you interpret the standard deviation? What does it tell Sarah about these users?
Share your answers in Poll Everywhere!
How would you interpret the standard deviation? What does it tell Sarah about these users?
When we return: Distribution shape
Spread tells us how dispersed data is, but SHAPE tells us how it’s distributed!
Three main shapes:
Key feature: Mean ≈ Median (they overlap!)
Key feature: Mean > Median (pulled by outliers on the right!)
Sarah’s actual data (first 20 users, hours per week):
5, 6, 7, 8, 8, 9, 10, 10, 11, 12, 13, 14, 15, 18, 20, 22, 45, 67, 82, 95
Calculate in Google Sheets:
=AVERAGE(A1:A20) = 21.85 hours=MEDIAN(A1:A20) = 11.5 hoursMean > Median → Right-skewed!
The extreme binge-watchers (45, 67, 82, 95) pull the mean up!
Key feature: Mean < Median (pulled by low outliers on the left!)
Three-step process:
Pro tip: Look at a histogram too - visual confirmation helps! 📊
For Sarah’s Netflix analysis:
Right-skewed watch times tell her:
Business impact:
Right-skewed examples (Mean > Median):
Left-skewed examples (Mean < Median):
Steps to check for skewness:
=AVERAGE(A:A) (mean)=MEDIAN(A:A) (median)=STDEV.S(A:A) (standard deviation)Skewness Detective Challenge:
Dataset D: User satisfaction ratings (1-5 scale)
5, 5, 5, 5, 4, 4, 4, 3, 3, 2, 1
Tasks:
Discuss: What does this distribution tell Sarah about user satisfaction?
Post your analysis on Ed Discussion with your group members’ names!
To fully describe a dataset, Sarah needs BOTH:
Measures of Center:
Where is the data centered?
Measures of Spread:
How dispersed is the data?
Plus Shape:
Weekly Watch Time Data (hours):
Measures of Center:
Measures of Spread:
Distribution Shape:
“Based on my analysis…”
Center: “The median viewing time is 12 hours per week, which better represents our typical user than the mean (18 hours), since we have some extreme binge-watchers.”
Spread: “There’s substantial variation in viewing habits (SD = 15.3 hours), indicating diverse user segments.”
Shape: “Our distribution is right-skewed, with most users watching 8-15 hours, but a valuable segment watching 50+ hours weekly.”
Action: “We should create targeted strategies for both casual viewers and super-users!”
Essential Functions for Spread & Shape:
| Function | Purpose | Example |
|---|---|---|
=MAX(range) |
Maximum value | =MAX(A1:A100) |
=MIN(range) |
Minimum value | =MIN(A1:A100) |
=VAR.S(range) |
Sample variance | =VAR.S(A1:A100) |
=STDEV.S(range) |
Sample std dev | =STDEV.S(A1:A100) |
=AVERAGE(range) |
Mean | =AVERAGE(A1:A100) |
=MEDIAN(range) |
Median | =MEDIAN(A1:A100) |
Remember: Use .S versions for samples (which is almost always!)
STDEV.S() and VAR.S() for samples.S versions!)Without calculating, predict the skewness:
Scenario 1: Ages of US residents
Most people in middle age ranges, fewer children and elderly
Answer: Approximately symmetric (slight variations)
Scenario 2: Incomes in the United States
Most people earn moderate incomes, few earn millions
Answer: Right-skewed! (Mean > Median)
Scenario 3: Scores on an easy exam
Most students score 80-100, few score below 70
Answer: Left-skewed! (Mean < Median)
Next Class: Introduction to Probability
This week’s WS and HW: Will include analyzing a dataset using everything we’ve learned - center, spread, and shape! 📊
Questions? Office hours are the perfect place to practice these calculations! 🤗
Rate your confidence (1-5) on Ed Discussion:
If you rated anything 3 or below, please visit office hours! 🤝
Questions? Office hours information on Canvas.
Next up: Introduction to Probability & Statistical Thinking!
![]()
STAT 17 – Spring 2026