
28 Apr 2026
After solving the recommendation mystery, Sarah now faces a new challenge…
She has 50,000 user survey responses with:
The Problem: Raw numbers everywhere! How can she make sense of it all? 🤔
Today’s mission: Learn to visualize data and calculate measures of center!
By the end of this lecture, you will be able to:
Raw Numbers 😵
12, 15, 8, 22, 18, 9, 14,
16, 20, 11, 13, 17, 19, 10,
15, 21, 14, 16, 18, 12, 15,
19, 13, 17, 20, 25, 8, 14,
16, 22, 11...
Hard to see patterns!
Visualized Data 😊

Patterns emerge immediately!
Key insight: Human brains process visual information 60,000x faster than text!
Let’s explore each one with Sarah’s Netflix data! 🎬
Key Features:
Sarah’s Data: Weekly watch time (hours) for 100 users
What do you notice? Most users watch 10-15 hours weekly!
Steps:
Pro tip: Experiment with different bin sizes - too few bins hide patterns, too many create noise!
Bar Charts
Example: Number of users per genre preference
Histograms
Example: Distribution of watch times
Sarah’s Genre Preferences:
Creating in Google Sheets: Insert → Chart → Column chart (or Bar chart)
When to Use:
When NOT to Use:
Subscription Type Distribution:
Google Sheets: Insert → Chart → Pie chart
Chart Selection Challenge: Which visualization would you use?
Scenario 1: Show the distribution of user ages (18-65 years old)
Scenario 2: Compare the number of users across five subscription plans
Scenario 3: Display the market share percentages of different streaming platforms
Discuss with a partner:
Share your answers on Poll Everywhere!
Scenario 3: Which chart type for displaying market share percentages of streaming platforms?
Box Plot shows the distribution through five key values:
Bonus: Easily spots outliers! 🎯
Red dots = Outliers (values far from the rest)
Unfortunately, Google Sheets doesn’t have a built-in box plot option! 😔
Solutions:
For now: Focus on calculating the five numbers using Google Sheets functions!
Sarah asks: “What’s a typical Netflix user’s weekly watch time?”
She has 50,000 different numbers. Which single number best represents them all?
Three measures of “center”:
Definition: The sum of all values divided by the number of values
Why use it?:
Caution: Sensitive to outliers! 🚨
Sample Mean: \(\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}\)
Population Mean: \(\mu = \frac{\sum_{i=1}^{N} x_i}{N}\)
Understanding Sigma Notation (Σ):
Calculate: \(\sum_{i=1}^{5} x_i\) where x = {10, 15, 12, 18, 20}
Expanding:
\(\sum_{i=1}^{5} x_i = x_1 + x_2 + x_3 + x_4 + x_5\)
\(= 10 + 15 + 12 + 18 + 20 = 75\)
Mean: \(\bar{x} = \frac{75}{5} = 15\) hours
Example data in cells A1:A5: 10, 15, 12, 18, 20
Formula: =AVERAGE(A1:A5)
Result: 15
Alternative (manual calculation):
=SUM(A1:A5)/COUNT(A1:A5)
Also gives 15!
Pro tip: Use AVERAGE for simplicity, but know the underlying calculation!
Definition: The middle value when data is arranged in order
How to find it:
Odd n (n=5):
Data: 10, 12, 15, 18, 20
Median = 15 hours
(The 3rd value)
Even n (n=6):
Data: 10, 12, 15, 18, 20, 22
Median = (15 + 18)/2 = 16.5 hours
(Average of 3rd and 4th values)
Position formula: If ordered, median is at position (n+1)/2
Example data in A1:A6: 20, 10, 15, 12, 22, 18
Formula: =MEDIAN(A1:A6)
Result: 16.5
Google Sheets automatically:
Key advantage: You don’t need to manually sort! 🎉
Definition: The value that appears most frequently in the dataset
Types:
Unimodal:
12, 15, 15, 15, 18, 20
Mode = 15
(appears 3 times)
Bimodal:
12, 12, 15, 18, 18, 20
Modes = 12 and 18
(both appear twice)
No Mode:
10, 12, 15, 18, 20
No mode
(all values unique)
Best for:
Categorical data!
Example: Most popular genre, common subscription type
Example data in A1:A7: 15, 12, 15, 18, 15, 20, 12
Formula: =MODE(A1:A7)
or
=MODE.SNGL(A1:A7)
Result: 15
For multiple modes: =MODE.MULT(A1:A7) returns array of modes
Limitation: Only works with numerical data. For categorical data, use pivot tables or COUNTIF!
Sarah’s Dilemma: Weekly watch times (hours) from 7 users
Data: 5, 8, 10, 12, 15, 18, 95
Mean: (5+8+10+12+15+18+95)/7 = 23.3 hours
Median: 5, 8, 10, 12, 15, 18, 95 = 12 hours
Question: Which better represents a “typical” user? 🤔
The 95-hour user is an OUTLIER - someone who binges excessively!
Impact:
Sarah’s decision: Report median to executives when data has outliers!
Use MEAN when:
Example: Average test scores in a class
Use MEDIAN when:
Example: Household income, home prices
Use MODE when:
Example: Most popular shoe size, favorite ice cream flavor
Pro tip: Often report multiple measures for complete picture!
Example: “Mean watch time is 23 hours, but median is 12 hours, suggesting some extreme binge-watchers.”
Scenario Analysis - Work with a partner:
Scenario 1: Netflix user watch times (hours):
5, 8, 10, 12, 15, 18, 95
Scenario 2: User satisfaction ratings:
1-star, 3-star, 4-star, 5-star, 5-star, 5-star
Post one paragraph with your analysis on Ed Discussion!
Post on Ed Discussion:
Essential Functions:
| Function | Purpose | Example |
|---|---|---|
=AVERAGE(range) |
Calculate mean | =AVERAGE(A1:A100) |
=MEDIAN(range) |
Find median | =MEDIAN(A1:A100) |
=MODE(range) |
Find mode | =MODE(A1:A100) |
=SUM(range) |
Sum all values | =SUM(A1:A100) |
=COUNT(range) |
Count values | =COUNT(A1:A100) |
=MIN(range) |
Minimum value | =MIN(A1:A100) |
=MAX(range) |
Maximum value | =MAX(A1:A100) |
Steps:
What Sarah Discovered:
The Impact:
AVERAGE, MEDIAN, MODENext Class: Measures of Spread & Distribution Shape
Questions? These concepts are fundamental - make sure you’re comfortable! 🤝
Rate your confidence (1-5) on Ed Discussion:
If you rated anything 3 or below, please visit office hours! 🤝
Questions? Office hours information on Canvas.
Next up: Measures of Spread & Distribution Shape!
![]()
STAT 17 – Spring 2026