28 Apr 2026
2019: Netflix has 167 million subscribers worldwide.
Sarah, a marketing specialist, notices something puzzling…
Today’s mission: Follow Sarah’s statistical detective story and learn the fundamental concepts that helped her solve this mystery!
By the end of this lecture, you will be able to:
Statistics is the science of collecting, organizing, analyzing, and interpreting data to make informed decisions.
Think of it as Sarah’s toolbox for understanding Netflix users! 🧰
Sarah’s case: All 167 million Netflix subscribers worldwide
Sarah’s case: The 50,000 users she actually studies
Sarah’s case: The true average satisfaction rating of ALL Netflix users
Sarah’s case: The average satisfaction rating from her 50,000 users
Sarah’s case: The probability that a user will watch a recommended movie
Your Turn: Work with a partner and identify the following:
“A YouTuber wants to understand running habits from their subscribers. They survey 2,000 subscribers from their 15 million total users about their average daily running time. The survey shows subscribers run an average of 1.3 hours daily.”
Identify:
Discussion: Share your answers in Poll Everywhere!
Identify:
Sarah realizes she needs to collect different types of information about users.
Variable: A characteristic of interest for each person or object in a population
Variables that take on values that are names or labels
Examples from Netflix:
Preferred genre (Comedy, Drama, Horror)
Subscription type (Basic, Standard, Premium)
Are you happy with your Netflix recommendations? (Always, Most of the time, Sometimes, Rarely, Never)
Are you sharing your Netflix account now? (Yes/No)
Variables that take on values indicated by numbers
Variables that take on values indicated by numbers
Result of counting (whole numbers)
Hours watched per week: 0, 1, 2, 3…
Number of shows in watchlist: 5, 12, 47…
Result of measuring (can be any value)
Time spent browsing: 12.7 minutes
User satisfaction rating: 3.8/5.0
Classify each variable as: Categorical Nominal, Categorical Ordinal, Categorical Binary, Discrete Numerical, or Continuous Numerical
Discussion: Share your answers in Poll Everywhere!
Classify each variable as: Categorical Nominal, Categorical Ordinal, Categorical Binary, Discrete Numerical, or Continuous Numerical
Sarah can’t survey all 167 million users.
How does she choose who to study? 🤔
Give each population member a number, then randomly select
Netflix Example: Randomly select 2,000 user IDs from the database
Pros ✅ - Unbiased - Every member has equal chance
Cons ❌ - May not represent subgroups well
Select every kth individual from a list
Netflix Example: Select every 83,500th subscriber
(167M ÷ 2,000 = 83,500)
Pros ✅ - Easy to implement - Spreads sample across population
Cons ❌ - Can introduce bias if there’s a pattern
Divide population into groups (strata), then randomly sample from each
Netflix Example: Sample proportionally from each country/age group
Pros ✅ - Ensures representation of subgroups
Cons ❌ - Requires knowledge of population characteristics
Divide population into clusters, randomly select clusters, include all members
Netflix Example: Randomly select cities, then survey ALL users in those cities
Pros ✅ - Practical when populations are geographically spread
Cons ❌ - Members within clusters may be similar
Select individuals that are easily accessible
Netflix Example: Survey users who respond to an email invitation
Pros ✅ - Quick and cheap
Cons ❌ - Often biased - Not representative
Scenario: You’re studying student satisfaction with campus dining services.
Work in Groups of 3-4:
1 person from the group shares on Ed Discusssion your sampling strategy in 1 paragraph and add the names of your groups memebers on the post!
When we return: Research ethics and bias - crucial for Sarah’s investigation!
Sarah wants to test if changing the recommendation algorithm improves user satisfaction.
But she faces ethical considerations… 🤔
Netflix example: Users don’t know which algorithm they’re using; customer service reps evaluating satisfaction don’t know either
Even with careful planning, Sarah must watch for bias that could mislead her conclusions.
Bias can sneak in everywhere! 🕵️♀️
Not all population members are equally likely to be selected
Example: Only surveying users who respond to emails (active users bias)
Impact: Results don’t represent all Netflix users
Issues affecting reliability beyond natural variation:
Variables affecting the study that aren’t explanatory or response variables
Example: Time of year (people watch differently during holidays)
Impact: Makes it seem like algorithm changes caused differences when they didn’t
Netflix example: Telling users they have a “new improved system” when nothing changed
Why it matters: People may report higher satisfaction just because they think something improved
Read each scenario and identify potential sources of bias
Scenario 1: “A university emails a survey about online learning satisfaction to all 10,000 students and receives 500 responses.”
Scenario 2: “A coffee shop owner asks customers who visit between 2-4 PM on weekdays about roast preference.”
For Each: (1) Identify bias type(s), (2) Explain impact, (3) Suggest improvements
Individual → Partner → Poll Everywhere (3+3+2 min)
Scenario 2: “A coffee shop owner asks customers who visit between 2-4 PM on weekdays about roast preference.” (1) Identify bias type(s), (2) Explain impact, (3) Suggest improvements
Sarah discovered: Recommendation accuracy varied by cultural context - the algorithm needed regional customization!
The Impact:
Netflix improved user satisfaction by 23%
Reduced subscription cancellations by 15%
Estimated annual savings: $15+ million
Next Class: We’ll explore descriptive statistics and learn how to summarize and visualize data - the next step in Sarah’s journey! 📊
Homework Preview: You’ll design a sampling strategy for a real-world scenario and identify potential biases.
Questions? This is your foundation for statistical thinking - make sure you’re comfortable with these concepts! 🤔
Before you leave, rate your confidence (1-5) on each learning outcome on Ed Discussion:
If you rated anything 3 or below, please stay after class, I have office hours today! 🤝
📊✨
Questions? Office hours after class today.
![]()
STAT 17 – Spring 2026