HW3:

A Probability Investigation

🎬 The Case

Welcome back, Statistical Detective!

Your success with Brew Haven has caught the attention of StreamFlix, a growing streaming service competing with Netflix and Disney+. The VP of User Experience, Jordan Chen, has reached out with an urgent problem:

“We’re losing subscribers at an alarming rate, but we don’t understand the patterns. Some users cancel after trying our free trial, others leave after a few months. We track viewing habits, subscription tiers, and device usage, but we need someone who understands probability to help us predict churn risk and identify our most valuable customer segments. Can you help us make sense of these patterns?”

Jordan has extensive data on 10,000 subscribers and needs you to use probability theory to uncover the relationships between viewing behavior, subscription choices, and cancellation risk.

Your mission: Use probability concepts to help StreamFlix understand their subscriber patterns and reduce churn!


Question 1: Probability Foundations

StreamFlix has 10,000 current subscribers. Answer the following:

a. Define these terms in the context of StreamFlix:

  • Experiment:

  • Sample Space:

  • Event:

  • Complement:

b. Let A = “subscriber watches action movies” and B = “subscriber watches on mobile device”

Write in words what each represents:

  • \(A^c\):

  • \(A \cup B\):

  • \(A \cap B\):

  • \(P(A|B)\):


Question 2: Independent vs. Mutually Exclusive Events

Jordan is confused about independence and mutual exclusivity.

a. Definitions

Explain in YOUR OWN WORDS (no textbook definitions):

  • What makes two events independent?

  • What makes two events mutually exclusive?

  • Can two events be both independent AND mutually exclusive? Explain why or why not.

b. Classify these pairs

For each pair, state if they are: Independent, Mutually Exclusive, Both, or Neither. Justify each answer.

Event Pair Classification Justification
“Subscribes to Premium tier” and “Subscribes to Basic tier”
“Watches content on weekends” and “Prefers comedy genre”
“Cancels within 30 days” and “Completes free trial”
“Uses mobile app” and “Uses smart TV app” (same subscriber can use both)

Question 3: Addition and Multiplication Rules

StreamFlix data shows: - 35% of subscribers watch action movies - 45% of subscribers watch comedies - 15% of subscribers watch both action movies AND comedies - 60% of Premium subscribers renew for another month - 40% of subscribers are on the Premium tier

a. Using the Addition Rule

Calculate \(P(\text{Action OR Comedy})\)

Show your work using the formula: \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)

b. More Addition Rule Practice

  • What’s the probability a randomly selected subscriber watches NEITHER action nor comedy?

  • Show your calculation:

c. Using the Multiplication Rule

What’s the probability that a randomly selected subscriber is BOTH on Premium tier AND renews for another month?

Assume renewal and tier choice are independent. Show your work using: \(P(A \cap B) = P(A) \times P(B)\)

d. Critical Thinking

In part (c), we assumed independence. List TWO reasons why renewal and tier choice might NOT actually be independent in reality:


Question 4: Contingency Table Analysis

StreamFlix surveyed 1,000 subscribers about their viewing habits and subscription status. Here’s the data:

Canceled Subscription Active Subscription Total
Watches Daily 80 420 500
Watches Weekly 120 230 350
Watches Rarely 100 50 150
Total 300 700 1,000

a. Define probability types

Using this table as context, explain:

  • Joint Probability:

  • Marginal Probability:

  • Conditional Probability:

b. Calculate the following (show work as fractions, then decimals):

  1. \(P(\text{Watches Daily})\) =

  2. \(P(\text{Canceled})\) =

  3. \(P(\text{Watches Daily AND Canceled})\) =

  4. \(P(\text{Watches Daily OR Canceled})\) =

  5. \(P(\text{Canceled | Watches Daily})\) =

  6. \(P(\text{Canceled | Watches Rarely})\) =

  7. \(P(\text{Watches Daily | Active})\) =

c. Interpretation

  • Compare \(P(\text{Canceled | Watches Daily})\) with \(P(\text{Canceled | Watches Rarely})\). What does this tell Jordan about viewing frequency and churn risk?

  • Is viewing frequency independent of subscription status? Provide statistical evidence using the definition: Events A and B are independent if \(P(A|B) = P(A)\).


Question 5: Device Usage Analysis

StreamFlix tracks which devices subscribers use. Here’s data from 2,000 subscribers:

Mobile Only TV Only Both Devices Total
Premium Tier 180 320 300 800
Standard Tier 240 360 280 880
Basic Tier 150 130 40 320
Total 570 810 620 2,000

a. Calculate:

  1. \(P(\text{Premium Tier})\) =

  2. \(P(\text{Both Devices})\) =

  3. \(P(\text{Premium AND Both Devices})\) =

  4. \(P(\text{Both Devices | Premium})\) =

  5. \(P(\text{Premium | Both Devices})\) =

b. Comparison

Notice that \(P(\text{Both Devices | Premium}) \neq P(\text{Premium | Both Devices})\)

Explain in words what EACH probability means and why they’re different:

c. Business Insight

  • Which tier has the highest percentage of multi-device users? Show calculation.

  • What should Jordan conclude about the relationship between tier choice and device usage?


Question 6: Free Trial Conversion

StreamFlix offers a 30-day free trial. Historical data shows: - 70% of trial users watch at least 5 hours of content - Of those who watch 5+ hours, 60% convert to paid subscribers - Of those who watch less than 5 hours, only 15% convert to paid subscribers

a. Create a tree diagram

Sketch a tree diagram showing: - First branch: Hours watched (5+ hours vs. <5 hours) - Second branch: Conversion outcome (Convert vs. Don’t Convert) - Label all probabilities on branches - Calculate all four endpoint probabilities

b. Calculate:

  1. What’s the probability a trial user BOTH watches 5+ hours AND converts?

  2. What’s the overall probability that a trial user converts (regardless of viewing)? Use the Law of Total Probability: \(P(\text{Convert}) = P(\text{Convert AND 5+ hours}) + P(\text{Convert AND <5 hours})\)

  3. Given that a user converted to paid, what’s the probability they watched 5+ hours during trial? Use Bayes’ Theorem concept (show your work)

c. Recommendation

Based on these probabilities, what strategy would you recommend to increase conversions?


Question 7: Content Preference Patterns

Jordan provides this data on genre preferences and age groups (1,500 subscribers):

Action Romance Documentary Total
Ages 18-30 240 180 80 500
Ages 31-50 200 220 180 600
Ages 51+ 60 140 200 400
Total 500 540 460 1,500

a. Calculate these probabilities:

  1. \(P(\text{Documentary})\) =

  2. \(P(\text{Ages 18-30})\) =

  3. \(P(\text{Documentary AND Ages 51+})\) =

  4. \(P(\text{Documentary | Ages 51+})\) =

  5. \(P(\text{Ages 51+ | Documentary})\) =

b. Independence check

Are age group and genre preference independent?

Test this by checking if \(P(\text{Action | Ages 18-30}) = P(\text{Action})\)

Show your calculations and state your conclusion.

c. Marketing insight

Based on the conditional probabilities, which genre should StreamFlix promote to each age group? Justify with data.


💭 Question 8: Detective’s Reflection

Reflect on your probability investigation (5-7 sentences):

  • How does understanding conditional probability help businesses make better decisions?
  • What’s the difference between \(P(A|B)\) and \(P(B|A)\), and why does this matter in real applications?
  • How did contingency tables help you see patterns in the data?
  • What surprised you about independence vs. mutual exclusivity?
  • Name one other business or real-world scenario where these probability concepts would be crucial.

🎉 Excellent work, Statistical Detective! StreamFlix is grateful for your probability insights!