STAT 17: Comparing two means and testing for independence

Prof. Marcela Alfaro Cordoba

Statistics - UCSC

18 Nov 2025

Today’s Learning Objectives

By the end of this lecture, you will be able to:

Compare two independent population means using appropriate tests
Understand and apply Cohen’s standards for effect sizes
Test for differences in means assuming equal population variances
Compare two independent population proportions
Conduct tests with known population standard deviations
Understand properties and applications of chi-square distribution
Test for independence between categorical variables

Retail A/B Testing

Scenario: A major online retailer is testing two checkout designs:

Design A (Control): Traditional multi-page checkout
Design B (Treatment): New single-page checkout

Questions we’ll answer:

Does Design B increase average purchase amount?
How large is the effect of the new design?
Does Design B improve conversion rates?
Is customer satisfaction independent of checkout design?

Part 1: Comparing Two Means

When do we compare two means?

Testing if two groups differ on a continuous outcome
A/B testing in business contexts
Clinical trials comparing treatments
Product testing and quality control

Key assumption: Two independent samples from two populations

The Two-Sample t-Test Framework

Hypotheses:

$H_0: \mu_1 = \mu_2$ (or $\mu_1 - \mu_2 = 0$)
$H_a: \mu_1 \neq \mu_2$ (two-sided)
$H_a: \mu_1 > \mu_2$ or $H_a: \mu_1 < \mu_2$ (one-sided)

Test Statistic:

\[t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{SE(\bar{x}_1 - \bar{x}_2)}\]

Under $H_0$: $(\mu_1 - \mu_2) = 0$, so:

\[t = \frac{\bar{x}_1 - \bar{x}_2}{SE(\bar{x}_1 - \bar{x}_2)}\]

Standard Error: Two Cases

Case 1: Equal Variances ($\sigma_1^2 = \sigma_2^2$)

\[SE = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\]

where pooled standard deviation:

\[s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}\]

Case 2: Unequal Variances (Welch’s t-test)

\[SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]

Example: Purchase Amounts

Data from our checkout design test:

Design A: $n_1 = 250$, $\bar{x}_1 = \$87.50$, $s_1 = \$22.30$
Design B: $n_2 = 250$, $\bar{x}_2 = \$92.80$, $s_2 = \$24.10$

Test: $H_0: \mu_A = \mu_B$ vs $H_a: \mu_B > \mu_A$ at $\alpha = 0.05$

Let’s assume equal variances for simplicity.

Calculating the Test Statistic

Step 1: Calculate pooled standard deviation

\[s_p = \sqrt{\frac{(250-1)(22.30)^2 + (250-1)(24.10)^2}{250 + 250 - 2}}\]

\[s_p = \sqrt{\frac{123,956.1 + 144,840.1}{498}} = \sqrt{539.59} = 23.23\]

Calculating the Test Statistic (cont.)

Step 2: Calculate standard error

\[SE = 23.23 \sqrt{\frac{1}{250} + \frac{1}{250}} = 23.23 \times 0.0894 = 2.08\]

Step 3: Calculate t-statistic

\[t = \frac{92.80 - 87.50}{2.08} = \frac{5.30}{2.08} = 2.55\]

Making a Decision

Degrees of freedom: $df = n_1 + n_2 - 2 = 498$

For one-sided test at $\alpha = 0.05$: Critical value ≈ 1.645

Our test statistic: $t = 2.55 > 1.645$

Conclusion: Reject $H_0$. There is significant evidence that Design B leads to higher average purchase amounts than Design A.

Practical interpretation: The new checkout design increases average purchases by about $5.30.

Google Sheets for Two-Sample t-Test

Function: =T.TEST(array1, array2, tails, type)

Parameters:

array1: First sample data range
array2: Second sample data range
tails: 1 for one-sided, 2 for two-sided
type: 1 for paired, 2 for equal variance, 3 for unequal variance

Example:

=T.TEST(A2:A251, B2:B251, 1, 2)

Returns p-value for one-sided test with equal variances

Effect Size: Beyond Statistical Significance

Statistical significance ≠ Practical importance

Effect Size measures the magnitude of a difference in standardized units

Cohen’s d:

\[d = \frac{\bar{x}_1 - \bar{x}_2}{s_p}\]

Standardized mean difference (in standard deviations)

Cohen’s Standards for Effect Sizes

Interpretation of Cohen’s d:

Effect Size	Cohen’s d	Interpretation
Small	0.2	Difficult to detect
Medium	0.5	Noticeable difference
Large	0.8	Very noticeable

Our example:

\[d = \frac{92.80 - 87.50}{23.23} = \frac{5.30}{23.23} = 0.23\]

Small to medium effect size - statistically significant but modest practical impact

Comparing Two Proportions

When: Testing if two groups differ on a binary outcome (success/failure)

Examples:

Conversion rates between two designs
Default rates between two loan types
Customer satisfaction (satisfied/not satisfied) between services

Hypotheses:

$H_0: p_1 = p_2$ or $p_1 - p_2 = 0$
$H_a: p_1 \neq p_2$ (two-sided)

Test Statistic for Two Proportions

Pooled proportion: $\hat{p} = \frac{x_1 + x_2}{n_1 + n_2}$

Standard error under $H_0$:

\[SE = \sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}\]

Test statistic:

\[z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{SE}\]

Under $H_0$, $z \sim N(0,1)$

Example: Conversion Rates

Data from checkout design test:

Design A: 250 visitors, 47 completed purchases → $\hat{p}_A = 0.188$
Design B: 250 visitors, 63 completed purchases → $\hat{p}_B = 0.252$

Test: $H_0: p_A = p_B$ vs $H_a: p_B > p_A$ at $\alpha = 0.05$

Step 1: Pooled proportion

\[\hat{p} = \frac{47 + 63}{250 + 250} = \frac{110}{500} = 0.220\]

Calculating the Test (cont.)

Step 2: Standard error

\[SE = \sqrt{0.220(1-0.220)\left(\frac{1}{250} + \frac{1}{250}\right)}\]

\[SE = \sqrt{0.1716 \times 0.008} = \sqrt{0.001373} = 0.0371\]

Step 3: Test statistic

\[z = \frac{0.252 - 0.188}{0.0371} = \frac{0.064}{0.0371} = 1.72\]

Making a Decision

For one-sided test at $\alpha = 0.05$: Critical value = 1.645

Our test statistic: $z = 1.72 > 1.645$

Conclusion: Reject $H_0$. There is significant evidence that Design B has a higher conversion rate than Design A.

Practical interpretation: Design B increases conversion rate by about 6.4 percentage points (from 18.8% to 25.2%).

Google Sheets for Proportion Tests

Manual calculation approach:

// Pooled proportion
=(x1 + x2)/(n1 + n2)

// Standard error
=SQRT(pooled*(1-pooled)*(1/n1 + 1/n2))

// Z-statistic
=(p1 - p2)/SE

// P-value (one-sided)
=1 - NORM.S.DIST(z, TRUE)

// P-value (two-sided)
=2*(1 - NORM.S.DIST(ABS(z), TRUE))

🧘‍♀️ STRETCH BREAK

Time to move! (5 minutes)

Stand up and stretch 🤸‍♀️
Chat with neighbors about differences of proportions 💬
Grab some water 💧

Welcome Back!

Quick recap of Part 1:

Two-sample t-tests for comparing means
Effect sizes (Cohen’s d) for practical significance
Two-proportion z-tests for comparing rates

Now: What if we have more than two categories?

Part 2: The Chi-Square Distribution

The chi-square ($\chi^2$) distribution:

Only takes positive values
Skewed right (especially for small df)
Defined by degrees of freedom (df)
Used for testing with categorical data

Notation: $\chi^2_{df}$ or $\chi^2(df)$

Properties of Chi-Square Distribution

Key properties:

Mean: $E(\chi^2_{df}) = df$
Variance: $Var(\chi^2_{df}) = 2 \times df$
As df increases, distribution becomes more symmetric
Sum of squared standard normals: If $Z_i \sim N(0,1)$, then $\sum Z_i^2 \sim \chi^2_{df}$

Shape depends on df:

Small df (1-3): Very right-skewed
Medium df (10-20): Moderately skewed
Large df (>30): Approaches normal

See it in an app

Chi-Square Test for Independence

Question: Are two categorical variables related?

Example: Is customer satisfaction level independent of checkout design?

Contingency Table:

	Very Satisfied	Satisfied	Neutral	Dissatisfied
Design A	45	102	68	35
Design B	72	115	48	15

Hypotheses for Independence Test

Hypotheses:

$H_0$: The two variables are independent
$H_a$: The two variables are associated (dependent)

Test Statistic:

\[\chi^2 = \sum_{all\ cells} \frac{(O - E)^2}{E}\]

where:

O = Observed frequency
E = Expected frequency under independence

Calculating Expected Frequencies

Formula:

\[E_{ij} = \frac{(\text{Row}_i\ \text{Total}) \times (\text{Column}_j\ \text{Total})}{\text{Grand Total}}\]

Our example:

	Very Satisfied	Satisfied	Neutral	Dissatisfied	Total
Design A	45	102	68	35	250
Design B	72	115	48	15	250
Total	117	217	116	50	500

Expected Frequencies Calculation

For Design A, Very Satisfied:

\[E_{11} = \frac{250 \times 117}{500} = \frac{29,250}{500} = 58.5\]

Complete expected frequency table:

	Very Satisfied	Satisfied	Neutral	Dissatisfied
Design A	58.5	108.5	58.0	25.0
Design B	58.5	108.5	58.0	25.0

Note: Row totals match observed (250 each)

Calculating Chi-Square Statistic

\[\chi^2 = \sum_{all\ cells} \frac{(O - E)^2}{E}\]

Design A, Very Satisfied: $\frac{(45-58.5)^2}{58.5} = \frac{182.25}{58.5} = 3.12$

Design A, Satisfied: $\frac{(102-108.5)^2}{108.5} = \frac{42.25}{108.5} = 0.39$

Design A, Neutral: $\frac{(68-58.0)^2}{58.0} = \frac{100}{58.0} = 1.72$

Design A, Dissatisfied: $\frac{(35-25.0)^2}{25.0} = \frac{100}{25.0} = 4.00$

Calculating Chi-Square Statistic (cont.)

Design B, Very Satisfied: $\frac{(72-58.5)^2}{58.5} = \frac{182.25}{58.5} = 3.12$

Design B, Satisfied: $\frac{(115-108.5)^2}{108.5} = \frac{42.25}{108.5} = 0.39$

Design B, Neutral: $\frac{(48-58.0)^2}{58.0} = \frac{100}{58.0} = 1.72$

Design B, Dissatisfied: $\frac{(15-25.0)^2}{25.0} = \frac{100}{25.0} = 4.00$

\[ \chi^2 = 3.12 + 0.39 + 1.72 + 4.00 + \]

\[3.12 + 0.39 + 1.72 + 4.00 = 18.46\]

Degrees of Freedom and Decision

Degrees of freedom:

\[df = (r-1)(c-1)\]

where r = number of rows, c = number of columns

Our example: $df = (2-1)(4-1) = 3$

For $\alpha = 0.05$ and $df = 3$: Critical value = 7.815

Our test statistic: $\chi^2 = 18.46 > 7.815$

Conclusion: Reject $H_0$. There is significant evidence that customer satisfaction and checkout design are associated.

Interpretation: Design B leads to higher satisfaction levels.

Google Sheets for Chi-Square Test

Function: =CHISQ.TEST(actual_range, expected_range)

Returns p-value for the test

For critical value: =CHISQ.INV.RT(alpha, df)

For p-value from statistic: =CHISQ.DIST.RT(chi_square, df)

Example:

// P-value
=CHISQ.DIST.RT(18.46, 3)

// Critical value
=CHISQ.INV.RT(0.05, 3)

THINK-PAIR-SHARE 2 (7 minutes)

Poll Everywhere Time!

Question: A company surveys 400 employees about work preference (office/hybrid/remote) across 3 departments. Here’s the data:

	Office	Hybrid	Remote
Sales	30	45	25
Tech	15	50	85
Admin	40	55	55

Calculate the expected frequency for Sales-Office cell and the contribution to chi-square for that cell.

Work with your neighbor (4 minutes), then submit!

Assumptions for Chi-Square Test

Requirements for valid test:

Random sample: Data collected randomly
Independence: Observations are independent
Expected frequencies: All expected counts ≥ 5
Categorical data: Variables are categorical (not continuous)

What if expectations aren’t met?

Fisher’s exact test for small samples
Combine categories if some cells have low counts
Use simulation-based methods

Interpreting Chi-Square Results

What does rejection of $H_0$ tell us?

Variables are associated (not independent)
Does NOT tell us direction of association
Does NOT tell us strength of association
Does NOT imply causation

To understand the relationship:

Examine residuals: $(O-E)/\sqrt{E}$
Look at cell contributions to $\chi^2$
Calculate measures of association (Cramér’s V, odds ratios)

Looking Ahead

Next lecture:

One-way Analysis of Variance (ANOVA)
F distribution and F-ratio
Comparing more than two means
Post-hoc tests

This builds on:

Today’s two-sample tests
Understanding of hypothesis testing
Comparing multiple groups simultaneously

Quick Knowledge Check ✅

Rate your confidence (1-5) on Ed Discussion:

Conducting two-sample t-tests ⭐⭐⭐⭐⭐
Interpreting effect sizes ⭐⭐⭐⭐⭐
Testing two proportions ⭐⭐⭐⭐⭐
Understanding chi-square distribution ⭐⭐⭐⭐⭐
Testing for independence ⭐⭐⭐⭐⭐

Thank you! 📊✨

Questions? I have office hours right after class today!

Next up: ANOVA and Linear Regression

Remember:

Post Think-Pair-Share on Ed Discussion and Poll Everywhere
Rate your confidence
Statistical significance + Effect size = Complete picture :::

STAT 17: Comparing two means and testing for independence

Today’s Learning Objectives

Retail A/B Testing

Part 1: Comparing Two Means

The Two-Sample t-Test Framework

Standard Error: Two Cases

Example: Purchase Amounts

Calculating the Test Statistic

Calculating the Test Statistic (cont.)

Making a Decision

Google Sheets for Two-Sample t-Test

Effect Size: Beyond Statistical Significance

Cohen’s Standards for Effect Sizes

THINK-PAIR-SHARE 1 (7 minutes)

Poll Everywhere Time!

Share your answers in Poll Everywhere!

Comparing Two Proportions

Test Statistic for Two Proportions

Example: Conversion Rates

Calculating the Test (cont.)

Making a Decision

Google Sheets for Proportion Tests

🧘‍♀️ STRETCH BREAK

Time to move! (5 minutes)

Welcome Back!

Part 2: The Chi-Square Distribution

Properties of Chi-Square Distribution

Chi-Square Test for Independence

Hypotheses for Independence Test

Calculating Expected Frequencies

Expected Frequencies Calculation

Calculating Chi-Square Statistic

Calculating Chi-Square Statistic (cont.)

Degrees of Freedom and Decision

Google Sheets for Chi-Square Test

THINK-PAIR-SHARE 2 (7 minutes)

Poll Everywhere Time!

Share your answers in Poll Everywhere!

Assumptions for Chi-Square Test

Interpreting Chi-Square Results

Looking Ahead

Quick Knowledge Check ✅

Thank you! 📊✨