Statistics - UCSC
18 Nov 2025
By the end of this lecture, you will be able to:
Scenario: A major online retailer is testing two checkout designs:
Questions we’ll answer:
When do we compare two means?
Key assumption: Two independent samples from two populations
Hypotheses:
Test Statistic:
\[t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{SE(\bar{x}_1 - \bar{x}_2)}\]
Under \(H_0\): \((\mu_1 - \mu_2) = 0\), so:
\[t = \frac{\bar{x}_1 - \bar{x}_2}{SE(\bar{x}_1 - \bar{x}_2)}\]
Case 1: Equal Variances (\(\sigma_1^2 = \sigma_2^2\))
\[SE = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\]
where pooled standard deviation:
\[s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}\]
Case 2: Unequal Variances (Welch’s t-test)
\[SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]
Data from our checkout design test:
Test: \(H_0: \mu_A = \mu_B\) vs \(H_a: \mu_B > \mu_A\) at \(\alpha = 0.05\)
Let’s assume equal variances for simplicity.
Step 1: Calculate pooled standard deviation
\[s_p = \sqrt{\frac{(250-1)(22.30)^2 + (250-1)(24.10)^2}{250 + 250 - 2}}\]
\[s_p = \sqrt{\frac{123,956.1 + 144,840.1}{498}} = \sqrt{539.59} = 23.23\]
Step 2: Calculate standard error
\[SE = 23.23 \sqrt{\frac{1}{250} + \frac{1}{250}} = 23.23 \times 0.0894 = 2.08\]
Step 3: Calculate t-statistic
\[t = \frac{92.80 - 87.50}{2.08} = \frac{5.30}{2.08} = 2.55\]
Degrees of freedom: \(df = n_1 + n_2 - 2 = 498\)
For one-sided test at \(\alpha = 0.05\): Critical value ≈ 1.645
Our test statistic: \(t = 2.55 > 1.645\)
Conclusion: Reject \(H_0\). There is significant evidence that Design B leads to higher average purchase amounts than Design A.
Practical interpretation: The new checkout design increases average purchases by about $5.30.
Function: =T.TEST(array1, array2, tails, type)
Parameters:
array1: First sample data rangearray2: Second sample data rangetails: 1 for one-sided, 2 for two-sidedtype: 1 for paired, 2 for equal variance, 3 for unequal varianceExample:
=T.TEST(A2:A251, B2:B251, 1, 2)
Returns p-value for one-sided test with equal variances
Statistical significance ≠ Practical importance
Effect Size measures the magnitude of a difference in standardized units
Cohen’s d:
\[d = \frac{\bar{x}_1 - \bar{x}_2}{s_p}\]
Standardized mean difference (in standard deviations)
Interpretation of Cohen’s d:
| Effect Size | Cohen’s d | Interpretation |
|---|---|---|
| Small | 0.2 | Difficult to detect |
| Medium | 0.5 | Noticeable difference |
| Large | 0.8 | Very noticeable |
Our example:
\[d = \frac{92.80 - 87.50}{23.23} = \frac{5.30}{23.23} = 0.23\]
Small to medium effect size - statistically significant but modest practical impact
Question: A software company tested two training methods for new employees. Method A (n=40): mean productivity score = 78.5, SD = 12.3. Method B (n=40): mean = 82.7, SD = 11.8.
Calculate:
Is this difference practically important?
Discuss with your neighbor (3 minutes), then submit your answer!
Is this difference practically important?
When: Testing if two groups differ on a binary outcome (success/failure)
Examples:
Hypotheses:
Pooled proportion: \(\hat{p} = \frac{x_1 + x_2}{n_1 + n_2}\)
Standard error under \(H_0\):
\[SE = \sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}\]
Test statistic:
\[z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{SE}\]
Under \(H_0\), \(z \sim N(0,1)\)
Data from checkout design test:
Test: \(H_0: p_A = p_B\) vs \(H_a: p_B > p_A\) at \(\alpha = 0.05\)
Step 1: Pooled proportion
\[\hat{p} = \frac{47 + 63}{250 + 250} = \frac{110}{500} = 0.220\]
Step 2: Standard error
\[SE = \sqrt{0.220(1-0.220)\left(\frac{1}{250} + \frac{1}{250}\right)}\]
\[SE = \sqrt{0.1716 \times 0.008} = \sqrt{0.001373} = 0.0371\]
Step 3: Test statistic
\[z = \frac{0.252 - 0.188}{0.0371} = \frac{0.064}{0.0371} = 1.72\]
For one-sided test at \(\alpha = 0.05\): Critical value = 1.645
Our test statistic: \(z = 1.72 > 1.645\)
Conclusion: Reject \(H_0\). There is significant evidence that Design B has a higher conversion rate than Design A.
Practical interpretation: Design B increases conversion rate by about 6.4 percentage points (from 18.8% to 25.2%).
Manual calculation approach:
// Pooled proportion
=(x1 + x2)/(n1 + n2)
// Standard error
=SQRT(pooled*(1-pooled)*(1/n1 + 1/n2))
// Z-statistic
=(p1 - p2)/SE
// P-value (one-sided)
=1 - NORM.S.DIST(z, TRUE)
// P-value (two-sided)
=2*(1 - NORM.S.DIST(ABS(z), TRUE))
Quick recap of Part 1:
Now: What if we have more than two categories?
The chi-square (\(\chi^2\)) distribution:
Notation: \(\chi^2_{df}\) or \(\chi^2(df)\)
Key properties:
Shape depends on df:
Question: Are two categorical variables related?
Example: Is customer satisfaction level independent of checkout design?
Contingency Table:
| Very Satisfied | Satisfied | Neutral | Dissatisfied | |
|---|---|---|---|---|
| Design A | 45 | 102 | 68 | 35 |
| Design B | 72 | 115 | 48 | 15 |
Hypotheses:
Test Statistic:
\[\chi^2 = \sum_{all\ cells} \frac{(O - E)^2}{E}\]
where:
Formula:
\[E_{ij} = \frac{(\text{Row}_i\ \text{Total}) \times (\text{Column}_j\ \text{Total})}{\text{Grand Total}}\]
Our example:
| Very Satisfied | Satisfied | Neutral | Dissatisfied | Total | |
|---|---|---|---|---|---|
| Design A | 45 | 102 | 68 | 35 | 250 |
| Design B | 72 | 115 | 48 | 15 | 250 |
| Total | 117 | 217 | 116 | 50 | 500 |
For Design A, Very Satisfied:
\[E_{11} = \frac{250 \times 117}{500} = \frac{29,250}{500} = 58.5\]
Complete expected frequency table:
| Very Satisfied | Satisfied | Neutral | Dissatisfied | |
|---|---|---|---|---|
| Design A | 58.5 | 108.5 | 58.0 | 25.0 |
| Design B | 58.5 | 108.5 | 58.0 | 25.0 |
Note: Row totals match observed (250 each)
\[\chi^2 = \sum_{all\ cells} \frac{(O - E)^2}{E}\]
Design A, Very Satisfied: \(\frac{(45-58.5)^2}{58.5} = \frac{182.25}{58.5} = 3.12\)
Design A, Satisfied: \(\frac{(102-108.5)^2}{108.5} = \frac{42.25}{108.5} = 0.39\)
Design A, Neutral: \(\frac{(68-58.0)^2}{58.0} = \frac{100}{58.0} = 1.72\)
Design A, Dissatisfied: \(\frac{(35-25.0)^2}{25.0} = \frac{100}{25.0} = 4.00\)
Design B, Very Satisfied: \(\frac{(72-58.5)^2}{58.5} = \frac{182.25}{58.5} = 3.12\)
Design B, Satisfied: \(\frac{(115-108.5)^2}{108.5} = \frac{42.25}{108.5} = 0.39\)
Design B, Neutral: \(\frac{(48-58.0)^2}{58.0} = \frac{100}{58.0} = 1.72\)
Design B, Dissatisfied: \(\frac{(15-25.0)^2}{25.0} = \frac{100}{25.0} = 4.00\)
\[ \chi^2 = 3.12 + 0.39 + 1.72 + 4.00 + \]
\[3.12 + 0.39 + 1.72 + 4.00 = 18.46\]
Degrees of freedom:
\[df = (r-1)(c-1)\]
where r = number of rows, c = number of columns
Our example: \(df = (2-1)(4-1) = 3\)
For \(\alpha = 0.05\) and \(df = 3\): Critical value = 7.815
Our test statistic: \(\chi^2 = 18.46 > 7.815\)
Conclusion: Reject \(H_0\). There is significant evidence that customer satisfaction and checkout design are associated.
Interpretation: Design B leads to higher satisfaction levels.
Function: =CHISQ.TEST(actual_range, expected_range)
Returns p-value for the test
For critical value: =CHISQ.INV.RT(alpha, df)
For p-value from statistic: =CHISQ.DIST.RT(chi_square, df)
Example:
// P-value
=CHISQ.DIST.RT(18.46, 3)
// Critical value
=CHISQ.INV.RT(0.05, 3)
Question: A company surveys 400 employees about work preference (office/hybrid/remote) across 3 departments. Here’s the data:
| Office | Hybrid | Remote | |
|---|---|---|---|
| Sales | 30 | 45 | 25 |
| Tech | 15 | 50 | 85 |
| Admin | 40 | 55 | 55 |
Calculate the expected frequency for Sales-Office cell and the contribution to chi-square for that cell.
Work with your neighbor (4 minutes), then submit!
What is the expected frequency for Sales-Office cell and the contribution to chi-square for that cell.
Requirements for valid test:
What if expectations aren’t met?
What does rejection of \(H_0\) tell us?
To understand the relationship:
Next lecture:
This builds on:
Rate your confidence (1-5) on Ed Discussion:
Questions? I have office hours right after class today!
Next up: ANOVA and Linear Regression
Remember:
STAT 17 – Fall 2025