| participant.ID | treatment.group | overall.V60.outcome |
|---|---|---|
| LEAP_100522 | Peanut Consumption | PASS OFC |
| LEAP_103358 | Peanut Consumption | PASS OFC |
| LEAP_105069 | Peanut Avoidance | PASS OFC |
| LEAP_994047 | Peanut Avoidance | PASS OFC |
| LEAP_997608 | Peanut Consumption | PASS OFC |
STAT 7 - Statistical Methods for the Biological, Environmental & Health Sciences
09 Jan 2026
📋 Quick Reminders
We’ll explore how to think statistically and learn about data collection methods that impact everything we do in this course.
By the end of today’s class, you will be able to:
Explain what statistical thinking is and why it matters for scientific research
Identify the key components of a statistical study (population, sample, variables, parameters, statistics)
Distinguish between different types of variables (categorical vs. numerical; nominal, ordinal, binary, discrete, continuous)
Recognize common sources of bias in data collection
Evaluate the quality and reliability of data based on collection methods
Apply these concepts to real-world case studies in biological and health sciences

Key Question: Does early peanut consumption prevent allergies?
| participant.ID | treatment.group | overall.V60.outcome |
|---|---|---|
| LEAP_100522 | Peanut Consumption | PASS OFC |
| LEAP_103358 | Peanut Consumption | PASS OFC |
| LEAP_105069 | Peanut Avoidance | PASS OFC |
| LEAP_994047 | Peanut Avoidance | PASS OFC |
| LEAP_997608 | Peanut Consumption | PASS OFC |
Individual-level data for five children shows their treatment group and outcome.
But 640 children is too much to look at individually…
| FAIL OFC | PASS OFC | Total | |
|---|---|---|---|
| Peanut Avoidance | 36 | 227 | 263 |
| Peanut Consumption | 5 | 262 | 267 |
| Total | 41 | 489 | 530 |

Is the 11.8% difference real or just chance variation?
This is where statistical thinking comes in!
We need tools to: - Understand if this difference is meaningful - Account for uncertainty - Make reliable conclusions
Later in the course, you’ll learn how to answer this question using hypothesis testing
The Belief Bias Effect
When asked if an argument is logically valid, people tend to be influenced by whether the conclusion seems believable, even when they shouldn’t be.
We naturally let our biases affect our reasoning!
Recommended reading: Chapter 1 from Learning Statistics with R
📝 Your Task (5 minutes total)
Think individually (2 min): Can you think of one example of belief bias you’ve encountered? Consider:
Pair with neighbor (2 min): Share and discuss both examples
Share (1 min): Decide which example best illustrates belief bias and submit to Poll Everywhere (both partners submit the same answer)

PollEv.com/slugstats
“Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty.”
An interdisciplinary field that uses various methods and processes to extract insight from data and apply actionable insight across application domains.
PPDAC: Problem → Plan → Data → Analysis → Conclusion
Every statistical study follows this cycle!
From Chris Wild: What is Statistics?
Use numerical and graphical methods to:
Example: “In our sample, 13.7% of children in the avoidance group developed allergies”
Use sample data to:
Example: “We conclude that peanut consumption reduces allergy risk in the broader population of at-risk children”
This course covers both! We start with description, then move to inference.
Unit (Subject): An object we collect data about
Population: The full set of units we’re interested in
Sample: A subset of the population we actually observe
Variable: A characteristic we measure for each unit
Note: A census collects data from every member of a population (rare and expensive!)

All units of interest (often unobserved)
Observed units (subset we measure)
Key insight: We use the sample (what we can observe) to learn about the population (what we want to know about)
A numerical measurement describing a population
Examples:
A numerical measurement describing a sample
Examples:
Statistics estimate parameters! We calculate statistics from our sample to guess the parameter values for the population.
For the LEAP peanut allergy study, identify each component:
Question 1: What is the unit (subject)?
A. Peanuts
B. Children
C. Allergy tests
D. Treatment groups
Question 2: What is the population?
A. All peanuts produced in the UK
B. UK children with eczema/egg allergy in 2006
C. All allergy tests in 2009
D. The 640 children enrolled

Question 3: Is this a census or sample?
A. Census - measured everyone in the population
B. Sample - measured a subset of the population
Question 4: What is a variable in this study?
A. Children
B. Treatment assignment (consumption vs. avoidance)
C. The United Kingdom
D. 2006-2009
Question 5: The parameter of interest is:
A. The true proportion of all at-risk UK children who would develop allergies under each treatment
B. The 13.7% failure rate in our avoidance group
C. The 530 children in the study
D. Whether a child passed or failed the OFC
Unit: Children (individuals with eczema/egg allergy)
Population: UK children aged 4-11 months with eczema/egg allergy in 2006
Sample: The 530 children with negative skin test who completed the study
Variables: - Treatment group (consumption/avoidance) - OFC result (pass/fail) - Age, severity of eczema, etc.
Parameter: - True proportion of population who would develop allergies under avoidance: p₁ - True proportion under consumption: p₂ - We want to know: p₁ - p₂
Statistic: - Sample proportion with avoidance: 36/263 = 0.137 - Sample proportion with consumption: 5/267 = 0.019 - Observed difference: 0.137 - 0.019 = 0.118 (11.8%)
Sometimes populations are conceptually infinite:
In these cases, the parameter represents the theoretical property of the process generating the data.
Break Time! ☕ 5-minute break
Stretch, grab water, chat with neighbors!
We’ll resume with types of variables and data collection.
Variables that place individuals into groups or categories
Subtypes:
Nominal: Unordered categories - Blood type (A, B, AB, O) - Species of animal - Brand of detector
Ordinal: Ordered categories - Education level (HS, BS, MS, PhD) - Disease severity (mild, moderate, severe) - Course grade (A, B, C, D, F)
Binary: Only two outcomes - Yes/No, Pass/Fail, Alive/Dead - Gender (in some contexts)
Variables that take on numerical values where arithmetic makes sense
Subtypes:
Discrete: Countable values, often integers - Number of children - Number of mutations - Number of emergency room visits
Continuous: Any value in a range - Height, weight, temperature - Blood pressure - Reaction time
Art by Allison Horst
Art by Allison Horst
Different variable types require different methods!
Categorical variables:
Numerical variables:
Bottom line: Correctly identifying variable types is the first step in any data analysis!
For each variable, identify the type (nominal, ordinal, binary, discrete, or continuous):
Recall the carbon monoxide detector study. Classify these variables:
“The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.”
— John Tukey, legendary statistician
Your analysis is only as good as your data collection!
Undercoverage: Some groups are systematically excluded from the sample
Nonresponse bias: Those who respond differ from those who don’t
Voluntary response bias: People self-select into the sample
Measurement error: Measurements are systematically wrong
Loaded/ambiguous questions: Question wording influences response
Question order effects: Earlier questions influence later responses :::
During WWII, military officials wanted to determine where to add extra armor to bombers.
They recorded damage on planes that returned from missions.
Initial thought: Add armor where we see the most damage (wings, fuselage)
Problem: What about the planes that didn’t return?
Correct answer: Add armor where returning planes show little damage (engines, cockpit) — because planes hit there didn’t make it back!
Survivor bias: Only observing “survivors” gives a misleading picture of the full population
An influencer posted: “Have you ever been bitten by an animal?”
Questions:

Problems:
Likely truth: Far less than 65% of all people have been bitten!
📝 Your Task (10 minutes total)
Consider this scenario:
A pharmaceutical company wants to test a new weight-loss drug. They post ads on social media asking for volunteers. Participants who complete the 6-month study will receive $500. The company measures weight loss by asking participants to self-report their weight at the beginning and end of the study.
Questions to discuss:
Think (2 min): What sources of bias can you identify?
Pair (5 min): Share with neighbor, identify at least 3 different bias types
Share (3 min): Groups share their answers via Poll Everywhere
Volunteer bias: People who respond to ads may be more motivated than general population
Nonresponse bias: Only those who complete 6 months are measured (people who quit or had bad reactions are excluded)
Measurement error: Self-reported weight
Financial incentive: $500 may encourage favorable reporting
Better to measure yourself when possible!
“Medication reduces migraines by 150%”
Problem: Can’t reduce by > 100%!
Example: 12% of 500 = (12/100) × 500 = 60
Just because two things are related doesn’t mean one causes the other!
Classic example:
Hidden variable: Temperature/season
We’ll study causation more when we cover experimental design next week!
✅ Statistical thinking and the PPDAC cycle
✅ Key concepts: Population, sample, variable, parameter, statistic
✅ Variable types: Categorical (nominal, ordinal, binary) and numerical (discrete, continuous)
✅ Bias in data collection: Survivor bias, volunteer bias, nonresponse bias, measurement error
✅ Real-world applications through the LEAP study and CO detector example
Key Takeaway: Good statistics starts with good data. Always ask: How was this data collected?
Rate your confidence (1-25 ⭐s) on Ed Discussion:
Can you now:
If summing all the stars you had more than 16, you’re ready to move forward! 🎉
If not, review Chapter 1 from the textbook and come to office hours.
📝 Before You Leave
Exit ticket: Complete today’s lecture summary
Check attendance: Did you complete Poll Everywhere activities?
Complete assignments:
Data Visualization & Summary Statistics - How to create effective graphs - Summarizing categorical data - Introduction to distributions
Read: Textbook Sections 1.6, 1.7, 2.1
Great work today!
See you next class! 📊✨
Questions? Catch me after class or on Ed Discussion
![]()
STAT 7 – Winter 2026