Statistical Methods for Biological, Environmental, and Health Sciences
When: Thursday 5:20 pm, usual classroom
Format:
What’s covered: Everything we have discussed so far.
Questions welcome throughout!
Statistics is the science of collecting, analyzing, and interpreting data to answer questions and make decisions
Why it matters in biological & health sciences:
Numerical (Quantitative):
Categorical (Qualitative):
Observational Study:
Experiment:
Important
Only well-designed experiments allow causal conclusions!
Random Assignment: Participants randomly assigned to treatment groups
Control Group: Comparison group (often receives placebo)
Blinding:
Replication: Large sample size for reliable results
Convenience Sampling:
Voluntary Response Bias:
Confounding:
Mean (Average):
Median:
Mode:
Tip
Use median for skewed data, mean for symmetric data
Range:
Interquartile Range (IQR):
Standard Deviation (SD):
Symmetric:
Right-Skewed (Positive skew):
Left-Skewed (Negative skew):
Always include three components:
Also mention:
| Variable Type(s) | Graph Type |
|---|---|
| One categorical | Bar chart |
| One numerical | Histogram or box plot |
| Two categorical | Segmented bar chart or mosaic plot |
| Numerical + Categorical | Side-by-side box plots |
| Two numerical | Scatterplot |
Purpose: Show distribution of numerical variable
Key features:
Shows five-number summary:
Box: IQR (middle 50%)
Whiskers: Extend to min/max (excluding outliers)
Outliers: Shown as individual points
Step 1: Calculate IQR = Q3 - Q1
Step 2: Calculate fences:
Step 3: Values outside fences are outliers
Impact: Outliers can greatly affect mean and SD, but not median and IQR
Purpose: Show relationship between two numerical variables
Look for:
Sample Space (S): All possible outcomes
Event (E): Collection of outcomes
Probability: Likelihood an event occurs
\[P(\text{Event}) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}}\]
Properties:
Addition Rule (OR):
For mutually exclusive events: \[P(A \text{ or } B) = P(A) + P(B)\]
For non-mutually exclusive events: \[P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)\]
Multiplication Rule (AND):
For independent events: \[P(A \text{ and } B) = P(A) \times P(B)\]
Independent:
Dependent:
Probability of A given B has occurred:
\[P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\]
Example: What’s the probability a patient has disease given they tested positive?
\[P(\text{Disease}|\text{Positive test}) = \frac{P(\text{Disease and Positive})}{P(\text{Positive})}\]
Useful for:
How to use:
Also called contingency tables
Organize data by two categorical variables
| Disease | No Disease | Total | |
|---|---|---|---|
| Test+ | A | B | A + B |
| Test- | C | D | C + D |
| Total | A + C | B + D | N |
Useful for calculating conditional probabilities
Sensitivity: Probability test is positive given person has disease
\[\text{Sensitivity} = P(\text{Test+}|\text{Disease})\]
Specificity: Probability test is negative given person doesn’t have disease
\[\text{Specificity} = P(\text{Test-}|\text{No Disease})\]
Positive Predictive Value (PPV): Probability of disease given positive test
\[\text{PPV} = P(\text{Disease}|\text{Test+})\]
Negative Predictive Value (NPV): Probability of no disease given negative test
\[\text{NPV} = P(\text{No Disease}|\text{Test-})\]
General formula:
\[P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\]
For medical testing:
\[P(\text{Disease}|\text{Test+}) = \frac{P(\text{Test+}|\text{Disease}) \times P(\text{Disease})}{P(\text{Test+})}\]
Note
PPV depends on disease prevalence! Same test has different PPV in different populations.
When you need to find \(P(\text{Test+})\):
\[P(\text{Disease}|\text{Test+}) = \frac{P(\text{Test+}|\text{Disease}) \times P(\text{Disease})}{P(\text{Test+}|\text{Disease}) \times P(\text{Disease}) + P(\text{Test+}|\text{No Disease}) \times P(\text{No Disease})}\]
This equals:
\[\frac{\text{Sensitivity} \times \text{Prevalence}}{\text{Sensitivity} \times \text{Prevalence} + (1-\text{Specificity}) \times (1-\text{Prevalence})}\]
Random Variable: Numerical outcome of a random process
Discrete Random Variable:
Continuous Random Variable:
Probability Distribution:
Expected Value (Mean):
\[E(X) = \mu = \sum [x \times P(X = x)]\]
Standard Deviation:
\[SD(X) = \sigma = \sqrt{\sum [(x - \mu)^2 \times P(X = x)]}\]
When to use:
Examples:
Probability of exactly k successes:
\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]
where \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\)
Mean and Standard Deviation:
\[\mu = np\]
\[\sigma = \sqrt{np(1-p)}\]
Characteristics:
Notation: \(X \sim N(\mu, \sigma)\)
Key feature: Total area under curve = 1
For a normal distribution:
Tip
Use this for quick estimates of percentages and unusual values
Z-score: Number of standard deviations from the mean
\[z = \frac{x - \mu}{\sigma}\]
Interpretation:
Use: Compare values from different distributions
Standard Normal: \(N(0, 1)\)
Any normal can be standardized:
If \(X \sim N(\mu, \sigma)\), then \(Z = \frac{X - \mu}{\sigma} \sim N(0,1)\)
Use z-table or technology to find probabilities
Common questions:
Process:
How to check if data is approximately normal:
Warning
Many statistical methods assume normality, so always check!
When appropriate:
How to use:
Approximate \(X \sim \text{Binomial}(n, p)\) with \(X \sim N(\mu, \sigma)\) where:
Note
Apply continuity correction for better approximation
A researcher wants to know if a new drug reduces blood pressure. She recruits 100 volunteers and lets them choose whether to take the drug or placebo.
Questions:
In a population, 2% have a disease. A test has:
Questions:
Adult female heights are normally distributed with mean 64 inches and SD 2.5 inches.
Questions:
Confusing causation with correlation in observational studies
Mixing up conditional probabilities: \(P(A|B) \neq P(B|A)\)
Forgetting assumptions for binomial or normal approximation
Using mean/SD for skewed data (use median/IQR instead)
Misinterpreting p-values in probability (wait, we haven’t covered this yet!)
Not checking if events are independent before multiplying probabilities
Forgetting that sensitivity/specificity ≠ PPV/NPV
Before the exam:
During the exam:
Let’s work through any topics you’d like to review!
Good luck on the midterm! 🍀
Remember: You’ve been working with these concepts all quarter. Trust your preparation!