Lecture 4: Data Visualization & Relationships

STAT 7 - Statistical Methods for the Biological, Environmental & Health Sciences

10 Mar 2026

Learning Objectives

By the end of today’s lecture, you will be able to:

  • Create and interpret histograms and box plots for numerical data
  • Create and interpret bar charts and pie charts for categorical data
  • Explore relationships between two variables using appropriate visualizations
  • Calculate and interpret correlation coefficients
  • Understand basic probability concepts and rules

Recap from Last Time

  • Descriptive vs Inferential statistics
  • Types of variables
  • Observational vs experimental studies
  • Types of sampling methods
  • Numerical data summarization

We couldn’t finish: Categorical data summarization, we will start with that today :)

Categorical Data Summarization

What happens when we don’t have numbers to summarize, but instead, we have categories?

Marine Science Research Example

Researcher_ID Ecosystem_Studied Research_Focus Conservation_Challenge
1 Coral Reefs Coral Bleaching Overfishing
2 Open Ocean Marine Biodiversity Climate Change
3 Estuaries Carbon Sequestration Coastal Development
4 Estuaries Nutrient Cycling Pollution
5 Estuaries Hydrothermal Vents Pollution
6 Kelp Forests Species Interactions Ocean Acidification
7 Seagrass Meadows Habitat Restoration Agricultural Runoff
8 Rocky Intertidal Invasive Species Climate Change
9 Salt Marshes Erosion Control Sea Level Rise
10 Open Ocean Fisheries Management Overfishing

Categorical Data: Frequency Tables

How do we summarize categorical data?

  1. Frequency Tables (counts)
  2. Relative Frequency Tables (proportions/percentages)
  3. Cross-tabulation (relationships between variables)
  4. Plots (we’ll see these next week!)

Example: Conservation Challenges

Conservation Challenge Absolute Frequency Relative Frequency
Pollution 2 0.20
Overfishing 2 0.20
Climate Change 2 0.20
Coastal Development 1 0.10
Ocean Acidification 1 0.10
Agricultural Runoff 1 0.10
Sea Level Rise 1 0.10
TOTAL 10 1.00

Activity: Your Turn - Ecosystem Table

Your Task (5 minutes)

Create a frequency table for Ecosystem_Studied using the data provided.

Calculate:

  1. Absolute frequencies (counts)
  2. Relative frequencies (proportions)

Can you also create a cross-table between Ecosystem Studied and Research Focus?

Cross-Tabulation Example

Ecosystem Studied × Conservation Challenge

Estuaries Open Ocean Other Total
Pollution 2 0 0 2
Climate Change 0 1 1 2
Overfishing 0 1 1 2
Other 1 0 3 4
Total 3 2 5 10

Part 1: Palmer Penguins

Meet the Penguins!

Note

Palmer Archipelago (Antarctica) penguin data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER.

See more info here.

The Dataset

Variables:

  • species: penguin species (Chinstrap, Adélie, or Gentoo)
  • culmen_length_mm: culmen length (mm)
  • culmen_depth_mm: culmen depth (mm)
  • flipper_length_mm: flipper length (mm)
  • body_mass_g: body mass (g)
  • sex: penguin sex (female or male)
  • island: island name (Dream, Torgersen, or Biscoe)

Activity: Types of Variables

Poll Question

Classify each variable as:

  • Categorical (Nominal or Ordinal)
  • Numerical (Discrete or Continuous)

Data: Google Sheets Link

Poll: PollEv.com/slugstats

Exploratory Data Analysis (EDA)

Where to start?

Quantitative (Numerical):

  • culmen_length_mm
  • culmen_depth_mm
  • flipper_length_mm
  • body_mass_g

Qualitative (Categorical):

  • species
  • sex
  • island

How can we summarize this dataset?

Numerical Summaries

Statistic Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g)
Min 32.10 13.10 172.00 2700.00
Q1 39.23 15.60 190.00 3550.00
Median 44.45 17.30 197.00 4050.00
Mean 43.92 17.15 200.92 4201.75
Q3 48.50 18.70 213.00 4750.00
Max 59.60 21.50 231.00 6300.00
SD 5.46 1.97 14.06 801.95

Categorical Summaries

Species

Species Count
Adelie 152
Chinstrap 68
Gentoo 124
TOTAL 344

Island

Island Count
Biscoe 168
Dream 124
Torgersen 52
TOTAL 344

Sex

Sex Count
Female 165
Male 169
TOTAL 334

What’s missing to describe the data?

Part 2: Visualizing Numerical Data

Histogram: Culmen Length

Reading Histograms

What to check:

  1. Shape: Is it roughly symmetric or skewed (negatively or positively)?
  2. Modality: Is it unimodal, bimodal, or multimodal?
  3. Center: Where is the typical value?
  4. Spread: How variable are the data?
  5. Outliers: Are there observations far from the others?

Skewness

Activity: Interpret Histograms

Your Task

For each histogram, describe: 1. Shape (symmetric, skewed left, skewed right) 2. Center (approximately) 3. Spread (range, typical deviation) 4. Any unusual features

Multiple Histograms

Question: Should we compare directly the different distributions? Why or why not?

Box Plots: Overview

Box Plots: Key Features

The Box:

  • Q1 (25th percentile)
  • Median (Q2, 50th percentile)
  • Q3 (75th percentile)
  • IQR = Q3 - Q1

The Whiskers:

  • Extend to 1.5 × IQR
  • Points beyond are outliers

The Mean: - Often shown as + (in some software)

Box Plots: Flipper Length by Island

Box Plots: Flipper Length by Species

What patterns do you notice?

Part 3: Visualizing Categorical Data

Bar Plot: Species Count

How many species are present, and how many individuals belong to each species?

Pie Chart: Species Proportion

Discussion

What are the limitations of pie charts? When might a bar plot be better?

Break Time! ☕ 5-minute break

Stretch, grab water, chat with neighbors!

We’ll resume with types of variables and data collection.

Part 4: Relationships Between Variables

Scatterplot: Body Mass vs Flipper Length

Scatterplot by Species

What patterns emerge when we add species information?

Correlation

Correlation coefficient (r) measures the strength and direction of a linear relationship:

\[r_{xy} = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2 \sum(y_i - \bar{y})^2}}\]

  • Range: -1 to +1
  • Sign: Direction of relationship
  • Magnitude: Strength of relationship

Interpreting Correlation

Interactive practice: https://istats.shinyapps.io/guesscorr/

Correlation Patterns

Warning

Correlation only measures linear relationships!

Activity: Calculate Correlation

Example Data

id x y
1 25 60
2 48 123
3 39 96
4 34 83
5 16 42

Average x = 30.7, Average y = 74.3

Calculate the correlation coefficient!

Calculation Steps

  1. Calculate deviations: \((x_i - \bar{x})\) and \((y_i - \bar{y})\)
  2. Calculate products: \((x_i - \bar{x})(y_i - \bar{y})\)
  3. Calculate sum of products: \(\sum(x_i - \bar{x})(y_i - \bar{y})\)
  4. Calculate sum of squared deviations for x and y
  5. Plug into formula

Scatterplot for Example

What correlation do you expect?

Scatterplot for Example: Body Mass vs Flipper Length

Calculated correlation: r ≈ 0.87

Activity: Find New Plot Types

Your Turn (Pairs)

Visit: https://datavizcatalogue.com/

Find NEW plots to represent:

  1. Quantitative vs Quantitative
  2. Qualitative vs Qualitative
  3. Qualitative vs Quantitative
  4. More than 2 variables

Share what you found using screenshots in Ed Discussion!

Summary: How to choose?

Key Takeaways

  1. Visualizations help us see patterns in data
  2. Histograms and box plots for numerical data show distribution
  3. Bar charts for categorical data (avoid pie charts when possible)
  4. Scatterplots explore relationships between numerical variables
  5. Correlation measures linear relationships

Quick Knowledge Check ✅

Rate your confidence (1-25 ⭐s) on Ed Discussion:

Can you now:

If summing all the stars you had more than 16, you’re ready to move forward! 🎉

If not, review Chapter 1 from the textbook and come to office hours.

Exit Ticket

  1. Summary: Post your self assessment in Ed Discussion. Please reply to the poll only, no need to leave comments.

  2. Attendance: Did you complete at least one attendance activity? If not, see me now!

  3. Complete:

    • HW 1 (due Friday)
    • DSA 1 (due after your Discussion Section)

Looking Ahead

Next class:

  • Probability concepts
  • Conditional probability
  • Probability distributions

Readings:

  • Section 2.1 (continued)
  • Section 2.2

Great work today!

See you next class! 📊✨

Questions? Catch me after class or on Ed Discussion