Lab 2: Visualizing Amounts & Distributions

STAT 80B - Winter 2026

Overview

In this lab, you’ll practice creating fundamental visualizations for amounts and distributions. You’ll create the same visualizations using different methods to understand when each approach works best, and you’ll explore how parameter choices (like bin width) dramatically affect how we interpret data.

Due: Thursday Jan 22nd at the end of the lecture Submit: One PDF via Canvas
Filename: LastName_FirstName_Lab2.pdf

Learning Objectives

By completing this lab, you will:

  • Create multiple visualization types for the same data (bars, dots, heatmaps)
  • Build histograms with different bin widths
  • Understand how visualization choices affect interpretation
  • Practice using your chosen tool (Tableau, R, or Python)

Dataset

Download the dataset from here, and save it as congress.csv on a folder that you can find.

The dataset contains:

  • Category variables for visualizing amounts
  • Numerical variables for creating distributions
  • Sufficient data to explore multiple visualization approaches

Lab Structure

This lab has three parts:

  1. Part 1: Visualizing Amounts (3 visualizations)
  2. Part 2: Exploring Distributions (3+ histograms)
  3. Part 3: Written Reflection (1 paragraph)

Part 1: Visualizing Amounts

Task

Create three different visualizations showing amounts across categories:

  1. Bar chart (vertical OR horizontal - your choice)
  2. Dot plot
  3. Heatmap (or color-coded table)

All three should show the same underlying data but use different visual approaches.

Instructions by Tool

Choose ONE tool and follow the corresponding instructions below.

Step 1: Load Data

  1. Open Tableau Desktop
  2. Click “Connect to Data” → “Text file”
  3. Navigate to congress.csv and open
  4. Verify data loaded correctly in the Data Source tab

Step 2: Create Vertical Bar Chart

  1. Drag your category field to Columns
  2. Drag your amount field to Rows
  3. Tableau automatically creates vertical bars
  4. Right-click on the y-axis → “Edit Axis” → Add title
  5. Double-click the title area → Add chart title

To make horizontal instead:

  • Swap: Category to Rows, Amount to Columns
  • Or click the swap icon in toolbar

Step 3: Create Dot Plot

  1. Option A: Start fresh
    • Create new sheet
    • Drag category to Columns, amount to Rows
    • In the Marks card, change from “Automatic” to “Circle”
  2. Option B: Duplicate your bar chart
    • Right-click sheet tab → Duplicate
    • Change Marks type to “Circle”
    • Adjust size using Size slider in Marks card

Step 4: Create Heatmap

  1. Create new sheet
  2. If you have two categorical variables:
    • Drag first category to Rows
    • Drag second category to Columns
    • Drag amount to Color
  3. If you only have one category:
    • Create a simple text table with color
    • Drag category to Rows
    • Drag amount to Text AND Color
    • Choose appropriate color palette

Tip: For heatmaps, use a sequential color palette (one color gradient) rather than diverging colors.

Step 1: Load Packages and Data

# Load required packages
library(tidyverse)  # Includes ggplot2 and readr

# Load the data
data <- read_csv("lab2_data.csv")

# Preview the data
head(data)

Step 2: Create Bar Chart

Vertical bars:

ggplot(data, aes(x = category_column, y = amount_column)) +
  geom_col(fill = "steelblue") +
  labs(
    title = "Your Title Here",
    x = "Category",
    y = "Amount"
  ) +
  theme_minimal()

# Save the plot
ggsave("barplot.png", width = 8, height = 6)

Horizontal bars:

ggplot(data, aes(x = amount_column, y = category_column)) +
  geom_col(fill = "steelblue") +
  labs(
    title = "Your Title Here",
    x = "Amount",
    y = "Category"
  ) +
  theme_minimal()

# Save the plot
ggsave("barplot_horizontal.png", width = 8, height = 6)

Note: Replace category_column and amount_column with your actual column names.

Step 3: Create Dot Plot

ggplot(data, aes(x = category_column, y = amount_column)) +
  geom_point(size = 4, color = "darkblue") +
  labs(
    title = "Dot Plot: Your Title",
    x = "Category",
    y = "Amount"
  ) +
  theme_minimal()

# Save
ggsave("dotplot.png", width = 8, height = 6)

Optional enhancement: Add a line connecting points if order matters:

ggplot(data, aes(x = category_column, y = amount_column)) +
  geom_line(color = "gray50") +
  geom_point(size = 4, color = "darkblue") +
  labs(title = "Dot Plot with Lines") +
  theme_minimal()

Step 4: Create Heatmap

If you have two categorical variables:

ggplot(data, aes(x = category1, y = category2, fill = amount)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "darkblue") +
  labs(
    title = "Heatmap: Your Title",
    x = "Category 1",
    y = "Category 2",
    fill = "Amount"
  ) +
  theme_minimal()

# Save
ggsave("heatmap.png", width = 8, height = 6)

If you have one categorical variable:

# Create a simple color-coded table
ggplot(data, aes(x = 1, y = category_column, fill = amount_column)) +
  geom_tile() +
  geom_text(aes(label = amount_column), color = "white") +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(title = "Color-Coded Table") +
  theme_minimal() +
  theme(axis.text.x = element_blank())

# Save
ggsave("heatmap_simple.png", width = 6, height = 8)

Step 1: Import Packages and Load Data

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
data = pd.read_csv("lab2_data.csv")

# Preview data
print(data.head())

Step 2: Create Bar Chart

Using matplotlib:

# Vertical bar chart
plt.figure(figsize=(10, 6))
plt.bar(data['category_column'], data['amount_column'], color='steelblue')
plt.title('Your Title Here')
plt.xlabel('Category')
plt.ylabel('Amount')
plt.xticks(rotation=45, ha='right')  # Rotate labels if needed
plt.tight_layout()
plt.savefig('barplot.png', dpi=300)
plt.show()

Horizontal bar chart:

plt.figure(figsize=(10, 6))
plt.barh(data['category_column'], data['amount_column'], color='steelblue')
plt.title('Your Title Here')
plt.xlabel('Amount')
plt.ylabel('Category')
plt.tight_layout()
plt.savefig('barplot_horizontal.png', dpi=300)
plt.show()

Using seaborn (fancier):

plt.figure(figsize=(10, 6))
sns.barplot(data=data, x='category_column', y='amount_column', color='steelblue')
plt.title('Your Title Here')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('barplot_seaborn.png', dpi=300)
plt.show()

Step 3: Create Dot Plot

plt.figure(figsize=(10, 6))
plt.scatter(data['category_column'], data['amount_column'], 
            s=100, color='darkblue', alpha=0.7)
plt.title('Dot Plot: Your Title')
plt.xlabel('Category')
plt.ylabel('Amount')
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('dotplot.png', dpi=300)
plt.show()

With seaborn:

plt.figure(figsize=(10, 6))
sns.stripplot(data=data, x='category_column', y='amount_column', 
              size=10, color='darkblue')
plt.title('Dot Plot: Your Title')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('dotplot_seaborn.png', dpi=300)
plt.show()

Step 4: Create Heatmap

If you have two categorical variables:

# Pivot data for heatmap
heatmap_data = data.pivot(index='category1', 
                          columns='category2', 
                          values='amount')

# Create heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(heatmap_data, annot=True, fmt='.1f', 
            cmap='Blues', cbar_kws={'label': 'Amount'})
plt.title('Heatmap: Your Title')
plt.tight_layout()
plt.savefig('heatmap.png', dpi=300)
plt.show()

If you have one categorical variable (color-coded table):

# Create a simple heatmap
plt.figure(figsize=(8, 10))
# Reshape data into matrix form
matrix_data = data[['category_column', 'amount_column']].set_index('category_column')
sns.heatmap(matrix_data, annot=True, fmt='.1f', 
            cmap='Blues', cbar_kws={'label': 'Amount'})
plt.title('Color-Coded Table')
plt.tight_layout()
plt.savefig('heatmap_simple.png', dpi=300)
plt.show()

Part 2: Exploring Distributions

Task

Create at least 3 histograms of the same numerical variable using different bin widths:

  1. Wide bins (few bins: 5-10)
  2. Medium bins (moderate: 15-25)
  3. Narrow bins (many bins: 30-50+)

Why This Matters

Bin width dramatically changes what patterns you see! This is one of the most important lessons in data visualization.

Instructions by Tool

Create Histograms with Different Bins

  1. Create first histogram:
    • Drag your numerical variable to Columns
    • Right-click on the field → “Create” → “Bins”
    • Set bin size (start with 10 bins)
    • Drag the binned field to Rows
    • Drag COUNT to Columns (or let Tableau do it automatically)
  2. Duplicate sheet for different bin widths:
    • Right-click sheet tab → “Duplicate Sheet”
    • Right-click on bin field → “Edit” → Change bin size
    • Repeat for each bin width
  3. Add clear titles:
    • Label each: “Histogram - 5 bins”, “Histogram - 20 bins”, etc.

Calculating Bin Size

If Tableau asks for “Bin Size” instead of number of bins:

Bin Size = (Max Value - Min Value) / Desired Number of Bins

Example: Data ranges from 0 to 100, you want 10 bins:

Bin Size = (100 - 0) / 10 = 10

Create Histograms with Different Bins

# Find your numerical column
summary(data$numerical_column)

# Histogram with 5 bins
ggplot(data, aes(x = numerical_column)) +
  geom_histogram(bins = 5, fill = "steelblue", color = "white") +
  labs(
    title = "Histogram - 5 Bins",
    x = "Values",
    y = "Count"
  ) +
  theme_minimal()

ggsave("histogram_5bins.png", width = 8, height = 6)

# Histogram with 20 bins
ggplot(data, aes(x = numerical_column)) +
  geom_histogram(bins = 20, fill = "steelblue", color = "white") +
  labs(
    title = "Histogram - 20 Bins",
    x = "Values",
    y = "Count"
  ) +
  theme_minimal()

ggsave("histogram_20bins.png", width = 8, height = 6)

# Histogram with 50 bins
ggplot(data, aes(x = numerical_column)) +
  geom_histogram(bins = 50, fill = "steelblue", color = "white") +
  labs(
    title = "Histogram - 50 Bins",
    x = "Values",
    y = "Count"
  ) +
  theme_minimal()

ggsave("histogram_50bins.png", width = 8, height = 6)

Optional: Create All Three in One View

library(patchwork)

p1 <- ggplot(data, aes(x = numerical_column)) +
  geom_histogram(bins = 5, fill = "steelblue", color = "white") +
  labs(title = "5 Bins") + theme_minimal()

p2 <- ggplot(data, aes(x = numerical_column)) +
  geom_histogram(bins = 20, fill = "steelblue", color = "white") +
  labs(title = "20 Bins") + theme_minimal()

p3 <- ggplot(data, aes(x = numerical_column)) +
  geom_histogram(bins = 50, fill = "steelblue", color = "white") +
  labs(title = "50 Bins") + theme_minimal()

# Combine plots
p1 / p2 / p3

ggsave("histograms_combined.png", width = 8, height = 10)

Create Histograms with Different Bins

# Get column name
numerical_col = 'your_numerical_column'

# Check data range
print(data[numerical_col].describe())

# Histogram with 5 bins
plt.figure(figsize=(10, 6))
plt.hist(data[numerical_col], bins=5, color='steelblue', edgecolor='white')
plt.title('Histogram - 5 Bins')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('histogram_5bins.png', dpi=300)
plt.show()

# Histogram with 20 bins
plt.figure(figsize=(10, 6))
plt.hist(data[numerical_col], bins=20, color='steelblue', edgecolor='white')
plt.title('Histogram - 20 Bins')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('histogram_20bins.png', dpi=300)
plt.show()

# Histogram with 50 bins
plt.figure(figsize=(10, 6))
plt.hist(data[numerical_col], bins=50, color='steelblue', edgecolor='white')
plt.title('Histogram - 50 Bins')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('histogram_50bins.png', dpi=300)
plt.show()

Using Seaborn

# More polished histograms with seaborn
fig, axes = plt.subplots(3, 1, figsize=(10, 12))

# 5 bins
sns.histplot(data=data, x=numerical_col, bins=5, ax=axes[0], color='steelblue')
axes[0].set_title('Histogram - 5 Bins')

# 20 bins
sns.histplot(data=data, x=numerical_col, bins=20, ax=axes[1], color='steelblue')
axes[1].set_title('Histogram - 20 Bins')

# 50 bins
sns.histplot(data=data, x=numerical_col, bins=50, ax=axes[2], color='steelblue')
axes[2].set_title('Histogram - 50 Bins')

plt.tight_layout()
plt.savefig('histograms_combined.png', dpi=300)
plt.show()

What to Observe

For each histogram, note:

  • Shape: Is it symmetric? Skewed? Multiple peaks?
  • Center: Where is most of the data?
  • Spread: How wide is the distribution?
  • Outliers: Any extreme values visible?
  • Detail: What patterns appear/disappear with different bin widths?

Part 3: Written Reflection

Task

Write one paragraph (150-300 words) addressing:

  1. Comparing amount visualizations: Which of your three visualizations (bar, dot, heatmap) was most effective for this data? Why?

  2. Impact of bin width: How did changing bin width affect what you saw in the histograms? What did you learn?

  3. Recommendations: What would you recommend to someone analyzing similar data?

What Makes a Strong Reflection

Good reflections include:

Specific comparisons - “The dot plot made it easier to compare exact values because…”

Trade-offs - “Bar charts emphasized magnitude well, but the heatmap revealed patterns I missed…”

Evidence - “With 5 bins, the distribution looked smooth and normal, but with 30 bins I could see a small secondary peak around…”

Practical insights - “For presentations, I’d use the bar chart because it’s familiar, but for detailed analysis…”

Weak reflections:

❌ “I liked the bar chart.”

❌ “The histograms were different.”

❌ Generic statements not tied to your actual visualizations

Example Paragraph Structure

[1-2 sentences comparing your amount visualizations and picking the best one with reasoning] [2-3 sentences about how bin width changed your interpretation of the distribution] [1-2 sentences with recommendations for future work or lessons learned]


Submission Guidelines

Format Your PDF

Your final PDF should contain:

  1. Title page (optional but nice):
    • “Lab 2: Amounts & Distributions”
    • Your name and date
  2. Part 1: Amount Visualizations
    • Bar chart (labeled)
    • Dot plot (labeled)
    • Heatmap (labeled)
  3. Part 2: Distribution Visualizations
    • Histogram with wide bins (labeled with bin count)
    • Histogram with medium bins (labeled)
    • Histogram with narrow bins (labeled)
    • Additional histograms if you created more
  4. Part 3: Written Reflection
    • Your paragraph

Creating the PDF

Several options:

  • Word/Google Docs: Insert images, export as PDF
  • LaTeX/Markdown: Compile to PDF
  • PowerPoint: Create slides, save as PDF
  • Direct export: Some tools export directly to PDF

Important: Ensure all visualizations are readable and clearly labeled!

Checklist Before Submitting


Grading Rubric

Component Points Criteria
Bar Chart 1 Clear, properly labeled, appropriate orientation
Dot Plot 1 Clear, properly labeled, same data as bar chart
Heatmap 1 Appropriate color scale, readable
Histogram 1 0.5 Wide bins, labeled
Histogram 2 0.5 Medium bins, labeled
Histogram 3 0.5 Narrow bins, labeled
Visual Quality 1 All charts readable, professional appearance
Reflection - Depth 2 Shows critical thinking about visualization choices
Reflection - Evidence 1.5 Uses specific examples from your visualizations
Formatting 1 Proper PDF format, organized, labeled clearly
Total 10

Tips for Success

Time Management

  • Don’t wait until the last minute! Creating visualizations takes longer than you think.
  • Start with Part 1, then Part 2, then write the reflection.
  • If you get stuck on one visualization, move to the next and come back.

Common Mistakes to Avoid

  1. Missing axis labels - Every chart needs labeled axes!
  2. No titles - Viewers shouldn’t have to guess what they’re looking at
  3. Only one histogram - You need at least 3 with different bin widths
  4. Vague reflection - Be specific! Reference your actual visualizations
  5. Unreadable text - Make sure fonts are large enough

Getting Help

  • During lab: Ask questions!
  • Lab instructions: This page has step-by-step code for each tool

Going Beyond (Optional)

Want to challenge yourself?

  • Try creating the same visualizations in multiple tools
  • Experiment with color schemes and formatting
  • Create additional histogram variations (8 bins, 15 bins, etc.)
  • Try density plots instead of histograms
  • Compare grouped vs. stacked bar charts
  • Try to build a map

Frequently Asked Questions

Q: Can I use a different dataset?

A: No, please use the provided congress.csv for consistency and grading purposes.

Q: Do I have to use all three software tools?

A: No! Pick ONE tool (Tableau, R, or Python) and stick with it for the entire lab.

Q: What if I can’t make a heatmap work?

A: If you only have one categorical variable, a color-coded table is acceptable. The point is to use color to encode amount.

Q: How many histograms do I need?

A: At least 3 with noticeably different bin widths. More is fine if you want to explore further!

Q: Can the visualizations be in color?

A: Yes! Color is encouraged, especially for the heatmap.

Q: What if my histograms all look very similar?

A: Try more extreme bin widths—go to 3-5 bins on the low end and 40-60 on the high end.

Q: How long should my reflection be?

A: 150-300 words. That’s roughly 1 good paragraph or 2 shorter paragraphs.

Q: Can I work with a partner?

A: You can discuss concepts together, but each person must create and submit their own visualizations and reflection.


Good luck! Remember, the goal is learning, not perfection. Experiment, explore, and don’t be afraid to try things!

STAT 80B – Winter 2026