Lab 2: Visualizing Amounts & Distributions
STAT 80B - Winter 2026
Overview
In this lab, you’ll practice creating fundamental visualizations for amounts and distributions. You’ll create the same visualizations using different methods to understand when each approach works best, and you’ll explore how parameter choices (like bin width) dramatically affect how we interpret data.
Due: Thursday Jan 22nd at the end of the lecture Submit: One PDF via Canvas
Filename: LastName_FirstName_Lab2.pdf
Learning Objectives
By completing this lab, you will:
- Create multiple visualization types for the same data (bars, dots, heatmaps)
- Build histograms with different bin widths
- Understand how visualization choices affect interpretation
- Practice using your chosen tool (Tableau, R, or Python)
Dataset
Download the dataset from here, and save it as congress.csv on a folder that you can find.
The dataset contains:
- Category variables for visualizing amounts
- Numerical variables for creating distributions
- Sufficient data to explore multiple visualization approaches
Lab Structure
This lab has three parts:
- Part 1: Visualizing Amounts (3 visualizations)
- Part 2: Exploring Distributions (3+ histograms)
- Part 3: Written Reflection (1 paragraph)
Part 1: Visualizing Amounts
Task
Create three different visualizations showing amounts across categories:
- Bar chart (vertical OR horizontal - your choice)
- Dot plot
- Heatmap (or color-coded table)
All three should show the same underlying data but use different visual approaches.
Instructions by Tool
Choose ONE tool and follow the corresponding instructions below.
Step 1: Load Data
- Open Tableau Desktop
- Click “Connect to Data” → “Text file”
- Navigate to
congress.csvand open - Verify data loaded correctly in the Data Source tab
Step 2: Create Vertical Bar Chart
- Drag your category field to Columns
- Drag your amount field to Rows
- Tableau automatically creates vertical bars
- Right-click on the y-axis → “Edit Axis” → Add title
- Double-click the title area → Add chart title
To make horizontal instead:
- Swap: Category to Rows, Amount to Columns
- Or click the swap icon in toolbar
Step 3: Create Dot Plot
- Option A: Start fresh
- Create new sheet
- Drag category to Columns, amount to Rows
- In the Marks card, change from “Automatic” to “Circle”
- Option B: Duplicate your bar chart
- Right-click sheet tab → Duplicate
- Change Marks type to “Circle”
- Adjust size using Size slider in Marks card
Step 4: Create Heatmap
- Create new sheet
- If you have two categorical variables:
- Drag first category to Rows
- Drag second category to Columns
- Drag amount to Color
- If you only have one category:
- Create a simple text table with color
- Drag category to Rows
- Drag amount to Text AND Color
- Choose appropriate color palette
Tip: For heatmaps, use a sequential color palette (one color gradient) rather than diverging colors.
Step 1: Load Packages and Data
# Load required packages
library(tidyverse) # Includes ggplot2 and readr
# Load the data
data <- read_csv("lab2_data.csv")
# Preview the data
head(data)Step 2: Create Bar Chart
Vertical bars:
ggplot(data, aes(x = category_column, y = amount_column)) +
geom_col(fill = "steelblue") +
labs(
title = "Your Title Here",
x = "Category",
y = "Amount"
) +
theme_minimal()
# Save the plot
ggsave("barplot.png", width = 8, height = 6)Horizontal bars:
ggplot(data, aes(x = amount_column, y = category_column)) +
geom_col(fill = "steelblue") +
labs(
title = "Your Title Here",
x = "Amount",
y = "Category"
) +
theme_minimal()
# Save the plot
ggsave("barplot_horizontal.png", width = 8, height = 6)Note: Replace category_column and amount_column with your actual column names.
Step 3: Create Dot Plot
ggplot(data, aes(x = category_column, y = amount_column)) +
geom_point(size = 4, color = "darkblue") +
labs(
title = "Dot Plot: Your Title",
x = "Category",
y = "Amount"
) +
theme_minimal()
# Save
ggsave("dotplot.png", width = 8, height = 6)Optional enhancement: Add a line connecting points if order matters:
ggplot(data, aes(x = category_column, y = amount_column)) +
geom_line(color = "gray50") +
geom_point(size = 4, color = "darkblue") +
labs(title = "Dot Plot with Lines") +
theme_minimal()Step 4: Create Heatmap
If you have two categorical variables:
ggplot(data, aes(x = category1, y = category2, fill = amount)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "darkblue") +
labs(
title = "Heatmap: Your Title",
x = "Category 1",
y = "Category 2",
fill = "Amount"
) +
theme_minimal()
# Save
ggsave("heatmap.png", width = 8, height = 6)If you have one categorical variable:
# Create a simple color-coded table
ggplot(data, aes(x = 1, y = category_column, fill = amount_column)) +
geom_tile() +
geom_text(aes(label = amount_column), color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(title = "Color-Coded Table") +
theme_minimal() +
theme(axis.text.x = element_blank())
# Save
ggsave("heatmap_simple.png", width = 6, height = 8)Step 1: Import Packages and Load Data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
data = pd.read_csv("lab2_data.csv")
# Preview data
print(data.head())Step 2: Create Bar Chart
Using matplotlib:
# Vertical bar chart
plt.figure(figsize=(10, 6))
plt.bar(data['category_column'], data['amount_column'], color='steelblue')
plt.title('Your Title Here')
plt.xlabel('Category')
plt.ylabel('Amount')
plt.xticks(rotation=45, ha='right') # Rotate labels if needed
plt.tight_layout()
plt.savefig('barplot.png', dpi=300)
plt.show()Horizontal bar chart:
plt.figure(figsize=(10, 6))
plt.barh(data['category_column'], data['amount_column'], color='steelblue')
plt.title('Your Title Here')
plt.xlabel('Amount')
plt.ylabel('Category')
plt.tight_layout()
plt.savefig('barplot_horizontal.png', dpi=300)
plt.show()Using seaborn (fancier):
plt.figure(figsize=(10, 6))
sns.barplot(data=data, x='category_column', y='amount_column', color='steelblue')
plt.title('Your Title Here')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('barplot_seaborn.png', dpi=300)
plt.show()Step 3: Create Dot Plot
plt.figure(figsize=(10, 6))
plt.scatter(data['category_column'], data['amount_column'],
s=100, color='darkblue', alpha=0.7)
plt.title('Dot Plot: Your Title')
plt.xlabel('Category')
plt.ylabel('Amount')
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('dotplot.png', dpi=300)
plt.show()With seaborn:
plt.figure(figsize=(10, 6))
sns.stripplot(data=data, x='category_column', y='amount_column',
size=10, color='darkblue')
plt.title('Dot Plot: Your Title')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('dotplot_seaborn.png', dpi=300)
plt.show()Step 4: Create Heatmap
If you have two categorical variables:
# Pivot data for heatmap
heatmap_data = data.pivot(index='category1',
columns='category2',
values='amount')
# Create heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(heatmap_data, annot=True, fmt='.1f',
cmap='Blues', cbar_kws={'label': 'Amount'})
plt.title('Heatmap: Your Title')
plt.tight_layout()
plt.savefig('heatmap.png', dpi=300)
plt.show()If you have one categorical variable (color-coded table):
# Create a simple heatmap
plt.figure(figsize=(8, 10))
# Reshape data into matrix form
matrix_data = data[['category_column', 'amount_column']].set_index('category_column')
sns.heatmap(matrix_data, annot=True, fmt='.1f',
cmap='Blues', cbar_kws={'label': 'Amount'})
plt.title('Color-Coded Table')
plt.tight_layout()
plt.savefig('heatmap_simple.png', dpi=300)
plt.show()Part 2: Exploring Distributions
Task
Create at least 3 histograms of the same numerical variable using different bin widths:
- Wide bins (few bins: 5-10)
- Medium bins (moderate: 15-25)
- Narrow bins (many bins: 30-50+)
Why This Matters
Bin width dramatically changes what patterns you see! This is one of the most important lessons in data visualization.
Instructions by Tool
Create Histograms with Different Bins
- Create first histogram:
- Drag your numerical variable to Columns
- Right-click on the field → “Create” → “Bins”
- Set bin size (start with 10 bins)
- Drag the binned field to Rows
- Drag COUNT to Columns (or let Tableau do it automatically)
- Duplicate sheet for different bin widths:
- Right-click sheet tab → “Duplicate Sheet”
- Right-click on bin field → “Edit” → Change bin size
- Repeat for each bin width
- Add clear titles:
- Label each: “Histogram - 5 bins”, “Histogram - 20 bins”, etc.
Calculating Bin Size
If Tableau asks for “Bin Size” instead of number of bins:
Bin Size = (Max Value - Min Value) / Desired Number of Bins
Example: Data ranges from 0 to 100, you want 10 bins:
Bin Size = (100 - 0) / 10 = 10
Create Histograms with Different Bins
# Find your numerical column
summary(data$numerical_column)
# Histogram with 5 bins
ggplot(data, aes(x = numerical_column)) +
geom_histogram(bins = 5, fill = "steelblue", color = "white") +
labs(
title = "Histogram - 5 Bins",
x = "Values",
y = "Count"
) +
theme_minimal()
ggsave("histogram_5bins.png", width = 8, height = 6)
# Histogram with 20 bins
ggplot(data, aes(x = numerical_column)) +
geom_histogram(bins = 20, fill = "steelblue", color = "white") +
labs(
title = "Histogram - 20 Bins",
x = "Values",
y = "Count"
) +
theme_minimal()
ggsave("histogram_20bins.png", width = 8, height = 6)
# Histogram with 50 bins
ggplot(data, aes(x = numerical_column)) +
geom_histogram(bins = 50, fill = "steelblue", color = "white") +
labs(
title = "Histogram - 50 Bins",
x = "Values",
y = "Count"
) +
theme_minimal()
ggsave("histogram_50bins.png", width = 8, height = 6)Optional: Create All Three in One View
library(patchwork)
p1 <- ggplot(data, aes(x = numerical_column)) +
geom_histogram(bins = 5, fill = "steelblue", color = "white") +
labs(title = "5 Bins") + theme_minimal()
p2 <- ggplot(data, aes(x = numerical_column)) +
geom_histogram(bins = 20, fill = "steelblue", color = "white") +
labs(title = "20 Bins") + theme_minimal()
p3 <- ggplot(data, aes(x = numerical_column)) +
geom_histogram(bins = 50, fill = "steelblue", color = "white") +
labs(title = "50 Bins") + theme_minimal()
# Combine plots
p1 / p2 / p3
ggsave("histograms_combined.png", width = 8, height = 10)Create Histograms with Different Bins
# Get column name
numerical_col = 'your_numerical_column'
# Check data range
print(data[numerical_col].describe())
# Histogram with 5 bins
plt.figure(figsize=(10, 6))
plt.hist(data[numerical_col], bins=5, color='steelblue', edgecolor='white')
plt.title('Histogram - 5 Bins')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('histogram_5bins.png', dpi=300)
plt.show()
# Histogram with 20 bins
plt.figure(figsize=(10, 6))
plt.hist(data[numerical_col], bins=20, color='steelblue', edgecolor='white')
plt.title('Histogram - 20 Bins')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('histogram_20bins.png', dpi=300)
plt.show()
# Histogram with 50 bins
plt.figure(figsize=(10, 6))
plt.hist(data[numerical_col], bins=50, color='steelblue', edgecolor='white')
plt.title('Histogram - 50 Bins')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('histogram_50bins.png', dpi=300)
plt.show()Using Seaborn
# More polished histograms with seaborn
fig, axes = plt.subplots(3, 1, figsize=(10, 12))
# 5 bins
sns.histplot(data=data, x=numerical_col, bins=5, ax=axes[0], color='steelblue')
axes[0].set_title('Histogram - 5 Bins')
# 20 bins
sns.histplot(data=data, x=numerical_col, bins=20, ax=axes[1], color='steelblue')
axes[1].set_title('Histogram - 20 Bins')
# 50 bins
sns.histplot(data=data, x=numerical_col, bins=50, ax=axes[2], color='steelblue')
axes[2].set_title('Histogram - 50 Bins')
plt.tight_layout()
plt.savefig('histograms_combined.png', dpi=300)
plt.show()What to Observe
For each histogram, note:
- Shape: Is it symmetric? Skewed? Multiple peaks?
- Center: Where is most of the data?
- Spread: How wide is the distribution?
- Outliers: Any extreme values visible?
- Detail: What patterns appear/disappear with different bin widths?
Part 3: Written Reflection
Task
Write one paragraph (150-300 words) addressing:
Comparing amount visualizations: Which of your three visualizations (bar, dot, heatmap) was most effective for this data? Why?
Impact of bin width: How did changing bin width affect what you saw in the histograms? What did you learn?
Recommendations: What would you recommend to someone analyzing similar data?
What Makes a Strong Reflection
Good reflections include:
✅ Specific comparisons - “The dot plot made it easier to compare exact values because…”
✅ Trade-offs - “Bar charts emphasized magnitude well, but the heatmap revealed patterns I missed…”
✅ Evidence - “With 5 bins, the distribution looked smooth and normal, but with 30 bins I could see a small secondary peak around…”
✅ Practical insights - “For presentations, I’d use the bar chart because it’s familiar, but for detailed analysis…”
Weak reflections:
❌ “I liked the bar chart.”
❌ “The histograms were different.”
❌ Generic statements not tied to your actual visualizations
Example Paragraph Structure
[1-2 sentences comparing your amount visualizations and picking the best one with reasoning] [2-3 sentences about how bin width changed your interpretation of the distribution] [1-2 sentences with recommendations for future work or lessons learned]
Submission Guidelines
Format Your PDF
Your final PDF should contain:
- Title page (optional but nice):
- “Lab 2: Amounts & Distributions”
- Your name and date
- Part 1: Amount Visualizations
- Bar chart (labeled)
- Dot plot (labeled)
- Heatmap (labeled)
- Part 2: Distribution Visualizations
- Histogram with wide bins (labeled with bin count)
- Histogram with medium bins (labeled)
- Histogram with narrow bins (labeled)
- Additional histograms if you created more
- Part 3: Written Reflection
- Your paragraph
Creating the PDF
Several options:
- Word/Google Docs: Insert images, export as PDF
- LaTeX/Markdown: Compile to PDF
- PowerPoint: Create slides, save as PDF
- Direct export: Some tools export directly to PDF
Important: Ensure all visualizations are readable and clearly labeled!
Checklist Before Submitting
Grading Rubric
| Component | Points | Criteria |
|---|---|---|
| Bar Chart | 1 | Clear, properly labeled, appropriate orientation |
| Dot Plot | 1 | Clear, properly labeled, same data as bar chart |
| Heatmap | 1 | Appropriate color scale, readable |
| Histogram 1 | 0.5 | Wide bins, labeled |
| Histogram 2 | 0.5 | Medium bins, labeled |
| Histogram 3 | 0.5 | Narrow bins, labeled |
| Visual Quality | 1 | All charts readable, professional appearance |
| Reflection - Depth | 2 | Shows critical thinking about visualization choices |
| Reflection - Evidence | 1.5 | Uses specific examples from your visualizations |
| Formatting | 1 | Proper PDF format, organized, labeled clearly |
| Total | 10 |
Tips for Success
Time Management
- Don’t wait until the last minute! Creating visualizations takes longer than you think.
- Start with Part 1, then Part 2, then write the reflection.
- If you get stuck on one visualization, move to the next and come back.
Common Mistakes to Avoid
- Missing axis labels - Every chart needs labeled axes!
- No titles - Viewers shouldn’t have to guess what they’re looking at
- Only one histogram - You need at least 3 with different bin widths
- Vague reflection - Be specific! Reference your actual visualizations
- Unreadable text - Make sure fonts are large enough
Getting Help
- During lab: Ask questions!
- Lab instructions: This page has step-by-step code for each tool
Going Beyond (Optional)
Want to challenge yourself?
- Try creating the same visualizations in multiple tools
- Experiment with color schemes and formatting
- Create additional histogram variations (8 bins, 15 bins, etc.)
- Try density plots instead of histograms
- Compare grouped vs. stacked bar charts
- Try to build a map
Frequently Asked Questions
Q: Can I use a different dataset?
A: No, please use the provided congress.csv for consistency and grading purposes.
Q: Do I have to use all three software tools?
A: No! Pick ONE tool (Tableau, R, or Python) and stick with it for the entire lab.
Q: What if I can’t make a heatmap work?
A: If you only have one categorical variable, a color-coded table is acceptable. The point is to use color to encode amount.
Q: How many histograms do I need?
A: At least 3 with noticeably different bin widths. More is fine if you want to explore further!
Q: Can the visualizations be in color?
A: Yes! Color is encouraged, especially for the heatmap.
Q: What if my histograms all look very similar?
A: Try more extreme bin widths—go to 3-5 bins on the low end and 40-60 on the high end.
Q: How long should my reflection be?
A: 150-300 words. That’s roughly 1 good paragraph or 2 shorter paragraphs.
Q: Can I work with a partner?
A: You can discuss concepts together, but each person must create and submit their own visualizations and reflection.
Good luck! Remember, the goal is learning, not perfection. Experiment, explore, and don’t be afraid to try things!
![]()
STAT 80B – Winter 2026
