Lab 3: Proportions and Hierarchies
STAT 80B: Data Visualization - Edible Plants Edition
Overview
Due: Thursday, at the end of class
Points: 10
Submission: PDF with visualizations + written critique
In this lab, youβll practice creating and critiquing visualizations of proportions and hierarchical data using the TidyTuesday Edible Plants Database. Youβll create multiple visualizations of the same data and then analyze which ones work best and why.
Learning Objectives
By completing this lab, you will:
- Create different types of proportion visualizations (pie charts, stacked bars, side-by-side bars)
- Build hierarchical visualizations (treemaps, sunburst charts)
- Critically evaluate the strengths and weaknesses of different visualization choices
- Develop design judgment for choosing appropriate visualizations
- Explore patterns in edible plant diversity across families, regions, and uses
Dataset: Edible Plants Database
For this lab, youβll use the TidyTuesday Edible Plants Database (February 3, 2026, Week 5).
About the Data
This dataset contains information about edible plants from around the world, including:
- Plant families: Taxonomic classification (e.g., Fabaceae, Solanaceae, Rosaceae)
- Plant cultivation: Cultivation class. Brassica, Legume, etc.
- Sun light: How much sunlight the plant requires.
- Common and scientific names: Plant identification
- Possibly: nutritional information, culinary uses, growing conditions
Loading the Data
# Install tidytuesdayR if needed
# install.packages("tidytuesdayR")
# Load required packages
library(tidyverse)
library(treemapify) # For treemaps
# Load data directly from TidyTuesday
tuesdata <- tidytuesdayR::tt_load('2026-02-03')
# OR
tuesdata <- tidytuesdayR::tt_load(2026, week = 5)
edible_plants <- tuesdata$edible_plants
# View structure
glimpse(edible_plants)
head(edible_plants)
names(edible_plants) # See all column names
# View the readme for more information about the dataset
readme(tuesdata)# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import squarify # For treemaps: pip install squarify
import plotly.express as px # For interactive plots
# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 300
# Load data
edible_plants = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-02-03/edible_plants.csv')
# View structure
print(edible_plants.head())
print(edible_plants.info())
print(edible_plants.columns)
print(edible_plants.describe())- Open Tableau Desktop
- Download CSV from GitHub
https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-02-03/edible_plants.csv- In Tableau: Connect to Data β Text file β select downloaded CSV
Another option - Use R to download:
- Run R code from the R section
- Export:
write_csv(edible_plants, "edible_plants.csv") - Load into Tableau
Exploring the Data
First step: Explore whatβs actually in your dataset!
# In R - explore the structure
glimpse(edible_plants)
summary(edible_plants)
# Check unique values for key columns
edible_plants %>% count(family) %>% arrange(desc(n))
edible_plants %>% count(plant_type) %>% arrange(desc(n)) # or whatever the column is calledIn every tool: explore all the variables, how many of them are categorical, how many are numerical, what do they represent, and which questions could you answer using them?
Part 1: Visualizing Plant Diversity Proportions (4 points)
Task
Create FOUR different visualizations showing compositional patterns in the edible plants data.
Choose ONE of these approaches:
Option A: Plant Family Distribution
Analyze which plant families contribute the most edible species.
Create these four visualizations:
- Pie chart - showing proportion of plants by family (top 8-10 families)
- Bar chart - ordered by count (most common families first)
- Bar chart - different ordering (alphabetical, or grouped by botanical order)
- Donut chart - alternative circular visualization with top families highlighted
Example questions to explore:
- What proportion of edible plants come from the Fabaceae (legume) family?
- How dominant is the Rosaceae family (roses, apples, cherries)?
- Are edible plants concentrated in a few families or widely distributed?
Option B: Plant Type Comparison Across Water requirement
Compare the distribution of plant types across different categories of how much water the plant requires.
Create these four visualizations:
- Stacked bar chart - comparing plant types (vegetables, fruits, grains, herbs) by water requirement
- 100% stacked bar chart - proportional distribution showing composition differences
- Side-by-side bar chart - for direct comparison of each type across categories
- Small multiples - separate pie or bar charts for each categories
Example questions to explore:
- Does fruit species need more water?
- Which water requirement level have the greatest diversity of edible plants?
- How does plant type composition vary by water requirements?
Requirements
Each visualization must:
- β Be properly labeled (title, axis labels, legend if needed)
- β Show counts or percentages clearly
- β Use appropriate colors (consider green palette for plants, or color by category)
- β Be readable (appropriate size, font, spacing)
- β Include a brief caption (2-3 sentences) explaining what it shows and one key insight
Data Preparation Tips
Youβll need to:
- Count occurrences: Use
count()in R orvalue_counts()in Python - Calculate percentages:
count / total * 100 - Filter to top categories: Too many small slices make pie charts unreadable
- Consider showing only top 10-12 families/categories
- Group remaining into βOtherβ
- Handle missing data: Decide how to deal with NAs
- Create clear labels: Shorten long scientific names if needed
Deliverable
- 4 visualizations (can be arranged using Dashboard on Tableau)
- Each with a 2-3 sentence caption that includes one specific insight
- Professional appearance with consistent styling
Part 2: Visualizing Plant Classification Hierarchies (4 points)
Task
Create ONE hierarchical visualization showing the nested structure of edible plant classifications.
Recommended Hierarchy Structures
Option 1: Taxonomic Hierarchy
- Level 1: Taxonomy (e.g., Fabaceae, Rosaceae, Solanaceae)
- Level 2: Individual species count OR specific edible parts
Best for: Understanding botanical diversity and which families dominate our food system
Option 2: Two-Level Simplified
- Level 1: Taxonomy (top 15-20 families)
- Level 2: Water needs within each family
Best for: Clear, focused visualization when you have limited space
Visualization Type Options
Choose ONE:
- Treemap (recommended) - rectangles sized by count, excellent for comparing categories
- Sunburst chart - circular rings showing hierarchy, good for part-whole relationships
- Icicle plot - vertical rectangles, like a treemap rotated
- Circle packing - nested circles, aesthetically pleasing but harder to compare sizes
Requirements
Your visualization must:
- β Show at least 2 levels of hierarchy clearly
- β Use size to encode importance (larger = more species, more diversity)
- β
Use color strategically:
- Color by plant family (with consistent palette)
- OR color by plant type (vegetables=green, fruits=red/orange, etc.)
- OR color by region
- NOT random colors!
- β Label major categories clearly (family names, major groups)
- β Include a clear, descriptive title
- β Have a 2-3 sentence caption explaining the main pattern or insight
Deliverable
- 1 hierarchical visualization
- Professional appearance with thoughtful design
- Caption explaining what patterns are visible
Part 3: Critical Evaluation (2 points)
Task
Write a 2-page critique (approximately 200-300 words) analyzing your visualizations.
Address the following questions:
Section 1: Comparing Proportion Visualizations
For your four proportion visualizations:
- Which visualization is most effective for showing water requirement diversity patterns?
- Which visualization is least effective?
- For what audience or purpose would each be appropriate?
- What design choices did you make and why?
Section 2: Evaluating Your Hierarchical Visualization
For your hierarchical visualization:
- What story does it tell about edible plant diversity?
- What works well in your design?
- What could be improved?
- Would a different visualization type be better?
Deliverable
- Written analysis
- Thoughtful, specific critique with concrete examples from your visualizations
- References to visualization principles from class (pre-attentive attributes, color theory, etc.)
- Professional writing (well-organized, proofread!)
Grading Rubric
Part 1: Proportions (4 points)
| Criterion | Points | What Weβre Looking For |
|---|---|---|
| Four distinct visualizations | 1.6 | All four types completed; appropriate for botanical/count data; different enough to compare |
| Design quality | 1.2 | Professional appearance; thoughtful color choices; proper sizing; clean layout |
| Labels and annotations | 0.8 | Clear titles, axis labels, legends; counts/percentages shown appropriately; readable text |
| Captions with insights | 0.4 | 2-3 sentence captions that explain the visualization AND state one specific insight about plants |
Part 2: Hierarchy (4 points)
| Criterion | Points | What Weβre Looking For |
|---|---|---|
| Appropriate visualization choice | 1 | Chosen type suits the plant data and classification hierarchy |
| Hierarchy clearly shown | 1 | Multiple levels visible; parent-child relationships clear; well-organized structure |
| Design quality | 1 | Effective use of size (species count), color (families/types), layout; professional look |
| Labels and annotations | 1 | Major categories labeled; clear title; insightful 2-3 sentence caption about diversity patterns |
Part 3: Critique (2 points)
| Criterion | Points | What Weβre Looking For |
|---|---|---|
| Comparison of proportion visualizations | 0.8 | Thoughtful comparison; specific examples; considers different audiences; identifies trade-offs |
| Evaluation of hierarchical visualization | 0.8 | Honest self-assessment; identifies both strengths and weaknesses; considers alternatives |
| Writing quality | 0.4 | Clear, well-organized; proper grammar; professional tone; references class concepts |
Total: 10 points
Technical Instructions & Code Examples
R Code Examples
Data Preparation & Exploration
library(tidyverse)
library(treemapify)
# Load data
tuesdata <- tidytuesdayR::tt_load('2026-02-03')
edible_plants <- tuesdata$edible_plants
# FIRST: Explore your data!
glimpse(edible_plants)
head(edible_plants, 20)
names(edible_plants)
# Look at unique values for key columns
# (Adjust column names based on what you actually have!)
edible_plants %>% count(family, sort = TRUE)
edible_plants %>% count(plant_type, sort = TRUE)
# Option 1: Count plants by family
family_counts <- edible_plants %>%
count(family, sort = TRUE) %>%
mutate(percentage = n / sum(n) * 100)
# Filter to top 10 families for cleaner visualizations
top_families <- family_counts %>%
slice_max(n, n = 10)
# Add "Other" category for remaining families
top_families_with_other <- family_counts %>%
mutate(family_group = if_else(family %in% top_families$family, family, "Other")) %>%
group_by(family_group) %>%
summarize(n = sum(n)) %>%
mutate(percentage = n / sum(n) * 100)
# Option 2: Count by edible part
edible_part_counts <- edible_plants %>%
count(edible_part, sort = TRUE) %>% # Adjust column name!
mutate(percentage = n / sum(n) * 100)
# Option 3: Cross-tabulation - families by plant type
family_type <- edible_plants %>%
count(family, plant_type) %>%
group_by(family) %>%
mutate(total = sum(n)) %>%
ungroup() %>%
filter(total >= 5) # Only families with 5+ species
# Option 4: Regional comparison (if you have region data)
regional_comparison <- edible_plants %>%
count(region, plant_type) %>%
group_by(region) %>%
mutate(percentage = n / sum(n) * 100)Visualization 1: Pie Chart
# Basic pie chart - top plant families
ggplot(top_families, aes(x = "", y = n, fill = family)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
theme_void() +
labs(title = "Distribution of Edible Plants by Family (Top 10)",
subtitle = "Based on TidyTuesday Edible Plants Database",
fill = "Plant Family") +
scale_fill_brewer(palette = "Set3") + # or "Greens", "YlGn"
geom_text(aes(label = paste0(round(percentage, 1), "%")),
position = position_stack(vjust = 0.5),
size = 3,
color = "black")
ggsave("pie_chart_families.png", width = 8, height = 6, dpi = 300)
# Alternative: Pie chart with "Other" category
ggplot(top_families_with_other, aes(x = "", y = n, fill = family_group)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
theme_void() +
labs(title = "Edible Plant Families",
fill = "Family") +
scale_fill_viridis_d(option = "G", begin = 0.2, end = 0.9)
ggsave("pie_chart_with_other.png", width = 8, height = 6, dpi = 300)Visualization 2: Bar Chart (Horizontal)
# Horizontal bar chart - easier to read family names
ggplot(top_families, aes(x = n, y = reorder(family, n), fill = family)) +
geom_col(show.legend = FALSE) +
labs(title = "Top 10 Edible Plant Families by Species Count",
subtitle = "Number of edible species per plant family",
x = "Number of Species",
y = "Plant Family") +
theme_minimal() +
scale_fill_brewer(palette = "Set3") +
geom_text(aes(label = paste0(n, " (", round(percentage, 1), "%)")),
hjust = -0.1, size = 3.5) +
scale_x_continuous(expand = expansion(mult = c(0, 0.15))) # Make room for labels
ggsave("bar_chart_horizontal.png", width = 10, height = 6, dpi = 300)Visualization 3: Bar Chart (Vertical, Different Order)
# Vertical bar chart - alphabetical order
ggplot(top_families, aes(x = reorder(family, family), y = n, fill = family)) +
geom_col(show.legend = FALSE) +
labs(title = "Top Plant Families - Alphabetical Order",
x = "Plant Family",
y = "Number of Edible Species") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10)) +
scale_fill_viridis_d(option = "viridis") +
geom_text(aes(label = n), vjust = -0.5, size = 3)
ggsave("bar_chart_alphabetical.png", width = 10, height = 6, dpi = 300)Visualization 4: Stacked or Grouped Bars
# If comparing across regions or types:
# Stacked bar chart
ggplot(regional_comparison, aes(x = region, y = n, fill = plant_type)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Edible Plant Types by Region",
x = "Region",
y = "Number of Species",
fill = "Plant Type") +
theme_minimal() +
scale_fill_brewer(palette = "Set2") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggsave("stacked_bar_regional.png", width = 10, height = 6, dpi = 300)
# 100% stacked bar
ggplot(regional_comparison, aes(x = region, y = percentage, fill = plant_type)) +
geom_bar(stat = "identity", position = "fill") +
labs(title = "Composition of Edible Plant Types by Region",
x = "Region",
y = "Percentage",
fill = "Plant Type") +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
theme_minimal() +
scale_fill_brewer(palette = "Set2")
ggsave("stacked_bar_percent.png", width = 10, height = 6, dpi = 300)
# Grouped (side-by-side) bars
ggplot(regional_comparison, aes(x = region, y = n, fill = plant_type)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Edible Plant Types by Region - Direct Comparison",
x = "Region",
y = "Number of Species",
fill = "Plant Type") +
theme_minimal() +
scale_fill_brewer(palette = "Set2") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggsave("grouped_bar.png", width = 12, height = 6, dpi = 300)Hierarchical Visualization: Treemap
# Treemap: Family > Plant Type > Count
# Prepare hierarchical data
treemap_data <- edible_plants %>%
count(family, plant_type) %>%
filter(n >= 2) # Only include family-type combos with 2+ species
# Basic treemap
ggplot(treemap_data,
aes(area = n, fill = family, label = plant_type, subgroup = family)) +
geom_treemap() +
geom_treemap_text(colour = "white",
place = "centre",
grow = FALSE,
reflow = TRUE,
size = 10) +
geom_treemap_subgroup_border(colour = "white", size = 3) +
geom_treemap_subgroup_text(place = "centre",
alpha = 0.7,
colour = "white",
fontface = "bold",
grow = TRUE,
size = 14) +
labs(title = "Edible Plant Diversity: Family β Type",
subtitle = "Size represents number of edible species",
fill = "Plant Family") +
theme(legend.position = "bottom",
legend.text = element_text(size = 8)) +
scale_fill_viridis_d(option = "G", begin = 0.1, end = 0.9) +
guides(fill = guide_legend(nrow = 2))
ggsave("treemap_family_type.png", width = 14, height = 10, dpi = 300)
# Alternative: Just families (simpler)
family_counts_filtered <- family_counts %>%
filter(n >= 3) # Only families with 3+ species
ggplot(family_counts_filtered,
aes(area = n, fill = n, label = paste0(family, "\n", n))) +
geom_treemap() +
geom_treemap_text(colour = "white",
place = "centre",
size = 12,
grow = TRUE) +
labs(title = "Edible Plant Families by Species Count",
subtitle = "Each rectangle represents a plant family, sized by number of edible species") +
scale_fill_gradient(low = "#d4f1d4", high = "#0d5c0d",
name = "Species\nCount") +
theme(legend.position = "right")
ggsave("treemap_families_only.png", width = 12, height = 8, dpi = 300)Creating Multi-Panel PDFs
# Save all plots to one PDF using gridExtra
library(gridExtra)
p1 <- # your pie chart ggplot object
p2 <- # your first bar chart
p3 <- # your second bar chart
p4 <- # your stacked/grouped bar chart
# Arrange in grid
grid.arrange(p1, p2, p3, p4, ncol = 2,
top = "Lab 3: Edible Plants Proportion Visualizations")
ggsave("all_proportion_viz.pdf", width = 14, height = 10)
# Alternative: patchwork package
library(patchwork)
(p1 | p2) / (p3 | p4) +
plot_annotation(title = 'Edible Plants Visualizations',
theme = theme(plot.title = element_text(size = 16, face = "bold")))
ggsave("all_proportion_viz_patchwork.pdf", width = 14, height = 10)Python Code Examples
Data Preparation
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import squarify # pip install squarify
import plotly.express as px
# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 300
# Load data
edible_plants = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-02-03/edible_plants.csv')
# Explore
print(edible_plants.head())
print(edible_plants.info())
print(edible_plants.columns)
# Count by family
family_counts = edible_plants['family'].value_counts().reset_index()
family_counts.columns = ['family', 'count']
family_counts['percentage'] = (family_counts['count'] / family_counts['count'].sum()) * 100
# Top 10 families
top_families = family_counts.head(10)
# Count by edible part
edible_part_counts = edible_plants['edible_part'].value_counts().reset_index()
edible_part_counts.columns = ['edible_part', 'count']
# Cross-tabulation
family_type = pd.crosstab(edible_plants['family'], edible_plants['plant_type'])Pie Chart
# Pie chart
plt.figure(figsize=(10, 8))
colors = sns.color_palette('Set3', len(top_families))
plt.pie(top_families['count'],
labels=top_families['family'],
autopct='%1.1f%%',
startangle=90,
colors=colors,
textprops={'fontsize': 11})
plt.title('Distribution of Edible Plants by Family (Top 10)',
fontsize=16, fontweight='bold', pad=20)
plt.axis('equal')
plt.tight_layout()
plt.savefig('pie_chart.png', bbox_inches='tight', dpi=300)
plt.show()Horizontal Bar Chart
# Horizontal bar chart
plt.figure(figsize=(10, 8))
colors_bar = sns.color_palette('Set3', len(top_families))
plt.barh(top_families['family'], top_families['count'], color=colors_bar)
plt.xlabel('Number of Species', fontsize=12)
plt.ylabel('Plant Family', fontsize=12)
plt.title('Top 10 Edible Plant Families by Species Count',
fontsize=14, fontweight='bold')
# Add count labels
for i, (count, pct) in enumerate(zip(top_families['count'], top_families['percentage'])):
plt.text(count + max(top_families['count'])*0.02, i,
f'{count} ({pct:.1f}%)',
va='center', fontsize=10)
plt.gca().invert_yaxis() # Highest at top
plt.tight_layout()
plt.savefig('bar_chart_horizontal.png', dpi=300)
plt.show()Stacked Bar Chart
# Stacked bar chart (example with plant types by region)
# First create a pivot table
pivot_data = edible_plants.groupby(['region', 'plant_type']).size().unstack(fill_value=0)
# Plot
ax = pivot_data.plot(kind='bar', stacked=True,
figsize=(12, 6),
colormap='Set2')
plt.xlabel('Region', fontsize=12)
plt.ylabel('Number of Species', fontsize=12)
plt.title('Edible Plant Types by Region', fontsize=14, fontweight='bold')
plt.legend(title='Plant Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('stacked_bar.png', dpi=300)
plt.show()
# 100% stacked
pivot_pct = pivot_data.div(pivot_data.sum(axis=1), axis=0) * 100
ax = pivot_pct.plot(kind='bar', stacked=True,
figsize=(12, 6),
colormap='Set2')
plt.ylabel('Percentage', fontsize=12)
plt.title('Composition of Plant Types by Region (%)', fontsize=14, fontweight='bold')
plt.legend(title='Plant Type', bbox_to_anchor=(1.05, 1))
plt.tight_layout()
plt.savefig('stacked_bar_percent.png', dpi=300)
plt.show()Treemap
# Static treemap using squarify
plt.figure(figsize=(14, 10))
# Prepare data
sizes = top_families['count'].tolist()
labels = [f"{row['family']}\n{row['count']} species"
for _, row in top_families.iterrows()]
colors_tree = sns.color_palette('Greens', len(top_families))
squarify.plot(sizes=sizes,
label=labels,
color=colors_tree,
alpha=0.8,
text_kwargs={'fontsize': 11, 'weight': 'bold', 'color': 'white'})
plt.title('Edible Plant Families by Species Count',
fontsize=16, fontweight='bold')
plt.axis('off')
plt.tight_layout()
plt.savefig('treemap.png', dpi=300)
plt.show()
# Interactive treemap using Plotly
# Create hierarchical data
tree_data = edible_plants.groupby(['family', 'plant_type']).size().reset_index(name='count')
tree_data = tree_data[tree_data['count'] >= 2] # Filter small groups
fig = px.treemap(tree_data,
path=['family', 'plant_type'],
values='count',
title='Edible Plant Diversity: Family β Type',
color='family',
color_discrete_sequence=px.colors.qualitative.Set3)
fig.update_layout(font=dict(size=12))
fig.write_html('treemap_interactive.html')
fig.show()
# Save screenshot for PDF
fig.write_image('treemap_interactive.png', width=1400, height=1000)Sunburst Chart (Interactive)
# Sunburst chart using Plotly
fig = px.sunburst(tree_data,
path=['family', 'plant_type'],
values='count',
title='Edible Plants: Hierarchical View',
color='family',
color_discrete_sequence=px.colors.qualitative.Set3)
fig.update_layout(font=dict(size=12))
fig.write_html('sunburst_interactive.html')
fig.show()Tableau Instructions
For Proportion Visualizations
Pie Chart:
- Drag
Familyto Color (in Marks card) - Drag
Familyto Angle (or use automatic measure) - Change mark type to Pie
- Right-click on the worksheet β Duplicate to count: use
COUNT([Family]) - Add labels: Drag
Familyto Label, add percentage - Edit colors: Click Color β Edit Colors β choose palette
Bar Chart:
- Drag
Familyto Rows - Drag
CNT(Record)or count measure to Columns - Click Sort descending button
- Drag
Familyto Color (optional) - Right-click axis β Add Reference Line if needed
Stacked Bar:
- Drag
Regionto Columns - Drag
Plant Typeto Color - Drag count measure to Rows
- Analysis β Stack Marks β On
- For percentages: Right-click axis β Quick Table Calculation β Percent of Total
For Treemap
- Create hierarchy: Right-click
Familyβ Hierarchy β Create - Add
Plant Typeto the hierarchy - New worksheet: Change mark type to Square
- Drag hierarchy to Detail
- Drag count measure to Size
- Drag
Familyto Color - Drag labels to Label
- Format: Color β Edit Colors; Borders for clarity
Submission Instructions
What to Submit
Submit ONE PDF file containing:
- Part 1: Four proportion visualizations (1-2 pages)
- All four visualizations clearly visible
- Each with a 2-3 sentence caption that includes one specific insight
- Part 2: One hierarchical visualization (1 page)
- Your treemap/sunburst/etc.
- With 2-3 sentence caption about diversity patterns
- Part 3: Written critique (1 pages)
- Well-organized with clear sections
- Professional writing
Total: 3-4 pages
Formatting Requirements
- β PDF format only
- β
Filename:
Lab3_YourLastName.pdf - β Readable visualizations: Not too small! Use high DPI (300+)
- β Professional appearance: Consistent fonts, clean layout
- β Pages numbered
- β Your name on first page
How to Submit
- Create all visualizations and save as high-quality images (PNG, 300 DPI)
- Write your critique in Word/Google Docs
- Combine everything into one PDF:
- Word: File β Save As β PDF
- Google Docs: File β Download β PDF
- Or use online PDF merger
- Upload to Canvas before Thursday class ends
Tips for Success
General Tips
- Start early! Explore the data first before committing to specific visualizations
- Check column names - they may differ from examples, adapt code accordingly
- Ask questions in office hours or on the discussion board
- Be honest in your critique - self-reflection earns points!
Data Exploration Tips
- Look at the readme - it has important context about the data
- Count things first - understand distributions before visualizing
- Filter strategically - show top 10-15 categories, combine rare ones into βOtherβ
- Handle NAs - decide if you want to show missing data or exclude it
Visualization Tips
- Less is more: Donβt try to show everything at once
- Color matters: Use green for plants, or color by category consistently
- Label clearly: Plant family names can be long - consider abbreviating or using horizontal bars
- Tell a story: Whatβs the main message each visualization should convey?
- Be consistent: Use same color scheme across your four proportion visualizations
Writing Tips
- Be specific: βThe Fabaceae family dominates with 47 species (23%)β is better than βOne family is very commonβ
- Reference concepts from class: Pre-attentive attributes, color theory, gestalt principles
- Compare thoughtfully: Discuss trade-offs between visualization types
- Connect to real world: What do these patterns mean for biodiversity, food security, agriculture?
- Proofread! Typos hurt your credibility
Frequently Asked Questions
Q: What if column names are different from the examples?
A: Expected! Use glimpse(), head(), or info() to see actual column names. The readme will help too. Adapt the code to match your data.
Q: How many families should I show in my pie chart?
A: 8-12 categories maximum. More than that becomes hard to read. Group rare families into βOtherβ.
Q: Should I filter out plants with missing data?
A: Up to you! Justify your choice in the critique. You could exclude NAs or show them as a separate category.
Q: Can I create an interactive visualization?
A: Yes! Use Plotly (Python/R) or Tableau. Submit the HTML file as a bonus, but include screenshots in your PDF (required).
Q: What if I want to explore something not in the options?
A: Great! As long as you meet the requirements (4 proportion viz, 1 hierarchical, critique), explore what interests you.
Q: How critical should I be in my critique?
A: Very! Honest, thoughtful critique earns more points than generic praise. Identify real weaknesses and suggest improvements.
Q: Can I work with a partner?
A: Discuss ideas and help debug code, but submit your own work. Your visualizations and critique must be entirely your own.
Q: What if I find something really interesting in the data?
A: Awesome! Mention it in your critique. Curiosity and genuine insights are valued.
Q: Iβve never heard of some of these plant families. Is that okay?
A: Totally fine! Youβre learning about data visualization, not botany. The patterns are what matter.
