Project Proposal

STAT 80: Data Visualization

Overview

Due: End of Week 4
Points: 100 (10% of final grade)
Format: PDF submission via Canvas Groups: Work HAS to be done in pairs (and one trio)

This is the first checkpoint for your final project. The goal is to help you select a dataset, start exploring it, and get early feedback before you invest too much time in a particular direction.

What You’ll Submit

Your proposal should be 3-4 pages total:

  • Pages 1-3: Your exploratory visualizations with brief captions
  • Page 4: Your written plan (see requirements below)

Part 1: Choose Your Dataset (Required)

Dataset Requirements

Your dataset should:

  • Have at least 100 observations (rows)
  • Include at least 5 variables (columns) of different types (numbers, categories, dates, etc.)
  • Come from a trustworthy source (see examples below)
  • Be something you’re genuinely curious about

Trusted Data Sources

Here are some excellent places to find datasets:

Government & Official Sources:

Other Trustworthy Sources:

What to Write About Your Dataset

In your written plan (Part 3), include:

  1. Where the data comes from: Name the source and provide a link
  2. Why you trust this source: Is it from a government agency, research institution, reputable organization?
  3. What the data represents: What does each row represent? (e.g., “Each row is one country in 2020” or “Each row is one student’s survey response”)
  4. When the data was collected: Is it recent? Historical? Does timing matter for your questions?

Example: > “My dataset comes from the CDC’s National Health and Nutrition Examination Survey (NHANES) 2017-2020. I trust this source because the CDC is a federal agency that uses rigorous data collection methods. Each row represents one survey participant, and the data includes demographic information, dietary habits, and health measurements. This data was collected between 2017-2024, which is recent enough to reflect current health patterns.”

Part 2: Create 3-5 Exploratory Visualizations (Required)

What is Exploratory Visualization?

Exploratory visualizations are the charts you make for yourself when you first start working with data. They help you:

  • Understand what variables you have
  • See patterns or interesting features
  • Identify potential problems (missing data, outliers, errors)
  • Generate ideas for research questions

These don’t need to be perfect or beautiful! They’re about discovery.

What to Include

Create 3-5 different visualizations that show:

  1. At least one chart showing a single variable’s distribution
    • Histogram of a numeric variable
    • Bar chart of a categorical variable
  2. At least one chart showing a relationship between two variables
    • Scatterplot of two numeric variables
    • Grouped bar chart comparing categories
    • Line chart over time
  3. At least one chart exploring something that surprised you or caught your interest

Tips for Success

  • Use simple chart types: Histograms, bar charts, scatterplots, and line charts are your friends
  • Add titles and labels: Even for exploratory work, label your axes!
  • Write captions: Under each chart, write 1-2 sentences about what you observe
  • Try different variables: Don’t make 5 histograms - mix it up!

Good Caption Example: > “This histogram shows the distribution of ages in the dataset. Most participants are between 20-60 years old, with a peak around 30-40. There are relatively few participants over 70.”

Not Helpful: > “Figure 1: Age distribution”

Tools You Can Use

  • Tableau (what we’re learning in class)
  • R (if you’re comfortable with it)
  • Python (if you’re comfortable with it)
  • Excel or Google Sheets (totally fine for exploration!)

Part 3: Written Plan (Required)

On your final page, include:

A. Dataset Description (1-2 paragraphs)

Write a paragraph answering:

  • Where does your data come from? (Include a link/citation)
  • Why do you trust this source?
  • What does the data represent? (What is one row?)
  • When was it collected?
  • What variables does it include? (List 5-7 key variables)

B. Three Research Questions (Required)

List 3 specific questions you want to explore with visualizations. Good research questions are:

  • Specific: “How does income vary by education level?” not “What about income?”
  • Visual: Can be answered by looking at a chart
  • Interesting: Something you actually want to know!

Example Research Questions:

❌ Too vague: “What about climate change?”
✅ Better: “How have global temperatures changed over the past 50 years?”

❌ Too simple: “What’s the average price?”
✅ Better: “How do housing prices compare across different neighborhoods, and has this changed over time?”

❌ Not visual: “What is the correlation coefficient?”
✅ Better: “Is there a relationship between hours studied and exam scores?”

C. Visualization Plan (1 paragraph)

Describe the types of visualizations you’re planning to create for your final project. You don’t need to be too specific yet, but give us a sense of your approach.

Example: > “For my final project, I plan to create an infographic about food deserts in California. I’ll include a map showing the location of food deserts across different counties, bar charts comparing access to grocery stores across urban vs. rural areas, and a comparison chart showing the relationship between food desert status and health outcomes. I’m also considering a timeline showing how food desert patterns have changed over the past decade.”

Grading Rubric (100 points)

Component Points What We’re Looking For
Dataset Description 25 Dataset meets requirements; source is clearly identified and trustworthy; proper citation included
Initial Visualizations 25 3-5 visualizations created; variety of chart types; clear labels and titles; thoughtful captions; shows genuine exploration
Research Questions 20 Three specific, visual, and interesting questions listed; questions are answerable with the chosen dataset
Planned Visualizations 20 5-7 visualizations planned; clear rationale; appropriate types; shows course knowledge; realistic visualization plan; demonstrates understanding of project scope
Overall Quality 10 Professional, clear writing; proper formatting; easy to follow

What Happens Next?

After you submit your proposal:

  1. You’ll receive feedback (within 1 week) on your dataset choice and research questions
  2. You can revise your approach based on feedback
  3. You’ll continue working on this project through the quarter
  4. Your final deliverable will be an infographic incorporating your visualizations

Frequently Asked Questions

Q: Can I change my dataset after the proposal?
A: Yes, but only with instructor approval. The proposal is meant to help you avoid problems later, so if you get feedback suggesting a change, take it seriously!

Q: Do my exploratory visualizations need to be beautiful?
A: No! They should be clear and labeled, but they’re exploratory. Beauty comes later.

Q: What if I can’t think of good research questions?
A: Start by exploring your data! Make some charts. What surprises you? What makes you curious? Those surprises often lead to the best questions.

Q: Can I use a dataset from my major or research?
A: Absolutely! Using data related to your field is encouraged.

Q: How many variables should my dataset have?
A: At least 5, but more is better. You want enough variables to explore interesting relationships.

Q: What if my dataset is really big (millions of rows)?
A: That’s fine! You can work with a sample for your exploratory phase. Just mention this in your dataset description.

Getting Help

  • Office hours: Come show me your dataset and ask questions!
  • Discussion board: Share dataset ideas and get feedback from peers on Ed Discussion
  • Library: Research librarians can help you find datasets in your area of interest

Tips for Success

  1. Start early: Finding the right dataset takes time
  2. Choose something you care about: You’ll be working with this data all quarter
  3. Don’t overthink it: Your proposal doesn’t need to be perfect
  4. Ask questions: Use office hours or the discussion board
  5. Have fun: This is your chance to explore something that interests you!

Remember: The goal of this proposal is to get you started and get you feedback. We want to help you succeed, so don’t worry about having everything figured out perfectly. Just show us you’re thinking carefully about your data and your questions!