STAT 80B Final Project Guidelines
Data Visualization Term Project
Project Overview
The final project is your opportunity to demonstrate mastery of data visualization principles by creating a comprehensive visual analysis of a dataset of your choice. You will produce a series of high-quality visualizations that tell a coherent story about your data, demonstrating both technical proficiency and design excellence.
| Component | Weight | Due Date | Deliverable |
|---|---|---|---|
| Project Proposal | 10% | Week 5 (Friday) | PDF with dataset description, initial visualizations, questions, and plan |
| Exploratory Data Analysis (EDA) | 10% | Week 9 (Friday) | PDF with 10-15 exploratory visualizations and summary |
| Presentation | 10% | Week 10 (in class) | 8-10 minute presentation with peer feedback |
| Final Visual Report | 13% | Finals Week | 5-8 page report with publication-quality visualizations |
| Reproducible Documentation | 2% | Finals Week | Code (R/Python) OR Process Documentation (Tableau) |
| TOTAL | 45% |
Project Goals
By completing this project, you will:
- Apply visualization principles from the entire course to a real-world dataset
- Make informed decisions about chart types, color schemes, and design elements
- Create publication-quality visualizations suitable for professional portfolios
- Communicate complex data insights through visual storytelling
- Demonstrate technical proficiency with your chosen tool (Tableau, R, or Python)
- Provide and receive constructive peer feedback
Working in Pairs
- Both partners must contribute equally and submit a joint statement describing each person’s contributions
Component 1: Project Proposal (10%)
Due: Week 5 (Friday, 11:59 PM)
Submission: PDF via Canvas
Purpose
The proposal ensures you have a suitable dataset and clear research direction before investing significant time in the project. This is an opportunity to get early feedback from the instructor.
Requirements
Your proposal should be 3-4 pages (including visualizations) and include:
1. Dataset Description (1 page)
- Source: Where did you obtain the data? Include URL or citation
- Topic: What is this data about? Why is it interesting?
- Variables: List key variables with descriptions and data types (quantitative continuous, quantitative discrete, categorical, ordinal)
- Size: Number of observations and variables
- Quality: Are there missing values? Outliers? Data quality issues?
- Relevance: Why did you choose this dataset? What makes it suitable for visualization?
- Trust Is the source of the data an institution we can trust? If not, why do you trust the entity that produced the data?
2. Initial Exploratory Visualizations (1-2 pages)
Create 3-5 initial visualizations that explore your data:
- Use at least 3 different visualization types
- Include proper titles, axis labels, and legends
- Add 2-3 sentence captions explaining what each visualization shows
- Tool flexibility: Use Tableau, R, or Python - whichever you plan to use for the final project
These are exploratory - they don’t need to be polished or publication-ready yet.
3. Research Questions (0.5 page)
List 3 specific questions you want to explore through visualization. Good questions are:
- Specific: “How has temperature changed in California over the past 50 years?” not “What about climate?”
- Answerable: Your data should contain the information needed
- Visual: The answer would benefit from visualization
- Interesting: The answer isn’t immediately obvious
Examples:
- How do housing prices vary by neighborhood and over time?
- What factors are most strongly associated with customer satisfaction?
- How have COVID-19 case rates differed across age groups and regions?
4. Planned Visualizations (0.5 page)
Outline 5-7 visualizations you plan to create for your final report:
- What type of chart will you use? (scatter plot, time series, choropleth map, etc.)
- What variables will you show?
- What question will it answer?
- Why is this visualization type appropriate?
Example: > Visualization 3: Time series line graph
> Variables: Date (x-axis), Average temperature (y-axis), colored by region
> Question: How has temperature changed over time across different regions?
> Why: Line graphs effectively show trends over time; color distinguishes regions
Tool-Specific Guidance
- Connect to your data source (CSV, Excel, database)
- Create initial visualizations using drag-and-drop
- Export each visualization as an image (Worksheet → Export → Image)
- Include screenshots in your PDF proposal
- Mention if you plan to use calculated fields or parameters
- Load data using
read.csv(),read_excel(), or appropriate function - Create visualizations using
ggplot2 - Save plots using
ggsave() - Include code snippets if they clarify your approach
- Cite any LLM assistance:
# Used ChatGPT to help with faceting syntax
- Load data using
pandas(pd.read_csv(), etc.) - Create visualizations using matplotlib, seaborn, plotly, or altair
- Save figures using
plt.savefig()or equivalent - Include code snippets if they clarify your approach
- Cite any LLM assistance:
# Generated initial plot with Claude assistance
Submission Format
- File type: PDF only
- Filename:
LastName_FirstName_Proposal.pdf(orLastName1_LastName2_Proposal.pdffor pairs) - Page limit: 3-4 pages (content beyond 4 pages will not be reviewed)
- Include: Your name, date, and “STAT 80B Project Proposal” at the top
Grading Rubric: Project Proposal (100 points)
| Criterion | Excellent (90-100%) | Good (80-89%) | Satisfactory (70-79%) | Needs Improvement (<70%) | Points |
|---|---|---|---|---|---|
| Dataset Description | Complete, clear description with all required elements; dataset is appropriate and interesting | Mostly complete; minor gaps in description; dataset is suitable | Incomplete description; missing some key information; dataset concerns | Significant gaps; dataset may not be suitable for project | /25 |
| Initial Visualizations | 3-5 clear visualizations; diverse types; appropriate for data; properly labeled | 3-5 visualizations; mostly appropriate; minor labeling issues | 2-3 visualizations; limited variety; labeling issues | <2 visualizations or major appropriateness issues | /25 |
| Research Questions | 3 specific, answerable, interesting questions well-suited for visualization | 3 questions; mostly specific and answerable; could be stronger | 3 questions but vague or not well-suited for visualization | <3 questions or very weak questions | /20 |
| Planned Visualizations | 5-7 visualizations planned; clear rationale; appropriate types; shows course knowledge | 5-7 visualizations; mostly appropriate; rationale could be clearer | 4-5 visualizations; weak rationale; questionable choices | <4 visualizations or very poor planning | /20 |
| Presentation & Writing | Professional, clear writing; proper formatting; easy to follow | Clear writing; minor formatting issues | Adequate writing; formatting issues affect clarity | Poor writing or formatting significantly hinders understanding | /10 |
Component 2: Exploratory Data Analysis (EDA) (10%)
Due: Week 9 (Friday, 11:59 PM)
Submission: PDF via Canvas
Purpose
The EDA is a complete exploratory analysis of your dataset where you “play” with the data, discover patterns, identify relationships, and determine which visualizations best tell your story. This is where you refine your approach before creating polished final visualizations.
Requirements
Your EDA should be 8-12 pages and include:
1. Introduction (1 page)
- Brief dataset description (can build on your proposal)
- Updated or refined research questions (if they’ve evolved)
- Overview of your exploration process
2. Exploratory Visualizations (5-8 pages)
Create 10-15 exploratory visualizations that:
- Explore different aspects of your data
- Try multiple visualization types for the same data (compare effectiveness)
- Experiment with different variables and relationships
- Test different design choices (colors, scales, arrangements)
- Include both “winners” (visualizations you’ll polish for final) and “experiments” (things you tried)
For each visualization:
- Include a descriptive title
- Label axes and include legends as needed
- Write 2-4 sentences about what you observe and what you learned
Visualization variety requirements:
- At least 5 different visualization types (e.g., bar chart, scatter plot, time series, box plot, heatmap)
- At least 2 multi-panel or small multiple figures
- At least 1 visualization showing uncertainty or distributions
3. Findings and Patterns (1-2 pages)
Summarize what you discovered:
- What patterns or relationships did you find?
- What surprised you?
- What anomalies or outliers did you notice?
- How did your understanding of the data evolve?
- What challenges did you encounter (data quality, missing values, etc.)?
4. Final Visualization Selection (1 page)
Identify your top 5-7 visualizations for your final report:
- List which exploratory visualizations you plan to polish
- Explain why you chose these (most important insights, clearest communication, etc.)
- Note any design improvements you plan to make
Example: > Selected for Final: Exploratory Viz #3 (Scatter plot of price vs. square footage)
> Why: Clearly shows positive relationship; interesting outliers to discuss
> Planned improvements: Add color by neighborhood, increase label size, add trend line, improve title
5. Reflection (0.5-1 page)
- What did you learn from the exploration process?
- What would you do differently if starting over?
- What additional data would be helpful?
Tool-Specific Guidance
- Use worksheets to create individual visualizations
- Experiment with Show Me for different chart types
- Try different color palettes and test for colorblindness
- Use dashboards to arrange multiple views
- Export each visualization separately or as a dashboard
- Save your Tableau workbook - you’ll need it for the final project
- Create a well-organized R script or R Markdown document
- Use
ggplot2for consistency - Try different themes:
theme_minimal(),theme_classic(), etc. - Experiment with faceting:
facet_wrap(),facet_grid() - Save exploratory plots:
ggsave("exploratory_plot_01.png") - Comment your code to document your thought process
- Cite LLM assistance where used
- Use a Jupyter notebook for interactive exploration
- Try different libraries: matplotlib for basic plots, seaborn for statistical graphics, plotly for interactive
- Experiment with subplots:
plt.subplot()orfig, axes = plt.subplots() - Document your thought process with markdown cells
- Save figures:
fig.savefig("exploratory_plot_01.png") - Cite LLM assistance where used
Submission Format
- File type: PDF only
- Filename:
LastName_FirstName_EDA.pdf - Page limit: 8-12 pages (content beyond 12 pages will not be reviewed)
- Include: Your name, date, and “STAT 80B EDA” at the top
Grading Rubric: EDA (100 points)
| Criterion | Excellent (90-100%) | Good (80-89%) | Satisfactory (70-79%) | Needs Improvement (<70%) | Points |
|---|---|---|---|---|---|
| Quantity & Variety | 10-15 visualizations; >5 types; includes multi-panel and uncertainty | 10-15 visualizations; 4-5 types; mostly meets requirements | 8-10 visualizations; 3-4 types; missing some requirements | <8 visualizations; limited variety | /25 |
| Quality of Exploration | Thoughtful experimentation; tries multiple approaches; clear learning progression | Good exploration; some experimentation; adequate variety | Limited exploration; mostly straightforward visualizations | Minimal exploration; superficial analysis | /25 |
| Findings & Patterns | Clear, insightful summary; identifies interesting patterns; discusses challenges | Good summary; identifies main patterns; mentions challenges | Basic summary; obvious patterns only; limited reflection | Weak summary; misses key patterns | /20 |
| Final Viz Selection | 5-7 selections with clear, strong rationale; shows critical thinking | 5-7 selections; adequate rationale | 4-5 selections; weak rationale | <4 selections or very poor rationale | /15 |
| Documentation | All visualizations well-labeled and captioned; clear, professional presentation | Most visualizations well-documented; minor issues | Some documentation issues; harder to follow | Poor documentation; difficult to understand | /15 |
Component 3: Presentation (10%)
Due: Week 10 (in class)
Format: 8-10 minute presentation + Q&A
Purpose
Present your project to the class, sharing your data story and key visualizations. This is an opportunity to practice communicating insights clearly and receiving feedback before finalizing your report.
Requirements
Presentation Structure (8-10 minutes)
1. Introduction (1-2 minutes)
- What is your dataset? Where is it from?
- Why is it interesting or important?
- What questions are you exploring?
2. Key Visualizations (5-6 minutes)
- Show your top 5-7 visualizations (the ones going in your final report)
- For each visualization:
- Explain what it shows (2-3 sentences)
- Highlight the main insight or pattern
- Mention 1-2 key design decisions you made
- Build a narrative - connect your visualizations into a coherent story
3. Insights & Conclusions (1-2 minutes)
- What did you learn from your data?
- What are the main takeaways?
- What surprised you or challenged your expectations?
4. Q&A (2-3 minutes)
- Answer questions from classmates and instructor
- Be prepared to explain design choices
Presentation Format
- Slides: Create a slide deck (PowerPoint, Google Slides, or PDF)
- Timing: Practice to stay within 8-10 minutes
- Delivery: Speak clearly; engage with your audience; avoid reading slides
- Visuals: Let your visualizations speak - don’t overcrowd slides with text
Recommended slide structure:
- Title slide (project title, your name)
- Dataset introduction
- Research questions
- 5-7 slides with visualizations (one per slide, or 2 small ones)
- Conclusions slide
- (Optional) Thank you / Questions slide
Tool-Specific Guidance
- Export high-quality images: Worksheet → Export → Image (PNG, highest quality)
- Or create a Story in Tableau and present directly from Tableau
- Consider using Tableau’s presentation mode (full screen)
- Test your exported images in slides beforehand
- Export plots at high resolution:
ggsave("plot.png", width=10, height=6, dpi=300) - Or knit an R Markdown presentation (ioslides, slidy, or xaringan)
- Ensure all plots are clearly visible on projector
- Have backup static images in case of technical issues
- Export high-quality figures:
plt.savefig("plot.png", dpi=300, bbox_inches='tight') - Or use Jupyter notebook in presentation mode
- For interactive plotly charts, export as static images for slides
- Test that visualizations are clearly visible
Peer Feedback
During presentations, all students will provide feedback to their classmates:
Your Responsibilities as Audience Member
- Listen actively to each presentation
- Complete a feedback form for each presentation (provided in class)
- Ask at least one question during the Q&A period (across all presentations)
- Provide constructive feedback that is:
- Specific (not just “good job”)
- Balanced (mention strengths and areas for improvement)
- Actionable (suggest concrete improvements)
- Respectful (kind tone, focus on the work not the person)
Feedback Form Questions
You will evaluate each presentation on:
- Clarity: Were the research questions clear? Was the presentation easy to follow?
- Visualizations: Were the visualizations effective? Well-designed? Appropriately chosen?
- Insights: Did the presenter communicate interesting findings? Was there a clear data story?
- Delivery: Was the presentation well-paced? Engaging? Professional?
- One strength: What was the best aspect of this presentation?
- One suggestion: What could be improved for the final report?
Completing thoughtful peer feedback forms is part of your participation grade. These forms help your classmates improve their final reports and contribute to our learning community.
Submission Format
- Slides: Upload PDF of slides to Canvas by the start of class
- Filename:
LastName_FirstName_Presentation.pdf - Peer feedback forms: Complete in class (paper or electronic, as instructed)
Grading Rubric: Presentation (100 points)
| Criterion | Excellent (90-100%) | Good (80-89%) | Satisfactory (70-79%) | Needs Improvement (<70%) | Points |
|---|---|---|---|---|---|
| Content & Story | Clear, compelling narrative; research questions well-defined; strong coherence | Good narrative; clear questions; mostly coherent | Adequate content; questions okay; some coherence issues | Weak narrative; unclear questions; lacks coherence | /30 |
| Visualizations | 5-7 excellent visualizations; well-designed; clearly visible; effectively support story | 5-7 good visualizations; mostly well-designed; support story | 4-5 visualizations; some design issues; partial support | <4 visualizations or poor quality/design | /30 |
| Communication | Excellent delivery; clear explanation; engaging; perfect timing (8-10 min) | Good delivery; clear explanation; good timing | Adequate delivery; understandable; timing issues (±2 min) | Poor delivery; unclear; major timing issues | /20 |
| Design & Professionalism | Professional slides; clean layout; appropriate text; polished presentation | Professional slides; mostly clean; minor issues | Adequate slides; some layout/text issues | Poor slide design; unprofessional | /10 |
| Q&A Response | Thoughtful, clear answers; demonstrates deep understanding | Good answers; shows understanding | Adequate answers; some understanding | Weak answers; limited understanding | /10 |
Component 4: Final Visual Report (13%)
Due: Finals Week
Submission: PDF via Canvas
Purpose
The final visual report is your polished, publication-quality project deliverable. This demonstrates your mastery of data visualization principles and your ability to communicate insights through well-designed visual narratives.
Requirements
Your final report should be 5-8 pages and include:
1. Title Page (1 page)
- Project title
- Your name (and partner’s name if working in pairs)
- Course name and quarter
- Date
- (Optional) Compelling visualization as background or header image
2. Introduction (0.5-1 page)
- Dataset: Briefly describe your data (source, topic, variables, size)
- Context: Why is this data interesting or important?
- Questions: State your 3 research questions
- Purpose: What story are you telling with this data?
3. Visualizations (3-5 pages)
Present your 5-7 publication-quality visualizations:
Each visualization must include:
- The visualization itself - high quality, properly sized
- Descriptive title - tells readers what they’re looking at
- Clear labels - axis labels, legend, units
- Caption (3-5 sentences):
- What does this visualization show?
- What pattern or insight is revealed?
- Why is this important or interesting?
Design requirements:
- All visualizations must follow design principles from the course
- Appropriate chart types for each data structure
- Colorblind-safe color palettes
- Proper use of scales (no misleading axes)
- Readable labels (large enough font sizes)
- Consistent visual style across all figures
- Professional appearance
Narrative flow:
- Arrange visualizations in logical order
- Each visualization should build on the previous
- Together they tell a coherent story
4. Design Justification (1-2 pages)
For your visualizations overall, explain:
Chart Type Choices:
- Why did you choose these visualization types?
- How do they suit your data and questions?
- What alternatives did you consider?
Color & Aesthetic Decisions:
- Why did you choose these colors?
- How do they enhance understanding?
- Did you test for colorblindness?
Design Principles Applied:
- Which course principles guided your design? (proportional ink, redundant coding, multi-panel figures, etc.)
- What specific design problems did you solve?
- How did you ensure accessibility?
Iteration & Refinement:
- What changed from your EDA to final report?
- What feedback did you incorporate from your presentation?
- What design decisions were most challenging?
5. Conclusion (0.5-1 page)
- Main findings: What are the key insights from your visualizations?
- Answers to research questions: Directly address your 3 questions
- Broader implications: What do these findings mean? Why should readers care?
- Limitations: What are the limitations of your data or analysis?
- Future directions: What would you explore with more time/data?
6. (Optional) References
If you cited external sources, include a references section.
Tool-Specific Requirements
Final Visualization Quality:
- Use highest quality export settings
- PNG format, 300+ DPI recommended
- Ensure text is readable at final size
- Clean up titles, labels, tooltips
- Remove unnecessary gridlines or chart junk
- Test how visualizations look in your PDF
Polishing in Tableau:
- Edit titles: double-click on title area
- Format axes: right-click axis → Format
- Adjust colors: right-click on color legend → Edit Colors
- Add annotations: Right-click → Annotate
- Create professional dashboards with consistent styling
Final Visualization Quality:
# Export high-quality plots
ggsave("final_plot_1.png",
width = 10, height = 6,
dpi = 300,
bg = "white")Polishing in R:
- Use consistent theme:
theme_minimal()ortheme_classic() - Increase text sizes:
theme(text = element_text(size = 14)) - Clean axis labels:
labs(x = "Clear Label", y = "Clear Label") - Professional titles:
ggtitle("Descriptive Title") - Save with white background:
bg = "white" - Consider using
patchworkorcowplotfor multi-panel figures
If using R Markdown: - Can create entire report in R Markdown and knit to PDF - Use code chunks with echo=FALSE to hide code - Cite LLM assistance in code comments, not in final PDF
Final Visualization Quality:
# Export high-quality figures
fig.savefig("final_plot_1.png",
dpi=300,
bbox_inches='tight',
facecolor='white')Polishing in Python:
# Matplotlib/Seaborn
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12
plt.title("Descriptive Title", fontsize=16)
plt.xlabel("Clear Label", fontsize=14)
plt.tight_layout()
# For multiple plots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))If using Jupyter: - Can export notebook to PDF, but hide code cells - Or export visualizations and assemble in separate document - Cite LLM assistance in code comments, not in final PDF
Submission Format
- File type: PDF only (not .docx, not .pptx)
- Filename:
LastName_FirstName_FinalReport.pdf - Page limit: 5-8 pages (content beyond 8 pages will not be reviewed)
- File size: Keep under 25 MB (compress images if needed)
- Quality: Ensure all visualizations are crisp and readable
Grading Rubric: Final Visual Report (150 points)
| Criterion | Excellent (90-100%) | Good (80-89%) | Satisfactory (70-79%) | Needs Improvement (<70%) | Points |
|---|---|---|---|---|---|
| Visualizations Quality | 5-7 excellent, publication-quality visualizations; all principles applied correctly; visually stunning | 5-7 good visualizations; mostly correct principles; professional appearance | 4-5 visualizations; some principle violations; adequate quality | <4 or poor quality visualizations; major principle violations | /50 |
| Visualization Appropriateness | Perfect chart type choices for each data structure; enhances understanding; shows mastery | Good chart choices; appropriate for data; shows good understanding | Adequate choices; some questionable decisions | Poor choices; inappropriate for data | /20 |
| Design Excellence | Exceptional use of color, layout, typography; accessible; consistent style; creative yet clear | Very good design; accessible; mostly consistent; professional | Adequate design; some accessibility issues; inconsistencies | Poor design; not accessible; inconsistent | /25 |
| Narrative & Captions | Compelling story; excellent flow; insightful captions; clear takeaways | Good story; logical flow; clear captions | Basic story; adequate captions; some flow issues | Weak or absent story; poor captions | /20 |
| Design Justification | Thoughtful, detailed justification; shows deep understanding of design principles; excellent reflection | Good justification; clear understanding; good reflection | Adequate justification; basic understanding | Weak justification; limited understanding | /15 |
| Introduction & Conclusion | Excellent framing; clear questions; strong insights; thoughtful limitations | Good framing; clear questions; good insights | Adequate framing; basic insights | Weak framing; unclear or missing insights | /10 |
| Professional Presentation | Flawless formatting; perfect writing; publication-ready | Very professional; minor writing issues | Adequate formatting; some writing issues | Poor formatting; significant writing issues | /10 |
Component 5: Reproducible Documentation (2%)
Due: Finals Week (with Final Report)
Submission: Code files OR process documentation via Canvas
Purpose
Provide documentation of how you created your visualizations. This ensures reproducibility (for R/Python) or transparency of process (for Tableau), and demonstrates good workflow practices.
- R/Python users: Submit reproducible code
- Tableau users: Submit process documentation
Option A: For R and Python Users - Reproducible Code
Submit all code files needed to reproduce your final visualizations:
Required Files
1. Main code file(s):
- Submit your
.Rscript or.Rmdfile - Include code for all final visualizations
- Clean, well-commented code
- Load libraries at the top
- Set working directory or use relative paths
- Filename:
LastName_FirstName_Project.Ror.Rmd
- Submit your
.pyscript or.ipynb(Jupyter notebook) - Include code for all final visualizations
- Clean, well-commented code
- Import libraries at the top
- If using notebook, can include or hide code cells
- Filename:
LastName_FirstName_Project.pyor.ipynb
2. README file:
Create a README.md or README.txt file that includes:
- Your name and project title
- Software/tool used (R or Python) and version
- Required libraries/packages
- Instructions for running the code
- Location of data file
- Any special notes or instructions
Example README:
# STAT 80B Final Project: California Housing Prices
## Author: Jane Smith
### Software
- R version 4.3.0
- RStudio recommended
### Required Packages
- tidyverse
- ggplot2
- scales
- patchwork
### Data
- File: `california_housing.csv` (included)
- Source: [URL or citation]
### How to Run
1. Set working directory to project folder
2. Install required packages if needed: install.packages("tidyverse")
3. Run entire script: source("Smith_Jane_Project.R")
4. Visualizations will be saved as PNG files in output/ folder
### Notes
- Code uses LLM assistance (documented in comments)
- All visualizations from final report are generated by this code3. Data file (if applicable):
- Include your data file if it’s not too large (<10 MB)
- If data is public, provide clear download instructions in README
- If data is too large, include a sample or link to full data
Code Quality Standards
✅ Organization: - Clear structure with sections/comments - Related code grouped together - Logical flow from data loading → processing → visualization
✅ Comments: - Explain what each section does - Document any data transformations - Note any tricky or complex parts - Cite LLM assistance where used
✅ Reproducibility: - Code runs without errors - Paths are relative or clearly documented - All required libraries/packages listed - Clear instructions provided
✅ Style: - Consistent naming conventions - Proper indentation - Meaningful variable names - Functions if appropriate
✅ LLM Citation Example:
# Used ChatGPT to help with facet_wrap syntax for multi-panel figure
# Modified color scheme and labels for my specific data
ggplot(data, aes(x = var1, y = var2)) +
geom_point(aes(color = category)) +
facet_wrap(~region) +
scale_color_brewer(palette = "Set2") + # Changed from default
theme_minimal()What Not to Include
❌ Don’t submit: - Exploratory code that isn’t used in final visualizations - Multiple versions of the same file - Code that doesn’t run - Code you don’t understand (if using LLMs)
Submission Format for R/Python
Create a ZIP file containing:
- Main code file(s)
- README file
- Data file (if applicable and <10 MB)
ZIP filename: LastName_FirstName_Code.zip
Upload to Canvas with your final report.
Option B: For Tableau Users - Process Documentation
Since Tableau is point-and-click rather than code-based, submit a process documentation that explains how you created your visualizations.
Required: Process Documentation (2-3 pages)
Create a document that describes your Tableau workflow:
1. Overview (0.5 page)
- Tableau version used
- Data source and how it was prepared/imported
- Overall approach to building your dashboard/visualizations
2. Step-by-Step Process for Each Visualization (1-2 pages)
For each of your 5-7 final visualizations, document:
Basic Setup: - Which worksheet contains this visualization - Data source and fields used - Chart type selected (from Show Me or custom)
Key Steps Taken: - Dimensions and measures placed on shelves (Rows, Columns, Color, Size, etc.) - Any calculated fields created (show formula) - Any parameters or sets used (explain purpose) - Filters applied - Sorting or grouping applied
Design Choices: - Color palette selected and why - Format settings adjusted (axis, labels, tooltips) - Any custom annotations or reference lines added - Layout decisions for dashboards
Example Documentation for One Visualization:
Visualization 3: Housing Price by Neighborhood (Map)
Worksheet: "Price_Map"
Chart Type: Symbol Map (filled map)
Steps:
1. Drag "Neighborhood" to Detail
2. Drag "Latitude" to Rows, "Longitude" to Columns
3. Drag "Avg Price" to Color
4. Changed color palette to Orange-Blue Diverging (colorblind safe)
5. Edited color scale: set midpoint to $500,000
6. Added "Neighborhood" to Label
7. Formatted tooltips to show: Neighborhood, Avg Price, Number of Sales
8. Created calculated field for price category:
IF [Price] < 400000 THEN "Affordable"
ELSEIF [Price] < 800000 THEN "Moderate"
ELSE "Expensive"
END
9. Added filter for year (2020-2024)
10. Adjusted map style to "Light" for clarity
3. Dashboard Assembly (if applicable) (0.5 page)
If you created dashboards: - How did you arrange visualizations? - What actions or interactions did you add? - How did you ensure consistent styling? - Any layout considerations for different screen sizes?
4. Iterations and Refinements (0.5 page)
- What changed from your exploratory work to final visualizations?
- What feedback did you incorporate from your presentation?
- What design challenges did you solve?
- Any features you tried but decided not to use?
Submission Format for Tableau
Submit TWO files:
1. Tableau Workbook: - .twbx (Tableau Packaged Workbook) file - This should include your data and all visualizations - Ensure all visualizations from your final report are in the workbook - Clean up: delete exploratory worksheets you don’t need - Filename: LastName_FirstName_Project.twbx
2. Process Documentation: - PDF document describing your process - Filename: LastName_FirstName_Process.pdf
Upload both files to Canvas with your final report.
Grading Rubric: Reproducible Documentation (50 points)
For R/Python Users - Code Rubric
| Criterion | Excellent (90-100%) | Good (80-89%) | Satisfactory (70-79%) | Needs Improvement (<70%) | Points |
|---|---|---|---|---|---|
| Code Runs | Code runs perfectly; reproduces all visualizations; no errors | Code runs; reproduces most visualizations; minor issues | Code runs with some effort; some errors; partial reproduction | Code doesn’t run or major errors | /20 |
| Documentation | Excellent README; clear comments; LLM use properly cited; easy to follow | Good README; adequate comments; LLM cited | Basic README; some comments; minimal documentation | Poor or missing documentation | /15 |
| Code Quality | Clean, well-organized, efficient code; professional style | Good organization; readable code; good style | Adequate organization; readable with effort | Poor organization; hard to read | /10 |
| Reproducibility | Perfect reproducibility; all dependencies listed; clear instructions | Good reproducibility; minor setup needed | Partial reproducibility; unclear steps | Not reproducible | /5 |
For Tableau Users - Process Documentation Rubric
| Criterion | Excellent (90-100%) | Good (80-89%) | Satisfactory (70-79%) | Needs Improvement (<70%) | Points |
|---|---|---|---|---|---|
| Completeness | All visualizations documented with detailed steps; nothing missing | Most visualizations well-documented; minor gaps | Some visualizations documented; several gaps | Incomplete documentation; major gaps | /20 |
| Clarity | Crystal clear explanations; anyone could recreate visualizations from description | Clear explanations; mostly replicable | Adequate explanations; some ambiguity | Unclear; difficult to follow | /15 |
| Technical Detail | Specific details on calculated fields, parameters, filters, formatting; shows mastery | Good technical detail; shows competence | Basic detail; some steps unclear | Lacking technical detail | /10 |
| Workbook Quality | Clean, organized workbook; all final visualizations present; well-labeled | Good organization; visualizations present; mostly labeled | Adequate organization; some confusion | Poor organization; missing elements | /5 |
Dataset Suggestions
Need help finding a dataset? Here are some excellent sources:
General Data Repositories
- data.gov: https://data.gov/ - US government data
- Data is Plural: https://www.data-is-plural.com/ - Weekly newsletter of interesting datasets
- Google Dataset Search: https://datasetsearch.research.google.com/
- FiveThirtyEight: https://data.fivethirtyeight.com/ - Data from news articles
Domain-Specific
Health & Medicine: - CDC Data: https://data.cdc.gov/ - WHO Data: https://www.who.int/data - HealthData.gov: https://healthdata.gov/
Climate & Environment: - NOAA Climate Data: https://www.ncdc.noaa.gov/cdo-web/ - NASA Earth Data: https://earthdata.nasa.gov/ - Our World in Data: https://ourworldindata.org/
Social & Economic: - World Bank: https://data.worldbank.org/ - US Census Bureau: https://data.census.gov/ - Pew Research Center: https://www.pewresearch.org/ - Bureau of Labor Statistics: https://www.bls.gov/data/
Sports: - FiveThirtyEight Sports: https://fivethirtyeight.com/
Arts & Culture: - Metropolitan Museum of Art: https://github.com/metmuseum/openaccess - Spotify API: https://developer.spotify.com/ - The Movie Database (TMDB): https://www.themoviedb.org/
Dataset Criteria
A good project dataset should:
✅ Have at least 100-200 observations (rows)
✅ Have multiple variables (at least 5-7) of different types
✅ Be interesting to you personally
✅ Be appropriate for visualization (not just suited for statistical modeling)
✅ Have good data quality (or interesting quality issues to address)
✅ Allow for 5-7 different meaningful visualizations
✅ Permit public sharing (no privacy or confidentiality issues)
Tips for Success
General Advice
- Start early - good visualization requires iteration
- Choose data you care about - you’ll be working with it for weeks
- Sketch first - plan your visualizations on paper before coding
- Iterate - create version 1, get feedback, improve to version 2
- Test on others - show visualizations to friends/family; can they understand them?
- Follow the principles - review course material on design best practices
- Less is more - better to have 5 excellent visualizations than 7 mediocre ones
- Tell a story - your visualizations should connect and build on each other
Tool-Specific Tips
Tableau
- Learn keyboard shortcuts to work faster
- Use calculations for custom metrics
- Test dashboards on different screen sizes
- Save often and use version control (save as…)
R
- Use R Markdown o Quarto for integrated report creation
- Leverage ggplot2 extensions (ggthemes, patchwork, gghighlight)
- Create a consistent theme and reuse it
- Save intermediate data transformations
Python
- Use Jupyter notebooks for exploration
- Try multiple libraries (matplotlib, seaborn, plotly) and pick best for each viz
- Create reusable plotting functions
- Use virtual environment for package management
Getting Help
- Office hours: Best for discussing your specific dataset and questions
- Ed Discussion: Good for technical questions about software
- Peer feedback: Use classmates as sounding boards
- Online communities: Stack Overflow, RStudio Community, Tableau Forums
- LLMs (for R/Python): Use for code assistance, but must understand output
Frequently Asked Questions
Q: Can I change my dataset after the proposal?
A: Only with instructor permission and only before Week 7. Changes require re-submission of proposal.
Q: Can I use multiple datasets?
A: Yes, if they’re related and you can tell a coherent story. Discuss with instructor.
Q: How many visualizations is too many?
A: 7 is the maximum we’ll review. Quality over quantity!
Q: What if I’m using Tableau? Do I need to submit code?
A: No! Tableau users submit a process documentation (2-3 pages) explaining the steps taken to create each visualization, plus the .twbx workbook file.
Q: For Tableau, how detailed should the process documentation be?
A: Detailed enough that someone could recreate your visualizations. Include calculated fields (with formulas), parameters, filters, color choices, and any special formatting.
Q: Can I include interactive visualizations?
A: For Tableau/Plotly, yes, but also submit static versions. Final report PDF must include static images.
Q: What if my data has privacy concerns?
A: Either anonymize the data, aggregate it, or choose a different dataset.
Q: Can I use visualization types not covered in class?
A: Yes, but be prepared to justify why they’re appropriate.
Q: How much can I use LLMs?
A: As much as you want for R/Python code, but you MUST understand every line. See Academic Integrity policy.
Q: What if I can’t make it to the presentation day?
A: Contact instructor immediately. No-shows receive 0 without prior approval.
Q: Can I include statistical analysis in my project?
A: Light analysis is fine (means, correlations), but focus should be on visualization, not complex statistics.
Q: Do all visualizations need to be static?
A: Final report must have static images, but you can include interactive versions as supplementary materials.
This project is your opportunity to showcase everything you’ve learned about data visualization. Take pride in your work, be creative, and most importantly - tell a compelling story with your data!
Good luck! 🎨📊
