# LOAD ANY RELEVANT PACKAGES HERE
Lab 4: Data Visualization
Remember, follow the instructions below and use R Markdown to create a pdf document with your code and answers to the following questions on Gradescope. You may find a template file by clicking “Code” in the top right corner of this page.
Collaborators
INSERT NAMES OF ANY COLLABORATORS
A. Basic visualizations
For this portion, we’ll be using the palmerpenguins
data. Use the following code to load the data.
library(palmerpenguins)
data(penguins)
Create and interpret a histogram of
bill_length_mm
using base R code. Be sure to use meaningful axis labels and titles.Create and interpret a histogram of
bill_length_mm
using ggplot2. Be sure to use meaningful axis labels and titles.Create and interpret a scatterplot of
bill_length_mm
versusbill_depth_mm
using base R code. Be sure to use meaningful axis labels and titles.Create and interpret a scatterplot of
bill_length_mm
versusbill_depth_mm
using ggplot2. Be sure to use meaningful axis labels and titles.Update your
ggplot2
scatterplot ofbill_length_mm
versusbill_depth_mm
using ggplot2 so that the color of a point represents the corresponding penguin’s species. What do you notice?
B. Analyzing trends in San Jose rental prices
For this component, you will be exploring and visualizing data on Craigslist apartment rental postings in the Bay Area. The data are available here from Tidy Tuesday, as prepared by Dr. Kate Pennington. Note that you can use links within read_csv()
to read online .csv files. I recommend saving a version of the unprocessed .csv on your machine in a data
subfolder within your project folder so you will be able to work offline.
How many 1 bedroom listings from Santa Clara county are in this dataset?
What is the median price for a 1 bedroom listing in Santa Clara county in 2018?
Which county has the highest median price for a 1 bedroom listing in 2018?
Create two histograms for the prices of 1 bedroom listings in Santa Clara county in 2005 and 2018. Compare and discuss.
Create and interpret a line plot with year on the x-axis and median price for a 1 bedroom apartment for Santa Clara county on the y-axis from 2000 to 2018.
Create and interpret a single plot with year on the x-axis and median price for a 1 bedroom apartment on the y-axis, using a separate line for each city in Santa Clara county, for the years 2000 to 2018.
C. Open ended data visualization
For this part, choose a dataset that interests you and identify a set of questions that you would like to explore via data visualizations. In particular, you should create three visualizations that satisfy the following requirements.
Instructions
- Identify three research questions of interest that you want to study using this dataset.
- For each of your three research questions, generate a data visualization using your dataset. Discuss and interpret your findings.
- Your project should include at least two different types of visualizations (e.g. scatterplots, box plots, bar plots, histograms, line plots, etc.).
- At least one of your plots should display variation over time or location (or both) in some way.
- Each visualization should include a caption that fully explains how to understand your visualization (i.e. explain all the components, legends, etc.). A good guideline is that someone who has not read your report should be able to look at just a visualization and its caption and fully understand what that visualization is showing.
- Each visualization must be accompanied by at least one paragraph of text. This text should include an interpretation of your visualization as well as what is interesting about your visualization. A strong visualization will be accompanied by text explaining what patterns or insights it helps us glean from the data.