<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-12-20/weather_forecasts.csv') weather_forecasts
Lab 3: Descriptive Statistics
Follow the instructions below and use R Markdown to create a pdf document with your code and answers to the following questions on Gradescope. You may find a template file by clicking “Code” in the top right corner of this page.
Your final submission should clearly include all code needed to generate your answers and should be formatted according to the guidelines outlined in class. In particular, make sure:
- Code and output are clearly organized by question.
- Unnecessary messages, warning, and output are removed.
You may collaborate with your classmates and consult external resources, but you should write and submit your own answer. Any classmates with whom you collaborate should be credited at the top of your submission. Similarly, if you consult any external references, you should cite them clearly and explicitly.
A. Weather Forecast Data
- For this lab, we’ll be using data on weather forecasts gathered by student at Saint Louis University. You can read about the dataset here. Download the weather forecasts data using the following code:
- How many rows are in this dataset? How many columns?
# YOUR CODE HERE
- How many cities are represented in this dataset?
# YOUR CODE HERE
- Create a new data frame containing only the forecasts for San Jose. You may have to explore the values for the
city
variable.
# YOUR CODE HERE
- Compute the mean absolute error between
observed_temp
andforecast_temp
for San Jose.
# YOUR CODE HERE
- Compute the mean absolute error between
observed_temp
andforecast_temp
for San Jose using only forecasts made 48 hours in advance.
# YOUR CODE HERE
- Compute the mean absolute error between
observed_temp
andforecast_temp
for San Jose using only forecasts made 12 hours in advance.
# YOUR CODE HERE
- Compare your answers to 6 and 7. What do you notice? How does this compare to your expectation?
- Pick two cities in this dataset. Investigate whether the forecast accuracy is better for one city than for the other, using an appropriate statistic. Discuss your findings.
B. Find your own data
For this component, pick a Tidy Tuesday dataset and complete the following activity.
- Provide a brief description of your dataset. Identify at least two questions you could try to answer using this dataset.
- Open your dataset in R and compute one or more descriptive statistics that shed light on your questions. Discuss your findings.
- Are there any limitations of your analysis? Could additional data or more complicated methods improve your analysis? Discuss.