MATH167R: Overview

Peter Gao

Course information

Instructor Peter Gao
Lectures MW 12-1:15pm (Section 01) and 1:30-2:45pm (Section 02) in MH235.
Office Hours MW 10:30-11:30am in MH311 or email for an appointment
Email

peter.gao [at sjsu]

Feel free to send me a reminder after 48 hours have passed. Please include [MATH 167R] in your subject line.

Learning objectives

Upon successful completion of this course, students will be able to:

  1. Understand the structures of R objects
  2. Import data from a variety of sources.
  3. Save data in formats that can be used by other programs.
  4. Create publication quality graphs.
  5. Download and install packages.
  6. Create reusable functions.
  7. Perform statistical analysis on R.

Learning objectives

In this class we will discuss some basic computer science concepts, but we will emphasize tools for data analysis.

Advances in computation have enabled advances at every step of the data analysis pipeline:

  • Data collection, storage, and sharing

  • Exploratory data analysis and visualization

  • Statistical inference and prediction

  • Simulation 

  • Communication and distribution of results

Where to look

Course website: Course slides, assignment instructions.

Canvas: Official syllabus, receiving grades, data.

Gradescope: Submitting assignments.

If you ever have questions about accessing materials, please contact me. If you need any kind of accommodations, please let me know as soon as possible.

What you need

Access to a computer with R and RStudio: the computer lab in MacQuarrie Hall 221 contains computers with all of the software that will be used during the semester.

All of the coursework may be completed on a personal computer and the software is freely available to students.

Course structure

Your final grade will be calculated as follows:

  • 10%: Check-ins

  • 40%: Labs

  • 20%: Final exam

  • 30%: Class project

Assignments

  • Check-ins: Throughout the semester, you will be assigned short check-in assignments. These are designed to be completed during lab or shortly after and will be due at the start of the next class. At the end of the quarter, your lowest check-in grade will be dropped. They will be graded on the following two point scale:

    • 0: indicates incomplete or unacceptable work

    • 1: effort towards completing at least 75% of the assignment

    • 2: effort towards completing the entire assignment.

Assignments

  • Labs: Throughout the semester, you will be assigned Labs. These are extended, more complicated assignments that you will likely not be able to complete during class. You will typically have two weeks for a lab assignment. At the end of the quarter, your lowest lab grade will be dropped.

Assignments

  • Final Exam: There will be one final exam during finals week. Practice questions will be provided in advance of the exam.

  • Class Project: During the semester, you will complete a class project that requires you to apply the data manipulation, visualization, and analysis skills covered in this course to a real-world dataset of your choice.

Late policy

In general, the late policy is as follows:

  • Any assignment that is received late but less than 24 hours late will receive a grade penalty of 25%.

  • Any assignment that is received 24–48 hours late will receive a grade penalty of 50%.

  • Assignments will not be accepted more than 48 hours late.

Late policy

Extensions for academic purposes (ex. conference presentation or job interview) or extreme circumstances (ex. illness, emergency) will generally be granted. If you would like to request an extension, please do your best to email me at least 24 hours before an assignment is due, along with the reason. 

Collaboration

  • You may discuss problems, approaches, and solutions with your classmates.

  • You must credit anyone with whom you worked on each assignment.

  • All submitted work must be your own; you should not submit code or answers copied from any resource including your classmates.

  • If you have any questions, ask!

ChatGPT (and other tools)…

Students may use external online resources including discussion forums (ex. StackOverflow) large language model-based chatbots (ex. ChatGPT) as aids for learning and understanding course material.

However, submitting code or answers for course assignments obtained using resources like StackOverflow or ChatGPT is not permitted.

When external resources are consulted for an assignment, they must be cited clearly at the top of your submission.

Discussion

Canvas discussion forum: post questions about assignments and answer questions from other students.

Posts may not include substantial amounts of code that can be used for a solution to any problem, but may include code snippets.

  • Worth up to 2% extra credit on your final grade!

Discussion

Bad questions:

  • How do you do problem 2?

  • Here’s my code and it’s broken. How do I fix it?

Good questions:

  • Here’s a snippet of code I used for problem 2: 
    code snippet 
    It returned the following error: 
    error message 
    Does anyone know why? I already tried...

  • I don’t understand the concept from Slide 18 today. Could anyone elaborate on why...

Office Hours (aka drop-in hours)

In-person MW 10:30-11:30am in MH311 or email for an appointment
Zoom By appointment

Come by:

  • for a snack
  • to ask questions or work with other students
  • just to chat!

My advice: make it a habit to drop by office hours, starting early in the semester when schedules are less busy.

Lecture slides

Slides will usually be posted on the course website before class. I encourage you to return to the slides after class and make sure you understand the code used and the concepts covered.

Generally speaking, code will be contained in blocks that look like this:

# example comment
example line of code

Getting to know each other

Form groups of 4-5 and discuss the following:

  1. Introduce yourself (names, major/program)

  2. What are you excited/nervous/confused about with regards to this course? What questions do you have?

  3. Have you ever used R? Programmed?

  4. What is one area of interest you would like to use statistics/data science to study?

Introduction to R

What is R?

R is a programming language designed for statistical analysis.

  • open-source

  • free

  • large and active community of developers and users

  • great analysis tools

  • great visualization tools

Why R?

In this class we will cover programming through the use of the R language, emphasizing statistical computing skills.

Advances in computation have enabled advances at every step of the data analysis pipeline:

  • Data collection, storage, and sharing
  • Exploratory data analysis and visualization
  • Statistical inference and prediction
  • Communication and distribution of results

R is a programming language tailored to these tasks.

R: A fancy calculator

# Addition
6 + 3
[1] 9
# Subtraction
6 - 3
[1] 3
# Multiplication
6 * 3
[1] 18
# Division
6 / 3
[1] 2

Here, the [1] indicates the first output value. We only have one output value here, but in the future we will have more.

R: A fancy calculator

# Exponentiation
3 ^ 20
[1] 3486784401

R can usually handle bigger numbers than your handheld calculator, but even R has limits:

# Bigger exponentiation
3 ^ 1000
[1] Inf

Assignment

R can also store values (like numbers) as objects, which can be referenced later:

x <- 7
print(x)
[1] 7
y <- 3
print(x + y)
[1] 10

A tricky calculator

What do you expect the output to be?

x <- 7
y <- x + 1
print(y)
[1] 8

What about this code?

x <- 7
y <- x + 1
x <- 0
print(y)
[1] 8

A tricky calculator

When we assign y <- x + 1, we essentially create a copy of x. As a result, when we reassign x <- 0, y is not affected.

This isn’t necessarily “right” or “wrong”– this is just how R works! R is full of these little subtleties that you need to be able to grasp in order to write functional code.

What is RStudio?

RStudio is an integrated development environment (IDE) designed to make your life easier.

  • Organizes scripts, files, plots, code console, ...

  • Highlights syntax

  • Helpful interactive graphical interface

  • Will make an efficient, reproducible workflow much easier

A tour through RStudio

By default…

  • Top left: Editor pane. Browse and edit scripts and data with tabs
  • Top right: List of objects in your Environment (recall ls()), code History
  • Bottom left: Console for running R code line-by-line (> prompt)
  • Bottom right: Files, plots, packages, help files

More than a fancy calculator

Of course, we wouldn’t have a whole course on R if this was all it could do.

Data analysis

library(palmerpenguins)
data(penguins)
lm(body_mass_g ~ flipper_length_mm, data = penguins)

Call:
lm(formula = body_mass_g ~ flipper_length_mm, data = penguins)

Coefficients:
      (Intercept)  flipper_length_mm  
         -5780.83              49.69  

Visualization

library(ggplot2)
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) + 
  geom_point() + geom_smooth(method = "lm")

Communication

R Markdown files (see examples here)

  • Combine code, output, and writing
  • Self-contained analyses
  • Creates HTML, PDF, slides (like these!), webpages, …

Installing R and RStudio

  1. Download and install R from this link

  2. Download and install RStudio Desktop from this link.