Instructor | Peter Gao |
---|---|
Lectures | MW 12-1:15pm (Section 01) and 1:30-2:45pm (Section 02) in MH235. |
Office Hours | MW 10:30-11:30am in MH311 or email for an appointment |
peter.gao [at sjsu] Feel free to send me a reminder after 48 hours have passed. Please include [MATH 167R] in your subject line. |
Upon successful completion of this course, students will be able to:
In this class we will discuss some basic computer science concepts, but we will emphasize tools for data analysis.
Advances in computation have enabled advances at every step of the data analysis pipeline:
Data collection, storage, and sharing
Exploratory data analysis and visualization
Statistical inference and prediction
Simulation
Communication and distribution of results
Course website: Course slides, assignment instructions.
Canvas: Official syllabus, receiving grades, data.
Gradescope: Submitting assignments.
If you ever have questions about accessing materials, please contact me. If you need any kind of accommodations, please let me know as soon as possible.
Access to a computer with R and RStudio: the computer lab in MacQuarrie Hall 221 contains computers with all of the software that will be used during the semester.
All of the coursework may be completed on a personal computer and the software is freely available to students.
Your final grade will be calculated as follows:
10%: Check-ins
40%: Labs
20%: Final exam
30%: Class project
Check-ins: Throughout the semester, you will be assigned short check-in assignments. These are designed to be completed during lab or shortly after and will be due at the start of the next class. At the end of the quarter, your lowest check-in grade will be dropped. They will be graded on the following two point scale:
0: indicates incomplete or unacceptable work
1: effort towards completing at least 75% of the assignment
2: effort towards completing the entire assignment.
Final Exam: There will be one final exam during finals week. Practice questions will be provided in advance of the exam.
Class Project: During the semester, you will complete a class project that requires you to apply the data manipulation, visualization, and analysis skills covered in this course to a real-world dataset of your choice.
In general, the late policy is as follows:
Any assignment that is received late but less than 24 hours late will receive a grade penalty of 25%.
Any assignment that is received 24–48 hours late will receive a grade penalty of 50%.
Assignments will not be accepted more than 48 hours late.
Extensions for academic purposes (ex. conference presentation or job interview) or extreme circumstances (ex. illness, emergency) will generally be granted. If you would like to request an extension, please do your best to email me at least 24 hours before an assignment is due, along with the reason.
You may discuss problems, approaches, and solutions with your classmates.
You must credit anyone with whom you worked on each assignment.
All submitted work must be your own; you should not submit code or answers copied from any resource including your classmates.
If you have any questions, ask!
Students may use external online resources including discussion forums (ex. StackOverflow) large language model-based chatbots (ex. ChatGPT) as aids for learning and understanding course material.
However, submitting code or answers for course assignments obtained using resources like StackOverflow or ChatGPT is not permitted.
When external resources are consulted for an assignment, they must be cited clearly at the top of your submission.
Canvas discussion forum: post questions about assignments and answer questions from other students.
Posts may not include substantial amounts of code that can be used for a solution to any problem, but may include code snippets.
Bad questions:
How do you do problem 2?
Here’s my code and it’s broken. How do I fix it?
Good questions:
Here’s a snippet of code I used for problem 2:
code snippet
It returned the following error:
error message
Does anyone know why? I already tried...
I don’t understand the concept from Slide 18 today. Could anyone elaborate on why...
In-person | MW 10:30-11:30am in MH311 or email for an appointment |
Zoom | By appointment |
Come by:
My advice: make it a habit to drop by office hours, starting early in the semester when schedules are less busy.
Slides will usually be posted on the course website before class. I encourage you to return to the slides after class and make sure you understand the code used and the concepts covered.
Generally speaking, code will be contained in blocks that look like this:
Form groups of 4-5 and discuss the following:
Introduce yourself (names, major/program)
What are you excited/nervous/confused about with regards to this course? What questions do you have?
Have you ever used R? Programmed?
What is one area of interest you would like to use statistics/data science to study?
R is a programming language designed for statistical analysis.
open-source
free
large and active community of developers and users
great analysis tools
great visualization tools
In this class we will cover programming through the use of the R language, emphasizing statistical computing skills.
Advances in computation have enabled advances at every step of the data analysis pipeline:
R is a programming language tailored to these tasks.
R can usually handle bigger numbers than your handheld calculator, but even R has limits:
R can also store values (like numbers) as objects, which can be referenced later:
What do you expect the output to be?
[1] 8
[1] 8
When we assign y <- x + 1
, we essentially create a copy of x
. As a result, when we reassign x <- 0
, y
is not affected.
This isn’t necessarily “right” or “wrong”– this is just how R works! R is full of these little subtleties that you need to be able to grasp in order to write functional code.
RStudio is an integrated development environment (IDE) designed to make your life easier.
Organizes scripts, files, plots, code console, ...
Highlights syntax
Helpful interactive graphical interface
Will make an efficient, reproducible workflow much easier
By default…
ls()
), code History>
prompt)Of course, we wouldn’t have a whole course on R if this was all it could do.
R Markdown files (see examples here)