This course meets on Tuesdays and Thursdays in Sequoia Hall, Room 200.
Note that the meetings only take place during the first 5 weeks of the quarter. The first meeting is on March 29, 2022. The final meeting is on April 28, 2022.
Harrison Li. I am a second year PhD student in Statistics at Stanford. My e-mail is hli90722@stanford.edu.
Office hours: Tuesdays 9:00 am - 11:00 am, March 29 - April 26, 2022.
The course aims to provide a fast-paced introduction to R. No prior experience in programming or statistics is assumed.
As a statistics course, the focus is not just on writing code but also on how to leverage R effectively to do good data science. This means that you will be challenged to apply the basic ideas covered in class to solve real-world data problems, in a way that may require substantial critical thinking and persistence. The hope is that this will help you deepen your understanding of how to think about data.
Here is a broad outline of the topics we will cover:
More details can be found in the detailed course schedule below.
Grades in this course are solely determined by weekly problem sets. There will be 4 problem sets, equally weighted, released on Thursdays and due the following Thursday at 1:30 pm. Please see the detailed course schedule below. Homeworks submitted between 0 and 24 hours late will lose 20% of the maximum possible credit. No late homeworks will be accepted more than 24 hours after the due date. This is due to the fast pace of the course.
All problem subparts are graded on a 0-1-2 scale, and are equally weighted within each assignment. Some will be much more straightforward than others.
Please submit each assignment as a single PDF file knitted from RMarkdown. Assignments submitted in any other format will not be accepted.
Since there are no other graded assignments in the course, the problem sets will not necessarily be easy, so please do not try to do them the morning of the due date.
You are encouraged to discuss the problems with other students in the class, but every individual student must submit their own code.
Note that Internet sources are allowed for consultation on homeworks. This is because anyone doing programming in the real world will constantly be searching things up on the Internet.
However, all code you write must be original and your own work; you need to also cite any sources that you use that are outside the scope of the class. You may not use Internet forums to ask for answers to the assignments. Failure to adhere to these guidelines may constitute a violation of the Honor Code, leading to possible disciplinary actions.
All problem set questions can be solved using the tools presented in lecture, though as discussed above, some may require some creative thinking in terms of how to apply them.
You are highly encouraged to try all the problems alone before asking for help. You will learn the best if you figure something out yourself. Do not be discouraged if you cannot solve a question immediately — it is part of the learning process. That said, do not hesitate to reach out to other classmates and/or myself for help.
We will use Gradescope for assignment submission, and Canvas for course materials.
Each class meeting will have 2 components: a lecture portion, followed by a more interactive “lab” portion where no new content is presented, but we review together the concepts from lecture using one or more realistic data examples.
You are required to bring a laptop computer to class. This is so you can follow along interactively with the labs. Attendance is expected.
Unit | Lecture | Date | Topics | Assignments |
---|---|---|---|---|
Fundamentals | 1 | 3/29/2022 | Introduction to R and RStudio. Basic data structures. | |
Fundamentals | 2 | 3/31/2022 | Data frames, functions, packages, and the tidyverse. | HW 1 released |
Data manipulation | 3 | 4/5/2022 | File I/O, paths, logical and comparison operators. Introduction to dplyr. | |
Data manipulation | 4 | 4/7/2022 | Summary statistics, grouping, joins. Categorical and quantitative data variables. | HW 1 due; HW 2 released |
Data visualization | 5 | 4/12/2022 | Introduction to ggplot2. Univariate visualizations. | |
Data visualization | 6 | 4/14/2022 | More ggplot2. Multivarite visualizations. | HW 2 due; HW 3 released |
Data modeling | 7 | 4/19/2022 | Simple linear regression: Correlation, prediction, and graphical diagnostics. | |
Data modeling | 8 | 4/21/2022 | Multiple linear regression. Polynomial regression and locally weighted regression. | HW 3 due; HW 4 released |
Data modeling | 9 | 4/26/2022 | Hypothesis testing. Parametric and non-parametric tests. | |
Data modeling | 10 | 4/28/2022 | Model evaluation and selection. Hypothesis testing for nested linear models. Cross validation. Prediction vs. inference. | HW 4 due |
Stanford is committed to providing equal educational opportunities for disabled students. Disabled students are a valued and essential part of the Stanford community. We welcome you to our class.
If you experience disability, please register with the Office of Accessible Education (OAE). Professional staff will evaluate your needs, support appropriate and reasonable accommodations, and prepare an Academic Accommodation Letter for faculty. To get started, or to re-initiate services, please visit oae.stanford.edu.
If you already have an Academic Accommodation Letter, we invite you to share your letter with us. Academic Accommodation Letters should be shared at the earliest possible opportunity so we may partner with you and OAE to identify any barriers to access and inclusion that might be encountered in your experience of this course.