05:00
Duke University


Broadly, it’s turning data into knowledge using the computer.
Tidy messy data sets
Wrangle the written word with regular expressions
Interact with databases
Optimize code in R
Model data with complicated likelihood functions and then write algorithms to maximize the likelihood
Build shiny web apps
and more!

By the end of this course you will be able to…
write efficient R code to (1) wrangle, explore and analyze data, (2) program algorithms to make inference under a variety of data generative models
conduct independent data analysis and subsequently write and present results effectively
| Assignment | Description |
|---|---|
| Labs (30%) | Biweekly lab assignments. |
| Exams (50%) | Two hybrid (both an in-class and take-home component) exams. |
| Final Project (15%) | Written report and presentation. |
| Quizzes (5%) | In-class pop quizzes. |
Uphold the Duke Community Standard:
I will not lie, cheat, or steal in my academic endeavors;
I will conduct myself honorably in all my endeavors; and
I will act if the Standard is compromised.
Any violations in academic honesty standards as outlined in the Duke Community Standard and those specific to this course will automatically result in a 0 for the assignment and will be reported to the Office of Student Conduct for further action.
The final project and several labs will be completed in teams. All group members are expected to participate equally. Commit history may be used to give individual team members different grades. Your grade may differ from the rest of your group.
The use of online resources (including generative AI, as well as static webpages like Stack-Overflow, etc.) is strictly prohibited on in-class quizzes and exams. For take home assignments, you may make use of online resources for coding portions on assignments. If you directly use code from a source (or use it as inspiration), you must explicitly cite where you obtained the code. If you used generative AI to create the code, you should include your prompt(s) in your citation as well.
Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of source.
Narrative (non-code solutions) should always be entirely your own.
Warning
Extensive use of AI on take-home assessments will likely set you up for poor performance on graded in-class assignments.
Labs can be turned in within 48 hours of the deadline for grade penalty (5% off per day).
Exams and the final project cannot be turned in late and can only be excused under exceptional circumstances.
The Duke policy for illness requires a short-term illness report or a letter from the Dean; except in emergencies, all other absenteeism must be approved in advance (e.g., an athlete who must miss class may be excused by prior arrangement for specific days). For emergencies, email notification is needed at the first reasonable time.
Last minute coding/rendering issues will not be granted extensions.
| Resource | Description |
|---|---|
| course website | course notes, deadlines, assignments, office hours, syllabus |
| Canvas | class recordings, solutions, announcements, Ed Discussion |
| course organization | assignments, collaboration |
| RStudio containers* | online coding platform |
*You are welcome to install R and RStudio locally on your computer. If working locally you should make sure that your environment meets the following requirements:
latest R version
latest RStudio
working git installation
ability to create ssh keys (for GitHub authentication)
All R packages updated to their latest version from CRAN
If you have questions about homework/lab exercises, debugging, or any question about course materials
Warning
The teaching team will not debug via email.
When you miss a class:
Post on Ed Discussion
Create a GitHub account (unless you already have one) on https://github.com/
Tell me your username by taking this survey. This is essential to receive credit on future assignments
05:00
The fundamental building block of data in R is a vector (collections of related values, objects, other data structures, etc).
R has two types of vectors:
I will use the term component or element when referring to a value inside a vector.
R has six atomic vector types:
logical, integer, double, character, complex, raw
In this course we will mostly work with the first four. You will rarely work with the last two types - complex and raw.
If you try to combine components of different types into a single atomic vector, R will try to coerce all elements so they can be represented as the simplest type. The ordering is logical < integer < double < character, where logical is considered the “simplest”.
| Operator | Definition | Vectorized? |
|---|---|---|
| x | y | or | yes |
| x & y | and | yes |
| !x | not | yes |
| x || y | or | no |
| x && y | and | no |
| xor(x,y) | exclusive or | yes |
| Operator | Definition | Vectorized? |
|---|---|---|
| x < y | less than | yes |
| x <= y | less than or equal to | yes |
| x != y | not equal to | yes |
| x == y | equal to | yes |
| x %in% y | is x contained in y | yes (over x) |
The shorter of two atomic vectors in an operation is recycled until it is the same length as the longer atomic vector.
What do each of the following return? Run the code to check your answer.
Exercise 1.
Exercise 2.
Exercise 3.