Discuss the following lines of code. What do they do?
Vectorization: applying a function repeatedly to every entry in a vector/array
Vectorization allows us to quickly carry out computations for every individual in a dataset.
Note that R recycles, repeating elements of shorter vectors to match longer vectors. This is incredibly useful when done on purpose, but can also easily lead to hard-to-catch bugs in your code!
We can apply many functions component-wise to vectors, including comparison operators.
In code, entries that are TRUE or FALSE are called booleans (logicals in R). These are incredibly important, because they can be used to give your computer conditions. What will the following code do?
[1] 1 2 3 3 3
We can also do basic arithmetic with booleans. TRUE is encoded as 1 and FALSE is encoded as 0.
[1] 3
[1] 0.6
What is this last quantity telling us?
By taking the mean, we are looking at the proportion of our vector that is TRUE.
We can also get more complicated with our indexing.
We can also get more complicated with our indexing.
We can compare entire vectors using identical()
What do you think the function rev() is doing in the code above?
Hint: Use ?rev to read the help files for the function
Lists, like vectors and matrices, are a class of objects in R. Lists are special because they can store multiple different types of data.
my_list <- list("some_numbers" = 1:5,
                "some_characters" = c("a", "b", "c"),
                "a_matrix" = diag(2))
my_list$some_numbers
[1] 1 2 3 4 5
$some_characters
[1] "a" "b" "c"
$a_matrix
     [,1] [,2]
[1,]    1    0
[2,]    0    1
Make sure to store items within a list using the = operator for assigning arguments, not the assignment arrow <-
There are three ways to access an item within a list
[[]] with its name in quotes[[]] with its index as a number$ followed by its name without quotesIf you use a single bracket to index, like we do with matrices and vectors, you will return a list with a single element.
Note that this means you can only return a single item in a list using double brackets or the dollar sign! (Why?)
This is a subtle but important difference!
You can subset a list similarly to vectors and matrices using single brackets.
We can use the same tools we used to access list elements to add to a list. However, if we use double brackets, we must use quotes, otherwise R will search for something that does not yet exist.
Call names() to get a vector of list item names.
output within a list, we can always search for it, regardless of how the list was created or what else it containsA data frame in R is essentially a special type of list, where each item is a vector of equal length. Typically, we say that data has \(n\) rows (one for each observation) and \(p\) columns (one for each variable)
Unlike a matrix, columns can have different types. However, many column functions still apply! (such as colSums, summary, etc.)
There are plenty of free datasets available through R and its packages. If you haven’t already, run install.packages("palmerpenguins") in your console. Then, we can load the penguins dataset.
We can use the head function to look at the first several rows:
  species    island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
1  Adelie Torgersen           39.1          18.7               181        3750
2  Adelie Torgersen           39.5          17.4               186        3800
3  Adelie Torgersen           40.3          18.0               195        3250
4  Adelie Torgersen             NA            NA                NA          NA
5  Adelie Torgersen           36.7          19.3               193        3450
6  Adelie Torgersen           39.3          20.6               190        3650
     sex year
1   male 2007
2 female 2007
3 female 2007
4   <NA> 2007
5 female 2007
6   male 2007
Using the $ operator, we can access individual columns.
We can then use any of our useful functions for vectors to summarize this column (ex. max(), min(), mean(), median(), sum(), sd(), var(), length()).
An easy way to create a data frame is to use the function data.frame().
Like lists, make sure you define the names using = and not <-!
If you import or create numeric data as a matrix, you can also convert it easily using as.data.frame()
We can subset data frames using most of the tools we’ve learned about subsetting so far. We can use keys or indices.
We can add to a data frame using rbind() and cbind(), but be careful with type mismatches! We can also add columns using the column index methods.
We can use str() to see the structure of a data frame (or any other object!)
'data.frame':   4 obs. of  4 variables:
 $ var1: num  1 2 3 1
 $ var2: chr  "a" "b" "c" "2"
 $ var3: num  1 0 1 3
 $ var4: num  3 2 1 4
'data.frame':   4 obs. of  4 variables:
 $ var1: num  1 2 3 4
 $ var2: chr  "a" "b" "c" "d"
 $ var3: logi  TRUE FALSE TRUE FALSE
 $ var4: num  3 2 1 0
Most data frames will have column names describing the variables. They can also include rownames, which we can add using rownames().
     var1 var2  var3 var4
Obs1    1    a  TRUE    3
Obs2    2    b FALSE    2
Obs3    3    c  TRUE    1
Obs4    4    d FALSE    0