MATH167R: Functions, vectors, and matrices

Peter Gao

Warm-up

Discuss the following lines of code with a neighbor. What do they do?

x <- TRUE
y <- 3 > 4
x <- x & y
as.numeric(x)

Answer:

x <- TRUE
y <- 3 > 4
x <- x & y
as.numeric(x)
[1] 0

Overview of today

  • Functions and arguments
  • Vectors and matrices
  • Indexing data

Functions and Arguments

Functions

“To understand computations in R, two slogans are helpful: Everything that exists is an object. Everything that happens is a function call.”

(Chambers, 2014)

We have already seen some functions, including the sample function:

sides <- 1:6
rolls <- sample(sides, 10, replace = T)

and the typeof function:

x <- T
typeof(x)
[1] "logical"

Functions provide code to execute some task given a set of inputs.

Functions

A function call is a command to execute the code of a function:

function_name(argument1, argument2, ...)

Arguments or parameters are expressions/values that are the inputs to the function.

exp(-1)
[1] 0.3678794
exp(0)
[1] 1

The parentheses following the name of a function are still required even when there are no arguments:

ls()

Functions

Whenever you are using a function for the first time, it is good idea to access the documentation by typing ?function_name into the console.

?exp

Specifying arguments

A formal argument is a named argument that is used in the code of a function.

The function args displays the formal arguments:

args(sample)
function (x, size, replace = FALSE, prob = NULL) 
NULL

An actual argument t is the value specified by the user during a function call:

sides <- c("H", "T")
sample(x = sides, size = 1, replace = T)
[1] "H"

Matching arguments

The two most common ways to specify arguments are positional and exact:

  • Positional: the actual arguments are matched to the formal arguments in order:
sample(sides, 1, T)
[1] "H"
sample(1, T, sides)
Error in sample.int(x, size, replace, prob): invalid 'replace' argument
  • Exact: the actual arguments are matched to the formal arguments using names:
sample(size = 1, replace = T, x = sides)
[1] "T"

Check your understanding: functions

How can we use functions to compute (feel free to look online):

  • \(\ln 10\)
  • \(\log_{10} 10\)

Answer:

log(10)
[1] 2.302585
log10(10)
[1] 1

Vectors

Atomic Vectors

  • Last class, we introduced atomic vectors, but we only considered vectors of length one.

  • Generally, atomic vectors are sets of elements of the same type.

  • We create vectors using the function c()

    c(16, 3, 0, 7, -2)
    [1] 16  3  0  7 -2

Accessing elements of vectors

  • We index vectors using [index] after the vector name:

    x <- c(16, 3, 0, 7, -2)
    x[3]
    [1] 0
    x[4]
    [1] 7
  • If we use a negative index, we return the vector with that element removed

    x[-4]
    [1] 16  3  0 -2

Atomic vectors and data types

Note that atomic vectors can only have one type of data. So the following lines work:

x <- c(1, 2, 3)
y <- c("a", "b", "c")
z <- c(T, F, T)

but when we try

c(1, "b", 3)
[1] "1" "b" "3"

R will force the elements in our vector to be of the same type! This is a common source of bugs.

Check your understanding: vectors

What do you expect the output of the following chunk to be?

x <- c(1, 2, 3)
y <- c("a", "b", "c")
c(x, y)

Answer:

x <- c(1, 2, 3)
y <- c("a", "b", "c")
c(x, y)
[1] "1" "2" "3" "a" "b" "c"

We can use the c() function to concatenate vectors (forcing elements to be the same type).

Check your understanding: vectors

What do you expect the output of the following chunk to be?

x <- c(3 > 4, T, 5 > 6)
x[3]

Answer:

x <- c(3 > 4, T, 5 > 6)
x[3]
[1] FALSE

R evaluates expressions when creating vectors.

Useful functions for vectors

  • max(), min(), mean(), median(), sum(), sd(), var()
  • length() returns the number of elements in the vector
  • head() and tail() return the beginning and end vectors
  • sort() will sort
  • summary() returns a 5-number summary
  • any() and all() to check conditions on Boolean vectors
  • hist() will return a crude histogram (we’ll learn how to make this nicer later)

If you are unclear about what any of them do, use ? before the function name to read the documentation. You should get in the habit of checking function documentation a lot!

Generating vectors

The notation a:b generates integers starting at a and ending at b.

1:6
[1] 1 2 3 4 5 6

The rep function repeats values of the first argument.

rep("Hello", times = 3)
[1] "Hello" "Hello" "Hello"

The rnorm function randomly generates n elements with the specified mean and sd.

rnorm(n = 10, mean = 1, sd = 1)
 [1]  1.0334186  1.2893213  1.8097966  1.2287419  2.6956954 -0.1966823
 [7]  1.2522572 -0.4786321  1.2067770  1.4449909

Matrices

Matrices

  • Matrices are two-dimensional extensions of vectors: they have rows and columns
  • We can create a matrix using the function matrix()
x <- c(1, 2, 3, 4, 5)
y <- c(5, 4, 3, 2, 1)
my_matrix <- matrix(c(x, y), nrow = 2, ncol = 5, byrow = TRUE)
my_matrix
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    5    4    3    2    1

Constructing matrices

# Note: byrow = FALSE is the default
my_matrix2 <- matrix(c(x, y), nrow = 2, ncol = 5)
my_matrix2
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    4    2
[2,]    2    4    5    3    1

Warning: be careful not to call your matrix matrix! Why not?

Constructing matrices

We can also generate matrices by column binding (cbind()) and row binding (rbind()) vectors

cbind(x, y)
     x y
[1,] 1 5
[2,] 2 4
[3,] 3 3
[4,] 4 2
[5,] 5 1
rbind(x, y)
  [,1] [,2] [,3] [,4] [,5]
x    1    2    3    4    5
y    5    4    3    2    1

Indexing and Subsetting Matrices

Indexing a matrix is similar to indexing a vector, except we must index both the row and column, in that order.

my_matrix
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    5    4    3    2    1

What is the output of the following line?

my_matrix[2, 3]
[1] 3

Indexing and Subsetting Matrices

my_matrix
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    5    4    3    2    1

What is the output of the following line?

my_matrix[2, c(1, 3, 5)]
[1] 5 3 1

Dropping entries

Also similarly to vectors, we can subset using a negative index.

my_matrix
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    5    4    3    2    1
my_matrix[-2, -4]
[1] 1 2 3 5
# Note: Leaving an index blank includes all indices
my_matrix[, -c(1, 3, 4, 5)]
[1] 2 4

Dropping entries

my_matrix[, -c(1, 3, 4, 5)]
[1] 2 4
is.matrix(my_matrix[, -c(1, 3, 4, 5)])
[1] FALSE

What happened here? When subsetting a matrix reduces one dimension to length 1, R automatically coerces it into a vector. We can prevent this by including drop = FALSE.

Dropping entries

my_matrix[, -c(1, 3, 4, 5), drop = FALSE]
     [,1]
[1,]    2
[2,]    4
is.matrix(my_matrix[, -c(1, 3, 4, 5), drop = FALSE])
[1] TRUE

Filling in a Matrix

We can also fill in an empty matrix using indices. In R, you should always start by initializing an empty matrix of the right size.

my_results <- matrix(NA, nrow = 3, ncol = 3)
my_results
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   NA   NA
[3,]   NA   NA   NA

Filling in a Matrix

Then I can replace a single row (or column) using indices as follows.

my_results[2, ] <- c(2, 4, 3)
my_results
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]    2    4    3
[3,]   NA   NA   NA

We can also fill in multiple rows (or columns) at once. (Likewise, we can also do subsets of rows/columns, or unique entries). Note that recycling applies here.

my_results[c(1, 3), ] <- 7
my_results
     [,1] [,2] [,3]
[1,]    7    7    7
[2,]    2    4    3
[3,]    7    7    7

Matrix Data Types

Matrices, like vectors, can only have entries of one type.

rbind(c(1, 2, 3), c("a", "b", "c"))
     [,1] [,2] [,3]
[1,] "1"  "2"  "3" 
[2,] "a"  "b"  "c" 

Matrix functions

Let’s create 3 matrices for the purposes of demonstrating matrix functions.

mat1 <- matrix(1:6, nrow = 2, ncol = 3, byrow = TRUE)
mat1
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
mat2 <- matrix(1:6, nrow = 3, ncol = 2)
mat2
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
mat3 <- matrix(5:10, nrow = 2, ncol = 3, byrow = TRUE)
mat3
     [,1] [,2] [,3]
[1,]    5    6    7
[2,]    8    9   10

Matrix functions

Matrix Sums +

mat1 + mat3
     [,1] [,2] [,3]
[1,]    6    8   10
[2,]   12   14   16

Element-wise Matrix Multiplication *

mat1 * mat3
     [,1] [,2] [,3]
[1,]    5   12   21
[2,]   32   45   60

Matrix functions

Matrix Multiplication %*%

mat_square <- mat1 %*% mat2
mat_square
     [,1] [,2]
[1,]   14   32
[2,]   32   77

Column Bind Matrices cbind()

cbind(mat1, mat3)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2    3    5    6    7
[2,]    4    5    6    8    9   10

Matrix functions

Transpose t()

t(mat1)
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Column Sums colSums()

colSums(mat1)
[1] 5 7 9

Matrix functions

Row Sums rowSums()

rowSums(mat1)
[1]  6 15

Column Means colMeans()

colMeans(mat1)
[1] 2.5 3.5 4.5

Matrix functions

Row Means rowMeans()

rowMeans(mat1)
[1] 2 5

Dimensions dim()

dim(mat1)
[1] 2 3

Matrix functions

Determinant det()

det(mat_square)
[1] 54

Matrix Inverse solve()

solve(mat_square)
           [,1]       [,2]
[1,]  1.4259259 -0.5925926
[2,] -0.5925926  0.2592593

Matrix Diagonal diag()

diag(mat_square)
[1] 14 77

Commenting Code and Style

What is a comment?

  • Computers completely ignore comments (in R, any line preceded by #)
  • Comments do not impact the functionality of your code at all.

So why do them…

  • Commenting a code allows you to write notes for readers of your code only

  • Usually, that reader is you!

  • Coding without comments is ill-advised, bordering on impossible

  • Sneak peak at functions…

Example

#' Wald-type t test
#' @param mod an object of class \code{bbdml}
#' @return Matrix with wald test statistics and p-values. Univariate tests only.
waldt <- function(mod) {
  # Covariance matrix
  covMat <- try(chol2inv(chol(hessian(mod))), silent = TRUE)
  if (class(covMat) == "try-error") {
    warning("Singular Hessian! Cannot calculate p-values in this setting.")
    np <- length(mod$param)
    se <- tvalue <- pvalue <- rep(NA, np)
  } else {
    # Standard errors
    se <- sqrt(diag(covMat))
    # test statistic
    tvalue <- mod$param/se
    # P-value
    pvalue <- 2*stats::pt(-abs(tvalue), mod$df.residual)
  }
  # make table
  coef.table <- cbind(mod$param, se, tvalue, pvalue)
  dimnames(coef.table) <- list(names(mod$param),
                               c("Estimate", "Std. Error", "t value", "Pr(>|t|)"))
  return(coef.table)
}

Comment Style Guide

Frequent use of comments should allow most comments to be restricted to one line for readability

A comment should go above its corresponding line, be indented equally with the next line, and use a single # to mark a comment

Use a string of - or = to break your code into easily noticeable chunks - Example: # Data Manipulation ----------- - RStudio allows you to collapse chunks marked like this to help with clutter

Comment Style Guide

There are exceptions to every rule! Usually, comments are to help you!

Example of breaking rules

Here’s a snippet of a long mathematical function (lots of code omitted with ellipses for space).

Code is divided into major steps marked by easily visible comments

Example of breaking rules

objfun <- function(theta, W, M, X, X_star, np, npstar, link, phi.link) {

  ### STEP 1 - Negative Log-likelihood

  # extract matrix of betas (np x 1), first np entries
  b      <- utils::head(theta, np)
  # extract matrix of beta stars (npstar x 1), last npstar entries
  b_star <- utils::tail(theta, npstar)

  ...

  ### STEP 2 - Gradient

  # define gam
  gam <- phi/(1 - phi)

A final plea

Being a successful programmer requires commenting your code

Want to understand code you wrote >24 hours ago without comments?

What style

We will be using a mix of the Tidyverse Style Guide by Hadley Wickham and the Google Style Guide. Please see the links for details, but I will summarize some main points here and throughout the class as we learn more functionality, such as functions and packages.

You may be graded on following good code style.

Object Names

Use either underscores (_) or big camel case (BigCamelCase) to separate words within an object name. Do not use dots . to separate words in R functions!

# Good
day_one
day_1
DayOne

# Bad
dayone

Object Names

Names should be concise, meaningful, and (generally) nouns.

# Good
day_one

# Bad
first_day_of_the_month
djm1

Object Names

It is very important that object names do not write over common functions!

# Very extra super bad
c <- 7
t <- 23
T <- FALSE
mean <- "something"

Note: T and F are R shorthand for TRUE and FALSE, respectively. In general, spell them out to be as clear as possible.

Spacing

Put a space after every comma, just like in English writing.

# Good
x[, 1]

# Bad
x[,1]
x[ ,1]
x[ , 1]

Do not put spaces inside or outside parentheses for regular function calls.

# Good
mean(x, na.rm = TRUE)

# Bad
mean (x, na.rm = TRUE)
mean( x, na.rm = TRUE )

Spacing with Operators

Most of the time when you are doing math, conditionals, logicals, or assignment, your operators should be surrounded by spaces. (e.g. for ==, +, -, <-, etc.)

# Good
height <- (feet * 12) + inches
mean(x, na.rm = 10)

# Bad
height<-feet*12+inches
mean(x, na.rm=10)

There are some exceptions we will learn more about later, such as the power symbol ^. See the Tidyverse Style Guide for more details!

Extra Spacing

Adding extra spaces ok if it improves alignment of = or <-.

# Good
list(
  total = a + b + c,
  mean  = (a + b + c) / n
)

# Also fine
list(
  total = a + b + c,
  mean = (a + b + c) / n
)

Long Lines of Code

Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font.

If a function call is too long to fit on a single line, use one line each for the function name, each argument, and the closing ). This makes the code easier to read and to change later.

# Good
do_something_very_complicated(
  something = "that",
  requires = many,
  arguments = "some of which may be long"
)

# Bad
do_something_very_complicated("that", requires, many, arguments,
                              "some of which may be long"
                              )

Tip! Try RStudio > Preferences > Code > Display > Show Margin with Margin column 80 to give yourself a visual cue!

Assignment

We use <- instead of = for assignment. This is moderately controversial if you find yourself in the right (wrong?) communities.

# Good
x <- 5

# Bad
x = 5

Semicolons

In R, semi-colons (;) are used to execute pieces of R code on a single line. In general, this is bad practice and should be avoided. Also, you never need to end lines of code with semi-colons!

# Bad
a <- 2; b <- 3

# Also bad
a <- 2;
b <- 3;

# Good
a <- 2
b <- 3

Quotes and Strings

Use ", not ', for quoting text. The only exception is when the text already contains double quotes and no single quotes.

# Bad
'Text'
'Text with "double" and \'single\' quotes'

# Good
"Text"
'Text with "quotes"'
'<a href="http://style.tidyverse.org">A link</a>'