MATH167R: Programming Basics

Peter Gao

Overview of today

  • What is an algorithm?
  • Control statements
  • Writing R functions

Programming and this class

Up until now, we have been focusing on using R for exploratory data analysis:

  • Downloading and combining datasets
  • Objects and types of data in R
  • Creating data visualizations and computing descriptive statistics
  • Writing R Markdown documents for communication

For the next several weeks, we will focus on programming and computer science concepts (as they might be used for data science).

What is an algorithm?

An algorithm is a finite sequence of instructions for performing some task or solving some problem.

Algorithms should be unambiguous and precise. In other words, if the instructions are followed correctly, the output should be reproducible: using the same inputs twice should yield the same outputs each time.

A more precise example: the smallest number

Input: A list of numbers L. Output: The smallest number in the list L

  1. If L is empty, then there is no smallest number.
  2. Assume the first number in L is the smallest.
  3. For each remaining number in L, compare with the current assumed smallest number. If it is smaller, assume this number is now the smallest number.
  4. When there are no more numbers in L to consider, report the current smallest number.

Flowcharts for algorithms

You may also find it helpful to imagine an algorithm as a flowchart (from Wikipedia):

Algorithms in code

Since algorithms are simply sets of instructions or “recipes,” we can represent them as:

  • Written “pseudo-code” instructions (in English or other languages)
  • Flow charts
  • Code (in R, Python, or other programming languages)

Our goal will be to write basic algorithms in R that we can use to carry out data science tasks.

Functions as algorithms

Each function is an implementation of an algorithm. For example, consider the rank() function which returns the rank-ordering of each number in a vector:

rank(c(10, 9, 8))
[1] 3 2 1
rank
function (x, na.last = TRUE, ties.method = c("average", "first", 
    "last", "random", "max", "min")) 
{
    stopifnot(length(na.last) == 1L)
    nas <- is.na(x)
    nm <- names(x)
    ties.method <- match.arg(ties.method)
    if (is.factor(x)) 
        x <- as.integer(x)
    x <- x[!nas]
    y <- switch(ties.method, average = , min = , max = .Internal(rank(x, 
        length(x), ties.method)), first = sort.list(sort.list(x)), 
        last = sort.list(rev.default(sort.list(x, decreasing = TRUE))), 
        random = sort.list(order(x, stats::runif(sum(!nas)))))
    if (!is.na(na.last) && any(nas)) {
        yy <- NA
        NAkeep <- (na.last == "keep")
        if (NAkeep || na.last) {
            yy[!nas] <- y
            if (!NAkeep) 
                yy[nas] <- (length(y) + 1L):length(yy)
        }
        else {
            len <- sum(nas)
            yy[!nas] <- y + len
            yy[nas] <- seq_len(len)
        }
        y <- yy
        names(y) <- nm
    }
    else names(y) <- nm[!nas]
    y
}
<bytecode: 0x13bcf1048>
<environment: namespace:base>

Exercise

Can you write an algorithm to check if a positive integer N is divisible by 3?

Control statements

Most algorithms will require the interpreter to check some conditions to determine the outcome.

By default, the expressions in the body of a function are evaluated sequentially.

In programming, control statements allow the developer (the person writing the code) to tell the interpreter what to do in different scenarios.

if statements

R provides the if syntax for conditional evaluation:

if (condition) code

  • if statements give conditions for which a chunk of code is evaluated.

  • First, a condition is specified before a chunk of code.

    • If the condition is TRUE, then the chunk is evaluated.
    • If the condition is FALSE, then it is not evaluated.

if statements

x <- 1
# Conditions go in parentheses after if
if (x > 0) {
  # code chunks get surrounded by curly brackets
  print(paste0("x is equal to ", x, ", a positive number!"))
}
[1] "x is equal to 1, a positive number!"

if statements

x <- -1
# Conditions go in parenthesis after if
if (x > 0) {
  # code chunks get surrounded by curly brackets
  print(paste0("x is equal to ", x, ", a positive number!"))
}

if statements

We can also write one line if statements without braces:

grade <- NULL
score <- 65
if (score >= 60) grade <- "CR"
print(grade)
[1] "CR"
grade <- NULL
score <- 55
if (score >= 60) grade <- "CR"
print(grade)
NULL

else statements

We can use else to specify what we want to happen when the condition is FALSE.

x <- 1
if (x > 0) {
  print(paste0("x is equal to ", x, ", a positive number!"))
} else {
  print(paste0("x is equal to ", x, ", a negative number!"))
}
[1] "x is equal to 1, a positive number!"

else statements

x <- -1
if (x > 0) {
  print(paste0("x is equal to ", x, ", a positive number!"))
} else {
  print(paste0("x is equal to ", x, ", a negative number!"))
}
[1] "x is equal to -1, a negative number!"

else if

We use else if when there are more than two possible paths. Here, the final else chunk will evaluate for any cases not covered by the if or else if.

x <- 1
if (x > 0) {
  paste0("x is equal to ", x, ", a positive number!")
} else if (x < 0) {
  paste0("x is equal to ", x, ", a negative number!")
} else {
  paste0("x is equal to ", x, "!")
}
[1] "x is equal to 1, a positive number!"

else if

x <- -1
if (x > 0) {
  paste0("x is equal to ", x, ", a positive number!")
} else if (x < 0) {
  paste0("x is equal to ", x, ", a negative number!")
} else {
  paste0("x is equal to ", x, "!")
}
[1] "x is equal to -1, a negative number!"

else if

x <- 0
if (x > 0) {
  paste0("x is equal to ", x, ", a positive number!")
} else if (x < 0) {
  paste0("x is equal to ", x, ", a negative number!")
} else {
  paste0("x is equal to ", x, "!")
}
[1] "x is equal to 0!"

Control Statements: Examples

Divisibility

Suppose we want to check if x is divisible by 3 and print out the answer. What should CONDITION be?

x <- 5
if (CONDITION) {
  IF TRUE, DO THIS
} else {
  ELSE, DO THIS
}

Divisibility

Suppose we want to check if x is divisible by 5 and print out the answer. What should CONDITION be?

x <- 5
# modulo operator
if (x %% 3 == 0) {
  print("divisible by 3")
} else {
  print("not divisible by 3")
}
[1] "not divisible by 3"

Check length of strings

Note: We will need the stringr package for this.

# Run this if you have never installed stringr before!
# install.packages("stringr")
library(stringr)
x <- "cat"
if (str_length(x) <= 10) {
  cat("x is a short string.")
} else {
  cat("x is a long string.")
}
x is a short string.

Check length of strings

x <- "A big fluffy cat with orange fur and stripes"
if (str_length(x) <= 10) {
  cat("x is a short string.")
} else {
  cat("x is a long string.")
}
x is a long string.

Check class

x <- 5
if (is.numeric(x)) {
  cat("x is a numeric.")
} else if (is.character(x)) {
  cat("x is a character.")
} else {
  cat("x is some class I didn't check for in my code.")
}
x is a numeric.

Check class

x <- list()
if (is.numeric(x)) {
  cat("x is a numeric.")
} else if (is.character(x)) {
  cat("x is a character.")
} else {
  cat("x is some class I didn't check for in my code.")
}
x is some class I didn't check for in my code.

Exercise

Write code that, given two numerics x and y, prints out:

  • EQUAL if sum(x) and sum(y) are equal.
  • X if sum(x) is bigger.
  • Y if sum(y) is bigger.

Functions

We’ve already seen and used several functions, but you can also create your own.

This is incredibly useful when:

  • You use the same code repeatedly
  • You want to perform the same task for slightly different input values
  • You want others to be able to use your code

Anatomy of a function

The function named function is used to create a function:

function(formal arguments) {body}

The body comprises one or more lines of code that will perform the desired computations.

celsiusToFahrenheit <- function(x) {x * 9 / 5 + 32}
celsiusToFahrenheit(0)
[1] 32

Anatomy of a function

For more complicated functions, we typically move the body to a new line:

function_name <- function(param1, param2 = "default") {
  # body of the function
  return(output)
}
  • function_name: the name you want to give your function, what you will use to call it
  • function(): call this to define a function
  • param1, param2: formal arguments input by the user. You can assign default values by setting them equal to something in the call to function()
  • body: the actual code that is executed
  • return(): the output value returned to the user

Square a number, add 2

square_plus_2 <- function(x) {
  y <- x^2 + 2
  return(y)
}

square_plus_2(4)
[1] 18
square_plus_2(10)
[1] 102
square_plus_2(1:5)
[1]  3  6 11 18 27

Square a number, add 2

square_plus_2("some string")
Error in x^2: non-numeric argument to binary operator

What happened here?

We wrote a function for numerics only but didn’t check the input.

Square a number, add 2

Let’s try making our function more robust by adding a if statement and a stop call.

square_plus_2 <- function(x) {
  if (!is.numeric(x)) {
    stop("x must be numeric!")
  } else {
    y <- x^2 + 2
    return(y)
  }
}

square_plus_2("some string")
Error in square_plus_2("some string"): x must be numeric!

Style Guide

Function Names

Strive to have function names based on verbs. Otherwise, standard variable name style guidelines apply!

# Good
add_row()
permute()

# Bad
row_adder()
permutation()

Spacing

Place a space before and after () when used with if, for, or while.

# Good
if (condition) {
  x + 2
}

# Bad
if(condition){
  x + 2
}

Spacing

Place a space after () used for function arguments.

# Good
if (debug) {
  show(x)
}

# Bad
if(debug){
  show(x)
}

Code Blocks

  • { should be the last character on the line. Related code (e.g., an if clause, a function declaration, a trailing comma, …) must be on the same line as the opening brace. It should be preceded by a single space.
  • The contents within code blocks should be indented by two spaces from where it started
  • } should be the first character on the line.

Code Blocks

# Good
if (y < 0) {
  message("y is negative")
}

if (y == 0) {
  if (x > 0) {
    log(x)
  } else {
    message("x is negative or zero")
  }
} else {
  y^x
}

Code Blocks

# Bad
if (y<0){
message("Y is negative")
}

if (y == 0)
{
    if (x > 0) {
      log(x)
    } else {
  message("x is negative or zero")
    }
} else { y ^ x }

In-line Statements

In general, it’s ok to drop the curly braces for very simple statements that fit on one line. However, function calls that affect control flow (return, stop, etc.) should always go in their own {} block:

# Good
y <- 10
x <- if (y < 20) "Too low" else "Too high"

if (y < 0) {
  stop("Y is negative")
}

find_abs <- function(x) {
  if (x > 0) {
    return(x)
  }
  x * -1
}

In-line Statements

In general, it’s ok to drop the curly braces for very simple statements that fit on one line. However, function calls that affect control flow (return, stop, etc.) should always go in their own {} block:

# Bad
if (y < 0) stop("Y is negative")

if (y < 0)
  stop("Y is negative")

find_abs <- function(x) {
  if (x > 0) return(x)
  x * -1
}

Long lines in functions

If a function definition runs over multiple lines, indent the second line to where the definition starts.

# Good
long_function_name <- function(a = "a long argument",
                               b = "another argument",
                               c = "another long argument") {
  # As usual code is indented by two spaces.
}

# Bad
long_function_name <- function(a = "a long argument",
  b = "another argument",
  c = "another long argument") {
  # Here it's hard to spot where the definition ends and the
  # code begins
}

return

Strictly speaking, return is not necessary in a function definition. The function will output the last line of executable R code. The following function definitions will output the same results!

Add_Values <- function(x, y) {
  return(x + y)
}

Add_Values <- function(x, y) {
  x + y
}

Commenting functions

For now, when commenting functions, include (at least) 3 lines of comments:

  • a comment describing the purpose of a function
  • a comment describing each input
  • a comment describing the output

The function body should be commented as usual!

Commenting functions

# Good ----
# Function: square_plus_2, squares a number and then adds 2
# Input: x, must be numeric
# Output: numeric equal to x^2 + 2
square_plus_2 <- function(x) {
  # check that x is numeric
  if (!is.numeric(x)) {
    stop("x must be numeric!")
  } else {
    # if numeric, then square and add 2
    y <- x^2 + 2
    return(y)
  }
}

Commenting functions

# Bad ----
# Function for problem 2c
square_plus_2 <- function(x) {
  if (!is.numeric(x)) {
    stop("x must be numeric!")
  } else {
    y <- x^2 + 2
    return(y)
  }
}

Exercises

  1. Write a function that takes as input three numbers, a, b, and c, and returns the sum, without using the sum function. Be sure to test your functions out on example input.
  2. Write a function that takes as input one number, a, and returns its absolute value, without using the abs function.