[1] 3 2 1
Up until now, we have been focusing on using R for exploratory data analysis:
For the next several weeks, we will focus on programming and computer science concepts (as they might be used for data science).
An algorithm is a finite sequence of instructions for performing some task or solving some problem.
Algorithms should be unambiguous and precise. In other words, if the instructions are followed correctly, the output should be reproducible: using the same inputs twice should yield the same outputs each time.
Input: A list of numbers L. Output: The smallest number in the list L
You may also find it helpful to imagine an algorithm as a flowchart (from Wikipedia):
Since algorithms are simply sets of instructions or “recipes,” we can represent them as:
Our goal will be to write basic algorithms in R that we can use to carry out data science tasks.
Each function is an implementation of an algorithm. For example, consider the rank()
function which returns the rank-ordering of each number in a vector:
function (x, na.last = TRUE, ties.method = c("average", "first",
"last", "random", "max", "min"))
{
stopifnot(length(na.last) == 1L)
nas <- is.na(x)
nm <- names(x)
ties.method <- match.arg(ties.method)
if (is.factor(x))
x <- as.integer(x)
x <- x[!nas]
y <- switch(ties.method, average = , min = , max = .Internal(rank(x,
length(x), ties.method)), first = sort.list(sort.list(x)),
last = sort.list(rev.default(sort.list(x, decreasing = TRUE))),
random = sort.list(order(x, stats::runif(sum(!nas)))))
if (!is.na(na.last) && any(nas)) {
yy <- NA
NAkeep <- (na.last == "keep")
if (NAkeep || na.last) {
yy[!nas] <- y
if (!NAkeep)
yy[nas] <- (length(y) + 1L):length(yy)
}
else {
len <- sum(nas)
yy[!nas] <- y + len
yy[nas] <- seq_len(len)
}
y <- yy
names(y) <- nm
}
else names(y) <- nm[!nas]
y
}
<bytecode: 0x13bcf1048>
<environment: namespace:base>
Can you write an algorithm to check if a positive integer N is divisible by 3?
Most algorithms will require the interpreter to check some conditions to determine the outcome.
By default, the expressions in the body of a function are evaluated sequentially.
In programming, control statements allow the developer (the person writing the code) to tell the interpreter what to do in different scenarios.
if
statementsR provides the if
syntax for conditional evaluation:
if (condition) code
if
statements give conditions for which a chunk of code is evaluated.
First, a condition is specified before a chunk of code.
TRUE
, then the chunk is evaluated.FALSE
, then it is not evaluated.if
statements[1] "x is equal to 1, a positive number!"
if
statementsif
statementsWe can also write one line if
statements without braces:
else
statementsWe can use else
to specify what we want to happen when the condition is FALSE
.
[1] "x is equal to 1, a positive number!"
else
statementselse if
We use else if
when there are more than two possible paths. Here, the final else
chunk will evaluate for any cases not covered by the if
or else if
.
else if
else if
Suppose we want to check if x
is divisible by 3 and print out the answer. What should CONDITION
be?
Suppose we want to check if x
is divisible by 5 and print out the answer. What should CONDITION
be?
[1] "not divisible by 3"
Note: We will need the stringr
package for this.
Write code that, given two numerics x
and y
, prints out:
EQUAL
if sum(x)
and sum(y)
are equal.X
if sum(x)
is bigger.Y
if sum(y)
is bigger.We’ve already seen and used several functions, but you can also create your own.
This is incredibly useful when:
The function named function
is used to create a function:
function(formal arguments) {body}
The body comprises one or more lines of code that will perform the desired computations.
[1] 32
For more complicated functions, we typically move the body to a new line:
function_name
: the name you want to give your function, what you will use to call itfunction()
: call this to define a functionparam1
, param2
: formal arguments input by the user. You can assign default values by setting them equal to something in the call to function()
return()
: the output value returned to the userWhat happened here?
We wrote a function for numerics only but didn’t check the input.
Let’s try making our function more robust by adding a if
statement and a stop
call.
Strive to have function names based on verbs. Otherwise, standard variable name style guidelines apply!
Place a space before and after ()
when used with if
, for
, or while
.
Place a space after ()
used for function arguments.
{
should be the last character on the line. Related code (e.g., an if
clause, a function declaration, a trailing comma, …) must be on the same line as the opening brace. It should be preceded by a single space.}
should be the first character on the line.In general, it’s ok to drop the curly braces for very simple statements that fit on one line. However, function calls that affect control flow (return
, stop
, etc.) should always go in their own {}
block:
In general, it’s ok to drop the curly braces for very simple statements that fit on one line. However, function calls that affect control flow (return
, stop
, etc.) should always go in their own {}
block:
If a function definition runs over multiple lines, indent the second line to where the definition starts.
# Good
long_function_name <- function(a = "a long argument",
b = "another argument",
c = "another long argument") {
# As usual code is indented by two spaces.
}
# Bad
long_function_name <- function(a = "a long argument",
b = "another argument",
c = "another long argument") {
# Here it's hard to spot where the definition ends and the
# code begins
}
return
Strictly speaking, return
is not necessary in a function definition. The function will output the last line of executable R code. The following function definitions will output the same results!
For now, when commenting functions, include (at least) 3 lines of comments:
The function body should be commented as usual!
# Good ----
# Function: square_plus_2, squares a number and then adds 2
# Input: x, must be numeric
# Output: numeric equal to x^2 + 2
square_plus_2 <- function(x) {
# check that x is numeric
if (!is.numeric(x)) {
stop("x must be numeric!")
} else {
# if numeric, then square and add 2
y <- x^2 + 2
return(y)
}
}
a
, b
, and c
, and returns the sum, without using the sum
function. Be sure to test your functions out on example input.a
, and returns its absolute value, without using the abs
function.