The primary reference for these notes are r4ds

What you will learn

  • conditionals (if/else)
  • loops (for, while)
  • how to define your own functions
  • vectors and lists
library(tidyverse)

R supports the use of two types of conditional statements: if and if-else.

if/else

An if statement is a true-or-false condition (given in parentheses), followed by a block of coded (surrounded by curly brackets). The commands are only executed if the condition turns out to be true.

if (2 > 1) {
  print('fact')
}
## [1] "fact"
if (2 < 1) {
  print('alternative fact')
}

if statements can be followed by else statements. Note runif (r-uniform) returns a random number between 0 and 1.

# Flip a coin 
if (runif(1) < 0.5) {
    print('heads')
} else {
    print('tails')
}
## [1] "tails"

You can also use else if

r <- runif(1)
# rock paper sissors
if (r < 1/3) {
    print('rock')
} else if (1/3 < r && r < 2/3) {
    print('paper')
} else{
    print('scissors')
}
## [1] "rock"

conditionals

if/else statements usually evaluate conditional statements.

equality

Use the double equals (==) to check for equality between some objects

2*5 == 10
## [1] TRUE

warning: when checking if two decimal numbers are equal do not use ==; instead use near from the dplyr package. Why? Computers do finite precision arithmetic meaning they approximate decimal numbers.

# oops
sqrt(2)^2 == 2
## [1] FALSE
dplyr::near(sqrt(2)^2, 2)
## [1] TRUE

To check if an object is NA use is.na.

is.na(NaN)
## [1] TRUE

For special values such as Inf, and NaN see section 20.3.2

vectorized logial operations

Note that == is vectorized i.e.

c(1, 1, 1) == c(1, 2, 1)
## [1]  TRUE FALSE  TRUE

When using and/or in a conditional statement use the double && and || (opposed to single & and |). Notice the difference

c(T, T, T) || c(T, F, T)
## [1] TRUE
c(T, T, T) |c(T, F, T)
## [1] TRUE TRUE TRUE

Loops

A loop is any part of code that causes a repeat of the same commands to be executed. There are three kinds of loops available in R: for, while and repeat (I’ve never used repeat). All three require some form of conditional statements, like those considered above.

for loops

This is the most common type of loop, and it can be very powerful. In a for loop, you explicitly define both a counter (or index) and a vector for which the counter takes its values. When the counter has finished taking the values of the vector, the loop is terminated.

# counter is i and the vector is the integers from 1 to 10
for (i in 1:10) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Often you will use a loop to fill a vector with values. You should preallocate the memory for the vector

nums <- vector("double", 10) # or rep(0, 10) or something else
for (i in 1:10) {
  nums[i] <- runif(1)
}

# or you could dynamically allocate memory which is probably the wrong way to do it

# nums <- c()
# for (i in 1:10) {
#   nums <- c(nums, runif(1))
# }

while loops

A “while” loop is a block of code following a single logical condition. The code will continue to run again and again so long as the condition remains true. The condition will be checked once before every iteration.

# Random walk on the number line until we have strayed too far

current_position <- 10
n_iter <- 0

# run until stopping condition
while (current_position > 0){
    current_position <- current_position + rnorm(1)
    n_iter <- n_iter + 1
}

print(paste0('you lost all your money after ', n_iter, ' trips to the casino'))
## [1] "you lost all your money after 269 trips to the casino"

Most of the time, you will want the code block to affect the condition somehow. In the above example, the variable “distance” changes every time, so we eventually expect the condition (distance < 5) to be false.

Caution: if you are not careful, you can get into an “infinite loop”!

while (TRUE){
    print('Duke sucks')
}

vectorization

for loops can be slow and are necessarily sequential. Most computers have multiple cores. If you have access to a cluster you might have access to dozens or hundreds of cores. Many tasks that use for loops are not inherently sequential.

Loops also require a lot of written code. We will talk later about imperative and functional programming in R which replaces for loops in many cases. For more details see section 21.


Functions

Functions are super important. In theory you could just copy and paste a lot however writing a function has some major benefits (from r4ds)

  1. You can give a function an evocative name that makes your code easier to understand.

  2. As requirements change, you only need to update code in one place, instead of many.

  3. You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).

General life advice

write a function whenever you’ve copied and pasted a block of code more than twice

define a function

You can define a function as follows

power <- function(num, exponent){
    # returns num raised to the exponent
    num ^ exponent
}

power(2, 3)
## [1] 8

num and exponent are called arguments. You can give them default values

power <- function(num, exponent=3){
    # returns num raised to the exponent
    num ^ exponent
}

power(2)
## [1] 8

Most statements call a function. To figure out how to use a function pull up the documentation with ?

?runif

min and max are given default values, while the argument n is not.

return values

The last statement run in a function will be returned. You can also use return()

random_rps <- function(){
    # randomly returns one of rock, paper or scissors
    
    r <- runif(1)
    # rock paper sissors
    if (r < 1/3) {
        return('rock')
    } else if (1/3 < r && r < 2/3) {
        return('paper')
    } else{
        return('scissors')
    }
}

random_rps()
## [1] "scissors"

check arguments

Often you want to check the arguments of a function to make sure they are what your function expects. Use the stop command

get_rps_winner <- function(alice, bob){
    # returns the winner of a rock paper scissors match
    
    if (!alice %in% c('rock', 'paper', 'scissors')){
        stop('alice is cheating!')
    }
    
    if (! bob %in% c('rock', 'paper', 'scissors')){
        stop('alice is cheating!')
    }
    
    if (alice == bob){
        return('tie')
    }
    
    if (alice == 'paper'){
        if(bob == 'rock'){
            return('alice')
        } else{
            return('bob')
        }
    } else if (alice == 'rock'){
        if(bob == 'scissors'){
            return('alice')
        } else{
            return('bob')
        }
    } else{ # alice is 'scissors'
        if(bob == 'paper'){
            return('alice')
        } else{
            return('bob')
        }
    }
    
}

get_rps_winner('rock', 'paper')
## [1] "bob"
# r markdown won't run when stop is called
# get_rps_winner('sledgehammer', 'paper')

Functions are for humans and computers

Long, complicated code that jumps around a lot is harder to deal with in every way. A benefit of using functions is that you can break up the overall coding task into smaller functions. This way, you can leave all the details for each step within a single function.

Functions should be named informatively; typically a verb describing what the function is doing. Functions should also be commented well so someone else (e.g. future you) can understand what’s going on. Consider your audience to be someone who knows R but has not read your code.

Long programs are broken up into dozens (or hundreds) of helper functions. It’s a good idea to organize functions that go together into the same R script.

play_rps <- function(victory_thresehold=10){
    # determins the winner of rock paper scissors
    if(victory_thresehold <= 0){
        stop('victory thresehold must be positive')
    }
    
    alice_wins <- 0
    bob_wins <- 0
    
    # play until someone wints
    while(alice_wins < victory_thresehold  && bob_wins < victory_thresehold){
        alice <- random_rps()
        bob <- random_rps()
        
        winner <- get_rps_winner(alice, bob)
        
        if (winner == 'alice'){
            alice_wins <- alice_wins + 1
        }
        
        if (winner == 'bob'){
            bob_wins <- bob_wins + 1
        }
    }
    
    if (alice_wins == 10){
        return('alice')
    } else{
        return('bob')
    }
}
    


play_rps(15)
## [1] "bob"

importing your own functions

Say we write a bunch of helper functions in a file called fun.R. You can import them with the source command

source('fun.R')

helper_fun()
## [1] "Im not a very helpful helper function"

The source command will look for R scripts relative to the current working directory.

Vectors

See section 20 from r4ds for more details.

Vectors are a basic data type in R and there are two types: atomic vectors and lists. Atomic vectors are homogeneous, one dimensional objects while lists are heterogeneous, hierarchical objects.

If you are familiar with python: atomic vectors in R are lists in Python while lists in R are dicts in python.

Atomic vectors

An atomic vector is just a list of things that are the same type (integers, booleans, etc).

# integer
c(1,2,3) # 1:3
## [1] 1 2 3
# boolean
c(TRUE, FALSE, TRUE) 
## [1]  TRUE FALSE  TRUE
# string
c('I', 'wish', 'vectors', 'were', 'named', 'lists', 'instead')
## [1] "I"       "wish"    "vectors" "were"    "named"   "lists"   "instead"

There are six possible types: boolean, character, complex, raw, integer and double. Integers/doubles are called numeric vectors.

The typeof command will tell you the type of a list

typeof(rep(TRUE, 4))
## [1] "logical"

To test if a vector is a given type use the as_BLAH functions from the purr package that comes with tidyverse (see section 20.4.2)

is_logical(c(TRUE, FALSE))
## [1] TRUE

You can explicitly coerce some types into other types with as.BLAH() functions

as.integer(c('1', '2', '3'))
## [1] 1 2 3
as.logical(c(0,1,1))
## [1] FALSE  TRUE  TRUE

Sometimes vectors are implicitly coerced

# logical gets implicily coerced to integer
c(1, 2, TRUE)
## [1] 1 2 1

Implicit coercion sometimes happens within a function

sum(c(T, T, F))
## [1] 2

Vectorized operations are very common

c(-2, -1, 1, 2) + 2
## [1] 0 1 3 4

often they are combined with implicit coercion

sum(c(-2, -1, 1, 2) > 0)
## [1] 2

You can subset a vector with [] and a vector of indices

v <- 1:10
v[c(1,10)]
## [1]  1 10

Or a vector of booleans (the same length of the original vector)

v[v %%2 == 0]
## [1]  2  4  6  8 10

Vectors can also have names (read about this in section 20).

Lists

Lists can contain objects of multiple types and are indexed by names (as opposed to index sequentially)

L <- list(number=1, letter='a', bool=TRUE)
L
## $number
## [1] 1
## 
## $letter
## [1] "a"
## 
## $bool
## [1] TRUE

To access elements of a list use [[]]

L[['number']]
## [1] 1

you can use a single [] and this will return a list

L['number']
## $number
## [1] 1

See section 20.5.3 for an explanation of the difference. Lists can store other lists making them hierarchical

LoL <- list(names = list('Iain', 'Brendan', 'Varun'),
            numbers=list(1:3, 1:5, 1:7))

LoL[['numbers']][[2]]
## [1] 1 2 3 4 5

Other topics

This lecture covers the programming basics in R. There are a number of advanced topics (some of which we may cover later) you should be aware of/learn more about. Almost all of these are covered in Advanced R

  • testing

  • vectorization

  • recursion

  • debugging / profiling

  • environments / scoping

  • object oriented programming

  • functional programming

  • non-standard evaluation

  • dynamic programming

  • memory usage

  • using Rcpp