The primary reference for these notes are r4ds
library(tidyverse)
R supports the use of two types of conditional statements: if
and if-else
.
An if
statement is a true-or-false condition (given in parentheses), followed by a block of coded (surrounded by curly brackets). The commands are only executed if the condition turns out to be true.
if (2 > 1) {
print('fact')
}
## [1] "fact"
if (2 < 1) {
print('alternative fact')
}
if
statements can be followed by else
statements. Note runif
(r-uniform) returns a random number between 0 and 1.
# Flip a coin
if (runif(1) < 0.5) {
print('heads')
} else {
print('tails')
}
## [1] "tails"
You can also use else if
r <- runif(1)
# rock paper sissors
if (r < 1/3) {
print('rock')
} else if (1/3 < r && r < 2/3) {
print('paper')
} else{
print('scissors')
}
## [1] "rock"
if/else
statements usually evaluate conditional statements.
Use the double equals (==
) to check for equality between some objects
2*5 == 10
## [1] TRUE
warning: when checking if two decimal numbers are equal do not use ==
; instead use near
from the dplyr
package. Why? Computers do finite precision arithmetic meaning they approximate decimal numbers.
# oops
sqrt(2)^2 == 2
## [1] FALSE
dplyr::near(sqrt(2)^2, 2)
## [1] TRUE
To check if an object is NA
use is.na
.
is.na(NaN)
## [1] TRUE
For special values such as Inf
, and NaN
see section 20.3.2
Note that ==
is vectorized i.e.
c(1, 1, 1) == c(1, 2, 1)
## [1] TRUE FALSE TRUE
When using and/or in a conditional statement use the double &&
and ||
(opposed to single &
and |
). Notice the difference
c(T, T, T) || c(T, F, T)
## [1] TRUE
c(T, T, T) |c(T, F, T)
## [1] TRUE TRUE TRUE
A loop is any part of code that causes a repeat of the same commands to be executed. There are three kinds of loops available in R: for
, while
and repeat
(I’ve never used repeat
). All three require some form of conditional statements, like those considered above.
This is the most common type of loop, and it can be very powerful. In a for
loop, you explicitly define both a counter (or index) and a vector for which the counter takes its values. When the counter has finished taking the values of the vector, the loop is terminated.
# counter is i and the vector is the integers from 1 to 10
for (i in 1:10) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
Often you will use a loop to fill a vector with values. You should preallocate the memory for the vector
nums <- vector("double", 10) # or rep(0, 10) or something else
for (i in 1:10) {
nums[i] <- runif(1)
}
# or you could dynamically allocate memory which is probably the wrong way to do it
# nums <- c()
# for (i in 1:10) {
# nums <- c(nums, runif(1))
# }
A “while” loop is a block of code following a single logical condition. The code will continue to run again and again so long as the condition remains true. The condition will be checked once before every iteration.
# Random walk on the number line until we have strayed too far
current_position <- 10
n_iter <- 0
# run until stopping condition
while (current_position > 0){
current_position <- current_position + rnorm(1)
n_iter <- n_iter + 1
}
print(paste0('you lost all your money after ', n_iter, ' trips to the casino'))
## [1] "you lost all your money after 269 trips to the casino"
Most of the time, you will want the code block to affect the condition somehow. In the above example, the variable “distance” changes every time, so we eventually expect the condition (distance < 5) to be false.
Caution: if you are not careful, you can get into an “infinite loop”!
while (TRUE){
print('Duke sucks')
}
for
loops can be slow and are necessarily sequential. Most computers have multiple cores. If you have access to a cluster you might have access to dozens or hundreds of cores. Many tasks that use for
loops are not inherently sequential.
Loops also require a lot of written code. We will talk later about imperative and functional programming in R which replaces for
loops in many cases. For more details see section 21.
Functions are super important. In theory you could just copy and paste a lot however writing a function has some major benefits (from r4ds)
You can give a function an evocative name that makes your code easier to understand.
As requirements change, you only need to update code in one place, instead of many.
You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).
General life advice
write a function whenever you’ve copied and pasted a block of code more than twice
You can define a function as follows
power <- function(num, exponent){
# returns num raised to the exponent
num ^ exponent
}
power(2, 3)
## [1] 8
num
and exponent
are called arguments. You can give them default values
power <- function(num, exponent=3){
# returns num raised to the exponent
num ^ exponent
}
power(2)
## [1] 8
Most statements call a function. To figure out how to use a function pull up the documentation with ?
?runif
min
and max
are given default values, while the argument n
is not.
The last statement run in a function will be returned. You can also use return()
random_rps <- function(){
# randomly returns one of rock, paper or scissors
r <- runif(1)
# rock paper sissors
if (r < 1/3) {
return('rock')
} else if (1/3 < r && r < 2/3) {
return('paper')
} else{
return('scissors')
}
}
random_rps()
## [1] "scissors"
Often you want to check the arguments of a function to make sure they are what your function expects. Use the stop
command
get_rps_winner <- function(alice, bob){
# returns the winner of a rock paper scissors match
if (!alice %in% c('rock', 'paper', 'scissors')){
stop('alice is cheating!')
}
if (! bob %in% c('rock', 'paper', 'scissors')){
stop('alice is cheating!')
}
if (alice == bob){
return('tie')
}
if (alice == 'paper'){
if(bob == 'rock'){
return('alice')
} else{
return('bob')
}
} else if (alice == 'rock'){
if(bob == 'scissors'){
return('alice')
} else{
return('bob')
}
} else{ # alice is 'scissors'
if(bob == 'paper'){
return('alice')
} else{
return('bob')
}
}
}
get_rps_winner('rock', 'paper')
## [1] "bob"
# r markdown won't run when stop is called
# get_rps_winner('sledgehammer', 'paper')
Long, complicated code that jumps around a lot is harder to deal with in every way. A benefit of using functions is that you can break up the overall coding task into smaller functions. This way, you can leave all the details for each step within a single function.
Functions should be named informatively; typically a verb describing what the function is doing. Functions should also be commented well so someone else (e.g. future you) can understand what’s going on. Consider your audience to be someone who knows R but has not read your code.
Long programs are broken up into dozens (or hundreds) of helper functions. It’s a good idea to organize functions that go together into the same R script.
play_rps <- function(victory_thresehold=10){
# determins the winner of rock paper scissors
if(victory_thresehold <= 0){
stop('victory thresehold must be positive')
}
alice_wins <- 0
bob_wins <- 0
# play until someone wints
while(alice_wins < victory_thresehold && bob_wins < victory_thresehold){
alice <- random_rps()
bob <- random_rps()
winner <- get_rps_winner(alice, bob)
if (winner == 'alice'){
alice_wins <- alice_wins + 1
}
if (winner == 'bob'){
bob_wins <- bob_wins + 1
}
}
if (alice_wins == 10){
return('alice')
} else{
return('bob')
}
}
play_rps(15)
## [1] "bob"
Say we write a bunch of helper functions in a file called fun.R
. You can import them with the source
command
source('fun.R')
helper_fun()
## [1] "Im not a very helpful helper function"
The source
command will look for R scripts relative to the current working directory.
See section 20 from r4ds for more details.
Vectors are a basic data type in R and there are two types: atomic vectors and lists. Atomic vectors are homogeneous, one dimensional objects while lists are heterogeneous, hierarchical objects.
If you are familiar with python: atomic vectors in R are lists in Python while lists in R are dicts in python.
An atomic vector is just a list of things that are the same type (integers, booleans, etc).
# integer
c(1,2,3) # 1:3
## [1] 1 2 3
# boolean
c(TRUE, FALSE, TRUE)
## [1] TRUE FALSE TRUE
# string
c('I', 'wish', 'vectors', 'were', 'named', 'lists', 'instead')
## [1] "I" "wish" "vectors" "were" "named" "lists" "instead"
There are six possible types: boolean, character, complex, raw, integer and double. Integers/doubles are called numeric vectors.
The typeof
command will tell you the type of a list
typeof(rep(TRUE, 4))
## [1] "logical"
To test if a vector is a given type use the as_BLAH
functions from the purr
package that comes with tidyverse
(see section 20.4.2)
is_logical(c(TRUE, FALSE))
## [1] TRUE
You can explicitly coerce some types into other types with as.BLAH()
functions
as.integer(c('1', '2', '3'))
## [1] 1 2 3
as.logical(c(0,1,1))
## [1] FALSE TRUE TRUE
Sometimes vectors are implicitly coerced
# logical gets implicily coerced to integer
c(1, 2, TRUE)
## [1] 1 2 1
Implicit coercion sometimes happens within a function
sum(c(T, T, F))
## [1] 2
Vectorized operations are very common
c(-2, -1, 1, 2) + 2
## [1] 0 1 3 4
often they are combined with implicit coercion
sum(c(-2, -1, 1, 2) > 0)
## [1] 2
You can subset a vector with []
and a vector of indices
v <- 1:10
v[c(1,10)]
## [1] 1 10
Or a vector of booleans (the same length of the original vector)
v[v %%2 == 0]
## [1] 2 4 6 8 10
Vectors can also have names (read about this in section 20).
Lists can contain objects of multiple types and are indexed by names (as opposed to index sequentially)
L <- list(number=1, letter='a', bool=TRUE)
L
## $number
## [1] 1
##
## $letter
## [1] "a"
##
## $bool
## [1] TRUE
To access elements of a list use [[]]
L[['number']]
## [1] 1
you can use a single []
and this will return a list
L['number']
## $number
## [1] 1
See section 20.5.3 for an explanation of the difference. Lists can store other lists making them hierarchical
LoL <- list(names = list('Iain', 'Brendan', 'Varun'),
numbers=list(1:3, 1:5, 1:7))
LoL[['numbers']][[2]]
## [1] 1 2 3 4 5
This lecture covers the programming basics in R. There are a number of advanced topics (some of which we may cover later) you should be aware of/learn more about. Almost all of these are covered in Advanced R
testing
vectorization
recursion
debugging / profiling
environments / scoping
object oriented programming
functional programming
non-standard evaluation
dynamic programming
memory usage
using Rcpp