= 5
n = 0:n
k
# individual terms
= rep(1/(2^k))
terms terms
[1] 1.00000 0.50000 0.25000 0.12500 0.06250 0.03125
# series
sum(terms)
[1] 1.96875
Stat 133
In this module, we review various constructs and idioms in R to handle iterative computations:
for()
loopwhile()
looprepeat
loopapply()
, lapply()
, sapply()
functionssweep()
Many times we need to perform a procedure several times
In other words, we have to perform the same operation several times as long as some condition is fulfilled
For this purpose we use loops
The main idea is that of iteration or repetition
R provides three basic paradigms to handle this situations: for()
loops, while
loops, and repeat
loops.
Consider the following summation series:
\[ \sum_{k=0}^{n} \frac{1}{2^k} = 1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \dots + \frac{1}{2^n} \]
Assuming \(n = 5\), we can compute the summation series using vectorized code. First, we generate each of the individual terms \(1/2^k\) with values for \(k = 0, 1, \dots, n\). And then we add those terms:
= 5
n = 0:n
k
# individual terms
= rep(1/(2^k))
terms terms
[1] 1.00000 0.50000 0.25000 0.12500 0.06250 0.03125
# series
sum(terms)
[1] 1.96875
for
loopFor learning purposes, we are going to ask you to forget about vectorization for a moment. And instead let’s see how to use a for
loop.
In the following loop, we generate each individual term
at each iteration, storing them in a vector terms
. We also accumulate each term
in the object series_sum
:
# input (5 terms)
= 5
n
# initialize terms and series objects
= 0
terms = 0
series_sum
# generate individual terms and accumulate them
for (k in 0:n) {
<- 1 / (2^k)
term print(term)
+1] <- term
terms[k= series_sum + term
series_sum }
[1] 1
[1] 0.5
[1] 0.25
[1] 0.125
[1] 0.0625
[1] 0.03125
series_sum
[1] 1.96875
Consider the following arithmetic series:
\[ a_n = a_1 + (n-1)d, \qquad n = 2, 3, \dots \]
For instance, when \(a_1 = 3\), \(d = 3\), the terms \(a_2, \dots, a_5\) are given by:
\[\begin{align*} a_2 &= a_1 + (2 - 1) d = 6 \\ a_3 &= a_2 + (3 - 1) d = 9 \\ a_4 &= a_3 + (4 - 1) d = 12 \\ a_5 &= a_4 + (5 - 1) d = 15 \\ \end{align*}\]
for
loopHere’s one way to obtain the above series using a for
loop:
# Arithmetic Series
= 3
a1 = 3
d = 5
num
# output
= rep(0, num)
series
# iterations
for (n in 1:num) {
= a1 + (n-1)*d
an <- an
series[n]
}
series
[1] 3 6 9 12 15
A sequence such as \(3, 6, 12, 24, 48\) is an example of a geometric sequence. In this type of sequence, the \(n\)-th term is obtained as:
\[ a_n = a_1 \times r^{n-1} \]
where: \(a_1\) is the first term, \(r\) is the common ratio, and \(n\) is the number of terms.
for
loopWrite a for loop to compute the sum of the first \(n\) terms of: 3 + 6 + 12 + 24 + …
Obtain the geometric series up to \(n = 10\).
# inputs
= 3
a1 = 2
r = 10
num
# output
= rep(0, num)
series
# iterations
for (n in 1:num) {
= a1 * r^(n-1)
an <- an
series[n] }
As you know, the average of \(n\) numbers \(x_1, x_2, \dots, x_n\) is given by the following formula:
\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i = \frac{x_1 + x_2 + \dots + x_n}{n} \]
Write R code, using both a for
loop, and a while
loop, to compute the average of the vector x = 1:100
. Don’t use sum()
or mean()
.
= 1:100
x = length(x)
n
= 0
sum_entries
for (i in 1:n) {
<- sum_entries + x[i]
sum_entries
}
= sum_entries / n avg
apply()
and sweep()
functionsIn this part we want to show you some interesting and convenient functions in R for applying a function to the elements of various kinds of objects.
apply()
: apply a function to the elements of an array
(e.g. a matrix
)
lapply()
: apply a function to the elements of a list
sapply()
: simplified apply a function to the elements of a list
Consider the following matrix (based on data frame mtcars
)
= as.matrix(mtcars[1:10,1:5])
mat mat
mpg cyl disp hp drat
Mazda RX4 21.0 6 160.0 110 3.90
Mazda RX4 Wag 21.0 6 160.0 110 3.90
Datsun 710 22.8 4 108.0 93 3.85
Hornet 4 Drive 21.4 6 258.0 110 3.08
Hornet Sportabout 18.7 8 360.0 175 3.15
Valiant 18.1 6 225.0 105 2.76
Duster 360 14.3 8 360.0 245 3.21
Merc 240D 24.4 4 146.7 62 3.69
Merc 230 22.8 4 140.8 95 3.92
Merc 280 19.2 6 167.6 123 3.92
A common statistical operation involves computing summary statistics (e.g. mean, median, min, max) of the variables in a table. For example, say you want to calculate column-means. This could be done using a for
loop that iterates over all columns, computing the mean for each of them:
# pre-allocate (i.e. initialize) vector of means
= c(0, ncol(mat))
col_means
# iterate over all columns of mat
for (j in 1:ncol(mat)) {
= mean(mat[ ,j])
col_means[j]
}
# assign names
names(col_means) = colnames(mat)
col_means
mpg cyl disp hp drat
20.370 5.800 208.610 122.800 3.538
apply()
Interestingly, instead of using a loop, you can also use apply()
which allows you to apply a function to the columns, or the rows, or both cols-rows, of a matrix. The three main arguments of apply()
are:
X
: input array (including a matrix)
MARGIN
: dimension index (1 = rows, 2 = columns) on which the function will be applied
FUN
: the function to be applied
So, to obtain the column means of mat
, we apply the mean()
function (FUN = mean
) to each column (MARGIN = 2
) of the input matrix mat
= apply(X = mat, MARGIN = 2, FUN = mean)
col_means col_means
mpg cyl disp hp drat
20.370 5.800 208.610 122.800 3.538
apply()
Sometimes you may want to apply a computation for which R has no built-in function. This can be done by passing an anonymous function to the argument FUN
. Recall that an anonymous function is a function that has no name. Here’s an example to calculate column-means with our own anonymous function:
= apply(
col_avgs X = mat,
MARGIN = 2,
FUN = function(x) sum(x) / length(x)
)
col_avgs
mpg cyl disp hp drat
20.370 5.800 208.610 122.800 3.538
If the body of the anonymous function involves several lines of code, you just have to use curly braces to wrap the body of the function:
= apply(
col_avgs X = mat,
MARGIN = 2,
FUN = function(x) {
= length(x)
n sum(x) / n
}
)
col_avgs
mpg cyl disp hp drat
20.370 5.800 208.610 122.800 3.538
Alternatively, you can create your own (non-anonymous) function first, and then pass it to apply()
= function(x, na.rm = FALSE) {
average if (na.rm) {
= x[!is.na(x)]
x
}sum(x) / length(x)
}
= apply(X = mat, MARGIN = 2, FUN = average)
col_avgs col_avgs
mpg cyl disp hp drat
20.370 5.800 208.610 122.800 3.538
Notice that the function average()
takes two arguments: x
and na.rm
. When we pass a function to apply()
that takes more than one argument, these additional arguments can also be provided to apply()
. The way you do this is by specifying them after the argument FUN
, for example:
= apply(X = mat, MARGIN = 2, FUN = average, na.rm = TRUE)
col_avgs col_avgs
mpg cyl disp hp drat
20.370 5.800 208.610 122.800 3.538
sweep()
exampleHaving obtained column-means, we can mean-center the values in mat
, that is, compute deviations from the mean for each column. Again, you could write a loop to do this:
# pre-allocate (i.e. initialize) matrix object
= matrix(0, nrow = nrow(mat), ncol = ncol(mat))
mat_centered
for (j in 1:ncol(mat)) {
= mat[ ,j] - col_means[j]
mat_centered[ ,j]
} mat_centered
[,1] [,2] [,3] [,4] [,5]
[1,] 0.63 0.2 -48.61 -12.8 0.362
[2,] 0.63 0.2 -48.61 -12.8 0.362
[3,] 2.43 -1.8 -100.61 -29.8 0.312
[4,] 1.03 0.2 49.39 -12.8 -0.458
[5,] -1.67 2.2 151.39 52.2 -0.388
[6,] -2.27 0.2 16.39 -17.8 -0.778
[7,] -6.07 2.2 151.39 122.2 -0.328
[8,] 4.03 -1.8 -61.91 -60.8 0.152
[9,] 2.43 -1.8 -67.81 -27.8 0.382
[10,] -1.17 0.2 -41.01 0.2 0.382
The same can be accomplished without writing a loop thanks to the function sweep()
= sweep(mat, MARGIN = 2, STATS = col_means, FUN = "-")
mat_centered mat_centered
mpg cyl disp hp drat
Mazda RX4 0.63 0.2 -48.61 -12.8 0.362
Mazda RX4 Wag 0.63 0.2 -48.61 -12.8 0.362
Datsun 710 2.43 -1.8 -100.61 -29.8 0.312
Hornet 4 Drive 1.03 0.2 49.39 -12.8 -0.458
Hornet Sportabout -1.67 2.2 151.39 52.2 -0.388
Valiant -2.27 0.2 16.39 -17.8 -0.778
Duster 360 -6.07 2.2 151.39 122.2 -0.328
Merc 240D 4.03 -1.8 -61.91 -60.8 0.152
Merc 230 2.43 -1.8 -67.81 -27.8 0.382
Merc 280 -1.17 0.2 -41.01 0.2 0.382
MARGIN = 2
indicates sweeping column-by-columnSTATS
is the statistic to be taking into accountFUN
is the function applied (with the given STATS
)apply()
Consider the matrix mat
defined above.
apply()
to get a vector with column maxima (see below).= apply(mat, MARGIN = 2, FUN = max) col_max
mpg cyl disp hp drat
24.40 8.00 360.00 245.00 3.92
sweep()
to scale the columns of mat
by dividing them by the column maxima (see below).= sweep(mat, MARGIN = 2, STATS = col_max, FUN = "/") mat_scaled
mpg cyl disp hp drat
Mazda RX4 0.8606557 0.75 0.4444444 0.4489796 0.9948980
Mazda RX4 Wag 0.8606557 0.75 0.4444444 0.4489796 0.9948980
Datsun 710 0.9344262 0.50 0.3000000 0.3795918 0.9821429
Hornet 4 Drive 0.8770492 0.75 0.7166667 0.4489796 0.7857143
Hornet Sportabout 0.7663934 1.00 1.0000000 0.7142857 0.8035714
Valiant 0.7418033 0.75 0.6250000 0.4285714 0.7040816
Duster 360 0.5860656 1.00 1.0000000 1.0000000 0.8188776
Merc 240D 1.0000000 0.50 0.4075000 0.2530612 0.9413265
Merc 230 0.9344262 0.50 0.3911111 0.3877551 1.0000000
Merc 280 0.7868852 0.75 0.4655556 0.5020408 1.0000000