Practice: Intro to Iterations

Stat 133

Author

Gaston Sanchez

1 Introduction

In this module, we review various constructs and idioms in R to handle iterative computations:

for() loop
while() loop
repeat loop
apply(), lapply(), sapply() functions
sweep()

2 About Loops

Many times we need to perform a procedure several times
In other words, we have to perform the same operation several times as long as some condition is fulfilled
For this purpose we use loops
The main idea is that of iteration or repetition
R provides three basic paradigms to handle this situations: for() loops, while loops, and repeat loops.

2.1 Example: Summation Series

Consider the following summation series:

\[ \sum_{k=0}^{n} \frac{1}{2^k} = 1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \dots + \frac{1}{2^n} \]

2.1.1 Using vectorized code

Assuming \(n = 5\), we can compute the summation series using vectorized code. First, we generate each of the individual terms \(1/2^k\) with values for \(k = 0, 1, \dots, n\). And then we add those terms:

n = 5
k = 0:n

# individual terms
terms = rep(1/(2^k))
terms

[1] 1.00000 0.50000 0.25000 0.12500 0.06250 0.03125

# series
sum(terms)

[1] 1.96875

2.1.2 Using a `for` loop

For learning purposes, we are going to ask you to forget about vectorization for a moment. And instead let’s see how to use a for loop.

In the following loop, we generate each individual term at each iteration, storing them in a vector terms. We also accumulate each term in the object series_sum:

# input (5 terms)
n = 5

# initialize terms and series objects
terms = 0
series_sum = 0

# generate individual terms and accumulate them
for (k in 0:n) {
  term <- 1 / (2^k)
  print(term)
  terms[k+1] <- term
  series_sum = series_sum + term
}

[1] 1
[1] 0.5
[1] 0.25
[1] 0.125
[1] 0.0625
[1] 0.03125

series_sum

[1] 1.96875

2.2 Example: Arithmetic Series

Consider the following arithmetic series:

\[ a_n = a_1 + (n-1)d, \qquad n = 2, 3, \dots \]

For instance, when \(a_1 = 3\), \(d = 3\), the terms \(a_2, \dots, a_5\) are given by:

\[\begin{align*} a_2 &= a_1 + (2 - 1) d = 6 \\ a_3 &= a_2 + (3 - 1) d = 9 \\ a_4 &= a_3 + (4 - 1) d = 12 \\ a_5 &= a_4 + (5 - 1) d = 15 \\ \end{align*}\]

2.2.1 Using a `for` loop

Here’s one way to obtain the above series using a for loop:

# Arithmetic Series
a1 = 3
d = 3
num = 5

# output
series = rep(0, num)

# iterations
for (n in 1:num) {
  an = a1 + (n-1)*d
  series[n] <- an
}

series

[1]  3  6  9 12 15

2.3 Geometric Sequence

A sequence such as \(3, 6, 12, 24, 48\) is an example of a geometric sequence. In this type of sequence, the \(n\)-th term is obtained as:

\[ a_n = a_1 \times r^{n-1} \]

where: \(a_1\) is the first term, \(r\) is the common ratio, and \(n\) is the number of terms.

2.3.1 Your Turn: `for` loop

Write a for loop to compute the sum of the first \(n\) terms of: 3 + 6 + 12 + 24 + …
Obtain the geometric series up to \(n = 10\).

Show answer

# inputs
a1 = 3
r = 2
num = 10

# output
series = rep(0, num)

# iterations
for (n in 1:num) {
  an = a1 * r^(n-1)
  series[n] <- an
}

2.4 Average

As you know, the average of \(n\) numbers \(x_1, x_2, \dots, x_n\) is given by the following formula:

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i = \frac{x_1 + x_2 + \dots + x_n}{n} \]

2.4.1 Your Turn: Average

Write R code, using both a for loop, and a while loop, to compute the average of the vector x = 1:100. Don’t use sum() or mean().

Show answer

x = 1:100
n = length(x)

sum_entries = 0

for (i in 1:n) {
  sum_entries <- sum_entries + x[i]
}

avg = sum_entries / n

3 `apply()` and `sweep()` functions

In this part we want to show you some interesting and convenient functions in R for applying a function to the elements of various kinds of objects.

apply(): apply a function to the elements of an array (e.g. a matrix)
lapply(): apply a function to the elements of a list
sapply(): simplified apply a function to the elements of a list

3.1 Example: Column-means with a loop

Consider the following matrix (based on data frame mtcars)

mat = as.matrix(mtcars[1:10,1:5])
mat

                   mpg cyl  disp  hp drat
Mazda RX4         21.0   6 160.0 110 3.90
Mazda RX4 Wag     21.0   6 160.0 110 3.90
Datsun 710        22.8   4 108.0  93 3.85
Hornet 4 Drive    21.4   6 258.0 110 3.08
Hornet Sportabout 18.7   8 360.0 175 3.15
Valiant           18.1   6 225.0 105 2.76
Duster 360        14.3   8 360.0 245 3.21
Merc 240D         24.4   4 146.7  62 3.69
Merc 230          22.8   4 140.8  95 3.92
Merc 280          19.2   6 167.6 123 3.92

A common statistical operation involves computing summary statistics (e.g. mean, median, min, max) of the variables in a table. For example, say you want to calculate column-means. This could be done using a for loop that iterates over all columns, computing the mean for each of them:

# pre-allocate (i.e. initialize) vector of means
col_means = c(0, ncol(mat))

# iterate over all columns of mat
for (j in 1:ncol(mat)) {
  col_means[j] = mean(mat[ ,j])
}

# assign names
names(col_means) = colnames(mat)
col_means

    mpg     cyl    disp      hp    drat 
 20.370   5.800 208.610 122.800   3.538

3.2 Example: Column-means with `apply()`

Interestingly, instead of using a loop, you can also use apply() which allows you to apply a function to the columns, or the rows, or both cols-rows, of a matrix. The three main arguments of apply() are:

X: input array (including a matrix)
MARGIN: dimension index (1 = rows, 2 = columns) on which the function will be applied
FUN: the function to be applied

So, to obtain the column means of mat, we apply the mean() function (FUN = mean) to each column (MARGIN = 2) of the input matrix mat

col_means = apply(X = mat, MARGIN = 2, FUN = mean)
col_means

    mpg     cyl    disp      hp    drat 
 20.370   5.800 208.610 122.800   3.538

3.3 Example: Column-means with `apply()`

Sometimes you may want to apply a computation for which R has no built-in function. This can be done by passing an anonymous function to the argument FUN. Recall that an anonymous function is a function that has no name. Here’s an example to calculate column-means with our own anonymous function:

col_avgs = apply(
  X = mat, 
  MARGIN = 2, 
  FUN = function(x) sum(x) / length(x)
)

col_avgs

    mpg     cyl    disp      hp    drat 
 20.370   5.800 208.610 122.800   3.538

If the body of the anonymous function involves several lines of code, you just have to use curly braces to wrap the body of the function:

col_avgs = apply(
  X = mat, 
  MARGIN = 2, 
  FUN = function(x) {
    n = length(x)
    sum(x) / n
  }
)

col_avgs

    mpg     cyl    disp      hp    drat 
 20.370   5.800 208.610 122.800   3.538

Alternatively, you can create your own (non-anonymous) function first, and then pass it to apply()

average = function(x, na.rm = FALSE) {
  if (na.rm) {
    x = x[!is.na(x)]
  }
  sum(x) / length(x)
}

col_avgs = apply(X = mat, MARGIN = 2, FUN = average)
col_avgs

    mpg     cyl    disp      hp    drat 
 20.370   5.800 208.610 122.800   3.538

Notice that the function average() takes two arguments: x and na.rm. When we pass a function to apply() that takes more than one argument, these additional arguments can also be provided to apply(). The way you do this is by specifying them after the argument FUN, for example:

col_avgs = apply(X = mat, MARGIN = 2, FUN = average, na.rm = TRUE)
col_avgs

    mpg     cyl    disp      hp    drat 
 20.370   5.800 208.610 122.800   3.538

3.4 `sweep()` example

Having obtained column-means, we can mean-center the values in mat, that is, compute deviations from the mean for each column. Again, you could write a loop to do this:

# pre-allocate (i.e. initialize) matrix object
mat_centered = matrix(0, nrow = nrow(mat), ncol = ncol(mat))

for (j in 1:ncol(mat)) {
  mat_centered[ ,j] = mat[ ,j] - col_means[j]
}
mat_centered

       [,1] [,2]    [,3]  [,4]   [,5]
 [1,]  0.63  0.2  -48.61 -12.8  0.362
 [2,]  0.63  0.2  -48.61 -12.8  0.362
 [3,]  2.43 -1.8 -100.61 -29.8  0.312
 [4,]  1.03  0.2   49.39 -12.8 -0.458
 [5,] -1.67  2.2  151.39  52.2 -0.388
 [6,] -2.27  0.2   16.39 -17.8 -0.778
 [7,] -6.07  2.2  151.39 122.2 -0.328
 [8,]  4.03 -1.8  -61.91 -60.8  0.152
 [9,]  2.43 -1.8  -67.81 -27.8  0.382
[10,] -1.17  0.2  -41.01   0.2  0.382

The same can be accomplished without writing a loop thanks to the function sweep()

mat_centered = sweep(mat, MARGIN = 2, STATS = col_means, FUN = "-")
mat_centered

                    mpg  cyl    disp    hp   drat
Mazda RX4          0.63  0.2  -48.61 -12.8  0.362
Mazda RX4 Wag      0.63  0.2  -48.61 -12.8  0.362
Datsun 710         2.43 -1.8 -100.61 -29.8  0.312
Hornet 4 Drive     1.03  0.2   49.39 -12.8 -0.458
Hornet Sportabout -1.67  2.2  151.39  52.2 -0.388
Valiant           -2.27  0.2   16.39 -17.8 -0.778
Duster 360        -6.07  2.2  151.39 122.2 -0.328
Merc 240D          4.03 -1.8  -61.91 -60.8  0.152
Merc 230           2.43 -1.8  -67.81 -27.8  0.382
Merc 280          -1.17  0.2  -41.01   0.2  0.382

MARGIN = 2 indicates sweeping column-by-column
STATS is the statistic to be taking into account
FUN is the function applied (with the given STATS)

3.5 Your Turn: `apply()`

Consider the matrix mat defined above.

Use apply() to get a vector with column maxima (see below).

Show answer

col_max = apply(mat, MARGIN = 2, FUN = max)

   mpg    cyl   disp     hp   drat 
 24.40   8.00 360.00 245.00   3.92

Use sweep() to scale the columns of mat by dividing them by the column maxima (see below).

Show answer

mat_scaled = sweep(mat, MARGIN = 2, STATS = col_max, FUN = "/")

                        mpg  cyl      disp        hp      drat
Mazda RX4         0.8606557 0.75 0.4444444 0.4489796 0.9948980
Mazda RX4 Wag     0.8606557 0.75 0.4444444 0.4489796 0.9948980
Datsun 710        0.9344262 0.50 0.3000000 0.3795918 0.9821429
Hornet 4 Drive    0.8770492 0.75 0.7166667 0.4489796 0.7857143
Hornet Sportabout 0.7663934 1.00 1.0000000 0.7142857 0.8035714
Valiant           0.7418033 0.75 0.6250000 0.4285714 0.7040816
Duster 360        0.5860656 1.00 1.0000000 1.0000000 0.8188776
Merc 240D         1.0000000 0.50 0.4075000 0.2530612 0.9413265
Merc 230          0.9344262 0.50 0.3911111 0.3877551 1.0000000
Merc 280          0.7868852 0.75 0.4655556 0.5020408 1.0000000

1 Introduction

2 About Loops

2.1 Example: Summation Series

2.1.1 Using vectorized code

2.1.2 Using a for loop

2.2 Example: Arithmetic Series

2.2.1 Using a for loop

2.3 Geometric Sequence

2.3.1 Your Turn: for loop

2.4 Average

2.4.1 Your Turn: Average

3 apply() and sweep() functions

3.1 Example: Column-means with a loop

3.2 Example: Column-means with apply()

3.3 Example: Column-means with apply()

3.4 sweep() example

3.5 Your Turn: apply()

2.1.2 Using a `for` loop

2.2.1 Using a `for` loop

2.3.1 Your Turn: `for` loop

3 `apply()` and `sweep()` functions

3.2 Example: Column-means with `apply()`

3.3 Example: Column-means with `apply()`

3.4 `sweep()` example

3.5 Your Turn: `apply()`