Practice: Vectors (part 2)

Stat 133

Author

Gaston Sanchez

Learning Objectives
  • Work with vectors of different data types
  • Create vectors of numeric sequences
  • Understand the concept of atomic vectors
  • Learn how to subset and slice R vectors

1 Vectors from data "starwars"

In this module, you are going to work with the data starwars from "dplyr" which is part of the "tidyverse" ecosystem of packages.

library(tidyverse)
data(starwars)

To be more specific, we will focus on columns name, height, mass, sex, and homeworld, and we will also remove rows that contain missing values:

variables = c("name", "height", "mass", "sex", "homeworld")
dat = na.omit(starwars[ ,variables])

1.1 Columns into vectors

Because we are interested in working with the above five columns from a vectors perspective, we need to “break apart” the table dat into five vectors:

# creating 5 vectors (from columns in dat)
name = dat$name

height = dat$height

mass = dat$mass

sex = dat$sex

homeworld = dat$homeworld

Use the function typeof() to see the data type of each of the above vectors.

Show answer
typeof(name)
typeof(height)
typeof(mass)
typeof(sex)
typeof(homeworld)

1.2 Your turn: subsetting vectors

The code below is one way to create a vector four by selecting the first four elements in name:

four = head(name, n = 4)

Single brackets [ ] are used to subset (i.e. subscript, split, slice) vectors. Without running the code, try to guess the output of the following commands, and then run them to check your guess:

  1. number one: four[1]

  2. an index of zero: four[0]?

  3. a negative index: four[-1]?

  4. various negative indices: four[-c(1,2,3)]?

  5. an index greater than the length of the vector: four[5]?

  6. repeated indices: four[c(1,2,2,3,3,3)]?

Often, you will need to generate vectors of numeric sequences, like the first five elements 1:5, or from the first till the last element 1:length(name). R provides the colon operator :, and the functions seq(), and rep() to create various types of sequences.

1.3 Your turn: sequences and repetitions

Figure out how to use seq(), rep(), rev(), and bracket notation, to extract:

  1. all the even elements in name (i.e. extract positions 2, 4, 6, etc)
Show answer
# all the even elements in name
name[seq(from = 2, to = length(name), by = 2)]
  1. all the odd elements in height (i.e. extract positions 1, 3, 5, etc)
Show answer
# all the odd elements in height
height[seq(from = 1, to = length(height), by = 2)]
  1. all multiples of 5 (e.g. 5, 10, 15, etc) of sex
Show answer
# all multiples of 5 (e.g. 5, 10, 15, etc) of sex
sex[seq(from = 5, to = length(sex), by = 5)]
  1. elements in positions 10, 20, 30, 40, etc of mass
Show answer
# elements in positions 10, 20, 30, 40, etc of mass
mass[seq(from = 10, to = length(mass), by = 10)]
  1. all the even elements in name but this time in reverse order
Show answer
# all the even elements in name but this time in reverse order
rev(name[seq(from = 2, to = length(name), by = 2)])

2 Logical Subsetting and Comparisons

Another kind of subsetting/subscripting style is the so-called logical subsetting. This kind of subsetting typically takes place when making comparisons. A comparison operation occurs when you use comparison operators such as:

  • > greater than
  • >= greater than or equal
  • < less than
  • <= less than or equal
  • == equal
  • != different

For example:

height_four <- height[1:4]

# elements greater than 100
height_four[height_four > 100]

# elements less than 100
height_four[height_four < 100]

# elements less than or equal to 10
height_four[height_four <= 10]

# elements different from 10
height_four[height_four != 10]

In addition to using comparison operators, you can also use logical operators to produce a logical vector. The most common type of logical operators are:

  • & AND
  • | OR
  • ! negation

Run the following commands to see what R does:

# AND
TRUE & TRUE
TRUE & FALSE
FALSE & FALSE

# OR
TRUE | TRUE
TRUE | FALSE
FALSE | FALSE

# NOT
!TRUE
!FALSE

Logical operators allow you to combine several comparisons:

# vectors for first 10 elements
name10 <- name[1:10]
height10 <- height[1:10]
mass10 <- mass[1:10]
sex10 <- sex[1:10]

# names of first 10 individuals with mass greater than 70kg
name10[mass10 > 70]

# names of first 10 individuals with heights between 150 and 200 (exclusive)
name10[height10 > 150 & height10 < 200]

2.1 Your turn: logical subsetting

Write commands, using bracket notation, to answer the following questions (you may need to use is.na(), min(), max(), which(), which.min(), which.max()):

  1. name of individuals from homeworld Naboo
Show answer
# name of individuals from homeworld Naboo
name[homeworld == "Naboo"]
  1. name of individuals from homeworlds Naboo or Corellia; hint: the OR operator | is your friend.
Show answer
# name of individuals from Naboo or Corellia
name[homeworld == "Naboo" | homeworld == "Corellia"]
  1. name of female individuals
Show answer
# name of female individuals
name[sex == "female"]
  1. number (i.e. count) of male individuals; hint: the sum() function is your friend.
Show answer
# number of male individuals
sum(sex == "male")
  1. name of individuals with largest mass; hint: the which.max() function is your friend.
Show answer
# name of individuals with largest mass
name[which.max(mass)]
  1. largest height of all females; hint: the max() function is your friend.
Show answer
# largest height of all females
max(height[sex == "female"])
  1. name of individual(s) with height equal to the median height; hint: the median() function is your friend.
Show answer
# name of individual(s) with height equal to the median height
name[height == median(height)]
  1. name of individual(s) with height of at most 180, AND mass of at least 120; hint: the logical AND operator & is your friend.
Show answer
# name of individual(s) with height of at most 180, and mass of at least 120 
name[height <= 180 & mass >= 120]