library(tidyverse)
data(starwars)
Practice: Vectors (part 2)
Stat 133
- Work with vectors of different data types
- Create vectors of numeric sequences
- Understand the concept of atomic vectors
- Learn how to subset and slice R vectors
1 Vectors from data "starwars"
In this module, you are going to work with the data starwars
from "dplyr"
which is part of the "tidyverse"
ecosystem of packages.
To be more specific, we will focus on columns name
, height
, mass
, sex
, and homeworld
, and we will also remove rows that contain missing values:
= c("name", "height", "mass", "sex", "homeworld")
variables = na.omit(starwars[ ,variables]) dat
1.1 Columns into vectors
Because we are interested in working with the above five columns from a vectors perspective, we need to “break apart” the table dat
into five vectors:
# creating 5 vectors (from columns in dat)
= dat$name
name
= dat$height
height
= dat$mass
mass
= dat$sex
sex
= dat$homeworld homeworld
Use the function typeof()
to see the data type of each of the above vectors.
Show answer
typeof(name)
typeof(height)
typeof(mass)
typeof(sex)
typeof(homeworld)
1.2 Your turn: subsetting vectors
The code below is one way to create a vector four
by selecting the first four elements in name
:
= head(name, n = 4) four
Single brackets [ ]
are used to subset (i.e. subscript, split, slice) vectors. Without running the code, try to guess the output of the following commands, and then run them to check your guess:
number one:
four[1]
an index of zero:
four[0]
?a negative index:
four[-1]
?various negative indices:
four[-c(1,2,3)]
?an index greater than the length of the vector:
four[5]
?repeated indices:
four[c(1,2,2,3,3,3)]
?
Often, you will need to generate vectors of numeric sequences, like the first five elements 1:5
, or from the first till the last element 1:length(name)
. R provides the colon operator :
, and the functions seq()
, and rep()
to create various types of sequences.
1.3 Your turn: sequences and repetitions
Figure out how to use seq()
, rep()
, rev()
, and bracket notation, to extract:
- all the even elements in
name
(i.e. extract positions 2, 4, 6, etc)
Show answer
# all the even elements in name
seq(from = 2, to = length(name), by = 2)] name[
- all the odd elements in
height
(i.e. extract positions 1, 3, 5, etc)
Show answer
# all the odd elements in height
seq(from = 1, to = length(height), by = 2)] height[
- all multiples of 5 (e.g. 5, 10, 15, etc) of
sex
Show answer
# all multiples of 5 (e.g. 5, 10, 15, etc) of sex
seq(from = 5, to = length(sex), by = 5)] sex[
- elements in positions 10, 20, 30, 40, etc of
mass
Show answer
# elements in positions 10, 20, 30, 40, etc of mass
seq(from = 10, to = length(mass), by = 10)] mass[
- all the even elements in
name
but this time in reverse order
Show answer
# all the even elements in name but this time in reverse order
rev(name[seq(from = 2, to = length(name), by = 2)])
2 Logical Subsetting and Comparisons
Another kind of subsetting/subscripting style is the so-called logical subsetting. This kind of subsetting typically takes place when making comparisons. A comparison operation occurs when you use comparison operators such as:
>
greater than>=
greater than or equal<
less than<=
less than or equal==
equal!=
different
For example:
<- height[1:4]
height_four
# elements greater than 100
> 100]
height_four[height_four
# elements less than 100
< 100]
height_four[height_four
# elements less than or equal to 10
<= 10]
height_four[height_four
# elements different from 10
!= 10] height_four[height_four
In addition to using comparison operators, you can also use logical operators to produce a logical vector. The most common type of logical operators are:
&
AND|
OR!
negation
Run the following commands to see what R does:
# AND
TRUE & TRUE
TRUE & FALSE
FALSE & FALSE
# OR
TRUE | TRUE
TRUE | FALSE
FALSE | FALSE
# NOT
!TRUE
!FALSE
Logical operators allow you to combine several comparisons:
# vectors for first 10 elements
<- name[1:10]
name10 <- height[1:10]
height10 <- mass[1:10]
mass10 <- sex[1:10]
sex10
# names of first 10 individuals with mass greater than 70kg
> 70]
name10[mass10
# names of first 10 individuals with heights between 150 and 200 (exclusive)
> 150 & height10 < 200] name10[height10
2.1 Your turn: logical subsetting
Write commands, using bracket notation, to answer the following questions (you may need to use is.na()
, min()
, max()
, which()
, which.min()
, which.max()
):
- name of individuals from homeworld Naboo
Show answer
# name of individuals from homeworld Naboo
== "Naboo"] name[homeworld
- name of individuals from homeworlds Naboo or Corellia; hint: the OR operator
|
is your friend.
Show answer
# name of individuals from Naboo or Corellia
== "Naboo" | homeworld == "Corellia"] name[homeworld
- name of female individuals
Show answer
# name of female individuals
== "female"] name[sex
- number (i.e. count) of male individuals; hint: the
sum()
function is your friend.
Show answer
# number of male individuals
sum(sex == "male")
- name of individuals with largest mass; hint: the
which.max()
function is your friend.
Show answer
# name of individuals with largest mass
which.max(mass)] name[
- largest height of all females; hint: the
max()
function is your friend.
Show answer
# largest height of all females
max(height[sex == "female"])
- name of individual(s) with height equal to the median height; hint: the
median()
function is your friend.
Show answer
# name of individual(s) with height equal to the median height
== median(height)] name[height
- name of individual(s) with height of at most 180, AND mass of at least 120; hint: the logical AND operator
&
is your friend.
Show answer
# name of individual(s) with height of at most 180, and mass of at least 120
<= 180 & mass >= 120] name[height