= c("Jupiter", "Saturn", "Uranus", "Neptune")
jovians jovians
[1] "Jupiter" "Saturn" "Uranus" "Neptune"
Stat 133
In this module, we are going to use data from so-called Terrestrial planets. These planets include Mercury, Venus, Earth, and Mars. They are called like this because they are “Earth-like” planets in contrast to the Jovian planets that involve planets similar to Jupiter (i.e. Jupiter, Saturn, Uranus and Neptune). The main characteristics of terrestrial planets is that they are relatively small in size and in mass, with a solid rocky surface, and metals deep in its interior.
planet | gravity | daylength | temp | moons | haswater |
---|---|---|---|---|---|
Mercury | 3.7 | 4222.6 | 167 | 0 | FALSE |
Venus | 8.9 | 2802 | 464 | 0 | FALSE |
Earth | 9.8 | 24 | 15 | 1 | TRUE |
Mars | 3.7 | 24.7 | -65 | 2 | FALSE |
The first step is to create vectors for each of the columns in the data table displayed above. For illustration purposes, we are going to assume the following data types:
planet
: character vector
gravity
: real (i.e. double) vector (\(m/s^2\))
daylength
: real (i.e. double) vector (hours)
temp
: integer vector (mean temperature in Celsius)
moons
: integer vector (number of moons)
haswater
: logical vector (whether has known bodies of liquid water on its surface)
c()
The most common way to create an R vector is with the combine function c()
. Here’s an example:
= c("Jupiter", "Saturn", "Uranus", "Neptune")
jovians jovians
[1] "Jupiter" "Saturn" "Uranus" "Neptune"
planets
with the names of the Terrestrial planets.= c("Mercury", "Venus", "Earth", "Mars") planets
c()
to make vectors gravity
and daylength
for the Terrestrial planets.= c(3.7, 8.9, 9.8, 3.7)
gravity
= c(4222.6, 2802, 24, 24.7) daylength
The creation of the temperature vector seems to be straightforward:
<- c(167, 464, 15, -65)
temp temp
[1] 167 464 15 -65
But there is a catch. The issue is that the way temp
was created is as a vector of type "double"
instead of type "integer"
as required:
typeof(temp)
[1] "double"
So how do you create integer vectors in R? You have to use a special notation for integer numbers. Here’s an example:
= c(2L, 4L, 6L)
ints ints
[1] 2 4 6
Notice how we append an upper case L
at the end of every numeric value. This is how you tell R to store such numbers as integers.
temp
and moons
for the Terrestrial planets. Inspect their data types, with typeof()
, to confirm that they are integer vectors.= c(167L, 464L, 15L, -65L)
temp
= c(0L, 0L, 1L, 2L) moons
TRUE
and FALSE
) for the variable haswater
.= c(FALSE, FALSE, TRUE, FALSE) haswater
Now that we have various vectors—of different data types—to play with, let’s talk about Coercion which is what R does when you try to combine distinct data types into a single vector.
We have an integer vector ints
ints
[1] 2 4 6
What happens if we combine ints
with one or more double values? What is the data type of the new vector values
?
= c(ints, 8, 10)
values values
[1] 2 4 6 8 10
typeof(values)
[1] "double"
Can you guess why values
is not of type "integer"
anymore?
Inspect the data type of the following combination of vectors:
planets
with gravity
typeof(c(planets, gravity))
planets
with temp
typeof(c(planets, temp))
planets
with haswater
typeof(c(planets, haswater))
gravity
with daylength
typeof(c(gravity, daylength))
gravity
with temp
typeof(c(gravity, temp))
temp
with moons
typeof(c(temp, moons))
temp
with haswater
typeof(c(temp, haswater))
Can you see a pattern?
In addition to coercion, another fundamental concept to learn about R vectors is that of vectorization. This is basically what R does when you apply a function or an operation to all the elements of a vector in a simultaneous way.
Here’s an example. Let’s bring back the integer vector ints
, and suppose that we want to obtain the square root of all its elements. One option to do this is by taking the square root of each element in ints
, one by one—separately—using the sqrt()
function:
sqrt(ints[1])
[1] 1.414214
sqrt(ints[2])
[1] 2
sqrt(ints[3])
[1] 2.44949
We haven’t talked about this yet, but notice how you refer to the elements in a vector by indicating their position: using square brackets [ ]
with a numeric index for the position of the element you want to operate on.
Now, instead of having to repeat the same command three times, we can use the function sqrt()
in a single call because it is a vectorized function. This means that sqrt()
can compute the square root of all the elements in a vector simultaneously:
sqrt(ints)
[1] 1.414214 2.000000 2.449490
Likewise, pretty much all arithmetic operators (addition, subtraction, multiplication, division, power, etc) are vectorized. For instance, say we want to add c(1,2,3)
to ints
, here’s how to do it with vectorized code:
+ c(1, 2, 3) ints
[1] 3 6 9
gravity
and create a new vector gravity_log
by taking the logarithm, with log()
, of the values in gravity
= log(gravity) gravity_log
moons
and haswater
. Try adding them and see what happens. What is R doing in this case?+ haswater moons
moons
and haswater
. Try subtracting them, e.g. moons - haswater
, and see what happens. What is R doing in this case?- haswater moons
Related to vectorization, there is another important concept called recycling which has to do with what R does when you operate with two (or more) vectors of different length.
Consider the vectors ints
and values
which, by the way, are of different length:
ints
[1] 2 4 6
values
[1] 2 4 6 8 10
What if we try to add ints
and values
? Is this possible?
+ values ints
Warning in ints + values: longer object length is not a multiple of shorter
object length
[1] 4 8 12 10 14
Yes, you can add two numeric vectors of different lengths such as ints
plus values
. Notice though that R gives a warning message along the lines of
longer object length is not a multiple of shorter object length
This message tells you that the length of the longer vector, values
, is not a multiple of the length of the shorter vector ints
.
When computing ints + values
, R is basically recycling or repeating some of the numbers in ints
to match the length of values
. To be more precise, here is what R is adding:
2L + 2 = 4
4L + 4 = 6
6L + 6 = 12
2L + 8 = 10
4L + 10 = 14
All the integer numbers come from ints
whereas all the double numbers come from values
.
daylength
(measured in hours) and create a new vector dayminutes
in which the units are expressed in minutes instead of hours.= daylength * 60 dayminutes
temp
(measured in Celsius degrees) and create a new vector temp2
in which the units are expressed in Fahrenheit degrees. The conversion factor from Celsius to Fahrenheit is:\[ (1^{\circ}C × 9/5) + 32 = 33.8^{\circ}F \]
= (temp * 9/5) + 32 temp2
^
to raise ints
to the values of moons
: e.g. ints^moons
. If you get a warning, why is so? What is R doing behind this operation?# R gives a warning because the longer vector "moons" is not a
# multiple of the shorter vector "ints"