library(tidyverse)
Practice: Graphics with ggplot2 (part 2)
Stat 133
- Get started with
"ggplot2"
- Produce basic plots with
ggplot()
- Gain familiarity with the
aes()
function - Learn about the various geoms, or geometric objects, and recognize them
- Understand why and how to facet
- Try out different plot themes
1 First contact with ggplot()
In this module you will learn how to create graphics with "ggplot2"
which is part of the "tidyverse"
ecosystem of packages.
1.1 Data mpg
For illustration purposes we are going to use the mpg
data which is one of the data sets in "ggplot2"
:
mpg
# A tibble: 234 × 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
3 audi a4 2 2008 4 manu… f 20 31 p comp…
4 audi a4 2 2008 4 auto… f 21 30 p comp…
5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
# ℹ 224 more rows
2 Example: Scatterplots
Let’s start with a scatter plot to visualize the relationship between engine displacement displ
, and highway miles per gallon hwy
.
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point()
Recap:
ggplot()
creates an object of class"ggplot"
the main input for
ggplot()
isdata
which must be a data framethen we use the
"+"
operator to add a layerthe geometric object (geom) are points:
geom_point()
aes()
is used to specify thex
andy
coordinates, by taking columnsdispl
andhwy
from the data frame
The same scatter plot can also be created with this alternative use of ggplot()
# scatterplot (option 2)
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy))
3 Using aes()
Does anything happen if you don’t name the arguments to aes()
, i.e. you type in aes(displ, hwy)
? What if you type in aes(hwy, displ)
?
Show answer
# Typing aes(hwy, displ) gives you another scatter plot
# in which 'hwy' goes to the x-axis, and 'displ' to y-axis
ggplot(data = mpg, aes(hwy, displ)) +
geom_point()
Let’s restrict the data set to just cars manufactured by Audi.
= filter(mpg, manufacturer == 'audi')
audi
ggplot(data = audi, aes(x = displ, y = hwy)) +
geom_point()
3.1 Using geom_text()
Let’s label each point using model
by adding a geom_text()
layer, and mapping this argument with aes()
:
ggplot(data = audi, aes(hwy, displ)) +
geom_point() +
geom_text(aes(label = model))
- The model names overlap with the points. Modify your code above by using the
nudge_y
argument ingeom_text()
. Does it go inside or outside ofaes()
? Now, replacegeom_text()
withgeom_label()
. What difference do you notice? Did you have to modify the arguments toaes()
at all?
Show answer
# argument nudge_y goes outside aes()
ggplot(data = audi, aes(hwy, displ)) +
geom_point() +
geom_text(aes(label = model), nudge_y = 0.1)
- Next, cut and paste the
aes(x = displ, y = hwy)
from the argument ofggplot()
to the argument ofgeom_point()
. Do you run into an error? What if you copy thex
andy
arguments over to theaes()
function ingeom_text()
?
Show answer
# specifying a local mapping just for geom_point results in an error
ggplot(data = audi) +
geom_point(aes(hwy, displ)) +
geom_text(aes(label = model), nudge_y = 0.1)
4 Adding color
Let’s go back to the full data set.
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point()
- First, make all the points blue. Should you use
aes()
?
Show answer
# to make all points blue don't use aes()
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(color = "blue")
- Next, color the points by
class
. Should you useaes()
this time?
Show answer
# to color points by 'class' you should use aes()
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class))
- Try coloring the points off of other variables. Which variables display a reasonable amount of colors? Which ones display far too many?
Show answer
# you can color points by 'drv' (the type of drive train)
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = drv))
# coloring points by 'manufacturer' gives too many colors
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = manufacturer))
- Also try modifying the size of points, both inside and outside of
aes()
. For which variables does it make sense to map them to size? For which ones does it not make sense?
Show answer
# mapping 'cty' to size
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class, size = cty))
# setting size to 3
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class), size = 3)
5 Adding smoothers
Let’s fit a line to our scatter plot. To be more specific, let’s fit a linear model (i.e. least squares regression line) of hwy
onto displ
:
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = 'lm')
`geom_smooth()` using formula = 'y ~ x'
geom_smooth()
with the argument method = 'lm'
plots a least squares regression line for highway mileage on engine displacement. The translucent gray band is a confidence interval for the predictions of mileage.
- Modify the code above by adding a vertical line at \(x=4\) using
geom_vline()
. Does it requireaes()
?
Show answer
# adding a vertical line does not require aes()
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = 'lm') +
geom_vline(xintercept = 4)
- Now try adding a vertical line at the mean of
displ
. Does it requireaes()
this time?
Show answer
# adding a vertical line at the mean of 'displ'
# does require aes()
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = 'lm') +
geom_vline(aes(xintercept = mean(displ)))
- Do the same with a horizontal line at the mean of
hwy
. Play around with color and size. Should those arguments go inside or outside ofaes()
?
Show answer
# adding a horizontal line at the mean of 'hwy'
# does require aes()
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(method = 'lm') +
geom_hline(aes(yintercept = mean(hwy)))
6 Using facets
Let’s return to the basic scatter plot again.
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point()
There are only two unique values for year
, 1999 and 2008. Let’s compare the relationship between hwy
and displ
, distinguishing by years using facet_wrap()
.
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ year)
- What happens when you facet (essentially, compare) using a different variable? Modify the code above and try.
Show answer
# your code here
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ cyl)
facet_grid()
works slightly differently fromfacet_wrap
. The latter takes in only one variable, which always goes behind the~
, and it ‘wraps’ the plots left to right, top to bottom.facet_grid()
allows you to facet into just rows or just columns.
Show answer
# facet into rows
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_grid(. ~ year)
Show answer
# facet into columns
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_grid(year ~ .)
The .
is a placeholder for a variable. Modify either code chunk above by replacing the .
with another variable, such as cyl
. How does the display change?
6.1 More facets
Finally, let’s study just the distribution of highway mileage.
ggplot(data = mpg) +
geom_histogram(aes(x = hwy)) +
facet_wrap(~ class)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Instead of using a histogram to study the distribution, let’s try a boxplot instead.
ggplot(data = mpg) +
geom_boxplot(aes(x = hwy))
Notice that geom_histogram()
and geom_boxplot()
required only an x
aesthetic. What happens if you replace x
with y
?
- Facet again by modifying the code above, this time using any variable of your choice.
Show answer
# facets by 'drv' (the type of drive train)
ggplot(data = mpg) +
geom_boxplot(aes(x = displ)) +
facet_wrap(~ drv)
7 Using Themes
Graphics produced with ggplot()
use a default theme for things such as the color of the background, the grid lines (auxiliary horizontal and vertical lines), the tick marks, position of a legend, etc.
Interestingly, you can change the appearance of a graphic by using other themes. Here’s one example with theme_bw()
that uses a Black-White theme layer to the original scatter plot:
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
theme_bw()
Look at more theme options in the "ggplot2"
cheatsheet or check the help()
documentation of theme functions ?theme_bw
, and try at least 2 more themes:
Show answer
# minimal theme
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
theme_minimal()
# classic theme
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point() +
theme_classic()