Two big ideas today:
Using base R data frame subsetting, create from star_warsβ¦
Letβs look at three ways to solve this.
mean(height)
1 1.75
mean(height)
1 1.75

2016: magrittr introduces %>%
|>
2021:
R now provides a simple native forward pipe syntax |>. The simple form of the forward pipe inserts the left-hand side as the first argument in the right-hand side call.
mean(height)
1 1.75
Itβs good practice to understand the output of each line by breaking the pipe.
Question: What are the dimensions of the data frame at each stage of the pipe: A, B, C, and D?
02:00
|> works everywhereR now provides a simple native forward pipe syntax |>. The simple form of the forward pipe inserts the left-hand side as the first argument in the right-hand side call.
Using base R data frame subsetting, create from star_warsβ¦
You could set up two pipelines with different filters. But thereβs a better way.
group_by()Flag the rows of a dataframe as belong to a group defined by a factor. For use in downstream operations.
group_by()Flag the rows of a dataframe as belong to a group defined by a factor. For use in downstream operations.
# A tibble: 2 Γ 2
homeworld `mean(height)`
<chr> <dbl>
1 Naboo 1.75
2 Tatooine 1.75
Draw diagram
group_by() with summarize()group_by() with summarize()# A tibble: 2 Γ 2
homeworld `mean(height)`
<chr> <dbl>
1 Naboo 1.75
2 Tatooine 1.75
Makes a summary row for each group.
group_by() with filter()group_by() with filter()# A tibble: 2 Γ 4
# Groups: homeworld [2]
name homeworld height weight
<chr> <chr> <dbl> <dbl>
1 Anakin Tatooine 1.8 84
2 JarJar Naboo 1.9 90
Changes the scope of functions inside
filter()to operate within groups.
group_by() with arrange()group_by() with arrange()# A tibble: 4 Γ 4
# Groups: homeworld [2]
name homeworld height weight
<chr> <chr> <dbl> <dbl>
1 JarJar Naboo 1.9 90
2 Anakin Tatooine 1.8 84
3 Luke Tatooine 1.7 77
4 Padme Naboo 1.6 45
Arrange ignores
group_by()and is always global
Why???
name homeworld height weight
1 Padme Naboo 1.6 45
2 JarJar Naboo 1.9 90
3 Luke Tatooine 1.7 77
4 Anakin Tatooine 1.8 84
group_by() with mutate()group_by() with mutate()# A tibble: 4 Γ 5
# Groups: homeworld [2]
name homeworld height weight height_z
<chr> <chr> <dbl> <dbl> <dbl>
1 Anakin Tatooine 1.8 84 0.707
2 Padme Naboo 1.6 45 -0.707
3 Luke Tatooine 1.7 77 -0.707
4 JarJar Naboo 1.9 90 0.707
Changes the scope of functions inside
mutate()to operate within groups.
What will this produce?
# A tibble: 2 Γ 2
homeworld `mean(height_z)`
<chr> <dbl>
1 Naboo 0
2 Tatooine 0
Once grouped, a data frame stays grouped until reduced to one-row-per group or it is ungrouped.
What will this produce?
# A tibble: 1 Γ 1
`mean(height_z)`
<dbl>
1 0
.byMost dplyr functions have a .by argument to apply the function to the groups of another variable.
