5) Tables and Tidyverse-1
UC Berkeley, STAT 133, Fall 2024
đź“– Lecture
One of the main parts of the course involves working with tabular data, which is the most ubiquitous format in which data is handled. And even if the raw data is not in tabular format, sooner or later, you’ll be dealing with data in tabular format in most Data Analysis Projects.
The good news is that there are many tools (e.g. packages & functions) in R to work with tables. Due to time constraints, we will focus only on 2 approaches:
The “classic” approach which is based on data.frames, the dedicated data object in R to handle tabular data.
The “tidyverse” approach which is based on tibbles (a modern data.frame), and provides a rich ecosystem of packages chiefly designed for working with tables.
We start this week by looking at some important details about data frames, and how to manipulate them from a “classic” approach in R.
We are also going to get our feet wet making graphics with the Tidyverse package "ggplot2"
. In particular you’ll learn about the Grammar of Graphics and its central idea: a graphic involves mapping data to geometric objects and their visual attributes. This is the underlying framework behind the mechanics and core functions available in “ggplot2” for creating simple graphics:
- Begin your graphic specification with
ggplot()
- Indicate where the
data
comes from - Establish which variables are going to be mapped into visual attributes:
aes()
- Decide what kind of geometric object(s) will be used to display data; e.g.
geom_point()
- Optional: do you need to use facets?
- Optional: do you need to explicitly use one or more statistical transformations?
- Optional: do you need to explicitly set a non-default system of coordinates?
- Optional: do you need to explicitly use a non-default theme of graphical elements?
đź“š Reading
Read chapter 9 of “R Coding Basics”:
Read chapters 2 and 4 of “R Tidy Hurricanes”:
🔬 Lab
You will get introduced to the ggplot2 system for making graphics when data is in tabular format. You will also begin your shiny journey by adapting the default Old Faithful app so that it produces the histogram using "ggplot2"
functions.
🎯 Objectives
At the end of this week you will be able to:
- Explain how can you manipulate a data frame as a 2-dimensional object
- Explain how to use the dollar
$
operator with a data frame - Provide a simple example for creating a data frame
- Get a first contact with
"ggplot2"
- Describe the purpose of the aesthetic mapping function
aes()
- Describe the notion of “layers” and how to use them for making a graphic using “ggplot2”
🔆 Shiny Friday
The shiny app for this week involves data from NBA players, visualizing some univariate plots with ggplot2.
We have two versions:
nba-histograms1-basic: histograms of NBA players’ data, includes a select input to choose the variable to visualize, and a slider input to determine the number of bins in the histogram.
https://github.com/data133/shiny/tree/main/nba-histograms1-basicnba-histograms2-facets: same as app1 but this one also includes another select input to decide whether the histograms should be split into facets by either team or player’s position.
https://github.com/data133/shiny/tree/main/nba-histograms2-facets
đź”” Assignments
- HW3 due this 9/27
- HW4 released on 9/28, due 10/04