Group_by, Summarize, Arrange
We will continue with penguins
for this exercise. Questions and solutions (Q1-Q3) were taken from https://allisonhorst.shinyapps.io/dplyr-learnr/#section-dplyrgroup_by-summarize.
First, let's convert penguins
to a tibble.
penguins <- dplyr::as_tibble(penguins)
Q1: Use group_by()
and summarize()
to obtain the mean and standard deviation of penguin bill length, grouped by penguin species and sex.
Q1: Solution
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
penguins %>%
group_by(species, sex) %>%
summarize(bill_length_mean = mean(bill_len, na.rm = TRUE),
bill_length_sd = sd(bill_len, na.rm = TRUE))
## `summarise()` has grouped output by 'species'. You can override using the
## `.groups` argument.
## # A tibble: 8 × 4
## # Groups: species [3]
## species sex bill_length_mean bill_length_sd
## <fct> <fct> <dbl> <dbl>
## 1 Adelie female 37.3 2.03
## 2 Adelie male 40.4 2.28
## 3 Adelie <NA> 37.8 2.80
## 4 Chinstrap female 46.6 3.11
## 5 Chinstrap male 51.1 1.56
## 6 Gentoo female 45.6 2.05
## 7 Gentoo male 49.5 2.72
## 8 Gentoo <NA> 45.6 1.37
Q2: Use group_by()
and summarize()
to prepare a summary table containing the maximum and minimum flipper length for male Adelie penguins, grouped by island.
Q2: Solution
penguins %>%
filter(species == "Adelie", sex == "male") %>%
group_by(island) %>%
summarize(flip_max_length = max(flipper_len),
flip_min_length = min(flipper_len))
## # A tibble: 3 × 3
## island flip_max_length flip_min_length
## <fct> <int> <int>
## 1 Biscoe 203 180
## 2 Dream 208 178
## 3 Torgersen 210 181
Q3: Starting with penguins, create a summary table containing the maximum and minimum length of flippers (call the columns "flip_max" and "flip_min") for chinstrap penguins, grouped by island.
Q3: Solution
penguins %>%
filter(species == "Chinstrap") %>%
group_by(island) %>%
summarize(flip_max = max(flipper_len),
flip_min = min(flipper_len))
## # A tibble: 1 × 3
## island flip_max flip_min
## <fct> <int> <int>
## 1 Dream 212 178
Q4. Create a data frame reordering penguins
by year, island, and sex.
Q4: Solution
penguins %>% arrange(year, island, sex)
## # A tibble: 344 × 8
## species island bill_len bill_dep flipper_len body_mass sex year
## <fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
## 1 Adelie Biscoe 37.8 18.3 174 3400 female 2007
## 2 Adelie Biscoe 35.9 19.2 189 3800 female 2007
## 3 Adelie Biscoe 35.3 18.9 187 3800 female 2007
## 4 Adelie Biscoe 40.5 17.9 187 3200 female 2007
## 5 Adelie Biscoe 37.9 18.6 172 3150 female 2007
## 6 Gentoo Biscoe 46.1 13.2 211 4500 female 2007
## 7 Gentoo Biscoe 48.7 14.1 210 4450 female 2007
## 8 Gentoo Biscoe 46.5 13.5 210 4550 female 2007
## 9 Gentoo Biscoe 45.4 14.6 211 4800 female 2007
## 10 Gentoo Biscoe 43.3 13.4 209 4400 female 2007
## # ℹ 334 more rows
Q5. Create a data frame containing male Adelie penguins reordered by body_mass in descending order.
Q5: Solution
penguins %>%
filter(species == "Adelie", sex == "male") %>%
arrange(desc(body_mass))
## # A tibble: 73 × 8
## species island bill_len bill_dep flipper_len body_mass sex year
## <fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
## 1 Adelie Biscoe 43.2 19 197 4775 male 2009
## 2 Adelie Biscoe 41 20 203 4725 male 2009
## 3 Adelie Torgersen 42.9 17.6 196 4700 male 2008
## 4 Adelie Torgersen 39.2 19.6 195 4675 male 2007
## 5 Adelie Dream 39.8 19.1 184 4650 male 2007
## 6 Adelie Dream 39.6 18.8 190 4600 male 2007
## 7 Adelie Biscoe 45.6 20.3 191 4600 male 2009
## 8 Adelie Torgersen 42.5 20.7 197 4500 male 2007
## 9 Adelie Dream 37.5 18.5 199 4475 male 2009
## 10 Adelie Torgersen 41.8 19.4 198 4450 male 2008
## # ℹ 63 more rows