Skip to content

Group_by, Summarize, Arrange

We will continue with penguins for this exercise. Questions and solutions (Q1-Q3) were taken from https://allisonhorst.shinyapps.io/dplyr-learnr/#section-dplyrgroup_by-summarize.

First, let's convert penguins to a tibble.

penguins <- dplyr::as_tibble(penguins)

Q1: Use group_by() and summarize() to obtain the mean and standard deviation of penguin bill length, grouped by penguin species and sex.

Q1: Solution
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
penguins %>%
  group_by(species, sex) %>%
  summarize(bill_length_mean = mean(bill_len, na.rm = TRUE),
            bill_length_sd = sd(bill_len, na.rm = TRUE))
## `summarise()` has grouped output by 'species'. You can override using the
## `.groups` argument.
## # A tibble: 8 × 4
## # Groups:   species [3]
##   species   sex    bill_length_mean bill_length_sd
##   <fct>     <fct>             <dbl>          <dbl>
## 1 Adelie    female             37.3           2.03
## 2 Adelie    male               40.4           2.28
## 3 Adelie    <NA>               37.8           2.80
## 4 Chinstrap female             46.6           3.11
## 5 Chinstrap male               51.1           1.56
## 6 Gentoo    female             45.6           2.05
## 7 Gentoo    male               49.5           2.72
## 8 Gentoo    <NA>               45.6           1.37

Q2: Use group_by() and summarize() to prepare a summary table containing the maximum and minimum flipper length for male Adelie penguins, grouped by island.

Q2: Solution
penguins %>%
  filter(species == "Adelie", sex == "male") %>%
  group_by(island) %>%
  summarize(flip_max_length = max(flipper_len),
            flip_min_length = min(flipper_len))
## # A tibble: 3 × 3
##   island    flip_max_length flip_min_length
##   <fct>               <int>           <int>
## 1 Biscoe                203             180
## 2 Dream                 208             178
## 3 Torgersen             210             181

Q3: Starting with penguins, create a summary table containing the maximum and minimum length of flippers (call the columns "flip_max" and "flip_min") for chinstrap penguins, grouped by island.

Q3: Solution
penguins %>%
  filter(species == "Chinstrap") %>%
  group_by(island) %>%
  summarize(flip_max = max(flipper_len),
            flip_min = min(flipper_len))
## # A tibble: 1 × 3
##   island flip_max flip_min
##   <fct>     <int>    <int>
## 1 Dream       212      178

Q4. Create a data frame reordering penguins by year, island, and sex.

Q4: Solution
penguins %>% arrange(year, island, sex)
## # A tibble: 344 × 8
##    species island bill_len bill_dep flipper_len body_mass sex     year
##    <fct>   <fct>     <dbl>    <dbl>       <int>     <int> <fct>  <int>
##  1 Adelie  Biscoe     37.8     18.3         174      3400 female  2007
##  2 Adelie  Biscoe     35.9     19.2         189      3800 female  2007
##  3 Adelie  Biscoe     35.3     18.9         187      3800 female  2007
##  4 Adelie  Biscoe     40.5     17.9         187      3200 female  2007
##  5 Adelie  Biscoe     37.9     18.6         172      3150 female  2007
##  6 Gentoo  Biscoe     46.1     13.2         211      4500 female  2007
##  7 Gentoo  Biscoe     48.7     14.1         210      4450 female  2007
##  8 Gentoo  Biscoe     46.5     13.5         210      4550 female  2007
##  9 Gentoo  Biscoe     45.4     14.6         211      4800 female  2007
## 10 Gentoo  Biscoe     43.3     13.4         209      4400 female  2007
## # ℹ 334 more rows

Q5. Create a data frame containing male Adelie penguins reordered by body_mass in descending order.

Q5: Solution
penguins %>% 
  filter(species == "Adelie", sex == "male") %>%
  arrange(desc(body_mass))
## # A tibble: 73 × 8
##    species island    bill_len bill_dep flipper_len body_mass sex    year
##    <fct>   <fct>        <dbl>    <dbl>       <int>     <int> <fct> <int>
##  1 Adelie  Biscoe        43.2     19           197      4775 male   2009
##  2 Adelie  Biscoe        41       20           203      4725 male   2009
##  3 Adelie  Torgersen     42.9     17.6         196      4700 male   2008
##  4 Adelie  Torgersen     39.2     19.6         195      4675 male   2007
##  5 Adelie  Dream         39.8     19.1         184      4650 male   2007
##  6 Adelie  Dream         39.6     18.8         190      4600 male   2007
##  7 Adelie  Biscoe        45.6     20.3         191      4600 male   2009
##  8 Adelie  Torgersen     42.5     20.7         197      4500 male   2007
##  9 Adelie  Dream         37.5     18.5         199      4475 male   2009
## 10 Adelie  Torgersen     41.8     19.4         198      4450 male   2008
## # ℹ 63 more rows