Skip to content

Data Reshape

Q1. Import data from the sheet "iris_data_long" from the excel workbook (file_path = "./data/iris_data.xlsx"). Make sure the column names are unique and do not contain spaces. Save the imported data to an object called iris_long.

Q1 Solution
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
iris_long<-readxl::read_excel("./data/iris_data.xlsx",sheet="iris_data_long",.name_repair="universal",skip=3)
## New names:
## • `Iris ID` -> `Iris.ID`
## • `Measurement location` -> `Measurement.location`

iris_long
## # A tibble: 600 × 4
##    Iris.ID Species Measurement.location Measurement
##      <dbl> <chr>   <chr>                      <dbl>
##  1       1 setosa  Sepal.Length                 5.1
##  2       1 setosa  Sepal.Width                  3.5
##  3       1 setosa  Petal.Length                 1.4
##  4       1 setosa  Petal.Width                  0.2
##  5       2 setosa  Sepal.Length                 4.9
##  6       2 setosa  Sepal.Width                  3  
##  7       2 setosa  Petal.Length                 1.4
##  8       2 setosa  Petal.Width                  0.2
##  9       3 setosa  Sepal.Length                 4.7
## 10       3 setosa  Sepal.Width                  3.2
## # ℹ 590 more rows

Q2. Reshape iris_long to a wide format. Your new column names will contain names from Measurement.location. Your wide data should look as follows:

# A tibble: 150 × 6
   Iris.ID Species Sepal.Length Sepal.Width Petal.Length Petal.Width
     <dbl> <chr>          <dbl>       <dbl>        <dbl>       <dbl>
 1       1 setosa           5.1         3.5          1.4         0.2
 2       2 setosa           4.9         3            1.4         0.2
 3       3 setosa           4.7         3.2          1.3         0.2
 4       4 setosa           4.6         3.1          1.5         0.2
 5       5 setosa           5           3.6          1.4         0.2
 6       6 setosa           5.4         3.9          1.7         0.4
 7       7 setosa           4.6         3.4          1.4         0.3
 8       8 setosa           5           3.4          1.5         0.2
 9       9 setosa           4.4         2.9          1.4         0.2
10      10 setosa           4.9         3.1          1.5         0.1
# ℹ 140 more rows
Q2 Solution
tidyr::pivot_wider(iris_long, names_from = Measurement.location, values_from = Measurement)
## # A tibble: 150 × 6
##    Iris.ID Species Sepal.Length Sepal.Width Petal.Length Petal.Width
##      <dbl> <chr>          <dbl>       <dbl>        <dbl>       <dbl>
##  1       1 setosa           5.1         3.5          1.4         0.2
##  2       2 setosa           4.9         3            1.4         0.2
##  3       3 setosa           4.7         3.2          1.3         0.2
##  4       4 setosa           4.6         3.1          1.5         0.2
##  5       5 setosa           5           3.6          1.4         0.2
##  6       6 setosa           5.4         3.9          1.7         0.4
##  7       7 setosa           4.6         3.4          1.4         0.3
##  8       8 setosa           5           3.4          1.5         0.2
##  9       9 setosa           4.4         2.9          1.4         0.2
## 10      10 setosa           4.9         3.1          1.5         0.1
## # ℹ 140 more rows

Q3. Let's use table4a from the tidyr package. Use pivot_longer() to place the year columns in a column named year and their values in a column named cases.

data(table4a)
table4a
# A tibble: 3 × 3
  country     `1999` `2000`
  <chr>        <dbl>  <dbl>
1 Afghanistan    745   2666
2 Brazil       37737  80488
3 China       212258 213766

The resulting data frame should appear as follows:

# A tibble: 6 × 3
  country     year   cases
  <chr>       <chr>  <dbl>
1 Afghanistan 1999     745
2 Afghanistan 2000    2666
3 Brazil      1999   37737
4 Brazil      2000   80488
5 China       1999  212258
6 China       2000  213766
Q3 Solution
pivot_longer(table4a,2:3, names_to = "year", values_to = "cases")
## # A tibble: 6 × 3
##   country     year   cases
##   <chr>       <chr>  <dbl>
## 1 Afghanistan 1999     745
## 2 Afghanistan 2000    2666
## 3 Brazil      1999   37737
## 4 Brazil      2000   80488
## 5 China       1999  212258
## 6 China       2000  213766

Q4. Separate the column rate from tidyr's table3 into two columns: cases and population.

data(table3)
table3
# A tibble: 6 × 3
  country      year rate             
  <chr>       <dbl> <chr>            
1 Afghanistan  1999 745/19987071     
2 Afghanistan  2000 2666/20595360    
3 Brazil       1999 37737/172006362  
4 Brazil       2000 80488/174504898  
5 China        1999 212258/1272915272
6 China        2000 213766/1280428583

The result should appear as follows:

# A tibble: 6 × 4
  country      year cases  population
  <chr>       <dbl> <chr>  <chr>     
1 Afghanistan  1999 745    19987071  
2 Afghanistan  2000 2666   20595360  
3 Brazil       1999 37737  172006362 
4 Brazil       2000 80488  174504898 
5 China        1999 212258 1272915272
6 China        2000 213766 1280428583
Q4 Solution
separate(table3, rate, into = c("cases", "population"))
## # A tibble: 6 × 4
##   country      year cases  population
##   <chr>       <dbl> <chr>  <chr>     
## 1 Afghanistan  1999 745    19987071  
## 2 Afghanistan  2000 2666   20595360  
## 3 Brazil       1999 37737  172006362 
## 4 Brazil       2000 80488  174504898 
## 5 China        1999 212258 1272915272
## 6 China        2000 213766 1280428583

Reshape challenge

Q5 Use pivot_longer to reshape countB. You will need to import countB (file_path = "./data/countB.csv"). Your reshaped data should look the same as the data below.

# A tibble: 27 × 4
   Feature Replicate SampleA SampleB
   <chr>   <chr>       <int>   <int>
 1 Tspan6  1             703      71
 2 Tspan6  2             567     970
 3 Tspan6  3             867     242
 4 TNMD    1             490     342
 5 TNMD    2             482     935
 6 TNMD    3              18     469
 7 DPM1    1             921     661
 8 DPM1    2             797       8
 9 DPM1    3             622     500
10 SCYL3   1             335     774
# ℹ 17 more rows
Q5 Solution
countB<-read.csv("../data/countB.csv",row.names=1) %>% rownames_to_column("Feature")

countB_l<-pivot_longer(countB,
        cols=2:length(countB),
        names_to = c(".value", "Replicate"),
        names_sep = "_")

tibble(countB_l)
## # A tibble: 27 × 4
##    Feature Replicate SampleA SampleB
##    <chr>   <chr>       <int>   <int>
##  1 Tspan6  1             703      71
##  2 Tspan6  2             567     970
##  3 Tspan6  3             867     242
##  4 TNMD    1             490     342
##  5 TNMD    2             482     935
##  6 TNMD    3              18     469
##  7 DPM1    1             921     661
##  8 DPM1    2             797       8
##  9 DPM1    3             622     500
## 10 SCYL3   1             335     774
## # ℹ 17 more rows