Data Reshape
Q1. Import data from the sheet "iris_data_long" from the excel workbook (file_path = "./data/iris_data.xlsx"). Make sure the column names are unique and do not contain spaces. Save the imported data to an object called iris_long
.
Q1 Solution
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
iris_long<-readxl::read_excel("./data/iris_data.xlsx",sheet="iris_data_long",.name_repair="universal",skip=3)
## New names:
## • `Iris ID` -> `Iris.ID`
## • `Measurement location` -> `Measurement.location`
iris_long
## # A tibble: 600 × 4
## Iris.ID Species Measurement.location Measurement
## <dbl> <chr> <chr> <dbl>
## 1 1 setosa Sepal.Length 5.1
## 2 1 setosa Sepal.Width 3.5
## 3 1 setosa Petal.Length 1.4
## 4 1 setosa Petal.Width 0.2
## 5 2 setosa Sepal.Length 4.9
## 6 2 setosa Sepal.Width 3
## 7 2 setosa Petal.Length 1.4
## 8 2 setosa Petal.Width 0.2
## 9 3 setosa Sepal.Length 4.7
## 10 3 setosa Sepal.Width 3.2
## # ℹ 590 more rows
Q2. Reshape iris_long
to a wide format. Your new column names will contain names from Measurement.location
. Your wide data should look as follows:
# A tibble: 150 × 6
Iris.ID Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 setosa 5.1 3.5 1.4 0.2
2 2 setosa 4.9 3 1.4 0.2
3 3 setosa 4.7 3.2 1.3 0.2
4 4 setosa 4.6 3.1 1.5 0.2
5 5 setosa 5 3.6 1.4 0.2
6 6 setosa 5.4 3.9 1.7 0.4
7 7 setosa 4.6 3.4 1.4 0.3
8 8 setosa 5 3.4 1.5 0.2
9 9 setosa 4.4 2.9 1.4 0.2
10 10 setosa 4.9 3.1 1.5 0.1
# ℹ 140 more rows
Q2 Solution
tidyr::pivot_wider(iris_long, names_from = Measurement.location, values_from = Measurement)
## # A tibble: 150 × 6
## Iris.ID Species Sepal.Length Sepal.Width Petal.Length Petal.Width
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 setosa 5.1 3.5 1.4 0.2
## 2 2 setosa 4.9 3 1.4 0.2
## 3 3 setosa 4.7 3.2 1.3 0.2
## 4 4 setosa 4.6 3.1 1.5 0.2
## 5 5 setosa 5 3.6 1.4 0.2
## 6 6 setosa 5.4 3.9 1.7 0.4
## 7 7 setosa 4.6 3.4 1.4 0.3
## 8 8 setosa 5 3.4 1.5 0.2
## 9 9 setosa 4.4 2.9 1.4 0.2
## 10 10 setosa 4.9 3.1 1.5 0.1
## # ℹ 140 more rows
Q3. Let's use table4a
from the tidyr
package. Use pivot_longer()
to place the year columns in a column named year
and their values in a column named cases
.
data(table4a)
table4a
# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 745 2666
2 Brazil 37737 80488
3 China 212258 213766
The resulting data frame should appear as follows:
# A tibble: 6 × 3
country year cases
<chr> <chr> <dbl>
1 Afghanistan 1999 745
2 Afghanistan 2000 2666
3 Brazil 1999 37737
4 Brazil 2000 80488
5 China 1999 212258
6 China 2000 213766
Q3 Solution
pivot_longer(table4a,2:3, names_to = "year", values_to = "cases")
## # A tibble: 6 × 3
## country year cases
## <chr> <chr> <dbl>
## 1 Afghanistan 1999 745
## 2 Afghanistan 2000 2666
## 3 Brazil 1999 37737
## 4 Brazil 2000 80488
## 5 China 1999 212258
## 6 China 2000 213766
Q4. Separate the column rate
from tidyr's table3 into two columns: cases
and population
.
data(table3)
table3
# A tibble: 6 × 3
country year rate
<chr> <dbl> <chr>
1 Afghanistan 1999 745/19987071
2 Afghanistan 2000 2666/20595360
3 Brazil 1999 37737/172006362
4 Brazil 2000 80488/174504898
5 China 1999 212258/1272915272
6 China 2000 213766/1280428583
The result should appear as follows:
# A tibble: 6 × 4
country year cases population
<chr> <dbl> <chr> <chr>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583
Q4 Solution
separate(table3, rate, into = c("cases", "population"))
## # A tibble: 6 × 4
## country year cases population
## <chr> <dbl> <chr> <chr>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583
Reshape challenge
Q5 Use pivot_longer
to reshape countB. You will need to import countB (file_path = "./data/countB.csv"). Your reshaped data should look the same as the data below.
# A tibble: 27 × 4
Feature Replicate SampleA SampleB
<chr> <chr> <int> <int>
1 Tspan6 1 703 71
2 Tspan6 2 567 970
3 Tspan6 3 867 242
4 TNMD 1 490 342
5 TNMD 2 482 935
6 TNMD 3 18 469
7 DPM1 1 921 661
8 DPM1 2 797 8
9 DPM1 3 622 500
10 SCYL3 1 335 774
# ℹ 17 more rows
Q5 Solution
countB<-read.csv("../data/countB.csv",row.names=1) %>% rownames_to_column("Feature")
countB_l<-pivot_longer(countB,
cols=2:length(countB),
names_to = c(".value", "Replicate"),
names_sep = "_")
tibble(countB_l)
## # A tibble: 27 × 4
## Feature Replicate SampleA SampleB
## <chr> <chr> <int> <int>
## 1 Tspan6 1 703 71
## 2 Tspan6 2 567 970
## 3 Tspan6 3 867 242
## 4 TNMD 1 490 342
## 5 TNMD 2 482 935
## 6 TNMD 3 18 469
## 7 DPM1 1 921 661
## 8 DPM1 2 797 8
## 9 DPM1 3 622 500
## 10 SCYL3 1 335 774
## # ℹ 17 more rows