Help Session Lesson 4
Plotting with ggplot2
For the following plots, let's use the diamonds data (?diamonds
).
The diamonds dataset comes in ggplot2 and contains information about ~54,000 diamonds, including the price, carat, color, clarity, and cut of each diamond. --- R4DS
library(ggplot2)
data(diamonds)
diamonds
## # A tibble: 53,940 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
## # ℹ 53,930 more rows
Create a scatter plot demonstrating how carat
(x axis) relates to price
(y axis).
Solution}
ggplot(data = diamonds) +
geom_point(mapping = aes(x = carat, y = price))
-
Color the points above by
clarity
and scale the colors using the viridis package, option "magma".Solution}
ggplot(data = diamonds) + geom_point(mapping = aes(x = carat, y = price,color=clarity)) + viridis::scale_color_viridis(discrete=TRUE,option="magma")
-
Apply the complete theme,
theme_classic()
.Solution}
ggplot(data = diamonds) + geom_point(mapping = aes(x = carat, y = price,color=clarity)) + viridis::scale_color_viridis(discrete=TRUE,option="magma") + theme_classic()
Create a bar chart displaying the number of diamonds per cut
. Be sure to check out the help documentation for geom_bar()
.
Solution}
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
-
Fill the bars by
clarity
. Modify the position of the bars so that they are set to dodge.Solution}
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill=clarity),position="dodge")
Examine how the price
of a diamond changes across different diamond color
categories using a boxplot.
Solution}
ggplot(data = diamonds) +
geom_boxplot(mapping = aes(x = color, y = price))
-
Apply the complete theme,
theme_bw()
.Solution}
ggplot(data = diamonds) + geom_boxplot(mapping = aes(x = color, y = price))+ theme_bw()
-
Change the font of all text elements to "Times New Roman" and change the size of the font to 12. Bold the x and y axis labels.
Solution}
ggplot(data = diamonds) + geom_boxplot(mapping = aes(x = color, y = price))+ theme_bw()+ theme(text=element_text(family="Times New Roman",size=12), axis.title = element_text(face="bold"))
Challenge Question
Using the boxplot you created above, reorder the x-axis so that color is organized from worst (J) to best (D). There are multiple possible solutions. Hint: Check out functions in the forcats package (a tidyverse core package)
Solution}
ggplot(data = diamonds) +
geom_boxplot(mapping = aes(x = forcats::fct_rev(color), y = price))+
labs(x="color",y="price")+
theme_bw()+
theme(text=element_text(family="Times New Roman",size=12),
axis.title = element_text(face="bold"))
Putting it all together
-
Load in the comma separated file "./data/countB.csv" and save to an object named
gcounts
.Solution}
gcounts<-readr::read_csv("../data/countB.csv")
## New names: ## Rows: 9 Columns: 7 ## ── Column specification ## ──────────────────────────────────────────────────────── Delimiter: "," chr ## (1): ...1 dbl (6): SampleA_1, SampleA_2, SampleA_3, SampleB_1, SampleB_2, ## SampleB_3 ## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ ## Specify the column types or set `show_col_types = FALSE` to quiet this message. ## • `` -> `...1`
colnames(gcounts)[1]<-"Gene" gcounts
## # A tibble: 9 × 7 ## Gene SampleA_1 SampleA_2 SampleA_3 SampleB_1 SampleB_2 SampleB_3 ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Tspan6 703 567 867 71 970 242 ## 2 TNMD 490 482 18 342 935 469 ## 3 DPM1 921 797 622 661 8 500 ## 4 SCYL3 335 216 222 774 979 793 ## 5 FGR 574 574 515 584 941 344 ## 6 CFH 577 792 672 104 192 936 ## 7 FUCA2 798 766 995 27 756 546 ## 8 GCLC 822 874 923 705 667 522 ## 9 NFYA 622 793 918 868 334 64
-
Plot the values (gene counts) from Sample A on the y axis and sample B on the x axis. Hint: you will need to reshape the data to accomplish this task.
Solution}
library(tidyverse) gcount2<-pivot_longer(gcounts, cols=2:length(gcounts), names_to = c(".value", "Replicate"), names_sep = "_" ) #reshaping data so that all replicates are stacked in a single column by treatment ggplot(data=gcount2) + geom_point(aes(x=SampleB,y=SampleA))
-
Add a linear model to your scatter plot (See
geom_smooth()
). Also, the trend line should be red and the confidence interval around the trend line should NOT be visible. Change the panel background to white.Solution}
ggplot(data=gcount2,aes(x=SampleB,y=SampleA)) + geom_point() + geom_smooth(method="lm",color="red", se=FALSE)+ theme(panel.background = element_rect(fill = "white", colour = "black"), panel.grid.major = element_line(color="black",size = 0.05))