Help Session Lesson 4

Plotting with ggplot2

For the following plots, let's use the diamonds data (?diamonds).

The diamonds dataset comes in ggplot2 and contains information about ~54,000 diamonds, including the price, carat, color, clarity, and cut of each diamond. --- R4DS

library(ggplot2)
data(diamonds)
diamonds

## # A tibble: 53,940 × 10
##    carat cut       color clarity depth table price     x     y     z
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
##  2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
##  3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
##  4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
##  5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
##  6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
##  7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
##  8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
##  9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
## 10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
## # ℹ 53,930 more rows

Create a scatter plot demonstrating how carat (x axis) relates to price (y axis).

Solution}

ggplot(data = diamonds) + 
  geom_point(mapping = aes(x = carat, y = price))

Color the points above by clarity and scale the colors using the viridis package, option "magma".

Solution}

ggplot(data = diamonds) + 
  geom_point(mapping = aes(x = carat, y = price,color=clarity)) +
  viridis::scale_color_viridis(discrete=TRUE,option="magma")

Apply the complete theme, theme_classic().

Solution}

ggplot(data = diamonds) + 
  geom_point(mapping = aes(x = carat, y = price,color=clarity)) +
  viridis::scale_color_viridis(discrete=TRUE,option="magma") +
  theme_classic()

Create a bar chart displaying the number of diamonds per cut. Be sure to check out the help documentation for geom_bar().

Solution}

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut))

Fill the bars by clarity. Modify the position of the bars so that they are set to dodge.

Solution}

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill=clarity),position="dodge")

Examine how the price of a diamond changes across different diamond color categories using a boxplot.

Solution}

ggplot(data = diamonds) + 
  geom_boxplot(mapping = aes(x = color, y = price))

Apply the complete theme, theme_bw().

Solution}

ggplot(data = diamonds) + 
  geom_boxplot(mapping = aes(x = color, y = price))+
  theme_bw()

Change the font of all text elements to "Times New Roman" and change the size of the font to 12. Bold the x and y axis labels.

Solution}

ggplot(data = diamonds) + 
  geom_boxplot(mapping = aes(x = color, y = price))+
  theme_bw()+
  theme(text=element_text(family="Times New Roman",size=12), 
        axis.title = element_text(face="bold"))

Challenge Question

Using the boxplot you created above, reorder the x-axis so that color is organized from worst (J) to best (D). There are multiple possible solutions. Hint: Check out functions in the forcats package (a tidyverse core package)

Solution}

ggplot(data = diamonds) + 
  geom_boxplot(mapping = aes(x = forcats::fct_rev(color), y = price))+
  labs(x="color",y="price")+
  theme_bw()+
  theme(text=element_text(family="Times New Roman",size=12), 
        axis.title = element_text(face="bold"))

Putting it all together

Load in the comma separated file "./data/countB.csv" and save to an object named gcounts.

Solution}

gcounts<-readr::read_csv("../data/countB.csv")

## New names:
## Rows: 9 Columns: 7
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (6): SampleA_1, SampleA_2, SampleA_3, SampleB_1, SampleB_2,
## SampleB_3
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

colnames(gcounts)[1]<-"Gene"
gcounts

## # A tibble: 9 × 7
##   Gene   SampleA_1 SampleA_2 SampleA_3 SampleB_1 SampleB_2 SampleB_3
##   <chr>      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 Tspan6       703       567       867        71       970       242
## 2 TNMD         490       482        18       342       935       469
## 3 DPM1         921       797       622       661         8       500
## 4 SCYL3        335       216       222       774       979       793
## 5 FGR          574       574       515       584       941       344
## 6 CFH          577       792       672       104       192       936
## 7 FUCA2        798       766       995        27       756       546
## 8 GCLC         822       874       923       705       667       522
## 9 NFYA         622       793       918       868       334        64

Plot the values (gene counts) from Sample A on the y axis and sample B on the x axis. Hint: you will need to reshape the data to accomplish this task.

Solution}

library(tidyverse)

gcount2<-pivot_longer(gcounts,
  cols=2:length(gcounts),
  names_to = c(".value", "Replicate"),
  names_sep = "_"
) #reshaping data so that all replicates are stacked in a single column by treatment

ggplot(data=gcount2) +
  geom_point(aes(x=SampleB,y=SampleA))

Add a linear model to your scatter plot (See geom_smooth()). Also, the trend line should be red and the confidence interval around the trend line should NOT be visible. Change the panel background to white.

Solution}

ggplot(data=gcount2,aes(x=SampleB,y=SampleA)) +
  geom_point() +
  geom_smooth(method="lm",color="red", se=FALSE)+
  theme(panel.background = element_rect(fill = "white", colour = "black"),
        panel.grid.major = element_line(color="black",size = 0.05))