Skip to content

Help Session Lesson 4

Plotting with ggplot2

For the following plots, let's use the diamonds data (?diamonds).

The diamonds dataset comes in ggplot2 and contains information about ~54,000 diamonds, including the price, carat, color, clarity, and cut of each diamond. --- R4DS

library(ggplot2)
data(diamonds)
diamonds
## # A tibble: 53,940 × 10
##    carat cut       color clarity depth table price     x     y     z
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
##  2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
##  3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
##  4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
##  5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
##  6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
##  7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
##  8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
##  9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
## 10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
## # ℹ 53,930 more rows

Create a scatter plot demonstrating how carat (x axis) relates to price (y axis).

Solution}

ggplot(data = diamonds) + 
  geom_point(mapping = aes(x = carat, y = price))

  • Color the points above by clarity and scale the colors using the viridis package, option "magma".

    Solution}

    ggplot(data = diamonds) + 
      geom_point(mapping = aes(x = carat, y = price,color=clarity)) +
      viridis::scale_color_viridis(discrete=TRUE,option="magma")
    

  • Apply the complete theme, theme_classic().

    Solution}

    ggplot(data = diamonds) + 
      geom_point(mapping = aes(x = carat, y = price,color=clarity)) +
      viridis::scale_color_viridis(discrete=TRUE,option="magma") +
      theme_classic()
    

Create a bar chart displaying the number of diamonds per cut. Be sure to check out the help documentation for geom_bar().

Solution}

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut))

  • Fill the bars by clarity. Modify the position of the bars so that they are set to dodge.

    Solution}

    ggplot(data = diamonds) + 
      geom_bar(mapping = aes(x = cut, fill=clarity),position="dodge")
    

Examine how the price of a diamond changes across different diamond color categories using a boxplot.

Solution}

ggplot(data = diamonds) + 
  geom_boxplot(mapping = aes(x = color, y = price))

  • Apply the complete theme, theme_bw().

    Solution}

    ggplot(data = diamonds) + 
      geom_boxplot(mapping = aes(x = color, y = price))+
      theme_bw()
    

  • Change the font of all text elements to "Times New Roman" and change the size of the font to 12. Bold the x and y axis labels.

    Solution}

    ggplot(data = diamonds) + 
      geom_boxplot(mapping = aes(x = color, y = price))+
      theme_bw()+
      theme(text=element_text(family="Times New Roman",size=12), 
            axis.title = element_text(face="bold"))
    

Challenge Question

Using the boxplot you created above, reorder the x-axis so that color is organized from worst (J) to best (D). There are multiple possible solutions. Hint: Check out functions in the forcats package (a tidyverse core package)

Solution}

ggplot(data = diamonds) + 
  geom_boxplot(mapping = aes(x = forcats::fct_rev(color), y = price))+
  labs(x="color",y="price")+
  theme_bw()+
  theme(text=element_text(family="Times New Roman",size=12), 
        axis.title = element_text(face="bold"))

Putting it all together

  • Load in the comma separated file "./data/countB.csv" and save to an object named gcounts.

    Solution}

    gcounts<-readr::read_csv("../data/countB.csv") 
    
    ## New names:
    ## Rows: 9 Columns: 7
    ## ── Column specification
    ## ──────────────────────────────────────────────────────── Delimiter: "," chr
    ## (1): ...1 dbl (6): SampleA_1, SampleA_2, SampleA_3, SampleB_1, SampleB_2,
    ## SampleB_3
    ## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
    ## Specify the column types or set `show_col_types = FALSE` to quiet this message.
    ## • `` -> `...1`
    
    colnames(gcounts)[1]<-"Gene"
    gcounts
    

    ## # A tibble: 9 × 7
    ##   Gene   SampleA_1 SampleA_2 SampleA_3 SampleB_1 SampleB_2 SampleB_3
    ##   <chr>      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
    ## 1 Tspan6       703       567       867        71       970       242
    ## 2 TNMD         490       482        18       342       935       469
    ## 3 DPM1         921       797       622       661         8       500
    ## 4 SCYL3        335       216       222       774       979       793
    ## 5 FGR          574       574       515       584       941       344
    ## 6 CFH          577       792       672       104       192       936
    ## 7 FUCA2        798       766       995        27       756       546
    ## 8 GCLC         822       874       923       705       667       522
    ## 9 NFYA         622       793       918       868       334        64
    

  • Plot the values (gene counts) from Sample A on the y axis and sample B on the x axis. Hint: you will need to reshape the data to accomplish this task.

    Solution}

    library(tidyverse)
    
    gcount2<-pivot_longer(gcounts,
      cols=2:length(gcounts),
      names_to = c(".value", "Replicate"),
      names_sep = "_"
    ) #reshaping data so that all replicates are stacked in a single column by treatment
    
    ggplot(data=gcount2) +
      geom_point(aes(x=SampleB,y=SampleA))
    

  • Add a linear model to your scatter plot (See geom_smooth()). Also, the trend line should be red and the confidence interval around the trend line should NOT be visible. Change the panel background to white.

    Solution}

    ggplot(data=gcount2,aes(x=SampleB,y=SampleA)) +
      geom_point() +
      geom_smooth(method="lm",color="red", se=FALSE)+
      theme(panel.background = element_rect(fill = "white", colour = "black"),
            panel.grid.major = element_line(color="black",size = 0.05))