Recommendations and Tips for Creating Effective Plots with ggplot2

Learning Objectives

Evaluate general principles and best practices for designing clear, publication-quality figures in ggplot2.
Construct multi-panel figures using tools such as patchwork.
Identify and explore specialized R packages that support particular plot types.
Write simple R functions that wrap ggplot2 code to streamline the creation of repeatable or customized plot templates.

In the previous lessons, we learned the basics of the grammar of graphics. In this lesson, we will focus on miscellaneous topics that will help you in your plot making journey.

Included topics:

recommendations for publishable figures
additional packages that enhance ggplot2 functionality (e.g., patchwork, gghighlight, ggthemes, ggrepel, scales)
creating plotting functions
resources for further learning

Recommendations for creating publishable figures

(Inspired by Visualizing Data in the Tidyverse, a Coursera lesson)

Consider whether the plot type you have chosen is the best way to convey your message
Make your plot visually appealing
- Careful color selection - color blind friendly if possible (e.g., library(viridis))
- Eliminate unnecessary white space
- Carefully choose themes including font types
Label all axes with concise and informative labels
- These labels should be straight forward and adequately describe the data
Ask yourself "Does the data make sense?"
- Does the data plotted address the question you are answering?
Try not to mislead the audience
- Often this means starting the y-axis at 0
- Keep axes consistent when arranging facets or multiple plots
- Keep colors consistent across plots
Do not try to convey too much information in the same plot
- Keep plots fairly simple

There are many complementary R packages related to creating publishable figures using ggplot2. Check out ggplot2 extensions with the ggplot2 extensions - gallery. By default, these are listed by popularity.

Here is a sampling of data visualization packages you may be interested in:

Warning

These packages do not exclusively use ggplot2 for graphic generation.

Genomics

gggenomes - extends the grammar of graphics for comparative genomics.
GViz - Plotting data and annotation information along genomic coordinates
ComplexHeatmap - generate simple or complex heatmaps
EnhancedVolcano - generate high quality, publication ready volcano plots
pcaExplorer - general-purpose interactive companion tool for RNA-seq analysis (uses a Shiny application)
OmicsCircos - generate high quality circular plots for omics data.

You may also search for plots using "plot" or "visualization" using Bioconductor: https://bioconductor.org/packages/release/BiocViews.html#___Software

Can I add ggplot2 layers?

There are many -omics related packages that include data visualization wrappers (e.g., DESeq2, Seurat, etc.). These are not visualization specific packages. Many of these functions can be customized by adding ggplot2 layers. How do we know if we can add ggplot layers? Try any / all of the following:

Check imports → does package depend on ggplot2? (e.g., packageDescription("package")$Imports)
Check the source code. Does it use ggplot2: (e.g., DESeq2::plotPCA)
1. Call directly DESeq2::plotPCA
2. showMethods(PlotPCA)
3. getMethod("plotPCA", "DESeqTransform")
Inspect the output object → class(x) includes "gg" or "ggplot"?
Try adding a layer → does + theme_minimal() work?
Read examples/vignettes → do they use + syntax?

Check out this BTEP tutorial on EnhancedVolcano and ComplexHeatmap.

Statistics integration

ggpubr - generate out-of-the-box publication quality plots. Includes statistical integration.
- Coding Club tutorial: https://bioinformatics.ccr.cancer.gov/docs/btep-coding-club/CC2024/ggpubr/Intro_to_ggpubr/
ggfortify - easily visualize statistical results including PCA.
factoextra - visualize multivariate statistics (e.g., PCA).

Combining plots

patchwork - the go-to package for combining plots.

Example:

library(tidyverse)

library(patchwork)

sc <- read.csv("./data/sc.csv")

a <- ggplot(data=sc) + 
geom_boxplot(aes(x=dex, y = TotalCounts))   

b <- ggplot() +
    geom_point(data=sc,aes(x=Num_transcripts, y = TotalCounts)) 

a + b

cowplot - also includes nice themes and annotation functions.

You may find this BTEP tutorial on combining R graphics useful.

Miscellaneous

gghighlight - highlight specific points, lines, etc. in a plot
scales - tools for working with ggplot2 scaling infrastructure (funcitons involving scale).
ggthemes - extra geoms, scales, and themes for ggplot2.
ggrepel - repel overlapping text labels.
plotly - create interactive plots (ggplotly to work with ggplot2 plots).

Note

There are many more packages. Shop around, especially if you are interested in plotting a specific data type.

Using ggplot2 in a function

While we have learned how to use existing functions in R, we have not covered writing functions.

The Syntax

The syntax for writing a function is as follows:

function(x) {
    body # do something with x
}

where function is the function used to write the function,
x is one or more arguments,
and bodyis the code that performs the function task.

We would name the function by assigning it to an object using function_name <-.

Here is an example.

add5 <- function(x){
    x+5
}

add5(5)

[1] 10

This function named add5 simply adds 5 to whatever number we include as an argument.

Note

When you call a function in R, R evaluates all of the arguments before it passes them into the function body (unless you’ve deliberately delayed evaluation with special tricks like tidy evaluation). This has important implications.

Functions that use ggplot2

Now that you know the basics, you may be interested in creating a function that will plot different sets of data the same way using ggplot2.

However, tidyverse functions use something called "tidy evaluation to allow you to refer to the names of variables inside your data frame without any special treatment". While there are two types of tidy evaluation to be aware of, data-masking and tidy-selection, these are generally beyond the scope of this lesson. You can learn more about tidy evaluation here.

What you really need to know is that when you pass expressions containing column names to a function using tidyverse verbs, including aes(). you need to use {{}}. Let's see why.

Let's use our data sc to create a function that makes a boxplot.

my_boxplot<- function(data){
  ggplot(data,aes(x=dex, y = TotalCounts, fill=dex)) + 
    geom_boxplot() +
    geom_point() +
    scale_fill_manual(values=c("red","purple"))+
    theme_bw() +
    labs(x="Treatment",y="Total Counts")
}

my_boxplot(sc)

Here, we need to supply the data frame to use this function, and everything works fine.

But what if we intend to use this function on a data set where the x variable is not "dex". We want to supply the column name as an argument.

For example,

my_boxplot_x<- function(data,x){
  ggplot(data,aes(x=x, y = TotalCounts, fill=dex)) + 
    geom_boxplot() +
    geom_point() +
    scale_fill_manual(values=c("red","purple"))+
    theme_bw() +
    labs(x="Treatment",y="Total Counts")


}

my_boxplot_x(sc, dex)

Error in `geom_boxplot()`:
! Problem while computing aesthetics.
i Error occurred in the 1st layer.
Caused by error:
! object 'dex' not found

We run into an error that says "object 'dex' not found". We know "dex" is in sc, so what is happening?

When we run my_boxplot_x(sc, dex), R tries to find an object called dex in our global environment, not in sc. Because dex is not in the global environment, an error is thrown. We need to tell our function to hold off on evaluating the argument right now, rather, capture it as an expression to be evaluated in the right context (inside aes()).

How do we fix this. We use something called embracing. "Embracing a variable means to wrap it in braces so (e.g.) var becomes {{ var }}. Embracing a variable tells the [Tidyverse] verb to use the value stored inside the argument, not the argument as the literal variable name."

Tips on Saving and Scaling

ggplot2 comes with its own function for simplified saving, ggsave(). When creating plots, we tend to work interactively and save interactively. While you may create the perfect figure at a width of 7 inches and height of 5 inches, this may not scale well (either smaller or larger). For example, you may notice the text becomes very small when the size of your image is scaled up. Text is set using an absolute point size. If you come across this issue, try the suggestions outlined here.

Tips for saving:

Use vector graphics (PDF, SVG) to save your figure. You can then scale the size of the image outside of R and maintain proportions and crispness.

If you need to use raster graphics (PNG, TIFF, JPEG), which suffer from blurring when resized, use the R package ragg for image resizing.

For example,

volcano <- readRDS("./data/Volcano.rds")

volcano

ggsave("png_small.png",width=7,height=5, dpi=300,units="in")

ggsave("scale_png.png", volcano,
    device = ragg::agg_png,
    width = 21, height = 15, units = "in", res = 300,
    scaling = 3)

The arguments res and scaling are specific to ragg:agg_png.

Vector vs Raster Graphics

What are vector and raster graphics and why does this matter?

Raster and vector graphics differ in how they represent visual information, and that distinction directly affects how visualizations look and scale. In short, this means that the output format of a plot matters.

Raster graphics are made of a fixed grid of pixels, each storing a color value. Example formats include PNG, TIFF, JPEG. This type of graphic is generally great for photos or heatmaps, but suffers from blurring when resized. Vector graphics (e.g., PDF, SVG), in contrast, describe shapes, lines, and text mathematically, so they remain crisp at any zoom level and produce smaller files for simple plots. This matters for data visualization because the choice determines clarity and flexibility. Raster formats are better for complex, image-heavy displays or web use, while vector formats are ideal for reports, publications, and presentations where sharp text and scalable detail are essential.

Finding R packages for Beginners

Google Search
1. Rseek - A special Google-powered search engine that searches R-related websites (CRAN, R-bloggers, Stack Overflow, GitHub, etc.).
Repository Search
1. CRAN - try CRAN Task Views
2. Bioconductor - repository for bioinformatics, genomics, and clinical data analysis.
r-universe - a modern R package ecosystem and discovery platform built by the rOpenSci team. Publish, explore, and evaluate R packages (CRAN and other sources).
Blogs
1. Posit - Highlights new Tidyverse and ecosystem tools.
2. R bloggers - aggregates posts from hundreds of R users and developers.
3. R Weekly - A weekly digest of new packages, tutorials, and news.

Resources for Further Learning

Official ggplot2 documentation - https://ggplot2.tidyverse.org/
BTEP
1. Coding Club
2. Data Visualization with R
Online books / tutorials
A self-learning platform (e.g., Coursera)