Skip to content

Course Overview

Welcome to the Data Wrangling with R course series

The purpose of this course is to introduce you to essential R packages and functions that will make your life easier when it comes time to explore, clean, transform, and summarize your data. This course will include a series of lessons for scientists with little to no experience in R.

Course objectives

  • Learn how to navigate RStudio.
  • Learn how to load different types of data formats.
  • Get acquainted with the tidyverse packages, especially dplyr.
  • Become familiar with functions useful for cleaning, transforming, and summarizing data.

While this course will not make you an expert R programmer or full-fledged data analyst, it will help you learn how to analyze real-life, messy data and prepare it for visualization and further analyses.

Course Expectations

This course will include a series of eight, one hour lessons. Each lesson will be held virtually using the Webex platform on Mondays / Wednesdays at 1 pm. Lessons will immediately be followed by a one-hour help session. Help sessions will be structured around a set of practice problems for you to test your new skills. Though, we welcome all questions!


You will not be able to practice or work through this documentation using DNAnexus outside of class hours. If you are using this documentation asynchronously, please install R and RStudio.

Lesson 1: Introduction to R, RStudio, and the Tidyverse

This will be a no coding introduction to R, RStudio, and the Tidyverse. In this lesson, we will review some of the advantages of using R for data analysis and will get you acquainted with the RStudio environment.

Lesson 2: Getting started with R

Lesson 2 will focus on some of the basics of R programming including naming and assigning R objects, recognizing and using R functions, understanding data types and classes, becoming familiar with the R programming syntax.

Lesson 3: Importing and reshaping data

In lesson 3, we will learn how to import simple and complex data and how to avoid common mistakes. We will also learn how to reshape data, for example, from wide to long format, with tidyr.

Lesson 4: Data Visualization with ggplot2

Lesson 4 will be a brief reprieve from data wrangling. In this lesson, we will learn the basics of plotting with ggplot2.

Lesson 5: Introducing dplyr and the pipe (Part 1)

In Lesson 5, we will learn how to improve code interpretability with the pipe %>% from the magrittr package. We will also learn how to merge and filter data frames.

Lesson 6: Introducing dplyr and the pipe (Part 2)

In Lesson 6, we will continue to wrangle data using dplyr. This lesson will focus on functions such as group_by(), arrange(), summarize(), and mutate().

Lesson 7: Introduction to Bioconductor -omics classes (containers)

In this lesson, we will learn about specialized data containers / classes that are shared across Bioconductor packages. These classes allow us to store and easily manage multiple -omics types. We will discuss some of the properties of these classes and gain insight into how to access and subset the data stored within.

Lesson 8: Data Wrangling Review and Practice

In Lesson 8, we will review many of the important concepts we learned throughout the course. We will also practice using our skills together on a realistic data set.

Required Course Materials

To participate in this class you will need your government-issued computer and a reliable internet connection. You do not need to download or install any software to participate in the class.