Skip to content

R Introductory Course Series 2024

Welcome!

Welcome to the R Introductory Series!

Who: Novices and beginners\ What: A course series introducing R and RStudio. This course will introduce the foundational skills necessary to begin to analyze and visualize data in R.\ Where: Webex - The link is on the BTEP calendar!\ When: T/Th 1:00 - 2:00 pm; Help session immediately after.

Course Overview

  • Lesson 1: Introduction to R and RStudio\
  • Lesson 2: The Basics of R Programming (syntax and base R)\
  • Lesson 3: R Data Structures: Introducing Data Frames\
  • Lesson 4: Data Frames and Data Wrangling (part 1)\
  • Lesson 5: Data Frames and Data Wrangling (part 2)\
  • Lesson 6: Introduction to Data Visualization with R (part 1)\
  • Lesson 7: Introduction to Data Visualization with R (Part 2)\
  • Lesson8: Introduction to Bioconductor and report generation with R

Course Materials

::: columns ::: {.column width="75%"} - No install of R or RStudio required.

::: {.column width="25%"} :::

\ :::

If you haven't already, please create a free DNAnexus account and provide us with your username via survey monkey.

::: notes DNAnexus is a Cloud-based platform for NGS analysis for which CCR has a "site-license". For this class we are using the platform to provide a uniform, stable, preinstalled interface for R training. This interface makes use of RStudio server, which allows you to access RStudio through a browser interface. This RStudio interface also integrates the course-notes for the class in one window.

Files will disappear at the end of each session! - will say multiple times :::

Help Sessions

Each lesson is followed by a 1-hour optional help session.

What can you do in the help session?

  • Work on practice problems

  • Ask questions!

::: notes We except all R related questions. We can help you install R / RStudio locally, troubleshoot R code, provide guidance on packages related to a specific workflow, and even go over segments of the lesson that you may have had trouble understanding.

These sessions are for you! :::

Objectives

To understand:\ 1. the difference between R and RStudioIDE.\ 2. how to work within the RStudio environment including:\ - creating an Rproject and Rscript\ - navigating between directories\ - using functions\ - obtaining help\ 3. how R can enhance data analysis reproducibility

By the end of this section, you should be able to easily navigate and explore your RStudio environment.

What is R?

  • Both a computational language and environment for statistical computing and graphics.
  • Open-source and widely used by scientists, not just bioinformaticians.
  • Extensible.
  • Maintained by a network of collaborators - The R Core Team
  • A resource for and by scientists
    • R functionality makes it easy to develop and share packages on any topic.

Check out more about R on The R Project for Statistical Computing website.

Why R?

R is a particularly great resource for statistical analyses, plotting, and report generating.

  • wide use = functions and packages covering a broad range of topics.
    • CRAN has 20,000 + packages
    • Bioconductor (v 3.18) includes 2,266 software packages, 429 experiment data packages, 920annotation packages, 30 workflows, and 4 books
  • Help is a quick Google search away!
  • R = Reproducibility.

Where do we get R packages?{.smaller}

  • R packages are loadable extensions that contain code, data, documentation, and tests in a standardized shareable format that can easily be installed by R users.
  • CRAN = the primary repository for R packages.
    • To install a CRAN package, use install.packages("packageName").
  • Github
    • do not necessarily meet CRAN standards so approach with caution.
    • Use library(devtools) followed by install_github().
  • Bioconductor
    • Many genomics and other packages useful to biologists / molecular biologists.

Use METACRAN to search and browse CRAN/R packages.

::: notes CRAN is a global network of servers that store identical versions of R code, packages, documentation, etc (cran.r-project.org).

Bioconductor is a project and repository for R packages useful in the analysis of biological data sets. :::

Ways to run R

  • command line interactively.\
  • command line using a script.\
  • interactively through an environment.
    • This course will demonstrate the utility of the RStudio integrated development environment (IDE).

::: notes I am also a fan of the R extension through VSCode. :::

What is RStudio?

RStudio is an integrated development environment for R, and now python.

  • includes a console, editor, and tools for plotting, history, debugging, and work space management.\
  • provides a graphic user interface for working with R.\
  • freely available and can be installed locally or used through a browser (RStudio Server).

We will be showcasing RStudio Server, but we highly encourage new users to install R and RStudio locally to their PC or macbook.

Getting Started with R and RStudio

Transition to DNAnexus!

::: notes I'm going to move over to DNAnexus. If you have provided me with your DNAnexus username as of 9:00 am this morning, you will be able to play with DNAnexus today. If not, that's ok. Take the survey after class to provide us with your username. I will get those added before class on Thursday, which is when things will really get hands on. :::