R Introductory Course Series 2024
Welcome!
Welcome to the R Introductory Series!
Who: Novices and beginners\ What: A course series introducing R and RStudio. This course will introduce the foundational skills necessary to begin to analyze and visualize data in R.\ Where: Webex - The link is on the BTEP calendar!\ When: T/Th 1:00 - 2:00 pm; Help session immediately after.
Course Overview
- Lesson 1: Introduction to R and RStudio\
- Lesson 2: The Basics of R Programming (syntax and base R)\
- Lesson 3: R Data Structures: Introducing Data Frames\
- Lesson 4: Data Frames and Data Wrangling (part 1)\
- Lesson 5: Data Frames and Data Wrangling (part 2)\
- Lesson 6: Introduction to Data Visualization with R (part 1)\
- Lesson 7: Introduction to Data Visualization with R (Part 2)\
- Lesson8: Introduction to Bioconductor and report generation with R
Course Materials
::: columns ::: {.column width="75%"} - No install of R or RStudio required.
-
We will access RStudio server through the DNAnexus platform.
-
Course documentation available at https://bioinformatics.ccr.cancer.gov/docs/rintro/index.html. :::
::: {.column width="25%"}
:::
\ :::
If you haven't already, please create a free DNAnexus account and provide us with your username via survey monkey.
::: notes DNAnexus is a Cloud-based platform for NGS analysis for which CCR has a "site-license". For this class we are using the platform to provide a uniform, stable, preinstalled interface for R training. This interface makes use of RStudio server, which allows you to access RStudio through a browser interface. This RStudio interface also integrates the course-notes for the class in one window.
Files will disappear at the end of each session! - will say multiple times :::
Help Sessions
Each lesson is followed by a 1-hour optional help session.
What can you do in the help session?
-
Work on practice problems
-
Ask questions!
::: notes We except all R related questions. We can help you install R / RStudio locally, troubleshoot R code, provide guidance on packages related to a specific workflow, and even go over segments of the lesson that you may have had trouble understanding.
These sessions are for you! :::
Objectives
To understand:\ 1. the difference between R and RStudioIDE.\ 2. how to work within the RStudio environment including:\ - creating an Rproject and Rscript\ - navigating between directories\ - using functions\ - obtaining help\ 3. how R can enhance data analysis reproducibility
By the end of this section, you should be able to easily navigate and explore your RStudio environment.
What is R?
- Both a computational language and environment for statistical computing and graphics.
- Open-source and widely used by scientists, not just bioinformaticians.
- Extensible.
- Maintained by a network of collaborators - The R Core Team
- A resource for and by scientists
- R functionality makes it easy to develop and share packages on any topic.
Check out more about R on The R Project for Statistical Computing website.
Why R?
R is a particularly great resource for statistical analyses, plotting, and report generating.
- wide use = functions and packages covering a broad range of topics.
- CRAN has 20,000 + packages
- Bioconductor (v 3.18) includes 2,266 software packages, 429 experiment data packages, 920annotation packages, 30 workflows, and 4 books
- Help is a quick Google search away!
- R = Reproducibility.
Where do we get R packages?{.smaller}
- R packages are loadable extensions that contain code, data, documentation, and tests in a standardized shareable format that can easily be installed by R users.
- CRAN = the primary repository for R packages.
- To install a CRAN package, use
install.packages("packageName")
.
- To install a CRAN package, use
- Github
- do not necessarily meet CRAN standards so approach with caution.
- Use
library(devtools)
followed byinstall_github()
.
- Bioconductor
- Many genomics and other packages useful to biologists / molecular biologists.
Use METACRAN to search and browse CRAN/R packages.
::: notes CRAN is a global network of servers that store identical versions of R code, packages, documentation, etc (cran.r-project.org).
Bioconductor is a project and repository for R packages useful in the analysis of biological data sets. :::
Ways to run R
- command line interactively.\
- command line using a script.\
- interactively through an environment.
- This course will demonstrate the utility of the RStudio integrated development environment (IDE).
::: notes I am also a fan of the R extension through VSCode. :::
What is RStudio?
RStudio is an integrated development environment for R, and now python.
- includes a console, editor, and tools for plotting, history, debugging, and work space management.\
- provides a graphic user interface for working with R.\
- freely available and can be installed locally or used through a browser (RStudio Server).
We will be showcasing RStudio Server, but we highly encourage new users to install R and RStudio locally to their PC or macbook.
Getting Started with R and RStudio
Transition to DNAnexus!
::: notes I'm going to move over to DNAnexus. If you have provided me with your DNAnexus username as of 9:00 am this morning, you will be able to play with DNAnexus today. If not, that's ok. Take the survey after class to provide us with your username. I will get those added before class on Thursday, which is when things will really get hands on. :::