ncibtep@nih.gov

Bioinformatics Training and Education Program

Learning R with BTEP

R is both a computational language and open-source environment for statistical computing and graphics. While R programming was ranked 20th in popularity when compared with other programming languages in December 2023, it remains a favorite among non-developers (e.g., academics, data scientists, and statisticians). R is a particularly great resource for statistical analysis, data visualization, and report generation. It has also become a staple among bioinformaticians and others interested in analyzing biological data thanks to efforts like Bioconductor, an R package repository for free open-source software that “facilitates [the] rigorous and reproducible analysis of data from current and emerging biological assays”.

R can be used via the command line interactively, the command line using a script, or interactively through an integrated development environment (IDE), most popularly RStudio. CCR researchers can install and use R and RStudio locally for most data analysis needs. For mac users, the installation is as easy as following the steps here; however, Windows users will need admin privileges, thus necessitating IT support from service.cancer.gov. Some types of analyses (e.g., single cell) will require greater computational resources. For such purposes, R and RStudio can be used on the NIH HPC Biowulf.

Whether you are analyzing your data from start to finish, applying additional statistical testing, or tweaking figures for publication, R programming can help you accomplish your goals. The NCI CCR Bioinformatics Training and Education Program (BTEP) facilitates training in R programming through three core courses: the R Introductory Series, Data Visualization with R, and Data Wrangling with R.  These courses are tailored to beginners but are also excellent refreshers for those with more experience. The R Introductory Series provides the foundational skills needed to get started working with R and RStudio including a general introduction to R syntax, data types and structures, data cleaning, wrangling, and visualization. This course is recommended for all novices. While also tailored to beginners, Data Wrangling with R and Data Visualization with R are more focused courses. Data Wrangling with R introduces essential R packages and functions used to explore, clean, transform, and summarize data. This course also includes a lesson on Bioconductor objects, which demonstrates how to access and manipulate data in S4 objects. Data Visualization with R dives deeper into plotting with ggplot2, a popular data visualization package that can be used to generate both simple and complex publication quality figures.

These free courses are offered online annually via Webex to all NIH staff. If you are unable to attend the live iteration of a course, each lesson is documented in detail and readily available in BTEP’s “Class documents”.  The course documentation also includes the data needed to work through each lesson asynchronously, helpful links, practice problems, and additional resources. In addition to the course documentation, each lesson is recorded and available in the BTEP Video Archive. However, it is advantageous to attend live sessions when possible, as these allow attendees to ask questions or obtain help specific to their interests / research questions. While we recommend learners install R / RStudio locally, BTEP’s suite of R courses do not require an R installation. All hands-on R courses are taught using a pre-established teaching environment with RStudio on DNAnexus.

BTEP also provides intermediate and advanced R lessons and lessons on specific topics. Most single session lessons (i.e., lessons not associated with a course), are offered through the BTEP Coding Club seminar. If there is a specific topic, package, or skill you would like to see featured in a BTEP Coding Club, please email us at ncibtep@nih.gov. Please note that BTEP Coding Club sessions are not exclusive to R topics. For all other R concerns, including but not limited to R code troubleshooting, contact us via email at ncibtep@nih.gov or visit us in-person on the second Monday of the month during our BTEP office hours.

– Alex Emmons (BTEP)