R Crash Course
Accessing R
R can be accessed from the command line using R, which opens the R console, or it can be accessed via and Integrated development environment (IDE) (e.g., RStudio, VSCode, etc.). R commands can be submitted together in a script or interactively in a console.
You can install and use R locally or via an HPC such as the NIH HPC Biowulf.
R is case sensitive, so avoid typos, and space agnostic, meaning, for the most part, R does not care about spaces.
Installing and loading packages
To take full advantage of R, you need to install R packages. R packages are loadable extensions that contain code, data, documentation, and tests in a standardized shareable format that can easily be installed by R users. The primary repository for R packages is the Comprehensive R Archive Network (CRAN). CRAN is a global network of servers that store identical versions of R code, packages, documentation, etc (cran.r-project.org).
An R library is, effectively, a directory of installed R packages which can be loaded and used within an R session. —renv
install.packages()install packages from CRAN
library()load packages in R session
BiocManager::install()install packages from Bioconductor
devtools::install_github()install an R package from Github.
.libPaths() reports the directory where your installed R packages are located.
Commenting
You can annotate your code by starting annotations with #. Comments to the right of # will be ignored by R.
Use # ---- to create navigable code sections.
For report generation, use R Markdown or Quarto.
Assignment operators
Anything that you want assigned to memory must be assigned to an R object.
<- the primary assignment operator, assigning values on the right to objects on the left.
= can also be used to assign values to objects, but is usually reserved for other purposes (e.g., function arguments)
Use ls() to list objects created in R. rm() can be used to remove an object from memory.
For R objects names,
* avoid spaces or special characters, excluding “_” and “.”.
* do not begin with numbers or underscores.
* do not use names with special meanings (?Reserved).
Object data types
The base data type (e.g., numeric, character, logical, etc.) and the class (dataframe, matrix, etc.) will be important for what you can do with an object. Learn more about an object with the following:
class()returns the class of an object or base data type
str()returns the structure of an object.
Similar to str() but with much more succinct output.
Coercion is when converting from one type to another, which may throw various warning messages. Always make sure output matches expectations.
Importing and exporting data
Use the read functions to import data (e.g., read.csv, read.delim, etc.). Use write functions to export data (e.g., write.table).
There are specific functions for unique data. For example, we will learn how to specifically import scRNA-seq data using Seurat.
Using functions
R functions perform specific tasks. R has a ton of built-in functions and functions available through additional packages. You can also create your own functions.
The general syntax for a function is the name followed by parentheses, function_name() (e.g., round()).
To create a function:
function_name <- function(arg_1, arg_2, ...) {
Function body
}
Vectors
A vector is a collection of values that are all of the same type (numbers, characters, etc.) — datacarpentry.org
c()- used to combine elements of a vector
When you combine elements of different types in the same vector, they are forced into the same type via “coercion” (logical < numeric < character).
* length() - returns the number of elements in a vector
Use brackets to extract elements of a vector:
a <- 1:10
a[2]
Lists
Unlike vectors, lists can hold values of different types.
list(1, "apple", 3)
Data frames
Data frames hold tabular data comprised of rows and columns; they can be created using data.frame().
To understand more about the structure of an object and data frame, consider the following functions:
str()displays the structure of an object, not just data frames
dplyr::glimpse()similar tostrbut applies to data frames and produces cleaner output
summary()produces result summaries of the results of various model fitting functions
ncol()returns number of columns in data frame
nrow()returns number of rows of data frame
dim()returns row and column numbers
unique()returns a vector of with duplicates removed; also seedplyr::distinct()
We can subset data frames using bracket notation df[row,column]:
df<- data.frame(Counts=seq(1,5), animals=c("racoon","squirrel","bird","dog","cat"))
#to return just the animals column
df[,"animals"]
We can also use functions from dplyr such as filter() for subsetting by row and select() for subsetting by column.
Plotting
There are 3 primary plotting systems with R: base R, ggplot2, and lattice. Data visualization functions from Seurat primarily use ggplot2 and can easily be customized by adding additional ggplot2 layers.
Check out the R Graph Gallery for data visualization examples and code.
Conditionals and Looping
See the attached resources on
Getting info on R Session
sessionInfo() Print version information about R, the OS and attached or loaded packages. This is useful for reporting methods for publication. Consider using the package renv to track and share exact versions of packages used for any given R script / project.