The R Project for Statistical Computing
Description
R is both a computational language and environment for statistical computing and graphics. It is open-source and widely used by scientists and other researchers, not just bioinformaticians. Base packages of R are built into the initial installation, but R functionality is greatly improved by installing other packages.
R is a great resource for statistical analysis, data visualization, and report generation. It is a particularly powerful programming language and environment due to its extensive community support. The widespread use of R means that tutorials, data analysis workflows / examples, and help are only a Google search away, and there are packages available for most types of analyses.
Recommendations
To take full advantage of R, you need to install R packages. R packages are loadable extensions that contain code, data, documentation, and tests in a standardized shareable format that can easily be installed by R users. The primary repository for R packages is the Comprehensive R Archive Network (CRAN). CRAN is a global network of servers that store identical versions of R code, packages, documentation, etc. (cran.r-project.org). As of now, CRAN houses 18,825 available packages. Github is another common source used to store R packages; though, these packages do not necessarily meet CRAN standards so approach with caution.
There are also many field specific packages, including those useful in the -omics (genomics, transcriptomics, metabolomics, etc.). Check out Bioconductor, a repository for R packages related to biological data analysis, and Github for -omics packages and pipelines. Try out the biocViews search in Bioconductor.
Examples of top ranked Bioconductor packages by topic
-
RNA-Seq
-
ChIP-Seq
-
Variant Detection
-
Mass Spec / Proteomics / Metabolomics
-
Single cell
Things to Know
- R is freely available and can be used via command line, through an integrated development environment (RStudio), and online (RStudio Server).
- Using R effectively can make scientific data analysis more reproducible. Data reports can be easily generated using R markdown.
- Because R is a programming language, the learning curve is fairly steep. However, if you take the time to learn the basics, a plethora of different data analysis and visualization packages will become accesible to you.
Input Data Types
The input data types are unlimited due to an extensive library of multidisciplinary packages. Tab delimited files (e.g., .txt, .tsv), comma separated files (.csv), Excel spreadsheets (.xls, .xlsx), and other delimited files, are easily imported using base R import functions.
Output Data Types
Again, thanks to a wide array of packages, output data types are essentially limitless. There are some file types that are specific to R and noteworthy including .RData and .rds files. RData files are used to capture all objects stored in a R workspace or global R environment, while .rds files hold a single R object.
Access Information
R and RStudio are free resources that can be downloaded directly from the internet. Click here for installation instructions. To install R an RStudio on NIH laptops, please submit a ticket at service.cancer.gov.
Getting Help
Tutorials and courses are easily accessible.
-
Check out BTEP R course offerings
BTEP R Course Documentation
-
Check out the NIH library.
- Check out self-learning platforms: Coursera and Dataquest.