A common question posed to the Bioinformatics Training and Education Program (BTEP) is “How can I learn R and Python to analyze my data?”. First, it’s important to state that learning any programming language can be daunting, and often you do not need to learn a programming language to analyze high-throughput data. There are many graphical user interface (GUI) based alternatives. For example, commercial licenses for NIDAP, Partek Flow, Partek Genomics Suite, Qiagen IPA, Qiagen CLC Genomics Workbench, Qlucore Omics Explorer, etc. are available for CCR researchers. These programs and others include complete workflows for -omics solutions. However, open-source tools prevail in the formal literature, and you may find yourself wanting to replicate research methods using open-source tools.
So, should you learn R or Python? The answer to this question ultimately depends on your goals. Are you trying to analyze your data? Are you hoping to use a specific tool? Or, are you hoping to broadly learn computational science to apply your skills to a multitude of bioinformatics problems? If you are trying to analyze your data, the programs / languages that you use will be analysis dependent. Bioinformatics workflows can include tools with influence from R, Python, Bash, Perl, and more. You may need to learn a bit of each of these to incorporate open-source tools into your analysis. That being said, a good foundation in computer programming can ease future headaches.
Python, in particular, is a good language to tackle for learning the fundamentals of programming. It is a high-level, general-purpose programming language that has a readable and easy to learn syntax. Once you have a handle on the basics of programming with python, you may find it easier to learn other languages as needed. In contrast, R programming, which was developed for and excels at statistical computation, can be a bit confusing for beginners, given syntax specific and functional idiosyncrasies. In a head to head comparison, both R and Python are highly extensible via external packages, meaning they can both be used for very specific tasks by importing external libraries. They both excel at data wrangling and data visualization. Though, arguably, R is the leader in data visualization thanks to packages such as ggplot2 and lattice. Python also has its strengths and is more efficient than R and easier to use for highly iterative tasks; it also excels at machine learning (See scikit-learn).
If you are interested in using a specific bioinformatics tool, R seems to be the leader in -omics focused packages thanks to efforts like Bioconductor, a repository for R packages related to biological data analysis. Bioconductor ensures that packages in a given release are “mutually compatible, traceable, and guaranteed to function for the associated version of R”, making it easy to mix and match different packages in a given analysis. Popular packages developed for R programming include limma, DESeq2, edgeR, Seurat, clusterProfiler, ComplexHeatmap, Phyloseq, etc. R is often a go to for non-programmers including academics and scientists. The RStudio (now Posit) IDE facilitates interactive programming with R, which has likely influenced its popularity in the bioinformatics community. If your colleagues regularly use R, you may find greater support for troubleshooting R code and analysis concerns.
All in all, there is no right answer to the question, “which programming language should I learn, R or Python?”. They are both valuable programming languages with different strengths and weaknesses. Before tackling either, you should reflect on why you would like to learn a programming language at all. If you are in a time crunch to analyze your data, you may find groups like CCBR invaluable to your data analysis needs.
If you would like to learn R or Python, check out the NIH Bioinformatics calendar for upcoming courses, or request a Dataquest or Coursera license to begin programming on your own time. The BTEP website also includes self-guided course materials and recordings from past classes. BTEP is here for you and your training needs; if you have any questions, email us at ncibtep@nih.gov.
— Alex Emmons (BTEP)