Lesson 2: Getting Started with R on Biowulf
Learning objectives
- Understand how R can be deployed on Biowulf
- Understand how to access and use R modules
- Learn to create a custom R library on Biowulf
Deploying R on Biowulf
There are multiple ways to use R on Biowulf. See the HPC documentation.
Note
R sessions are not allowed on Helix or the login node. All R sessions must use computational nodes.
-
Interactively
Your workflow may require some element of interactivity (e.g., modifying code based on graphical output). In such cases, users generally like to use an IDE (Integrated development environment). The preferred IDE for R programming is generally RStudio. However, if you are expeirencing significant lag, there are other options including Jupyter Lab and VSCode.
-
There are currently 2 ways to run RStudio on Biowulf.
-
Using NoMachine (To be demoed)
- To get started, you will need to install NoMachine (NX), "a graphical client that presents a full virtual Linux desktop to a window on the user's local machine".
- Once NoMachine is installed, follow these instructions to start RStudio.
Warning
NoMachine uses X11 forwarding and will experience lags.
- Using RStudio Server (Warning: Under development) (To be demoed)
-
-
- To use the VSCode R extension, use these instructions.
-
Connecting and using just an R console (Lesson 1)
This is how we will use R in today's lesson.
-
-
Submitting R scripts via sbatch (Lesson 4)
To submit an R script from command line, you can use the command
Rscript
orR CMD BATCH
.Rscript
is preferred and prints output to stdout.R CMD BATCH
prints R commands and output to a.Rout
file. See more information here.
Remove your hands from your keyboard, sit back, and enjoy a demo on how to use the RStudio IDE on Biowulf. If you intend to use an IDE to interact with R on Biowulf and you experience difficulties in the future, please email us at ncibtep@nih.gov.
Connect to Biowulf (Hands-on)
To connect to Biowulf, you must be on the NIH network, either on campus or via VPN.
We will then connect using an ssh
protocol.
Open your terminal if on a mac or the command prompt if using a Windows and type the following:
ssh username@biowulf.nih.gov
Replace username
with your NIH user name. You will then be prompted for your NIH password.
Note
The cursor will not move nor will you be able to see what you type when entering your password.
Getting started with R
We will be working with R from our /data/$USER
directory. There is not much space in ~
(16 GB), so it is good practice to always cd
to /data/$USER
.
cd /data/$USER
Info
$USER is an environment variable. You can read more about environment variables here.
The default R installation on Biowulf is R/4.3.0
as of May 2023. R is available on Biowulf via environment modules.
To see the available modules use:
module -r avail '^R$'
Here we using the module
command with the option to use regular expression matching (-r
) and avail
to return a list of available modules.
Before loading the R module and running R, we first need an interactive session. R cannot be used on the login node or on helix.
sinteractive --gres=lscratch:5
sinteractive default allocations
The default sinteractive allocation is 1 core (2 CPUs) and 0.768 GB/CPU (1.536 GB but rounded to 2 GB in the terminal) of memory and a walltime of 8 hours.
Note: lscratch
"R will automatically use lscratch for temporary files if it has been allocated" (HPC Biowulf docs). lscratch
space can be requested using --gres=lscratch:#
, where gres
stands for "generic resources" and #
is the number of GB you would like allocated. This will be code dependent.
Info: more memory and CPUs?
You may want to also include more memory and more CPUs (for multi-threaded) (e.g., sinteractive --cpus-per-task=2 --mem=6g --gres=lscratch:20
). However, often more memory is not needed and most R code is single threaded, unless written specifically to be multi-threaded. Track memory and CPU usage using jobload
or the user dashboard.
Loading modules
Load the R module and begin the R session.
# Load the module
module load R/4.2.2
# Begin the R session
R
Setting up local libraries
Each version of R loaded as a module includes a number of installed packages. However, you may want to install additional packages, which will by default be stored in "~/R/%v/library
where %v
is the major.minor version of R (e.g. 4.2)".
Due to the space constraints associated with biowulf home directories (16GB), it is safer to save installed packages to /data/$USER
.
First, make a new package directory.
#replace %v with the major.minor version of R you plan to use (e.g., 4.2)
mkdir -p /data/$USER/R/%v
Next, set this location using $R_LIBS_USER
in your ~/.bashrc
file.
nano ~/.bashrc
Copy and paste export R_LIBS_USER="/data/$USER/R/%v"
to the file. Replace $USER
with your username and %v
with the correct version number. Use Ctrl + O
to write the file, press return
, and Ctrl + X
to exit.
Open R and check your library path.
R
.libPaths()
module load R
.
Let's quit R and end the interactive session.
q() # quit R
exit # end interactive session
Next time
Lesson 3 will feature R project management and using renv
to manage package dependencies.