Skip to content

Lesson 2: Getting Started with R on Biowulf

Learning objectives

  1. Understand how R can be deployed on Biowulf
  2. Understand how to access and use R modules
  3. Learn to create a custom R library on Biowulf

Deploying R on Biowulf

There are multiple ways to use R on Biowulf. See the HPC documentation.

Note

R sessions are not allowed on Helix or the login node. All R sessions must use computational nodes.

  1. Interactively

    Your workflow may require some element of interactivity (e.g., modifying code based on graphical output). In such cases, users generally like to use an IDE (Integrated development environment). The preferred IDE for R programming is generally RStudio. However, if you are expeirencing significant lag, there are other options including Jupyter Lab and VSCode.

    • RStudio

      There are currently 2 ways to run RStudio on Biowulf.

      1. Using NoMachine (To be demoed)

        • To get started, you will need to install NoMachine (NX), "a graphical client that presents a full virtual Linux desktop to a window on the user's local machine".
        • Once NoMachine is installed, follow these instructions to start RStudio.

      Warning

      NoMachine uses X11 forwarding and will experience lags.

      1. Using RStudio Server (Warning: Under development) (To be demoed)
    • Jupyter Lab

    • VSCode

    • Connecting and using just an R console (Lesson 1)

      This is how we will use R in today's lesson.

  2. Submitting R scripts via sbatch (Lesson 4)

    To submit an R script from command line, you can use the command Rscript or R CMD BATCH. Rscript is preferred and prints output to stdout. R CMD BATCH prints R commands and output to a .Rout file. See more information here.

Remove your hands from your keyboard, sit back, and enjoy a demo on how to use the RStudio IDE on Biowulf. If you intend to use an IDE to interact with R on Biowulf and you experience difficulties in the future, please email us at ncibtep@nih.gov.

Connect to Biowulf (Hands-on)

To connect to Biowulf, you must be on the NIH network, either on campus or via VPN.

We will then connect using an ssh protocol.

Open your terminal if on a mac or the command prompt if using a Windows and type the following:

ssh username@biowulf.nih.gov  

Replace username with your NIH user name. You will then be prompted for your NIH password.

Note

The cursor will not move nor will you be able to see what you type when entering your password.

Getting started with R

We will be working with R from our /data/$USER directory. There is not much space in ~ (16 GB), so it is good practice to always cd to /data/$USER.

cd /data/$USER  

Info

$USER is an environment variable. You can read more about environment variables here.

The default R installation on Biowulf is R/4.3.0 as of May 2023. R is available on Biowulf via environment modules.

To see the available modules use:

module -r avail '^R$'  

Here we using the module command with the option to use regular expression matching (-r) and avail to return a list of available modules.

Before loading the R module and running R, we first need an interactive session. R cannot be used on the login node or on helix.

sinteractive --gres=lscratch:5  

sinteractive default allocations

The default sinteractive allocation is 1 core (2 CPUs) and 0.768 GB/CPU (1.536 GB but rounded to 2 GB in the terminal) of memory and a walltime of 8 hours.

Note: lscratch

"R will automatically use lscratch for temporary files if it has been allocated" (HPC Biowulf docs). lscratch space can be requested using --gres=lscratch:#, where gres stands for "generic resources" and # is the number of GB you would like allocated. This will be code dependent.

Info: more memory and CPUs?

You may want to also include more memory and more CPUs (for multi-threaded) (e.g., sinteractive --cpus-per-task=2 --mem=6g --gres=lscratch:20). However, often more memory is not needed and most R code is single threaded, unless written specifically to be multi-threaded. Track memory and CPU usage using jobload or the user dashboard.

Loading modules

Load the R module and begin the R session.

# Load the module
module load R/4.2.2 
# Begin the R session 
R

Setting up local libraries

Each version of R loaded as a module includes a number of installed packages. However, you may want to install additional packages, which will by default be stored in "~/R/%v/library where %v is the major.minor version of R (e.g. 4.2)".

Due to the space constraints associated with biowulf home directories (16GB), it is safer to save installed packages to /data/$USER.

First, make a new package directory.

#replace %v with the major.minor version of R you plan to use (e.g., 4.2)
mkdir -p /data/$USER/R/%v  

Next, set this location using $R_LIBS_USER in your ~/.bashrc file.

nano ~/.bashrc  

Copy and paste export R_LIBS_USER="/data/$USER/R/%v" to the file. Replace $USER with your username and %v with the correct version number. Use Ctrl + O to write the file, press return, and Ctrl + X to exit.

Open R and check your library path.

R  
.libPaths() 
You should see the new path to your personal library listed first followed by the library established my module load R.

Let's quit R and end the interactive session.

q() # quit R
exit # end interactive session 

Next time

Lesson 3 will feature R project management and using renv to manage package dependencies.