Introduction to Data Transfer using Globus

Joe Wu, PhD
NCI CCR Bioinformatics Training and Education Program
ncibtep@nih.gov

Globus

"Globus lets you share data on your storage systems with collaborators at other institutions. You specify what data. You specify which colleagues. Globus manages access simply and securely, so you can focus on your research." -- https://www.globus.org/

Why use Globus

  • Recommended for transferring large quantities of data including next generation sequencing (NGS).
  • "Globus will manage file transfers, monitor performance, retry failures, recover from faults automatically when possible, and report the status of your data transfer." -- Biowulf. Once a transfer is initiated, the user can walk away from the computer.

General Steps to Using Globus

  • Have a Biowulf account.
  • Install the Globus desktop client to local computer. This enables the use of local computer as a data transfer endpoint.
  • Set data transfer endpoints.
  • Initiate data transfer.

Help Resources

Biowulf has an extensive tutorial on using Globus. See https://hpc.nih.gov/docs/globus/setup.php.

Logging into Globus

Use https://www.globus.org/ to sign onto Globus. Google Chrome is recommended.

Accessing Globus from Biowulf HPC OnDemand

Globus can be accessed from Biowulf HPC OnDemand. Just click on any of the user's Biowulf directories under the "Files" tab and then on the "Globus" icon.

Select Affiliation

After click on "LOG IN" at the Globus page, users will be prompted to select institutional affiliation. For NIH, just type "national institutes of health" in the drop down menu. Click "Continue" when ready.

Sign in with PIV Card

Subsequently, select to sign onto Globus using NIH PIV card credentials and enter PIN when prompted.

Agree to the Terms of Globus and Authenticate

At the next screen, click "I Agree" accept the terms of Globus.

Globus Landing Page

Globus File Endpoint

"An "endpoint" is one of the two file transfer locations – either the source or the destination – between which files can move. Once a resource (server, cluster, storage system, laptop, or other system) is defined as an endpoint, it will be available to authorized users who can transfer files to or from this endpoint." -- Globus

Globus Collection

The Collections tab provides a table with with metadata regarding the data transfer endpoints that a user has setup.

Globus Collection Table

Clicking on the "Collections" tab, the following table is shown.

Globus Collection Table: Columns Explanation

  • COLLECTION: contains name of the endpoint. The example below shows the NIH HPC data transfer endpoint.
  • SUBSCRIBED: This column when checked indicates that the endpoint belongs to a organization that has a Globus subscription.
  • HA: This column refers to high assurance collections and when checked indicates that the endpoint is suitable for things like personal health data (PHI). Biowulf cannot be used for PHI so this column is not checked.
  • STATUS: This indicates whether the endpoint is ready to use.
  • ROLE: Informs of whether the user has a things like administrative rights to the endpoint.
  • Box with the up arrow on the far right links to the file transfer manager.

Globus Endpoint Overview

Click on ">" on the far right of the Collection table to see more detailed information regarding an endpoint. The UUID is important and needed for data transfer.

Globus Endpoint to Local Computer

Setting up Globus Local Endpoint (step 1)

Launch Globus desktop client and choose "Log In".

Setting up Globus Local Endpoint (step 2)

Users may need allow Globus desktop client to find local folders. Select "Allow" and sign onto Globus as shown in earlier slides.

Setting up Globus Local Endpoint (step 3)

After re-authenticating, fill out a name for the endpoint.

Setting up Globus Local Endpoint (step 4)

Subsequently, provide the name of the collection as well as a description. Hit "Save" when ready.

Setting up Globus Local Endpoint (step 5)

The endpoint appears under the "Administered By You" tab in the collections table.

Data Transfer from Local to Biowulf (step 1)

Click on the "COLLECTIONS" tab and select the "NIH HPC Data Transfer (Biowulf)" endpoint. Select "Transfer or Sync" to start a data transfer. A second file manager window opens. Here, click the magnifying glass to search for the endpoint to transfer to or from. This example will use the local computer endpoint that was setup.

Data Transfer from Local to Biowulf (step 2)

Once the local endpoint ("Joe.Wu.local") is selected use type the path to the file or folder that needs to be transferred. Click "Start" when ready. To transfer from Biowulf, just click on "Start" at on the "NIH HPC Data Transfer (Biowulf)" endpoint panel.

Data Transfer from Local to Biowulf (step 3)

A message will appear if the transfer request was successfully submitted.

Data Transfer from Local to Biowulf (step 4)

Click on "ACTIVITY" to view details such as transfer progress. Once the transfer is complete, users will get an email.

Data Transfer from Local to Biowulf (step 5)

After transfer completes, refresh the "NIH HPC Data Transfer (Biowulf)" endpoint and click on "LAST MODIFIED" to ensure the data was successfully transferred.

Schedule Data Transfer

Users can schedule data transfer.

Transfer from NCI CCR Sequencing Facility Data Management Environment: Overview

This example applies to those researchers who utilize the NCI CCR Sequencing Facility for sequencing experiments. The sequencing facility will:

  • Provide a link to their Data Management Environment (DME) for researchers to access their data.
  • Sequencing facility will also do many of the analysis steps for the researchers including QC.

Please check with the specific core for data management and transfer issues if not using NCI CCR Sequencing Facility.

Make a New Folder in Globus

Open the "NIH HPC Data Transfer (Biowulf)" endpoint and in the instructor's data a directory a new folder called globus_transfers.

Add Guest Collection

Next, goto back to the "NIH HPC Data Transfer (Biowulf)" endpoint and click on "Add Guest Collection".

Provide Guest Collection Information

In the subsequent page, supply the directory in which to link the guest collection. Here is globus transfer under the instructor's Biowulf data folder. Provide a display name and description for the guest collection and hit "Create Collection" when done.

Granting Permission for NCI CCR SF DME to Transfer to Guest Collection (step 1)

Next, grant permission for the NCI CCR Sequencing Facility Data Management Environment to share data with the globus demonstration collection.

Granting Permission for NCI CCR SF DME to Transfer to Guest Collection (step 2)

  • Keep / in the Path box.
  • Be sure to select share with group.
  • Make sure that permissions are set to read and write.
  • Click "Select a Group" to find the group to grant permission to this endpoint to.

Granting Permission for NCI CCR SF DME to Transfer to Guest Collection (step 3)

Enter the name of the group in which grant permission for the collection to or start typing and a list options will appear for users to choose from.

Granting Permission for NCI CCR SF DME to Transfer to Guest Collection (step 4)

The group that is getting permission granted is now listed next to the "Group" column. Click "Add Permission" when ready.

Granting Permission for NCI CCR SF DME to Transfer to Guest Collection (step 5)

When prompted, click "Done" to complete the permission granting process.

Granting Permission for NCI CCR SF DME to Transfer to Guest Collection (step 6)

Look at the "Overview" for the "globus demonstration" collection. Note the UUID. This will be needed when transfering file from the sequencing facility DME to Biowulf.

Getting Data from Sequencing Facility DME (step 1)

In the sequencing facility DME page where data is stored, click on "Browse project data" to peruse specific data or download the all of the data.

Getting Data from Sequencing Facility DME (step 2)

This example will browse for specific data to download. Just click on the "Download" button to the far right of the file content table when ready.

Getting Data from Sequencing Facility DME (step 3)

In the next page, select the Globus radial button under "Transfer Type".

Getting Data from Sequencing Facility DME (step 4)

Users will see the message highlighted in blue in the image below when the transfer request has been successfully submitted.

Getting Data from Sequencing Facility DME (step 5)

Clicking on "Manage" and then "Download Tasks", the status for the data transfer will change to complete when done.

Getting Data from Sequencing Facility DME (step 6)

Users will also receive an email from sequencing facility DME and Globus informing that transfer was completed as well as successful.

Getting Data from Sequencing Facility DME (step 7)

The folder in which the data was transferred is now populated with content.