Skip to content

Transferring Data to the NIH Partek Flow Server Using Globus

Globus

NCI CCR researchers will likely use the NCI CCR Sequencing Facility (or SF) for sequencing projcts. Data generated from the NCI CCR SF will be stored within its Data Manangement Environment (DME). Researchers can use Globus to transfer data from NCI CCR SF DME to the NIH Partek Flow server and the steps for accomplishing this are described below.

Note

The staff at Biowulf has created detailed documents for Globus, which can be found at https://hpc.nih.gov/docs/globus/setup.php.

Step 1: Logging into Globus

Goto https://www.globus.org to log in by clicking on the "LOG IN" icon at the top right corner of the page.

In the next page, select organziational affiliation from the drop down menu (in this example it is National Institutes of Health).

Click on "Continue" when the organizational affiliation has been selected.

Subsequently, users will be taken to the NIH authentication page. Click on "Sign in" to authenticate using PIV card.

Next, select the appropriate PIV card certificate (usually the one with the user's name followed by "- A (Affiliate)) and a pop-up will appear to take the user's NIH pin.

In the next page, scroll to the bottom and click "I Agree".

Users will then be brought to the Globus interface where file transfers are managed. Click on "COLLECTIONS" to see the recently used Globus data transfer endpoints here. The "NIH HPC Data Transfer (Biowulf)" endpoint points to the content available on the cluster.

Step 2: Setting up a Globus Endpoint to the Partek Flow Server

Click on the "NIH HPC Data Transfer (Biowulf)" endpoint and then "Open in File Manager". This will take users to their Biowulf /home directory. Username is the Biowulf user name for the specific user.

Recall

The user's Biowulf /home directory is not suitable for analyzing data. To conduct analysis, use the /data directory.

To goto the user's /data/directory, replace /home/username in the box labeld "Path" with /data/username.

After switching to the user specific /data directory, find and click into the PartekFlow folder. Recall that the PartekFlow folder will exist only if the user has contacted Biowulf staff about activating a Partek Flow account.

Next, click on "New Folder" to make a folder called "globus" to store data uploaded to the Partek Flow server via Globus.

Then, click into the globus folder and create one named example_data_transfer.

Go back to the "NIH HPC Data Transfer (Biowulf)" endpoint and click on the "COLLECTIONS" tab after the example_data_transfer folder has been created. From there, click on "Add Guest Collection".

In the "Add Guest Collection" menu, click "Browse" to select the folder in which the endpoint will reference (ie. /data/username/PartekFlow/globus/example_data_transfer). Enter a display name and description for the endpoint and then "Create Collection" when ready.

The user will then be taken to a page for setting up sharing between the endpoint and the location where the data is (ie. NCI CCR Sequencing Facility DME).

Leave the entry in the box labeled "Path" as "/". Make sure to mark the "Write" permission box because the NCI CCR Sequencing Facility DME has to write the data into this Globus endpoint. Then click "Select a Group" to choose the group in which to share /data/username/PartekFlow/globus/example_data_transfer with.

In the drop down menu, select "HPCDME-PROD-App-Accts-Pool-FNLCR" to return to the "Add Permissions" page. Hit "Add Permission" when ready.

Click "Done" to finish the adding permission process.

The user will then be returned to the "example data transfer" endpoint and see that this endpoint has been shared with "HPCDME-PROD-App-Accts-Pool-FNLCR".

Click on the "Overview" tab and scroll to the bottom of the page. Take note of the UUID, which tells Sequencing Facility DME where to send data. The UUID for each Globus endpoint will be different.

Step 3: Downloading Data from the NCI CCR SF DME

Copy the link to the data provided by the NCI CCR SF and sign in with user specific NIH credentials and the page below will be shown. Users can download all data or browse through their data.

This example will click on the tab for browsing data and download FASTQ files in the folder labeled "Sample_N_1395BL_NextGEM". To download, just click on the "down arrow" corresponding to this folder in the column labeled "Download".

After clicking on the download arrow, users will be taken to the dialogue page shown below. Be sure to select Globus for the "Transfer Type", go back to Globus and copy then paste the endpoint UUID (this will ensure that the data gets transferred to the right place), and finally, leave the path as "/" as it was set when creating the "example data transfer" Globus Endpoint. When ready, hit "Download".

If all goes well, users will see a message indicating that the data transfer request has been submitted successfully.

Click on "Manage" and then "Download Tasks" to check download progress. Each download is assigned a task ID.

Go back to the "example data transfer" endpoint on Globus and click on "Open in File Manager".

The data will populate in the data/username/PartekFlow/globus/example_data_transfer folder (ie. the folder pointed to by the “example data transfer” endpoint) as the download proceeds.

These changes are also reflected on Biowulf. Again, replace username with the user's Biowulf user name.

ls /data/username/PartekFlow/globus/example_data_transfer
N_1395BL_NextGEM_S2_L001_I1_001.fastq.gz
N_1395BL_NextGEM_S2_L001_R1_001.fastq.gz
N_1395BL_NextGEM_S2_L001_R2_001.fastq.gz
N_1395BL_NextGEM_S2_L002_I1_001.fastq.gz
N_1395BL_NextGEM_S2_L002_R1_001.fastq.gz
N_1395BL_NextGEM_S2_L002_R2_001.fastq.gz
N_1395BL_NextGEM_S2_L003_I1_001.fastq.gz
N_1395BL_NextGEM_S2_L003_R1_001.fastq.gz
N_1395BL_NextGEM_S2_L003_R2_001.fastq.gz
N_1395BL_NextGEM_S2_L004_I1_001.fastq.gz
N_1395BL_NextGEM_S2_L004_R1_001.fastq.gz
N_1395BL_NextGEM_S2_L004_R2_001.fastq.gz