Accessing Partek Flow at NIH and tips for data transfer
Learning objectives
After consulting this guide, participants will
- Know how to access Partek Flow at NIH.
- Be able to transfer data from NCI CCR Sequencing Facility Data Management Environment to their Biowulf Partek Flow folder.
Instructions for accessing Partek Flow
NCI researchers can find instructions for accessing Partek Flow at https://bioinformatics.ccr.cancer.gov/btep/partek-flow-bulk-and-single-cell-rna-seq-data-analysis/. But the things needed are
- A Biowulf (The High Performance Computing cluster) account — see here for information about how to obtain a HPC account.
- A /data directory on Biowulf with enough disk space to hold their Partek Flow files — please fill out this online form if you do not already have a /data directory or if you require more disk space.
- A Partek Flow account created for them — please contact staff@hpc.nih.gov.
Once these steps have been accomplished, Partek Flow is available at https://partekflow.cit.nih.gov/flow.
The Partek Flow folder on Biowulf
HPC staff will create a folder called "PartekFlow" in the user's Biowulf data directory. This folder will hold all Partek Flow projects.
Transferring data from NCI CCR Sequencing Facility to Partek Flow on Biowulf
Those researchers who used the NCI CCR Sequencing Facility to get sequencing done will receive a link to their data. This data can be transferred to the "PartekFlow" folder on Biowulf using Globus. The steps for setting up a Globus endpoint for the Biowulf "PartekFlow" folder can be found at https://partekflow.cit.nih.gov/#upload_globus. The embedded PDF shows how to connect the sequencing facility's data management environment to a Globus endpoint.
For those who have not setup a Globus account, refer to https://hpc.nih.gov/docs/globus/setup.php for instructions.
Tip
If following the Biowulf instructions for creating a Globus endpoint for the "PartekFlow" folder, it will be a good idea to use subdirectories for data generated for different experiments. This exercise will use a subdirectory called fnl_example_single_cell_fastq.
Sign onto Globus
Information regarding Globus and how to obtain it can be found at https://hpc.nih.gov/docs/globus/setup.php.
For those Globus already setup, goto https://www.globus.org to log in by clicking on the "LOG IN" icon at the top right of the pages.
After clicking on the log in button, select organziational affiliation, which is National Institutes of Health in this example.
Click on "Continue" when the organizational affiliation has been selected.
After clicking "Continue", users will be brought the to Globus interface where file transfers are managed. Clicking on "COLLECTIONS" and then check "ADMINISTER BY YOU" will reveal several endpoints including on that points to that for the instructor's Biowulf "PartekFlow" folder labeled "example fastq from fnl sf dme to biowulf partek flow" to see the overview. Note the UUID.
NCI CCR Sequencing Facility Data Management Environment
The NCI CCR Sequencing Facility will send researchers a link to the data, which is stored in their Data Management Environment (DME). Again, instructions for connecting DME to a Globus endpoint on Biowulf are in the embedded PDF.
Users are able to download an entire collection of data or browse the collection and download a subset.
This example will browse the collection and download the N_1395BL_NextGEM_count.tar.
At the subsequent page, enter the UUID for the Biowulf Partek Flow Globus endpoint and "/" for the path. Then click "Download".
Globus will also send an email to the user's NIH email account after transfer has been completed.
These files will show up on Biowulf as well. These files will need to be unpacked using tar -xvf
.
[wuz8@biowulf fnl_example_single_cell_fastq]$ ls -1 *.tar
N_1395BL_NextGEM_count.tar
Importing data to Partek Flow project
Log into Partek Flow at https://partekflow.cit.nih.gov/flow. This example will use the nci_ccr_sf_example_scrna project, so click on it.
Click on "Add data".
Select Single cell, scRNA-Seq, and check 10x Genomics Cell Ranger counts h5. Then click Next.
Navigate PartekFlow, globus, fnl_example_single_cell_fastq, N_1395BL_NextGEM, outs and select the filtered_feature_bc_matrix.h5 file and then click Next at the bottom of the screen.
In the subsequent page, provide an informative sample name and select the appropriate assembly. Then click Finish.
When import is done there will be a "Single cell counts" data node in the Analyses window.