Accessing Partek Flow at NIH and tips for data transfer
Learning objectives
Instructions for accessing Partek Flow
NCI researchers can find instructions for accessing Partek Flow at https://bioinformatics.ccr.cancer.gov/btep/partek-flow-bulk-and-single-cell-rna-seq-data-analysis/. But the things needed are
- A Biowulf (The High Performance Computing cluster) account — see here for information about how to obtain a HPC account.
- A /data directory on Biowulf with enough disk space to hold their Partek Flow files — please fill out this online form if you do not already have a /data directory or if you require more disk space.
- A Partek Flow account created for them — please contact staff@hpc.nih.gov.
Once these steps have been accomplished, Partek Flow is available at https://partekflow.cit.nih.gov/flow.
Data transfer using command line:
Tip
File transfer using command line:
- Copy from data to Partek Flow uploads folder is OK
- Copy from data to Partek Flow project folder is not OK
- Copy from Partek Flow uploads folder to project folder is not ok
Caution
"Virtually all file transfer activities should be run from within the web interface, rather than from the command line. We have implemented a permissions policy on users' PartekFlow directories to prevent inadvertent file removal mistakes that breaks the old way of moving files around." -- Biowulf staff
Data transfer using SFTP client
Biowulf recommends options for transfeering data from local to the cluster using SFTP clients, SCP
, or SFTP
(see https://hpc.nih.gov/docs/transfer.html). Regardless of method, the only place users can transfer data into is the /data/username/PartekFlow/uploads
folder.
Data transfer using Globus
Step 1:
Goto https://www.globus.org to log in by clicking on the "LOG IN" icon at the top right of the pages.
After logging in, select organziational affiliation, which is National Institutes of Health in this example.
Click on "Continue" when the organizational affiliation has been selected.
After clicking "Continue", users will be brought the to Globus interface where file transfers are managed. Users will see the recently accessed Globus data transfer endpoints. The image below shows the NIH HPC Data Transfer (Biowulf), which is the content available on the cluster. "Joe Wu collection" points to the files/directories on a local computer.
To access the NIH HPC Data Transfer (Biowulf) Globus endpoint, type either of the following in the search box.
- NIH HPC Data Transfer (Biowulf)
- e2620047-6d04-11e5-ba46-22000b92c6ec (used in the example below)
Clicking on the NIH endpoint takes users to the Biowulf home directory. Type /data/username
to change into the researcher's specific data directory. Those with a Partek Flow account setup will see a folder "PartekFlow" in the data directory.
Click on the "PartekFlow" directory to take a look at what is inside. In the PartekFlow folder below, there are several projects but importantly there is a folder called "uploads", which is the only place where files be uploaded to.
To learn more about Globus, refer to "A guide to Globus from Biowulf".
Importing data to Partek Flow project at NIH
To access the NIH Partek Flow server, go to https://partekflow.cit.nih.gov/flow and enter the user's NIH username and then password.
Note
User may have selected a password different than that used for NIH when Partek Flow account was setup.
As an example, click on the demonstration_project.
Then, click on the "Add data" button.
This exercise will import FASTQ files from bulk RNA sequencing. Click "Next" when ready.
Users will then be directed to their Biowulf PartekFlow folder.
Expand on the PartekFlow folder to reveal the uploads directory and import the FASTQ files in the hbr_uhr_fastq folder.
Check the "name" box to select all FASTQ files in the hbr_uhr_fastq folder.
Click "Finish" when ready.
Upon successful import of data, users will see a circular data node in the workflow builder.
After data has been imported to a project, click on the Metadata and users will see a table listing the sample names and data files associated with those sample names. Note that Partek Flow will automatically recognize the paired end relationship of files. The "+" and "-" allows users to associate additional or dissociate files from sample. Click on "Hide data files" if users do not wish to see the data files associated with samples. Users can assign description/attributes to samples under the "Sample attributes" section either from a existing file or manually by clickin on the "Manage" button. This exercise will use the "Manage" button to assign attributes to the samples (ie. the treatment group in which they belong).
On the subsequent page, click on "Add new attribute". In the box that appears, choose between categorical or numeric attribute type. This example assigns the FASTQ files to their respective treatment groups, so categorical will be selected. Enter variable name ("TREATMENT" in this case) in box labeled "Name".
Enter attributes in the "New category" box and the click on the "+" sign. "HBR" and "UHR" will added in this example. When done adding attributes, click on the "Back to metadata" tab.
Back in the metada page,click on assign values to assign the FASTQ files to their corresponding treatments.
Select the appropriate treatment group for the FASTQ files using the drop down under the "TREATMENT" column. Click on "Apply changes" when done.
Transferring data via web upload
Web upload is another way to transfer data from local computer to the NIH Partek Flow server. To do this, click on the "Add data" button in the project.
Next, select "Transfer files to the server".
Then, click on "Transfer files".
Keep the /data/username/PartekFlow/uploads
folder in the "Select directory" box.
Then drag local files to the data upload client.
After successful data transfer, users can follow steps described previously to import data into the project.