Importing Data to Project and Assigning Metadata

The next step is to import data to the project. Click on the "Add data" button and select "Bulk". RNA sequencing is the default option and since FASTQ files will be imported, leave the "fastq" radio button selected. Click "Next" when ready. In the next page, users can navigate the Partek Flow folder of their own Biowulf account to select the needed files. Specify that the data is mRNA and hit "Finish" when ready. As the data is importing, users will see a rectangular task node. Once the data has successfully imported, the rectangular task node will turn into a circular data node.

After the FASTQ files have been imported, it is time to assign metadata to the files to help keep track of what condition each file came from. To do this, click on the "Metadata" tab in the project analysis page. Once in the "Metadata" page, click on "Show data files" and users will see the two paired end FASTQ files associated with the sample. Partek Flow uses the portion of the filename before "_R1.fq" and "_R2.fq" as the sample name. This class will assign metadata using the "Assign values from file" options as this is more convenient. The metadata are available in the tab delimited file "hcc1395_phenotype.txt" in the instructor's ./PartekFlow/uploads/hcc1395 folder. The contents of the file are below. Samples that start with "n" are normal and those starting with "t" are tumors, thus in this dataset there are 3 normal and 3 tumor samples. In either case, select "hcc1395_phenotype.txt" and click on "Next" when finished. In the next page, check the import box associated appropriated with the "Attribute name" or variable, which in this case is "disease_type" as there is already a column name "sample" containing the sample names. Click import when ready.

sample  disease_type
n1      normal
n2      normal
n3      normal
t1      tumor
t2      tumor
t3      tumor