Bulk RNA Sequencing Analysis Using Partek Flow

Joe Wu, PhD NCI CCR Bioinformatics Training and Education Program ncibtep@nih.gov

What is and why use Partek Flow

Partek Flow
Point-and-click bioinformatic software enabling biologists to create workflows for analyzing high throughput sequencing data including:
- DNA
- Bulk and single cell RNA, ATAC/ChIP
- CITE
- Spatial transcriptomics.
Hosted on Biowulf so provides users more compute resources for analyzing large genomic data.
Getting started with Partek Flow at NIH
Institutional licenses available for NCI, NHGRI, NIH Library.

Participants will have an understanding of how to construct a bulk RNA analysis work flow after this class, ranging from file import to differential expression analysis and construction of visualizations. This class will not turn the audience into experts.
Mention the Partek Flow bulk and single cell RNA training offered at through the NIH library in December.
Going to assume that we have our Partek Flow account setup and data transferred to the PF server already (see Getting started with Partek Flow at NIH to learn about the different options for getting your data to the server)s

Click on the "Add project" tab to create a new analysis project.
Import data:
Partek Flow handles many data types but for this class, we will select bulk and then RNA.
Partek Flow also allows users to start anywhere (want to start with a BAM file, you can do that!).
As data is importing, we will see a light blue task node. After import is complete, we see a circular data node.

QC all reads
K-mer length, if specified will generate a report for each sample of the positions for the most commonly occurring k-mers (or sequence of nucleotides) of the specified length - can hint at enrichment (maybe adapters)
Summary table
Sample names (click to access sample-level report)
Quality - likelihood of error in sequencing (all samples a quality score of 38, which indicates a 0.0158% error likelihood)
Essentially no unknown reads as indicated by the "%N" column
"Average base quality score per position" plot shows the average quality at each position for all reads/sequences in a sample
Quality score distribution

Adapter available from file
Trim from both sides
Run pre-alignment QC again after this step to make sure the trimming step did not affect the data - quality still great after trimming although the average read length per sample was reduced (due to trimming)

RNA sequencing requires a splice aware aligner to accommodate reads that map across exons and
STAR
HISAT2 (will use this here)
HISAT2 index the reference genome prior to alignment to make it more efficient. If your index is not available in the menu then scroll down and choose "New assembly" to add it.
Run HISAT2 with defaults on adapter trimmed reads although users can adjust alignment stringency such as mismatch penalty under "Configure" next to advanced options.

Use Partek Quantification to Model (E/M) algorithm since a gtf annotation is available
Uses statistics to assign expression to multi-mappers rather than discarding them
Output includes gene and transcript level expression quantifications
- Summary table indicating the percentage of reads that mapped to features such as exons and introns.
- Count distribution table showing minimum, maximum, median, 25th (Q1), and 75th (Q3) - these are also available as visualizations (box and whiskers plot as well as density plot)

Remove technical variants while keep biological differences
Will use median ratio for DESeq2, this removes variations from
Differing sequencing depth per sample
Variations in RNA composition between biological conditions
Post normalization report shows distribution table and before/after box and density plots
Both gene and transcript level expression estimates will be normalized

Filter the normalized transcript expression to excluded those whose sum across all samples is less than or equal to 3 (low expression genes or transcript may represent noise).

Use Partek's implementation of DESeq2.
Assign tumor as the numerator and normal as the denominator so that the expression ratio is calculate as average expression for tumor/average expression for normal