RNAseq Salmon (rnaseq_salmon) is a DNAnexus Workflow that combines the two DNAnexus applets Salmon Scatter-Process-Gather Workflow and quant_sf2express_table. Salmon Scatter-Process-Gather Workflow (salmon_spg_wf) is a DNAnexus applet that process a batch of pair-end FASTQ read files and runs Salmon to produce expression count files. quant_sf2express_table is a DNAnexus applet that generates expression table files suitable for RNA-seq Expression Analysis (i.e BioJupies or iDEP) from a set of quant.sf files produced by Salmon.
Required Input Files
- A Set of quant.sf Files – Select a set of files with naming convention sample_name_quant.sf. Where, you substitute sample_name with a unique sample name containing alpha-numeric characters and no spaces. You can obtain quant.sf files:
- FASTQ Gzip Compressed Paired-end Files – A batch of sample pair-end read files with the form sample_name_R1.fastq.gz_ and sample_name_R2.fastq.gz. Where, you substitute sample_name with a unique sample name containing alpha-numeric characters and no spaces.
- Salmon Index tar.gz File – A Salmon Indexed genome files with the form genome_name_salmon_idx.tar.gz . Where, you substitute genome_name with a unique genome name containing alpha-numeric characters and no spaces. This is generated using salmon_indexer.
- Expression HTML File – An output file that provides useful links, DNAnexus job information and instructions on submitting to BioJupies or iDEP which provide downstreamRNA-seq Expression Analysis.
- Raw Counts Table File – File containing table with unprocessed raw counts.
- TPM Counts Table File – File containing table TPM (transcripts per million reads) counts.
- Design Table File for iDEP – File used by iDEP containing table of sample names and conditions or treatments. The file can be opened in Excel or a text editor and customized.
- Salmon Results Directory tar.gz File – A file of form sample_name_salmon.tar.gz. This is a directory that is tar.gz compressed and needs to be expanded using the command tar -xzf tarfile. These files are provided if you wish to do some custom analysis. Otherwise, it can be ignored.
- Salmon’s Quant.sf File – A file of form sample_name_quant.sf. This file contains counts.
- Kallisto’s abundance.h5 File – A file of form sample_name_abundance.h5. This files is transformed from sample_name_quant.sf file into a Kallisto Hierarchical Data Format (HDF) file.
For NCI Members
- Find out more about GAU’s DNAnexus Pilot Program.
- For NCI Members who want to use DNAnexus or develop you must get an account.