Salmon Scatter-Process-Gather Workflow (salmon_spg_wf) is a DNAnexus applet that process a batch of pair-end FASTQ read files and runs Salmon to produce expression count files.
Required Input Files
- FASTQ Gzip Compressed Paired-end Files – A batch of sample pair-end read files with the form sample_name_R1.fastq.gz_ and sample_name_R2.fastq.gz. Where, you substitute sample_name with a unique sample name containing alpha-numeric characters and no spaces.
- Salmon Index tar.gz File – A Salmon Indexed genome files with the form genome_name_salmon_idx.tar.gz . Where, you substitute genome_name with a unique genome name containing alpha-numeric characters and no spaces. This is generated using salmon_indexer.
Output Files
- Salmon Results Directory tar.gz File – A file of form sample_name_salmon.tar.gz. This is a directory that is tar.gz compressed and needs to be expanded using the command tar -xzf tarfile. These files are provided if you wish to do some custom analysis. Otherwise, it can be ignored.
- Salmon’s Quant.sf File – A file of form sample_name_quant.sf. This file contains counts.
- Kallisto’s abundance.h5 File – A file of form sample_name_abundance.h5. This files is transformed from sample_name_quant.sf file into a Kallisto Hierarchical Data Format (HDF) file.
For NCI Members
- Find out more about GAU’s DNAnexus Pilot Program.
- For NCI Members who want to use DNAnexus or develop you must get an account.
Developed by GAU
- Peter FitzGerald (email: fitzgepe@mail.nih.gov)
- Carl McIntosh (email: mcintoshc@mail.nih.gov)