Experimental Design: Best Practices

RNA-Seq
ChIP-Seq
Exome-Seq and Whole Genome-Seq

RNA-Seq

Many Researchers Have Questions About How to Run Their RNA-Seq Experiments. Here are Some Best Practices Guidelines:

Factor in at least 3 replicates (absolute minimum), but 4 if possible (optimum minimum). Biological replicates are recommended rather than technical replicates.
Always process your RNA extractions at the same time. Extractions done at different times lead to unwanted batch effects.
There are 2 major considerations for RNA-Seq libraries:
- If you are interested in coding mRNA, you can select to use the mRNA library prep. The recommended sequencing depth is between 10-20M paired-end (PE) reads. Your RNA has to be high quality (RIN > 8).
- If you are interested in long noncoding RNA as well, you can select the total RNA method, with sequencing depth ~25-60M PE reads. This is also an option if your RNA is degraded.
Ideally to avoid lane batch effects, all samples would need to be multiplexed together and run on the same lane. This may require an initial MiSeq run for library balancing. Additional lanes can be run if more sequencing depth is needed.
If you are unable to process all your RNA samples together and need to process them in batches, make sure that replicates for each condition are in each batch so that the batch effects can be measured and removed bioinformatically.
For sequence depth and machine requirements, visit Illumina Sequencing Coverage website
For cost estimates from the CCR Sequencing Facility, visit Sequencing Facility pricing for NGS. For information regarding NGS sequencing at the CCR Genomics Core (Building 41, Bethesda campus), please email ncilecdnacore@mail.nih.gov.

For further assistance in planning your RNA-Seq experiment or to discuss specifics of your project, please contact us by email: CCBR@mail.nih.gov OR visit us during office hours on Fridays 10am to noon (Bldg37/Room3041). For cost and specific information about setting up an RNA-Seq experiment, please visit the Frederick Sequencing and Genomics Core (FSGC) website or contact Bao Tran

ChIP-Seq

Many Researchers Have Questions About How to Design Their ChIP Experiments. Here are Some Best Practices Guidelines:

Factor in at least 2 replicates (absolute minimum), but 3 if possible. Biological replicates are required, not technical replicates.
There are several major considerations for ChIP-Seq libraries:
- Higher quality “ChIP-seq grade” antibody is recommended for the immune-precipitation step.
  - If the antibody is purchased from commercial vendors, the lot numbers are also important. The quality of antibodies with the same catalog number but different lot number often varies.
  - If possible, use antibodies confirmed by reliable sources or consortiums such as ENCODE or Epigenome Roadmap.
  - If the antibody was not verified by others, we recommend immunoblot or immunofluorescence validation as suggested in ENCODE ChIP-seq guideline (Landt et al, 2012 Genome research).
- For successful ChIP-seq experiments, complex high depth ChIP controls (input or IgG) are absolutely recommended.
  - It is recommended to have ChIP controls in all experimental conditions, but the experimental condition does not cause chromatin state changes, single ChIP control might be enough.
  - Spike-ins derived from remote organisms, e.g. fly spike-in for human or mouse samples, might help comparing binding affinities of the proteins qualitatively in different conditions or samples.
- If you are interested in ChIP-seq for a transcription factor that possibly binds a specific DNA sequence motif (expecting a narrow punctate binding pattern), the recommended sequencing depth is between 10-15M reads. If you are interested in a modified histone or other DNA binding proteins that do not have a specific binding motif (expecting a broad binding pattern), we recommend sequencing depth ~30M reads or more.
- Generally, single-end sequencing (read lengths 75nt) is recommended, as it is usually most economical. If you know your protein binds to the chromosome area where highly repetitive or low complex regions, you might need to sequence longer or paired-end reads.
Ideally, to avoid lane batch effects, all samples would need to be multiplexed together and run on the same lane. This may require an initial MiSeq run for library balancing. Additional lanes can be run if more sequencing depth is needed.
If you are unable to process all your DNA samples together and need to process them in batches, make sure that replicates for each condition are in each batch so that the batch effects can be measured and removed, bioinformatically.

For further assistance in planning your ChIP-Seq experiment or to discuss specifics of your project, please contact us by email: CCBR@mail.nih.gov OR visit us during office hours on Fridays 10am to noon (Bldg37/Room3041). For cost and specific information about setting up an ChIP-Seq experiment, please visit the Frederick Sequencing and Genomics Core (FSGC) website or contact Bao Tran

Exome-Seq and Whole Genome-Seq

Many Researchers Have Questions About How to Design Their Exome-Seq Genome-Seq Experiments. Here are Some Best Practices Guidelines:

Tumor / Normal Variant Calling

Matched tumor and germline sample pairs are ideal. For mouse, match paired tumor/germline is also ideal, but matched germline from another individual in the same cohort can also be effective. However, when using germline samples for mouse that are not matched from the same individual, it is critical that the germline sample be from the same generation, and that at least 2 germline samples are used. In addition, somatic calling and filtering of germline variation is significantly when several generations of backcrossing is done to reduce variation among individuals.
When germline samples are unavailable, somatic variant detection is still possible, but sensitivity and precision are significantly reduced.
Mean target depth for tumor sample is >=100X, and >=50X for germline sample.

Germline Variant Detection

Mean target depth of >=50X for exome and >=30X for genome
If structural and/or copy number variation detection are potential objectives, then it is strongly recommended that whole genome sequencing be performed rather than exome.
Due to considerable reductions in cost and an overall higher accuracy of variant detection for whole genome sequencing relative to exome sequencing, whole genome is preferred for germline variant detection, even when exonic variants are largely the focus of analysis.
For familial studies, careful selection of samples for sequencing is required to maximize statistical power for downstream linkage or subtraction analyses. As a result, consultation prior to sequence generation is strongly encouraged.

For further assistance in planning your Exome-Seq experiment or to discuss specifics of your project, please contact us by email: CCBR@mail.nih.gov OR visit us during office hours on Fridays 10am to noon (Bldg37/Room3041). For cost and specific information about setting up an RNA-Seq experiment, please visit the Frederick Sequencing and Genomics Core (FSGC) website or contact Bao Tran