ncibtep@nih.gov

Bioinformatics Training and Education Program

BTEP Question Forum

BTEP maintains several Question and Answer Forums of interest to the NCI/CCR community.
Currently, there are forums on these topics listed below:

If you wish to ask a question go to the Ask Question Page and submit your question.

 Back to Questions

Pipelines and QC: Since reads are mapped to unique regions of the genome, what happens with data from repeat regions such as long non-coding RNA, LINEs, etc? Will any reads be mapped to these regions?

Since reads are mapped to unique regions of the genome, what happens with data from repeat regions such as long non-coding RNA, LINEs, etc? Will any reads be mapped to these regions?

1

1 Answer:


It really depends on the type of repeat region you are discussing and how you decide to deal with blacklisted regions. Simple repetitive regions and transposable elements have little variation between repeats and result in a lot of multimapping. Because you cannot accurately ascertain which region of the genome these reads belong to, they have a tendency to cause problems with many different steps in the ChIP-seq processing, so it is often better to remove them from the traditional analysis. Long non-coding RNAs on the other hand are much more degenerate and do not tend to be in blacklisted regions. If you are setting up a workflow yourself, you have the option of removing these reads during the mapping step or at any later stage of the pipeline. - answered by Tovah Markowitz, Paul Schaughency, Vishal Koparde.


Answered on June 5th, 2020 by