This DNAnexus application implements a QC evaluation module as part of the larger application, which is based on the nf-core workflow, wf-clone-validation, developed by Oxford Nanopore for De novo assembly of plasmids. This module should help you evaluate the accuracy and reliability of the assemblies generated by this pipeline. Shown below are three criteria that can be used to determine the usefulness of the assemblies:
- A difference between the Expected Size and Assembly Size. If the expected size is believed to be accurate and the two are noticeably different, then it is likely the assembly is wrong. The ratio of these two values is shown in the table and when the two differ by a factor of 1.5 or more the sample is given a warning status.
- The sequence is based on a single assembly.This data analysis pipeline includes a step that generates three separate subsamples of the data reads for assembly. In ideal cases, these three assemblies can be represented by a single consensus sequence, which indicates relatively high confidence in the results. However, in some cases the three assemblies can not be represented by a single consensus. In such cases a single assembly is randomly selected as the final result. In the table given below we highlight those sequence that are derived from only a single assembly (Warning), and as such come with a lower degree of confidence in their accuracy.
- The average quality score for the assembly is low. Each base in the assembly is assigned a quality score (similar to the phred scores from the raw reads). Each sample has been assigned an average quality score. If this value falls below 30 the sample is assigned a Warning label.
Visual Inspection Clicking anywhere on each row of the table will show a plot of the derived quality scores for each base in the assembly, and a histogram of the distribution of these quality scores. Large discontinuities in the plots suggest major problems in the assembly. Small region dips in the score may indicate local uncertainties. Individual low scores for a single base are not likely to indicate a significant problem. Oddities in the score distribution should also be considered as reason for questioning the assembly.
Instructions for interacting with this interface
The generated table and graph are highly interactive:
Table
- The table can be sorted by each element by clicking on the column header.
- Clicking on each row will bring up the per/base quality graph and quality histogram
- A search feature can help filter the data.
- Selected items (rows) can be exported in numerous ways. (Multiple rows can be selected with shift-click). The buttons Copy, CSV, and Excel will “export” the selected rows in the given format. If no rows are selected all rows will be exported.
Graph – Control icons can be found in the top right corner of the graph:
Controls
- The quality scores have been color-coded as follows, and presented as separate traces.
- Score < 30, Low – Red
- Score > 30 and < 50, Medium – Blue
- Score > 50, High – Green
- No color coding based on score value, All – Black
- The graph can be zoomed by selecting the Zoom icon and dragging over a part of the graph
- The Home icon will restore the graph to its original form.
- Mousing over the graph will show the location, score, and base for each position.
- The graph can be exported as an image(png) using the camera icon.
- Each trace can be toggled on/off by clicking on the legend item.