Why is high-molecular-weight DNA important?
The effect of DNA quality (i.e. fragment size) on genomic datasets can
vary depending on how the data is generated. Three of the most common
sequencing applications in genomic research are reduced-representation
sequencing, long-range sequencing, and whole-genome sequencing. Reduced
representation sequencing approaches include popular methods like
genotype-by-sequencing (GBS), restriction-site associated DNA (RAD)
sequencing, and double digest restriction-site associated DNA (ddRAD)
sequencing (Andrews et al., 2016). With reduced representation
sequencing, specific sections of the genome are sequenced. Reduced
representation sequencing is more cost-effective and particularly
popular when dealing with organisms that have large genome sizes, such
as many plants species (Clugston et al., 2019). Reduced-representation
sequencing protocols use one (RAD) or two restriction enzymes (ddRAD) in
order to cut genomes of individuals at common sites, and subsequently
isolate a set of fragments, usually between 300–500 bp long. Based on
the restriction enzyme cut sites, specific regions of the genome can be
consistently targeted for sequencing in all individuals. DNA degradation
affects the efficiency of reduced presentation sequencing, increasing
the number of missing loci. For instance, Guo et al. (2018) showed that
degraded samples resulted in the reduction of total reads, and a
reduction in the number of reads that successfully mapped to the
reference genome using ddRAD. Similar results were observed by Graham et
al. (2015), where incubation at room temperature of samples up to 96
hours induced DNA degradation reduced the final numbers of SNPs. These
studies showed that reduced-representation sequencing performs best when
using high-molecular-weight DNA to generate population data. With
long-range sequencing applications like PACBio and Oxford Nanopore
technologies, the effect of DNA degradation is a reduction in read
length. PACBio and MinION are able to sequence fragments
>80 kbp up to 100 kbp (van Dijk et al., 2018) and the
degradation of a DNA sample will drastically reduce the average fragment
size that can be obtained. Whole-Genome-Sequencing (generally referred
to as WGS), or Whole-Genome-reSequencing involves sequencing of the
entire genome for one or multiple organisms. Whole-Genome-reSequencing
allows for some variation in DNA quality because it does not rely on
restriction enzymes for fragmentation. Instead, during library
preparation DNA is fragmented into small sizes (e.g. 300–500 bp) using
sonication or a mechanical method. However, high-quality DNA provides
more consistent SNP coverage across samples (Anderson et al., 2010).
Moreover, to create genome assemblies, DNA quality should be
sufficiently high to provide large number of reads evenly distributed
along the genome, creating long, continuous scaffolds for a genome
assembly (Del Angel et al., 2018).