Why is high-molecular-weight DNA important?
The effect of DNA quality (i.e. fragment size) on genomic datasets can vary depending on how the data is generated. Three of the most common sequencing applications in genomic research are reduced-representation sequencing, long-range sequencing, and whole-genome sequencing. Reduced representation sequencing approaches include popular methods like genotype-by-sequencing (GBS), restriction-site associated DNA (RAD) sequencing, and double digest restriction-site associated DNA (ddRAD) sequencing (Andrews et al., 2016). With reduced representation sequencing, specific sections of the genome are sequenced. Reduced representation sequencing is more cost-effective and particularly popular when dealing with organisms that have large genome sizes, such as many plants species (Clugston et al., 2019). Reduced-representation sequencing protocols use one (RAD) or two restriction enzymes (ddRAD) in order to cut genomes of individuals at common sites, and subsequently isolate a set of fragments, usually between 300–500 bp long. Based on the restriction enzyme cut sites, specific regions of the genome can be consistently targeted for sequencing in all individuals. DNA degradation affects the efficiency of reduced presentation sequencing, increasing the number of missing loci. For instance, Guo et al. (2018) showed that degraded samples resulted in the reduction of total reads, and a reduction in the number of reads that successfully mapped to the reference genome using ddRAD. Similar results were observed by Graham et al. (2015), where incubation at room temperature of samples up to 96 hours induced DNA degradation reduced the final numbers of SNPs. These studies showed that reduced-representation sequencing performs best when using high-molecular-weight DNA to generate population data. With long-range sequencing applications like PACBio and Oxford Nanopore technologies, the effect of DNA degradation is a reduction in read length. PACBio and MinION are able to sequence fragments >80 kbp up to 100 kbp (van Dijk et al., 2018) and the degradation of a DNA sample will drastically reduce the average fragment size that can be obtained. Whole-Genome-Sequencing (generally referred to as WGS), or Whole-Genome-reSequencing involves sequencing of the entire genome for one or multiple organisms. Whole-Genome-reSequencing allows for some variation in DNA quality because it does not rely on restriction enzymes for fragmentation. Instead, during library preparation DNA is fragmented into small sizes (e.g. 300–500 bp) using sonication or a mechanical method. However, high-quality DNA provides more consistent SNP coverage across samples (Anderson et al., 2010). Moreover, to create genome assemblies, DNA quality should be sufficiently high to provide large number of reads evenly distributed along the genome, creating long, continuous scaffolds for a genome assembly (Del Angel et al., 2018).