loading page

POOLPARTY2: An integrated pipeline for analyzing pooled or indexed low coverage whole genome sequencing data to discover the genetic basis of diversity
  • +1
  • Stuart Willis,
  • Steven Micheletti,
  • Kimberly Andrews,
  • Shawn Narum
Stuart Willis
Columbia River Inter-Tribal Fish Commission

Corresponding Author:swillis@critfc.org

Author Profile
Steven Micheletti
Columbia River Inter-Tribal Fish Commission
Author Profile
Kimberly Andrews
University of Idaho
Author Profile
Shawn Narum
Columbia River Inter-Tribal Fish Commission
Author Profile

Abstract

Whole genome sequencing data allow survey of variation from across the genome, reducing the constraint of balancing genome sub-sampling with recombination rates and linkage between sampled markers and target loci. As sequencing costs decrease, low coverage whole genome sequencing of pooled or indexed-individual samples is commonly utilized to identify loci associated with phenotypes or environmental axes in non-model organisms. There are, however, relatively few publicly available bioinformatic pipelines designed explicitly to analyze these types of data, and fewer still that process the raw sequencing data, provide useful metrics of quality control, and then execute analyses. Here, we present an updated version of a bioinformatics pipeline called POOLPARTY2 that can effectively handle either pooled or indexed DNA samples and includes new features to improve computational efficiency. Using simulated data, we demonstrate the ability of our pipeline to recover segregating variants, estimate their allele frequencies accurately, and identify genomic regions harboring loci under selection. Based on the simulated data set, we benchmark the efficacy of our pipeline with another bioinformatic suite, ANGSD, and illustrate the compatibility and complementarity of these suites by using ANGSD to generate genotype likelihoods as input for identifying linkage outlier regions using alignment files and variants provided by POOLPARTY2. Finally, we apply our updated pipeline to an empirical dataset of low coverage whole genomic data from uncurated population samples of Columbia River steelhead trout (Oncorhynchus mykiss), results from which demonstrate the genomic impacts of decades of artificial selection in a prominent hatchery stock.
31 May 2023Submitted to Molecular Ecology Resources
06 Jun 2023Submission Checks Completed
06 Jun 2023Assigned to Editor
06 Jun 2023Review(s) Completed, Editorial Evaluation Pending
15 Jun 2023Reviewer(s) Assigned
02 Aug 2023Editorial Decision: Revise Minor
29 Sep 20231st Revision Received
30 Sep 2023Submission Checks Completed
30 Sep 2023Assigned to Editor
30 Sep 2023Review(s) Completed, Editorial Evaluation Pending
19 Oct 2023Editorial Decision: Accept