Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at help@authorea.com in case you face any issues.

loading page

Excessive and asymmetrical removal of heterozygous sites by maxSharedH biases downstream population genetic inference: Implications for hybridization between two primroses
  • +5
  • Jie Zhang,
  • Francisco Pina-Martins,
  • Zushi Jin,
  • Yongpeng Cha,
  • Zuyao Liu,
  • Junchu Peng,
  • Jianli Zhao,
  • Qingjun Li
Jie Zhang
Yunnan University
Author Profile
Francisco Pina-Martins
Universidade de Lisboa Faculdade de Ciencias
Author Profile
Zushi Jin
Tibet Academy of Agricultural and Animal Husbandry Sciences
Author Profile
Yongpeng Cha
Yunnan University
Author Profile
Zuyao Liu
University of Bern
Author Profile
Junchu Peng
Yunnan University
Author Profile
Jianli Zhao
Yunnan University
Author Profile
Qingjun Li
Yunnan University

Corresponding Author:qingjun.li@ynu.edu.cn

Author Profile

Abstract

Techniques of reduced-representation sequencing (RRS) have revolutionized ecological and evolutionary genomics studies. Precise establishment of orthologs is a critical challenge for RRS, especially when a reference genome is absent. The proportion of shared heterozygous sites across samples is an alternative criterion for filtering paralogs, as divergent lineages should be less likely to share heterozygosity. In the prevailing pipeline for variant calling of RRS data - PYRAD/IPYRAD, maxSharedH is an often overlooked parameter with implications to detecting and filtering paralogs according to shared heterozygosity. Using empirical GBS data of two primroses (Primula alpicola Stapf and Primula florindae Ward) and their putative hybrids, we explore the impact of maxSharedH on filtering paralogs and further downstream analyses. Our study sheds light on the simultaneous validity and risk of filtering paralogs using maxSharedH, and its significant effects on downstream analyses of outlier detection, population assignment, and demographic modelling, emphasizing the importance of attention to detail during bioinformatics processes. The mutual confirmation between results of population assignment and demographic modelling in this study suggested maxSharedH = 0.10 has a potentially excessive and asymmetrical effect on the removal of truly shared heterozygous sites as paralogs. These results indicate that hybridization origin hypotheses of putative hybrids represented by results with maxSharedH = 0.25 and 0.50 are more credible. In conclusion, we revealed the critical hazard of paralogs filtration according to sharing heterozygosity at first, so that we propose to use specific protocols, rather than maxSharedH, to filter potential paralogs for closely related lineages.