Abstract
Techniques of reduced-representation sequencing (RRS) have
revolutionized ecological and evolutionary genomics studies, especially
favoring species without reference genome. But it is a great challenge
for RRS data to precisely establish homologous loci, which is strongly
associated with accuracy of downstream analyses and reliability of
biological inferences. maxSH is an overlooked parameter with
respect to detecting paralogs, belonging to PYRAD/IPYRAD──a prevailing
pipeline for genotyping RADseq and GBS data. Using GBS data of two
primroses (Primula alpicola Stapf and P. florindae Ward)
and their putative hybrids, as empirical study, we explore the
efficiency of maxSH on filtering paralogs and its impact on
downstream analyses. At the same time, we try to assess if putative
hybrids are truly speciated from hybridization. Our study sheds light on
the efficiency of maxSH on filtering paralogs, and significant
effects of maxSH, together with clustering threshold and missing
data, on downstream analyses of outlier detection, population
assignment, and demographic modelling, emphasizing the significance of
carefully coping with bioinformatics process. On the other hand,
although putative hybrids exhibit a genetic mixture of P.
alpicola and P. florindae according to most STRUCTURE and PCA
results, we cannot clearly draw a conclusion on the origin of putative
hybrids due to conflicting demographic scenarios mainly resulted from
altering maxSH value among nine chosen datasets. However, gene
flow patterns of most optimal models from multiple maxSH values
collectively indicate incomplete reproductive isolation between putative
hybrids and two primroses, and the existence of indirect introgression
between P. alpicola and P. florindae.