Analysis of linkage disequilibrium
Four
microsatellite pairs showed significant LD in both KNP and HiP (van
Hooft et al., 2019; van Hooft et al., 2014). Three of these occur close
on chromosome 1 in cattle and its homologous chromosome in African
buffalo (whole genome sequence data of cattle and contig sequence data
of buffalo at NCBI website): BM1824 -CSMM019 (0.5 Mb in
buffalo), CSSM01 9-BM3205 (5.4 Mb), BM1824 -BM3205 (5.9 Mb) (Ihara et al., 2004). The fourth pair consists of
microsatellites ILSTS026 and INRA006 , which are probably
~18 Mb apart on the same chromosome in buffalo. This
chromosome corresponds to a fusing of chromosomes 2 and 3 in cattle,
with the former harbouring ILSTS026 and the latterINRA006 , in both cases ~9.5 Mb from the end of
the left-side of their p arms (Text S4) (Gallagher & Womack, 1992;
Goudet, 1995, 2001; Ihara et al., 2004; Lane-deGraaf et al., 2015; Liu
& Muse, 2005; Weng, Saatchi, Schnabel, Taylor, & Garrick, 2014). Two
additional microsatellite pairs are at close chromosomal distance,
although they did not show significant LD in KNP or HiP:INRA006 -TGLA263 (27.5 Mb in buffalo, chromosome 3 in
cattle) and TGLA057 -INRA128 (28.3 Mb, chromosome 1) (Ihara
et al., 2004). The chromosomal distances at the aforementioned locus
pairs correspond to recombination rates of 0.6-31%/generation based on
a recombination rate of 1.1 cM/Mb at chromosomes 1 to 3 in cattle
(Mouresan et al., 2019; Weng et al., 2014).
In this study, we conducted two statistical tests for LD, with LD
estimated per population per locus pair, assuming two alleles per locus:
the DE allele or SAE allele on the one hand and the wild-type allele
(the pool of all original microsatellite alleles not associated with a
male-deleterious trait, including the rare alleles) on the other. In
these tests, we did not include the BM1824 -BM3205 pair,
because this specific combination of microsatellites was only genotyped
in KNP and HiP and because their LD would not be independent of theBM1824 -CSMM019 and CSMM019 -BM3205 locus
pairs. Thus, the two LD tests were conducted with the following five
locus pairs: BM1824 -CSMM019 (0.5 Mb),CSSM01 9-BM3205 (5.4 Mb), ILSTS026 -INRA006(~18 Mb), INRA006 -TGLA263 (27.5 Mb) andTGLA057 -INRA128 (28.3 Mb).
We measured LD by the Pearson correlation coefficient for a 2x2 table,r LD, for bi-allelic locus pairs. When the linkage
phase is known, r LD can be derived from the
observed and expected haplotype frequencies (χ 2value, four possible haplotypes) and the total number of gametes or
chromosomes analysed (k ), using Equation 2 below (Charlesworth &
Charlesworth, 2010). A negative sign is added tor LD when most pairs of focal alleles are in
repulsion. However, when the linkage phase is unknown, as in our case,r LD can be derived from maximum-likelihood
estimates of the haplotype frequencies (Schaid, 2004; Weir, 1990). In
our case, positive r LD values indicate that most
pairs of male-deleterious alleles are in coupling (i.e., DE-DE, SAE-SAE
or DE-SAE).
\(r_{\text{LD}}=\ \sqrt{\frac{\chi^{2}}{k}}\) (2)
We estimated r LD with Genalex, following Weir’s
method (Weir, 1990). Genalex provides three estimates ofr LD: 1) r LD based on
estimated frequencies of double heterozygotes in coupling assuming
Hardy-Weinberg equilibrium, 2) r LD based on
estimated frequencies of double heterozygotes in repulsion assuming
Hardy-Weinberg equilibrium, 3) r LD based on
estimated frequencies of both types of double heterozygotes without
assuming Hardy-Weinberg equilibrium. The first two estimates should be
identical, but the maximum-likelihood procedure may result in
deviations, especially with small sample sizes. We used ther LD estimates that assume HWE (i.e., estimates 1
and 2). We only included cases where estimates 1 and 2 were identical
and of the same sign as estimate 3 (67 out of 78 cases), except for two
cases where estimates 1 and 2 were of opposite sign but very close to
zero (|r LD| =0.0002) while
estimate 3 was zero. For these two cases we assumedr LD=0.
We pooled the samples from the neighbouring populations in Mana Pools
NP-Nyakasanga (~55 km distance) and in Limpopo
NP-Manguana (~200 km distance) to increase sample size,
after checking that in the latter case this pooling, considering the
large geographic distance, did not bias estimates ofr LD (paired sample t -test on 30 locus
pairs, comparing weighted average r LD of the two
separate populations with r LD of the pooled
population: pooled population: average r LD =
-0.010, two separation populations: average r LD =
0.002; P = 0.52; Pearson r = 0.91). It was not possible to
estimate r LD for all populations and locus pairs
because of small sample sizes. However, in some cases we were able to
get an estimate of r LD by a leave-one-out
approach (CSMM019 -BM3205 in Queen Elizabeth NP (both
sectors) and Amboseli NP, BM1824 -CSMM019 in Limpopo
NP-Manguana, INRA006 -TGLA263 in Mana Pools NP-Nyakasanga).
Hereby, we calculated average r LD for all
possible population samples with one individual excluded, with multiple
estimates per population always being of the same sign. We excluded HiP
because of earlier observed negative selection (van Hooft et al., 2019),
the two central African populations because of small sample size, and
the admixed population from Save Valley Conservancy because admixing may
bias LD estimates (Pfaff et al., 2001).
In the first LD test, we hypothesized that pairs of
male-deleterious-trait-associated alleles (i.e.; DE-DE, SAE-SAE and
DE-SAE allele pairs) are coupled more often than expected with random
association, resulting in positive r LD values. We
conducted a χ 2 test to determine whether the
total fraction of locus pairs (across loci and across populations) with
a positive r LD value significantly deviated from
0.5, considering that positive and negative r LDvalues are equally likely in the absence of selection (Slatkin, 2008).
LD estimates of the two closest linked locus pairs,BM1824 -CSMM019 and CSMM019 -BM3205 , for
populations that are part of a larger metapopulation may not be
independent of each other. To err on the conservative side, we used the
average r LD value per metapopulation across both
locus pairs (KNP was the only metapopulation in which both locus pairs
were analysed). With respect to these locus pairs, the following four
groups of populations were considered to be part a metapopulation: 1)
northern KNP - southern KNP, 2) Malilangwe Wildlife Res. - Gonarezhou NP
- Sengwe Safari Area - Crooks Corner, 3) Victoria Falls NP - Chobe NP -
Okavango Delta, 4) Amboseli NP - Tsavo NP, 5) Queen Elizabeth NP Mw -
Queen Elizabeth NP Is. We assumed independence ofr LD estimates for the other three physically
linked locus pairs, even when part of the same metapopulation, because
of their relatively large interlocus distances. For these locus pairs,
the expected recombination rate (~20-31% per
generation) likely exceeds the migration rate between nearby
populations.
As a control, we estimated the total fraction of locus pairs with
positive r LDacross all populations for: 1) the
five closely linked locus pairs with genotypes randomized per locus per
population, and 2) locus pairs with interlocus distance >45
Mb (seven locus pairs) or that occur on different chromosomes (116 locus
pairs). Both fractions are expected to be close to 0.5 (free
recombination). In the first control we conducted 30 randomizations of
the total dataset, resulting in 1541 randomizedr LD values. The r LD values
in the second control were estimated with GDA, instead of Genalex,
because it can better handle large data sets with missing data (Labate,
2000). At the five closely linked locus pairs, Genalex and GDA always
provided identical estimates.
In the second LD test, we estimated the significance of regression ofr LD for each population against the natural
logarithm of the variable l =latitude (67 data points) over the
range [-24.75°, 3.85°], i.e., we fitted the parameters a andb in the equation:
\(r_{\text{LD}}=\operatorname{aln}\left(l-\ l_{0}\right)+b\) (3)
using a general linear mixed model (GLMM) approach, specifically the
’lme4’ (version 1.1.21), ’lmerTest’ (version 3.1.0) and ’performance’
(version 0.4.4.1) packages in R (Bates, Machler, Bolker, &
Walker, 2015; Kuznetsova, Brockhoff, & Christensen, 2017). We applied
log-transformation to account for the obvious non-linearity in the data,
using l 0 = -27. Locus pair (five levels),
population (25 levels, including three singletons) and metapopulation
(57 levels, including 52 singletons) were included as random factors.
GLMMs were not weighted by sample size as this only had a minor effect
on parameter estimates (< 0.5% difference). Goodness-of-fit
was estimated with marginal R 2; the proportion
of variability explained by the fixed effects (Nakagawa & Schielzeth,
2013).