2.7 Phylogenetic analysis
Phylogenetic trees for S. chinensis and eight other aphid species
including Daktulosphaira
vitifoliae , Sipha flava , Aphis glycines , R.
maidis , A. pisum , Myzus persicae , Diuraphis noxia ,E. lanigerum were reconstructed (International Aphid Genomics
Consortium, 2010; Li et al., 2019; Mathers, 2020; Mathers et al., 2017;
Mathers, Mugford, et al., 2020; Mathers, Wouters, et al., 2020;
Nicholson et al., 2015; Thorpe et al., 2018; Wenger et al., 2016). The
whitefly, Bemisia tabaci was used as the outgroup. The aphid
genome sequence and gene structure annotation files were downloaded from
the NCBI genome database, genes containing mRNA information were
retained, and the CDS was modified. The longest isoform was selected as
the representative sequence of the gene. Predicted proteins encoded by
all putative genes were obtained. Orthologous groups were assigned by
OrthMCL (v2.0.9) (Li, Stoeckert & Roos, 2003) based on the
all-versus-all BLASTP results (E-value ≤1×10−5).
Single copy orthologous groups
were extracted from OrthoMCL results where single copy genes covered at
least 50% of all species. And if the shortest sequence of the single
copy ortholog group is longer than 6000 bp, the single copy ortholog
group is filtered out to avoid too long sequences that may affect the
accuracy of tree. Multi-sequence alignments of single copy orthologous
genes were performed using MAFFT (version 7.221,
Katoh, Misawa, Kuma, & Miyata,
2002; Katoh & Standley, 2013) and the conserved amino-acid sites were
identified using Gblocks (version 0.91, Clore, 2014). RAxML (version
8.1.24) (Stamatakis 2014) was employed to construct the phylogenetic
tree under the GTRGAMMA model with 1000 bootstrapping replicates
(Castresana, 2000). The branch length of homologous genes was analyzed
with PAML (Yang, 2007), and compared with the standard tree to eliminate
abnormal genes. Then, the tree was rebuilt using RAxML again
(Stamatakis, 2014). By providing the root number and multiple sequence
alignment results with calibration point information, the species
divergence time was calculated using MCMCtree of PAML software (version
14.9). Divergence time within the evolutionary tree was obtained with
95% confidence interval (CI) (Yang, 2007). Meanwhile, divergence time
and age of fossil records were derived from TimeTree
(http://www.timetree.org/) and applied as the calibration points.
According to the divergence times
from TimeTree, the nodal dates of Ac. pisum and Ap.
glycines were 28-61 million years ago (MYA), those of D.
vitifoliae and S. flava were 87-162 MYA and those of B.
tabaci and D. vitifoliae were 245- 351 MYA (Johnson et al.,
2018).