2.7 Phylogenetic analysis
Phylogenetic trees for S. chinensis and eight other aphid species including Daktulosphaira vitifoliae , Sipha flava , Aphis glycines , R. maidis , A. pisum , Myzus persicae , Diuraphis noxia ,E. lanigerum were reconstructed (International Aphid Genomics Consortium, 2010; Li et al., 2019; Mathers, 2020; Mathers et al., 2017; Mathers, Mugford, et al., 2020; Mathers, Wouters, et al., 2020; Nicholson et al., 2015; Thorpe et al., 2018; Wenger et al., 2016). The whitefly, Bemisia tabaci was used as the outgroup. The aphid genome sequence and gene structure annotation files were downloaded from the NCBI genome database, genes containing mRNA information were retained, and the CDS was modified. The longest isoform was selected as the representative sequence of the gene. Predicted proteins encoded by all putative genes were obtained. Orthologous groups were assigned by OrthMCL (v2.0.9) (Li, Stoeckert & Roos, 2003) based on the all-versus-all BLASTP results (E-value ≤1×10−5). Single copy orthologous groups were extracted from OrthoMCL results where single copy genes covered at least 50% of all species. And if the shortest sequence of the single copy ortholog group is longer than 6000 bp, the single copy ortholog group is filtered out to avoid too long sequences that may affect the accuracy of tree. Multi-sequence alignments of single copy orthologous genes were performed using MAFFT (version 7.221, Katoh, Misawa, Kuma, & Miyata, 2002; Katoh & Standley, 2013) and the conserved amino-acid sites were identified using Gblocks (version 0.91, Clore, 2014). RAxML (version 8.1.24) (Stamatakis 2014) was employed to construct the phylogenetic tree under the GTRGAMMA model with 1000 bootstrapping replicates (Castresana, 2000). The branch length of homologous genes was analyzed with PAML (Yang, 2007), and compared with the standard tree to eliminate abnormal genes. Then, the tree was rebuilt using RAxML again (Stamatakis, 2014). By providing the root number and multiple sequence alignment results with calibration point information, the species divergence time was calculated using MCMCtree of PAML software (version 14.9). Divergence time within the evolutionary tree was obtained with 95% confidence interval (CI) (Yang, 2007). Meanwhile, divergence time and age of fossil records were derived from TimeTree (http://www.timetree.org/) and applied as the calibration points. According to the divergence times from TimeTree, the nodal dates of Ac. pisum and Ap. glycines were 28-61 million years ago (MYA), those of D. vitifoliae and S. flava were 87-162 MYA and those of B. tabaci and D. vitifoliae were 245- 351 MYA (Johnson et al., 2018).