2.4 Comparative genomics analysis
The following 17 species were used for comparative genomic analysis:F. chinensis , P. monodon , L. vannamei ,Portunus trituberculatus , Trinorchestia longiramus ,H. Azteca , Eurytemora affinis , Drosophila melanogaster , Acyrthosiphon pisum , Apis mellifera ,Bombyx mori , Zootermopsis nevadensis , Cryptotermes secundus , Pediculus humanus , Tribolium castaneum ,Anopheles gambiae , and Limulus polyphemus . Genomic sequences were downloaded from NCBI. The gene set of each species was filtered. In brief, when a gene possessed multiply spliced transcripts, only the longest protein-coding transcripts were retained for further analysis. Furthermore, genes were excluded if the proteins encoded by them consisted of less than 30 amino acids or contained degenerate bases or termination codons. The similarity between protein sequences of all species was assessed using BLASTp (E-value ≤ 1e−7). The results were clustered using OrthoMCL (L. Li, Stoeckert, & Roos, 2003), with an expansion coefficient of 1.5. Single-copy and multiple-copy homologous genes were filtered by these analyses.
A phylogenetic tree was constructed using single-copy homologous genes in the 17 species. MUSCLE (Edgar, 2004) was used for sequence alignment. The final dataset was used to construct the phylogenetic tree with RAxML (Rokas, 2011) using the maximum likelihood method. The best tree was used as an input tree for divergence time estimation using MCMCTREE in the PAML package (Yang, 2007), with the following parameters: burn in = 700, sample number = 1000,000, sample frequency = 2. Fossil calibrations were used as priors for the divergence time estimation, as below:P. monodon and L. vannamei [58–108 million years ago (Mya) ], Acyrthosiphon pisum and Eurytemora affinis(452–557 Mya), Zootermopsis nevadensis and Cryptotermes secundus (103–156 Mya), Zootermopsis nevadensis andPediculus humanus (330–398 Mya), Anopheles gambiae andDrosophila melanogaster (217–301 Mya), Drosophila melanogaster and Bombyx mori (243–317 Mya), Tribolium castaneum and Apis mellifera (308–366 Mya). In the gene family expansion and contraction analysis, we filtered the gene families with the results of the clustering analysis of gene families using CAFE software (De Bie, Cristianini, Demuth, & Hahn, 2006). Protein sequences of single-copy homologous genes in F. chinensis , L. vannamei , P. trituberculatus and H. azteca were subjected to multiple alignment using MUSCLE to detect positive selection. The ratios of nonsynonymous substitution per nonsynonymous site (dN) to synonymous substitution per synonymous site (dS) were calculated using the branch-site model of the Codeml tool included in the PAML package. Likelihood ratio tests were applied to test for positive selection.