2.4 Comparative genomics analysis
The following 17 species were used for comparative genomic analysis:F. chinensis , P. monodon , L. vannamei ,Portunus trituberculatus , Trinorchestia longiramus ,H. Azteca , Eurytemora affinis , Drosophila
melanogaster , Acyrthosiphon pisum , Apis mellifera ,Bombyx mori , Zootermopsis nevadensis , Cryptotermes
secundus , Pediculus humanus , Tribolium castaneum ,Anopheles gambiae , and Limulus polyphemus . Genomic
sequences were downloaded from NCBI. The gene set of each species was
filtered. In brief, when a gene possessed multiply spliced transcripts,
only the longest protein-coding transcripts were retained for further
analysis. Furthermore, genes were excluded if the proteins encoded by
them consisted of less than 30 amino acids or contained degenerate bases
or termination codons. The similarity between protein sequences of all
species was assessed using BLASTp (E-value ≤ 1e−7).
The results were clustered using OrthoMCL (L. Li, Stoeckert, & Roos,
2003), with an expansion coefficient of 1.5. Single-copy and
multiple-copy homologous genes were filtered by these analyses.
A phylogenetic tree was constructed using single-copy homologous genes
in the 17 species. MUSCLE (Edgar, 2004) was used for sequence alignment.
The final dataset was used to construct the phylogenetic tree with RAxML
(Rokas, 2011) using the maximum likelihood method. The best tree was
used as an input tree for divergence time estimation using MCMCTREE in
the PAML package (Yang, 2007), with the following parameters: burn in =
700, sample number = 1000,000, sample frequency = 2. Fossil calibrations
were used as priors for the divergence time estimation, as below:P. monodon and L. vannamei [58–108 million years ago
(Mya) ], Acyrthosiphon pisum and Eurytemora affinis(452–557 Mya), Zootermopsis nevadensis and Cryptotermes
secundus (103–156 Mya), Zootermopsis nevadensis andPediculus humanus (330–398 Mya), Anopheles gambiae andDrosophila melanogaster (217–301 Mya), Drosophila
melanogaster and Bombyx mori (243–317 Mya), Tribolium
castaneum and Apis mellifera (308–366 Mya). In the gene family
expansion and contraction analysis, we filtered the gene families with
the results of the clustering analysis of gene families using CAFE
software (De Bie, Cristianini, Demuth, & Hahn, 2006). Protein sequences
of single-copy homologous genes in F. chinensis , L.
vannamei , P. trituberculatus and H. azteca were subjected
to multiple alignment using MUSCLE to detect positive selection. The
ratios of nonsynonymous substitution per nonsynonymous site (dN) to
synonymous substitution per synonymous site (dS) were calculated using
the branch-site model of the Codeml tool included in the PAML package.
Likelihood ratio tests were applied to test for positive selection.