2.7 Repetitive sequence annotation
Two methods were combined to identify the repeat contents in the genome: homology-based and de novo prediction. For homology-based analysis, we identified the known TEs within the C. sonnerati genome using RepeatMasker 4.0.9 (Tarailo-Graovac et al., 2009) to identify with the Repbase TE library (Jurka et al., 2000, 2005). Repeat Protein Mask searches were also conducted using the TE protein database as a query library. For de novo prediction, we constructed a de novo repeat library of the C. sonnerati genome using RepeatModeler (http://www. org/RepeatModeler/), which can automatically execute two core de novo repeat finding programs, namely, RECON v1.08 (Bao & Eddy, 2002) and RepeatScout (v1.0.5) (Price et al., 2005), to comprehensively conduct, refine and classify consensus models of putative interspersed repeats for the C. sonnerati genome. Furthermore, we performed a de novo search for long terminal repeat (LTR) retrotransposons against the C. sonnerati genome sequences using LTR_FINDER (v1.0.7) (Xu & Wang, 2007). We also identified tandem repeats using the Tandem Repeat Finder (TRF) package (Benson, 1999) and the non interspersed repeat sequences, including low complexity repeats, satellites and simple repeats, using Repeat Masker. Finally, we merge the library files of the two methods and use repeat maker to identify the repeat contents.