2.7 Repetitive sequence annotation
Two methods were combined to identify the repeat contents in the genome:
homology-based and de novo prediction. For homology-based analysis, we
identified the known TEs within the C. sonnerati genome using
RepeatMasker 4.0.9 (Tarailo-Graovac et al., 2009) to identify with the
Repbase TE library (Jurka et al., 2000, 2005). Repeat Protein Mask
searches were also conducted using the TE protein database as a query
library. For de novo prediction, we constructed a de novo repeat
library of the C. sonnerati genome using RepeatModeler
(http://www. org/RepeatModeler/), which can automatically execute two
core de novo repeat finding programs, namely, RECON v1.08 (Bao & Eddy,
2002) and RepeatScout (v1.0.5) (Price et al., 2005), to comprehensively
conduct, refine and classify consensus models of putative interspersed
repeats for the C. sonnerati genome. Furthermore, we performed a
de novo search for long terminal repeat (LTR) retrotransposons against
the C. sonnerati genome sequences using LTR_FINDER (v1.0.7) (Xu
& Wang, 2007). We also identified tandem repeats using the Tandem
Repeat Finder (TRF) package (Benson, 1999) and the non interspersed
repeat sequences, including low complexity repeats, satellites and
simple repeats, using Repeat Masker. Finally, we merge the library files
of the two methods and use repeat maker to identify the repeat contents.