A Survey of Annotated smORF-Encoded Polypeptides in Bacteria Interacting
with Eukaryotic Hosts
Abstract
smORF encoded polypeptides (SEPs) are difficult to predict due to their
small size. While modern genome annotation tools are capable of
identifying smORFs, their reliability is often uncertain. Furthermore,
experimental validation of smORFs has primarily focused on a limited set
of model organisms. Here we conduct a comprehensive analysis of
annotated smORFs in a diverse range of bacteria interacting with
eukaryotic hosts. Our analysis revealed that bacterial genomes typically
harbor between 100 and 300 annotated smORFs, predominantly encoding SEPs
exceeding 40 residues and annotated as hypothetical proteins. We show
that functional annotation of SEPs can be improved to some extent with
the currently available resources, and that SEPs exhibit distinct
functional profiles in bacteria associated with different host types
(plant vs. animal). We also found that most of the experimentally
validated SEPs are conserved, and that all the annotated SEPs begin with
methionine, while that is not always the case for the experimentally
validated ones. Our findings underscore the need for improved annotation
methods and further experimental characterization to fully understand
the functional roles and evolutionary significance of smORFs in
bacteria-host interactions.