PETase homologues
We are aware that there are controversial discussions about the term
PETase, but we prefer to define all PET-active enzymes as PETases.
Sixteen protein sequences for enzymes with known activity against PET
were clustered using CD-HIT (version 4.6.8-1) at a threshold of 90%
sequence identity and a word length of 5 to derive a reduced set of
twelve centroid sequences 15,16. These protein
sequences were aligned in a structure-guided multiple sequence alignment
by T-COFFEE (version 11.00.8cbe486-1) 17. A profile
hidden Markov model (HMM) was derived from this multiple sequence
alignment by HMMER (version 3.1b2, http://hmmer.org). The profile
HMM was trimmed by selecting alignment columns that corresponded to the
region between amino acid positions 32 and 274 in the PETase fromIdeonella sakaiensis (Is PETase, Uniprot identifier
A0A0K8P6T7) to avoid ambiguities at the N- and C-termini (Table
S4 and Figure S1 ). The profile HMM and the underlying multiple
sequence alignment can be downloaded from
https://doi.org/10.18419/darus-2055. This PETase-profile HMM was
used to search both the NCBI non-redundant (nr) protein database and the
Protein Data Bank (PDB) for an update of the Lipase Engineering
Database (LED, https://led.biocatnet.de), which was previously
established as a collection of protein sequences from α/β-hydrolases18–20. Hits for the PETase-profile HMM were selected
from the HMMER results with a minimal score of 100, a minimal profile
coverage of 95%, and a maximum ratio of bias/score of 10%.
HMMER was also used to identify the C-terminal region for the Type IX
secretion system sorting domain, using the profile HMM TIGR04183, which
was derived from a multiple sequence alignment of 889 protein sequences
in the TIGRFAM database
(http://tigrfams.jcvi.org/cgi-bin/index.cgi), with an E-value
cut-off below 1.