PETase homologues
We are aware that there are controversial discussions about the term PETase, but we prefer to define all PET-active enzymes as PETases. Sixteen protein sequences for enzymes with known activity against PET were clustered using CD-HIT (version 4.6.8-1) at a threshold of 90% sequence identity and a word length of 5 to derive a reduced set of twelve centroid sequences 15,16. These protein sequences were aligned in a structure-guided multiple sequence alignment by T-COFFEE (version 11.00.8cbe486-1) 17. A profile hidden Markov model (HMM) was derived from this multiple sequence alignment by HMMER (version 3.1b2, http://hmmer.org). The profile HMM was trimmed by selecting alignment columns that corresponded to the region between amino acid positions 32 and 274 in the PETase fromIdeonella sakaiensis (Is PETase, Uniprot identifier A0A0K8P6T7) to avoid ambiguities at the N- and C-termini (Table S4 and Figure S1 ). The profile HMM and the underlying multiple sequence alignment can be downloaded from https://doi.org/10.18419/darus-2055. This PETase-profile HMM was used to search both the NCBI non-redundant (nr) protein database and the Protein Data Bank (PDB) for an update of the Lipase Engineering Database (LED, https://led.biocatnet.de), which was previously established as a collection of protein sequences from α/β-hydrolases18–20. Hits for the PETase-profile HMM were selected from the HMMER results with a minimal score of 100, a minimal profile coverage of 95%, and a maximum ratio of bias/score of 10%.
HMMER was also used to identify the C-terminal region for the Type IX secretion system sorting domain, using the profile HMM TIGR04183, which was derived from a multiple sequence alignment of 889 protein sequences in the TIGRFAM database (http://tigrfams.jcvi.org/cgi-bin/index.cgi), with an E-value cut-off below 1.