Sequence motifs
The PETase-profile HMM was applied to analyse the conservation of amino acid residues in the 2930 PETase core domains annotated in the LED (Table S6 ) in comparison to the equivalent positions in the PETase from Ideonella sakaiensis (Is PETase, Uniprot identifier A0A0K8P6T7) and LCC (Uniprot identifier G9BY57). The catalytic triad, the previously suggested PET binding subsite I, which includes an aromatic clamp for possible substrate interaction, and PET binding subsite II from 39 were found to be highly conserved (Table 2 ). The extension of the second α-helix and the extended loop region, which were described previously as functionally relevant in Is PETase, were also found in several PETase homologues in the LED.
Using the position numbers from Is PETase, we suggest a typical PETase sequence motif written as follows (with X indicating an arbitrary amino acid): [YF]87, Q119, X3 139-141, S160, M161, W185, D206, H237, X6 242-247, followed by one of the previously published amino acid substitutions from 40. Interestingly, two sequences from an uncultured bacterium (NCBI: ACC95208.1) and Alkalilimnicola ehrlichii (NCBI: WP_116302080.1) were found to comprise the PETase sequence motif and W238, which was mentioned as an amino acid substitution for improved activity and substrate binding, and four additional sequences (from Caldimonas manganoxidans , NCBI: WP_019560450.1, from C. taiwanensis , NCBI:WP_062195544.1, from Rhizobacter gummiphilus , NCBI:WP_085749610.1, and from Aquabacterium sp., NCBI: MBI3384080.1) were found to comprise the PETase sequence motif and M241, which was mentioned as an amino acid substitution for improved thermostability. These six different and novel protein sequences, each selected by a sequence motif of seventeen amino acid positions in total, are proposed for upcoming studies on PETase activity.
Polyurethanes (PUR) active enzymes