3.1 | To split or not to split
3.1.1 | Summary
From among 77 CASP15 tertiary structure prediction targets, 43 were one-domain targets, 21 had two domains, and the rest - three domains or more (Table 1). For 52 targets no domain rearrangement was necessary, and the targets were evaluated as whole-length structures (41) or unchanged constituent domains (11). For the remaining 25 targets, in 20 cases we merged at least some domains according to Grishin plots, in two cases we merged domains according to other considerations, and in three cases we split targets in more EUs than suggested by the domain parsing programs. The domain splitting and re-joining procedure (Methods ) yielded 112 evaluation units, 109 of which were included into the final tertiary structure evaluation21, while three – T1114s1-D2, T1157s1-D2 and -D3 – were cancelled due to the low resolution of the cryo-EM maps in their local areas.
Out of 34 multi-domain targets, 14 were evaluated as one EU and 20 were split into multiple EUs (Table 1). Below we discuss different scenarios of forming evaluation units and present case studies for some targets.
3.1.2 | Multidomain-targets not requiring splitting (14)
Fourteen multi-domain targets (as defined by the automatic parsers - section 2.1, Step 2 ) were proposed for the evaluation without splitting into substructures.
In two cases, T1131 and T1133, we disagreed with the automatic domain parsing results and considered the targets as one-domain structures. Target T1131 is a small protein where a long central helix holds two parts of the structure together and is needed for the structural integrity of the protein; while target T1133 (PDB: 8DYS) is a nine-bladed beta-propeller that is fully and reliably covered by templates (e.g., 3WJ9_B) and well-predicted as the whole.
For eleven targets a decision to join domains into single EUs was reached based on the analysis of Grishin plots. Two examples of such targets are shown in Figure 1. Even though the targets are clearly two-domain entities, their whole structures were predicted by most groups as accurately as the constituent domains and thus did not require splitting.