3.2 Protease substrate specificity profiling using YESS.
Because all its components are DNA-encoded, the YESS system offers a
platform capable of performing three high-throughput experiments: enzyme
engineering, substrate specificity profiling, and mutational scanning.
Performing each experiment would typically require three different
technologies. For instance, one can engineer a protease and profile the
substrate specificity of evolved variants during the engineering
campaign. Furthermore, when combined with next-generation sequencing and
deep learning, YESS can map the substrate specificity landscape of
proteases (Figure 3B).
To optimize YESS for protease substrate specificity profiling, Qing and
coworkers sought to analyze and remove major endogenous proteolytic
events in the yeast secretory pathway, which could convolute analysis of
cleavage specificities of recombinantly expressed proteases (Li et al.,
2017). Screening a DNA-encoded pentapeptide library revealed that a
secretory pathway protease cleaved many arginine and lysine-containing
sequences. This protease was identified as the Golgi residentkex2 protease, with a major cleavage pattern of
Ali/Leu-X-Lys/Arg-Arg. These results helped generate a kex2knockout yeast strain, a superior strain to profile the substrate
specificity of proteases, particularly ones with trypsin-like cleavage
patterns.
Predicting PTM-enzyme substrate specificity is essential for designing
specific activity probes and inhibitors, inferring physiological
substrates, and guiding PTM-enzyme substrate specificity engineering.
The main obstacle to overcome in enzyme-substrate specificity profiling
is undersampling. Substrate specificity is relative, and for promiscuous
enzymes, it is better defined when more substrates are interrogated.
Unfortunately, even the largest substrate libraries generated with yeast
or phage display (>109 unique sequences)
only sample a fraction of possible amino acid combinations in a
heptapeptide library. Machine learning can overcome this bottleneck, and
the DNA-encoded substrate libraries in the YESS system provide the
sequence-function datasets to build ML models for substrate specificity
prediction. Khare and coworkers judiciously showed that combining the
YESS system, computational modeling, and machine learning allows one to
entirely map the P6-P2 substrate specificity and energetic landscape of
HCVp (Pethe et al., 2019). They sorted a naïve pentapeptide library
spanning the P6 to P2 sites of HCVp and selected three distinct
populations by FACS: uncleaved, partially cleaved, and completely
cleaved sequences. They showed that fully and partially cleaved
sequences form separate clusters and that one can map sequence
preference trajectories by single substrate mutation tracking within the
data. To predict the cleavability of the entire pentapeptide library
diversity (3.2 million sequences), they implemented a support vector
machine method trained on energetic features of experimentally derived
sequences obtained from Rosetta modeling. This approach allowed them to
reconstruct the pentapeptide substrate landscape completely. Most
importantly, they discovered and characterized a novel cleavage pattern
(PSTVF) in addition to the four previously known HCVp cleavage
specificities. This deep analysis could be tailored to any PTM-enzyme
and its variants (including drug-resistant mutations) to explore
sequence and structure landscapes of enzyme-substrate interactions not
possible with experiments alone. One obvious next step would be to
leverage machine learning and substrate profiling to infer physiological
substrates as a complement to more expensive proteomics approaches such
as SILAC.