Introduction
The class 2 clustered regularly interspaced short palindromic repeats
(CRISPR)-CRISPR-associated proteins (Cas), which are derived from
prokaryotic immune system, were identified as programmable, RNA-guided
nucleases.[1-7] Generally, each CRISPR-Cas system is composed of Cas
proteins and a guide RNA. In a broad spectrum of eukaryotic and
prokaryotic species, CRISPR/Cas9 and CRISPR/Cpf1 could be expressed
heterologously with relative guide RNAs to target complementary DNA
sequences, exhibiting many advantages as powerful genome editing
tools.[8-11] Cpf1 was reported with several differences from Cas9:
first, Cpf1 processes its own guide RNAs and does not require a
tracrRNA; second, there is a longer distance between the seed sequence
and cleavage site; third, Cpf1 recognizes thymidine-rich PAM sequence;
fourth, Cpf1 generates cleavage with 5′overhangs.[12,13,14] These
features make Cpf1 expand the toolkit for genome editing.[15,16,17]
A general issue for the application of Cpf1 appears to be the
unpredictable success of guide RNA design.[18,19] However, limited
information of the relationship between guide RNAs sequence and activity
is available. There is a number of tools and applications developed to
predict guide RNA performance of Cas9.[20-28] It may seem that the
guide RNA design for Cpf1 would benefit from these information and
strategies. Recent studies for Cpf1 attempted to describe the guide RNA
sequence-activity relationship and present algorithms to predict the
activity of Cpf1 guide RNAs.[20-22]
Nevertheless, such approaches were developed in mammalian cell lines
where Cpf1 activities at endogenous sites were found to be affected by
chromatin accessibility as well as target sequence composition. And the
known nonhomologous end-joining (NHEJ) pathway preference for different
DSB substrates may also reshape the guide RNA activity landscape
To exclude these factors and gain more general insights into the
relationship of guide RNA sequence and activity, we launched
high-throughput screening experiments and collect large-scale datasets
in E.coli cells, in which NHEJ molecular machinery is entirely
absent.
In this paper, we described a library of >12,500 target
sequence and guide RNA pairs and evaluated guide RNA activity inE. coli by associating CRISPR/Cpf1-induced DNA cleavage with
cellular lethality. The guide RNA activity revealed significant
diversity. It’s worth noting that the current guide RNA activity
prediction models showed Spearman correlations of only 0.56 when tested
with our data. We therefore proposed a computational approach to design
Cpf1 guide RNAs allowing the prediction of efficient and inefficient
guide RNAs with an improved performance with Spearman correlation of
0.80. Lastly, our model identified important guide RNA sequence features
that contribute to DNA cleavage.