SPiP: Splicing Prediction Pipeline, a machine learning tool for massive
detection of exonic and intronic variant effect on mRNA splicing.
Abstract
Modeling splicing is essential for tackling the challenge of variant
interpretation as each nucleotide variation can be pathogenic by
affecting pre-mRNA splicing via disruption/creation of splicing
motifs such as 5’/3’ splice sites, branch sites or splicing regulatory
elements. Unfortunately, most in silico tools focus on a specific
type of splicing motif, which is why we developed the Splicing
Prediction Pipeline (SPiP) to perform, in one single bioinformatic
analysis based on machine learning approach, comprehensive assessment of
variant effect on different splicing motifs. We gathered a curated set
of 4,616 variants scattered all along the sequence of 227 genes, with
their corresponding splicing studies. Bayesian analysis provided us the
number of control variants, i.e. variants without impact on
splicing, to mimic the deluge of variants from high throughput
sequencing data. Results show that SPiP can deal with the diversity of
splicing alterations, with 83.13% sensitivity and 99% specificity to
detect spliceogenic variants. Overall performance as measured by area
under the receiving operator curve was 0.986, significantly better than
0.965 spliceAI for the same dataset. SPiP lends itself to a unique suite
for comprehensive prediction of spliceogenicity in the genomic medicine
era. SPiP is available at:
[https://sourceforge.net/projects/splicing-prediction-pipeline/](https://sourceforge.net/projects/splicing-prediction-pipeline/)