Driving rare disease research
The ~40,000 openly consented patient records in DECIPHER
contain more than 51,000 variants and ~172,000
phenotypes, and represents a rich dataset to drive rare disease
research. Since its inception in 2004, DECIPHER has been cited more than
2,600 times in peer reviewed publications (Fig. 7A); a testimony to its
impact on rare disease research. In some cases there is a large
genotypic patient series, which allows, for example, the full spectrum
of phenotypes associated with a gene to be recognised. At the time of
writing, the genes with the most open access sequence variants wereNF1 (162), ANKRD11 (123), ARID1B (107),KMT2A (107), and DDX3X (78) (Fig. 7B).
Search: To identify the most relevant patient records and gene
information DECIPHER offers a powerful search function allowing users to
search using many different categories including gene, phenotype, HPO
identifier, genomic position (in GRCh37 or GRCh38), chromosome band,
pathogenicity, and inheritance. Advanced searches are supported, such as
searching for multiple terms either from the same category (e.g.
multiple phenotypes) or different categories (e.g. gene plus phenotype).
Results are displayed in a tabular format, in addition to genome
browser-based representations.
Driving discovery: The genotype-linked phenotypic data allows,
for example, new variant-disease associations to be discovered, such as
loss-of-function variants in ARFGEF1 causing developmental delay
and epilepsy (Thomas et al ., 2021). The dataset also enables the
extension of phenotypes for new syndromes to be uncovered (e.g.
Witteveen-Kolk syndrome a SIN3A -related disorder Balasubramanianet al ., 2021), in addition to well established syndromes (e.g.ALG13 congenital disorder of glycosylation Alsharhan et
al ., 2021). It also permits the understanding of contiguous gene
effects, such as that around ERF which causes a novel
craniosynostosis syndrome with varying degrees of intellectual
disability (Calpena et al ., 2021).
DDD Research variants: In addition to the openly consented
patient data, DECIPHER openly shares the DDD research variants, which
are variants of unknown significance identified in undiagnosed probands
with developmental disorders in the DDD study. These include functionalde novo variants and rare loss-of-function homozygous, compound
heterozygous, and hemizygous variants in genes that are neither
developmental disorder genes, nor OMIM-morbid genes. At present this
dataset comprises nearly 5,000 variants. High-level phenotype terms are
provided for each variant (Fig. 7C). The number of patients with each
variant in the DDD dataset is displayed, in addition to the number of
patients identified in the GeneDx and Radboud University Medical Centerde novo variant dataset as described by Kaplanis et al. ,
2020. This dataset enables the discovery of new gene-disease
associations.
Bulk data for research: The openly consented patient data is
available for bulk download for research purposes, subject to a data
access agreement. In bulk the data can be used, for example, for
developing new analytical methods, in understanding patterns of
polymorphism, and in refining critical intervals to map genes involved
in specific phenotypes and diseases. The dataset has recently been used
to associate phenotypes with functional systems (Jabato et al .,
2021), and to develop a new tool to assist clinical interpretation of
CNVs (Requena et al ., 2021). DECIPHER also shares the data in
bulk for display, subject to a Data Display Agreement. This allows
third-party variant analysis companies and the academic genome browser
providers such as Ensembl and UCSC to display the data, maximising the
possibility of finding patient matches.