Genetic Data
As our knowledge of rare disease genetics develops and the interaction between genetic loci are more fully understood, there is a pressing need for the visualization of all types of genetic variation within a single interface. DECIPHER fulfills this need, supporting many types of genetic variation including sequence variants, CNVs, aneuploidy, uniparental disomy (UPD), inversions, insertions and short tandem repeats (STRs) (Fig. 2).
Variant deposition: Variants are deposited using genomic coordinates. Sequence variants can also be deposited using a relevant subset of HGVS nomenclature (den Dunnen et al ., 2006), and will be normalised (left aligned, parsimonious) during the deposition process (Tan et al ., 2015). For known STRs, the disease-relevant STR can be selected from a dropdown in the web interface. Additional information about the variant such as inheritance, genotype, pathogenicity, and contribution to phenotype can also be recorded.
Mosaicism: For de novo mosaic variants, it is possible to record the mosaicism observed in each tissue, as a percentage. This information is clinically important as it can help explain the variability of clinical symptoms, for example the difference between nevus sebaceous or Schimmelpenning syndrome (where extracutaneous abnormalities are present), caused by HRAS and KRASvariants (Groesser et al ., 2012).
Mitochondrial variants: DECIPHER supports the deposition and interpretation of variants in the nuclear and mitochondrial genomes. Mitochondrial diseases are the most common form of inherited neuro-metabolic disorders, and are caused by mutations in the nuclear or mitochondrial genomes. In addition, nuclear genetic factors have been shown to influence clinical outcomes for mitochondrial DNA mutations (Boggan et al ., 2019). Thus the display of both genomes in a single interface is clinically important. In DECIPHER it is possible to record homoplasmy or the percentage of heteroplasmy per tissue, which is clinically essential as it has been shown to contribute to disease progression (Grady et al ., 2018).
Variant haplotypes: Variants may work in cis to create or modify a disease allele or in trans to cause a biallelic disorder. For this reason DECIPHER users can assign variants to a haplotype, e.g. for compound heterozygous variants, the variants will be shown as in trans . As our understanding of rare disease genetics improves, the representation of its complexity is becoming even more essential. It is known that genetic modifiers alleviate or exacerbate the severity of the disease (Rahit and Tarailo-Graovac 2020) and there are recent examples where rare pathogenic haplotypes have been shown to cause disease, such as an albinism-causing TYR haplotype (Campbellet al , 2019).
Pathogenicity predictors: For all sequence variants deposited to DECIPHER, predictions from the Ensembl Variant Effect Predictor (VEP; McLaren et al ., 2016) are displayed across all Ensembl/GENCODE transcripts. Predictions include the consequence (e.g. missense, frameshift), the protein change, and several pathogenicity scores: SIFT (Sim et al ., 2012), PolyPhen-2 (Adzhubei et al ., 2013), CADD (Kircher et al ., 2014), REVEL (Ioannidis et al ., 2016), and SpliceAI (Jaganathan et al. , 2019). DECIPHER seeks advice from experts in the field and refers to benchmarking studies for pathogenicity predictors (e.g. Gunning et al ., 2021) prior to the inclusion of additional scores, assisting in the application of good practice.
Reference genome: All genomic information is displayed in the GRCh38 assembly version of the human genome, allowing the most up-to-date genome and transcript information to be used to enable accurate variant interpretation. The display of genomic data in GRCh38 permits DECIPHER to promote the use of Matched Annotation from NCBI and EMBL-EBI (MANE) transcripts, where the RefSeq and Ensembl/GENCODE transcripts from a protein-coding gene pair are identical (5’ UTR, coding region, and 3’ UTR). DECIPHER currently promotes and highlights MANE Select transcripts, one high-quality representative transcript per protein-coding gene that is well-supported by experimental data and represents the biology of the gene (https://tark.ensembl.org/web/mane_project). Describing variants relative to a single, recommended transcript, along with sequence variant normalisation, assists in the standardisation of variant reporting.
Reference conversion tools: Deposition with GRCh37/hg19 coordinates is still supported: prior to normalisation, DECIPHER remaps GRCh37 coordinates onto the GRCh38 assembly, using an algorithm based on the UCSC LiftOver tool (https://genome.ucsc.edu/cgi-bin/hgLiftOver, Kuhn et al ., 2013). A range of tools are also provided to allow users to visualise the differences between assemblies. These include GRCh37 and GRCh38 comparative genome browsers, gene lists for variants lifted over by DECIPHER which display genes that no longer overlap the variant, and a liftover mapping genome browser track (Fig. 3).