Evolution of Matchmaking
Matchmaking using MME is based on a two-sided framework where two
interested parties are both looking for a match for the same gene(Philippakis et al., 2015) . As the past 7 years of experience has
demonstrated, this approach has been very successful in advancing the
discovery of novel disease-gene relationships. However, this approach
only works when both interested parties have taken the time to flag a
highly compelling novel candidate gene of interest. But what of all the
datasets where extensive manual review has not occurred due to all sorts
of factors? Discovery for these types of datasets needs to happen
differently. One-sided matchmaking (Figure 1 ) can occur when
one party is interested in a candidate gene and queries a database
hosting genome-wide sequencing data from undiagnosed patients to
identify variants in the candidate gene associated with additional
information. Zero-sided matchmaking (Figure 1 ) is the term used
to describe the state where there are no candidates identified but
instead computational analysis across the cohort is used to identify
genes with predicted damaging variants in common across phenotypes. For
example, the genebass.org website allows users to query precomputed gene
burden analyses across all genes for all phenotypes in the UK Biobank(Karczewski et al., 2021) . In another example, the Deciphering
Developmental Disorders (DDD) study applies burden testing frameworks to
identify genes with significant enrichments of damaging variants, such
as genes with more de novo loss-of-function variants in the DDD
cohort than expected (PMC: 7116826). Likewise, the new GREGoR consortium
(gregorconsortium.org) is amassing rare disease data on the AnVIL
platform (Schatz et al., 2022) from both the prior NIH Centers
for Mendelian Genomics as well as prospectively collected data to
improve power for identifying gene-disease candidates. As more and more
data are generated, this type of approach will be critical to ensure we
can analyze unsolved datasets at scale.
Several data platforms are approaching one-sided matchmaking by
providing information about the existence of a specific variant and its
associated information (e.g., phenotype). These databases include
MyGene2 (NHGRI/NHLBI University of Washington-Center for Mendelian
Genomics (UW-CMG), Seattle, WA) , Geno2MP (University of
Washington Center for Mendelian Genomics) , VariantMatcher (Wohler
et al., 2021) , and Franklin (Genoox) . MyGene2 and Geno2MP are
public databases, with sharing driven by families in the case of MyGene2
where anyone can access the displayed variant level data associated with
phenotypic terms. VariantMatcher will accept variant-specific queries,
search its database of variants, and respond if the variant is present
and the associated phenotype if available with dual notification to the
querier and data submitter (Wohler et al., 2021) . Franklin is an
interpretation and connection platform that supports a community of
users to facilitate variant interpretation. These four data platforms
are working to facilitate a federated connection to one another using
Data Connect, a standard for discovery and search from GA4GH
(PRODUCTION: REFERENCE APPEARS IN THE SAME SPECIAL ISSUE(Rodrigues et al., 2022) .
At the gene level, several databases, such as DECIPHER (PRODUCTION:
REFERENCE APPEARS IN THE SAME SPECIAL ISSUE (Foreman et al.,
2022) ), RD-Connect GPAP (PRODUCTION: REFERENCES APPEARS IN THE SAME
SPECIAL ISSUE (Laurie et al., 2022) ), Genomics4RD (PRODUCTION:
REFERENCES APPEARS IN THE SAME SPECIAL ISSUE (Driver et al.,
2022) ), and seqr if used in collaboration with the Broad Center
for Mendelian Genomics (PRODUCTION: REFERENCES APPEARS IN THE SAME
SPECIAL ISSUE (Pais et al., 2022) ) are now individually
approaching this challenge using internal one-sided matchmaking where an
internal user with a candidate gene identified in an undiagnosed patient
can query the genomic data housed in the database to see all variants
identified in this candidate gene at a certain frequency, or of a
certain type, across the dataset along with associated phenotypic and
often inheritance data. While these approaches are currently siloed and
only available to internal users due to the level of data being shared,
efforts are underway to make more of this data available. For example,
Geno2MP (University of Washington Center for Mendelian Genomics)allows searches of the rare variants generated by the majority of the
Centers for Mendelian Genomics which are linked to very high-level
phenotypic information (Baxter et al., 2022) . Genomics4RD
(PRODUCTION: REFERENCES APPEARS IN THE SAME SPECIAL ISSUE (Driver
et al., 2022) ) is piloting a one-sided matchmaking platform for
external users using a registered access model to facilitate multi-level
filtering for both genetic variation and phenotypic information and
ensuring that compound heterozygous variants in a single participant are
identifiable. Beacon is a genomic discovery protocol and data access API
issued by the GA4GH. Its most recent version (v2) presented in this
issue describes its new and enhanced features for complex queries and
richer responses (PRODUCTION: REFERENCES APPEARS IN THE SAME SPECIAL
ISSUE (Rambla et al. 2022). Beacon v2 is designed to sit on top of
existing solutions, can be integrated into Beacon networks and provides
a way forward for the next phase of genomic matchmaking and other data
queries.