Discussion
In the last 10 years, multiple public databases such as Exome Variant
Server (https://evs.gs.washington.edu/EVS/) (ExomeVariantServer),
1000 Genome Project
(https://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/)
(Genomes Project et al. 2015), ExAC (gs://gnomad-public/legacy) (Lek et
al. 2016), gnomAD (https://gnomad.broadinstitute.org/) (Karczewski
et al. 2020) and ABraOM (https://abraom.ib.usp.br/search.php)
(Naslavsky et al. 2017) have facilitated the identification of rare
pathogenic variants in the represented populations by publicly sharing
the genomic data of, mostly, healthy individuals in an accessible way.
However, these databases are de-identified and do not harbor phenotypic
information of included individuals. Therefore, these databases are not
optimized for the interpretation of rare variants possibly associated
with rare phenotypes, particularly those characterized by mild
presentation, incomplete penetrance, and/or late onset.
To overcome this limitation, databases such DECIPHER (Firth et al.
2009), MyGene2/Geno2MP (Chong et al. 2016), VariantMatcher (Wohler et
al. 2021) and Franklin (Genoox) have created a public way to share
genomic and phenotypic data from individuals with rare phenotypes that
is easily accessible to researchers, clinicians, health care providers,
and patients. While they are each queried in a slightly different way,
they all harbor accessible genomic and phenotypic information from
patients with rare phenotypes. The use of these databases has supported
the identification of novel disease-causing variants and the more
precise classification of many variants of uncertain significance. To
date, most of the VUSs investigated in databases such MyGene2/Geno2MP,
VariantMatcher and Franklin could be classified as benign after close
comparison of the phenotypes facilitating identification of stronger
candidate causative variants for the phenotypes being investigated
(Wohler et al. 2021).
We plan to follow the successful Matchmaker Exchange (MME) model to
connect these databases and others in a federated network using the
GA4GH Data Connect standard. This will facilitate data sharing, the
identification of individuals harboring the same variant, and the
exchange of phenotypic information making variant classification more
specific. Users will be able to choose the most appropriate database to
share their data and easily query other connected databases for similar
cases. When a match occurs among connected databases, the users will
automatically and simultaneously receive an email notification informing
them of the presence or absence of a match in the queried database(s). A
matching email contains the matching data (genomic +/- phenotypic
features), contact information of the users to whom they matched, and
additional metadata that will be shared at the discretion of the
databases that harbor the matching cases. Subsequently, the matched
users can choose to contact each other to exchange further information
about their cases including detailed phenotypic information.
If there is no match in any of the queried databases, the submission
will only be stored in the database from which the query originated, not
the external databases queried. In the future, if the users would like
to repeat the query, they would need to send the submission again. In
some databases, such as VariantMatcher, the users have the option to
automatically resend the data from their submissions to the other chosen
databases on a periodic basis.
Variant information in the format of genomic location is the minimal
requirement to start a query among the connected nodes. Matching on
variant features such as zygosity or phenotypic features in addition to
the required variant will also be supported by some of the databases
such VariantMatcher. However, even if some of the databases match only
on the variant information, we expect that the users querying the
databases through the Data Connect API will also submit zygosity
information in addition to detailed phenotypic information so this
information can be shared in the email notification which will
facilitate further communication among the users who matched. To enhance
the likelihood of a match on pathogenic variant databases such
VariantMatcher and Geno2MP only harbor rare coding variants.
We will follow the recommendations of the Consent Task Team from the
GA4GH Regulatory and Ethics Working Group Since and individual written
informed consent will typically be required since variant-level data
and/or phenotypic data will be provided. Each database will manage the
security and privacy of the data they harbor.
By connecting variant-level databases that also facilitate phenotypic
data access, we expect to improve the variant classification process in
research and clinical settings and also to increase the discovery rate
of novel disease-causing variants by increasing the specificity of
matches. Nevertheless, incomplete penetrance, variable expressivity of
the phenotype, age of onset, and zygosity are some of the factors that
should be considered when the variants and phenotypes are being compared
before the final classification of a candidate variant.