AUTHOREA
Log in Sign Up Browse Preprints
LOG IN SIGN UP
Yujiang Wang
Yujiang Wang

Public Documents 1
Machine learning ascertains candidate genes for rare musculoskeletal disorders in 100...
Yujiang Wang
Anshul Thakur

Yujiang Wang

and 10 more

April 27, 2025
The rich and unique whole genome sequencing (WGS) data in the UK’s 100,000 Genomes Project (100kGP) remain under-explored with low diagnostic yield for rare diseases (RDs), including rare musculoskeletal (MSK) disorders. Machine learning (ML) algorithms, a powerful method for discovering insights into novel genes, suffer from the long-standing curse of dimensionality, especially for MSK studies with small patient cohorts and enormous numbers of candidate variants. To this end, we propose an ML framework to hierarchically collapse evidence from multiple variant-level annotations into a gene-level representation crucial for revealing genetic causality insights and dispelling the dimensionality curse. We curated variant data from 449 patients of four MSK subtypes from 100kGP with 980 non-MSK control participants, and we trained an ML model to indicate MSK-risk candidate genes. Those top-ranked ML genes suggest that our ML approach could extract inspiring algorithmic insights into the genetic MSK and RD research.

| Powered by Authorea.com

  • Home