loading page

Genomic prediction with machine learning in sugarcane, a complex highly polyploid clonally propagated crop with substantial non-additive variation for key traits
  • +6
  • Chensong Chen,
  • Owen Powell,
  • Eric Dinglasan,
  • Elizabeth Ross,
  • Seema Yadav,
  • Xianming Wei,
  • Felicity Atkin,
  • Emily Deomano,
  • Ben Hayes
Chensong Chen
The University of Queensland

Corresponding Author:chensong.chen@uq.edu.au

Author Profile
Owen Powell
Queensland Alliance for Agriculture and Food Innovation
Author Profile
Eric Dinglasan
Queensland Alliance for Agriculture and Food Innovation
Author Profile
Elizabeth Ross
Queensland Alliance for Agriculture and Food Innovation
Author Profile
Seema Yadav
Queensland Alliance for Agriculture and Food Innovation
Author Profile
Xianming Wei
Sugar Research Australia Ltd
Author Profile
Felicity Atkin
Sugar Research Australia Ltd
Author Profile
Emily Deomano
Sugar Research Australia Ltd
Author Profile
Ben Hayes
University of Queensland
Author Profile

Abstract

Sugarcane has a complex, highly polyploid genome with multi-species ancestry. Additive models for genomic prediction of clonal performance might not capture interactions between genes and alleles from different ploidies and ancestral species. As such genomic prediction in sugarcane presents an interesting case for machine learning methods, which are purportedly able to deal with high levels of complexity in prediction. Here we investigate deep learning networks (DL), including Multilayer networks (MLP) and convolution neural networks (CNN), and Random Forest (RF) for genomic prediction in sugarcane. The data set was 2912 sugarcane clones, scored for 26,086 genome wide SNP markers, with final assessment trial (FAT) data for total cane harvested (TCH), Commercial cane sugar (CCS) and Fibre content. The clones in the latest trial (2017) were used as a validation set. We compared performances of these methods to GBLUP extended to include dominance and epistatic effects. The prediction accuracies from GBLUPs were 0.37 for TCH, 0.37 for CCS and 0.48 for Fibre, while the DL models had accuracies of 0.33 for TCH prediction, 0.38 for CCS prediction and 0.43 for Fibre. Optimised RF achieved a prediction accuracy of 0.35 for TCH, 0.38 for CCS and 0.48 for Fibre. Both DL and RF predictions were more accurate additive GBLUP but generally lower than extended GBLUP. Finally, we identified a partially shared distribution of SNP selections between RF and GBLUP models. We conclude RF may have some utility for genomic prediction for crops with highly complex genomes, particularly if non-additive interactions can be captured with clonal propagation.
09 Dec 2022Submitted to The Plant Genome
13 Dec 2022Submission Checks Completed
13 Dec 2022Assigned to Editor
13 Dec 2022Review(s) Completed, Editorial Evaluation Pending
14 Dec 2022Reviewer(s) Assigned
03 Jan 2023Editorial Decision: Revise Minor