Genomic prediction with machine learning in sugarcane, a complex highly polyploid clonally propagated crop with substantial non-additive variation for key traits

Chensong Chen; Owen Powell; Eric Dinglasan; Elizabeth Ross; Seema Yadav; Xianming Wei; Felicity Atkin; Emily Deomano; Ben Hayes

doi:10.22541/essoar.167407908.82445315/v1

loading page

Genomic prediction with machine learning in sugarcane, a complex highly polyploid clonally propagated crop with substantial non-additive variation for key traits

Chensong Chen,
Owen Powell,
Eric Dinglasan,
Elizabeth Ross,
Seema Yadav,
Xianming Wei,
Felicity Atkin,
Emily Deomano,
Ben Hayes

Abstract

Sugarcane has a complex, highly polyploid genome with multi-species ancestry. Additive models for genomic prediction of clonal performance might not capture interactions between genes and alleles from different ploidies and ancestral species. As such genomic prediction in sugarcane presents an interesting case for machine learning methods, which are purportedly able to deal with high levels of complexity in prediction. Here we investigate deep learning networks (DL), including Multilayer networks (MLP) and convolution neural networks (CNN), and Random Forest (RF) for genomic prediction in sugarcane. The data set was 2912 sugarcane clones, scored for 26,086 genome wide SNP markers, with final assessment trial (FAT) data for total cane harvested (TCH), Commercial cane sugar (CCS) and Fibre content. The clones in the latest trial (2017) were used as a validation set. We compared performances of these methods to GBLUP extended to include dominance and epistatic effects. The prediction accuracies from GBLUPs were 0.37 for TCH, 0.37 for CCS and 0.48 for Fibre, while the DL models had accuracies of 0.33 for TCH prediction, 0.38 for CCS prediction and 0.43 for Fibre. Optimised RF achieved a prediction accuracy of 0.35 for TCH, 0.38 for CCS and 0.48 for Fibre. Both DL and RF predictions were more accurate additive GBLUP but generally lower than extended GBLUP. Finally, we identified a partially shared distribution of SNP selections between RF and GBLUP models. We conclude RF may have some utility for genomic prediction for crops with highly complex genomes, particularly if non-additive interactions can be captured with clonal propagation.

09 Dec 2022Submitted to The Plant Genome

Show details

Hide details

13 Dec 2022Submission Checks Completed

13 Dec 2022Assigned to Editor

13 Dec 2022Review(s) Completed, Editorial Evaluation Pending

14 Dec 2022Reviewer(s) Assigned

03 Jan 2023Editorial Decision: Revise Minor

Abstract

Peer review status:IN REVISION