Sunwha Park

and 9 more

Objective: To select markers that improve predictive power through machine learning among various vaginal microbiota and develop an excellent prediction algorithm that combines clinical information. Design, setting and population: A multicenter case-control study with 150 Korean pregnant women. Methods: Cervicovaginal fluid were collected from pregnant women during mid-pregnancy. Their demographic profiles, white blood cell count, and cervical length were recorded, and the microbiome profiles of the cervicovaginal fluid were analyzed. The subjects were randomly divided into a training and a test set. A univariate analysis was performed to select markers using seven statistical tests. Using the selected markers, machine learning models were used to build prediction models. Main outcome measures: The preterm birth prediction model showed sensitivity of 79% and specificity of 83% (AUC=0.84). Results: The test area under the curve of the logistic regression model was 0.72 with the 17 microbiome markers. When analyzed by combining white blood cell count and cervical length, the random forest model showed the test area under the curve of 0.84. The GUIDE model confirmed that the association with preterm birth was high when Prevotella and Ureaplasma increased, which could also be explained by the fact that as the number of Peptoniphilus increased, the association with preterm birth was higher. Conclusions: Our study demonstrates that several candidate bacteria could be used as potential predictors for preterm birth, and that the predictive rate can be increased through a machine learning model employing a combination of cervical length and white blood cell count information.