Prediction of Lung Cancer Metastasis Using Machine Learning Models Based
on Clinical Data
Abstract
Background:Clinical laboratory data, indicative of tumor cell
growth and metabolic activity, warrants investigation for its potential
in predicting lung cancer metastasis. Aims: The purpose is to
develop a predictive model for regional lymph node involvement and skip
metastasis in lung cancer using machine learning methods and integrating
clinical laboratory information and patient characteristics.
Methods: Data from lung cancer patients at Chongqing University
Fuling Hospital between January 2020 and December 2022 were analyzed
retrospectively. Patients were divided into N (regional lymph node
involvement prediction) and M (skip metastasis prediction) groups based
on TNM staging criteria. Prognostic factors were determined through
univariate analysis and LASSO regression, and machine learning
algorithms were used to develop predictive models. Results: Out
of a total of 1629 cases analyzed, 861 were in the N group and 519 were
in the M group. Univariate analysis revealed 40 parameters that were
significantly different between the two groups (p < 0.05) and
27 parameters, respectively. LASSO regression identified 13
characteristic factors for the N group and 12 for the M group. In the N
group, these factors included tumor size, prothrombin time (PT), mean
platelet volume, fibrinogen, platelet count, procalcitonin, carbohydrate
antigen 15-3 (CA 15-3), carcinoembryonic antigen (CEA), adenosine
deaminase, red blood cell distribution width, thrombin time, smoking and
drinking history. In the M group, the factors were cytokeratin 19
fragment, tumor size, CEA, CA 15-3, squamous cell carcinoma-related
antigen, alkaline phosphatase, fibrinogen, hemoglobin, calcium, albumin,
PT, and absolute monocyte value. The test set results indicated that the
Logistic regression model was optimal for both groups, achieving AUCs of
0.888 and 0.875, respectively. Conclusion: This study
demonstrates the potential of using machine learning algorithms
alongside clinical characteristics and laboratory data to predict
regional lymph node involvement and skip metastasis in lung cancer.