Supervised Machine Learning Models
In the current study, each data point in the training set includes a vector (X, Y), where X denotes the feature vector of\((x_{1},\ x_{2},\ \ldots,\ x_{n})\) and Y expresses the output feature that takes values 0 (absence of CM) and 1 (presence of CM). The following five supervised ML models were constructed on the feature vectors of several sizes \((1,\ 2,\ \ldots,\ 11)\). Single and the combined features achieved from the best performed classifiers were constructed for each measurement.
XGBoost (Extreme Gradient Boosting) algorithm: Xgboost is an algorithm that employs a gradient boosting decision tree and can compute boosted trees efficiently in parallel. XgBoost divides its tree models into regression and classification trees (9). The biggest advantage of XGBoost is its scalability in all situations. The method for measuring timing helps to simulate distributed real-time applications with millions of simultaneous connections. The efficacy of XGBoost is owing to various well-differentiated systems and algorithmic optimizations. A novel tree learning method is used to handle sparse data, and a technically justified weighted quantile sketch technique is implemented in approximate tree learning to handle instance weights (10). The details of the XGBoost can be achieved from the relevant study (10).
Stochastic Gradient Boosting (SGB): Gradient boosting is a supervised machine learning approach for regression and classification issues in the SGB algorithm, which creates a predictive model using a collection of weak learners/classifiers, usually decision trees. SGB builds the model in a stage-wise manner and constructs additive regression models by fitting a simple parameterized function (base learner) to the current pseudo-residuals by at least squares at each iteration. The pseudo-residuals are the minimization of the functional loss gradient. It’s worth noting that integrating randomization into the technique improves gradient boosting’s approximation accuracy and execution speed. For prediction models, the SGB parameters (depth of interaction, tension number, and shrinkage) were optimized as soon as possible (11). The entire information on the SGB was reported by the related study (12).
Bagged classification and regression trees (Bagged CART): CART has been widely utilized for ML modeling with adequate results, including potential prediction of the diseases in health field. As the CART is regarded as an unstable model, the bagging method can significantly increase its precision. The bagged CART effectively eliminates the variance in estimation and greatly improves accuracy and over-fitting of classification. Thus, it is predicted that promising results can be obtained by employing the bagged CART in the new application of potential classification problems (13). A comprehensive explanation for the bagged CART was given in the concerned reference (14).
Random Forest (RF): An ensemble method, which constructs several decision trees that will be used by the majority vote to classify a new case, is Random Forest. A subset of attributes selected randomly from the whole original set of attributes is used by each decision tree node. Additionally, in the same way as bagging, each tree uses a different bootstrap sample data. Besides, Random Forest is a technique that is computer-efficient and can run rapidly over large datasets. It has been used in diverse domains in many recent research projects and real-world applications (14). The details of the Random Forest was introduced with the relevant paper (15).
Logistic Regression (LR): Logistic regression analysis is used to obtain an ”odds ratio” that incorporates two or more covariates/features. The process is similar to multiple linear regression, except that the response attribute is predicted from a binomial distribution. The data and the results of each variable indicate the strength of each predictor on the observed outcome of interest. This method can be used to estimate the probability\(P(Y\ |\ X)\) for an observation point of X and considers a different transformation of a linear combination for the input features. Here,\(\left(Y=1\ \right|X)=\frac{1}{\left[1+e^{-\left(b_{0}+b_{1}x_{1}+\ldots+b_{n}x_{n}\right)}\right]}\), where \(b_{0},\ b_{1},\ \ldots,\ b_{n}\) are parameters of the model, which are calculated from the training stage with maximum likelihood technique (16,17).