Supervised Machine Learning Models
In the current study, each data point in the training set includes a
vector (X, Y), where X denotes the feature vector of\((x_{1},\ x_{2},\ \ldots,\ x_{n})\) and Y expresses the output feature
that takes values 0 (absence of CM) and 1 (presence of CM). The
following five supervised ML models were constructed on the feature
vectors of several sizes \((1,\ 2,\ \ldots,\ 11)\). Single and the
combined features achieved from the best performed classifiers were
constructed for each measurement.
XGBoost (Extreme Gradient Boosting) algorithm: Xgboost is an
algorithm that employs a gradient boosting decision tree and can compute
boosted trees efficiently in parallel. XgBoost divides its tree models
into regression and classification trees (9). The biggest advantage of
XGBoost is its scalability in all situations. The method for measuring
timing helps to simulate distributed real-time applications with
millions of simultaneous connections. The efficacy of XGBoost is owing
to various well-differentiated systems and algorithmic optimizations. A
novel tree learning method is used to handle sparse data, and a
technically justified weighted quantile sketch technique is implemented
in approximate tree learning to handle instance weights (10). The
details of the XGBoost can be achieved from the relevant study (10).
Stochastic Gradient Boosting (SGB): Gradient boosting is a
supervised machine learning approach for regression and classification
issues in the SGB algorithm, which creates a predictive model using a
collection of weak learners/classifiers, usually decision trees. SGB
builds the model in a stage-wise manner and constructs additive
regression models by fitting a simple parameterized function (base
learner) to the current pseudo-residuals by at least squares at each
iteration. The pseudo-residuals are the minimization of the functional
loss gradient. It’s worth noting that integrating randomization into the
technique improves gradient boosting’s approximation accuracy and
execution speed. For prediction models, the SGB parameters (depth of
interaction, tension number, and shrinkage) were optimized as soon as
possible (11). The entire information on the SGB was reported by the
related study (12).
Bagged classification and regression trees (Bagged CART): CART
has been widely utilized for ML modeling with adequate results,
including potential prediction of the diseases in health field. As the
CART is regarded as an unstable model, the bagging method can
significantly increase its precision. The bagged CART effectively
eliminates the variance in estimation and greatly improves accuracy and
over-fitting of classification. Thus, it is predicted that promising
results can be obtained by employing the bagged CART in the new
application of potential classification problems (13). A comprehensive
explanation for the bagged CART was given in the concerned reference
(14).
Random Forest (RF): An ensemble method, which constructs several
decision trees that will be used by the majority vote to classify a new
case, is Random Forest. A subset of attributes selected randomly from
the whole original set of attributes is used by each decision tree node.
Additionally, in the same way as bagging, each tree uses a different
bootstrap sample data. Besides, Random Forest is a technique that is
computer-efficient and can run rapidly over large datasets. It has been
used in diverse domains in many recent research projects and real-world
applications (14). The details of the Random Forest was introduced with
the relevant paper (15).
Logistic Regression (LR): Logistic regression analysis is used to
obtain an ”odds ratio” that incorporates two or more
covariates/features. The process is similar to multiple linear
regression, except that the response attribute is predicted from a
binomial distribution. The data and the results of each variable
indicate the strength of each predictor on the observed outcome of
interest. This method can be used to estimate the probability\(P(Y\ |\ X)\) for an observation point of X and considers a different
transformation of a linear combination for the input features. Here,\(\left(Y=1\ \right|X)=\frac{1}{\left[1+e^{-\left(b_{0}+b_{1}x_{1}+\ldots+b_{n}x_{n}\right)}\right]}\), where \(b_{0},\ b_{1},\ \ldots,\ b_{n}\) are parameters of the model,
which are calculated from the training stage with maximum likelihood
technique (16,17).