2.4 Extraction and Selection of Quantitative Features
After image preprocessing, a number of 1409 quantitative imaging features were extracted from CT images based on CMP or NP using the Pyradiomics v.2.1.2 package, and so a total of 2818 features from CMP+NP were obtained. These features can be grouped into three groups. Group 1 (first order statistics) quantitatively delineates the distribution of voxel intensities within the CT image through commonly used and basic metrics. Group 2 (shape- and size-based features) reflects the shape and size of the region. Calculated from grey level run-length and grey level co-occurrence texture matrices, textural features that can quantify region heterogeneity differences were classified into group 3 (texture features).
As described above, a large number of image features may be computed. However, all these extracted features may not be useful for a particular task. Therefore, dimensionality reduction and selection of task-specific features for best performance are necessary steps. To reduce the redundant features, the feature selection methods included the variance threshold (variance threshold = 0.8), SelectKBest and the least absolute shrinkage and selection operator (LASSO) were used for this purpose. For the variance threshold method, the threshold is 0.8, so that the eigenvalues of the variance smaller than 0.8 were removed. The SelectKBest method, which belongs to a single variable feature selection method, uses p value to analysis the relationship between the features and the classification results, so all the features with p < 0.05 will be used. For LASSO model, L1 regularizer was used as the cost function, and the error value of cross validation is 5, and the maximum number of iterations is 1000.