Abstract
The accurate gland segmentation from digitized H&E (hematoxylin and
eosin) histology images with a wide range of histologic grades of cancer
is quite challenging. In recent years, several fully convolutional
network methods have been proposed, with UNet being the most classic.
The UNet model, with its symmetric structure, has shown excellent
performance in gland segmentation tasks. However, the locality of
convolution operations in UNet also limits its ability to capture global
dependencies. To address this limitation, this paper proposes a novel
deep glandular tissue image segmentation network based on Swin UNet,
termed EMA-Swin UNet. This network replaces CNN modules with Swin
Transformer modules to capture both local and global representations.
Additionally, the EMA-Swin UNet incorporates an Efficient Multi-scale
Attention (EMA) module to enhance multi-scale feature extraction for
glandular tissues of various sizes by capturing global dynamic features
and long-range smooth features from the encoder outputs. By integrating
edge-detection pooling, we enhanced the refinement of prediction maps
produced by the EMA-Swin UNet. Moreover, we standardized the staining
across both the ClaS dataset and the six-grade tumor differentiation
dataset from EBHI-Seg using Reinhard normalization. The final
segmentation results are compared with those of classical gland
segmentation algorithms on the ClaS and EBHI-Seg datasets, demonstrating
the effectiveness of our proposed method. Particularly, on the GlaS
dataset, the mDice reached 0.894.