Next Article in Journal
Research on a Method for Measuring the Pile Height of Materials in Agricultural Product Transport Vehicles Based on Binocular Vision
Previous Article in Journal
A Multi-Class ECG Signal Classifier Using a Binarized Depthwise Separable CNN with the Merged Convolution–Pooling Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

AER-Net: Attention-Enhanced Residual Refinement Network for Nuclei Segmentation and Classification in Histology Images

by
Ruifen Cao
1,
Qingbin Meng
1,
Dayu Tan
2,
Pijing Wei
2,
Yun Ding
3 and
Chunhou Zheng
3,*
1
The Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Computer Science and Technology, Anhui University, Hefei 230601, China
2
Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China
3
School of Artificial Intelligence, Anhui University, Hefei 230601, China
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(22), 7208; https://doi.org/10.3390/s24227208
Submission received: 7 October 2024 / Revised: 4 November 2024 / Accepted: 8 November 2024 / Published: 11 November 2024
(This article belongs to the Section Biomedical Sensors)

Abstract

:
The acurate segmentation and classification of nuclei in histological images are crucial for the diagnosis and treatment of colorectal cancer. However, the aggregation of nuclei and intra-class variability in histology images present significant challenges for nuclei segmentation and classification. In addition, the imbalance of various nuclei classes exacerbates the difficulty of nuclei classification and segmentation using deep learning models. To address these challenges, we present a novel attention-enhanced residual refinement network (AER-Net), which consists of one encoder and three decoder branches that have same network structure. In addition to the nuclei instance segmentation branch and nuclei classification branch, one branch is used to predict the vertical and horizontal distance from each pixel to its nuclear center, which is combined with output by the segmentation branch to improve the final segmentation results. The AER-Net utilizes an attention-enhanced encoder module to focus on more valuable features. To further refine predictions and achieve more accurate results, an attention-enhancing residual refinement module is employed at the end of each encoder branch. Moreover, the coarse predictions and refined predictions are combined by using a loss function that employs cross-entropy loss and generalized dice loss to efficiently tackle the challenge of class imbalance among nuclei in histology images. Compared with other state-of-the-art methods on two colorectal cancer datasets and a pan-cancer dataset, AER-Net demonstrates outstanding performance, validating its effectiveness in nuclear segmentation and classification.

1. Introduction

Colorectal cancer is recognized as one of the most prevalent malignant tumors, with various factors, including genetic and environmental influences, potentially leading to its development [1]. The pathological analysis and diagnosis of colorectal cancer based on hematoxylin and eosin (H&E)-stained histology images have become the gold standard in clinical practice. However, manual pathological analysis not only depends on expert experience but also is time consuming, labor-intensive, and inefficient. In order to overcome the difficulties of visual evaluation of tissue sections, there is increasing interest in digital pathology (DP), where scanning devices are used to obtain digital whole slide images (WSI) from glass tissue sections. This allows for the effective processing, analysis, and management of tissue specimens [2]. There are numerous nuclei of different types in each whole slide image. The precise segmentation and classification of nuclei that are interested play a vital role in improving the quantification of WSIs, offering critical information for tasks like cancer diagnosis and predicting clinical outcomes [3,4]. The segmentation and classification of nuclei is mainly to distinguish different nuclei on the basis of various morphological features, such as the shape, contour, size, color and texture features of nuclei. This study focuses on the segmentation and classification of miscellaneous, inflammatory, epithelial, and spindle-shaped nuclei in pathological images of colorectal cancer.
With the advancement of deep learning technology, methods utilizing deep learning have exhibited outstanding performance in both the segmentation and classification of nuclei. Notably, Graham et al. [5] proposed a deep learning framework based on vertical and horizontal distance maps to simultaneously segment and classify instances in histopathological images. However, there are numerous nuclear clusters in histopathological images, and it is difficult to accurately predict the nuclear boundaries and types in these regions. In addition, the number of nuclei varies significantly across different types. The problem of class imbalance in the task of segmenting and classifying nuclei in pathology images refers to the significant difference in the number of different types of nuclei. For example, in the CoNIC Data we used, the number of nuclei corresponding to the six types is shown in Figure 1 [6], indicating a serious imbalance between the samples. This class imbalance can adversely affect the algorithm’s performance, especially when processing classes with a small number of instances. In this paper, we propose a novel attention-enhanced residual refinement network (AER-Net), which consists of one encoder and three decoder branches that have the same network structure. The AER-Net utilizes an attention-enhanced encoder module to focus on more valuable features. To further refine predictions and achieve more accurate results, an attention-enhancing residual refinement module is employed at the end of each encoder branch. Moreover, the coarse predictions and refined predictions are combined by the loss function that employs cross-entropy loss and generalized dice loss to efficiently tackle the challenge of class imbalance among nuclei in histology images. We conducted experiments using three datasets, and our method demonstrated superior performance compared with other state-of-the-art approaches. The contributions of this study are summarized below:
1.
To enhance the ability of the AER-Net encoder extracting nuclei features in pathology images, a spatial and channel attention enhancement encoder module is utilized to improve nuclei segmentation and classification performance.
2.
To address the challenge of accurately predicting nuclear boundaries, we integrate an attention-enhanced residual refinement module into our nuclear segmentation and classification approach. This enhancement enables the model to produce more precise predictions, and experimental results confirm the effectiveness of this attention-enhanced residual refinement module.
3.
A loss function that not only combines cross-entropy loss and generalized dice loss but also integrates coarse predictions and refined predictions is utilized to mitigate the class imbalance of nuclei and improve the overall results.
Figure 1. The number of nuclei per type in the CoNIC2022 dataset.
Figure 1. The number of nuclei per type in the CoNIC2022 dataset.
Sensors 24 07208 g001

2. Related Work

2.1. Nuclei Segmentation Methods

For nuclei segmentation, there exist both traditional methods and deep learning-based approaches. Numerous methods based on traditional image processing techniques have been proposed for nuclear segmentation. Yang et al. [7] proposed a marker-controlled watershed based on mathematical morphology. It first utilizes a thresholding method to obtain markers and energy landscapes and then inputs them into the watershed algorithm to obtain segmentation results. Ali et al. [8] proposed a synergistic boundary and region-based active contouring model that combines an active contours approach with nuclear shape modeling. Consequently, it obtains the final segmentation result by a level-set method. Active contours is a classical image segmentation method. It is based on the mathematical idea called “energy minimization”, which iteratively adjusts the contour lines to better fit the boundary of the segmentation target [9]. The level-set method is a mathematical method that can be used to track the evolution of curves and surfaces and is widely used in the field of image segmentation.
However, the performance of these methods is highly dependent on the choice of parameters, and they are not highly generalizable. Deep learning models can automatically learn image features, and many excellent deep learning models have emerged in computer vision tasks. In recent years, deep learning models have made tremendous progress in the field of medical image processing. Ronneberger et al. [10] proposed U-Net, which has been successfully applied to various medical segmentation tasks. Inspired by UNet, Micro-Net [11] was proposed by Raza et al., which is an architecture for segmenting nuclei and glands in multi-resolution histology images. Chen et al. [12] proposed a deep contour-aware network (DCAN) with two independent branches for predicting instances and contours simultaneously. Zhou et al. [13] proposed CIA-Net, which utilizes a multi-level information aggregation module between two task-specific decoders. These decoders separately output nuclei and contour predictions, enabling the full utilization of the spatial and textural dependencies between nuclei and contours. Naylor et al. [14] used distance map regression for segmentation to split adjacent nuclei. He et al. proposed Mask-RCNN [15] for instance segmentation. It is a two-stage model that initially predicts the candidate regions and then inputs the proposed regions into the segmentation module to obtain the segmentation results. Currently, researchers have already applied Mask-RCNN, an instance segmentation model, to the task of nuclei segmentation [16].

2.2. Nuclei Classification Methods

For nuclear classification methods, many previous nuclear classification methods typically involve two models, the first for nuclear detection and the second for nuclear classification. Yuan et al. [17] utilized the AdaBoost classifier to classify each nucleus based on intensity, morphology, and texture as feature information. Wang et al. [18] proposed a mitotic detection cascade approach that combines CNN-extracted features and hand-crafted features to classify mitotic and anaphase nuclei. Sirinukunwattana et al. [19] proposed a Spatially Constrained Convolutional Neural Network to detect nuclei and a Neighborhood Ensemble Predictor to further predict the class of nuclei. Rahaman et al. [20] proposed a DL-based hybrid depth feature fusion (HDFF) technique for the accurate classification of cervical cytopathic cells. Lately, there have been several methods to segment and classify nuclei simultaneously.

2.3. Simultaneous Nuclei Segmentation and Classification Methods

In recent years, methods for the simultaneous nuclear segmentation and classification of pathology images have emerged in this field. Graham et al. [5] proposed a deep learning method that allows for the simultaneous segmentation and classification of nuclei. This method utilizes horizontal and vertical distance maps of predicted nuclear pixels to the center of mass of that nucleus. It obtains the results of nucleus instance segmentation via a watershed algorithm and predicts the class of nucleus by a separate classification branch. Xiao et al. [21] proposed a scale and regional enhanced decoder network for nuclei classification in histology images, which improves classification accuracy by enhancing nuclear region information. Powerful representations in encoders are essential for the segmentation and classification of nuclei. However, the methods described above do not specifically design a suitable encoder for these tasks. The above methods do not use the prediction refinement module to improve the performance of the methods in nuclear segmentation and classification tasks. In the task of nuclei segmentation and classification of pathology images, nuclei class imbalance is the situation where some types of nuclei are much more numerous than others. This is very common in the real pathology datasets and can pose a challenge for the model to predict the nuclei of the lesser numbered types. The class imbalance problem in the nuclear classification task needs to be further addressed.

3. Methodology

The AER-Net employs a classical encoder–decoder architecture. The model comprises three decoder branches: a nuclear probability branch, a distance map regression branch, and a nuclear classification branch. All of these branches share an attention-augmented encoder, as illustrated in Figure 2. AER-Net initially predicts the distance map and probability map to segment the instances of nuclei, and it subsequently integrates the predictions from the classification prediction branch to predict the types of nuclei. AER-Net utilizes spatial and channel attention modules within the encoder to enhance the representational power of the extracted features. An attention-enhanced residual refinement module is added to the end of each branch of the decoder to achieve accurate nuclear boundaries and class predictions.

3.1. Encoder

The model employs an encoder module enhanced with channel and spatial attention, as illustrated in Figure 3. Inspired by ResNet [22] and CBAM [23], we propose a channel and spatial attention-enhanced residual ( C S A R ) encoder block, where n represents the number of residual units, and the corresponding n values for C S A R blocks 1 to 4 are 3, 4, 6, and 3, respectively. Specifically, we improve the ResNet-50 network by removing the initial max pooling module and setting the stride of the first convolutional layer to 1, aiming to obtain as many detailed features of the nuclei as possible. In addition, we add channel and spatial attention modules at the end of each stage of ResNet-50. The addition of the channel and spatial attention modules enhances the encoder’s representation, which is important for predicting nuclear contours and classes. The role of the channel attention module is to allow the model to dynamically adjust the weights of each channel based on the importance of the input features. The spatial attention module computes attention on the spatial dimension of the feature map to focus on regions of the feature map that are more important to the task. The attention mechanism allows the model to pay more attention to the feature information in the nuclear dense region and reduce the influence of other interfering information.

3.2. Decoder

There are three decoder branches in our model: the nuclear probability branch, the distance map regression branch, and the nuclear classification branch, as shown in Figure 1. Inspired by HoverNet [5], we employ the same structure for the three decoder branches, which include upsampling operations and dense units [24]. The decoder block is illustrated in Figure 4, with dense units employed after the first and second upsampling, utilizing 8 and 4 dense units, respectively. We use a skip connection to merge the features of the encoder. Specifically, we add the feature maps of each C S A R module in the encoder to the corresponding feature maps in the decoder. This approach optimizes the utilization of the detailed information of the low-level features and enables the decoder to better recover details of nuclei. It is worth noting that the feature maps provided by the C S A R module are augmented by the channel and spatial attention modules, and the C S A R module suppresses irrelevant information and enhances information on features important for the task of nuclear segmentation and classification. After each decoder block, we add an attention-enhanced residual refinement block to obtain refined prediction results.

3.3. The Attention-Enhanced Residual Refinement Module

The representation of different types of nuclei in pathological tissue images is variable. Tumor nuclei frequently appear as clusters, making the boundaries of these nuclei difficult to predict accurately. Inspired by the works of Qin et al. [25] and Hu et al. [26], we propose an attention-enhanced residual refinement module ( A R R M ) to address the above problem, as illustrated in Figure 5a. The A R R M is designed with an encoder–decoder architecture, where the encoder and decoder modules have different designs. The encoder is composed of a convolutional layer and four stages, and each stage includes a channel attention convolution module and a max pooling layer. As shown in Figure 5b, the channel attention convolution module is composed of a 3 × 3 convolution layer followed by a batch normalization and R e L U activation function and a channel attention mechanism. Correspondingly, the decoder consists of four stages and a single 3 × 3 convolutional layer. Each stage in the decoder consists of a 3 × 3 convolutional layer, which is followed by batch normalization and a R e L U activation function. In the attention-enhanced residual refinement module, a channel attention convolution module is used as a bridging module between the encoder and decoder. The residual refinement module allows further adjustments to the coarse predictions to obtain more accurate refined predictions. The initial segmentation results may have rough boundaries and missing details. The refinement module can optimize the boundary of the segmented region by finer feature extraction.

3.4. Loss Function

The total loss function of the proposed network is the sum of the auxiliary loss L 1 multiplied by the weight α and the final loss L 2 .
L = α L 1 + L 2
where L 1 is the loss calculated by coarse predictions and L 2 is the loss calculated by the refined predictions using A R R M . The model has three decoder branches, and each branch uses a different loss function.
L i = L i , a + L i , b + L i , c , i { 1 , 2 }
where L i , a is the loss function of the nuclear probability branch, L i , b is the loss function of the distance regression branch, and L i , c is the loss function of the nuclear classification branch.

3.4.1. Distance Regression Branch

The loss function L b of the distance map regression branch is as follows:
L b = β 1 L m s e + β 2 L m s g e
L m s e = 1 N i = 1 N ( X i X ^ i ) 2
L m s g e = 1 m i M ( x ( X i , x ) x ( X ^ i , x ) ) 2 + 1 m i M ( y ( X i , y ) y ( X ^ i , y ) ) 2
where L m s e is the mean squared error loss, L m s g e is the gradient mean squared error loss, and β 1 and β 2 are the corresponding weights of L m s e and L m s g e . β 1 and β 2 are 1 by default. X i ^ denotes the ground truth of the horizontal and vertical distances of the nuclear pixels to their corresponding centers of mass, and X i is the horizontal and vertical distance map predicted by the network. x and y represent the horizontal and vertical gradients, respectively. M represents the set that contains all nuclear pixels, and m represents the number of nuclear pixels in the image.

3.4.2. Nuclear Probability Branch

The loss function L a of the nuclear probability branch is as follows:
L a = β 3 L c e + β 4 L d i c e
L d i c e = 1 2 i = 1 N ( X i X ^ i ) + ε i = 1 N X i + i = 1 N X ^ i + ε
L c e = 1 N i = 1 N m = 1 M X i , m l o g X ^ i , m
where X ^ is defined as the ground truth, X is the predicted outcome, i is the pixel of X and X ^ , and ε is a constant set to 0.001. L c e and L d i c e denote cross-entropy loss and dice loss, respectively. β 3 is the weight of L c e and β 4 is the weight of L d i c e . β 3 and β 4 are 1 by default.

3.4.3. Nuclear Classification Branch

In the classification branch, we utilize a loss function that combines cross-entropy loss and generalized dice loss. The class imbalance in nuclear segmentation and classification datasets tends to make it challenging to detect nuclei in classes with a small number of instances. Although Graham et al. [5] use the sum of cross-entropy and dice loss as a loss function in the nuclei classification branch, addressing class imbalance remains a challenge. We utilize the sum of the cross-entropy loss and generalized dice loss in the classification branch to address the problem of class imbalance. The loss function L c of the nuclear classification branch is as follows:
L c = β 5 L c e + β 6 G D L
G D L = 1 2 l = 1 K w l n r l n p l n l = 1 K w l n ( r l n + p l n )
w l = 1 ( n = 1 N r l n ) 2
where L c e is the cross-entropy loss function, G D L is the generalized dice loss [27], and β 5 and β 6 are the corresponding weights, respectively. β 5 and β 6 are 1 by default. K is the number of classes and w l is the weight of the class l as shown in Equation (11). r l n represents the true label value, and p l n represents the predicted probability value. Since GDL assigns higher weights to classes with fewer nuclei, it makes the model pay more attention to a few classes, thus effectively addressing the problem of class imbalance.

3.5. Post-Processing

Inspired by HoverNet [5], we employ the following post-processing method. In the horizontal and vertical distance maps, the pixel differences between different cell nuclei are significant. Therefore, we compute the gradients of the predicted horizontal and vertical distance maps, and the generated gradient maps can highlight the regions where there are significant differences between neighboring pixels.
S m = m a x ( H x ( p x ) , H y ( p y ) )
where p x and p y are the results of the horizontal and vertical distance maps output by the distance regression branch, and H x and H y are the horizontal and vertical components of the Sobel operator. S m highlights the regions where there are significant differences between neighboring pixels in the horizontal and vertical ranges. The markers M and energy landscape E are computed as follows:
M = σ ( τ ( q , h ) τ ( S m , k ) )
E = [ 1 τ ( S m , k ) τ ( q , h ) ]
where q is the prediction result of the nuclear probability branch. τ ( a , b ) is the threshold function operating on the matrix a, which outputs 1 when the element is greater than b and 0 otherwise. h and k are set according to experience. σ changes all negative values to 0. After obtaining M and E, nuclear instance segmentation results could be obtained using the marker-controlled watershed algorithm. Finally, the instance segmentation results are combined with the output of the nuclei classification branch to perform class majority voting on the pixels within each instance of cell nuclei to obtain the final nuclei segmentation and classification prediction results.

4. Experiments

4.1. Datasets

Three public datasets, namely CoNSep [5], Lizard [28], and PanNuke [29,30], are used to validate the performance of AER-Net. CoNSep and Lizard are colorectal cancer datasets, while PanNuke is a pan-cancer dataset containing a variety of different cancer types.
The CoNSep dataset is a colorectal nuclear segmentation and phenotypes dataset, consisting of 41 H&E stained images scanned at 40 × magnification. Each image in the dataset has a size of 1000 × 1000 pixels. Referring to the setup of the paper [5], the nuclear classes in the CoNSep dataset are divided into four classes which are epithelial, inflammatory, spindle-shaped, and miscellaneous. The training set consists of 27 images, while the test set comprises 14 images.
The Lizard dataset is the largest known dataset for nuclear segmentation and classification, comprising histology image regions of colon tissue from six different data sources. The dataset images are at 20 × magnification and provide nuclear classification labels. The dataset is labeled with six different types of nuclei including neutrophil, epithelial, lymphocyte, plasma, eosinophil, and connective. For the Lizard dataset, we use the data provided in CoNIC 2022 [6], which contains 4981 images with a size of 256 × 256 pixels. In order to better evaluate AER-Net, we divided 4981 images into 2988 as a training set, 996 as a validation set, and 997 as a test set by stratified random sampling.
The PanNuke dataset contains H&E staining images of 19 different tissue types with the nuclear classes divided into five classes. The dataset is organized into three folds, comprising 2656, 2523, and 2722 images, respectively. Each image has a size of 256 × 256 pixels. We use fold1 as the training set, fold2 as the validation set, and fold3 as the test set.

4.2. Evaluation Metrics

4.2.1. Nuclear Instance Segmentation Evaluation

The evaluation metric for the instance segmentation uses the Panoptic Quality (PQ), which was originally proposed by Kirillov et al. [31].
P Q = T P T P + 1 2 F P + 1 2 F N Detection Quality ( DQ ) × ( x , y ) T P I o U ( x , y ) T P Segmentation Quality ( SQ )
where x is the ground truth, y is the predicted outcome, and I o U is the intersection over the union. If I o U ( x , y ) > 0.5 , it is assumed that x and y are a unique matching pair. T P denotes matched pairs, F N denotes unmatched Ground Truth ( G T ) segments, and F P denotes unmatched prediction segments. Panoptic Quality ( P Q ) can be visualized as the product of Detection Quality ( D Q ) and Segmentation Quality ( S Q ).

4.2.2. Nuclear Classification Evaluation

We use the detection metric F d and the classification metric F t for the evaluation of the nuclei classification task. F d is defined as follows:
F d = 2 T P d 2 T P d + F P d + F N d
where T P d denotes correctly detected instances, F N d denotes incorrectly detected G T instances, and F P d denotes over-detected instances. T P d is further divided into correctly classified type t instances T P t , correctly classified non-type t instances ( T N t ), incorrectly classified type t instances ( F P t ), and incorrectly classified non-type t instances ( F N t ), which are used to compute F t for the classification task.
F t = 2 ( T P t + T N t ) 2 ( T P t + T N t ) + 2 F P t + 2 F N t + F P d + F N d

4.3. Implementation Details

We implemented our framework using PyTorch on a workstation that has an NVIDIA GeForce RTX 3090 GPU (24G). In the experiments with the CoNSep dataset, the input size of our model and HoverNet is 270 × 270 pixels, and the output size is 80 × 80 pixels. For the experiments on the Lizard dataset, the input and output sizes of both AER-Net and HoverNet are 256 × 256 pixels. In the experiments on the PanNuke dataset, the input size of AER-Net and HoverNet is 256 × 256 pixels, and the output size is 164 × 164 pixels. During model training, various data augmentation methods such as flipping, Gaussian blurring, and median blurring are employed. The whole training process is divided into two stages each with 50 epochs, and the same settings are applied in each stage. In the first stage, we do not freeze the encoder but train it directly. The batch size is four for both the first and second stages of the training process. The model parameters are initialized with pre-trained weights from the ImageNet [32] dataset, we use the Adam optimization method, and the initial learning rate is 0.0001 , which will be reduced to 0.00001 after 25 epochs. For the loss function parameters in the training of the CoNSep dataset, we set β 2 to 2 and used default values for all other parameters. For Lizard and PanNuke dataset training, the loss function parameters all use default values. Note that for the Pannuke dataset, A R R M utilizes only two pooling operations, which is because the output of the model for the Pannuke dataset is 164 × 164 pixels.

4.4. Experimental Results

We compare the AER-Net with other state-of-the-art methods on the CoNSep dataset, as illustrated in Table 1. The experimental results for the Lizard dataset are shown in Table 2. In addition, we compared the baseline (HoverNet) with the AER-Net on the PanNuke dataset to validate the predictive ability of our model on different tissues.
Firstly, experiments were conducted on the CoNSep dataset, and the results in Table 1 reveal that AER-Net achieved the best performance on the CoNSep dataset. AER-Net achieved the highest P Q of 0.556, which is 3.5% higher than the second highest value. This result indicates that AER-Net achieves excellent segmentation performance on smaller datasets such as CoNSep. In the CoNSep dataset, there are four types of nuclei: miscellaneous, inflammatory, epithelial, and spindle-shaped nuclei, corresponding to F m , F i , F e , and F s in Table 1. The number of nuclei in the miscellaneous class is lower, and hence the F m is lower for many methods. Notably, AER-Net still performs well in classifying miscellaneous nuclei, achieving an F m 16.2 % higher than the result of HoverNet. This indicates that our proposed method can effectively address the class imbalance of nuclei instances in the task of nuclei segmentation and classification. The comparison between AER-Net and HoverNet results is depicted in Figure 6, where it can be observed that the segmentation results and the predictions of nuclear classification are more accurate in AER-Net compared with HoverNet.
We conducted experiments on a larger dataset, Lizard, and the experimental results are presented in Table 2. F n e , F e p , F l , F p , F e o , and F c denote the F 1 classification scores for neutrophil, epithelial, lymphocyte, plasma, eosinophil, and connective nuclei, respectively. The experimental results show that AER-Net achieves a P Q of 0.608. It can be observed that AER-Net demonstrates the best performance in terms of F d , F n e , F e p , F l , F p , F e o , and F c . In the Lizard dataset, the number of nuclei in the eosinophil and neutrophil classes is significantly smaller compared with the nuclei in the other classes. It is noteworthy that AER-Net demonstrated significant improvement in the eosinophil and neutrophil classes with a 3.1 % improvement in F e o and a 6.0 % improvement in F n e . This further validates the potential of AER-Net for nuclear classification. Figure 7 illustrates the results of AER-Net compared with HoverNet. It can be observed that despite the numerous challenges in the image, AER-Net still achieves excellent results.
A comparison of Table 1 and Table 2 reveals that with the increase in the dataset, AER-Net can also achieve better results on the nuclei segmentation and classification tasks. In the small-scale dataset CoNSep, the number of samples is limited, and AER-Net can utilize the advantages of the residual refinement module to significantly improve the results of nuclei segmentation and classification. In large-scale datasets, where the diversity and complexity of data increase, AER-Net still performs well. Although the improvement of segmentation performance is not obvious in the Lizard large-scale dataset, the classification performance is still significantly improved, which shows that AER-Net can adapt to data of different scales and complexity with stronger generalization ability.
In order to further validate the segmentation and classification ability of AER-Net for nuclei of different tissues, we performed experiments on the PanNuke dataset, as shown in Table 3. F n , F i , F c , F d e a d and F e denote the F 1 classification scores for neoplastic, inflammatory, connective tissue, dead and non-neoplastic nuclei, respectively. It can be seen that AER-Net achieved the highest P Q of 0.644, indicating that AER-Net is also competitive in nuclear instance segmentation across different tissues. Moreover, as depicted in Table 3, it can be observed that AER-Net demonstrates improved performance in nuclear classification compared with HoverNet. In conclusion, AER-Net still achieves excellent performance in nuclear segmentation and classification in different tissues. The F 1 scores take into account the type of nuclei, so while the improvements in the P Q and F d metrics are not significant, the improvement in the F 1 scores for each class also demonstrates the enhancement of the AER-Net in the overall task of nuclear segmentation and classification.

4.5. Ablation Study

To analyze the performance of each module, we further conducted ablation experiments on the CoNSep dataset. We used the same experimental settings for each method in the ablation experiments to ensure a fair comparison. B a s e l i n e denotes HoverNet, A E n c o d e r denotes an encoder that utilizes the C S A R module based on the Baseline, and A R R M indicates the use of the A R R M module based on the baseline. A E n c o d e r & A R R M is our model ARE-Net including C S A R and A R R M . The results of our experiments are presented in Table 4. A powerful encoder is essential for nuclear segmentation and classification, and the experimental results show that the channel and spatial attention-enhanced encoder we used significantly improves the predictive performance of the model. The best performance of the model is observed when both the attention-enhanced encoder and the attention-enhanced residual refinement module are used.
We also performed ablation experiments on the weights of the auxiliary loss in the loss function, where L 1 denotes the auxiliary loss. The experimental results are shown in Table 5, and it is observed that the best performance is achieved when α is set to 1. It should be pointed out that the weight of auxiliary loss is only a hyperparameter, which should be adjusted accordingly for different datasets.
In addition, we tried to adjust the weights of the loss function for each branch in Equation (2). The experimental results are shown in Table 6. It can be seen that the results are better when the weights of L a , L b , L c are all 1 that is our used in the comparative experiments and independent validation.

4.6. Independent Validation

The differences in the datasets make it difficult to apply a model developed on one dataset to another dataset to assess its generalizability. DigestPath and CRAG are two subsets of the Lizard dataset. CRAG contains image regions extracted from the whole slide image (WSI) of the University Hospital Coventry and Warwick (UHCW). DigestPath contains image regions of biopsy samples from four different hospitals in China. Inspired by the work of Xiao et al. [21], we first divide CRAG into training and validation sets for training and use DigestPath as a test set. Then, we divide DigestPath into training and validation sets for training and use CRAG as a test set. The experimental results are presented in Table 7 and Table 8. It can be observed that our method achieved the best overall results in both sets of experiments. Furthermore, it is worth noting that the improvement of our method is more pronounced when experimenting on datasets with less data.

5. Conclusions

The accurate segmentation and classification of nuclei in histology images are crucial for providing valuable insights for the diagnosis and prognostic analysis of colorectal cancer. Nevertheless, histology images often contain numerous clusters of nuclei, posing challenges in accurately predicting their boundaries. Moreover, the class imbalance among nuclei presents another significant challenge in the field. In this study, we introduce AER-Net, which is a method designed to achieve the precise segmentation and classification of nuclei. Firstly, we utilize an attention-enhanced encoder module to enhance the feature extraction within the encoder. Secondly, we incorporate A R R M modules into each decoder branch to enhance the precision of our predictions, refining the initial coarse results for greater accuracy. Additionally, we address the challenge of classes imbalance among nuclei in pathology images by implementing a loss function that combines cross-entropy loss and generalized dice loss. Experimental results show that AER-Net performs well in nuclear segmentation and classification with excellent generalization ability. In the future, the model performance can be further improved by combining the AER-Net method with unsupervised methods to utilize more pathology image data.

Author Contributions

Conceptualization, R.C., Q.M. and D.T.; methodology, R.C., Q.M., D.T., P.W., Y.D. and C.Z.; software, Q.M.; validation, R.C., Q.M. and D.T.; writing—original draft preparation, Q.M.; writing—review and editing, R.C., Q.M. and D.T.; project administration, R.C. and C.Z.; supervision, R.C., Y.D. and C.Z.; funding acquisition, R.C., D.T. and C.Z.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants (62373001, 62303014), in part by State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Diseases in Central Asia Fund(SKL-HIDCA-2024-AH4),in part by the University Synergy Innovation Program of Anhui Province under Grant (GXXT-2021-030), in part by the National Key Research and Development Program of China under Grant (2020YFA0908700), in part by the supported by Anhui Provincial Natural Science Foundation (2308085QF225), and in part by the Education Department of Anhui Province (2023AH050089, 2023AH050061).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Sawicki, T.; Ruszkowska, M.; Danielewicz, A.; Niedźwiedzka, E.; Arłukowicz, T.; Przybyłowicz, K.E. A review of colorectal cancer in terms of epidemiology, risk factors, development, symptoms and diagnosis. Cancers 2021, 13, 2025. [Google Scholar] [CrossRef] [PubMed]
  2. Niazi, M.K.K.; Parwani, A.V.; Gurcan, M.N. Digital pathology and artificial intelligence. Lancet Oncol. 2019, 20, e253–e261. [Google Scholar] [CrossRef] [PubMed]
  3. Alsubaie, N.; Sirinukunwattana, K.; Raza, S.E.A.; Snead, D.; Rajpoot, N. A bottom-up approach for tumour differentiation in whole slide images of lung adenocarcinoma. In Proceedings of the Medical Imaging 2018: Digital Pathology, Houston, TX, USA, 10–15 February 2018; SPIE: Bellingham, DC, USA, 2018; Volume 10581, pp. 104–113. [Google Scholar]
  4. Lu, C.; Romo-Bucheli, D.; Wang, X.; Janowczyk, A.; Ganesan, S.; Gilmore, H.; Rimm, D.; Madabhushi, A. Nuclear shape and orientation features from H&E images predict survival in early-stage estrogen receptor-positive breast cancers. Lab. Investig. 2018, 98, 1438–1448. [Google Scholar] [PubMed]
  5. Graham, S.; Vu, Q.D.; Raza, S.E.A.; Azam, A.; Tsang, Y.W.; Kwak, J.T.; Rajpoot, N. Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 2019, 58, 101563. [Google Scholar] [CrossRef] [PubMed]
  6. Graham, S.; Jahanifar, M.; Vu, Q.D.; Hadjigeorghiou, G.; Leech, T.; Snead, D.; Raza, S.E.A.; Minhas, F.; Rajpoot, N. Conic: Colon nuclei identification and counting challenge 2022. arXiv 2021, arXiv:2111.14485. [Google Scholar]
  7. Yang, X.; Li, H.; Zhou, X. Nuclei segmentation using marker-controlled watershed, tracking using mean-shift, and Kalman filter in time-lapse microscopy. IEEE Trans. Circuits Syst. I Regul. Pap. 2006, 53, 2405–2414. [Google Scholar] [CrossRef]
  8. Ali, S.; Madabhushi, A. An integrated region-, boundary-, shape-based active contour for multiple object overlap resolution in histological imagery. IEEE Trans. Med. Imaging 2012, 31, 1448–1460. [Google Scholar] [CrossRef] [PubMed]
  9. Kass, M.; Witkin, A.; Terzopoulos, D. Snakes: Active contour models. Int. J. Comput. Vis. 1988, 1, 321–331. [Google Scholar] [CrossRef]
  10. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
  11. Raza, S.E.A.; Cheung, L.; Shaban, M.; Graham, S.; Epstein, D.; Pelengaris, S.; Khan, M.; Rajpoot, N.M. Micro-Net: A unified model for segmentation of various objects in microscopy images. Med. Image Anal. 2019, 52, 160–173. [Google Scholar] [CrossRef] [PubMed]
  12. Chen, H.; Qi, X.; Yu, L.; Heng, P.A. DCAN: Deep contour-aware networks for accurate gland segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2487–2496. [Google Scholar]
  13. Zhou, Y.; Onder, O.F.; Dou, Q.; Tsougenis, E.; Chen, H.; Heng, P.A. Cia-net: Robust nuclei instance segmentation with contour-aware information aggregation. In Proceedings of the Information Processing in Medical Imaging: 26th International Conference, IPMI 2019, Hong Kong, China, 2–7 June 2019; Proceedings 26. Springer: Berlin/Heidelberg, Germany, 2019; pp. 682–693. [Google Scholar]
  14. Naylor, P.; Laé, M.; Reyal, F.; Walter, T. Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE Trans. Med. Imaging 2018, 38, 448–459. [Google Scholar] [CrossRef] [PubMed]
  15. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  16. Lv, G.; Wen, K.; Wu, Z.; Jin, X.; An, H.; He, J. Nuclei R-CNN: Improve mask R-CNN for nuclei segmentation. In Proceedings of the 2019 IEEE 2nd International Conference on Information Communication and Signal Processing (ICICSP), Weihai, China, 28–30 September 2019; pp. 357–362. [Google Scholar]
  17. Yuan, Y.; Failmezger, H.; Rueda, O.M.; Ali, H.R.; Gräf, S.; Chin, S.F.; Schwarz, R.F.; Curtis, C.; Dunning, M.J.; Bardwell, H.; et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 2012, 4, 157ra143. [Google Scholar] [CrossRef] [PubMed]
  18. Wang, H.; Cruz-Roa, A.; Basavanhally, A.; Gilmore, H.; Shih, N.; Feldman, M.; Tomaszewski, J.; Gonzalez, F.; Madabhushi, A. Cascaded ensemble of convolutional neural networks and handcrafted features for mitosis detection. In Proceedings of the Medical Imaging 2014: Digital Pathology, San Diego, CA, USA, 15–20 February 2014; SPIE: Bellingham, DC, USA, 2014; Volume 9041, pp. 66–75. [Google Scholar]
  19. Sirinukunwattana, K.; Raza, S.E.A.; Tsang, Y.W.; Snead, D.R.; Cree, I.A.; Rajpoot, N.M. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Trans. Med. Imaging 2016, 35, 1196–1206. [Google Scholar] [CrossRef] [PubMed]
  20. Rahaman, M.M.; Li, C.; Yao, Y.; Kulwa, F.; Wu, X.; Li, X.; Wang, Q. DeepCervix: A deep learning-based framework for the classification of cervical cells using hybrid deep feature fusion techniques. Comput. Biol. Med. 2021, 136, 104649. [Google Scholar] [CrossRef] [PubMed]
  21. Xiao, S.; Qu, A.; Zhong, H.; He, P. A scale and region-enhanced decoding network for nuclei classification in histology image. Biomed. Signal Process. Control 2023, 83, 104626. [Google Scholar] [CrossRef]
  22. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  23. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  24. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
  25. Qin, X.; Zhang, Z.; Huang, C.; Gao, C.; Dehghan, M.; Jagersand, M. Basnet: Boundary-aware salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7479–7489. [Google Scholar]
  26. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  27. Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Jorge Cardoso, M. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Québec City, QC, Canada, 14 September 2017; Held in Conjunction with MICCAI 2017; Proceedings 3. Springer: Berlin/Heidelberg, Germany, 2017; pp. 240–248. [Google Scholar]
  28. Graham, S.; Jahanifar, M.; Azam, A.; Nimir, M.; Tsang, Y.W.; Dodd, K.; Hero, E.; Sahota, H.; Tank, A.; Benes, K.; et al. Lizard: A large-scale dataset for colonic nuclear instance segmentation and classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 684–693. [Google Scholar]
  29. Gamper, J.; Alemi Koohbanani, N.; Benet, K.; Khuram, A.; Rajpoot, N. PanNuke: An open pan-cancer histology dataset for nuclei instance segmentation and classification. In Proceedings of the Digital Pathology: 15th European Congress, ECDP 2019, Warwick, UK, 10–13 April 2019; Proceedings 15. Springer: Berlin/Heidelberg, Germany, 2019; pp. 11–19. [Google Scholar]
  30. Gamper, J.; Koohbanani, N.A.; Benes, K.; Graham, S.; Jahanifar, M.; Khurram, S.A.; Azam, A.; Hewitt, K.; Rajpoot, N. Pannuke dataset extension, insights and baselines. arXiv 2020, arXiv:2003.10778. [Google Scholar]
  31. Kirillov, A.; He, K.; Girshick, R.; Rother, C.; Dollár, P. Panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9404–9413. [Google Scholar]
  32. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Figure 2. Illustration of overall architecture.
Figure 2. Illustration of overall architecture.
Sensors 24 07208 g002
Figure 3. Illustration of C S A R block.
Figure 3. Illustration of C S A R block.
Sensors 24 07208 g003
Figure 4. The structure of the decoder block.
Figure 4. The structure of the decoder block.
Sensors 24 07208 g004
Figure 5. The structure of the attention-enhancing residual refinement module.
Figure 5. The structure of the attention-enhancing residual refinement module.
Sensors 24 07208 g005
Figure 6. Example visualization results on the CoNSeP dataset.
Figure 6. Example visualization results on the CoNSeP dataset.
Sensors 24 07208 g006
Figure 7. Example visualization results on the Lizard dataset.
Figure 7. Example visualization results on the Lizard dataset.
Sensors 24 07208 g007
Table 1. Comparative experimental results on the CoNSep dataset.
Table 1. Comparative experimental results on the CoNSep dataset.
Model F d F e F i F s F m PQ
SC-CNN [19]0.6080.3060.1930.1750.000-
DIST [14]0.7120.6170.5340.5050.0000.372
Micro-Net [11]0.7430.6150.5920.5320.1170.430
Mask-RCNN [15]0.6920.5950.5900.5200.0980.450
HoverNet [5]0.7520.6330.5310.5450.3580.521
AER-Net0.7820.6820.6260.5970.5200.556
Table 2. Comparative experimental results on the Lizard dataset.
Table 2. Comparative experimental results on the Lizard dataset.
Model F d F ne F ep F l F p F eo F c PQ
Unet [10]0.7800.1410.6730.6110.2990.2800.6200.456
Mask-RCNN [15]0.7660.2290.7510.6010.3650.3160.5870.531
HoverNet [5]0.8210.2200.7600.6380.3960.3490.6200.607
AER-Net0.8250.2800.7850.6490.4210.3800.6540.608
Table 3. Comparative experimental results on the PanNuke dataset.
Table 3. Comparative experimental results on the PanNuke dataset.
Model F d F n F i F c F dead F e PQ
HoverNet [5]0.7910.5610.4720.4050.1600.4060.643
AER-Net0.7910.5930.4930.4020.2010.5000.644
Table 4. The results of different modules on the CoNSep dataset.
Table 4. The results of different modules on the CoNSep dataset.
Model F d F e F i F s F m PQ
B a s e l i n e 0.7520.6330.5310.5450.3580.521
A E n c o d e r 0.7740.6560.6280.5950.4430.551
A R R M 0.7630.6370.5750.5690.3820.546
A E n c o d e r & A R R M 0.7820.6820.6260.5970.5200.556
Table 5. The results of experiments with different α on the CoNSep dataset.
Table 5. The results of experiments with different α on the CoNSep dataset.
Model F d F e F i F s F m PQ
               L 2 0.7690.6580.6340.5810.4880.550
0.2 L 1 + L 2 0.7720.6710.6150.5820.4260.549
0.4 L 1 + L 2 0.7600.6490.6140.5830.4880.548
0.6 L 1 + L 2 0.7730.6590.6140.5870.5160.551
0.8 L 1 + L 2 0.7700.6720.6090.5870.4810.554
      L 1 + L 2 0.7820.6820.6260.5970.5200.556
Table 6. The results of experiments with different weights for each branch in loss function Equation (2).
Table 6. The results of experiments with different weights for each branch in loss function Equation (2).
Model F d F e F i F s F m PQ
2 L a +    L b +    L c 0.7690.6580.6220.5730.5040.548
   L a + 2 L b +    L c 0.7500.6150.5740.5590.3910.543
   L a +    L b + 2 L c 0.7700.6670.6220.5960.5050.545
2 L a + 2 L b +    L c 0.7740.6630.5890.5850.4410.553
2 L a +    L b + 2 L c 0.7650.6570.6310.5900.4820.551
   L a + 2 L b + 2 L c 0.7550.6520.6130.5780.4760.546
   L a +    L b +    L c 0.7820.6820.6260.5970.5200.556
Table 7. The results of the independent validation experiments utilized CRAG as the training set and DigestPath as the test set.
Table 7. The results of the independent validation experiments utilized CRAG as the training set and DigestPath as the test set.
Model F d F ne F ep F l F p F eo F c PQ
HoverNet [5]0.7400.0210.5600.4170.2020.1490.3910.528
AER-Net0.7700.0130.6200.5210.2310.1630.4600.554
Table 8. The results of the independent validation experiments utilized DigestPath as the training set and CRAG as the test set.
Table 8. The results of the independent validation experiments utilized DigestPath as the training set and CRAG as the test set.
Model F d F ne F ep F l F p F eo F c PQ
HoverNet [5]0.7250.0450.5880.4390.2250.0700.3570.405
AER-Net0.7290.0720.6270.4620.2650.1040.4100.415
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, R.; Meng, Q.; Tan, D.; Wei, P.; Ding, Y.; Zheng, C. AER-Net: Attention-Enhanced Residual Refinement Network for Nuclei Segmentation and Classification in Histology Images. Sensors 2024, 24, 7208. https://doi.org/10.3390/s24227208

AMA Style

Cao R, Meng Q, Tan D, Wei P, Ding Y, Zheng C. AER-Net: Attention-Enhanced Residual Refinement Network for Nuclei Segmentation and Classification in Histology Images. Sensors. 2024; 24(22):7208. https://doi.org/10.3390/s24227208

Chicago/Turabian Style

Cao, Ruifen, Qingbin Meng, Dayu Tan, Pijing Wei, Yun Ding, and Chunhou Zheng. 2024. "AER-Net: Attention-Enhanced Residual Refinement Network for Nuclei Segmentation and Classification in Histology Images" Sensors 24, no. 22: 7208. https://doi.org/10.3390/s24227208

APA Style

Cao, R., Meng, Q., Tan, D., Wei, P., Ding, Y., & Zheng, C. (2024). AER-Net: Attention-Enhanced Residual Refinement Network for Nuclei Segmentation and Classification in Histology Images. Sensors, 24(22), 7208. https://doi.org/10.3390/s24227208

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop