Next Article in Journal
Correcting Diacritics and Typos with a ByT5 Transformer Model
Previous Article in Journal
Fracture Toughness Determination on an SCB Specimen by Meshless Methods
Previous Article in Special Issue
A Dual-Stage Vocabulary of Features (VoF)-Based Technique for COVID-19 Variants’ Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Attentive Octave Convolutional Capsule Network for Medical Image Classification

1
School of Computer Science and Technology, Minzu University of China, Beijing 100081, China
2
Department of Computer Science and Technology, School of Information, Renmin University of China, Beijing 100872, China
3
Department of Computer Science, School of Engineering and Applied Science, Gonzaga University, Spokane, WA 99258, USA
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(5), 2634; https://doi.org/10.3390/app12052634
Submission received: 14 January 2022 / Revised: 23 February 2022 / Accepted: 25 February 2022 / Published: 3 March 2022
(This article belongs to the Special Issue Medical Signal and Image Processing)

Abstract

:
Medical image classification plays an essential role in disease diagnosis and clinical treatment. More and more research efforts have been dedicated to the design of effective methods for medical image classification. As an effective framework, the capsule network (CapsNet) can realize translation equivariance. Lots of current research applies capsule networks in medical image analysis. In this paper, we propose an attentive octave convolutional capsule network (AOC-Caps) for medical image classification. In AOC-Caps, an AOC module is used to replace the traditional convolution operation. The purpose of the AOC module is to process and fuse the high- and low-frequency information in the input image simultaneously, and weigh the important parts automatically. Following the AOC module, a matrix capsule is used and the expectation maximization (EM) algorithm is applied to update the routing weights. The proposed AOC-Caps and comparative methods are tested on seven datasets, including PathMNIST, DermaMNIST, OCTMNIST, PneumoniaMNIST, OrganMNIST_Axial, OrganMNIST_Coronal, and OrganMNIST_Sagittal, which are from MedMNIST. In the experiments, baselines include the traditional CNN models, automated machine learning (AutoML) methods, and related capsule network methods. The experimental results demonstrate that the proposed AOC-Caps achieves better performance on most of the seven medical image datasets.

1. Introduction

As an interdisciplinary area, medical image classification is the foundation of automatic disease diagnosis. With the development of deep learning technology, convolutional neural networks (CNNs) [1,2,3,4,5,6,7] have been widely applied in computer vision tasks, such as image classification [8,9], object detection [10], semantic segmentation [11], etc. The performance of these tasks was greatly improved with the application of CNNs. However, CNNs have limitations. Firstly, the pooling operation provides some transition invariance and results in the loss of important location information. Secondly, CNNs struggle to learn the part–whole relationship. To address these weaknesses, CapsNet [12] is proposed to replace the scalar output with vector output for representing different properties, such as the orientation and viewpoints of objects. Different from the translation invariance from the pooling operation, CapsNet can provide translation equivariance. Equivariance is the detection of objects that can transform into each other. Different from CNNs, the knowledge about part–whole relationships is kept in the capsule network, as discussed in [12,13]. Capsule networks recognize objects through both local features and part–whole knowledge. For example, a bird, as an object, has several parts, including a head, a trunk, wings, claws, and a tail. When these parts are disturbed, CNNs would still recognize the disturbed object as a bird, while CapsNet can determine that it is not a bird through the part–whole relationship. In the original CapsNet [12], information is represented in vectorized format, leading to costly calculation of the routing between capsules of different layers. In the matrix CapsNet [13], vectorized information is replaced by matrix capsules, and routing weights are updated by the expectation-maximization (EM) algorithm.
However, early CapsNets have their drawbacks. First, the low-level features that consist of the capsule are extracted only by shallow convolutional operations. This results in capsules containing very little semantic high-level information. Secondly, low-level convolutional operations lack an attention mechanism, which may import meaningless and redundant information into the capsules. One of the effective ways for performance boosting is to employ a better feature extractor, which can capture richer and more semantic contextual patterns to build capsules. Recent efforts focusing on the improvement of feature extraction for CapsNets were extensively investigated, such as Multi-Scale CapsNet (MS-CapsNet) [14] and RS-CapsNet [15].
In this paper, we propose a novel capsule network, named the attentive octave convolutional capsule network (AOC-CapsNet) for medical image classification. In AOC-CapsNet, the traditional convolution operation is replaced by an octave convolution operation. In [16], the octave convolution operation is proposed to process both higher and lower frequencies in the inputs at the same time. In natural images, higher frequencies correspond to the detailed information that varies greatly in the images, and lower frequencies correspond to the smoothly changing structure in the images. These two types of information are also very important for medical image classification. It is critical to select which kind of information is more important in medical image classification. However, the traditional octave convolutional operation cannot enhance useful information and suppress useless information. In AOC-CapsNet, we adopt a convolutional block attention module (CBAM) to identify and select useful information. The CBAM allows AOC-CapsNet to highlight critical local regions with rich semantic details utilized as distinguishable patterns, leading to a performance gain in the medical image classification task.
Studies on capsule networks [17,18] have focused on medical image analysis. A recent benchmark, named MedMNIST [19], was proposed and used to validate the performance of different models for medical image analysis. MedMNIST is composed of 10 pre-processed open medical image datasets. Similar to the MNIST dataset [20], classification tasks in MedMNIST are lightweight. The resolution of images in classification tasks is 28 × 28 . Those tasks cover primary medical image modalities and diverse data scales. In this paper, we design comparative experiments on seven datasets in MedMNIST. Through experiments, ResNet [6], AutoML methods [21,22] and methods related to capsules [12,13,14] are compared with the proposed AOC-Caps. In the ablation studies, matrix capsule networks with different convolutional feature extraction layers are compared to determine which type of convolution layer is more suitable for the application of capsule networks in the medical image classification of MedMNIST.
The main contributions of this research are as follows:
  • An attentive octave convolution operation is proposed. By combining the novel operation with capsule networks, we design an effective classification framework named AOC-CapsNet for medical image classification.
  • The proposed AOC-CapsNet is validated via extensive experiments on the MedMNIST benchmark and has achieved the state-of-the-art (SOTA) performance in two of the seven tasks.
  • The proposed method can serve as a credible benchmark for future reference. We have made the code public at the following link, which was last accessed on 23 Feburary 2022, https://github.com/aszx87414/Attentive_Octave_Convolutional_Capsule_Network.
The rest of this paper is organized as follows. Section 2 reviews the related research work. Section 3 explains our proposed method. In Section 4, comprehensive experiments are conducted to evaluate the effectiveness of the proposed method. Finally, in Section 5, we conclude the paper.

2. Related Work

Introduced by Hinton [23], the core idea of “capsule” is to group the neurons into a vector, which is defined as a capsule. In CNNs, the activation of a neuron can be considered the likelihood of detecting a specific feature. Different from feature invariance in CNN, feature equivariance, which is considered the detection of features that can transform into each other, is achieved in capsule networks.
In [12], the dynamic routing between capsules is applied in the proposed capsule network. The pooling operation is abandoned in [12] for keeping the location information of features. Although CapsNet with dynamic routing achieved SOTA performance in MNIST and its variant, MultiMNIST, it still has drawbacks, such as huge computational cost and the lack of high-level semantic information. In [13], the matrix capsule network is constructed by transforming the capsule form from vector to matrix and changing the link mode between capsules of different layers. The coupling coefficients between lower-layer and higher-layer capsules are updated by the EM algorithm. In [12,13], all the features used to construct capsules are extracted by a convolution layer. These features are low-level information and cannot effectively recognize complex objects.
To handle the drawbacks explained above, there have been several studies [14,15,17,24,25] focusing on applying more powerful feature extraction modules to improve the performance of capsule networks. In [14], multiple convolutional kernels are used to extract multi-scale features for constructing multi-dimension capsules, and a novel dropout for capsules is proposed. In RS-Caps [15], the Res2Net block [26] is used to extract multi-scale features and increase the size of receptive fields of each convolutional layer. What is more, a new linear combination between capsules and routing process is proposed for constructing more effective classification capsules. In HitNet [24], a new layer called hit-or-miss and a centripetal loss function are designed. HitNet also introduces a data augmentation method that can combine data space and feature space. The most straightforward idea to improve the performance of capsule networks is to increase the number of intermediate capsule layers to obtain deeper capsule networks. However, it was recently proven that directly stacking fully connected capsule layers will result in a decline in performance [27]. In order to solve this problem, DeepCaps [25] uses a novel 3D convolution-based dynamic routing algorithm. Furthermore, a class-independent decoder network is also proposed to strengthen the use of reconstruction loss as a regularization term.
Deep learning technology has also been applied in medical image analysis. In [28], U-Net architecture, which consists of a contracting path to capture context information and an expanding path that enables precise localization, is proposed for biomedical image segmentation. In [29], an approach based on a volumetric, fully convolutional neural network is proposed for 3D image segmentation. USE-Net [30], which incorporates squeeze-and-excitation (SE) modules into U-Net, is proposed for magnetic resonance imaging (MRI) segmentation. In [31], SegNet [32], which consists of an encoder network, a decoder network followed by a pixel-wise classification layer, U-Net and pix2pix are compared in the experiments on two multi-centric MRI prostate datasets.

3. Attentive Octave Convolutional Capsule Network

In this section, we introduce our proposed AOC-Caps in detail. As shown in Figure 1b, input images are fed into a traditional convolution layer followed by batch normalization and RELU operation. The feature maps generated by this convolutional layer are then fed into the attentive octave convolution layer (AOC-Layer). In the AOC-Layer, the higher- and lower-frequencies are processed simultaneously. The useful information is enhanced, and useless information is suppressed in the AOC-Layer. The enhanced feature maps generated by the AOC-Layer are then reshaped into a pose matrix and an activation following the matrix capsule network [13]. In Section 3.1, the details of the AOC-Layer are provided. The process of routing and updating in capsule layers is introduced in Section 3.2. The loss function of the proposed AOC-Caps is described in Section 3.3.

3.1. Attentive Octave Convolution Layer

In the traditional convolution operation, the input information is processed by convolutional kernels of certain sizes. Convolution operations of different convolutional kernels can obtain information of different frequencies, and there is no effective fusion process between frequencies. The octave convolution operation [16] is proposed to process different frequencies simultaneously. In octave convolution, the convolution and fusion of two frequencies with a difference of an octave are performed simultaneously without an attention mechanism module. It is very important to select useful information in medical image classification. In order to select information of different frequencies, we add a CBAM [33] in the AOC-Layer.
Suppose the input is defined as I R 3 × h × w , where h and w are defined as the height and width of the image. The feature maps obtained by the first convolution layer are defined as F R c × h × w , where c is the channel of the feature maps. In the AOC-Layer, the feature map F is first divided into two parts F 1 H , F 1 L by a convolution operation (H-H Conv in Figure 1a) and pooling and a convolution operation (H-L Pooling and Conv in Figure 1a), where F 1 H is higher frequency and F 1 L is lower frequency. The channels of the feature maps are divided by ratio α . The size of the higher frequency is R α c × h × w and that of the lower frequency is R ( 1 α ) c × h 2 × w 2 . In order to obtain lower-frequency information, two pooling operations (average and maximum) are used in the AOC-Layer. Their effects are discussed in detail in Section 4.5. In order to convert lower-frequency information into higher-frequency information, bilinear interpolation is used to convert lower-resolution feature maps into higher-resolution feature maps. Then, the higher- and lower-frequency communicate with each other by summation, as shown in Formulas (1) and (2):
F 2 H = C o n v H H ( F 1 H ) + U p s a m p l e L H ( C o n v L H ( F 1 L ) )
F 2 L = P o o l i n g H L ( C o n v H L ( F 1 H ) ) + C o n v L L ( F 1 L )
As shown in Figure 1a, the attention modules are added to the intermediate feature maps with different frequencies to enhance useful information and suppress useless information. Without losing particularity, an intermediate feature map in the AOC-Layer is defined as F R C × H × W , where C is the number of channels of both higher- and lower-frequency parts, and H and W are the height and width of higher- and lower-frequency parts. The attention module sequentially infers a 1D channel attention map F c R C × 1 × 1 and a 2D spatial attention map F s R 1 × H × W . The selections of channel and spatial information are based on Formulas (3) and (4):
F c a t t = F c ( F ) F
F s , c a t t = F s ( F c a t t ) F c a t t
where ⊗ denotes the element-wise multiplication. The intermediate feature map F R C × H × W is firstly processed by a channel attention module. The feature map F is pooled along the spatial dimension through average and maximum operations. The average-pooled features and max-pooled features are processed by a multi-layer perception (MLP), with one hidden layer for producing the channel attention vector F c R C × 1 × 1 . The channel attention enhanced feature map F c a t t is then processed by a spatial attention module. Feature map F c a t t is pooled along the channel dimension by both average and maximum operations. The average-pooled and max-pooled feature maps are concatenated along the channel dimension to produce F s R 2 × H × W . F s is processed by a convolution operation with kernel size 7 followed by a sigmoid function. In the AOC-Layer, the attention modules are plugged into F 1 H , F 1 L , F 2 H , and F 2 L . The role of attention modules is also discussed in detail in Section 4.5.

3.2. Capsule Layer

The two commonly used capsule networks are CapsNet with dynamic routing and matrix CapsNet with EM routing. In CapsNet with dynamic routing, the capsule vector is constructed by stacking the neurons with scalar values. In CapsNet with EM routing, the capsule contains a pose matrix and an activation.
The output feature maps of the AOC-Layer are first reshaped into a series of capsules s j , j = 1 , 2 , . For one capsule s j , its input is the weighted sum of all the prediction vectors u ^ j | i generated by the previous layers. It can be defined as Formulas (5) and (6).
s j = i c i j u ^ j | i
u ^ j | i = W i j u i
where c i j is the routing coefficient, W i j is the matrix used for voting, and u i is the output vector of the previous capsule layer. The routing coefficients should be computed in dynamic routing [12] or EM routing [13]. In dynamic routing, u i is a vector.
The process of dynamic routing is shown as follows:
  • The prior probability b i j between capsule j and capsule i in the previous layer is initialized to be 0;
  • The routing coefficients c i j can be computed through the softmax function c i j = e x p ( b i j ) ) k e x p ( b i k ) ;
  • The input to capsule j is computed by Formula (5) and then it is squeezed by v j = s j 2 1 + s j 2 s j s j ;
  • The b i j is updated by b i j b i j + u ^ j | i · v j ;
  • Repeat steps 2 to 4 r times. The value of r is set empirically, usually from 1 to 3.
Different from capsule networks with dynamic routing, capsules in the matrix capsule with EM routing consist of a pose matrix and an activation. A pose matrix defines the translation and the rotation of the objects. The aim of the EM algorithm is to cluster datapoints into different Gaussian distributions. Suppose the pose matrix is a 4 × 4 matrix, i.e., 16 components. Let v i j be the vote from capsule i to capsule j, and v i j h be its h-th component. The probability density function of a Gaussian is defined as Formula (7):
P x = 1 σ 2 π e x μ 2 2 σ 2
It can be applied to compute the probability of v i j h belonging to the capsule j’s Gaussian model:
P i | j h = 1 2 π σ j h 2 e v i j h μ j h 2 2 σ j h 2
Let c o s t i j h = l n P i | j h be the cost to activate the h-th component of capsule j by the h-th component of capsule i, where
l n P i | j h = l n ( σ j h ) ) l n 2 π 2 v i j h μ j h 2 2 σ j h 2
Whether the capsule j is activated is determined by the following equation:
a j = s i g m o i d ( λ ( b j h c o s t j h ) )
where
c o s t j h = i r i j c o s t i j h
r i j is the assignment probability of each datapoint to a capsule. r i j , μ , σ , and a j are computed iteratively using the EM algorithm.

3.3. Loss Function

In AOC-Caps, if the dynamic routing is applied, the loss function is defined as
L c = T c m a x ( 0 , m + v c ) 2 + λ ( 1 T c ) m a x ( 0 , v c m ) 2
where T c = 1 if the class c is present, m + is set to be 0.9 and m is set to be 0.1.
If the matrix capsule with EM routing is applied, the loss function has a similar design as in [13]. The spread loss is used to directly maximize the gap between the activation of the target class and the activation of other classes. The loss function is formed as Formulas (13) and (14):
L w = m a x ( 0 , m a r a w ) 2
L = w r L w
where a w is the wrong class and a t is the target class.

4. Experiments

4.1. Datasets

MedMNIST consists of 10 pre-processed datasets. It contains 10 open medical image datasets covering multiple tasks, including multi-class, binary classification, sequential regression, and multi-label. In our experiments, we focus on the multi-class tasks, such as PathMNIST, DermaMNIST, OCTMNIST, PneumoniaMNIST, OrganMNIST_Axial, OrganMNIST_Coronal, and OrganMNIST_Sagittal. In these seven datasets, the height and width of the images are resized to 28. Figure 2 demonstrates an overview of the seven datasets with samples.
All datasets are divided into a training set, a validation set, and a test set. The number of images in each set is detailed in Table 1. The models are trained on the training sets, validated on the validation sets after each epoch during training, and finally evaluated on the test sets.
PathMNIST. It is based on a prior study [34] for predicting survival from colorectal cancer histology slides, which provides a dataset of 100,000 non-overlapping image patches from hematoxylin and eosin-stained histological images, and a test dataset of 7180 images patches from a different clinical center. Nine types of tissues are involved, resulting in a multi-class classification task. The details of these nine categories are introduced in Table 2.
DermaMNIST. It is based on HAM10000 [35], a large collection of multi-source dermatoscopic images of common pigmented skin lesions. The dataset consists of 10,015 images labeled as seven different categories, as a multi-class classification task. These seven categories are introduced in Table 3.
OctMNIST. It is based on a prior dataset [36] of 109,309 valid optical coherence tomography images for retinal diseases. Four types are involved in this dataset, leading to a multi-class classification task. These four categories are introduced in Table 4.
PneumoniaMNIST. It is based on a prior dataset [36] of 5856 pediatric chest X-ray images. This task is a binary class of pneumonia and normal. The source training set is split into training and validation sets with a ratio of 9:1, and its source validation set is used as the test set. These two categories are introduced in Table 5.
OrganMNIST_(Axial, Coronal, and Sagittal). These three datasets are based on 3D computed tomography (CT) images from the Liver Tumor Segmentation Benchmark [37]. Bounding-box annotations of 11 body organs from another study are used for obtaining the organ labels. The only differences of OrganMNIST_(Axial, Coronal, and Sagittal) are the views. The 11 categories of each of the three datasets are introduced in Table 6, Table 7 and Table 8.

4.2. Evaluation Metrics

In our experiments, we use accuracy (ACC), area under ROC curve (AUC), precision (PRE), recall (REC) and F1-score (F1) as the evaluation metrics. The formulas of these metrics are shown below:
A c c u r a r y = i = 1 N f ( x i ) = y i N
p r e c i s i o n = i = 1 c p r e c i s i o n i C
p r e c i s i o n i = T P i T P i + F P i , i = 1 , 2 , , C
r e c a l l = i = 1 C r e c a l l i C
r e c a l l i = T P i T P i + F N i , i = 1 , 2 , , C
F 1 = 2 p r e c i s i o n r e c a l l p r e c i s i o n + r e c a l l
where C is the number of classes and N is the number of total samples. TP, TN, FP, and FN refer to true positive, true negative, false positive, and false negative.
Accuracy (ACC) is the most commonly used metric among these performance metrics, but it does not indicate the true model performance when the classes are imbalanced. Area under the ROC curve (AUC) is less sensitive to class imbalance than ACC. Precision (PRE) and recall (REC) are related to the positive prediction. In our experiments, we use macro-precision and macro-recall, which are defined by Formulas (16) and (18). The F1-score (F1) is a metric that combines both precision and recall.

4.3. Baselines

In our experiments, we use the same baselines as in [19]. In addition, several methods related to capsule networks are used in the classification tasks of the seven datasets mentioned above.
ResNet18 and ResNet50 [6]. These two models are trained for 100 epochs, using a cross-entropy loss function and an SGD optimizer with a bath size of 128 and an initial learning rate 10 3 .
AutoML Methods. Several AutoML methods [21,22] were applied on MedMNIST classification. The experimental settings of three AutoML methods (auto-sklearn [21], AutoKeras [22] and Google AutoML Vision) are the same as in [19]. AutoML methods are designed to search for the optimal hyper-parameter setting or neural architecture to maximize the predictive ability. For example, Auto-sklearn and autoKeras are open-source AutoML tools for both statistical machine learning and deep learning. On the other hand, Google AutoML Vision offers commercial black-box AutoML tools. In this study, the results of AutoML methods in our experiments are directly referenced from [19].
CapsNet (Dynamic routing) [12]. In [12], the output feature maps of the convolutional layer with kernel size 9 × 9 and a stride 2 are reshaped into primary capsules. The capsules of the previous layer are routed to classification capsules by agreement. Different from the original setting, we set the iteration number to be 2 instead of 3 for the best performance.
CapsNet (EM routing) [13]. In [13], the capsule contains a 4 × 4 pose matrix and an activation. The vote for capsules in the next layer is computed by the matrix multiplication between the pose matrix and the trainable transformation matrix. The routing coefficients between capsules and classification capsules are updated through the EM algorithms. Different from the original setting, the kernel size of the first convolutional layer is set to be 3 × 3 for detailed feature representation.
MS-Caps [14]. In [14], hierarchical features are extracted and reshaped into capsules of different dimensions. The capsules cascade together for the dropout operation. Following the original experimental setting, the Adam optimizing method [38] is used as the gradient descent algorithm to perform the training. The weight decay factor is set to be 0.00001. The initial learning rate is 0.001 and 0.0001, and the number of iterations is 25 and 50 for converging to the optimal solution quickly.
DeepCaps [25]. In [25], the Adam optimizer is used with an initial learning rate of 0.001, which is reduced by half after every 20 epochs.

4.4. Implementation Details

In this paper, experiments are implemented by PyTorch on a PC with four GPUs of TITAN X. Different from the experiments in [19], we conduct experiments only with images of size 28 × 28.
In the experiments, the kernel size of the first convolution layer is set to be 3 × 3 and the stride is set to be 2. In the AOC-Layer, ratios for the split of higher- and lower-frequencies are set to be 0.5. The convolutional kernel size is set to be 3 in the AOC-Layer. In the attention modules, the reduction ratio is set to be 8, and the convolutional kernel size is set to be 3. When the capsule with a dynamic routing framework is applied, the vector dimension of the capsule is set to be 8 and the iteration number is set to be 2. When the matrix capsule with the EM routing framework is applied, the pose matrix is set to be a 4 × 4 matrix. The batch size is set to be 96 and the initial learning rate is 3 × 10 3 . The training epoch is set to be 120.
In our experiments, all metrics are implemented by a python module called Torchmetrics.

4.5. Ablation Study

To verify the effectiveness of each design in AOC-Caps, we conduct experiments for the ablation study. The ablation study focuses on (1) different forms of capsules and different routing methods; (2) types of pooling operations in the AOC-Layer; (3) the effect of attention modules.
To test different forms of capsules and different routing methods, we choose the pooling operation type to be maximal and plug attention modules as introduced in Section 3.1. It should be noted that in dynamic routing, we do not add the decode term. As shown in Table 9, the AOC-Caps with matrix and EM routing outperforms the one with dynamic routing. It can be seen from Table 9 that different routing methods have a great impact on the results. For example, in the OrganMNIST_Axial dataset, the accuracy with EM routing is 93.1%, while the accuracy with dynamic routing is 80.2%. The AOC-Caps with EM routing also achieves better precision and recall, both of which are more than 10% higher.
To test the effectiveness of different pooling operations, such as average and maximum, we keep the capsule form to matrix and EM routing. The attention modules are plugged into the AOC-Layer as introduced in Section 3.1. As shown in Table 10, the AOC-Caps with max-pooling outperforms the one with avg-pooling. What is more, the type of pooling operations has much less impact on the results than the routing methods.
To test the effectiveness of the attention modules, we keep the capsule form to matrix and EM routing. The pooling type is set to be maximal. As shown in Table 11, the AOC-Caps with the attention module outperforms the one without the attention module. Attention modules have a greater impact on the results than the pooling operation type and less than the routing method. In Table 11, the AOC-Caps with attention modules achieves higher accuracy, precision and recall.
The results in Table 9, Table 10 and Table 11 have demonstrated the effectiveness of EM routing, max-pooling in the AOC-Layer, and attention modules. By putting them together, we find the optimal combination to implement AOC-Caps based on the following comparative experiments.

4.6. Comparative Experiments

The results of the comparative experiments are reported in Table 12, in which the results of ResNet18, ResNet50, Auto-sklearn, AutoKeras, and Google AutoML Vision are from [19]. For the three AutoML models, the original paper does not provide metrics in Pre, Rec, and F1, which are marked as dashes in Table 12.
Table 12 shows the performance of the comparative models on seven datasets in MedMNIST. In terms of accuracy (ACC), AOC-Caps achieves the best accuracy on DermaMNIST and OrganMNIST_Axial datasets, and ranks second on other datasets. CapsNet with dynamic routing and EM routing demonstrate worse performance than other methods, due to the shallow feature extraction network. However, because the data of OrganMNIST_(Axial, Coronal, and Sagittal) are collected from 3D images, the viewpoints are different. Therefore, the CapsNet with the EM routing with transformation invariance can obtain good accuracy, even with the shallow feature extraction network.
Due to the class imbalance in certain datasets of MedMNIST, it is necessary to consider different metrics. In terms of precision (PRE) and recall (REC), the performance of all methods is lower than accuracy on datasets DermaMNIST and OrganMNIST_Sagittal because of the data imbalance between different categories in DermaMNIST and OrganMNIST_Sagittal. In this case, AOC-Caps still achieves better results. However, it should be pointed out that there is no corresponding solution to this data imbalance problem in our experiments.
In our comparative experiments, a model with deeper layers does not necessarily achieve higher accuracy. Consider ResNet18 as an example. It outperforms ResNet50 on datasets such as DermaMNIST, OCTMNIST, OrganMNIST_Axial, and OrganMNIST_Sagittal. As part of the reason, the low resolution of the image data does not require a deep network model to extract high-level features.

5. Conclusions

In this paper, we proposed a novel attentive octave convolutional capsule network (AOC-Caps) for medical image classification. In the AOC-Layer, the octave convolution and attention modules (CBAM) are used for communicating and selecting the higher- and lower-frequency information. The output feature maps of the AOC-Layer are used as the foundation for constructing capsules. The experiments have verified the effectiveness of the AOC-Layer. Because the images of certain datasets come from 3D images and the viewpoint is likely to change, the capsule routing method with transformation invariance, such as matrix CapsNet with the EM routing, can obtain higher accuracy. By combining the AOC-Layer and matrix CapsNet with the EM routing, AOC-Caps could achieve better performance than most baselines in the experiments.
However, the AOC-Layer in AOC-Caps is still a shallow convolutional network. Although AOC-Caps has achieved better results than DeepCaps on the classification tasks in MedMNIST, the deep network structure is still necessary when handling high-resolution medical images. This may be due to the smaller image size of the dataset and less detailed information. In future studies, we will consider high-resolution medical images of different diseases and investigate the effect of AOC-Caps in the case of deep structures. Furthermore, it is necessary to consider how to solve the problem of class imbalance in medical image analysis.

Author Contributions

Conceptualization and methodology, Z.L. (Zhengzhen Li) and H.Z. (Hong Zhang); software, validation, Z.L. (Zan Li) and H.Z. (Hao Zhao); original draft preparation, H.Z. (Hong Zhang) and Y.Z.; review and editing, H.Z. (Hong Zhang), Z.L. (Zhengzhen Li) and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.
Not applicable.

Data Availability Statement

Not applicable. The data and code supporting the conclusions of this article are available at (last accessed on 23 Feburary 2022) https://github.com/aszx87414/Attentive_Octave_Convolutional_Capsule_Network.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Siem Reap, Cambodia, 13–16 December 2012; Curran Associates Inc.: Lake Tahoe, NV, USA, 2012; Volume 1. [Google Scholar]
  2. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  3. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar] [CrossRef] [Green Version]
  4. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; Volume 1, pp. 2818–2826. [Google Scholar] [CrossRef] [Green Version]
  5. Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. 2017. In Proceedings of the Thirty-First AAAI Conference on Artificial IntelligenceFebruary, San Francisco, CA, USA, 4–9 February 2017; pp. 4278–4284. [Google Scholar]
  6. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef] [Green Version]
  7. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
  8. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
  9. Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 Million Image Database for Scene Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1452–1464. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
  11. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef] [Green Version]
  12. Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic Routing Between Capsules. arXiv 2017, arXiv:1710.09829. [Google Scholar]
  13. Hinton, G.E.; Sabour, S.; Frosst, N. Matrix capsules with EM routing. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  14. Xiang, C.; Zhang, L.; Tang, Y.; Zou, W.; Xu, C. MS-CapsNet: A Novel Multi-Scale Capsule Network. IEEE Signal Process. Lett. 2018, 25, 1850–1854. [Google Scholar] [CrossRef]
  15. Yang, S.; Lee, F.; Miao, R.; Cai, J.; Chen, L.; Yao, W.; Kotani, K.; Chen, Q. RS-CapsNet: An Advanced Capsule Network. IEEE Access 2020, 8, 85007–85018. [Google Scholar] [CrossRef]
  16. Chen, Y.; Fan, H.; Xu, B.; Yan, Z.; Kalantidis, Y.; Rohrbach, M.; Shuicheng, Y.; Feng, J. Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27–28 October 2019; Volume: 1, pp. 3434–3443. [Google Scholar] [CrossRef] [Green Version]
  17. Hoogi, A.; Wilcox, B.; Gupta, Y.; Rubin, D.L. Self-Attention Capsule Networks for Image Classification. arXiv 2019, arXiv:1904.12483. [Google Scholar]
  18. LaLonde, R.; Torigian, D.A.; Bagci, U. Encoding High-Level Visual Attributes in Capsules for Explainable Medical Diagnoses. arXiv 2019, arXiv:1909.05926. [Google Scholar]
  19. Yang, J.; Shi, R.; Ni, B. MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis. arXiv 2020, arXiv:2010.14925. [Google Scholar]
  20. Deng, L. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]. IEEE Signal Process. Mag. 2012, 29, 141–142. [Google Scholar] [CrossRef]
  21. Feurer, M.; Klein, A.; Eggensperger, K.; Springenberg, J.T.; Blum, M.; Hutter, F. Auto-Sklearn: Efficient and Robust Automated Machine Learning; The Springer Series on Challenges in Machine Learning; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef] [Green Version]
  22. Jin, H.; Song, Q.; Hu, X. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar] [CrossRef] [Green Version]
  23. Hinton, G.E.; Krizhevsky, A.; Wang, S.D. Transforming Auto-Encoders. Artificial Neural Networks and Machine Learning—ICANN 2011; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar] [CrossRef] [Green Version]
  24. Deliège, A.; Cioppa, A.; Van Droogenbroeck, M. HitNet: A neural network with capsules embedded in a Hit-or-Miss layer, extended with hybrid data augmentation and ghost capsules. arXiv 2018, arXiv:1806.06519. [Google Scholar]
  25. Rajasegaran, J.; Jayasundara, V.; Jayasekara, S.; Jayasekara, H.; Seneviratne, S.; Rodrigo, R. DeepCaps: Going Deeper With Capsule Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognitionar, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar] [CrossRef] [Green Version]
  26. Gao, S.H.; Cheng, M.M.; Zhao, K.; Zhang, X.Y.; Yang, M.H.; Torr, P. Res2Net: A New Multi-Scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 652–662. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Xi, E.; Bing, S.; Jin, Y. Capsule Network Performance on Complex Data. arXiv 2017, arXiv:abs/1712.03480. [Google Scholar]
  28. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
  29. Milletari, F.; Navab, N.; Ahmadi, S. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. arXiv 2016, arXiv:1606.04797. [Google Scholar]
  30. Rundo, L.; Han, C.; Nagano, Y.; Zhang, J.; Hataya, R.; Militello, C.; Tangherloni, A.; Nobile, M.S.; Ferretti, C.; Besozzi, D.; et al. USE-Net: Incorporating Squeeze-and-Excitation blocks into U-Net for prostate zonal segmentation of multi-institutional MRI datasets. Neurocomputing 2019, 365, 31–43. [Google Scholar] [CrossRef] [Green Version]
  31. Rundo, L.; Han, C.; Zhang, J.; Hataya, R.; Nagano, Y.; Militello, C.; Ferretti, C.; Nobile, M.S.; Tangherloni, A.; Gilardi, M.C.; et al. CNN-Based Prostate Zonal Segmentation on T2-Weighted MR Images: A Cross-Dataset Study; Springer: Singapore, 2020. [Google Scholar]
  32. Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
  33. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar] [CrossRef] [Green Version]
  34. Kather, J.N.; Krisam, J.; Charoentong, P.; Luedde, T.; Herpel, E.; Weis, C.-A.; Gaiser, T.; Marx, A.; Valous, N.A.; Ferber, D.; et al. Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLoS Med. 2019, 16, e1002730. [Google Scholar] [CrossRef]
  35. Tsch, L.P.; Rosendahl, C.; Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 2018, 5, 180161. [Google Scholar] [CrossRef]
  36. Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell 2018, 172, 1122–1131.e9. [Google Scholar] [CrossRef] [PubMed]
  37. Bilic, P.; Christ, P.F.; Vorontsov, E.; Chlebus, G.; Chen, H.; Dou, Q.; Fu, C.-W.; Han, X.; Heng, P.-A.; Hesser, J.; et al. The Liver Tumor Segmentation Benchmark (LiTS). arXiv 2019, arXiv:abs/1901.04056. [Google Scholar]
  38. Kingma, D.P.; Ba, J. December. Adam: A Method for Stochastic Optimization, CoRR. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Figure 1. Attentive octave convolution capsule network. (a) Attentive octave convolution layer. (b) Main framework of AOC-Caps.
Figure 1. Attentive octave convolution capsule network. (a) Attentive octave convolution layer. (b) Main framework of AOC-Caps.
Applsci 12 02634 g001
Figure 2. An Overview of the Datasets: PathMNIST, DermaMNIST, OCTMNIST, PneumoniaMNIST, OrganMNIST_Axial, OrganMNIST_Coronal, and OrganMNIST_ Sagittal.
Figure 2. An Overview of the Datasets: PathMNIST, DermaMNIST, OCTMNIST, PneumoniaMNIST, OrganMNIST_Axial, OrganMNIST_Coronal, and OrganMNIST_ Sagittal.
Applsci 12 02634 g002
Table 1. Datasets in MedMNIST used in our experiments.
Table 1. Datasets in MedMNIST used in our experiments.
NameData ModalityTasks#Training#Validation#Test
PathMNISTPathologyMulti-Class89,99610,0047180
DermaMNISTDermatoscopeMulti-Class700710032005
OCTMNISTOCTMulti-Class97,47710,8321000
PneumoniaMNISTChest X-rayBinary-Class4708524624
OrganMNIST_AxialAbdominal CTMulti-Class34,581649117,778
OrganMNIST_CoronalAbdominal CTMulti-Class13,00023928268
OrganMNIST_SagittalAbdominal CTMulti-Class13,94024528829
Table 2. Details of PathMNIST dataset.
Table 2. Details of PathMNIST dataset.
Name#Training#Validation#Testing
adipose936610411338
background95091057847
debris10,3601152339
lymphocytes10,4011156634
mucus80068901035
smooth muscle12,1821354592
normal colon mucosa7886877741
cancer-associated stroma94011045421
colorectal adenocarcinoma epithelium12,88514321233
Table 3. Details of DermaMNIST dataset.
Table 3. Details of DermaMNIST dataset.
Name#Training#Validation#Testing
Actinic Keratoses and Intraepithelial Carcinoma2283366
Basal Cell Carcinoma35952103
Benign Keratosis-like Lesions769110220
Dermatofibroma801223
Melanoma779111223
Melanocytic Nevi46936711341
Vascular Lesions991429
Table 4. Details of OctMNIST dataset.
Table 4. Details of OctMNIST dataset.
Name#Training#Validation#Testing
Choroidal Neovascularization33,4843721250
Diabetic Macular Edema10,2131135250
Drusen7754862250
Normal46,0265114250
Table 5. Details of PneumoniaMNIST dataset.
Table 5. Details of PneumoniaMNIST dataset.
Name#Training#Validation#Testing
Normal1214135234
Pneumonia3494389390
Table 6. Details of OrganMNIST_Axial dataset.
Table 6. Details of OrganMNIST_Axial dataset.
Name#Training#Validation#Testing
Bladder19563211036
Femur-left1408233784
Femur-right1359225793
Heart1474392785
Kidney-left39635682064
Kidney-right38176371965
Liver616410333285
Lung-left391910331747
Lung-right392910091813
Pancreas30315291622
Spleen35615111884
Table 7. Details of OrganMNIST_Coronal dataset.
Table 7. Details of OrganMNIST_Coronal dataset.
Name#Training#Validation#Testing
Bladder1153191833
Femur-left626102442
Femur-right60896441
Heart600202421
Kidney-left1088132732
Kidney-right1170157737
Liver29864291836
Lung-left1002347550
Lung-right1022352558
Pancreas1173179750
Spleen1572205968
Table 8. Details of OrganMNIST_Sagittal dataset.
Table 8. Details of OrganMNIST_Sagittal dataset.
Name#Training#Validation#Testing
Bladder1148188811
Femur-left637104439
Femur-right61595447
Heart721246510
Kidney-left1132140704
Kidney-right1119159693
Liver34644912078
Lung-left741261397
Lung-right803275439
Pancreas20042801343
Spleen1556213968
Table 9. Performance of different capsule forms and routing.
Table 9. Performance of different capsule forms and routing.
Dataset NamesAOC-Caps (Dynamic Routing)AOC-Caps (EM Routing)
ACCPRERECF1AUCACCPRERECF1AUC
PathMNIST0.7940.7900.7820.7830.9520.8590.8400.8370.8450.960
DermaMNIST0.7320.4200.3680.3900.8830.7860.4850.4170.4470.890
OCTMNIST0.7070.7880.7600.7730.9500.7500.8200.8020.8190.955
PneumoniaMNIST0.8590.8620.8430.8470.9850.9310.9140.8930.8950.990
OrganMNIST_Axial0.8020.7990.7730.7850.9500.9310.9200.9030.9190.973
OrganMNIST_Coronal0.7940.8030.8920.8990.9400.9070.9110.9040.9040.965
OrganMNIST_Sagittal0.7390.6800.6650.6690.9670.7830.7310.7170.7230.980
Table 10. Performance of different pooling operations in AOC-Layer.
Table 10. Performance of different pooling operations in AOC-Layer.
Dataset NamesAOC-Caps (AVG)AOC-Caps (MAX)
ACCPRERECF1AUCACCPRERECF1AUC
PathMNIST0.8470.8290.8200.8320.9600.8590.8400.8370.8450.960
DermaMNIST0.7640.4470.3800.4030.8600.7860.4850.4170.4470.890
OCTMNIST0.7390.8070.7890.7980.9300.7500.8200.8020.8190.955
PneumoniaMNIST0.9180.8950.8790.8800.9850.9310.9140.8930.8950.990
OrganMNIST_Axial0.9270.9150.9000.9120.9700.9310.9200.9030.9190.973
OrganMNIST_Coronal0.8860.8890.8780.8830.9600.9070.9110.9040.9040.965
OrganMNIST_Sagittal0.7690.7180.7020.7100.9700.7830.7310.7170.7230.980
Table 11. Performance of AOC-Caps with/without attention module.
Table 11. Performance of AOC-Caps with/without attention module.
Dataset NamesAOC-Caps (without Attention)AOC-Caps (with Attention)
ACCPRERECF1AUCACCPRERECF1AUC
PathMNIST0.8050.7900.7850.7850.9450.8590.8400.8370.8450.960
DermaMNIST0.7210.4050.3370.3820.8500.7860.4850.4170.4470.890
OCTMNIST0.6890.7560.7430.7490.9430.7500.8200.8020.8190.955
PneumoniaMNIST0.8630.8470.8260.8330.9800.9310.9140.8930.8950.990
OrganMNIST_Axial0.8770.8640.8450.8560.9720.9310.9200.9030.9190.973
OrganMNIST_Coronal0.8320.8430.8300.8360.9550.9070.9110.9040.9040.965
OrganMNIST_Sagittal0.7110.6540.6300.6480.9550.7830.7310.7170.7230.980
Table 12. Performance of seven datasets in MedMNIST.
Table 12. Performance of seven datasets in MedMNIST.
Dataset NamesResNet18ResNet50
ACCPRERECF1AUCACCPRERECF1AUC
PathMNIST0.8440.8170.8170.8050.9660.8640.8320.8330.8200.978
DermaMNIST0.7210.4680.3730.3910.8950.7100.3630.3320.3430.886
OCTMNIST0.7580.8140.7490.7130.9510.7450.8090.7300.7020.951
PneumoniaMNIST0.8430.8950.7970.8170.9700.8570.8950.7970.8170.966
OrganMNIST_Axial0.9210.9240.9120.9170.9950.9160.9170.9060.9100.995
OrganMNIST_Coronal0.8890.8890.8860.8860.9900.8930.8900.8860.8860.992
OrganMNIST_Sagittal0.7620.7140.7060.6940.9690.7460.7050.6970.6920.970
Dataset NamesAuto-sklearnAutoKeras
ACCPRERECF1AUCACCPRERECF1AUC
PathMNIST0.186___0.5000.864___0.979
DermaMNIST0.734___0.9060.756___0.921
OCTMNIST0.595___0.8830.736___0.956
PneumoniaMNIST0.865___0.9470.918___0.970
OrganMNIST_Axial0.563___0.7970.929___0.996
OrganMNIST_Coronal0.676___0.8980.915___0.992
OrganMNIST_Sagittal0.601___0.8550.803___0.972
Dataset NamesGoogle AutoML VisionCapsNet:Dynamic
ACCPRERECF1AUCACCPRERECF1AUC
PathMNIST0.811___0.9820.7100.6930.6650.6820.851
DermaMNIST0.766___0.9250.6010.3320.3140.3080.807
OCTMNIST0.732___0.9650.5980.6570.6770.6690.890
PneumoniaMNIST0.941___0.9930.7380.7930.7730.7790.930
OrganMNIST_Axial0.818___0.9880.7380.7580.7410.7470.923
OrganMNIST_Coronal0.861___0.9860.7400.7470.7300.7420.943
OrganMNIST_Sagittal0.706___0.9640.6350.5950.5780.5830.852
Dataset NamesCapsNe:EMMS-Caps
ACCPRERECF1AUCACCPRERECF1AUC
PathMNIST0.8100.7990.7760.7830.9510.8430.8300.8090.8170.955
DermaMNIST0.7130.4300.3720.3980.8670.7200.4630.3890.4120.880
OCTMNIST0.6950.7750.7300.7520.9320.6730.7540.7300.7490.950
PneumoniaMNIST0.8420.8790.8360.8580.9600.8100.8330.8190.8250.960
OrganMNIST_Axial0.8700.8830.8450.8590.9870.8890.8900.8670.8730.980
OrganMNIST_Coronal0.8430.8570.8330.8390.9800.8630.8700.8510.8590.977
OrganMNIST_Sagittal0.7010.6670.6010.6280.9630.7420.7010.6780.6880.962
Dataset NamesDEEPCapsAOC-Caps
ACCPRERECF1AUCACCPRERECF1AUC
PathMNIST0.7910.7830.7700.7790.9650.8590.8400.8370.8450.960
DermaMNIST0.7490.4060.3370.3900.8980.7860.4850.4170.4470.890
OCTMNIST0.6000.6760.6590.6630.9200.7500.8200.8020.8190.955
PneumoniaMNIST0.8210.8470.8280.8360.9550.9310.9140.8930.8950.990
OrganMNIST_Axial0.8560.8670.8510.8590.9610.9310.9200.9030.9190.973
OrganMNIST_Coronal0.8470.8550.8420.8430.9500.9070.9110.9040.9040.965
OrganMNIST_Sagittal0.7370.6930.6590.6730.9600.7830.7310.7170.7230.980
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, H.; Li, Z.; Zhao, H.; Li, Z.; Zhang, Y. Attentive Octave Convolutional Capsule Network for Medical Image Classification. Appl. Sci. 2022, 12, 2634. https://doi.org/10.3390/app12052634

AMA Style

Zhang H, Li Z, Zhao H, Li Z, Zhang Y. Attentive Octave Convolutional Capsule Network for Medical Image Classification. Applied Sciences. 2022; 12(5):2634. https://doi.org/10.3390/app12052634

Chicago/Turabian Style

Zhang, Hong, Zhengzhen Li, Hao Zhao, Zan Li, and Yanping Zhang. 2022. "Attentive Octave Convolutional Capsule Network for Medical Image Classification" Applied Sciences 12, no. 5: 2634. https://doi.org/10.3390/app12052634

APA Style

Zhang, H., Li, Z., Zhao, H., Li, Z., & Zhang, Y. (2022). Attentive Octave Convolutional Capsule Network for Medical Image Classification. Applied Sciences, 12(5), 2634. https://doi.org/10.3390/app12052634

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop