Next Article in Journal
On Solutions of Some Functional Equations of Radical Type
Previous Article in Journal
Obtaining Symmetrical Gradient Structure in Copper Wire by Combined Processing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Convolutional Neural Networks: A Comprehensive Evaluation and Benchmarking of Pooling Layer Variants

1
Department of Computer Science, National University of Technology, Islamabad 44000, Pakistan
2
Department of Computing, NASTP Institute of Information Technology, Lahore 58810, Pakistan
3
Department of Information Systems, College of Computer and Information Science, King Saud University, Riyadh 11543, Saudi Arabia
4
Department of Computing, Riphah International University, Lahore 39101, Pakistan
5
Department of Computer Science, University of Science and Technology, Beijing 100083, China
*
Authors to whom correspondence should be addressed.
Symmetry 2024, 16(11), 1516; https://doi.org/10.3390/sym16111516
Submission received: 19 August 2024 / Revised: 5 November 2024 / Accepted: 6 November 2024 / Published: 12 November 2024
(This article belongs to the Section Computer)

Abstract

:
Convolutional Neural Networks (CNNs) are a class of deep neural networks that have proven highly effective in areas such as image and video recognition. CNNs typically include several types of layers, such as convolutional layers, activation layers, pooling layers, and fully connected layers, all of which contribute to the network’s ability to recognize patterns and features. The pooling layer, which often follows the convolutional layer, is crucial for reducing computational complexity by performing down-sampling while maintaining essential features. This layer’s role in balancing the symmetry of information across the network is vital for optimal performance. However, the choice of pooling method is often based on intuition, which can lead to less accurate or efficient results. This research compares various standard pooling methods (MAX and AVERAGE pooling) on standard datasets (MNIST, CIFAR-10, and CIFAR-100) to determine the most effective approach in preserving detail, performance, and overall computational efficiency while maintaining the symmetry necessary for robust CNN performance.

1. Introduction

The most sophisticated methods for performing various challenging tasks, such as image segmentation [1] and classification [2] in computer vision and image-analysis tasks, are convolutional neural networks (CNNs). Convolutions, nonlinear activations, and additional pooling operators are ample in each convolutional layer of a CNN, typically followed by one or more fully connected layers. CNNs are feedforward networks because the information is passed in only one direction from input to output. CNNs and artificial neural networks (ANNs) are rooted in biological principles. Their design is inspired by the brain’s visual cortex, alternately layered with simple and higher-order cells [3]. There are many different types of CNN architectures, but they consist of convolutional and pooling layers built into modules. Beneath these modules are one or more fully connected layers, similar to standard feedforward neural networks. Modules are typically stacked on top of each other to build complex models [4]. A typical CNN architecture for a toy-image-classification task is shown in Figure 1. Feeding the images directly into the network involves a series of convolution and pooling layers. The representations formed by these processes are fed into one or more fully connected layers. The classifier finally provides a proof of evaluation for the fully linked layers. Although this is the most commonly used basic design in the literature, many improvements to the architecture have recently been proposed to improve image classification accuracy or reduce computational overhead.
Similar to feature extraction, convolutional layers learn feature representations of input images. Neurons in convolutional layers are organized into feature maps. A set of trainable weights, often called a filter bank, connects each neuron in the receptive field of the feature map with its neighbors in the underlying layer. We combine the input with the learned weights to create a new feature map and pass the output to a nonlinear activation function. Each neuron in a feature map is constrained to have equal weights since the weights of relatively different maps within the same convolutional layer are variable, so multiple features can be obtained at each point [5]. The k t h output feature map y k can be defined more formally as
y k = f   ( w k × x )
where x represents the input image, w k represents the convolution filter attached to the kth feature map, the 2D convolution operator is the multiplication sign in this context, and f (.) represents the nonlinear activation function. You can use this operator to compute the inner product of the filter model at each location in the input image.
Before the arrival of Convolutional Neural Networks (CNNs), traditional machine learning models such as Support Vector Machines (SVM) [6] and K-Nearest Neighbors (KNN) [7] were commonly employed for image classification, where each pixel was treated as an individual feature. The introduction of CNNs revolutionized this approach by using convolutional layers to extract multiple features from an image, enhancing the prediction of output values. Since the convolution operation is computationally intensive, pooling layers were integrated into CNNs to make the process more efficient. Pooling reduces the computational load by down-sampling the input, which decreases the number of computations required while preserving the most critical information. The pooling method streamlines the processing within the network, maintaining essential details with significantly lower resource consumption [8].
While CNNs have significantly improved image classification, various enhancements have been proposed to further optimize performance, including hybrid models [9] that combine CNNs with other machine learning algorithms to improve accuracy or reduce computational complexity. Recent advancements, such as adaptive pooling strategies and dynamic pooling methods, adjust pooling operations based on the input, allowing for better flexibility and feature retention [10]. However, the impact of these newer methods on standard architectures like AlexNet, ResNet, and LeNet remains an area of active research.
This study aims to provide an overview of various pooling methods, discussing the benefits and drawbacks of each approach (see Table 1). Additionally, we compare their performance in classification tasks using three distinct datasets.
The main contributions of this study include the following:
  • The proposed study systematically evaluated multiple CNN architectures—LeNet, AlexNet, and ResNet—across various datasets (MNIST, CIFAR-10, and CIFAR-100). This comprehensive analysis sheds light on how these models perform on datasets of differing complexities and sizes, providing insights into their adaptability and generalization capabilities across different image-classification tasks.
  • By presenting the comparative performance metrics, the proposed study identifies which CNN architectures excel or struggle when applied to specific datasets. This helps researchers to understand the strengths and weaknesses of each model in handling distinct image datasets, aiding in informed model selection for particular tasks or datasets.
  • This study provides a comprehensive comparison of standard pooling methods—max and average pooling—evaluated across different CNN architectures, including CNN, AlexNet, ResNet, and LeNet, on multiple datasets. While prior studies have discussed individual pooling methods, few have provided a systematic comparison in this context. Additionally, in this study, these methods were evaluated in light of recent advancements, highlighting the practical implications for resource-constrained environments.
The rest of this study is organized as follows: Section 2 reviews work related to standard pooling methods proposed for computer vision and various image-analysis applications. Section 3 presents the datasets and experimental procedures, reviews and presents the results and document provides a detailed discussion of our study, and Section 4 summarizes the study.

2. Related Work

The publications included in this review were sourced through a thorough search that utilized combinations of the terms “Pooling”, “CNN”, and “Convolution” (along with related terms such as “convolutional”) across titles, keywords, and abstracts. Following an initial screening of the results, additional relevant literature was incorporated by carefully examining references and related works from the selected papers, with a focus specifically on the applications of CNNs. While some foundational studies, such as Yamaguchi’s introduction of max pooling in the early 1990s, are noted, the majority of pooling techniques and advancements have emerged in the last decade [11]. Figure 2 illustrates a sustained interest in pooling research over the past eight years, with only minor fluctuations in publication frequency.
Two pooling groups are commonly employed in CNN for feature-reduction purposes. The first is local pooling, which samples small local regions, such as 3 × 3 , to display the feature map. The second is global pooling. It derives a scalar value from the feature vector of the image representation of each feature across the feature map [12]. A fully connected layer takes all these representations and classifies them. In particular, the well-known Dense Net consists of one global pooling layer and four local pooling layers. The three commonly used types of pooling operations are max pooling, average pooling, and min pooling [13]. This study discusses each pooling operation’s properties, advantages, and limitations.

2.1. Max Pooling

Max pooling is a simple operation widely used in CNNs due to its lack of tuneable parameters [14]. The feature map’s spatial dimensions are enhanced by a mechanism, known as max pooling, that also provides network invariance. To accomplish network invariance, the k × k neighborhood is emphasized as having the highest value on the feature map. The max pooling method selects the largest element for each pooling zone. Considering sparse codes and simple linear classifiers, max pooling shows better performance. Due to these reasons, it has grown in popularity in recent years [15]. Max pooling’s stochastic features allow it to handle sparse representations efficiently, which is yet another reason for its success. The mathematical expression for max pooling is
f m a x X = m a x i x i
J-related filters are used for the composition of the mth max pooling band:
p j , m = max h j , m 1 N + r
Here, N 1 , , R is termed as a pooling shift, which allows for overlap within concerned pooling regions when N < R. The pooling layer reduces the output size from K convolution bands to M = ( K R N + 1 ) the pooling region and the expected results for the resulting layer p = [ p 1 , . p m ] R M . J .
The primary limitation of max pooling lies in its selection of the maximum element from the pooling region while disregarding other values, potentially leading to the loss of distinguishing features and critical information. Studies have highlighted that, despite enhancing computational efficiency and reducing dimensionality in CNNs, max pooling can compromise spatial information and introduce inconsistencies in activations [16]. Furthermore, in object-detection tasks, max pooling often results in poor localization accuracy, particularly for small or low-resolution objects [17]. To mitigate these drawbacks, a novel approach, called Spatial Pyramid Pooling (SPP), has been proposed, which employs multiple pooling layers at varying spatial resolutions to better capture spatial information, demonstrating superior performance over max pooling in benchmark object detection datasets [18].
Figure 3 illustrates the max pooling operation. In this example, the pooling region has an input size of 4-by-4, while the filter size, with a stride of 2, is 2-by-2. Max pooling extracts the maximum value of 20 from the first 2 × 2 segments (highlighted in green), and the highest values from each segment are selected to generate an output channel. However, max pooling only considers the largest value and ignores the others. As a result, when most elements have high values, significant features may be lost after max pooling, potentially leading to adverse outcomes.

2.2. Average Pooling

Down-sampling is performed by an average pooling layer by splitting the input into rectangular pooling areas and determining the average values of each region. The idea of extracting the feature by finding their average was first introduced by the authors of [19]. The proposed idea was implemented in the first convolution-based deep neural network. Figure 4 demonstrates the example of average pooling operation. The standard average pooling method divides the image input into several independent rectangular boxes. The average value for each box is determined, and the output channel is displayed. Average pooling is mathematically defined as
f a v e   X = 1 N   i = 1 N x i
where x is a vector representing activations from a rectangular box of N permutation in an image or a channel (for example, the size of the rectangular area in Figure 4 is 2 by 2. Average pooling used to be common, but with the arrival of the max pooling technique, its usefulness has been constrained [20], where the shortcoming seems to mostly lead to a decline in information in terms of contrast. All of the activation values in the rectangular box are considered when estimating the mean. The estimated mean will indeed be low if the strength of all the activation functions is low, resulting in diminished contrast. Whenever the majority of the activations in the pooling region have a zero value, the scenario will get much worse. In that situation, the convolutional feature characteristic would be reduced significantly. Noise-inducing elements are reduced by averaging. However, since it gives each element in the pooling region equal priority, background regions may predominate in the pooled representation, which might diminish the discrimination power [21].
Since neither max pooling nor average pooling consistently demonstrates superior performance, several techniques have emerged that combine the strengths of both methods, such as weighted pooling [22] and soft pooling [23]. These hybrid approaches introduce additional parameters, leading to increased learning time and computational overhead. However, these methods still face challenges, as they either prioritize the stronger activation or treat all activations equally, with existing studies primarily depending on activation values to address these limitations.

2.3. Min Pooling

Min pooling is a pooling operation that selects the minimum value within a sliding window, though it is less frequently used than max or average pooling, as it tends to preserve the smallest and least significant features of the input [24]. However, it is beneficial in specific applications, such as anomaly detection and background subtraction, where detecting differences from a reference signal is essential. A comparison of standard pooling methods—max, average, and min pooling—along with their strengths and weaknesses, is presented in Table 1. Studies indicate that max pooling is generally more effective for tasks requiring the capture of highly discriminative features, while average pooling offers greater robustness to noise and improved generalization by considering the overall context [25].
Recent advancements in CNNs have introduced hybrid models and adaptive pooling strategies to improve performance in various tasks. For example, Khairandish et al. (2022) proposed a hybrid approach combining CNNs with Support Vector Machines (SVMs) for more accurate image classification in limited-resource environments, demonstrating that hybrid methods can outperform traditional CNNs in certain contexts [26]. Similarly, Ding et al. (2024) introduced an adaptive pooling method for image text retrievel that adjusts pooling parameters based on input data, allowing the model to maintain more relevant features while reducing computational complexity [27]. Li et al. (2024) explored the benefits of combining CNNs with other algorithms, like Long Short-Term Memory (LSTM) networks, enabling enhanced feature extraction for recommendation algorithm [28]. Additionally, Han et al. (2019) conducted a survey of dynamic neural networks’ mechanism, which varies the pooling size depending on the input characteristics, offering better adaptability and precision for complex datasets [29]. Zhao et al. (2024) presented a mixed-pooling strategy where max and average pooling were combined in a single layer, demonstrating improvements in accuracy on specific benchmarks. While these studies have significantly contributed to the enhancement of CNNs, particularly through hybridization and adaptive-pooling techniques, they often focus on specific architectures or use cases. In contrast, the proposed study offers a systematic and direct comparison of three widely used pooling methods—max, min, and average pooling—across multiple architectures (AlexNet, ResNet, and LeNet) on diverse datasets. This broader evaluation provides a more generalizable understanding of how different pooling techniques impact performance across standard CNN architectures. Unlike the aforementioned works, which typically propose new architectures or adaptations, this study focuses on refining the core understanding of pooling operations, offering practical insights for improving CNN performance in resource-constrained environments without introducing additional complexity.

3. Material and Methods

To understand the impact of pooling techniques on the performance of convolutional neural networks (CNNs), this study precisely analyzes three standard datasets, each chosen for its unique characteristics and challenges. The methodologies employed are designed to systematically evaluate and compare the effectiveness of different pooling strategies, thereby offering insights into their implications for CNN performance.

3.1. Datasets

This study used three standard datasets to evaluate the performance of pooling techniques across various convolutional neural network (CNN) architectures: MNIST [30], CIFAR-10 [31], and CIFAR-100 [32]. These datasets represent a range of complexities, from simple handwritten digits (MNIST) to diverse object classes (CIFAR-10 and CIFAR-100). The selection of these datasets allows for a comprehensive analysis of how different pooling methods (max, average, and min pooling) impact model performance across varying levels of classification difficulty.

3.2. Model Architecture

Three widely adopted CNN architectures employed in this study are LeNet, AlexNet, and ResNet. These architectures were selected for their established performance and distinct structural characteristics, which provide a robust testing ground for evaluating different pooling strategies. LeNet is a smaller architecture primarily designed for simpler tasks, such as digit classification, and consists of convolutional and fully connected layers. In contrast, AlexNet features a deeper design with more convolutional layers, specifically engineered to address complex image classification problems. ResNet, recognized for its innovative use of residual connections, is deeper and more intricate than both LeNet and AlexNet, making it particularly well-suited for highly dimensional image-classification tasks. Together, these architectures create a comprehensive platform for analyzing the effects of various pooling techniques on model performance across a range of classification challenges.

3.3. Experimental Setup

The experiments in this study were conducted using Keras with TensorFlow as the backend framework, establishing a controlled environment for evaluating the performance of different pooling techniques across various CNN architectures. Each dataset, MNIST, CIFAR-10, and CIFAR-100, was divided using an 80/20 split, allocating 80% of the data for training and reserving 20% for testing. This division allowed for a thorough assessment of both recognition capabilities and generalization of performance of the proposed models. To ensure consistency and fairness in the evaluation, model training was executed under multiple configurations of batch sizes (32, 64, 128, and 256), learning rates (0.01, 0.001, and 0.0001), and dropout rates (ranging from 0.1 to 0.5, including a no-dropout condition). The Stochastic Gradient Descent (SGD) optimizer was utilized for the LeNet architecture, while the Adam and RMSProp optimizers were applied to AlexNet and ResNet, respectively, optimizing model performance for multi-class classification tasks. The categorical cross-entropy loss function guided the training process, with early stopping employed based on validation loss to mitigate overfitting. This comprehensive setup ensured a rigorous evaluation of each pooling method’s impact on model accuracy and generalization across diverse datasets. The proposed architecture of comparative approaches is presented in Table 2, Table 3, Table 4 and Table 5.

3.4. Result and Analysis

In our research, each comparative approach involving various neural network architectures, such as CNN, AlexNet, ResNet, and LeNet, was subjected to different configurations of batch size, learning rate, and dropout. Selecting distinct hyperparameters aimed to explore their impact on model performance and convergence across pooling techniques (max and average) on datasets like MNIST, CIFAR-10, and CIFAR-100. Given the sensitivity of these hyperparameters in influencing training dynamics, their variation resulted in divergent accuracy results across the models. Batch sizes were chosen to regulate the number of samples processed per iteration, affecting gradient updates and the convergence speed. Learning rates played a critical role in controlling the step size during optimization, impacting the model’s ability to navigate the loss landscape. Additionally, dropout rates were manipulated to mitigate overfitting by randomly deactivating neurons during training, affecting the model’s generalization capability. Consequently, the discrepancy in accuracy outcomes underscores the nuanced interplay between these hyperparameters and their consequential effect on model learning and generalization across different network architectures and pooling strategies.

3.5. Performance Evaluation in Terms of Accuracy

This section provides a comprehensive comparative analysis of the accuracy of the MNIST, CIFAR-10, and CIFAR-100 datasets across various convolutional neural network (CNN) architectures, including CNN, LeNet, AlexNet, and ResNet. The performance of these architectures is evaluated under different batch sizes, learning rates, and dropout rates, focusing on the effects of max pooling and average pooling techniques. The objective is to identify which pooling technique and architectural configuration yields the best performance for each dataset, thereby offering valuable insights into optimizing CNN models for diverse image-classification tasks. The detailed results of comparative approaches are presented in Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11.
The comparative analysis of CNN, AlexNet, ResNet, and LeNet using max and average pooling methods under varying batch sizes and dropout rates highlights the distinct impact of pooling strategies on model performance from Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11. Under max pooling conditions, AlexNet consistently outperformed other models, showcasing its ability to extract dominant features from the data efficiently. This superior performance is evident in AlexNet’s stable and high classification accuracy across different configurations, even as dropout rates and batch sizes varied. The architecture of AlexNet is particularly suited for capturing the most salient features, which is effectively facilitated by the max pooling method. In contrast, CNN emerged as the second-best performer in the max pooling setup. Although it demonstrated strong feature-extraction capabilities, CNN showed higher sensitivity to changes in hyperparameters compared to AlexNet, indicating some variability in performance.
When average pooling was employed, the performance dynamics shifted, with CNN taking the lead. CNN’s architecture leveraged the generalization provided by average pooling to smooth feature representations, resulting in more stable and consistent accuracy, particularly at moderate dropout rates and optimized batch sizes. This suggests that average pooling is effective for CNN, as it promotes a balanced feature representation across spatial dimensions and reduces overfitting. AlexNet, while still achieving respectable accuracy, performed suboptimally with average pooling. Its reliance on max pooling for optimal feature extraction limited its ability to fully exploit the benefits of average pooling, which is better suited for models that require feature smoothing rather than emphasizing the most activated features.
Both ResNet and LeNet exhibited relatively lower performance with both pooling methods. ResNet’s complex architecture with skip connections did not gain significant benefits from either pooling strategy, while LeNet’s simpler structure was unable to fully capitalize on the feature-extraction capabilities of max or average pooling. Overall, this study underscores the importance of selecting pooling strategies that align with the architectural strengths of deep learning models. The findings demonstrate that AlexNet excels with max pooling, while CNN performs best with average pooling, highlighting the necessity of adaptive pooling approaches to optimize model accuracy based on the specific dataset and model architecture.

3.6. Statistical Significance Analysis

To further validate the robustness and reliability of the comparative results between max and average pooling methods across standard datasets (CIFAR-10, CIFAR-100, and MNIST) using CNN, AlexNet, ResNet, and LeNet, a statistical significance analysis was conducted. Specifically, p-values were calculated to assess whether the differences in performance metrics, such as accuracy, precision, and computational efficiency, were statistically significant. The p-values provide an objective measure to determine whether the observed differences between pooling methods are due to random variations or represent true distinctions in performance. [33]. A significance threshold of 0.05 was adopted, where p-values below this level indicate that the performance differences are statistically significant and unlikely to have occurred by chance. Table 12 shows the comparative analysis of p-values on standard datasets.
The p-value results highlight the comparative performance of max pooling and average pooling across various models and datasets, demonstrating that max pooling consistently yields more statistically significant results in many cases. For instance, in the MNIST dataset, max pooling shows superior performance, especially with LeNet, where its p-values are significantly lower across all learning rates, indicating stronger statistical significance compared to average pooling. This trend is also evident in the CIFAR-10 dataset, where max pooling exhibits better robustness, particularly at a learning rate of 0.01, with a p-value of 0.0042 compared to 0.0001 for average pooling. Similarly, in the CIFAR-100 dataset, max pooling performs better at lower learning rates, maintaining strong statistical significance with more consistent p-values across architectures like CNN and LeNet. While average pooling occasionally shows lower p-values, particularly in deeper models like ResNet, max pooling demonstrates greater consistency and reliability across the board. Overall, max pooling emerges as the more robust and reliable pooling method, offering better statistical significance and performance stability across a variety of datasets and architectures. Figure 5 illustrates the comparative analysis of p-values for max pooling and average pooling across different neural network architectures and data.

3.7. Convergence Graph

A convergence graph visualizes the learning progress of a model over time, illustrating the effectiveness of various training parameters and architectures. This analysis focuses on the convergence graphs of four neural network architectures, CNN, AlexNet, LeNet, and ResNet, generated using Matplotlib, as shown in Figure 6. Key observations from the comparative analysis reveal that CNN and AlexNet exhibit stable convergence with minimal oscillations, indicating reliable training processes. In contrast, AlexNet converges quickly due to its deeper architecture, while ResNet shows initial instability but ultimately achieves strong performance, reflecting its robustness. AlexNet and ResNet demonstrate rapid initial improvements due to their complex designs, effectively capturing intricate patterns, whereas CNN and LeNet show more gradual progress. The plateau in AlexNet and stabilization in ResNet suggest these models reach optimal performance quickly, while CNN and LeNet may require more epochs for full optimization. Overall, the graphs highlight the distinct strengths of each architecture: CNN and AlexNet provide stable learning for simpler tasks, AlexNet excels in fast early stage learning for complex data, and CNN, despite initial fluctuations, demonstrates high performance and robustness. Adjusting hyperparameters like learning rate and batch size can further enhance these architectures for use in specific datasets and tasks.

3.8. Analysis of Optimal Pooling Performance and Parameter Trends

This section identifies and discusses the optimal performance achieved by max and average pooling across the MNIST, CIFAR-10, and CIFAR-100 datasets while also addressing the observed trends in performance based on parameter variations.
The CNN performs best with average pooling, achieving higher accuracy on MNIST and leveraging this pooling method to generalize effectively across the dataset. Specifically, CNN peaks with average pooling at 98.82% accuracy on MNIST using a learning rate of 0.001 and a batch size of 32 with a low dropout rate, suggesting that CNN benefits from the smoothed feature extraction of average pooling. In contrast, ResNet exhibits superior performance with max pooling, particularly on CIFAR-10 and CIFAR-100, where complex features demand higher selectivity. For instance, ResNet’s optimal CIFAR-10 performance with max pooling reaches 54.68% at a learning rate of 0.0001 and dropout of 0.4–0.5, showcasing its alignment with max pooling’s concentrated feature selection.
On the MNIST dataset, each model displays varied responses to pooling methods. LeNet performs moderately, achieving its peak of 98.9% accuracy with average pooling and a learning rate of 0.001 on batch size of 128, though it generally trails CNN and AlexNet. AlexNet demonstrates robustness with both pooling methods, attaining high performance with max pooling, especially on CIFAR-10, where it reaches 67.89% accuracy with a learning rate of 0.001 and moderate dropout. This suggests that AlexNet’s deeper architecture handles CIFAR-10′s feature complexity well under max pooling. ResNet, which excels at retaining intricate feature details, benefits significantly from max pooling across datasets, particularly CIFAR-100, achieving around 17.13% accuracy. Although this result is lower than with simpler datasets, it reflects ResNet’s high selectivity in feature extraction.
On CIFAR-100, which is more challenging due to its 100-class complexity, CNN’s accuracy peaks at 22.26% with max pooling, again showing a preference for structured feature selection. LeNet, on the other hand, struggles to maintain high accuracy across both pooling methods, achieving only around 13.5% accuracy on CIFAR-100, pointing to limitations in its simpler architecture for such complex data. AlexNet continues to perform relatively well on CIFAR-100, reaching its highest accuracy at around 30.89% with max pooling, while ResNet achieves its best result with max pooling, reaching around 17.13%. The analysis confirms that CNN benefits most from average pooling on MNIST, while ResNet performs best on max pooling across CIFAR datasets. This pattern underlines the importance of aligning pooling techniques with model architecture and dataset complexity for optimal performance.
The observed parameter trends in Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11 reveal the importance of the batch size, learning rate, and dropout rate adjustments, which significantly impact pooling performance across different architectures and datasets. Lower learning rates and moderate batch sizes generally enhanced model stability and accuracy, especially for complex datasets like CIFAR-100. Higher dropout rates tended to improve generalization in average pooling, which smooths feature representations, though they occasionally hindered accuracy in configurations where feature retention was critical, as in max pooling. These trends underscore that max pooling often benefits from moderate batch sizes and minimal dropout to retain strong activations, whereas average pooling performs optimally with larger batch sizes and higher dropout rates, especially for noisier datasets. It is important to note that while these trends offer insight, generalizing them across all applications is challenging, as parameter effects can vary significantly depending on dataset characteristics and task requirements. Adaptive parameter tuning based on dataset-specific needs could provide more reliable results in future studies. Overall, these findings suggest that selecting pooling methods based on dataset complexity and noise levels is crucial, particularly when applying CNNs to resource-constrained environments, where optimized configurations can greatly enhance performance and generalization.

3.9. Discussion

This research provided an in-depth analysis of various pooling methods in Convolutional Neural Networks (CNNs), specifically max pooling, across datasets such as MNIST, CIFAR-10, and CIFAR-100. The results demonstrate that each method has unique strengths and limitations that make it suitable for different tasks. Max pooling consistently performed well in scenarios where preserving high-contrast features and robustness to noise were critical. The ability of max pooling to capture the most prominent features from the feature maps allowed it to excel in classification tasks with high-dimensionality input, such as the CIFAR-100 dataset. However, the technique’s downside lies in its tendency to discard potentially useful information, which may explain its reduced performance when applied to datasets with smaller objects or more intricate details, where preserving all information is important. This is particularly relevant in applications such as small object detection, where discarding finer details may lead to poor localization accuracy, as noted in several studies on object detection. In contrast, average pooling showed more balanced feature representation and performed well in complex datasets, such as CIFAR-10. By smoothing out noise and reducing the impact of any outlier values in the feature map, average pooling can generalize better in tasks where the input is noisy or where capturing the overall context is more important than highlighting specific high-contrast features. For instance, in semantic segmentation tasks, where each pixel’s classification matters more than a focus on high-intensity regions, average pooling may outperform max pooling. However, the downside of this method is its inability to preserve sharp edges and fine details, which are crucial in high-precision tasks like medical image analysis or object detection. Min pooling, though less commonly used, showed its utility in highly specialized tasks such as anomaly detection and background subtraction. By focusing on the least prominent features, min pooling can highlight anomalies or subtle differences in an image, which can be critical in applications like fraud detection or medical imaging, where identifying outliers or rare features is essential. However, the sensitivity of min pooling to noise limits its general applicability in mainstream image-classification tasks, where higher-contrast features dominate.
The results also underscore that no single pooling method universally outperforms others across all tasks, architectures, and datasets. The effectiveness of a pooling technique is highly dependent on the specific task and dataset characteristics. Max pooling might be preferable for tasks involving object detection or datasets with large, prominent features, whereas average pooling would be a better choice for more balanced, noisy datasets requiring more generalization, such as in semantic segmentation. Moreover, recent advancements in CNN architectures, such as hybrid pooling methods and adaptive pooling strategies, have sought to combine the strengths of multiple pooling operations. For example, adaptive pooling methods, which dynamically adjust the pooling strategy based on the input, have shown promise in enhancing performance by balancing feature preservation and computational efficiency. These approaches allow CNNs to adapt better to the varying complexities of different datasets, especially for tasks requiring fine-grained classification like medical imaging and high-precision applications. In practical terms, the choice of pooling method has important implications for resource-constrained environments. Max pooling offers high accuracy but at the cost of potentially discarding important information, while average pooling provides a more generalizable solution but with potential loss of detail. Min pooling is effective for specialized tasks but may not be suitable for broader applications. Future research could benefit from exploring adaptive pooling techniques that optimize the trade-offs between computational cost and feature retention, ensuring that the choice of pooling method aligns with specific task requirements.

4. Conclusions

Convolutional Neural Networks (CNNs) play a crucial role in computer vision, with pooling layers improving efficiency by reducing complexity while preserving key features. This study compared max and pooling across CNN architectures (AlexNet, ResNet, and LeNet) using datasets like MNIST and CIFAR-10. Max pooling performed best in high-dimensionality tasks, and average pooling excelled in handling noisy data. Specifically, the practical recommendations based on the results are to use max pooling for resource-limited environments, and average pooling for tasks involving noisy datasets. While max pooling has demonstrated robust performance across the datasets in this study, average pooling may still play a valuable role in certain contexts, such as noisy data environments or tasks emphasizing spatial continuity. By exploring adaptive or hybrid pooling methods in future research, it may be possible to leverage the strengths of both max and average pooling to enhance model flexibility and performance across a broader range of applications. Future research should explore integrating pooling strategies with Vision Transformers (ViTs) to reduce computational overhead, and adaptive pooling methods could enhance performance in tasks like small object detection and fine-grained classification. Vision Transformers may revolutionize feature extraction by processing global image context without traditional pooling layers.

Author Contributions

Conceptualization, A.Z.; methodology, A.A. (Amerah Alabrah) and N.S.; software, S.Z.; investigation, M.N.; writing—original draft preparation, S.R. and M.S.; writing—review and editing, S.R. and A.A. (Amerah Alabrah); supervision, A.A. (Ali Arshad); funding acquisition, A.A. (Amerah Alabrah). All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Researchers Supporting Project number RSP2024R476, King Saud University, Riyadh, Saudi Arabia.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
  2. Archana, R.; Jeevaraj, P.E. Deep learning models for digital image processing: A review. Artif. Intell. Rev. 2024, 57, 11. [Google Scholar] [CrossRef]
  3. Singh, S.; Gupta, A.; Katiyar, K. Neural modeling and neural computation in a medical approach. In Computational Techniques in Neuroscience; CRC Press: Boca Raton, FL, USA, 2023; pp. 19–41. [Google Scholar]
  4. Taye, M.M. Theoretical understanding of convolutional neural network: Concepts, architectures, applications, future directions. Computation 2023, 11, 52. [Google Scholar] [CrossRef]
  5. Jiang, P.; Xue, Y.; Neri, F. Convolutional neural network pruning based on multi-objective feature map selection for image classification. Appl. Soft Comput. 2023, 139, 110229. [Google Scholar] [CrossRef]
  6. Valkenborg, D.; Rousseau, A.J.; Geubbelmans, M.; Burzykowski, T. Support vector machines. Am. J. Orthod. Dentofac. Orthop. 2023, 164, 754–757. [Google Scholar] [CrossRef]
  7. Zhang, Z. Introduction to machine learning: K-nearest neighbors. Ann. Transl. Med. 2019, 4, 218. [Google Scholar] [CrossRef]
  8. Zhao, T.; Xie, Y.; Wang, Y.; Cheng, J.; Guo, X.; Hu, B.; Chen, Y. A survey of deep learning on mobile devices: Applications, optimizations, challenges, and research opportunities. Proc. IEEE 2022, 110, 334–354. [Google Scholar] [CrossRef]
  9. De Oliveira, C.I.; do Nascimento, M.Z.; Roberto, G.F.; Tosta, T.A.; Martins, A.S.; Neves, L.A. Hybrid models for classifying histological images: An association of deep features by transfer learning with ensemble classifier. Multimed. Tools Appl. 2024, 83, 21929–21952. [Google Scholar] [CrossRef]
  10. Dogan, Y. A new global pooling method for deep neural networks: Global average of top-k max-pooling. Trait. Du Signal 2023, 40, 577–587. [Google Scholar] [CrossRef]
  11. Chen, Y.; Fang, J.; Zhang, X.; Miao, Y.; Lin, Y.; Tu, R.; Hu, L. Pool fire dynamics: Principles, models and recent advances. Prog. Energy Combust. Sci. 2023, 95, 101070. [Google Scholar] [CrossRef]
  12. Pan, X.; Xu, J.; Pan, Y.; Wen, L.; Lin, W.; Bai, K.; Fu, H.; Xu, Z. Afinet: Attentive feature integration networks for image classification. Neural Netw. 2022, 155, 360–368. [Google Scholar] [CrossRef] [PubMed]
  13. Zhao, L.; Zhang, Z. A improved pooling method for convolutional neural networks. Sci. Rep. 2024, 14, 1589. [Google Scholar] [CrossRef] [PubMed]
  14. Krichen, M. Convolutional neural networks: A survey. Computers 2023, 12, 151. [Google Scholar] [CrossRef]
  15. Matoba, K.; Dimitriadis, N.; Fleuret, F. Benefits of Max Pooling in Neural Networks: Theoretical and Experimental Evidence. In Transactions on Machine Learning Research; 2023. Available online: https://openreview.net/forum?id=YgeXqrH7gA (accessed on 15 September 2024).
  16. Qiu, Y.; Liu, Y.; Chen, Y.; Zhang, J.; Zhu, J.; Xu, J. A2SPPNet: Attentive atrous spatial pyramid pooling network for salient object detection. IEEE Trans. Multimed. 2022, 25, 1991–2006. [Google Scholar] [CrossRef]
  17. Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
  18. Zhou, J.; Liang, Z.; Tan, Z.; Li, W.; Li, Q.; Ying, Z.; Zhai, Y.; He, Y.; Shen, Z. RVDNet: Rotated Vehicle Detection Network with Mixed Spatial Pyramid Pooling for Accurate Localization. In International Conference on Artificial Intelligence and Communication Technology; Springer Nature: Singapore, 2023; pp. 303–316. [Google Scholar]
  19. Özdemir, C. Avg-topk: A new pooling method for convolutional neural networks. Expert Syst. Appl. 2023, 223, 119892. [Google Scholar] [CrossRef]
  20. Tang, T.N.; Kim, K.; Sohn, K. Temporalmaxer: Maximize temporal context with only max pooling for temporal action localization. arXiv 2023, arXiv:2303.09055. [Google Scholar]
  21. Bianchi, F.M.; Lachi, V. The expressive power of pooling in graph neural networks. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar] [CrossRef]
  22. Zhu, X.; Meng, Q.; Ding, B.; Gu, L.; Yang, Y. Weighted pooling for image recognition of deep convolutional neural networks. Clust. Comput. 2019, 22, 9371–9383. [Google Scholar] [CrossRef]
  23. Stergiou, A.; Poppe, R.; Kalliatakis, G. Refining activation downsampling with SoftPool. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10357–10366. [Google Scholar]
  24. Walter, B. Analysis of convolutional neural network image classifiers in a hierarchical max-pooling model with additional local pooling. J. Stat. Plan. Inference 2023, 224, 109–126. [Google Scholar] [CrossRef]
  25. Chen, J.; Hu, H.; Wu, H.; Jiang, Y.; Wang, C. Learning the best pooling strategy for visual semantic embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 15789–15798. [Google Scholar]
  26. Khairandish, M.O.; Sharma, M.; Jain, V.; Chatterjee, J.M.; Jhanjhi, N.Z. A hybrid CNN-SVM threshold segmentation approach for tumor detection and classification of MRI brain images. IRBM 2022, 43, 290–299. [Google Scholar] [CrossRef]
  27. Ding, Y.; Yu, J.; Lv, Q.; Zhao, H.; Dong, J.; Li, Y. Multiview adaptive attention pooling for image–text retrieval. Knowl.-Based Syst. 2024, 291, 111550. [Google Scholar] [CrossRef]
  28. Li, H.; Cheng, Y.; Ni, H.; Zhang, D. Dual-path recommendation algorithm based on CNN and attention-enhanced LSTM. Cyber-Phys. Syst. 2024, 10, 247–262. [Google Scholar] [CrossRef]
  29. Han, Y.; Huang, G.; Song, S.; Yang, L.; Wang, H.; Wang, Y. Dynamic neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7436–7456. [Google Scholar] [CrossRef] [PubMed]
  30. Seng, L.M.; Chiang, B.B.C.; Salam, Z.A.A.; Tan, G.Y.; Chai, H.T. MNIST handwritten digit recognition with different CNN architectures. J. Appl. Technol. Innov 2021, 5, 7–10. [Google Scholar]
  31. Giuste, F.O.; Vizcarra, J.C. Cifar-10 image classification using feature ensembles. arXiv 2020, arXiv:2002.03846. [Google Scholar]
  32. Singla, S.; Singla, S.; Feizi, S. Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100. arXiv 2021, arXiv:2108.04062. [Google Scholar]
  33. Hopkins, W.G.; Rowlands, D.S. Standardization and other approaches to meta-analyze differences in means. Stat. Med. 2024, 43, 3092–3108. [Google Scholar] [CrossRef]
Figure 1. Standard CNN architecture.
Figure 1. Standard CNN architecture.
Symmetry 16 01516 g001
Figure 2. Total number of publications on pooling techniques for CNNs in Scopus.
Figure 2. Total number of publications on pooling techniques for CNNs in Scopus.
Symmetry 16 01516 g002
Figure 3. Illustration of max pooling operation.
Figure 3. Illustration of max pooling operation.
Symmetry 16 01516 g003
Figure 4. Illustration of the average pooling operation.
Figure 4. Illustration of the average pooling operation.
Symmetry 16 01516 g004
Figure 5. Comparison of the statistical significance of p-values for max pooling and average pooling in CNN, LeNet, AlexNet, and ResNet across MNIST (b), CIFAR-10 (a), and CIFAR-100 (c) datasets.
Figure 5. Comparison of the statistical significance of p-values for max pooling and average pooling in CNN, LeNet, AlexNet, and ResNet across MNIST (b), CIFAR-10 (a), and CIFAR-100 (c) datasets.
Symmetry 16 01516 g005
Figure 6. Accuracy comparison against different parameters on MNIST, CIFAR 10, and CIFAR 100.
Figure 6. Accuracy comparison against different parameters on MNIST, CIFAR 10, and CIFAR 100.
Symmetry 16 01516 g006
Table 1. Standard pooling methods along with strengths and weaknesses.
Table 1. Standard pooling methods along with strengths and weaknesses.
Pooling MethodsDescriptionStrengthsWeaknessesCommon Use Cases
Max PoolingSelects the maximum value within each pooling window
  • Preserves strong features
  • Robust to noise
  • Improves generalization
  • can discard potentially useful information
  • May lead to overfitting in some cases
  • Image classification (e.g., MNIST and CIFAR-10)
  • Object detection
  • Natural language processing
Average PoolingCalculates the average value within each pooling window
  • Captures overall feature representation
  • Smoothes out noise
  • Less prone to overfitting
  • May blur important features
  • Less effective at preserving sharp edges
  • Image classification (e.g., ImageNet)
  • Semantic segmentation
Min PoolingSelects the minimum value within each pooling window
  • Captures dark features’ or background information
  • Useful in specific domains, like medical imaging
  • Less common than max or average pooling
  • Can be sensitive to noise
  • Medical image analysis
  • Texture analysis
Table 2. Proposed architecture of CNN.
Table 2. Proposed architecture of CNN.
LayersFeature Map
No. of Filters/Neurons
Filter Size/Kernel SizeSize of Feature MapActivation Function
Input LayerImage--32 × 32 × 3-
First LayerConvolution323 × 330 × 30 × 32relu
Second LayerPooling322 × 215 × 15 × 32
Third LayerConvolution643 × 313 × 13 × 64relu
Fourth LayerPooling642 × 26 × 6 × 64
Fifth LayerConvolution643 × 34 × 4 × 64relu
Sixth LayerPooling642 × 22 × 2 × 64
Seventh LayerFully Connected--64
Output LayerFully Connected--No. of ClassesSoftmax
Table 3. Proposed Architecture of LeNet.
Table 3. Proposed Architecture of LeNet.
LayersFeature Map
No. of Filters/Neurons
Filter Size/Kernel SizeStrideSize of Feature MapPadding
Input LayerImage---32 × 32 × 1-
First LayerConvolution65 × 5128 × 28 × 6same
Second LayerPooling62 × 2214 × 14 × 6valid
Third LayerConvolution165 × 5110 × 10 × 16valid
Fourth LayerPooling162 × 225 × 5 × 16valid
Fifth LayerConvolution1205 × 51120valid
Sixth LayerFully Connected---84-
Output LayerFully Connected---No. of Classes-
Table 4. Proposed architecture of AlexNet.
Table 4. Proposed architecture of AlexNet.
LayersFeature Map
No. of Filters/Neurons
Filter Size/Kernel SizeStrideSize of Feature MapPadding Activation Function
Input LayerImage1--227 × 227 × 3-relu
First LayerConvolution9611 × 11455 × 55 × 96samerelu
Pooling963 × 3227 × 27 × 96-
Second LayerConvolution2565 × 5127 × 27 × 256samerelu
Pooling2563 × 3213 × 13 × 256-
Third LayerConvolution3843 × 3113 × 13 × 384samerelu
Fourth LayerConvolution3843 × 3113 × 13 × 384samerelu
Fifth LayerConvolution2563 × 3113 × 13 × 256samerelu
Pooling2563 × 326 × 6 × 256-
Sixth LayerFully Connected---9216-relu
Seventh LayerFully Connected---4096-relu
Eight LayerFully Connected---4096-relu
Output LayerFully Connected---No. of Classes-Softmax
Table 5. Proposed architecture of AlexNet.
Table 5. Proposed architecture of AlexNet.
LayersFeature Map
No. of Filters/Neurons
Filter Size/Kernel SizeStrideSize of Feature MapPadding Activation Function
Input LayerImage---224 × 224 × 3--
First LayerConvolution647 × 72112 × 112 × 64samerelu
Second LayerPooling643 × 32112 × 112 × 64--
Stage 1Third Layer–Eleventh Layer Convolution641 × 1156 × 56 × 64validrelu
Convolution643 × 31samerelu
Convolution2561 × 11valid-
Stage 2Twelfth Layer–Twenty-Third Layer Convolution1281 × 1156 × 56 × 256validrelu
Convolution1283 × 31samerelu
Convolution5121 × 11valid-
Stage 3Twenty-Fourth Layer–Forty-First LayerConvolution2561 × 1128 × 28 × 512validrelu
Convolution2563 × 31samerelu
Convolution10241 × 11valid-
Stage 4Forty-Second Layer–Fiftieth LayerConvolution5121 × 1114 × 14 × 1024validrelu
Convolution5123 × 31samerelu
Convolution20481 × 11valid-
Fifty-First LayerPooling20482 × 2 7 × 7 × 2048--
Output LayerFully Connected---No. of Classes-Softmax
Table 6. Accuracy of max pooling on standard MNIST dataset with different learning rates and batch sizes with comparative approaches.
Table 6. Accuracy of max pooling on standard MNIST dataset with different learning rates and batch sizes with comparative approaches.
MNIST Max Pooling
Learning Rate0.01
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN97.7897.4595.9793.9494.1393.5896.0496.6294.1195.7096.1492.7896.9896.7896.3297.5796.8396.7897.5296.0296.2595.3196.2396.15
LeNet95.5795.1288.2690.1486.4482.4591.5189.3985.5685.4288.5686.5488.7587.989.3986.683.3588.0289.2590.5790.8285.3182.8584.96
AlexNet98.5897.9296.7297.5997.9396.6598.8898.4197.1196.7098.5496.5397.9798.0198.2598.2597.2897.1498.4498.4198.5398.6997.3298.35
ResNet96.5795.4392.1889.4589.4591.5894.0392.7991.5092.7294.2589.3294.2391.2593.5894.2589.2592.5692.4293.2593.7691.3189.5689.84
Learning Rate0.001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN97.5996.5998.1298.7996.4097.9895.2896.5997.5296.6695.5697.8696.8795.7896.7097.4296.4697.0596.5396.6997.6997.4297.1895.30
LeNet90.4591.4598.8588.7298.9187.2690.5891.7990.2588.8388.5087.5288.5489.3490.2188.9487.8189.7688.8690.9591.8090.8591.7589.75
AlexNet98.6698.4798.6997.5498.5695.2898.7698.9698.4098.6698.4698.8497.2598.8898.2098.4297.2497.9598.5598.3798.2698.5698.9597.73
ResNet93.4993.4589.2194.6287.8890.6293.6393.7292.5393.7890.4592.5891.2789.3492.0191.9489.5291.0292.2792.3393.2094.9492.9390.46
Learning Rate0.0001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN96.2195.1896.8496.5596.9996.3997.8498.0197.2596.1796.9894.6997.0396.3596.8095.5795.3793.0697.0696.1195.9396.4794.5494.42
LeNet90.5589.5489.3886.4285.3086.0488.3687.5485.7188.1685.9986.8186.0684.0283.9580.8582.5881.2882.3382.1584.0783.9784.7187.54
AlexNet98.5598.6998.5897.9498.2098.5498.3098.2898.1997.5997.7296.5298.2897.2797.8497.8797.5395.2798.2497.5997.6998.0396.6197.96
ResNet94.0990.1691.0789.0190.6691.4293.7890.4593.7191.7693.4292.3993.0694.4793.6688.7392.6888.7791.7291.3790.6393.9390.0392.10
Table 7. Accuracy of average pooling on standard MNIST dataset with different learning rates and batch sizes with comparative approaches.
Table 7. Accuracy of average pooling on standard MNIST dataset with different learning rates and batch sizes with comparative approaches.
MNIST Average Pooling
Learning Rate0.01
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN98.2498.2897.9696.8796.7596.3398.3698.4698.0598.1697.2996.4398.6698.4598.6498.3397.7896.8498.6298.8298.7998.5398.1797.41
LeNet95.1094.9194.0494.0294.4994.5796.4496.7596.8697.0196.9996.7096.5596.9596.9995.9194.0794.5596.1396.0196.3495.4094.2494.33
AlexNet97.5097.3196.6295.7795.8994.8597.0097.4897.3996.8795.4094.0797.2697.1097.0396.5095.7095.0997.2097.5297.5696.5595.9395.26
ResNet91.7591.4389.2688.9888.4188.9191.3789.3988.5085.4286.3582.3291.4987.7181.3686.6683.3580.6294.0090.5786.0285.3182.5481.69
Learning Rate0.001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN98.5198.8298.6898.6198.4497.8498.6698.5598.6598.4198.1997.9198.4698.3698.5898.1897.8097.3897.9097.9798.0397.7497.5096.78
LeNet96.5996.9096.8395.9496.7696.6596.6896.8096.9296.9696.7696.8396.6796.8296.8996.9296.8496.7396.5496.7396.6896.5996.6396.34
AlexNet97.6497.2797.5996.7197.5097.7497.8097.2997.4597.8297.5897.6397.5797.7697.6397.5097.8097.6897.6697.0697.8197.5497.5697.50
ResNet93.7391.4589.2188.7287.0885.6093.7091.7990.5488.8387.5085.7192.9291.3490.2189.9487.5286.0291.7990.3389.1987.9486.9385.46
Learning Rate0.0001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN98.9998.3797.8797.1896.6996.6898.0298.7197.3597.4496.5096.4797.0897.4297.8497.0097.7897.5797.1997.1897.4097.1897.4797.34
LeNet96.1996.1495.8396.7595.6094.2795.5796.4395.1495.9295.6394.7795.5595.1895.6895.4295.8395.2095.0194.7794.1993.6092.9692.23
AlexNet97.3297.2096.3897.1996.5695.8096.1897.4796.2596.3597.4895.3896.2396.8696.1896.2996.0596.1096.9996.4896.9896.1495.4296.72
ResNet89.3288.1687.0786.0184.6683.4287.7386.5485.7184.7683.4282.3985.4388.4783.6682.7281.6880.7781.9881.3680.6379.9379.0278.10
Table 8. Accuracy of max pooling on standard CIFAR 10 dataset with different learning rates and batch sizes with comparative approaches.
Table 8. Accuracy of max pooling on standard CIFAR 10 dataset with different learning rates and batch sizes with comparative approaches.
CIFAR 10 Max Pooling
Learning Rate0.01
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN48.5647.1044.2545.2544.2343.2844.8244.3945.1745.9145.9646.2146.4746.6447.1647.9347.8948.3948.9449.0849.7349.2050.0850.40
LeNet32.8534.8133.8931.7531.8631.1530.5428.1627.5227.8927.6028.7928.9828.1928.4028.8229.2029.7230.6030.8930.8531.5431.5432.41
AlexNet58.2056.4255.2056.5554.5653.8756.7157.5355.5854.8553.6352.0354.9055.8255.4753.0054.8755.3252.0750.1252.7354.2553.3555.39
ResNet40.2641.25.40.7541.5740.5841.2639.5436.0936.4737.7536.7338.9538.2137.0037.25.36.9537.0938.6039.0036.0335.5036.4937.2938.86
Learning Rate0.001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN58.4257.3859.8360.6561.6762.1559.3260.4659.2659.6559.9860.1560.3560.7560.8760.9861.1261.3561.4961.8861.9562.0362.4562.56
LeNet32.7634.1534.2534.8633.5834.9835.0535.2635.4835.5435.7535.8536.0236.4736.8936.9337.2537.4537.3837.4837.9838.0238.1938.95
AlexNet60.0561.9662.9163.6862.4864.1964.8265.0365.5965.9865.8966.2166.3566.6866.7466.8566.9866.9667.0567.5467.6867.8967.1467.68
ResNet39.9840.2540.9641.2641.6942.0342.2942.7442.8942.9843.0843.2543.5543.7543.9844.0544.3544.7544.8645.1245.2545.6845.8346.02
Learning Rate0.0001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN56.2357.0257.8658.2158.6558.9859.0259.4559.6859.9760.0160.2460.7560.8761.2561.4661.7862.0562.4562.8963.1963.4864.3564.26
LeNet33.1533.5733.7834.0234.5434.6535.0135.2634.6834.8735.0235.1536.0636.2436.4936.7836.9837.1537.8638.1938.4539.1539.4538.98
AlexNet61.2861.8562.3562.4562.9663.3163.7563.9864.1964.7664.5864.8965.0665.1865.4665.7565.8465.9667.1067.2167.4567.8567.8968.02
ResNet41.2541.8942.0542.6542.8642.9742.9850.0250.4350.6551.1151.6351.8952.1452.5452.6752.8953.2453.7553.9854.0254.6854.2554.36
Table 9. Accuracy of average pooling on CIFAR 10 dataset with different learning rates and batch sizes with comparative approaches.
Table 9. Accuracy of average pooling on CIFAR 10 dataset with different learning rates and batch sizes with comparative approaches.
CIFAR 10 Average Pooling
Learning Rate0.01
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN56.3556.2956.0156.9556.0056.5456.9254.8656.0956.2354.4549.0058.5550.4344.5947.6843.0541.6058.5354.4948.9949.7041.3536.06
LeNet44.5846.3339.5442.4734.2741.9049.7131.4630.1427.4127.1931.1936.2636.3635.0332.6131.8229.6940.2239.8439.3037.7535.7533.56
AlexNet54.6550.1942.3248.8247.0646.8455.6245.8347.4144.3543.2140.5449.9747.0743.4444.3240.9542.1749.9430.3135.7441.9538.2436.59
ResNet40.2632.0027.7528.5728.1523.3430.5426.0921.7421.7520.7321.9535.2128.0022.9023.9523.0920.6030.0026.0824.5024.4924.2923.86
Learning Rate0.001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN70.2870.0868.4468.4865.7359.6567.8968.6368.0566.6863.0460.2566.1263.8365.7663.5760.8557.0569.9061.0060.7259.7157.5055.37
LeNet41.8641.4940.0337.7635.3531.9541.9940.8537.8134.8331.4430.4440.6338.3234.6530.9830.8530.1338.6934.6131.4930.6230.3930.04
AlexNet53.9648.2843.7948.0644.0949.1054.7550.9856.1640.2146.1441.8352.1748.6448.0248.9349.0847.5450.9443.4145.0240.1447.1844.74
ResNet39.2834.7732.6731.2729.9927.3738.8334.7033.2832.2830.5728.7736.7734.3732.5231.4029.7328.7336.5635.0033.8532.8832.1731.28
Learning Rate0.0001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN64.0464.0563.1061.8360.3258.4461.4660.3459.7958.7757.3655.5358.8558.5357.5456.6355.6453.3556.8055.5054.8754.1852.8251.01
LeNet31.4430.1429.4929.3229.2629.0630.4829.4929.3129.2428.9928.7729.4229.0828.7828.6228.7628.4028.3428.2928.2128.0027.8827.82
AlexNet52.2153.2352.0158.9254.2048.8357.4159.5855.9051.6051.7646.3446.8342.4945.2948.2746.6343.6848.6356.6343.7847.7447.2942.86
ResNet33.9433.2332.5032.0431.2930.5833.1132.4431.8431.3731.1330.6631.4431.1732.1430.4330.3029.5928.3928.1627.5527.1026.8026.47
Table 10. Accuracy of MAX pooling on standard CIFAR 100 datasets with different learning rates and batch sizes with comparative approaches.
Table 10. Accuracy of MAX pooling on standard CIFAR 100 datasets with different learning rates and batch sizes with comparative approaches.
CIFAR 100 Max Pooling
Learning Rate0.01
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN12.0411.5011.8912.0012.0512.8512.4512.5013.6814.0114.3514.8815.5415.9616:0016.3216.4517:0018.2918.8419.2820.0120.5822.26
LeNet8.009.099.4510.0010.4510.6810.2511.0111.0111.2511.2811.8511.6311.9812.0112.2412.3512.6812.7512.8813.0113.2413.2813.50
AlexNet18.3017.2618.9218.0018.5618.6817.8618.1018.5618.8719.0219.3520.823.8424.5224.8525.5225.8826.3226.8928.0128.8429.5630.52
ResNet11.2610.7011.2511.7511.8512.0012:0512.8512.9512.9812.0812.6712.9813.0213.4313.8613.9814.0014.0214.5614.7815.0615.4616.85
Learning Rate0.001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN12.0512.3912.8513.0113.5213.7813.9913.6314.0214.7914.9514.7015.7815.9115.9616.4816.4616.6516.5017.3917.7818.7890.3820.51
LeNet8.598.768.988.969.159.769.8910.0110.059.8910.7610.8410.9911.0011.0511.4211.6511.8512.0112.2512.3612.8813.0113.95
AlexNet20.2520.5820.5421.2521.0821.5421.5623.6523.7223.8524.0224.3124.0824.5624.8524.9225.0025.0625.9626.7427.8528.0529.2530.89
ResNet11.0211.5811.6511.9812.2512.3512.9813.0213.3613.4913.9513.8614.0214.0914.2014.3914.5613.0213.2513.9814.0214.6814.9615.09
Learning Rate0.0001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN11.0511.5912.6512.9813.0813.1513.8514.4514.5914.9915.2115.3615.9715.9416.0116.4216.8516.7816.9717.0517.6417.8818.4218.96
LeNet9.029.329.679.9810.0210.2510.7610.6810.9811.0911.3511.6811.9911.9712.0812.3312.6513.0213.3513.9813.8813.8914.2514.65
AlexNet19.3619.9520.0520.6520.9820.8521.0921.8521.9622.1522.8522.4622.9923.3223.4523.9824.0024.2534.8525.9826.3527.6528.9729.56
ResNet10.2511.2611.7512.2512.7512.9813.0013.2513.7513.6713.9814.2014.6514.7815.0215.3215.6715.9416.0316.4516.6216.8416.9817.13
Table 11. Accuracy of average pooling on CIFAR 100 dataset with different learning rates and batch sizes with comparative approaches.
Table 11. Accuracy of average pooling on CIFAR 100 dataset with different learning rates and batch sizes with comparative approaches.
CIFAR 100 Average Pooling
Learning Rate0.01
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN14.0415.0016.4716.0015.0014.0015.0414.0015.4715.0015.2514.3717.9316.9516.3614.9615.5114.2622.7726.8622.0022.4222.2322.00
LeNet10.0009.0009.2509.2510.4510.2510.0009.0009.2509.2510.4510.2511.2309.7206.6611.0011.5911.0016.8212.0509.8811.4609.7808.00
AlexNet13.3013.2613.0213.0012.0013.2413.6913.3813.7813.2912.7813.8914.0013.1913.0113.0012.9012.3920.2221.4821.3521.2520.5721.40
ResNet08.2606.7006.6605.2803.6203.4509.0005.4705.6904.7605.0404.2308.4506.9204.6905.8505.1203.9209.5608.1807.6106.6206.2604.49
Learning Rate0.001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN31.5731.5831.1930.3630.5930.6029.9930.6329.0226.7924.0523.7030.7830.1127.3625.4823.3620.4529.2228.1929.4729.3528.1828.51
LeNet23.8424.3422.9621.5920.4518.9920.9620.0920.8320.9920.2318.4124.3323.7722.8421.6619.7618.1423.5922.6622.2120.6419.6117.79
AlexNet25.1925.9825.4125.1624.6824.8821.1721.6221.4121.9021.4921.1527.4927.3426.7624.8422.4319.2725.7925.7823.9223.0922.2420.69
ResNet10.7209.5208.5907.6507.1506.2111.2110.2409.3208.8508.0107.0810.7709.9209.0008.5408.0607.5510.2409.4808.6508.1207.7707.04
Learning Rate0.0001
Batch Size3264128256
Drop outNO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5NO0.10.20.30.40.5
CNN38.8637.5836.0533.9533.4731.7935.0835.1135.3932.7131.1330.4035.3733.4832.5931.0339.7238.9432.8932.3631.4830.6638.7536.37
LeNet19.0219.3118.0117.6217.1815.8717.8617.5117.2216.4515.9114.5216.0815.8515.5814.8614.3513.1414.4314.9513.5813.8212.7111.88
AlexNet26.9527.9629.6129.0326.5423.8227.5930.2221.0829.5625.4625.2729.9325.4226.6524.1425.2922.3129.5528.9823.8821.7522.9426.06
ResNet08.1807.8907.9807.5207.4406.7807.5007.5507.2306.8306.6506.5906.8706.8406.6806.2506.2506.0406.0805.7105.8506.0605.6605.54
Table 12. Comparative analysis of p-value on a standard dataset.
Table 12. Comparative analysis of p-value on a standard dataset.
ModelLearning Ratep Values
Max PoolingAverage Pooling
MNIST
CNN0.010.39610.1240
0.0010.62710.0075
0.00010.04460.0003
LeNet0.016.3906 × 10−91.8355 × 10−12
0.0010.04730.0082
0.00012.9744 × 10−81.5924 × 10−7
AlexNet0.010.25160.8749
0.0010.59020.5471
0.00010.22870.0172
ResNet0.010.99320.9822
0.0010.82420.8891
0.00010.18280.0005
CIFAR 10
CNN0.010.00420.0001
0.0010.56740.0490
0.00010.80083.1530 × 10−5
LeNet0.015.3257 × 10−111.2731 × 10−7
0.0015.1826 × 10−90.1558
0.00015.0826 × 10−130.0004
AlexNet0.010.23370.4183
0.0010.33980.9431
0.00010.03980.0719
ResNet0.010.13090.1332
0.0010.89090.8903
0.00011.0087 × 10−71.1491 × 10−7
CIFAR 100
CNN0.010.20070.2958
0.0010.97720.2828
0.00010.09770.0166
LeNet0.010.12010.0002
0.0010.09250.8823
0.00011.0136 × 10−52.5128 × 10−5
AlexNet0.010.39370.0322
0.0011.0139 × 10−50.0003
0.00010.54150.2337
ResNet0.010.32170.4312
0.0010.75160.7295
0.00010.17758.0274 × 10−7
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zafar, A.; Saba, N.; Arshad, A.; Alabrah, A.; Riaz, S.; Suleman, M.; Zafar, S.; Nadeem, M. Convolutional Neural Networks: A Comprehensive Evaluation and Benchmarking of Pooling Layer Variants. Symmetry 2024, 16, 1516. https://doi.org/10.3390/sym16111516

AMA Style

Zafar A, Saba N, Arshad A, Alabrah A, Riaz S, Suleman M, Zafar S, Nadeem M. Convolutional Neural Networks: A Comprehensive Evaluation and Benchmarking of Pooling Layer Variants. Symmetry. 2024; 16(11):1516. https://doi.org/10.3390/sym16111516

Chicago/Turabian Style

Zafar, Afia, Noushin Saba, Ali Arshad, Amerah Alabrah, Saman Riaz, Mohsin Suleman, Shahneer Zafar, and Muhammad Nadeem. 2024. "Convolutional Neural Networks: A Comprehensive Evaluation and Benchmarking of Pooling Layer Variants" Symmetry 16, no. 11: 1516. https://doi.org/10.3390/sym16111516

APA Style

Zafar, A., Saba, N., Arshad, A., Alabrah, A., Riaz, S., Suleman, M., Zafar, S., & Nadeem, M. (2024). Convolutional Neural Networks: A Comprehensive Evaluation and Benchmarking of Pooling Layer Variants. Symmetry, 16(11), 1516. https://doi.org/10.3390/sym16111516

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop