LBCNIN: Local Binary Convolution Network with Intra-Class Normalization for Texture Recognition with Applications in Tactile Internet
Abstract
:1. Introduction
- Introducing a novel architecture (LBCNIN) specifically tailored for texture recognition tasks in TI for multi-view images of the same object. This architecture encodes features from the backbone without requiring fine-tuning, utilizing a non-trainable local binary convolution layer inspired by LBP.
- Demonstrating the performance of LBCNIN across various texture benchmarks in comparison with state-of-the-art methods. The evaluation assumes the use of intra-class normalization technique. Additionally, an ablation study on different components of the architecture (BatchNorm2d, Activation functions, LBC layer, and GAP) highlights their respective impacts on accuracy, providing insights into the architecture’s effectiveness.
- Conducting an extensive evaluation of different backbone architectures within the context of texture recognition tasks, the study assessed the performance of various backbones such as MobileNet V2 1.4, ResNet18, ResNet50, and ConvNeXt-XL, across diverse texture datasets. The evaluation encompassed an ablation study on different components of the proposed architecture and exploring the effectiveness of different activation functions in the proposed feature encoding scheme. Moreover, 2D t-SNE and GradCAM visualizations provided insights into the model’s behavior, while confusion cases highlighted its limitations. CPU and GPU computation times for different backbone architectures were also compared.
2. Related Work
3. Proposed Method
3.1. LBCNIN Architecture
- MobileNet V2 1.4: ,
- ResNet18: ,
- ResNet50 and ConvNeXt-XL on ImageNet-21K: .
- Sigmoid Activation: The feature tensor undergo normalization and introduction of non-linearity through a Sigmoid function, ensuring all features are scaled between 0 and 1, which stabilizes the feature values for subsequent processing:
- LBC Layer [17]: The LBC layer introduces sparsity by generating binary random weights ( or 1) through a Bernoulli distribution with a probability of 0.5. Additionally, a binary mask (0 or 1) is created from a uniform distribution. The binary mask is applied element-wise to the weights, resulting in final weight values of , 0, or 1:
- Batch Normalization: The convolved feature tensor is normalized using BatchNorm2d, which standardizes the data and ensures consistent feature distribution among the samples within the same class:The use of the BatchNorm2d layer serves also as an intra-class normalization technique, which can significantly improves performance by ensuring that the input to activation functions in neural networks is normalized, thus reducing internal covariate shift [39]. This normalization process is beneficial within batches of images from the same class, as it helps the architecture learn more efficiently. By conducting additional testing, it has been determined that the optimal batch size for the process is 32. This size is also chosen to ensure the stability and consistency of the normalization process during inference.
- Second Sigmoid Activation: A Sigmoid activation is applied to the convolved normalized feature tensor to further refine the features and intensify non-linear transformations:The effectiveness of using these activation functions is further emphasized in Section 4.3 and Section 4.4.
- Feature Summation: The initial Sigmoid-transformed features and the processed features are summed, retaining both the original and enhanced information:This combination leads to a more comprehensive feature representation.
- GAP: Finally, a GAP layer is employed to reduce the dimensionality of the combined features. GAP summarizes each encoded feature tensor into a single value by averaging, effectively capturing the most important texture features while reducing the computational load:
3.2. Implementation Details
4. Experiments
4.1. Datasets and Experimental Protocols
- Describable Texture Dataset (DTD) [13]: This dataset consists of 5640 images categorized into 47 distinct texture classes. It is evaluated using the 10 provided splits for training-testing along with the dataset, ensuring robust performance metrics across different subsets. The same evaluation protocol is used in [30,31].
- KTH-TIPS2-b [8] (KTH-2-b): Featuring 4752 images from 11 different material categories, this dataset employs a fixed set of 4 splits for 4-fold cross-validation. This setup allows the model to be trained and tested on diverse subsets, promoting generalized learning. The same splitting scheme is also used in [34].
- Ground Terrain in Outdoor Scenes (GTOS) [44]: This extensive dataset includes 34,105 images representing 40 outdoor ground material classes. It utilizes a fixed set of 5 train/test splits, offering a robust evaluation framework for models designed to classify ground terrain textures. The same evaluation splits are used in [10,11,12,26,30,32,33].
- GTOS-Mobile [27]: Comprising 100,011 images captured via a mobile phone, this dataset features 31 different outdoor ground material categories. It is divided into a single train/test split, reflecting real-world mobile data collection scenarios. The same evaluation protocols are utilized in [10,11,12,32,33].
4.2. Comparison with State-of-the-Art Methods
4.3. Ablation Study on Different Architecture Components
4.4. Evaluation of Different Activation Functions
4.5. Assessing the Stability of LBCNIN Accuracy with Different Random Masks of LBC
4.6. Detailed Results
4.7. Timing Performance
4.8. Limitations of the Presented Work
- Necessity of high computational resources for larger datasets: The method demonstrates very good results on two datasets: DTD, and KTH-2-b, using the MobileNet V2 1.4 backbone, which is efficient in terms of computational resources. However, achieving the best results on the GTOS and GTOS-Mobile datasets required the use of the ConvNeXt-XL backbone, which is computationally intensive. Additionally, the results on the GTOS and GTOS-Mobile datasets are less favorable even when using the ResNet18 and ResNet50 backbones compared to other methods. This reliance on high-resource backbones for certain datasets may limit the method’s applicability in environments where computational resources are constrained.
- Confusion in similar textures: As noted in Section 4.6, LBCNIN encounters difficulty distinguishing between visually similar textures, particularly in the GTOS-Mobile dataset. This indicates that the model might struggle with textures that have subtle differences, which could impact its performance in applications requiring fine-grained texture discrimination. To address these two limitations, optimizing the LBCNIN architecture without significantly increasing computational demands is crucial. This may involve refining the feature extraction process, introducing novel layers, or aggregating information from intermediate blocks of the backbone network. Leveraging these blocks can capture hierarchical features and nuances effectively. Additionally, integrating data augmentation techniques during training can enhance generalization and robustness, thereby improving the model’s performance across diverse datasets. Specifically, augmenting the training data with more diverse examples of similar textures or using data augmentation techniques to create variations in the existing dataset can be beneficial for tackling the challenge of confusion in similar textures.
- Impact of noise on image texture recognition: The study did not investigate the impact of different types of noise on the experimental results. Future research should explore how various noise types affect the performance of texture recognition models like LBCNIN. This includes considering methods for image denoising using CNNs [49] or Gaussian noise enhanced training as in [50].
- Dataset variability: While the LBCNIN method shows strong performance on two benchmark datasets, its effectiveness might not generalize to other texture datasets not included in the study. The chosen datasets (DTD, KTH-2-b, GTOS, and GTOS-Mobile) represent a variety of textures, but real-world TI applications may involve textures with different characteristics that were not tested.
- Imbalanced datasets: Many texture recognition datasets suffer from imbalanced class distributions, where some classes have significantly fewer samples than others. This imbalance can adversely affect the accuracy of models like LBCNIN, particularly for classes with fewer training samples. Future research should explore techniques to mitigate the impact of imbalanced datasets, such as class re-weighting during training, data augmentation strategies that focus on minority classes, or advanced sampling techniques like oversampling or undersampling. These approaches could potentially improve the robustness and generalization of texture recognition models across diverse datasets with varying class distributions. To address this limitation, it would be beneficial to conduct experiments on additional texture datasets to assess the generalization performance of the LBCNIN method across a wider range of textures. This would provide insights into the model’s robustness and effectiveness in handling textures with different characteristics that were not tested in the current study.
- Requirement for suitably trained backbones: The performance of LBCNIN heavily relies on the quality and diversity of features extracted by pre-trained backbones. If these backbones are not adequately previously trained or if they do not capture relevant texture features, the overall performance of LBCNIN might suffer. To mitigate this limitation, it would be beneficial to explore the performance of LBCNIN with a wider range of backbone architectures. Experimenting with different pre-trained backbones, including those trained on diverse datasets or specialized in texture recognition tasks, could provide insights into the robustness and adaptability of the LBCNIN method.
- Requirement for several samples from the same class during the inference phase: The proposed architecture incorporates intra-class normalization techniques, which operate under the assumption that multiple images of the same texture class are available for testing. This is crucial for the method to effectively normalize and learn the distinct features of each texture class. However, if the classification task involves only a single image, the DNNLBCBN method may not be able to apply the same normalization process as intended, potentially resulting in suboptimal performance.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
BoVW | Bag-of-Visual-Words |
CLASSNet | Cross-Layer Aggregation of a Statistical Self-similarity Network |
CNNs | Convolutional Neural Networks |
DeepTEN | Deep Texture Encoding Network |
DTPNet | Deep Tracing Pattern encoding Network |
DL | Deep Learning |
LBCNIN | Local Binary Convolution Network with Intra-class Normalization |
DNNs | Deep Neural Networks |
ELU | Exponential Linear Unit |
FC | Fully Connected |
FV | Fisher Vector |
GAP | Global Average Pooling |
KNN | K-Nearest Neighbors |
LBC | Local Binary Convolution |
LBCNNs | Local Binary Convolutional Networks |
LBP | Local Binary Patterns |
LDA | Linear Discriminant Analysis |
MPAP | Multiple Primitives and Attributes Perception |
MSBFEN | Multi-Scale Boosting Feature Encoding Network |
RADAM | Random encoding of Aggregated Deep Activation Maps |
RAE | Randomized Autoencoder |
ReLU | Rectified Linear Unit |
RIFT | Rotation Invariant Feature Transform |
RPNet | Residual Pooling Network |
SIFT | Scale Invariant Feature Transform |
SVM | Support Vector Machine |
TI | Tactile Internet |
VLAD | Vector of Locally Aggregated Descriptor |
VR | Virtual Reality |
References
- Agarwal, M.; Singhal, A.; Lall, B. 3D local ternary co-occurrence patterns for natural, texture, face and bio medical image retrieval. Neurocomputing 2018, 313, 333–345. [Google Scholar] [CrossRef]
- Ding, C.; Choi, J.; Tao, D.; Davis, L.S. Multi-directional multi-level dual-cross patterns for robust face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 518–531. [Google Scholar] [CrossRef] [PubMed]
- Akiva, P.; Purri, M.; Leotta, M. Self-supervised material and texture representation learning for remote sensing tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8203–8215. [Google Scholar]
- Liu, L.; Chen, J.; Fieguth, P.; Zhao, G.; Chellappa, R.; Pietikäinen, M. From BoW to CNN: Two decades of texture representation for texture classification. Int. J. Comput. Vis. 2019, 127, 74–109. [Google Scholar] [CrossRef]
- Swetha, R.; Bende, P.; Singh, K.; Gorthi, S.; Biswas, A.; Li, B.; Weindorf, D.C.; Chakraborty, S. Predicting soil texture from smartphone-captured digital images and an application. Geoderma 2020, 376, 114562. [Google Scholar] [CrossRef]
- Bell, S.; Upchurch, P.; Snavely, N.; Bala, K. Material recognition in the wild with the materials in context database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3479–3487. [Google Scholar]
- Scabini, L.F.; Ribas, L.C.; Bruno, O.M. Spatio-spectral networks for color-texture analysis. Inf. Sci. 2020, 515, 64–79. [Google Scholar] [CrossRef]
- Caputo, B.; Hayman, E.; Mallikarjuna, P. Class-specific material categorisation. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, Beijing, China, 17–21 October 2005; IEEE: New York, NY, USA, 2005; Volume 2, pp. 1597–1604. [Google Scholar]
- Yang, Z.; Lai, S.; Hong, X.; Shi, Y.; Cheng, Y.; Qing, C. DFAEN: Double-order knowledge fusion and attentional encoding network for texture recognition. Expert Syst. Appl. 2022, 209, 118223. [Google Scholar] [CrossRef]
- Zhai, W.; Cao, Y.; Zha, Z.J.; Xie, H.; Wu, F. Deep structure-revealed network for texture recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11010–11019. [Google Scholar]
- Zhai, W.; Cao, Y.; Zhang, J.; Zha, Z.J. Deep multiple-attribute-perceived network for real-world texture recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3613–3622. [Google Scholar]
- Chen, Z.; Li, F.; Quan, Y.; Xu, Y.; Ji, H. Deep texture recognition via exploiting cross-layer statistical self-similarity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5231–5240. [Google Scholar]
- Cimpoi, M.; Maji, S.; Kokkinos, I.; Mohamed, S.; Vedaldi, A. Describing textures in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3606–3613. [Google Scholar]
- Andrearczyk, V.; Whelan, P.F. Using filter banks in convolutional neural networks for texture classification. Pattern Recognit. Lett. 2016, 84, 63–69. [Google Scholar] [CrossRef]
- Fujieda, S.; Takayama, K.; Hachisuka, T. Wavelet convolutional neural networks for texture classification. arXiv 2017, arXiv:1707.07394. [Google Scholar]
- Jogin, M.; Mohana; Madhulika, M.; Divya, G.; Meghana, R.; Apoorva, S. Feature extraction using convolution neural networks (CNN) and deep learning. In Proceedings of the 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 18–19 May 2018; IEEE: New York, NY, USA, 2018; pp. 2319–2323. [Google Scholar]
- Juefei-Xu, F.; Naresh Boddeti, V.; Savvides, M. Local binary convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 19–28. [Google Scholar]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Lazebnik, S.; Schmid, C.; Ponce, J. A sparse texture representation using local affine regions. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1265–1278. [Google Scholar] [CrossRef] [PubMed]
- Guo, Z.; Zhang, L.; Zhang, D. A completed modeling of local binary pattern operator for texture classification. IEEE Trans. Image Process. 2010, 19, 1657–1663. [Google Scholar] [PubMed]
- Jégou, H.; Douze, M.; Schmid, C.; Pérez, P. Aggregating local descriptors into a compact image representation. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; IEEE: New York, NY, USA, 2010; pp. 3304–3311. [Google Scholar]
- Bruna, J.; Mallat, S. Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1872–1886. [Google Scholar] [CrossRef] [PubMed]
- Cimpoi, M.; Maji, S.; Vedaldi, A. Deep filter banks for texture recognition and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3828–3836. [Google Scholar]
- Lin, T.Y.; Maji, S. Visualizing and understanding deep texture representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2791–2799. [Google Scholar]
- Zhang, H.; Xue, J.; Dana, K. Deep ten: Texture encoding network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 708–717. [Google Scholar]
- Xue, J.; Zhang, H.; Dana, K. Deep texture manifold for ground terrain recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 558–567. [Google Scholar]
- Bu, X.; Wu, Y.; Gao, Z.; Jia, Y. Deep convolutional network with locality and sparsity constraints for texture classification. Pattern Recognit. 2019, 91, 34–46. [Google Scholar] [CrossRef]
- Peeples, J.; Xu, W.; Zare, A. Histogram layers for texture analysis. IEEE Trans. Artif. Intell. 2021, 3, 541–552. [Google Scholar] [CrossRef]
- Xu, Y.; Li, F.; Chen, Z.; Liang, J.; Quan, Y. Encoding spatial distribution of convolutional features for texture representation. Adv. Neural Inf. Process. Syst. 2021, 34, 22732–22744. [Google Scholar]
- Mao, S.; Rajan, D.; Chia, L.T. Deep residual pooling network for texture recognition. Pattern Recognit. 2021, 112, 107817. [Google Scholar] [CrossRef]
- Song, K.; Yang, H.; Yin, Z. Multi-scale boosting feature encoding network for texture recognition. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4269–4282. [Google Scholar] [CrossRef]
- Chen, Z.; Quan, Y.; Xu, R.; Jin, L.; Xu, Y. Enhancing texture representation with deep tracing pattern encoding. Pattern Recognit. 2024, 146, 109959. [Google Scholar] [CrossRef]
- Zhai, W.; Cao, Y.; Zhang, J.; Xie, H.; Tao, D.; Zha, Z.J. On exploring multiplicity of primitives and attributes for texture recognition in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 403–420. [Google Scholar] [CrossRef]
- Scabini, L.; Zielinski, K.M.; Ribas, L.C.; Gonçalves, W.N.; De Baets, B.; Bruno, O.M. RADAM: Texture recognition through randomized aggregated encoding of deep activation maps. Pattern Recognit. 2023, 143, 109802. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- pytorch.org, Instalation of Pytorch v1.12.1. Available online: https://pytorch.org/get-started/previous-versions/ (accessed on 1 June 2024).
- Wightman, R. Pytorch Image Models (Timm). Available online: https://github.com/rwightman/pytorch-image-models (accessed on 1 June 2024).
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Wightman, R. PyTorch Image Models (resnet50.ram_in1k). Available online: https://huggingface.co/timm/resnet50.ram_in1k (accessed on 1 June 2024).
- Xue, J.; Zhang, H.; Nishino, K.; Dana, K.J. Differential viewpoints for ground terrain material recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1205–1218. [Google Scholar] [CrossRef] [PubMed]
- Wightman, R. Pytorch Image Models (Timm)-MobileNet V2. Available online: https://paperswithcode.com/lib/timm/mobilenet-v2 (accessed on 1 June 2024).
- Wightman, R. Pytorch Image Models (Timm)-ResNet. Available online: https://paperswithcode.com/lib/timm/resnet/ (accessed on 1 June 2024).
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Gildenblat, J.; Cid, J.; Hjermitslev, O.; Lu, M.; Draelos, R.; Butera, L.; Shah, K.; Fukasawa, Y.; Shekhar, A.; Misra, P.; et al. PyTorch Library for CAM Methods. 2021. Available online: https://github.com/jacobgil/pytorch-grad-cam (accessed on 2 June 2024).
- Ilesanmi, A.E.; Ilesanmi, T.O. Methods for image denoising using convolutional neural network: A review. Complex Intell. Syst. 2021, 7, 2179–2198. [Google Scholar] [CrossRef]
- Ye, H.; Li, W.; Lin, S.; Ge, Y.; Lv, Q. A framework for fault detection method selection of oceanographic multi-layer winch fibre rope arrangement. Measurement 2024, 226, 114168. [Google Scholar] [CrossRef]
Method | Backbone | Backbone Params * (Millions) | Backbone GFLOPs * | DTD | KTH-2-b | GTOS | GTOS-Mobile |
---|---|---|---|---|---|---|---|
RADAM light [35] | MobileNet V2 1.4 | 6 | 0.77 | 73.1 ± 0.9 | 86.8 ± 3.1 | 81.7 ± 1.7 | 78.2 |
LBCNIN (proposed) ** | 87.2 ± 0.8 | 90.4 ± 5.0 | 83.5 ± 1.6 | 80.6 | |||
DeepTEN [26,33] | ResNet18 | 12 | 2 | - | - | - | 76.1 |
HistRes [29] | - | - | - | 79.8 ± 0.8 | |||
DEPNet [27] | - | - | - | 82.2 | |||
FENet [30] | 69.6 | 86.6 ± 0.1 | 83.1 ± 0.2 | 85.1 ± 0.4 | |||
RPNet [31,33] | 71.6 ± 0.7 | 86.7 ± 2.7 | 83.3 ± 2.2 | 76.6 ± 1.5 | |||
MAPNet [11] | 69.5 ± 0.8 | 80.9 ± 1.8 | 80.3 ± 2.6 | 83.0 ± 1.6 | |||
DSRNet [10] | 71.2 ± 0.7 | 81.8 ± 1.6 | 81.0 ± 2.1 | 83.7 ± 1.5 | |||
RADAM [35] | 68.1 ± 1.0 | 84.7 ± 3.6 | 80.6 ± 1.7 | 79.5 | |||
CLASSNet [12] | 71.5 ± 0.4 | 85.4 ± 1.1 | 84.3 ± 2.2 | 85.3 ± 1.3 | |||
DTPNet [33] | 71.8 ± 0.7 | 86.7 ± 1.3 | 84.8 ± 2.4 | 87± 1.2 | |||
MPAP [34] | 72.4 ± 0.7 | 87.9 ± 1.5 | 85.5 ± 1.7 | 85.5 ± 1.6 | |||
LBCNIN (proposed) ** | 89.3 ± 0.6 | 89.8 ± 3.7 | 81.4 ± 1.7 | 74.7 | |||
DeepTEN [26,33] | ResNet50 | 26 | 5 | 69.6 | 82.0 ± 3.3 | 84.5 ± 2.9 | - |
HistRes [29] | 72.0 ± 1.2 | - | - | - | |||
DEPNet [27] | 73.2 | - | - | - | |||
FENet [30] | 74.2 ± 0.1 | 88.2 ± 0.2 | 85.7 ± 0.1 | 85.2 ± 0.4 | |||
RPNet [31,33] | 73.0 ± 0.6 | 87.2 ± 1.8 | 83.6 ± 2.3 | 77.9 ± 0.3 | |||
MAPNet [11] | 76.1 ± 0.6 | 84.5 ± 1.3 | 84.7 ± 2.2 | 86.6 ± 1.5 | |||
DSRNet [10] | 77.6 ± 0.6 | 85.9 ± 1.3 | 85.3 ± 2.0 | 87.0 ± 1.5 | |||
RADAM [35] | 75.6 ± 1.1 | 88.5 ± 3.2 | 81.8 ± 1.1 | 81 | |||
CLASSNet [12] | 74.0 ± 0.5 | 87.7 ± 1.3 | 85.6 ± 2.2 | 85.7 ± 1.4 | |||
MSBFEN [32] | 77.8 ± 0.5 | 86.2 ± 1.1 | 86.4± 1.8 | 87.6 ± 1.6 | |||
DTPNet [33] | 73.5 ± 0.4 | 88.5 ± 1.6 | 86.1 ± 2.5 | 88.0 ± 1.2 | |||
MPAP [34] | 78.0 ± 0.5 | 89.0 ± 1.0 | 86.1 ± 1.8 | 88.1 ± 1.3 | |||
LBCNIN (proposed) ** | 93.5 ± 0.8 | 91.3 ± 4.7 | 81.5 ± 2.1 | 80.8 | |||
RADAM [35] | ConvNeXt-XL in ImageNet-21K | 350 | 60.9 | 83.7 ± 0.9 | 94.4 ± 3.8 | 87.2 ± 1.9 | 90.2 |
LBCNIN (proposed) ** | 89.4 ± 0.9 | 96.1 ± 3.3 | 87.3 ± 1.6 | 91.8 |
Backbone | Batch Norm2d | Activation Functions Sigmoid | LBC Layer | GAP | DTD | KTH-2-b |
---|---|---|---|---|---|---|
MobileNet V2 1.4 | ✓ | 71.4 ± 0.7 | 86.4 ± 2.5 | |||
✓ | ✓ | 69.3 ± 1.0 | 84.8 ± 3.0 | |||
✓ | ✓ | ✓ | 70.7 ± 1.2 | 85.6 ± 1.8 | ||
✓ | ✓ | 68.6 ± 0.8 | 83.5 ± 2.5 | |||
✓ | ✓ | ✓ | 83.4 ± 0.7 | 89.6 ± 3.8 | ||
✓ | ✓ | ✓ | ✓ | 87.2 ± 0.8 | 90.4 ± 5.0 | |
ConvNeXt-XL in ImageNet-21K | ✓ | 81.3 ± 0.9 | 93.4 ± 4.1 | |||
✓ | ✓ | 80.9 ± 0.9 | 93.8 ± 4.2 | |||
✓ | ✓ | ✓ | 82.0 ± 0.9 | 93.7 ± 4.0 | ||
✓ | ✓ | 79.8 ± 1.0 | 91.7 ± 4.6 | |||
✓ | ✓ | ✓ | 83.4 ± 1.0 | 94.4 ± 4.0 | ||
✓ | ✓ | ✓ | ✓ | 89.4 ± 0.9 | 96.1 ± 3.3 |
Backbone | Activation Function | DTD | KTH-2-b |
---|---|---|---|
MobileNet V2 1.4 | ReLU | 80.8 ± 0.9 | 88.8 ± 3.6 |
LeakyReLU | 80.8 ± 0.8 | 88.9 ± 3.6 | |
ELU | 84.1 ± 0.8 | 89.2 ± 4.2 | |
Swish | 80.5 ± 0.9 | 88.7 ± 3.6 | |
Sigmoid | 87.2 ± 0.8 | 90.4 ± 5.0 | |
ConvNeXt-XL in ImageNet-21K | ReLU | 82.7 ± 1.0 | 93.8 ± 4.0 |
LeakyReLU | 82.8 ± 1.0 | 93.8 ± 4.0 | |
ELU | 84.57 ± 1.0 | 94.2 ± 4.0 | |
Swish | 82.7 ± 1.0 | 93.7 ± 4.0 | |
Sigmoid | 89.4 ± 0.9 | 96.1 ± 3.3 |
Backbone | Component | CPU Time (ms) | GPU Time (ms) |
---|---|---|---|
MobileNet V2 1.4 | backbone (forward pass) | 35.5 ± 9.1 | 16.3 ± 10.2 |
LBCNIN (inference) | 43.3 ± 9.4 | 17.2 ± 10.2 | |
ConvNeXt-XL in ImageNet-21K | backbone (forward pass) | 394.6 ± 43.6 | 26.2 ± 6.7 |
LBCNIN (inference) | 523.2 ± 44.2 | 185.8 ± 8.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Neshov, N.; Tonchev, K.; Manolova, A. LBCNIN: Local Binary Convolution Network with Intra-Class Normalization for Texture Recognition with Applications in Tactile Internet. Electronics 2024, 13, 2942. https://doi.org/10.3390/electronics13152942
Neshov N, Tonchev K, Manolova A. LBCNIN: Local Binary Convolution Network with Intra-Class Normalization for Texture Recognition with Applications in Tactile Internet. Electronics. 2024; 13(15):2942. https://doi.org/10.3390/electronics13152942
Chicago/Turabian StyleNeshov, Nikolay, Krasimir Tonchev, and Agata Manolova. 2024. "LBCNIN: Local Binary Convolution Network with Intra-Class Normalization for Texture Recognition with Applications in Tactile Internet" Electronics 13, no. 15: 2942. https://doi.org/10.3390/electronics13152942
APA StyleNeshov, N., Tonchev, K., & Manolova, A. (2024). LBCNIN: Local Binary Convolution Network with Intra-Class Normalization for Texture Recognition with Applications in Tactile Internet. Electronics, 13(15), 2942. https://doi.org/10.3390/electronics13152942