1. Introduction
Texture analysis seeks to establish the surrounding relationship of texture components and their location with respect to the others (connectivity), the number of components per spatial unit (density), and their commonness (homogeneity) [
1]. Models developed in the literature to characterize textures can be divided into statistical and geometric methods [
2]. The former aims to compute numerical measures by analyzing how and to what extent some textural properties are distributed. The latter explores distinct types of periodicity within an image and describes the texture therein with the relative spectral energy at distinct periodicities.
Texture characterization is not trivial, as this process presents considerable challenges, one of which is related to the capture conditions of images. Accordingly, lighting geometry or intensity changes can significantly impact a textural image’s appearance. Moreover, in both the microtexture and macrotexture, different texture types may constrain the characterization process and the definition of methods robust enough to represent texture information within an image, especially when both are present. Indeed, each texture mentioned above contains intrinsic proprieties that may require a specific statistical method to be represented to the maximum extent. This process involves image processing techniques in computer vision to characterize an image’s texture parameters. These techniques allow the extraction of descriptors from an image or a region related to characteristics that refer to intrinsic properties such as roughness, regularity, smoothness, etc. Thus, choosing which texture analysis method to employ for extracting features becomes challenging for the success of the classification phase. In addition, the metric used in the comparison of feature vectors is also crucial.
A variety of classical and novel methods for texture information extraction from images have been developed and employed successfully, such as gray-level co-occurrence matrix (GLCM) [
3], Haralick descriptors [
4], local binary patterns (LBPs) [
5], wavelet transform [
6], Markov random fields [
7], histogram of oriented gradients, and fractal models [
8]. Other interesting works exploring texture analysis can be found in [
9,
10,
11], and a review of most of these approaches can be found in [
12,
13].
Convolutional neural networks (CNNs) have recently drawn the interest of researchers due to their effectiveness in several tasks, such as the detection, segmentation, and classification of objects. CNNs learn abstract features and concepts directly from images [
14], with increasing complexity as the images go through many convolutional layers (CLs). The first CLs learn features such as edges and simple textures. The intermediate CLs learn more complex textures and patterns, and the last CLs learn objects or their parts [
14]. Andrearczyk and Whelan [
15] created a straightforward texture CNN architecture (T-CNN) for analyzing texture images that merges an energy measure at the final convolution layer and dumps the general shape information analyzed by conventional CNNs. Despite the encouraging results, the trade-off between complexity and accuracy is not advantageous. Other texture CNN architectures have also reached a reasonable texture classification performance [
16,
17,
18,
19]. Another drawback of CNNs is their inability to be explained and interpreted.
In most cases, current approaches for texture analysis focus on specific information within images. Accordingly, such approaches usually choose a limited set of texture features from contextual information in the form of a region of interest. However, relying solely on local texture information may negatively impact their subsequent classification because a texture is characterized by local information and its global appearance representing the repetition and the relationship between local patterns [
20]. Another factor that can penalize the classification of textural images is noise, which distorts the observed data. Usually, noise is inherent to the process of acquiring real-life images. In most cases, conventional and current classification methods still present performance problems for classifying images with noisy textures. In addition to the aspects mentioned earlier, such descriptors often suffer from limitations such as computational complexity and limited robustness to changes in scale, rotation, and illumination.
In recent years, bio-inspired texture descriptors have emerged as a promising alternative, leveraging the inherent capabilities of the human visual system to better characterize and discriminate between different textures. However, despite the success of some bio-inspired methods, there is still room for improvement in terms of accuracy, robustness, and computational efficiency.
To circumvent most problems described above, Ataky and Lameiras Koerich [
21] proposed a bio-inspired texture (BiT) descriptor, a generic descriptor that can describe both global and local texture information on various images [
22,
23]. Such a descriptor relies on biodiversity measures and taxonomic indices that can be interpreted and explained based on corresponding ecological concepts. The authors stated that textural patterns behave like ecological patterns. Therefore, large units can self-organize into assemblages that produce patterns from non-deterministic and nonlinear processes. From the ecosystem point of view, the proposed strategy performs well regardless of the challenges mentioned above on texture classification since biodiversity indices deal with such complexity. Furthermore, the BiT descriptor benefits from the invariance characteristics of ecological patterns to construct a rotation-, translation-, and permutation-invariant descriptor. In addition, it has shown benefits relative to several texture descriptors and deep approaches.
Diversity describes the multiplicity and abundance of species in a distinct unit of analysis. It is a measurement frequently employed to explain the complexity of a community [
24]. Evenness counts the relative abundance of the different species in the same area and is a way to measure how the species are evenly distributed in a community. In other words, it shows the equitability of the taxa frequencies in a community [
25]. Despite the concept of diversity being somewhat precise, Wagner et al. [
25] presented a few reasons that may constrain its application: (i) the existence of multiple typically utilized diversity indices that can produce different results; (ii) partitioning diversity into elements, such as evenness and richness, may be helpful, but changes depending on the diversity measure; and (iii) the terminology presently in use to characterize diversity is confusing and complex. For instance, with Shannon and Simpson, both indexes of diversity work differently. The former equally weights evenness and richness, whereas the latter gives more importance to evenness, and such differences in weighting clarify differences often perceived in results from each measure. What if we integrate and combine more diversity indices? In this paper, we state that incorporating more diversity and evenness indexes into the existing BiT descriptor, considering the variations in the mathematical properties of such indices, improves its global representation and leads to a more robust descriptor than the original one.
In light of the above, our study proposes a novel extended bio-inspired texture descriptor named E-BiT, which aims to address the limitations of existing approaches, including its baseline BiT, while maintaining high accuracy and robustness in texture analysis and characterization. Specifically, our goals are (i) to develop a descriptor that effectively captures a texture’s local and global information while being robust against geometrical transformation, noise, and illumination changes; (ii) to ensure computational efficiency and make the proposed descriptor suitable for real-time applications; (iii) to rigorously evaluate the performance of the E-BiT descriptor on benchmark datasets and compare it with existing state-of-the-art methods.
The main contribution of this paper is an extension of the BiT descriptor [
21] that integrates a few sets of diversity and evenness measures widely used in ecology to resemble the completeness of alpha diversity. Including such measures in the BiT descriptor [
21] improves its global representativeness and produces a more robust texture characterization and classification. More precisely, the contributions are (i) an extended version of the BiT descriptor (henceforth E-BiT) combining species diversity, evenness, richness, and taxonomic indexes to resemble the completeness of alpha diversity as a generalization of biodiversity analysis; (ii) a descriptor that captures the all-inclusive comportment of texture image patterns (both local and global features); (iii) the E-BiT descriptor is invariant to permutation, scale, and translation; (iv) the E-BiT descriptor is straightforward to calculate and has low computational complexity; (v) the E-BiT descriptor is a generic texture descriptor that works well on diverse image types, from natural textures to medical images. We validated the proposed E-BiT descriptor on breast cancer histopathologic and natural image datasets. The E-BiT descriptor achieved state-of-the-art results surpassing the BiT descriptor [
21], which confirms the relevance of including such diversity and evenness measures.
The organization of the rest of this paper is as follows.
Section 2 presents the alpha and evenness diversity indices that are integrated into the BiT descriptor.
Section 3.2 presents the experimental results of the E-BiT descriptor on three natural texture datasets and one HI dataset to assess its performance. The performance of the E-BiT descriptors is compared with other texture descriptors, such as the BiT descriptor, and CNN-based approaches. Finally, the conclusions are presented in the last section.
3. Experimental Protocol
In this section, we show how the proposed E-BiT descriptor can be used in conjunction with other image processing and machine learning techniques to perform the texture classification task. The classification scheme is the same employed in [
21] to evaluate the BiT and other texture descriptors, and it is structured into five stages: image channel splitting, preprocessing, feature extraction, normalization, and training/classification.
Figure 1 shows an overview of the proposed scheme.
To evaluate the E-BiT descriptor, we used three benchmark texture datasets and a histopathological image dataset. The datasets were chosen due to their diverse and challenging textures, which allowed us to test the performance of the proposed descriptor rigorously.
We followed a standard experimental protocol defined in [
21] for each dataset to ensure a fair comparison with other methods. We divided the dataset into training and test sets for each experiment, maintaining the same proportion of samples used in the literature. For natural texture images, we randomly selected 70% of the samples for training for each texture class and used the remaining 30% for testing. Regarding the histopathological images, we use
k-fold cross-validation with
k = 5 and 10.
3.1. Performance Evaluation and Statistical Analysis
To assess the performance of the E-BiT descriptor, we calculated the classification accuracy, which is the ratio of correctly classified samples to the total number of samples in the test set. We used different classifiers, which have been widely used in texture analysis research and allow for a fair comparison with other methods.
To ensure the statistical validity of our results, we repeated each experiment ten times with different random train–test splits. We calculated the mean classification accuracy and standard deviation for each experiment. The results were compared with those of other state-of-the-art methods using the paired McNemar test at a significance level of 0.05. The paired McNemar test allowed us to determine if the differences in classification accuracy between the E-BiT descriptor and other methods were statistically significant.
3.2. Datasets
We used four datasets to assess the performance of the proposed E-BiT descriptor, including histopathological images (HIs) and natural texture images. These datasets have previously been utilized to evaluate other texture descriptors, including Haralick, GLCM, and LBP [
12].
The Salzburg dataset (
Figure 2a) is made up of 476 color texture images of 10 categories with a resolution of 128 × 128 pixels, where 70% are used for development (training and validation) and 30% for test. The Outex_TC_00010_c dataset (
Figure 2b) contains a training set of 480 non-rotated color images belonging to 24 classes (20 per class). The test set includes 3840 color images in eight different orientations (5
, 10
, 15
, 30
, 45
, 60
, 75
, and 90
). The KTH-TIPS dataset (
Figure 2c) contains 810 color images of 200 × 200 pixels, where 70% are used for training and 30% for testing. The images were captured in three different lighting directions at nine scales and in three different poses, with 81 images per class.
The last dataset is the CRC [
35] (
Figure 3), which contains 5000 colorectal cancer histopathology images (HIs) cropped into patches of 150 × 150 pixels labeled into to 8 structure types: stroma (ST), complex stroma (C), tumor (T), lymphoid or immune cells (L), debris (D), adipose (AD), mucosa (M), and background or empty structures (E). HIs usually have structures like nuclei (shape) and tissue variations (colors) within the same classes, making them more challenging than pure texture images [
36].
Each structure described in the CRC dataset has a distinct textural feature. For instance, few shape characteristics can be found in the formation of cell nuclei, having a circular shape but distinct color due to hematoxylin. The CRC dataset has 625 images for each of the 8 structure types, summing up to 5000 images. The experiments with the CRC dataset were carried out using stratified 10-fold and 5-fold cross-validation (CV) to allow a fair comparison with state-of-the-art approaches.
Figure 3 illustrates samples from the CRC dataset for each class.
We compared the E-BiT descriptor’s performance with the original BiT descriptor [
21], which has already achieved state-of-the-art performance on the three texture datasets described above. Our main contribution lies in integrating ten more diversity and evenness indices to build a more discriminant texture descriptor capable of classifying textures efficiently due to its ability to capture the all-inclusive comportment of texture patterns, regardless of the latter constituting a non-deterministic complex system. Moreover, like the original BiT descriptor, the E-BiT descriptor is also permutation-, rotation-, and reflection-invariant. We followed the same scheme used for the BiT descriptor [
21] for feature extraction.
4. Experimental Results
To demonstrate the benefits of the newly integrated indices,
Table 1 compares the average accuracy achieved by the E-BiT descriptor and different classification algorithms with the accuracy achieved by BiT, GLCM, LBP, and Haralick descriptors with the same classification algorithms on the KTH-TIPS, Outex, and Salzburg datasets.
The proposed E-BiT descriptor achieved the best result on the Salzburg dataset with SuperL (95.79%). It outperformed the best BiT descriptor (94.23%), with a difference of 1.56%, and all other texture descriptors. The accuracy differences between E-BiT+SuperL and GLCM+k-NN and Haralick+SVM are nearly 20% and 8%, respectively. On the Outex dataset, the E-BiT descriptor also provided the best accuracy. The accuracy differences between the E-BiT and the other descriptors range from 0.12% to 16.79% for BiT+SVM and LBP+SuperL, respectively. Finally, the E-BiT descriptor surpassed all other descriptors on the KTH-TIPS dataset. For example, the E-BiT+SVM achieved the best accuracy of 98.92%, which is 1.05%, 4.03%, and 12.14% higher than the accuracy achieved by BiT+SVM, Haralick+SVM, and GLCM+SuperL, respectively.
The McNemar test, with a significance threshold of 95%, also revealed that the fraction of errors in each dataset is unique. As a result, the E-BiT descriptor shows a statistically significant advantage over the other compared feature descriptors when it comes to the top performing results.
Figure 4 illustrates the average accuracy of all the descriptors presented in
Table 1. In this figure, the E-BiT descriptor consistently outperforms the other methods in terms of average accuracy across all four datasets, indicating its effectiveness in a wide range of situations.
Comparing the results presented in
Table 1 with other approaches that have used identical datasets may not be appropriate due to disparities in the experimental protocols. For instance, most results reported on the Salzburg dataset overlook the subclasses used in the experimentations and the examples used in the test set. Mehta and Egiazarian [
37] proposed a method based on a rotation-invariant LBP with a
k-NN that achieved an accuracy of 96.26% on the Outex dataset. However, it has the disadvantage of not exploiting global features and color information. Du et al. [
38] proposed an illumination-invariant, impulse noise-resistant, and rotation-invariant method based on a local spiking pattern with a neural network, which achieved an accuracy of 86.1% on the Outex dataset. Nonetheless, it does not consider color textures and has many hyperparameters. Hazgui et al. [
39] defined a method based on genetic programming and combining LBP and HOG features. With a
k-NN, such an approach achieved 91.20% accuracy on the KTH-TIPS dataset. Nonetheless, color information and global features are not taken into account. Nguyen et al. [
40] proposed rotation- and noise-invariant statistical binary patterns. This method achieved an accuracy of 97.73% on the KTH-TIPS dataset, which is approximately 1.19% lower than that achieved by the E-BiT+SVM. However, this method has high computational complexity and is resolution-sensitive. Qi et al. [
41] used Shannon entropy and LBP to encode cross-channel texture correlation and investigated the relative variance of texture patterns among color channels. They proposed a multi-scale rotation-invariant cross-channel LBP (CCLBP). They compute the LBP descriptors for each channel, three scales, and consider co-occurrence statistics. Such an approach achieved 99.01% accuracy on the KTH-TIPS dataset with an SVM, which is nearly equal to the accuracy achieved by the E-BiT+SVM. Nonetheless, this approach is not scale-invariant.
Table 2 presents and compares the performance of BiT and E-BiT descriptors. Although the accuracy differs in 3.69% and 3.13% for
k-NN and SVM classifiers, respectively, and 3.04%, 1.90%, 1.07%, and 0.20% for LightB, XGBCB, SuperL, and HistoB ensemble classifiers, respectively, the E-BiT descriptor wins in all the cases. Another remark is that, unlike the BiT descriptor, the E-BiT descriptor obtained its best accuracy with a monolithic classifier (SVM), even though E-BiT+SuperL also outperformed BiT+SuperL, which provided the best average accuracy for the BiT descriptor. Furthermore, the McNemar test for the CRC dataset under SVM shows that, with the exception of the SuperL, all classifiers produce a similar percentage of errors on the test set. Thus, there is no statistically significant gap between the E-BiT descriptor’s best results and any other classifier on the CRC dataset. The discrepancies between SVM and SuperL, however, are statistically significant.
Table 3 compares the best result of
Table 2 achieved with the E-BiT+SVM to the state-of-the-art for the CRC dataset. The E-BiT outperforms almost all other methods in terms of accuracy. For example, considering an eight-class classification task and a ten-fold CV, the difference in accuracy to the second-best method (shallow) is 1.15%, and the difference is 1.71% to the third-best method (CNN). In addition, the E-BiT descriptor slightly outperformed the second-best method (CNN) for 5-fold CV, with a difference of 0.67%. The results highlight the advantages of the E-BiT descriptor over other shallow and deep methods.
Finally, we conducted an empirical evaluation to estimate the computational time required by the E-BiT descriptor for feature extraction and compare it to the computational time required by its baseline for the three texture datasets. The following is the average time per image for executing a Python implementation of Algorithm 1, including preprocessing: (i) KTH-TIPS dataset: 1.39 s and 1.42 s for BiT and E-BiT, respectively; (ii) Outex dataset: 461.8 ms and 498.8 ms for BiT and E-BiT, respectively; and (iii) Salzburg dataset: 881 ms and 910 ms for BiT and E-BiT, respectively. The methodology was developed using the Microsoft Windows 10 operating system and the Python programming language running on an Intel Core i7-8850H CPU @ 2.60 GHz, 64-bit operating system, x64-based processor, and 32 GB of RAM.
4.1. Analysis of the Experimental Results
The experimental results on diverse datasets showed that the E-BiT descriptor is generalizable and applicable to various texture analysis tasks. The reasons behind the improved performance and the potential advantages of using the proposed E-BiT descriptor for texture analysis and characterization are as follows. (i) The E-BiT’s design enables extracting more discriminative features by incorporating local and global texture information. It is achieved by combining the properties found in descriptors such as LBP, GLCM, and Gabor filters, resulting in a more comprehensive texture representation and better texture characterization and analysis. Compared to the baseline BiT, the E-BiT integrates ten more diversity and evenness indices to gain greater insight into gray-level interactions within a textural image. (ii) The evaluation demonstrated that the proposed E-BiT descriptor is robust and adaptable across various datasets, image conditions, and application scenarios. As shown in
Table 1,
Table 2 and
Table 3, the E-BiT descriptor consistently outperforms the other methods regarding average accuracy across all four datasets, indicating its effectiveness in various situations. (iii) When comparing the E-BiT descriptor with other methods, it combines the strengths of these methods and addresses their limitations, resulting in better overall performance. (iv) The E-BiT’s performance was thoroughly evaluated using diverse datasets, demonstrating its generalizability and applicability to a wide range of texture analysis tasks. The consistently high accuracy across different datasets highlights the descriptor’s robustness and ability to capture essential features of various texture types.
The advantages of the shallow approaches that use the E-BiT descriptor over CNNs are also noteworthy. CNNs have remarkable performance on object detection and recognition tasks. However, the shape information extracted by CNNs is of minor importance in texture analysis [
15]. For instance, Andrearczyk and Whelan [
15] developed a simple texture CNN (T-CNN) architecture for analyzing texture images that pools an energy measure at the last convolution layer and discards the overall shape information analyzed by classic CNNs. Despite the promising results, the trade-off between accuracy and complexity is unfavorable. Other T-CNN architectures have also achieved moderate performance in texture classification [
16,
17,
19,
47]. For instance, de Matos et al. [
16] and Ataky et al. [
47] carried out experiments on the three texture datasets with a tiny T-CNN of 11,900 parameters, trained without and with data augmentation (1×, 2×, 4×, and 6×). Such a tiny T-CNN achieved the best accuracy of 61.06%, 70.60%, and 70.22% for Salzburg, Outex, and KTH-TIPS datasets, respectively. These results are far below the accuracy achieved by E-BiT+SVM and E-BiT+SuperL, as reported in
Table 1.
CNNs have shown promising results in texture recognition tasks, but their performance depends on the dataset’s complexity and the specific characteristics of the textures. In some cases, shallow approaches that employ traditionally handcrafted texture descriptors, like LBP, Gabor filters, or GLCM, still provide competitive results, especially when datasets are small, or the textures are less complex. It is worth noting that massive labeled datasets are not always possible in medical imaging. Some challenges and limitations need to be addressed when using CNNs on texture classification tasks. (i) They have a large number of parameters to be learned on limited training data, which makes the model memorize the training data instead of generalizing well on unseen data. However, data augmentation, dropout, and regularization can mitigate overfitting. (ii) They typically require a large amount of labeled data to learn the complex patterns and features within the data. With small datasets, the performance may be less optimal. However, transfer learning, where pre-trained models are fine-tuned on the target dataset, can help improve performance in such cases. (iii) They are computationally expensive to train and may require specialized hardware, such as GPUs, for efficient training. Traditionally handcrafted descriptors might be more computationally efficient for small datasets or less complex texture analysis tasks. (iv) They still lack explainability and interpretability, especially in the medical field.
4.2. Invariance Properties of the E-BiT Descriptor
We have also evaluated the invariant properties of new aggregated alpha and evenness diversity indices. We applied different geometric transformations to each image, namely rotation of 90
and 180
, horizontal and vertical reflection, rescaling by 50%, and computing the values of all newly integrated features. The non-normalized values of the descriptors are shown in
Table 4 and
Table 5 for images taken randomly from KTH and CRC datasets, respectively.
The values of the E-BiT descriptor illustrated in
Table 4 and
Table 5 show that all measurements are reflection- and rotation-invariant, as they have similar values for original texture images and HIs. This also substantiates the fact that the E-BiT descriptors capture the all-inclusive comportment of patterns within an image. Simpson index (
), Gini coefficient (
), Heip’s evenness (
), Pielous evenness (
), McIntosh dominance diversity (
), and Simpson evenness (
) are scale-invariant, as they provided values in the order of the original images. On the other hand, most diversity measures based on abundance and richness exhibit some scale dependence. By rescaling the original image, we affect the proportion of both factors, which affects the resulting values either inversely or directly. However, normalizing such measurements by the total number of pixels can mitigate this effect. On the other hand, indexes based on evenness are based on the equitability of taxa frequencies in a community. As a result, they are unaffected by the change in scaling because evenness is determined by the intrinsic properties of the ecosystem (image).
5. Discussion
The E-BiT’s outstanding performance can be attributed to its ability to capture the all-inclusive behavior of texture image patterns at local and global levels. This ability is achieved by combining ecological concepts of species diversity, evenness, richness, and taxonomic indexes. These concepts offer a more holistic understanding of texture patterns and their relationships within an image, allowing the E-BiT descriptor to classify textures more effectively. Furthermore, the E-BiT invariance to scale, translation, and permutation contributes to its robustness and adaptability in various applications, such as natural and HI analysis. This is an advantage over other methods, which may need to account for these invariances.
However, the E-BiT descriptor also has its limitations. First, it relies on preprocessing techniques for normalization and geometric transformations, which may introduce additional complexity to the overall texture characterization process. This could be a disadvantage compared to other methods that do not require extensive preprocessing. In addition to such limitations, the E-BiT descriptor presents further disadvantages that should be considered. (i) It is unable to define dynamic ranges of the biodiversity and taxonomic indices. The ranges depend on the texture variability, which may limit the descriptor’s applicability in some cases. This issue necessitates further normalization. Thus, in future work, we will search to develop adaptive strategies for defining the indices’ ranges based on the image characteristics. (ii) It does not inherently provide a tolerance parameter to cope with noise in some types of images. That means preprocessing techniques may be required to address noise-related issues, adding complexity to the overall texture characterization process. (iii) The invariance to changes in image contrast is not yet conclusive, as its performance depends on the specific datasets being analyzed. This limitation highlights the need for further investigation into how the descriptor responds to different image contrasts and potential strategies for enhancing its contrast invariance.
Taking these additional disadvantages into account, the E-BiT descriptor, while promising, requires continued research to address its limitations and further refine its performance across various applications and settings.
In summary, the E-BiT descriptor offers a promising contribution to texture characterization by leveraging ecological concepts for a more comprehensive understanding of texture patterns. However, while it demonstrates superior performance compared to existing methods, further research is needed to explore its limitations and potential improvements in various applications and settings.
6. Conclusions and Future Work
This paper proposed an extended version of the bio-inspired texture descriptor (E-BiT) to characterize textures in images. This extension, named E-BiT, combines global ecological concepts of species diversity, evenness, richness, and taxonomic indexes to approximate the completeness of diversity as a generalization of biodiversity analysis. That allows the development of a descriptor that captures the all-inclusive behavior of texture image patterns at local and global levels. Furthermore, the E-BiT descriptor is insensitive to scale, translation, or permutation.
Compared to related methods for texture characterization and its baseline version, the E-BiT descriptor emerges as a promising texture characterization tool, achieving state-of-the-art texture classification performance. In addition, its generic nature is notable, as it performs well in natural and histopathologic images.
Moreover, notwithstanding the meaningful results, there are still some aspects where the E-BiT descriptor could be improved and extended. For future work, we consider the following research directions as very promising:
(i) Examining how the E-BiT descriptor behaves at different spatial scales and resolutions, which may allow for the most effective texture property extraction while maintaining or improving performance;
(ii) Incorporating dynamic range selection for biodiversity and taxonomic indices to enhance the descriptor’s adaptability to different texture variability within images;
(iii) Defining a tolerance parameter for the E-BiT descriptor to improve its robustness against noise without the need for preprocessing in certain types of images;
(iv) Studying the E-BiT descriptor’s invariance to changes in image contrast across different datasets to establish a more conclusive understanding of its performance. We expect that our work on the E-BiT descriptor will inspire further research and contribute to the ongoing development of texture analysis and characterization methods.