2.2. Five Sensory Systems
Current studies usually utilize five sensory systems to acquire various forms of information from the external environment, and these sensory systems operate independently and interactively [
19]. Schifferstein et al. [
20] found that people could understand most product details through visual and tactile senses. Stadtlander and Murdoch [
21] noted that about 60% of the identification and description of product characteristics were obtained through the visual sense, and 32% through the tactile sense. Pietra [
22] conducted the design, and preliminary testing, of a virtual reality driving simulator capable of communicating tactile and visual information to promote ecologically sustainable driving behaviors. Abbasimoshaei and Kern [
23] explored the relationship between hardness and roughness perceptions during the pressing of materials with different surface textures under different forces. The results showed a significant correlation between perceptions of perception and roughness, as well as the influence of applied force on the perception of roughness. Osman et al. [
24] presented a method for surface detection using a robot and vibrotactile sensing. Machine learning algorithms were used to classify different surfaces, based on the vibrations detected by an accelerometer.
From a physical perspective, light inherently lacks ‘color’; color is purely a perception created by one’s eyes and brain in response to light frequencies. Different spectra can be perceived as the same color by humans, indicating that color definition is highly subjective. In 1931, the International Commission on Illumination (CIE) proposed the first generation of chromaticity coordinates, establishing color identities based on the corresponding values of mixed red (R), green (G), and blue (B) light, and RGB became a universal color identification method. The
YCbCr color space is predominantly utilized in continuous image processing in television computer vision technology, or in digital photography systems. The
YCbCr color space, stemming from the CCIR Recommendations 601 (1990) standard, represents color through luminance (
Y) and the chroma of blue and red (
Cb,
Cr). Here,
Y is the grayscale value when converting color to grayscale images. The conversion formulae between
YCbCr and RGB are as follows:
Pass et al. [
25] proposed the Color Coherent Vector (CCV) method to improve the shortcomings of the color histogram. This method, based on color histogram processing, also considers color space information to extract image color features. The same color area is called the “cohesive area,” and the number of a certain color in the cohesive area is defined as the “cohesiveness” of this color. Pixels are classified into two types, cohesive and non-cohesive, to measure pixel cohesiveness. Cohesive pixels are in some continuous area of a certain size, while non-cohesive pixels are not in this area. The color adhesive vector can represent the classification of each color in the image. Research by Hong Liu et al. [
26] used the Grab Cut auto segmentation algorithm to segment garment images and extract the image’s foreground. Then, the color coherence vector (CCV) and the dominant color method were adopted to extract the color features to conduct garment image retrieval. Reddy et al. [
27] proposed an algorithm which incorporates the advantages of various other algorithms to improve the accuracy and performance of retrieval. The accuracy of color histogram-based matching can be increased by using the Color Coherence Vector (CCV) for successive refinement.
To illustrate, consider a 6 × 6 grayscale image, with each pixel’s grayscale value as shown in
Figure 1. The image can be quantified into three color components. Each quantified color component is called a bin. For instance, bin1 contains grayscale values from 10 to 19, bin2 contains grayscale values from 20 to 29, and bin3 contains grayscale values from 30 to 39. After quantification, we can obtain the results as shown in
Figure 2.
Using the “Connected Component Labeling” method, we can identify connected regions, as shown in
Figure 3. Each component is labeled with an English letter (A, B, C, D). Different letters are used to label regions with the same color that are present in different contiguous areas.
Create a table to record the color corresponding to each mark and the quantity of this color, as shown in
Table 1.
We now set an adhesive threshold value T = 4 (the size of the adhesive threshold value can be determined by oneself). If the number of connected components of pixels exceeds the threshold, T, these pixels are adhesive. If it is less than the threshold, these pixels are non-adhesive. Alpha (α) represents the number of adhesive pixels, and beta (β) represents the number of non-adhesive pixels. In
Table 2, the number of pixels marked as A, B, and E all exceed the threshold, T, hence they are marked as adhesive pixels; the number of pixels marked as C and D are less than the threshold T, hence they are marked as non-adhesive pixels. In the end, the color adhesion vector is obtained. The color adhesion vector of this image can be expressed as ((17, 3), (15, 0), (0, 3)).
The HSI color space is a color determination method grounded in three fundamental properties of color: hue, saturation, and intensity. Stemming from our visual system, the HSI color space conforms more closely to human visual characteristics, compared to the RGB color space. Research by Smith and Chang [
28] pointed out that the hue in the HSI color space is composed of the three primary colors of red, green, and blue, distributed every 120°. The hue component of the image is quantized every 20°. Quantifying the hue component into 18 (360/20) bins is sufficient to distinguish various colors in the hue component. The saturation component only needs to be quantized into three bins to provide enough perceptual tolerance. Therefore, it can be combined into 18 × 3 = 54 kinds of quantized color, which are encoded into numbers 1–54. The numbers 1-54 represent these 54 different quantized colors. By using the adhesive vector method to judge a pixel’s eight-neighborhood and defining a high threshold Tα, if the number of pixels with the same color as the center pixel exceeds Tα, the adhesiveness (α) of this color is incremented by 1. By scanning all the (M-2) × (N-2) pixels in the entire image, we can obtain all the high-adhesive colors in the image. The main color of this image, and its corresponding adhesion, can be represented by the numbers 1-54, which can be used to represent the color characteristics of the image. Thoriq et al. [
29] identified the level of ripeness of bananas using the image of a plantain fruit in its skin that was still intact. The image of the plantain fruit was preprocessed using HSI (Hue Saturation Intensity) color space transformation feature extraction.
Texture refers to the grooves on the surface of an object that appear uneven, with different characteristics, such as direction and roughness, which are the qualities of the surface of the substance. Mathematically, texture is the visual representation of the correlation of gray level changes or color space in adjacent pixels in the image, such as shape, stripes, color blocks, etc.
Local Binary Pattern (LBP) is a basic invariant of grayscale structure statistics. LBP is a powerful feature algorithm in texture classification problems. It uses a 3 × 3 computational dimension to calculate the difference between the central pixel and the surrounding eight-neighborhood pixels. The mask equation can judge the difference value, get the mask of the difference value and correspondingly multiply it with the weight mask, and then add it all up. The LBP operation formula is as follows below [
30]. The neighborhood pixel value is compared with the central pixel value;1 if larger and 0 if smaller. The schematic diagram of the LBP operation is shown in
Figure 4.
Pi: Calculate the 3 × 3 neighborhood pixel values of the mask.
Pcenter: Calculate the center pixel value of the 3 × 3 mask.
2i: Indicates the weight value of each position of the neighborhood pixel.
di: The 0 and 1 values obtained by comparing the neighboring pixels with the central pixel.
Ojala et al. [
31] achieved excellent performance in their research by using LBP analysis on images. Additionally, they introduced three other calculation methods to identify the relationship between image pixels and grayscale texture: SCOV, VAR, and SAC. Similar to LBP, these three methods also operate on a 3 × 3 basis of pixels, as shown in
Figure 5, and calculate the correlations between the eight neighboring pixels and the center pixel. A brief description of each is provided below:
- (1)
SCOV measures the correlation between textures, and is used in unnormalized local grayscale variables.
μ represents the local average, that is
- (2)
VAR measures the variation of gray scale values.
- (3)
SAC is the correlation of 8 values in the measurement area, and the gray scale variable after normalization is used, and the
SAC value will be limited between −1 and 1.
In undertaking relevant research on color and texture, Hayati [
32] aimed to classify types of roses by applying the k-nearest neighbor (K-NN) algorithm, based on extraction color characteristics of hue, saturation, value (HSV) and local binary pattern (LBP) texture. Rusia and Singh [
33] proposed a practical approach combining Local Binary Patterns (LBP) and convolutional neural network-based transfer learning models to extract low-level and high-level features. Three color spaces (i.e., RGB, HSV, and
YCbCr) were analyzed to understand the impact of the color distribution on real and spoofed faces for the NUAA benchmark dataset. Vangah, et al. [
34] combined the statistical Local Binary Pattern (LBP), having Hue Saturation Value (HSV) and Red Green Blue (RGB) color spaces fusion, with frequency (Gabor filter and Discrete Cosine Transform (DCT)) descriptors for the extraction of visual textural and colorimetric features from direct view images of rocks.
This study converts the RGB value of the sample image into the HSI value, uses hue (H) and saturation (S) components for color analysis, and uses color adhesion to calculate the top six colors with the highest adhesion as the main color. The corresponding adhesion combination is used as the color feature of the sample image. The decomposed intensity component (I) is changed to the calculation method of the Y value in the YCbCr color space to get the image grayscale value, and methods such as LBP, SCOV, VAR, SAC, etc. are used to analyze the correlation between image grayscale values for image texture feature extraction.