We used the standard of fresh apples (GB/T 10651-2008, General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China) and conducted market research to determine the appropriate grading standard. The standard is listed in
Table 1. The final grade of the apples is based on the lowest grade of a single characteristic, such as the size, color, shape, and surface defect of the apples. Three levels were used: first-grade, second-grade, and other-grade fruit. The apple size was the maximum cross-sectional diameter of the apple. The apple color was characterized by the ratio of the red area to the total apple surface area.
2.2.2. Extraction of the Fruit Shape
A regular shape is an important index in apple grading and also influences the consumer. Deformed apples usually sell at a lower price. In this study, the roundness and shape index were used to evaluate the apple shape.
The roundness describes the complexity of an object’s boundary. The range of the roundness is [0, 1], and the closer the value is to 1, the rounder the shape is. The roundness
E1 was calculated as follows:
where
S is the apple area, i.e., the number of projected pixels on the apple’s surface;
P is the perimeter of the apple, i.e., the number of pixels on the apple’s circumference.
The shape index
E2 is the ratio of the long axis to the short axis of the apple:
where
D1 is the long axis of the apple, i.e., the maximum length from the bottom of the calyx to the stem of the apple, mm;
D2 is the short axis of the apple, i.e., the maximum cross-sectional diameter of the apple, mm. When the shape index has a range of 0.6–0.8, the apple is oblate, 0.8–0.9 is round or nearly round, 0.9–1.0 is oval or conical, and 1.0 or more is oblong [
6].
2.2.3. Extraction of the Apple Color
The apple’s color is a visual characteristic and an indicator of the apple’s quality and maturity. Therefore, the color of apples is crucial in apple grading. The hue, saturation, and value (HSV) color space model was adopted in this study since it is a robust color model.
The HSV color space model separates the hue, saturation, and value (brightness) of the apple’s color. The hue is least affected by illumination and most suitable for distinguishing different apple colors. Therefore, we selected the hue of the HSV image to extract the apple’s color. In this research, we used OpenCV to convert the collected apple image into the HSV space, and divided the value of hue by 2 to obtain the range of hue from 0 to 180. The red, yellow, and background components were extracted from the hue of the side view image of the apple for the statistical analysis, as shown in
Figure 3.
The hue of the background was substantially different from that of the red parts of the apple. The hue ranges of the red parts were [0, 15] and [175, 180] (as shown in
Figure 3a), the hue ranges of the background parts were [50, 125] (as shown in
Figure 3c), and those of the yellow components were about [20, 25] (as shown in
Figure 3b). Thus, the ratio of the red area was used to describe the color of the apple. The ratio of the red area is the ratio of the number of red pixels in the hue to the total number of pixels in the apple area:
where
M is the total number of apple pixels in the camera field of view; Mask
i is set to 1 when the hue value of the
ith pixel is within the red threshold range; otherwise, it is 0. The red threshold ranges from [0, 15] to [175, 180]. The red area ratios of the hue values of the apple images obtained by both cameras were calculated and averaged to describe the color of the apple. The value ranges from 0 to 1, and the closer it is to 1, the redder the apple is.
2.2.4. Extraction of the Fruit Surface Defects
Surface defects are the most critical factors in apple grading. During growth and harvest, the apples are affected by friction, squeezing, and insect pests, resulting in physical damage. Damaged apples are prone to mildew and decay, which affect the taste and commercial value of the apples and can reduce the storage capacity of entire batches of apples.
Apple surface defects include crush injury, stab injury, abrasion injury, sunburn, hail injury, cracks, splits, insect damage, and so on. Due to the low probability of some fruit surface defects, such as sunburn and hail injury, it is necessary to determine the proportion of different types of detected fruit surface defects. In order to obtain accurate results, we collected 1000 Fuji apples from the Huicheng Orchard (34.31° N, 108.02° E) in Yangling, Shaanxi Province and counted the surface defects. The statistical results are listed in
Table 2.
It can be seen from the statistical results in
Table 2 that the ratio of bruising and cracks is high, which was due to the physical damage caused by falling or colliding apples during picking. The bruises were divided into two categories: mild bruises, referred to as bruises, and severe bruises. Slight bruises were similar in color to the surrounding epidermis, without obvious discoloration. Severe bruising was caused by severe impact and pressure, and even juice flowing out. Due to the fact that the types of fruit surface defects described by cracks and splits were basically the same, the two types of cracks and splits were combined into cracks. Other surface defects were relatively few, among which damage caused by farm chemicals was the most, and other kinds of fruit surface defects were collectively referred to as skin defects.
In this study, the TensorFlow deep learning framework with the single-shot multibox detector (SSD) deep learning algorithm was used to identify the apple surface defects to ensure high accuracy of the surface defect detection of the field grading equipment. SSD is a target detection method [
34,
35]. The algorithm changes the two fully connected layers of the VGG-16 network structure into convolution layers, and adds four convolution layers to construct the network structure. The specific network structure is shown in
Figure 4.
The overall objective loss function is a weighted sum of the localization loss (
Lloc) and the confidence loss (
Lconf):
where
N is the number of matched default boxes,
x indicates whether the prediction box matches the real label box, if so, it is 1, otherwise it is 0;
c represents the confidence of softmax function for each classes;
l represents the prediction box;
g represents the real label box and the weight term
α is set to 1 by cross-validation [
35].
The images of the apples used in the training set were manually labeled using the LabelImg toolbox. The labels included no defect, bruising, severe bruising, cracks, and skin defects. Due to the small number of samples, transfer learning was used to optimize the parameters. We used the trained parameters of the MobileNetV2 classification network, removed the last classification layer, kept the parameters of the VGG5 layer unchanged, and randomly initialized the parameters of the other layers using a Gaussian distribution with 0 mean and 0.01 standard deviation. A batch random gradient descent algorithm was used for parameter optimization. The batch size was 64, the initial learning rate was 0.04, and the 800 images with a size of 3024 × 3024 pixels were scaled to 300 × 300 pixels. After training, the model was applied to the test set to detect surface defects in the apples. Rectangular boxes with scores exceeding 50% represented the areas with surface defects.
To verify the model accuracy, the apples in the image were checked, and the apple damage in the test set images was manually marked. An intersection over union (IOU) was performed to obtain the overlap between the predicted results obtained from the SSD deep learning algorithm and the actual results. According to the previous test, it can be considered that the IOU is 70% with high accuracy, that is, if the IOU is more than 70%, it is considered that the detection of the fruit surface defect type is correct, otherwise, it is considered that the detection is incorrect. The evaluation indices of detection accuracy are recognition precision (
P), recall (
R), and harmonic mean
F1, and the formula is as follows:
where
TP is the number of surface defects detected correctly;
FP is the number of non-surface defect areas mistakenly detected as surface defect areas or surface defect detection errors;
FN is the number of surface defect areas mistakenly detected as non-surface defect areas;
F1 is the harmonic mean of
P and
R, and the closer it is to 1, the better the model is.