This section describes the images and accuracy obtained using the method described in
Section 2. We used two types of microscopic images of color materials for training: images with small (
Figure 1a) and large (
Figure 1b) number of particles. Both images were representative of a mixture of carbon black pigment and resin; however, the compositions of the mixtures were different.
The accuracy of the images obtained by training was evaluated. The confusion matrix presented in
Table 1 was used to determine the accuracy, recall, and precision, defined as in Equations (8)–(10), respectively [
15]. True (TP) and false positive (FP) correspond to cases in which the part classified as a particle and background in the ground truth is classified as a particle in the prediction image, respectively. False negative (FN) and true negative (TN) correspond to cases in which the part classified as a particle and background in the ground truth is classified as the background in the prediction image, respectively [
16].
The accuracy is a measure of the percentage of the number of correctly classified data among the total data, that is, the percentage corresponding to TP and TN. The recall indicates the proportion of the number of data that were positive among the data that should be classified as positive, that is, the proportion of TP in the total of TP and FN. The precision is a measure of the percentage of the number of data that are actually positive among the data classified as positive, that is, the percentage corresponding to TP and FP. A higher accuracy, recall, and precision is preferable.
Here, accuracy indicates the overall recognition ratio. However, the accuracy alone is insufficient, and additional ratio parameters, such as parameters, precision, recall, and the F1-score evaluation criteria, are required. F1-score is defined as the harmonic mean of the precision rate and the recall rate and is a useful value for balancing the precision and recall metrics [
17]. It is calculated as follows:
3.1. Learning Based on U-Net Architectures
The training time for the conventional U-Net was 780 s, but the improved U-Net #1 and improved U-Net #2 present training times of 450 s and 375 s, respectively, on the GPU (NVIDIA GeForce RTX 2080 Ti). The training time was significantly reduced owing to network improvements. The inference based on the learning model can be output in 2–3 s in either network architecture. In the improved U-Net #1, the number of required parameters was reduced by reducing the number of channels without changing the size of the input image, resulting in a dimensionality reduction. Consequently, the learning time is reduced. In the improved U-Net #2, the removal of skip connections in the deeper layers is assumed to have reduced the learning time as fewer processing steps are required.
Figure 8 shows the results of the segmentation of a colorant microscopic image with a small number of particles using the conventional U-Net, U-Net #1, and U-Net #2. In the case of the upper image, the results of all the U-Net architectures are comparable. However, in the case of the lower image, the large particle in the upper right corner is not properly segmented. In particular, the particle part is almost undetectable when the conventional U-Net is used, and the segmentation is slightly enhanced when U-Net #1 and #2 are used.
Figure 9 shows the results of the segmentation of a colorant microscopic image with a large number of particles using the conventional U-Net, U-Net #1, and U-Net #2. For the upper and lower images, the results obtained using the conventional U-Net and U-Net #1 are similar. However, several particles are not properly detected by U-Net #2.
Table 2 presents the segmentation accuracies of colorant microscopic images with a small number of particles using different U-Net architectures. The accuracy is comparable for all the U-Net architectures. U-Net #2 has a high recall but small precision compared to the other U-Net architectures. The accuracy of U-Net #1 is similar to that of the conventional U-Net architecture.
Table 3 presents the segmentation accuracies of colorant microscopic images with a large number of particles using different U-Net architectures. The accuracy is comparable for all the U-Net architectures. U-Net #2 has a high recall but small precision compared to the other U-Net architectures. The accuracy of U-Net #1 is similar to that of the conventional U-Net architecture. In particular, the accuracies of U-Net #1 and the conventional U-Net are identical.
In this study, the ground-truth images were manually created and are not perfectly correct data. In many cases, especially for small particles, they are not classified as particles. On the other hand, there are very few areas where non-particles are incorrectly classified as particles. This is because, when creating the ground-truth images, we set a rule that, if it is difficult to determine whether they are particles or not, the parts are not classified as particles. Based on this rule, it is expected that the areas classified as particles at the time of the ground-truth-image creation are almost certainly particles. Therefore, in the accuracy evaluation in this study, priority was given to improving the recall score, which indicates the percentage of particles in the ground-truth images that are properly classified as particles. Comparing the recall scores in
Table 2 and
Table 3, the values corresponding to U-Net #1 and U-Net #2 are improved over the conventional U-Net.
Figure 10a–c illustrates the training curve of each network architecture.
Table 2 and
Table 3 present the prediction results for the test data. It can be observed that this accuracy is similar to the results of the training data and that our model was not prone to overfitting.
To quantitatively assess the accuracy of the model, comparisons were made using the area under the curve (AUC) of the Receiver Operating Characteristic curve (ROC).
Table 4,
Table 5 and
Table 6 present the confusion matrices corresponding to images with a small number of particles for conventional U-Net, improved U-Net #1, and improved U-Net #2.
Table 7,
Table 8 and
Table 9 present the confusion matrices corresponding to images with a large number of particles for conventional U-Net, improved U-Net #1, and improved U-Net #2. Comparing the values in these tables, an insignificant difference can be observed between the matrices for conventional U-Net, improved U-Net #1, and improved U-Net #2. Therefore, the proposed method does not significantly improve the confusion matrix. However, when comparing
Figure 9 and
Figure 10, significant results were obtained in terms of the appearance of particle domain segmentation. However, as mentioned above, a comparison of
Figure 9 and
Figure 10 shows visually that the improved network detects particles more accurately than the conventional U-Net. Therefore, better results were obtained using the improved U-Net in terms of the appearance of the particle area segmentation. Comparing the AUC values shown in
Table 10 and
Table 11, the accuracy of the improved U-Net #1 is comparable to that of the conventional U-Net. On the other hand, the accuracy of the improved U-Net #2 is slightly lower than that of the conventional U-Net. The results demonstrate that the proposed method maintained the previous high level of accuracy in quantitative evaluation while improving the visual appearance of the particles in the segmentation of the regions of the particles. The number of particles in the image is inversely related to the size of the particles; i.e., the larger the number of particles in the image, the smaller the size of the particles and vice versa. Therefore, the AUC scores shown in
Table 10 and
Table 11 are close to the human model in that larger particles are easier to distinguish.
We calculated the F1-scores for each network architecture and compared the detection accuracy.
Table 12 presents the F1-scores for the conventional U-Net, improved U-Net #1, and improved U-Net #2. The accuracy is better when this score is closer to 1. The comparison of the values in the table shows that the conventional U-Net presents the best score, and the improved U-Net #2 has similar scores. The evaluation based on F1-score indicates that the conventional U-Net and the improved U-Net #2 present similar performance in our application.
3.2. Learning U-Net #1 Based on Incomplete Labeling
Figure 11 displays the segmentation results of a colorant microscope image with a small number of particles with incomplete labeling data used as the ground truth data. The results of training using the original ground truth data and labeled data in which particles smaller than 200 px are removed indicate that despite the presence of noise, the particle regions can be correctly detected. The results of training using labeling data in which particles smaller than 400 px are removed are similar to the correct answers; however, certain small particles are not detected. In the training using labeling data in which particles smaller than 600 px are removed, the number of undetected particles is larger than that in the other cases.
Figure 12 shows the segmentation results of a colorant microscope image with a large number of particles, with incomplete labeling data used as the ground truth data. When the training is performed using the original ground truth data and labeled data in which particles smaller than 200 and 400 px are removed, the particle regions are correctly detected. However, several small particles are not detected when labeling data in which particles smaller than 400 px are removed are used. In the training using labeling data in which particles smaller than 600 px are removed, the number of undetected particles is larger than that in the other cases, which is undesirable.
Table 13 and
Table 14 present the segmentation-accuracy values obtained using U-Net #1 for colorant microscopic images with a small and large number of particles, respectively, with incomplete labeling data used as the supervised data. The accuracy is comparable for all the labeled data. However, when labeling data in which particles smaller than 400 px and 600 px are removed are used, the precision is higher, and recall is lower.
The F1-score was calculated to compare the accuracy of learning when using incomplete labeling data. As shown in
Table 15, the F1-score is highest for learning when using data with particles smaller than 200 px removed, both for a small and large number of particles. This result demonstrates that removing particles smaller than 200 px when creating the ground-truth images does not affect the learning results and simplifies the creation of the dataset.