3.2. Semi-SRUnet Modeling Framework
In a mathematical sense, this paper defines the OM image as . The labeling diagram for the Semi-SRUnet model is , with 0 denoting the background category and 1 denoting the grain boundary category. The dataset D used by the model consists of n labeled samples and N unlabeled samples, denoted as = , where and .
The semi-supervised grain boundary segmentation model proposed in this paper is illustrated in
Figure 2. The process involves the following steps: (i) Using the labeled data
to train the teacher network. The teacher network learns the feature information of the grain boundaries from the OM image
. Subsequently, it generates predicted images of the grain boundaries
and boundary regression images
. These predictions are then combined with the ground truth labels
to train the teacher network and adjust the parameters of the model. (ii) The trained teacher network will segment the unlabeled OM images
with grain boundaries and generate pseudo-label images
. Subsequently, the label
repair is performed using the breakpoint connection method to obtain the repaired pseudo-label images
. (iii) During the training process, we employed a random noise generation procedure, which involves adding noise points, missing boundaries, and scratches to the labeled OM images
. Among these, missing boundaries are simulated by binarizing and extracting regions with significant grain boundary features in the OM image and adding a mask to these regions. This approach more realistically simulates the phenomenon of missing grain boundaries in real images. The noise generation strategy aims to enhance the model’s generalization capability when faced with noise and distortions beyond those present in the training dataset. The OM images with random noise added and their corresponding ground truth labeler are included in a dataset named MixData. Additionally, unlabeled OM images
and repaired pseudo-labeled images
are also incorporated into the MixData dataset. Therefore, the MixData dataset is denoted as MixData =
. Subsequently,
are fed into the student network for training and output
and
, as well as predictions
from using the trained teacher network, are used for knowledge distillation to accelerate the student’s feature learning. Finally, the predicted output images are compared with ground truth labels to compute the loss value for the student network. Based on this, the parameters of the student network are adjusted to enhance model
Accuracy.
3.2.1. Teacher–Student Network
In this paper, we enhance U-Net by introducing SCConv and a boundary regression module to improve its spatial and channel information capture and edge segmentation performance. This improved version is named SRUnet (U-Net enhanced with SCConv and boundary regression), as shown in
Figure 3. SCConv is composed of two parts, the Spatial Reconstruction Unit (SRU) and the Channel Reconstruction Unit (CRU) (
Figure 3a). SRU filters out irrelevant spatial details and CRU optimizes key features across channels, helping the model to focus on the most representative features even in the presence of distortion in the input image. Therefore, SRUnet inserts three SCConv modules in the encoding process to avoid the extraction of redundant features in the encoding process, and the convolved information is fused with the up-conv image information by jump connection for feature fusion. During the decoding process, SRUnet also inserts three SCConv modules during the decoding process to capture the spatial and channel information after feature fusion. Finally, two conv and Rectified Linear Units are applied to the image features of the end up-conv output to preserve the detail information of local features and output images. Additionally, three more convolutional layers and a Rectified Linear Units are applied to the end up-conv output to expand the receptive field, capture contextual information, and accurately locate boundaries, resulting in boundary regression images (
Figure 3b).
However, in the semi-supervised grain boundary segmentation method proposed in this paper, the teacher network is based on SRUnet. The student network also uses SRUnet but without the SCConv module, and it receives guidance from the teacher network through knowledge distillation. This approach avoids the need for the student model to learn from scratch, helps it find the correct direction more quickly, reduces the search space, and improves learning efficiency.
The output images and regression images in the SRUnet network structure are used to calculate the loss values with real ground labels using two loss functions, binary cross entropy with logits Loss (
) defined in (1) and mean squared error (
) defined in (2), respectively. Teacher uses the loss function
defined in (4), while the student’s loss function
is defined in (5).
is based on the teacher network and introduces a knowledge distillation loss function (
) in the form of Kullback–Leibler Divergence defined in (3) to learn the prior knowledge of the teacher. The calculation formulas are as follows:
where
denotes the number of samples in the training set,
denotes the predicted output from the student model,
denotes the predicted output from the teacher model,
denotes regression images,
denotes real ground labeling,
denotes sigmoid activation function, and
,
and
,
,
denote the weight ratio of the loss function.
3.2.2. Pseudo-Label Repair
Since only a small amount of labeled data is used for supervised learning, the pseudo-labels predicted by the teacher network contain some errors compared to the real ground truth labels. Therefore, this paper presents an algorithm for pseudo-label repair, as outlined in Algorithm 1:
Algorithm 1: Pseudo-label inpainting |
Input: Pseudo-label .
|
|
|
|
|
|
|
For Algorithm 1, the main steps are as follows. (1) Perform skeleton extraction on the input pseudo-labeled image . (2) Iterate over each non-zero pixel in the image, treating each non-zero pixel as the center. Extract the eight neighboring pixels around it; if there is only one non-zero pixel among these neighbors, the central non-zero pixel is considered a breakpoint. This process continues until all breakpoints in the image are detected, resulting in a set of breakpoint coordinates. (3) Iterate over any two breakpoints and of the set . If the distance between the two points is less than the threshold and greater than threshold or less than (180 − ), then connect the two breakpoints and . While is the direction angle of the line segment at , is the direction angle of the line segment at , and is the threshold. (4) Repeat the operation of (2) to detect breakpoints on the image and obtain a new set of breakpoints . (5) The method of the morphology of shapes is applied to detect all the fork points of the crystal boundaries in the image to obtain the set of fork points . (6) Traverse all combinations of fork points and breakpoints assuming the traversals are , . Connect and if the distance between and is less than and the absolute value of minus angle(,) is less than . (7) Return (2) to continue with the following steps, decrease , and increase the value of until = 0. (8) After the breakpoints and forks are connected, repeat the operation in (2) to detect the remaining grain boundary breakpoints. (9) Extend breakpoint by adding and until it reaches the coordinates of the grain boundary, and then connect with the coordinates of the grain boundary. (10) Perform the expansion operation on the of the image with the breakpoints connected to restore the thickness of the grain boundary as the real ground label. (11) Return the restored pseudo-label image .
To avoid introducing noise due to erroneous connections between breakpoints, appropriate thresholds for both length and angle must be set. Based on the polygonal characteristics of grain boundaries, when connecting breakpoints, the target point typically falls within the angular range of (−60°, 60°) relative to the breakpoint. Similarly, when connecting a breakpoint to a bifurcation point, the directional angle between the breakpoint and the bifurcation point predominantly lies within the same range. Therefore, we conducted a grid search within a range of 0 to 100 pixels, performing multiple tests and observing the connection results to determine the optimal pixel length as 30 pixels. Given that the side length of our images is 512 pixels, and considering that different image sizes may require different connection lengths, we converted the pixel length into a proportion of the image side length to represent and . The final optimal threshold values for and are determined to be 5.85%. Additionally, within the angular range of (−60°, 60°), we used the same testing method to determine the optimal angle thresholds as = 30° and (180° − ) = 20°. When the direction of the line segments at the two breakpoints is nearly opposite and close to 180°, it may indicate that the grain boundary is interrupted in the middle. Therefore, || > also meets our connection criteria. However, this may also lead to situations where breakpoint connection becomes impossible. For example, if the grain boundaries in the pseudo-labeled image are severely missing, resulting in excessively long distances between breakpoints and the inability to find grain boundary points by extending the breakpoints, the breakpoint repair cannot be completed.
Using Algorithm 1, skeleton extraction is performed for pseudo-label
, as shown in
Figure 4a. Subsequently,
(5.85% represents the proportion of the image’s side length) and
are set to connect the two breakpoints that meet the conditions for connecting them, as shown in
Figure 4b. Next, the thresholds
and
are applied to connect the breakpoints and fork points, as shown in
Figure 4c. Then, line segments are extended for the remaining breakpoints until they reach the grain boundary points, as shown in
Figure 4d. Finally, shape morphology is applied to expand the grain boundaries, as well as black and white inversion of the image after the connection of the breakpoints to obtain the repaired pseudo-labeled image
, as shown in
Figure 4e.
3.3. Training Details
In this paper, we empirically determine the optimal hyperparameters for the teacher and student networks. Both networks are trained using the RMSProp optimizer with a batch size of 2, a learning rate of 0.00001, and a weight decay of 1 × 10−8 to prevent overfitting. The momentum is set to 0.9 to facilitate faster convergence and improve model performance. After experimentation and testing, the weights and of the loss function in the teacher network were set to 0.8 and 0.2. The weights , , and of the loss function in the student network were set to 0.88, 0.02, and 0.1. In addition, this study was trained and tested on a computer (Intel(R) Xeon(R) Platinum 8352 V CPU @ 2.10 GHz, NVIDIA GTX 3090 GPU, ubuntu20.04, PyTorch 1.10.0).