The main flow of this experiment is shown in
Figure 2. Firstly, underwater camera calibration was performed to obtain the parameters of the camera for aberration correction. Then, the acquired images were processed in a division of labor. On the one hand, the underwater image was enhanced as a way to eliminate some effects of uneven illumination. The enhanced image was then stereo-matched to calculate the disparity value and obtain the depth information of the image. On the other hand, the acquired data were segmented, and the purpose of segmentation was to facilitate the labeling of key points in the next step. After labeling, the pixel body length of the tilapia could be calculated. Then, the depth information and pixel body length were combined to reconstruct the tilapia in three dimensions, which was used to calculate the actual body length of the tilapia. Finally, the relationship between the body length and the mass of tilapia was modeled. The mass information of the tilapia was obtained by inputting the estimated body length of the tilapia.
2.3.1. Camera Calibration
The refraction of light when shooting in underwater environments can lead to image distortion. To ensure the accuracy of the results, the binocular camera requires binocular calibration and distortion correction underwater [
14]. The ZHANG’s method [
15,
16] was used, and a 12 × 9 square grid aluminum substrate was selected for the calibration plate. The size of each square was 30 mm × 30 mm.
Keeping the position of the camera unchanged, the position of the calibration plate was constantly changed, and several groups of images with different angles and positions were captured. The relative position of the binocular camera to the calibration board is shown in
Figure 3. Forty of these images were selected and programmed to be split into separate left and right viewpoint maps and stored in separate folders. The left and right viewpoint maps were automatically calibrated using tools in Matlab2017b. The corner point detection results of the calibration board are shown in
Figure 4. The optical aberrations produced by the images can be corrected using Equations (2)–(4).
where (
x,
y) are the coordinates of the original image.
X1,
X2,
Y1, and
Y2 are the coordinates of the corrected image.
fx and
fy are the focal lengths in each direction.
k1 and
k2 are correction coefficients for radial distortion, and
p1 and
p2 are correction coefficients for tangential distortion.
t is the distance from the image pixel point to the center of the image.
The histogram of the reprojection error of the Matlab2017b first calibration results is shown in
Figure 5a. The maximum error in the first calibration reached 0.15, and it can be seen that several images had a large impact on the error. In order to further improve the subsequent estimation accuracy, we removed some image groups with large errors. The histograms of the reprojection errors after removal are shown in
Figure 5b, and the errors were all below 0.1.
The calculated camera parameters are shown in
Table 1. It can be seen that the rotation matrix R approximates the unit matrix. The first parameter of the translation vector T represents the distance between the centers of the two cameras. The default parameter for the binocular camera baseline was 60 mm, which is within the tolerance of the error.
2.3.2. Image Enhancement and Stereo Matching
Because of the uneven illumination of the underwater images obtained from the experiment, this study preprocessed the images before stereo matching. A Retinex-based image enhancement algorithm was used to enhance the underwater images, and the final image was obtained to eliminate the uneven illumination and retain the nature of the fish itself. In this study, several mainstream image enhancement methods were selected for comparison, including single-scale Retinex (SSR), multi-scale Retinex (MSR), and multi-scale Retinex with color restoration (MSRCR). A comparison of the underwater enhancement results is shown in
Figure 6.
Quantitative analysis is needed for images processed by underwater image enhancement methods. In this study, two commonly used image quality evaluation metrics, peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), were used to quantitatively analyze and compare the performance of different algorithms.
In the application of underwater image enhancement, PSNR can be used to determine whether the enhanced image is over-enhanced. Ideally, the enhanced image should be closer to the reality of the original scene rather than producing unnatural enhancement. A higher PSNR value means that the difference between the enhanced image and the original image is smaller, which usually indicates that the image is of better quality and does not appear to be over-enhanced. The formula for calculating the
PSNR is as follows:
In the above formula,
MAXA denotes the maximum pixel value of the image,
m and
n denote the pixel value of each row and column of the image,
A(
i,
j) denotes the original image, and
B(
i,
j) denotes the enhanced processed image. The four sets of images in
Figure 6 are numbered from 1 to 4. The PSNR values of different algorithms in the same environment are shown in
Table 2.
Table 2 shows that the peak SNR values of MSRCR algorithm were higher than those of the other algorithms for different images in the same environment, which proves that the MSRCR performed well in this index.
SSIM is a metric used to assess the similarity between images. It is determined by the covariance between two images. A larger value of SSIM indicates a higher-quality image. When two images are identical, the SSIM is equal to 1. For two images
x and
y, the structural similarity index between them can be calculated using a specific formula:
In the above equation, the mean of
x is denoted by
and the mean of
y is denoted by
.
denotes the variance of
x;
denotes the variance of
y;
denotes the covariance of
x and
y; and
c denotes a constant value. The structural similarity ranges from 0 to 1.
Table 3 shows the SSIM index of different algorithms.
As can be seen from
Table 3, the similarity structure index of MSR algorithm was better than that of the other two algorithms. Because the image must be restored to the original image as much as possible during image stereo matching, the two evaluation indexes and the final effect of the image were synthesized. In this study, we decided to choose the MSR algorithm for underwater image enhancement before stereo matching to make the image clearer.
After completing the underwater camera calibration and image enhancement, the stereo matching algorithm can be used to match each pixel point in the left and right cameras. The depth of each point was calculated from the disparity value and converted into a depth image. The 3D coordinates of the corresponding target points were obtained to complete the 3D reconstruction. The depth of the target point corresponding to the pixel point in the image was calculated using the similar triangle principle, as shown in
Figure 7. In the figure,
Ol and
Or are the optical centers of the left and right cameras, respectively,
Xl and
Xr are the horizontal coordinates of the pixels on the left and right imaging planes, respectively. The depth
D of the target point
P is related to the disparity value
d by the following equation:
where
D is the depth value,
f is the focal length.
Bl is the baseline length of the camera, and
d is the disparity value.
Lp is the edge length of a single pixel of the camera.
In this paper, we used the current mainstream semi-global block matching method (SGBM) [
17,
18] as the camera stereo matching algorithm, and the adopted parameters are shown in
Table 4. In this paper, we used the relevant functions and methods integrated in the OpenCV library to implement the process of stereo matching. The texture filtering algorithm of the block matching algorithm was also integrated in the preprocessing of the SGBM algorithm in the OpenCV library, which helped to remove regions with low texture values. An example of the stereo matching results is shown in
Figure 8. It is not difficult to observe that there is only a small void in the disparity of the fish body part of the disparity map [
19], which basically meets the needs of size measurement.
2.3.4. Methods of Estimating Body Length
The segmented fish body image was labeled with the coordinates of the muzzle (
x1,
y1) and the base of the caudal fin (
x2,
y2), as shown in
Figure 11. Since stereo matching is not good at obtaining high-density correspondences, the problem of not finding the corresponding depths on key points can occur. For such cases, we acquired the depths of multiple neighboring points of the key point. The average of the depths of these neighboring points was taken as the depth of the key point. The pixel length (
PL) of the tilapia was calculated by Equation (8), and we tried to avoid acquiring the values of empty areas when acquiring the depth information. And the depth values of five different parts of the fish body were acquired. The average value
Davg was taken as the depth value of the fish body in this figure, so as to minimize the error. Then, we combined the depth information with the triangle similarity principle to calculate the body length (
BL) of the fish:
In practice, it is difficult to obtain a high-quality image of the fish at a perfect angle (fish body parallel to the mirror) [
21,
22]. In most cases, the body of the fish is not parallel to the mirror. As shown in
Figure 12, the angle between the fish body and the mirror is
a. We need to acquire the depth information of the fish’s muzzle and the base of the caudal fin. Their pixel coordinates are combined for 3D reconstruction of the fish body. In the figure,
p1 and
p2 are the pixel coordinates of the fish’s muzzle and caudal fin base in the x-axis, respectively.
d1 and
d2 are the depth values of the first two, respectively. Finally, the trigonometric function is combined to further calculate the real length (
RL) of the fish body: