Counting of Underwater Static Objects Through an Efficient Temporal Technique

Naseer, Atif; Nava, Enrique

doi:10.3390/jmse13020205

Open AccessArticle

Counting of Underwater Static Objects Through an Efficient Temporal Technique

by

Atif Naseer

^1,*

and

Enrique Nava

²

¹

Deanship of Postgraduate Studies and Research, Umm Al-Qura University, Makkah 21955, Saudi Arabia

²

ETSI Telecomunicación, Universidad de Málaga, 29071 Malaga, Spain

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(2), 205; https://doi.org/10.3390/jmse13020205

Submission received: 2 December 2024 / Revised: 25 December 2024 / Accepted: 27 December 2024 / Published: 22 January 2025

(This article belongs to the Special Issue New Advances in Marine Remote Sensing Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Counting marine species is a challenging task for biologists and marine experts. This paper presents an efficient temporal technique for counting underwater static objects. The proposed method employs deep learning techniques to detect objects over time and an efficient spatial–temporal algorithm to track them, allowing for accurate counting of objects within a given area. The technique is designed to handle various challenges that arise in underwater environments such as low visibility, object occlusion, and water turbulence. The approach is validated through experiments conducted on the surveyed data of Nephrops norvegicus. Nephrops is considered one of the main commercial species in Europe. Nephrops spend most of their time inside the burrows. Burrows tracking and counting are the only ways to monitor this species. This paper proposed a technique to accurately count underwater static objects using their spatial–temporal values by minimizing false positives. The proposed technique has potential applications in marine biology, environmental monitoring, and underwater surveillance.

Keywords:

Nephrops norvegicus; spatial–temporal; tracking; counting; Faster R-CNN

1. Introduction

Our earth is mainly composed of water. Water is one of the fundamental parts of our life. The species present in the sea are one of the main sources of food for people. Scientists have worked for decades to find out more about the species and monitor their habitats. With the increase in underwater monitoring of species using cameras, it has now become very difficult to analyze the data manually. The counting of underwater species is especially hard using the manual process. A human eye cannot track the object if similar objects appear in consecutive frames. With the advancement of technology in the last few decades, the species are now automatically detected and counted. Artificial Intelligence and computer vision play an important role in automatically detecting and counting underwater species. Underwater tracking and counting of species are an important area of research in marine biology and ecology. It involves the use of various technologies to monitor and track the movements and behaviors of different species in the ocean, from small planktonic organisms to large whales.

One common method for tracking and counting marine species is acoustic telemetry. This involves attaching small acoustic tags to individual animals, which emit unique signals that can be detected by underwater listening devices or receivers. By triangulating the signals from multiple receivers, researchers can track the movements and behavior of individual animals over time and gain insights into their migration patterns, foraging habits, and social interactions. Another method for underwater tracking and counting of species is video monitoring. This involves deploying underwater cameras or remotely operated vehicles (ROVs) to capture images and footage of marine life in their natural habitats. This can provide valuable information on species diversity, abundance, behavior, and interactions and can help researchers to better understand the ecological dynamics of different marine ecosystems.

Object counting in an underwater environment can be a challenging task due to the low visibility and the presence of various marine species. Traditional methods such as manual counting or visual surveys are time-consuming and may be inaccurate. However, advancements in computer vision technology have enabled the development of automated object-counting systems using underwater cameras and image processing techniques. These systems can accurately and efficiently count various objects such as fish, corals, and other marine organisms, providing valuable information for conservation and management efforts. Object counting is the biggest challenge in vision problems. One of the major causes of these challenges is due to the nature of an object like its shape, size, and movement in the underwater environment. The other major factor is the underwater environment. The underwater environment has a complex background structure, poor visibility, turbulence of water, and complex seabed, which causes the object counting a complex and challenging problem. With the advancement of AI, many automated tools and mechanisms are available to count the objects in underwater and ground objects. In an underwater environment, the counting algorithm used a regression-based [1] or detection-based [2] approach to count the objects. There are many tracking techniques used in the literature that are based on the OpenCV KCF tracker [3], Optical flow [4], SORT [5], or Kalman Filter [6].

Computer vision plays a crucial role in object counting in underwater environments by automating the task and improving accuracy. Computer vision systems use image processing techniques to extract relevant features from underwater images, such as color, shape, and texture, to identify and count objects of interest. These systems can process large quantities of underwater images quickly and accurately, reducing the time and effort required for manual counting or visual surveys. Additionally, computer vision systems can be trained to recognize specific objects or species, enabling researchers and conservationists to obtain more detailed and targeted information about the underwater ecosystem. Overall, computer vision technology has revolutionized object counting in underwater environments, providing a powerful tool for marine research and conservation efforts.

There are several challenges that computer vision systems may face in underwater object counting and tracking. Those challenges are the following:

Low visibility: Underwater environments often have low visibility due to factors such as water turbidity, lighting conditions, and water depth. This can make it difficult for computer vision systems to accurately detect and track objects;
Object occlusion: Objects in underwater environments may be partially or fully occluded by other objects, such as rocks or plants. This can make it difficult for computer vision systems to accurately count and track objects;
Object variability: Objects in underwater environments can vary greatly in size, shape, and color, making it difficult for computer vision systems to accurately recognize and classify them;
Data variability: Underwater images may vary in quality, resolution, and lighting conditions, which can affect the accuracy of computer vision systems;
Limited training data: There may be a limited amount of training data available for specific underwater environments or species, which can limit the accuracy of computer vision systems;
Motion blur: Objects in underwater environments may move quickly or unpredictably, causing motion blur in images and making it difficult for computer vision systems to accurately track them.

The main difference between underwater static and moving object tracking is that static object tracking involves detecting and monitoring stationary objects in an underwater environment while moving object tracking involves detecting and monitoring objects that are in motion.

Static object tracking in underwater environments is often used for monitoring and counting fixed structures such as coral reefs, underwater structures or shipwrecks, and other stationary objects. It typically involves using computer vision techniques to detect and track changes in the environment over time, such as changes in the size or shape of the object or changes in lighting conditions. Once an object is identified, it can be tracked over time, allowing researchers to monitor changes to the object or surrounding environment.

Moving object tracking in underwater environments involves detecting and tracking objects that are in motion, such as fish, marine mammals, or boats. It requires more sophisticated computer vision algorithms that can track objects as they move through the water, often in complex and unpredictable ways. This typically involves using techniques such as motion estimation, feature tracking, and object recognition to identify and track objects over time. The goal is to provide accurate and reliable information on the location, velocity, and behavior of moving objects in the underwater environment.

In summary, while both underwater static and moving object tracking rely on computer vision techniques, they differ in terms of the types of objects being tracked and the algorithms used to detect and track them. Addressing these challenges requires the development of robust and adaptive computer vision algorithms that can handle these variations and provide accurate object counting and tracking in underwater environments.

There are thousands of species present in our marine ecosystem. Some of them are dynamic (i.e., they all the time in a continuous motion) while other species are static and do not change their position and stick to a fixed point. Some of the common examples of static species are corals and underwater species burrows. One of the popular and commercial species in Europe that lives under burrows on the seabed is Nephrops norvegicus. Nephrops norvegicus is known to be a Norway lobster that is present in the European seas. In Figure 1, the species Nephrops norvegicus, commonly referred to as Nephrops, is depicted. This particular species can be found across a range of depths, specifically from 10 m to 800 m, in both the northeastern Atlantic waters and the Mediterranean Sea [7]. The Nephrops are known to inhabit these regions primarily due to the presence of suitable sediment, which allows them to construct their burrows.

Nephrops create burrows on the seabed and hide themselves inside the burrows. The burrow architecture of the Nephrops norvegicus species stands out for its unique construction, featuring entrances that can be singular or multiple [8,9]. These entrances are easily distinguishable from those created by other burrowing creatures due to their specific attributes. Commonly, you will find at least one entry point shaped like a crescent moon, leading to a tunnel that slopes gently downward. Around these entry points, it is typical to observe a fan-shaped spread of sediment as a result of the excavation process. Additionally, signs such as scratch marks and tracks around the burrows are a frequent sight. In situations where the burrow system includes several entrances, the area connecting these openings tends to be noticeably elevated. Nephrops burrows are considered static underwater objects. To monitor the habitat of Nephrops, the only way is to count the burrows, which represent the presence of Nephrops. Monitoring the population levels of Nephrops is primarily conducted through underwater television (UWTV) surveys across various European locations. This approach, initially developed in Scotland during the 1990s, relies on detecting and counting the burrow systems within the established distribution zones of Nephrops [10]. These surveys are fundamental in evaluating Nephrops abundance, forming a critical part of the process for assessing and managing their stocks effectively [11]. A standard operating procedure is adopted in each station for burrow counting. The details included how many minutes are to be counted, warm-up session details, where to count on the screen, and removal of minute counts where footage quality deteriorates. The counting procedure follows blind and independent counts from two different experts.

The techniques proposed for tracking underwater objects perform well with objects in motion, like fishes and other underwater species. In our case, we are facing three major challenges while tracking the Nephrops burrow (static objects). The first challenge is the movement of the camera; our objects are not moving but the camera is moving in the forward direction, leaving the object behind. The second challenge is the characteristics and size of burrows that are not fixed, and each new burrow can vary in size and other characteristics. The third challenge is the angle/opening of the burrow. Each burrow opening can vary in direction and the angle of the burrow can also change. Due to these challenges, the traditional object-tracking mechanism is not very effective.

In this study, we propose an efficient technique for tracking and counting underwater static objects, utilizing spatial and temporal values. Our spatial–temporal technique tracks each Nephrops burrow based on its unique spatial and temporal characteristics, counting only the distinct burrows. This counting process involves analyzing the intersection values of detected burrows across consecutive frames. Our methodology unfolds in three stages: data collection and processing, detection, and counting of burrows. Data are sourced from Underwater TV (UWTV) surveys, specifically focusing on the Gulf of Cadiz (Functional Unit 30). Here, videos are recorded using a camera system mounted on a sledge, angled at 45° relative to the seabed. The collected data undergo processing to discard irrelevant frames. Given the extensive volume of video and image data, manual annotation and analysis is a daunting task, requiring significant time and effort. We adopted a manual annotation approach using the Microsoft VOTT image annotation tool in the Pascal VOC format, with all annotations validated by marine experts from the Working Group on Nephrops Surveys (WGNEPS). The second phase of our methodology involves the detection of Nephrops burrows. Image classification and detection in underwater settings present unique challenges compared to other forms of visual data. In this research, we employed the YOLOv3 (You Only Look Once) algorithm, a single-stage model known for its real-time processing and high accuracy, to identify Nephrops burrows using data from Functional Unit 30 (FU30). In the third and final phase, the outputs from the model are fed into our proposed tracking algorithm to enumerate unique burrows. We benchmarked the performance of our algorithm against existing OpenCV multiple tracking algorithms, identifying challenges and limitations specific to the nature of this problem.

We performed experiments on the videos from the FU 30 station. The results show a mAP of Nephrops burrows detection of more than 80%. Also, the counting of Nephrops burrows using the proposed spatial–temporal algorithms gives accurate results up to 90%.

The rest of the paper is organized as follows: the background and related work is presented in Section 2, the proposed methodology is discussed in Section 3, Section 4 is about the experiments and results, and Section 5 concludes the paper.

2. Background and Related Work

Object detection and classification is a challenging computer vision problem. Researchers have developed many methods for object detection and classification tasks. The existing object detection approaches use handcrafted feature-based models [12,13,14,15] and deep features models [16]. There are many approaches presented for underwater object tracking that are very useful in tracking and counting objects with concrete features and moving in the water.

2.1. Background

There are hundreds of underwater species in the ocean. Nephrops norvegicus (also known as Norway lobster) is one of the most important commercial species in Europe. This species is distributed from 10 m to 800 m depth in the Atlantic NE waters and the Mediterranean Sea [1], where sediment is suitable for them to construct their burrows. This species spends most of its time in muddy seabed sediments on the seabed and leaves behind some burrows with certain characteristics. These burrows are sometimes in groups of two, three, or four and sometimes a single burrow represents the presence of Nephrops. These burrows have certain characteristics like at least one opening has a crescent moon shape and a shallowly descending tunnel, which sometimes forms a delta-like tunnel opening. The opening of burrows has scratches and tracks that make these burrows different from other species. An individual Nephrops is counted by counting the burrows of Nephrops. Figure 2 shows the features of the Nephrops burrow system.

2.2. Related Work

Object counting is also one of the biggest challenges in vision problems. One of the major causes of these challenges is due to the nature of an object like its shape, size, and movement in the underwater environment. The underwater environment has a complex background structure, poor visibility, turbulence of water, and complex seabed, which causes object counting to be a complex and challenging problem. The counting algorithms used regression-based [1,17,18,19,20,21] or detection-based [22,23] methods to count the underwater objects. The regression-based method generates a density map for images and later integrates it with the image maps to count the objects. Most regression-based counting algorithms are useful in counting objects from a single image. The detection-based methods are either two-stage detectors [24] or a single-stage detector [25,26]. The two-stage detector, also referred to as a sparse detector, used two steps for image detection. The first step generates the boxes in the image using the region proposal network while the second step evaluates these proposals and generates the detections. Some of the most popular two-stage detectors are RCNN [24], Fast RCNN [27], Faster RCNN [2], SPPNet [28], and Pyramid Networks [29]. On the other hand, the single-stage detectors used single-shot or dense-shot architectures. This type of detector processes the image only once and uses the feature pyramid networks to detect the objects. Some of the state-of-the-art single-stage detectors are OverFeat [30], SSD [26], YOLO [22], Retina-Net [31], and Effiient-Det [32]. The YOLO detector was introduced in 2016 with its first version called YOLO v1 by Redmon et al. [22], achieving 63.4 mAP on the Pascal voc dataset; the second version, called YOLOv2 and YOLO900, was introduced in 2017 by Redmon et al. [33], achieving 78.6 mAP on the same dataset used in YOLOv1. In 2018, Redmon et al. [34] introduced YOLO v3, which used a complex dataset of MS COCO, which had 80 classes and achieved 57.9 mAP. In 2020, multiple versions of YOLO appeared from different authors that used the same MS COOC dataset and achieved a higher mAP as compared to YOLO v3; they were listed as YOLO v4 by Bochkovskiy et al. [25] achieved 65.7 mAP, Scaled YOLO v4 by Chein et al. [35], achieving 66.2 mAP, PP-YOLO from Xiang et al. [36], achieving 65.2 mAP and YOLO v5 by Ultralytics, and achieving 68.9 mAP. In 2021, YOLOX was introduced by Zheng et al. [37], which achieved 69.6 mAP with the MS COCO dataset.

2.2.1. Object Counting in Images

In the literature, people used many methods to automatically count marine objects. In a recent survey by Li et al. [38], they present three different methods based on sensors, vision, and acoustic technology for counting underwater objects. For computer vision methods, people proposed counting methods for still images and videos.

In underwater images, one of the most used techniques for object counting is object segmentation. It is an important method to differentiate the object of interest from the background based on its intensity value. Solahdin et al. [39] and Labuguen et al. [40] adopted threshold-based segmentation methods to count the shrimps and fishes. Jing et al. [41] proposed an edge detection-based Sobel Operator to detect the edge of fishes to obtain the count estimation of fishes. The other common methods used in detecting and counting objects underwater are detection-based methods. The key point in this method is to select the appropriate classifier to identify and detect the objects accurately. Culverhouse and Pilgrim [42] used an Artificial Neural Network to count the fish from the underwater images. Fan et al. [43] used a Background Propagation Neural Network (BPNN) to classify the number of fish for counting. Object detection methods also used some hand-crafted features for object classification and counting.

2.2.2. Object Counting in Videos

Counting the objects from underwater videos is more challenging due to varying framerates and background changes. Lau et al. [44] presented segmentation and SVM-based detection and tracking methods to count the Norway lobster. Sharif et al. [45] used Kalman and Hungarian methods to count the fish from underwater videos on Shutterstock. Chuang et al. [46] presented a multi-object tracking algorithm based on the deformable multiple kernel method to track and count fishes from multiple locations and habitats. After the evolution of deep learning, the efficiency of tracking and counting mechanism becomes more efficient. Huang et al. [47] presented a combination of deep learning and a 3D Kalman filter to count the fish from underwater stereo images captured from stereo cameras. Spampinato et al. [48] combined the blob shape features and histogram matching to track fish underwater. They used the moving average algorithm to achieve an accuracy up to 85%.

Modasshir et al. [3] presented a mechanism to identify and count the corals. They used RetinaNet [31] to identify ad localize the coral samples from the dataset. For tracking corals, they used the OpenCV KCF tracker [49]. Mohammed et al. [50] proposed a fish farm monitoring system to detect, track, and count the fish. They used YOLOv3 to detect the fish in the farm and used an optical flow algorithm to track the fish movements in each frame by using the fish trajectories. Wageeh et al. [4] presented a method to detect the fish from the fish farm. They also count and make trajectories from the fish detections. The proposed method is used for monitoring fish farms using the combination of fish counting and trajectories. They used distance to calculate the distance measured between the fishes in consecutive frames and track them according to the distance between them. Li et al. [51] proposed an adaptive multi-appearance model and tracking strategy for real-time fish tracking. Tanaka et al. [5] presented a fish tracking and counting method used to count the fish on the deck. They trained Yolov3 with 13,789 different images of fish for detection. The fish counter proposed in their approach is based on the Simple Online Real-time Tracking (SORT) [52], which uses the Kalman filter to approximate the displacement of the fish in consecutive frames. Their approach detects and tracks the fishes; later they apply the post-processing algorithm to suppress the false positives. Gaude et al. [6] proposed a method to track the fish in a varying turbidity environment. The Kalman filter is used to track the fish. Table 1 shows the summary of some tracking algorithms that people apply to track the objects in the underwater environment.

2.2.3. Object Tracking

Object tracking plays a fundamental role in computer vision to locate and follow the object. Many applications require accurate and real-time tracking of objects in videos. Tracking in videos encounters many challenges like object deformation, background clutter, occlusions, and lighting variations. Many tracking algorithms are developed to cater to these challenges. OpenCV also provides multiple tracking algorithms that perform very well in tracking the objects in the videos. These algorithms are very fast compared to the detection algorithms. Here, we present all the OpenCV tracking algorithms and determine the pros and cons of these algorithms. Also, we will apply these algorithms to track the Nephrops burrows and will see the results in detail.

BOOSTING Tracker

The boosting tracker works on the HAAR cascade-based detector. This tracker trained itself on the runtime based on the initial values of the bounding box provided on the first frame. This is a very old algorithm and works fine in tracking the objects, but in our experimental test, the tracking usually failed with this tracker.

MIL Tracker

The MIL tracker also works on the same idea as a BOOSTING tracker but during the tracking it also considers the neighborhood location of the object to obtain positive examples. It uses Multiple Instance Learning (MIL). The MIL tracker performs very well as compared to the BOOSTER tracker, but it also leads to false positive tracking and not recovering from full occlusion.

KCF Tracker

The Kernelized Correlation Filters (KCFs) tracker is the most commonly used tracker. It performs better in terms of speed and accuracy as compared to the previous two trackers, but it also leads to the false positive tracking of the object.

TLD Tracker

The TLD tracker used detection and learning in the tracking. It follows the object in every single frame and localizes the object for tracking. This algorithm gives the best results in videos but also leads to many false positives, which degrades the efficiency of this tracker.

MEDIANFLOW Tracker

The MEDIALFLOW tracker tracks the object in forward and backward directions, which enables this tracker to track the object with more accuracy. This tracker keeps track of the object and can identify when the tracking of the object fails. It works well with the object motion, but it fails when the camera is moving and it loses the tracking.

MOSSE tracker

The MOSSE stands for Minimum Output Sum of Squared Error and uses adaptive correlation between objects. This tracker is robust and able to resume tracking if the object is lost in some frames.

CSRT tracker

The CSRT tracker uses the spatial reliability map for adjusting the filter and tracks the object by localizing the selected region. This tracker uses HoGs and Colornames to find out the features of objects.

Some of the other famous tracking algorithms are the following:

DeepSORT

DeepSORT is one of the most widely used tracking algorithms. The object is DeepSORT and is trackable for a longer period because it integrates the appearance information of the object within the tracking algorithm.

Object Tracking MATLAB

There is a computer vision toolbox in MATLAB that provides the object tracking in the videos. This toolbox uses CAMShift and Kanade-Lucas-Tomasi (KLT) for tracking the object.

MDNet

MDNet is a CNN-based tracking algorithm. It is mostly used for real-time object tracking but it is highly expensive in terms of computational and speed.

The method proposed in this study is to track the burrows using spatial–temporal technique. The techniques presented in the literature are not able to track the burrows accurately and provide a lot of false positives due to variations in the angle and burrow characteristics.

3. Proposed Methodology

The proposed methodology is presented in Figure 3. The data are collected through the annual WGNEPS survey. The collected data are passed through a preprocessing stage and the model using the prepared data. The proposed spatial-temporal algorithm is run with the trained model to track the detected burrows and count the unique burrows.

3.1. Data Collection and Processing

Data were gathered using a unique sledge equipped with cameras, illumination, a laser, and a sensor. The research utilizes data from the 2018–19 survey in the Gulf of Cadiz (FU 30). The recordings were captured at a rate of 25 frames per second under optimal lighting conditions. Each station at FU 30 produced video recordings lasting between 10 and 12 min. In 2018, a total of 70 underwater television (UWTV) stations underwent surveying. However, due to inadequate visibility and lighting, 10 of these stations were excluded from the analysis.

Data from FU 30 undergoes a transformation into individual frames. This dataset includes numerous frames characterized by uneven lighting, low light conditions, and subpar contrast. Frames lacking visible burrows or suffering from poor visibility are eliminated in the annotation stage, along with successive frames that present redundant information. The process of manually marking the burrows employs the Microsoft VOTT [53] image annotation tool, utilizing the Pascal VOC format. The resulting XML annotation files detail the image name, class name (Nephrops), and bounding box coordinates for each highlighted object within the image. Experts in marine sciences from Spain conducted a review of the annotated images to ensure the accuracy and quality of the ground-truth data. Following the validation process, the data are segmented into sets for training and testing, with the training set being utilized for model training purposes.

3.2. Nephrops Burrow Detections

To automatically detect and classify the Nephrops burrow systems, a deep learning-based system is proposed that takes underwater video data as input. The system acquires hierarchical characteristics from the input and identifies burrows within every frame of the input video. The total number of detections across all frames culminates in the aggregate tally of Nephrops burrows. To train the models, transfer learning [25] is utilized to fine-tune the YOLO v3 network. YOLOv3 uses darknet to train the model. The darknet originally had 53 layers. In YOLOv3, another 53 layers are added to the darknet for detection, making 106 layers of fully convolutional architecture. Figure 4 shows the YOLOv3 architecture. YOLOv3 [32] gives the best results as compared to other neural networks. The previous neural networks used region-based convolutional neural networks, which require thousands of evaluations to predict an object from an image. On the other hand, YOLO only passes the image once to the neural network; that is why it is called “You Only Look Once”. YOLOv3 has five-layer types in general, which are “convolutional layer”, “upsample layer”, “route layer”, “shortcut layer”, and “yolo layer”.

3.3. Tracking and Counting of Burrows

The proposed algorithm used the spatial and temporal values of the object for tracking and counting. The proposed spatial–temporal technique tracks each burrow based on its spatial and temporal values and counts the unique burrows. The unique burrows are counted by using the intersection values of detected burrows in the consecutive frames. The proposed algorithm presented in Algorithm 1 shows the tracking and counting of burrows using the spatial–temporal value of each burrow. The tracking algorithm runs in parallel with the detection algorithm.

Tracking and Counting Algorithm

The algorithm receives two inputs: an input video V and threshold value λ (an overlap amount used between predicted and ground truth detections). The output of the algorithm will be the unique count of burrows in the given video. The first step is to detect the burrows in the video frames. The input video is passed to the method Detect_Nephrops_Burrows(V). The method converts the video into frames and uses a YOLO v3 detector to detect the burrows in each individual frame. The output of the method is an individual frame (I) with a set of detected Nephrops burrows with spatial values. The spatial value of each detection is the bounding box values {x, y, w, h}, where (x, y) are coordinates of an initial pixel of the bounding box j and w, h are width and height. For each detection in the current frame f ∈ I at frame I_i, the algorithm loops through each detection of the current frame and obtains the current spatial value using the Get_Spatial_Value(b) method. The current detection identified by the algorithm is stored in Index_fb and added to the list of burrow count N. The current detection is marked with a flag, and the algorithm continues to mark each detection of the current frame. Once all the detections of the current frame are marked, the algorithm saves the detections with their spatial values to Index_(f₋_1)b and moves them to the next frame. In the next consecutive frame, the algorithm again identifies the detection using the method Get_Spatial_Value(b). Now, each detection of the current frame is tracked by comparing them with the previous frame detections stored in Index_(f₋_1)b. The Compare_Overlapping (Index_fb, Index_(f₋_1)b) method is used to compare the bounding box values of two detections. For comparison, this method did not use the traditional overlapping metric IoU because of variation in the position of a detected burrow in each frame due to the movement of the camera. The overlapping method is modified in this algorithm, and instead of calculating the IoU, the algorithm calculates the intersection value of each comparison. This compared value is stored in the variable delta and is compared with the given threshold value λ. If the delta is greater or equal to the λ value then it means the same burrow is detected again and is not counted again. Otherwise, the counter list of that frame is updated with a new burrow count. The work processes the whole video V detections in the same way. In the end, the counter values of frames are accumulated and return the unique number of burrows.

Algorithm 1: Tracking and Counting

Input Data V, λ where V is an input video and λ is a threshold value for object overlapping
Results N = {N₁, N₂, ..., N_n}, where N are the unique objects, N_C is the count of unique burrows
Begin

I = Detect_Nephrops_Burrows(V) // I = {I₁, I₂, ..., I_n} where I is the list of frames and each one I_i = {B₁, B₂, …, B_n} has n bounding boxes and each box B_j = {x_j, y_j, w_j, h_j}, where (x_j, y_j) are coordinates of an initial pixel of the bounding box j and w_j, h_j are width and height.

  count = 0
  Foreach frame f ∈ Ido
              Foreach boundingbox b ∈ f do
Index_fb = Get_Spatial_Value(b)
if (flag)
delta = Compare_Overlapping (Index_fb, Index_(f−1)b)
if delta < λ thsen
N_fb++
endif
endif
              endFor
              N.add(N_fb)
              flag = true
  endFor
   return N

4. Experiments and Results

In this section, we analyze the outcomes of various experiments conducted on the tracking and counting of Nephrops burrows. The algorithm for tracking and counting these burrows was implemented on the FU 30 dataset. We employed the YOLOv3 model for object detection, which was specifically trained using the FU 30 dataset. This model underwent training for a total of 100k iterations, with a log recorded at every 10,000 iterations to facilitate evaluation.

In this study, we employed various OpenCV tracking algorithms as a benchmark to compare the effectiveness of our proposed tracking algorithm. The OpenCV trackers used for this comparison included Boosting, MIL, KCF, TLD, MedianFlow, MOSSE, and CSRT. To conduct this comparative analysis, we utilized a nine-minute video from the FU 30 station, which was divided into nine segments of one minute each. Each segment was first analyzed using the OpenCV trackers and then with our proposed tracking and counting algorithm. This approach allowed us to present a comprehensive comparison of the results, showcasing both quantitative and qualitative aspects.

4.1. Quantitative Analysis

In the quantitative analysis, an annotated video with a frame rate of 25 fps is divided into nine temporal segments. Each segment is used for the detection and counting of burrows. The detection and tracking algorithms run together to find the unique number of burrows in each temporal segment separately. The work recorded the number of detections from each temporal segment by YOLOv3. The detection is then processed through the proposed tracking and counting algorithm to identify unique burrows. The algorithm will count the number of burrows after each temporal segment. Table 2 shows the complete results of each temporal segment. The results show the tracking and counting with the frame number of each burrow detected. Here, the work does not consider the FP detections. The aim is to test the proposed algorithm on the TP tracking and counting of Nephrops burrows.

4.2. Qualitative Analysis

In this section, the performance of the proposed tracking and counting algorithm is measured qualitatively. The blue bounding boxes on the images shown in this section are the original detections obtained from the models with the tracking in the consecutive frames.

Figure 5 shows the identification, tracking, and counting of a Nephrops burrow in some consecutive frames. These frames are extracted from the ‘RF09_Min1’ temporal segment. The figure shows the detection on frame 1 and the proposed tracking algorithm also starts tracking the burrow. The burrow is detected in the next consecutive frames and the tracking module tracks the burrow based on the spatial and temporal analysis. The figure shows eight different frames from the first 50 frames to show how the tracking algorithm is working and counting the burrow. The burrow counts are not increasing in this case as the same burrow is identified and tracked in each frame. Hence, the proposed work is able to count the number of unique Nephrops burrows in a video frame.

Figure 6 also shows a similar result as discussed in the previous example. These frames are extracted from the ‘RF09_Min3’ temporal segment. The figure shows eight consecutive frames from the input video. The burrow is detected and tracked in each frame until it disappears from the visibility window. In each frame, the burrow is detected with a different confidence value and with a different size. This is the main reason for the failure of a known tracking algorithm. The proposed spatial–temporal tracking algorithm tracks the burrow based on their spatial intersection values. As is clearly seen in the figure, the detected burrow bounding boxes are different in sizes in each frame but the algorithm is able to track them and count them as one burrow.

Figure 7 shows the tracking results from the Boosting tracking algorithm. The results clearly show that the algorithm tracks well in the first few frames but, as the camera starts moving, the algorithm loses the tracking and starts tracking some other part of the frame. The first frame on the top left corner shows a burrow that is being tracked over the frames, as seen in the last frame on the bottom right corner where it lost the burrow but still shows the tracking. This leads to an inaccurate count of burrows.

Similarly, Figure 8 shows the results from the CSRT tracking algorithm. The burrows track well in the initial few frames. After that, the tracking algorithm lost its position and coordinates. The last four frames in the figure show the wrong tracking of the burrow. CSRT is less effective than the boosting algorithm.

Figure 9 shows the KCF tracking algorithm results. KCF lost the burrow information and lost the tracking information, which led to the wrong count of burrows. In the later frames, the algorithm continuously tracks the burrow at the bottom of the window.

The median flow tracking algorithm shows good results but loses the information in the later frames as other OpenCV tracking algorithms do. Figure 10 shows the tracking results obtained by the median flow tracking algorithm.

The MIL tracking algorithm runs perfectly fine until the Nephrops burrow is visually present on the frames but it loses the information and maps the tracking to the wrong place. Figure 11 shows the MIL tracking algorithm results.

The Mosse tracking algorithm lost the information of the Nephrops burrow at the initial point and lost the tracking. Figure 12, below, shows the results of the Mosse tracking algorithm.

Finally, the TLD tracking algorithm is applied to the same data. This algorithm is not able to track the burrows properly from the initial frames and leads to an inaccurate count. Figure 13 shows the results of the TLD tracking algorithm.

5. Conclusions

This study successfully introduced an innovative spatial–temporal tracking technique for efficiently counting static underwater objects, specifically focusing on Nephrops Norvegicus burrows. Our approach, leveraging the spatial and temporal characteristics of each burrow, has demonstrated effectiveness in distinguishing and counting unique burrow instances. The utilization of YOLOv3, a state-of-the-art real-time object detection algorithm, has proven to be a significant stride in accurately identifying these burrows. This technique not only surpasses the traditional manual annotation and analysis in terms of efficiency but also offers a high degree of accuracy and reliability, as confirmed by experts from WGNEPS. Our methodology’s distinct three-step process—encompassing data collection and processing from UWTV surveys, object detection using deep learning techniques, and the subsequent counting of burrows through our tracking algorithm—has showcased a comprehensive approach to addressing the challenges of underwater object detection and enumeration. Our proposed tracking and counting algorithm used spatial and temporal values of the object for tracking and counting burrows. The results are compared with various OpenCV tracking algorithms to measure the effectiveness of the proposed algorithm. The algorithm is able to track and count the unique burrows in consecutive video frames.

Looking forward, there are several avenues for further exploration and enhancement of this methodology. Firstly, expanding the dataset to include additional underwater environments and varying conditions would be beneficial to test the versatility and robustness of the technique. This expansion could also include data from different geographical locations and depths, providing a broader scope for analysis. Secondly, the integration of more advanced machine learning algorithms, such as deep reinforcement learning and neural networks with higher computational capacities, could potentially increase the accuracy and speed of burrow detection and tracking. Another promising direction is the application of this technique to other marine species and environmental features. By adapting the algorithm to recognize different types of underwater habitats and organisms, the methodology could serve a wider range of ecological and marine biological studies. Lastly, incorporating real-time data analysis capabilities would enable immediate processing and the interpretation of underwater footage, opening up possibilities for dynamic ecological monitoring and rapid response to environmental changes.

In conclusion, while our current methodology marks a significant contribution to the field of marine biology and underwater imaging, the potential for future enhancements and applications remains vast and promising.

Author Contributions

Data collection, A.N. and E.N.; images annotation, research methodology, and implementation, A.N. and E.N.; validation, E.N.; Counting and Tracking algorithm implementation and testing, A.N. and E.N.; writing—original draft preparation, A.N.; writing—review and editing, E.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research work was funded by Umm Al-Qura University, Saudi Arabia under grant number: 25UQU4330909GSSR01.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available because it’s a property of marine institute Spain.

Acknowledgments

The authors extend their appreciation to Umm Al-Qura University, Saudi Arabia for funding this research work through grant number: 25UQU4330909GSSR01. We thank the CN—Spanish Oceanographic Institute, Cadiz, Spain, for providing the dataset for research. I acknowledge that this paper is based on Section 5 of my doctoral dissertation submitted to the University of Malaga [55].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Boominathan, L.; Kruthiventi, S.S.; Babu, R.V. Crowdnet: A deep convolutional network for dense crowd counting. In Proceedings of the 2016 ACM on Multimedia Conference, Amsterdam, The Netherlands, 15–19 October 2016; pp. 640–644. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Modasshir, M.; Rahman, S.; Youngquist, O.; Rekleitis, I. Coral Identification and Counting with an Autonomous Underwater Vehicle. In Proceedings of the 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), Kuala Lumpur, Malaysia, 12–15 December 2018; pp. 524–529. [Google Scholar] [CrossRef]
Wageeh, Y.; Mohamed, H.E.D.; Fadl, A.; Anas, O.; ElMasry, N.; Nabil, A.; Atia, A. YOLO fish detection with Euclidean tracking in fish farms. J. Ambient. Intell. Human Comput. 2021, 12, 5–12. [Google Scholar] [CrossRef]
Tanaka, R.; Nakano, T.; Ogawa, T. Sequential Fish Catch Counter Using Vision-based Fish Detection and Tracking. In Proceedings of the OCEANS 2022—Chennai, Chennai, India, 21–24 February 2022; pp. 1–5. [Google Scholar] [CrossRef]
Gaude, G.S.; Borkar, S. Fish Detection and Tracking for Turbid Underwater Video. In Proceedings of the 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 15–17 May 2019; pp. 326–331. [Google Scholar] [CrossRef]
Rice, A.L.; Chapman, C.J. Observations on the burrows and borrowing of two mud-dwelling decapod crustaceans Nephrops norvegicus and Goneplax romboides. Mar. Biol. 1971, 10, 330–342. [Google Scholar] [CrossRef]
Workshop on the Use of UWTV Surveys for Determining Abundance in Nephrops Stocks throughout European Waters. 2007, p. 198. Available online: https://www.ices.dk/sites/pub/CM%20Doccuments/CM-2007/ACFM/ACFM1407.pdf (accessed on 20 April 2022).
Report of the Workshop and Training Course on Nephrops Burrow Identification (WKNEPHBID). Available online: https://archimer.ifremer.fr/doc/00586/69782/67673.pdf (accessed on 20 April 2022).
2016/SSGIEOM:34; Report of the Workshop on Nephrops Burrow Counting, WKNEPS 2016 Report 9–11 November 2016. ICES: Reykjavík, Iceland, 2016; p. 62.
Leocadio, L.; Weetman, A.; Wieland, K. Using UWTV Surveys to Assess and Advise on Nephrops Stocks; ICES Cooperative Research Report, No. 340; ICES: Lorient, France, 2018; p. 49. [Google Scholar]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
Dollár, P.; Appel, R.; Belongie, S.; Perona, P. Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1532–1545. [Google Scholar] [CrossRef] [PubMed]
Dollar, P.; Tu, Z.; Perona, P.; Belongie, S. Integral channel features. In Proceedings of the BMVC, 2009, London, UK, 7–10 September; 2009. [Google Scholar]
Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed]
Song, H.A.; Lee, S.-Y. Hierarchical representation using NMF. In International Conference on Neural Information Processing; Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8226, pp. 466–473. [Google Scholar]
Lempitsky, V. Zisserman, Learning to count objects in images. In Advances in Neural Information Processing Systems, Proceedings of the 24th Annual Conference on Neural Information Processing Systems, British Columbia, BC, Canada, 6 December 2010; Curran Associates, Inc.: San Francisco, CA, USA, 2010; pp. 1324–1332. [Google Scholar]
Pham, V.-Q.; Kozakaya, T.; Yamaguchi, O.; Okada, R. COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3253–3261. [Google Scholar] [CrossRef]
Xu, B.; Qiu, G. Crowd density estimation based on rich features and random projection forest. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–8. [Google Scholar]
Sam, D.B.; Surya, S.; Babu, R.V. Switching convolutional neural network for crowd counting. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4031–4039. [Google Scholar]
Zhang, Y.; Zhou, D.; Chen, S.; Gao, S.; Ma, Y. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 589–597. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Gao, C.; Li, P.; Zhang, Y.; Liu, J.; Wang, L. People counting based on head detection combining adaboost and cnn in crowded surveillance environment. Neurocomputing 2016, 208, 108–116. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In Computer Vision—ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; ECCV: Milano, Italy, 2014. [Google Scholar]
Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. arXiv 2017, arXiv:1612.03144. https://arxiv.org/abs/1612.03144. [Google Scholar]
Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv 2013, arXiv:1312.6229. [Google Scholar]
Ross, T.-Y.L.P.G.; Dollar, G.K.H.P. Focal loss for dense object detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning. 2019, pp. 6105–6114. Available online: http://proceedings.mlr.press/v97/tan19a.html (accessed on 24 December 2024).
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
Farhadi, A.; Redmon, J. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. Scaled-YOLOv4: Scaling Cross Stage Partial Network. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13024–13033. [Google Scholar] [CrossRef]
Huang, X.; Wang, X.; Lv, W.; Bai, X.; Long, X.; Deng, K.; Dang, Q.; Han, S.; Liu, Q.; Hu, X.; et al. PP-YOLOv2: A Practical Object Detector. arXiv 2021, arXiv:2104.10419. [Google Scholar]
Zhang, M.; Wang, C.; Yang, J.; Zheng, K. Research on Engineering Vehicle Target Detection in Aerial Photography Environment based on YOLOX. In Proceedings of the 2021 14th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 11–12 December 2021; pp. 254–256. [Google Scholar] [CrossRef]
Li, D.; Miao, Z.; Peng, F.; Wang, L.; Hao, Y.; Wang, Z.; Chen, T.; Li, H.; Zheng, Y. Automatic counting methods in aquaculture: A review. J. World Aquac. Soc. 2020, 52, 269–283. [Google Scholar] [CrossRef]
Solahudin, M.; Slamet, W.; Dwi, A.S. Vaname (Litopenaeus vannamei) shrimp fry counting based on image processing method. Earth Environ. Sci. 2018, 147, 2014. [Google Scholar] [CrossRef]
Labuguen, R.T.; Volante, E.J.P.; Causo, A.; Bayot, R.; Peren, G.; Macaraig, R.M.; Libatique, N.J.C.; Tangonan, G.L. Automated fish fry counting and schooling behavior analysis using computer vision. In Proceedings of the 2012 IEEE 8th International Colloquium on Signal Processing and its Applications, Malacca, Malaysia, 23–25 March 2012; pp. 255–260. [Google Scholar]
Jing, D.; Han, J.; Wang, X.; Wang, G.; Tong, J.; Shen, W.; Zhang, J. A method to estimate the abundance of fish based on dual-frequency identification sonar (DIDSON) imaging. Fish. Sci. 2017, 83, 685–697. [Google Scholar] [CrossRef]
Newbury, P.F.; Culverhouse, P.F.; Pilgrim, D.A. Automatic fish population counting by artificial neural network. Aquaculture 1995, 133, 45–55. [Google Scholar] [CrossRef]
Fan, L.; Liu, Y. Automate fry counting using computer vision and multi-class least squares support vector machine. Aquaculture 2013, 380, 91–98. [Google Scholar] [CrossRef]
Lau, P.Y.; Correia, P.L.; Fonseca, P.; Campos, A. Estimating Norway lobster abundance from deep-water videos: An automatic approach. IET Image Process. 2012, 6, 22–30. [Google Scholar] [CrossRef]
Sharif, M.H.; Galip, F.; Guler, A.; Uyaver, S. A simple approach to count and track underwater fishes from videos. In Proceedings of the 2015 18th International Conference on Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 21–23 December 2015; pp. 347–352. [Google Scholar]
Chuang, M.C.; Hwang, J.N.; Williams, K.; Towler, R. Tracking live fish from low-contrast and low-frame-rate stereo videos. IEEE Transactions on Circuits and Systems for Video Technology. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 167–179. [Google Scholar] [CrossRef]
Huang, T.W.; Hwang, J.N.; Romain, S.; Wallace, F. Fish tracking and segmentation from stereo videos on the wild sea surface for electronic monitoring of rail fishing. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 3146–3158. [Google Scholar] [CrossRef]
Spampinato; ChenBurger, J.; Nadarajan, G.; Fisher, B. Detecting, tracking and counting fish in low quality unconstrained underwater videos. In Proceedings of the International Conference on Computer Vision Theory & Applications, Lisboa, Portugal, 5–8 February 2009; Volume s2, pp. 514–519. [Google Scholar]
Available online: https://ehsangazar.com/object-tracking-with-opencv-fd18ccdd7369 (accessed on 24 December 2024).
Mohamed, H.E.-D.; Fadl, A.; Anas, O.; Wageeh, Y.; ElMasry, N.; Nabil, A.; Atia, A. MSR-YOLO: Method to Enhance Fish Detection and Tracking in Fish Farms. Procedia Comput. Sci. 2020, 170, 539–546. [Google Scholar] [CrossRef]
Li, X.; Wei, Z.; Huang, L.; Nie, J.; Zhang, W.; Wang, L. Real-Time Underwater Fish Tracking Based on Adaptive Multi-Appearance Model. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 2710–2714. [Google Scholar] [CrossRef]
Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and real-time tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP2016), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
Microsoft CSE Group. Visual Object Tagging Tool (VOTT), an Electron App for Building End to End Object Detection Models from Images and Videos, v2.2.0. Available online: https://github.com/microsoft/VoTT (accessed on 3 June 2020).
Available online: https://dev.to/afrozchakure/all-you-need-to-know-about-yolo-v3-you-only-look-once-e4m (accessed on 25 December 2024).
Naseer, A.; Nava, E.; Villa, Y. Nephrops Norvegicus Burrows Detection and Classification from Underwater Videos Using Deep Learning Techniques. Universidad de Malaga, Malaga, Spain. Available online: https://hdl.handle.net/10630/31403 (accessed on 4 April 2024).

Figure 1. Nephrops norvegicus.

Figure 2. Nephrops’ burrow system.

Figure 3. Proposed methodology for Nephrops burrow tracking and counting.

Figure 4. YOLOv3 architecture [54].

Figure 5. Nephrops burrow count in temporal segment 1 on the consecutive frames; Blue bounding box shows the detection and tracking of a burrow.

Figure 6. Nephrops burrow count in temporal segment 3 on the consecutive frames. Blue bounding box shows the detection and tracking of a burrow.

Figure 7. Nephrops burrow count using the Boosting Tracking algorithm Green bounding box shows the detection and tracking of a burrow.

Figure 8. Nephrops burrow count using the CSRT tracking algorithm. Green bounding box shows the detection and tracking of a burrow.

Figure 9. Nephrops burrow count using the KCF tracking algorithm. Green bounding box shows the detection and tracking of a burrow.

Figure 10. Nephrops burrow count using the Median flow tracking algorithm. Green bounding box shows the detection and tracking of a burrow.

Figure 11. Nephrops burrow count using the MIL tracking algorithm. Green bounding box shows the detection and tracking of a burrow.

Figure 12. Nephrops burrow count using the Mosse tracking algorithm. Green bounding box shows the detection and tracking of a burrow.

Figure 13. Nephrops burrow count using TLD tracking algorithm. Green bounding box shows the detection and tracking of a burrow.

Table 1. Comparative analysis of few Tracking techniques in Underwater environment.

Approach	Year	Date set	Detection Algorithm	Tracking Algorithm
Identification and Counting of Coral [3]	2018	-	RetinaNet	OpenCV KCF tracker
Detection and Fish Tracking [50]	2020	400 goldfish images	YOLOv3	Optical flow
Detection and fish Tracking [4]	2021	2000 images of golden fish	YOLOv3	Optical flow
Fish Tracking and Counting [5]	2022	13,789 images of fishes	YOLOv3	SORT
Fish Tracking [6]	2019	Custom dataset	Hybrid Algoriyhm	Kalman Filter

Table 2. Details of each burrow count framewise distribution in the proposed temporal segments.

Temporal Segments	Burrow Count (Ground Truth)	Proposed Tracking and Counting												Total Count
RF09_Min1	10	Frame No	161	362	464	624	676	1040	1050	1440	-	-	-	10
RF09_Min1	10	Burrow Count	1	2	2	1	1	1	1	1	-	-	-	10
RF09_Min2	16	Frame No	92	114	210	230	290	309	360	393	502	1020	1070	16
RF09_Min2	16	Burrow Count	1	1	1	3	2	2	2	1	1	1	1	16
RF09_Min3	11	Frame No	1	180	350	421	576	675	1154	1450	-	-	-	11
RF09_Min3	11	Burrow Count	1	1	1	1	1	2	3	1	-	-		11
RF09_Min4	9	Frame No	160	410	810	996	1102	1165	1349	1390	-	-	-	9
RF09_Min4	9	Burrow Count	1	1	1	2	1	1	1	1	-	-	-	9
RF09_Min5	14	Frame No	224	290	776	870	940	1130	1230	1250	1270	1315	-	14
RF09_Min5	14	Burrow Count	1	2	2	1	1	1	2	1	1	2	-	14
RF09_Min6	10	Frame No	320	510	630	665	718	730	1097	1111	1460	-	-	10
RF09_Min6	10	Burrow Count	1	1	1	2	1	1	1	1	1	-	-	10
RF09_Min7	13	Frame No	460	539	775	805	825	856	900	920	1035	1225	1400	13
RF09_Min7	13	Burrow Count	1	1	2	1	1	1	1	2	1	1	1	13
RF09_Min8	6	Frame No	76	274	320	657	885	1360	-	-	-	-	-	6
RF09_Min8	6	Burrow Count	1	1	1	1	1	1	-	-	-	-	-	6
RF09_Min9	18	Frame No	49	66	454	466	484	516	760	780	793	830	1335	18
RF09_Min9	18	Burrow Count	1	2	1	1	3	2	1	2	3	1	1	18

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Naseer, A.; Nava, E. Counting of Underwater Static Objects Through an Efficient Temporal Technique. J. Mar. Sci. Eng. 2025, 13, 205. https://doi.org/10.3390/jmse13020205

AMA Style

Naseer A, Nava E. Counting of Underwater Static Objects Through an Efficient Temporal Technique. Journal of Marine Science and Engineering. 2025; 13(2):205. https://doi.org/10.3390/jmse13020205

Chicago/Turabian Style

Naseer, Atif, and Enrique Nava. 2025. "Counting of Underwater Static Objects Through an Efficient Temporal Technique" Journal of Marine Science and Engineering 13, no. 2: 205. https://doi.org/10.3390/jmse13020205

APA Style

Naseer, A., & Nava, E. (2025). Counting of Underwater Static Objects Through an Efficient Temporal Technique. Journal of Marine Science and Engineering, 13(2), 205. https://doi.org/10.3390/jmse13020205

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Counting of Underwater Static Objects Through an Efficient Temporal Technique

Abstract

1. Introduction

2. Background and Related Work

2.1. Background

2.2. Related Work

2.2.1. Object Counting in Images

2.2.2. Object Counting in Videos

2.2.3. Object Tracking

3. Proposed Methodology

3.1. Data Collection and Processing

3.2. Nephrops Burrow Detections

3.3. Tracking and Counting of Burrows

Tracking and Counting Algorithm

4. Experiments and Results

4.1. Quantitative Analysis

4.2. Qualitative Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI