Depth-Oriented Gray Image for Unseen Pig Detection in Real Time

Seo, Jongwoong; Son, Seungwook; Yu, Seunghyun; Baek, Hwapyeong; Chung, Yongwha

doi:10.3390/app15020988

Open AccessArticle

Depth-Oriented Gray Image for Unseen Pig Detection in Real Time

by

Jongwoong Seo

¹,

Seungwook Son

²,

Seunghyun Yu

¹

,

Hwapyeong Baek

¹ and

Yongwha Chung

^1,*

¹

Department of Computer Convergence Software, Korea University, Sejong 30019, Republic of Korea

²

Info Valley Korea Co., Ltd., Anyang 14067, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(2), 988; https://doi.org/10.3390/app15020988

Submission received: 29 December 2024 / Revised: 17 January 2025 / Accepted: 18 January 2025 / Published: 20 January 2025

(This article belongs to the Special Issue Advances in Machine Vision for Industry and Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

With the increasing demand for pork, improving pig health and welfare management productivity has become a priority. However, it is impractical for humans to manually monitor all pigsties in commercial-scale pig farms, highlighting the need for automated health monitoring systems. In such systems, object detection is essential. However, challenges such as insufficient training data, low computational performance, and generalization issues in diverse environments make achieving high accuracy in unseen environments difficult. Conventional RGB-based object detection models face performance limitations due to brightness similarity between objects and backgrounds, new facility installations, and varying lighting conditions. To address these challenges, this study proposes a DOG (Depth-Oriented Gray) image generation method using various foundation models (SAM, LaMa, Depth Anything). Without additional sensors or retraining, the proposed method utilizes depth information from the testing environment to distinguish between foreground and background, generating depth background images and establishing an approach to define the Region of Interest (RoI) and Region of Uninterest (RoU). By converting RGB input images into the HSV color space and combining HSV-Value, inverted HSV-Saturation, and the generated depth background images, DOG images are created to enhance foreground object features while effectively suppressing background information. Experimental results using low-cost CPU and GPU systems demonstrated that DOG images improved detection accuracy (AP50) by up to 6.4% compared to conventional gray images. Moreover, DOG image generation achieved real-time processing speeds, taking 3.6 ms on a CPU, approximately 53.8 times faster than the GPU-based depth image generation time of Depth Anything, which requires 193.7 ms.

Keywords:

foundation model; image processing; pig detection; unseen environment; real-time application

1. Introduction

Pigs are critical in the global livestock industry and are a significant protein source [1]. Pork consumption is projected to grow at an average annual rate of 1.2% from 2018 to 2027 [2], underscoring the importance of health and welfare management for pigs. Early detection of health and welfare issues in pigs is essential for maintaining productivity and preventing the spread of diseases. However, it is impractical to manually monitor subtle behavioral changes in commercial-scale pig farms over extended periods [3], highlighting the need for efficient and automated health monitoring systems. Such systems are essential to enhance productivity while managing pigs’ welfare and health conditions. With increasing demands for automation and efficiency in the livestock industry, automated health monitoring systems are expected to contribute to real-time management and improved productivity in pig farming [4].

Models with fast detection speeds and high computational efficiency are required to monitor pigs’ health status in real time using cameras installed in individual pens within commercial pig farms. However, the corrosive environment caused by ammonia emissions in pig farms limits the use of high-performance computing resources. In this context, convolutional neural network (CNN)-based models are suitable for object detection tasks. Among these, You Only Look Once (YOLO) [5], with its CNN-based end-to-end structure, predicts both object locations and classes simultaneously, offering faster processing speeds compared to those of Region-Based Convolutional Neural Networks (R-CNNs) [6]. While R-CNNs use a step-by-step approach with region proposals to achieve high detection accuracy, their computational demands make them unsuitable for real-time processing. Similarly, transformer-based models [7], leveraging self-attention mechanisms to learn global relationships, provide higher detection accuracy than that of CNN-based models but require high-performance computing resources, posing limitations in real-time commercial pig farm settings. Given the need for computational efficiency and real-time processing, YOLO is well suited for automated health monitoring systems.

However, these object detection models rely on red, green, and blue (RGB) images, which can experience reduced detection accuracy in environments with complex lighting conditions or significant background variability [8]. For instance, in top-view footage captured by fixed cameras in the same pen, lighting variations or natural light can cause differences in brightness depending on object positions. Additionally, materials like flooring within the Region of Interest (RoI) may ambiguously blend with objects. Furthermore, these models, trained on seen environments, exhibit limited generalization capabilities and lower detection accuracy in previously unseen environments [9]. Figure 1 illustrates these challenges, highlighting the limitations of object detection in unseen environments and the reduced detection accuracy of the YOLOv11n [10] detector in such scenarios.

In Figure 1a, there is a screenshot captured from [11], whereas d is a screenshot captured from [12]. In this study, the YOLO detector is applied to both the training data and unseen environments represented by a and d. The results in b and e, shown with green boxes for detected objects, demonstrate low detection accuracy in these unseen environments. Upon analyzing the causes, it is found that the training utilized the German pig dataset [13]. This dataset, collected in German environments, features a characteristic where the floor is darker than the pigs in the foreground. However, this dataset characteristic leads to detection errors when the floor’s brightness is similar to or brighter than the pigs, as observed in b. Furthermore, similar errors occur in e due to brightness variations in objects or walls within pigpens that differ from the training environment.

To overcome these limitations, recently developed foundation models have emerged as transformative tools for object detection and segmentation. These models, trained on extensive datasets, demonstrate high adaptability to diverse tasks and environments. The Segment Anything Model (SAM) [14] offers robust segmentation performance with minimal user input, effectively separating foreground and background across varied conditions. Large Mask Inpainting (LaMa) [15] enhances masked areas by leveraging surrounding pixels, achieving high performance in background restoration tasks. Additionally, Depth Anything [16] uses depth information to accurately distinguish objects’ physical positions from the background, maintaining stable performance even in challenging lighting conditions or complex facility layouts. By analyzing the depth information in c and f, it becomes evident that leveraging depth-based techniques could provide a promising solution to the aforementioned errors.

To address these issues, this study proposes a method for generating Depth-Oriented Gray (DOG) images using foundation models such as the SAM, LaMa, and Depth Anything. This approach establishes the RoI in unseen environments and effectively generalizes the brightness of objects and backgrounds in test images. The proposed method improves detection accuracy by accurately distinguishing between foreground and background, even in complex environments. Foundation models are transformer-based architectures that require significant processing time and high-performance computing resources. However, the proposed method achieves real-time processing by utilizing foundation models only during the initialization phase, significantly enhancing both detection accuracy and speed.

The contribution of this study is as follows:

This study proposes a new method that utilizes foundation models, including the SAM, LaMa, and Depth Anything, to accurately separate the foreground and background based on depth information in pigpens. The method effectively generates depth background images and establishes the RoI, even in test environments with diverse lighting conditions and high background complexity.
The proposed method generates DOG images by combining HSV-Value, inverted HSV-Saturation images, and the generated depth background images. This approach is designed to maintain high detection and generalization performance, leveraging depth information to address the accuracy degradation issues observed in conventional detection models when applied to unseen test environments.
This study proposes a cost-effective approach to utilizing depth information without requiring additional depth sensors by leveraging the Depth Anything model. In unseen test environments, depth background images are generated only once using GPU-based Depth Anything. Subsequently, input images are processed in real time using CPU-based DOG image generation. This method enables operation on systems equipped with low-cost CPUs and GPUs, significantly reducing system setup costs.

2. Related Work

Research on object detection and tracking in various environments has been actively conducted alongside advancements in deep learning technologies, which are increasingly applicable to video monitoring systems. Video monitoring enables real-time detection and analysis of object states, which is pivotal in optimizing management and automation across diverse fields such as agriculture, logistics, and security.

However, complex environmental factors, lighting changes, and overlaps with structures remain significant challenges that can reduce object detection accuracy. These issues highlight the limitations of traditional deep learning models, which often fail to ensure generalization performance in environments different from those they were trained on. As a result, frequent object detection errors undermine the reliability and utility of video monitoring systems [17,18,19].

Existing studies have proposed various approaches to improve object detection accuracy to address these challenges. One prominent approach involves attention mechanisms that guide detection models to focus on relevant regions in an image. For instance, methods that define the Region of Interest (RoI) and Regions of Uninterest (RoUs) help detection models exclude irrelevant background information and focus on key objects [20,21,22]. By allowing models to concentrate on foreground areas during detection, RoI settings effectively enhance detection accuracy. Additionally, segmentation-based approaches that separate the foreground from the background further assist detection models in focusing on object features, offering another effective strategy [23,24,25]. These attention-based methods improve object detection performance in complex environments and demonstrate broad applicability across various domains.

Nevertheless, external environmental factors, such as lighting-induced imbalances, continue to pose significant challenges even for these approaches. To overcome such issues, numerous experiments incorporating various preprocessing and postprocessing techniques have been proposed [26,27,28,29,30]. For example, image transformations emphasize the differences between the foreground and background, enhance image quality, or make key features more distinguishable. These techniques mitigate problems caused by lighting variations or noise in complex environments, contributing to improved detection accuracy. Enhancing contrast and emphasizing key features facilitates clear foreground–background separation, even under diverse lighting conditions or complex backgrounds.

Finally, compared to optical cameras, depth cameras offer the advantage of more effectively distinguishing between the foreground and background [31,32,33,34]. Depth cameras utilize the Time of Flight (ToF) principle, emitting signals and measuring the return time after interacting with objects to generate depth information [35]. This capability positions depth cameras as valuable tools for addressing challenges posed by lighting variations and complex backgrounds. However, despite their advantages, such as low cost, fast processing, and high reliability, depth cameras have limitations, including missing information due to installation location or environmental conditions and additional noise from indoor or outdoor variability. The accuracy of depth information collected by cameras can vary significantly depending on lighting conditions and structural factors in the environment. Therefore, optimizing installation locations is crucial to fully leverage the benefits of depth cameras while overcoming these environmental constraints [36].

Table 1 summarizes the results of a search using the keywords “pig”, “depth”, and “detection”. This study represents the first research to improve object detection model performance by utilizing depth information to establish the RoI and generate generalized images in unseen environments. In contrast to previous studies, this research proposes an approach that leverages the foundation model Depth Anything to utilize depth information without the additional costs associated with purchasing and installing depth sensors.

Furthermore, this study overcomes the limitations of traditional depth sensor-based detection techniques by introducing a new method that employs foundation models such as the SAM, LaMa, and Depth Anything. Specifically, the proposed approach uses foundation models in the test pigpen environment during the initialization process, utilizing GPUs only once. After initialization, real-time DOG images are generated through simple CPU-based image processing. This practical method addresses the generalization challenges of test images in unseen environments while enhancing detection models’ accuracy and efficiency.

3. Materials and Methods

This study utilizes recently introduced deep learning-based foundation models, including the SAM, LaMa, and Depth Anything. The SAM is a deep learning-based object segmentation model trained on large-scale datasets, capable of generating foreground object masks based on user inputs (e.g., the entire image, bounding boxes, or object points). These masks are passed to LaMa, a deep learning model that seamlessly restores masked regions using surrounding pixel information. LaMa generates natural background images, even for complex or large masked areas. The generated background images are then provided to Depth Anything, which calculates the depth information for each pixel to produce depth background images. Depth Anything estimates depth based on the physical position and distance of objects, creating depth images that remain unaffected by the facilities or lighting conditions of the pigpen environment. These depth background images effectively establish the RoI.

We propose a new DOG image generation method, which involves converting RGB colors to the hue, saturation, value (HSV) color space and leveraging HSV-Value and HSV-Saturation information. In the HSV-Saturation space, white and black are defined as achromatic colors, and since most pigs have colors based on white or black, their HSV-Saturation values are close to zero. This consistent representation of pigs’ brightness in HSV-Saturation images, irrespective of their positions in the RGB image, allows the effective separation of foreground pixels that are otherwise difficult to distinguish using color information alone. However, HSV-Hue, which represents pure color information as an angle, was excluded because it is sensitive to changes in object position and lighting conditions, making it less reliable. This HSV-based approach helps address issues arising from complex lighting conditions and color similarities between objects and the background.

For the HSV-Value and depth background images used in the image generation process, we apply Contrast-Limited Adaptive Histogram Equalization (CLAHE) [46]. CLAHE enhances contrast by uniformly distributing pixel values and adjusts histograms locally to overcome the limitations of global histogram equalization. This technique prevents noise amplification while effectively improving contrast. By applying CLAHE to both HSV-Value images and depth background images, the histograms are smoothed, and object shapes and boundaries become more distinct. Consequently, boundaries between objects and backgrounds, which are challenging to distinguish using HSV-Value images alone, can be effectively separated by combining these images with depth-based images for accurate RoI determination.

Figure 2 illustrates the entire process proposed in this study. Given an input image, the first step is the RoI Setting Module, executed only once to generate the depth images of the input scene. The generated depth information establishes RoIs that effectively separate the foreground and background in a top-view fixed camera environment. Next, the DOG Generation Module takes the initialized depth background image and established RoIs from the RoI Setting Module as inputs to generate DOG images. This module aims to utilize simple image processing techniques to effectively distinguish RoUs and RoIs based on the initialized RoI information from the test pigpen and to create images emphasizing the differences between the foreground and background pixels within the RoI region. The green boxes in the figure represent the detected objects. This approach achieves generalization for test images in unseen environments, enhances object detection model performance, and enables real-time detection.

3.1. RoI Setting Module

The method for setting the RoI to enhance detection accuracy is illustrated in Figure 3. First, the SAM estimates the probability that each pixel in the input image belongs to the user-specified object region. The resulting mask is used to separate the foreground object, and that mask is then passed to LaMa. LaMa removes the masked foreground region and restores the remaining background as an inpainting model specialized in filling large masked areas. The background image generated by LaMa is subsequently provided to Depth Anything, which predicts a depth value for each pixel to create a depth background image. Unlike RGB-based information, this depth information remains unaffected by facility layouts, lighting conditions, and floor materials, making it particularly effective in environments where RGB data alone are insufficient.

Although the inpainting process may leave shadows in areas where the object has been removed, Depth Anything recognizes those shadows as non-object shadows, thereby accurately generating the depth background. Afterward, CLAHE is applied to the depth background image to enhance contrast, followed by the Otsu [47] thresholding algorithm to finalize the RoI.

3.2. DOG Generation Module

The process for generating DOG images, aimed at generalizing test images and improving the accuracy of object detection models, is illustrated in Figure 4. The proposed DOG Generation Module leverages the depth background images produced during the preprocessing stage to achieve real-time depth-like image generation. Initially, the pixel values of both the grayscale image and the depth background image are summed. This step mitigates the brightness inconsistencies caused by lighting variations and environmental factors, enabling the enhanced representation of object contours and details.

Subsequently, Contrast-Limited Adaptive Histogram Equalization (CLAHE) was applied with parameters set to clipLimit = 4 and tileGridSize = (4, 4) to enhance contrast. clipLimit = 4 was chosen to effectively prevent excessive noise amplification, preserving the integrity of the image during processing. tileGridSize = (4, 4) was utilized to emphasize differences in smaller regions, which allowed for the precise delineation of boundaries between the floor and pigpen walls, as well as the edges of objects. This contrast enhancement improved the distinction between the foreground and background, allowing the object detection model to effectively identify the RoI. Following the application of CLAHE, the histograms of the grayscale image and the depth background image were equalized, and the two images were combined using equal weights. The reason for applying equal weights was to maintain a balanced contribution between the grayscale image, which contains objects, and the depth background image, which does not contain objects.

Grayscale images often struggle to differentiate between the foreground and background under varying lighting conditions or due to the presence of diverse floor materials. To address this challenge, the depth background information is merged with the grayscale image using equal weighting to create a composite image. This composite image retains the structural details and contours of the objects in the grayscale image while effectively incorporating depth background information. Finally, element-wise multiplication is performed between this composite image and the inverted saturation component of the HSV color space. This process emphasizes low-saturation foreground regions, such as pigs, while suppressing high-saturation regions containing rich color information, resulting in the final DOG image.

Figure 5 illustrates the HSV images derived from the input video. The HSV-Hue component, which separates colors such as red, green, and blue into angles in the color space, was not utilized because it is heavily influenced by color and fails to represent pixel values of objects within the image consistently. In contrast, HSV-Saturation defines lower values as achromatic regions, and when inverted, achromatic objects are represented with higher pixel values. This characteristic provides compelling features when performing element-wise multiplication. HSV-Value, generated based on the maximum value in the red, green, and blue color space, vividly highlights the texture of objects.

Algorithm 1 describes the process for generating RoI and DOG images from a sequence of video frames. It involves preprocessing steps to define RoIs, creating depth background images, and generating DOG images to enhance object detection accuracy in unseen environments. By leveraging deep learning-based foundation models (SAM, LaMa, Depth Anything) and simple image processing techniques, the algorithm ensures efficient real-time performance.

Algorithm 1. RoI Setting Module and DOG Generation Module

Input:
Frame: video frame sequence F =

{{f}_{1} {, f}_{2} \dots f_{n}}

.
Output:
      Region of Interest (RoI) image;
      Depth background (DB) image;
      Depth-oriented gray (DOG) image.

For each frame, carry out

f \in F

.

If

f

is the first frame, then one obtains the following:

f_{1}^{M a s k}

= inference with SAM given

f_{1};

f_{1}^{B a c k g r o u n d}

= inference with LaMa given

f_{1}^{M a s k};

f_{1}^{D e p t h R e s u l t}

= inference with Depth Anything given

f_{1}^{B a c k g r o u n d};

f_{1}^{D e p t h B a c k g r o u n d}

= apply CLAHE on

f_{1}^{D e p t h R e s u l t};

RoI_image = apply Otsu thresholding on

f_{1}^{D e p t h B a c k g r o u n d} .

Elsewhere, one obtains the following:

f_{n}^{H u e}, f_{n}^{S a t u r a t i o n}, f_{n}^{V a l u e}

= split

f

into HSV color space;

f_{n}^{V a l u e}

= apply CLAHE on

f_{n}^{V a l u e};

f_{n}^{S u m m e d I m a g e}

= Weighted Sum (

f_{n}^{V a l u e}, f_{1}^{D e p t h B a c k g r o u n d}

,

α = 0.5, β = 0.5

);

f_{n}^{I n v e r t e d S a t u r a t i o n}

= bitwisenot

f_{n}^{S a t u r a t i o n};

f_{n}^{M u l t i p l i e d I m a g e}

= element-wise multiply(

f_{n}^{S u m m e d I m a g e}

,

f_{n}^{I n v e r t e d S a t u r a t i o n}

).
If RoI

[y, x]

is 0, then one obtains the following:

f^{m u l t i p l i e d I m a g e} [y, x] .

= copy pixel value from

f_{1}^{D e p t h R e s u l t}

to

f_{n}^{M u l t i p l i e d I m a g e}

;

f_{n}^{D O G I m a g e}

=

f_{n}^{M u l t i p l i e d I m a g e}

.
Return

f_{n}^{D O G I m a g e}

.

In summary, the proposed methodology integrates these state-of-the-art foundation models and advanced processing techniques to address the challenges of unseen environments. The modular design of the RoI Setting Module and DOG Generation Module enables accurate separation of the foreground and background, while enhancing the generalization capabilities for object detection models. This approach demonstrates both its practicality and potential for deployment in real-world scenarios, such as pigpen monitoring systems.

4. Experiment Results

This study was conducted in the following hardware and software environments. The experiments were conducted on an Intel Core i7-7700K @ 4.20 GHz processor (Intel Corporation, Santa Clara, CA, USA) and a GeForce GTX 1660 GPU (NVIDIA Corporation, Santa Clara, CA, USA). The software environment was based on the Ubuntu 22.04 LTS operating system (Canonical Ltd, London, UK), with the deep learning models implemented and tested using PyTorch 2.0.1 (Meta Platforms, Inc. Menlo Park, CA, USA) and CUDA 11.7 (NVIDIA Corporation, Santa Clara, CA, USA).

Figure 6 presents the data and training setup used in this study. The training dataset was sourced from the German pig dataset [13] and consisted of 985 images, divided into 788 images for training and 197 for validation. The test dataset consisted of 200 images derived from videos acquired in Korea. Accuracy measurements were specifically conducted using data filmed at a pigsty on a farm located in Jochiwon-up, Sejong-si, Chungcheongnam-do, South Korea, involving 23 pigs. The test dataset additionally included images of another Korean pig, Belgian pigs, and Chinese pigs, specifically utilized to evaluate the generalization performance of the proposed method. The model was trained for 300 epochs with a batch size of 16 and a resolution of 640 × 640.

The dataset involving 23 pigs presented challenges such as indistinguishable separation between pigs and the background due to flooring materials and object overlap caused by feeding troughs located at the top. Additionally, the top-right area of the image was darker than other regions, which could result in detection errors. In contrast, the single image of another Korean pig exhibited clear separation between objects and the background but was prone to errors caused by sunlight. The Belgian pig image faced challenges in the central area, where objects and the background could not be distinctly separated, similarly to the 23-pig dataset. Lastly, the Chinese pig image included a black pig, which was absent in the training dataset (German pig dataset), and objects with dark colors that closely resembled the background, both contributing to potential detection errors.

True Positive (TP): TP represents the number of instances where the model correctly detects existing objects. For example, if an image in a real farm environment contains 23 pigs, and the model accurately detects 20 out of these 23 pigs, the TP value would be 20. A high TP value indicates that the model demonstrates excellent detection performance.
False Positive (FP): FP represents the instances where the model incorrectly detects an object in locations where no object exists. For example, if the model mistakenly identifies an area without a pig containing a pig, it counts as an FP. A lower FP value indicates higher reliability in the model’s detection performance.
False Positive (FN): FN refers to the number of instances where the model fails to detect existing objects. For example, if a pig is present but the model does not detect it, it counts as an FN. A lower FN value indicates that the model detects objects without missing them.
Average Precision (AP): AP50 refers to the Average Precision at an Intersection over a Union (IoU) threshold of 50% or higher and is a metric used to evaluate the overall performance of object detection models. An IoU measures the overlap ratio between the detected bounding box and the ground truth bounding box, while AP represents the area under the Precision–Recall curve. AP50 clearly indicates how well a model can accurately detect objects with at least 50% overlap between predicted and actual boundaries.

I O U = \frac{A r e a o f O v e r l a p}{A r e a o f U n i o n} A P = \int_{0}^{1} P r e c i s i o n (R e c a l l) d r

Inference Time: Inference Time represents the time (in milliseconds) a model takes to process a single image. It is a key metric for evaluating the model’s real-time processing capability. For instance, if the Inference Time is 10 ms, the model can process 100 images per second.

Table 2 presents the results of testing models trained on grayscale images when the test image types were grayscale, Depth Anything, and the proposed method (DOG). For real-time detection, the smallest model variants, whereby the model ending with n is nano and t is tiny, were employed. The results showed that testing with depth images overall improved AP50 from 3% up to 9.5% compared to testing with the grayscale images of unseen data from Jochiwon. Additionally, the accuracy of all detection models improved compared to that of the baseline, demonstrating more stable detection performance in unseen environments when using depth images. This indicated that depth images can resolve difficulties in object detection when there is a distinction between the foreground and background.

Comparing the test results of grayscale and DOG images revealed that even without training on DOG images, detection of unseen data using DOG images overall improved AP50 from 2.7% up to 6.4% over grayscale image detection. For test images with a resolution of 640 × 640, the Inference Time for Depth Anything using the Depth Anything small model on a GPU was 193.7 ms. In contrast, the proposed DOG image transformation took 3.6 ms on a CPU, demonstrating a 53.8-times increase in processing speed.

Furthermore, the results emphasized the practicality of the proposed method for real-time applications. Unlike Depth Anything, which relied on computationally expensive GPU-based processing, the proposed method achieved significant speed improvements by utilizing lightweight CPU-based operations. This efficiency made it particularly suitable for deployment in low-resource environments, such as farms, where high-performance computing infrastructure was not readily available.

Figure 7 illustrates the object detection results according to the test image types. The detection results of the object detection model varied based on the test image type. In the enlarged prediction results, green boxes represent TP, blue boxes represent FP, and yellow boxes represent FN. As shown, FP and FN occurred due to the ambiguity between the foreground and background in unseen environments, negatively impacting the accuracy of the object detection model. The Depth Anything and DOG Image detection results successfully detected all 23 pigs. However, errors still occurred in detecting the feeding troughs. These errors are attributed to the feeding troughs having a different shape from those in the training environment and having a depth similar to that of pigs. This issue can be resolved by setting the RoI as a red-highlighted area, as in the DOG Image with the RoI approach.

Figure 8 shows the results of applying the proposed method across various environments. Grayscale images demonstrated that detection results can be negatively affected when the brightness of objects and backgrounds is similar. While HSV-Saturation images effectively represented achromatic pigs with pixel values close to zero in various environments, they suffer from the drawback of losing texture information. The depth of the background played a crucial role in generating DOG images. Although generating depth images for every frame using Depth Anything was not feasible for real-time processing, combining grayscale images and depth background images with equal weighting created an image similar to a depth image through simple image processing on a CPU. The Otsu algorithm was also applied to the generated depth background image to define the RoI region.

The RoI remained unchanged after initialization in the pigsty environment, where a fixed camera was used. This initial setup helped the object detection model identify the areas to detect, improving detection accuracy. The DOG images created from the above components generalized the foreground (pigs) across various environments while reducing the brightness values of the background. The images enabled a clear distinction between the foreground and background, and the DOG images enabled clearer object boundaries, with the green boxes representing the detected objects, resulting in more robust and precise detection even in challenging unseen environments.

The DOG images not only addressed varying lighting conditions effectively but also enhanced object boundaries, resulting in robust and precise detection performance. For example, in the first column, the gray image shows a heat lamp positioned in the upper left corner near the feeding trough, where the brightness of objects decreases as they move further away from the lamp. Despite this variation in lighting conditions within a single image, the DOG image effectively distinguishes the boundaries between the white flooring and objects while maintaining similar brightness levels among the objects. The second column depicts an environment with natural light, where pigs near the light source appear brighter, and those farther away exhibit darker brightness levels. Similarly to the first column, the DOG image equalized the brightness levels among the objects, even under varying lighting conditions within the same image. In the third column, the gray image presents a scenario where the brightness of the pigs and the floor are similar, making differentiation challenging. However, the DOG image clearly separates the objects from the background, demonstrating its effectiveness. Lastly, in the fourth column, the black pig image highlights pigs darker than the floor. The DOG image adjusted the brightness of the black pigs to a level comparable to the white pigs in other DOG images, further enhancing the distinction between objects and improving overall clarity.

5. Discussion

The results of this study demonstrated that the proposed method, which utilized foundation models during the initialization phase with top-view images as input, was capable of operating in real time on a CPU through simple image processing. The DOG images generated by the proposed method showed superior object detection performance compared to that of grayscale images in various pigpen environments, indicating their effectiveness in distinguishing between the foreground and background even in complex settings. However, in tilted-view environments, the performance of DOG images varied depending on the camera angle and perspective transformation method. In certain cases, distinguishing between the foreground and background became challenging, potentially reducing detection accuracy. These limitations highlighted the need for the further development of robust methodologies that could reliably operate in tilted-view scenarios.

Figure 9 presents the results of applying the proposed method to tilted-view scenarios. It shows the outcomes of applying the proposed method to tilted-view input image after performing perspective transformations, with the green boxes representing the detected objects. In the input image, pigs closer to the camera appeared larger than those farther away. However, in the transformed image, a distortion was observed where pigs farther from the camera appeared larger than those closer to the camera due to the applied perspective transformation. This distortion seemed to have arisen during the perspective transformation process. Additionally, incorrect results from the Otsu thresholding of the depth background image led to the generation of RoI images that obscured parts of the pigs. To address this issue, developing appropriate perspective transformation methods or designing a more robust approach that can reliably work in tilted-view environments could be a topic for future research.

Table 3 presents the performance of different YOLO model versions when tested on unseen environment data. The models were trained on augmented training data, where DOG images were added to the training dataset to enhance robustness against complex and previously unseen conditions. The results demonstrated how the inclusion of DOG images improved the models’ generalization capability and detection accuracy in challenging environments. Models trained on both Gray and DOG images showed an overall improvement in AP50, ranging from 3.6% to 6.6%, compared to models trained on Gray images. This indicated that incorporating DOG images into the training process contributed to the better handling of challenging scenarios and improved overall detection performance.

However, there are two key challenges associated with augmenting the training data. First, while the proposed method performs effectively in top-view scenarios, it fails to achieve consistent results in tilted-view settings. Second, the initialization process must be performed on all training images, which significantly increases the overall training time.

Figure 10 illustrates the CPU and GPU memory usage during the initialization phase of DOG image generation. The initialization step represented the time required for the RoI Setting Module to operate on a single image. To measure this, the models were loaded prior to recording, and the stages of the initialization step were marked with a green line. Although this process was performed only once per pigpen, it took approximately 10 s. The models used were the SAM (sam2 hiera small), LaMa (big lama), and Depth Anything (Depth Anything V2 small). This initialization step was also conducted under the same experimental conditions as the those of the main experiments.

6. Conclusions

This study proposed a methodology utilizing DOG images and the Depth Anything model to achieve high object detection performance and cost efficiency even in unseen environments. The approach addresses the performance degradation issues of gray image-based detection models in complex backgrounds and unseen environmental conditions. Cost efficiency is significantly improved by generating depth information through Depth Anything without requiring additional depth sensors.

Experimental results demonstrated that DOG images achieved up to a 6.4% increase in AP50 compared to gray images, with a processing time of 3.6 ms on a CPU approximately 53.8 times faster than the GPU-based Depth Anything’s depth image generation time of 193.7 ms. These findings confirmed that DOG images resolve foreground–background distinction challenges in complex lighting conditions and backgrounds, proving their applicability in real-time object detection systems.

Future research will focus on developing real-time video-based detection and tracking algorithms and enhancing generalization performance across diverse environments to expand the applicability of real-time monitoring systems in areas such as farm management and smart farming. Following approaches like [19], it is anticipated that generating DOG images on a CPU and processing object detection models such as YOLO on a GPU in a pipeline could enable real-time video monitoring even in embedded board environments. Furthermore, if the proposed method, initially designed for top-view operation, incorporates camera position estimation and 3D modeling for perspective transformation to a top view for the RoI setting, it is also expected to be applicable in tilted-view environments.

Author Contributions

Y.C. conceptualized and designed the experiments; J.S. wrote the paper and built the overall software; S.S. reviewed and edited the paper; S.Y. and H.B. validated the proposed method. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by “Regional Innovation Strategy (RIS)” through the National Research Foundation of Korea (NRF) funded by the MOE (2021RIS-004), the Korea Institute for Advancement of Technology (KIAT) grant funded by the MOTIE (P0024177, Development of RIC (Regional Innovation Cluster)), and the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2022R1F1A1062775).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Data are contained within the article.

Conflicts of Interest

Author Seungwook Son was employed by the company Info Valley Korea Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest”.

References

McGlone, J. The Future of Pork Production in the World: Towards Sustainable, Welfare-Positive Systems. Animals 2013, 3, 401–415. [Google Scholar] [CrossRef] [PubMed]
Outlook, O.-F.A. OECD-FAO Agricultural Outlook 2018–2027. Available online: https://www.oecd.org/en/publications/oecd-fao-agricultural-outlook-2018-2027_agr_outlook-2018-en (accessed on 10 January 2025).
Matthews, S.; Miller, A.; Clapp, J.; Plötz, T.; Kyriazakis, I. Early Detection of Health and Welfare Compromises Through Automated Detection of Behavioural Changes in Pigs. Vet. J. 2016, 217, 43–51. [Google Scholar] [CrossRef]
Racewicz, P.; Ludwiczak, A.; Skrzypczak, E.; Składanowska-Baryza, J.; Biesiada, H.; Nowak, T.; Nowaczewski, S.; Zaborowicz, M.; Stanisz, M.; Ślósarz, P. Welfare Health and Productivity in Commercial Pig Herds. Animals 2021, 11, 1176. [Google Scholar] [CrossRef]
Redmon, J. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 142–158. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, N.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Sa, J.; Choi, Y.; Lee, H.; Chung, Y.; Park, D.; Cho, J. Fast Pig Detection with a Top-View Camera under Various Illumination Conditions. Symmetry 2019, 11, 266. [Google Scholar] [CrossRef]
Fan, C.; Liu, Y.; Liu, X.; Sun, Y.; Wang, J. A Study on Semi-Supervised Learning in Enhancing Performance of AHU Unseen Fault Detection with Limited Labeled Data. Sustain. Cities Soc. 2021, 70, 102874. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. Yolov11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Liu, D.; Parmiggiani, A.; Psota, E.; Fitzgerald, R.; Norton, T. Where’s Your Head at? Detecting the Orientation and Position of Pigs with Rotated Bounding Boxes. Comput. Electron. Agric. 2023, 212, 108099. [Google Scholar] [CrossRef]
Tu, S.; Cai, Y.; Liang, Y.; Lei, H.; Huang, Y.; Liu, H.; Xiao, D. Tracking and Monitoring of Individual Pig Behavior Based on YOLOv5-Byte. Comput. Electron. Agric. 2024, 221, 108997. [Google Scholar] [CrossRef]
Ravi, N.; Gabeur, V.; Hu, Y.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L. Sam 2: Segment Anything in Images and Videos. arXiv 2024, arXiv:2408.00714. [Google Scholar]
Suvorov, R.; Logacheva, E.; Mashikhin, A.; Remizova, A.; Ashukha, A.; Silvestrov, A.; Kong, N.; Goka, H.; Park, K.; Lempitsky, V. Resolution-Robust Large Mask Inpainting with Fourier Convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022. [Google Scholar]
Yang, L.; Kang, B.; Huang, Z.; Zhao, Z.; Xu, X.; Feng, J.; Zhao, H. Depth Anything V2. arXiv 2024, arXiv:2406.09414. [Google Scholar]
Riekert, M.; Klein, A.; Adrion, F.; Hoffmann, C.; Gallmann, E. Automatically Detecting Pig Position and Posture by 2D Camera Imaging and Deep Learning. Comput. Electron. Agric. 2020, 174, 105391. [Google Scholar] [CrossRef]
Senior, A.; Brown, L.; Hampapur, A.; Shu, C.; Zhai, Y.; Feris, R.; Tian, Y.; Borger, S.; Carlson, C. Video Analytics for Retail. In Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, London, UK, 5–7 September 2007. [Google Scholar]
Ahn, H.; Son, S.; Yu, S.; Suh, Y.; Son, J.; Lee, S.; Chung, Y.; Park, D. Accurate Pig Detection for Video Monitoring Environment. J. Korea Multimed. Soc. 2021, 24, 890–902. [Google Scholar]
Son, S.; Ahn, H.; Baek, H.; Yu, S.; Suh, Y.; Lee, S.; Chung, Y.; Park, D. StaticPigDet: Accuracy Improvement of Static Camera-Based Pig Monitoring Using Background and Facility Information. Sensors 2022, 22, 8315. [Google Scholar] [CrossRef] [PubMed]
Miyawaki, K.; Huang, H.H.; Kawagoe, K. ROU (Region of Uninterest) and its Applications for Geographical Maps and Images. In Proceedings of the IIAI 3rd International Conference on Advanced Applied Informatics, Kitaktushu, Japan, 31 August–4 September 2014. [Google Scholar]
Zhou, L.; Xu, J. Enhanced Abandoned Object Detection through Adaptive Dual-Background Modeling and SAO-YOLO Integration. Sensors 2024, 24, 6572. [Google Scholar] [CrossRef] [PubMed]
Pan, J.; Zhong, S.; Yue, T.; Yin, Y.; Tang, Y. Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer. Sensors 2024, 24, 2374. [Google Scholar] [CrossRef] [PubMed]
Dombrowski, M.; Reynaud, H.; Baugh, M.; Kainz, B. Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023. [Google Scholar]
Garain, U.; Paquet, T.; Heutte, L. On Foreground—Background Separation in Low Quality Document Images. Int. J. Doc. Anal. Recognit. 2006, 8, 47–63. [Google Scholar] [CrossRef]
Ye, X.; Yang, J.; Sun, X.; Li, K.; Hou, C.; Wang, Y. Foreground–Background Separation from Video Clips via Motion-Assisted Matrix Restoration. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 1721–1734. [Google Scholar] [CrossRef]
Lee, K.; Jeong, J. Multi-Color Space Network for Salient Object Detection. Sensors 2022, 22, 3588. [Google Scholar] [CrossRef]
Huang, X.; Liang, C.; Li, X.; Kang, F. An Underwater Crack Detection System Combining New Underwater Image-Processing Technology and an Improved YOLOv9 Network. Sensors 2024, 24, 5981. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Feng, Y.; Shao, Y.; Liu, F. IDP-YOLOV9: Improvement of Object Detection Model in Severe Weather Scenarios from Drone Perspective. Appl. Sci. 2024, 14, 5277. [Google Scholar] [CrossRef]
Fei, Z.; Xie, Y.; Deng, D.; Meng, L.; Niu, F.; Sun, J. HDetect-VS: Tiny Human Object Enhancement and Detection Based on Visual Saliency for Maritime Search and Rescue. Appl. Sci. 2024, 14, 5260. [Google Scholar] [CrossRef]
Mutawa, A.; Al-Sabti, K.; Raizada, S.; Sruthi, S. A Deep Learning Model for Detecting Diabetic Retinopathy Stages with Discrete Wavelet Transform. Appl. Sci. 2024, 14, 4428. [Google Scholar] [CrossRef]
Kim, J.; Chung, Y.; Choi, Y.; Sa, J.; Kim, H.; Chung, Y.; Park, D.; Kim, H. Depth-Based Detection of Standing-Pigs in Moving Noise Environments. Sensors 2017, 17, 2757. [Google Scholar] [CrossRef] [PubMed]
Xu, J.; Zhou, S.; Xu, A.; Ye, J.; Zhao, A. Automatic Scoring of Postures in Grouped Pigs Using Depth Image and CNN-SVM. Comput. Electron. Agric. 2022, 194, 106746. [Google Scholar] [CrossRef]
Du, Q.; Bian, Y.; Wu, J.; Zhang, S.; Zhao, J. Cross-Modal Adaptive Interaction Network for RGB-D Saliency Detection. Appl. Sci. 2024, 14, 7440. [Google Scholar] [CrossRef]
Wang, Q.; Liu, F.; Cao, Y.; Ullah, F.; Zhou, M. LFIR-YOLO: Lightweight Model for Infrared Vehicle and Pedestrian Detection. Sensors 2024, 24, 6609. [Google Scholar] [CrossRef] [PubMed]
Gokturk, S.; Yalcin, H.; Bamji, C. A Time-of-Flight Depth Sensor-System Description, Issues and Solutions. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshop, Washington, DC, USA, 27 June–2 July 2004. [Google Scholar]
Condotta, I.; Brown-Brandl, T.; Pitla, S.; Stinn, J.; Silva-Miranda, K. Evaluation of Low-Cost Depth Cameras for Agricultural Applications. Comput. Electron. Agric. 2020, 173, 105394. [Google Scholar] [CrossRef]
Franchi, G.; Bus, J.; Boumans, I.; Bokkers, E.; Jensen, M.B.; Pedersen, L. Estimating Body Weight in Conventional Growing Pigs Using a Depth Camera. Smart Agric. Technol. 2023, 3, 100117. [Google Scholar] [CrossRef]
Yik, S.; Benjamin, M.; Lavagnino, M.; Morris, D. DIAT (Depth-Infrared Image Annotation Transfer) for Training a Depth-Based Pig-Pose Detector. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA, 25–29 October 2020. [Google Scholar]
Zheng, C.; Zhu, X.; Yang, X.; Wang, L.; Tu, S.; Xue, Y. Automatic Recognition of Lactating Sow Postures from Depth Images by Deep Learning Detector. Comput. Electron. Agric. 2018, 147, 51–63. [Google Scholar] [CrossRef]
Lee, J.; Jin, L.; Park, D.; Chung, Y. Automatic Recognition of Aggressive Behavior in Pigs Using a Kinect Depth Sensor. Sensors 2016, 16, 631. [Google Scholar] [CrossRef] [PubMed]
Leonard, S.; Xin, H.; Brown-Brandl, T.; Ramirez, B. Development and Application of an Image Acquisition System for Characterizing Sow Behaviors in Farrowing Stalls. Comput. Electron. Agric. 2019, 163, 104866. [Google Scholar] [CrossRef]
Pacheco, V.; Brown-Brandl, T.; de Sousa, R.; Rohrer, G.; Sharma, S.; Martello, L. Deep Learning-Based Sow Posture Classifier Using Colour and Depth Images. Smart Agric. Technol. 2024, 9, 100563. [Google Scholar] [CrossRef]
Lu, J.; Guo, H.; Du, A.; Su, Y.; Ruchay, A.; Marinello, F.; Pezzuolo, A. 2-D/3-D Fusion-Based Robust Pose Normalisation of 3-D Livestock from Multiple RGB-D Cameras. Biosyst. Eng. 2022, 223, 129–141. [Google Scholar] [CrossRef]
Zhu, X.; Chen, C.; Zheng, B.; Yang, X.; Gan, H.; Zheng, C.; Yang, A.; Mao, L.; Xue, Y. Automatic Recognition of Lactating Sow Postures by Refined Two-Stream RGB-D Faster R-CNN. Biosyst. Eng. 2020, 189, 116–132. [Google Scholar] [CrossRef]
Chen, C.; Zhu, W.; Liu, D.; Steibel, J.; Siegford, J.; Wurtz, K.; Han, J.; Norton, T. Detection of Aggressive Behaviours in Pigs Using a RealSence Depth Sensor. Comput. Electron. Agric. 2019, 166, 105003. [Google Scholar] [CrossRef]
Pizer, S.; Johnston, R.; Eriksen, J.; Yankaskas, B.; Muller, K. Contrast-Limited Adaptive Histogram Equalization: Speed and Effectiveness. In Proceedings of the First Conference on Visualization in Biomedical Computing, Atlanta, Georgia, 22–25 May 1990. [Google Scholar]
Otsu, N. A Threshold Selection Method From Gray-Level Histograms. Automatica 1975, 11, 23–27. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the International Conference on Advances in Data Engineering and Intelligent Computing Systems, Chennai, India, 18–19 April 2024. [Google Scholar]
Wang, C.; Yeh, I.; Mark Liao, H. Yolov9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]

Figure 1. Comparison of YOLO detection results and Depth Anything results. (a,d) Input images, (b,e) YOLO detection results, and (c,f) Depth Anything results. The primary reason for errors in YOLO detection results is that, unlike foundation models, it has not been trained on a sufficiently diverse range of environmental data.

Figure 2. The overall process of the proposed method.

Figure 3. Structure of RoI Setting Module.

Figure 4. Overview of DOG Generation Module.

Figure 5. Hue, saturation, and value of the input image with (a) HSV-Hue, (b) HSV-Saturation, and (c) HSV-Value.

Figure 6. Examples of sample images from train and test datasets showing differences in seen and unseen environment.

Figure 7. Object detection results with different test image types.

Figure 8. Results of each step of the proposed method with different unseen environments.

Figure 9. Results of the proposed method on tilted view.

Figure 10. CPU and GPU memory usage during the initialization step.

Table 1. Summary of related works regarding pig, depth, detection.

No. of Sensors	No. of Pigs	Use Foundation Model	Test Unseen Image	Ref.
1 (Depth)	9–18	X	X	[37]
1 (Depth)	10	X	X	[32]
2 (Infrared, Depth)	9	X	X	[8]
2 (Infrared, Depth)	1	X	X	[38]
2 (Color, Depth)	6	X	X	[39]
	13	X	X	[31]
	23	X	-	[40]
	1	X	X	[36]
	9	X	X	[41]
	1	X	-	[42]
	-	X	-	[43]
	9	X	-	[44]
3 (Color, Infrared, Depth)	8	X	X	[45]
1 (Color)	23	○	○	Proposed

-: Not specified.

Table 2. Performance of each different YOLO model versions tested with unseen environment data.

Train Image Type	Test Image Type	Detection Model	TP	FP	FN	AP₅₀ (%)	Inference Time (ms)
Gray	Gray (Baseline)	YOLOv8n [48]	3838	493	760	90.4	5.8
		YOLOv9t [49]	3845	657	753	88.8	10.6
		YOLOv10n [50]	3723	555	875	88.3	6.6
		YOLOv11n [10]	3972	433	626	93.2	5.9
	Depth Anything [16]	YOLOv8n [48]	4383	185	215	97.0	199.5
		YOLOv9t [49]	4440	54	158	98.2	204.3
		YOLOv10n [50]	4333	217	265	96.4	200.3
		YOLOv11n [10]	4236	127	362	96.2	199.6
	DOG (Proposed)	YOLOv8n [48]	4283	226	315	96.8	9.4
		YOLOv9t [49]	4184	386	414	91.8	14.2
		YOLOv10n [50]	4140	351	458	94.1	10.2
		YOLOv11n [10]	4370	423	228	95.9	9.5

Table 3. Performance of different YOLO model versions tested on unseen environment data trained with augmented training data.

Train Image Type	Test Image Type	Detection Model	TP	FP	FN	AP₅₀ (%)	Inference Time (ms)
Gray	Gray (baseline)	YOLOv8n [48]	3838	493	760	90.4	5.8
		YOLOv9t [49]	3845	657	753	88.8	10.6
		YOLOv10n [50]	3723	555	875	88.3	6.6
		YOLOv11n [10]	3972	433	626	93.2	5.9
Gray and DOG (proposed)	DOG (proposed)	YOLOv8n [48]	4317	169	281	97.0	9.4
		YOLOv9t [49]	4171	384	427	94.1	14.2
		YOLOv10n [50]	4228	525	370	93.5	10.2
		YOLOv11n [10]	4296	216	302	97.0	9.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Seo, J.; Son, S.; Yu, S.; Baek, H.; Chung, Y. Depth-Oriented Gray Image for Unseen Pig Detection in Real Time. Appl. Sci. 2025, 15, 988. https://doi.org/10.3390/app15020988

AMA Style

Seo J, Son S, Yu S, Baek H, Chung Y. Depth-Oriented Gray Image for Unseen Pig Detection in Real Time. Applied Sciences. 2025; 15(2):988. https://doi.org/10.3390/app15020988

Chicago/Turabian Style

Seo, Jongwoong, Seungwook Son, Seunghyun Yu, Hwapyeong Baek, and Yongwha Chung. 2025. "Depth-Oriented Gray Image for Unseen Pig Detection in Real Time" Applied Sciences 15, no. 2: 988. https://doi.org/10.3390/app15020988

APA Style

Seo, J., Son, S., Yu, S., Baek, H., & Chung, Y. (2025). Depth-Oriented Gray Image for Unseen Pig Detection in Real Time. Applied Sciences, 15(2), 988. https://doi.org/10.3390/app15020988

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Depth-Oriented Gray Image for Unseen Pig Detection in Real Time

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. RoI Setting Module

3.2. DOG Generation Module

4. Experiment Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI