1. Introduction
In recent years, object detection has played a crucial role in various applications such as autonomous driving, surveillance systems, and industrial automation. As a result, it has become a central focus of ongoing research in the fields of computer vision and artificial intelligence [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10]. Object detection models increasingly rely on high-quality input images, making image quality factors such as illumination, resolution, and noise even more critical in determining detection accuracy [
11]. These factors significantly impact the effectiveness of detection algorithms, as issues in illumination, resolution, and noise can distort object features and degrade overall performance [
12,
13].
Illumination is a crucial factor that defines the shape and features of objects in an image. Variations in illumination, such as low-light conditions or overexposure, can blur object features, reducing detection accuracy. For instance, surveillance systems face significant detection challenges due to lighting differences between day and night. Resolution also plays an important role, as low-resolution images cause objects to appear blurry, reducing the detectability of fine details and small or distant objects. In the case of autonomous vehicles, inadequate resolution can severely hinder detection of distant obstacles. Noise, often caused by image acquisition devices or environmental factors, distorts pixel values, making it challenging to differentiate between objects and backgrounds, especially in low-light environments [
14,
15,
16,
17].
While existing image enhancement methods often address individual aspects, such as illumination correction or resolution enhancement, these single-factor approaches may overlook the complex, interdependent nature of image quality factors [
18,
19]. Recent studies have attempted to tackle these interdependencies with multi-factor enhancement methods. For example, dual-encoding Y-ResNet has been introduced to mitigate issues like lens flare, which disrupts object boundaries, while GAN-based approaches have been used to generate synthetic images with controlled defects, training models with diverse data. These advancements indicate the ongoing need for adaptive image enhancement that addresses multiple quality factors to meet the demands of diverse environments [
20,
21].
This paper proposes a hierarchical image quality improvement process that dynamically prioritizes and adjusts the enhancement order of illumination, resolution, and noise based on their severity levels. The process evaluates each factor’s necessity for improvement through discriminators that analyze brightness, edge strength, and noise distribution, and applies enhancements with intensity levels that are dynamically adapted. Additionally, the process incorporates an iterative learning mechanism, which adjusts weights based on quality assessment results, allowing for continuous optimization across diverse image conditions.
The contributions of this study are as follows: (1) A dynamic prioritization mechanism that determines the improvement order of illumination, resolution, and noise based on calculated severity levels, addressing the complex interplay between these factors. (2) An iterative improvement framework with adaptive weight adjustments, enabling the process to fine-tune the importance of each factor according to its impact on overall image quality and detection accuracy. (3) Extensive validation through experiments, demonstrating enhanced detection performance, especially in challenging environments with varying illumination and noise levels.
This hierarchical approach aims to improve object detection performance by addressing the most severe quality issues first, leading to more robust detection results in real-world applications.
2. Literature Review
Object detection through image quality enhancement has been an important field of research for a long time. Various techniques have been proposed to improve factors that degrade image quality, such as illumination, resolution, and noise. This section reviews major studies addressing the impact of these three factors on object detection performance and introduces recent advancements in multi-factor image enhancement techniques.
2.1. Illumination Correction for Object Detection
Illumination variation is one of the factors that most significantly affect object detection. Irregular lighting conditions such as insufficient illumination, shadows, or overexposure can greatly degrade the performance of detection systems. Various methods have been proposed to address this issue. Histogram Equalization (HE) and Adaptive Histogram Equalization (AHE) improve the contrast of an image by redistributing brightness levels, effectively correcting illumination variations [
22,
23]. However, these techniques often excessively enhance contrast, resulting in unnatural outcomes [
24]. Contrast Limited Adaptive Histogram Equalization (CLAHE) limits the amount of contrast enhancement in local areas, producing more natural and visually pleasing results, thus overcoming the drawbacks of traditional methods [
25]. Recently, deep learning-based illumination correction techniques have emerged and gained significant attention. For example, Retinex-based models work by separating the reflectance and illumination of an image, correcting the illumination while preserving details [
26,
27]. Additionally, methods utilizing Generative Adversarial Networks (GANs) are used to convert low-light images into bright images, thereby enhancing object detection performance [
28,
29]. However, these deep learning approaches primarily focus on illumination issues and have limitations in simultaneously considering other factors such as resolution and noise [
30].
2.2. Resolution Enhancement for Object Detection
Resolution directly affects the ability to detect and distinguish the fine details of objects. Low-resolution images lose important structural information, making it difficult for detection algorithms to extract meaningful features. Traditional resolution enhancement methods include bicubic interpolation, Lanczos resampling, and bilinear interpolation, which are techniques that up-sample images based on neighboring pixel information [
31,
32,
33]. However, these methods often fail to restore sharp edges and the essential high-frequency details of objects. Recent studies have focused on deep learning-based super-resolution techniques to generate high-resolution images from low-resolution ones. Techniques like SRCNN (Super-Resolution Convolutional Neural Network) and EDSR (Enhanced Deep Super-Resolution Network) have been applied to object detection, demonstrating high performance [
34,
35]. However, these models also tend to focus solely on resolution, often neglecting other quality issues such as noise and illumination.
2.3. Noise Reduction Techniques
Noise in images can distort pixel values and negatively affect object detection performance. Various noise reduction techniques have been developed, with several notable methods among them. Traditional noise reduction techniques, such as Gaussian smoothing, median filtering, and Wiener filtering, reduce noise by averaging pixel values or removing high-frequency components [
36]. However, these methods can lead to the loss of important details. More recent techniques, like Non-Local Means (NLM) filtering and Block Matching and 3D Filtering (BM3D), utilize the self-similarity within the image to reduce noise while preserving details effectively [
37,
38]. Additionally, deep learning-based methods such as DnCNN (Denoising Convolutional Neural Network) and FFDNet learn complex noise patterns, demonstrating excellent noise reduction performance [
39,
40]. However, these techniques also have limitations in simultaneously addressing other issues such as illumination and resolution.
2.4. Multi-Factor Image Enhancement
In real-world environments, issues like insufficient illumination, reduced resolution, and noise often occur simultaneously. To address this, multi-factor image enhancement techniques have emerged, aiming to resolve multiple quality issues at once. For instance, the multi-scale Retinex algorithm has gained attention for addressing contrast enhancement and illumination issues simultaneously [
41]. Some studies have also attempted to integrate noise reduction techniques to achieve better performance [
42]. Frameworks that improve both resolution and noise simultaneously have also been developed. These approaches, often based on GANs, upscale low-resolution images while reducing noise [
43,
44]. However, these methods still have limitations, such as the need for high-quality training images and not dynamically considering the combined impact of each factor [
45,
46,
47].
2.5. Recent Advances in Image Quality Assessment and Object Detection
In addition to traditional and multi-factor enhancement approaches, recent studies have proposed innovative methods for image quality assessment and object detection, offering new perspectives relevant to our work. For example, No-Reference Image Quality Assessment by Hallucinating Pristine Features [
48] introduces a no-reference IQA method that hallucinates pristine features to evaluate degraded images without requiring original reference images. This technique could complement our approach by offering an alternative means of assessing quality, particularly in cases where original images are unavailable for comparison. Integrating this method could potentially enhance the assessment accuracy of noise and distortion in our framework.
Further, object detection models have seen significant advancements, such as the recent findings in DETRs Beat YOLOs on Real-time Object Detection [
49]. This study highlights the strengths of DETR models over YOLO models in specific contexts, particularly in terms of accuracy and robustness. While our study utilizes YOLOv8 due to its established performance in real-time applications, comparing our method’s performance on DETR-based models could offer additional insights into the applicability of our hierarchical improvement process across different detection architectures.
Lastly, Pixel-inconsistency Modeling for Image Manipulation Localization [
50] provides a technique to detect pixel-level inconsistencies for identifying manipulated content. Although this work primarily addresses image manipulation, the concept of pixel-level analysis aligns with our goal of detecting subtle artifacts introduced by noise and resolution adjustments. Incorporating these techniques could further refine our process, minimizing unintended artifacts during enhancement.
These studies underscore the rapid progress in image quality assessment and object detection, providing valuable reference points for our hierarchical improvement model. Our proposed method is positioned to extend these advances by dynamically addressing the simultaneous challenges posed by illumination, resolution, and noise, prioritizing improvement based on severity to achieve optimal image quality for object detection applications.
3. Environmental Factors and Clustering
To optimize object recognition performance, understanding the environmental factors that impact image quality is crucial. Each factor influences the visual quality of an image and, consequently, affects object recognition performance. This section explains the key environmental factors that impact image quality and discusses a conceptual clustering approach to group these factors systematically, providing a structured foundation for the image quality enhancement strategy.
3.1. Environmental Factors Affecting Image Quality
The main environmental factors that degrade image quality include illumination, resolution, noise, color distortion, contrast, blur, compression artifacts, and other factors like weather, viewing angle, and object size. Each factor uniquely affects the visual quality of an image and plays a critical role in determining the accuracy of object detection.
Illumination: Illumination affects the brightness and color of an image. Depending on the lighting conditions, the appearance of an object may vary, and in dark environments, the object may not be visible enough. Excessively bright or dark lighting can cause the loss of object details or create shadows, making object recognition difficult. Uneven illumination can cause specific areas to be overly bright or dark, confusing recognition performance.
Resolution: Resolution refers to the number of pixels in an image and indicates how well it can represent the details of the image, determining its sharpness and detail. Low resolution causes the image to appear blurry, making pixels noticeable, which hinders the representation of object details and lowers object recognition performance. Conversely, higher resolution results in a sharper image that can capture fine details.
Noise: Noise is unnecessary signals or information caused by imperfections in the image sensor, compression/transmission processes, or shooting environment. Noise degrades the image’s clarity and makes the object’s boundaries unclear, distorting the original image data and reducing object recognition accuracy. Especially in low-light environments, noise occurs more severely, further degrading image quality.
Color Distortion: Color distortion refers to a phenomenon where the original color tone of an image is damaged and displayed differently. This can occur due to various factors such as illumination, camera settings, lens quality, and compression processes. When colors are distorted, the performance of color-based object recognition is reduced. For example, errors can occur in classification tasks based on specific colors.
Contrast: Contrast refers to the difference between the brightest and darkest parts of an image. Low contrast reduces the difference between light and dark areas, making the image look flat and lacking in detail. This makes it difficult to distinguish between the object and the background, reducing object recognition performance. On the other hand, too much contrast can create excessive boundaries, causing the loss of important information in the image.
Blur: Blur is the phenomenon where an image appears blurry, which can occur for various reasons such as movement, focus mismatch, and lens quality issues. Severe blur makes the object’s boundaries and details unclear, reducing recognition performance. This is especially problematic in the recognition of small objects or in boundary-based recognition.
Compression Artifacts: Compression artifacts refer to the phenomenon where data loss occurs during the image compression process, damaging the original image. This mainly occurs in lossy compression formats such as JPEG. Compression artifacts can cause blocky artifacts or noise in the image, distorting the shape of the object and reducing recognition performance. Especially in highly compressed images, boundaries may become smeared, or color distortion may occur.
Distortion: Distortion refers to abnormal deformation that occurs due to lens asphericity, incorrect focus, uneven illumination, or image processing. It causes the shape of the object to appear differently than it actually is. When an object is distorted, the recognition system has difficulty perceiving the object’s actual form, leading to performance degradation, especially in shape-based classification.
Exposure: Exposure refers to the amount of light that the image sensor is exposed to. Too much exposure causes overexposure, while too little causes underexposure. Overexposure can result in areas that are too bright, losing detail in those pixels, while underexposure can turn dark areas black, losing detail. This can cause important features of the object to be missed, making information indistinguishable and reducing recognition performance.
Viewing Angle: Viewing angle refers to the angle at which the camera captures the object. Images captured from the front, side, or oblique angles can all look different. When the viewing angle changes, the shape, size, and even the identifiable features of the object can change. In particular, objects captured from angles not included in the training data are difficult to recognize.
Background Complexity: Background complexity refers to how complex and varied the background around the object is. A complex background makes it difficult to distinguish between the object and the background, reducing recognition performance. The problem is exacerbated when the color or texture of the object is similar to the background.
Object Position and Size: The position and size of the object in the image are important factors for object recognition. If the object is at the edge of the image or very small, it is difficult for the recognition system to accurately detect and classify the object. If the object is too large, some parts may be cut off, leading to the loss of important features.
Camera Settings: Camera settings include focus, shutter speed, and ISO sensitivity, which directly affect image quality. Incorrect camera settings can degrade image quality, negatively affecting object recognition performance. For example, if the focus is incorrect, the image will appear blurry, and too-high ISO sensitivity increases noise.
Object Characteristics: The characteristics of the object itself, such as reflectivity, transparency, color, and texture, affect recognition performance. Reflective objects can reflect light, distorting their shape, and transparent objects are difficult to distinguish from the background, making recognition difficult. A complex texture can make it difficult to distinguish the object’s boundaries.
Weather: Weather conditions affect the environment in which the image is captured. For example, rain, snow, fog, or sunlight can affect image quality. Rain or snow obstructs the view, fog reduces image sharpness, and strong sunlight can cause overexposure. All these factors can reduce object recognition performance.
Movement Speed: The movement speed of the object or the camera can cause blur in the image. Fast-moving objects cause blur in the image, blurring the boundaries and reducing object recognition performance.
Occlusion: Occlusion refers to the phenomenon where an object is partially covered by another object, making only part of the object visible. When only part of an object is visible, it becomes difficult for the object recognition system to accurately recognize the entire object. Problems arise especially when important features are obscured.
Reflection: Reflection refers to the phenomenon where light or images of other objects around the object are reflected on the surface. Reflected light or images can distort the shape or color of the object, causing confusion in recognition performance. This issue can arise especially when the surface of glass or objects is shiny.
Specular Highlighting: Specular highlighting refers to the phenomenon where light is strongly reflected on the surface of a glossy object, causing specific areas to appear very bright. This can distort the color information or shape of the object. The recognition system may mistake these bright spots or areas as part of the object.
Contour Confusion: Contour confusion refers to the phenomenon where the object’s contours appear ambiguous or similar to the background, especially in complex backgrounds. When the boundary between the object and the background becomes unclear, it becomes difficult to recognize the object’s shape. This can lead to performance degradation, especially in boundary-based recognition algorithms.
Object Surface Texture: The surface texture of an object is an important factor in forming the object’s features in an image. If the surface texture is complex or irregular, the shape of the object may become unclear, making recognition difficult. Conversely, a simple texture is advantageous for object recognition.
These environmental factors each have a significant impact on image quality and, as a result, directly or indirectly affect the performance of deep learning-based object detection systems. Understanding and controlling these factors are key to achieving high detection performance.
3.2. Selection of Environmental Factors (Clustering)
Improving image quality for object detection requires a targeted approach to address relevant environmental factors based on their impact. Due to the interdependencies among these factors, clustering similar factors together can help in systematically addressing them and prioritizing those with the most substantial influence on recognition performance.
The clustering process here is conceptual rather than algorithmic, aiming to categorize environmental factors into five groups (or “clusters”) based on their functional impact on image quality and object recognition. This allows us to focus enhancement efforts on clusters with the highest influence on quality, specifically the Image Quality Factors cluster.
In this study, the factors are grouped into the following clusters (illustrated in
Figure 1).
This cluster includes the factors that most directly affect the visual quality of an image. Specifically, it includes illumination, resolution, noise, contrast, blur, compression artifacts, color distortion, exposure, distortion, specular highlighting, and object surface texture. These factors are crucial in determining the overall visual quality of an image and play a key role in the process by which an object recognition system processes images. Illumination adjusts the brightness and color of the image to clarify the shape and boundaries of objects, while resolution expresses the details of the image. Noise distorts the boundaries or details of objects. Contrast and color distortion affect the clarity and color representation of the image, and compression artifacts can degrade quality due to losses occurring during the image compression process. Specular highlighting and object surface texture define the visual characteristics of objects. These factors work together to determine the overall quality of an image.
This cluster is a group of factors that need to be directly improved to enhance image quality, and improving these factors can significantly improve object recognition performance.
This cluster includes factors related to the physical environment and camera settings at the time of image capture. Specifically, it includes illumination, camera settings, exposure, viewing angle, weather, reflection, and specular highlighting. In this cluster, illumination determines the amount of light in the shooting environment, and camera settings include factors such as focus, shutter speed, and ISO sensitivity. These settings have a significant impact on image quality, and appropriate settings are necessary to prevent issues such as blur or overexposure. Weather and reflection represent the impact of the external environment on image quality, and weather conditions like strong sunlight, rain, or snow can significantly degrade image quality.
The factors in this cluster are mainly variables directly related to the shooting environment, and strategies are needed to minimize image quality degradation caused by environmental factors.
This cluster consists of factors related to the composition of objects and the background within an image. It includes background complexity, object position and size, object characteristics, occlusion, contour confusion, object surface texture, and reflection. Background complexity is a crucial factor in distinguishing between the object and the background; the more complex the background, the lower the object recognition performance. The position and size of the object determine the difficulty of recognition depending on how the object is positioned in the image and its size. Small objects are difficult to detect, and large objects can be partially cut off, hindering recognition.
This cluster focuses on scene composition and the physical characteristics of objects, and improving these factors can contribute to more clearly distinguishing between the object and the background, thereby enhancing recognition performance.
This cluster consists of factors that affect movement or dynamic situations within an image. It includes movement speed, blur, occlusion, viewing angle, and distortion. Movement speed refers to the speed at which the camera or object moves; fast movement causes blur, making object recognition difficult. Occlusion represents the impact on recognition performance when an object is obscured by another object. If an important part is obscured, the overall recognition performance of the object may be reduced.
This cluster plays an important role in developing strategies to improve object recognition in environments with a lot of movement or change.
Lastly, this cluster focuses on how image quality is affected by external environments and shooting conditions. It includes weather, illumination, background complexity, color distortion, noise, and compression artifacts. Weather and illumination are variables caused by the external environment. Lighting conditions can vary depending on the weather, which has a significant impact on the quality of the image. Noise and compression artifacts are factors that can be caused by external conditions and shooting equipment, which can reduce the reliability of the image.
This cluster focuses on controlling and optimizing factors caused by the environment to improve shooting conditions.
By organizing factors into these clusters, we enable a targeted improvement approach. For instance, focusing on the Image Quality Factors cluster directly addresses visual quality, which can offset or amplify the effects of other factors, improving recognition performance in diverse conditions.
Importance of Adopting the Image Quality Factors Cluster
The Image Quality Factors cluster directly impacts visual quality and is thus prioritized for improvement. This cluster’s elements (illumination, resolution, noise, etc.) are the most visually apparent and foundational for clear and accurate object recognition. By focusing on this cluster, we can significantly enhance recognition performance while reducing the impact of other clustered factors, such as movement or environmental conditions.
4. Hierarchical Improvement Process Design for Environmental Factors
The hierarchical improvement proposed in this paper refers to the method of sequentially enhancing various factors according to their priority to improve image quality. This means that the three main environmental factors—illumination, resolution, and noise—are each independently evaluated and then sequentially improved in a hierarchical structure that considers their interactivity. Since the environmental factors of illumination, resolution, and noise interact with each other and have low independence, it is crucial to clearly define the criteria for determining whether each factor needs improvement. This chapter explains the criteria for each factor and proposes specific methods to determine whether improvement is needed. By doing so, a process is designed to identify and improve environmental factors in the input image.
Figure 2 illustrates the hierarchical improvement process for environmental factors, representing the overall procedure for dynamically enhancing the quality of the original image. The process is divided into three main parts: “Environmental Factor Identification”, “Hierarchical Image Enhancement”, and “Quality Assessment and Weight Update”.
Figure 3 provides an overall flowchart of the improvement process, breaking down the content of the diagram into detailed steps. It covers the process from the beginning to the output of the final improved image. The improvement process starts with the analysis of the input image data. This data goes through the Environmental Factor Identification phase, where three discriminators (illumination, resolution, and noise) perform a characteristic analysis of each environmental factor.
In the Environmental Factor Identification phase, the Illumination Discriminator calculates the brightness deviation and contrast of the image to determine whether illumination improvement is needed. The Resolution Discriminator analyzes the image size (number of pixels), edge strength, and texture uniformity to assess the need for resolution enhancement. The Noise Discriminator calculates the noise distribution and high-frequency noise ratio to evaluate the necessity of noise reduction. The result of each discriminator extracts whether improvement is needed and its severity.
For factors deemed “necessary for improvement”, severity is calculated and used to set the priority and intensity of the improvement process. For instance, if the Illumination Discriminator determines that brightness or contrast enhancement is necessary, the severity of that factor is calculated, setting the illumination severity. Similarly, the severity of resolution and noise is calculated based on their respective discriminators. If “improvement is unnecessary”, no enhancement is performed for that factor, and the severity is set to 0.
The next step is Hierarchical Image Enhancement. In this phase, the improvement priority and intensity for each factor are set based on the necessity and severity obtained from the discriminators. The higher the severity, the higher the priority of the factor. The severity values are generally designed to range between 0 and 1 for most images, enabling a balanced response across a wide range of typical image conditions. However, in extreme cases—such as images with very low brightness, significant noise, or exceptionally low resolution—severity values may exceed this range, triggering the highest level of intensity for improvement. The intensity of improvement is divided into three levels as follows:
Intensity 1: [0 < severity < 0.3]
Intensity 2: [0.3 ≤ severity < 0.6]
Intensity 3: [0.6 ≤ severity]
This structured intensity division ensures that minor issues receive lighter processing, while severe degradation in extreme cases activates more intensive enhancement procedures, optimizing image quality across varying conditions.
Improvements are performed sequentially, starting with the factor with the highest priority, and the improvement process is applied in order to illumination, resolution, and noise. For example, if illumination has the highest severity among the three factors, illumination enhancement is performed first. The Illumination Enhancement phase optimizes the brightness and contrast of the image using illumination correction techniques. The Resolution Enhancement phase applies resolution correction algorithms to better represent image details. The Noise Reduction phase applies noise removal algorithms to reduce noise in the image.
The subsequent step involves Quality Assessment and Weight Update. In this phase, the Quality Assessment Module measures the degree of improvement after each enhancement step, and the weights are dynamically updated based on the results. Based on the quality assessment results, it is determined whether the improvement was successful, and if necessary, learning parameter values are adjusted to reflect the dynamically changed weight values in the next improvement process. This enhances the efficiency of future improvement processes and applies the optimal improvement strategy by adjusting the learning parameter values, thereby improving image quality through enhancement. If there are no more factors requiring improvement or the improvement process exceeds the specified number of iterations, the final improved image is output, and the process terminates.
4.1. Criteria for Improvement Necessity of Each Factor
In situations where the independence among environmental factors is low, it is crucial to establish clear criteria for accurately determining the need for improvement. Illumination, resolution, and noise are key factors that affect the pixel environment of the entire image. Although they do not directly cause the loss of physical information in the image, they can indirectly degrade object recognition performance. For this reason, it is essential to establish clear criteria for determining the necessity of improvement for each factor. This study proposes the following specific criteria to determine the necessity for improvement of each factor:
4.1.1. Criteria for Illumination Improvement Determination
Illumination has a significant impact on the overall brightness and contrast of an image, and improper illumination can reduce the accuracy of object recognition. To determine the need for improvement of the illumination factor, the following criteria are proposed:
The brightness deviation of an image is measured by calculating the average brightness of the entire image and then determining the deviation from the ideal brightness value (the median value, 128). A larger deviation indicates that the image is either too dark or too bright, which is considered a signal that improvement is needed. To calculate the brightness deviation, first compute the average brightness value of all pixels, and then find the absolute difference between this value and the median (128). For example, if the average brightness is 100, the brightness deviation is “|100 − 128| = 28”. If this deviation value exceeds a certain threshold, it can be determined that illumination improvement is necessary. Equation (1) represents the formula for calculating brightness deviation, where
μB is the mean value of image brightness (the median), and
L is the maximum brightness value (typically 256 for an 8-bit image). Equation (2) is the formula for determining the necessity of brightness improvement. By using a user-defined brightness deviation threshold,
Tb, we can consider the central concentration of the brightness histogram to determine the need for improvement.
Contrast is an indicator that represents the difference between the bright and dark areas of an image. By analyzing the overall contrast level of the image, we can determine whether there is an illumination imbalance. Low contrast may indicate improper illumination or excessive shadows. On the other hand, excessively high contrast can lead to the loss of image details. To evaluate the contrast level of an image, a specific threshold is set, and if the contrast deviates from this threshold, it is considered a signal that improvement is needed. Equation (3) is the formula for measuring contrast, where
Imax and
Imin represent the maximum and minimum brightness values of the image, respectively. Equation (4) is the formula for determining the necessity of contrast improvement. Using a user-defined contrast threshold,
Tc, we can assess the need for improvement.
In the end, for the illumination factor, an “OR” gate is applied to the results of the brightness deviation and contrast improvement necessity determinations (True/False) to ultimately decide whether improvement is required.
4.1.2. Criteria for Resolution Improvement Determination
Resolution plays a crucial role in representing the details of an image, and low-resolution images can negatively impact object recognition performance. The criteria for determining the necessity of resolution improvement are as follows:
Image size or pixel count is a direct indicator of resolution and is used to detect low-resolution images. If the horizontal or vertical resolution (pixel count) is lower than the set minimum resolution threshold, it is considered low resolution and requires improvement. For example, if the horizontal or vertical resolution of an image falls below 256 pixels, it is considered low resolution, signaling the need for upscaling. Equation (5) is the formula for measuring the size of the input image, where
W and
H represent the width and height of the image, respectively. Equation (6) is the formula for determining the necessity of image size improvement. Using a user-defined image size threshold,
Ti, we can assess the need for improvement.
Edges in the image are detected to evaluate sharpness (strength). If edges appear indistinct and blurry, it can be determined that the resolution is low. To achieve this, edge detection algorithms like the Laplacian filter are used to quantify the sharpness of the image. A specific threshold is set, and if the sharpness value falls below this threshold, it is decided that sharpness enhancement is necessary. Equation (7) is the formula for measuring edge strength, where
I represents the input image. Equation (8) is the formula for determining the necessity of edge strength improvement. Using a user-defined edge strength threshold,
Te, we can assess the need for improvement.
Texture uniformity is an indicator that evaluates how consistently and evenly textures are distributed within an image. In cases of low resolution, the details of the texture may appear unclear or blurred, resulting in decreased texture uniformity. To assess this, the Local Binary Pattern (LBP) method is used to analyze the texture of the image. If the texture is found to be non-uniform and irregularly distributed, it is determined that resolution enhancement is necessary [
51]. This serves as an important criterion for determining the need for resolution improvement and helps to express image details more clearly. Equation (9) is the formula for measuring texture uniformity. Equation (10) is the formula for determining the necessity of texture improvement. Using a user-defined texture uniformity threshold,
Tt, we can assess the need for improvement.
In the end, for the resolution factor, an “AND” gate is applied to the results of the image size, edge strength, and texture improvement necessity determinations (True/False) to ultimately decide whether improvement is required. Specifically, the resolution factor applies an “AND” gate instead of an “OR” gate. This is to prevent indiscriminate image upscaling and to avoid overhead caused by a drastic increase in computational load during the improvement process.
4.1.3. Criteria for Noise Improvement Determination
Noise is a major factor that degrades image quality, and a high level of noise can negatively impact object recognition performance. The criteria for determining the necessity of noise improvement are as follows:
To quantitatively evaluate noise distribution, the BRISQUE (Blind Referenceless Image Spatial Quality Evaluator) algorithm is used [
52]. BRISQUE is an objective metric designed to evaluate the visual quality of an image without any reference to the original image. It is sensitive to various quality degradation factors, especially noise, making it effective for measuring image noise. This algorithm extracts features based on local contrast normalization and the Natural Scene Statistics (NSS) of the image and predicts an image quality score through a Support Vector Regression (SVR) model. It ultimately quantifies the degree of image distortion to produce a quality score. Generally, a higher BRISQUE score indicates lower image quality and a higher amount of noise. In this paper, the BRISQUE score is used as an indicator for measuring noise, and it is compared against a specific threshold. If the score exceeds this threshold, it is determined that noise removal is necessary. Equation (11) represents the BRISQUE score for measuring noise distribution, and Equation (12) is the formula for determining the necessity of noise improvement. Using a user-defined noise distribution threshold,
Tn, we can assess the need for improvement.
In the frequency domain of the image, high-frequency noise components are detected to evaluate the level of noise. High-frequency components include not only the details of the image but also noise, and the amount of noise in the high-frequency region can be measured through frequency analysis. Using the Fourier transform, the frequency spectrum of the image is analyzed, and if an excessive amount of noise is detected in the high-frequency band, it is determined that noise improvement is needed [
53]. In this case, if the detected noise exceeds the threshold, it is considered necessary for improvement. Equation (13) is the formula for measuring the high-frequency noise ratio based on the Fourier transform, where
fc represents the cutoff frequency for the high-frequency band. Equation (14) is the formula for determining the necessity of high-frequency noise improvement. Using a user-defined high-frequency noise ratio threshold,
Tf, we can assess the need for improvement.
In the end, for the noise factor, an “OR” gate is applied to the results of the noise distribution and high-frequency noise improvement necessity determinations (True/False) to ultimately decide whether improvement is required.
Based on the above determination criteria, each discriminator performs a quantitative evaluation of the environmental factors to measure severity.
4.2. Environmental Factor Discriminators: Severity Calculation
Environmental factor discriminators evaluate factors that directly affect image quality, such as illumination, resolution, and noise, to determine if improvement is needed and to quantitatively calculate their severity. Severity is a key element that determines the priority and intensity of improvement tasks for each factor, making it an essential process for efficient image quality enhancement. This section provides a detailed explanation of the methods for calculating the severity of major factors like illumination, resolution, and noise.
4.2.1. Illumination Severity
Illumination severity analyzes the lighting conditions related to the overall brightness and contrast of an image to determine the necessity of improvement and calculate its severity. It directly impacts image visibility and object recognition performance, and appropriate illumination can enhance image quality.
Equation (15) is the formula for calculating the severity of the illumination factor. The severity of the illumination factor, SIllumination, is expressed as a weighted sum of brightness deviation and contrast. Here, αI and βI are the weights that reflect the importance of each element in the illumination factor. These weights are used to relatively evaluate the impact of brightness deviation and contrast on illumination severity. Through this weighted sum, the necessity for illumination improvement is quantified, and the priority and intensity of illumination enhancement tasks are determined based on the severity.
4.2.2. Resolution Severity
Resolution severity evaluates resolution-related elements such as image sharpness, size, and texture uniformity to determine whether resolution improvement is necessary and to calculate its severity. Resolution is a crucial factor in representing the details of an image and identifying objects.
Equation (16) is the formula for calculating the severity of the resolution factor. The severity of the resolution factor, SResolution, is expressed as a weighted sum of the analysis results of image size, edge strength, and texture uniformity. Here, αR, βR, and γR are the weights that reflect the importance of each element of the resolution factor. These weights are used to relatively evaluate the impact of image size, edge strength, and texture uniformity on resolution severity. Through this weighted sum, the necessity and severity of resolution improvement are quantified, and based on this value, the priority and intensity of the improvement tasks are determined.
4.2.3. Noise Severity
Noise severity evaluates the level of noise present in the image to measure the necessity and severity of improvement. Noise degrades the visual quality of the image and can negatively impact object detection.
Equation (17) is the formula for calculating the severity of the noise factor. The severity of the noise factor, SNoise, is expressed as a weighted sum of the analysis results of noise distribution (BRISQUE score) and high-frequency noise ratio. Here, αN and βN are the weights that reflect the importance of each element in the noise factor. These weights are used to relatively evaluate the impact of noise distribution and high-frequency noise ratio on severity. This weighted sum quantifies the severity of noise, and based on this value, the priority and intensity of the improvement tasks are determined.
The Role of Weights in Severity Calculation Formulas
By using weights in the severity calculation formulas for each factor, the relative importance of each element affecting the severity can be reflected.
Reflecting the Importance of Factors:
Not all factors may have the same level of importance. Weights adjust the impact of each element on the overall severity, helping to determine which element plays a more critical role. For instance, in the case of resolution, if edge strength is more important than texture uniformity, a higher weight can be assigned to βR.
- 2.
Establishing a Flexible Improvement Strategy:
By adjusting the weights, a flexible improvement strategy that can adapt to various image characteristics and situations can be developed. This helps identify which element needs more focused improvement in a specific environment. For example, in certain images, the proportion of darkness may be a more significant issue than brightness deviation in illumination. By adjusting the weights, these differences can be accounted for.
4.3. Image Improvement Strategy
4.3.1. Setting Thresholds for Each Element
Thresholds for each element are crucial criteria for determining the need for improvement of each factor. These thresholds are set based on general image quality assessments and empirical data and have a direct impact on the characteristics of the image and the accuracy of object detection. Setting appropriate thresholds is vital for enhancing the effectiveness of the improvement process, allowing it to adapt to various image environments.
For illumination, the following two elements were considered. First, the threshold Tb for brightness deviation is set to 64. This value indicates how much the overall brightness of the image deviates from the median value of 128. A value of 64 is chosen to detect noticeable illumination imbalances, and if the deviation exceeds this range, illumination improvement is deemed necessary. Secondly, the threshold Tc for contrast is set to 50. Contrast represents the difference between the bright and dark areas of an image, and 50 is set as a criterion to distinguish between images with sufficient contrast and those without. This value is empirically derived from various images, and if the contrast is below 50, illumination improvement is considered necessary.
For resolution, the threshold Ti for image size is set to 4,000,000 pixels. This corresponds to a resolution of approximately 2000 × 2000, and images below this resolution are more likely to lose the detailed features of objects, so this serves as the basis. The threshold Te for edge strength is set to 15, which is derived as the average edge strength calculated using the Laplacian filter. If the edge strength is below 15, it is determined that the image lacks sharpness, indicating a need for resolution enhancement. Lastly, the threshold Tt for texture uniformity is set to 15, based on an analysis using the Local Binary Pattern (LBP). This value is set as a criterion for determining whether the texture of the image is sufficiently expressed.
For noise, thresholds are set based on two elements. First, the threshold Tn for the BRISQUE score is set to 30. The BRISQUE score evaluates the overall quality of the image, with higher scores indicating lower quality. The value of 30 is chosen as a criterion that can objectively detect quality degradation in various images. Secondly, the threshold Tf for the high-frequency noise ratio is set to 0.1. High-frequency noise appears in the high-frequency components of an image and is measured through frequency analysis. This value indicates the proportion of high-frequency noise in the total frequency, and if it exceeds this value, noise improvement is considered necessary.
4.3.2. Initial Weight Setting
Initial weights are set to reflect the relative importance of each element. These weights quantify the impact of each element on image quality and play a crucial role in enhancing the effectiveness of the improvement process. They are established based on various experiments and empirical data and can be adjusted depending on the situation.
In the illumination severity formula, the weights for Brightness Deviation and Contrast were set equally at 0.5. Brightness Deviation (αI = 0.5) indicates how much the brightness state of the image deviates from the median value. Since maintaining illumination balance is important, a high weight was assigned to this element. Contrast (βI = 0.5) represents the difference between the bright and dark areas of an image, and sufficient contrast helps in more clearly recognizing objects. Therefore, this element was also given an equal weight when considering illumination improvement.
For the resolution severity formula, weights were assigned to three elements. Image Size (αR = 0.4) was assigned a weight of 0.4 because the size of the image is important for representing the details of objects. This means that while resolution has a significant impact on image quality, it must be considered in balance with other elements. Edge Strength (βR = 0.4) reflects the sharpness and detail of the image and plays a crucial role in object identification, so it was also assigned a weight of 0.4. Texture Uniformity (γR = 0.2) is important for expressing the texture of the image, but it is relatively less important than the other elements, so a weight of 0.2 was set to control its influence on texture improvement.
In the noise severity formula, weights were set for the BRISQUE Score (αN = 0.6) and High-Frequency Noise (βN = 0.4). The BRISQUE Score evaluates the overall quality of the image, and images with high noise have higher scores. Thus, a weight of 0.6 was assigned to this element to increase sensitivity to noise. High-Frequency Noise, which appears in the high-frequency components of the image and can distort the details of the image, was assigned a weight of 0.4. This allows for the improvement process to consider various aspects of noise.
The initially set weights can be dynamically adjusted during the image improvement process and are designed to achieve optimal improvement results by applying them to various image environments.
4.3.3. Setting Improvement Intensity and Image Enhancement
In the image enhancement stage, the improvement intensity is set according to the severity of each factor, and based on this, appropriate enhancement algorithms are applied. The intensity of improvement is flexibly adjusted depending on the severity, and various algorithms are employed to achieve the optimal improvement for each level of intensity. Although more efficient or cutting-edge algorithms exist, the specific algorithms for illumination, resolution, and noise improvement in this paper are chosen for the following reasons. First, the selected algorithms have relatively low computational complexity and fast processing speeds, making them suitable for real-time image enhancement applications. Second, these algorithms provide stability and reliability across various image environments and have been proven effective in many use cases. Third, since the parameters of the selected algorithms can be adjusted, they allow fine-tuned enhancements in response to different image conditions, enabling flexible improvement strategies.
For illumination enhancement, three algorithms are applied depending on the severity level. The first intensity level (severity < 0.3) uses Histogram Equalization (HE), which improves the overall contrast by equalizing the brightness distribution. This is a simple yet effective initial enhancement method, targeting images where the lighting is relatively evenly distributed without causing significant distortion in contrast. The second intensity level (0.3 ≤ severity < 0.6) applies gamma correction, which balances the bright and dark areas of the image by adjusting mid-tone brightness, enhancing contrast without significant detail loss. A gamma value of 0.5 is used to maintain the image’s details. Finally, for the third intensity level (severity ≥ 0.6), Contrast-Limited Adaptive Histogram Equalization (CLAHE) is used, which adjusts local contrast to suppress noise while improving overall lighting. CLAHE is particularly useful in images with severely unbalanced lighting, with a clip limit of 2.0 to prevent noise amplification and achieve natural enhancement.
In resolution improvement, three levels of algorithms are applied to enhance image sharpness and detail representation depending on the severity. The first intensity (severity < 0.3) uses linear interpolation, which enlarges the image by a (1.3×) factor in width and height, adding soft details with minimal computation. Linear interpolation is well-suited for real-time applications due to its speed and smooth results. For the second intensity (0.3 ≤ severity < 0.6), bicubic interpolation is applied to enlarge the image by a (1.6×) factor. Bicubic interpolation, using information from 16 surrounding pixels, produces smoother and more natural results than linear interpolation while maintaining some level of original detail. Lastly, for the third intensity (severity ≥ 0.6), B-spline interpolation is used to enlarge the image by a (2×) factor, providing even smoother results, especially in curved or edge areas, while maximizing detail preservation. This method offers high-quality enlargement but requires more computation, making it suitable for cases of high severity.
For noise reduction, three algorithms are applied based on the severity of the noise. The first intensity level (severity < 0.3) uses Gaussian blur with a (1 × 1) kernel to gently reduce fine noise by averaging neighboring pixels. This method is simple and fast, suitable for minimal noise. At the second intensity (0.3 ≤ severity < 0.6), the non-local means filter is applied, which effectively reduces noise while preserving texture and detail by referencing the entire image. A low filter strength of three is set to accommodate the iterative improvement process. Finally, for the third intensity (severity ≥ 0.6), a stronger non-local means filter with a strength of seven is used to aggressively mitigate severe noise that significantly hinders detail, focusing on restoring critical image information.
4.3.4. Image Quality Assessment
After image enhancement, it is necessary to evaluate the quality of the enhanced image to quantitatively measure the improvement effect and adjust the weights and learning rate based on this evaluation. Quality assessment is conducted by comparing the quality changes before and after enhancement in terms of illumination, resolution, and noise.
For illumination improvement, the changes in brightness deviation and contrast are evaluated. By measuring whether the brightness deviation has decreased and the contrast has improved after enhancement, the effectiveness of the illumination improvement is confirmed. This allows us to determine whether the illumination of the image has actually improved, and the results are reflected in the weight adjustment.
In resolution improvement, the changes in edge strength and texture uniformity before and after enhancement are evaluated. Since the goal of resolution improvement is to enhance the sharpness and detail representation of the image, we measure whether edge strength has been increased and texture uniformity has been improved to verify the effect. This helps assess how well the resolution improvement has preserved the image’s details.
For noise reduction, the BRISQUE score and high-frequency noise ratio are used to measure noise reduction before and after enhancement. If the BRISQUE score decreases and the high-frequency noise ratio is reduced, it indicates that noise reduction has been performed effectively. This confirms the positive impact of noise removal on the visual quality of the image.
The results obtained through quality assessment are used to monitor the effectiveness of the improvement tasks and are employed in subsequent steps such as learning parameter adjustment and weight update. Through this process, the enhancement tasks can be iteratively learned and evolved, enabling optimization of the improvement for the characteristics of each image.
4.3.5. Weights Update: Characteristic Analysis-Based Learning Parameter Adjustment
In this session, we propose a method for updating weight values after quality assessment when image improvement tasks are repeated. This process dynamically adjusts the importance of each element and is designed to adapt to various image characteristics.
Equation (18) is the formula for adjusting the weight for brightness deviation.
ηb represents the learning rate for brightness deviation, and
Tb denotes the threshold for brightness deviation.
Equation (19) is the formula for adjusting the weight for contrast.
ηc represents the learning rate for contrast, and
Tc denotes the threshold for contrast.
Equation (20) is the formula for adjusting the weight for image size.
ηi represents the learning rate for image size, and
Ti denotes the threshold for image size.
Equation (21) is the formula for adjusting the weight for edge strength.
ηe represents the learning rate for edge strength, and
Te denotes the threshold for edge strength.
Equation (22) is the formula for adjusting the weight for texture uniformity.
ηt represents the learning rate for texture uniformity, and
Tt denotes the threshold for texture uniformity.
Equation (23) is the formula for adjusting the weight for noise distribution (BRISQUE score).
ηn represents the learning rate for the BRISQUE score, and
Tn denotes the threshold for the BRISQUE score.
Equation (24) is the formula for adjusting the weight for the high-frequency noise ratio. ηf represents the learning rate for high-frequency noise, and Tf denotes the threshold for high-frequency noise.
Weight updates are carried out after the image enhancement process based on the quality assessment results. In this update process, weights and learning rates are adjusted within a specific range to prevent them from becoming too large or too small, allowing for appropriate updating of learning parameters at a proper pace. For example, if both brightness deviation and contrast improve after performing illumination enhancement, the weights αI and βI for these elements are adjusted to decrease their importance. In this case, the weights are adjusted by multiplying the existing values by the learning rate, and the learning rate is decreased by multiplying the current learning rate by 0.9.
If the quality indicators do not improve after the enhancement process, the weight for the corresponding element is increased. This is done by adding the product of the current weight and the learning rate to the existing weight. The learning rate is increased by multiplying the current learning rate by 1.5. In other words, if the improvement of a specific element is achieved properly, the weight for that element is reduced to decrease its importance in future enhancement tasks. Conversely, if the improvement is insufficient, the weight for that element is increased to raise its importance.
During this adjustment process, weights and learning parameters are restricted from exceeding their upper and lower bounds. For weights, the upper limit is set to 1.0 and the lower limit to 0.1 to prevent them from becoming excessively large or small. Similarly, the learning rate is adjusted within an upper limit of 0.05 and a lower limit of 0.001.
This weight adjustment process continues until there are no more factors requiring improvement or until the improvement process exceeds the specified number of iterations. The aim is to allow the weights of each element to gradually approach their optimal values through iterative improvement tasks, building a flexible improvement process that can adapt to various images. As a result, the weights of each element dynamically change according to the situation, and with repeated enhancement tasks, the weights are optimized to achieve better image quality. This dynamic weight adjustment and iterative improvement process play a crucial role in deriving optimal enhancement results tailored to various image conditions.
Algorithm 1 concisely and systematically summarizes the hierarchical image quality improvement process proposed in this paper and intuitively expresses the iterative improvement process, quality evaluation, and weight update steps described in this paper.
Algorithm 1: Hierarchical Image Quality Improvement Process |
Input: Original Image |
Output: Improved Image |
1: Initialize weights and learning rates for illumination, resolution, and noise factors |
2: Define threshold values for severity levels and improvement intensities |
3: Repeat |
//Step 1: Calculate Severity for Each Factor 4:
Calculate illumination severity: |
5:
Brightness Deviation (compare with threshold) |
6: Contrast (compare with threshold) |
7: Calculate resolution severity: 8: Image Size (compare with threshold) 9: Edge Strength (compare with threshold) 10: Texture Uniformity (compare with threshold) 11: Calculate noise severity: 12: BRISQUE Score (compare with threshold) 13: High-Frequency Noise Ratio (compare with threshold) //Step 2: Sort Factors by Severity and Determine Improvement Order 14: Sort illumination, resolution, and noise factors based on calculated severity in descending order 15: Set improvement intensity based on severity levels //Step 3: Apply Improvement Functions Based on Severity 16: for each factor in order of severity do 17: if factor is Illumination then 18: Apply illumination improvement based on severity: 19: Intensity 1: Histogram Equalization 20: Intensity 2: Gamma Correction 21: Intensity 3: CLAHE |
22: else if factor is Resolution then |
23: Apply resolution improvement based on severity: |
24: Intensity 1: Linear Interpolation |
25: Intensity 2: Bicubic Interpolation |
26: Intensity 3: B-Spline Interpolation 27: else if factor is Noise then 28: Apply noise reduction based on severity: 29: Intensity 1: Gaussian Blur 30: Intensity 2: Non-local Means Filter (low strength) 31: Intensity 3: Non-local Means Filter (high strength) 32: end if 33: end for //Step 4: Perform Quality Assessment for Each Factor 34: Measure improvements in brightness deviation, contrast, edge strength, texture uniformity, BRISQUE score, and high-frequency noise ratio |
//Step 5: Adjust Weights and Learning Rates Based on Quality Assessment 35: Update weights for each factor based on improvement results: 36: Increase weight if improvement is insufficient 37: Decrease weight if improvement is effective 38: Adjust learning rates based on weight updates 39: Until convergence or maximum iterations reached 40: Return Improved Image |
5. Experimental Results
In this section, the performance of the hierarchical improvement process is verified by applying it to various images and analyzing the results [
54,
55,
56,
57].
Figure 4 shows the result images after 10 iterations of the hierarchical improvement process.
(A) Tulips Image: For this image, the illumination was evenly distributed, so the illumination severity converged to 0, indicating that no lighting improvement was needed. For the resolution, however, the severity was high, leading to the application of the second intensity improvement (bicubic interpolation) once, followed by the third intensity improvement (B-spline interpolation) once, resulting in an increase in resolution. Initially, the noise severity was 0, so no improvement was applied, but due to the resolution enhancement process increasing the image size, the noise severity increased. After applying the first intensity improvement (Gaussian blur) once, the severity converged to 0.
(B) Rope Image: Similar to the Tulips image, the illumination severity converged to 0, as no lighting improvement was necessary. In terms of resolution, the initial severity was higher than the noise severity, so the third intensity improvement (B-spline interpolation) was applied first, leading to a reduction in severity and eventual convergence to 0. Noise severity increased significantly during the resolution enhancement process, but after applying the first intensity improvement (Gaussian blur) twice, the second intensity improvement (Non-local Means) twice, and the third intensity improvement (Stronger Non-local Means) once, the severity decreased and finally converged to 0.
(C) Apple Image: The initial noise level was low, so noise improvement was not deemed necessary. However, the illumination and resolution improvements caused an increase in noise severity. After applying the first intensity improvement (Gaussian blur) once, the noise severity decreased and converged to 0. Unlike images (A) and (B), image (C) shows an overall increase in brightness due to the illumination improvement compared to the original.
(D) Bike and (E) Car Images: These images are from low-light environments. For the Bike image, the severity was high for resolution, illumination, and noise, leading to the application of hierarchical improvements for all three factors. The severities gradually decreased and converged to 0. The improved image demonstrates enhanced visibility of the Bike object. The Car image was taken at night, making it difficult to recognize the object with the naked eye, and its initial illumination severity was extremely high due to the very low light level. The third intensity improvement (CLAHE) was applied 10 times for the illumination factor, reducing the severity, but due to the extremely low initial lighting level, the severity remained relatively high after improvement. For resolution, the third intensity improvement (B-spline interpolation) was applied once initially. The noise severity increased during the lighting and resolution improvement processes, but after repeated noise improvements, it stabilized at a lower level compared to the initial severity. The final improved image shows a dramatic increase in visibility compared to the original, allowing the Car object to be recognized by the naked eye. However, some of the original red color of the object was lost during the enhancement process, indicating the need for additional correction measures, such as chromacity preservation, to address color distortion issues in low-light image improvements.
Table 1 provides the results of the individual factors after applying the hierarchical improvement process. It confirms that the proposed process generally improves the numerical values across each factor.
Figure 5 and
Figure 6 show the edge density difference for resolution and noise factors between the original images and the various algorithm results. The clearer the edges in the difference images, the higher the edge density, indicating clearer boundaries. By examining the edge density values in
Table 2, it is clear that the values obtained through the proposed process are higher compared to other algorithms.
Moreover, when comparing the values of PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structure Similarity Index Map), which are key image quality metrics, the proposed hierarchical improvement process consistently ranks in the upper tier of performance [
58,
59].
Table 2.
Resolution and noise factor improvement algorithm performance comparison table.
Table 2.
Resolution and noise factor improvement algorithm performance comparison table.
Algorithms | Figure 5 | Figure 6 |
---|
PSNR | SSIM | Edge Density (0.02302) | PSNR | SSIM | Edge Density (0.00692) |
---|
Linear Interpolation [60] | 48.53 | 0.9987 | 0.02188 | 49.69 | 0.9982 | 0.00662 |
Bicubic Interpolation [31] | 48.56 | 0.9981 | 0.02263 | 49.43 | 0.998 | 0.00676 |
B-spline Interpolation [61] | 46.49 | 0.9975 | 0.02274 | 47.96 | 0.9964 | 0.00684 |
Gaussian Blur (5 × 5) [62] | 43.87 | 0.9952 | 0.01811 | 49.57 | 0.9974 | 0.0058 |
NLM [37] | 38.08 | 0.966 | 0.0209 | 38.65 | 0.997 | 0.00687 |
S-NLM | 35.48 | 0.9353 | 0.0197 | 35.84 | 0.9502 | 0.00716 |
WNNM [63] | 48.42 | 0.9993 | 0.02305 | 48.39 | 0.9991 | 0.00682 |
BM3D [38] | 35.81 | 0.9449 | 0.02136 | 37.39 | 0.9711 | 0.0074 |
Ours | 46.38 | 0.9957 | 0.03592 | 43.21 | 0.9922 | 0.00698 |
Figure 7,
Figure 8 and
Figure 9 illustrate the improvement results for illumination factors using various algorithms. CLAHE, based on the default cliplimit parameter value of 40, significantly increased brightness but introduced considerable noise and artifacts during the enhancement process. Similarly, the Retinex-based MSRCR (Multi-Scale Retinex with Color Restoration) and LIME algorithms improved brightness but generated noticeable noise and artifacts throughout the image. However, Jeon and Eom, GCP-MCS and our method can be seen to produce clean images with less noise.
When reviewing the performance of the lighting enhancement algorithms in
Table 3, the proposed process, along with the traditional and lated algorithms, ranks among the top-performing methods.
Table 3.
Illumination factor improvement algorithm performance comparison table.
Table 3.
Illumination factor improvement algorithm performance comparison table.
Algorithms | Figure 7 | Figure 8 | Figure 9 |
---|
PSNR | SSIM | PSNR | SSIM | PSNR | SSIM |
---|
CLAHE [25] | 27.42 | 0.4867 | 28 | 0.6112 | 28.01 | 0.2957 |
MSRCR [64] | 27.73 | 0.4087 | 27.82 | 0.5606 | 27.73 | 0.3887 |
LIME [65] | 28.03 | 0.5729 | 27.83 | 0.5798 | 28.13 | 0.5741 |
Jeon and Eom [66] | 28.58 | 0.5349 | 27.9 | 0.5475 | 27.97 | 0.5423 |
GCP-MCS [67] | 27.71 | 0.5458 | 28.08 | 0.5886 | 29.1 | 0.5958 |
Ours | 28.11 | 0.5475 | 28.06 | 0.6159 | 28.01 | 0.5612 |
The hierarchical improvement process proposed in this paper is ultimately designed to be applied to object detection tasks, with the aim of enhancing detection performance. To measure the improvement in object detection performance, the YOLOv8 “Medium” model [
68] and the RT-DETR model [
49] pretrained on the COCO (Common Objects in Context) dataset [
69] were used.
Figure 10,
Figure 11,
Figure 12 and
Figure 13 show images comparing the object detection task performance.
Figure 10 depicts a high-illumination image. After applying the hierarchical improvement process, the detection accuracy of the YOLOv8 model for the ‘Bottle’ object improved by 0.12, 0.02, and 0.07, respectively, while the prediction accuracy of the RT-DETR model improved by 0.01, −0.01, and 0.02, respectively.
Figure 11,
Figure 12 and
Figure 13 depict images in low-light conditions, where the objects are difficult to distinguish with the naked eye. After applying the hierarchical improvement process, the detection accuracy of the YOLOv8 model for the ‘Bicycle’ object in
Figure 11 improved by 0.04, while the detection accuracy of the RT-DETR model improved by 0.01. Additionally, an additional ‘Bicycle’ object was detected for both models with improved low-light conditions. In
Figure 12, for the ‘Car’ object, the detection accuracy of the YOLOv8 model was improved by 0.02 and 0.03, and the detection accuracy of the RT-DETR model was improved by 0.02 and 0.06. In
Figure 13, we compare the detection and segmentation performance for the ‘Motorcycle’ object. After applying the hierarchical improvement process, the detection accuracy of the YOLOv8 model for the ‘Motorcycle’ object in
Figure 13 was improved by 0.06, and the RT-DETR model detection accuracy was improved by 0.05.
Finally, we compared the segmentation performance for the ‘Motorcycle’ object through additional experiments, as shown in
Figure 14. After applying the hierarchical improvement process, the segmentation of the object The area is much larger than the original image. For a segmentation performance comparison, the deep-learning-based feature-matching algorithm LoFTR (Detector-Free Local Feature Matching with Transformers) [
70] was used to extract the matching accuracy. The feature-matching accuracy for the segmentation area in the original image was 87.07%, while it improved to 96.52% in the enhanced image, a 9.45% increase.
In conclusion, the hierarchical improvement process proposed in this paper enables quality enhancements for illumination, resolution, and noise factors. And the enhanced images contribute to better object detection performance. However, the results also show that in some cases, such as shadow areas near object boundaries, local improvements were insufficient, potentially causing false detections or segmentation errors in object related tasks. Therefore, additional improvement processes targeting such areas may be necessary.
6. Conclusions
In this paper, we proposed a hierarchical improvement process targeting key image quality factors—illumination, resolution, and noise—to enhance object detection performance. The experimental results demonstrated that the proposed hierarchical process improved image quality metrics, such as PSNR and SSIM, across various images, and also yielded tangible improvements in object detection performance.
In experiments using the YOLOv8 model, detection rates for the ‘Bottle’ object increased by an average of 7% in high-light environments, while in low-light environments, detection rates for ‘Bicycle’ and ‘Car’ objects improved by 4% and an average of 2.5%, respectively. And the detection rate of the ‘Motorcycle’ object was improved by 6% in low-light environments.
In addition, in the experiments using the RT-DETR model, the detection rate of the ‘Bottle’ object increased by an average of 0.67% in a high-light environment, and the detection rates of the ‘Bicycle’ and ‘Car’ objects improved by an average of 1% and 4% in a low-light environment, respectively. And the detection rate of the ‘Motorcycle’ object improved by 5% in a low-light environment.
Additionally, segmentation performance for the ‘Motorcycle’ object showed a 9.45% improvement in matching accuracy in enhanced images, confirming that hierarchical image enhancement can contribute to better object recognition.
While the proposed process effectively improves environmental factors such as illumination imbalance, low resolution, and noise, limitations remain, such as false detections in shadowed areas and some degree of color distortion. These limitations are particularly pronounced in complex backgrounds and low-light conditions, where they can reduce object detection accuracy. To address these issues, future research should incorporate additional processes for shadow recognition and color correction, as well as explore methods to maintain high detection performance in complex backgrounds.
Moving forward, we plan to explore real-time optimization of the proposed hierarchical improvement process and validate its performance across diverse object detection models and datasets. Specifically, we aim to enhance processing speed for real-time applications, such as autonomous driving and surveillance systems, and to systematize responses to environmental factors. Through these efforts, we anticipate that this approach will further improve object detection performance across a wide range of real-world applications.