1. Introduction
Autonomous driving platforms relying on cameras, such as visible light-based RGB (red-green-blue) or YUV with CMOS (complementary metal oxide semiconductor) technology, exhibit remarkable performance in object detection, recognition, and information provision [
1,
2]. However, a significant drawback arises in nighttime environments where obtaining high-quality image data for object recognition becomes challenging due to the absence of ambient light [
3].
To address this limitation in the automotive and defense industries, ongoing research is actively exploring the integration of night vision systems using infrared cameras. Infrared-based commercial cameras utilize the infrared wavelength range and are generally divided into three product groups that utilize specific wavelength ranges: (1) SWIR (short-wave infrared, 0.9–1.7
m), (2) MWIR (mid-wave infrared, 3–5
m), and (3) LWIR (long-wave infrared, 8–14
m) [
4,
5,
6].
In SWIR-based cameras, the principle is that when energy from the light source hits an object and reflects, the detector visualizes the reflected energy. Therefore, in places such as underground parking lots or tunnels where there is no light, SWIR-based cameras have the disadvantage of not being able to obtain valuable images that users can utilize. For this reason, the size of the product increases because a light source is essential to operate under various conditions and a cooled detector must be used. The need to use a cooled detector increases the price of the product and requires significant power consumption due to the larger size of the products. Consequently, SWIR-based cameras are generally used mainly in the defense industry.
MWIR- and LWIR-based cameras are generally known to users as thermal-imaging cameras. MWIR-based cameras can acquire information on objects located at a distance because their atmospheric transmittance is relatively high compared to cameras that utilize other infrared wavelengths. However, like SWIR-based cameras, they must utilize a cooled detector, which increases the size of the product and the cost of production, and requires a large amount of power consumption. On the other hand, LWIR-based cameras have the advantage of being able to acquire information across a wide temperature range because they can detect most of the thermal energy emitted by various targets. Additionally, LWIR-based cameras can use bolometer-type detectors, allowing the use of either a cooled or uncooled detector depending on the intended use of the product. This means that products can be manufactured with characteristics such as low power consumption, low cost, and miniaturization, depending on the intended use.
Although LWIR-based cameras are known to have relatively shorter detection ranges than MWIR-based cameras, they provide performance that satisfies most distance conditions for situational awareness according to various standards (e.g., ISO-26262 [
7]) or user requirements [
4]. Therefore, to meet various standards or requirements such as cost or other conditions, depending on the application, LWIR-based cameras can be utilized most widely.
For this reason, ongoing academic research is actively exploring the integration of night vision systems using LWIR-based thermal-imaging cameras into autonomous vehicle platforms [
8,
9,
10]. Especially in the case of research and development (R&D) centers within the automotive industry, LWIR-based cameras are being developed for use as night vision systems among various infrared wavelengths, as they must produce finished products that meet standards such as ISO-26262. LWIR-based thermal-imaging cameras typically employ two detectors, categorized as either (1) cooled or (2) uncooled [
11,
12,
13]. Cooled detectors provide high-quality image acquisition but are expensive to produce, large in size, and require significant power, so they are mainly used in applications such as defense. On the other hand, uncooled detectors, which are cheaper to produce, smaller in size, and require less power, are preferred in autonomous vehicle platforms within the automotive industry. However, since uncooled detectors do not have a mechanical cooler, noise removal and pixel value correction for temperature changes require additional processing of the raw data collected.
Essential pre-processing steps, such as non-uniformity correction (NUC) to address fixed pattern noise and temperature compensation (TC) to offset temperature-related pixel value variations, are crucial for resolving hardware-related issues [
14]. Nevertheless, images obtained after NUC and TC processes exhibit low-dynamic range (LDR) characteristics, rendering them unsuitable for deep learning or machine learning-based object detection and recognition, essential components in autonomous vehicle platforms. To overcome this challenge, research is underway to develop contrast-enhancement techniques, specifically aiming to convert LDR into high dynamic range (HDR) characteristic images after NUC and TC processes.
Various histogram equalization (HE)-based methods exist for image contrast enhancement. Most commercially available products commonly utilize global histogram equalization (HE)-based methods after non-uniformity correction (NUC) and temperature compensation (TC) processes to enhance image quality. This approach helps reduce production costs and ensures stability, meeting military standards (MilSpecs) or international standards organization (ISO)-26262 requirements in autonomous or military industries. However, conventional HE methods, relying on a probability density function (PDF) and cumulative distribution function (CDF), can oversaturate results when histogram values are excessively concentrated, leading to issues such as a shifted average brightness level. Moreover, conventional HE-based methods solely present performance evaluations on data obtained from mass-produced products. This means there is a lack of experimental results on contrast enhancement using images calculated exclusively with NUC and TC processes. Additionally, most studies employing these methods only conduct experiments in driving scenes characterized by good image quality and favorable driving scenarios. Consequently, it becomes challenging to assert the algorithm suitability for infrared thermal-imaging cameras in autonomous driving platforms since the performance evaluation is confined to specific favorable driving conditions. Therefore, to comprehensively assess performance for deployment in autonomous driving platforms, it is imperative to conduct experiments in worst-case driving environments, including scenarios such as tunnels.
In this paper, we introduce a four-group-based HE method designed for contrast enhancement. Additionally, we present experimental results demonstrating the effectiveness of contrast enhancement using images after NUC and TC processes, considering both best and worst driving scenarios. The primary objective of our proposed method is to exhibit contrast enhancement performance in both favorable and challenging driving conditions. Moreover, regarding the experimental results, the comparison between the proposed method and conventional methods in both best and worst driving scenarios enables a comprehensive evaluation. In conclusion, the obtained images serve to determine the most suitable contrast-enhancement technique after the NUC and TC processes, providing potential application probability in mass-produced products.
3. Proposed Method
When using NUC and TC, as detailed in Equations (
1)–(
4) and mentioned in
Section 2 contrast enhancement is essential to provide meaningful images to users. Pixel values processed through NUC and TC are in a 14-bit format. If a contrast enhancement algorithm is performed using a high-performance processor in an embedded environment, the 14-bit image data produced through the NUC and TC algorithms can be utilized.
However, as mentioned in
Section 1, to produce a finished product suitable for the automotive industry, components must be standard specification such as ISO-26262. In other words, to produce an LWIR-based camera for automotive, processors and memories must comply with ISO-26262 standard specifications. Additionally, Original Equipment Manufacturers (OEMs) require products like cameras to have low-power and low-production cost characteristics.
For LWIR-based camera products to satisfy OEM requirements, they must use low-cost and low-power memory (e.g., LPDDR2 or LPDDR3) and low-cost processors (e.g., TI TDA3x) while meeting the ISO-26262 standard. In such as embedded platform environment, to ensure acceptable image quality for OEMs and other users, communication with various external components (e.g., Controller Area Network (CAN) or I2C) and cyber-security must be possible. Additionally, NUC, TC, global contrast enhancement, and local contrast enhancement algorithms are all operational.
Therefore, when both global and local contrast enhancement algorithms are utilized, the 14-bit format data obtained after performing NUC and TC are generally reduced to 8-bit. This reduction is necessary to meet various conditions such as processing speed and power consumption. Considering that a local contrast enhancement algorithm will be included in the future, this paper proposes a global contrast enhancement algorithm based on 8-bit image data.
3.1. Motivation
When acquiring an image in 14-bit format after performing NUC and TC operations, it is inevitable that pixel values are concentrated in a specific area due to the characteristics of the LWIR-based camera. Therefore, when downscaling the image data from 14-bit to 8-bit (after applying the automatic gain control described later), the low-temperature, medium-temperature including room-temperature, and high-temperature areas can be more clearly distinguished. As a result, the temperature areas are more distinct in the 8-bit domain (space) than in the 14-bit domain (space).
From this perspective, the analysis results shown in
Figure 1 were confirmed.
Figure 1 presents the results of analyzing the values of each pixel after downscaling the image, calculated using NUC and TC, to 8-bit format. As shown in
Figure 1, in a typical driving scenario with NUC and TC, the pixel values of the infrared-based thermal images fall within the low-temperature, medium-temperature, and high-temperature ranges in the histogram plot (low-temperature range: 0 to 63, medium-temperature range: 64 to 191, and high-temperature range: 192 to 255).
The red annotation area in the thermal image represents the sky and the pixel values were confirmed to be in the early 20s. In the case of the green annotation located on the asphalt road, it was observed that the pixel values were in the early 100s. Lastly, in the case of the blue annotation located on the rear side of the vehicle, it can be observed that the pixel values of the vehicle fall within the range of 149 to 183. However, the exhaust pipe of the vehicle has a slightly higher value of 195 or higher than the surrounding objects due to the high temperature.
When viewed as a histogram plot, pixel values can be grouped into a total of four regions. First, in the case of the sky with a lower temperature compared to surrounding objects (without the sun), it falls within the first group (region) with pixel values between 0 and 63. Second, on vehicle-driving roads such as asphalt, it falls within the second group (region) between 64 and 127. Third, the vehicle falls within the pixel group (region) between 128 and 191. Lastly, parts expressing high temperatures, such as exhaust pipe or the sun, belong to the fourth group (region) with a pixel value of 192 or higher. In other words, in terms of histogram frequencies, unlike CMOS-based cameras, it was observed that the pixel values from the 8-bit infrared thermal image exhibit the characteristic of being clustered in specific groups (regions).
3.2. Algorithm
Based on the analysis results depicted in
Figure 1, we propose a region-based HE method that utilizes clipping and distribution techniques with a dynamic clip limit. As illustrated in
Figure 2 and pseudo-code (Algorithm 1) for the proposed method, the operational process of the proposed method comprises five steps: (1) Automatic Gain Control (AGC) including Histogram Bin Calculation, (2) Histogram Group Division, (3) Histogram Clipping, (4) Excess Value Distribution, and (5) Output Value Mapping.
3.2.1. Automatic Gain Control (AGC) and Histogram Bin Calculation
In the AGC step, prior to Histogram Bin Calculation, the bit depth of the input pixel values in the computed image, obtained using NUC and TC methods, is reduced from an N-bit format to an 8-bit format, as defined by Equations (
5)–(
7). Thereafter, in the Histogram Bin Calculation step, the frequencies of pixel values across the entire image in 8-bit format are then computed.
where
represents the input image computed by using the NUC and TC methods, whereas
and
denote the maximum and minimum pixel values of
, respectively. The variable
corresponds to the desired bit reduction from the input pixel value to the output pixel value, and
represents the output image with
-bit depth.
Algorithm 1 Pseudo-Code for Proposed Contrast Enhancement Method Using Region-Based Histogram Equalization with Dynamic Clipping Technique |
Input: : Input Image with N-bit format Output: : Contrast Enhanced Output Image
- 1:
< Automatic Gain Control (AGC) > - 2:
← max - 3:
← min - 4:
← 255 × () / () - 5:
- 6:
< Histogram Bin Calculation > - 7:
for to N do - 8:
for to M do - 9:
← + 1 - 10:
end for - 11:
end for - 12:
- 13:
< Histogram Group Division > - 14:
← - 15:
← - 16:
← - 17:
← - 18:
- 19:
< Histogram Clipping > - 20:
for to 4 do - 21:
← max() - 22:
← small() - 23:
[, ] ← Clip(, , ) - 24:
end for - 25:
- 36:
< Excess Value Distribution > - 27:
← [, , , ] - 28:
E ← + + + - 29:
← + (E / 256) - 30:
- 31:
< Output Value Mapping > - 32:
C ← CDF() - 33:
P ← PDF(C) - 34:
for to N do - 35:
for to M do - 36:
← 255 × - 37:
end for - 38:
end for
|
Figure 2.
Operation process of the proposed method.
Figure 2.
Operation process of the proposed method.
3.2.2. Histogram Group Division
After computing the pixel value frequencies in the Histogram Bin Calculation step, the Histogram Group Division step involves dividing the histogram bin values into four groups, based on the analysis results presented in
Figure 1 and defined by Equations (
8) and (
9).
where
i represents the histogram region with values ranging from 1 to 4, and
n represents the histogram bin value. In the first histogram region, frequencies are considered for bin values ranging from 0 to 63. The second histogram region includes frequencies for bin values from 64 to 127, whereas the third histogram region encompasses frequencies for bin values from 128 to 191. Finally, the fourth region comprises frequencies for bin values from 192 to 255. In other words, each histogram region is subdivided into four, with each region having a histogram bin value range of 64.
As explained in the motivation subsection, each frame captured by the LWIR-based camera contains extensive temperature information. By applying AGC, the analysis can be performed within a limited domain (space), allowing the intervals to be distinguished based on the temperature range. For example, when the temperature decreases (e.g., from the sky to below zero), the pixel value approaches 0. Conversely, when the temperature increases (e.g., from a car engine or a fire), the pixel value approaches 255. In the case of medium temperature, the range is considerably wider. However, even within the intermediate temperature range, the thermal inversion phenomenon is generally observed based on the median value within the limited domain (space), depending on the external environment temperature (e.g., in the 8-bit domain, the pixel value is 127). Therefore, the intermediate temperature range is divided into two histogram regions. As a result, the proposed method divides the histogram into four regions, allowing the pixel values within each temperature region to be distinguished and utilized.
3.2.3. Histogram Clipping
In the Histogram Clipping step, the frequency values of histogram bin values for each region are clipped using Equations (
10)–(
13).
where
i represents the histogram region;
is the maximum histogram frequency value of region
i;
is the minimum histogram frequency value of group
i (minimum non-zero value if a histogram exists, if the histogram does not exist in group
i);
is the difference value between maximum and minimum histogram frequency values;
is the weight factor for selecting the threshold for the clipping operation;
is the excess value for each region. As shown in Equations (
10) and (
11), the difference in frequency values between the maximum and minimum histogram frequencies is calculated for each region.
Subsequently, the threshold value for the clipping operation is determined using the difference value and the weight factor , which ranges from 0 to 1.
After determining the threshold value, as shown in Equation (
13), the histogram frequency values for each group are clipped. During the clipping operation, similar to the clip-limit adaptive HE (CLAHE) method [
28], the excess value is calculated using Equation (
12). When the histogram frequency value is greater than the threshold value for the clipping operation, the histogram frequency value is adjusted to the threshold value. Conversely, when the histogram frequency value is less than the threshold value, the histogram frequency value remains unchanged.
3.2.4. Excess Value Distribution
In the Excess Value Distribution step, the excess value is distributed for each histogram region by using Equation (
14).
As illustrated in Equation (
14), first, the summed excess value is divided by 256, and then the resulting divided excess value is added to the clipped histogram frequency values from bin value 0 to 255. This distribution process, as per Equation (
14), helps prevent oversaturation of histogram frequency values at specific points during CDF computation.
3.2.5. Output Value Mapping
In the Output Value Mapping step, the output value is then calculated using Equations (
15)–(
18).
where
is the calculated value by using the cumulative distribution function (CDF);
w and
h are the width and height size of the computed image using the NUC and TC methods, as follow:
where
is the calculated CDF value;
is the normalized CDF by using
w and
h;
w and
h are width and height size of the computed image using the NUC and TC methods;
is the selected CDF value using
; and
is the output image with improved contrast ratio. Using Equations (
15) and (
16), the CDF value can be calculated. Subsequently, the CDF value is selected using the 8-bit computed input image obtained through the NUC and TC methods. Finally, normalization is performed, and the output value is calculated using the selected CDF value multiplied by 255. During the computation of the output value, it is fixed to 255 when the selected CDF value is greater than one, as the CDF value can exceed one due to the excess value distribution process.
4. Experimental Results
To evaluate the performance of the proposed method, it is crucial to set the parameter
for determining the threshold value for Histogram Clipping in each histogram region. In
Section 4,
was set to 0.5. This choice was made because if
approaches 1, the Histogram Clipping result for each histogram region according to the threshold value becomes very weak, resulting in no significant difference from traditional histogram equalization (THE). Conversely, if
approaches 0, the Histogram Clipping result for each histogram region can have a very strong effect. However, as per Equations (
14)–(
18), when pixel values are densely distributed in a specific histogram region, the contrast enhancement performance can be significantly reduced as
approaches 0. Additionally, there is a risk that the average pixel level of the image after contrast enhancement processing may decrease, leading to a notable reduction in image brightness. Therefore, in this paper, the experimental results obtained by setting
to 0.5, which is the median value between 0 and 1, were compared with the other conventional methods.
4.1. Qualitative Comparison (Visual Comparison)
4.1.1. Best Driving Scenario
Figure 3 presents experimental results comparing the proposed method with various conventional contrast enhancement methods for visual comparison under the best driving scenario. First,
Figure 3b–l shows the experimental results using histogram-based conventional contrast enhancement methods. Second,
Figure 3m–r shows the experimental results using retinex-based conventional contrast enhancement methods. Third,
Figure 3s–w shows the experimental results using other technique-based (e.g., de-haze) conventional contrast enhancement methods. Finally
Figure 3x shows the experimental result using the proposed method based on histogram techniques. As depicted in
Figure 3a, the downscaled 8-bit image obtained using NUC, TC, and AGC reveals objects such as crosswalks, vehicles, and apartments.
Among the histogram-based contrast enhancement methods,
Figure 3c,f,i,j showcases the results when using BBHE (Brightness-Preserving Bi-Histogram Equalization) [
29], RMSHE (Recursive Mean-Separate Histogram Equalization) [
32], BHEPL (Bi-Histogram Equalization with a Plateau Limit) [
35], and RLBHE (Range-Limited Bi-Histogram Equalization) [
15]. These methods improve the contrast ratio compared to the input image, making detailed object components visible. However, the image clarity appears somewhat diminished, akin to a foggy appearance.
Conversely,
Figure 3d,e,g displays the outcomes when utilizing DSIHE (Dualistic Sub-Image Histogram Equalization) [
30], MMBEBHE (Minimum Mean Brightness Error Bi-Histogram Equalization) [
31], and BPDHE (Brightness-Preserving Dynamic Histogram Equalization) [
33]. These methods exhibit increased sharpness compared to BBHE [
29], RMSHE [
32], BHEPL [
35], and RLBHE [
15] results. However, oversaturation of pixel values is observed in trees and signs on the far left of the image, causing inaccuracies in object details and pixel values.
In contrast,
Figure 3b,h,k,l,x demonstrates that applying THE, BPHEME (Brightness-Preserving Histogram Equalization with Maximum Entropy) [
34], RG-CACHE [
16], ROPE (Reflectance-Oriented Probabilistic Equalization) [
17], and the proposed method yield better contrast enhancement performance in terms of contrast ratio and sharpness compared to conventional methods. Detailed parts of images enhanced using THE, BPHEME, RG-CACHE, ROPE, and the proposed method are clearly visible, surpassing the results obtained with other histogram-based conventional methods.
Among the retinex-based contrast enhancement methods, as shown in
Figure 3m, AMSR (Adaptive Multi-Scale Retinex) [
18] exhibited low pixel-level values and poor contrast enhancement performance. As depicted in
Figure 3r, when using LIME [
24], it showed an oversaturated experimental result compared to other conventional and proposed methods. On the other hand, as shown in
Figure 3n–q,s, when using NPE (Naturalness Preserved Enhancement) [
19], SIRE (Simultaneous Illumination and Reflectance Estimation) [
20], SRIE (Simultaneous Reflectance and Illumination Estimation) [
23], MF (Multi-Scale Fusion) [
21], and SRLLIE (Structure-Revealing Low-Light Image Enhancement) [
22], they exhibited better contrast enhancement performance than other retinex-based methods. However, generally, these methods showed relatively lower contrast enhancement performance compared to histogram-based contrast enhancement methods.
Among the other technique-based contrast enhancement methods, the method proposed by Dong [
25] showed good edge enhancement in the image compared to other methods. However, in terms of contrast enhancement, the pixel values are oversaturated compared to other methods. In other words, the contrast enhancement performance is lower than other contrast enhancement methods. As shown in
Figure 3u–w, they demonstrate better contrast-enhanced experimental results compared to the results obtained when using the method proposed by Dong [
25]. However, they exhibit lower contrast enhancement performance compared to the experimental results using histogram-based contrast enhancement methods.
Table 2 and
Figure 4 present the results of subjective evaluations based on blind tests of 5 min videos (equivalent to 300 frames) containing frames from the best driving scenario, conducted by nine individuals including R&D engineers working in the automotive or military industries.
Table 2 displays three items: (1) subjective scores for each individual, (2) average score, and (3) rank;
Figure 4 illustrates graphs containing two items: (1) average score and (2) rank.
Subjective evaluation scores for each individual range from one to five points, with one point indicating the video consisting of the worst quality frames and five points indicating the video consisting of the best quality frames. The average score is calculated by summing up the subjective scores evaluated for each method and dividing by the number of individuals (nine in
Table 2). The rank value is determined by ranking the calculated average scores from top to bottom.
As evident in
Table 2 and
Figure 4, our method obtained an average score of 4.56 and ranked 6th when sorted from top to bottom. Being ranked 6th implies being within the top 30% (approximately within the 7th rank) overall. When analyzing solely within the histogram-based contrast enhancement method category and ranking sequentially, the proposed method is positioned in the 6th rank, indicating it received a medium average score. Compared to contrast enhancement methods based on retinex and other techniques, it is apparent that the proposed method achieves a better rank than conventional methods. In other words, through subjective evaluation, which visually assesses the image, it is confirmed that histogram-based contrast enhancement methods are most effective in the best driving scenario.
4.1.2. Worst Driving Scenario
Figure 5 presents experimental results comparing conventional and proposed contrast enhancement methods for visual comparison under the worst driving scenario located in a tunnel. The sequence of methods applied to compute the experimental result frames from
Figure 5a–x is the same as in
Figure 3. In the worst driving scenario, improving the contrast ratio of the 8-bit input image is crucial for detecting and recognizing objects for autonomous platforms. Enhanced contrast is essential for accurately recognizing the driving status.
Histogram-based contrast enhancement methods exhibit similar trends to the experimental results in the best driving scenario. In
Figure 5e,g,i, when MMBEBHE, BPDHE, and BHEPL are applied, the shape of the vehicle in the tunnel is clearly visible, but they do not accurately represent the environment around the vehicle within the tunnel. On the other hand, in
Figure 5c,f,h,j, although the clarity in the vehicle region is relatively reduced, the contrast is improved to a level where the driving environment in the tunnel can be roughly judged. However, the overall visual evaluation still feels dark due to low pixel brightness levels observed in
Figure 5f,h,j. In
Figure 5c, the contrast-enhanced image has a relatively high brightness pixel value compared to
Figure 5f,h,j. However, there is a problem of oversaturation in the vehicle region, making it impossible to accurately analyze object characteristics, and there is low contrast enhancement in the background region where the driving environment can be identified. In
Figure 5b,d,k,l, the contrast has been improved to the point where the driving environment within the tunnel can be accurately distinguished compared to the results of the other histogram-based contrast enhancement methods. However, the vehicle region is so oversaturated that the wheels and vehicle body cannot be visually distinguished, and the auxiliary lights in the tunnel also appear oversaturated.
In
Figure 5x, the proposed method demonstrates a uniform improvement in overall image contrast across all areas. Based on the contrast-enhanced image using our proposed method, it is evident that the pixel brightness level maintains an appropriate value, indicating it is not oversaturated compared to conventional histogram-based enhancement methods. When assessed by regions, the contrast ratio has improved sufficiently to clearly identify the driving environment within the tunnel where the frame was captured. This indicates reasonably good performance among the histogram-based methods.
The experimental results for retinex-based contrast enhancement methods are presented from
Figure 5m–s. Among these methods, the SRLLIE [
22] method, depicted in
Figure 5m, exhibited poor contrast enhancement performance, making it difficult to identify vehicles and the driving environment. However, when utilizing AMSR [
18], NPE [
19], SIRE [
20], SRIE [
23], MF [
21], and LIME [
24], as shown in
Figure 5m–r, relatively high contrast enhancement performance was observed. Among these six methods,
Figure 5m,o,p, which represents the results of utilizing AMSR [
18], SIRE [
20], and SRIE [
23], respectively, displayed sufficient contrast enhancement in both object and background regions for recognizing the driving environment. However, since the brightness level of the contrast-enhanced images is generally low, post-processing techniques such as gamma correction may be considered to further improve visibility.
The experimental results from
Figure 5u–w revealed poor contrast enhancement performance, similar to
Figure 5s. However, when employing the method proposed by Dong [
25], illustrated in
Figure 5t, notable contrast enhancement performance with high edge preservation was observed. Comparing the experimental results from
Figure 5r–t, it is important to note that the ranking of user-preferred images may vary based on subjective evaluation. Therefore, a blind test was conducted on the worst driving scenario to rank the images from top to bottom, similar to the methodology described in
Table 2 and
Figure 4.
Table 3 and
Figure 6 display the subjective evaluation results conducted under blind conditions on a 5-min video, comprising 300 frames depicting the worst driving scenario, similar to the experiments outlined in
Table 2. Our method achieved an average score of 3.44 and ranked 5th when sorted from top to bottom. This places our method within the top 30% (approximately within the 7th rank) overall for this scenario. In the histogram-based contrast enhancement method category, our method secured the 1st rank when ranked sequentially, indicating it received the highest average score among its peers. Comparing our proposed method with retinex- or other technique-based contrast enhancement methods, it ranked 5th, placing it within the top 50% (approximately 6 out of 12).
In conclusion, it is evident that no single category of contrast enhancement methods demonstrates overwhelmingly superior performance in the worst driving scenario. A comparison of
Table 2 and
Table 3 reveals significant performance discrepancies among conventional contrast enhancement methods in subjective visual evaluations across the best and worst driving scenarios. Conversely, our proposed method consistently ranks 6th and 5th in the best and worst driving scenarios, respectively. This consistency offers the advantage of providing users with contrast-enhanced images containing uniform information regardless of the driving conditions.
4.2. Quantitative Comparison
In the quantitative comparison, we assess various aspects of contrast enhancement using six metrics: (1) Enhancement Measure (EME), (2) Entropy, (3) Linear Fuzziness (LIF), (4) Lightness Order Error (LOE), (5) Structural Similarity (SS), and (6) Mean Processing Time (MPT). A higher EME value indicates a larger dynamic range within each pre-defined cell, whereas higher values of entropy and SS indicate greater information content in the image. Conversely, lower values of LIF and LOE signify better enhancement.
4.2.1. Best Driving Scenario
Table 4 and
Figure 7 present the experimental results of performance evaluation using objective metrics for the best driving scenario frames, as illustrated in
Figure 3. When utilizing the proposed method, the EME, entropy, LIF, LOE, and SS metrics are 6.8451, 6.4485, 0.4959, 15.8458, and 0.9043, respectively. For metrics where higher values indicate better performance, the method showing the highest performance in EME is THE with a value of 9.5430, and the proposed method ranked 6th with a value of 6.8451, which means that when considered as a percentage, the proposed method is in the top 30% (approximately within the 7th top rank). When considering
Table 2, which shows the subjective evaluation experimental results, it can be seen that the EME values of the proposed method and THE are similar to the average score-based rank results through subjective evaluation.
Regarding entropy, generally, a higher indicator value indicates better performance. In the entropy metric, the proposed method is ranked 12th with a value of 6.4485, indicating moderate performance. In terms of the SS metric, the proposed method is ranked 16th with a value of 0.9043, indicating relatively low performance. However, as shown in
Table 2, it can be observed that the entropy and SS values of methods that received good results in subjective evaluation (e.g., THE, RG-CACHE [
16], ROPE [
17], and the proposed method) are located at low ranks. In other words, in general, as the contrast ratio is greatly improved, it can be said that the better the image quality, the lower the entropy value. This is because the LWIR-based thermal image that can be obtained after the NUC and TC processes basically has a low contrast ratio.
Conversely, for LIF and LOE metrics, a low value indicates high performance. In terms of LIF, when using the proposed method, it ranks 12th with a value of 0.4959, indicating medium performance compared to conventional contrast enhancement methods. Regarding LOE, when using the proposed method, it ranks 5th with a value of 15.8458, indicating high performance (within the top 25%) compared to conventional contrast enhancement methods. The objective performance evaluation results, including LOE and LIF, of the proposed method were satisfactory. However, it is noted that LOE and LIF also exhibit poor index values for conventional methods that received good average scores in subjective evaluation. Therefore, this suggests that the previously used objective indicators cannot be relied upon as a sole standard when evaluating contrast improvement results for LWIR-based thermal images computed after NUC and TC processes in the best driving scenarios.
4.2.2. Worst Driving Scenario
Table 5 and
Figure 8 showcase the experimental results of performance evaluation for the worst driving scenario frames, as depicted in
Figure 5. In terms of EME, the proposed method obtained a rank of 14, from top to bottom, with a value of 22.1756. However, it is essential to note that EME computation relies on the minimum and maximum values per pre-defined cell. Consequently, in worst-case scenarios where the contrast ratio is enhanced, extreme brightness or darkness may skew EME results. Thus, EME may not offer a fair comparison metric as it could be influenced by factors like image illuminance, especially in experimental results of the worst driving scenario using LWIR-based thermal images.
Similarly, the SS metric assumes high structural visibility in the original image. However, in worst-case scenarios, the input image for contrast enhancement lacks clear structure due to an extreme low dynamic range acquired by low infrared energy. Therefore, conversely, a lower SS value might indicate better performance in such scenarios. This is because an image with improved contrast has a specific structure unlike the original input image, making it significantly different. Hence, a smaller SS value indicates better image quality. Based on this understanding, for the SS metric, the proposed method ranked 9th with a value of 0.6606, placing it in the top 40%. Therefore, when using the proposed method for the worst driving scenario, it showed medium performance.
In terms of the LIF, LOE, and entropy metrics, they are calculated using the original image, making them more reliable for understanding the overall driving environment. However, since the original image has low structure and poor dynamic range characteristics, these metrics may not be efficient for objectively evaluating contrast enhancement performance in the worst driving scenario. This is particularly evident when considering the experimental results presented in
Table 3 and
Table 5.
4.2.3. Processing Speed Performance
Table 6 and
Figure 9 present the MPT and frames-per-second (FPS) metrics for both the proposed and conventional methods. These metrics were extracted through experiments conducted using MATLAB software (R2023a version) on a personal computer environment. The MPT values for the proposed and conventional methods were computed based on 200 frames with a resolution of 640 × 480 obtained from the QuantumRed product of Hanwha Systems Company.
Among the histogram-based contrast enhancement methods, except for BPDHE [
33], the proposed and conventional methods exhibited similar MPT performance. Converting MPT to FPS yields performance ranging from approximately 10.7 to 12.8 FPS across methods. The proposed method ranked third in terms of both MPT and FPS indicators. However, these values fall short of the real-time performance benchmark of 30 FPS.
Among the retinex-based contrast enhancement methods, NPE [
19], SIRE [
20], SRIE [
23], and SRLLIE [
22] required a large amount of processing time, making real-time operation impossible. On the other hand, AMSR [
18], MF [
21], and LIME [
24] required relatively less processing time compared to the other retinex-based contrast enhancement methods. When comparing our proposed method with the retinex-based contrast enhancement methods, our proposed method showed appropriate processing performance.
In terms of the other technique-based contrast enhancement methods, they exhibited fast processing speeds compared to both histogram-based and retinex-based contrast enhancement methods. However, it is evident from previous experimental results that methods such as those proposed by [
27,
36], which achieve the real-time performance of 30 FPS or higher, exhibit poor contrast enhancement performance in both the best and worst driving scenarios.
In conclusion, methods that demonstrate a certain level of performance in previous experimental results do not achieve real-time processing speeds of more than 30 FPS on a personal computer. Furthermore, there is a risk of performance degradation when running these methods in embedded environments. Solutions for achieving real-time performance will be discussed in
Section 5.
5. Discussion
5.1. Industry Contribution
In
Section 4, we visually compared two driving scenarios (best and worst case) and conducted a qualitative evaluation using six metrics. The application of the proposed method demonstrates significant improvements, particularly in terms of enhanced sharpness and contrast ratio in both the best and worst driving scenarios. These findings suggest that the proposed method holds promise for potential use in mass-produced products.
This paper introduces a region-based histogram equalization algorithm with dynamic clipping technique for enhancing N-bit original images following NUC and TC processes, a topic not previously explored in the literature. By combining objective and subjective evaluations, our study provides comprehensive performance evaluation results. We anticipate that our findings will enable companies in defense and electronics industries to implement stable methods for mass-producing products utilizing LWIR-based thermal cameras. However, whereas our proposed method shows promising performance, qualitative evaluation metrics yielded mixed results, indicating the need for further investigation into their alignment with objective/subjective evaluations by actual users.
5.2. Contrast Enhancement Performance
Figure 10 illustrates the performance disparity between methods that excelled in subjective evaluation in the best and worst driving scenarios. Notably, methods like THE, BPDHE, BPHEME, RG-CACHE, and ROPE, which performed well in the best scenario, exhibit significant rank differences ranging from 4 to 16 or more in the worst scenario. Conversely, LIME, NPE, and the method proposed by Dong, which showed effectiveness in the worst scenario, display rank differences of at least 13 to 19. Interestingly, DSIHE and our proposed method demonstrate minimal variation in ranking between the best and worst scenarios. However, our method consistently ranks higher (6th and 5th) compared to DSIHE (6th and 7th), indicating superior performance and uniform image information provision on average. Therefore, for LWIR-based thermal-imaging cameras used in defense and electronics industries, our proposed method emerges as a viable choice due to its ability to consistently deliver uniform information to users.
5.3. Processing Speed Performance with Production Cost
Regarding processing speed performance, neither the proposed method nor conventional methods achieved real-time processing performance in a personal computer environment with a resolution of 640 × 480. The primary limitation arises from executing a contrast enhancement algorithm on a CPU, where only one frame of the input image can be stored in memory at a time, followed by subsequent calculations.
Considering the CPU-based operation mechanism, if all algorithms (including NUC, TC, and contrast enhancement) for thermal image processing are executed in an embedded environment, not only will latency increase, but FPS will also fall short. This deficiency presents a critical challenge as it fails to meet the low latency and processing speed requirement of 30 FPS or higher, typically demanded in the defense and automotive industries. This shortfall is particularly significant because both defense and automotive industries now demand resolutions higher than high definition (HD, 1280 × 720), alongside low latency and ultra-high FPS performance. Specifically, automotive applications necessitate up to 60 FPS for ADAS systems in high-speed driving environments, whereas defense systems require up to 100 FPS to counteract high-speed weapons.
Therefore, considering the pre-processing steps (NUC and TC) before the contrast enhancement algorithm, it becomes imperative to employ a thermal imaging processor equipped with accelerators optimized for these algorithms to meet the processing performance requirements across various resolutions in embedded environments. For the pre-processing accelerator, the RTL (Register Transfer Level) circuit can be designed with a fully pipelined architecture.
In the case of contrast enhancement, encompassing both proposed and conventional algorithms, the RTL circuit may not support a fully pipelined architecture. However, by utilizing internal FIFO memory to store the input frame of the image and simultaneously calculate the histogram, latency, and processing time can be significantly reduced compared to the CPU operation mechanism. If such an optimized accelerator-based thermal imaging processor is applied to the product, the production cost of the final LWIR-based camera can be lowered because only optimized hardware resources (e.g., block random access memories (BRAMs), LUTs, and registers) are used.
6. Conclusions
In this paper, we introduced a histogram equalization-based contrast enhancement method employing a region-based clipping technique tailored for dedicated LWIR-based thermal image processing. To assess its performance, we conducted visual and qualitative evaluations comparing the proposed method with conventional approaches under both best and worst driving scenarios. In the visual evaluation, it is evident that the proposed method enhances contrast and clarity compared to the conventional method. In qualitative evaluations of image processing performance and processing speed, the proposed method consistently demonstrates above-average metric results compared to the conventional method in both best and worst driving scenarios. However, as discussed in
Section 5, the objective evaluation metrics did not reflect the proposed method’s performance adequately. Hence, future work will involve conducting experiments to gauge the discrepancy between the objective evaluation metrics and user perspectives, with input from a larger pool of test evaluators.
Considering the processing speed, neither the proposed nor conventional methods met real-time performance standards. Therefore, our forthcoming endeavors will concentrate on assessing and improving processing speed. This will entail developing an accelerator in the form of a contrast enhancement processor using field-programmable gate array (FPGA), alongside exploring the development of application-specific integrated circuit (ASIC). Additionally, we will focus on comparing and analyzing the performance of dedicated contrast enhancement processors for infrared-based thermal imaging.
Through the findings presented in this paper and future research experiments, we anticipate significant enhancements in the image quality of mass-produced LWIR-based thermal cameras for night vision systems.