1. Introduction
Temperature is a key indicator for identifying anomalies in both living and inert systems. Infrared thermography (IRT) is a mature and widely accepted technique used as a non-contact temperature monitoring tool; it is used in the early detection of equipment failures and process anomalies in industrial operations [
1], as well as in medical diagnoses [
2].
Concerning medical applications of IRT, its use in diagnosing breast cancer, diabetes, neuropathy, and peripheral vascular disease has been highlighted [
2]. In a similar manner, IRT has been successfully used in detecting skin cancer [
3,
4], monitoring skin burns [
5], detecting problems associated with rheumatoid arthritis [
6,
7], detecting necrotizing enterocolitis [
8], evaluating infectious diseases (such as the coronavirus 2019 (COVID-19)), and detecting fever conditions by working together with algorithms based on artificial intelligence [
9,
10], among others.
Regarding the use of IRT for skin cancer detection, the performance of the active/dynamic IRT stands out over passive IRT [
11]. Buzug et al. [
12] evidenced differences in thermoregulation curves between healthy and cancerous skin areas. Çetingül and Herman [
13] proposed a methodology that quantifies the difference between the thermal responses of healthy skin and melanoma. Di Carlo et al. [
14] showed that active thermography, unlike dermoscopy, distinguishes a clear pattern to differentiate basal cell carcinoma (BCC) tumors from actinic keratosis (AK). Godoy et al. [
3] proposed a standardized analysis of dynamic thermography to discriminate malignant from benign lesions; the study included more than 100 patients, who presented benign, BCC, squamous cell carcinoma (SCC), or malignant melanoma (MM) lesions. A more sophisticated analysis of dynamic thermography is proposed by Godoy et al. [
4]; it combines the thermoregulation curves (TRCs) modeling and a detection theory scheme, achieving a sensitivity and specificity of 99%. Magalhaes et al. [
15] performed support vector machine classifiers to distinguish benign lesions (melanocytic nevi) from malignant melanoma lesions; this study explored features extracted from steady state (passive thermography) and dynamic thermography using a frequency of one frame per minute; thus, the steady state features were more relevant to solve this problem. Magalhaes et al. [
16] proposed a deep learning classifier for processing passive thermography, achieving an accuracy of 96.91% in discriminating between malignant and benign lesions. However, when differentiating malignant from benign lesions, its performance declined considerably, highlighting the potential of active thermography to address the challenge of increased inter-class variability. Soto and Godoy [
17] proposed key features and a scheme for skin cancer detection using active thermography and machine learning. Using a support vector machine (SVM) with a radial basis function (RBF) kernel classifier resulted in an accuracy close to 85%. Unlike other studies, Bu et al. [
18] used active thermography with warm excitation to propose a 3D model of heat evolution in skin tumors; the simulations revealed a dependency between tumor thickness and the maximum contrast parameter, which serves as a discriminant in tumor classification. Similarly, using a cold stimulus, Cardoso and Azevedo [
19] proposed a 3D model to analyze breast tumor sizes, considerably improving the contrast with the proposed methodology.
Despite decades of research and development of medical applications based on IRT, its massification has been limited mainly by the high cost of infrared (IR) cameras. According to Narayanamurthy et al. [
20], IRT needs optimum instrumentation for recording purposes, given that it is widely affected by external noise. Due to the continuous development of electronic technology, new detector array structures and new semiconductor alloys are available. This has reduced costs and sizes and increased the resolution and precision of IR devices [
21,
22]. The development of longwave infrared (LWIR) microbolometer technology has led to the creation of multipurpose, low-cost cameras compatible with smartphones [
23], and several other applications.
Recently, there has been high interest in the development of medical applications based on low-cost IR cameras, with tests being conducted mainly to support the evaluation of diabetic foot conditions [
24,
25,
26,
27,
28,
29,
30,
31]. Studies have evaluated skin burns [
32] and assessed the healing progress of thoracic surgical incisions [
33]. Villa et al. [
28] characterized and compared the low-cost cameras Seek Thermal CompactPRO and Thermal Expert TE-Q1 Plus (TE-Q1), using the high-end camera INO IRXCAM-640 as a reference. Their findings suggest that TE-Q1 is suitable for e-health applications, particularly when assessing diabetic foot ulcers. Due to the quality of the measured noise equivalent temperature difference (NETD), the residual non-uniformity (RNU) validated at 25 °C is comparable to the IRXCAM-640 camera.
To determine if a camera is useful for a task, its performance should be evaluated by acquiring a dataset with said camera, the data should be processed, and its performance should be calculated. Medical problems involve requesting permission from the clinical unit, coordinating work teams, looking for volunteers, and preparing informative documentation for volunteers, among other things. Given that space and time are limited in a clinical environment, it is relevant to know the feasibility of using such equipment before going to any clinical trial. Moreover, most investigations are initially performed with high-quality IR cameras, so it is relevant to properly model the decrease in performance with lower-quality imagers. In this paper, we propose an IR video degradation model that degrades videos captured with high-quality cameras to simulate the performance of lower-quality cameras. Our case study evaluates the feasibility of using Xenics Gobi-640, Opgal Therm-App, and Seek Thermal CompactPRO cameras for skin cancer detection through active thermography. This evaluation is based on patient data previously acquired with a higher-quality imager [
4]. However, the proposed degradation model can be applied to any kind of infrared camera, in any type of application in both passive and active thermography.
4. Results
Two main experiments were conducted and are shown here. The first aims to validate the proposed degradation model, and the second evaluates the feasibility of using low-cost IR cameras in skin cancer detection using active thermography.
4.1. Validation of the Degradation Model
To validate the degradation model, a simultaneous video was acquired with the three cameras positioned over a lesion on a volunteer. The video was captured according to the protocol described in
Section 3.2.1, and then each video was registered using the registration algorithm described in
Section 3.2.2.
To validate the model, the video acquired with the Xenics Gobi-640 camera was considered a high-quality video. Thus, the videos captured with the Opgal Therm-App and Seek Thermal CompactPRO cameras, which have lower quality and costs, will be mimicked. Following the methodology described in
Section 3.4, the video captured with the Xenics camera was adapted to the noise characteristics of the other cameras. First, the frame rate and image dimensions were reduced to match those of the camera being modeled. Second, the PSF characteristics of each camera were applied to the reduced-sized imagery. Third, characteristic spatial noise was added to the degraded imagery. Fourth, temporal noise specific to the camera being modeled was also added.
Thus, let
be a TRC captured at position
; · indicates that we are considering all time samples of the video. The temporal noise present in
is
, where
corresponds to the curve
modeled as a double exponential, as defined in (
1). Then, the low-frequency component is obtained by modeling
as a Fourier series, as defined in
Section 3.5.2, to obtain
. Meanwhile, the high-frequency component corresponds to white Gaussian noise, whose standard deviation is calculated over the residual noise, as described above.
Now, with the data cube acquired using camera c and the data cube from a high-quality camera degraded to mimic camera c, the Pearson correlation coefficient was calculated to estimate the level of similarity between the actual and simulated TRCs. Mimicking the Opgal Therm-App camera, which included 6889 TRCs, achieved an average correlation of . Meanwhile, mimicking the Seek Thermal CompactPRO camera, which included 7031 TRCs, achieved an average correlation of .
Given that the average Pearson correlation coefficient exceeded 0.9, the correlation between the real and simulated curves was deemed very strong [
50]; this indicates a high degree of similarity between the data collected with the actual camera and the data that mimicked it. Therefore, we consider the proposed degradation model to be valid within the scope of the TRC measurements. Clearly, more evidence must be collected to validate our model to mimic more cameras and different case studies.
A sample of the degradation model applied to the data cube acquired with the Xenics camera to Opgal camera characteristics is presented in
Figure 7, where
Figure 7a corresponds to an image of the IR video acquired with the Xenics camera, and
Figure 7d corresponds to a TRC of the cube.
Figure 7b corresponds to an image from the video acquired with the Opgal camera, and
Figure 7e corresponds to an image of the cube.
Figure 7c corresponds to an image of the video acquired with the Xenics camera that was modified to features of the Opgal camera according to the proposed model, and
Figure 7f corresponds to an image of the degraded cube.
Similarly, a sample of the degradation model applied to the same camera, modified to mimic the features of the Seek camera, is presented in
Figure 8.
Figure 8a corresponds to an image of the IR video acquired with the Xenics camera, and
Figure 8d corresponds to the TRC of the cube.
Figure 8b corresponds to an image from the video acquired with the Seek camera, and
Figure 8e corresponds to an image of the cube.
Figure 8c corresponds to an image of the video acquired with the Xenics camera that was modified to features of the Seek camera according to the proposed model, and
Figure 8f corresponds to an image of the degraded cube.
The proposed method does not rely on mimicking the TRCs exactly at every sample point, but to transfer all the noise characteristics from one camera to the other. Even though the TRCs are not visually identical, we observe that the low-frequency oscillations and high-frequency noise from the low-cost cameras are included in the modeled data (see
Figure 8e,f, for example). Some of the differences that are not transferred are due to the NUC correction shutter that the cameras have constantly operating.
A larger graphic sample is presented in
Appendix A. Where the results of mimicking each camera on the four types of lesions studied in this work are shown.
4.2. Feasibility Study of Using Low-Cost IR Cameras in Skin Cancer Detection
We now utilize the degraded data cube to assess the performance of the skin cancer detection algorithm we describe in
Section 3.3. As such, we can evaluate the performance one may obtain when low-cost infrared imagers can achieve this task.
To understand the variability of the classifier performance, each one of the modeled datasets was trained using the bootstrap method, which involved taking multiple samples with replacements from the original dataset and generating a new dataset of the same size. The modified dataset was then divided, allocating 80% of the data for training and 20% for testing. This process was repeated 2000 times. The results were previously reported by our group.
The algorithm performance with the original dataset acquired with the QmagiQ camera is shown in the second column of
Table 2. The results obtained by degrading these videos to match the characteristics of the Xenics, Opgal, and Seek cameras are presented in the following columns of the same table. The reported indices include accuracy, true positive rate (TPR), true negative rate (TNR), and positive predictive value (PPV), indicating the minimum, maximum, average (AVG), and standard deviation (SD) values for the evaluation.
As expected, the best performance is achieved with the QmagiQ camera, with an average accuracy of 87.29%, sensitivity (TPR) of 87.26%, specificity of 87.15%, and precision (PPV) of 87.39%. The adaptations to the Xenics and Opgal cameras offer a similar performance to that of the QmagiQ camera. The Xenics camera reached an accuracy of 84.33% and a sensitivity of 83.03%, while the Opgal camera achieved an accuracy of 84.20% and a sensitivity of 83.23%. Based on these results, both the Xenics and Opgal cameras are suitable for skin cancer detection, as they achieve similar levels of performance to the QmagiQ camera, with approximately a 3% difference in accuracy.
The worst performance was observed when adapting to the characteristics of the Seek camera, with an average accuracy of 82.13%, sensitivity of 79.77%, and specificity (TNR) of 83.74%. The sensitivity was approximately 8% lower than that of the QmagiQ camera. Consequently, this camera is not really suitable for the skin cancer application we are investigating.
Regarding the system’s performance in detecting the different types of skin cancer, the highest sensitivity was obtained in detecting SCC lesions, with values of 87.66%, 90.92%, 94.74%, and 95.94%, with the Seek, Opgal, Xenics, and QmagiQ cameras, respectively. Concerning BCC lesions, sensitivity values of 78.68%, 83.60%, 83.18%, and 87.23% were achieved with the Seek, Opgal, Xenics, and QmagiQ cameras, respectively. The worst performance was obtained in detecting MM lesions, with sensitivity values of 74.98%, 71.68%, 66.91%, and 76.42%, with the Seek, Opgal, Xenics, and QmagiQ cameras, respectively.
Table 3 presents the performances in terms of the sensitivity and specificity of the proposed methods using different cameras, along with the performances of highly trained dermatologists using naked eye evaluations and dermoscopy. The results obtained by Magalhaes et al. [
16] using passive thermography and deep learning are also presented. Dermoscopy significantly outperformed naked eye evaluation, demonstrating very high performance. However, it is important to note that this level of performance is achieved by dermatologists with years of clinical training. Along with this, the reported performances correspond to different detection problems. The passive thermography analysis method using deep learning proposed by Magalhaes et al. [
16] outperforms dermatologists in this detection problem. However, when distinguishing between malignant and benign lesions (with the malignant class including MM, BCC, and SCC), the algorithm’s performance drops considerably. This is where the relevance of active thermography analysis becomes evident, achieving over 80% sensitivity and 85% specificity. This work demonstrates that it is possible to significantly reduce equipment costs while maintaining similar performance.
5. Discussion and Conclusions
In this work, a degradation model of IR videos is proposed and evaluated to mimic the performances of different cameras in medical applications under laboratory conditions. Based on this model, videos captured with a high-quality camera were degraded to the characteristics of three low-cost imagers, namely, the Xenics Gobi-640, Opgal Therm-App, and Seek Thermal CompactPRO cameras. These synthetic datasets were then used to evaluate the feasibility of using the modeled cameras over the skin cancer detection algorithm proposed by Soto and Godoy [
17].
The proposed degradation model focuses on three key areas to accurately mimic the performance of any camera: temporal, spatial, and thermal resolution. It has been demonstrated that the model achieves a high level of similarity in the TRC, which is the most important aspect in applications that use active thermography. Moreover, qualitatively, it is appreciated that, spatially, the model manages to transmit the texture of the images captured by the modeled cameras.
The proposed model may not fully transfer the characteristics of a lower-quality camera to images captured with a higher-quality camera. The main characteristics that the model does not consider are as follows:
Adjustment of the image size. The model does not adjust the size of high-quality images to match those captured with lower-quality cameras when the FPA of the higher-quality camera is larger. This approach was not considered because it requires an interpolation process, which may introduce noise that is not characteristic of the camera being simulated.
Shape and size of the detector. This feature is critical for determining the minimum size of detectable objects. However, since the size of skin lesions is significantly larger than the detector size, this characteristic was not considered. The size of a typical detector in microbolometer technology is approximately 20 μm; with the right optics, it would be possible to detect a 40 × 40 μm lesion.
Temporal noise introduced by the shutter. When analyzing the temporal variations in the cameras due to changes in ambient temperature, the camera adjusts the offset and modifies the gain. However, incorporating this feature is problematic because the camera’s logic for applying these adjustments is unknown and cannot be determined.
Despite not considering the points mentioned above, the proposed model achieves a high similarity between the degraded videos and the original ones. We anticipate that when these cameras are used in a clinical environment, their behavior will mirror what was described in this study. This is because the cameras were evaluated in one of the worst scenarios in terms of ambient temperature fluctuations. The air-conditioning unit aggressively controlled the room temperature, and when activated, it decreased the temperature by approximately 5 °C, which translated into a drift in the temperature measurements. However, we do not rule out the possibility that the camera measurements might be influenced by other issues that we have not yet considered.
We believe that this model can be useful to evaluate the use of an IR camera in different applications, without the need to spend time, space, and other resources generating a dataset with a camera that does not fit the requirements of the application. However, as we have shown, IR cameras are highly affected by the ambient temperature, so it is important to characterize the behavior of the camera in the environment in which it will be used so that the simulation is as close as possible to its real performance. Nevertheless, some nuances may not be captured by our model, and that is exactly what we are currently doing in our current research.
We showed that the model manages to simulate the behaviors of the three cameras studied, achieving a high similarity of TRCs, with a Pearson correlation coefficient higher than 0.9. We believe that this model can be applied to most microbolometer technology chambers. However, we cannot be sure, especially in cameras with worse characteristics than those studied.
For our case study, we demonstrated that—within certain boundaries—the Xenics Gobi-640 and Opgal Therm-App are the most suitable IR cameras for skin cancer detection using active thermography. This is because their characteristics allow the evaluated algorithm to achieve similar performance to the high-quality camera. Moreover, the NUC approach in these two cameras is less aggressive, allowing discontinuities to be easily corrected over time. Meanwhile, the Seek camera presents an average decline of 5% in accuracy and 7% in TPR, so we do not consider it suitable for skin cancer detection using active thermography.
It is relevant to analyze the performance of the tool in detecting different types of skin cancer to observe its robustness in different scenarios. The performance of the system is outstanding when processing SCC-type lesions, reaching an average sensitivity between 87.66% and 95.94%; with BCC-type lesions, an average sensitivity between 78.68% and 87.23% is achieved. Whereas, when analyzing MM lesions, the average sensitivity is in the range of 66.91% to 76.42%. It is important to note that the dataset used is small, with only seven cases of MM, which implies that the tests performed with the bootstrap technique contain at most two IR cubes of MM lesions. This is a disadvantage at the time of training the detection models because it does not manage to represent the generality of the behavior of this type of lesion; and at the time of evaluation, it implies that when a case is missed, the sensitivity is reduced to 50%. Because of this, it is very important to increase the dataset to make the system more robust.
This study was conducted using a dataset captured from the New Mexican population, primarily involving Caucasian and Hispanic individuals. Although we consider the dataset small, it is representative of this demographic. We strongly believe that the malignancy of the lesions is encrypted in the thermal recovery of the skin, so we consider it important to increase the dataset to ensure it represents all types of populations. We are currently working to conduct a similar study in the Chilean population of the Biobío region. Conducting a clinical study involves many challenges, starting with the authorization of the clinical field, which must ensure the confidentiality of the patient’s data and that the patient is not exposed to any treatment that is harmful to his or her physical and mental health. For this study, we obtained informed consent from all subjects, and the clinical staff was responsible for anonymizing the data, ensuring that we could never correlate patient identities with their data. Another challenge is adjusting to the limited space and time to capture the data in a real public health clinical setting. This was a main motivator for developing the video degradation model, which allowed us to select the right equipment for the research. This model reduced the time required to use multiple devices simultaneously and improved our ability to work comfortably in constrained spaces.
Most of the algorithms attempt to find a good approach to detect skin cancer with the best-available sensor; our research attempts to change the paradigm by evaluating the feasibility of these algorithms when applied to data acquired with low-cost sensors. The similarity of the results between QmagiQ, Xenics, and Opgal cameras is due to the robustness of the detection algorithm. As evidenced by the results presented in
Section 4.2 and
Appendix B, more advanced classification techniques such as random forest, SVM, and XGBoost allow for similar performance in skin cancer detection when using high- or low-quality technology, as opposed to simpler algorithms, such as KNN. Thus, we can conclude that—thanks to advances in machine learning and feature extraction techniques—it is possible to utilize lower-quality technology. We hope to report better results soon; we are developing new statistical tools to extract the spatial thermal information of a process that is hidden within noisy TRCs. For this, the degradation model will play a key role in understanding the ways the actual data are acquired with low-cost imagers.
As with any other type of cancer, early detection is key for skin cancer patients. The tool presented in this work uses non-invasive, non-contact, and low-cost technology that can screen a suspicious lesion within a few minutes. The portability and low cost of our tool allow its rapid massification, allowing it to be accessible in primary care centers, even in difficult-to-access villages, where it is complex for a patient to be evaluated by a trained dermatologist. Our tool also aims to support specialists in whether to perform a biopsy or not; with this, we hope to contribute to the optimization of resources, avoiding unnecessary biopsies. this way, our tool contributes to public health by aiding in the early detection of skin cancer. It serves as an initial screening to refer patients to a dermatologist, supports the decision to perform a biopsy, and helps reduce the costs associated with more severe diseases caused by the late detection of cancerous lesions.