3.1. Spectral Range
Molecular spectroscopy across the near-infrared spectrum consists of multiple orders of overtone and combination transitions associated with vibrational modes for selected molecular bonds. The vibrational spectroscopy of water strongly influences the underlying spectral features observed in aqueous solutions owing to the high concentration of infrared-active O-H bonds. Hydrogen bonding modulates the absorptivities of the O-H bond, thereby resulting in a strong temperature sensitivity of the water spectrum over the near-infrared wavelengths. As noted above, numerous research groups have demonstrated the potential of measuring the temperature of aqueous solutions from an analysis of near-infrared spectra. Most notably, the principal water absorption bands across the near-infrared spectrum display a prominent blue-shift (a shift toward higher energy) with increasing temperatures.
The work summarized in the Introduction section featured different regions of the near-infrared spectrum for temperature measurements. Andrade and co-workers [
13] used spectra collected over the 600–1100 nm spectral range (16,700–9090 cm
−1), corresponding to the second order of the water O-H stretching vibration. Temperature models presented by both Brown and Kakuta [
12,
14] relied on spectral information collected over the first overtone of the water O-H stretching vibration mode (ca. 1300–1620 nm or 7690–6170 cm
−1). Small and his research team [
15] constructed temperature calibration models based on near-infrared spectra collected over the range of 2130–2380 nm (4700–4200 cm
−1). This spectral range sits between the fundamental water O-H stretch centered at 2780 nm (3600 cm
−1) and the first combination of the O-H stretch and H-O-H bending modes centered at 1920 nm (5200 cm
-1). The range used by Small’s group is an absorption minimum for aqueous solutions and features first-order combination bands associated with C-H, N-H, and O-H bonds in organic molecules. In combination, these studies illustrate how multiple spectral regions over the near-infrared spectrum are available for temperature measurements in aqueous environments.
In this work, absorbance spectra were collected over a set of proprietary wavelengths associated with the first overtone O-H stretch of water. This spectral band was selected on the basis of analytical sensitivity. In comparison, the absorptivity of the first-order O-H water band is approximately ten-fold higher than that of the second-order O-H water band, thereby providing higher measurement sensitivity [
16]. In addition, the shorter wavelengths of the first overtone spectral region offer deeper penetration depths in comparison to the longer wavelengths of the combination band. Deeper penetration depths enhance sensitivity because of longer effective aqueous path lengths.
3.2. Spectral Sensitivity to Temperature
Principal component analysis (PCA) was performed to assess the relationship between spectral variance and the temperature of the phantom matrix. Accordingly, a PCA calculation was performed on the full set of 700 tissue phantom spectra (seven phantoms × five temperatures × twenty spectra/temperatures). The spectra used for this PCA calculation corresponded to the raw absorbance spectra collected for each phantom without any prior spectral pre-processing.
The five score–temperature plots presented in
Figure 2 summarize the PCA results. For each score plot, the principal component scores are plotted relative to the temperature assignment to each spectrum. The plotted points are color-coded to distinguish each phantom, as specified in
Table 1. These first five principal components cumulatively explain 99.87% of the spectral variance throughout this tissue phantom spectral dataset.
The plots presented in
Figure 2 illustrate a strong relationship between spectral variance and the assigned phantom temperature. Although the scores plotted for the first principal component are relatively insensitive to sample temperature, significant correlations are evident between temperature and principal component scores for the second, third, fourth, and fifth principal components. Such correlations support the feasibility of multivariate calibration models for the measurement of phantom temperature.
Figure 2 also reveals several periods with extensive jumps in the principal component scores, particularly for Phantoms 1 (blue), 3 (yellow), 4 (purple), and 6 (cyan). This type of spectral variance is especially prominent in the second principal component score plots for Phantoms 3, 4, and 6, the third principal component plots for Phantoms 3 and 6, and the fourth principal component score plots for Phantoms 1 and 4. These types of variations correspond to sudden changes in the measured diffuse reflectance signal and are attributed to within-run instability of the interface between the sensing module and the phantom surface. Such instabilities constitute systematic errors, thereby justifying the exclusion of these data points as outliers from our analytical analyses. Of the 100 data points collected for each Phantom, nine, five, twenty, and fourteen outlier points were removed from Phantoms 1, 3, 4, and 6, respectively. Overall, the dataset size was reduced from 700 to 652 spectra, corresponding to a 93.1% retention. The removed data points are identified in
Figure S1, which shows the same five principal component score plots presented in
Figure 2, but with a box encompassing each of the removed outlier data points.
The PCA was repeated after removing the outlier spectra and the resulting score–temperature plots are presented in
Figure 3. This second set of scores is similarly distributed to the one displayed in
Figure 2 for the first, second, and third principal components. In contrast, major differences are noted for the fourth and fifth principal component plots in
Figure 2 and
Figure 3. For principal component 4, the score–temperature plots in
Figure 3 have a negative slope compared to the moderate positive slopes observed in
Figure 2.
Figure 3 also illustrates that the PCA scores for the fifth principal component are insensitive to temperature.
The score–temperature data plotted in
Figure 3 were fit to a linear function for each phantom. These fits provide a measure of the sensitivity of scores to temperature (slope), offsets between phantoms (y-intercepts), and goodness of fit within each phantom experiment (coefficient of determination, R
2). Results for the various regression calculations are summarized in
Table S1.
As expected, the regression slopes are low and the correlations (R2) between scores and temperature are poor for the first and fifth principal components. The largest slopes are observed for the second, third, and fourth principal components and, likewise, correlations between scores and temperature are strongest for these principal components (2nd, 3rd, and 4th). The range of R2 values across the seven phantoms are 0.87–0.99, 0.88–0.98, and 0.66–0.96 for principal components 2, 3, and 4, respectively.
Offsets are also evident in the score plots presented in
Figure 3. Spectral offsets between phantoms are most apparent in the plots for the first, second, and third principal components. Likewise, the relative variances of the y-intercepts listed in
Table S1 are higher for the first, second, and third principal components relative to the fourth and fifth. Relative variance values are 3.6, 0.6, 1.3, 0.007, and 0.03 A, respectively, for the five principal components represented in
Figure 3. These values illustrate that the highest degree of offset variance is described by the first principal component.
Differences in the chemical composition of the phantoms affect their absorption and scattering properties and could contribute to the observed spectral offsets. To visualize these effects, PCA scores were plotted as a function of the concentrations of water, gelatin, and Intralipid for the prepared phantoms. These plots are presented in
Figure 4, and
Table S2 summarizes the results of linear regression analysis for each of the five principal components. In general, the observed R
2 values are low (ranging from 0.002 to 0.21) for all plots except for the first principal component scores versus the concentrations of water and gelatin. The R
2 values for these water and gelatin plots are 0.73 and 0.70, respectively, indicating a correlation between spectral offsets and the chemical composition of the phantom. For comparison, the R
2 value is only 0.19 for the Intralipid plot for the first principal component. The similarities in R
2 for water and gelatin are caused by a strong concentration correlation for these two major matrix constituents (R
2 = 0.98). Of course, uncontrolled variations associated with the environment, instrumentation, and sample interface may also be potential contributors to the observed spectral offsets.
3.3. Partial Least Squares Analysis
PLS calibration models were examined for the ability to predict the temperature of the tissue phantoms based on spectra collected with the prototype solid-state laser spectrometer. Two modeling strategies were explored. The first involved splitting the full dataset into separate calibration and prediction subsets. In the second case, a leave-one-phantom-out cross-validation strategy was used. As implemented, both methods are designed to test the ability of the model to predict temperatures from spectral data collected outside the calibration period.
3.3.1. PLS Model from Calibration–Prediction Split
A single PLS calibration model was generated after splitting the dataset into calibration and prediction subsets. The spectral data collected for each phantom were separated into the spectra associated with the first three temperature steps and those associated with the final two temperature steps, as specified in
Table 1. Spectra in the first grouping were used to construct the calibration model while those in the second grouping were used to judge prediction accuracy. This strategy aims to demonstrate the ability of the PLS model to predict temperatures accurately from spectra collected outside the calibration period across the seven phantoms. For a given phantom, the spectral features associated with temperature steps performed outside of the calibration period are represented by temperature steps associated with the other phantoms.
The full dataset used in this analysis corresponded to the 700 spectra collected across all seven phantoms, excluding the 48 outlier spectra described above. As a result, the calibration dataset was composed of 372 absorbance spectra and the prediction dataset was composed of 280 absorbance spectra. As noted in the Materials and Methods section, temperatures were assigned to each spectrum by linear interpolation between the two bracketing temperatures.
The PLS calculation used mean-centered spectra by phantom. No additional pre-processing was attempted to improve performance. The only optimized parameter was the number of latent variables, or factors, used in the final PLS calibration model. A leave-one-phantom-out cross-validation strategy was used for this optimization. In this calculation, all spectra in the calibration dataset associated with a given phantom were removed and a PLS calibration model was developed using the retained calibration spectra along with their assigned temperatures. Temperatures were then predicted for each spectrum of the left-out phantom. This process was repeated until each phantom had been left out once and, when completed, the standard error of cross-validation (SECV) was calculated for the predicted temperatures associated with the left-out phantom spectra. The entire process was repeated for 1–10 latent variables and the optimum number of latent variables was determined based on Akaike’s information criterion, which is used here to avoid over- or under-fitting by providing a metric for assessing the balance between goodness of fit and model complexity [
17].
The following equation was then used to calculate the predicted temperature for the left-out spectra:
where
is the predicted temperature for a left-out spectrum,
is the left-out absorbance spectrum,
is the average absorbance spectrum for the calibration dataset, b is the PLS regression vector, and
is the average assigned temperature for the spectra in the calibration dataset.
Figure 5 shows the results from the optimized PLS model, which used three latent variables. In this figure, the predicted temperatures are plotted relative to the assigned temperatures for both the points used for calibration (red) and those used for prediction (yellow). In both cases, the calibration and prediction points align along the ideal unity line. Least squares linear fits provide a slope of 0.94 (±0.01), a y-intercept of 1.4 (±0.2) °C, and an R
2 value of 0.95 for the 372 calibration data points. For the corresponding 280 prediction data points, a slope of 1.00 (±0.02), a y-intercept of −0.1 (±0.4) °C, and an R
2 value of 0.94 were obtained.
Overall, the SECV is 0.26 °C, the standard error of prediction (SEP) is 0.22 °C, and the relative standard error of prediction (RSEP) is 0.94%, calculated according to Equations (3) and (4), where
is the reference concentration for the sample,
i, and
is the corresponding estimated concentration produced by the calibration model. The
n −
h − 1 term corresponds to the degrees of freedom for
n samples and a model based on
h independent variables with an intercept term.
The method used here splits the spectral data into separate calibration and prediction subsets according to when the spectra were collected. By using spectra collected from the first three temperature steps to predict the temperatures for the last two temperature steps, the model must have the ability to make accurate temperature predictions beyond the calibration period. In other words, extrapolated predictions are required. Still, the calibration and prediction spectra were collected at nearly the same time and extrapolations must only be accurate for a relatively short period beyond collection of the calibration data. For context, the data collected for each phantom were obtained over a period of 100 min (20 min per temperature step × five temperature steps), with the calibration data collected over the first 60 min and the prediction data collected over the next 40 min.
3.3.2. Leave-One-Phantom-Out PLS Models
A more rigorous test of the method’s analytical utility is to predict temperatures for a set of phantom spectra completely omitted from the calibration process. Similar to the first method, extrapolated predictions are required but the prediction set represents a sample matrix not seen by the model but with the same temperature range as the calibration set. From the perspective of a wearable device, this scenario would represent a calibration model predicting data from a different user whereas the first method would represent the prediction of a user’s data following a baseline period and inclusion of that user’s data to the calibration model. For this purpose, a leave-one-phantom-out strategy was evaluated. Accordingly, all spectra for a given phantom were set aside for prediction purposes while all spectra associated with the remaining six phantoms were used to build the calibration function. By using this strategy, seven unique PLS calibration models were prepared and evaluated.
As before, the outlier spectra were omitted, and the same temperature assignments were used. All spectra used for calibration purposes were mean-centered by the phantoms prior to input into the PLS algorithm. No other pre-processing was performed. The optimum number of latent variables was determined by using the same leave-one-phantom-out cross-validation method described above, and predicted temperatures were obtained from Equation (2).
Results for the leave-one-phantom-out process are presented in
Figure 6 and
Table 2. The series of temperature correlation plots in
Figure 6 present model outputs when using the optimum number of latent variables listed in
Table 2. In each case, temperature predictions for the calibration points align well with the unity line and are consistent with the SECV values listed in
Table 2, coupled with low bias and strong R
2 values. Across these seven calibration models, the SECVs range from 0.20–0.26 °C.
Prediction results are mixed for these models. In the best cases, predicted temperatures match the assigned temperatures and the SEP values are similar to the corresponding SECVs. Specifically, the models obtained when Phantoms 5, 6, and 7 were held out have similar SEP and SECV values and the R
2 values indicate a strong correlation between the predicted and assigned temperatures. When Phantoms 2, 3, and 4 were held out, on the other hand, the SEP values were higher than the SECVs, but the correlation between the predicted and assigned temperatures remains strong, as indicated by R
2 values between 0.96–0.97. Such findings suggest a prediction bias for these models, which is verified by the calculated bias values listed in
Table 2.
The poorest performing model was obtained when the data from Phantom 1 were held out. In this case, the SEP is a factor of 3.25 higher than the SECV, the R
2 is 0.67, and the bias is −0.20. Inspection of
Figure 6A reveals a grouping of temperature predictions between assigned temperatures of 23 and 24 °C, where the trend in the predicted temperatures is nearly orthogonal to the unity line. Inspection of
Figure 3 over this temperature range for Phantom 1 uncovers a series of spectral measurements that deviate from the trend, suggesting a systematic error corresponding to variations at the interface between the phantom and the sensing module. These points were not identified as outliers in our analysis described above, and therefore were retained in this dataset.
Despite the observed biases and systematic deviations in temperature predictions, the cumulative results presented in
Figure 6 and
Table 2 strongly support the analytical utility of near-infrared spectroscopy to predict temperatures within these tissue-simulating phantoms. Our analysis illustrates strong correlations between predicted and assigned temperatures even when the chemical composition of the sample is not represented in the calibration dataset.
Measurement biases are evident from the leave-one-phantom-out analysis, particularly when Phantoms 2, 3, and 4 are left out. Both positive and negative biases were observed for the analysis of these three phantoms (see
Table 2). Differences in the absorption and scattering properties of the phantom matrixes could potentially contribute to spectral offsets, and thereby contribute to the observed concentration biases. If matrix composition were responsible here, a trend in the magnitude of the bias would be observed with respect to the amount of water in the phantom matrix. Such a trend is not observed, which suggests phantom composition is not responsible for the offsets. For this reason, it is likely that variations in the placement of the sensing module on top of the sample surface are principally responsible for the observed offsets.