Leveraging over twenty years of knowledge and refinements, the split window algorithm used in this work was initially proposed by Becker and Li, (1990) for the AVHRR instrument [
6]. Wan and Dozier, (1996) generalized the algorithm to enable its utility for other dual-band instruments with a final adjustment made by Wan, (2014) to improve its performance over bare soils for the MODIS instruments [
7,
8]. The final form of the split window algorithm used here is
where
2.1. Derivation of the b-Coefficients
The flowchart in
Figure 1 illustrates the training process that was performed to derive the b-coefficients shown in Equation (
1). The radiative transfer model, MODTRAN [
9], was used to simulate a representative range of environmental acquisition parameters. Atmospheric profiles from the Thermodynamic Initial Guess Retrieval (TIGR) database [
10] were used to characterize atmospheric effects; seven surface temperatures bracketing the temperature of the lowest layer of each atmospheric profile were used as input [
6,
7]; and spectral emissivities of natural materials were obtained from the MODIS UCSB emissivity library [
11].
The TIGR database is a climatological library of 2311 unique atmospheric profiles that were categorized from 80,000 radiosondes. The profiles are classified into air masses (i.e., Tropical, Mid-lat1, Mid-lat2, Polar1, Polar2) that are consistent with MODTRAN’s default atmospheres but provide a richer and more densified representation of potential atmospheric effects that may be observed from a spaceborne platform. Temperature, water vapor, and ozone data are delivered at 43 predefined pressure levels ranging from 1013 mb (millibars) to 0.0026 mb [
10].
Figure 2 shows plots of the 2311 atmospheric profiles provided in the TIGR database categorized by airmass. For comparison,
Figure 2 (bottom right) shows the default MODTRAN profiles.
To be consistent with previous efforts and to satisfy the assumption that split window is most appropriate to be used for surface temperature retrieval when the ground temperature is close to the air temperature [
6,
7], seven surface temperatures bracketing the temperature of the lowest layer of each atmospheric profile were used as input to the forward model. Surface temperatures between −10 °C <
t0 < 20 °C in 5 °C steps were used in this study, where
t0 is the temperature of the lowest atmospheric layer. While some specific applications may warrant an extension of the range used to train the model (e.g., studies of urban heat island), the traditional range used here is appropriate for natural materials.
The MODIS UCSB emissivity database was used to provide a representative range of natural materials as input to the forward model [
11]. With 74 unique soils and minerals, 28 unique types of vegetation, and 11 forms of water (including snow and ice), this database provides 113 unique spectral emissivities between 8 and 14 μm that can be used to train the model in Equation (
1). Note that man-made materials are included in the MODIS UCSB emissivity database but were excluded in this study as the TIRS instrument has a spatial resolution of 100 m and was designed for environmental applications.
Referring again to
Figure 1, all parameters described above were provided as input to a forward model that uses MODTRAN for the atmospheric radiative transfer process to generate at-sensor spectral radiance. At-sensor, band-effective radiance was calculated by sampling the simulated top-of-atmosphere spectra with the TIRS spectral response functions, and apparent temperatures (
Ti,
Tj) were determined by developing and utilizing a predefined look-up table that relates band-effective radiance to blackbody temperature for Bands 10 and 11 of TIRS. Finally, band-effective emissivities were calculated by sampling the 113 spectral emissivities with the TIRS spectral response functions.
Note from
Figure 1 that modeled data were filtered to only include scenarios where the relative humidity is less than 90%. This was performed to remain consistent with previous studies [
12] and to eliminate saturated atmospheric conditions, which represents a challenging scenario for ST retrieval. Future work will explore and include higher humidity cases as needed. Nevertheless, with all the components of Equation (
1) determined, ST was regressed against the independent variables to determine the b-coefficients that best fit the model in a least-squares sense.
Table 1 shows a comparison of the b-coefficients derived in this study versus the b-coefficients derived in Du et al. (2015) [
12], along with the residual retrieval error when these coefficients are applied to the simulated data. Note that although the same split window algorithm was used, the derived b-coefficients could be significantly different due to the desired application. For example, Du et al. incorporated man-made materials into their training process to enable the utility of split window applications over urban areas. The impact of this training methodology on environmental applications is discussed in
Section 3 and
Section 4.
2.2. Emissivity Estimation
Once the b-coefficients are derived, estimation of emissivity remains the one unknown in Equation (
1). To be consistent with the single channel methodology used to derive Landsat surface temperature products in the Collection 2 release, the algorithm used to estimate broadband emissivity in the existing single channel workflow was mirrored in this study but extended for the TIRS dual-band instrument. To summarize the existing workflow, ASTER emissivity products that spatially cover the Landsat scene of interest are ingested and a spectral adjustment is made to estimate the equivalent TIRS emissivities. The spectrally-adjusted emissivities are then modified based on observed in-scene conditions (e.g., emissivities may be modified if snow or vegetation is present in a scene).
The ASTER global emissivity dataset (ASTER-GED) v3 contains worldwide emissivity maps at 100 m spatial resolution. The dataset was compiled using clear-sky scenes acquired between 2000 and 2008. Emissivities were calculated with the temperature emissivity separation algorithm (TES) and the water vapor scaling (WVS) atmospheric correction algorithm, and are available for all five ASTER TIR bands centered at 8.3, 8.6, 9.1, 10.6, and 11.3 μm. The ASTER-GED has been validated to an absolute band error of 1% [
13].
To enable an adjustment of the ASTER emissivities to the spectral response of the TIRS bands, a linear relationship that relates ASTER-observed (Bands 13 and 14) to TIRS-estimated (Bands 10 and 11) emissivities was developed. Note that ASTER Bands 13 and 14 were used here as they have the most overlap (spectrally) with the TIRS bands. To develop this relationship, the 113 spectral emissivities from the MODIS UCSB emissivity database described in
Section 2.1 were used. Band-effective emissivities for Bands 10 and 11 of TIRS were regressed against the corresponding band-effective emissivities for Bands 13 and 14 of ASTER to derive the coefficients shown in Equations (
2) and (
3).
where,
Note that an estimation of the residual errors associated with these relationships can be made by applying Equations (
2) and (
3) to the band-effective ASTER data for the 113 emissivities. The residual errors between the estimated band-effective emissivities can then be compared to the actual band-effective emissivities (as modeled here). The standard deviations of the residual emissivities in this simulated context are 0.001 (0.1%) and 0.005 (0.5%) for Bands 10 and 11, respectively.
Since the ASTER emissivity database represents averages over a nine-year period, modifications were made to the spectrally adjusted emissivities based on observations made by the Operational Land Imager (OLI), the TIRS reflective band counterpart onboard Landsat 8. Specifically, per-pixel normalized difference vegetation indices (NDVI) and normalized difference snow indices (NDSI) were calculated with the OLI. NDSI was computed by dividing the difference in reflectance observed in the Landsat 8 green band (0.53–0.59 μm) and the shortwave infrared band (1.57–1.65 μm) by the sum of the two bands [
14]. To make the NDVI adjustment, bare soil locations were estimated when the ASTER NDVI data were less than 0.1, and the Landsat vegetation emissivities adjusted accordingly based on the Landsat calculated NDVI. Snow locations for NDSI were set to 0.9876 and 0.9724 respectively for Band 10 and Band 11, where the calculated NDSI was larger than 0.4. A comprehensive description of the adjustments can be found in Malakar et al. (2018) [
1].
2.3. Surface Reference Data Sources
Several sources of surface measurements were used as reference to validate the efficacy of the split window algorithm as trained here for Landsat’s TIRS instruments. Several land-based instrumented sites, including three sites from the SURFRAD [
15] network and one site from the Ameriflux [
16] network, were used in the assessment. Additionally, the National Oceanic and Atmospheric Administration (NOAA) buoy network [
17] and the NASA Jet Propulsion Laboratory (JPL) instrumented buoys [
18,
19] were used to provide reference data over water.
NOAA established the surface radiation budget observing network (SURFRAD) to provide accurate, high-quality broadband solar and thermal upwelled and downwelled irradiance to support climate research, satellite retrieval validation and modeling, and weather forecasting research [
15]. The current SURFRAD network consists of seven locations selected to represent diverse climates in the United States [
15]. Note that three sites were chosen for this initial analysis due to their high spatial uniformity across an extended region. The three sites consist of agricultural land (Goodwin, Mississippi, US), bare soil (Desert Rock, Nevada, US), and grassland with a high inter-annual variation of snow cover (Fort Peck, Montana, US).
Each SURFRAD site is equipped with two Eppley Precision Infrared Pyrgeometers (model PIR) to collect measurements of broadband (4–50 μm) thermal infrared irradiance. The PIR pyrgeometers have a field-of-view (FOV) of 180 degrees and measure longwave irradiance with an uncertainty of ∼1.5% [
20], which leads to a reported uncertainty of less than 1 K in the retrieved LST [
21]. One pyrgeometer is upward facing and the other is downward facing to measure downwelled atmospheric irradiance and upwelled surface-leaving irradiance, respectively. Data from 1998 to 2009 were collected every three minutes, and every minute thereafter. The data has a quality flag to indicate failed internal quality checks. A detailed description of the SURFRAD instrumentation at each site can be found at:
https://www.esrl.noaa.gov/gmd/grad/surfrad/overview.html.
FLUXNET is a vast global network of more than 800 sites for in-situ flux measurement. Regional networks contribute to the FLUXNET data, one of which is a group of sites across the Americas called AmeriFlux. There are hundreds of AmeriFlux sites, with 44 flagged as “core” sites. These core sights deliver timely data, receiving support from the AmeriFlux Management Project (AMP) to ensure high quality data collection at 30 min intervals. Since not all sites measure upwelled and downwelled thermal radiation, the core sights were filtered for spatial uniformity, activity between 2013 to 2018, and having a sufficient number of upwelled and downwelled infrared observations. Only one site passed these criteria; namely, the University of Michigan Biological Station (UMBS) [
22]. This site is located within a protected forest of mid-aged northern hardwoods, conifer understory, aspen and old growth hemlock. The UMBS AmeriFlux site is equipped with a CG4 pyrgeometer from Kipp and Zonen to measure broadband (4.5 to 42 μm) thermal irradiance. The CG4 pyrgeometer, similar to the SURFRAD instrumentation, has an FOV of 180 degrees with an instrument uncertainty of less than 3% [
20], and temperature uncertainty of ±0.02 K [
23].
To estimate the in-situ ST using SURFRAD and AmeriFlux networks, the Stefan–Boltzmann law is manipulated to derive the following relationship [
15]:
where
ϵ represents the broadband emissivity,
is the Stefan–Boltzmann constant, and
E is the measured irradiance
. Broadband emissivity can be retrieved from narrowband satellite emissivities via empirical relationships (Wang et al., 2005) [
24]. However, this approach uses a combination of broadband emissivity from 8 to 12 μm and 14 to 25 μm. The latter range is from an emissivity library containing only measurements of minerals and does not include data beyond 25 μm because of the strong atmospheric absorption and weak thermal signals. For these reasons, the average emissivity of TIRS Bands 10 and 11 that is estimated from image data (as described in
Section 2.2) was used in Equation (
4) for this analysis.
When used in conjunction with land-based measurements, water represents a desirable target for surface temperature validation, as its emissivity is spectrally stable and well-defined [
25]. NOAA operates a suite of worldwide instrumented buoys that collect, among other variables, water temperature. The data are freely available and delivered through their National Data Buoy Center website [
17]. Measurements from thirty-six buoys in the near-shore of the United States coastline were used as reference in this work, with a bulk to surface adjustment, since measurements are obtained at depth [
26]. Note that Zeng et al. (1999) estimate the uncertainty in skin temperature estimation to be approximately 0.35 K, which includes measurement uncertainty.
In addition to the NOAA sensor suite, NASA’s Jet Propulsion Laboratory’s (JPL) instrumented buoys located in Lake Tahoe, California and Salton Sea, California are attractive sources of reference data. Lake Tahoe is approximately 1900 m above sea level, and with average lake temperatures ranging from 5 to 25 °C throughout the year [
18], it is an attractive cold water target for surface temperature validation. Alternatively, Salton Sea is located in Southern California and is approximately 70 m below sea level. With lake temperatures exceeding 35 °C, it is an attractive warm water target [
19]. The JPL data used for the validation efforts presented here are made freely available by JPL [
18,
19].
Referring to
Table 2, over 1500 Landsat Level-1 Terrain-Corrected (L1T) TIRS scenes acquired between 2013 and 2018 were processed with the split window algorithm and the derived surface temperatures compared to reference measurements acquired from the various sites during the Landsat 8 overpass. For comparison, and to gauge the fidelity of the presented split window implementation, the same L1T scenes were processed to surface temperature using split window with the b-coefficients suggested by Du et al. (2015) [
12] and using the single channel methodology [
1] that will be delivered to users in Collection 2.
2.4. Geometric Considerations
An initial application of Equation (
1) to the TIRS (L1T) image data resulted in undesirable artifacts in the final surface temperature product; see an example of Lake Ontario, NY in
Figure 3. The ST product derived from the single channel method is shown on the left for visual reference, while the split window-derived surface temperature image is shown on the right. Clearly, ringing artifacts can be observed at sharp transitions (edges) in the data; e.g., along the Lake Ontario shoreline as shown in the zoom windows of
Figure 3. Note that the derived surface temperatures in the single channel method are roughly two degrees warmer than the temperature derived from the split window method. This discrepancy will be discussed further in
Section 3.
To understand the source of these artifacts, a brief background of the TIRS focal plane is required. Referring to
Figure 4, the TIRS focal plane array (FPA) consists of three staggered detector arrays to cover the 185 km cross-track FOV of the instrument at a ground sampling distance (GSD) of approximately 100 m. Spectral filters are placed on the FPA detectors to produce detector rows with the desired spectral band shapes (Band 10 centered at 10.9 μm and Band 11 centered at 12.0 μm). When imaging in the nominal pushbroom mode, image data are recorded from one row of detectors in each filtered region and an image interval of the Earth is assembled as the instrument travels in orbit. Although band-to-band registration is well-within the defined specification for TIRS [
5], the physical layout of the detector arrays along with the read-out sample timing in the along-track direction leads to an inherent misregistration between the Band 10 and Band 11 images. This amounts to an along-track offset of the instantaneous fields-of-view (IFOV) of the detectors in the two bands (note that the magnitude of the offset is much less than the size of the pixel). The TIRS 100-m image data is upsampled to 30 m data in the final step of the Landsat product generation process in order to match the spatial resolution of the OLI sensor. The process of upsampling exacerbates the misregistration offsets due to the fact that the along-track band offset is now a significant fraction of the 30 m pixel. When the differences between the band images are calculated as part of the split window algorithm, the along-track offsets become magnified in the product.
From a technical perspective, applying and delivering a split window-derived ST product at the nominal TIRS resolution (100 m) represents an ideal scenario to avoid artifacts introduced by the algorithm and the upsampling. However, achieving this solution would require a significant deviation from the existing EROS processing pipeline and would result in a product that differs in resolution from the other products (e.g., surface reflectance) being released in Collection 2. Alternative solutions that mitigate the spatial artifacts, yet preserve radiometric fidelity and the 30-meter resolution of the ST product, have been investigated.
To motivate a potentially desirable solution,
Figure 5 shows the contributions of each term in Equation (
1) to the final surface temperature product for the scene in
Figure 3. Columns 3 and 4 of this table were populated by calculating the scene-wide mean and standard deviation, respectively, of each term in Equation (
1). Accordingly, column 3 represents the average magnitude of each term’s contribution to the final ST product, while column 4 represents the spatial variability introduced by each term to the final ST product. Note from the values in columns 3 and 4 that the additive terms in Equation (
1) (highlighted in blue) contribute most of the overall magnitude and variability to the final ST product for the scene in
Figure 3. Conversely, note from the values in columns 3 and 4 that the difference terms from Equation (
1) (highlighted in gray) contribute significantly less information to the final product. Since the difference terms introduce the artifacts shown in
Figure 3, and their radiometric contribution to the final product is relatively small, a proposed solution to mitigate these artifacts is to apply a 5 × 5 smoothing filter to the Band 10 and 11 apparent temperature images for terms
b4 through
b7 in Equation (
1). Recall that the TIRS nominal ground sampling distance is approximately 100 m, but the calculated full width at half maximum (FWHM) of its point-spread function is approximately 200 m (see Wenny et al., 2015) [
27]. Therefore, averaging the upsampled 30 m data to 150 m will not significantly alter the image data collected by TIRS. Comparing the nominal standard deviations for the
b4 through
b7 terms (column 4: gray terms) to the 5 × 5 smoothed standard deviations, as suggested here (zoomed: gray terms), smoothing has a negligible impact (less than 0.1 K) on the scene-wide variability observed in the final proposed ST product for the scene in
Figure 3.
While smoothing the difference terms in Equation (
1) appears to have negligible impact on radiometric fidelity, its effect on mitigating the geometric artifacts in
Figure 3 is dramatic.
Figure 6 shows the single channel ST product (left), the nominal split window ST product (middle), and the proposed split window ST product (right). Note that the single channel product is presented here for reference, as it should not exhibit the artifacts described in this section. By visually inspecting the zoom windows in
Figure 6, the artifacts present in the nominal split window product (middle) are essentially removed with the proposed methodology (right).