1. Introduction
Accurate and continuous ocean surface wind data at regular grids spanning a large domain are highly useful, especially for coastal cities. Such datasets are important for understanding tropical cyclones (TCs), the most dreaded natural disasters [
1,
2], as well as non-TC severe storms [
3,
4,
5], which are usually accompanied by strong winds. In Hong Kong, a No. 8 wind signal is hoisted when a sustained wind speed is within the range of 63–117 km/h (17.5–32.5 m/s), at which point the entire city is shut down, leading to heavy economic losses. The horizontal structure of ocean surface winds for TCs is critical to the decision-making regarding whether a No. 8 wind signal should be enforced.
However, ocean surface wind observations suffer poor spatial coverage, intermittent temporal frequency, and low accuracy under high wind conditions [
6,
7]. As TCs are associated with strong winds, immense cloud coverage, and heavy precipitation, measuring ocean surface winds within TCs is extremely difficult using either ground-based or space-borne techniques.
Satellite-based wind measurements are valuable, although their coverage is generally limited. A wide range of satellite measurements [
8], including those from radiometers and scatterometers such as the Advanced Microwave Scanning Radiometer-2 (AMSR2) [
9], the Soil Moisture Active Passive (SMAP) [
10], the advanced scatterometer (ASCAT) on the meteorological operational (MetOp) platform [
11], and the Cyclone Global Navigation Satellite System (CYGNSS) [
12], as well as various synthetic aperture radar (SAR) instruments (Radarsat-2, the RADARSAT Constellation Mission or RCM, and Sentinel-1) [
13,
14], are available in regions affected by TCs. Additionally, buoy measurements that rely on wind sensors installed on moored ocean floaters [
15] complement these satellite datasets.
Global ocean wind vector data reanalysis datasets, such as the Cross-Calibrated Multiplatform (CCMP) ocean winds, were developed to provide global coverage on regular spatial grids and time intervals through data fusion techniques [
7,
16]. Other notable reanalysis datasets include the ECMWF v5 Reanalysis (ERA5), the National Centers for Environmental Prediction (NCEP) Reanalysis, Modern-Era Retrospective Analysis for Research and Applications (MERRA-2), and Japanese 55-year Reanalysis, all of which have an ocean wind product. However, these wind datasets typically exhibit a low bias compared to the measurements obtained from satellites, aircraft or ground-based instruments [
17,
18]. In contrast, CCMP winds demonstrate a significantly better ability in capturing strong winds compared to other reanalysis datasets, valuable for studying high-wind structures. However, low biases in CCMP remain a problem. Kern et al. [
19] found that the 10-meter neutral stability winds from CCMP are biased low by 0–4 m/s compared to in situ measurements in the 0–18 m/s wind speed range at equatorial locations. Paiva et al. [
20] reported that within the 40° longitude × 50° latitude region around the Brazilian coast, when wind speeds exceed 12 m/s, ERA5, CCMP, and ASCAT exhibit low biases relative to buoy measurements, with magnitudes of −2.0 m/s, −1.5 m/s, and −1.0 m/s, respectively. For wind speeds below 10 m/s, biases decrease to 0.5 m/s and reverse sign at 6–7 m/s. Xiang et al. [
21] indicated that, relative to buoy measurements, the 10-meter ocean winds for ERA5, CCMP, and ASCAT-C are biased low by 0.46 m/s, 0.4 m/s, and 0.11–0.16 m/s, respectively; these biases, despite being small due to mixed latitudinal ranges and wind conditions, support the understanding that CCMP’s low bias is less severe than ERA5’s but generally worse than ASCAT’s. It is notable that these bias levels vary because they are affected by a variety of factors such as the data version, time period, regions, and high- or low-wind conditions.
In this study, we are interested in high wind conditions, with a focus on the TCs. We will compare CCMP with SAR-, radiometer-, and scatterometer-measured ocean wind speeds under high-wind conditions from February to October 2023 and develop a machine learning (ML) model for reconstructing the high winds in the core of the TCs. The ML model will reproduce ocean wind speeds on the same grids as CCMP. The modeled wind speeds should be drawn closer to SAR, since the latter was used as the true state. Using an ML approach to retrieve ocean wind speed has been explored before. For example, Lu et al. [
22] utilized the spatiotemporal information from CYGNSS observations as features to predict ocean wind speed using a convolutional neural network-long short-term memory (CNN-LSTM) model, which predicts spatial features and temporal dependencies, and assessed the results using ERA5 as the reference state. Zhang and Yin [
23] applied an improved purely 2D CNN approach to predict spatiotemporal offshore wind speed characteristics from a large spatial-scale perspective using a reanalysis dataset, focusing on wind-based energy-harvesting hotspots in the North Atlantic. This latter approach also used reanalysis as the reference state to evaluate model predictions. The current study differs from these previous studies in several ways. First, we focus on the TC tile that changes its location along its track. Second, rather than using reanalysis as the reference state, we aim to correct it. Third, spatial and temporal variabilities are treated consistently using a random forest (RF) model, as a TC structure is well-organized and remains somewhat conserved over several hours during its life cycle. While CNN is a conventional approach for image processing, in a TC, we do not deal with a hierarchy of complex structures. Given a CCMP map, the RF regressor appears to be an effective approach in significantly correcting the low bias in CCMP, as demonstrated in the paper. The RF regressor employs an ensemble learning technique; during the training phase, it constructs a large number of decision trees and outputs the mean prediction of each individual tree. This method tends to yield more accurate results, is less likely to overfit, and is more resilient to outliers compared to other regression approaches, such as decision trees. The pixel-wise CCMP and SAR ocean wind speeds are tabulated as pairs for the application of RF, in addition to other features. In this stage of the study, our goal in building the RF model is to identify the key features that shape the relationship between CCMP and the SAR-captured TCs, ultimately creating a sequence of more realistic wind speed maps along a given TC track at small time intervals (i.e., every 0.5 h).
4. A Machine Learning Model to Produce a Dataset Drawn Closer to SAR Within TCs
As previously stated, our objective in validating the CCMP ocean winds is to investigate the feasibility of using artificial intelligence (AI) to develop a comparable dataset to CCMP that can better represent high-wind structures. For our initial efforts, we have focused on SAR-measured winds to serve as our true state, as they provide the most detailed structure and the highest winds within a TC compared to other satellite datasets. However, we should keep in mind that SAR winds are not bias-free. For example, the Sentinel-1 Level-2 OCN wind product is biased against the weather buoy data by magnitudes ranging from −1.5 m/s to 0.5 m/s as the wind speed increases from zero to 16 m/s, with a sign reversal at 12 m/s [
42]. We used CCMP and SAR coincident tile-pairs to train a machine learning model using the random forest (RF) regressor. This model is applied to generate wind fields in regions similar to SAR tiles where CCMP and SAR both contain a TC, aiming to unravel the relationship between CCMP and SAR winds.
The key hyperparameters of an RF model are (1) the number of features selected for each split; (2) the sample size for each tree; (3) the minimum number of observations in a terminal node; (4) the number of trees in the forest; and (5) the criteria for splitting nodes. Using fewer features as predictors can lead to suboptimal tree performance but will lead to more diverse and less correlated trees, allowing the model to better capture variables with moderate effects on the outcome. Nevertheless, the benefit of using the full set of features is to ensure that there is at least one strong variable among the candidates for splitting. This can enhance the overall performance of the model [
43]. In this study, we adopted the latter approach by utilizing the regressor’s default option to efficiently achieve a favorable outcome. This decision was based on the necessity of including the strongest predictor, which is the CCMP wind speed, for the model to function effectively. The effect of sample size aligns with that of the number of features, although it may have a lesser impact. The node size parameter sets the minimum number of observations in a terminal node, affecting tree depth and splits. Decreasing the node size results in deeper trees with increased splits, which can lead to longer runtimes for large sample sizes. The number of trees is also crucial, as more trees generally enhance model performance. However, the most substantial performance improvement typically occurs within the initial 100 trees [
44,
45]. We have used a node size of one and a forest size of 100 trees in the model training.
The predictors chosen for this study include the CCMP wind speeds, distance from the eye center in longitude and latitude, i.e., Δlon and Δlat, Δlon and Δlat from the eye region, great-circle (GC) distance from TC eye center and eye region, for each of the two datasets, respectively. Regarding the distance to the eye region, the following is true: when the pixels are among the eye-region pixels, the distance from the eye region will be zero, otherwise it is the closest distance. These predictors are not entirely independent from each other. We came to use two of their subsets and then the full set of the predictors to build models, labelled Type-1, Type-2, and Type-1n2 models.
Figure 20 shows scatter plots comparing SAR data to model predictions, to illustrate the performance of the trained RF model. Among the legends in
Figure 20, the error represents the mean absolute difference across all pairs of relevant ocean pixels, irrespective of tiles or timeframes. The bias is the mean difference over all pixels used; the RMSE is the square root of the sum of the bias squared and the variance. The correlation coefficient is derived from all pixel pairs in arbitrary order, without considering spatial or temporal dependencies. Accuracy is determined as 100 minus the median percentage absolute error. For individual percentage absolute errors, the SAR measurement serves as the denominator instead of the mean of SAR and CCMP, as SAR measurements are considered the true state. The total set of TCs named above is divided into training and testing sets at a 75 versus 25% split. The latter is also a blind TC set that will be used to fully test the RF model’s potential to correct individual TC stances. When training the RF model with the training TC set, a similar 75% versus 25% split is applied, with 75% of pixel pairs used for model training.
The scatter plot for the training pairs (
Figure 20a) shows zero bias, an absolute error and RMSE of less than 1.0 m/s, a correlation coefficient of 0.99, and 98% accuracy. These results demonstrate a strong agreement between the pairs in the training set, validating the effectiveness of the chosen predictors. However, overfitting is not ruled out, as hyperparameters have not yet been tuned.
When the model is applied to the 25% of pairs not used in model training (that still belong to the training TC-set), as shown in
Figure 20b, the agreement is also remarkable, with a zero bias, 1–2 m/s error, 0.96 coefficient of correlation, and 94% of accuracy. This high level of agreement can be attributed in part to the fact that these pairs originate from the same TCs used for training, despite not being part of the training set of pixel pairs. This consistency may be due to the continuity of the morphology of a specific TC across different time frames.
Upon testing the trained model on a blind set of TCs, which are the additional 25% of the TCs reserved for testing purposes, slight variations in results are observed between the three models trained with different predictor combinations.
Figure 20c showcases the Type-1n2 model, with comprehensive details provided in
Table 1 and
Table 2. When utilizing all predictors in the Type-1n2 model, a bias of 0.3 m/s or lower, an absolute error close to 3 m/s, a correlation coefficient between 0.87 and 0.88, and an accuracy of 89% are achieved. The uncertainty in these results is twice as large as that seen in
Figure 20a,b, which is expected.
Table 1 presents the statistical moments for all three models. While the results of all three models are notably similar, the Type-1n2 model exhibits the highest bias (−0.32 m/s relative to −0.18 m/s), indicating that the inclusion of all predictors may not necessarily lead to optimal results. Although these biases are minimal, they persist as low biases compared to SAR data, indicating substantial correction in CCMP data yet without a reversal in direction. Although the Type-1n2 model leads to a larger bias, it marginally reduces the absolute error and RMSE. Notably, Type-2 outperforms Type-1 in terms of both bias and absolute error.
As pointed out by Probst et al. [
45], RF models generally provide good results, and tuning them leads to only a small performance gain. In our case, once the predictors have been decided, tuning the hyperparameters results in minimal differences in the indices shown in the scatter plots. In the Type-1n2 model, for example, rather than using all predictors in each split, we adjusted the number of features (used when a split occurs) to 3 and observed a slight improvement in the mean absolute error (by 0.03 m/s) while maintaining the same mean accuracy, bias, and correlation coefficient. Adopting the number of trees as 50, 100, and 200 also leads to a similar degree of differences. In summary, these differences are too small to be significant.
Once the RF models are being established, we refocus on our primary goal, which is to apply the model to individual tiles rather than a mixture of pixels. To evaluate the model’s predictive performance on individual tiles, we apply the trained models to each TC tile individually within the blind set and generate histograms for various statistical moments. The accuracy, bias, correlation, and STD are then presented in the four panels of
Figure 21.
Notably, results from all three RF models have improved the “accuracy” compared to CCMP, reducing the percentage absolute error by 4%. The accuracy histograms of the RF model results exhibit a single peak significantly skewed towards higher values (around 90%), contrasting with the CCMP and SAR results that have a double-peak with a maximum at 85%.
The RF modeled results nearly fully corrected a low bias (2–3 m/s) in CCMP statistically, shown in
Figure 21b. In fact, a slight overcorrection has resulted in a minimal high bias (0.02–0.07 m/s) that is not statistically significant. Yet, the histogram peaks for all three models are skewed towards the positive side. This small bias, not in line with
Table 1, indicates that a different sampling method can slightly alter the bias.
Previously, we have shown that TCs captured by CCMP and SAR generally exhibit drastically different morphologies. Their spatial correlation has a median coefficient of just 0.73 and a considerable spread across tiles statistically, shown in
Figure 21c. In comparison, the results from the three models show much higher correlations consistently, with median values ranging from 0.85 to 0.87. This improvement is evident in the histograms, which consistently display a sharper single peak at higher correlation values, indicating a more robust alignment between the RF model predictions and the SAR data.
Figure 21d indicates the reduction of the STD by 15–20% (from 4.3 m/s to 3.7 m/s) in the results of all three RF models, suggesting that the difference between the RF modeled field and SAR becomes less fluctuating after applying the RF model to the CCMP fields, also suggesting a better agreement.
Figure 22 shows actual pairs of maps, in three rows, to demonstrate the results of the three models. Overall, in all three results, the wind magnitude and morphology are both drawn closer to the SAR retrieved winds. In many cases, the high wind region within the TC may still be weaker, but the correction is substantial, reaching 10–15 m/s.
Upon closer examination of
Figure 22, it is observed that in Type-1, where only Δlon and Δlat are used as predictors, the structure displays feature oriented along the longitudinal and latitudinal directions, which is particularly noticeable in column 3. On the contrary, in the Type-2 results (column 4), where the predictors include GC distance to the eye region and eye center, the structures tend to be circular in shape. These results align with the expectations based on the definitions of the predictors. Although Type-1 exhibits the generally less smooth structure, it effectively captures the thick filamentary structure with greater clarity. It is worth noting that Type-1n2 does not show better structure than the Type-1 or Type-2 even though it used all the predictors.
We next examine the relative importance of predictors in the RF model training, as listed in
Table 2. Across all three RF models, the significance of CCMP wind speed stands out, as expected, ranging from 0.61 to 0.63. In Type-1, the importance is evenly distributed among its remaining predictors, with eye region-related predictors in both CCMP and SAR carrying more weight than eye center-related predictors. Type-2 shows a notable dominance of the distance-to-the-eye region of the SAR tile at 0.21, exceeding each of the other predictors that share the remaining importance. For Type-1n2, the importance distribution resembles that of Type-2, with additional predictors each contributing a 0.01 fraction of importance.
What is shown in
Table 2 indicates that the TC structural metrics in SAR served as significant predictors. But these metrics are unavailable when CCMP and SAR data are not paired. Nevertheless, we have come to understand that achieving high accuracy in the AI model’s results will greatly benefit from incorporating certain structural metrics related to the SAR TCs as predictors. In future endeavors to establish an operational model, we will consider predicting TC structural metrics as a prerequisite and then proceed to apply the models established above.
5. Summary
High-wind structures in the CCMP ocean wind reanalysis dataset are compared with winds measured by satellite radiometers (or scatterometers), and SAR throughout February to October 2023, using commonly gridded maps with a resolution of 0.25° in both longitude and latitude, along with a 0.5-h window to assess bias, uncertainty, and spatial correlations. A global comparison is made between CCMP and radiometer- (or scatterometer-) measured winds, and then ETCs and TCs within 10° × 10° blocks are compared separately. The comparison between CCMP and SAR is focused on the TC episodes recorded in NOAA’s SOCD. The ultimate goal is to utilize an AI approach to create a new dataset in TC-affected regions to meet specific regional requirements, for which an attempt is made in this study, but more work is required.
The global comparisons were conducted first, which suggest that all four instruments (AMSR2, SMAP, ASCATB, and CYGNSS) exhibit statistically lower wind speeds relative to CCMP, with CYGNSS showing the largest difference (−2.7%) and ASCATB (−0.6%) closely resembling CCMP. The coincidences were searched within a 0.25° × 0.25° grid and 0.5-h intervals. The MPD histograms share very similar distributions among the four datasets, with a narrow spread characterized by an STD falling between 4 and 12%, indicating a fairly consistent MPD across the pairs of maps. The STDs of the pixel-by-pixel percent differences suggest that CYGNSS is the nosiest among the four datasets, whereas SMAP noisiness varies more drastically over different hourly maps. The histograms of the spatial correlation coefficients peak at a high correlation of 0.8–0.9 with a small spread of less than 0.3, indicating a very strong spatial correlation global-wise between each radiometer (or scatterometer) dataset and CCMP.
When comparing TCs within 10° × 10° blocks between these datasets and CCMP, it is observed that both SMAP and AMSR2 exhibit stronger winds than CCMP, with MPD medians at 6.5% and 4.8%, respectively. In contrast, ASCATB shows a lower wind speed by −0.8%. These comparison results differ notably from the global comparison as TCs are present, particularly for SMAP and AMSR2, while ASCATB remains relatively consistent with the global comparison. SMAP’s MPD histogram displays the most variability across blocks.
The STD of the percent differences over pixels ranges from 17 to 20% with similar distribution between datasets, suggesting the same level of uncertainty. Fewer TCs were captured in ASCATB (12 versus 82 and 56 for AMSR2 and SMAP, respectively), and we observed less-organized TC structures in the ASCATB maps, likely due to the weaker winds. Despite the substantial differences in wind speed, there is a strong spatial correlation (0.8–0.9 with minimal variation) between TC structures captured by radiometers (or scatterometers) and CCMP, albeit weaker than in global comparisons. It is worth noting that during TCs, approximately 18% and 7% of block pairs exhibit relatively low spatial correlation (<0.6) for AMSR2 and SMAP, respectively, compared to 9% and 3% during ETCs.
When comparing ETCs within 10° × 10° blocks, we also observe stronger winds in AMSR2 and SMAP, albeit to a slightly lesser extent, at 5%. ASCATB once again shows a slight negative MPD of around −0.3%, but it is not statistically significant. The overall uncertainty levels are similar to those seen in TC comparisons. In this set of comparisons, there are more consistently high correlations with CCMP across the blocks. In summary, despite the presence of strong winds in ETCs, CCMP demonstrates better agreement with the individual satellite datasets in these cases than with those for TCs. This points to greater challenges in wind retrieval during TCs.
The comparison of TCs between SAR and CCMP, sampled at locations and time frames aligned with SAR tiles, reveals that SAR winds around TCs are approximately 9% higher than their CCMP counterparts. This difference is at the same level as those for radiometers (or scatterometers) that include TCs with statistical metrics calculated within 5° × 5° blocks. The median STD of pixel-by-pixel percent differences is at 21%, comparable to the uncertainties in the radiometer (or scatterometer) comparisons. The median spatial correlation between SAR tiles and coincident CCMP data stands at 0.74, with about 23% of pairs showing a correlation below 0.6, which also roughly echoes the radiometers’ (or scatterometers’) comparisons within 5° × 5° blocks.
Empirical approaches were created to determine the TC eye-center location, eye-region size, and the asymmetry relative to the eye center, to make comparison between SAR and CCMP. Throughout February to October 2023 and over all SAR tiles used, the histogram of eye center distances between SAR and CCMP displays a peak at 10 km, with the STD of 28 km suggesting a significant level of variability across tiles. The eye-region size measured in pixels is notably larger in CCMP compared to SAR, with median values of 21 and 7 pixels (0.125° × 0.125°), respectively. The SAR equivalent radii exhibit a bias towards higher values compared to CCMP, with a median MPD at around 15%. CCMP-observed TCs tend to be weaker in the core and less compact. The histograms of west–east and south–north asymmetries relative to the eye center indicate that only about half of the TC pairs exhibit aligned asymmetries in both west–east and south–north orientations. This suggests that achieving a detailed structural agreement between SAR and CCMP is challenging when TCs are present.
A machine-learning model using an RF regressor was developed to create improved maps of TCs that closely align with the SAR-measured TCs. When applied to a blind set of TCs that are captured coincidently in CCMP and SAR, the RF model results effectively corrected a low bias of about 2–3 m/s in CCMP. The correction was most significant (reaching 15 m/s) at the core region of a TC where CCMP showed the poorest agreement with satellite-measured winds.
Although the RF model was trained on pixel pairs, it performed reasonably well on individual tiles. The leading predictor, CCMP wind speed, accounted for a 60–63% share of the importance, while the SAR pixel distance to the eye region emerged as the second most important predictor, with an importance of 21%. The remaining predictors, including the pixel distance to the eye centers in both datasets, collectively contributed a share of 15–20% to the importance.
Immediate steps to improve the model performance will involve tuning the hyperparameters and expanding the training dataset to improve the present RF model. In addition, a revision of the predictors, particularly those related to SAR TC structural metrics, will be crucial. We might turn to the International Best Track Archive for Climate Stewardship (IBTrACS) to gather structural metrics of tropical cyclones (TC), including the eye center location and eye region size. Additionally, we could advance our efforts by developing more sophisticated ML models, such as CNN-LSTM, to generate more precise results that accurately capture both the spatial structure and temporal variability of a TC.
Given the paper’s limited scope, we did not include ground-based measurements from buoys, ships, or platforms. Typically, the coastal winds cannot be accurately retrieved by radiometers or SAR. Nonetheless, it is essential to either integrate the coastal region into our analysis or examine it in a separate study in the future, as it represents a significant component of any ocean wind dataset.