1. Introduction
The Changjiang Estuary is China’s largest estuary, with an annual runoff of more than 900 billion cubic meters, accounting for about 57% of China’s total estuaries [
1]. The Changjiang Dilute Water (CDW), carried with terrestrial materials, is of great significance in transporting material to the west Pacific. The interaction of CDW with saline sea water has a profound impact on the regional biogeochemical and physical processes. The spatial distribution and temporal variation of CDW and corresponding impacting factors are always hot topics but remain completely unresolved [
2,
3]. The lack of an effective index with long-term observation, sufficient spatial coverage and resolution is one of the main problems. Although different kinds of parameters are used to trace the CDW, such as sea surface temperature (SST), suspended particulate matter (SPM), colored dissolved organic matter (CDOM) or even Chl-a (Chlorophyll a), sea surface salinity (SSS) has been proven to be one of the most effective indexes in providing direct insight into hydrodynamic processes and energy transfer in marginal seas, land–sea interactions, marine ecology variation under pressure of global climate change and human activity, etc. [
4,
5,
6,
7].
However, the acquirement of SSS needs to be improved greatly to account for the disadvantages of modern observation technology. Generally, technologies monitoring SSS include on-site observation shipboards or buoys and remote sensing by aerospace satellite [
7]. Traditionally, marine salinity monitoring relies heavily on voyage surveys combined with fixed station buoy observations. To overcome the disadvantages of on-site observation with low temporal and spatial resolution, limited spatial coverage and the inability of dynamic synchronous observation, observation networks at different levels have been constructed globally and/or regionally [
8]. Among them, Argo has become an important means of obtaining global ocean temperature and salinity data [
9]. Argo has advantages in automatically collecting, saving online data and transmitting data through communication satellites, but as a device that flows with the currents, Argo is unable to collect data in a fixed station or even area, and the spatial resolution of Argo data is low, most of all, Argo is only deployed in sea areas deeper than 2000 m to observe the vertical profile from surface to deep [
10]. Assimilation simulation based on the 3D hydrodynamic model is a promising technology which can simulate the salinity distribution and the variable dynamics of a specific area well [
11], but the problem of the scarcity of high-precision historical data remains unresolved.
In recent years, the rapid development of satellite remote sensing technology has provided a new way to monitor SSS synchronously from the aerospace covering a large sea area with an acceptable spatial resolution and temporal frequency. Currently, Soil Moisture and Ocean Salinity (SMOS), Aquarius, Soil Moisture Active and Passive (SMAP) are the three main satellites used to monitor sea surface salinity, and have been successfully launched and are operating normally [
12,
13,
14]. All three are equipped with L-band microwave radiometers to observe the sea surface brightness temperature. Particularly, Aquarius and SMAP have the ability to perform sea surface roughness correction as they are equipped with scatterometers, which are conducive to obtaining SSS products with higher precision [
15,
16,
17]. However, their capability in observing coastal sea waters from salinity satellites is greatly limited due to serious errors. Moreover, the spatial resolution of satellite-obtained SSS is too low to identify the subtle interaction of currents in marginal sea. In addition, the narrow temporal span makes it difficult to obtain temporal variation over a long period with impacts from human activity and global changes.
To improve the spatial resolution and temporal span of satellite SSS data, this study aims to establish a complex SSS inversion model based on salinity satellite data combined with ocean color satellite data. Concerning the low spatial resolution of Aquarius\SSS, and the unavailability of SMOS\SSS in the sea areas around Changjiang Estuary, SMAP\SSS data is finally chosen to participate in the model construction. The accuracy of SMAP\SSS in the study area has been verified by its predecessors with the in-situ measured SSS [
7]. The Moderate Resolution Imaging Spectroradiometer (MODIS) was selected to provide oceanic color products for this study. At present, the MODIS has been widely used in offshore investigations due to its comprehensive observation capabilities. MODIS has the advantages of a long temporal span, open data access, high temporal and spatial resolution, and plentiful oceanic color products including SST, Chl-a, primary productivity, SPM, CDOM and so on [
18]. Previous studies have proved that the incorporation of ocean color data, such as Chl-a, SPM and CDOM, can improve the accuracy of SSS inversion [
19]. In this study, CDOM-related information is utilized for the strong negative relationship between SSS and CDOM, which has been widely observed in coastal waters [
20,
21,
22,
23]. However, it should be noted that systematic errors can be introduced due to the uncertainty of their internal relationships [
24], and the matching sampling points are too few to ensure the temporal and spatial applicability of the constructed model. Meanwhile, since SST has the capability of capturing the temperature differences between riverine freshwater and oceanic saline water, especially in the upwelling region [
23], the monthly average product of reflectance combined with MODIS\SST is used as input, and SMAP\SSS product is used as the true value. Concerning the great difference of the Changjiang River runoff in the flood and dry seasons, Random Forest, Particle Swarm Optimization Support Vector Regression (PSO-SVR), and Automatic Machine Learning (TPOT) were used for training in different seasons. The accuracy of the inversion model is verified with the in situ measured test dataset. The model with the best performance is selected to inverse SSS for the study area. Particularly, the coastal area is filled with SSS inversed by empirical models constructed from in situ measured data to compensate for the incapability of salinity satellites in coastal waters with low salinity and high turbidity. The combined inversion models are applied to SSS observations from 2003–2020 for the study area. The spatial expansion patterns and the temporal changing trend of CDW are then discussed. This study provides a way to obtain SSS in the study area with high spatial and temporal resolution. The observation of temporal variation of SSS distribution over a longer period can be achieved, which is conducive to a better understanding of the change of material transport and estuarine geochemical processes of marginal seas in the west Pacific under severe human activities and global warming.
3. Methodology
Figure 2 shows the basic technical flow of this paper. Firstly, an SSS inversion model with machine learning methods is constructed for the study area with salinity >30 psu, the MODIS\Rrs and MODIS\SST L3 monthly products are used as the input, and SMAP\SSS is used as the true value. The inversion SSS by different machine learning models is evaluated with in situ measured SSS from NSO for different seasons, and the model with the best performance is selected. Then, an empirical statistical model for sea areas with a salinity <30 psu is constructed from in situ measured data and validated. With the establishment of an SSS inversion process, the machine learning models combined with empirical models are applied to obtain SSS for March and August as representatives of the dry season and flood seasons from 2003 to 2010. Finally, the long-term analysis of the SSS temporal-spatial variation trend in the study sea area is performed.
3.1. Data Preparation
First, MODIS\Rrs and MODIS\SST are resampled using a cubic convolution method to obtain data with the same spatial resolution as SMAP\SSS. This method fits a smooth curve based on the centers of the 16 nearest input cells to determine the new value for that cell. This method was indicated to be appropriate for continuous data. It is geometrically less distorted than the raster achieved by running the nearest neighbor resampling algorithm, although it may result in an output raster containing values outside the range of the input raster [
37].
Second, concerning the species and concentration of CDOM will affect the reflectance from the blue to red band of the spectrum, bands in the range of 350–700 nm are widely adopted for the construction of CDOM inversion models [
38]. Particularly, five spectral bands (412, 443, 488, 547 and 667 nm) are used as the single band input variables. In addition, regarding the high accuracy of SSS inversion models established previously for this study sea area with the simple regression method, this paper adopts the band ratio as an input to enhance the accuracy of the model (Equations (1) and (2)) [
39].
Rrs555/Rrs443 and Rrs555/Rrs488 were selected for the flood season, while Rrs555/Rrs469 and Rrs555/Rrs488 were selected for the dry season, respectively. Moreover, SST is even more pronounced in indicating the front plume in some cases. Thus, MODIS\SST is adopted as an important variable that can capture the characteristics of CDW expansion. However, it should be kept in mind that SSS and SST do not always change synchronously.
To improve the SSS inversion accuracy, this paper aims to establish different inversion models for the dry and flood seasons, respectively. Rrs412, Rrs443, Rrs488, Rrs547, Rrs667, Rrs555/488, Rrs555/443 and SST are inputs in the flood season, whereas Rrs412, Rrs443, Rrs488, Rrs547, Rrs667, Rrs555/488, Rrs555/469 and SST are inputs in the dry season.
Finally, normalization of the input data is performed to reduce the inversion error, which may be induced by the uniformity of the dimensions of each input. The formula for normalization methods was given as follows:
Sstd is the normalized index obtained; Smin and Smax represent the maximum and minimum of the original data set, respectively.
After comprehensively considering the lack of MODIS reflectivity data in typical months of the flood and dry seasons in the study area, this paper selects March and August as the representatives of the dry season and flood seasons, respectively, from 2016 to 2018 as training data. A total of 2916 valid points were matched in the flood season and 2550 valid points were matched in the dry season. When dividing the training set and the test set, the principle of hierarchical stochastic is followed; that is, the data is classified with a salinity gradient, and the data are randomly selected as the training set and the test set under each classification to prevent errors caused by excessive concentration of the salinity range. The percentage of the training set and the test set is 80% and 20%, respectively.
3.2. Machine Learning Methods
Three conventional machine learning methods, including Random Forest, PSO-SVR and Automatic Machine Learning (TPOT), are introduced to establish the seasonal SSS inversion model (for the flood season and the dry season, respectively). The inversion accuracy of the three methods is compared, and the method with the highest accuracy is finally adopted.
3.2.1. Random Forest
Random forest is a classification/regression method that uses decision trees as the base learner to build a bagging ensemble and further introduces random attribute selection in the training process of decision trees [
40]. Random Forest is powerful in modeling nonlinear relationships, which can reduce over-fitting and improve the noise-resisting ability. Regarding regression, the final predicted value of the output is the mean of the predicted values of each tree. Two important parameters, the number of decision trees, namely as Ntree, and the number of features, namely as Mtry, are adjusted, especially when modeling.
3.2.2. PSO-SVR
Support Vector Machine (SVM), first proposed by Cortes and Vapnik in 1995 [
41], performs very well in solving small-sample, nonlinear and high-dimensional pattern recognition problems [
42]. SVM is based on the Vapnik-Chervonenkis (VC) dimensional theory of statistics and the structural risk minimum (SRM) principle. It seeks the best compromise between model complexity and learning ability based on limited sample information. SVM can be extended to solve regression problems, referred to as Support Vector Regression (SVR). When predicting, SVR maps the input independent variables to the high-dimensional feature space and converts the original nonlinear model into a linear model in the feature space [
43]. According to the principle of structural risk minimization [
44], the regression fitting process of SVR can be equivalent to the extreme value optimization problem.
For the available kernel functions, this paper tries linear (linear), polynomial (poly) and Gaussian kernel functions (RBF) to select the optimal kernel function. There are three main parameters that may affect the accuracy of the model, namely the selection of the kernel function, the penalty factor C and the kernel function parameter g [
45].
Particle Swarm Optimization (PSO) is a swarm intelligent biological heuristic evolution algorithm. It has the advantages of high accuracy and rapid convergence. The solutions to the optimization problem in the PSO algorithm are called particles, and they have a position vector and a velocity vector. Assuming that there are M particles in the population, the initial position of the first particle and its velocity are generated by random selection [
46]. In this paper, the mean squared error (MSE) is used as the fitness function, and the PSO algorithm is used to optimize the relevant parameters of SVR. The basic principle of SVR was introduced in the previous section, in which the penalty factor C and the kernel function parameter gamma are the main factors affecting the SVR fitting.
3.2.3. Automatic Machine Learning (TPOT)
Automatic Machine Learning (TPOT) is a new model construction method that automatically selects algorithms and automatically optimizes parameters. TPOT is one of the classic frameworks of automatic machine learning and has the ability to generate any tree structure process. The algorithms of each node in the process can be randomly matched and evolved through genetic programming to optimize the ideal machine learning process [
47]. The parameter optimization of TPOT is based on genetic programming. The basic idea of genetic programming draws on the principles of biological evolution and genetic theory in nature. It is a method of automatically and randomly generating search programs. The genetic programming parameter settings in this study are shown in
Table 2.
3.3. Construction of Empirical Statistical Model for Offshore Sea Waters
An Empirical Statistical Model is constructed for SSS inversion in near-shore sea area with a salinity lower than 30 psu. The in situ SSS is from the two buoy stations of NSO located at the mouth of the Changjiang Estuary and the mouth of Hangzhou Bay, respectively. The salinity measured in the two stations varies from 20–31 psu. To make up for the lack of spatial distribution with time continuity, the MODIS\Rrs L2 product is used for spatio-temporal matching. A total of 40 valid data points are matched. Due to the limited samples, one empirical model is constructed for both the flood and dry seasons. Among them, 30 samples are used to build the model, and the other 10 samples are used for validity verification.
Especially, the screening in the archives of the surface salinity data at 0 m of the No. 06 buoy station from 2014 to 2015 and the three-anchor buoy comprehensive observation platform from 2018 to 2019, a total of 40 sets of measured data were obtained. The samples are matched at the spatial and temporal scales of ±1 km and ±6 h. A Pearson correlation coefficient is used to analyze the correlation between each single band and band ratio combination with SSS, as shown in
Figure 3.
The band ratios (
Rrs667/
Rrs488) with the most robust correlation coefficient are selected and then fitted with linear, exponential, logarithmic and power mathematical forms. The power regression model with the highest correlation coefficient is used (
Table 3).
3.4. Sen + MK Trend Analysis
The temporal variation trend of SSS is analyzed with Sen’s slope estimator combined with the Mann–Kendall method (Sen + MK). Sen’s slope estimator is a classic trend calculation method based on non-parametric statistics. Compared with the least squares regression method, the estimated deviation caused by extreme values and outliers has a significant improvement in accuracy and reliability [
48]. The formula for calculating the Sen slope is shown in Equation (4):
In the formula, is the Sen slope, i and j are the year sequence, and are the SSS value of a pixel in the year, and Median is the median processing.
The Sen slope method can effectively calculate the variation trend of SSS at each position globally, concerning its inability to determine the significance of the variation trend, the Mann–Kendall method, a non-parametric test method, is introduced, which improves the anti-noise ability greatly. Moreover, with the Sen + MK method, it is not necessary for the sequence data to obey a specific distribution [
49].
4. Results and Analysis
4.1. Training Results of Machine Learning Models
With the three different machine learning methods, SSS inversion models are constructed and the training results are compared, respectively. The model with the best accuracy is adopted to inverse SSS.
4.1.1. Random Forest
For the model constructed by Random Forest, the Mean Absolute Error (MAE) and correlation coefficient R
2 are used as an evaluation index, and GridsearchCV is adopted to determine Ntree and Mtry. It is determined that Ntree = 250, Mtry = 3 for August, and Ntree = 175, Mtry = 4 for March. The feature importance of the output of the random forest model established in August and March is shown in
Figure 3. With the band ratio input, feature importance in both August and March is improved greatly, implying the significance of incorporation of CDOM optical properties in indicating SSS in the study area. In addition, it is worth noting that SST is more important in March, while
Rrs667 is more important in August.
Rrs667 is an important index indicating the suspended solids, the significance of
Rrs667 can be due to the huge amount of sediments carried by CDW during the flood season; on the contrary, with the decrease of CDW runoff and the increase of SST difference between terrestrial runoff and saline sea water in the dry season, SST gradually replaced the SSS and became the leading role in the SSS inversion. The final accuracy of the test dataset is shown in
Figure 4.
4.1.2. PSO-SVR
The first step involved in the construction of SSS inversion models with the PSO-SVR method is to choose the kernel function. We used a five-fold cross-validation in the training dataset to evaluate the performance of the kernel function (
Table 4). The RBF kernel function performs better in both August and March. The RBF kernel function has been proven to be effective in dealing with nonlinear problems. Therefore, only the penalty factor C and the kernel parameter g need to be adjusted. This study uses pop = 40, max_iter = 100, and MSE as the fitness function to set the PSO algorithm to optimize SVR. It is determined that C = 5, g = 0.96 in August, C = 20 and g = 8.38 in March. The convergence rate of optimization in August is obviously slower than that in March.
4.1.3. Automatic Machine Learning
MSE is used as the evaluation function, and the optimized machine learning process for August and March was constructed according to the genetic programming parameter settings described in
Table 2. This study chooses 50 iterations and the population size is 50. Compared with Random Forest and PSO-SVR, the TPOT runs significantly slower due to its automatic search based on optimized pipelines. The accuracy of the test dataset is shown in
Figure 5. In the case of sufficient iterations, the accuracy of the TPOT model is relatively high and stable.
4.2. Calibration of Inversion Results of Machine Learning Based Models
According to the verification results of the three different machine learning methods with the test dataset, the Random Forest was selected to invert SSS in the flood season, and the TPOT was utilized to invert SSS in the dry season. However, the inversion results of the two methods in the region with low SSS (<30 psu) are unsatisfactory. Possible reasons are supposed as follows:
For the low spatial resolution of SMAP data, few samples are available in coastal regions with low salinity for training.
The accuracy of SMAP\SSS products in the near-shore waters with low salinity is particularly low.
The relationships between SSS in the coastal water with the MODIS reflectance and SST are unstable because of the uncertainty of relationships between optical active components such as SPM, CDOM, etc. and SSS.
Since the inversion model constructed in this paper was designed to solve the problem of observing SSS in a longer time span and at larger space scales, the original data involved in the inversion are all monthly average products, which means that each pixel has been averaged on the time scale. Therefore, we speculate that the mapping relationship is still stable at the original monthly average data (4 km) resolution of MODIS.
To verify the model’s accuracy, the measured SSS of the National Institute of Fundamental Studies (NIFS) was used for comparison. In order to maintain consistency with the data used in the previous modeling, the daily average Rrs and SST (4 km) of MODIS Aqua L3 products are used as input, which matches the measured SSS with a spatial-temporal resolution of ±4 km and ±12 h. A total of 19 points from August 15 to 17 is matched during the flood season, and a total of 13 points on April 28 is matched during the dry season (no SSS is available for March). The comparison between the inversion SSS and the measured SSS is shown in
Figure 6. Due to the small fluctuation of the SSS range of the sampling point, MAE was selected to evaluate the inversion effect. The results indicate MAE = 0.50264 in the flood season and MAE = 0.263322 in the dry season, respectively. Although the number of sampling points is relatively small and the SSS varies in a concentrated range, the model shows generalization ability with a high spatial resolution.
The monthly average products of MODIS in March and August 2020 were used as inputs in the inversion model to obtain SSS.
Figure 7 shows the inversion results and the current monthly SMAP\SSS for comparison. The inversion result in August was found to be consistent with the SMAP\SSS, the model results showed finer details around the CDW boundary. By comparing the measured SSS near Jeju Island in 2020 (
Figure 6), it is found that the regional salinity fluctuates in the range of 28–30 psu in the SMAP data, and the inversion results show that the salinity is above 31 psu, which is highly consistent with the actual measurement. The inversion results in April are basically consistent with the salinity boundary of SMAP SSS, but some low-value areas in the central part are obviously underestimated. On the whole, the inversion results have a certain accuracy and can be used for long-term research observations such as the observation of the Yangtze River dilute water range.
4.3. Offshore Empirical Statistical Model Validation
In this paper, the
SSS inversion model for coastal sea waters with low salinity (<30 psu) is constructed based on the empirical method as mentioned in
Section 3.3.
The correlation between each single band and band ratio combination with SSS was analyzed (
Figure 3), and it was found that
Rrs(667)/
Rrs(488) had the strongest correlation with the measured
SSS. Then, the correlation coefficients between
Rrs(667)/
Rrs(488) and
SSS based on the linear, exponential, logarithmic and power mathematical forms were compared (
Table 3), and finally, the power form was selected as the
SSS inversion model.
The formula is shown in Equation (5):
The accuracy of the model is validated by the in situ measured data, as shown in
Figure 8. As a model constructed in the context of a limited target area, it shows high accuracy.
4.4. SSS Inversion Process Based on Complex Models
This paper establishes an inversion process based on complex models. The procedure is shown in
Figure 9. First, the optimal machine learning based model is used to invert the SSS in August and March. Then, the values >30 psu are retained, whereas the pixels of <30 psu are replaced with new values recalculated by the empirical model. Finally, the new inversion result is output as SSS images for that month.
To show the effect of SSS inversion with the complex models, SSS inversion in August 2020 is taken as an example. SSS inversion based on the complex models is shown in
Figure 10a, whereas SSS inversion with the machine learning model is only shown in
Figure 10b. It can be seen that the SSS distribution in the Changjiang Estuary and the adjacent sea area is greatly improved with complex models. Particularly, the front plume of the CDW is highlighted significantly with the accurate delineation of low salinity.
In order to further evaluate the accuracy of the SSS inversion results with the complex models, this paper uses the monthly average data of Copernicus Global Ocean Reanalysis to validate the inversion results. In the correlation analysis between Copernicus reanalysis data and combined inversion results in August 2020, the total number of matching points was 44,899, the correlation coefficient was 0.763, and the average deviation is −0.86 psu. Therefore, the SSS inversion results are in good agreement with the Copernicus reanalysis data. The validation suggests that the complex models constructed in this study are efficient in SSS observations for the study area.
5. Application: Temporal Variation of Spatial Expansion of CDW
With the complex models constructed in this study, the spatial distribution of SSS in the Changjiang Estuary and the adjoining sea area in March and August from 2003 to 2020 is inverted (as shown in
Appendix A Figure A1 and
Figure A2). The expansion pattern of the CDW in August is summarized, and the temporal variation trend of SSS in August and March is then analyzed, respectively.
5.1. Expansion Pattern of CDW in Summer
Generally, in summer, the salinity of CDW changes dramatically in the range of 22–29 psu, which is the salinity fluctuating range of the plume front of CDW, representing the core area of the CDW [
50]. After carefully comparing the inversion SSS images in August from 2003 to 2020 (
Figure 11,
Appendix A Figure A1 and
Figure A2), and referring to previous related studies [
51], 27 psu is indicated as the threshold to define the boundary of CDW. The expansion patterns of the CDW in August are then summarized into three categories: the first category is the northeast-oriented expansion pattern, including the northeast expansion subtype, the north-northeast expansion subtype and the northeast-east expansion subtype. The other two are multi-direction isotropic expansion pattern and a turn pattern in which CDW shows changing direction, namely the northeast-southeast expansion pattern. Among them, the northeast-oriented expansion pattern is the dominant pattern of CDW expansion.
In most years, the CDW is deflected northeastward by factors such as terrain, runoff, wind speed, circulation and their combined actions. For example, the enhanced TWC or southeasterly wind will make the diversion of the CDW more pronounced and thus expand northernly. The multi-direction expansion pattern and the turn pattern are special cases due to the special marine dynamic conditions of the year. The multi-directional expansion type means that the CDW expands in different directions at the same time. For example, the CDW expands in the northeast and southeast directions at the same time in the SSS inversion image in 2015. The isotropic expansion pattern occurred in 2014 and 2019, the low SSS area was distributed along the coast synchronously without a dominant direction, exhibiting an arc shape. The turn expansion type was found in the SSS inversion results in 2017. The CDW first expanded northeastward, then turned southward at around 125°E, becoming the only expansion pattern of the CDW with a changing direction.
5.2. Changing Trend of SSS from 2003 to 2020
The Sen + MK trend test is used to evaluate the significance of the interannual change of SSS in March and August from 2003 to 2020. The SSS change area is calculated according to the different degrees of confidence, as shown in
Table 5.
The change rate of SSS calculated pixel by pixel in March and August from 2003 to 2020 is shown in
Figure 12. Generally, the study area shows a slight SSS decrease in both March and August, exhibiting an overall downward trend of SSS in recent years. More than 70% of the sea area shows a decreasing trend in SSS in both August and March. The sea area with decreasing SSS in the flood season is larger than that in the dry season. The sea area with a significant and slight decrease in SSS is mainly distributed in the outer parts of the Changjiang Estuary. The area with elevated SSS is located at the plume front of CDW and the activity area of TWC, which intrudes along the submarine valley along the Zhejiang coast. The deceasing trend of SSS in the front plume of CDW may indicate the strength of CDW, particularly in flood season.
On the contrary, the sea area with a slight increase in SSS in both August and March is mainly located in the coastal waters. Particularly, in March, a large area in the northwest of the study area shows slightly increased SSS, in addition, sea areas with significant SSS increases are mainly distributed along the coast of the south of Hangzhou Bay. Whereas, in August, slightly increased SSS occurred near Jeju Island and a very small patch of sea area with significantly elevated SSS was observed in the Changjiang Estuary.
6. Conclusions
This paper takes the Changjiang Estuary and the adjoining sea area (the geographical location between 120°~130°E and 26°~36°N) as the research area. With SMAP\SSS L3 monthly product and MODIS\SST and MODIS\Rrs L3 monthly product as inputs, respective SSS inversion models of the flood season and the dry season are constructed by machine learning methods for the sea area with salinity >30 psu. An empirical method is constructed for the near-shore sea water with a salinity <30 psu without seasonal discrimination, to compensate the low accuracy of the machine learning inversion model in near-shore sea waters with low salinity. With the application of the combined SSS inversion models, SSS in this study area from 2003 to 2020 was performed to observe the change in expansion area and pattern of CDW in the continental shelf of ECS.
The paper reaches the following conclusions:
The combined inversion model achieved acceptable accuracy, particularly for the near shore sea water, SSS inversion accuracy is significantly improved through the combination with empirical model. In sea areas with a salinity between 30–34 psu, where SSS is obtained with machine learning models, the MAE in the flood season is 0.503 psu, the MAE in the dry season is 0.263 psu. In addition, the experimental verification accuracy of the offshore empirical statistical model is R2 = 0.8425, RMSE = 1.29 psu. Moreover, the correlation coefficient between the inversion SSS with Copernicus reanalysis SSS reaches 0.764, and the average deviation is −0.86 psu. It is indicated that the accuracy of the constructed combined model in this study satisfies the requirements of large scale dynamic observations. This new model provides a new scheme to obtain acceptable accuracy and enhanced spatial resolution for salinity observation for marginal seas, where the salinity varies widely, and the high turbidity has always been a great challenge for observation from space.
With the application of the combined model, SSS is inversed for the study area. The results show that more than 70% of the sea area has a downward trend in the flood season and the dry season. Particularly, in August, there is a significant decrease in salinity, indicating an expansion of the CDW extension. The increasing expansion of CDW with low salinity may indicate the gradual strengthening of CDW or weakening of TWC but needs more investigation.
From 2003 to 2020, the diluting water expansion patterns of the Yangtze River can be summarized into three categories: the first category is the northeast-directed expansion type, multi-directional expansion type, and the northeast-southeast expansion type. Among them, the northeast-pointing pattern is the most common type of CDW, and the three types occur due to the special marine dynamic conditions of that time.
With the application of a combined SSS inversion model constructed in this study, the time series of seasonal SSS sequence is obtained, the variation of the CDW expansion area and pattern are analyzed from 2003 to 2020. The study suggests an obvious increasing expansion of CDW, which is of great significance for not only a new scheme to observe SSS for coastal water from space, but also suggests a reliable insight into oceanic current variation. More studies are necessary to further explore the internal mechanisms inducing the land–sea interaction process under heavy human activities and global warming.
However, this paper still has the following shortcomings and future work direction:
- (1)
Consider trying more input parameters. In the future, factors such as chlorophyll and suspended sediment can be considered as inputs to comprehensively consider possible influencing factors related to SSS so as to improve the accuracy of the model.
- (2)
The inversion model is applicable to nearby sea areas, and its applicability needs to be strengthened.
- (3)
The missing rate of MODIS original reflectance data is high in winter and during the period of high occurrence of cloud and fog, which leads to partial missing of SSS inversion results. Therefore, in future research work, using appropriate cloud haze detection or data reconstruction methods to recover missing data can greatly improve the feasibility of remote sensing dynamic observation of SSS.