1. Introduction
In the northwestern Mediterranean Sea, the northern portion of the cyclonic circulation is known as the Northern Current (NC) [
1]. It flows roughly westward along the continental slope of the Gulf of Lion (GoL). This current is generally warm and oligotrophic. Occasionally, under specific wind conditions, the NC can penetrate the shelf causing an intrusion event (see
Figure 1). These events strongly impact the local biogeochemistry [
2] and consequntly the primary production because, during the intrusion, the shelf water, rich in nutrients from earlier upwellings, is replaced in 2–3 days by the warmer and oligotrophic NC water. By combining altimetric observations with high-resolution modelling, the relation of intrusion events with the geostrophic currents and horizontal winds has been analyzed in depth, with the objective of developing a machine-learning algorithm for their detection from satellite altimetry.
Previous projects including oceanographic cruises have focused on the detection of NC intrusions [
3,
4]. These intrusions have also been investigated by Gatti [
5] using high-resolution numerical modeling and the results of various oceanographic campaigns. The mechanisms driving the intrusions of the NC into the GoL have also been studied combining in situ observations and high-resolution modeling by Barrier et al. [
6], hereafter BPO16. This study demontrates that, on the eastern part of the GoL, current intrusions are forced mainly by strong winds coming predominantly from either the East or the Northwest, each via different physical processes. Easterlies cause a water piling along the coast of the GoL, that through geostrophy, induces an alongshore current intruding the shelf. This current intrusion event is simultaneous with the wind forcing and independent of the sea stratification. Intrusions due to northwesterly winds, on the other hand, occur only in periods of stratified conditions and are associated with upwelling currents near the Var coast (see
Figure 1). In particular, a pressure force balances the Coriolis force associated with the deep onshore flow of the upwelling. When the wind calms down, the upwelling and, consequently, the onshore flow and its related Coriolis force weaken. Therefore, the remaining pressure force causes the intrusion of the NC into the GoL shelf about 1 day after the wind relaxation.
The resolution of satellite altimetry, both spatial and temporal, is adequate for monitoring the sea surface height changes in the open ocean (e.g., [
7]). However, finer spatial and temporal scales of variability characterize coastal ocean dynamics. In the last decade, coastal altimetry has benefited from major improvements (a recent review can be found in [
8]). These improvements include the progress in radar technology (e.g., the use of synthetic aperture radar (SAR) and of the Ka-band), the development of waveform retracking algorithms focused on the coastal areas, and the adoption of improved corrections (e.g., wet troposphere and tidal models), all better adapted to the coastal zone. Compared to previous studies mostly based on sparse in situ observations or on high-resolution models, current coastal altimetry observations are particularly useful for the monitoring of coastal currents, thanks to their high temporal and spatial sampling and to their long-term coherent set of measurements. Some important applications of coastal altimetry techniques in the northern Mediterranean can be found in [
9,
10,
11]. For example, [
10] reported an intrusion event into the GoL detected by satellite altimetry. The present work, inspired by [
10], improves the detection technique and extends the analysis to a long-term and multi-mission altimetry dataset coupled with in situ data and numerical simulations. The coastal current intrusions are a key process in coastal dynamics, but due to their ephemeral character, they are very hard to observe by oceanographic cruises. Moreover, moorings, if judiciously positioned, can provide time series sufficiently long to observe intrusions, but these data are intrinsically limited in space. Therefore, their detection by satellite, not yet achieved by other works, represents an important development to their observation and study. The algorithm described in this paper has been specifically designed and tuned for the monitoring of coastal current intrusions crossing isobaths, in a specific region of the Mediterranean Sea. However, the methodology used in this work can easily be adapted to other types of oceanic physical processes and other regions. Regions with scarce ground- or coastal-based measurements can particularly benefit from the use of our proposed method to monitor their coastal currents. To adapt this approach to other kind of events or regions, one only need a relatively large dataset of numerical simulations. Regional numerical models are operationally run in a growing number of regions. We want to highlight that the methodology described in this paper is not alternative to numerical modeling, but can be used, together with model forecasts in order to improve the monitoring of harmful current events, such as current intrusions into the GoL.
The intrusion detection algorithm proposed here is based on a random forest algorithm (RFA). This is a popular machine-learning technique, which has already been applied to altimetry data for identifying sea ice thickness [
12], sea ice type [
13] and oceanic eddies [
14]. However, to our knowledge, RFA has never been applied to coastal currents.
2. Materials and Methods
2.1. JULIO Current Meter
Gatti [
5] has identified an optimal location for monitoring the current intrusions occurring on the eastern side of the GoL. This site, located on the 100-m isobath at 5.255°E–43.135°N, has been named Judicious Location for Intrusion Observation (JULIO). As such, it has been proposed as a site in the framework of the Mediterranean Ocean Observing System for the Environment (MOOSE) observing system (
http://www.moose-network.fr/). JULIO data have been acquired with a bottom-moored Acoustic Doppler Current Profiler (ADCP) (RDI Ocean Sentinel, 300 kHz) during 3 periods: 12 February–23 October 2012; 26 September–31 December 2013; 17 July 2014–10 April 2015. These data sets provide measurements of the horizontal currents with a period of 30 min and with a vertical resolution of 4-m between 15 and 92 m.
BPO16 defined a JULIO index (J
ind) for the identification of big current intrusions of the NC on the eastern part of the GoL. In BPO16 J
ind is the mean vertical current, projected in a certain direction (49.4°), daily averaged and standardized. We have made some small modifications on J
ind compared to the original BPO16 index; but the main calculation remains the same, as now explained. The two small differences (changing the section angle ϑ and not taking into account the boundary layers before the vertical averaging) in the definition of J
ind result from a sensitivity study conducted on the SYMPHONIE model simulations that showed that these small changes maximize the correlation with the flux through the 200 m isobath. First, the current has been projected perpendicularly to the JULIO section (at angle ϑ = 57.6° from North in our paper, 49.4° in BPO16; both in a clockwise direction) from the zonal (U) and meridional (V) velocities measured by JULIO:
By convention, U
jul is positive toward the Northwest. Then, U
jul has been vertically averaged, excluding the superficial and deepest currents, i.e., averaging the current profile from −80 m to −1 m (the full profile has been used in BPO16). Then the half-hourly time series have been daily averaged. Finally, the resulting daily time-series have been standardized, i.e., the temporal mean has been removed and the result divided by the standard deviation:
where
is the daily average of
.
2.2. Altimetry X-TRACK Dataset
An altimetry dataset in the period corresponding to the JULIO measurements and in the region near the GoL has been realized collecting all available measurements from X-TRACK [
15]. X-TRACK is a post-processing software, developed by the Center of Topography of the Ocean and Hydrosphere in Toulouse, France, dedicated to improve satellite altimetry in coastal areas. As shown by [
15], the most effective improvements in X-TRACK software are the efficient detection of outliers in altimetric corrections and the better reconstruction of the missing or rejected correction values. The authors gave special attention to the determination of the ionospheric, wet tropospheric and sea state bias corrections in coastal areas. Recently [
16] X-TRACK became a multi-mission altimetry product applied to all coastal oceans. Moreover, the actual version of X-TRACK includes the Level 2 ALES (Adaptive Leading Edge Subwaveform) retracked product, as well state-of-the-art altimetric corrections. Finally, a new high-resolution (20 Hz) Level 3 product has been made available for research goals. In the present work, all tracks from Jason 2 (J2) and Saral/Altika (SA) in the area 38.5°N–43.7°N and 1°E–8°E, close to the GoL, have been selected. All altimetry data are at 1 Hz resolution and have been used without any further filtering to reduce noise.
Table 1 shows some relevant characteristics for each selected track. The selected area is relatively large and some of the selected tracks are relatively far away from JULIO (in some cases more than 250 km). This choice has been taken in order to obtain a sufficient time sampling repetitivity from the two altimeters, as the repeat time periods of J2 and SA are of 10 and 35 days respectively.
2.3. SYMPHONIE Model Simulations
The 3-D primitive equation coastal ocean model SYMPHONIE, described in detail in [
17,
18], was implemented in a regional configuration [
19]. This model is based on the hydrostatic assumption and the Boussinesq approximation. For a detailed description of the SYMPHONIE model see [
9] and the references therein.
The SYMPHONIE dataset is built from 10 years of daily simulations (09/01/2001–31/12/2011) over the northern Mediterranean Sea. The simulation domain is a regular grid rotated 31 degrees counterclockwise, with a resolution of about 3 km. The vertical resolution of the model varies with the bathymetry as the vertical levels have been defined in sigma variables.
In this work the SYMPHONIE simulations have been used for several purposes. First, the simulated Sea Surface Heights for each selected altimetry track have been extracted. Then the modeled Sea Level Anomaly (SLA) has been calculated, with a mean value calculated over the 10-year simulations. Second, following [
5], the volume flux in the Eastern GoL through the 200 m isobath (F
200) has been calculated, by summing the fluxes of interpolated SYMPHONIE currents with 1 km spacing along the curve in
Figure 1. Moreover, the currents have been interpolated in the vertical direction every 4 meters. On each interpolating point, the azimuth (az) of the direction normal to the curve has been calculated as well as the component of the oceanic current in that az direction, which corresponds to the current normal to the 200m isobaths with:
Where u and v are the zonal and meridional components of the current. The flux is calculated as the sum of the horizontal currents normal to the vertical portion of sections along the 200 m isobaths. The flux is therefore the sum of all the normal currents multiplied by the area of the summed surfaces (corresponding to ~4000 m2). The normalized flux () is then calculated by subtracting its temporal mean, then dividing by its standard deviation, where both mean value and standard deviation are calculated over the full 10-year SYMPHONIE simulations.
2.5. Random Forests
The random forest is an ensemble learning algorithm (RFA) that combines the ideas of “bootstrap aggregating (bagging)” and “random subspace method” to construct randomized decision trees with controlled variation, introduced by Breiman [
20].
In details, in random forest, a recursive partitioning method on a reference data set (learning dataset), builds classification and regression trees (decision trees) for predicting continuous dependent variables (regression) and categorical predictor variables (classification).
Decision trees are represented by a set of questions (if-then logical split conditions on input variables) which splits the learning sample into smaller and smaller parts. The final result is a tree with decision nodes and leaf nodes (the final node). A decision node has two or more branches, each representing values for the predictor variable tested. The leaf nodes represent the final choice (the response) for all instances that lead to that node.
The ensemble bagging technique is based on the use of multiple trees from random bootstrapped replicas of the learning dataset, to enhance considerably the classification accuracy over a single decision tree. The final predictions of the random forest are made by averaging the predictions of each individual tree.
An important step in the implementation of a decisional tree consists of determining the predictor variables that best classify the training data and the splitting criteria in the decision nodes. In this sense, the homogeneity of the data contained in the node is considered. If all data in a node show identical values, homogeneity is maximal (impurity is minimal). Therefore, in the splitting procedure the reduction of impurity between the initial node (parent node) and the nodes resulting from the splitting (child nodes) must be maximized. There are several proposed criteria for measuring impurity (entropy, Gini index, chi-square, residual sum of squares), but in the regression-type problems the residual sum of square algorithm
is used for measuring impurity. In the expression,
is the value of the response variable, and
is the mean for node t.
An informative gain is evaluated as the difference between the impurity values before and after the split.
where t
r and t
l are the right and left child nodes of the parent node t
p, and P
l and P
r are the probabilities of left and right nodes. Therefore, in the splitting procedure the change of impurity measure
must be maximized.
For many problems the performance of random forests is very similar to other machine learning techniques (e.g., boosting, Neural Networks); however they are simpler to train and tune. Therefore, RFA are popular, and have been applied in various fields of science and technology, such as biology, meteorology, medicine or finance [
21], a review of the random forest applications in remote sensing can be found in [
22].
2.6. Training of the Algorithm
In this work, an algorithm based on random forest has been developed. The algorithm has been trained using as inputs simulated SLA and horizontal winds and as target the volume flux through the 200 m isobath (F
200). In particular, for each selected altimetry track (or track segments) in
Table 1 an independent random forest has been trained. The input variables for the training of the random forest algorithms are:
Modeled SLAs extracted from SYMPHONIE simulations (see
Section 2.3) in the coordinates corresponding to the satellite altimetry tracks.
Horizontal winds from ECMWF ERA Interim at 10 m from the surface (see
Section 2.4), for the same day and for 1, 2, 3 and 4 days before the event.
The volume flux predicted by the (
) have been normalized (
) in order to facilitate the comparison with J
ind.
has been calculated using the mean and standard deviation over the full SYMPHONIE 10-year simulations as defined in
Section 2.3.
In this work, each RFA, one for each track, has been set in order to include 100 regression trees, with a minimum leaf size (i.e., the minimum number of observations per tree leaf, or terminal node) of 5. Both these settings have been chosen after some sensitivity studies. Indeed, before reaching the optimal result, other sets of inputs and RFA structures have been tested, in particular the inputs have been selected starting from a model including the SLAs and the geostrophic currents in the track, and the horizontal winds from the actual day up to 10 days before the event. Then the inputs have been reduced eliminating one by one every input until the performance of the RFA started to decrease. In terms of RFA structure, two types of RFA have been tested: RFA for regression and RFA for classification. By the way, RFA for classification has been tested for discriminate the “intrusion event” class from the “non-intrusion event” class. However, the optimal RFA selected is a RFA for regression, since all the sensitivity tests with the RFA for classification gave poorer results. All algorithms have been trained by using MATLAB 2017 TreeBagger class.
2.7. Testing the Algorithm
The RFA was tested with the 3 time series of current measurements at JULIO. In testing the RFAs the input data are the horizontal winds at 10 m from ECMWF ERA Interim and SLAs measured by Jason 2 and Saral/Altika. For the altimetry SLA to be of the same order as the SYMPHONIE SLA, the altimetry SLA has been shifted using:
where the corrected Sea Level Anomaly (
) for a given track and position (x) is given by the corresponding SLA observed from altimetry
corrected by (simulated - observed) mean in time and multiplied by the ratio of the standard deviations, where the suffixes Obs and Sym are for “observed” and “simulated””. Both the mean and standard deviations of SLA
Obs have been calculated over the full available track record in the X-TRACK dataset. The algorithm has been tested against JULIO measurements, by comparing J
ind with Fn
200. It must be highlighted here that even if both J
ind and F
n200 have been calculated for the detection of intrusions, some intrinsic differences exist between them. Moreover, the two indexes have been normalized over the full period of their availability, even though these two periods are both extended, they do not coincide.
4. Discussion
This work consists of the application of a machine learning technique to the detection of current intrusions into the Gulf of Lion.
The RFA for the detection of current intrusions is based on two main inputs: the Sea Level Anomalies from altimetry datasets and horizontal winds from a global meteorological model. It is well known that the geostrophic component of currents in the direction perpendicular to the track can be derived from altimetry data. Therefore, the information available from altimetry for the detection of current intrusions (i.e., the current anomaly) has two limitations: first, it accounts only for the geostrophic component of the current and, secondly, only in one direction (perpendicular to the satellite track) that may or may not be coincident (or even similar) to the main direction of the current intrusion. Furthermore, the presence of strong winds shall affect the relative importance of the geostrophic and ageostrophic currents, augmenting the ageostrophic component. This shall be particularly true in case of north-westerly winds (see the discussion in BPO16). Nevertheless, recent work combining ocean glider, coastal radar, vessel survey and in-situ and numerical meteorological data, stated that the Northern current is mostly zonal and in geostrophic balance even at the surface [
23]. Moreover, the altimetry data showing the larger importance for the training of the algorithm is the SLA very near the coast. Unfortunately, it is exactly where the altimetry technology is more problematic, due to the potential contamination from land of the radar waveform and, secondarily of the radiometer signal. Fortunately, in the last 10 years altimetry has made great progress in the quality of measurements in the coastal region, due to the development of targeted re-tracker, and of corrections and data filtering specific to the coastal areas. This progress opens new paths for several applications in coastal areas and this work can be considered as an example. The paramount importance, in this work, of the use of X-TRACK data has to be highlighted, as any attempt to replicate these results with Geophysical Data Records (GDRs products),provided by space agencies, was unsuccessful. The main reasons for this failure were the absence of altimetry data in proximity of the coast and the important noise level in the immediate surrounding areas.
The choice of the second input used by the RFA, the horizontal winds, is based on several physically robust studies on the dynamics of coastal circulation processes in the Gulf of Lion (see BPO16, [
5,
24] and included references) and on a long series of experimental oceanographic campaigns in this area. The importance in the RFA algorithm of the main horizontal wind direction in the previous 1–2 days confirms the results of BPO16, where two main mechanisms were identified for the generation of current intrusions, as detailed in the introduction.
It must be highlighted that some of the limitations of the algorithm are due to discrepancies in the training and testing dataset: the training dataset is based on daily simulations, while the testing one is based on instantaneous observations from satellite altimetry. Moreover, the test is based on Jind, a standardized mean current projected in a certain direction that is a point-based measurement, while the algorithm predicts the volume flux through a relatively large surface. Some limitations are probably also related to the physics of the problem as the altimetry-based sea level anomaly is directly linked to the geostrophic current anomaly, while the volume flux (or the mean current on JULIO) is related to geostrophic and ageostrophic component of the currents. Last, but not least, the observations that play a major role in the training of the algorithm are the ones very near the coast, in the region where the altimetric signal is strongly affected by the presence of land.
Despite the small geographical area of application of this algorithm, the methodology is easily replicable in other areas of the world. The only need is a long time series of ocean model simulations.
In this study, the validation of the algorithm has benefited from the availability of high-quality measures from the bottom-moored ADCP of the JULIO site. This availability has been fundamental for testing the algorithm. However, the choice of the satellite usable for this study was limited from the measure time series available from JULIO. In particular, only Jason 2 and Saral/Altika data have been used, while more recent series of measurements could be compared with a RFA tuned with more recent high resolution SAR altimeter such as CryoSat-2 [
25] and Sentinel 3 A/B [
26] as well as the coming Sentinel 6 [
27], and the wide-swath altimetry missions (SWOT [
28]), which should allow the monitoring of current anomalies in two directions. Hence, this work opens the door to multiple future interesting applications.
5. Conclusions
This work shows the results of a random forest algorithm applied to SLA from coastal altimetry observations (X-TRACK). The algorithm has been trained on 10 years of model simulations from the SYMPHONIE model, to predict intrusions of the Northern Current in the eastern part of the GoL. These intrusions are quantified by the volume flux that crosses the 200m isobath in a well-defined area. The algorithm replicates the simulated volume flux very well, i.e., the correlation between the predicted and the reference flux is higher than 0.96 and the ARMSE is lower than 19% of the interquartile range. Sensitivity analysis in the training of the algorithm evidenced the high importance of the horizontal wind forcing, especially 1–2 days before the event, as well as the contribution of the Sea Level Anomalies very near the coast.
As shown, the application of the algorithm to satellite-altimetry-based SLAs has given very satisfying results. A detailed comparison with in situ data, from the JULIO mooring, showed that 93% of the intrusions have been correctly identified, with only a few false alarms. The capability of the algorithm to predict the volume flux weakens when the horizontal wind conditions change rapidly. By the way, it is important to highlight that the use of a dataset specifically developed for coastal areas was a necessary condition for the realization of this experiment.
This study has given very promising results; however many benefits should be gained in the future by using the new generation high resolution SAR altimeters such as CryoSat-2 [
25] and Sentinel 3 [
26] as well as the forthcoming Sentinel 6 [
27] and the wide-swath altimetry missions (SWOT, [
28]) and the Ka/Ku-band Copernicus Polar Ice and Snow Topography Altimeter (CRISTAL) high-priority candidate mission ([
29]), which should strongly improve our capability to monitor relatively small-scale coastal currents.