1. Introduction
When colossal gravity-driven mass flows enter a body of water, such as a sea, a lake, or a reservoir, they sometimes generate large waves. These events are particularly relevant in coastal areas and mountainous countries. Such waves occurred, for example, in Lituya Bay in 1958 [
1] and in Vajont, Italy, in 1963 [
2]. Predicting the characteristics of waves induced by subaerial landslides is of great importance for risk management in coastal areas [
3].
Researchers have conducted experiments using physical models that try to reproduce the physical processes of impulse waves generated by subaerial landslides. They have simplified water geometry by using 2D flumes or 3D basins and idealized the sliding masses as rigid blocks [
4,
5,
6,
7,
8], granular solids [
9,
10,
11,
12,
13,
14], or viscoplastic fluids [
15,
16]. Based on reliable experimental data, a number of empirical or semi-empirical equations have been established, either by combining regression techniques with dimensional analysis [
11,
17,
18,
19] or by a scaling analysis of governing equations [
20,
21]. Most equations to date have expressed wave characteristics as power functions of several slide parameters on impact, and some have occasionally involved an additive term [
22].
One significant issue has emerged from previous research: on many occasions, empirical equations have fit well with their own experimental data, but they then exhibited large deviations from the datasets obtained by other teams, especially when different slide materials were involved [
10,
15,
16,
23]. The performances of the different equations on a given dataset remain uncertain. This uncertainty reflects the limitations of empirical equations with a given functional form. Heller and Spinneken (2013) developed generic empirical equations for blocks of various shapes [
24]. They also discussed the data discrepancies between using blocks and granular slides. Actually, none of the existing empirical equations can account for all range of materials used in experiments. Applying empirical equations may be difficult when, for instance, the slide material involves different components. A typical example has been Tang et al. (2018), who conducted experiments using blocks, granular slides and mixture of block and granular slides [
25]. Taking the viscoplastic–granular mixture as an example, the representative parameters of these two materials are the yield stress and grain diameter, respectively. Due to the current lack of understanding about how these two materials affect the underlying physics of the slide–water interaction, integrating these two parameters into one equation might be problematic if we have presumed a functional form for that equation in advance.
Another key issue is that all the existing empirical equations express wave characteristics from the parameters relating to the sliding masses on impact; none use the parameters related to the initial stage (i.e., when the mass is still on the slope and starts moving). Putting the emphasis of the parameters on impact makes it easier to control the variables and to provide a quantitative analysis; however, for engineering applications, there is a need to predict wave characteristics before the sliding has occurred. For example, in May 2009, a slight slope failure occurred on the Guopu bank of the Laxiwa reservoir, in China. Based on monitoring data, a faulted rock mass with an approximate volume of
m
showed signs of general displacement [
26]. Although there is a very small probability, should the mass drop into the reservoir, it would generate large waves which may well destroy the nearby arch dam ([
27]). In this situation, estimating the characteristics of the potential waves from information on the potential landslide (which is still at rest on the slope) is more than warranted. To study the various physical processes from the initial impact to wave propagation, Heller et al. (2009) took a holistic approach based on a theoretical analysis and semi-empirical equations [
17]. For more complex landslide materials, providing physical constraints on the mathematical operators of prediction equations formulation of empirical equations becomes more challenging.
Using an approach that did not assume the functional form of the equation in advance and relied strictly on the data alone, would be preferable for dealing with both of the above issues. To overcome the limitations of empirical equations, the present study presents a data-driven method, known as an artificial neural network (ANN) method, which has been successfully employed in other fields to cope with complicated parameters in experimental data processing and to develop highly accurate predictive models [
28,
29,
30,
31,
32,
33]. In contrast to empirical equations, in which mathematical dependence was fixed in advance, the ANN method provides an approach in which both the explanatory and explained variables in the data ultimately define their internal relationship without any prior assumptions about the equation’s functional form or physical constraints. Moreover, the model can be easily calibrated when new data or parameters become available, which makes it powerful in solving complex problems [
34]. Panizzo et al. (2005) compared the ANN method and empirical equations on a simple case (that is, predicting wave characteristics from solid block parameters on impact). The ANN method’s predictive capacities were slightly better than those of empirical equations [
35]. To the best of our knowledge, no data-driven method has been used to deal with field data. The key advantage of data-driven methods, namely, their high adaptivity to solving complex problems and dealing with complex parameters, was not further investigated.
Using the ANN method, we (i) estimate the wave characteristics from the parameters of a subaerial mass at the initial stage, when it is at rest and starts moving down the slope, and (ii) predict the wave characteristics generated by different slide mass materials (specifically, viscoplastic slides, granular slides, and mixtures of them), all within one model. For each application, we refined the inputs, outputs and network structures of the model.
3. The Artificial Neural Network Method
The ANN method is inspired by how the human brain processes information, and it is constructed from interconnected processing elements called neurons [
38] (see
Figure 3). ANNs are receiving ever greater attention because of their ability to express complex functions in a flexible form. A typical ANN model consists of three main parts: learning rules, network architecture, and an activation function. The network structure is formed of several layers: one input layer, one output layer, and one or several hidden layers, with each layer containing several neurons. Each of the neurons in a layer is connected to neurons of the adjacent layers via coefficients called weightings.
From a mathematical perspective, the principle of neural networks involves the composition of non-linear functions. Starting with a linear model, considering a dataset z and a vector of inputs x, a linear model for the output can be constructed considering , where the weighting matrix W and the bias vector are obtained by solving an optimization problem that minimizes the overall difference between z and . This process is called model training. Such a simple model may lack the flexibility to represent complex functional mapping and, therefore, intermediate variables (layers) y are introduced: and , where is a user-specified activation function, like the hyperbolic tangent. The composition of several intermediate layers results in a neural network capable of efficiently representing arbitrarily complex function forms.
In this study, we selected a one-hidden-layer network, as an example, and adopted a back-propagation algorithm to train the network. The algorithm programming was developed using Matlab. Establishing an ANN model consists of three steps: (i) preparing the required data for training the network; (ii) evaluating neural networks with different structures and choosing the optimal one; and (iii) testing the neural network’s performance using data which have not been used previously for training the network.
The back-propagation artificial neural network algorithm (BP-ANN) consists of two paths: the feed-forwards and the feed-backwards paths. The feed-forwards path is expressed by Equations (
2) and (
3).
where
,
, and
represent the input, hidden, and output layers, respectively,
and
are the bias weights for setting the threshold values,
and
temporarily represent computing results before using the activation function, and
F is the activation function applied in the hidden and output layers. For the activation function, we chose the sigmoid function, which ranges between 0 and 1 (see Equation (
4)). The activation function is defined on each layer’s neurons and is applied to the sum of the weighted inputs and to each neuron’s bias to generate the neuron output.
Equation (
5) displays the residual function for residual back-propagation training.
where
is the predefined target value and
is the residual of each output node.
E is the residual between the expected and actual output values. We used a gradient-descent strategy to adjust the weightings, aiming to obtain a minimum
E. Equations (
6)–(
9) express the weightings between the hidden and output layers.
and hence
Therefore, the weighting adjustments in the hidden and output link
can be expressed by Equation (
8).
where
is the learning rate ranging between 0 and 1. With a lower learning rate, the network model will take longer time to converge. Conversely, a higher learning rate may lead to a widely oscillating network. In addition, maintaining a consistent learning rate across the model is preferable. The new weighting
is updated by Equation (
9), where
r is the number of iterations.
Similarly, the error gradient in the links between the input and hidden layers can be derived from the partial derivative with respect to
.
where
The new weighting dominates the link between the input layer and hidden layer,
, can be updated as:
All the input data were normalized in the range between 0 and 1 using the following equation:
where
X is the raw data and
Y is the normalized data. The initial parameter settings are shown in
Table 2.
4. Results
In
Section 4.1, we validate the ANN method by comparing its prediction accuracy against empirical equations, using the experimental data generated by the viscoplastic flow. In
Section 4.2, we predict the wave characteristics from the slide mass features at rest and as it started moving (stage I in
Figure 1). In
Section 4.3, we develop an ANN model which aims to cope with the parameters of a landslide with complex properties, specifically, a mixture of cohesive and granular slide mass materials.
Each model’s performance was evaluated by its coefficient of determination (
), mean square error (MSE), and its sum of squares due to error (SSE), which are expressed as follows:
where
is the number of series of experimental data,
and
are the predicted and observed data, respectively, and
is the average of observed data.
4.1. Model Validation
Most commonly used empirical equations to predict waves generated by landslides involve the following dimensional parameters:
Based on a dimensional analysis or a scale analysis, the scaled wave characteristics can be expressed as a function of several dimensionless groups:
where
X represents the scaled wave characteristics (e.g., the scaled maximum wave amplitude, wave height, wave length, wave period);
indicates the explanatory variables selected, where
N is the number of explanatory variables.
The predicting equations developed by Zitti et al. [
21] were the best fit with our experimental data (see Equation (
20)).
where
, and
is the slide mass Froude number,
is the scaled slide mass thickness, and
is the scaled impacted slide mass, where
B is the width of the flume.
The coefficients of explanatory variables
and
were acquired by fitting the experimental data based on a linear regression technique. The empirical equations of
and
for the present study were:
Using the same database and explanatory variables as Equation (
21), we modeled the experimental data using our ANN method. Thus, the three neurons in the input layer and the two neurons in the output layer were:
Of the 291 samples of Carbopol mass slides in the experimental database, 80% (233 samples) were selected as training data for model construction and 20% (58 samples) were saved as test data for model validation, providing an independent measure of ANN performance after training. Samples for each group were selected randomly.
We used a basic three-layer network structure, namely, one input layer, one hidden layer, and one output layer. To select the optimal number of neurons in the hidden layer, we set a random number of neurons and ran the program, determining their performance by
. Each run was repeated five times and
was calculated by eliminating the maximum and minimum coefficients of determination and averaging the results of the remaining three tests. As shown in
Figure 4, the
of both
and
reached their maximum values when the hidden layer contained six neurons. Thus, the optimum network for the present study was a three–six–two structure (input–hidden–output).
Model training was constrained by the following indicators: the maximum epoch number was initially set to 100; the objective MSE was set to
; the minimum gradient was set to
; and the maximum number of validation fails, which represents the number of successive iterations that the validation performance fails to decrease, was initially set to six. Training would stop once one of the indicators mentioned above reached its initial value; for instance, in the present study, training stopped when the number of validation fails reached 6.
Figure 5 illustrates the evolution of these indicators (i.e., gradient, validation fails, and MSE) at each epoch until the training is stopped.
In
Figure 5c, the MSEs of the training data and the test data were counted separately. The curves of the evolution of the MSE for these three data series were very close, indicating the model’s high level of adaptability. The best validation performance was an MSE = 0.00025337 at epoch 43, and the training terminated at epoch 48 as the number of validation fails reached six. The gradient = 0.0011736 at epoch 48.
Figure 6 displays a histogram of the residuals between the predicted
and the observed
. The probability density of the residuals approximately follows a Gaussian distribution.
Figure 7 displays the observed
and
versus the predicted data modeled using the ANN model and the empirical equations. The
of
and
of the test data in the ANN model were 0.9682 and 0.9479, respectively; the
of
and
of the test data predicted by the empirical equations were 0.9214 and 0.9062, respectively. The ANN model outperformed the best-fitting empirical equation. In addition, the
of
was always slightly higher than that of
, in both models, which may result from measurement errors in the experiments which have been defined in our previous publications [
15,
16].
4.2. Prediction of Wave Characteristics from Initial Slide Parameters
Previously, empirical or semi-empirical equations determined wave characteristics from the mass slide features on impact (illustrated as stage II in
Figure 1), and most equations were established in the form of the power-law equations of several dimensionless groups (see Equation (
20)). When we predict the wave characteristics from the slide features at stage I, it is difficult to provide physical constraints on the mathematical structure of predictive equations because of the complex physical mechanisms involved in the whole process. In this case, assuming a functional form for the prediction equation in advance might be problematic. Therefore, a data-driven approach that relies strictly on the data rather than on a fixed form equation is preferable, and the ANN method thus fits this requirement. The process involves the following parameters:
The slide mass’s rheological parameters include
,
K, and
n. Although they have little effect on the slide mass–water interaction and wave formation [
16], they have great effects on the slide mass flowing down the slope. The Pearson correlation coefficients between each pair of these three parameters were all above 0.9 (see
Table 3), indicating that all three parameters correlated highly. We therefore selected the yield stress
, namely the stress at which the material starts yielding, to represent the rheological parameters.
Figure 8 provides a first insight into how the wave characteristics depend on the rheological properties of the slide mass and on its parameters at the initial stage. It shows experimental data with the yield stress set at
= 41 Pa, 62 Pa, and 80 Pa. Overall, the maximum wave amplitude
increased with rising yield stress
and initial slide mass
, and decreased with slope length
.
and
are aspect ratios for the
l-axis to the
y-axis, and for the
s-axis to the
y-axis, respectively. The natural choice for defining the typical scale introduced by these ratios was to take the dimensions of the reservoir:
,
, and
. The Bingham number can be expressed as
, which is a dimensionless yield stress (relative to the viscous forces). We assumed that the viscoplastic flow reached a near-equilibrium regime, where viscous forces balanced gravity acceleration, and the velocity scale was then
. The Bingham number then became
(see [
40] for further information).
The dimensions involved in Equation (
23) are length [L], mass [M], and time [T]. We chose three scaling parameters: water density
, still-water depth
, and gravitational acceleration
g [
19]. Thus, the dimensionless form can be expressed as:
where
is the scaled free-water surface elevation. As in
Section 4.1, we selected the scaled maximum wave amplitude
and height
to represent the water surface elevation. As the slide mass density
and water density
were constant throughout our experiments,
can be eliminated. There were therefore five neurons in the input layer and two neurons in the output layer:
five inputs: , , , , and
two outputs: and
The modeling method used was the same as in
Section 4.1. First, based on the optimal number of hidden neurons determined, a five–ten–two network structure was developed; then, the experimental data were divided into training data and test data; finally, the ANN model was trained using the training data and validated using the test data. The
, MSE, and SSE of
were 0.8983, 0.00089, and 0.2591, respectively. The
, MSE, and SSE of
were 0.8497, 0.00295, and 0.8483, respectively. Because
, the present model is validated. Yet compared with the scenario that predicted wave characteristics from the slide mass parameters on impact, the prediction accuracy of the ANN method in the present scenario was lower. The more complicated the physical process is, the more information could be lost in prediction.
4.3. Waves Generated by Viscoplastic–Granular Mixtures
Most studies have mimicked landslides in the real world by using a single slide mass material, including granular slides, viscoplastic materials, or solid blocks. However, many landslides in the natural world are mixtures of granular and viscoplastic materials. In the present study, we conducted experiments using mixtures of polymer–water balls and Carbopol, with the percentage of Carbopol in volume varying symmetrically (0%, 20%, 50%, 80% and 100%).
Figure 9 shows raw images, captured by a high-speed camera, of Carbopol, polymer–water balls, and mixtures of them, entering the body of water. These represented landslides with different degrees of cohesion.
As shown in
Figure 10, larger waves are generated with higher proportions of Carbopol in the mixture, which implies that the slide mass material’s composition influenced wave generation. Here, to provide identical criteria for all slide mass materials, we quantified the slide mass properties using a universal dimensionless group named the
Impulse product parameter P, which was proposed by [
12]:
where
,
, and
denote the same parameters as in Equation (
20).
One issue which should be noted is that the properties of granular slides are usually represented by their grain diameters, whereas the rheological behavior of viscoplastic materials is commonly described using yield stress. It is difficult to integrate these two parameters into one equation in the form of a power-law equation. To overcome this limitation and provide a compatible model for these parameters, we applied the ANN method so as to avoid assuming the functional form of a prediction equation. Here, we predicted the wave characteristics from the mixture’s parameters on impact.
As highlighted above, the dimensionless parameters in modeling experiments with a single material commonly involve the slide Froude number , relative slide mass , and the relative slide thickness . To quantify the properties of mixed viscoplastic and granular slides, we introduced the following dimensionless groups: the Bingham number Bi, which represents the rheological properties of a cohesive material; the scaled diameter of the granular slide mass , where is the diameter of a granular particle; the volume ratio of the viscoplastic material in the mixture , where is the volume of the viscoplastic slide mass and is the volume of the granular slides; and the density ratio between the two materials , which is a constant in the present study.
Hence, the input layer contained six neurons
,
,
, Bi,
, and
, and the output layer contained again
and
. Using the same method presented in
Section 4.1, the number of hidden neurons was determined, and the network’s optimum structure was six–eight–two. The
, MSE, and SSE of
were 0.9325, 0.0072, and 0.2172, respectively. The
, MSE, and SSE of
were 0.9173, 0.00178, and 0.6154, respectively. As
of both
and
were greater than 0.8, the model can be considered as valid. The predicted
and
are illustrated against the experimental data in
Figure 11.
6. Conclusions
This study applied an artificial neural network (ANN) method—one of the most commonly used machine learning methods—to predict the characteristics of waves generated by gravity-driven slide masses. Laboratory experiments were conducted using a viscoplastic material (Carbopol), a granular material (polymer–water balls), and mixtures of them. After validating the ANN model by comparing its prediction accuracy with that of empirical equations, we applied the model to two scenarios: (i) predicting wave characteristics from the parameters of landslides initially at rest on the slope and (ii) integrating the parameters of different categories of slide mass material into one model, i.e., a Bingham number for the viscoplastic material and the grain diameter for the granular material. For each scenario, the inputs, outputs and network structures of the ANN model were refined. In the first scenario, the for the scaled maximum wave height and scaled maximum wave amplitude were 0.8983 and 0.8497, respectively, and in the second scenario, the for and were 0.9325 and 0.9173, respectively. As a purely data-driven method, this ANN method was easy to adapt when new parameters were included or fresh constraints occurred.