Plastic Constitutive Training Method for Steel Based on a Recurrent Neural Network

Wang, Tianwei; Yu, Yongping; Luo, Haisong; Wang, Zhigang

doi:10.3390/buildings14103279

Open AccessArticle

Plastic Constitutive Training Method for Steel Based on a Recurrent Neural Network

College of Construction Engineering, Jilin University, Changchun 130021, China

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(10), 3279; https://doi.org/10.3390/buildings14103279

Submission received: 21 September 2024 / Revised: 13 October 2024 / Accepted: 15 October 2024 / Published: 16 October 2024

(This article belongs to the Special Issue Intelligent Design, Green Construction, and Innovation)

Download

Browse Figures

Versions Notes

Abstract

:

The deep learning steel plastic constitutive model training method was studied based on the recurrent neural network (RNN) model to improve the allocative efficiency of the deep learning steel plastic constitutive model and promote its application in practical engineering. Two linear hardening constitutive datasets of steel were constructed using the Gaussian stochastic process. The RNN, long short-term memory (LSTM), and gated recurrent unit (GRU) were used as models for training. The effects of the data pre-processing method, neural network structure, and training method on the model training were analyzed. The prediction ability of the model for different scale series and the corresponding data demand were evaluated. The results show that LSTM and the GRU are more suitable for stress–strain prediction. The marginal effect of the stacked neural network depth and number gradually decreases, and the hysteresis curve can be accurately predicted by a two-layer RNN. The optimal structure of the two models is A50-100 and B150-150. The prediction accuracy of the models increased with the decrease in batch size and the increase in training batch, and the training time also increased significantly. The decay learning rate method could balance the prediction accuracy and training time, and the optimal initial learning rate, batch size, and training batch were 0.001, 60, and 100, respectively. The deep learning plastic constitutive model based on the optimal parameters can accurately predict the hysteresis curve of steel, and the prediction abilities of the GRU are 6.13, 6.7, and 3.3 times those of LSTM in short, medium, and long sequences, respectively.

Keywords:

plastic constitutive; deep learning; optimization method; sequence prediction

1. Introduction

In the process of engineering structural design, a high-precision finite element method (FEM) is needed to ensure the reliability of steel structures [1,2,3]. An accurate material constitutive model forms the basis of ensuring the reliability of the simulation results. At present, elastic constitutive model calculation is often used in the design of steel structures to achieve a fast design and save on computing resources, without considering the plastic stage. Elastic constitutive properties are different from the real mechanical properties of steel, and so a design based on this model calculation is inaccurate and too conservative [4,5]. The use of a plasticity constitutive model in the FEM calculation process of existing steel structures inevitably uses complex integration algorithms, and the process will go through complex iterations, which are time-consuming and have uncontrollable convergence [6,7]. Therefore, determining how to improve the efficiency of structural calculation while using the plastic constitutive model is a key objective in the industry [8,9].

With the rapid development of artificial intelligence (AI) technology, this problem can be solved. AI is essentially a neural network structure formed by using deep learning algorithms to fit and summarize a large number of existing data [10]. The structure of these neural networks is highly complex, allowing for rapid prediction and the generation of results as new data are input [11]. Accordingly, if the deep learning algorithm is used to learn the stress–strain variation law of steel in the plastic stage, then a neural network can be trained to replace the existing plastic constitutive model. The mapping method of inputting a batch of strain end to end to obtain a batch of stress can be realized. This can solve the existing plastic constitutive model’s problems of repeated iterations, a slow calculation speed, and poor convergence [12,13]. Similar work began with the use of artificial neural networks instead of material plasticity. This approach takes the strain increment and the stress and strain of the previous or first n incremental steps as inputs and the stress increment as outputs [14,15]. More recently, researchers proposed the application of an RNN to predict the hysteresis curve of metal materials and were able to predict the hysteresis curve under a cyclic load [16,17,18].

While the use of deep learning models to replace the plasticity principal structure of metallic materials is certainly efficient, its application and diffusion have been slow to advance. This is largely because the established research is mainly devoted to the innovation of deep learning algorithms, proposing models with a higher training efficiency and more accurate prediction. Issues have been detailed such as the form of the neural network structure, the mode of raw data processing, and the means of data training, but these have not been studied in depth. This makes it difficult for deep learning frameworks to be applied to FEM computational problems. Detailed research on the parameter setting and training methods required to apply the model is needed to ensure the reliability of the model. Engineers must be given parameter setting suggestions to avoid a trial-and-error cost of multiple engineers repeatedly adjusting the parameters in order to solve the same problem. The corresponding prediction ability and prediction effect of different models need to be understood, to clarify their prediction capacity for different lengths of hysteresis curves and whether they meet the demands for specific data quantities. These topics have been ignored by researchers in the field for a long time, hindering the development and application of a deep learning plastic constitutive model.

Focusing on the above problems, this paper studies the plastic constitutive model training method for steel based on an RNN. Linear isotropic hardening and linear kinematic hardening models were used for numerical experiments. A method of randomly generating stress–strain sequence datasets is proposed, which strictly follows the principles of plasticity mechanics and can create larger datasets for neural network training than traditional mechanical tests. Three recurrent neural networks suitable for time-series prediction were used to train the steel plasticity constitutive model. The effects of the dataset and neural network model parameters on model training are discussed, and an optimal training method for neural networks is given for the hysteresis curve prediction problem. This study provides a more refined method of parameterization and training of RNNs for stress–strain sequence prediction, which avoids the trial-and-error cost of repeated parameterization for this type of problem.

2. Basic Principles

2.1. The Computational Procedure of the Constitutive Model in the FEM

The traditional plastic constitutive model of materials is based on phenomenological theory. The mechanical properties of materials are identified through material tests and described by a set of mathematical expressions that can reasonably express these characteristics. With the most basic metal plastic constitutive model taken as an example, the stress–strain curve (i.e., constitutive relation) is obtained by a uniaxial tensile test and low-cycle tensile and compressive test [19,20,21].

The total strain

ε

of the material can be divided into elastic strain

ε^{e}

and plastic strain

ε^{p}

, as shown in Equations (1) and (2). The yield criterion can be used to determine whether the material is yielding, as shown in Equation (3). The flow criterion is used to determine the flow direction of the yield surface, as shown in Equation (4). The yield stress is adjusted by the hardening criterion, and the internal variable is updated, as shown in Equations (5) and (6). The elastic–plastic model should satisfy the discrete and complementary conditions required [22], as shown in Equation (7).

ε = ε^{e} + ε^{p}

(1)

σ = E ε^{e}

(2)

Φ (σ, σ_{y}) = |σ| - σ_{y}

(3)

{\dot{ε}}^{p} = \dot{γ} s i g n (σ)

(4)

σ_{y} = σ_{y} ({\bar{ε}}^{p})

(5)

{\dot{ε}}^{p} = \dot{γ}

(6)

Φ \leq 0, \dot{γ} \geq 0, \dot{γ} Φ = 0

(7)

where

E

is Young’s modulus;

σ

is the total stress;

ε

is the total strain;

Φ

is the yield function;

σ_{y}

is the yield stress;

{\dot{ε}}^{p}

is the plastic strain rate;

\dot{γ}

is the plastic multiplier;

{\bar{ε}}^{p}

is the equivalent plastic strain; and

s i g n ()

is the sign function, whose specific form is

s i g n (a) = \{\begin{matrix} + 1 i f a \geq 0 \\ - 1 i f a < 0 \end{matrix}

.

In FEM calculation, the material constitutive model needs to be used when updating the stress and internal variable state and calculating the tangential stiffness matrix. When the state of stress and the internal variable is updated, the stress

σ_{n + 1}

and internal variable

α_{n + 1}

of

t_{n + 1}

are updated by Equations (8) and (9) in an incremental time step [

t_{n}, t_{n + 1}

] and then applied to the internal force assembly of the element, as shown in Equation (10). When the tangential stiffness matrix is calculated, the correlated uniform tangential modulus is updated by Equation (11) and used in the next iteration of the FEM balance equation, as shown in Equation (12).

σ_{n + 1} = \hat{σ} (α_{n}, ε_{n + 1})

(8)

α_{n + 1} = \hat{α} (α_{n}, ε_{n + 1})

(9)

f_{e}^{i n t} = \sum_{i = 1}^{n_{g a u s p}} j_{i} w_{i} B_{i}^{T} {σ_{n + 1}|}_{i}

(10)

D \equiv \frac{\partial \hat{σ}}{\partial ε_{n + 1}}

(11)

{K_{T}}^{(e)} = \sum_{i = 1}^{n_{g a u s p}} w_{i} j_{i} B_{i}^{T} D_{i} B_{i}

(12)

where

j_{i}

is the Jacobian matrix;

w_{i}

is the Gaussian orthogonal weight;

f_{e}^{i n t}

is the internal force of the element;

D_{i}

is the correlated uniform tangent modulus;

B_{i}

is the strain displacement matrix; and

K_{T}

is the tangent stiffness matrix of the element.

The Euler discrete scheme is needed to solve the above problems. In an incremental time step [

t_{n}, t_{n + 1}

], given the elastic strain

ε_{n}^{e}

and internal variable

α_{n}

at time

t_{n}

, and with the strain increment set as

∆ ε

, the elastic strain

ε_{n + 1}^{e}

and internal variable

α_{n + 1}

at time

t_{n + 1}

are solved, as shown in Equations (13) and (14). Then, the stress

σ_{n + 1}

, internal variable

A_{n + 1},

and plastic strain

ε_{n + 1}^{p}

at time

t_{n + 1}

are obtained, as shown in Equations (15) and (16).

ε_{n + 1}^{e} = ε_{n}^{e} + ∆ ε - ∆ γ N (σ_{n + 1}, A_{n + 1})

(13)

α_{n + 1} = α_{n} + ∆ γ N (σ_{n + 1}, A_{n + 1})

(14)

σ_{n + 1} = {\bar{ρ} \frac{\partial ψ}{\partial ε^{e}}|}_{n + 1}, A_{n + 1} = {\bar{ρ} \frac{\partial ψ}{\partial α}|}_{n + 1}

(15)

ε_{n + 1}^{p} = ε_{n}^{p} + ∆ ε - ∆ ε^{e}

(16)

Generally, the solution algorithm is the fully implicit elastic predictor/return-mapping algorithm. After the material enters the plastic stage, the Newton iteration method needs to be used for several iterations, and the calculation time is long. The plastic constitutive model of steel based on a neural network can bypass the complex iterative steps and directly map the stress sequence from the input strain sequence, thus greatly reducing the calculation time of numerical simulation. The basic process is divided into three steps: dataset production, data pre-processing, and model training.

The specific process is shown in Figure 1. The establishment method does not require users to master more prior knowledge and only needs to place the stress and strain data obtained by experiments or numerical simulation into the RNN model. The neural network that can master the constitutive relationship of steel can be used in structural calculation after training.

2.2. Linear Hardening Constitutive Model for Metals

In order for the trained RNN model to predict the complex mechanical properties of steel under cyclic loading, two classical linear hardening constitutive models are selected, namely, the linear isotropic hardening model and linear kinematic hardening model [22]. The elastic parts of the two models are shown in Equations (1) and (2), and the plastic part adopts the von Mises yield criterion, shown in Equation (17), and the associated flow rule, shown in Equation (18). The difference between the two models lies in the difference in hardening criteria. The isotropic hardening criterion defines that the yield surface center remains unchanged, while the yield surface radius expands with the accumulation of equivalent plastic strain. The cyclic hardening behavior of steel can be adequately reflected. Its formula when the hardening function is linear is shown in Equation (19). The kinematic hardening criterion defines that the radius of the yield surface does not change, but the center of the yield surface moves with the accumulation of internal variables. Then, Equation (17) becomes Equation (20), and the flow rule is also transformed into Equation (21). This can reflect both the cyclic hardening behavior and the Bauschinger effect. The two-dimensional state of the two models is shown in Figure 2.

Φ (σ, σ_{y}) = |σ| - σ_{y} = {(\frac{2}{3} σ^{'} : σ^{'})}^{\frac{1}{2}} - σ_{y}

(17)

d ε_{p} = d λ \frac{\partial Φ}{\partial σ} = \frac{3}{2} d λ \frac{σ^{'}}{|σ|}

(18)

σ_{y} ({\bar{ε}}^{p}) = σ_{y 0} + H {\bar{ε}}^{p}

(19)

Φ (σ, σ_{y}) = {(\frac{2}{3} (σ^{'} - x^{'}) : (σ^{'} - x^{'}))}^{\frac{1}{2}} - σ_{y}

(20)

d ε_{p} = \frac{3}{2} d λ \frac{σ^{'} - x^{'}}{J |σ - x^{'}|}

(21)

where

σ

is the von Mises equivalent stress;

σ^{'}

is the deviator stress tensor;

σ_{y}

is the yield stress;

H

is the slope of the linear hardening function;

d λ

is the plastic multiplier; and

x^{'}

is the tensor of the kinematic hardening back stress.

The metal material used in this paper is Q235 steel, which has an elastic modulus

E

of 200 GPa, a Poisson’s ratio

v

of 0.3, an initial yield stress

σ_{y 0}

of 250 MPa, and a hardening parameter

H

of 50,000.

3. Basic Structure and Characteristics of the Recurrent Neural Network

3.1. Original Recurrent Neural Network

RNNs are applied to the establishment of metal plastic constitutive models to consider the influence of historical time data [23]. By introducing memory

h

, RNNs can associate the output

y_{t}

of the later time with the input information

h_{t - 1}

of the previous time so that the network has a memory effect, as shown in Equations (22) and (23). In the forward propagation of RNNs, the information

x_{t}

of each moment is input into the network, the memory

h_{t}

of the moment is updated by Equation (22) and passed to the next moment, and the output

y_{t}

of the moment

t

is calculated at the same time, as shown in Figure 3.

y_{t} = g (V h_{t})

(22)

h_{t} = f (U x_{t} + W h_{t - 1})

(23)

where

y_{t}

is the output value at time

t

;

g

is the activation function from the hidden layer to the output layer;

f

is the activation function from the input layer to the hidden layer;

V

is the weight matrix from the hidden layer to the output layer;

U

is the weight matrix from the input layer to the hidden layer; and

h_{t}

is the memory.

In the above approach, the RNN can build a certain short-term memory effect, and the output

y_{t}

at time

t

is related to the memory of the previous time

t - 1

so that Equation (22) can be expanded into the form of Equation (24). As can be seen from Equation (24), when the input sequence is long, the data far away from time

t

in the process of backpropagation will hardly affect the update of weight matrix

W

, resulting in gradient disappearance and gradient explosion and forgetting the distant historical inputs. As a result, the RNN has a poor long-term memory effect.

y_{t} = g (V h_{t}) = g (V f (U x_{t} + W f (U x_{t - 1} + W f (U x_{t - 2} + W f (U x_{t - 3} + \dots)))))

(24)

3.2. Long Short-Term Memory

To compensate for the shortcomings of the RNN, researchers have proposed an RNN with long-term memory effect properties, known as LSTM [24]. A dual-line parallel memory structure is formed by adding a state quantity

c

to save the part of the historical input that is needed for long-term memory, as shown in Figure 4. LSTM controls long-term memory state

c

through three gate structures: the forgetting gate, input gate, and output gate. The forgetting gate determines the proportion of the long-term memory state

c_{t - 1}

retained in the current long-term memory state

c_{t}

, as shown in Equation (25).

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(25)

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(26)

\tilde{c_{t}} = t a n h (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})

(27)

c_{t} = f_{t} \cdot c_{t - 1} + i_{t} \cdot \tilde{c_{t}}

(28)

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(29)

h_{t} = o_{t} \cdot t a n h c_{t}

(30)

where

f_{t}

is the output value of the forgetting gate;

\tilde{c_{t}}

is the information formed after the induction of current information and historical information, which is generally called the candidate state;

i_{t}

represents the proportion of the candidate state in the quantity of long-term memory states;

o_{t}

represents the output value of the output gate; and so on.

W_{f}

,

W_{i}

,

W_{c}

, and

W_{o}

are the weight matrices of the forgetting gate, input gate, memory state, and output gate, respectively.

b_{f}

,

b_{i}

,

b_{c}

, and

b_{o}

are the bias terms of the forgetting gate, input gate, memory state, and output gate, respectively.

σ

and

t a n h

are activation functions, whose specific forms are

σ (x) = \frac{1}{1 + e^{- x}}

and

t a n h (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

, respectively.

3.3. Gated Recurrent Unit

The structure of LSTM is complex, the number of parameters in training is large, and the internal structure of some gate elements is too weak to improve the training ability of the model. Therefore, some researchers simplified it and proposed a GRU, which includes only two gate structures, namely, the update gate and reset gate [25], as shown in Figure 5. The GRU has one less state output than LSTM, so the number of training parameters is greatly reduced, and it is a lightweight time-series neural network model. The memory update to the GRU is determined jointly by the update gate and the reset gate, as shown in Equations (31) and (32), and the update formula of the update gate and the reset gate are shown in Equations (33) and (34).

h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * \tilde{h_{t}}

(31)

\tilde{h_{t}} = t a n h (W \cdot [r_{t} * h_{t - 1}, x_{t}] + b_{h})

(32)

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}] + b_{z})

(33)

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}] + b_{r})

(34)

where

z_{t}

and

r_{t}

are the output of the update gate and reset gate, respectively;

\tilde{h_{t}}

is the candidate state, which functions to summarize the current input information and historical information;

W

,

W_{z}

, and

W_{r}

are the weight matrices of the candidate state, update gate, and reset gate, respectively; and

b_{h}

,

b_{z}

, and

b_{r}

are the bias terms of the candidate state, update gate, and reset gate, respectively.

The three kinds of RNNs have their own characteristics and different applicability to different data. Therefore, this paper will comprehensively discuss the performance of the three kinds of RNNs with different data, give their respective application ranges, and study their training methods.

4. Data Generation and Model Training

4.1. Data Preparation

4.1.1. Data Generation

On the basis of the two linear hardening constitutive models described in Section 2.2, a script was written in Python to randomly generate stress–strain sequences of the two constitutive models. Its main tasks were divided into two parts: one was to generate a batch of random strain loading paths through a random process; and the other was to input the random strain loading paths into the theoretical model in Section 2.2 to calculate the stress paths. The process was divided into four steps: (1) n data points with a Gaussian random distribution are generated randomly; (2) 50 loading steps are inserted into these points; (3) the generated data are multiplied by 0.01 to limit the strain data to a reasonable range [−0.03,0.03]; and (4) the stress path is calculated using theoretical formulas. The calculation method uses Equations (17)–(21). Using this method, 283,500 random curves were generated with curve lengths ranging from 200 to 2000, with the specific parameters shown in Table 1. Of these, 141,750 were used as the training set, 70,875 as the validation set, and 70,875 as the test set. The code for generating the database is shown in Appendix A.

4.1.2. Data Pre-Processing

Generally, the data used for training are difficult to distribute evenly. If the data are not pre-processed before training, then the gradient gap between the data will be large, which will lead to a poor training efficiency, gradient explosion, and gradient disappearance. The traditional deep learning standardization methods are mainly min–max standardization and mean variance standardization, and the calculation methods are shown in Equations (35) and (36), respectively. However, the sequence data processed in this paper are strain and stress, which have a large difference in order of magnitude, and the effect of the above method is poor, as shown in Figure 6. The stress–strain data obtained by mean variance standardization differ greatly in order of magnitude. As a result, the deep learning model is unable to extract features, thus making subsequent training impossible. Although min–max standardization can reduce the stress and strain data to an order of magnitude, it deletes all the negative parts of the data, and metal materials cannot be free from pressure. Thus, this method has serious defects. Therefore, this paper adopts the nonlinear reduction method as the data pre-processing method [26], and the calculation method is shown in Equation (37). After the same data are processed, the stress and strain data can be reduced to an order of magnitude and the negative part of the data can be considered at the same time.

\hat{x_{i}} = \frac{x_{i} - m i n (X)}{\max (X) - m i n (X)}

(35)

\hat{x_{i}} = \frac{x_{i} - μ}{σ}

(36)

\hat{x_{i}} = t a n h (\frac{x_{i}}{X_{r e f}})

(37)

where

x_{i}

is the input feature,

X

= (

x_{1}, x_{2}, x_{3}, \dots {, x}_{n}

);

m i n ()

and

m a x ()

are the minimum and maximum values;

μ

and

σ

are the mean and variance of

X

, respectively;

\hat{x_{i}}

is the normalized feature;

t a n h (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

; and

X_{r e f}

is the nonlinear shrinkage coefficient, which takes a value that can shrink the data to an order of magnitude.

4.2. Model Training Method

The equipment used to train the deep learning model in this study is a laboratory workstation equipped with an Intel(R) Core(TM) i9-10980XE processor (Produced by Intel Corporation, Santa Clara, CA, USA) with 16 cores and a GTX2070 SUPER graphics card (Produced by NVIDIA Corporation, Santa Clara, CA, USA) with 8 GB video memory. The deep learning framework used is TensorFlow2.0, which has all the basic methods needed for training deep learning, is optimized in the calling method, and has the characteristics of convenient deployment and a fast training speed. Three kinds of RNNs (RNN, LSTM, and GRU) were trained to predict the plastic cyclic mechanical response for steel. Stress–strain sequences of different lengths were generated as training data to study the predictive ability of the three models. Different network structure forms, data entry forms, and learning rate setting methods were set up to study the influence of model parameters on the training effect, and the specific parameters are shown in Table 1. The neural network optimizer selected the Adam optimizer, and the model evaluation method selected the mean square error (MSE), whose calculation method is shown in Equation (38). A low MSE corresponds to a small gap between the predicted value and the true value, and thus a better prediction accuracy of the model.

M S E = \frac{1}{N} \sum_{n = 1}^{N} (y - \hat{y})

(38)

where

N

is the total number of samples,

y

is the real value of the data, and

\hat{y}

is the predicted value.

5. Influence Effect and Prediction Results of Model Parameters

5.1. Model Parameter Influence Effect

5.1.1. Influence of Recurrent Neural Network Structure

Stress–strain datasets generated randomly by a linear hardening constitutive model were used in the study of the effects of model parameters. The performance of the neural network was evaluated in terms of the number of neurons and the number of layers. In order to discuss the effect of the network structure, the sequence length and the size of the dataset were controlled to be medium, with the sequence length selected as 1000 and the training data size as 2000 groups. The data were trained by maintaining a fixed learning rate of 0.001, training 50 epochs, and feeding 50 sets of data each time. The effects of the number of neurons on the model performance and training time in the single-layer network are shown in Figure 7. The influence of the number of hidden layers on the model performance and training time is shown in Figure 8.

As can be seen from Figure 7, when only a single-layer neural network is used, the type of neural network essentially determines the model performance. Two kinds of neural networks that introduce the forgetting mechanism—LSTM and the GRU—improve the model accuracy more than traditional RNNs, and their training speed is more than 10 times that of traditional RNNs. However, increasing the number of neurons cannot improve the performance of the model and will greatly increase the training time of the model. When the number of neurons exceeds 100, the accuracy of the model will increase almost to zero, and the training time will almost double.

As can be seen from Figure 8, when multi-layer neural networks are introduced, the model accuracy gap of the three kinds of neural networks is further widened, with GRU being the best, followed by LSTM and the RNN. Moreover, a number of deep layers corresponds to a great improvement in model accuracy. The three models all perform at their best with a two-layer neural network. However, stacking the number of layers of the neural network will not improve the accuracy of the model and will lead to a significant increase in training time. Figure 8b shows that the training time of traditional RNNs is about 18 to 300 times that of the other two neural network structures. The findings indicate that the RNN is not suitable as a neural network model for medium-length sequences (sequence length is about 1000).

After studying the single-layer neural network model and the multi-layer neural network model, we analyzed the influence of the two-layer neural network model with a non-equal number of neurons to further analyze the influence of neural network topology on model training. The cyclic neural network model was combined in sequential and reverse orders to explore whether the quality of the model could be further improved. The topology structure is represented by Ai-j and Bi-j, where A and B represent LSTM and the GRU, respectively; i represents the number of neurons corresponding to the first layer; and j represents the number of neurons corresponding to the second layer, as shown in Figure 9.

As can be seen from Figure 9, no matter how the topology of the neural network changes, the model trained by the GRU is more accurate than that trained by LSTM. In addition, the training time of the GRU is generally more than 50 s shorter than that of LSTM when the number of layers and neurons are the same, thereby indicating that the GRU has advantages in addressing the problem of a medium-scale stress–strain sequence.

A comparison among neural network models with different topologies showed that the accuracy of the double-layer model with an equal number of neurons in the GRU was better. When the gap between the number of neurons in the hidden layer was too large, the prediction effect of the GRU was poor. For example, when the number of neurons in the first layer was 50, the prediction accuracy of the model decreased rapidly when the number of neurons in the second layer reached 200.

However, in LSTM, the structure of equal neurons is not optimal, and when the number of neurons in adjacent hidden layers has a gradient, this can improve the generalization ability of the model. Moreover, a sequential-structure RNN has the strongest generalization ability after the completion of training, but this effect decreases with the increase in the gap in the number of neurons in adjacent hidden layers, thereby indicating that the proper arrangement of the neural network structure with a gradient is appropriate, and the order of the network structure arrangement does not affect the training time. However, when the number of neurons between the two layers is too large, this will not only make the training time too long but also weaken the lifting effect and even produce negative effects.

The arrangement of the number of neurons in the two layers also satisfies a general rule, that is, the model trained when the number of neurons in the second layer is an even multiple of the number of neurons in the first layer has a better prediction accuracy than that trained when the number of neurons in the first layer is an odd multiple, and the structure of the sequential arrangement is more conducive to the improvement of the model performance.

In summary, the optimal structure of the GRU is B150-150, which is an equal neuron structure, and each hidden layer contains 150 neurons. The optimal structure of LSTM is A50-100, which is a non-equal neuron structure, and the first and second layers contain 50 and 100 neurons, respectively.

5.1.2. Training Frequency and Training Batch

The data training frequency and training batch are two important parameters of model training. The memory of the computer is limited, which is why the dataset is often sent to the model for training several times in the process of deep learning training to save memory and improve the training efficiency. The quantity training frequency is controlled by the number of feed-in data per batch, which in this model refers to the number of training inputs per stress–strain sequence, while the training batch refers to the total number of times the dataset is trained. In this case, the optimal structure of two kinds of RNNs (LSTM: A50-100, GRU: B150-150) was used to train the medium-scale stress–strain sequence, and the two parameters were studied by controlling the training frequency and training batches. The maximum training batch was set as 100, and the results are shown in Figure 10.

As can be seen from Figure 10, the MSE of the two models decreases continuously with the increase in training batches during the training process, which can be divided into three parts: rapid decline segment, transition segment, and stable decline segment. The decline rate is initially fast and then slow and gradually approaches 0. LSTM can essentially enter the stable decline stage after 10 training batches, while the GRU can enter the stable decline stage after only 5 training batches, indicating that the convergence speed of the GRU is faster. In addition, during the training process, the MSE descent curve of the GRU is smooth and the training process is more stable, while the LSTM keeps jumping in the stable descent stage, which indicates that the gradient descent of the model is less stable.

With the increase in each batch of input data, the training time will be greatly reduced, but the accuracy of the model will be decreased. Overall, when the number of batch sizes is the same, the accuracy of the GRU is always higher than that of LSTM, but when the batch size exceeds 140, the accuracy of the GRU is similar to or even higher than that of LSTM. When the training time and training accuracy are considered together, the model has higher accuracy and a shorter training time when the batch size is 60. Therefore, 60 is the preferred batch size of the model.

5.1.3. Learning Rate of the Optimizer

The deep learning optimizer selected in this paper is the Adam optimizer, and the calculation method of its updated parameters is shown in Equation (39). The learning rate

l_{r}

in the formula is an important factor that affects the update of model parameters. In this paper, the influence of this parameter on the model is discussed by comparing the method of setting the constant learning rate and the method of setting the decay learning rate, as shown in Figure 11.

w_{t + 1} = w_{t} - l_{r} \cdot \frac{m_{t}}{1 - β_{1}^{t}} / \sqrt{\frac{V_{t}}{1 - β_{2}^{t}}}

(39)

where

w_{t + 1}

and

w_{t}

are the parameters before and after the update;

l_{r}

is the learning rate;

m_{t}

and

V_{t}

are the first-order momentum and second-order momentum, respectively; and

β_{1}^{t}

and

β_{2}^{t}

are the correction coefficients of first-order momentum and second-order momentum, respectively.

As can be seen from Figure 11, the MSE training curve of the model with a small learning rate is relatively smooth, but it cannot achieve a more accurate prediction effect after training 100 epochs. When the learning rate is large, although the curve can converge quickly, the curve oscillation is more intense, especially when the fixed learning rate is greater than 0.001. Whether with LSTM or the GRU, this phenomenon is more significant, which shows that the generalization ability of the neural network trained by the model is weak, showing false accuracy. The use of the attenuation learning rate setting method can alleviate the oscillation phenomenon of the MSE curve to a certain extent and achieve rapid convergence of the model while ensuring the reliability of the model because the characteristics of the attenuation learning rate method ensure that the learning rate gradually decreases with the increase in training batches. This feature ensures the stability of the model when a high learning rate is used in the early stage and a low learning rate is used in the later stage.

See Figure 12 for the results after extracting the number of epochs when the MSE is first reduced below 2.5 × 10⁻⁵ under different learning rate schemes. It can be found that the higher the initial learning rate of LSTM, the slower the convergence, while the opposite is true for the GRU. The use of a decaying learning rate enables the model to achieve the desired prediction accuracy in a lesser number of iterations. Since the two RNN structures chosen are already optimal, the neural networks trained to accurately predict the stress–strain curves often have no more than 70 iterations.

5.2. Forecast Result

The optimal neural network structure, training frequency, and optimizer learning rate of LSTM and the GRU were obtained by analyzing the effects of model parameters with a medium-scale dataset. The training ability and training speed of the original RNN in medium-scale datasets were no longer satisfactory. Thus, this paper offer analyses only of the prediction results of LSTM and the GRU.

In this section, we demonstrate how we applied the conclusions of Section 3 to train datasets of different sizes to analyze the predictive power of LSTM and the GRU. The structures of the two RNNs were selected as A50-100 and B150-150, the batch size was 60, the initial learning rate of the Adam optimizer was 0.001, the attenuated learning rate method was enabled, and the training batch was 100. The lengths of the stress–strain sequence were divided into short (200/400/600), medium (800/1000/1200), and long (1600/1800/2000). The dataset used in the analysis of model prediction ability was obtained through the calculation of the linear isotropic hardening constitutive model, and the dataset used in the analysis of model prediction results was based on the linear kinematic hardening constitutive dataset.

5.2.1. Model Prediction Ability

After the model was trained in the above steps, 10% of the samples in each group were selected as verification sets to verify the prediction results. In the process of verification, we found that the size of the training set had a great influence on the predictive ability of the model. With a dataset with a stress–strain sequence length of 1000 taken as an example, the model prediction results obtained by training datasets of different sizes are shown in Figure 13.

As can be seen from Figure 13, the model trained by LSTM with few data lacks generalization power. When the size of the training set is less than 6000, the model can essentially predict the trend of the hysteresis curve, but its deviation from the real values is large. When the number of training data reaches or exceeds 6000, the model shows a better prediction accuracy, but some mutation points are still present in the prediction results. These mutations occur in a small range and can be automatically corrected quickly. The GRU has a small demand for data, so it can predict hysteresis curves with a high accuracy when the number of data reaches 2000, and no data point mutation occurs, unlike in LSTM. The prediction results of the two models under different scales of stress–strain sequences are shown in Table 2 and Table 3, respectively. In the table, √ indicates an accurate prediction, — indicates that the trend can be predicted but has a small error, and × indicates that the trend cannot be predicted or the trend can be predicted but the error is large.

Table 2 and Table 3 comprehensively reflect the ability of the two models to predict stress–strain curves of different scales from a qualitative point of view. For small-scale series (length less than or equal to 600), both models have a strong prediction ability, but LSTM cannot make accurate predictions when there are fewer than 6000 data points, while the GRU can make accurate predictions when the number of data points reaches 4000. For medium-scale sequences (1000 to 1400 in length), LSTM requires 8000 or more data points for an accurate prediction, while the GRU still requires only 4000. For large-scale sequences (length greater than or equal to 1600), the LSTM’s predictive power is weak, requiring a data volume of more than 10,000 to ensure basic accuracy, while the GRU still requires 4000.

The prediction ability of the two models for stress–strain sequences of different sizes under different training set sizes was further demonstrated quantitatively by calculating the MSE value predicted by each model, which can reflect the difference between the predicted value and the real value, as shown in Figure 14.

As can be seen from Figure 14, the model’s prediction ability is positively correlated with the size of the training dataset and inversely correlated with the sequence length. When LSTM is used for training, and when the number of data points is less than 6000, increasing the amount of data can rapidly reduce the MSE of the predicted value; when the number of data is greater than 6000, increasing the amount of data can still reduce the MSE to a certain extent, but the curve tends to be flat and the decline rate low. This finding indicates that the data quantity cut-off point for LSTM prediction of the stress–strain curve problem is 6000, after which the prediction ability stops improving. However, when the GRU is used for training, the data quantity cut-off point is 4000, and the curve still has a large negative slope after 4000. This finding shows that although the GRU can accurately predict hysteresis curves after the number of data reaches 4000, the accuracy can still be improved to a certain extent with a further increase in the amount of data, and the training and prediction potential of the GRU model is greater than that of LSTM.

Figure 14 shows the prediction abilities of the two models for sequences of different lengths. For LSTM, when the number of data points is less than 6000, predicting sequences of various lengths is basically impossible. When the number of data reaches 6000, only short sequences can be predicted, and when the number of data points exceeds 8000, only medium-long sequences can be predicted. For GRUs, short sequences can be predicted after the number of data reaches 2000, and medium-long sequences can be predicted after 4000. The mean value of MSE in each length of series is taken as the predictive ability of the model for the length of series. In short, medium, and long sequences, the prediction abilities of the GRU were 6.13, 6.7, and 3.3 times those of LSTM, respectively.

5.2.2. Model Prediction Effect

After the prediction abilities of the two models were analyzed, three long sequences were selected for analysis to show the prediction effect of the models more clearly and assess the difference between the predicted value and the real value. All the models used were GRUs, as shown in Figure 15.

Figure 14 shows that the optimal parameter setting and training methods summarized in this paper produced relatively accurate prediction effects for the two commonly used steel structures. Driven by data, the deep learning model could better learn the two constitutive model features and quickly predict the stress sequence after the input strain sequence. The total time of this process did not exceed 0.01 s, thus having strong engineering application value.

In summary, the metal neural network model trained in this study can be well-applied to mechanical response prediction problems and has the potential to be integrated with the FEM. The computational framework of existing FEM program is largely the same, which is mainly based on the displacement field driven by integrating the global stiffness matrix and nodal force matrix and solving the equilibrium equations iteratively. The local iteration at the material level involves stress updating and consistent elastic–plastic tangent modulus (i.e., material Jacobian matrix) updating, where the constitutive model comes into play. Generally customized constitutive models can be implemented through material subroutine interfaces where the neural network model trained in this paper replaces the traditional constitutive model.

6. Conclusions

In this paper, deep learning model training methods for steel hysteresis curve prediction have been studied. On the basis of the linear isotropic hardening constitutive model and linear kinematic hardening constitutive model, random stress–strain curves were generated as datasets, and three kinds of RNNs were trained as models. Various model parameters and model training methods were discussed, and the following conclusions were drawn:

(1): During data generation and processing, the data pre-processing method of the stress–strain curve prediction problem seriously affects the accuracy of the model. Using the nonlinear square reduction method to process the data to the same order of magnitude can better ensure that gradient explosion and gradient disappearance do not occur during the training of the model. The method of randomly generating a strain sequence and calculating a stress sequence based on a Gaussian random distribution is reasonable, and the generated curve can reflect the corresponding constitutive model.
(2): In terms of model type and structure, RNNs are weak in stress–strain prediction, with a long training time and poor accuracy. The GRU and LSTM are suitable for this kind of problem. The marginal effect of the depth and number of simple stacked neural networks decreases, which is not only unable to effectively improve the model performance but also increases the training time. A two-layer structure can predict such problems. The optimal structures of LSTM and the GRU are A50-100 and B150-150, respectively.
(3): In terms of model parameters and training methods, the accuracy of the model will increase with the decrease in batch size and the increase in training batches, but the training time will also increase. The optimal batch size and training batches are 60 and 100, respectively. The influence of the learning rate of the model was discussed. Setting the initial learning rate of 0.001 and setting the decay learning rate can improve the accuracy of the model.
(4): The prediction ability of the model is positively correlated with the number of training set data and negatively correlated with the length of the required prediction curve. The GRU is less dependent on datasets than LSTM is, and both models are more accurate for short series’ prediction. In the case of short, medium, and long sequences, the prediction abilities of the GRU are 6.13, 6.7, and 3.3 times those of LSTM, respectively. Moreover, the proposed method can learn the law of two kinds of linear hardening constitutive models well and has a good hysteretic curve prediction ability.
(5): The method proposed in this paper is mainly applicable to the prediction of the linear hardening principal relationship for steel and can predict the lengths of data series with up to 2000 entries. The trained constitutive neural network needs to be combined with a customized material interface if it is to be applied to existing commercial or open-source numerical simulation software, which will be further investigated in future research.

Author Contributions

Conceptualization, T.W.; methodology, Y.Y.; software, T.W. and H.L.; validation, Y.Y. and Z.W.; formal analysis, T.W.; investigation, H.L. and Z.W.; resources, Y.Y. and Z.W.; data curation, Y.Y.; writing—original draft preparation, T.W.; writing—review and editing, T.W.; visualization, H.L.; supervision, Z.W. and Y.Y.; project administration, Z.W.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jilin Scientific and Technological Development Program, grant number 20240101130JC, and the National Natural Science Foundation of China, grant numbers 42372356 and 51991362.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that this research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Appendix A

(1) Random code generation of stress–strain curves

### Importing Numpy, Matplotlib, Pandas libraries ###

import numpy as np

from matplotlib import pyplot as plt

import pandas as pd

### Step 1: Initialise parameters ###

E = 200000#Modulus of elasticity

v = 0.3#Poisson’s ratio

G = E/(2 * (1 + v)) # Shear modulus

lamme = v * E/(1 + v)/(1 – 2 * v) # Lamey’s constant

sigma0 = 250# Initial yield stress

H = 50000# Hardening parameters

stress = np.zeros((1,1000),dtype = ‘float’)# Stress initialisation

dstress = np.zeros((1,1000),dtype = ‘float’)# Stress increment initialisation

strain = np.zeros((1,1000),dtype = ‘float’)# Strain initialisation

dstrain = np.zeros((1,1000),dtype = ‘float’)# Strain increment initialisation

estrain = np.zeros((1,1000),dtype = ‘float’)# Equivalent plastic strain initialisation

destrain = np.zeros((1,1000),dtype = ‘float’)# Equivalent plastic strain increment initialisation

yieldstress = np.zeros((1,1000),dtype = ‘float’)# Yield stress initialisation

### Step 2: Generate random strain data ###

data = []

detax2 = np.arange(0,1000,1)

df = pd.DataFrame(data)

df2 = pd.DataFrame(data)

df3 = pd.DataFrame(data)

for i in np.arange(0,10000,1):

y1 = np.random.randn(11) # Generate 11 random points

y1[0] = 0

y1 = y1 * 0.02

x1 = np.arange(0, 11, 1)

x2 = np.arange(0, 10, 0.01)

y2 = np.interp(x2, x1, y1)

df[‘%s’ %i] = pd.Series(data = y2, index = detax2)

for j in np.arange(1,1000,1):

dstrain [0,j] = y2[j] − y2[j − 1]

df2[‘%s’%i] = pd.Series(data = dstrain [0], index = detax2)

#### Step 3: Calculation of stresses using linear hardening constitutive ####

for k in range(1, 1000, 1):

stress_try = stress [0, k-1] + (2 * G + lamme) * dstrain [0, k] # Test stress

yieldstress [0, k] = sigma0 + H * estrain [0, k − 1] # Yield stress

yieldfunction = np.abs(stress_try) − yieldstress [0, k] # Yield function

if np.all(yieldfunction <= 0): # Determining whether or not to yield

stress [0, k] = stress_try # If there is no yielding

destrain [0, k] = 0 # Equivalent plastic strain increment is 0

estrain [0, k] = estrain [0, k − 1] + destrain [0, k]

else:

destrain [0, k] = yieldfunction/(H + 3 * G) # If yielding, calculate the equivalent plastic strain increment

estrain [0, k] = estrain [0, k-1] + destrain [0, k] # Calculation of cumulative equivalent plastic strain

stress [0, k] = stress_try − (2 * G + lamme) * stress [0, k-1] * destrain [0, k]/np.abs(stress [0, k-1])

df3[‘%s’ %i] = pd.Series(data = stress [0], index = detax2)

#### Step 4:Write randomly generated data to an excel table####

df.to_excel(‘Random strain data.xlsx’, sheet_name = ‘Random strain data ‘)

df2.to_excel(‘Random strain incremental data.xlsx’, sheet_name = ‘Random strain incremental data ‘)

df3.to_excel(‘Random stress data.xlsx’, sheet_name = ‘Random stress data ‘)

(2) LSTM training code (Using the TensorFlow framework)

###Importing TensorFlow, numpy, pandas libraries###

import numpy as np

import tensorflow as tf

from tensorflow.keras.layers import Dropout, Dense, LSTM, Activation

import matplotlib.pyplot as plt

import os

import pandas as pd

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error, mean_absolute_error

import math

### Step1: Data initialisation###

stress_data = pd.read_excel(‘./Random stress data.xlsx’)

strain_data = pd.read_excel(‘./Random strain data.xlsx’)

dstrain_data = pd.read_excel(‘./Random strain incremental data.xlsx’)

stress_train = stress_data.iloc [0:1001, 0:800].values.transpose() # Training set stress data

strain_train = strain_data.iloc [0:1001, 0:800].values.transpose() # Training set strain data

dstrain_train = dstrain_data.iloc [0:1001,0:800].values.transpose() #Training set strain incremental data

stress_test = stress_data.iloc [0:1001,801:].values.transpose() # Test set stress data

strain_test = strain_data.iloc [0:1001, 801:].values.transpose() # Test set strain data

dstrain_test = dstrain_data.iloc [0:1001,801:].values.transpose() # Test set strain incremental data

### Step2: Standardisation###

sc1 = MinMaxScaler(feature_range = (0,1))

stress_train_scaled = sc1.fit_transform(stress_train)

stress_test_scaled = sc1.fit_transform(stress_test)

sc2 = MinMaxScaler(feature_range = (0,1))

strain_train_scaled = sc2.fit_transform(strain_train)

strain_test_scaled = sc2.fit_transform(strain_test)

### Step3: Disrupting the training and test sets###

np.random.seed(7)

np.random.shuffle(strain_train_scaled)

np.random.seed(7)

np.random.shuffle(stress_train_scaled)

np.random.seed(7)

np.random.shuffle(strain_test_scaled)

np.random.seed(7)

np.random.shuffle(stress_test_scaled)

tf.random.set_seed(7)

x_train, y_train = np.array(strain_train_scaled), np.array(stress_train_scaled)

x_train = np.reshape(x_train, (len(x_train),len(x_train [0]), 1))

y_train = np.reshape(y_train, (len(y_train),len(y_train [0]), 1))

x_test, y_test = np.array(strain_test_scaled), np.array(stress_test_scaled)

x_test = np.reshape(x_test, (len(x_test),len(x_test [0]), 1))

y_test = np.reshape(y_test, (len(y_test),len(y_test [0]), 1))

### Step4: Training###

model = tf.keras.Sequential([

LSTM(100, return_sequences = True),

Dense(1)

])

model.compile(optimizer = tf.keras.optimizers.Adam(0.001),

loss = ‘mean_squared_error’) # Loss function with mean square error

checkpoint_save_path = “./checkpoint/stress_strain.ckpt”

if os.path.exists(checkpoint_save_path + ‘.index’):

print(‘-------------load the model-----------------’)

model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath = checkpoint_save_path,

save_weights_only = True, save_best_only = True, monitor = ‘val_loss’)

history = model.fit(x_train, y_train, batch_size = 10, epochs = 100, validation_data = (x_test, y_test), validation_freq = 1,

callbacks = [cp_callback])

### Step5: Parameter extraction ###

model.summary()

file = open(‘ The trained neural network/weights.txt’, ‘w’)

for v in model.trainable_variables:

file.write(str(v.name) + ‘\n’)

file.write(str(v.shape) + ‘\n’)

file.write(str(v.numpy()) + ‘\n’)

file.close()

loss = history.history[‘loss’]

val_loss = history.history[‘val_loss’]

References

Wang, C.; Fan, J.S.; Xu, L.Y.; Nie, X. Cyclic Hardening and Softening Behavior of the Low Yield Point Steel: Implementation and Validation. Eng. Struct. 2020, 210, 110220. [Google Scholar] [CrossRef]
Wang, M.; Ke, X.G. Seismic Design of Widening Flange Connection with Fuses Based on Energy Dissipation. J. Constr. Steel Res. 2020, 170, 106076. [Google Scholar] [CrossRef]
Xu, L.Y.; Nie, X.; Fan, J.S. Cyclic Behaviour of Low-Yield-Point Steel Shear Panel Dampers. Eng. Struct. 2016, 126, 391–404. [Google Scholar] [CrossRef]
Spacone, E.; El-Tawil, S. Nonlinear Analysis of Steel-Concrete Composite Structures: State of the Art. J. Struct. Eng. 2004, 130, 159–168. [Google Scholar] [CrossRef]
Thai, H.T.; Nguyen, T.K.; Lee, S.; Patel, V.I.; Vo, T.P. Review of Nonlinear Analysis and Modeling of Steel and Composite Structures. Int. J. Struct. Stab. Dyn. 2020, 20, 2030003. [Google Scholar] [CrossRef]
Tang, Z.; Yang, X.; Liu, Q.; Pan, Y.; Kong, L.; Zhuge, H. Elastoplastic Hysteretic Behavior and Constitutive Models of In-Service Structural Steel Considering Fatigue-Induced Pre-Damages. Constr. Build. Mater. 2023, 392, 131912. [Google Scholar] [CrossRef]
Olivier, G.; Csillag, F.; Christoforidou, A.; Tromp, L.; Veltkamp, M.; Pavlovic, M. Feasibility of Bolted Connectors in Hybrid FRP-Steel Structures. Constr. Build. Mater. 2023, 383, 131100. [Google Scholar] [CrossRef]
Wang, Y.-Z.; Kanvinde, A.; Li, G.-Q.; Wang, Y.-B. A New Constitutive Model for High Strength Structural Steels. J. Constr. Steel Res. 2021, 182, 106646. [Google Scholar] [CrossRef]
Zhao, G.; Liu, J.; Meng, S.; Liu, C.; Wang, Q. Performance of Corrugated Steel Plate Flange Joint under Combined Compression and Bending: Experimental and Numerical Investigations. Constr. Build. Mater. 2023, 389, 131798. [Google Scholar] [CrossRef]
Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Hasan, M.; Van Essen, B.C.; Awwal, A.A.S.; Asari, V.K. A State-of-the-Art Survey on Deep Learning Theory and Architectures. Electronics 2019, 8, 292. [Google Scholar] [CrossRef]
Zhang, W.G.; Li, H.R.; Li, Y.Q.; Liu, H.L.; Chen, Y.M.; Ding, X.M. Application of Deep Learning Algorithms in Geotechnical Engineering: A Short Critical Review. Artif. Intell. Rev. 2021, 54, 5633–5673. [Google Scholar] [CrossRef]
Liu, D.P.; Yang, H.; Elkhodary, K.I.; Tang, S.; Guo, X. Cyclic Softening in Nonlocal Shells-A Data-Driven Graph-Gradient Plasticity Approach. Extreme Mech. Lett. 2023, 60, 101995. [Google Scholar] [CrossRef]
Liu, D.P.; Yang, H.; Elkhodary, K.I.; Tang, S.; Liu, W.K.; Guo, X. Mechanistically Informed Data-Driven Modeling of Cyclic Plasticity via Artificial Neural Networks. Comput. Methods Appl. Mech. Eng. 2022, 393, 114766. [Google Scholar] [CrossRef]
Ghaboussi, J.; Sidarta, D.E. New Nested Adaptive Neural Networks (NANN) for Constitutive Modeling. Comput. Geotech. 1998, 22, 29–52. [Google Scholar] [CrossRef]
Ghaboussi, J.; Garrett, J.H.; Wu, X. Knowledge-Based Modeling of Material Behavior with Neural Networks. J. Eng. Mech. 1991, 117, 132–153. [Google Scholar] [CrossRef]
Mozaffar, M.; Bostanabad, R.; Chen, W.; Ehmann, K.; Cao, J.; Bessa, M.A. Deep Learning Predicts Path-Dependent Plasticity. Proc. Natl. Acad. Sci. USA 2019, 116, 26414–26420. [Google Scholar] [CrossRef]
Zopf, C.; Kaliske, M. Numerical Characterisation of Uncured Elastomers by a Neural Network Based Approach. Comput. Struct. 2017, 182, 504–525. [Google Scholar] [CrossRef]
Logarzo, H.J.; Capuano, G.; Rimoli, J.J. Smart Constitutive Laws: Inelastic Homogenization through Machine Learning. Comput. Methods Appl. Mech. Eng. 2021, 373, 113482. [Google Scholar] [CrossRef]
Zhu, H.; Zhang, C.; Chen, S.; Wu, J. A Modified Johnson-Cook Constitutive Model for Structural Steel after Cooling from High Temperature. Constr. Build. Mater. 2022, 340, 127746. [Google Scholar] [CrossRef]
Hai, L.; Ban, H.; Huang, C.; Shi, Y. Experimental Cyclic Behaviour and Constitutive Modelling of Hot-Rolled Titanium-Clad Bimetallic Steel. Constr. Build. Mater. 2022, 360, 129591. [Google Scholar] [CrossRef]
Xue, X.; Ding, Z.; Huang, L.; Hua, J.; Wang, N. Residual Monotonic Stress–Strain Property of Q690 High-Strength Steel: Experimental Investigation and Constitutive Model. Constr. Build. Mater. 2023, 392, 132010. [Google Scholar] [CrossRef]
De Souza Neto, E.A.; Peric, D.; Owen, D.R.J. Computational Methods for Plasticity: Theory and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
Georgiopoulos, M.; Li, C.; Kocak, T. Learning in the Feed-Forward Random Neural Network: A Critical Review. Perform. Eval. 2011, 68, 361–384. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.S.; Hu, C.H.; Zhang, J.X. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
Gao, S.; Huang, Y.F.; Zhang, S.; Han, J.C.; Wang, G.Q.; Zhang, M.X.; Lin, Q.S. Short-Term Runoff Prediction with GRU and LSTM Networks without Requiring Time Step Optimization during Sample Generation. J. Hydrol. 2020, 589, 125188. [Google Scholar] [CrossRef]
Wang, C.; Xu, L.Y.; Fan, J.S. A General Deep Learning Framework for History-Dependent Response Prediction Based on UA-Seq2Seq Model. Comput. Methods Appl. Mech. Eng. 2020, 372, 113357. [Google Scholar] [CrossRef]

Figure 1. Traditional constitutive model construction and deep learning constitutive model construction flow.

Figure 2. Linear hardening constitutive model: (a) linear isotropic hardening constitutive; (b) linear kinematic hardening constitutive.

Figure 3. Original RNN structure.

Figure 4. LSTM network structure.

Figure 5. GRU network structure.

Figure 6. Comparison of data pre-processing methods.

Figure 7. The effect of the number of neurons on the model: (a) the effect on the model performance; (b) the effect on the training time.

Figure 8. Influence of hidden layers on model performance: (a) influence on model performance; (b) influence on training time.

Figure 9. Influence of neural network topology on model performance and training time: (a) influence on model performance; (b) influence on training time.

Figure 10. Effects of training frequency and training batches on the model: (a) LSTM; (b) GRU.

Figure 11. Influence of learning rate on the model: (a) LSTM; (b) GRU.

Figure 12. Influence of learning rate on the number of iterations.

Figure 13. Model prediction effects of different dataset sizes: (a–g) prediction curve of LSTM when the dataset size is 500–10,000; (h–n) prediction curve of GRU when the dataset size is 500–10,000.

Figure 14. Prediction capabilities of LSTM and GRU: (a) LSTM; (b) GRU.

Figure 15. Prediction effect of the model: (a–c) linear isotropic constitutive hardening, and (d–f) linear kinematic constitutive hardening.

Table 1. Deep learning model parameters.

Parameter Class	Parameter Setting Situation
Type of model	RNN/LSTM/GRU
Sequence length	200/400/600/1000/1200/1400/1600/1800/2000
Training data size	500/1000/2000/4000/6000/8000/10,000
Network layer number	1/2/3/4/5
Number of neurons	50/100/150/200/250/300
Learning rate	Constant learning rate/Decay learning rate
Training batch	0–100
Batch size	10/20/40/60/80/100/120/140/160/180/200

Table 2. Forecast results of LSTM.

Forecast Result		Training Set Size
Forecast Result		500	1000	2000	4000	6000	8000	10,000
Stress–strain scale	200	×	—	—	—	√	√	√
	400	×	×	—	—	√	√	√
	600	×	×	×	—	√	√	√
	1000	×	×	×	—	—	√	√
	1200	×	×	×	—	—	—	√
	1400	×	×	×	—	—	—	√
	1600	×	×	×	—	—	—	√
	1800	×	×	×	×	—	—	√
	2000	×	×	×	×	—	—	√

Table 3. Forecast results of GRU.

Forecast Result		Training Set Size
Forecast Result		500	1000	2000	4000	6000	8000	10,000
Stress–strain scale	200	—	—	√	√	√	√	√
	400	—	—	—	√	√	√	√
	600	×	—	—	√	√	√	√
	1000	×	—	—	√	√	√	√
	1200	×	—	—	√	√	√	√
	1400	×	—	—	√	√	√	√
	1600	×	—	—	√	√	√	√
	1800	×	×	—	√	√	√	√
	2000	×	×	—	√	√	√	√

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, T.; Yu, Y.; Luo, H.; Wang, Z. Plastic Constitutive Training Method for Steel Based on a Recurrent Neural Network. Buildings 2024, 14, 3279. https://doi.org/10.3390/buildings14103279

AMA Style

Wang T, Yu Y, Luo H, Wang Z. Plastic Constitutive Training Method for Steel Based on a Recurrent Neural Network. Buildings. 2024; 14(10):3279. https://doi.org/10.3390/buildings14103279

Chicago/Turabian Style

Wang, Tianwei, Yongping Yu, Haisong Luo, and Zhigang Wang. 2024. "Plastic Constitutive Training Method for Steel Based on a Recurrent Neural Network" Buildings 14, no. 10: 3279. https://doi.org/10.3390/buildings14103279

APA Style

Wang, T., Yu, Y., Luo, H., & Wang, Z. (2024). Plastic Constitutive Training Method for Steel Based on a Recurrent Neural Network. Buildings, 14(10), 3279. https://doi.org/10.3390/buildings14103279

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Plastic Constitutive Training Method for Steel Based on a Recurrent Neural Network

Abstract

1. Introduction

2. Basic Principles

2.1. The Computational Procedure of the Constitutive Model in the FEM

2.2. Linear Hardening Constitutive Model for Metals

3. Basic Structure and Characteristics of the Recurrent Neural Network

3.1. Original Recurrent Neural Network

3.2. Long Short-Term Memory

3.3. Gated Recurrent Unit

4. Data Generation and Model Training

4.1. Data Preparation

4.1.1. Data Generation

4.1.2. Data Pre-Processing

4.2. Model Training Method

5. Influence Effect and Prediction Results of Model Parameters

5.1. Model Parameter Influence Effect

5.1.1. Influence of Recurrent Neural Network Structure

5.1.2. Training Frequency and Training Batch

5.1.3. Learning Rate of the Optimizer

5.2. Forecast Result

5.2.1. Model Prediction Ability

5.2.2. Model Prediction Effect

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI