Development of Artificial Neural Networks for Prediction of Low-Alloy Steels’ Volume Fractions of Microstructure Constituents
In its nature, the prediction of steels volume fractions of microstructure constituents is a regression, i.e., function approximation problem. For such problems, in various research areas including material property prediction, ANNs are being used as versatile, nonlinear computational models inspired by biological neural systems. The proposed procedure of estimating the low-alloy steels volume fractions of microstructure constituents using artificial neural networks, including data extraction and preparation, ANNs model building and training, and analysis of the ANNs performance, is given in the flow chart in
Figure 1 and explained in more details in the following paragraphs.
ANNs consist of an input layer, one or more hidden layers, and an output layer. Input and output layers have one neuron per input and output variable, while the size and the number of hidden layer(s) can vary. In a fully connected ANN, all neurons are interconnected by weights. A fully connected multilayer perceptron with one hidden layer and all input variables relevant for this study (listed in
Table 2) is shown in
Figure 2.
In this study, i.e., our prediction of steels’ volume fractions of microstructure constituents, several two-layer multilayer perceptrons (MLPs) were developed to estimate the steels microstructure, taking advantage of MLPs’ ability to model complex, nonlinear relationships in the data. According to [
18], a two-layer MLP with a
hyperbolic tangent transfer function in the hidden layer and
linear transfer function in the output layer is considered a universal approximator and is efficient in solving most regression problems. Due to the dataset being on the smaller side, a single hidden layer, which is appropriate for avoiding overfitting and ensuring generalization with limited data, was chosen. However, a combination of
tansig and
linear activation functions did not prove to be adequate for the problem presented in this research. Two conditions should be met when estimating the volume fractions of microstructure constituents. First, estimated values should be non-negative, and second, the sum of all outputs should be equal to 1. In addition to the
linear transfer function, the
logsig function was explored for the hidden layer. In the output layer, in addition to the
linear transfer function,
sigmoid and
softmax transfer functions were considered. The final configuration yielded the
logsig transfer function in the hidden layer and
softmax in the output layer. The outputs of the
softmax transfer function can be interpreted as the probabilities associated with each class (or here, the volume fraction of each phase). Each output will fall between 0 and 1, and the sum of the outputs will equal 1 [
18]. Although
softmax is more commonly used in pattern recognition or classification problems, here it aligns with the nature of the research problem.
The backpropagation algorithm, which is often used for supervised learning with a multilayer perceptron, is also used in this research. The primary goal of artificial neural network training is to adjust the weights so that the error function is minimized. Here, the mean square error, MSE, is chosen as the error function. The initialization of weights marks the beginning of the training, followed by the propagation of input signals through the network from the input layer to the output layer (forward phase). The forward phase is followed by backpropagation (backward phase). In backpropagation, error signals which are calculated by comparing predicted (i.e., output) and target values, propagate from the output layer to the input layer. During this phase, the weights are updated iteratively to reduce the error. The process continues until a specified stopping criterion is met.
Different combinations of input variables were considered and used for artificial neural networks development. The first configuration of input variables included the main alloying elements, austenitizing temperature,
Ta; austenitizing time,
ta; cooling time to 500 °C,
t500; and specific Jominy distance,
Ed. The second configuration omits the specific Jominy distance, while in the third configuration of input variables specific Jominy distance was used as the input variable (instead of the main alloying elements) along with the heat treatment parameters
Ta,
ta, and
t500. Input variables for individual configurations are listed in
Table 3. Every MLP had three output variables—volume fractions of ferrite-pearlite, bainite, and martensite. This was decided after the initial investigation, where the prediction of each microstructural constituent was performed with a separate ANN. In theory, three ANNs with one output should yield the same result as one ANN with three outputs. However, this did not provide good results in this case, since outputs are strongly interdependent.
For the development and testing of the artificial neural networks with all three configurations of input variables, 423 datasets for 24 steels were used (
Supplementary Materials, Table S1). For the purpose of this research, ANNs were developed using the computer software MATLAB R2022b [
19].
ANN robustness is ensured by preventing overlearning and overfitting, as well as by evaluating the ANNs’ performances on test data, i.e., those that were not used for ANNs development.
Overlearning was prevented by combining the “growth method” for the determination of the number of neurons in the hidden layer, and early stopping as a principle for improving generalization. Early stopping implies that weights are updated for the training dataset while the error function (mean square error, MSE) is calculated for the validation dataset. The training is stopped once the value of MSE on the validation dataset reaches a minimum, and then increases for a predefined number of epochs.
The size of the hidden layer, i.e., the maximum number of neurons in the hidden layer,
H, for which the ANNs with different configurations of input variables were trained, is determined depending on the number of available training equations,
Ntraineq, the number of input variables,
I, and the number of output variables,
O:
Limiting the maximum number of neurons in the hidden layer,
H, is important for preventing overfitting. The Levenberg–Marquardt algorithm with early stopping was used for the training of networks with three different combinations of input variables (as listed in
Table 3), and hidden layer size from one neuron to
H neurons (“growth method”). According to [
18], this learning algorithm with backpropagation appears to be among the fastest ANN training algorithms for moderate numbers of network parameters. The maximum number of neurons
H was different for each configuration, in line with Equation (1). Each architecture was trained 10 times with random initial weights and data divisions. In total, 2410 networks were trained—630 for Configuration No. 1, 680 for Configuration No. 2, and 1100 for Configuration No. 3.
Initialization of weights determines the starting point of ANN training and, if this is carried out randomly, for 10 iterations, the odds that said starting point is determined well and that MSE will reach the global minimum is increased.
The most important hyperparameters explored for the development of ANNs in this research are given in
Table 4. Hyperparameters that were selected for the final ANN configurations are underlined.
The Levenberg–Marquardt algorithm requires data division into training, validation and testing datasets. Using ten trainings per each architecture also ensures that random data division is performed in such a way that these three datasets represent the entire population. Fractions of data assigned to training, validation and testing are commonly set to 0.7/0.15/0.15 for training, validation, and testing datasets, respectively, which is adopted here as well.
If
N is the total number of datasets used for the development of ANNs, the number of training examples,
Ntrain, is then:
while the number of training equations
Ntraineq is:
The number of training equations Ntraineq was constant for all networks, regardless of the number of input variables and architecture. Output variables were always volume fractions of ferrite and pearlite, bainite, and martensite, which gives the number of outputs, O = 3.
The number of unknown weights,
Nw, in a fully connected MLP with one hidden layer is:
Several limitations should be kept in mind when determining the size of the hidden layer. The number of weights,
Nw, should be a lot smaller than the number of training equations, i.e.,
Nw <<
Ntraineq, or in extreme cases their difference, the number of degrees of freedom,
Ndof, must be greater than zero:
The number of training equations, Ntraineq, should be 4–5 times greater than the number of unknown weights, Nw. For a worst-case scenario with a maximum number of input variables (10, Configuration No. 1), to fulfill these conditions, the maximum feasible number of neurons in hidden layer, H, is between 12 and 15.
Since the original data were divided into training, validation, and testing subsets, the regression analysis between target and output values should be performed on each subset individually, as well as on the full dataset. Should the ANN show accurate fitting on the training subset, but poor results on the validation and test subsets, this would indicate overfitting. If training and validation results are good, but the testing results are poor, this could indicate extrapolation [
18]. Since ANNs learn by example, they are only reliable if applied to the same data distribution, as was the case with the learning dataset.
The selection of the best performing artificial neural network, for all three configurations and all architectures trained (hidden layer size
H), was based on the value of coefficients of correlation,
r, and the value of root mean square error,
RMSE, for the whole dataset. A useful indicator of model accuracy is root mean square error,
RMSE (the square root of the
MSE). It gives prediction errors of different models in the same unit as the variable that is to be predicted. The greater the
r and the smaller the
RMSE, the better the network’s performance is considered to be. If several networks had similar results, the one with a smaller hidden layer size was chosen. In accordance with the above-mentioned discrepancies that could occur between training, validation, and the testing of subsets, coefficients of correlation
rtrain,
rval, and
rtest, as well as the values of root mean square error
RMSEtrain,
RMSEval, and
RMSEtest were also analyzed for the obtained subsets so that balanced prediction of all three microconstituents is ensured, where possible. Results for selected architectures of all three configurations are summarized in
Table 5 and
Table 6.
The comparison of given metrics overall and for individual subsets shows that these do not differ significantly. That is especially important for training and testing values of coefficients of correlation r and root mean square error RMSE, i.e., rtrain and rtest, and RMSEtrain and RMSEtest. Based on these indicators, it can be concluded that the design and selection of ANNs ensured robustness and good generalization capabilities.