Using Machine Learning for Aerostructure Surface Damage Digital Reconstruction

Wu, Yijia; Tang, Hon Ping; Mannion, Anthony; Voyle, Robert; Xin, Ying

doi:10.3390/aerospace12010072

Open AccessArticle

Using Machine Learning for Aerostructure Surface Damage Digital Reconstruction

by

Yijia Wu

,

Hon Ping Tang

,

Anthony Mannion

,

Robert Voyle

and

Ying Xin

^*

Aviation Services Research Centre, Room 241, Block X, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China

^*

Author to whom correspondence should be addressed.

Aerospace 2025, 12(1), 72; https://doi.org/10.3390/aerospace12010072

Submission received: 4 December 2024 / Revised: 6 January 2025 / Accepted: 8 January 2025 / Published: 20 January 2025

(This article belongs to the Special Issue Aircraft Structural Health Monitoring and Digital Twin)

Download

Browse Figures

Versions Notes

Abstract

:

Aerostructure surface damage inspection is carried out over the whole life-cycle using legacy processes and recording during maintenance. The inspection techniques record the detailed history of the damage and repair. However it remains elusive to predict the location of future damage on the aerostructure surface. In this paper, we work up a novel simulation technique based on the results of machine learning analysis for prediction of the reference location of future aerostructure surface damage. First, we use the support vector machine (SVM) and k-nearest neighbor (KNN) to analyze the damage on three B777-200 aircraft and find that the classification accuracy can range from 75.1% to 86%. Then, we use the prediction result of a feedforward neural network (FNN) to simulate the damage structure and the mapping relationship to explore its reconstructive possibility. We show that the aerostructure surface damage can be reconstructed by machine learning. Moreover, the aerostructure surface damage heat map obtained by reconstruction can be prepared for further image recognition analysis in the future.

Keywords:

machine learning; aerostructure; damage; surface; maintenance; life-cycle; predict

1. Introduction

Aviation is a field with a high demand for security and reliability, and it faces the challenges of digital transformation, low-efficiency maintenance, intensive technology, and high operational production costs [1]. In aviation, using a central repository to collect the measured data usually leads to part of the useful and high-value data being lost. These lost high-value data are useful for optimizing maintenance [2,3]. The ways in which one prioritizes dealing with useful data is vital for continued maintenance [4]. With the development of artificial intelligence (AI), using data interaction and fusion with AI platforms to analyze the maintenance processes on an aircraft has become a mainstream research topic [5,6]. Phanden [7] used finite element methods (FEMs), computer-aided engineering (CAE), and Monte Carlo simulation to simulate and replicate 34 flights’ time-based histories. This result is just predicting the forthcoming maintenance requirements. In another work, Zaccaria and Stenfelt [8] simulated the relationship between fuel consumption and flight safety to predict failure, which reduced maintenance costs [9]. In this work, the great challenges are the large amounts of collected data which need adaptive models to fit them [10,11]. In order to solve the problem of interaction and fusion between data and the model, Uzun and Demirezen [12] proposed using deep learning to build a data-driven model based on fuel flows. They used Quick Access Recorder (QAR) data from a B777-300ER that had logged over 1000 flights. The QAR is used to estimate the aircraft fuel burn [13,14]. Finally, the results showed that the aircraft-based model synchronizes with actual performance.

In addition to the above research, another main research field is the use of Digital Twins (DTs) in the application of aero-engine maintenance and evaluation [15]. The operation of the aero-engine is very important for efficient aircraft performance. The health assessment of aero-engines has a great significance, especially in supporting maintenance decision making for ensuring flight safety [16]. Liang and Huawei [17] used the MultiScale1DCNN model in the health assessment tests of aero-engines and obtained an accuracy rate of about 96%. The results showed that the method can accurately reflect the health status of the aero-engine. Jinyue and Gang [18] developed a prototype system based on a vector loop and on-site data, which was verified on an aero-engine. Yufeng and Jun [19] proposed a novel DT method based on deep multimodal information. The research results showed that the DT models they proposed can improve fault accuracy and parameter prediction error [20,21].

However, during previous research, there has been a challenge in incorporating big data analysis in DT models [22], because the virtual environment outputs performance data which are instantly compared with the system models, and at the same time, data are modified, exchanged, estimated, and simulated [23]. The performance of aero-engines usually shows highly nonlinear physics; any small deviation of the shape will lead to a high computational cost [24]. Therefore, it was necessary to develop a fast simulator [25]. McClellan [26] developed a low-dimensional computational model that is capable of accurate prediction using a large number of data. It will have great value for the study of aero-engine behavior, in addition to saving time and cost on the test platform [27]. Minglan and Huawei [28] developed a kind of Implicit Digital Twin (IDT) to improve the predictive effect on engine maintenance. It can be seen that the maintenance of an aero-engine follows a development process from “post-event maintenance” to “preventive maintenance” and then to “predictive maintenance” [29]. In the future, “precise maintenance” will become the main direction of development [30]. The purpose of “precise maintenance” is to reduce the cost of maintenance and operation whilst achieving and ensuring safety [31]. While the aero-engine is undoubtedly one of the most important components of an aircraft, we cannot ignore the equally important roles of other parts, such as wings and fuselage in flight safety [32]. Similarly, we can apply the prediction approach to aerostructure maintenance and inspection according to the previous research results.

In this work, we investigate the aerostructure surface damage reference location predicted and reconstructed by machine learning. We apply data encoding and simulated stepping to construct the data links and then test whether these kinds of augmented data can be recognized by machine learning. In Experiment 1, we show that the preprocessed data can be well recognized by machine learning. In the SVM and KNN, the binary classification accuracy of the data with the label “Dent” and the data with the label “Others” is 75.1% and 86%, respectively. We then use these classification methods to back test the third type of damage, which are the data with the label “Damage and Others”. Using the same models and methods, the classification accuracy is 74.9% and 84.3%, respectively. This result verifies that the augmented data can be correctly recognized by the classification model, and at the same time it can be prepared for damage prediction in the next step. In Experiment 2, we use a neural network model to predict the aerostructure surface damage reference location. We use an FNN to predict 4616, 5387, and 5839 pieces of historical damage data of the same three B777-200 aircraft during their whole life-cycle (being the same as Experiment 1) according to the prediction value and the true value, which is obtained by the FNN, in order to judge the accuracy of the result. Finally, we obtain an FNN prediction accuracy rate of more than 93%. In Experiment 3, we use the FNN prediction value to generate a point cluster and form a point cloud at the same time, then we construct a three-dimensional (3D) damage simulation matrix [33]. Next, we use matrix merging and gradient descent to define the border of the aerostructure surface damage [34]. Finally, we map the 3D damage simulation matrix to the aircraft models’ surface to complete the whole process of reconstruction.

2. Materials and Methods

This work mainly consists of three components: data preprocessing, AI analysis, and damage reconstruction (Figure 1). Especially, with the AI analysis and damage reconstruction, through Experiments 1, 2, and 3, we exhibit the whole research core content. In data preprocessing, the raw data are preprocessed to fit the AI model. We mainly use data encoding, simple statistics, correlation coefficient analysis (Pearson correlation coefficient), and simulated stepping (data links) to augment the data. At the same time, we normalize the data and separate the training dataset (80%) and testing dataset (20%) for cross-validation. For AI analysis, an AI model framework is built for classification and prediction of the data. Firstly, we choose two machine learning classifier models (SVM and KNN) to recognize the different kinds of damage from the three B777-200 jets (Alpha 1, Charlie 2, and Bravo 3) and compare the models’ performances based on the results of classification accuracy (Experiment 1). We use those results to analyze and judge which machine learning model is more suitable for which kinds of data for learning and testing. The simple statistic results show that two of the three B777-200 jets (Alpha 1 and Charlie 2) have damage data labeled “Dent” accounting for nearly 50% of all damage. This means that the model should choose binary classify methods to classifier the data. Then, we verify the classification methods’ accuracies by back testing based on the damage data with the label “Damage”, which come from the third B777-200 jet (Bravo 3). Secondly, after completing the work of classification, we choose the third machine learning model (FNN) for damage data predictions (Experiment 2). The aim of using the third machine learning model for data prediction is not only to test the general performance of the derived data, but also to verify the results of Experiment 1. As there is a certain spatial structure logic relationship between each item in the raw data during the period of encoding, we follow the deep layer logical structure rule when augmenting data by simulated stepping. We use the forward stepping to build one-way data links, and in these data links, each node can be independently used as a starting point for the next stepping. When the data links obtain the direction, it is used for the FNN, with the features of time course [35] to predict the derived data. The advantage is that during the learning process, the FNN can effectively classify and predict according to the data features.

In damage reconstruction, we use the predicted results obtained by the FNN to reconstruct aerostructure surface damage. Firstly, we use the difference between the predicted value and true value obtained by the FNN to construct a point cloud (Experiment 3) [36,37]. We then use the gradient descent method to define the border of the simulated damage matrix. Finally, through matrix merging, we combine the biases matrix with the core matrix to generate a 3D matrix. We use the 3D damage simulation matrix mapping for the aerostructure surface model (Fly away mods) and then complete the reconstruction.

2.1. Data Preprocessing and Analysis

2.1.1. Encoding Data

In the raw data, the historical maintenance records are composed of five items (Air Transport Association 100 (ATA 100), zone, aircraft LH/RH, parts, location, and damage Type). The ATA mainly describes the fuselage, wings, nacelles, and stabilizers. The zone is mainly labeled using fuselage lower, fuselage top, doors, left wing, right wing, and stabilizers. The ATA and zone are defined by the ATA 100 numbering system, which was published on 1 June 1956. The aircraft is mainly labeled as LH and RH. The fuselage is mainly labeled as AFT (after), FWD (forward), fairing, upper, lower, LH, RH, vertical, horizontal lower, and horizontal upper. The parts of the wing are mainly labeled slat, flaps, etc. The damage type is mainly labeled crack, scratch, corrosion, dent, etc. For effective classification and prediction, the raw language needs to be transformed into a new language that can be understood by machine learning. We encoded each maintenance item and all the textual contents into a numeric code (Table 1).

In Table 1, there is a certain logical relationship existing between each independent item. It can also be found that the division of aircraft spatial location from item ATA to item location is becoming more and more detailed. We use these independent items to construct a set of vectors. In each vector the order is ATA, zone, aircraft, parts, and location. In the vector, the components are shown by the numeric code that comes from each independent item. We define such a set of vectors as one epoch. Therefore, the raw maintenance information for the three B777-200 jets (Alpha 1, Charlie 2, and Bravo 3) were transformed into 4616, 5387, and 5839 pieces of epoch data information. This also means that the new epoch dataset contains 15,842 pieces of maintenance information, the total for all three aircraft (Figure 2). In each piece of epoch data information, there are contained details of damage location and damage type. Figure 2 shows the detailed encoding process of raw data from historical damage recording.

2.1.2. Simple Statistical Analysis and Correlation Coefficient Analysis

Using a simple statistic is a fast and effective analysis method to better understand the whole distribution of data,. Figure 3 shows the different categories of distribution for each item. The chart’s horizontal axis represents descriptions such as dent, scratch, fore, aft, etc., in each category, and the vertical axis represents the number of those descriptions. In this work, according to the research purpose, the “damage type” is defined as the target of analysis.

Usually, the damage of “dent” is caused by a collision when loading the cargo, but it can also be due to contact with the trolley, etc. The magnitude of these dents generally depends on the collision range and intensity. The damage of “others” mainly includes those that occur on the aerostructure surface, such as crack, delamination, lightning strike, scratch, etc. Figure 3 shows, for Alpha 1 and Charlie 2, the kinds of damage data labeled with “dent” and “others”, with each accounting for almost 50% of all. This means that for Alpha 1 and Charlie 2, damage caused by collisions accounts for a high proportion of the total damage over the whole life-cycle. Therefore, the next research analysis should be focused on the damage type labeled “dent”. The distribution of data labeled “dent” accounts for nearly 50% of all, which means it is also more suitable for binary classification.

There is a relationship between the damage type and the location where it happened. Further analysis of the correlation between these two relevant factors is one of the main purposes in this work. In the previous work of raw data encoding, we discovered that there is a progressive relationship between each independent item. This progressiveness shows a linear relationship, and the Pearson correlation coefficient can be used as a statistical indicator to measure the strength of that linear relationship between two continuous variables. Therefore, we use the Pearson correlation coefficient to measure the correlation degree between the variable damage type and the variable locations. Firstly, in the Pearson correlation coefficient, for the two variables, X and Y, whose observations are

(\begin{matrix} x_{1}, & x_{2}, & \dots, & x_{n} \end{matrix})

and

(\begin{matrix} y_{1}, & y_{2}, & \dots, & y_{n} \end{matrix})

, the Pearson correlation coefficient

r

can be calculated using the following formula:

r = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \sqrt{{\sum_{i = 1}^{n} (y_{i} - \bar{y})}^{2}}}

(1)

In Formula (1),

\bar{x}

is the mean of the variable X;

\bar{y}

is the mean of the variable Y;

(x_{i} - \bar{x}) (y_{i} - \bar{y})

is the product of the deviation of variables X and Y at the i-th observation;

{(x_{i} - \bar{x})}^{2}

and

{(y_{i} - \bar{y})}^{2}

are the square of the deviations of variables X and Y.

Figure 4 shows the results of the Pearson correlation coefficient analysis. The bottom row in Figure 4 shows the correlation rate between damage type to the other locations. Usually, the correlation coefficient rate is within the range of −1 to 1. The closer the correlation value parameter is to −1 and 1, the higher the correlation between the actual items and the target items. When two items are identical, their correlation value parameters are 1 or −1. In general, the correlation parameter is an objective representation of the degree of correlation between the two in terms of the magnitude of the numerical value. We can find that the correlation rates between the damage type to location for Alpha 1, Charlie 2, and Bravo 3 have the highest values, which can reach −0.13, 0.14, and −0.22. This result not only shows that the trigger of the damage has a strong relationship with the location, but it also verifies that the hypothesis we proposed in the previous work is correct. It should be pointed out that the correlation coefficient value is just objectively reflecting the two variables’ closeness. It is only used to reflect some possible internal connections between two variables. Therefore, according to the analysis results of Figure 4, the next step is to enhance the data robustness and each item’s correlations [38].

2.1.3. Using Stepping to Simulate the Data Links to Augment Data

In the previous work, we encoded the raw data and obtained an epoch that contained the information of location and damage. The results of Pearson correlation coefficient show that although there is a certain relationship between the two variables, analysis shows that the strength is not ideal. Usually, this is caused by the factors of the data themselves, such as lack of data quantity. Therefore, in this work, we use the simulated stepping method to use the raw data and, at the same time, superimpose simulated data links, to increase the data quantity. Finally we reach the purpose of data augmentation (Figure 5). Figure 5 shows the process of simulated stepping superimposed on the raw data and the process of simulated data links being formed. On the simulated data links, all data transfer directions are one-way (feedforward transportation). Each item on these simulated data links can be defined as an independent starting node for data forward transportation. Using simulated stepping superimposed on the raw data leads to a kind of linear relationship between each node of the whole simulated data links. Therefore, when the simulated data links are unfolded, it is a straight chain, not a cycle chain.

Firstly, we constructed one piece of raw basic data links: θ, which was raised in the previous work of data encoding.

θ

is a vector that is composed of a set of components. The defined items compose a row vector. In the row vector,

a

is ATA,

b

is zone,

c

is aircraft,

d

is parts, and

e

is location. The new row vector is as follows:

θ = \{\begin{matrix} a, & b, & c, & d, & e \end{matrix}\}

(2)

According to Formula (2), we use ATA as the first staring node to construct a new forward stepping based on

θ

. We define this stepper as Step 6. During the period on ATA, Step 6 is

\sum_{i \to 1} S_{a}^{'} = \{\begin{matrix} θ & , α_{1} \end{matrix}\}, θ = \{\begin{matrix} a & \dots & e \end{matrix}\} α_{1} = a \cup b

(3)

In Formula (3),

S_{a}^{'}

is the sum of the raw basic data links and the new stepper start node.

α_{1}

is a union set that is constituted by the ATA stepping into the zone.

a

is the ATA, which is defined in component.

b

is the zone, which is also defined in component.

i

is the cumulative count of stepping. What should be noted is that the start stepping with ATA is actually Step 6, because the

θ

has already been executed by Step 5.

In Formula (3), based on the forward transmission direction of ATA nodes, we can infer Step 7. Therefore, moving from Step 6 to Step 7 is as follows:

\sum_{i \to 2} S_{a}^{'} = \{\begin{matrix} θ & , α_{1} & , α_{2} \end{matrix}\}, θ = \{\begin{matrix} a & \dots & e \end{matrix}\}, α_{1} = a \cup b, α_{2} = α_{1} \cup c

(4)

We simplify Formula (4) as follows:

\sum_{i \to 2} S_{a}^{'} = \{\begin{matrix} \sum_{i \to 1} S_{a}^{'} & , α_{2} \end{matrix}\}, α_{2} = α_{1} \cup c

(5)

In Formulas (4) and (5),

α_{2}

is a union set that is superimposed by the ATA stepping into the aircraft.

c

is the aircraft, which is defined in component.

i

is the cumulative count of stepping.

In Formula (4), based on the forward transmission direction of ATA nodes, we can infer Step 8. Therefore, moving from Step 7 to Step 8 is as follows:

\sum_{i \to 3} S_{a}^{'} = \{\begin{matrix} θ & , α_{1} & , α_{2} & , α_{3} \end{matrix}\}, θ = \{\begin{matrix} a & \dots & e \end{matrix}\}, α_{1} = a \cup b, α_{2} = α_{1} \cup c, α_{3} = α_{2} \cup d

(6)

We simplify Formula (6) as follows:

\sum_{i \to 3} S_{a}^{'} = \{\begin{matrix} \sum_{i \to 2} S_{a}^{'} & , α_{3} \end{matrix}\}, α_{3} = α_{2} \cup d

(7)

In Formulas (6) and (7),

α_{3}

is a union set that is superimposed by the ATA stepping into the parts.

d

is the parts, which is defined in component.

In Formula (7), based on the forward transmission direction of ATA nodes, we can infer Step 9. Therefore, moving from Step 8 to Step 9 is as follows:

\sum_{i \to 4} S_{a}^{'} = \{\begin{matrix} θ & , α_{1} & , α_{2} & , α_{3} & , α_{4} \end{matrix}\}, θ = \{\begin{matrix} a & \dots & e \end{matrix}\}, α_{1} = a \cup b, α_{2} = α_{1} \cup c, α_{3} = α_{2} \cup d, α_{4} = α_{3} \cup e

(8)

We simplify Formula (8) as follows:

\sum_{i \to 4} S_{a}^{'} = \{\begin{matrix} \sum_{i \to 3} S_{a}^{'} & , α_{4} \end{matrix}\}, α_{4} = α_{3} \cup e

(9)

In Formulas (8) and (9),

α_{4}

is a union set that is superimposed by the ATA stepping into the location.

e

is the location, which is defined in component.

After finishing the first ATA node stepping superimposition, we start a new round of the second zone node stepping superimposition. During the period on zone, Step 10 is

\sum_{i \to 1} S_{b}^{'} = \{\begin{matrix} φ & , β_{1} \end{matrix}\}, φ = \sum_{i \to 4} S_{a}^{'}, β_{1} = b \cup c

(10)

In Formula (10),

β_{1}

is a union set that is superimposed by the zone stepping into the aircraft.

b

is the zone, which is defined in component.

c

is the aircraft, which is defined in component.

In Formula (10), based on the forward transmission direction of zone nodes, we can infer Step 11. Therefore, moving from Stepping 10 to Step 11 is as follows:

\sum_{i \to 2} S_{b}^{'} = \{\begin{matrix} φ & , β_{1} & , β_{2} \end{matrix}\}, φ = \sum_{i \to 1} S_{b}^{'}, β_{1} = b \cup c, β_{2} = β_{1} \cup d

(11)

We simplify Formula (11) as follows:

\sum_{i \to 2} S_{b}^{'} = \{\begin{matrix} \sum_{i \to 1} S_{b}^{'} & , β_{2} \end{matrix}\}, β_{2} = β_{1} \cup d

(12)

In Formulas (11) and (12),

β_{2}

is a union set that is superimposed by the zone stepping into the parts.

d

is the parts, which is defined in component.

In Formula (12), based on the forward transmission direction of zone nodes, we can infer Step 12. Therefore, moving from Step 11 to Step 12 is as follows:

\sum_{i \to 3} S_{b}^{'} = \{\begin{matrix} φ & , β_{1} & , β_{2} & , β_{3} \end{matrix}\}, φ = \sum_{i \to 1} S_{b}^{'}, β_{1} = b \cup c, β_{2} = β_{1} \cup d, β_{3} = β_{2} \cup e

(13)

We simplify Formula (13) as follows:

\sum_{i \to 3} S_{b}^{'} = \{\begin{matrix} \sum_{i \to 2} S_{b}^{'} & , β_{3} \end{matrix}\}, β_{3} = β_{2} \cup e

(14)

In Formulas (13) and (14),

β_{3}

is a union set that is superimposed by the zone stepping into the location.

e

e

is the location, which is defined in component.

Using aircraft as the stepping, we superimpose for the third start node which contains just two steppings. During the period on aircraft, Step 13 is

\sum_{i \to 1} S_{c}^{'} = \{\begin{matrix} τ & , γ_{1} \end{matrix}\}, τ = \sum_{i \to 3} S_{b}^{'}, γ_{1} = c \cup d

(15)

In Formula (15),

γ_{1}

is a union set that is superimposed by the aircraft stepping into the parts.

c

is the aircraft, which is defined in component.

d

is the parts, which is defined in component.

In Formula (15), based on the forward transmission direction of the aircraft node, we can infer Step 14. Therefore, moving from Step 13 to Step 14 is as follows:

\sum_{i \to 2} S_{c}^{'} = \{\begin{matrix} τ & , γ_{1} & , γ_{2} \end{matrix}\}, τ = \sum_{i \to 3} S_{b}^{'}, γ_{1} = c \cup d, γ_{2} = γ_{1} \cup e

(16)

We simplify Formula (16) as follows:

\sum_{i \to 2} S_{c}^{'} = \{\begin{matrix} \sum_{i \to 1} S_{c}^{'} & , γ_{2} \end{matrix}\}, γ_{2} = γ_{1} \cup e

(17)

In Formulas (16) and (17),

γ_{2}

is a union set that is superimposed by the aircraft stepping into the location.

e

is the location, which is defined in component.

Then, the last stepping superimposes which is led by parts, with only one stepping. During the period on parts, Step 15 is

\sum_{i \to 1} S_{d}^{'} = \{\begin{matrix} η & , ε \end{matrix}\}, η = \sum_{i \to 2} S_{c}^{'}, ε = d \cup e

(18)

Finally, we show all steppings in one set as follows:

\sum_{i \to 5} S_{a \to e}^{″} = \sum_{i \to 1} S_{d}^{'} = \{\begin{matrix} a & \dots & ε \end{matrix}\}, ε = d \cup e

(19)

In Formulas (18) and (19), the union set is superimposed by the parts stepping into the location.

e

is the location, which is defined in component.

Figure 6 shows the simulated data links correlation value changing during the simulated stepping. It can be seen from Figure 6 that after stepping, superimposing the raw data structure and information becomes more complex. The number of data has increased, and they are now better for machine learning training and testing. The correlation coefficient values of Alpha 1, Charlie 2, and Bravo 3 are raised from 0.13, 0.22, and 0.14 to nearly 0.4, 0.5, and 0.6, and the growth rates are 33%, 44%, and 23.3%. The results show that after using the simulated stepping superimposition and simulated data links transfer, the raw data number and structure are expanded and enlarged linearly. Therefore, the raw data after effective augmentation can be used for machine learning analysis in the next step.

2.1.4. Normalization

To further enhance the data quality after data augmentation used for machine learning analysis, we use normalization to preprocess the data. Usually, the purpose of normalization is to remove noise from the data [39]. The method is to clean and process the raw data to improve the data reliability. In addition, normalization can also be used for transforming the data format and letting it become more suitable for modeling and analysis. Therefore, in this work, we used Z-score normalization (standardization) and min–max normalization (rescaling) for data preprocessing before classification [40]. After a comparison, we finally chose min–max normalization. The mathematical expression of Z-score is

x_{n o r m} = \frac{x - u}{σ}

(20)

In Formula (20) of Z-score,

u

is the mean of data,

σ

is the standard deviation of data, and

x_{n o r m}

is the normalized result of

x

.

x_{n o r m} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(21)

In Formula (21) of min–max,

x_{\min}

is the minimum value of the data,

x_{\max}

is the maximum value of the data, and

x_{n o r m}

is the normalized result of

x

.

During the data prediction of Experiment 3, which still needs preprocessing of the data, the formula of the data normalization is

x^{'} = \frac{x - x_{m e a n}}{x_{\max}}

(22)

In Formula (22),

x^{'}

is the standardized value,

x

is the original value,

x_{m e a n}

is the mean of variable, and

x_{\max}

is the maximum value of variable. For example,

x

represents the damage type variable,

x_{m e a n}

is the average of all damage type (it means that it is the average of the first column in row data), and

x_{\max}

is the highest value of damage type.

2.1.5. Data Training and Cross-Validation

Before using a machine learning model for training and testing, we needed to separate the datasets for cross-validation. Firstly, we marked the datasets of Alpha 1 and Charlie 2 in the classification model, and then used the supervised learning model to train the data. We classified data with the damage type “Dent” as one label and the rest of the data with other damage types as “Others”. All data needed to be marked (labeled) according to different the kinds of damage. In order to verify the reliability of the classification model, we used the classification results of Alpha 1 and Charlie 2 to back test Bravo 3. After marking the data, the next step was to divide the data into a training set and a testing set. The training set accounted for 80% of all data and the testing set accounted for 20% of all data, and all data needed to be cross-validated. The ratio chosen, of 80:20, was to avoid the same category of data appearing in a single dataset, which may lead to inaccurate classification results. After the data are separated, they can be classified according to the results of training and testing. The previous work of data augmentation, using simulated stepping and simulated data links to enlarge and expand the data quantity, led to the data having a linear relationship. Therefore, in this work, we mainly used SVM and KNN models for classification. Figure 7 shows the data cutting, dividing, training, testing, and cross-validating process.

2.2. Using Machine Learning for Classification (Experiment 1)

2.2.1. Kernel Functions of the SVM

There was a linear relationship between the data after using the simulated stepping. The binary classification is also a kind of linear classifier. Therefore, we chose linear kernel as the kernel function in the SVM:

K (x, y) = x^{T} y

(23)

In Formula (23),

x

and

y

are input vectors, and the advantage of using linear kernel functions is that they are computationally simple and do not require complex matrix operations. However, the linear kernel function can only process linearly separable datasets and may not be able to obtain good classification results for nonlinear datasets.

2.2.2. Similarity Function of KNN

KNN (k-nearest neighbor) is an algorithm based on instance-based learning and nongeneralizing learning. The similarity function of KNN includes Euclidean distance, Manhattan distance, Minkowski distance, and cosine similarity. Therefore, in this work according to the cross-validation, we chose Euclidean distance to analysis the data. The function expression of Euclidean distance is

x = (\begin{matrix} x_{1}, & x_{2}, & \dots, & x_{n} \end{matrix})

(24)

y = (\begin{matrix} y_{1}, & y_{2}, & \dots, & y_{n} \end{matrix})

(25)

E u D (x, y) = \sqrt{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}

(26)

In Formulas (24)–(26),

x

and

y

are data,

x_{i}

and

y_{i}

are the i-th attributes of the data, and

E u D (x, y)

is the Euclidean distance.

k

is a hyperparameter, and it can be obtained by cross-validation.

In Figure 8, “

k

” represents the radius of a circle. The

k

value usually shows the model’s complexity and its size, and whether the model will overfit or not (Figure 8). In Figure 9, we tested the

k

value range from

k \in [2, 20]

, where the horizontal axis represents the

k

values range, and the vertical axis represents the classification accuracy. According to the testing, it can be found that when the value of hyperparameter

k

is

k = 5

, the whole model tends to fit (Figure 9; shaded parts). Therefore, in this work, we use the hyperparameter value

k = 5

to train the model and evaluate the accuracy of the KNN model on both the training and prediction sets separately.

2.3. Using the FNN for Damage Prediction (Experiment 2)

As we define the direction of the data links during the data augmentation and make each epoch transfer forward, it is appropriate to use the FNN for data prediction in Experiment 2, because in the FNN, the neural network for each layer transmission direction is also forward (Figure 10). In this work, the FNN has 5 basic inputs, 1 output, and 3 hidden layers. In an FNN, the sigmoid function (sigmoid function) is also known as a logical function (logistic function), which was usually used as an activation function for the hidden layers and the output layers. It is an S-shaped function. It compresses the neuron output range of the FNN within (0, 1). For one neuron, its sigmoid function mathematical output expression is

y = S (ω x + b) = \frac{1}{1 + e^{- (ω x + b)}}

(27)

In Formula (27),

ω

is the weight,

x

is the input value,

b

is the bias value, and

e

is the natural logarithm base, which is approximately 2.71828.

From the hidden layer to the output layer, we choose a regression function. The reason why the function from the hidden layer to the output layer is changed to a regression function is because we no longer use neural networks for classification, but for prediction. The classification output result range is usually between [0, 1], so we use the sigmoid function. But our prediction result is a wide range value. For example, the predicted value of the damage may be 0.72 or 0.83, so we need to use a multiple regression function for analysis. In actuality, this is a process of array assignment. The significance is that we revert the original variable to a node in the hidden layer. Then we use the nodes for multiple linear regression. The original variables will not be used for direct multiple regression again, and this is also the meaning of the hidden layer.

y = w_{1} h_{1} + w_{2} h_{2} + b_{3}

(28)

In Formula (28),

w_{1}

is the weight of layer 1 to layer 2.

w_{2}

is the weight of layer 2 to layer 3.

h_{1}

is the forward communication learning of layer 1 to layer 2.

h_{2}

is the forward communication learning of layer 2 to layer 3.

b_{3}

is the residual.

When we use the learning rate α = 0.01, we have the iterative epoch 1000 times, and output the cost function value every 10 times, so as to obtain the loss value and the number of iterations. The cost function is

J (w, b) = \frac{1}{2 m} \sum_{i = 1}^{m} ({\hat{y}}^{(i)} - y^{(i)})^{2}

(29)

In Formula (29), m is the number of training examples, and

{\hat{y}}^{(i)}

is close to

y^{(i)}

for all

(x^{(i)}, y^{(i)})

. The learning rate

α

is 0.01.

2.4. Building Point Cloud and 3D Matrix (Experiment 3)

In Experiment 3, we use the methods of point cloud combined with the methods of matrix merging to simulate and reconstruct the aerostructure surface damage location. The data, which are made up of point clouds, usually have 3D spatial features; therefore, they can be shown in the form of matrices [41]. Firstly, we assign the results of prediction that were obtained by the FNN and use the difference between the true value and predicted value to form an element that constructs the damage matrix. The mathematical expression is

Δ_{p} = V_{t} - V_{p}

(30)

In Formula (30), the data true value is

V_{t}

, and the predicted value is

V_{p}

. The difference value is

Δ_{p}

. Then, we define difference

Δ_{p}

as

a

for the element in the matrix:

a_{1} \dots a_{n} = {Δ_{p}}_{1} \dots Δ_{p n}

(31)

According to Formula (31), we create a set of column vectors:

[\begin{matrix} a_{1} \\ a_{2} \\ a_{3} \\ . \\ . \\ . \\ a_{n - 1} \\ a_{n} \end{matrix}] \Rightarrow [\begin{matrix} Δ_{p 1} \\ Δ_{p 2} \\ Δ_{p 3} \\ . \\ . \\ . \\ Δ_{p (n - 1)} \\ Δ_{p n} \end{matrix}]

(32)

According to Formula (32), we use gradient descent to construct the matrix borders. The starting value is

Δ_{p}

, and the target value is

\lim_{n \to - 0.4}

. Here, the target value takes infinity and tends to 0.4, because this value is far beyond the limit value that

Δ_{p}

can reach, which is a safe value.

Then, we can obtain the left biases matrix as follows:

\begin{array}{l} D L_{b o r d e r} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \\ ⋮ & ⋮ \\ \lim_{n \to - 0.4} & \dots & Δ_{p 1} \end{matrix}] \\ D L_{m a i n} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & Δ_{p 1} \\ Δ_{p 2} \\ Δ_{p 3} \\ ⋮ & ⋮ \\ Δ_{p (n - 2)} \\ Δ_{p (n - 1)} \\ \lim_{n \to - 0.4} & \dots & Δ_{p n} \end{matrix}] \Rightarrow L D_{m a i n + b o r d} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \\ ⋮ \\ Δ_{p 1} \\ Δ_{p 2} \\ Δ_{p 3} \\ ⋮ & ⋮ \\ Δ_{p (n - 2)} \\ Δ_{p (n - 1)} \\ Δ_{p n} \\ ⋮ \\ \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \end{matrix}] \\ D L_{b o r d e r} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & Δ_{p 1} \\ ⋮ \\ \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \end{matrix}] \end{array}

(33)

In Formula (33),

D L_{b o r d e r}

is the border matrix on both sides, which shows the outer extension region of the damage.

D L_{m a i n}

is the core matrix, which shows the internal core region of the damage.

L D_{m a i n + b o r d e r}

is the left biases matrix after merging. What is important to note is that when two matrices are merged, the transition values should be continuous. This will not lead to the image obtained by matrix merging showing topographic faults. The right border of biases matrix is as follows:

\begin{array}{l} D R_{b o r d e r} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \\ ⋮ & ⋮ \\ Δ_{p 1} & \dots & \lim_{n \to - 0.4} \end{matrix}] \\ D R_{m a i n} = [\begin{matrix} Δ_{p 1} & \dots & \lim_{n \to - 0.4} \\ Δ_{p 2} \\ Δ_{p 3} \\ ⋮ & ⋮ \\ Δ_{p (n - 2)} \\ Δ_{p (n - 1)} \\ Δ_{p n} & \dots & \lim_{n \to - 0.4} \end{matrix}] \Rightarrow R D_{m a i n + b o r d} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \\ ⋮ \\ Δ_{p 1} \\ Δ_{p 2} \\ Δ_{p 3} \\ ⋮ & ⋮ \\ Δ_{p (n - 2)} \\ Δ_{p (n - 1)} \\ Δ_{p n} \\ ⋮ \\ \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \end{matrix}] \\ D L_{b o r d e r} = [\begin{matrix} Δ_{p 1} & \dots & \lim_{n \to - 0.4} \\ ⋮ & ⋮ \\ \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \end{matrix}] \end{array}

(34)

In Formula (34),

D R_{b o r d e r}

is the border matrix on both sides, which shows the outer extension region of the damage.

D R_{m a i n}

is the core matrix, which also shows the internal core region of the damage.

R D_{m a i n + b o r d e r}

is the right biases matrix after matrix merging. According to Formulas (33) and (34), the complete damage matrix can be derived. The whole damage matrix, which is formed by left and right biases damage matrix, is as follows:

L D_{m a i n + b o r d} \cdot R D_{m a i n + b o r d} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \\ Δ_{p 1} \\ Δ_{p 2} \\ ⋮ & ⋮ & ⋮ \\ Δ_{p (n - 1)} \\ Δ_{p n} \\ \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \end{matrix}]

(35)

Next, according to Formula (33), the left biases matrix

{LD}_{m a i n + b o r d e r}

is transposed to

{LD}_{m a i n + b o r d e r}^{T}

:

{LD}_{m a i n + b o r d} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \\ ⋮ \\ Δ_{p 1} \\ Δ_{p 2} \\ Δ_{p 3} \\ ⋮ & ⋮ \\ Δ_{p (n - 2)} \\ Δ_{p (n - 1)} \\ Δ_{p n} \\ ⋮ \\ \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \end{matrix}] \Rightarrow {LD}_{m a i n + b o r d}^{T} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & Δ_{p 1} & Δ_{p 2} & Δ_{p 3} & \dots & Δ_{p (n - 2)} & Δ_{p (n - 1)} & Δ_{p n} & \dots & \lim_{n \to - 0.4} \\ ⋮ & ⋮ \\ \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \end{matrix}]

(36)

According to Formula (34), the right biases matrix

{RD}_{m a i n + b o r d e r}

is transposed to

{RD}_{m a i n + b o r d e r}^{T}

:

{RD}_{m a i n + b o r d} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \\ ⋮ \\ Δ_{p 1} \\ Δ_{p 2} \\ Δ_{p 3} \\ ⋮ & ⋮ \\ Δ_{p (n - 2)} \\ Δ_{p (n - 1)} \\ Δ_{p n} \\ ⋮ \\ \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \end{matrix}] \Rightarrow {RD}_{m a i n + b o r d}^{T} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \\ ⋮ & ⋮ \\ \lim_{n \to - 0.4} & \dots & Δ_{p 1} & Δ_{p 2} & Δ_{p 3} & \dots & Δ_{p (n - 2)} & Δ_{p (n - 1)} & Δ_{p n} & \dots & \lim_{n \to - 0.4} \end{matrix}]

(37)

According to Formula (35), the whole damage matrix

{LD}_{m a i n + b o r d e r} \cdot {RD}_{m a i n + b o r d e r}

is transposed to

{LD}_{m a i n + b o r d e r}^{T} \cdot {RD}_{m a i n + b o r d r}^{T}

:

{LD}_{m a i n + b o r d} \cdot R D_{m a i n + b o r d} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \\ Δ_{p 1} \\ Δ_{p 2} \\ ⋮ & ⋮ & ⋮ \\ Δ_{p (n - 1)} \\ Δ_{p n} \\ \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \end{matrix}] \Rightarrow L D_{m a i n + b o r d} \cdot {R D}_{m a i n + b o r d}^{T} = [\begin{matrix} \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \\ ⋮ & Δ_{p 1} & Δ_{p 2} & \dots & Δ_{p (n - 1)} & Δ_{p n} & ⋮ \\ \lim_{n \to - 0.4} & \dots & \lim_{n \to - 0.4} \end{matrix}]

(38)

The 3D matrix of Formulas (36)–(38) constructs the point clouds. The next step is mapping the point cloud onto the aerostructure surface and then transforming it into heat map images (Figure 11). Figure 11 shows the transformation process of how to use predicted values to construct the point clouds and then simulate mapping to the surface of the aerostructure surface. During the reconstruction process, we will reconstruct the corresponding coordinates according to the different parts of the aircraft for more accurate mapping results. Finally, we integrate the simulated damage location, which is distributed on each part of the aircraft, and then obtain an aerostructure surface damage heat map image. These images are then prepared for the further step of using a convolutional image network for recognition and Fourier transform for curved surface simulation.

3. Results

In this work, three experiments were used to answer the following three questions: First, can the different kinds of damage to aerostructure surfaces be effectively recognized by machine learning? Second, when the damage reappears, can probability and location effectively be predicted according to the results of classification by machine learning? Third, can the machine learning predicted results be reconstructed by data visualization and 3D simulation?

3.1. Experiment 1: Machine Learning Recognition—Different Kinds of Damage

In order to verify the hypothesis, we chose two different kinds of classifier models to recognize one kind of damage that appeared with high frequency. According to the previous statistical analysis, it was found that for Alpha 1 and Charlie 2, the numbers of one kind of damage labeled “Dent” occupied 50% of the numbers for all kinds of damage. This result shows that during the maintenance life-cycle of an aircraft, the kind of damage “Dent”, which is caused by collisions, is a common occurrence. Therefore, we experimented to distinguish this kind of damage from others using machine learning. But can the machine learning model also effectively distinguish other kinds of damage? In order to answer this question, we chose another aircraft that contains nearly 50% of “Other” kinds of damage for validation testing. Therefore, we chose the aircraft Bravo 3, which contained the label “Damage” of 50% for validation. The validation results showed that the classification result of machine learning is effective. It can also effectively distinguish other kinds of damage. Moreover, selecting different models for comparing each other’s performance is also one way to verify the reliability of the classification results. Usually, the choice of classifier depends on the data features to obtain better results. In this work, as the data generated a linear relationship during the period of preprocessing, using the SVM is a good choice, because the SVM has a good performance in handing the linear problem. At the same time, we also chose a neural network model for classification performance comparison. By using KNN, it was found that the neural network model not only effectively classified the data but also showed a higher classification accuracy than the SVM. Therefore, under the cross-validation of different models, performance comparison and different kinds of damage were recognized; the results obtained were more scientific.

3.1.1. Presentation of Different Models’ Classification Performance

Figure 12 shows the classification accuracy, which was obtained from the different kinds of classification models and different kinds of damage. In Figure 12, the horizontal axis shows the number of data, and the vertical axis shows the classification accuracy. With continuous data growth, the classification accuracy of machine learning also increased. Here, the data quantity increase was mainly caused by the stepping superposition. When the forward stepping started, the data links that were led by each node were superimposed. This also led to the increased data quantity. During this period, the data structure became more and more complex. The data growth amount is directly proportional to classification accuracy raised and indicates that the machine learning model learned a certain number of features of the data and grasped a certain rule of the data during this period. In Figure 12, for Alpha 1 and Charlie 2, respectively, the classification accuracy range of damage with the label “Dent” is SVM: 41.7–64.7% and 46–71.1%, growth rate: 55%, 55%; FNN: 46.2–85.2% and 47.1–74.7%, growth rate: 84%, 58%. The classification results that were obtained by different models for the same kind of damage are satisfactory. For Bravo 3, the classification accuracy range of damage with the label “Damage” is SVM: 53.8% and 74.9%, growth rate: 39%; FNN: 55.7% and 84.3%, growth rate: 51%. The model can obtain the same classification results with another kind of damage label. This preliminary result indicates that machine learning can be used for classification of this kind of data. In Figure 12, the classification accuracy raised shows a positive correlation with the number of data growth, which also indicates that during this period, machine learning is in a dynamic state. It also shows a linear relationship with the data quantity increasing. Therefore, in the next steps, we need to analyze the results.

3.1.2. Using Curve Fitting to Analyze the Mechanism of Damage Recognition

According to the previous primary analysis, it can be found that the machine learning classification accuracy has a positive correlation with the number of data growth though simulated stepping superimposed generation. To further explore the relationship between these two variables, we use curve fitting for analysis. The logistic fit and the linear fit are used for analysis. The advantage of this that it is simple to describe and explain [42]. At the same time, it also can be used for predicting and analyzing the future trend of classification result changes [43].

Figure 13 shows the logistic fit results. The equation is

y = A 2 + \frac{(A 1 - A 2)}{[1 + (\frac{x}{x_{0}}) \overset{̑}{p}]}

(39)

In Formula (39),

A 2

is the maximum value, and

A 1

is the minimum value. From Figure 13, it can be seen that the R-square (COD) and R-square for Alpha 1 are SVM 95%, KNN 98%. Charlie 2 is SVM 97%, KNN 96%. Bravo 3 is SVM 96%, KNN 99%. After fitting, the values of R-square (COD) and adjusted R-square are maintained over 90%, which means the fit is successful. In addition, in the residual plot, the fitting results are in accordance with the relationship between classification accuracy (Acc) regular residuals and independent variable, fit Y value, count, and percentile (Figure 13). Figure 13 shows the relationship between the fitted curve and the raw curve. After fitting, the new curve is basically covering the raw curve. According to the projected curve trend, during the early stage of increasing data quantity, the SVM and KNN learning rates are smooth. However, when the quantity of data reaches a certain threshold, the learning rate is very high. With the data quantity continuously increasing, the model learning ability tends to saturate. After the increase has finished, the classification accuracy that is obtained is the final result. It should be noted that when the classification is nearing the end, the accuracy is not decreased, which means that there is no information loss, and at the same time this indicates that the model’s learning ability is improved. In Figure 13, the Acc regular residuals–percentile chart shows the fit relationship. The percentile is linearly distributed and basically fits on a straight line. With the growth of Acc regular residuals, the percentile also grows positively. While nonlinear curve fitting can fit the raw curve with the greatest accuracy, it also has a certain limitation. Therefore, the whole comprehensive analysis processing cannot be limited to a single fitting method. It needs to use different kinds of fitting methods for comparing and analyzing before obtaining corresponding conclusions.

In this work, as well as the nonlinear curve fitting method, we also used a linear fitting method for analysis (Figure 14). Figure 14 shows the SVM and KNN classification accuracy results obtained by linear fitting for Alpha 1, Charlie 2, and Bravo 3. Different from the nonlinear curve fit, the linear fit has a 95% confidence band and a 95% prediction band. According to the different bands, we can intuitively observe and analyze the accuracy of the classification result, predicting the spread trend. The band is also used to verify the result accuracy. According to Figure 14, the core function of a linear fit is a straight line, and the mathematical expression is a linear function:

y = a + b x

(40)

In Formula (40),

a

is the intercept and

b

is the gradient. It can be found that the magnitude of the gradient,

b

, is positively correlated with the learning rate of the model. When the gradient

b

value is larger, the model learning rate is higher, and its learning ability is stronger. All raw curves and data points must be covered by the 95% confidence band and 95% prediction band. The width of the band determines the degree of convergence of distributed data points. When the width of the band is narrower, the convergence of the data is higher and the dispersion of the data smaller. Conversely, if the width of the band is wider, the convergence of the data is lower, and the dispersion of the data will be larger. This means that data volatilely is higher, and variability in the accuracy of classification results is stronger. In Figure 14, the linear fit line can cover all the data points, and the band zone can also cover all the data points. Linear fitting successfully fits the raw data, which means that the classification accuracy results also have a linear correlation. The result of classification accuracy follows the number of data increase, showing a positive correlation. In addition, in Figure 14, the fit relationship of Acc regular residuals–percentile is consistent with Figure 13; they all show a linear positive correlation.

The result of nonlinear curve fitting and linear fitting is revealed, and linear regression of the data is used; again, it depends a lot on the number of data. We analyzed the classification accuracy results that came from the different models and obtained the same conclusion. According to the fitting result, it can be found that the learning and classification mechanisms of the SVM and KNN have a positive correlation with increased data. When the data quantity gradually increased, the SVM and KNN learning rate also increased. During the primary stage, the learning rate was lower, and when the data number increased to a certain inflection point, the learning rate rose significantly. Finally, when the data number continuously increased to a certain level, the model learning ability tended to be saturated. Thus, we can obtain a relatively stable classification accuracy.

Therefore, we can obtain the final classification accuracy of Alpha 1, Charlie 2, and Bravo 3 in the SVM of 65%, 71%, and 75%, while in KNN it is 85%, 75%, and 84%. In addition, we find, in Figure 13 and Figure 14, that the KNN classification performance was higher than the SVM classification performance during the whole stepping process. Therefore, in the next step, we need to use scientific statistics to validate and analyze the SVM and KNN classification performances.

3.1.3. Performance Validation of the Classifier

In order to test the different machine learning models’ classification performances, we used different kinds of damage for validation. We chose the SVM and KNN models for comparison. The classification accuracy from models was obtained through testing, training, and cross-validation (see Section 2). According to the last section fitting result analysis, there are some differences in the classification performance between the SVM and KNN. Therefore, the classification accuracy obtained is analyzed by scientific statistical methods in this section. Figure 15 shows all 15,842 epochs classification results for Alpha 1, Charlie 2, and Bravo 3. In cross-validation, we chose the training and testing ratio of 80:20 to avoid having data with the same labels appeared in one dataset. The classification accuracy result is presented as a scatter distribution (mean ± 1SD). Shown in Figure 15A are the mean value and SD range of classification that were obtained by the SVM model increasing with the learning and testing in each aircraft. For Alpha 1, the classification accuracy mean value is between 0.55–0.6, and when we continue to use the same methods of learning and testing for Charlie 2, the classification accuracy mean value rises to 0.6–0.65. For Alpha 1 and Charlie 2, the classification target is mainly focused on the kinds of damage data with the label “Dent”. When classifying the data of damage for Bravo 3, the classification target was changed. The final result of the mean value of classification accuracy also reached 0.65–0.7. We used the same method for training and testing in KNN (Figure 15B). According to the final result of classification accuracy, it was found that the mean value almost remained between 0.65 and 0.8. Compared with the mean value of the SVM, which is between 0.55 and 0.7, KNN is still 0.1 higher than the SVM in maximum and minimum values.

Figure 16 shows the second statistical results of Alpha 1, Charlie 2, and Bravo 3. The classification accuracy obtained from different models was used for statistical analysis by box plots. In the box plot, we define the upper quartile Q1 as the box upper edge and the lower quartile Q3 as the box lower edge; the perc is 25–75%, and the IQR is 1.5. The box width is 30%, and the wire cap length is 50. There are 50% of the data within the box. The median Q2 is located in the part with more scatter. In Figure 16A, during the first two classifications, the SVM box plot statistical results show that the box core part is between Q3: 0.53 and Q1: 0.68. In the last classification, the box core part is between Q3: 0.61 and Q1: 0.74. The box whole distribution is consistent with Figure 15A. In Figure 16B, during the first two classifications, the KNN box plot statistical results show that the box core part is between Q3: 0.56 and Q1: 0.84. In the last classification, the box core part is between Q3: 0.59 and Q1: 0.84. The box whole distribution is consistent with Figure 15B. In Figure 16A,B, the SVM and KNN classification results are normal, and there are no outliers. In Figure 16C, to compare the two models’ performances, we join the box of the SVM and KNN in one chart. In Figure 16C, it can be seen that the span of the SVM box from Q3 to Q1 is small, which means that the data distribution is more concentrated and convergent. In KNN, the span of the box from Q3 to Q1 is large, which means that the data distribution is discrete. However, in Figure 15, the conclusion is that the KNN box classification accuracy values are larger than those of the SVM according to the mean value analysis already obtained. Therefore, we must face this kind of problem: when using the SVM, the obtained classification accuracy shows concentrated distribution, and its results have high reliability. However, the corresponding classification accuracy value is not as good as KNN. When using KNN, the data distribution is discrete, but the corresponding classification accuracy value is better than the SVM. The question is now whether the results of KNN are more reliable than the SVM. This should be verified by further statistical analysis [44]. Therefore, the second statistical analysis results show that although the SVM and KNN have their own advantages, they also have their own shortcomings. These two models both have good performance; the statistical analysis shows that their classification accuracy results have their own good features.

Figure 17 shows Alpha 1, Charlie 2, and Bravo 3 statistical analyses as violin plots. There are two purposes for using violin plots to show the results of statistical analysis: First, the two models’ classification accuracy results can be compared intuitively. Second, it can use kernel density to explore and infer the new classification accuracy for possible distribution range when data quantity further increases. In Figure 17A,B, the widest border of the violin plot is 70%. The bandwidth extended automatically, the mean ± 1 SD, and the distribution of the scatter is in a unsettled state. The whole violin distribution is kernel smooth. In each chart, we used a semi-violin plot combined with a semi-scatter plot. The purpose is to better express the connection between the inferred kernel density and the scatter distribution density. In Figure 17A, the SVM violin’s widest border range is located in 0.6–0.7, 0.65–0.75, and 0.7–0.8, which means that the largest kernel density range is also located in this region. Moreover, further analysis shows that in these three locations, the corresponding scatter distribution is also the most concentrated. Therefore, we can use the mean value to infer the maximum kernel density to be 0.62, 0.66, and 0.73 (in charts, the solid dot presents the whole mean value; the maximum kernel density zone is the location of that hollow dot above the solid dot). Similarly, in Figure 17B, we also obtain the maximum kernel density ranges to be 0.8–0.9, 0.7–0.8, and 0.8–0.9, and the corresponding values are 0.84, 0.74, and 0.83. Then, we fuse the two models’ analysis results into one chart in Figure 17C for comparison. In Figure 17C, we keep the parameters unchanged and normalize the violin curve to represent the kernel density distributions more smoothly and accurately. The kernel density location is slightly adjusted after distribution curve normalization. This may be caused by function judgement, because the whole mean situation and scatter distribution situation will affect the function judgement. Therefore, the new kernel density ranges are SVM: 0.5–0.65, 0.5–0.7, and 0.6–0.75; FNN: 0.55–0.9, 0.5–0.75, and 0.6–0.85. According to the new kernel density distribution range, it can be seen that the results of the SVM and KNN are approached. Different from the box plot analysis results in Figure 16, the kernel density distribution normalized in the SVM and KNN tends toward equality. This means that within the new density distribution range, the data dispersion and convergence are similar. This is beneficial for KNN, because in the previous analysis, KNN data distribution was discrete. But this problem was solved by kernel density normalization. To obtain this result, we usually need to sacrifice a portion of the classification accuracy value. In Figure 17C, the KNN classification accuracy kernel density distribution range is still higher than that of the SVM. Although the kernel density distribution is normalized, it does not cause the value of KNN to decrease to the same value as SVM. This also shows that the classification result of KNN is much better than the SVM.

Figure 18 shows the analysis results of all classification accuracies of the SVM and KNN by using density plots. In Figure 18, the horizontal axes represent the SVM, the vertical axes represent KNN, and the color shade represents the data point density. The density parameter is defined by the discreteness of the scatter distribution of Acc. We used these two axes to build an enclosed quadrant and set the lower left quadrant corner as the origin to construct a coordinate system. In the whole upper right corner quadrant, a large amount of higher-density shade appears. This means that almost all the SVM and KNN classification accuracy distribution range is focused on the higher values region. At the same time, it also means that the model classification capabilities are very strong. During this period, the classification accuracy results analysis of Alpha 1 and Charlie 2 were not separate to Bravo 3. The reason why we did not separately analyze them is that in previous statistical analyses the model classification accuracy results of these two kinds of damage were similar. Therefore, they can be placed together for analysis. In the following receiver operating characteristic (ROC) curve analysis, we also no longer consider separate analyses of these two kinds of different damage classification accuracy. In addition, the ROC curve analysis is mainly used to judge whether the classification results are reliable. Therefore, the merging analysis will not affect the corresponding results.

Figure 19 shows the 15,842 pieces of epoch classification accuracy results that were obtained by the SVM and KNN. In Figure 19, the horizontal axis represents specificity and the vertical axis represents sensitivity. At the same time, the specificity represents the false positive rate, and the sensitivity represents the true positive rate. This means that within the quadrant enclosed by the coordinate axes, the closer the curve tends to the upper left corner quadrant, the closer the results tend toward the true value (Figure 19A,B). In order to represent the area under the curve (AUC) range more clearly, we use a ribbon radiation pattern for the plot (Figure 19C,D). The deeper the color band, the closer the true value. Therefore, the direction of the radiation should be diffused from the upper left corner. The ratio of positive state to negative state is 25:143, and the high value corresponding to the positive state is the test direction. The diagonal reference line runs through the image center. Standard error and a confidence level of 95% are used. According to the ROC curve analysis result, for the SVM, AUC is 0.89; standard error is 0.05; asymptotic probability is <0.0001; 95% LCL is 0.79; 95% UCL is 0.98. For KNN, AUC is 0.88; standard error is 0.08; asymptotic probability is 6.73542 × 10⁻⁴; 95% LCL is 0.72; 95% UCL is 1.03 (Table 2). For the SVM, the AUC is 0.89, which is >0.5, and for KNN, the AUC is 0.88, which is also >0.5. Both of these curves are located in the diagonal reference line upper left corner and biased towards the entire upper left corner quadrant. In addition, the AUCs of the SVM and KNN are 0.88 and 0.89, which are quite close to the high accuracy standard for 0.9. This indicates that the generalization ability and classification accuracy of these two models are strong, with a good application value.

3.2. Experiment 2: The AI Prediction Solution of Aviation

In Experiment 2, we chose the third machine learning model for prediction of the aerostructure surface damage. According to the results of Experiment 1, we found that the neural network model is more suitable for this kind of data for training and testing. During the previous process of data preprocessing, we used the simulated stepping to make the data links have a forward transmission capability. Therefore, in Experiment 3, it is more appropriate to choose a neural network that has a forward transmission capability for prediction. The feedforward neural network (FNN) has a forward transitivity based on time course. On each layer of the network, the information transmission direction is one-way forward. This feature gives a great advantage in dealing with corresponding time and event problems. The simulated stepping we designed also has these kinds of features. When the stepping transmission started, one beat corresponding to one clock was also transmission forward. During the time transmission forward process, it is irreversible. It can be understood as that the past time will no longer appear in reverse. Therefore, in Experiment 2, we chose the FNN for aerostructure surface damage prediction.

3.2.1. Performance of Model Learning Ability—Loss Function

Figure 20 shows the FNN function loss value situation during the prediction of some parts for Alpha 1, Charlie 2, and Bravo 3. We choose the learning rate as α = 0.01, iterating 4616, 5387, and 5839 epochs of Alpha 1, Charlie 2, and Bravo 3 for 1000 times. Then, each cost function was output 10 times. The loss values and iteration times that were obtained are shown in Figure 20. In Figure 20, the results show that in each test, the loss values obtained are decreasing steadily. It means the learning rate chosen is correct. It also means that during the learning process, the model is constrained and shows a linear downwards trend. During the model learning process, the data information loss is within the normal range. We chose each 10 times of iterations as one output. Therefore, in Figure 20, one unit on the horizontal axis represents 10 times of iterations. It means that during the 1000 iterations, each 10 times are one output, totaling 100 times.

3.2.2. Prediction Results and Assignments

Figure 21 shows the prediction results of the FNN, comparing the predicted value with the true value. The horizontal axis shows the epochs of different kinds of parts of aircraft, and the vertical axis shows the codes of different kinds of damage. When we obtained the predicted result of FNN, we simultaneously assigned values (Table 3). We define the difference between the predicted value and the true value as the classification accuracy value (i.e., p value divided by T value multiplied by 100%). Figure 21 shows the predicted results of a few aircraft parts. The region enclosed by the two green dotted lines is the target prediction parts. It can be found that when predicted for a certain part, the distance between the true value (red triangle) and the predicted value (blue square) is close or almost covered by each other. However, the distances between the predicted value and true value of other parts are far, which means that the predicted results for them are not very consistent. It should be noted that the vertical axis value represents a new kind of damage label that was obtained by FNN predictions. It was obtained by assignment with the FNN prediction result. When the predicted value is closer to the neighboring target value, this means it is closer to the true value. It also means that the model-predicted results are better. Conversely, when the difference between the predicted value and the neighboring target value is far, it means that the predicted result has a less strong correlation with the true value. During this period, the model prediction performance is poor. Therefore, it can be concluded that when we make a prediction for given aircraft parts, theoretically, the predicted value obtained should be close to the neighboring target value, which is close to the true value. But, at the same time, we also obtained the predicted results of the other parts that are located far away from the predicted part we have given. The predicted results of other parts are usually not good, which means that they are not very well correlated with the true values. However, there have occasionally appeared special cases. Outside of the target prediction section, the same situation occurs, that the high predicted value is close to the true value. The reason why this phenomenon occurred is mainly caused by each part of the aircraft being divided into LH and RH, because the aircraft left and right sides are symmetrical. For example, when we need to predict the damage on the left wings upper, the same high predicted value may also occur at the right wings upper. This phenomenon is normal, because when we are using the FNN to make a prediction for a certain part of the aircraft, at the same time, the whole aircraft damage is being predicted. Compared with the other parts, the right wing has more similar features to the target left wing. Therefore, it is actually reasonable for the model to make a high prediction.

Table 3 shows the prediction quantitative results of Figure 21 using numerical values. The table shows all parts of Alpha 1, Charlie 2, and Bravo 3. In some parts, the value is blank. The reason is that those parts have no history of damage. The learning features of machine learning are trained and tested based on existing data. Therefore, when no definite damage has occurred, the model cannot conduct prediction according to the historical damage records. In Table 3, the predicted value and true value of the FNN damage label are defined by the model’s own calculations, but the classification labels are defined before the data training and testing. In Table 3, most different parts’ T value and p value classification accuracy stay in the 80% to 90% range, and the LH-fuselage-AFT and LH-fuselage-FWD are lower, which are 76.7% and 79.3%. The classification accuracies of LH-wings-lower, RH-wings-lower, and RH-wings-upper are very high, and can reach 99.1%, 99.1%, and 99.3%. The reason why different parts have different classification accuracies may be caused by data quantity. The different quantities of data are mainly based on the actual damage number to be determined. In practice, the wing area is larger than the fuselage area. Therefore, during the flight process, it suffers damage more easily. The number of damages and the occurrence rate of damages on the wing are much greater than those on the fuselage.

Finally, in Experiment 2, we analyzed the results of using the FNN to predict the damage of one aircraft. The purpose of this experiment was to verify whether machine learning can be used to predict the aerostructure surface damage, which is based on historical records, to ultimately achieve the purpose of prevention. Therefore, the results of Experiment 2 are really important for the next work having a connecting role in the whole work. In summary, the results of Experiment 2 show that machine learning can be used to predict aerostructure surface damage, which is based on historical records and the parts location to make effective predictions. This result also suggests another possibility for using the predicted results obtained by machine learning to construct an aerostructure surface damage.

3.3. Experiment 3: The Simulation Solution of Aerostructure Surface Damage

The aerostructure surface damage simulation method can be summed up as follows: According to the result of the predicted difference value, we build a 3D matrix (three-dimensional matrix), then simulate and map it to the surface of the aerostructure model. Essentially, the aerostructure surface simulation problem is a mapping problem. Considering the change in spatial location, it is necessary to reconstruct a new coordinate position system in different parts to achieve the matching of the constructed simulated damage. In this work, we use the method of reconstruction for aerostructure surface damage simulation and mapping. Then, we can simply summarize the problem as determining how to superimpose the two image layers (Figure 22). The first image layer is a 3D damage simulation layer, which is built by the predicted result difference values. The second image layer is an aerostructure surface location mapping layer, which is built by the Fly away mods. All coordinate position determinations and each aircraft part division are based on the main systems of Station lines (STA 0.0), Water lines (WL), and Butt lines (BL) used by Boeing and Airbus. Finally, by superimposing and merging these two layers, we will obtain the new reconstruction layer.

3.3.1. Machine Learning Predicted Result—Guided 3D Damage Simulation

Usually, the historical damage textual description records the detail repair information during the maintenance period. It can be consulted by the engineer for reference [45]. Machine learning has the strong power of learning and prediction, which can quantify the maintenance damage information. However, it also has its own flaws. Since the result it obtains is often a numerical value, building an effective connection with raw information to better express the internal relevance is very important [46,47]. Therefore, considering that the numerical value and textual representations are abstract, we propose to use machine learning predicted results combined with data visualization for 3D reconstruction in this experiment. The advantages of using machine learning predicted results to reconstruct the aerostructure surface damage are the following: Firstly, the predicted results can be more intuitively represented by a 3D matrix data visualization, which can help maintenance engineers and relevant users better understand their meaning [45]. Secondly, the 3D data image that is based on the predicted results can also be used for the higher-dimensional structural curved surface fitting in the future. At this point, it makes up for the traditional historical damage record information transmission deficiencies. We performed 3D damage simulation experiments with data from Alpha 1, Charlie 2, and Bravo 3. Then, the 3D damage simulation border was defined based on the FNN predicted value results. The proposal is to make the predicted value results intuitively expressed in 3D graphics.

Figure 23 and Figure 24 show the results of using a 3D matrix to simulate the damage. The damage simulation is based on the FNN predicted results of Alpha 1, Charlie 2, and Bravo 3. The 3D topographic map shows the damage simulated graphics of all three aircraft in each part. The construction of the damage simulated graphics is based on the FNN predicted difference values. The difference values are generated by the damage truth values minus the FNN predicted values. They come from the different locations of Table 3. These difference values are the basic elements that constructed the point cloud. We found that the point cloud that constructed new damage simulation graphics surrounded the predicted values. They were centered on the predicted values and decreased sequentially to the periphery. This is because we used gradient descent to define the damage border.

In Figure 23 and Figure 24, we compressed the 3D damage simulated graphics. After compression, the raw 3D image was changed into a flat layer with 3D information [48,49,50]. Then, it was further transformed into the compressed layer. After being transformed, the compressed layer became the 3D damage simulated layer, which is what we need. The main purpose of compression is to make the raw 3D structure and change it to a 2D structure, but during this process, the third-dimension information is not lost, and is still shown in color. This result provides the possibility to directly present the machine learning predicted value result as a visualization. It also shows that it will play an important role in the subsequent reconstruction process.

3.3.2. Machine Learning Predicted Result—Guided Aerostructure Surface Damage Reconstruction

Figure 25 and Figure 26 show the results of aerostructure surface damage reconstruction based on machine learning prediction. According to the constructed 3D damage simulation topographic map based on the predicted results of the FNN, we are mapping the first image layer to the second image layer. In each image layer, the horizontal and vertical axes of reconstructed new coordinate are represented by two yellow lines. In the first image layer, the deeper the red color means the smaller the difference between the p value and T value, the higher the result of classification accuracy, and the greater the probability of the reappearance of the damage. Therefore, the damage simulation can better describe and explain the probability of damage that may reappear in the same location in the future. The reconstructed location is still guided by the machine learning predicted results. According to the reconstruction results, it was found that in the parts of the wing, damage usually occurred on the slat and flap. These damages are usually located on the border of the wing, with “Dent” easily caused by various collisions. In the parts of the fuselage, the damage usually occurs on doors, windows, and lower fuselage. These “Dent” damages are usually caused by collisions such as cargo handling, cart movement, etc.

Figure 27 and Figure 28 show the heat map results of simulated damage and reconstructed location. The purpose of using a heat map is as follows: First, improving the reconstruction results’ recognized ability. Second, preparing for the image recognition based on AI deep learning [51]. Third, supplying the support of the curved surface simulation by the Fourier transform. In Figure 27 and Figure 28, all three aircraft parts’ damage simulation and location reconstruction are transformed for heat maps. Through independent analysis of each part, it was found that the heat map presentation results of target damage are good. They are basically consistent with the raw image location.

4. Discussion

The purposes of this study are as follows: First, to test whether aerostructure surface damage can be classified and predicted by machine learning; second, to determine whether prediction results can be simulated and mapped by reconstruction. After analyzing the raw data features and exploring the different kinds of damage classification and verification, we obtained a 3D damage simulated matrix. Based on the prediction results of the machine learning model, reconstructing the aerostructure surface damage was made possible. Our results reveal that machine learning predictions can not only reflect a quantified value of prediction result but they can also be used to simulate and map the spatial location. To investigate the data features, we designed a series of experiments. These simulated stepping experiments revealed the correlation between each item in the data, as evidenced by the Pearson correlation coefficient. We also provided evidence that the data links based on simulated stepping have a feedforward transitivity and linear mechanism. It is important to note that these experiments can support and enhance the data internal logical connection during the period of data augmentation. They reveal the specificity of different classifications in different aircraft in different kinds of damages using machine learning. They confirmed the relationship between the FNN time course to the maintenance life-cycle and firmly linked the times to events. Finally, they simulated and mapped the damage location of these results and showed that there are, indeed, machine learning predicted results that can be represented in other forms. In general, our three experimental results provide and support exploration of the aerostructure surface damage digital simulation through machine learning; this reconstructs the reference location of the predicted result from a 3D matrix of point cloud simulation.

The data links feedforward transmission stage reflects the situation of damage spatial location during the forward stepping. Therefore, it seems that on the nodes of data links, there appears to be some kind of connection related to temporal and spatial dimension. There is one piece of evidence that can prove this hypothesis. If the superposition of forward stepping of new nodes leads to increased data quantity, the correlation between each item of the data can be enhanced. However, it will significantly increase the complexity of the data information and construction. We find that the data augmentation causes the correlation coefficient of each new starting node to increase linearly.

The aerostructure surface damage simulation is constructed by the difference of p value and T value. In our experiments, we did not directly use the p value as the element to construct the 3D matrix. This is consistent with the mechanism of the prediction result which was obtained by learning and testing with machine learning on existing data. Usually, the p value is obtained based on the T value. Although the p value can be directly chosen as an element to construct the 3D damage simulation matrix, machine learning will indiscriminately assign a p value to all objects. Moreover, the predicted p values obtained that were out of the target prediction region are of no use for this study.

According to the whole experimental results performance, we found that a few systems are working independently of each other. For example, after we obtained the machine learning predicted results, during the process of using predicted results to simulate the 3D damage matrix, it needed human calculation for assistance. In general, human intervention is necessary to control the direction that processes tend to take, unless the initial population of data is disproportionately large. Although we have discovered its conversion mechanism, the calculation process is still quite complex. Therefore, only by solving the generalization problem between each independent system can we reach a high effective application state.

In conclusion, through the simulated stepping of data augmentation and the classification of machine learning of different kinds of damage, we provide direct evidence that machine learning can predict aerostructure surface damage and guide simulated mapping and reconstructed location.

5. Conclusions

Previous scientific studies have explored aircraft engine and blade damage repair and simulation based on DT [52,53,54,55,56,57,58,59]. However, there is no clear conclusion on the prevention of aerostructure surface damage. Therefore, it is important to investigate the mechanism by which aerostructure surface damage occurs and predict the probability that it may reappear in the future [60,61]. Our study specifically examines the digital reconstruction of aerostructure surface damage. These digital reconstructions further explore the damage type and result prediction. Based on experiment results, the conclusions are as follows: (1) Machine learning can effectively recognize the different kinds of descriptive damage data that have internal logical relationships. (2) Based on the machine learning recognition result, the same data can be effectively predicted. (3) Under the guidance of the machine learning prediction results, the 3D matrix of damage simulation can be mapped onto the aerostructure surface model through reconstruction.

Compared with previous studies, in this work, the main innovations are the following: Through encoding, the raw damage text description record was transformed into digital language that could be understood by machine learning and be successfully recognized and predicted by machine learning. Through reconstruction, the prediction results of machine learning were presented in the form of heat map images with the images of the aircraft which can easily and intuitively be understood by the aircraft maintenance engineer or other Maintenance, Repair, Overhaul (MRO) relevant users. In the future, with additional and different kinds of damage data, the number and variety of heat maps will also increase. These heat maps were obtained based on the generated results of this work. These heat maps can be used to build a damage prediction image gallery. Then, in the future, this damage prediction image gallery can provide raw research information for the use of image recognition based on AI deep learning. At the same time, by fusing the AI image recognition model with the machine learning prediction model, it is possible to build a large model. In this large model, the AI image recognition results can be used to enhance the robustness of the machine learning prediction model. When the machine learning prediction model’s efficiency improves, it will also upgrade the accuracy of the generated images. Finally, it forms a closed inner loop inside the large model for self-training and testing. The reconstructed heat maps resulting from this work also provide the possibility for simulated nondestructive inspection in the future. We will use Fourier transforms of the 3D damage matrix for curved surface simulation to show the damage extent. In the future, the results of this work will provide basic and useful information for further research in the field of aerostructure surface damage.

Author Contributions

Y.W.: Writing—original draft, visualization, software, methodology, formal analysis. Y.X. and A.M.: Writing—review and editing, writing—original draft, visualization, validation, software, methodology, investigation, funding acquisition, formal analysis, conceptualization. H.P.T. and R.V.: Writing—review and editing, writing—original draft, visualization, validation, supervision, validation, supervision, software, project administration, methodology, investigation, funding acquisition, formal analysis, conceptualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the “Innovation and Technology Commission (Project ITS/017/22FP)”.

Data Availability Statement

The data underlying this article cannot be shared publicly due to being proprietary data from an airline shared through an aircraft OEM.

Acknowledgments

The present work was supported by the Innovation and Technology Commission (Project ITS/017/22FP), Hong Kong Polytechnic University (Aviation Services Research Centre), The Boeing Company, Hong Kong Aircraft Engineering Co Ltd. (HAECO), and Hong Kong Aero Engine Services Ltd. (HAESL).

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Xiong, M.; Wang, H. Digital twin applications in aviation industry: A review. Int. J. Adv. Manuf. Technol. 2022, 121, 5677–5692. [Google Scholar] [CrossRef]
Hartwell, A.; Montana, F.; Jacobs, W.; Kadirkamanathan, V.; Ameri, N.; Mills, A.R. Distributed digital twins for health monitoring: Resource constrained aero-engine fleet management. Aeronaut. J. 2024, 128, 1556–1575. [Google Scholar] [CrossRef]
Hofmann, S.; Duhovnikov, S.; Schupke, D. Massive data transfer from and to aircraft on ground: Feasibility and challenges. IEEE Aerosp. Electron. Syst. Mag. 2021, 36, 6–14. [Google Scholar] [CrossRef]
Badea, V.; Zamfroiu, A.; Boncea, R. Big Data in the Aero space Industry. Inform. Econ. 2018, 22, 17–24. [Google Scholar]
Qi, Q.; Tuegel, E. Digital twin and big data towards smart manufacturing and industry 4.0: 360 degree comparison. IEEE Access 2018, 6, 3585–3593. [Google Scholar] [CrossRef]
Mabkhot, M.; Al-Ahmari, A.; Salah, B.; Alkhalefah, H. Requirements of the smart factory system: A survey and perspective. Machines 2018, 6, 23. [Google Scholar] [CrossRef]
Phanden, R.K.; Sharma, P.; Dubey, A. A review on simulation in digital twin for aerospace, manufacturing and robotics. Mater. Today Proc. 2020, 38, 174–178. [Google Scholar] [CrossRef]
Zaccaria, V.; Stenfelt, M.; Aslanidou, I.; Kyprianidis, K.G. Fleet Monitoring and Diagnostics Framework based on Digital Twin of Aero-Engines. In Proceedings of the ASME Turbo Expo, Oslo, Norway, 11–15 June 2018. [Google Scholar] [CrossRef]
Depold, H.R.; Siegel, J. Using diagnostics and prognostics to minimize the cost of ownership of gas turbines. In Proceedings of the ASME Turbo Expo 2006: Power for Land, Sea and Air, Barcelona, Spain, 8–11 May 2006. Paper number GT2006-91183. [Google Scholar]
Gorinevsky, D.; Matthews, B.; Martin, R. Aircraft Anomaly Detection using Performance Models Trained on Fleet Data. In Proceedings of the 2012 Conference on Intelligent Data Understanding, Boulder, CO, USA, 24–26 October 2012. [Google Scholar] [CrossRef]
Li, L.; Gariel, M.; Hansman, R.J.; Palacios, R. Anomaly detection in onboard-recorded flight data using cluster analysis. In Proceedings of the 30th IEEE/AIAA Digital Avionics Systems Conference (DASC), Seattle, WA, USA, 16–20 October 2011; pp. 4A4-1–4A4-11. [Google Scholar] [CrossRef]
Uzun, M.; Demirezen, M.U.; Koyuncu, E.; Inalhan, G. Design of a Hybrid Digital-Twin Flight Performance Model Through Machine Learning. In Proceedings of the 2019 IEEE Aerospace Conference, Big Sky, MT, USA, 2–9 March 2019; pp. 1–14. [Google Scholar] [CrossRef]
Khadilkar, H.; Balakrishnan, H. Estimation of aircraft taxi fuel burn using flight data recorder archives. Transp. Res. Part D Transp. Environ. 2012, 17, 532–537. [Google Scholar] [CrossRef]
Chati, Y.S.; Balakrishnan, H. A Gaussian process regression approach to model aircraft engine fuel flow rate. In Proceedings of the 2017 ACM/IEEE 8th International Conference on Cyber-Physical Systems (ICCPS), Pittsburgh, PA, USA, 18–21 April 2017; pp. 131–140. [Google Scholar] [CrossRef]
Li, L.; Aslam, S.; Wileman, A.; Perinpanayagam, S. Digital Twin in Aerospace Industry: A Gentle Introduction. IEEE Access 2022, 10, 9543–9562. [Google Scholar] [CrossRef]
King, S.P.; Mills, A.R.; Kadirkamanathan, V. Equipment Health Monitoring in Complex Systems; Artech House: London, UK, 2017; ISBN 1630814970/9781630814977. [Google Scholar]
Zhou, L.; Wang, H.; Xu, S. Aero-engine gas path system health assessment based on depth digital twin. Eng. Fail. Anal. 2022, 142, 106790. [Google Scholar] [CrossRef]
Li, J.; Zhao, G.; Zhang, P.; Xu, M.; Cheng, H.; Han, P. A Digital Twin-based on-site quality assessment method for aero-engine assembly. J. Manuf. Syst. 2023, 71, 565–580. [Google Scholar] [CrossRef]
Huang, Y.; Tao, J.; Sun, G.; Wu, T.; Yu, L.; Zhao, X. A novel digital twin approach based on deep multimodal information fusion for aero-engine fault diagnosis. Energy 2023, 270, 126894. [Google Scholar] [CrossRef]
Sun, R.; Shi, L.; Yang, X.; Wang, Y.; Zhao, Q. A coupling diagnosis method of sensors faults in gas turbine control system. Energy 2020, 205, 117999. [Google Scholar] [CrossRef]
Huang, Y.; Sun, G.; Tao, J.; Hu, Y.; Yuan, L. A modified fusion model-based/data-driven model for sensor fault diagnosis and performance degradation estimation of aero-engine. Meas. Sci.Technol. 2022, 33, 85105. [Google Scholar] [CrossRef]
Tadepalli, N.V.R.S.; Koona, R. Gas turbine aero engine fault detection using Geo-TLSVM and digital twin with multimodal data analysis. Eng. Res. Express 2024, 6, 15523. [Google Scholar] [CrossRef]
Stevens, R. Digital Twin for Spacecraft Concepts. In Proceedings of the 2023 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2023; pp. 1–7. [Google Scholar] [CrossRef]
Alizadehsalehi, S.; Hadavi, A.; Huang, J.C. From BIM to extended reality in AEC industry. Autom. Constr. 2020, 116, 103254. [Google Scholar] [CrossRef]
Shin, J.H.; Park, S.J.; Kim, M.A.; Lee, M.J.; Lim, S.C.; Cho, K.W. Development of a Digital Twin Pipeline for Interactive Scientific Simulation and Mixed Reality Visualization. IEEE Access 2023, 11, 100907–100918. [Google Scholar] [CrossRef]
McClellan, A.; Lorenzetti, J.; Pavone, M.; Farhat, C. A physics-based digital twin for model predictive control of autonomous unmanned aerial vehicle landing. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 2022, 380, 20210204. [Google Scholar] [CrossRef]
Fawke, A.J.; Saravanamuttoo, H.I.H. Digital computer simulation of the dynamic response of a twin-spool turbofan with mixed exhausts. Aeronaut. J. 1973, 77, 471–478. [Google Scholar] [CrossRef]
Xiong, M.; Wang, H.; Fu, Q.; Xu, Y. Digital twin–driven aero-engine intelligent predictive maintenance. Int. J. Adv. Manuf. Technol. 2021, 114, 3751–3761. [Google Scholar] [CrossRef]
Baur, M.; Albertelli, P.; Monno, M. A review of prognostics and health management of machine tools. Int. J. Adv. Manuf. Technol. 2020, 107, 2843–2863. [Google Scholar] [CrossRef]
Lu, Y.; Liu, C.; Kevin, I.; Wang, K.; Huang, H.; Xu, X. Digital twin driven smart manufacturing: Connotation, reference model, applications and research issues. Robot. Comput.-Integr. Manuf. 2020, 61, 101837. [Google Scholar] [CrossRef]
Daily, J.; Peterson, J.; Walther, J.; Richter, K.; Walther, J.; Richter, K. Predictive maintenance: How big data analysis Can improve maintenance. In Supply Chain Integration Challenges in Commercial Aerospace; Richter, K., Walther, J., Eds.; Springer: Cham, Switzerland, 2017. [Google Scholar] [CrossRef]
Lee, J.; Wang, H. New technologies for maintenance. In Complex System Maintenance Handbook; Springer Series in Reliability Engineering; Springer: London, UK, 2008. [Google Scholar] [CrossRef]
Ghorbani, H.; Khameneifar, F. Construction of damage-free digital twin of damaged aero-engine blades for repair volume generation in remanufacturing. Robot. Comput.-Integr. Manuf. 2022, 77, 102335. [Google Scholar] [CrossRef]
Xu, C.; Gui, X.; Zhao, Y. Digital Twin-Assisted Multiview Reconstruction Enhanced Domain Adaptation Graph Networks for Aero-Engine Gas Path Fault Diagnosis. IEEE Sens. J. 2024, 24, 21694–21705. [Google Scholar] [CrossRef]
Engel, A.K.; Fries, P. Beta-band oscillations-signalling the status quo? Curr. Opin. Neurobiol. 2010, 20, 156–165. [Google Scholar] [CrossRef]
Li, L.; Li, C.; Tang, Y.; Du, Y. An integrated approach of reverse engineering aided remanufacturing process for worn components. Robot. Comput.-Integr. Manuf. 2017, 48, 39–50. [Google Scholar] [CrossRef]
Wilson, J.M.; Piya, C.; Shin, Y.C.; Zhao, F.; Ramani, K. Remanufacturing of turbine blades by laser direct deposition with its energy and environmental impact analysis. J. Clean. Prod. 2014, 80, 170–178. [Google Scholar] [CrossRef]
Su, C.; Han, Y.; Tang, X.; Jiang, Q.; Wang, T.; He, Q. Knowledge-based digital twin system: Using a knowlege-driven approach for manufacturing process modeling. Comput. Ind. 2024, 159–160, 104101. [Google Scholar] [CrossRef]
Su, C.; Jiang, X.; Huo, G.; Zou, Q.; Zheng, Z.; Feng, H.-Y. Accurate model construction of deformed aero-engine blades for remanufacturing. Int. J. Adv. Manuf. Technol. 2020, 106, 3239–3251. [Google Scholar] [CrossRef]
Patro, S.; Sahu, K.K. Normalization: A preprocessing stage. arXiv 2015, arXiv:1503.06462. [Google Scholar] [CrossRef]
Khameneifar, F.; Feng, H.-Y. Establishing a balanced neighborhood of discrete points for local quadric surface fitting. Comput. Aided Des. 2017, 84, 25–38. [Google Scholar] [CrossRef]
Zheng, J.; Li, Z.; Chen, X. Worn area modeling for automating the repair of turbine blades. Int. J. Adv. Manuf. Technol. 2006, 29, 1062–1067. [Google Scholar] [CrossRef]
Zhang, X.; Cui, W.; Liou, F. Voxel-Based Geometry Reconstruction for Repairing and Remanufacturing of Metallic Components Via Additive Manufacturing. Int. J. Precis. Eng. Manuf.-Green Technol. 2021, 8, 1663–1686. [Google Scholar] [CrossRef]
Martinsson, J.; Panarotto, M.; Kokkolaras, M.; Isaksson, O. Exploring the potential of digital twin-driven design of aero-engine structures. In Proceedings of the Development of Efficient DIgital Product FAMily Design Platform to Increase Cost Efficiency-DIFAM, Gothenburg, Sweden, 16–20 August 2021; Volume 1, pp. 1521–1528. [Google Scholar] [CrossRef]
Gao, J.; Chen, X.; Yilmaz, O.; Gindy, N. An integrated adaptive repair solution for complex aerospace components through geometry reconstruction. Int. J. Adv. Manuf. Technol. 2008, 36, 1170–1179. [Google Scholar] [CrossRef]
Yilmaz, O.; Gindy, N.; Gao, J. A repair and overhaul methodology for aeroengine components. Robot. Comput.-Integr. Manuf. 2010, 26, 190–201. [Google Scholar] [CrossRef]
Praniewicz, M.; Kurfess, T.; Saldana, C. An Adaptive Geometry Transformation and Repair Method for Hybrid Manufacturing. J. Manuf. Sci. Eng. 2019, 141, 0110061. [Google Scholar] [CrossRef]
Yan, C.; Wan, W.; Huang, K.; Liu, L.; Lee, C.-H. A reconstruction strategy based on CSC registration for turbine blades repairing. Robot. Comput.-Integr. Manuf. 2020, 61, 101835. [Google Scholar] [CrossRef]
Wu, B.; Zheng, H.; Wang, J.; Zhang, Y. Geometric model reconstruction and CNC machining for damaged blade repair. Int. J. Comput. Integr. Manuf. 2020, 33, 287–301. [Google Scholar] [CrossRef]
Selvarajan, S.; Manoharan, H.; Shankar, A.; Khadidos, A.O.; Khadidos, A.O.; Galletta, A. PUDT: Plummeting uncertainties in digital twins for aerospace applications using deep learning algorithms. Future Gener. Comput. Syst. 2024, 153, 575–586. [Google Scholar] [CrossRef]
Fotland, G.; Haskins, C.; Rølvåg, T. Trade study to select best alternative for cable and pulley simulation for cranes on offshore vessels. Syst. Eng. 2020, 23, 177–188. [Google Scholar] [CrossRef]
Björnsson, B.; Borrebaeck, C.; Elander, N.; Gasslander, T.; Gawel, D.R.; Gustafsson, M.; Jörnsten, R.; Lee, E.J.; Li, X.; Lilja, S.; et al. Digital twins to personalize medicine. Genome Med. 2019, 12, 4. [Google Scholar] [CrossRef] [PubMed]
Terkaj, W.; Gaboardi, P.; Trevisan, C.; Tolio, T.; Urgo, M. A digital factory platform for the design of roll shop plants. CIRP J. Manuf. Sci. Technol. 2019, 26, 88–93. [Google Scholar] [CrossRef]
Coronado, P.D.U.; Lynn, R.; Louhichi, W.; Parto, M.; Wescoat, E.; Kurfess, T. Part data integration in the Shop Floor Digital Twin: Mobile and cloud technologies to enable a manufacturing execution system. J. Manuf. Syst. 2018, 48, 25–33. [Google Scholar] [CrossRef]
Zhuang, C.; Liu, J.; Xiong, H. Digital twin-based smart production management and control framework for the complex product assembly shop-floor. Int. J. Adv. Manuf. Technol. 2018, 96, 1149–1163. [Google Scholar] [CrossRef]
Bilberg, A.; Malik, A.A. Digital twin driven human–robot collaborative assembly. CIRP Ann. 2019, 68, 499–502. [Google Scholar] [CrossRef]
Cai, Y.; Wang, Y.; Burnett, M. Using augmented reality to build digital twin for reconfigurable additive manufacturing system. J. Manuf. Syst. 2020, 56, 598–604. [Google Scholar] [CrossRef]
Zheng, Y.; Yang, S.; Cheng, H. An application framework of digital twin and its case study. J. Ambient. Intell. Humaniz. Comput. 2019, 10, 1141–1153. [Google Scholar] [CrossRef]
Coraddu, A.; Oneto, L.; Baldi, F.; Cipollini, F.; Atlar, M.; Savio, S. Data-driven ship digital twin for estimating the speed loss caused by the marine fouling. Ocean Eng. 2019, 186, 106063. [Google Scholar] [CrossRef]
Grieves, M.; Vickers, J. Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems. In Transdisciplinary Perspectives on Complex Systems; Kahlen, F.J., Flumerfelt, S., Alves, A., Eds.; Springer: Cham, Switzerland, 2017. [Google Scholar] [CrossRef]
Glaessgen Edward, H.; Stargel, D.S. The Digital Twin Paradigm for Future NASA and U.S. Air Force Vehicles. In Proceedings of the 53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference, Honolulu, HI, USA, 23–26 April 2012; p. 1818. Available online: https://ntrs.nasa.gov/search.jsp?R=20120008178 (accessed on 3 December 2024).

Figure 1. Aerostructure surface damage digital reconstruction framework.

Figure 2. Raw data encoding process.

Figure 3. Simple statistical analysis.

Figure 4. Pearson correlation coefficient analysis.

Figure 5. Simulated stepping and simulated data links.

Figure 6. The growth rate of correlation coefficients after simulated stepping superimposition.

Figure 7. The process of data training, testing, and classifying.

Figure 8. The decision-making role of k-value in KNN analysis.

Figure 9. The result of k-value parameter adjusting.

Figure 10. Feedforward neural network framework.

Figure 11. Point cloud building and mapping.

Figure 12. The classification results of the SVM and KNN for Alpha 1, Charlie 2, and Bravo 3.

Figure 13. The logistic fit result of the SVM and the FNN classification accuracy for Alpha 1, Charlie 2, and Bravo 3.

Figure 14. The linear fit results of the SVM and the FNN classification accuracy for Alpha 1, Charlie 2, and Bravo 3.

Figure 15. The mean value analysis of the SVM and KNN classification performances.

Figure 16. The box plot statistical analysis of the SVM and KNN for Alpha 1, Charlie 2, and Bravo 3.

Figure 17. The violin plot of the SVM and KNN for Alpha 1, Charlie 2, and Bravo 3.

Figure 18. The density analysis of classification results for the SVM and KNN.

Figure 19. ROC curve of the SVM and KNN.

Figure 20. The function loss value and the iteration times.

Figure 21. The prediction results of Alpha 1, Charlie 2, and Bravo 3.

Figure 22. The mapping process of 3D damage simulated layer and aerostructure surface location layer.

Figure 23. The 3D damage simulation of the wings.

Figure 24. The 3D damage simulation of the fuselage and the stabilizer.

Figure 25. Wing and stabilizer surface damage reconstruction.

Figure 26. Fuselage surface damage reconstruction.

Figure 27. The surface damage thermal infrared imaging of the wing and the stabilizer.

Figure 28. The surface damage thermal infrared imaging of the fuselage.

Table 1. The raw textual content encoding.

ATA		Zone		Aircraft		Parts		Location	Damage Type
Fuselage	53	Fuselage lower	100	LH	1	AFT	1	0–9999	Others	1
Doors	52	Fuselage top	200	RH	2	FWD	2		Dent	2
Wings	57	Doors	800			Fairing	3
Nacelles	54	Left wing	500			Upper	4
Stabilizers	55	Right wing	600			Lower	5
		Stabilizers	300			LH	6
						RH	7
						Vertical	8
						Horizontal lower	9
						Horizontal upper	10

Table 2. ROC curve analysis results.

	Area	Standard Error	Asymptotic Prob	95% LCL	95% UCL
SVM	0.88699	0.04975	<0.0001	0.78949	0.9845
KNN	0.87755	0.07862	6.73542 × 10⁻⁴	0.72347	1.03164

Table 3. The predicted value and true value of the FNN.

Location	Alpha 1			Charlie 2			Bravo 3
Location	Trues	Predict	ACC	Trues	Predict	ACC	Trues	Predict	ACC
LH-wings-upper	−0.08	−0.07	0.90	−0.15	−0.19	0.79	−0.26	−0.31	0.84
LH-wings-lower	−0.08	−0.08	0.99	−0.15	−0.19	0.82	−0.26	−0.27	0.95
RH-wings-upper	−0.08	−0.10	0.77	−0.15	−0.20	0.76	−0.26	−0.26	0.99
RH-wings-lower	−0.08	−0.06	0.80	−0.15	−0.15	0.99	−0.26	−0.27	0.961
LH-fuselage-FWD	−0.08	−0.06	0.79	−0.15	−0.15	0.98	−0.26	−0.28	0.92
LH-fuselage-AFT	−0.08	−0.06	0.77	−0.15	−0.16	0.98
RH-fuselage-FWD				−0.15	−0.16	0.98
RH-fuselage-AFT	−0.08	−0.07	0.89	−0.15	−0.12	0.81	−0.26	−0.28	0.94
RH-door-AFT				−0.15	−0.17	0.89

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, Y.; Tang, H.P.; Mannion, A.; Voyle, R.; Xin, Y. Using Machine Learning for Aerostructure Surface Damage Digital Reconstruction. Aerospace 2025, 12, 72. https://doi.org/10.3390/aerospace12010072

AMA Style

Wu Y, Tang HP, Mannion A, Voyle R, Xin Y. Using Machine Learning for Aerostructure Surface Damage Digital Reconstruction. Aerospace. 2025; 12(1):72. https://doi.org/10.3390/aerospace12010072

Chicago/Turabian Style

Wu, Yijia, Hon Ping Tang, Anthony Mannion, Robert Voyle, and Ying Xin. 2025. "Using Machine Learning for Aerostructure Surface Damage Digital Reconstruction" Aerospace 12, no. 1: 72. https://doi.org/10.3390/aerospace12010072

APA Style

Wu, Y., Tang, H. P., Mannion, A., Voyle, R., & Xin, Y. (2025). Using Machine Learning for Aerostructure Surface Damage Digital Reconstruction. Aerospace, 12(1), 72. https://doi.org/10.3390/aerospace12010072

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Using Machine Learning for Aerostructure Surface Damage Digital Reconstruction

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Preprocessing and Analysis

2.1.1. Encoding Data

2.1.2. Simple Statistical Analysis and Correlation Coefficient Analysis

2.1.3. Using Stepping to Simulate the Data Links to Augment Data

2.1.4. Normalization

2.1.5. Data Training and Cross-Validation

2.2. Using Machine Learning for Classification (Experiment 1)

2.2.1. Kernel Functions of the SVM

2.2.2. Similarity Function of KNN

2.3. Using the FNN for Damage Prediction (Experiment 2)

2.4. Building Point Cloud and 3D Matrix (Experiment 3)

3. Results

3.1. Experiment 1: Machine Learning Recognition—Different Kinds of Damage

3.1.1. Presentation of Different Models’ Classification Performance

3.1.2. Using Curve Fitting to Analyze the Mechanism of Damage Recognition

3.1.3. Performance Validation of the Classifier

3.2. Experiment 2: The AI Prediction Solution of Aviation

3.2.1. Performance of Model Learning Ability—Loss Function

3.2.2. Prediction Results and Assignments

3.3. Experiment 3: The Simulation Solution of Aerostructure Surface Damage

3.3.1. Machine Learning Predicted Result—Guided 3D Damage Simulation

3.3.2. Machine Learning Predicted Result—Guided Aerostructure Surface Damage Reconstruction

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI