1. Introduction
Electrical machines have been intensively used in recent years in the automotive industry for applications that require high efficiency and reliability, switching designers’ focus to the development of robust high-performance electrical machines. The design process of an electrical motor is a complex task that needs to deal with the non-linearity caused by the saturation of the iron at high magnetic field strength and the multi-physical nature of the investigated system, as well as with requirements that may come into conflict. Conventionally, the design process follows a sequential process: firstly, the electromagnetic targets are satisfied, then the stress and thermal aspects are analysed, leaving the noise, vibration and harshness (NVH) characteristics at the end. This means that at the end of the design process, only limited changes could be made to improve the noise and vibration characteristics. At the same time, since the design process is a multi-physical problem, where different domains are involved, experts from distinct fields must interact and contribute to achieve a robust, optimal design. Therefore, the electrical machine development process is an iterative and computational expensive one. Moreover, the multi-physical nature of motor characteristics analysis requires a synergy between 2D FEA for electromagnetic and losses analysis and 3D FEA for structural characteristics. Hence, the overall time is increased by using finite element (FE) analysis, that is time consuming and memory intensive, even when parallelization techniques are used.
Traditionally, the process used to design a high-performance electrical machine is multi-objective machine optimisation [
1,
2]. The optimal design is obtained by automatically varying the geometric parameters within predefined limits, for imposed objectives and constraints. The design space exploration is conducted using optimisation algorithms [
3], the designer having the freedom to choose the objectives, constraints and parameters discretisation, selected based on manufacturing capabilities.
Various studies from the state of the art are focused on finding the optimal design of electrical machines. Due to economical reasons (e.g., high cost of rare-earth materials) and the need for high power densities, the cost optimisation procedure gained popularity in the latter years, the machine being optimised to met the requirements at the lowest cost [
4]. Different motor topologies, including permanent magnet and synchronous reluctance machines, are analysed in order to select the best design at the lowest cost in [
5], while in [
6], a permanent magnet synchronous machine (PMSM) is optimised to met the performance and cost demands, with focus on high-volume mass production and its constraints. Another optimisation objective is presented in [
7], where the torque ripple is optimised to obtain a high and smooth torque. Nevertheless, the computational cost of an optimisation loop can drastically increase when a large number of machine designs are analysed. This is caused by the FE-based simulations conducted to evaluate the performances of the machine designs. Despite their well-known accuracy, the simulations based on FE may limit the optimisation process, due to their high computational cost (simulation times may vary from several minutes to several hours or even days [
8]). To overcome the discussed issues, fast models can be developed using machine learning models [
9], reducing the computational burden in the design stage, as most of the computations are carried out in the model building phase. At the same time, several processes can be brought earlier in the design cycle. This way, the system’s performances and sensitivities can be identified in the concept stage and the designer can decide if the desired targets are met.
Machine learning models used for design, optimisation, fault detection and robustness evaluation have been among the main research interests in electrical machines field over the years. Some works have already focused on generating machine learning models that allow replacement of the time consuming FEA and reduce the computational time. In [
10], a statistical analysis that uses multiple correlation coefficients has been used to generate a fast model that is able to replace the FE model and reduce the computational effort, whereas in [
11], the same objective is accomplished by using an artificial neural network. Another approach that uses online learning and dynamic weighting methods is presented in [
12]. Moreover, in [
3] the focus is on analyzing the effectiveness of using electrical machine machine learning models that incorporate tolerance or sensitivity aspects in a multi-objective optimization run. Further works focusing on machine-learning-assisted multi-objective optimisation are presented in [
13], and in [
14]. A recent work presents a data-driven structural modelling for electrical machine airborne vibration that is intended to be used in both design stage for optimisation purposes and in system-level simulations [
15]. In [
16], a multi-physics simulation workflow, based on reduce-order models, used to predict the vibro-acoustic behaviour of electrical and decrease the computation time is presented. The influence of the production mass tolerances modeled at system level and the interaction between the uncertainties and the drive’s components, together with the fitted machine learning can be found in [
1]. Fast prediction of electromagnetic torque and flux linkages used in system-level simulations are accomplished using a machine learning electromagnetic model based on artificial neural networks in [
17]. Machine learning models employed to predict sound pressure levels are developed in [
18,
19], where it is proven that the developed models can be considered as replacements of the FEA for future design and optimization problems of the same motor.
However, none of the above examples focus on the multi-physical characteristics of electrical machines, but only on the prediction of one physics (e.g., electromagnetic, thermal or structural characteristics). In [
20], a data-aided, deep learning meta-model, based on cross-section image processing, is created to predict the multi-physical characteristics (e.g., maximum torque, critical field strength, costs of active parts, sound power) quickly and to accelerate the full optimisation process. The accuracy of this method is highly dependent on the chosen hyper-parameters settings and on the precision of the input data. Even if the hyper-parameters can be selected based on a sensitivity analysis, the accuracy is still dependent on the precision of input data. The image-based proposed method performs close to parameter-based method, that will be presented in this paper, only for increased pixel resolution of the training data, meaning that the method proposed in the cited work is memory intensive.
This paper proposes to use geometric parametric models to evaluate the multi-physical performances of electrical machines and build a multi-attribute machine learning model. The aim is to characterise the development process of a multi-attribute machine learning model that is able to predict the multi-physical characteristics of electrical machines. The obtained machine learning model is capable to quickly and accurately estimate the multi-attribute performance from geometrical parameters (which represent the input data). At the same time, the model is suitable for inclusion in optimisation routines to reduce the process time and in system-level simulations for fast predictions. The main difference between the presented machine learning model and the ones found in the literature is the multi-physical feature of the developed model. The model can accurately estimate the torque and back electromotive force, motor losses and natural frequencies of the mode-shapes. Moreover, this article presents a way to simplify the machine learning problem by using a harmonic model to predict the electromagnetic values, reducing the computational burden in the training phase.
For that, the full process of designing the electrical motor is brought in the early-design stage, where the characteristics (electromagnetic, motor losses, mode-shapes and natural frequencies) can be evaluated and predicted by the designer. Therefore, the designer can come to the best solution, identifying the system’s capabilities and sensitivities in the concept phase, without involving experts from different domains. The proposed workflow is identified in
Figure 1. The data, needed to build the multi-attribute model, are achieved by conducting both 2D electromagnetic and 3D structural FE analysis on a set of motor models generated by imposing multiple design parameters on the reference parametric model. After that, the resulting data is harnessed using machine learning algorithms, creating one machine learning model per physics involved. One harmonic model for the electromagnetic torque and back emf is created by applying a Fourier decomposition on the electromagnetic quantities, reducing the size and the complexity of the model, without a significant loss in accuracy [
21]. Another two models are built, one for motor losses and one model to predict the natural frequencies of the mode-shapes. The most suitable machine learning model for each target is chosen by testing different machine learning algorithms applied on multiple dataset sizes. Based on their capability to predict electrical motors multi-physical characteristics, the most accurate machine learning model for each physics is selected.
The paper is structured as follows: The process of building the stator parametric model, together with the most important design parameters and their interval of variation is presented in
Section 2. Afterward, the multi-physical FE analysis, that include electromagnetic simulations, losses computations and structural modal analysis, performed on the parametric model for a set of imposed design variables are described in
Section 3.
Section 4 is dedicated to the multi-attribute machine learning model. Here, an evaluation of the prediction and fitting capabilities of different machine learning algorithms are investigated. Moreover, the impact of training sample size is discussed here.
Section 5 deals with a comparison between machine learning and FEA computational cost. The final conclusions are drawn in
Section 6.
2. Stator Parametric Model
At the beginning of the development process, the designer has information about the stator core, but not about the rest of the components. Hence, for the structural analysis, only the stator core is modeled, leaving the housing and the rotor core aside. The stator is the main energy transfer path and has the most important influence on the NVH characteristics. The stator is excited by the electromagnetic field and transfers the energy to the exterior, causing airborne noise. Therefore, in order to characterise the transfer path, a three-dimensional model is created within Simcenter 3D. The 3D stator model is constructed based on the geometric dimensions and material specifications of the electrical motor under study. Afterwards, the stator model is parameterized with the same degrees-of-freedom (DOFs) as in the electromagnetic optimization process [
6], a DOF consistency being needed for multi-physics optimization routine.
Figure 2 shows the cross-section and the DOFs of the parameterized model, where TWS represents the tooth width, YT is the yoke thickness, SOAng stands for the tooth tip angle, TGD is the tooth tip height, SO represents the slot opening, R is the stator inner radius, and
stands for the stator outer radius.
The parameterised model allows the generation of a set of feasible designs by performing a design of experiments using specific sampling techniques. The most important design parameters, TWS, YT, SOAng, TGD, SO, are varied within imposed boundaries, while the stator length (
) and the stator inner and outer radii are kept constant. Each parameter is individually varied within imposed limits and the design space is filled with the help of the Latin Hypercube Sampling Technique (LHS). The variation interval, with its lower (LB) and upper (UB) boundaries, of the considered DOFs is presented in
Table 1.
The variable DOFs are chosen based on their impact on the structural characteristics and electromagnetic performances of the machine. The yoke thickness is the main contributor to the stator vibration response due to its direct influence on the stator stiffness. By increasing the yoke thickness, the stiffness value enlarges, shifting the natural frequencies corresponding to each mode-shape to a higher value. In particular, mode 0 (breathing mode) is strongly influenced by the variation of this parameter. At the same time, the yoke thickness, together with the tooth width have a massive impact on the electromagnetic characteristics by influencing the saturation levels of the electrical machine. The tooth width can also influence the structural behavior of the motor by creating local tooth bending modes. On the other hand, the variation of tooth tip height, slot opening and tooth tip, influence especially the electromagnetic quantities. The electromagnetic flux density harmonics are affected by this variation and, indirectly the structural vibration characteristics, by influencing the source of vibrations for the electrical motor.
4. Multi-Attribute Machine Learning Model Selection
A machine learning model is constructed by harnessing a set of system responses (output values) obtained by imposing a set of predictor variables (input parameters). Therefore, defining the input–output behaviour characteristics is essential. Then, based on the type of the obtained datasets (i.e., discrete or time-series), a proper machine learning method is chosen to process the data that is divided into training, validation and test samples. Following that, the model is then trained using training samples and its accuracy is tested. However, the accuracy of the developed machine learning model depends on the selected machine learning algorithm, as well as on the size of dataset used for training the model.
In this paper, three independent machine learning models are developed for each analysed physics, one for electromagnetic torque and back emf harmonics, one for motor losses and one for the structural targets (modes and natural frequencies). For that, three types of machine learning methods used for regressions are tested on the available datasets exported from 2D electromagnetic and 3D structural analysis. The tested methods are support vector regression (SVR), gradient boosting regressor (GBR) and Gaussian process regressor (GPR) and were selected due to their capacity to predict discrete datasets. Moreover, their accuracy is tested for four training datasets, 250, 500, 750 and 1000 samples. The Python Scikit-Learn library [
28] was used to implement the discussed regression models. The procedure used to train, validate and test the machine learning models is the same for each method applied. The training process uses 70% of the available dataset, while the rest of 30% is used for testing.
4.1. Support Vector Regression (SVR)
Support vector regression is a popular machine learning model used for both data classification of continuous datasets and regression. SVR is a supervised learning algorithm that uses the same principles as Support Vector Machine and is suitable for prediction of discrete values [
29]. The algorithm searches the best line that can fit the data. SVR allows the introduction of the error limit, or the tolerance and will fit the data through a line or a hyperplane (in multiple dimensions). The SVR objective is to minimize the l2 norm,
, where the maximum error is set to have limited variation.
When using SVR algorithm, one of the most important aspects to take care of is the dataset size. SVR is not suitable for training large datasets because the training time increases with more than the square of number of samples [
30] and becomes too computational expensive. For large datasets, defined by more than 10,000 samples, it becomes infeasible to use SVR due to the increased training time. For large datasets, Linear SVR or SDR regressor can be used as substitutes for classical SVR algorithm [
31]. However, as the maximum size of the dataset used for this work is 1000, the classical SVR algorithm will be applied. Aside from this, SVR was chosen for this work due to its advantages for small datasets: it is robust to outliers, it has excellent generalization capability, a high prediction accuracy and it is easy to implement.
Nevertheless, another aspect that must be considered when applying SVR on datasets is the ratio between the number of features and the number of training samples. When the ratio is greater than one, meaning that the number of features for each data point exceeds the number of training samples, the SVR algorithm will underperform. To avoid this, the dimension of the features can be reduced by applying Principal Component Analysis (PCA) [
32] to extract the first
n principal components with highest corresponding eigen-values. However, PCA-based feature extraction can lead to errors, due to the loss of information, especially if the relationship between the features and the data is highly nonlinear. Another method to reduce the dimension of the features is to apply a feature selection algorithm [
33] that allows the selection of the features that are most relevant to build the model. However, the dataset under investigation does not require any feature dimensionality reduction, because the number of features is much smaller than the number of samples. For the tested datasets, the number of features is always constant and equal to nine for the electromagnetic datasets (i.e., torque harmonics amplitudes and phases and back emfs harmonics amplitudes), equal to six for the structural dataset, and it is seven for the motor losses datasets, while the number of samples varies from 250 to 1000.
Regarding the hyper-parameters chosen to train the models, they were kept constant and equal to C = 3.2 and the default kernel, ‘rbf’ (radial basis function), for all cases.
4.2. Gradient Boosting Regressor (GBR)
Gradient boosting is one of the most popular machine learning algorithms for discrete datasets [
34]. Compared with linear models, the tree-based methods Random Forest and Gradient Boosting are less impacted by outliers [
35]. Gradient Boosting is a robust machine learning algorithm that combines both the Gradient Descent algorithm and Boosting [
36]. Gradient Boosting tries to optimise the mean square error (MSE) or the mean average error (MAE).
Among its advantages, one can identify its capability to deal with missing data and its ability to fit the nonlinear relationship between the data and the features. At the same time, it allows the optimisation of different loss functions. The method trains multiple decision tress in a sequential process, starting from a tree with a weak prediction and improving the over-all model’s prediction capacity by adding another decision tree that is upgraded by modifying the weights of the first decision tree. Therefore, the new model has a better performance, given by the combination of the two decision trees [
37]. After that, the error (also known as residual) given by the model’s prediction is evaluated. If the error is not satisfactory, a new tree is introduced to better classify the data and reduce the error. The process is repeated for a imposed number of iterations in order to minimize the error and obtain a better prediction.
One of the main drawbacks is that the GBR minimizes all errors, including the ones given by the outliers, that can cause overfitting. However, to address the overfitting issues, different methods, as regularization, setting a maximum depth and early stopping can be chosen [
38,
39]. Another disadvantage is that this method is almost impossible to scale up, because each decision tree is trained based on the previous one and it is hard to parallelize the process. For processes that need to be scaled up, a scalable end-to-end tree boosting system called XGBoost is widely used by data scientists. The XGBoost reduces the run time and scales to billions of examples in distributed or memory-limited settings [
40].
Considering that the datasets under test are discrete, with a non-linear relationship between data and features, the GBR algorithm is a suitable method to train a machine learning model. For the four datasets under test, the overfitting issue was avoided by choosing a maximum depth of three and an early stopping coefficient to stop the training process when validation score is not improving after 20 iterations. The loss function imposed for the training process is the squared error.
4.3. Gaussian Process Regressor (GPR)
The Gaussian process is a type of machine learning model with applicability in solving regression and probabilistic classification problems [
41]. For this type of model, the covariance is parameterised by using a kernel function, the most popular being constant kernel and quadratic exponential kernel, known as the radial basis function. The kernels give the shape of the prior and the posterior knowledge of the Gaussian process. The main advantages of the GPR are its capability to work well on small datasets and its ability to provide uncertainty measurements on the predictions [
42].
As presented in the literature available for normative modeling, GPR has been used to characterize subject heterogeneity [
43,
44]. Normative modeling is described by a group of methods that are able to quantify the deviation of an individual from his expected value. GPR has the capability to model the heterogeneity, but it is either hard to estimate the aleatoric uncertainty accurately when the data are sparse, or unnecessary to model the conditional variance, when the data are dense [
44].
For prediction of unseen values, GPR is a remarkably powerful machine learning algorithm. Because GPR needs a reduced number of parameters, compared to other machine learning algorithms, to make predictions, it can solve a wide type of problems, even when a small size of data samples is used [
45]. GPR becomes inefficient when the number of features of the dataset exceeds a few dozens. However, by analysing the structure of the data samples under test, it can be observed that the number of features is much lower (i.e, the maximum number of features is nine, for the electromagnetic dataset) and the GPR method is suitable to be applied on the dataset under test. Another concern is that the Gaussian processes are computationally expensive and it becomes infeasible to apply it on large datasets. Nevertheless, for small datasets, such as the one under test, the GP regression is still computationally reasonable. Aside from this, the GPR was chosen due to its flexibility in implementation, the user being able to add prior knowledge and information about the model (e.g., smooth, sparse, differentiable) by selecting different kernel functions. The kernel type used in the algorithm is a radial basis function (rbf) kernel.
4.4. Performance Indicators for Prediction Accuracy Evaluation
The fitting capability of each regression model is analysed by presenting two statistical indicators. The first one, R-squared, or the coefficient of determination, measures the variation of the regression model. The model fits data good when the R-square has high scores, while very low scores indicates underfitting issues. The model has perfect predictions when the R-squared score is 1.
Considering
as the predicted value of the
sample,
as the corresponding true value for a number of
n fitted points with the mean value
,
coefficient is defined as:
where
The second metrics used to evaluate the fitting capabilities of the machine learning models is the mean squared error (MSE). MSE is a risk indicator, giving the average of the squares errors obtained by computing the square difference between the estimated values and the actual values, . The model fits data good when MSE has values close to 0.
4.5. Evaluation of Machine Learning Models Fitting Capabilities
The machine learning models that are most suitable to predict the motor multi-physical characteristics (electromagnetic, losses and structural targets) are selected based on their capacity to accurately fit the target values at the lowest computational costs. The computational cost must be cheaper than the FEA. Since the training process time is negligible, the most suitable machine learning model for each physics involved is chosen based on its accuracy and the number of samples used for the training process, looking for the one that uses the smallest dataset size. Usually, the accuracy of a machine learning model depends on the number of training samples, hence its behaviour was tested for different sample sizes (i.e., 250, 500, 750 and 1000 samples).
The first values that were fitted were the torque and back emfs harmonics. The ability of the tested regression methods to predict the most influential torque harmonic orders, (i.e., DC TH, 6th TH and 12th TH), and the back emf harmonics (i.e., 1st BmfH, 3rd BmfH and 11th BmfH harmonic orders) is presented in
Table 4. Here, the
R2 and the MSE scores are presented for each individual regression method (i.e., SVR, GBR, GPR) sequentially trained using 250, 500, 750 and 1000 samples.
By analysing the values, it can be seen that all tested methods show an ascendant trend for the
R2 score and a reduction of the MSE error when the number of samples is increased. In terms of machine learning models, SVR model performs much better than the GBR model for all types of samples, the GBR presenting a higher MSE and a lower
R2 than the SVR. At the same time, it can be observed that the GPR model fits the targets more accurately, for each training sets. Starting with 750 samples, the GPR presents excellent results, its
R2 score being higher than 0.93 and MSE lower than 7%. In particular, for 750 samples, GPR succeeds to improve the scores for the 12th TH,
R2 from 0.86 (for 250 samples) to 0.93 and the MSE from 13% (for 250 samples) to 7%. The same situation applies for the 11th BmfH, the
R2 value increases from 0.89 to 0.94 and the MSE reduces from 11% to 1%. However, even if the process of generating 1000 samples from FEA is more computationally expensive, GPR performs at its best for 1000 samples, enhancing, compared with the 750 samples case, the fitting capabilities, especially for the third most influential torque and back emf harmonics. For this case, the obtained
R2 score is 0.95 and MSE is 5% for 12th TH, and the
R2 score is 0.99 and the MSE is 1% for the 11th BmfH.
Figure 7 shows the most influential torque and back emf harmonics over their actual target (original) values obtained from the GPR 1000 samples machine learning model.
The capability of the tested regression methods to predict the modes and their corresponding natural frequencies are presented in
Table 5, where the
R2 and the MSE scores are presented for each individual regression method, SVR, GBR, GPR, sequentially trained using 250, 500, 750 and 1000 samples.
As can be observed, the machine learning models under test are able to predict very well even when they are trained using 250 samples. By analysing the presented values, it can be seen that except the GPR method, that keeps its scores constant, the SVR and the GBR show an improvement of the
R2 score and a reduction of the MSE error when the number of samples is increased. Moreover, their performance starts to saturate beyond 750 samples. This behaviour is emphasized when the data size is increased from 750 to 1000 samples. Comparing the results, it is clear that the
R2 and the MSE are the same in both cases, and an expansion of data size beyond 750 does not influence the accuracy of the developed models. Actually, the models developed for 750 samples perform well, their
R2 values being 0.99 for the SVR and GBR models and 1 for GPR model. Regarding the MSE, its values are between 0% and 1% for SVR and GBR, while for GBR it is 0%. The GPR model fits the natural frequencies more accurately than SVR and GBR. Even starting with 250 samples, the model is able to be the best performer. The low number of training samples necessary for a good performance is due to the fact that the global modes and their natural frequency are highly affected by the yoke and tooth thickness and are not as sensitive as the electromagnetic targets when the tooth tip angle, tooth tip height and the slot opening are adjusted within the set range. Therefore, less designs are needed to build the data-driven model and the structural characteristics obtained by imposing 250 input geometrical parameters are sufficient to obtain a good generalisation capacity that allows the characterisation of a new design.
Figure 8 shows the model capability to approximate the motor structural characteristics. Specifically, the frequency at which the first six mode-shapes appear is estimated with a high accuracy by the GPR 250 samples machine learning model.
The fitting ability to predict the losses targets can be identified in
Table 6, where the scores for all types of motor losses are presented (i.e., total—Tot, winding losses—Wind, iron—Iron, stator back iron—SBI, rotor back iron—RBI and magnet—Mag). The results for
R2 and MSE show that SVR performs better than the GBR for all dataset sizes. The performances of all three models are improved by increasing the size of the training data, the worst prediction being obtained at 250 samples for stator tooth losses. Both SVR and GBR manage to increase the
R2 to 1 and 0.99 and MSE to a value under 2 for 1000 samples case, but the model that is more suitable to accurately fit the losses data is, also in this case, GPR. This algorithm manages to predict, even for 250 samples, the losses with a maximum MSE of 9%, minimising the MSE value to 1% for 500 samples and 750 cases and to 0% for 1000 samples case. GPR perfectly fits the target data for 1000 samples case, where the accuracy indicators show perfect results,
R2 taking unity value and MSE being zero.
Figure 9 displays the predicted motor losses (total, winding, iron, stator back iron, stator teeth, rotor back iron and the magnets) over their actual target values (original) obtained from the GPR 1000 samples machine learning model. The obtained values are specific to the described number of stator DOFs. For less DOFs, a reduced number of feasible designs is needed to build the data-driven model. Correspondingly, when the number of DOFs is increased, the number of analysed designs must be enlarged to keep the same accuracy.