1. Introduction
The rapid evolution of automotive technology, driven by increasingly stringent emissions regulations and the demand for higher fuel efficiency, has necessitated significant advancements in engine design and calibration. Modern internal combustion engines must meet a complex set of requirements, balancing performance, efficiency, and emissions within a highly constrained operational framework [
1]. This balance is particularly challenging due to the diverse conditions under which engines must operate, from low-load urban driving to high-load highway cruising [
2,
3].
To address these challenges, engine design and calibration processes have evolved to incorporate a variety of advanced techniques and technologies. Among these, the integration of model-based design (MBD) and optimization algorithms stands out as a transformative approach. By leveraging detailed simulations and robust optimization frameworks, engineers can explore a vast design space more effectively than ever before, identifying optimal configurations that would be impractical to evaluate experimentally. Machine learning (ML) techniques are being increasingly introduced into all engine development phases to meet the following objectives.
Model-Based Design: Utilizing high-fidelity engine models to simulate performance and emissions under various operating conditions, providing a detailed understanding of engine behavior [
4,
5,
6]
Optimization Algorithms: Implementing multi-objective optimization techniques to identify the best trade-offs between conflicting goals, such as those related to power output, fuel consumption, and pollutant emissions [
7,
8]
Control Strategies: Developing and applying advanced control strategies, including feedback and feedforward mechanisms, to ensure optimal engine performance across the entire operating range [
9,
10]
Data Analytics: Employing real-time data analytics to continuously refine and validate models, ensuring that the optimization process remains aligned with the real-world engine performance [
11].
In the field of engine modeling, the integration of analytical functions with ML-based models marks a significant step forward in enhancing the predictive performance and operational efficiency. Traditional engine models, often rooted in complex physical laws and empirical data, face challenges in the identification and calibration of analytical functions, especially when the number of independent variables starts to increase [
12]. Conversely, purely data-driven models, particularly those based on artificial neural networks (ANNs), excel in learning complex patterns from data but can struggle with extrapolation beyond their training domain and in maintaining consistency with physical laws [
13].
This paper explores the development of hybrid approaches that combine the strengths of traditional analytical methods with modern ML techniques to model engine combustion and temperature indices more effectively [
14,
15,
16]. By integrating analytical corrective functions into ML-based engine models, we aim to leverage the precision and robustness of physical models alongside the adaptability and learning capabilities of ANNs. This hybridization seeks to address key limitations in the current modeling approaches, such as feature extrapolation, parameter estimation, and the ability to capture physical dynamics [
17,
18].
The core objective of this study is to demonstrate that integrated models, which blend analytical functions with neural networks, achieve superior predictive accuracy and generalization performance compared to models relying solely on neural networks. The effects superposition principle lies thus at the base of the methodology presented in this work, and this means that each input variable brings a specific contribution to the combustion process. Such a principle has been already implemented in the literature and demonstrated by the authors to include the effects of inert species for the combustion process, like water [
19,
20]. The innovative contribution of this work is the extension of the methodology previously presented to model complex physical problems, even with a larger number of independent variables. In other words, the main benefit of the proposed approach is the disaggregation of the complex system into simpler problems that depend on a single independent variable, and it can be synthetized as follows:
- -
The multi-dimensional phenomenon is divided into simpler problems;
- -
Artificial neural networks are used to capture the effects of variables for which the operating range is well defined (such as the engine speed and load), and it cannot be exceeded during the engine’s operation;
- -
Analytical functions are applied to describe and extend the trend of the output variable (such as combustion, knock, and exhaust gas temperature indices) with respect to those independent variables that could assume values that differ from the calibrated ones; this allows us to increase the extrapolation capabilities of a standard data-driven model;
- -
The contributions of the artificial neural network and corrective functions are then determined to calculate the final output value.
Through this integrative approach, the enhanced models not only improve the accuracy and robustness of predictions but also maintain consistency with physical trends, even in scenarios where the input features extend beyond the values encountered during training. Such a capability allows us to maintain the good accuracy of the model’s predictions, including in the case of anomalous engine operation, and this particular feature makes the models also suitable for failure prediction. Moreover, the improvement does not impact the effort required for data collection, which relies on standard mapping and calibration activity for a new engine. Additionally, the application of analytical functions for the description of simple physical trends requires fewer engine points for the definition of the calibratable coefficients, with respect to a purely data-driven approach. Indeed, although, for this work, a dedicated dataset has been collected, this is due to the need to also train neural networks and to have a training dataset suitable for a robust comparison between the two methods. Once the hybrid approach is identified as the most robust for this application, a portion of the dataset collected for the engine mapping procedure can be used for model calibration.
The analytical corrective functions play a critical role in ensuring that the output of the complete model adheres to the expected physical behavior, providing a safeguard against unrealistic predictions. This capability is especially crucial when the model is exposed to operating conditions that significantly deviate from the training data, thereby extending the practical applicability and reliability of the engine simulator. The availability of a robust and reliable engine simulator, with the capability to extrapolate the values of the combustion, knock, and exhaust gas temperature indices for values of the input features that differ from the calibrated ones, represents a useful tool for the development of new control strategies [
20,
21] and the offline adjustment of engine calibrations. Indeed, one of the main targets of such activity is to define a method that can be extended to other indices to develop a complete engine simulator for the prediction of combustion, knock, and exhaust gas temperature indices to achieve the offline design of new driving cycles or to implement new calibration sets.
In the first part of this work, the experimental campaign conducted to collect the data needed for the development of the analytical corrective functions is presented. The description of the complete engine simulator and the indices that have to be modeled are presented, highlighting the types of input variables that can be used to feed the models. The two main approaches considered for the development of the engine models are described and compared in particular for a specific combustion index (50% of the mass fraction burnt, MFB50). Such an index is generally used to develop the methodologies described in this work, and the most accurate approach is then applied to other indices. The methodology utilized to identify the analytical corrective functions used to enhance the performance of the ANNs is described. The accuracy of a purely ANN-based method is compared with that of the hybrid approach when the calibration of the models is performed with steady-state data, while validation is carried out with recordings of driving cycles and dynamic on-track profiles.
This study underscores the potential of hybrid modeling approaches to significantly extend the extrapolation capabilities of engine simulators, offering a more accurate and reliable tool for the prediction of engine performance across a wide range of operating conditions. While it is not discussed in this study, the application of the hybrid approach could also follow a reversed approach, where the main model for the reference conditions is represented by a look-up table (LUT) or an analytical function and the machine learning algorithm works to adjust the main output for dynamic conditions (for example, to include the effect of the sensor dynamics) or to adjust the first value when an additional influencing parameter deviates from the reference conditions. Variables that have a major impact on the combustion process are included as input features of the hybrid model. The engine speed and load, the spark advance, the target lambda value, the temperature of the air at the inlet of the intake valves, and the phases of the intake and exhaust valves are considered as input features, since such variables are also affected by the engine control unit (ECU) strategies that are initiated when particular environmental or vehicle conditions occur. In this way, a wide range of possible conditions that can affect the combustion process are predicted by the model.
In the second part, the development of the complete engine simulator based on the hybrid approach is described, and the results achieved by simulating different types of driving cycles and steady-state conditions are presented. Moreover, some proposals to further improve the accuracy and robustness of the data-driven models are presented in the last section of this work.
2. Experimental Setup
In this section, the engine used for the experimental campaign and the layout of the testing environment used to control such an engine and to collect data are introduced. Furthermore, a detailed description and critical considerations regarding the experimental campaign conducted are presented.
2.1. Testing Environment
The experimental setup is composed of a high-performance, gasoline-powered internal combustion engine, whose main characteristics are described in
Table 1. It is installed in the engine test cell by connecting it to an eddy current passive dynamometer by Borghi & Saveri (Italy).
Each cylinder is equipped with a piezoelectric pressure transducer, in order to collect in-cylinder pressure data, with a sampling frequency of 100 kHz. The pressure sensor used is from Kistler, and its main features are reported in
Table 2. Indicating signal conditioning and acquisition is carried out with MASTRO charge amplifiers and the OBI-M2 indicating system provided by Alma Automotive (Italy). The calculations of the MFB50, indicated mean effective pressure (IMEP), and maximum in-cylinder pressure (Pmax) combustion indices use low-pass-filtered signals, with a cut-off frequency of 3 kHz, and a windowed in-cylinder pressure trace, while the same signal is band-pass-filtered in order to calculate the maximum amplitude pressure oscillation (MAPO) index for the knock intensity.
The main temperatures of the system (air temperature inside intake manifold, exhaust gas temperature in exhaust runners and at turbine inlet section) are acquired with thermocouples connected to the test bench management system, whose characteristics are summarized in
Table 3.
The thermocouple measurement chain is composed of a National Instruments (Austin, TX, USA) Compact-Rio 9024, with module 9213, which has a sample frequency of 100 Hz. The actual value of the temperature measured for each stationary engine point and condition is acquired after the end of the transient phase, to obtain the characteristic temperature at a given working condition.
The fuel used to feed the engine is gasoline with research octane number (RON) 98 for the whole experimental procedure, as it is not required that the engine model is sensitive to different fuel properties. Furthermore, all data available from the industrial partner of this project that can be used for validation, testing, and analysis are collected using the same type of gasoline. The thermodynamic conditions of the air inside the intake manifold are controlled thanks to the turbocharging system, for the air pressure, and to the water-to-water heat exchange system of the intercooler for the air temperature.
As seen in
Figure 1, an engine control unit (ECU) for each bank controls the engine, equipped with ETAS hardware and software tools to manage different working conditions and impose the actuations required by each specific test. Furthermore, real-time communication between the ETAS tools and the test bench acquisition system allows the collection of iso-frequency and phased data between the actuations and the data sensed by the engine test bench.
2.2. Experimental Campaign
To highlight the advantages of the hybrid approach followed by the authors, and the reasons for which this path is chosen, in the following sections, a comparison will be presented between this one and a pure ANN methodology. To perform a consistent comparison, a dataset obtained from standard calibration activities is used to train the purely ANN-based model (1200 points are used for this task), given its ability to manage a large amount of data. A portion of this dataset can be also used to perform model validation and testing. Meanwhile, to calibrate the whole hybrid model, specific custom-made tests are required to ensure a robust comparison between the two approaches, as mentioned earlier.
The experimental tests conducted have the main goal of building a consistent dataset that is able to successfully train the ANN and calibrate the analytical corrective functions.
The neural network must be capable of modeling the value of the MFB50 at different engine working points, with standard calibrations of the main actuations and reference inlet air conditions. Thus, it must be sensitive to the variations in the combustion phasing with different engine speeds and loads. For this purpose, a wide grid of engine operating points is collected to avoid the risk of extrapolation during the model’s inference. This goal is fulfilled by testing the engine up to its physical minimum and maximum values for both the engine speed and load, avoiding the possibility of simulating maneuvers or operating conditions outside of these boundaries. As visible in
Figure 2, which shows the steady-state data recorded for the ANN training, the operating points of the grid are quite narrow to ensure good interpolation capabilities, avoiding being too well fitted, which could lead to a risk of overfitting (a total amount of 89 engine points are collected). The engine speed and load are normalized with respect to the maximum value and then converted to percentages. This normalization process for the engine speed and load is followed throughout this paper.
Furthermore, different testing activities are conducted to collect a dataset that is useful to calibrate the analytical corrective functions. Keeping in mind that the core idea of the model is to leverage the use of the superposition principle, each test consists of a sweep of each independent variable that acts as input for the model: the spark advance (SA), lambda (λ), variable valve timing (VVT), and intake manifold temperature (Tans). Considering all these elements, fewer engine points are considered necessary with respect to the first dataset, as seen in
Figure 3. The selection of these points is based on the goal of covering the whole engine speed and load map, while isolating the effect of each variable through their sweeping. In particular, the engine map can be divided into four zones to be explored: a low load and low speed, a high load and low speed, a low load and high speed, and a high load and high speed. These cover both aspirated and boosted zones of the engine’s working areas, from low to high speeds. As another challenge in the current work, a small number of engine operating points are chosen, ensuring that a reasonable number of total points is tested, considering that, for each engine point, a certain number of sweep tests is performed. To ensure the robustness of the corrective functions (a total of 615 points are collected), 19 engine points are used for each number of sweeps performed for each engine point and independent variable.
An important characteristic of these tests is that, during the variation in each input, all other independent variables are kept as constant as possible, as visible in
Figure 4 and
Figure 5. If it is an actuated variable, it is fixed at the same value as for standard calibration; meanwhile, if it is a sensed parameter, it must be close to the reference value. This element is very important to isolate each input’s effect on the combustion phasing. The numerical values of the variables are omitted for confidentiality; thus, it is important to highlight that the maximum value of oscillation accepted is ±5%, especially for those parameters that are difficult to control (e.g., the intake manifold temperature).
Additionally, some tests are carried out to build a proper validation dataset for the model: both steady-state engine points with a simultaneous change in multiple inputs, in order to stress the combined effect of all variables involved in the combustion process, and transient on-road profiles replicated at the engine test bench from standard homologation cycles to track their profiles.
Figure 6 shows an example of a standard homologation cycle on the left and a track test profile on the right side, represented both by the normalized driver pedal position and normalized engine speed. It is also important to highlight that, throughout the whole validation process, transient tests with both standard calibration and different control strategies are performed. This is crucial to fully explore the capability of the developed model to reproduce combustion metrics in transient conditions and non-conventional control srategies.
6. Conclusions and Future Work
This study highlights the significant advancements achieved through the integration of traditional analytical methods with modern machine learning techniques in engine modeling. The hybrid approach developed in this work is based on the effects superposition principle, and it combines the precision and robustness of physical models with the adaptability and learning capabilities of artificial neural networks. By incorporating analytical corrective functions into ML-based engine models, this study addresses key limitations of the current modeling approaches, such as feature extrapolation, parameter estimation, and the ability to capture physical dynamics. The main innovative contribution of this work is the extension of the hybrid approach to model complex physical problems, even with a larger number of independent variables. In particular, artificial neural networks are used to capture the effects of variables for which the operating range is well defined (such as the engine speed and load), while analytical functions are applied to describe and extend the trends of the output variables (such as the combustion, knock, and exhaust gas temperature indices) with respect to those independent variables that could assume values that differ from the calibrated ones. This allows us to increase the extrapolation capabilities of a standard data-driven model, and it does not have an impact on the effort required for the collection of the experimental data.
This experimental campaign and the development of a complete engine simulator demonstrate the superior predictive accuracy and generalization performance of hybrid models compared to purely data-driven methods. These enhanced models maintain consistency with physical trends, even under operating conditions that deviate significantly from the training data. This capability is crucial in ensuring the reliability and applicability of engine simulators in real-world scenarios, including under anomalous engine operations and failure predictions. It is demonstrated that such an approach can be effectively applied to combustion and knock modeling, but also to exhaust gas temperature estimation, through the implementation of a semi-physical thermocouple model. Moreover, an additional benefit of the proposed approach is the possibility to define the values of the calibratable coefficients with fewer engine points, with respect to a purely data-driven approach. This results in a reduction in the computational effort required for the calibration of the model and in the possibility to extend the data collected at the test bench via simulations, with a concrete reduction in costs.
On the other hand, tests for the calibration of the analytical functions have to be carried out by varying a single variable and keeping all other variables constant, so as to isolate the effect of each input feature on the calculated output. This, in some cases, leads to variation in the standard procedures to conduct the campaign to calibrate the main ECU control strategies. At the same time, as mentioned above, the number of engine points required for complete model calibration is smaller than that needed for a purely data-driven methodology.
Future research directions include further improving the accuracy and robustness of the data-driven models and exploring additional applications of the hybrid approach to model different engine parameters, such as functional indices. Moreover, the developed engine simulator will be applied for offline engine calibration activities and to design and pre-evaluate new durability tests and driving cycles.