Data-Driven Engine Health Monitoring with AI

Hornyák, Olivér

doi:10.3390/engproc2024079039

Open AccessProceeding Paper

Data-Driven Engine Health Monitoring with AI^†

by

Olivér Hornyák

Institute of Information Science, University of Miskolc, 3515 Miskolc, Hungary

^†

Presented at the Sustainable Mobility and Transportation Symposium 2024, Győr, Hungary, 14–16 October 2024.

Eng. Proc. 2024, 79(1), 39; https://doi.org/10.3390/engproc2024079039

Published: 5 November 2024

(This article belongs to the Proceedings of The Sustainable Mobility and Transportation Symposium 2024)

Download

Browse Figures

Versions Notes

Abstract

:

Artificial Intelligence (AI) can change how automotive engines are maintained and monitored in the future. This paper presents an Adaboost-based model to monitor engine health with real-world data. Key engine metrics such as temperature, pressure, and other operational data can be analysed to detect early signs of potential problems. Machine learning algorithms can be used to identify patterns and anomalies that might be missed by traditional methods, thus leading to more accurate predictions of engine issues. Maintenance schedules can be optimised, resulting in reduced downtime and costs. This work highlights the importance of data-driven methods in the future of automotive maintenance and efficiency. The application of AI in this context also enhances safety by preventing unexpected engine failures. Importantly, AI-driven solutions are designed to be scalable, making them applicable to a wide range of vehicles and engine types. The automotive industry can move towards more proactive and efficient maintenance practices by exploiting the available data.

Keywords:

machine learning; prediction; engine health data

1. Introduction

Artificial Intelligence (AI) is reshaping various industries, and automotive engine maintenance is no exception. With the continuous collection of sensor data, such as temperature, pressure, and other operational metrics [1], AI can enhance the monitoring and maintenance of automotive engines. Unlike traditional methods that rely heavily on scheduled maintenance, AI-driven approaches can detect patterns and anomalies that may indicate mechanical issues. This capability not only improves the accuracy of predicting engine failures but also allows for the optimisation of maintenance schedules, reducing unnecessary downtime and associated costs. AI applications in this context also contribute to environmental sustainability. Optimised engine performance leads to better fuel efficiency and reduced emissions, aligning with global efforts to minimise the automotive sector’s environmental footprint. As the automotive industry continues to evolve, the role of AI is significant.

2. Materials and Methods

Maintaining automotive engines effectively ensures vehicle reliability, safety, and durability. Recent advancements in technology, particularly in Artificial Intelligence sensors and data analytics, have significantly enhanced the methods and systems used for automotive maintenance. This section provides an overview of various aspects of automotive engine maintenance systems, illustrating the progress and challenges in this domain.

AI technologies are increasingly being used to predict engine failures before they occur. Extensive research was conducted in this field. Paper [2] describes how machine learning algorithms can analyse sensor data to predict engine component failures. Machine learning (ML) can analyse large volumes of data from vehicle sensors. These models learn from historical data to predict future events, such as potential failures. Paper [3] provides an overview of the application of neural networks to predict the lifespan of engine components based on operational data and past maintenance records. Deep learning, a subset of ML, is also helpful in identifying anomalies in sensor data that may indicate impending failures. In paper [4], Convolutional Neural Networks (CNNs) were used to detect unusual patterns in engine vibration data, a common precursor to mechanical issues. Reinforcement Learning can be applied to optimise maintenance schedules based on the performance feedback of automotive engines [5]. Natural language processing (NLP) techniques can analyse maintenance records and service logs [6].

Integrating advanced sensors into automotive engines provides real-time data crucial for monitoring engine health. The Internet of Things (IoT) refers to a network of interconnected devices that can communicate with each other and exchange data over the Internet. Internet of Things (IoT)-based onboard sensor usage is presented in [7] for nitrogen oxide emissions estimation. A review is given in paper [8] to discuss how IoT devices can send real-time diagnostics and performance data to cloud-based systems for analysis, facilitating proactive maintenance strategies.

Predictive maintenance is a significant strategy in automotive applications. The IoT-based maintenance [9] and lifecycle management of engine components can significantly extend the operational life of automotive engines. Paper [10] examines the methodologies used to assess the lifecycle of various engine components based on usage patterns and environmental factors. The economic impact of engine maintenance is also essential in the literature, papers [11,12] analyse the cost-effectiveness of the automotive industry’s different maintenance strategies, including reactive, preventive, and predictive maintenance.

Emerging technologies such as AI, machine learning, and IoT will redefine the scope and effectiveness of maintenance systems. A future-oriented study [13] presents the next generation of maintenance systems powered by intelligent algorithms capable of self-diagnosis and automated repair functions. The concept of connected cars is reviewed in paper [14], presenting the possibilities and capabilities of hardware and software.

2.1. Research Goal

This paper will analyse an Engine Failures Dataset [15]. A structured approach extracts meaningful relationships and develops an AI-based predictive model. Figure 1 describes the necessary steps:

Data collection and semantical analysis involve obtaining and exploring the dataset, reviewing documentation, and performing initial data exploration to identify data types, missing values, and basic statistics. Data cleaning and preprocessing include handling missing values through imputation, dealing with outliers by trimming, normalising, or standardising data, if required, and conducting feature engineering to create new variables. Exploratory data analysis uncovers patterns, trends, and relationships within the data using visualisations like histograms, box plots, and scatter plots. Model development involves creating predictive models using suitable machine learning algorithms, selecting promising models based on the data and prediction tasks, and using cross-validation to tune the models and avoid overfitting. The model evaluation assesses the models’ performance using the accuracy, precision, recall, and F1 score for classification problems. Presentation compiles the findings into a comprehensive report or presentation, outlining the methodology, findings, and recommendations.

2.2. Description of the Dataset

This paper investigates the engine time to failure dataset. The type of the engine is not defined. Units of the data columns are not indicated. It describes a study in which 100 different engines are continuously monitored from the start of use until they fail. Each engine is run under certain conditions—possibly varying by factors such as operational intensity, environmental conditions, or maintenance schedules—until it experiences a breakdown. The dataset consists of the following columns:

ID: This column represents the unique identifier for each monitored engine or unit.
TTF: This stands for “time to failure”, which is the primary variable of interest, indicating the remaining operational time before an engine failure occurs.
s12, s14, s17: These columns represent various signals or sensor readings related to engine health. Each signal provides specific data points collected during engine operation.

Table 1 provides a brief overview of the first few rows of the dataset:

TTF shows a strong positive correlation with s12 (0.67) and a robust negative correlation with s17 (–0.61). The third signal, s14, has a moderate negative correlation with TTF (–0.31) and a moderate positive correlation with s17 (0.25). A strong negative correlation (–0.70) exists between s12 and s17. Table 2 shows the signals’ statistics. Figure 2 shows the correlation heatmap. The descriptive statistics for each column in the dataset are shown in Table 2:

2.3. Key Observations on the Data

The following observations can be taken: The dataset contains 20,631 fields for all columns, indicating a complete dataset with no missing values. The mean and median values for signals s12, s14, and s17 are close, which is a symmetric distribution around the centre for these measurements. The standard deviation for each sensor signal (s12, s14, s17) is relatively small, indicating that most values are close to the mean, suggesting consistent sensor readings. The relatively narrow interquartile range for the sensors indicates that most data points lie close to the median, which is a reasonable basis for predictive modelling and reliability analysis.

2.4. Histograms

The histogram of s12 (Figure 3, Blue) illustrates the distribution of sensor s12 readings, which appear to be normally distributed, with a slight asymmetry. The histogram of s14 (Figure 3, Red) also demonstrates a normal distribution but a narrower range than s12, indicating less variability in sensor s14 readings. The histogram of s17 (Figure 3, Green) shows a somewhat normal distribution with a slight left skew, suggesting a concentration of values at the higher end of the range. Finally, the histogram of Time to Failure (Figure 3, Purple) depicts the distribution of time to failure across the dataset, displaying a right skew, which indicates that many engines have a longer lifespan before failing, with fewer failing much earlier.

2.5. Adaboost Classification Model

In this paper, an adaptive boosting (AdaBoost) algorithm was used. AdaBoost is a learning algorithm designed to improve the performance of weak classifiers by combining them into a robust classifier. It starts by assigning equal weights to all training samples. The algorithm iteratively trains weak learners and adjusts the weights of the training samples based on their classification results. Misclassified samples have their weights increased, while correctly classified samples have decreased. This process ensures that subsequent learners focus more on complex cases. Each weak learner’s contribution to the final model is determined by its error rate, with a lower error rate resulting in a higher weight (alpha). The final model is a weighted sum of the predictions from all weak learners. In the preprocessing step, the dataset was loaded, and a threshold for classification was set using the median of the TTF values. Binary labels were created: samples with TTF values less than the median were labelled as 1 (high risk), and those with values greater than or equal to the median were labelled as 0 (low risk). The features used for classification were all the three available sensor readings (s12, s14, s17). The AdaBoost model was implemented in python using the AdaBoostClassifier class from the scikit-learn library. The AdaBoost model was configured with a DecisionTreeClassifier as the base estimator (max_depth = 3), using 100 estimators (n_estimators = 100), a learning rate of 1.0 (learning_rate = 1.0), and a random state of 42 (random_state = 42) to ensure reproducibility. The experiment was conducted on a personal computer equipped with an Intel i7 processor, with 16 GB of RAM. The data was split into training and testing sets, as 70% was used for training and 30% was used for testing. After training, the model was evaluated on the test set. The predicted classification was carried out, and various performance metrics were calculated, including the confusion matrix, accuracy, precision, recall, and F1 score.

3. Results and Discussions

The evaluation of the training data resulted in the following confusion matrix:

[\begin{matrix} 2728 & 456 \\ 777 & 2229 \end{matrix}]

Based on the confusion matrix, the following metrics were calculated:

Evaluation Metrics

The most essential numerical evaluation metrics are as follows:

Accuracy/the proportion of correctly classified instances:

Accuracy = (TP + TN)/(TP + TN + FP + FN) = 0.8008077544426494

(1)

Precision/the proportion of true positive predictions out of the total predicted positives:

Precision = TP/(TP + FP) = 0.8301675977653631

(2)

Recall/the proportion of true positive predictions out of the actual positives

Recall = TP/(TP + FN) = 0.7415169660678643

(3)

F1 Score/the harmonic mean of precision and recall

F1 Score = 2 × (Precision × Recall)/(Precision + Recall) = 0.783342119135

(4)

4. Conclusions

One of the main advantages of data-driven approaches, such as the one employed in this study, is that they do not require extensive prior knowledge of automotive engine mechanics. This allows the analysis to be conducted purely based on the data, irrespective of the specific details or understanding of what each data point represents. However, the success of these approaches relies on the reliability and accuracy of the sensor data used for model training and testing.

This paper has presented a theoretical approach. An AdaBoost classification model was applied to predict the time to failure for automotive engines based on sensor readings. The model achieved an accuracy of approximately 80.08%. This is a reasonably high level of performance for the AdaBoost. With a precision of 83.02%, the model is quite reliable. The recall rate of 74.15% indicates that the model can identify the majority of “Low risk” (class 1) instances. While this is a strong performance, it also shows room for improvement in catching more true positives. The F1 score, which balances precision and recall, is 78.33%. This suggests a good overall performance, balancing the identification of true positives and the minimisation of false positives.

It is important to note that this study is hypothetical. To validate the approach as an engineering solution, reliable data—including precise sensor accuracy, appropriate sampling rates, and detailed information on the interaction between the measured dimensions and the engine’s operational conditions—must be used. Without validation against real-world operational conditions and sensor data accuracy, the results remain a theoretical exercise.

The model’s predictions, if validated with reliable data, have the potential to enhance preventive maintenance schedules, reduce unexpected engine failures, and improve operational efficiency. By accurately predicting engine failures, resources can be better allocated to engines at a higher risk of failure, optimising maintenance efforts and costs.

Funding

The described article was carried out as part of the 2020-1.1.2-PIACI-KFI-2020-00147 “OmegaSys—Lifetime planning and failure prediction decision support system for facility management services” project implemented with the support provided by the National Research, Development, and Innovation Fund of Hungary, financed under the 2020.1.1.2-PIACI KFI funding scheme.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset investigated in this study is available at https://www.kaggle.com/datasets/m0ntecarl0/engine-time-to-failure (accessed on 1 October 2024).

Conflicts of Interest

The author declares no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

References

Syafrudin, M.; Alfian, G.; Fitriyani, N.L.; Rhee, J. Performance analysis of IoT-based sensor, big data processing, and machine learning model for real-time monitoring system in automotive manufacturing. Sensors 2018, 18, 2946. [Google Scholar] [CrossRef] [PubMed]
Theissler, A.; Pérez-Velázquez, J.; Kettelgerdes, M.; Elger, G. Predictive maintenance enabled by machine learning: Use cases and challenges in the automotive industry. Reliab. Eng. Syst. Saf. 2021, 215, 107864. [Google Scholar] [CrossRef]
Arena, F.; Collotta, M.; Luca, L.; Ruggieri, M.; Termine, F.G. Predictive maintenance in the automotive sector: A literature review. Math. Comput. Appl. 2021, 27, 2. [Google Scholar] [CrossRef]
Kumar, A.; Gandhi, C.P.; Zhou, Y.; Vashishtha, G.; Kumar, R.; Xiang, J. Improved CNN for the diagnosis of engine defects of 2-wheeler vehicle using wavelet synchro-squeezed transform (WSST). Knowl.-Based Syst. 2020, 208, 106453. [Google Scholar] [CrossRef]
Du, G.; Zou, Y.; Zhang, X.; Liu, T.; Wu, J.; He, D. Deep reinforcement learning based energy management for a hybrid electric vehicle. Energy 2020, 201, 117591. [Google Scholar] [CrossRef]
Pillai, A.S. Advancements in Natural Language Processing for Automotive Virtual Assistants Enhancing User Experience and Safety. J. Comput. Intell. Robot. 2023, 3, 27–36. [Google Scholar]
Tan, Y.; Henderick, P.; Yoon, S.; Herner, J.; Montes, T.; Boriboonsomsin, K.; Durbin, T.D. On-board sensor-based NOx emissions from heavy-duty diesel vehicles. Environ. Sci. Technol. 2019, 53, 5504–5511. [Google Scholar] [CrossRef] [PubMed]
Rahim, M.A.; Rahman, M.A.; Rahman, M.M.; Asyhari, A.T.; Bhuiyan, M.Z.A.; Ramasamy, D. Evolution of IoT-enabled connectivity and applications in automotive industry: A review. Veh. Commun. 2021, 27, 100285. [Google Scholar] [CrossRef]
Liyakat, K.S.S.; Liyakat, K.K.S. IoT in Electrical Vehicle: A Study. J. Control Instrum. Eng. 2023, 9, 15–21. [Google Scholar]
Del Pero, F.; Delogu, M.; Pierini, M. Life Cycle Assessment in the automotive sector: A comparative case study of Internal Combustion Engine (ICE) and electric car. Procedia Struct. Integr. 2018, 12, 521–537. [Google Scholar] [CrossRef]
Pophaley, M.; Vyas, R.K. Choice criteria for maintenance strategy in automotive industries. Int. J. Manag. Sci. Eng. Manag. 2010, 5, 446–452. [Google Scholar] [CrossRef]
Poór, P.; Ženíšek, D.; Basl, J. Historical overview of maintenance management strategies: Development from breakdown maintenance to predictive maintenance in accordance with four industrial revolutions. In Proceedings of the International Conference on Industrial Engineering and Operations Management, Pilsen, Czech Republic, 23–26 July 2019. [Google Scholar]
Ucar, A.; Karakose, M.; Kırımça, N. Artificial intelligence for predictive maintenance applications: Key components, trustworthiness, and future trends. Appl. Sci. 2024, 14, 898. [Google Scholar] [CrossRef]
Coppola, R.; Morisio, M. Connected car: Technologies, issues, future trends. ACM Comput. Surv. (CSUR) 2016, 49, 1–36. [Google Scholar] [CrossRef]
Engine Failure Dataset. Available online: https://www.kaggle.com/datasets/m0ntecarl0/engine-time-to-failure (accessed on 28 July 2024).

Figure 1. Steps to achieving the research goal.

Figure 2. Correlation heatmap.

Figure 3. Histogram of (a) s12 signal; (b) s14 signal; (c) s17 signal; (d) time to failure.

Table 1. First few rows of the dataset.

ID	TTF	s12	s14	s17
1	191	521.66	8138.62	392
1	190	522.28	8131.49	392
1	189	522.42	8133.23	390
1	188	522.86	8133.83	392
1	187	522.19	8133.80	393

The dataset contains 20,631 entries. The ID ranges from 1 to 100; this is the count of the engines. TTF, given in the number of cycles, ranges from 0 to 361, with a mean of approximately 108. One engine was broken at the time of the investigation (TTF = 0). The duration of the cycles was not indicated in the dataset.

Table 2. Descriptive statistics.

Statistics	ID	TTF	s12	s14	s17
Count	20,631	20,631	20,631	20,631	20,631
Mean	51.51	107.81	521.41	8143.75	393.21
Std Dev	29.23	68.88	0.74	19.08	1.55
Min	1.00	0.00	518.69	8099.94	388.00
Max	100.00	361.00	523.38	8293.72	400.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hornyák, O. Data-Driven Engine Health Monitoring with AI. Eng. Proc. 2024, 79, 39. https://doi.org/10.3390/engproc2024079039

AMA Style

Hornyák O. Data-Driven Engine Health Monitoring with AI. Engineering Proceedings. 2024; 79(1):39. https://doi.org/10.3390/engproc2024079039

Chicago/Turabian Style

Hornyák, Olivér. 2024. "Data-Driven Engine Health Monitoring with AI" Engineering Proceedings 79, no. 1: 39. https://doi.org/10.3390/engproc2024079039

APA Style

Hornyák, O. (2024). Data-Driven Engine Health Monitoring with AI. Engineering Proceedings, 79(1), 39. https://doi.org/10.3390/engproc2024079039

Article Menu

Data-Driven Engine Health Monitoring with AI^†

Abstract

1. Introduction