1. Introduction
Digital twin (DT) technology is a cutting-edge innovation that has the potential to revolutionize various industries. DT involves creating a virtual replica of a physical object or system, and using data-driven analysis and decision-making to continuously update and improve it. The virtual replica, or digital twin, is made up of computational models that evolve and change over time, reflecting the structure, behavior, and environment of the physical object or system they represent [
1,
2]. Digital twin systems are digital representations of physical systems, such as vehicles, buildings, or manufacturing processes. They are used to simulate the behavior and performance of the physical system, and to predict its behavior or performance under different conditions. This can be useful for a variety of applications, such as planning for maintenance, optimizing the operation of the physical system, or analyzing the impact of changes to the system’s design or operation.
The development of an operational vehicle digital twin system for urban air mobility (UAM-ODT) includes the following fundamental modules: (i) neural digital twin dynamic engines (DTDE), (ii) neural digital twin control engines (DTCE), (iii) digital twin control frame (DTCF), and (iv) digital twin cloud infrastructure (DTCI) as shown in
Figure 1. The DTDE module is responsible for creating a virtual replica of the aerodynamics of UAM vehicles using learning-based techniques. The DTCE module performs control tasks, such as robust control, optimal control, and adaptive control, to ensure the safety of the vehicle. These two modules digitalize the dynamics and control of the vehicle to ensure that the operations of the vehicle in the digital space are identical to those in the physical space. The DTCF module serves as a bridge between the digital twin and the physical twin of the vehicle. It can provide teleoperation services, fault-tolerant control, or traffic prediction and management, with the belief that if the dynamics and control of the physical vehicle are accurately captured in the digital space along with the digital environment (e.g., city, region, country), the operations in the digital space can be effectively transferred to the physical space. The DTCI module is the common computing platform that hosts the entire UAM-ODT system, running constantly to create a virtual space of the real-world UAM physical infrastructure. Due to the stringent requirements for the high availability of the digital twin system, the DTCI must handle any failures and maintain constant digital operations and services in the long run. Particularly, if a digital twin runs all day and night, it can be subject to a phenomenon known as “software aging”. Software aging is the gradual deterioration of the performance and reliability of software over time, due to factors such as changes in the operating environment, errors and defects in the software, or the accumulation of wear and tear on the software. If a digital twin runs continuously, it can experience software aging more quickly than if it were run only intermittently. This can cause the digital twin to become less accurate and less reliable over time, which can affect the quality of the predictions and decisions it makes. In this work, we investigate the software aging problems in the digital twin cloud infrastructure which is developed upon
Kubernetes-based cloud environment using a cloud-in-the-loop simulation approach.
Software aging is a phenomenon that occurs when software systems become less reliable and less efficient over time. This can happen for a variety of reasons, such as changes in the environment, changes in the software itself, or the accumulation of errors and defects. When software ages, it can become less accurate and less reliable, which can affect the performance and behavior of the systems that it is used to control or manage. The software aging phenomenon occurs in operating software systems, causing sudden failures such as crashes and continuous performance degradation, which can be circumvented by a proactive strategy such as software rejuvenation to avoid abrupt system interruptions [
3,
4]. The relevance of such a phenomenon is remarkable, considering that the demand for availability and reliability in the provision of services in practically all areas has increased in order to have quality and competitiveness in each field of activity. Considering high availability requirements, the services of computing, health, security, financial system, geolocation, and routing are examples that can be cited. In order to meet such service demands without the unwanted effects of software aging, it is necessary to use an architecture capable of maintaining its offer without huge operational costs of employing several redundant servers with high computational power, requiring human resources for their handling and management, and also incurring higher energy costs. Using virtual machines in contexts such as these has been an alternative because they provide functionalities of a physical server based on the same traditional computational architecture. Thus, it is possible to create several virtual machines on a single server, and each virtual machine can run different environments allowing the execution of heterogeneous systems [
5]. The scalability and flexibility of IT (Information Technology) can be increased through virtualization, in addition to generating significant savings in operational costs. Thus, IT administration becomes easier to manage by obtaining better availability, operability, performance, and greater workload mobility through virtualization [
6].
If the flight control software of an unmanned aerial vehicle (UAV) experiences software aging, it can affect the performance and behavior of the UAV. As the software ages, it can become less accurate and less reliable, which can cause the UAV to behave in unexpected or unsafe ways. To address software aging in the flight control software of a UAV, it is important to periodically update and maintain the software. This can involve installing patches and updates, fixing errors and defects, and re-tuning or re-calibrating the software to account for changes in the environment or the UAV itself. Regular maintenance and updates can help to ensure that the flight control software remains accurate and reliable over time, and can help to prevent or mitigate the effects of software aging. In some cases, software aging can cause the flight control software to become unstable or unreliable. If this happens, it may be necessary to take the UAV out of service temporarily in order to perform maintenance or repairs. This can involve replacing or upgrading the flight control software, or making other changes to the UAV in order to improve its performance and reliability. So, software aging in the flight control software of a UAV can affect the performance and behavior of the UAV. To address this problem, it is important to periodically update and maintain the flight control software, and to take the UAV out of service if necessary in order to perform maintenance or repairs. This can help to ensure that the UAV remains safe and reliable over time.
To create a digital twin, a mathematical model of the physical system is created using data about the system’s behavior and performance. This model is then used to simulate the behavior of the physical system under different conditions, and to make predictions about its performance. In order to create a reliable and accurate digital twin, it is important to use accurate and reliable software to create the model and simulate the system’s behavior. However, software aging can be a problem for digital twin systems. As the software used to create and simulate the digital twin ages, it can become less accurate and less reliable. This can affect the accuracy and reliability of the digital twin, and can cause it to produce incorrect or inconsistent predictions. In some cases, this can lead to incorrect or sub-optimal decisions or actions based on the digital twin’s predictions. To address software aging problems in digital twin systems, it is important to periodically update and maintain the software used to create and simulate the digital twin. This can involve installing patches and updates, fixing errors and defects, and re-tuning or re-calibrating the software to account for changes in the environment or the system being modeled. Regular maintenance and updates can help to ensure that the digital twin remains accurate and reliable over time, and can help to prevent or mitigate the effects of software aging.
Digital twin systems can experience a variety of errors, depending on the specific characteristics of the system and the software being used. Some common types of errors that can occur in digital twin systems include:
Data errors: Digital twin systems are typically based on data about the behavior and performance of the physical system being modeled. If the data are incorrect or inconsistent, it can cause errors in the digital twin. For example, if the data contain missing or invalid values, or if the data are not properly pre-processed or cleaned, it can affect the accuracy and reliability of the digital twin.
Modeling errors: Digital twin systems are based on mathematical models of the physical system being modeled. If the model is incorrect or incomplete, it can cause errors in the digital twin. For example, if the model does not accurately represent the underlying physical principles or behaviors of the system, or if the model is not properly calibrated or validated, it can affect the accuracy and reliability of the digital twin.
Software errors: Digital twin systems are implemented using software, and software can contain errors or defects. If the software used to create or simulate the digital twin contains errors, it can cause the digital twin to behave in unexpected or incorrect ways. For example, if the software contains bugs or syntax errors, or if the software is not properly designed or implemented, it can affect the accuracy and reliability of the digital twin.
Overall, digital twin systems can experience a variety of errors, including data errors, modeling errors, and software errors. To address these errors and improve the accuracy and reliability of the digital twin, it is important to carefully collect and pre-process the data, to create accurate and well-calibrated models, and to use high-quality software that is free of errors and defects.
When using virtualization, it is possible to implement many servers in a smaller number of hosts (physical servers), which consequently implies the gain of physical spaces and energy cost reduction. However, once the virtual machine is initialized, all the hardware on which the Operating System (OS) is running is loaded and not just a copy of the OS, resulting in the consumption of many system resources, making virtualization very expensive from a computational point of view [
7]. The use of containerization mitigates the operational cost of traditional virtualization, as stated by [
5], in which the author addresses host-level virtualization known as container, which is another type of virtualization. This type of virtualization acts on top of the physical server offering support to several independent systems since the physical server already has an OS installed, not needing to load all the host hardware or its copy. Container-based virtualization has recently gained much attention [
8,
9]. This virtualization makes an application run efficiently in the most varied computing environments through its encapsulation and its dependencies [
10]. This virtualization technique is said by the author of [
11] to be lightweight, as the system significantly decreases workloads by sharing OS resources from host. Containers provide an isolated environment for system resources such as processes, file systems and networks to run at the host OS level, without having to run a Virtual Machine (VM) with its own OS on top of virtualized hardware. By sharing the same OS kernel, containers start much faster using a small amount of system memory compared to booting an entire virtualized OS like in [
10] VMs.
Kubernetes is a widely used tool for managing containers, configure, maintain and manage solutions that have containers as an approach to the detriment of VMs. Thus, this work aims to evaluate the effects of software aging and the performance of
Kubernetes when undergoing a high-stress load, characterized by creating replicas of pods to maintain service availability in the
Nginx and
K3S environments. Furthermore, the aging problem on an unmanned vehicle refers to the degradation of the vehicle’s performance over time, due to factors such as wear and tear, corrosion, and obsolescence. As an unmanned vehicle ages, its components may become less reliable and less capable of performing their intended functions, which can affect the vehicle’s ability to operate safely and effectively. The aging problem can be particularly challenging for unmanned vehicles, as they often operate in harsh or hostile environments, and they may be subjected to high levels of stress and strain. For example, an unmanned aerial vehicle (UAV) may experience high levels of vibration and air turbulence during flight, which can cause its components to wear out faster. Similarly, an unmanned underwater vehicle (UUV) may be exposed to corrosive saltwater, which can cause its components to corrode and deteriorate over time. In this work, our focus is on the investigation of a digital twin cloud infrastructure in which a
Kubernetes-based cloud environment is investigated regarding software aging phenomenon of the cloud if hosting the UAM-ODT with no downtime.
The study in this work extends the related research area on software aging in virtualized environment through the following key contributions:
proposed a cloud based simulation platform with provisioning for the development of UAM-ODT infrastructures
proposed a methodology for measurement and assessment of software aging in a container-based environment with a Kubernetes cluster in the digital twin cloud.
performed comprehensive test-bed experiments and observations of software aging phenomena along with software rejuvenation in Kubernetes clusters based on Minikube and K3S environments.
Findings and impacts:
- –
It is important to stress that aging events found in test-bed experiments indicate the threats of system failures and performance degradation due to software aging symptoms. However, the time that those events will occur depends on the characteristics and intensity of the workload that the system needs to process, as well as the hardware and software specification of that Kubernetes system.
- –
If the system has more resources available or less workload than those employed in this experiment, the aging phenomenon would be slower, and subsequently, the failures due to resource exhaustion would take longer to occur. This fact does not reduce the importance of evaluating software aging in those systems as well as planning actions for their mitigation.
To the best of our knowledge, this work contributes to the practical implementation and maintenance of virtualized environment on the perspectives of system dependability in digital twin computing infrastructures in which a huge amount of services are running with a stringent requirement of continuity. The findings of this study bring about the comprehension of software aging phenomena in digital twin computing infrastructures developed on top of Kubernetes, which is at very early stage of current research on software aging problems for a high level of dependability and fault-tolerance in digital twin computing infrastructures.
In order to facilitate the understanding of this work, the paper is organized as follows.
Section 2 addresses the related works that inspired this study on software aging assessment;
Section 3 presents the fundamental concepts and system design used in this work;
Section 4 deals with the methodology used in the research; the objective and planning, covering the context in which it was produced, the tools selected, variables involved, scripts for reproduction and the hardware used are discussed in
Section 5. The results are presented and discussed in
Section 6. In
Section 7 are the remarks arising from our research results.
2. Related Work
The work described in [
10] analyzes the performance of running containers with services hosted on them, carrying out experiments with containers monitoring system resources, including network, memory, disk, and
CPU. The testbed environment consists of a
Kubernetes cluster manually deployed to carry out the evaluation, considering the Microsoft Azure Kubernetes Service (AKS), Google
Kubernetes Engine (GKE), or Amazon Elastic Container Service for
Kubernetes (EKS).
The authors in [
12] evaluated the memory utilization, network overhead of containers, storage, and
CPU using Docker, comparing them with KVM hypervisors. They exposed in their experiments that the containers obtained, in the worst case, similar or superior performance when compared to the VMs.
The work presented in [
13] conducted a similar study, however, comparing the performance obtained from containers when monitoring the number of requests an application server could handle in relation to the same application deployed in a VM and the results showed that the VMs had significantly outperformed the containers.
The research reported in [
14] performed application experiments for HPC (high-performance computing), using benchmarking tools to evaluate memory, network, disk, and
CPU performance in Linux Container (LXC) related virtualization implementations, along with OpenVZ and Linux VServer, showing that all containerized apps performed similarly to a native system.
The authors of [
15,
16] showed improvements obtained related to performance isolation for MapReduce workloads. However, when evaluating disk workloads, LXC failed to fully isolate resources, opposite behavior to that of hypervisor-based systems.
Through memory, network, and disk metrics, the authors of [
17] evaluated the performance of LXC, Docker, and KVM running many benchmarking tools to measure the performance of these components and concluded that the overhead caused by container-based virtualization technologies could have its weight considered irrelevant, despite the performance being compensated by safety.
Our main focus is on software aging investigation on a private cloud system hosting an operational digital twin of an eVTOL vehicle flying in a virtualized urban air mobility. Operational digital twins of vehicles in urban air mobility are digital representations of real-world vehicles that can be used for a variety of purposes. Some potential uses of operational digital twins in urban air mobility include:
Performance modeling and simulation: Operational digital twins can be used to model and simulate the performance of vehicles in urban air mobility systems, including their flight dynamics, propulsion systems, and control systems. This can help to optimize the design and operation of vehicles, to improve their performance and efficiency, and to identify potential issues or risks.
Fleet management and maintenance: Operational digital twins can be used to monitor the condition and performance of vehicles in real time, and to provide information about their current state and status. This can be used to support fleet management and maintenance operations, by providing timely and accurate data about the health and safety of vehicles, and by enabling proactive maintenance and repair.
Traffic management and control: Operational digital twins can be used to support traffic management and control in urban air mobility systems, by providing information about the location, orientation, and velocity of vehicles. This can help to coordinate the movement of vehicles, to avoid collisions and other hazards, and to optimize the flow of traffic in urban airspace.
Emergency response and rescue: Operational digital twins can be used to support emergency response and rescue operations in urban air mobility systems, by providing real-time information about the location and status of vehicles. This can help to quickly and accurately identify the location and condition of vehicles in distress, and to coordinate rescue and recovery efforts.
Operational digital twins of vehicles in urban air mobility can be used for a variety of purposes, including performance modeling and simulation, fleet management and maintenance, traffic management and control, and emergency response and rescue. Due to such constant operational services, the UAM-ODT cloud system is inevitable to suffer software aging problems. In this study, we specifically investigate the software aging problems of a UAM-ODT cloud system based on Kubernetes virtualization environment.
4. Methodology
The methodology adopted in this work followed the flow shown in
Figure 7, which in summary is based on an experimental evaluation applied by measuring the use of system resources and performance in a container-based environment with a
Kubernetes cluster. The evaluation was carried out in different scenarios using the
Nginx or
K3S tool to manage the cluster.
Kubernetes is an open-source platform for managing and orchestrating containerized applications. It allows you to deploy and manage multiple containers, such as those created with Docker, across a cluster of machines, and it provides many features and tools to help you automate, scale, and manage applications and their dependencies. One of the main benefits of Kubernetes is that it helps to simplify and automate the process of deploying and managing applications in a distributed environment. This can save time and effort, and it can help to improve the reliability and scalability of applications. Additionally, Kubernetes provides many features and tools that can help you manage and monitor applications, such as: (i) Service discovery and load balancing: Kubernetes can automatically assign unique IP addresses to each of containers, and it can automatically distribute incoming traffic across the containers in cluster. (ii) Configuration management: Kubernetes allows you to define application’s configuration in a declarative manner, using YAML files or other configuration formats. This can make it easier to manage and update the configuration of applications. (iii) Health checking: Kubernetes can monitor the health of containers and applications, and it can automatically restart or replace containers that are not functioning properly. (iv) Self-healing: Kubernetes can automatically detect and recover from failures in application, such as when a container crashes or when a node in cluster goes down. Therefore, he benefits of Kubernetes include improved automation, scalability, and reliability for applications, as well as a rich set of features and tools for managing and monitoring applications in a distributed environment [
35].
In this research, the experiments were carried out using scripts developed for these scenarios to simulate a service’s distribution using a Kubernetes cluster, which can be accessed externally through the Internet, receiving a high-stress load by performing requests. We have also developed scripts to monitor software aging metrics, such as CPU utilization, memory consumption, and disk utilization, among others, in order to measure the performance of a service hosted in Kubernetes by checking the time of requests correctly fulfilled.
The environments of Nginx and K3S adopted in this experimental evaluation are composed of a cluster containing 5 Pods and 1 Service—that allows communication between the Pods. One of the Pods was configured as a Deployment of an Nginx web server, which enabled testing the performance of an application hosted in Kubernetes, responding to user requests from anywhere connected to the Internet.
CPU utilization, memory consumption, disk utilization, and total response time were some of the metrics used for this study, based on the metrics used in [
10,
14]. The results of these measures were captured by scripts developed for this purpose and, finally, evaluated through analysis of their behavior.
The proposed methodology actually can be applied for typical operational digital twin version of heterogeneous UAM vehicles including rotary aircrafts such as drones or helicopters, fixed-wing aircrafts or hybrid aircrafts such as eVTOL vehicles. Since the UAM-ODT platform is designed to run on a private cloud computing system based on cloud-in-the-loop simulation paradigm with heterogeneous digital twin modules of dynamics and controls as shown in
Figure 4.
6. Experimental Results
In this section, the results collected from the experiment will be presented for both Minikube and K3S environments, considering the metrics of CPU utilization, memory consumption, disk utilization, and, finally, the requests made to the service. Each metric result is described in the following subsections. These are metrics for continuity and performance of the UAM-ODT cloud infrastructure. The data were collected from the cloud infrastructure rather than from the vehicle. The reason is we are investigating the software aging problems in a private cloud to host 24/7/365 operational digital twin services for UAM management.
It is worth highlighting that the experiments’ total time differs in
Minikube and
K3S due to a difference in the average time to restart the pods within the auto-scaling process. This information was also measured and is presented in
Table 1, showing the fastest execution of this action in the
K3S environment,
25.4% faster than
Minikube, evidencing an improved efficiency in auto-scaling of
K3S when compared to
Minikube.
6.1. CPU Utilization
In the CPU utilization evaluation, data were collected from the following specific metrics: USR, which is the percentage of CPU used by the task during execution at the user level; SYS, which is the percentage of CPU used by the task during execution at the kernel level of the OS; WAIT is the percentage of CPU spent by the task while waiting to be executed; and finally, the CPU_TOTAL, which is the total percentage of CPU time used by the task monitored by Pidstat tool, which provides statistics report for the tasks on GNU/Linux systems.
Figure 10 shows a peak of
180% of
CPU_TOTAL during the initialization of the Cluster, but with an average slightly above
100% during the entire experiment in the
Minikube environment. It is also possible to notice in the graph a controlled behavior within
Minikube about the metrics limits since the limit is only exceeded when starting the environment. Notice also that values of utilization higher than
100% in this context are related to the usage of more than one core of the processor by this process.
Figure 11 shows a different behavior of
K3S about
Minikube regarding
CPU utilization when we look at the
CPU_TOTAL metric, which, unlike
Minikube, it shows an increase in
CPU_TOTAL utilization together with the
USR metric over time, being interrupted when applying the cluster termination, which seems to act as a software rejuvenation technique for this situation. Although, during the entire experiment, the
CPU_TOTAL did not exceed
60%.
6.2. Disk-Related Metrics
In the evaluation of disk-related metrics, data were collected for the following metrics: READ, which represents the amount of kilobytes per second that the task took to be read; WRITE, which is the amount of kilobytes per second that the task sent to be written to the disk; and finally CANCELLED, which is the amount of kilobytes per second whose disk writing was canceled by the task, that can also occur when the task truncates some dirty page cache. All these metrics were monitored by the Pidstat tool in both Minikube and K3S.
In
Figure 12, the
WRITE and
CANCELLED metrics have their behavior unchanged throughout the experiment, always walking close to
0 KB/s. Although, the
READ metric had a distinct behavior, holding the same value throughout a single cycle of the cluster stress, and presenting a linear growth among cycles until the fourth execution cycle, being interrupted abruptly when reaching about
4,000,000 KB/s due to the limiting factor of the
Minikube environment. Such behavior may be indicative of the software aging phenomenon in this environment.
In
Figure 13, the
WRITE and
CANCELLED metrics also remain close to
0 KB/s throughout the experiment, similar to the execution in the
Minikube environment. However, the behavior is different in the
READ metric, which was not interrupted abruptly and had a linear growth from one cycle to another until the end of the experiment execution in
K3S. It is important to mention that
K3S presented smaller values of bytes read per second than
Minikube, which might have prevented it from the abrupt fall observed there.
6.3. Memory Usage
In the evaluation of memory consumption, data were collected for the metrics: MEM_USED, which represents the calculation of the total memory used; MEM_FREE, which is the memory that is not being used; MEM_AVAILABLE, which estimates how much memory is available to start new applications without swapping (it may include memory space that is being used for buffers or cache); MEM_SHARED, which is the memory mainly used by TMPFS which is the file system that keeps all files in virtual memory; MEM_BUFFERS_CACHED, which is the sum of the memory buffers and cache; SWAP_USED and SWAP_FREE metric, which represent respectively the used and free amount of virtual memory’s swap space, that allows the system to use a part of the hard disk as physical memory. All these metrics were monitored using the “free” tool in both the Minikube and K3S environments.
In the evaluation of memory utilization in
Minikube, the
MEM_USED metric in
Figure 14 has its behavior mirrored with that of the
MEM_AVAILABLE metric, while the
MEM_USED increases throughout the experiment, the
MEM_AVAILABLE decreases in an inversely proportional trend.
MEM_USED has a consumption increase of around
70% at the end of the experiment, even applying rejuvenation (i.e., cluster termination and restart between cycles). Such an action drops the memory usage temporarily, but when the cluster is started again, the system restores the same memory usage level observed at the end of the previous cycle. The
MEM_FREE metric has a drop close to
48%. The
MEM_BUFFERS_CACHED metric has a drop of around
41%. The
SWAP_USED metric also behaves inversely to the
SWAP_FREE metric, while the
SWAP_USED has a
20% increase at the end of the experiment and
SWAP_FREE a drop of
11%. The
MEM_SHARED metric in both
Minikube and
K3S behave similarly, maintaining a regularity between
48 to
179 MB of consumption.
In the evaluation of memory utilization in
K3S, the
MEM_USED metric in
Figure 15 showed behavior similar to that observed in
Minikube.
MEM_USED has a consumption increase of around 61% at the end of the experiment, even applying rejuvenation. The
MEM_FREE metric has a decrease of close to 79%.
MEM_BUFFERS_CACHED has an increase of around 12%, which differs from the behavior in
Minikube. The
SWAP_USED has an increase of 8% when it reaches the end of the experiment and the
SWAP_FREE a decrease of 8.5%.
For these memory consumption metrics, both in Minikube and in K3S, linear regression calculations on MEM_USED were performed to estimate the moment when the system would reach its upper limit for RAM usage, which in these cases is 8 GB. To confirm that estimate, we also computed the linear regression for MEM_FREE, which is another way to indicate the exhaustion of the resource, leading to system downtime and, consequently, the interruption of service provision. Similar regression estimates were carried out for the swap space usage.
Equation (
1) of the linear regression was obtained for the
MEM_USED metric in the
Minikube environment (
MU_Minikube), shown in
Figure 14. From this equation, it is possible to observe, as a function of
MU_Minikube, that the
8 GB limit is reached after 170 h (i.e., 7 days and 2 h) of continuous execution of the workload used in the experiment. For the
SWAP_USED metric, also exposed in
Figure 14 for the
Minikube environment, the linear regression Equation (
2) was obtained.
In Equation (
2), it is possible to observe that the upper limit of the
SWAP_USED metric of the
Nginx environment (SU_Minikube), which in this case is 5.8 GB, is reached after approximately 551 h of experiment, or 22 days of the same, so that the resource was completely exhausted.
For the
MEM_USED metric of the
K3S environment (
MU_K3S), the linear regression Equation (
3) was obtained which, through it, it is possible to observe that the upper limit of 8GB of resource for the
MEM_USED metric is reached after 187 h (i.e., 7 days and 8 h) of workload execution.
Finally, the
SWAP_USED metric of the
K3S environment (
SU_K3S) had the linear regression Equation (
4) obtained, which allows the visualization of resource exhaustion, which has a total of 5.8GB, after 603 h (i.e., 25 days and 1 h) of workload performed in the experiment.
6.4. Evaluation and Discussions
When evaluating the results presented in
Figure 10 and
Figure 11, it can be seen that most of the
CPU consumption happens through the
USR metric in the
K3S environment, while the
SYS metric does the highest consumption in the
Nginx environment. This growth behavior of the
USR metric in
K3S was recurrent even after applying Software Rejuvenation every cycle, unlike the
Nginx environment that maintains stability in the consumption of its
CPU utilization metrics.
The results presented in
Figure 12 and
Figure 13 show similar behavior in the use of disk usage metrics in the
K3S and
Nginx environment, differing only that in
Nginx, the
READ metric presents an interruption when it reaches
4,000,000 KB/s, returning in the fifth cycle with a total utilization close to
10%. In the
K3S environment, this
READ metric does not suffer an interruption but presents a linear growth from one workload cycle to another. In both scenarios, the
READ metric generally presents a linear growth representing a greater need for reading from disk at each cycle.
Figure 14 and
Figure 15 show similar behavior related to memory consumption metrics in the
Nginx and
K3S environments, respectively. In both, there is a linear growth of the
MEM_USED metric and in the
SWAP_USED metric, and the opposite behavior of the
MEM_AVAILABLE and
SWAP_FREE metrics. Thus, with Equations (
1) and (
2) obtained from linear regression, it is possible to glimpse the effects arising from software aging related to memory consumption even after the application of a potential software rejuvenation action, that is, the cluster termination.
The results presented in this section are the observation of software aging phenomenon for a private cloud system hosting a UAM-ODT platform for UAM management. It is crucial to emphasise that those findings point to the dangers of system breakdowns and performance declines brought on by signs of software ageing. However, the timing of those events relies on the nature and volume of the workload that the system must handle, in addition to the hardware and software requirements of particular Kubernetes system. The ageing phenomena would be delayed and the failures caused by resource exhaustion would follow if the system had more resources available or a lighter burden than that used in this experiment. This reality does not lessen the significance of assessing the software ageing in those systems and organising countermeasures. Evaluating these scenarios using other software rejuvenation approaches and complementary metrics related to software aging are the most promising steps that could be taken in future work.
Regarding how to avoid the observed software aging phenomenon in the UAM-ODT infrastructure, in general, there are several strategies that can be used to avoid or mitigate software aging. These can include: (i) regularly updating and patching the software to fix bugs and security vulnerabilities; (ii) monitoring the performance of the software and identifying potential problems before they occur; (iii) implementing automation and management tools to help manage the software and its dependencies; (iv) using modular, microservice-based architectures to make it easier to update and maintain individual components of the system; (v) using containerization technologies, such as Docker, to package the software and its dependencies into a self-contained environment that can be easily deployed and managed. These are just some examples of strategies that can be used to avoid software aging in a cloud system. Currently, the technique to avoid software aging is monitoring the performance of the software and identifying potential problems before they occur. Further investigation on how to adopt the software rejuvenation techniques in optimal and automatic manner will be an interesting extension for research into the UAM-ODT system in which the services for UAM management using ODT are constant and at zero downtime.