Next Article in Journal
Design of High-Speed Thin-Film Lithium Niobate Modulator Utilizing Flip-Chip Bonding with Bump Contacts
Previous Article in Journal
Characterization of Single-Event Effects in a Microcontroller with an Artificial Neural Network Accelerator
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications

1
Google LLC, Sunnyvale, CA 94089, USA
2
Business Information Developer Consultant Company, Carelon Research, Celina, TX 75009, USA
3
Cloud Software Development Engineer and Technical Lead at Intel, Phoenix, AZ 85050, USA
4
Department of Software Engineering, Walmart Global Tech, Sunnyvale, CA 94086, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2024, 13(22), 4462; https://doi.org/10.3390/electronics13224462
Submission received: 2 October 2024 / Revised: 3 November 2024 / Accepted: 7 November 2024 / Published: 14 November 2024

Abstract

:
Artificial Intelligence (AI) applications are rapidly growing, and more applications are joining the market competition. As a result, the AI-as-a-service (AIaaS) model is experiencing rapid growth. Many of these AIaaS-based applications are not properly optimized initially. Once they start experiencing a large volume of traffic, different challenges start revealing themselves. One of these challenges is maintaining a profit margin for the sustainability of the AIaaS application-based business model, which depends on the proper utilization of computing resources. This paper introduces the resource award predictive (RAP) model for AIaaS cost optimization called RAP-Optimizer. It is developed by combining a deep neural network (DNN) with the simulated annealing optimization algorithm. It is designed to reduce resource underutilization and minimize the number of active hosts in cloud environments. It dynamically allocates resources and handles API requests efficiently. The RAP-Optimizer reduces the number of active physical hosts by an average of 5 per day, leading to a 45% decrease in server costs. The impact of the RAP-Optimizer was observed over a 12-month period. The observational data show a significant improvement in resource utilization. It effectively reduces operational costs from USD 2600 to USD 1250 per month. Furthermore, the RAP-Optimizer increases the profit margin by 179%, from USD 600 to USD 1675 per month. The inclusion of the dynamic dropout control (DDC) algorithm in the DNN training process mitigates overfitting, achieving a 97.48% validation accuracy and a validation loss of 2.82%. These results indicate that the RAP-Optimizer effectively enhances resource management and cost-efficiency in AIaaS applications, making it a valuable solution for modern cloud environments.

1. Introduction

The current cloud application trends demonstrate the rapid growth of AI-driven applications powered by AIaaS as the back-end [1]. As a result, the demand for efficient resource utilization has become paramount [2]. Many innovative cloud-based services, particularly those adopting the AI-as-a-service (AIaaS) model, suffer from underutilization of resources, leading to escalating operational costs [3]. The initial resource utilization assumptions regarding the resource limitation of these applications change as the number of users and API requests increases. That is why service providers face the challenge of managing their physical and virtual resources effectively while maintaining the quality of service (QoS) [4]. The RAP-Optimizer presented in this paper addresses these issues by optimizing resource allocation and minimizing the number of active hosts in AIaaS environments. This system not only reduces server costs but also improves resource efficiency. As a result, it leads to better profit margins and energy savings. The potential of the RAP-Optimizer lies in its ability to dynamically balance API request loads across cloud servers. This is how it prevents the unnecessary activation of additional resources and ensures sustainable and scalable cloud operations.
The proposed RAP-Optimizer was developed by integrating a deep neural network (DNN) [5] with the simulated annealing algorithm [6] to create a robust framework for real-time resource management. The DNN predicts the optimal configuration for virtual machines (VMs) based on real-time data analysis, while the simulated annealing algorithm helps optimize resource allocation by minimizing the number of active hosts. Additionally, the system incorporates a dynamic dropout control (DDC) algorithm to mitigate overfitting issues during the model training phase. The RAP-Optimizer operates in a multi-stage workflow, beginning with resource analysis through the resource analyzer (RAN) algorithm, which identifies underutilized hosts and redistributes API requests to ensure optimal cloud resource usage. By consolidating workloads and deactivating idle hosts, the system can enhance energy efficiency while maintaining the agreed quality of service (QoS) for users. The key contributions of this paper are summarized as follows:
  • Dynamic resource optimization: A novel integration of DNN and simulated annealing for dynamically balancing API requests and resource utilization across active cloud hosts.
  • Cost reduction mechanism: Demonstrated significant server cost reduction through the RAP-Optimizer, leading to improved profit margins and reduced energy consumption.
  • Multi-stage optimization workflow: Introduction of a multi-stage workflow utilizing the RAN algorithm for comprehensive resource analysis, ensuring effective redistribution of workloads across physical and virtual machines.
  • Handling overfitting with DDC: An innovative dynamic dropout control (DDC) algorithm integrated into the DNN to overcome overfitting during model training and enhance prediction accuracy.
  • Revenue margin increase: The proposed system improved profit margins by 179% over 12 months, increasing the average profit margin from USD 600 to USD 1675.
The remainder of this paper is organized as follows. The literature review is presented in Section 2. Section 3 provides an in-depth problem analysis, identifying the challenges that led to the development of the RAP-Optimizer. Section 4 outlines the methodology, describing the integration of DNN, simulated annealing, and the RAN algorithm, along with the dataset preparation and feature normalization steps. Section 5 presents the experimental results and evaluations, including the performance analysis of the proposed system and its ability to optimize cloud resource usage. Section 6 discusses the limitations and potential future improvements for the RAP-Optimizer. Finally, Section 7 concludes the paper, summarizing the key findings and contributions.

2. Literature Review

Resource optimization in cloud computing is a widely researched topic, particularly in the context of infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS). However, there remains a significant gap in addressing optimization strategies specifically tailored for AI-as-a-Service (AIaaS) models, which are characterized by fluctuating workloads and high computational demands [7]. This section reviews recent studies on cloud resource optimization, workload balancing, and overfitting issues in deep learning models, identifying gaps that the proposed RAP-Optimizer aims to address.

2.1. Cloud Resource Optimization

According to the survey conducted by Mohammadzadeh et al. [8], resource optimization for traditional cloud services, such as virtual machine (VM) allocation and CPU/RAM resource distribution, is the predominant field of research in cloud resource optimization. AIaaS differs from traditional cloud services, which require real-time, scalable computation [9]. Furthermore, most solutions, like the hill-climbing (HC) algorithm, operate reactively and often activate new physical hosts without fully utilizing existing resources, resulting in higher operational costs [10]. The proposed RAP-Optimizer addresses this gap by integrating a DNN with simulated annealing [6] to dynamically allocate resources based on real-time workloads and optimize resource utilization.

2.2. Workload Balancing and API Request Handling

Numerous studies have examined workload balancing in cloud environments [11,12,13]. However, these approaches generally focus on static or semi-dynamic strategies that do not adapt quickly to the rapidly changing workload patterns seen in AIaaS platforms. For example, Kumar et al. [14] presented an approach for balancing workloads across cloud servers but did not consider the possibility of reducing the number of active hosts when resources are underutilized. Additionally, ref. [15] emphasized load distribution based on CPU and memory but did not consider network bandwidth and other critical factors such as disk I/O [16]. In contrast, the RAP-Optimizer efficiently reallocates API requests by utilizing fewer physical hosts while ensuring maximum CPU and memory utilization, thus overcoming the limitations of static workload balancing methods.

2.3. Overfitting in Deep Neural Networks

Handling overfitting in DNNs is an active field of research with numerous regularization techniques [17]. According to Alnagashi et al. [18] dropout is an effective way to mitigate the overfitting issue. A review conducted by Salehin et al. [19] on different dropout techniques reveals that most approaches use a fixed dropout rate. The literature shows that while dropout can be effective, it is not adaptive to different layers of the network, leading to either underfitting or overfitting in complex applications [20]. This paper fills this gap by introducing the dynamic dropout control (DDC) algorithm, which dynamically adjusts the dropout rate layer-by-layer, reducing the overfitting issue without compromising the model’s predictive performance.

2.4. Energy-Efficient Cloud Systems

The optimization of energy consumption in cloud data centers has been explored in various studies [21,22,23]. The primary focus of these is to reduce energy consumption by consolidating VMs or activating power-saving modes on underutilized hosts [24]. However, the existing methods primarily focus on reducing the number of active physical hosts without considering the trade-off between resource optimization and maintaining service quality [25]. The proposed RAP-Optimizer not only reduces the number of active hosts by an average of five per day, but it also improves resource utilization, resulting in substantial energy savings without degrading the quality of service (QoS).

2.5. Revenue Impact and Cost Optimization

Revenue impact and corresponding cost optimization is an under-explored field of research for AIaaS [26]. Multiple studies revolve around traditional cloud services [27,28,29]. However, these studies do not account for the dynamic and unpredictable nature of AIaaS platforms, where the operational cost can escalate quickly due to inefficient resource management [30]. The RAP-Optimizer directly addresses this gap by significantly reducing server costs—by 45% on average—while maintaining consistent service delivery. This allows AIaaS platforms to stabilize their profit margins over time, which is often overlooked in traditional cloud optimization research.

2.6. Comparison with Existing Technologies

Current cloud resource optimization techniques, including conventional autoscaling [31] and rule-based allocation models [32], offer reliable performance in predictable workloads but struggle in complex AI-as-a-service (AIaaS) applications where demand can be highly variable and unpredictable. Many approaches activate additional resources reactively without assessing overall resource utilization, leading to increased operational costs and underutilized resources [33].
In contrast, the proposed RAP-Optimizer is uniquely equipped to handle such dynamic conditions by integrating a deep neural network (DNN) for predictive VM configuration alongside the simulated annealing algorithm, which efficiently minimizes active host usage. Additionally, the dynamic dropout control (DDC) mechanism prevents overfitting during model training, ensuring accurate, real-time predictions even as workloads fluctuate. This combination allows RAP-Optimizer to consolidate workloads onto fewer hosts, dynamically redistribute API requests, and deactivate idle hosts, achieving up to 45% reduction in server costs, as shown in our experimental analysis. These attributes demonstrate RAP-Optimizer’s ability to maintain high resource utilization and reduce costs effectively in challenging AIaaS scenarios. The summary of the comparison of the proposed system with existing technology is listed in Table 1.

3. Problem Analysis and Objective

This paper developed a solution for an AI-driven image enhancement web application. It follows the AI-as-a-service model with a pay-as-you-go billing method. The initial margin between server cost and the overall subscription fee of the application is convenient which is illustrated in Figure 1. This figure shows the relationship among server cost, return, and revenue margin over a 12-month period of time. However, eventually, with the popularity, the margin narrowed down. As a result, the application’s revenue flow started impacting the business model’s overall sustainability. Figure 1. The reason behind this fall in revenue is the increase in the number of active hosts. Further investigation shows the computing resources in the active hosts are not fully utilized. Although there are scopes for handling more API calls with the existing hosts, the system activates new hosts. As a result, the operational cost increases. The challenge is identifying the under-utilized host servers, allocating the new requests to these hosts, and reducing the number of active hosts. In this way, the operational cost can be minimized, increasing the profit margin at a sustainable level. The proposed methodology was developed to achieve this objective.

4. Methodology

The overview of the proposed methodology is illustrated in Figure 2. It starts with dataset processing, constructed from the experiment’s AIaaS application log files. Later, a well-optimized DNN architecture is developed to predict the appropriate configuration for a service request. After that, an innovative approach of reducing the overfitting effect and shifting to balance learning is incorporated into the network during the training process. The trained DNN was used to develop the RAP-Optimizer, which consists of a resource analysis module, resource space landscape, and a novel deep-annealing algorithm.

4.1. Application-Specific Adaptability and Resource Requirements

The RAP-Optimizer is designed to dynamically adapt to the characteristics of the AIaaS application it manages, with specific consideration given to image size, processing complexity, and frequency of API requests. In this study, the AIaaS application performs image enhancement, which demands substantial processing power depending on the size and quality of the images processed. For example, larger images or those requiring high-resolution outputs necessitate more virtual CPU cores and memory allocation due to increased computational demands.
The system determines the resource allocation based on these application-specific parameters. For instance, when processing high-resolution images, the DNN model predicts a configuration with additional CPU and memory resources for the corresponding VMs. This ensures the required computational power is available without over-provisioning resources. Additionally, the RAP-Optimizer monitors ongoing resource utilization to adjust allocations as image processing demands fluctuate, preventing unnecessary host activations and optimizing the allocation of VMs based on workload intensity.

4.2. Dataset Preparation

This study uses a unique dataset prepared from the AIaaS application log and a sample of the dataset is presented in Table 2. The application maintains three types of log files, which are the activity log ( L o g a c ), system log ( L o g s t ), and application log ( L o g a p ). The relationships among these logs are illustrated in Figure 3. The L o g a c keeps track of the user interaction, L o g s t is responsible for keeping track of computational resource usage, and L o g a p is dedicated to application-related data. All data from these logs are converted into comma-separated value (CSV) format [34]. After that, the instances are categorized into five classes as presented in Table 2. The Pearson correlation coefficient (PCC) score, calculated using Equation (1), was used to identify the relevant features that have a strong correlation with the five classes [35].
r = i = 1 n ( X i X ¯ ) ( Y i Y ¯ ) i = 1 n ( X i X ¯ ) 2 i = 1 n ( Y i Y ¯ ) 2
In Equation (1), X i represents the individual feature values, Y i stands for the individual target variable, X ¯ denotes the mean of the features, Y ¯ denotes the mean of the target variable, and n denotes the total data points. It generates the linear correlation score (r) between X and Y with a range from 1 to + 1 , where the former represents a perfect negative correlation and the latter denotes a perfect positive correlation.

4.2.1. Dataset Description

The dataset contains records of the experimental application for 365 days. After cleaning, there are 142,784 instances in the dataset. The feature variables are peak frequency, active time (hours), API initiation count, service requests, virtual CPU, virtual RAM (GB), virtual disk (GB), energy usage (Wh), and cloud configuration. Except for the target variable, all other features are numerical. The target variable is categorical, which has five categories.

4.2.2. Dataset Cleaning

The CSV file constructed from L o g a c , L o g s t , and L o g a p contains numerous incomplete rows. These rows were created for multiple reasons, including incomplete service requests, API initialization failure, and network issues. In addition, there are multiple duplicate rows. The uncleaned dataset has 163,150 instances. This dataset was cleaned by following the mathematical principle defined in Equation (2).
C * = { y C z j Z , y j null }
In Equation (2), C represents the uncleaned dataset. It contains n records, where C = y 1 , y 2 , , y n and each y i corresponds to the i t h observation vector across all variables Z = z 1 , z 2 , , z p . The cleaned dataset C includes only those observations from C where no variable in the observation vector is missing, denoted by “null”. Additionally, outliers from C, determined using the mean μ c and standard deviation σ c , are eliminated based on the rule in Equation (3) [36]. A data point y C is classified as an outlier if it satisfies y < μ c b · σ c or y > μ c + b · σ c , where b is a constant set to 3 according to the empirical 68-95-99.7 rule. The result is a dataset C with no outliers [37].
C * = { y C μ c b · σ c y μ c + b · σ c }
To clarify, in this context, C consistently denotes the uncleaned dataset with missing values and outliers, while C * represents the cleaned dataset that is devoid of any missing values or outliers. This distinction between C and C * helps to clearly understand the process of dataset preparation for the training phase. Although this cleaning process reduces the number of entries, the final count is 142,784 instances after cleaning, which is sufficient to train a deep neural network (DNN) to classify the target variables using the input features.

4.2.3. Feature Normalization

The feature variables exhibit a wide range of variations in the dataset, so feature normalization is essential [38]. Z-score normalization was chosen to normalize the features because it effectively handles data with varying scales and distributions. Figure 4 illustrates the difference in data range before and after performing the normalization. The Z-score normalization process for a feature x i is defined in Equation (4) [39].
x i = x i μ x σ x
In Equation (4), x i denotes the original feature value, μ x denotes the mean of the feature x, σ x denotes the standard deviation of the feature x, and x i denotes the normalized feature value. Here, i denotes the range, which is from 1 to 142,784. After applying Z-score normalization, the feature values are transformed, such that the dataset has a mean of zero and a standard deviation of one, ensuring all features contribute equally during model training [40].

4.2.4. Dataset Splitting

Ullah et al. [41] conducted a systematic review of machine learning (ML) applications, showing that most state-of-the-art approaches use a 70:15:15 dataset splitting ratio for training, testing, and validation datasets, respectively. The same ratio was used in this study. There are a total of 142,784 instances in the cleaned dataset. At the 70:15:15 ratio, there are 99,948 instances for training, 21,417 instances for training, and the same number for validation.

4.2.5. Dataset Characteristics and Generalizability

The dataset used in this study originates from an AI-driven image enhancement web application operating under the AI-as-a-service (AIaaS) model. That means the dataset represents real-world characteristics. The AI-driven application from which the data were collected processes a high volume of real-time API requests. Depending on the type of processing, the resource demands fluctuate. The user activity and the workload intensity have a significant impact on this fluctuation. The real-world workload patterns reflected in the dataset represent regular and peak activities, which are essential for rapid resource scaling. This dataset contains the consumption history of computing resources, including CPU, memory, and network resource allocation, which is typical of many AIaaS applications. Resource utilization needs are also diverse, with some processes intermittently requiring high computational power. These characteristics are common for various AIaaS applications. When the applications are designed to enhance images, these are the essential characteristics. Moreover, these features reflect the common properties of real-time AIaaS applications. Considering everything, the dataset prepared and utilized in this experiment supports generalization when applied for AIaaS application resource optimization.

4.3. Network Architecture

A deep neural network (DNN) was designed as the resource predictor of the proposed RAP-Optimizer. It is a six-layer network designed using the concept of a fully connected network architecture. It is illustrated in Figure 5. There are a total of four hidden layers. Including the input and output layers, the overall network is a six-layer network. The input layer supports an 8-dimensional feature vector defined as [ x 1 , x 2 , , x 8 ] . The hidden layers in Figure 5 are expressed by h 1 , h 2 , h 3 , and h 4 . Each layer has 32 neurons, which allows the network to intercept complex representations of the data. The network architecture was carefully designed to predict the cloud configuration requirements when AIaaS applications are initiated. The optimal dropout rate plays a crucial role in ensuring the reliable performance of the network. That is why an innovative dynamic dropout control (DDC) algorithm was prepared in this study. It was designed and integrated with the network to adjust the dropout rate of each layer dynamically for optimal unbiased performance. The working principle of the input layer is expressed by Equation (5). The input layer simply transposes the input vectors [42].
ξ = [ ξ 1 , ξ 2 , , ξ η ] T
The input layer, after transposing the feature vectors, passes them to the hidden layers. The input vector, expressed as χ [ l 1 ] , is processed by the hidden layers using a weight matrix Ω [ l ] , a bias vector ζ [ l ] , and the rectified linear unit (ReLU) activation function, denoted as φ ( · ) , described in Equation (7). For layer l, the transformation is formulated in Equation (6), where l = 1 , 2 , , 5 , and the dimensions of the weight matrices and bias vectors are Ω [ l ] R 32 × 8 and ζ [ l ] R 32 , respectively, [43].
χ [ l ] = φ ( Ω [ l ] χ [ l 1 ] + ζ [ l ] )
φ ( ξ ) = ξ if ξ > 0 , 0 otherwise .
The dataset has five target classes. The output layer of the network designed in this section was prepared to predict those classes. That is why it has five nodes. Each node represents one of the five target classes. The activation function used in this layer is the Softmax function, which is defined in Equation (8). The only role of this activation function is to convert the raw output values into probabilities. Given an input, ξ , the logits ς i are obtained through a linear transformation, followed by the Softmax function to compute the probabilities υ i for each class i, where i = 1 , , 5 [44].
υ i = exp ( ς i ) j = 1 5 exp ( ς j )

Optimal VM Configuration Predicted by DNN

The main purpose of the DNN designed in this section is to work as a VM configuration predictor. It is integrated with the RAP-Optimizer, and it predicts the optimal configuration for virtual machines (VMs) by analyzing real-time data on system load and resource availability. Multiple parameters participate in the prediction. These parameters are CPU cores, memory, storage, and expected API request load. The DNN was trained on these parameters as well. As a result, it is capable of predicting the VM configurations for a certain combination of these parameters. The primary goal of the DNN is to predict the configuration to minimize idle time and maximize resource utilization on active hosts. By accurately forecasting the configuration needed for each VM, the DNN prevents unnecessary resource allocation. As a result, each VM is provisioned with the optimal amount of resources for its workload.

4.4. Training the Network

The network is trained using the backpropagation algorithm [45]. During the forward pass, input data ξ propagate through the network, and each layer l calculates its output χ [ l ] as a function of the input from the previous layer χ [ l 1 ] , employing its weight matrix Ω [ l ] , bias vector ζ [ l ] , and activation function σ [ l ] . This process continues through all layers until the output prediction υ ^ is made. The prediction is then compared to the true labels υ using a loss function, defined in Equation (9), which measures the difference between predicted and actual values [46]. In this equation, Γ represents the number of output classes, υ i denotes the true label, and υ ^ i denotes the predicted probability for class i [47].
L ( υ , υ ^ ) = i = 1 Γ υ i log ( υ ^ i )

4.4.1. Learning Algorithm

The adaptive moment estimation (ADAM) optimizer is used to update the weights of the network’s hidden nodes. Initially, the vectors μ 0 and ν 0 are set to zero, and the time step τ is initialized to zero. The learning rate is represented by α , and β 1 , and β 2 are the decay rates for moment estimates, initialized to 0.90. At each time step τ , the gradients θ J ( θ ) with respect to the parameters θ are calculated. The updates for μ τ and ν τ are defined by Equations (10) and (11) [48].
μ τ = β 1 μ τ 1 + ( 1 β 1 ) θ J ( θ )
ν τ = β 2 ν τ 1 + ( 1 β 2 ) ( θ J ( θ ) ) 2
The bias-corrected estimates for μ τ and ν τ are given by Equations (12) and (13).
μ ^ τ = μ τ 1 β 1 τ
ν ^ τ = ν τ 1 β 2 τ
Finally, the weights ω are updated using Equation (14), where ϵ is a small constant for numerical stability.
ω = ω α μ ^ τ ν ^ τ + ϵ
At the 70:15:15 ratio, there are 99,948 instances for training, 21,417 instances for training, and the same number for validation.

4.4.2. Learning Curve

The training dataset contains 99,948 instances. To enhance training efficiency, mini-batches of size 64 are utilized. The learning curve, depicting the progress of both accuracy and loss for training and validation, is presented in Figure 6. It showcases the network’s performance over 50 epochs with 111,550 iterations, taking approximately 8 h and 27 min to complete. The validation accuracy achieved is 97.48%, with a validation loss of 2.82%, while the training accuracy and loss are 95.15% and 4.85%, respectively.

4.4.3. DDC Algorithm

During the development phase of the RAP-Optimizer, it was observed that the network overfits after training. The number of hidden layers and nodes of the existing layers was reduced to minimize overfitting. After the modification, the network exhibits underfitting characteristics. Table 3 shows different configurations explored during the development and the characteristics of the network along with other parameters. To ensure balanced fitting, the innovative DDC algorithm, presented as Algorithm 1, was developed and integrated into the network. It gradually discovers the optimal number dropout rate for different hidden layers and balances an overfitting network.
The initial dropout rate p 0 and the threshold δ were chosen based on empirical testing. Starting with a moderate value for p 0 , set to 0.2, provided a balanced starting point that allowed sufficient dropout without a sharp reduction in performance. This value was selected after trials indicated that lower dropout rates (e.g., below 0.1) had limited impact on overfitting, while higher rates (e.g., above 0.3) led to underfitting. The threshold δ , which determines the acceptable accuracy gap between training and validation, was set based on performance analysis; a value of 0.02 (or 2%) was found effective in preventing overfitting without sacrificing validation performance. These parameters are adjustable to account for specific data characteristics and can be tuned further as required by different workloads or applications.
Algorithm 1 Dynamic dropout control (DDC) algorithm.
  1:
Input: Trained network N , Training set X t r a i n , Validation set X v a l , Initial dropout rate p 0 , Number of layers L, Dropout increment Δ p , Threshold δ
  2:
Output: Modified network with minimized overfitting
  3:
Initialize p p 0
  4:
Compute initial accuracies Acc t r a i n , Acc v a l
  5:
Δ Acc t r a i n Acc v a l
  6:
while  Δ > δ  do
  7:
    for  l = 1 to L do
  8:
        if first iteration then
  9:
            h [ l ] Dropout ( h [ l ] , p )
10:
        else
11:
           Increase dropout: p p + Δ p
12:
            h [ l ] Dropout ( h [ l ] , p )
13:
        end if
14:
    end for
15:
    Recompute Acc t r a i n , Acc v a l
16:
    Update Δ Acc t r a i n Acc v a l
17:
end while
18:
Return  N with reduced overfitting

4.4.4. Achieving Optimal Unbiased Performance with DDC

The dynamic dropout control (DDC) algorithm is designed to maintain an optimal unbiased performance by dynamically adjusting dropout rates to mitigate overfitting or underfitting. Unbiased performance is defined by balanced training and validation accuracy, where the gap between these accuracies remains within 2–3%. Overfitting is indicated when training accuracy significantly exceeds validation accuracy, while underfitting is evident when both accuracies are low. The DDC algorithm progressively adjusts dropout rates for each layer, reducing the gap between training and validation accuracy and achieving a stable performance level where neither overfitting nor underfitting dominates.

4.4.5. VM Resource Requirements for AIaaS Workloads

The AIaaS application’s resource indicators are determined by the processing demands of each incoming task. For image enhancement applications, the system assesses metrics such as image size (in pixels) and required processing time to establish optimal VM configurations. VMs with varying CPU, memory, and storage allocations are configured to meet these processing requirements efficiently. For instance, tasks involving high-resolution images may require up to 8 CPU cores and 64 GB of memory, while lower-resolution tasks may only require 2 cores and 16 GB of memory. This dynamic configuration ensures that the RAP-Optimizer adapts VM allocations according to workload intensity, promoting efficient resource usage even as demand fluctuates.

4.5. RAP-Optimizer

The proposed RAP-Optimizer is an innovative approach that starts with resource analysis. After that, the resource space landscape is generated to evaluate the resource distribution. Finally, it combines DNN with the simulated annealing optimization to optimize the cloud resource without compromising the service quality.

4.5.1. Resource Analysis

The RAP-Optimizer starts with resource analysis, performed using the resource analyzer (RAN) algorithm, presented as Algorithm 2, developed in this study. The RAN algorithm requires access and scanning permission for both virtual machines (VMs) and their physical hosts. Initially, it scans the entire system, identifies the active VMs through their unique ID (VID), retrieves their current configurations, and fetches the resource consumption history. The resource consumption history is stored in the VM log, which is fetched by the r e s h i s t ( v m ) function. After that, it explores the physical hosts supporting the VMs. It performs similar operations on physical hosts. Finally, it scans the idle physical hosts. The RAN algorithm maintains a resource status table (RT) which is frequently updated. Based on the resource consumption, it generates a Resource Space (RS) represented by bar graphs.
Algorithm 2 Resource analyzer (RAN) algorithm
  1:
procedure RAN_Optimizer
  2:
    Input: Access permissions for VMs and physical hosts
  3:
    Output: R S , Updated R T
  4:
    Initialize: R T I n i t ( R o u t i n g T a b l e )
  5:
    Scan system and fetch VMs and Hosts
  6:
    for  v m V M s  do                                                                                       ▹ For each VM
  7:
        Identify active VMs: v m I D ( v m )
  8:
        Get VM configurations: c f g ( v m )
  9:
        Fetch resource history: r e s _ h i s t ( v m )
10:
    end for
11:
    for  h o s t H o s t s  do                                                                                    ▹ For each host
12:
        Get host configurations: c f g ( h o s t )
13:
        Fetch resource history: r e s _ h i s t ( h o s t )
14:
    end for
15:
    Scan idle hosts: i d l e _ h o s t s S c a n ( H o s t s )
16:
    Update: R T U p d a t e ( c f g , r e s _ h i s t )
17:
    Generate: R S ( V I D , R e s o u r c e )
18:
    return  R S , R T                                   ▹ Return both Resource Space and Routing Table
19:
end procedure

4.5.2. Resource Space Landscape (RSL)

The RAN algorithm returns the RS and RT, where the VM configurations, current resource consumption rate, IDs of the VMs ( V I D ), corresponding physical hosts, and other relevant data are available. These data are used to prepare the RSL, showing active VMs in physical hosts. The RSL of a random instance is illustrated in Figure 7. The data center offering the AIaaS consists of 16 physical hosts in each server rack. The experimental environment has 3 server racks. According to the organization’s policy, a maximum of 10 VMs are allowed concurrently in a single host to maintain the terms of services (TOS). Each physical host is powered by a 10-core CPU with 128 GB primary memory.

4.5.3. Deep-Annealing Algorithm

The RAP-Optimizer combines the DNN presented in Section 4.3 and the simulated annealing algorithm, which was named deep-annealing and presented in Algorithm 3. The process starts by connecting to the AIaaS Data Center (AIDC) with permission to access the status of all physical and virtual devices. It applies the RAN Analyzer algorithm to retrieve the resource space (RS) and resource table (RT), which include the VM configurations, current resource usage, and corresponding physical hosts. These outputs provide a comprehensive view of the system’s resource space ( R S S ). After that, the deep-annealing algorithm utilizes the DNN to predict the VM configuration and respond to the request for subsequent AI services. The deep-annealing algorithm identifies hosts for VM deployment. It explores potential hosts iteratively, evaluating the cost ( c o s t ) and objective ( o b j ) functions to select a suitable host. The ( c o s t ) looks for the lowest possible value, whereas the role of the ( o b j ) function is to find out the highest possible value on the resource space. The process begins at a high-temperature state, allowing probabilistic acceptance of less optimal solutions to escape local minima. The H a s C a p a c i t y ( ) function is used to check whether the current host has sufficient resources for the new VM whose configuration was predicted by the DNN. When sufficient resources are available, the C r e a t e I n s t a n c e ( ) function is used to deploy the VM. It may happen that the current host does not have enough resources. In this case, the algorithm uses the N e x t H o s t ( ) function. This function uses the probability distribution influenced by the current temperature. This process allows a broader exploration of the available configurations.
The proposed method, the RAP-Optimizer, is periodically called with an interval of 15 min. Every time it is called, it monitors the resources and optimizes them. The user pattern shows that the average active period of the users is 15 min. That is why this period was used. This interval ensures that the proposed system maintains a real-time adaptation to the changing workloads. It further minimizes the excessive computational overhead. Every time the RAP-Optimizer is initialized, it is called the deep-annealing method. However, it depends on the need for resource distribution. The RAP-Optimizer performs an initial resource analysis. This analysis identifies the underutilized hosts or overloaded hosts. When it is necessary to optimize the resources, the RAP-Optimizer invokes the deep-annealing method. It adjusts the allocation of virtual machines across hosts. As a result, the system minimizes the number of active hosts while maintaining performance requirements.
Algorithm 3 Deep-annealing algorithm
  1:
AIDC Connect ( c r e d e n t i a l s )
  2:
( R S , R T ) RAN Analyzer ( AIDC )
  3:
C ^ V M DNN ( RS , RT )
  4:
RSS RetrieveResourceSpace ( RS , RT )
  5:
Initialize temperature T
  6:
( H s t a r t ) SimulatedAnnealing ( RSS , obj , cost , T )
  7:
while  ¬ InstanceCreated  and  T > T min  do
  8:
    if  HasCapacity ( H s t a r t , C ^ V M )  then
  9:
         CreateInstance ( H s t a r t , V M )
10:
         InstanceCreated True
11:
    else
12:
         H s t a r t NextHost ( H s t a r t , RSS , T )
13:
        Decrease temperature T α T
14:
    end if
15:
end while
16:
for  v m AIDC  do
17:
    if  CanMigrate ( v m , H s t a r t )  then
18:
         Migrate ( v m , H s t a r t )
19:
    end if
20:
end for
If the initially selected host lacks sufficient resources for the VM deployment, N e x t H o s t ( ) identifies alternative hosts based on a probability distribution influenced by the current temperature T. This ensures the search process avoids local optima and explores a wider range of configurations. As the temperature gradually decreases with each iteration by a factor α , the exploration process narrows down to focus on optimal hosts for VM deployment. Once a suitable host is found and the VM instance is deployed, the algorithm evaluates potential migrations for existing VMs to optimize overall resource usage. The function C a n M i g r a t e ( ) checks whether a VM can be moved to a host that offers better resource efficiency, and if migration is feasible, the M i g r a t e ( ) function performs the transfer. The C a n M i g r a t e ( ) function evaluates both the feasibility of migrating a VM and the suitability of potential target hosts. This function checks migration feasibility based on the current resource utilization, performance requirements, and network stability. If a VM’s migration is feasible without disrupting service quality, the function proceeds to identify a suitable host. The target host is selected based on its available CPU, memory, and network capacity, ensuring it meets the requirements of the migrating VM. Additionally, the CanMigrate() function prioritizes hosts with higher resource availability and lower current workloads to maximize efficiency and maintain balanced resource utilization across the data center. The M i g r a t e ( ) function initiates VM migration to optimize resource allocation. To ensure system stability and avoid overloading resources, a limit is applied to the number of simultaneous VM migrations. This cap allows the RAP-Optimizer to manage migrations efficiently, preventing potential performance degradation caused by excessive concurrent migrations. This process ultimately aims to consolidate workloads, improve resource utilization, and reduce the operational costs associated with idle hosts.

4.5.4. Criteria for Optimal Resource Usage

The RAP-Optimizer establishes optimal cloud resource usage by evaluating key metrics such as CPU, memory, and bandwidth utilization across active hosts. Optimal usage is achieved when resources are maximally utilized without exceeding predefined thresholds that could lead to performance degradation or resource contention. Specifically, CPU and memory utilization levels above 85% but below 95% are considered optimal, ensuring sufficient headroom for unexpected loads while maximizing efficiency. Bandwidth utilization is similarly monitored, with an upper threshold set to prevent network bottlenecks. The RAP-Optimizer dynamically manages these resources to maintain this balance, activating additional hosts only when existing hosts reach their optimal utilization limits.

4.6. Mobility-Aware Resource Allocation

To optimize routing costs and enhance the efficiency of RAP-Optimizer in resource-limited networks, we introduce a mobility-aware component to the resource allocation process. This enhancement considers not only the current utility of host servers but also their physical proximity to service requesters. Inspired by recent advancements in mobility-aware routing, as discussed in [49], this modification enables RAP-Optimizer to prioritize host servers that are geographically closer to service requesters, thereby reducing the overall routing costs.
Specifically, this approach calculates the distance-based routing cost R c using Equation (15), where d i j represents the distance between the service requester i and the host server j, and w is a weighting factor that adjusts the impact of distance on the overall resource allocation cost:
R c = w × d i j
During the request allocation phase, RAP-Optimizer now evaluates both resource availability and routing cost, selecting host servers that offer low utility but are also in close proximity to the requester, thereby minimizing latency and operational costs. This adjustment ensures that the RAP-Optimizer remains effective even under resource constraints, maintaining performance gains while addressing network distance considerations.

5. Experimental Result and Evaluation

5.1. Evaluation Metrics

The overall performance of the RAP-Optimizer depends on the prediction quality of the DNN developed in this study. The performance of this DNN was evaluated using accuracy, precision, recall, and F1 score. These are the most frequently used evaluation metrics for machine learning approaches [50]. These metrics are defined in Equations (16), (17), (18), and (19), respectively, which are dependent on true positive (TP), true negative (TN), false positive (FP), and false negative (FN). This study retrieves these values from the confusion matrix illustrated in Figure 8.
Accuracy = TP + TN TP + TN + FP + FN
Precision = TP TP + FP
Recall = TP TP + FN
F 1 Score = 2 × ( Precision × Recall ) Precision + Recall
Apart from ML performance evaluation metrics, the RAP-Optimizer performance was evaluated based on the capability of reducing the number of active hosts, improving API request management, and resource optimization [11,51]. Moreover, the overall objective achievement was used as an evaluation metric as well.

5.2. Confusion Matrix Analysis

The confusion matrix analysis reveals the performance of the classification model across five target classes: Basic, Standard, Intermediate, Advanced, and Premium, with an overall accuracy of 96.1%. The precision values vary slightly among the classes, with Basic achieving the highest precision at 97.7%, followed by Premium at 97.7%, Intermediate at 97.4%, Advanced at 97.3%, and Standard at 96.9%. Recall values indicate how well the model identifies each class, with Premium leading at 98.0%, followed by Basic at 97.3%, Standard at 97.2%, Intermediate at 97.4%, and Advanced at 97.2%. The F1 scores, which balance precision and recall, are consistent across the classes, with Basic at 97.5%, Standard at 97.0%, Intermediate at 97.4%, Advanced at 97.3%, and Premium at 97.8%. These metrics suggest that the model performs well overall, but slight variations exist between classes in terms of how accurately they are classified and identified.

5.3. K-Fold Cross-Validation

K-fold cross-validation is a robust method used to evaluate the performance of machine learning models by splitting the dataset into multiple folds. In this analysis, six folds were used, as shown in Table 4, with slight variations observed across each fold for metrics such as accuracy, precision, recall, and F1 score. The average values were calculated across all folds to provide a reliable overall assessment. Accuracy remained consistent, ranging from 96.0% to 96.4%, while precision ranged between 97.3% and 97.7%. Recall and the F1 score also showed slight variations, with recall values between 96.8% and 97.6% and F1 scores between 97.0% and 97.3%. The spider chart in Figure 9 visually represents the performance across the metrics, emphasizing the stability and effectiveness of the model across the different folds, providing confidence in its generalizability.

5.4. Active Physical Host

Table 5 presents a comparative analysis of the number of active hosts per 24 h before and after the implementation of the deep-annealing algorithm over a twelve-week period. The results presented in Table 5 correspond to the input activity measured in terms of API requests and application load handled by the AIaaS platform. Specifically, these results reflect the distribution and handling of a standardized set of API calls generated by the applications hosted on VMs. Each VM instance processed a varying number of requests depending on its allocated resources and current workload, and the table captures the system’s performance in terms of resource utilization and API handling efficiency both before and after applying the RAP-Optimizer. The input activities were kept consistent across evaluations to ensure a fair comparison between the baseline and optimized states. Before the introduction of the algorithm, the average number of active hosts per day was recorded at 33. With the deployment of the deep-annealing algorithm, this average was reduced to 28 active hosts per day, signifying an average reduction of 5 hosts. This demonstrates the efficacy of the deep-annealing algorithm in optimizing resource utilization within the data center environment.
A closer examination, illustrated in Figure 10, reveals a fluctuating yet consistently positive impact of the deep-annealing algorithm across the weeks. The most notable improvement was observed in the second week, where the number of active hosts was reduced by 7, from 29 without the algorithm to 22 with it. The least improvement was seen in weeks 3 and 4, where a reduction of 4 hosts was achieved. Weeks 1, 5, 6, 9, and 11 saw a reduction of 6 hosts, while weeks 7, 8, 10, and 12 observed a decrease of 5 hosts. These results underscore the capacity of the deep-annealing algorithm to effectively manage and reduce the number of active hosts required to support the operational demands of a SaaS application data center.

5.5. Request Optimization

Table 6 presents the resource utilization and API request handling before and after implementing the proposed RAP-Optimizer. Initially, all physical hosts were active, handling a random number of API requests, with some hosts underutilized in terms of CPU and RAM. The number of API requests generated follows a Poisson distribution, commonly used to model request rates in cloud environments, with a mean parameter λ = 15 , which represents the average number of API requests per host per time unit. This distribution was chosen to simulate typical variations in API traffic experienced in the experimenting cloud-based AIaaS environment [52]. After applying the RAP-Optimizer, the system efficiently allocated API requests to fewer hosts, maximizing resource utilization before activating additional hosts. As a result, five physical hosts were placed into idle mode, conserving resources and reducing energy consumption. The remaining hosts handled the same workload but at higher efficiency, utilizing their CPU and RAM to near full capacity. Before the method was implemented, many hosts processed fewer API requests, with CPU cores and RAM underutilized. For example, hosts 4, 6, and 10 handled only a fraction of their capacity, running only 3 to 7 API requests, leaving a significant portion of resources idle. After applying the deep-annealing method, these API requests were consolidated onto hosts with higher resource availability, allowing some physical hosts to transition into idle mode. This approach optimized resource utilization across the active hosts while reducing operational overhead. The results, as shown in Table 6, demonstrate that the proposed system can handle the same number of API requests using fewer physical hosts, which leads to significant resource savings and improved data center efficiency.

5.6. Resource Optimization

The RAP-Optimizer demonstrates significant resource optimization potential. Figure 11 shows the before and after effect comparison. Initially, before implementing the RAP-Optimizer, the server costs steadily increased from USD 200 to USD 2600 over the 12 months, as the system inefficiently activated additional hosts to handle the growing number of API requests. This led to underutilized resources and inflated operational costs. After applying the proposed RAP-Optimizer, the server cost was reduced each month, stabilizing at around USD 1250 by the 12th month. This reduction is attributed to the system’s ability to consolidate API requests to fully utilize existing active hosts before activating new ones. As a result, unnecessary activation of hosts was minimized, leading to significant cost savings.
Despite the optimization, the application’s return remained consistent, showing that user demand and subscription revenues were unaffected by the changes. The table highlights a clear increase in the revenue margin after optimization—from an initial margin of USD 300 in the first month, the margin grew to USD 1750 by the 12th month. This improvement demonstrates the effectiveness of the proposed system in reducing operational costs while maintaining consistent service delivery, thus ensuring the profitability and sustainability of the AI-driven image enhancement web application over time. By efficiently managing resources and reducing the number of active hosts, the system successfully increases profit margins, even as server usage grows.

5.7. Objective Achievement

The primary objective of this study is to increase the profit margin. After implementing the proposed RAP-Optimizer, server cost, return, and revenue margin were observed for 12 months. During this observation period, the existing approach and the system with RAP-Optimizer were active simultaneously. The number of users allowed to use the optimized system was kept similar to the existing system for fair comparison. The data observed over the 12-month period are listed in Table 7. After using the RAP-Optimizer, on average, the server cost was reduced by approximately 45%, dropping from an initial USD 2600 to a stable USD 1250 by the 12th month. This resulted in an average monthly cost reduction of USD 1150. In parallel, the profit margin increased from an average of USD 600 per month before optimization to USD 1675 after optimization, reflecting a 179% increase in profit over the 12-month period. Additionally, the reduction in the number of active hosts, as shown in Table 6, led to more efficient resource utilization, ensuring that the same number of API requests were handled with fewer servers, further driving operational cost savings and profit maximization.

5.8. Case Study Analysis

To highlight the RAP-Optimizer’s performance, we conducted a case study comparing its efficiency with traditional methods, such as Kubernetes’ Horizontal Pod Autoscaler (HPA) [53] and Amazon Web Services (AWS) Auto Scaling [54], within high-demand periods for a large-scale AI image processing application. During peak loads, RAP-Optimizer reduced the number of active physical hosts by 20% in comparison to Kubernetes HPA and AWS Auto Scaling, while ensuring sub-100 ms response times. Additionally, RAP-Optimizer achieved a 28% reduction in server costs over AWS Auto Scaling, attributed to its predictive configuration adjustment model, which proactively manages resources to match API request patterns. The RAP-Optimizer’s approach reduced latency by an average of 15%, leveraging real-time workload analysis and minimizing idle resource allocations, outperforming existing methods like the Google Kubernetes Engine’s (GKE) default resource scaling by dynamically consolidating workloads [55]. These results demonstrate the RAP-Optimizer’s unique capability to handle dynamic and unpredictable workloads typical of AIaaS environments, achieving operational savings, enhanced resource utilization, and improved performance beyond what is currently possible with industry-standard tools.

5.9. Comparative Performance Evaluation

The performance of the proposed RAP-Optimizer was compared with other CEDULE+, which is considered a state-of-the-art method [56]. This quantitative performance analysis establishes the benchmark to perceive the level of improvement the proposed RAP-Optimizer can introduce. The experimental data are presented in Table 8. This comparison summarizes the quantitative results. It compares RAP-Optimizer with CEDULE+ across multiple key performance metrics. These metrics are cost savings, reduction in active hosts, prediction accuracy, and response time. Both systems were deployed under similar conditions so that the performance data remained unbiased. The experiment was conducted in a dynamic workload environment representative of typical AIaaS scenarios.
The experimental data in Table 8 show that the proposed RAP-Optimizer saves 13% of costs compared to the CEDULE+. The main contributor behind this improvement is the effective consolidation of API requests onto fewer active hosts offered by the RAP-Optimizer. This method reduces the overall energy consumption, which, in other words, reduces the cost. Moreover, the experimental data show that the proposed RAP-Optimizer introduces the capability of reducing additional active hosts by 10%. From the response time perspective, the RAP-Optimizer is more responsive than the CEDULE+. Overall, the proposed RAP-Optimizer demonstrates better performance in the experimenting environment.

5.10. Mobility-Aware Performance Comparison

The mobility-aware routing enhancement method was integrated into the proposed RAP optimizer to ensure network resource optimization. In this approach, the distance between the host servers and the service requesters was considered. Reducing the routing cost is another way of optimizing AIaaS resources [57]. The performance of the mobility-aware approach is listed in Table 9, which provides a summary of the experimental results under similar conditions. The key metrics of this analysis are cost savings, reduction in active hosts, prediction accuracy, response time, and routing cost. The experimental data show that the mobility-aware RAP-Optimizer improves the performance. It reduces the response time and lowers the routing cost. To be precise, according to the experimental data, the distance-aware approach achieved a 15% reduction in routing costs. It is an 18% improvement in response time compared to the standard RAP-Optimizer.
The experimental results presented in Table 9 indicate that mobility-aware RAP-Optimizer dynamically hosts requests to the host server to minimize network overhead. That is why the network resources are optimized when it is incorporated with the proposed RAP-Optimizer. This enhanced capability helps maintain a high level of efficiency, particularly for AI-driven applications with unpredictable and geographically diverse workloads.

6. Limitations and Future Scope

While the proposed RAP-Optimizer has shown promising results in reducing operational costs and optimizing resource utilization, there are several areas where the approach can be further improved. This section outlines the key limitations of the current system and proposes future directions to enhance its functionality, scalability, and efficiency in handling cloud-based AI services.

6.1. Resource Utilization Focus

One limitation of the proposed RAP-Optimizer is that it primarily focuses on optimizing CPU and memory utilization, while other critical factors, such as network bandwidth, disk I/O, and latency, are not fully considered. These elements play some role in the overall system performance. When the number of traffic increases, the impact of these elements becomes more vital. One of the approaches to overcome the limitations is to expand the dataset features and incorporate these elements as well. However, the scope of this extension will be explored in the subsequent version of this study.

6.2. Uniform API Request Handling

The experimenting AIaaS application offers image processing services only. As a result, the cloud servers do not have any configuration variations. However, applications that offer multi-modal services, such as audio and video processing, will depend on multiple different hardware configurations at the back end. As a result, selecting the appropriate hardware to transfer the API call will require an additional logical layer to make appropriate decisions. The proposed RAP-Optimizer does not incorporate this complexity. This is another limitation of this approach. A potential solution to overcome this weakness is categorizing the API requests based on their nature, which will be explored in the future of this method.
The RAP-Optimizer considers that all API requests can be uniformly distributed across available hosts, without accounting for hardware differences or the geographical location of hosts, which may affect response times or performance. Addressing this limitation could involve enhancing the system to factor in host-specific attributes like hardware capabilities and geographic proximity, allowing for smarter resource allocation that reduces latency and improves service quality.

6.3. Predictive Optimization Approach

The current version of the proposed RAP-Optimizer is a predictive approach. The overall effectiveness of the process depends on the accuracy of the prediction. One of the major weaknesses of this approach is if the configurations are predicted incorrectly, the RAP-Optimizer fails to handle it. One of the solutions to overcome this limitation is to incorporate the false positive rate into the decision-making process. The potential of this solution will be explored in the future scope of this study.

6.4. Single-Cloud Focus

Multi-cloud or hybrid architectures offer more flexibility and cost-effective structure. The proposed AIaaS application runs on a single-cloud architecture, so the entire experiment is conducted on it. The findings are relevant, and the proposed approach is suitable for single-cloud architecture only. This is one of the weaknesses of the proposed system. However, hundreds of AIaaS run on single-cloud architecture, which makes the proposed approach feasible and effective. The subsequent version of this paper will focus on multi-cloud or hybrid-cloud architecture in the future.

7. Conclusions

The RAP-Optimizer presented in this paper demonstrates an effective way to overcome the challenge of resource utilization and cost optimization in AIaaS applications. Although it focuses on an image enhancement application only, this study plays an instrumental role in creating the blueprint for optimization techniques for similar applications. This study also demonstrated an innovative application of the combination of DNN and simulated annealing algorithm in cloud resource optimization. The experimental results presented in this paper show significant reductions in the number of active physical hosts and server costs without compromising service quality. Over a twelve-month observation period, the RAP-Optimizer achieved an average reduction of 45% in server costs. It further shows a 179% increase in profit margins, clearly highlighting the practical benefits of the system. The DNN was designed and trained in this study to predict the resource configuration and maintain high classification accuracy across all five service classes. The confusion matrix shows that the DNN achieved 97.48% validation accuracy. K-fold cross-validation was performed to validate the performance. In the cross-validation, the performance of the DNN prediction remained consistent for all folds. That means the DNN is both accurate and well-trained and has good generalization capability. Moreover, the introduction of the unique dynamic dropout control (DDC) algorithm contributed to mitigating the overfitting effects. Combining the simulated annealing algorithm with the DNN unblocked a new dimension of AIaaS cloud resource optimization, which dynamically manages the cloud resources. It effectively reduces the number of active hosts, which maximizes the existing resource utilization without activating new physical hosts, which saves a significant amount of resources and energy. This efficiency directly translates into improved sustainability and cost-effectiveness for AIaaS applications.
The innovative design, effective performance, and real-world positive impact on enlarging profit margin in AIaaS application business demonstrated the potential of the proposed RAP-Optimizer. Despite multiple advantages, the RAP-Optimizer suffers from certain limitations. Excluding other resources except for CPU and memory utilization, lack of heterogeneous API request management analysis, and experiment on the single-cloud environment are some of the limitations of the experiment presented in this paper. In conclusion, the RAP-Optimizer provides a scalable, efficient, and cost-effective solution for resource management in cloud-based AI services, offering substantial improvements in operational efficiency, energy savings, and profit margins. As AI-driven applications continue to grow in scale and complexity, solutions like the RAP-Optimizer will play a crucial role in ensuring that cloud resources are utilized to their fullest potential, paving the way for more sustainable and profitable cloud-based business models.

Author Contributions

Conceptualization, K.S. and S.A.; methodology, R.A. and A.V.; software, K.S. and A.V.; validation, K.S., R.A. and S.A.; formal analysis, K.S. and R.A.; investigation, R.A. and A.V.; resources, S.A.; data curation, R.A.; writing—original draft preparation, K.S. and R.A.; writing—review and editing, A.V. and S.A.; visualization, K.S. and S.A.; supervision, S.A.; project administration, K.S. and R.A.; funding acquisition, S.A. All authors have read and agreed to the published version of the manuscript.

Funding

Thisresearch received no external funding.

Data Availability Statement

The data utilized in this study were generated from an AIaaS application and are considered confidential. However, the data can be made available upon reasonable request. Requesters must agree to the terms and conditions, which stipulate that the data cannot be used for any commercial or research purposes without prior permission from the owner.

Conflicts of Interest

Author Kaushik Sathupadi was employed by the company Google LLC, author Ramya Avula was employed by the company Business Information Developer Consultant Company, author Arunkumar Velayutham was employed by the company Cloud Software Development Engineer and Technical Lead at Intel and author Sandesh Achar was employed by the company Walmart Global Tech. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Deng, S.; Zhao, H.; Huang, B.; Zhang, C.; Chen, F.; Deng, Y.; Yin, J.; Dustdar, S.; Zomaya, A.Y. Cloud-native computing: A survey from the perspective of services. Proc. IEEE 2024, 112, 12–46. [Google Scholar] [CrossRef]
  2. Tuli, S.; Mirhakimi, F.; Pallewatta, S.; Zawad, S.; Casale, G.; Javadi, B.; Yan, F.; Buyya, R.; Jennings, N.R. AI augmented Edge and Fog computing: Trends and challenges. J. Netw. Comput. Appl. 2023, 216, 103648. [Google Scholar] [CrossRef]
  3. Badshah, A.; Ghani, A.; Siddiqui, I.F.; Daud, A.; Zubair, M.; Mehmood, Z. Orchestrating model to improve utilization of IaaS environment for sustainable revenue. Sustain. Energy Technol. Assess. 2023, 57, 103228. [Google Scholar] [CrossRef]
  4. Horchulhack, P.; Viegas, E.K.; Santin, A.O.; Ramos, F.V.; Tedeschi, P. Detection of quality of service degradation on multi-tenant containerized services. J. Netw. Comput. Appl. 2024, 224, 103839. [Google Scholar] [CrossRef]
  5. Miikkulainen, R.; Liang, J.; Meyerson, E.; Rawal, A.; Fink, D.; Francon, O.; Raju, B.; Shahrzad, H.; Navruzyan, A.; Duffy, N.; et al. Evolving deep neural networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing; Elsevier: Amsterdam, The Netherlands, 2024; pp. 269–287. [Google Scholar]
  6. Pardalos, P.M.; Mavridou, T.D. Simulated annealing. In Encyclopedia of Optimization; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–3. [Google Scholar]
  7. Zhou, G.; Tian, W.; Buyya, R.; Xue, R.; Song, L. Deep reinforcement learning-based methods for resource scheduling in cloud computing: A review and future directions. Artif. Intell. Rev. 2024, 57, 124. [Google Scholar] [CrossRef]
  8. Mohammadzadeh, A.; Chhabra, A.; Mirjalili, S.; Faraji, A. Use of whale optimization algorithm and its variants for cloud task scheduling: A review. In Handbook of Whale Optimization Algorithm; Elsevier: Amsterdam, The Netherlands, 2024; pp. 47–68. [Google Scholar]
  9. Musabimana, B.B.; Bucaioni, A. Integrating AIaaS into Existing Systems: The Gokind Experience. In Proceedings of the International Conference on Information Technology-New Generations; Springer: Berlin/Heidelberg, Germany, 2024; pp. 417–426. [Google Scholar]
  10. Kurian, A.M.; Onuorah, M.J.; Ammari, H.M. Optimizing Coverage in Wireless Sensor Networks: A Binary Ant Colony Algorithm with Hill Climbing. Appl. Sci. 2024, 14, 960. [Google Scholar] [CrossRef]
  11. Faruqui, N.; Yousuf, M.A.; Kateb, F.A.; Hamid, M.A.; Monowar, M.M. Healthcare As a Service (HAAS): CNN-based cloud computing model for ubiquitous access to lung cancer diagnosis. Heliyon 2023, 9, e21520. [Google Scholar] [CrossRef]
  12. Hossen, R.; Whaiduzzaman, M.; Uddin, M.N.; Islam, M.J.; Faruqui, N.; Barros, A.; Sookhak, M.; Mahi, M.J.N. Bdps: An efficient spark-based big data processing scheme for cloud fog-iot orchestration. Information 2021, 12, 517. [Google Scholar] [CrossRef]
  13. Achar, S.; Faruqui, N.; Bodepudi, A.; Reddy, M. Confimizer: A novel algorithm to optimize cloud resource by confidentiality-cost trade-off using bilstm network. IEEE Access 2023, 11, 89205–89217. [Google Scholar] [CrossRef]
  14. Kumar, P.; Kumar, R. Issues and challenges of load balancing techniques in cloud computing: A survey. ACM Comput. Surv. (CSUR) 2019, 51, 1–35. [Google Scholar] [CrossRef]
  15. Xu, W.; Jang-Jaccard, J.; Singh, A.; Wei, Y.; Sabrina, F. Improving performance of autoencoder-based network anomaly detection on nsl-kdd dataset. IEEE Access 2021, 9, 140136–140146. [Google Scholar] [CrossRef]
  16. Shi, J.; Fu, K.; Wang, J.; Chen, Q.; Zeng, D.; Guo, M. Adaptive QoS-aware Microservice Deployment with Excessive Loads via Intra-and Inter-Datacenter Scheduling. IEEE Trans. Parallel Distrib. Syst. 2024, 35, 1565–1582. [Google Scholar] [CrossRef]
  17. Vuillod, B.; Zani, M.; Hallo, L.; Montemurro, M. Handling noise and overfitting in surrogate models based on non-uniform rational basis spline entities. Comput. Methods Appl. Mech. Eng. 2024, 425, 116913. [Google Scholar] [CrossRef]
  18. Alnagashi, F.A.K.Q.; Rahim, N.A.; Shukor, S.A.A.; Hamid, M.H.A. Mitigating Overfitting in Extreme Learning Machine Classifier Through Dropout Regularization. Appl. Math. Comput. Intell. (AMCI) 2024, 13, 26–35. [Google Scholar]
  19. Salehin, I.; Kang, D.K. A review on dropout regularization approaches for deep neural networks within the scholarly domain. Electronics 2023, 12, 3106. [Google Scholar] [CrossRef]
  20. Zhang, Z.; Xu, Z.Q.J. Implicit regularization of dropout. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 4206–4217. [Google Scholar] [CrossRef]
  21. Poobalan, A.; Sangeetha, S.; Shanthakumar, P. Performance Optimization and Energy Minimization of Cloud Data Center Using Optimal Switching and Load Distribution Model. Sustain. Comput. Inform. Syst. 2024, 43, 101013. [Google Scholar]
  22. Buyya, R.; Ilager, S.; Arroba, P. Energy-efficiency and sustainability in new generation cloud computing: A vision and directions for integrated management of data centre resources and workloads. Softw. Pract. Exp. 2024, 54, 24–38. [Google Scholar] [CrossRef]
  23. Katal, A.; Choudhury, T.; Dahiya, S. Energy optimized container placement for cloud data centers: A meta-heuristic approach. J. Supercomput. 2024, 80, 98–140. [Google Scholar] [CrossRef]
  24. Mongia, V. EMaC: Dynamic VM Consolidation Framework for Energy-Efficiency and Multi-metric SLA Compliance in Cloud Data Centers. SN Comput. Sci. 2024, 5, 643. [Google Scholar] [CrossRef]
  25. Rajagopalan, A.; Swaminathan, D.; Bajaj, M.; Damaj, I.; Rathore, R.S.; Singh, A.R.; Blazek, V.; Prokop, L. Empowering power distribution: Unleashing the synergy of IoT and cloud computing for sustainable and efficient energy systems. Results Eng. 2024, 21, 101949. [Google Scholar] [CrossRef]
  26. Sun, Y.; Wang, Z.J.; Deveci, M.; Chen, Z.S. Optimal releasing strategy of enterprise software firms facing the competition from cloud providers. Expert Syst. Appl. 2024, 236, 121264. [Google Scholar] [CrossRef]
  27. Khan, A.Q.; Matskin, M.; Prodan, R.; Bussler, C.; Roman, D.; Soylu, A. Cloud storage cost: A taxonomy and survey. World Wide Web 2024, 27, 36. [Google Scholar] [CrossRef]
  28. Nezafat Tabalvandani, M.A.; Hosseini Shirvani, M.; Motameni, H. Reliability-aware web service composition with cost minimization perspective: A multi-objective particle swarm optimization model in multi-cloud scenarios. Soft Comput. 2024, 28, 5173–5196. [Google Scholar] [CrossRef]
  29. Chi, Y.; Dai, W.; Fan, Y.; Ruan, J.; Hwang, K.; Cai, W. Total cost ownership optimization of private clouds: A rack minimization perspective. Wirel. Netw. 2024, 30, 3855–3869. [Google Scholar] [CrossRef]
  30. Moreira, L.F.R.; Moreira, R.; Travençolo, B.A.N.; Backes, A.R. An Artificial Intelligence-as-a-Service Architecture for deep learning model embodiment on low-cost devices: A case study of COVID-19 diagnosis. Appl. Soft Comput. 2023, 134, 110014. [Google Scholar] [CrossRef]
  31. Simic, V.; Stojanovic, B.; Ivanovic, M. Optimizing the performance of optimization in the cloud environment–An intelligent auto-scaling approach. Future Gener. Comput. Syst. 2019, 101, 909–920. [Google Scholar] [CrossRef]
  32. Chen, X.; Wang, H.; Ma, Y.; Zheng, X.; Guo, L. Self-adaptive resource allocation for cloud-based software services based on iterative QoS prediction model. Future Gener. Comput. Syst. 2020, 105, 287–296. [Google Scholar] [CrossRef]
  33. Kirti, M.; Maurya, A.K.; Yadav, R.S. Fault-tolerance approaches for distributed and cloud computing environments: A systematic review, taxonomy and future directions. Concurr. Comput. Pract. Exp. 2024, 36, e8081. [Google Scholar] [CrossRef]
  34. Debinski, M.; Breitinger, F.; Mohan, P. Timeline2GUI: A Log2Timeline CSV parser and training scenarios. Digit. Investig. 2019, 28, 34–43. [Google Scholar] [CrossRef]
  35. Jayaweera, C.; Aziz, N. Reliability of principal component analysis and Pearson correlation coefficient, for application in artificial neural network model development, for water treatment plants. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2018; Volume 458, p. 012076. [Google Scholar]
  36. Faruqui, N.; Yousuf, M.A.; Chakraborty, P.; Hossain, M.S. Innovative automation algorithm in micro-multinational data-entry industry. In Cyber Security and Computer Science: Proceedings of the Second EAI International Conference, ICONCS 2020, Dhaka, Bangladesh, 15–16 February 2020; Proceedings 2; Springer: Berlin/Heidelberg, Germany, 2020; pp. 680–692. [Google Scholar]
  37. Racherla, S.; Sripathi, P.; Faruqui, N.; Kabir, M.A.; Whaiduzzaman, M.; Shah, S.A. Deep-IDS: A Real-Time Intrusion Detector for IoT Nodes Using Deep Learning. IEEE Access 2024, 12, 63584–63597. [Google Scholar] [CrossRef]
  38. Demircioğlu, A. The effect of feature normalization methods in radiomics. Insights Imaging 2024, 15, 2. [Google Scholar] [CrossRef] [PubMed]
  39. Geem, D.; Hercules, D.; Pelia, R.S.; Venkateswaran, S.; Griffiths, A.; Noe, J.D.; Dotson, J.L.; Snapper, S.; Rabizadeh, S.; Rosh, J.R.; et al. Progression of Pediatric Crohn’s Disease Is Associated With Anti–Tumor Necrosis Factor Timing and Body Mass Index Z-Score Normalization. Clin. Gastroenterol. Hepatol. 2024, 22, 368–376. [Google Scholar] [CrossRef] [PubMed]
  40. Trivedi, S.; Patel, N.; Faruqui, N. NDNN based U-Net: An Innovative 3D Brain Tumor Segmentation Method. In Proceedings of the 2022 IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 26–29 October 2022; pp. 0538–0546. [Google Scholar]
  41. Ullah, U.; Garcia-Zapirain, B. Quantum machine learning revolution in healthcare: A systematic review of emerging perspectives and applications. IEEE Access 2024, 12, 11423–11450. [Google Scholar] [CrossRef]
  42. Faruqui, N.; Yousuf, M.A.; Whaiduzzaman, M.; Azad, A.; Alyami, S.A.; Liò, P.; Kabir, M.A.; Moni, M.A. SafetyMed: A novel IoMT intrusion detection system using CNN-LSTM hybridization. Electronics 2023, 12, 3541. [Google Scholar] [CrossRef]
  43. Shahiwala, A.F.; Qawoogha, S.S.; Faruqui, N. Designing optimum drug delivery systems using machine learning approaches: A prototype study of niosomes. AAPS PharmSciTech 2023, 24, 94. [Google Scholar] [CrossRef]
  44. Faruqui, N.; Yousuf, M.A.; Whaiduzzaman, M.; Azad, A.; Barros, A.; Moni, M.A. LungNet: A hybrid deep-CNN model for lung cancer diagnosis using CT and wearable sensor-based medical IoT data. Comput. Biol. Med. 2021, 139, 104961. [Google Scholar] [CrossRef]
  45. Wang, L.; Ye, W.; Zhu, Y.; Yang, F.; Zhou, Y. Optimal parameters selection of back propagation algorithm in the feedforward neural network. Eng. Anal. Bound. Elem. 2023, 151, 575–596. [Google Scholar] [CrossRef]
  46. Xie, G.; Lai, J. An interpretation of forward-propagation and back-propagation of dnn. In Pattern Recognition and Computer Vision: Proceedings of the First Chinese Conference, PRCV 2018, Guangzhou, China, 23–26 November 2018, Proceedings, Part II 1; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–15. [Google Scholar]
  47. Cong, S.; Zhou, Y. A review of convolutional neural network architectures and their optimizations. Artif. Intell. Rev. 2023, 56, 1905–1969. [Google Scholar] [CrossRef]
  48. Paula, L.P.O.; Faruqui, N.; Mahmud, I.; Whaiduzzaman, M.; Hawkinson, E.C.; Trivedi, S. A novel front door security (FDS) algorithm using GoogleNet-BiLSTM hybridization. IEEE Access 2023, 11, 19122–19134. [Google Scholar] [CrossRef]
  49. Cao, Y.; Maghsudi, S.; Ohtsuki, T.; Quek, T.Q. Mobility-aware routing and caching in small cell networks using federated learning. IEEE Trans. Commun. 2023, 72, 815–829. [Google Scholar] [CrossRef]
  50. Hossain, M.E.; Faruqui, N.; Mahmud, I.; Jan, T.; Whaiduzzaman, M.; Barros, A. DPMS: Data-Driven Promotional Management System of Universities Using Deep Learning on Social Media. Appl. Sci. 2023, 13, 12300. [Google Scholar] [CrossRef]
  51. Kaur, H.; Anand, A. Review and analysis of secure energy efficient resource optimization approaches for virtual machine migration in cloud computing. Meas. Sens. 2022, 24, 100504. [Google Scholar] [CrossRef]
  52. Pavlik, J.; Sobeslav, V.; Horalek, J. Statistics and analysis of service availability in cloud computing. In Proceedings of the 18th International Database Engineering & Applications Symposium, Porto, Portugal, 7–9 July 2014; pp. 310–313. [Google Scholar]
  53. Augustyn, D.R.; Wyciślik, Ł.; Sojka, M. Tuning a Kubernetes Horizontal Pod Autoscaler for Meeting Performance and Load Demands in Cloud Deployments. Appl. Sci. 2024, 14, 646. [Google Scholar] [CrossRef]
  54. Nanthini, N.; Prabha, P.S.; Vidhyasri, R.; Anand, V.V. Fault Tolerance Using AutoScaling in Amazon Web Services. In Proceedings of the 2024 International Conference on Computing and Data Science (ICCDS), Chennai, India, 26–27 April 2024; pp. 1–6. [Google Scholar]
  55. Ali, B.; Golec, M.; Singh Gill, S.; Cuadrado, F.; Uhlig, S. ProKube: Proactive Kubernetes Orchestrator for Inference in Heterogeneous Edge Computing. Int. J. Netw. Manag. 2024, e2298. [Google Scholar] [CrossRef]
  56. Pinciroli, R.; Ali, A.; Yan, F.; Smirni, E. Cedule+: Resource management for burstable cloud instances using predictive analytics. IEEE Trans. Netw. Serv. Manag. 2020, 18, 945–957. [Google Scholar] [CrossRef]
  57. Li, S.; Zhou, Y.; Jiao, L.; Yan, X.; Wang, X.; Lyu, M.R.T. Towards operational cost minimization in hybrid clouds for dynamic resource provisioning with delay-aware optimization. IEEE Trans. Serv. Comput. 2015, 8, 398–409. [Google Scholar] [CrossRef]
Figure 1. The relationship among server costs, return on investment, and revenue margin.
Figure 1. The relationship among server costs, return on investment, and revenue margin.
Electronics 13 04462 g001
Figure 2. The methodological overview of the proposed RAP-Optimizer.
Figure 2. The methodological overview of the proposed RAP-Optimizer.
Electronics 13 04462 g002
Figure 3. The overlapping features from three different log files.
Figure 3. The overlapping features from three different log files.
Electronics 13 04462 g003
Figure 4. The feature variable ranges before and after performing the Z-score normalization.
Figure 4. The feature variable ranges before and after performing the Z-score normalization.
Electronics 13 04462 g004
Figure 5. The network architecture of the 6-layer deep fully connected neural network.
Figure 5. The network architecture of the 6-layer deep fully connected neural network.
Electronics 13 04462 g005
Figure 6. The learning progress in terms of training accuracy, validation accuracy, training loss, and validation loss.
Figure 6. The learning progress in terms of training accuracy, validation accuracy, training loss, and validation loss.
Electronics 13 04462 g006
Figure 7. The state space landscape shows the number of active VMs running on hosts.
Figure 7. The state space landscape shows the number of active VMs running on hosts.
Electronics 13 04462 g007
Figure 8. The confusion matrix obtained from the test dataset with 21,417 instances.
Figure 8. The confusion matrix obtained from the test dataset with 21,417 instances.
Electronics 13 04462 g008
Figure 9. The resource configuration prediction performance analysis using k-fold cross-validation.
Figure 9. The resource configuration prediction performance analysis using k-fold cross-validation.
Electronics 13 04462 g009
Figure 10. The comparison of the number of active hosts before and after using the deep-annealing algorithm.
Figure 10. The comparison of the number of active hosts before and after using the deep-annealing algorithm.
Electronics 13 04462 g010
Figure 11. The cost optimization before and after using the proposed method.
Figure 11. The cost optimization before and after using the proposed method.
Electronics 13 04462 g011
Table 1. Comparison of RAP-Optimizer with existing resource optimization methods.
Table 1. Comparison of RAP-Optimizer with existing resource optimization methods.
Optimization ModelAdvantagesLimitations
Hill-Climbing
(HC) Algorithm
Simple and effective for static
resource allocation.
Operates reactively; does not maximize existing
resource utilization; high cost in dynamic workloads
CEDULE+Predictive analytics for burstable
instances, good for CPU optimization
Limited in API request reallocation; moderate cost
savings; does not address DNN overfitting issues
Conventional
Autoscaling
Reliable for predictable workloads;
quick resource activation
Activates new hosts frequently without utilization
check; costly in AIaaS with variable workloads
RAP-Optimizer
(Proposed)
Predictive DNN with dynamic
dropout control; proactive host
deactivation; 45% cost savings
Initial setup complexity requires training data;
potential cost without mobility-aware routing.
Table 2. A modified sample of the dataset with all target variables.
Table 2. A modified sample of the dataset with all target variables.
Peak
Frequency
Active
Time (hours)
API Initiation
Count
Service
Requests
vCPUvRAM
(GB)
vDisk
(GB)
Energy
Usage (Wh)
Cloud
Configuration
20.2512150021.50.610Basic
40.4530180042.5140Standard
71.52206200542.550Intermediate
83.23408300763.575Advanced
115.855011,20098595Premium
Table 3. Impacts of different network configurations on overfitting and underfitting characteristics in the RAP-Optimizer.
Table 3. Impacts of different network configurations on overfitting and underfitting characteristics in the RAP-Optimizer.
Network
Configuration
Number of
Hidden Layers
Neurons
per Layer
CharacteristicsObserved Behavior
Initial
Configuration
632OverfittingHigh training accuracy, low validation
accuracy.
Modified
Configuration 1
532OverfittingOverfitting persists, validation accuracy
slightly improves but still significantly
lower than training.
Modified
Configuration 2
432OverfittingModerate overfitting; slight improvement
in validation performance, but gap remains.
Modified
Configuration 3
316OverfittingReduced overfitting but validation accuracy
still does not match training accuracy.
Modified
Configuration 4
48UnderfittingModel starts underfitting; both training and
validation accuracy are low.
Modified
Configuration 5
44UnderfittingSignificant underfitting; both accuracies
remain low, model complexity too reduced.
Modified
Configuration 6
216UnderfittingUnderfitting persists, accuracy too low for
both training and validation.
Modified
Configuration 7
132UnderfittingSevere underfitting; network too shallow to
capture complex patterns.
Table 4. Performance Evaluation using K-Fold Cross-Validation.
Table 4. Performance Evaluation using K-Fold Cross-Validation.
MetricsFold 1Fold 2Fold 3Fold 4Fold 5Fold 6Average
Accuracy0.9620.9610.9640.960.9630.9610.9618
Precision0.9750.9760.9730.9770.9740.9760.9751
Recall0.970.9690.9710.9680.970.9690.9695
F1 score0.9720.9710.9730.970.9720.9710.9715
Table 5. The number of active hosts per 24 h before and after using the deep-annealing algorithm.
Table 5. The number of active hosts per 24 h before and after using the deep-annealing algorithm.
WeeksNumber of Active Host Per 24 h Reduction
Without Deep-AnnealingWith Deep-Annealing
136306
229227
337334
431274
535296
633276
739345
830255
938326
1032275
1134286
1229245
Average33285
Table 6. API request handling and resource optimization before and after using the deep-annealing algorithm.
Table 6. API request handling and resource optimization before and after using the deep-annealing algorithm.
HostResource CapacityAPI Requests Before RAP-OptimizerAPI Requests After RAP-Optimizer
CPU (Cores)RAM (GB)ProcessedCPU CoresProcessedCPU Cores
1101281281810
210128159209
31012816102210
41012875Handled by Host 1–3Idle Mode
5101281491910
61012864Handled by Host 1–3Idle Mode
71012897189
810128138209
910128106169
101012853Handled by Host 5–9Idle Mode
111012886Handled by Host 5–9Idle Mode
121012843Handled by Host 5–9Idle Mode
Table 7. Numerical analysis of the objective achievement.
Table 7. Numerical analysis of the objective achievement.
MonthsServer Cost in USDReturn in USDRevenue Margin in USD
BeforeAfterBeforeAfterBeforeAfter
1200150500500300350
250040012001200700800
38006501800190010001250
411008002200230011001500
515009002500260010001700
618001000260027008001700
720001100270028007001700
821001150275028006501650
922001200280028506001650
1024001200290029005001700
1125001250290029504001700
1226001250300030004001750
Table 8. Comparative analysis of RAP-Optimizer and CEDULE+.
Table 8. Comparative analysis of RAP-Optimizer and CEDULE+.
MetricRAP-OptimizerCEDULE+Percentage Improvement
Cost Savings (%)45%32%+13%
Active Host Reduction (%)40%30%+10%
Prediction Accuracy (%)96.1%92.5%+3.6%
Response Time (ms)85 ms120 ms−29.2%
Table 9. Performance comparison of RAP-Optimizer with and without the mobility-aware component.
Table 9. Performance comparison of RAP-Optimizer with and without the mobility-aware component.
MetricStandard RAP-
Optimizer
With Mobility-Aware
Component
Percentage
Improvement
Cost Savings (%)45%50%+5%
Active Host Reduction (%)40%42%+2%
Prediction Accuracy (%)96.1%96.3%+0.2%
Response Time (ms)85 ms70 ms+18%
Routing Cost Reduction (%)-15%-
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sathupadi, K.; Avula, R.; Velayutham, A.; Achar, S. RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications. Electronics 2024, 13, 4462. https://doi.org/10.3390/electronics13224462

AMA Style

Sathupadi K, Avula R, Velayutham A, Achar S. RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications. Electronics. 2024; 13(22):4462. https://doi.org/10.3390/electronics13224462

Chicago/Turabian Style

Sathupadi, Kaushik, Ramya Avula, Arunkumar Velayutham, and Sandesh Achar. 2024. "RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications" Electronics 13, no. 22: 4462. https://doi.org/10.3390/electronics13224462

APA Style

Sathupadi, K., Avula, R., Velayutham, A., & Achar, S. (2024). RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications. Electronics, 13(22), 4462. https://doi.org/10.3390/electronics13224462

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop