RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications

Sathupadi, Kaushik; Avula, Ramya; Velayutham, Arunkumar; Achar, Sandesh

doi:10.3390/electronics13224462

Open AccessArticle

RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications

¹

Google LLC, Sunnyvale, CA 94089, USA

²

Business Information Developer Consultant Company, Carelon Research, Celina, TX 75009, USA

³

Cloud Software Development Engineer and Technical Lead at Intel, Phoenix, AZ 85050, USA

⁴

Department of Software Engineering, Walmart Global Tech, Sunnyvale, CA 94086, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2024, 13(22), 4462; https://doi.org/10.3390/electronics13224462

Submission received: 2 October 2024 / Revised: 3 November 2024 / Accepted: 7 November 2024 / Published: 14 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

Artificial Intelligence (AI) applications are rapidly growing, and more applications are joining the market competition. As a result, the AI-as-a-service (AIaaS) model is experiencing rapid growth. Many of these AIaaS-based applications are not properly optimized initially. Once they start experiencing a large volume of traffic, different challenges start revealing themselves. One of these challenges is maintaining a profit margin for the sustainability of the AIaaS application-based business model, which depends on the proper utilization of computing resources. This paper introduces the resource award predictive (RAP) model for AIaaS cost optimization called RAP-Optimizer. It is developed by combining a deep neural network (DNN) with the simulated annealing optimization algorithm. It is designed to reduce resource underutilization and minimize the number of active hosts in cloud environments. It dynamically allocates resources and handles API requests efficiently. The RAP-Optimizer reduces the number of active physical hosts by an average of 5 per day, leading to a 45% decrease in server costs. The impact of the RAP-Optimizer was observed over a 12-month period. The observational data show a significant improvement in resource utilization. It effectively reduces operational costs from USD 2600 to USD 1250 per month. Furthermore, the RAP-Optimizer increases the profit margin by 179%, from USD 600 to USD 1675 per month. The inclusion of the dynamic dropout control (DDC) algorithm in the DNN training process mitigates overfitting, achieving a 97.48% validation accuracy and a validation loss of 2.82%. These results indicate that the RAP-Optimizer effectively enhances resource management and cost-efficiency in AIaaS applications, making it a valuable solution for modern cloud environments.

Keywords:

deep neural network; dynamic dropout control; overfitting mitigation; simulated annealing; AIaaS; cloud resource optimization; cost-efficiency; resource utilization; profit margin enlarging

1. Introduction

The current cloud application trends demonstrate the rapid growth of AI-driven applications powered by AIaaS as the back-end [1]. As a result, the demand for efficient resource utilization has become paramount [2]. Many innovative cloud-based services, particularly those adopting the AI-as-a-service (AIaaS) model, suffer from underutilization of resources, leading to escalating operational costs [3]. The initial resource utilization assumptions regarding the resource limitation of these applications change as the number of users and API requests increases. That is why service providers face the challenge of managing their physical and virtual resources effectively while maintaining the quality of service (QoS) [4]. The RAP-Optimizer presented in this paper addresses these issues by optimizing resource allocation and minimizing the number of active hosts in AIaaS environments. This system not only reduces server costs but also improves resource efficiency. As a result, it leads to better profit margins and energy savings. The potential of the RAP-Optimizer lies in its ability to dynamically balance API request loads across cloud servers. This is how it prevents the unnecessary activation of additional resources and ensures sustainable and scalable cloud operations.

The proposed RAP-Optimizer was developed by integrating a deep neural network (DNN) [5] with the simulated annealing algorithm [6] to create a robust framework for real-time resource management. The DNN predicts the optimal configuration for virtual machines (VMs) based on real-time data analysis, while the simulated annealing algorithm helps optimize resource allocation by minimizing the number of active hosts. Additionally, the system incorporates a dynamic dropout control (DDC) algorithm to mitigate overfitting issues during the model training phase. The RAP-Optimizer operates in a multi-stage workflow, beginning with resource analysis through the resource analyzer (RAN) algorithm, which identifies underutilized hosts and redistributes API requests to ensure optimal cloud resource usage. By consolidating workloads and deactivating idle hosts, the system can enhance energy efficiency while maintaining the agreed quality of service (QoS) for users. The key contributions of this paper are summarized as follows:

Dynamic resource optimization: A novel integration of DNN and simulated annealing for dynamically balancing API requests and resource utilization across active cloud hosts.
Cost reduction mechanism: Demonstrated significant server cost reduction through the RAP-Optimizer, leading to improved profit margins and reduced energy consumption.
Multi-stage optimization workflow: Introduction of a multi-stage workflow utilizing the RAN algorithm for comprehensive resource analysis, ensuring effective redistribution of workloads across physical and virtual machines.
Handling overfitting with DDC: An innovative dynamic dropout control (DDC) algorithm integrated into the DNN to overcome overfitting during model training and enhance prediction accuracy.
Revenue margin increase: The proposed system improved profit margins by 179% over 12 months, increasing the average profit margin from USD 600 to USD 1675.

The remainder of this paper is organized as follows. The literature review is presented in Section 2. Section 3 provides an in-depth problem analysis, identifying the challenges that led to the development of the RAP-Optimizer. Section 4 outlines the methodology, describing the integration of DNN, simulated annealing, and the RAN algorithm, along with the dataset preparation and feature normalization steps. Section 5 presents the experimental results and evaluations, including the performance analysis of the proposed system and its ability to optimize cloud resource usage. Section 6 discusses the limitations and potential future improvements for the RAP-Optimizer. Finally, Section 7 concludes the paper, summarizing the key findings and contributions.

2. Literature Review

Resource optimization in cloud computing is a widely researched topic, particularly in the context of infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS). However, there remains a significant gap in addressing optimization strategies specifically tailored for AI-as-a-Service (AIaaS) models, which are characterized by fluctuating workloads and high computational demands [7]. This section reviews recent studies on cloud resource optimization, workload balancing, and overfitting issues in deep learning models, identifying gaps that the proposed RAP-Optimizer aims to address.

2.1. Cloud Resource Optimization

According to the survey conducted by Mohammadzadeh et al. [8], resource optimization for traditional cloud services, such as virtual machine (VM) allocation and CPU/RAM resource distribution, is the predominant field of research in cloud resource optimization. AIaaS differs from traditional cloud services, which require real-time, scalable computation [9]. Furthermore, most solutions, like the hill-climbing (HC) algorithm, operate reactively and often activate new physical hosts without fully utilizing existing resources, resulting in higher operational costs [10]. The proposed RAP-Optimizer addresses this gap by integrating a DNN with simulated annealing [6] to dynamically allocate resources based on real-time workloads and optimize resource utilization.

2.2. Workload Balancing and API Request Handling

Numerous studies have examined workload balancing in cloud environments [11,12,13]. However, these approaches generally focus on static or semi-dynamic strategies that do not adapt quickly to the rapidly changing workload patterns seen in AIaaS platforms. For example, Kumar et al. [14] presented an approach for balancing workloads across cloud servers but did not consider the possibility of reducing the number of active hosts when resources are underutilized. Additionally, ref. [15] emphasized load distribution based on CPU and memory but did not consider network bandwidth and other critical factors such as disk I/O [16]. In contrast, the RAP-Optimizer efficiently reallocates API requests by utilizing fewer physical hosts while ensuring maximum CPU and memory utilization, thus overcoming the limitations of static workload balancing methods.

2.3. Overfitting in Deep Neural Networks

Handling overfitting in DNNs is an active field of research with numerous regularization techniques [17]. According to Alnagashi et al. [18] dropout is an effective way to mitigate the overfitting issue. A review conducted by Salehin et al. [19] on different dropout techniques reveals that most approaches use a fixed dropout rate. The literature shows that while dropout can be effective, it is not adaptive to different layers of the network, leading to either underfitting or overfitting in complex applications [20]. This paper fills this gap by introducing the dynamic dropout control (DDC) algorithm, which dynamically adjusts the dropout rate layer-by-layer, reducing the overfitting issue without compromising the model’s predictive performance.

2.4. Energy-Efficient Cloud Systems

The optimization of energy consumption in cloud data centers has been explored in various studies [21,22,23]. The primary focus of these is to reduce energy consumption by consolidating VMs or activating power-saving modes on underutilized hosts [24]. However, the existing methods primarily focus on reducing the number of active physical hosts without considering the trade-off between resource optimization and maintaining service quality [25]. The proposed RAP-Optimizer not only reduces the number of active hosts by an average of five per day, but it also improves resource utilization, resulting in substantial energy savings without degrading the quality of service (QoS).

2.5. Revenue Impact and Cost Optimization

Revenue impact and corresponding cost optimization is an under-explored field of research for AIaaS [26]. Multiple studies revolve around traditional cloud services [27,28,29]. However, these studies do not account for the dynamic and unpredictable nature of AIaaS platforms, where the operational cost can escalate quickly due to inefficient resource management [30]. The RAP-Optimizer directly addresses this gap by significantly reducing server costs—by 45% on average—while maintaining consistent service delivery. This allows AIaaS platforms to stabilize their profit margins over time, which is often overlooked in traditional cloud optimization research.

2.6. Comparison with Existing Technologies

Current cloud resource optimization techniques, including conventional autoscaling [31] and rule-based allocation models [32], offer reliable performance in predictable workloads but struggle in complex AI-as-a-service (AIaaS) applications where demand can be highly variable and unpredictable. Many approaches activate additional resources reactively without assessing overall resource utilization, leading to increased operational costs and underutilized resources [33].

In contrast, the proposed RAP-Optimizer is uniquely equipped to handle such dynamic conditions by integrating a deep neural network (DNN) for predictive VM configuration alongside the simulated annealing algorithm, which efficiently minimizes active host usage. Additionally, the dynamic dropout control (DDC) mechanism prevents overfitting during model training, ensuring accurate, real-time predictions even as workloads fluctuate. This combination allows RAP-Optimizer to consolidate workloads onto fewer hosts, dynamically redistribute API requests, and deactivate idle hosts, achieving up to 45% reduction in server costs, as shown in our experimental analysis. These attributes demonstrate RAP-Optimizer’s ability to maintain high resource utilization and reduce costs effectively in challenging AIaaS scenarios. The summary of the comparison of the proposed system with existing technology is listed in Table 1.

3. Problem Analysis and Objective

This paper developed a solution for an AI-driven image enhancement web application. It follows the AI-as-a-service model with a pay-as-you-go billing method. The initial margin between server cost and the overall subscription fee of the application is convenient which is illustrated in Figure 1. This figure shows the relationship among server cost, return, and revenue margin over a 12-month period of time. However, eventually, with the popularity, the margin narrowed down. As a result, the application’s revenue flow started impacting the business model’s overall sustainability. Figure 1. The reason behind this fall in revenue is the increase in the number of active hosts. Further investigation shows the computing resources in the active hosts are not fully utilized. Although there are scopes for handling more API calls with the existing hosts, the system activates new hosts. As a result, the operational cost increases. The challenge is identifying the under-utilized host servers, allocating the new requests to these hosts, and reducing the number of active hosts. In this way, the operational cost can be minimized, increasing the profit margin at a sustainable level. The proposed methodology was developed to achieve this objective.

4. Methodology

The overview of the proposed methodology is illustrated in Figure 2. It starts with dataset processing, constructed from the experiment’s AIaaS application log files. Later, a well-optimized DNN architecture is developed to predict the appropriate configuration for a service request. After that, an innovative approach of reducing the overfitting effect and shifting to balance learning is incorporated into the network during the training process. The trained DNN was used to develop the RAP-Optimizer, which consists of a resource analysis module, resource space landscape, and a novel deep-annealing algorithm.

4.1. Application-Specific Adaptability and Resource Requirements

The RAP-Optimizer is designed to dynamically adapt to the characteristics of the AIaaS application it manages, with specific consideration given to image size, processing complexity, and frequency of API requests. In this study, the AIaaS application performs image enhancement, which demands substantial processing power depending on the size and quality of the images processed. For example, larger images or those requiring high-resolution outputs necessitate more virtual CPU cores and memory allocation due to increased computational demands.

The system determines the resource allocation based on these application-specific parameters. For instance, when processing high-resolution images, the DNN model predicts a configuration with additional CPU and memory resources for the corresponding VMs. This ensures the required computational power is available without over-provisioning resources. Additionally, the RAP-Optimizer monitors ongoing resource utilization to adjust allocations as image processing demands fluctuate, preventing unnecessary host activations and optimizing the allocation of VMs based on workload intensity.

4.2. Dataset Preparation

This study uses a unique dataset prepared from the AIaaS application log and a sample of the dataset is presented in Table 2. The application maintains three types of log files, which are the activity log (

L o g_{a c}

), system log (

L o g_{s t}

), and application log (

L o g_{a p}

). The relationships among these logs are illustrated in Figure 3. The

L o g_{a c}

keeps track of the user interaction,

L o g_{s t}

is responsible for keeping track of computational resource usage, and

L o g_{a p}

is dedicated to application-related data. All data from these logs are converted into comma-separated value (CSV) format [34]. After that, the instances are categorized into five classes as presented in Table 2. The Pearson correlation coefficient (PCC) score, calculated using Equation (1), was used to identify the relevant features that have a strong correlation with the five classes [35].

r = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}} \sqrt{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}}

(1)

In Equation (1),

X_{i}

represents the individual feature values,

Y_{i}

stands for the individual target variable,

\bar{X}

denotes the mean of the features,

\bar{Y}

denotes the mean of the target variable, and n denotes the total data points. It generates the linear correlation score (r) between X and Y with a range from

- 1

to

+ 1

, where the former represents a perfect negative correlation and the latter denotes a perfect positive correlation.

4.2.1. Dataset Description

The dataset contains records of the experimental application for 365 days. After cleaning, there are 142,784 instances in the dataset. The feature variables are peak frequency, active time (hours), API initiation count, service requests, virtual CPU, virtual RAM (GB), virtual disk (GB), energy usage (Wh), and cloud configuration. Except for the target variable, all other features are numerical. The target variable is categorical, which has five categories.

4.2.2. Dataset Cleaning

The CSV file constructed from

L o g_{a} c

,

L o g_{s} t

, and

L o g_{a} p

contains numerous incomplete rows. These rows were created for multiple reasons, including incomplete service requests, API initialization failure, and network issues. In addition, there are multiple duplicate rows. The uncleaned dataset has 163,150 instances. This dataset was cleaned by following the mathematical principle defined in Equation (2).

C^{*} = {y \in C ∣ \forall z_{j} \in Z, y_{j} \neq null}

(2)

In Equation (2), C represents the uncleaned dataset. It contains n records, where

C = y_{1}, y_{2}, \dots, y_{n}

and each

y_{i}

corresponds to the

i^{t h}

observation vector across all variables

Z = z_{1}, z_{2}, \dots, z_{p}

. The cleaned dataset

C

includes only those observations from C where no variable in the observation vector is missing, denoted by “null”. Additionally, outliers from C, determined using the mean

μ_{c}

and standard deviation

σ_{c}

, are eliminated based on the rule in Equation (3) [36]. A data point

y \in C

is classified as an outlier if it satisfies

y < μ_{c} - b \cdot σ_{c} or y > μ_{c} + b \cdot σ_{c}

, where b is a constant set to 3 according to the empirical 68-95-99.7 rule. The result is a dataset

C

with no outliers [37].

C^{*} = {y \in C ∣ μ_{c} - b \cdot σ_{c} \leq y \leq μ_{c} + b \cdot σ_{c}}

(3)

To clarify, in this context, C consistently denotes the uncleaned dataset with missing values and outliers, while

C^{*}

represents the cleaned dataset that is devoid of any missing values or outliers. This distinction between C and

C^{*}

helps to clearly understand the process of dataset preparation for the training phase. Although this cleaning process reduces the number of entries, the final count is 142,784 instances after cleaning, which is sufficient to train a deep neural network (DNN) to classify the target variables using the input features.

4.2.3. Feature Normalization

The feature variables exhibit a wide range of variations in the dataset, so feature normalization is essential [38]. Z-score normalization was chosen to normalize the features because it effectively handles data with varying scales and distributions. Figure 4 illustrates the difference in data range before and after performing the normalization. The Z-score normalization process for a feature

x_{i}

is defined in Equation (4) [39].

x_{i}^{'} = \frac{x_{i} - μ_{x}}{σ_{x}}

(4)

In Equation (4),

x_{i}

denotes the original feature value,

μ_{x}

denotes the mean of the feature x,

σ_{x}

denotes the standard deviation of the feature x, and

x_{i}^{'}

denotes the normalized feature value. Here, i denotes the range, which is from 1 to 142,784. After applying Z-score normalization, the feature values are transformed, such that the dataset has a mean of zero and a standard deviation of one, ensuring all features contribute equally during model training [40].

4.2.4. Dataset Splitting

Ullah et al. [41] conducted a systematic review of machine learning (ML) applications, showing that most state-of-the-art approaches use a 70:15:15 dataset splitting ratio for training, testing, and validation datasets, respectively. The same ratio was used in this study. There are a total of 142,784 instances in the cleaned dataset. At the 70:15:15 ratio, there are 99,948 instances for training, 21,417 instances for training, and the same number for validation.

4.2.5. Dataset Characteristics and Generalizability

The dataset used in this study originates from an AI-driven image enhancement web application operating under the AI-as-a-service (AIaaS) model. That means the dataset represents real-world characteristics. The AI-driven application from which the data were collected processes a high volume of real-time API requests. Depending on the type of processing, the resource demands fluctuate. The user activity and the workload intensity have a significant impact on this fluctuation. The real-world workload patterns reflected in the dataset represent regular and peak activities, which are essential for rapid resource scaling. This dataset contains the consumption history of computing resources, including CPU, memory, and network resource allocation, which is typical of many AIaaS applications. Resource utilization needs are also diverse, with some processes intermittently requiring high computational power. These characteristics are common for various AIaaS applications. When the applications are designed to enhance images, these are the essential characteristics. Moreover, these features reflect the common properties of real-time AIaaS applications. Considering everything, the dataset prepared and utilized in this experiment supports generalization when applied for AIaaS application resource optimization.

4.3. Network Architecture

A deep neural network (DNN) was designed as the resource predictor of the proposed RAP-Optimizer. It is a six-layer network designed using the concept of a fully connected network architecture. It is illustrated in Figure 5. There are a total of four hidden layers. Including the input and output layers, the overall network is a six-layer network. The input layer supports an 8-dimensional feature vector defined as

[x_{1}, x_{2}, \dots, x_{8}]

. The hidden layers in Figure 5 are expressed by

h_{1}

,

h_{2}

,

h_{3}

, and

h_{4}

. Each layer has 32 neurons, which allows the network to intercept complex representations of the data. The network architecture was carefully designed to predict the cloud configuration requirements when AIaaS applications are initiated. The optimal dropout rate plays a crucial role in ensuring the reliable performance of the network. That is why an innovative dynamic dropout control (DDC) algorithm was prepared in this study. It was designed and integrated with the network to adjust the dropout rate of each layer dynamically for optimal unbiased performance. The working principle of the input layer is expressed by Equation (5). The input layer simply transposes the input vectors [42].

ξ = {[ξ_{1}, ξ_{2}, \dots, ξ_{η}]}^{T}

(5)

The input layer, after transposing the feature vectors, passes them to the hidden layers. The input vector, expressed as

χ^{[l - 1]}

, is processed by the hidden layers using a weight matrix

Ω^{[l]}

, a bias vector

ζ^{[l]}

, and the rectified linear unit (ReLU) activation function, denoted as

φ (\cdot)

, described in Equation (7). For layer l, the transformation is formulated in Equation (6), where

l = 1, 2, \dots, 5

, and the dimensions of the weight matrices and bias vectors are

Ω^{[l]} \in R^{32 \times 8}

and

ζ^{[l]} \in R^{32}

, respectively, [43].

χ^{[l]} = φ (Ω^{[l]} χ^{[l - 1]} + ζ^{[l]})

(6)

φ (ξ) = \{\begin{matrix} ξ & if ξ > 0, \\ 0 & otherwise . \end{matrix}

(7)

The dataset has five target classes. The output layer of the network designed in this section was prepared to predict those classes. That is why it has five nodes. Each node represents one of the five target classes. The activation function used in this layer is the Softmax function, which is defined in Equation (8). The only role of this activation function is to convert the raw output values into probabilities. Given an input,

ξ

, the logits

ς_{i}

are obtained through a linear transformation, followed by the Softmax function to compute the probabilities

υ_{i}

for each class i, where

i = 1, \dots, 5

[44].

υ_{i} = \frac{exp (ς_{i})}{\sum_{j = 1}^{5} exp (ς_{j})}

(8)

Optimal VM Configuration Predicted by DNN

The main purpose of the DNN designed in this section is to work as a VM configuration predictor. It is integrated with the RAP-Optimizer, and it predicts the optimal configuration for virtual machines (VMs) by analyzing real-time data on system load and resource availability. Multiple parameters participate in the prediction. These parameters are CPU cores, memory, storage, and expected API request load. The DNN was trained on these parameters as well. As a result, it is capable of predicting the VM configurations for a certain combination of these parameters. The primary goal of the DNN is to predict the configuration to minimize idle time and maximize resource utilization on active hosts. By accurately forecasting the configuration needed for each VM, the DNN prevents unnecessary resource allocation. As a result, each VM is provisioned with the optimal amount of resources for its workload.

4.4. Training the Network

The network is trained using the backpropagation algorithm [45]. During the forward pass, input data

ξ

propagate through the network, and each layer l calculates its output

χ^{[l]}

as a function of the input from the previous layer

χ^{[l - 1]}

, employing its weight matrix

Ω^{[l]}

, bias vector

ζ^{[l]}

, and activation function

σ^{[l]}

. This process continues through all layers until the output prediction

\hat{υ}

is made. The prediction is then compared to the true labels

υ

using a loss function, defined in Equation (9), which measures the difference between predicted and actual values [46]. In this equation,

Γ

represents the number of output classes,

υ_{i}

denotes the true label, and

{\hat{υ}}_{i}

denotes the predicted probability for class i [47].

L (υ, \hat{υ}) = - \sum_{i = 1}^{Γ} υ_{i} log ({\hat{υ}}_{i})

(9)

4.4.1. Learning Algorithm

The adaptive moment estimation (ADAM) optimizer is used to update the weights of the network’s hidden nodes. Initially, the vectors

μ_{0}

and

ν_{0}

are set to zero, and the time step

τ

is initialized to zero. The learning rate is represented by

α

, and

β_{1}

, and

β_{2}

are the decay rates for moment estimates, initialized to 0.90. At each time step

τ

, the gradients

\nabla_{θ} J (θ)

with respect to the parameters

θ

are calculated. The updates for

μ_{τ}

and

ν_{τ}

are defined by Equations (10) and (11) [48].

μ_{τ} = β_{1} μ_{τ - 1} + (1 - β_{1}) \nabla_{θ} J (θ)

(10)

ν_{τ} = β_{2} ν_{τ - 1} + (1 - β_{2}) {(\nabla_{θ} J (θ))}^{2}

(11)

The bias-corrected estimates for

μ_{τ}

and

ν_{τ}

are given by Equations (12) and (13).

{\hat{μ}}_{τ} = \frac{μ_{τ}}{1 - β_{1}^{τ}}

(12)

{\hat{ν}}_{τ} = \frac{ν_{τ}}{1 - β_{2}^{τ}}

(13)

Finally, the weights

ω

are updated using Equation (14), where

ϵ

is a small constant for numerical stability.

ω = ω - α \frac{{\hat{μ}}_{τ}}{\sqrt{{\hat{ν}}_{τ}} + ϵ}

(14)

At the 70:15:15 ratio, there are 99,948 instances for training, 21,417 instances for training, and the same number for validation.

4.4.2. Learning Curve

The training dataset contains 99,948 instances. To enhance training efficiency, mini-batches of size 64 are utilized. The learning curve, depicting the progress of both accuracy and loss for training and validation, is presented in Figure 6. It showcases the network’s performance over 50 epochs with 111,550 iterations, taking approximately 8 h and 27 min to complete. The validation accuracy achieved is 97.48%, with a validation loss of 2.82%, while the training accuracy and loss are 95.15% and 4.85%, respectively.

4.4.3. DDC Algorithm

During the development phase of the RAP-Optimizer, it was observed that the network overfits after training. The number of hidden layers and nodes of the existing layers was reduced to minimize overfitting. After the modification, the network exhibits underfitting characteristics. Table 3 shows different configurations explored during the development and the characteristics of the network along with other parameters. To ensure balanced fitting, the innovative DDC algorithm, presented as Algorithm 1, was developed and integrated into the network. It gradually discovers the optimal number dropout rate for different hidden layers and balances an overfitting network.

The initial dropout rate

p_{0}

and the threshold

δ

were chosen based on empirical testing. Starting with a moderate value for

p_{0}

, set to 0.2, provided a balanced starting point that allowed sufficient dropout without a sharp reduction in performance. This value was selected after trials indicated that lower dropout rates (e.g., below 0.1) had limited impact on overfitting, while higher rates (e.g., above 0.3) led to underfitting. The threshold

δ

, which determines the acceptable accuracy gap between training and validation, was set based on performance analysis; a value of 0.02 (or 2%) was found effective in preventing overfitting without sacrificing validation performance. These parameters are adjustable to account for specific data characteristics and can be tuned further as required by different workloads or applications.

Algorithm 1 Dynamic dropout control (DDC) algorithm.

1:: Input: Trained network $N$ , Training set $X_{t r a i n}$ , Validation set $X_{v a l}$ , Initial dropout rate $p_{0}$ , Number of layers L, Dropout increment $Δ p$ , Threshold $δ$
2:: Output: Modified network with minimized overfitting
3:: Initialize $p \leftarrow p_{0}$
4:: Compute initial accuracies ${Acc}_{t r a i n}, {Acc}_{v a l}$
5:: $Δ \leftarrow {Acc}_{t r a i n} - {Acc}_{v a l}$
6:: while $Δ > δ$ do
7:: for $l = 1$ to L do
8:: if first iteration then
9:: $h^{[l]} \leftarrow Dropout (h^{[l]}, p)$
10:: else
11:: Increase dropout: $p \leftarrow p + Δ p$
12:: $h^{[l]} \leftarrow Dropout (h^{[l]}, p)$
13:: end if
14:: end for
15:: Recompute ${Acc}_{t r a i n}, {Acc}_{v a l}$
16:: Update $Δ \leftarrow {Acc}_{t r a i n} - {Acc}_{v a l}$
17:: end while
18:: Return $N$ with reduced overfitting

4.4.4. Achieving Optimal Unbiased Performance with DDC

The dynamic dropout control (DDC) algorithm is designed to maintain an optimal unbiased performance by dynamically adjusting dropout rates to mitigate overfitting or underfitting. Unbiased performance is defined by balanced training and validation accuracy, where the gap between these accuracies remains within 2–3%. Overfitting is indicated when training accuracy significantly exceeds validation accuracy, while underfitting is evident when both accuracies are low. The DDC algorithm progressively adjusts dropout rates for each layer, reducing the gap between training and validation accuracy and achieving a stable performance level where neither overfitting nor underfitting dominates.

4.4.5. VM Resource Requirements for AIaaS Workloads

The AIaaS application’s resource indicators are determined by the processing demands of each incoming task. For image enhancement applications, the system assesses metrics such as image size (in pixels) and required processing time to establish optimal VM configurations. VMs with varying CPU, memory, and storage allocations are configured to meet these processing requirements efficiently. For instance, tasks involving high-resolution images may require up to 8 CPU cores and 64 GB of memory, while lower-resolution tasks may only require 2 cores and 16 GB of memory. This dynamic configuration ensures that the RAP-Optimizer adapts VM allocations according to workload intensity, promoting efficient resource usage even as demand fluctuates.

4.5. RAP-Optimizer

The proposed RAP-Optimizer is an innovative approach that starts with resource analysis. After that, the resource space landscape is generated to evaluate the resource distribution. Finally, it combines DNN with the simulated annealing optimization to optimize the cloud resource without compromising the service quality.

4.5.1. Resource Analysis

The RAP-Optimizer starts with resource analysis, performed using the resource analyzer (RAN) algorithm, presented as Algorithm 2, developed in this study. The RAN algorithm requires access and scanning permission for both virtual machines (VMs) and their physical hosts. Initially, it scans the entire system, identifies the active VMs through their unique ID (VID), retrieves their current configurations, and fetches the resource consumption history. The resource consumption history is stored in the VM log, which is fetched by the

r e s_{h} i s t (v m)

function. After that, it explores the physical hosts supporting the VMs. It performs similar operations on physical hosts. Finally, it scans the idle physical hosts. The RAN algorithm maintains a resource status table (RT) which is frequently updated. Based on the resource consumption, it generates a Resource Space (RS) represented by bar graphs.

Algorithm 2 Resource analyzer (RAN) algorithm

1:: procedure RAN_Optimizer
2:: Input: Access permissions for VMs and physical hosts
3:: Output: $R S$ , Updated $R T$
4:: Initialize: $R T \leftarrow I n i t (R o u t i n g T a b l e)$
5:: Scan system and fetch VMs and Hosts
6:: for $v m \in V M s$ do ▹ For each VM
7:: Identify active VMs: $v m \leftarrow I D (v m)$
8:: Get VM configurations: $c f g (v m)$
9:: Fetch resource history: $r e s_h i s t (v m)$
10:: end for
11:: for $h o s t \in H o s t s$ do ▹ For each host
12:: Get host configurations: $c f g (h o s t)$
13:: Fetch resource history: $r e s_h i s t (h o s t)$
14:: end for
15:: Scan idle hosts: $i d l e_h o s t s \leftarrow S c a n (H o s t s)$
16:: Update: $R T \leftarrow U p d a t e (c f g, r e s_h i s t)$
17:: Generate: $R S \leftarrow (V I D, R e s o u r c e)$
18:: return $R S, R T$ ▹ Return both Resource Space and Routing Table
19:: end procedure

4.5.2. Resource Space Landscape (RSL)

The RAN algorithm returns the RS and RT, where the VM configurations, current resource consumption rate, IDs of the VMs (

V I D

), corresponding physical hosts, and other relevant data are available. These data are used to prepare the RSL, showing active VMs in physical hosts. The RSL of a random instance is illustrated in Figure 7. The data center offering the AIaaS consists of 16 physical hosts in each server rack. The experimental environment has 3 server racks. According to the organization’s policy, a maximum of 10 VMs are allowed concurrently in a single host to maintain the terms of services (TOS). Each physical host is powered by a 10-core CPU with 128 GB primary memory.

4.5.3. Deep-Annealing Algorithm

The RAP-Optimizer combines the DNN presented in Section 4.3 and the simulated annealing algorithm, which was named deep-annealing and presented in Algorithm 3. The process starts by connecting to the AIaaS Data Center (AIDC) with permission to access the status of all physical and virtual devices. It applies the RAN Analyzer algorithm to retrieve the resource space (RS) and resource table (RT), which include the VM configurations, current resource usage, and corresponding physical hosts. These outputs provide a comprehensive view of the system’s resource space (

R S S

). After that, the deep-annealing algorithm utilizes the DNN to predict the VM configuration and respond to the request for subsequent AI services. The deep-annealing algorithm identifies hosts for VM deployment. It explores potential hosts iteratively, evaluating the cost (

c o s t

) and objective (

o b j

) functions to select a suitable host. The (

c o s t

) looks for the lowest possible value, whereas the role of the (

o b j

) function is to find out the highest possible value on the resource space. The process begins at a high-temperature state, allowing probabilistic acceptance of less optimal solutions to escape local minima. The

H a s C a p a c i t y ()

function is used to check whether the current host has sufficient resources for the new VM whose configuration was predicted by the DNN. When sufficient resources are available, the

C r e a t e I n s t a n c e ()

function is used to deploy the VM. It may happen that the current host does not have enough resources. In this case, the algorithm uses the

N e x t H o s t ()

function. This function uses the probability distribution influenced by the current temperature. This process allows a broader exploration of the available configurations.

The proposed method, the RAP-Optimizer, is periodically called with an interval of 15 min. Every time it is called, it monitors the resources and optimizes them. The user pattern shows that the average active period of the users is 15 min. That is why this period was used. This interval ensures that the proposed system maintains a real-time adaptation to the changing workloads. It further minimizes the excessive computational overhead. Every time the RAP-Optimizer is initialized, it is called the deep-annealing method. However, it depends on the need for resource distribution. The RAP-Optimizer performs an initial resource analysis. This analysis identifies the underutilized hosts or overloaded hosts. When it is necessary to optimize the resources, the RAP-Optimizer invokes the deep-annealing method. It adjusts the allocation of virtual machines across hosts. As a result, the system minimizes the number of active hosts while maintaining performance requirements.

Algorithm 3 Deep-annealing algorithm

1:: $AIDC \leftarrow Connect (c r e d e n t i a l s)$
2:: $(R S, R T) \leftarrow RAN Analyzer (AIDC)$
3:: ${\hat{C}}_{V M} \leftarrow DNN (RS, RT)$
4:: $RSS \leftarrow RetrieveResourceSpace (RS, RT)$
5:: Initialize temperature T
6:: $(H_{s t a r t}) \leftarrow SimulatedAnnealing (RSS, obj, cost, T)$
7:: while $\neg InstanceCreated$ and $T > T_{\min}$ do
8:: if $HasCapacity (H_{s t a r t}, {\hat{C}}_{V M})$ then
9:: $CreateInstance (H_{s t a r t}, V M)$
10:: $InstanceCreated \leftarrow True$
11:: else
12:: $H_{s t a r t} \leftarrow NextHost (H_{s t a r t}, RSS, T)$
13:: Decrease temperature $T \leftarrow α T$
14:: end if
15:: end while
16:: for $v m \in AIDC$ do
17:: if $CanMigrate (v m, H_{s t a r t})$ then
18:: $Migrate (v m, H_{s t a r t})$
19:: end if
20:: end for

If the initially selected host lacks sufficient resources for the VM deployment,

N e x t H o s t ()

identifies alternative hosts based on a probability distribution influenced by the current temperature T. This ensures the search process avoids local optima and explores a wider range of configurations. As the temperature gradually decreases with each iteration by a factor

α

, the exploration process narrows down to focus on optimal hosts for VM deployment. Once a suitable host is found and the VM instance is deployed, the algorithm evaluates potential migrations for existing VMs to optimize overall resource usage. The function

C a n M i g r a t e ()

checks whether a VM can be moved to a host that offers better resource efficiency, and if migration is feasible, the

M i g r a t e ()

function performs the transfer. The

C a n M i g r a t e ()

function evaluates both the feasibility of migrating a VM and the suitability of potential target hosts. This function checks migration feasibility based on the current resource utilization, performance requirements, and network stability. If a VM’s migration is feasible without disrupting service quality, the function proceeds to identify a suitable host. The target host is selected based on its available CPU, memory, and network capacity, ensuring it meets the requirements of the migrating VM. Additionally, the CanMigrate() function prioritizes hosts with higher resource availability and lower current workloads to maximize efficiency and maintain balanced resource utilization across the data center. The

M i g r a t e ()

function initiates VM migration to optimize resource allocation. To ensure system stability and avoid overloading resources, a limit is applied to the number of simultaneous VM migrations. This cap allows the RAP-Optimizer to manage migrations efficiently, preventing potential performance degradation caused by excessive concurrent migrations. This process ultimately aims to consolidate workloads, improve resource utilization, and reduce the operational costs associated with idle hosts.

4.5.4. Criteria for Optimal Resource Usage

The RAP-Optimizer establishes optimal cloud resource usage by evaluating key metrics such as CPU, memory, and bandwidth utilization across active hosts. Optimal usage is achieved when resources are maximally utilized without exceeding predefined thresholds that could lead to performance degradation or resource contention. Specifically, CPU and memory utilization levels above 85% but below 95% are considered optimal, ensuring sufficient headroom for unexpected loads while maximizing efficiency. Bandwidth utilization is similarly monitored, with an upper threshold set to prevent network bottlenecks. The RAP-Optimizer dynamically manages these resources to maintain this balance, activating additional hosts only when existing hosts reach their optimal utilization limits.

4.6. Mobility-Aware Resource Allocation

To optimize routing costs and enhance the efficiency of RAP-Optimizer in resource-limited networks, we introduce a mobility-aware component to the resource allocation process. This enhancement considers not only the current utility of host servers but also their physical proximity to service requesters. Inspired by recent advancements in mobility-aware routing, as discussed in [49], this modification enables RAP-Optimizer to prioritize host servers that are geographically closer to service requesters, thereby reducing the overall routing costs.

Specifically, this approach calculates the distance-based routing cost

R_{c}

using Equation (15), where

d_{i j}

represents the distance between the service requester i and the host server j, and w is a weighting factor that adjusts the impact of distance on the overall resource allocation cost:

R_{c} = w \times d_{i j}

(15)

During the request allocation phase, RAP-Optimizer now evaluates both resource availability and routing cost, selecting host servers that offer low utility but are also in close proximity to the requester, thereby minimizing latency and operational costs. This adjustment ensures that the RAP-Optimizer remains effective even under resource constraints, maintaining performance gains while addressing network distance considerations.

5. Experimental Result and Evaluation

5.1. Evaluation Metrics

The overall performance of the RAP-Optimizer depends on the prediction quality of the DNN developed in this study. The performance of this DNN was evaluated using accuracy, precision, recall, and F1 score. These are the most frequently used evaluation metrics for machine learning approaches [50]. These metrics are defined in Equations (16), (17), (18), and (19), respectively, which are dependent on true positive (TP), true negative (TN), false positive (FP), and false negative (FN). This study retrieves these values from the confusion matrix illustrated in Figure 8.

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(16)

Precision = \frac{TP}{TP + FP}

(17)

Recall = \frac{TP}{TP + FN}

(18)

F 1 Score = \frac{2 \times (Precision \times Recall)}{Precision + Recall}

(19)

Apart from ML performance evaluation metrics, the RAP-Optimizer performance was evaluated based on the capability of reducing the number of active hosts, improving API request management, and resource optimization [11,51]. Moreover, the overall objective achievement was used as an evaluation metric as well.

5.2. Confusion Matrix Analysis

The confusion matrix analysis reveals the performance of the classification model across five target classes: Basic, Standard, Intermediate, Advanced, and Premium, with an overall accuracy of 96.1%. The precision values vary slightly among the classes, with Basic achieving the highest precision at 97.7%, followed by Premium at 97.7%, Intermediate at 97.4%, Advanced at 97.3%, and Standard at 96.9%. Recall values indicate how well the model identifies each class, with Premium leading at 98.0%, followed by Basic at 97.3%, Standard at 97.2%, Intermediate at 97.4%, and Advanced at 97.2%. The F1 scores, which balance precision and recall, are consistent across the classes, with Basic at 97.5%, Standard at 97.0%, Intermediate at 97.4%, Advanced at 97.3%, and Premium at 97.8%. These metrics suggest that the model performs well overall, but slight variations exist between classes in terms of how accurately they are classified and identified.

5.3. K-Fold Cross-Validation

K-fold cross-validation is a robust method used to evaluate the performance of machine learning models by splitting the dataset into multiple folds. In this analysis, six folds were used, as shown in Table 4, with slight variations observed across each fold for metrics such as accuracy, precision, recall, and F1 score. The average values were calculated across all folds to provide a reliable overall assessment. Accuracy remained consistent, ranging from 96.0% to 96.4%, while precision ranged between 97.3% and 97.7%. Recall and the F1 score also showed slight variations, with recall values between 96.8% and 97.6% and F1 scores between 97.0% and 97.3%. The spider chart in Figure 9 visually represents the performance across the metrics, emphasizing the stability and effectiveness of the model across the different folds, providing confidence in its generalizability.

5.4. Active Physical Host

Table 5 presents a comparative analysis of the number of active hosts per 24 h before and after the implementation of the deep-annealing algorithm over a twelve-week period. The results presented in Table 5 correspond to the input activity measured in terms of API requests and application load handled by the AIaaS platform. Specifically, these results reflect the distribution and handling of a standardized set of API calls generated by the applications hosted on VMs. Each VM instance processed a varying number of requests depending on its allocated resources and current workload, and the table captures the system’s performance in terms of resource utilization and API handling efficiency both before and after applying the RAP-Optimizer. The input activities were kept consistent across evaluations to ensure a fair comparison between the baseline and optimized states. Before the introduction of the algorithm, the average number of active hosts per day was recorded at 33. With the deployment of the deep-annealing algorithm, this average was reduced to 28 active hosts per day, signifying an average reduction of 5 hosts. This demonstrates the efficacy of the deep-annealing algorithm in optimizing resource utilization within the data center environment.

A closer examination, illustrated in Figure 10, reveals a fluctuating yet consistently positive impact of the deep-annealing algorithm across the weeks. The most notable improvement was observed in the second week, where the number of active hosts was reduced by 7, from 29 without the algorithm to 22 with it. The least improvement was seen in weeks 3 and 4, where a reduction of 4 hosts was achieved. Weeks 1, 5, 6, 9, and 11 saw a reduction of 6 hosts, while weeks 7, 8, 10, and 12 observed a decrease of 5 hosts. These results underscore the capacity of the deep-annealing algorithm to effectively manage and reduce the number of active hosts required to support the operational demands of a SaaS application data center.

5.5. Request Optimization

Table 6 presents the resource utilization and API request handling before and after implementing the proposed RAP-Optimizer. Initially, all physical hosts were active, handling a random number of API requests, with some hosts underutilized in terms of CPU and RAM. The number of API requests generated follows a Poisson distribution, commonly used to model request rates in cloud environments, with a mean parameter

λ = 15

, which represents the average number of API requests per host per time unit. This distribution was chosen to simulate typical variations in API traffic experienced in the experimenting cloud-based AIaaS environment [52]. After applying the RAP-Optimizer, the system efficiently allocated API requests to fewer hosts, maximizing resource utilization before activating additional hosts. As a result, five physical hosts were placed into idle mode, conserving resources and reducing energy consumption. The remaining hosts handled the same workload but at higher efficiency, utilizing their CPU and RAM to near full capacity. Before the method was implemented, many hosts processed fewer API requests, with CPU cores and RAM underutilized. For example, hosts 4, 6, and 10 handled only a fraction of their capacity, running only 3 to 7 API requests, leaving a significant portion of resources idle. After applying the deep-annealing method, these API requests were consolidated onto hosts with higher resource availability, allowing some physical hosts to transition into idle mode. This approach optimized resource utilization across the active hosts while reducing operational overhead. The results, as shown in Table 6, demonstrate that the proposed system can handle the same number of API requests using fewer physical hosts, which leads to significant resource savings and improved data center efficiency.

5.6. Resource Optimization

The RAP-Optimizer demonstrates significant resource optimization potential. Figure 11 shows the before and after effect comparison. Initially, before implementing the RAP-Optimizer, the server costs steadily increased from USD 200 to USD 2600 over the 12 months, as the system inefficiently activated additional hosts to handle the growing number of API requests. This led to underutilized resources and inflated operational costs. After applying the proposed RAP-Optimizer, the server cost was reduced each month, stabilizing at around USD 1250 by the 12th month. This reduction is attributed to the system’s ability to consolidate API requests to fully utilize existing active hosts before activating new ones. As a result, unnecessary activation of hosts was minimized, leading to significant cost savings.

Despite the optimization, the application’s return remained consistent, showing that user demand and subscription revenues were unaffected by the changes. The table highlights a clear increase in the revenue margin after optimization—from an initial margin of USD 300 in the first month, the margin grew to USD 1750 by the 12th month. This improvement demonstrates the effectiveness of the proposed system in reducing operational costs while maintaining consistent service delivery, thus ensuring the profitability and sustainability of the AI-driven image enhancement web application over time. By efficiently managing resources and reducing the number of active hosts, the system successfully increases profit margins, even as server usage grows.

5.7. Objective Achievement

The primary objective of this study is to increase the profit margin. After implementing the proposed RAP-Optimizer, server cost, return, and revenue margin were observed for 12 months. During this observation period, the existing approach and the system with RAP-Optimizer were active simultaneously. The number of users allowed to use the optimized system was kept similar to the existing system for fair comparison. The data observed over the 12-month period are listed in Table 7. After using the RAP-Optimizer, on average, the server cost was reduced by approximately 45%, dropping from an initial USD 2600 to a stable USD 1250 by the 12th month. This resulted in an average monthly cost reduction of USD 1150. In parallel, the profit margin increased from an average of USD 600 per month before optimization to USD 1675 after optimization, reflecting a 179% increase in profit over the 12-month period. Additionally, the reduction in the number of active hosts, as shown in Table 6, led to more efficient resource utilization, ensuring that the same number of API requests were handled with fewer servers, further driving operational cost savings and profit maximization.

5.8. Case Study Analysis

To highlight the RAP-Optimizer’s performance, we conducted a case study comparing its efficiency with traditional methods, such as Kubernetes’ Horizontal Pod Autoscaler (HPA) [53] and Amazon Web Services (AWS) Auto Scaling [54], within high-demand periods for a large-scale AI image processing application. During peak loads, RAP-Optimizer reduced the number of active physical hosts by 20% in comparison to Kubernetes HPA and AWS Auto Scaling, while ensuring sub-100 ms response times. Additionally, RAP-Optimizer achieved a 28% reduction in server costs over AWS Auto Scaling, attributed to its predictive configuration adjustment model, which proactively manages resources to match API request patterns. The RAP-Optimizer’s approach reduced latency by an average of 15%, leveraging real-time workload analysis and minimizing idle resource allocations, outperforming existing methods like the Google Kubernetes Engine’s (GKE) default resource scaling by dynamically consolidating workloads [55]. These results demonstrate the RAP-Optimizer’s unique capability to handle dynamic and unpredictable workloads typical of AIaaS environments, achieving operational savings, enhanced resource utilization, and improved performance beyond what is currently possible with industry-standard tools.

5.9. Comparative Performance Evaluation

The performance of the proposed RAP-Optimizer was compared with other CEDULE+, which is considered a state-of-the-art method [56]. This quantitative performance analysis establishes the benchmark to perceive the level of improvement the proposed RAP-Optimizer can introduce. The experimental data are presented in Table 8. This comparison summarizes the quantitative results. It compares RAP-Optimizer with CEDULE+ across multiple key performance metrics. These metrics are cost savings, reduction in active hosts, prediction accuracy, and response time. Both systems were deployed under similar conditions so that the performance data remained unbiased. The experiment was conducted in a dynamic workload environment representative of typical AIaaS scenarios.

The experimental data in Table 8 show that the proposed RAP-Optimizer saves 13% of costs compared to the CEDULE+. The main contributor behind this improvement is the effective consolidation of API requests onto fewer active hosts offered by the RAP-Optimizer. This method reduces the overall energy consumption, which, in other words, reduces the cost. Moreover, the experimental data show that the proposed RAP-Optimizer introduces the capability of reducing additional active hosts by 10%. From the response time perspective, the RAP-Optimizer is more responsive than the CEDULE+. Overall, the proposed RAP-Optimizer demonstrates better performance in the experimenting environment.

5.10. Mobility-Aware Performance Comparison

The mobility-aware routing enhancement method was integrated into the proposed RAP optimizer to ensure network resource optimization. In this approach, the distance between the host servers and the service requesters was considered. Reducing the routing cost is another way of optimizing AIaaS resources [57]. The performance of the mobility-aware approach is listed in Table 9, which provides a summary of the experimental results under similar conditions. The key metrics of this analysis are cost savings, reduction in active hosts, prediction accuracy, response time, and routing cost. The experimental data show that the mobility-aware RAP-Optimizer improves the performance. It reduces the response time and lowers the routing cost. To be precise, according to the experimental data, the distance-aware approach achieved a 15% reduction in routing costs. It is an 18% improvement in response time compared to the standard RAP-Optimizer.

The experimental results presented in Table 9 indicate that mobility-aware RAP-Optimizer dynamically hosts requests to the host server to minimize network overhead. That is why the network resources are optimized when it is incorporated with the proposed RAP-Optimizer. This enhanced capability helps maintain a high level of efficiency, particularly for AI-driven applications with unpredictable and geographically diverse workloads.

6. Limitations and Future Scope

While the proposed RAP-Optimizer has shown promising results in reducing operational costs and optimizing resource utilization, there are several areas where the approach can be further improved. This section outlines the key limitations of the current system and proposes future directions to enhance its functionality, scalability, and efficiency in handling cloud-based AI services.

6.1. Resource Utilization Focus

One limitation of the proposed RAP-Optimizer is that it primarily focuses on optimizing CPU and memory utilization, while other critical factors, such as network bandwidth, disk I/O, and latency, are not fully considered. These elements play some role in the overall system performance. When the number of traffic increases, the impact of these elements becomes more vital. One of the approaches to overcome the limitations is to expand the dataset features and incorporate these elements as well. However, the scope of this extension will be explored in the subsequent version of this study.

6.2. Uniform API Request Handling

The experimenting AIaaS application offers image processing services only. As a result, the cloud servers do not have any configuration variations. However, applications that offer multi-modal services, such as audio and video processing, will depend on multiple different hardware configurations at the back end. As a result, selecting the appropriate hardware to transfer the API call will require an additional logical layer to make appropriate decisions. The proposed RAP-Optimizer does not incorporate this complexity. This is another limitation of this approach. A potential solution to overcome this weakness is categorizing the API requests based on their nature, which will be explored in the future of this method.

The RAP-Optimizer considers that all API requests can be uniformly distributed across available hosts, without accounting for hardware differences or the geographical location of hosts, which may affect response times or performance. Addressing this limitation could involve enhancing the system to factor in host-specific attributes like hardware capabilities and geographic proximity, allowing for smarter resource allocation that reduces latency and improves service quality.

6.3. Predictive Optimization Approach

The current version of the proposed RAP-Optimizer is a predictive approach. The overall effectiveness of the process depends on the accuracy of the prediction. One of the major weaknesses of this approach is if the configurations are predicted incorrectly, the RAP-Optimizer fails to handle it. One of the solutions to overcome this limitation is to incorporate the false positive rate into the decision-making process. The potential of this solution will be explored in the future scope of this study.

6.4. Single-Cloud Focus

Multi-cloud or hybrid architectures offer more flexibility and cost-effective structure. The proposed AIaaS application runs on a single-cloud architecture, so the entire experiment is conducted on it. The findings are relevant, and the proposed approach is suitable for single-cloud architecture only. This is one of the weaknesses of the proposed system. However, hundreds of AIaaS run on single-cloud architecture, which makes the proposed approach feasible and effective. The subsequent version of this paper will focus on multi-cloud or hybrid-cloud architecture in the future.

7. Conclusions

The RAP-Optimizer presented in this paper demonstrates an effective way to overcome the challenge of resource utilization and cost optimization in AIaaS applications. Although it focuses on an image enhancement application only, this study plays an instrumental role in creating the blueprint for optimization techniques for similar applications. This study also demonstrated an innovative application of the combination of DNN and simulated annealing algorithm in cloud resource optimization. The experimental results presented in this paper show significant reductions in the number of active physical hosts and server costs without compromising service quality. Over a twelve-month observation period, the RAP-Optimizer achieved an average reduction of 45% in server costs. It further shows a 179% increase in profit margins, clearly highlighting the practical benefits of the system. The DNN was designed and trained in this study to predict the resource configuration and maintain high classification accuracy across all five service classes. The confusion matrix shows that the DNN achieved 97.48% validation accuracy. K-fold cross-validation was performed to validate the performance. In the cross-validation, the performance of the DNN prediction remained consistent for all folds. That means the DNN is both accurate and well-trained and has good generalization capability. Moreover, the introduction of the unique dynamic dropout control (DDC) algorithm contributed to mitigating the overfitting effects. Combining the simulated annealing algorithm with the DNN unblocked a new dimension of AIaaS cloud resource optimization, which dynamically manages the cloud resources. It effectively reduces the number of active hosts, which maximizes the existing resource utilization without activating new physical hosts, which saves a significant amount of resources and energy. This efficiency directly translates into improved sustainability and cost-effectiveness for AIaaS applications.

The innovative design, effective performance, and real-world positive impact on enlarging profit margin in AIaaS application business demonstrated the potential of the proposed RAP-Optimizer. Despite multiple advantages, the RAP-Optimizer suffers from certain limitations. Excluding other resources except for CPU and memory utilization, lack of heterogeneous API request management analysis, and experiment on the single-cloud environment are some of the limitations of the experiment presented in this paper. In conclusion, the RAP-Optimizer provides a scalable, efficient, and cost-effective solution for resource management in cloud-based AI services, offering substantial improvements in operational efficiency, energy savings, and profit margins. As AI-driven applications continue to grow in scale and complexity, solutions like the RAP-Optimizer will play a crucial role in ensuring that cloud resources are utilized to their fullest potential, paving the way for more sustainable and profitable cloud-based business models.

Author Contributions

Conceptualization, K.S. and S.A.; methodology, R.A. and A.V.; software, K.S. and A.V.; validation, K.S., R.A. and S.A.; formal analysis, K.S. and R.A.; investigation, R.A. and A.V.; resources, S.A.; data curation, R.A.; writing—original draft preparation, K.S. and R.A.; writing—review and editing, A.V. and S.A.; visualization, K.S. and S.A.; supervision, S.A.; project administration, K.S. and R.A.; funding acquisition, S.A. All authors have read and agreed to the published version of the manuscript.

Funding

Thisresearch received no external funding.

Data Availability Statement

The data utilized in this study were generated from an AIaaS application and are considered confidential. However, the data can be made available upon reasonable request. Requesters must agree to the terms and conditions, which stipulate that the data cannot be used for any commercial or research purposes without prior permission from the owner.

Conflicts of Interest

Author Kaushik Sathupadi was employed by the company Google LLC, author Ramya Avula was employed by the company Business Information Developer Consultant Company, author Arunkumar Velayutham was employed by the company Cloud Software Development Engineer and Technical Lead at Intel and author Sandesh Achar was employed by the company Walmart Global Tech. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Deng, S.; Zhao, H.; Huang, B.; Zhang, C.; Chen, F.; Deng, Y.; Yin, J.; Dustdar, S.; Zomaya, A.Y. Cloud-native computing: A survey from the perspective of services. Proc. IEEE 2024, 112, 12–46. [Google Scholar] [CrossRef]
Tuli, S.; Mirhakimi, F.; Pallewatta, S.; Zawad, S.; Casale, G.; Javadi, B.; Yan, F.; Buyya, R.; Jennings, N.R. AI augmented Edge and Fog computing: Trends and challenges. J. Netw. Comput. Appl. 2023, 216, 103648. [Google Scholar] [CrossRef]
Badshah, A.; Ghani, A.; Siddiqui, I.F.; Daud, A.; Zubair, M.; Mehmood, Z. Orchestrating model to improve utilization of IaaS environment for sustainable revenue. Sustain. Energy Technol. Assess. 2023, 57, 103228. [Google Scholar] [CrossRef]
Horchulhack, P.; Viegas, E.K.; Santin, A.O.; Ramos, F.V.; Tedeschi, P. Detection of quality of service degradation on multi-tenant containerized services. J. Netw. Comput. Appl. 2024, 224, 103839. [Google Scholar] [CrossRef]
Miikkulainen, R.; Liang, J.; Meyerson, E.; Rawal, A.; Fink, D.; Francon, O.; Raju, B.; Shahrzad, H.; Navruzyan, A.; Duffy, N.; et al. Evolving deep neural networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing; Elsevier: Amsterdam, The Netherlands, 2024; pp. 269–287. [Google Scholar]
Pardalos, P.M.; Mavridou, T.D. Simulated annealing. In Encyclopedia of Optimization; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–3. [Google Scholar]
Zhou, G.; Tian, W.; Buyya, R.; Xue, R.; Song, L. Deep reinforcement learning-based methods for resource scheduling in cloud computing: A review and future directions. Artif. Intell. Rev. 2024, 57, 124. [Google Scholar] [CrossRef]
Mohammadzadeh, A.; Chhabra, A.; Mirjalili, S.; Faraji, A. Use of whale optimization algorithm and its variants for cloud task scheduling: A review. In Handbook of Whale Optimization Algorithm; Elsevier: Amsterdam, The Netherlands, 2024; pp. 47–68. [Google Scholar]
Musabimana, B.B.; Bucaioni, A. Integrating AIaaS into Existing Systems: The Gokind Experience. In Proceedings of the International Conference on Information Technology-New Generations; Springer: Berlin/Heidelberg, Germany, 2024; pp. 417–426. [Google Scholar]
Kurian, A.M.; Onuorah, M.J.; Ammari, H.M. Optimizing Coverage in Wireless Sensor Networks: A Binary Ant Colony Algorithm with Hill Climbing. Appl. Sci. 2024, 14, 960. [Google Scholar] [CrossRef]
Faruqui, N.; Yousuf, M.A.; Kateb, F.A.; Hamid, M.A.; Monowar, M.M. Healthcare As a Service (HAAS): CNN-based cloud computing model for ubiquitous access to lung cancer diagnosis. Heliyon 2023, 9, e21520. [Google Scholar] [CrossRef]
Hossen, R.; Whaiduzzaman, M.; Uddin, M.N.; Islam, M.J.; Faruqui, N.; Barros, A.; Sookhak, M.; Mahi, M.J.N. Bdps: An efficient spark-based big data processing scheme for cloud fog-iot orchestration. Information 2021, 12, 517. [Google Scholar] [CrossRef]
Achar, S.; Faruqui, N.; Bodepudi, A.; Reddy, M. Confimizer: A novel algorithm to optimize cloud resource by confidentiality-cost trade-off using bilstm network. IEEE Access 2023, 11, 89205–89217. [Google Scholar] [CrossRef]
Kumar, P.; Kumar, R. Issues and challenges of load balancing techniques in cloud computing: A survey. ACM Comput. Surv. (CSUR) 2019, 51, 1–35. [Google Scholar] [CrossRef]
Xu, W.; Jang-Jaccard, J.; Singh, A.; Wei, Y.; Sabrina, F. Improving performance of autoencoder-based network anomaly detection on nsl-kdd dataset. IEEE Access 2021, 9, 140136–140146. [Google Scholar] [CrossRef]
Shi, J.; Fu, K.; Wang, J.; Chen, Q.; Zeng, D.; Guo, M. Adaptive QoS-aware Microservice Deployment with Excessive Loads via Intra-and Inter-Datacenter Scheduling. IEEE Trans. Parallel Distrib. Syst. 2024, 35, 1565–1582. [Google Scholar] [CrossRef]
Vuillod, B.; Zani, M.; Hallo, L.; Montemurro, M. Handling noise and overfitting in surrogate models based on non-uniform rational basis spline entities. Comput. Methods Appl. Mech. Eng. 2024, 425, 116913. [Google Scholar] [CrossRef]
Alnagashi, F.A.K.Q.; Rahim, N.A.; Shukor, S.A.A.; Hamid, M.H.A. Mitigating Overfitting in Extreme Learning Machine Classifier Through Dropout Regularization. Appl. Math. Comput. Intell. (AMCI) 2024, 13, 26–35. [Google Scholar]
Salehin, I.; Kang, D.K. A review on dropout regularization approaches for deep neural networks within the scholarly domain. Electronics 2023, 12, 3106. [Google Scholar] [CrossRef]
Zhang, Z.; Xu, Z.Q.J. Implicit regularization of dropout. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 4206–4217. [Google Scholar] [CrossRef]
Poobalan, A.; Sangeetha, S.; Shanthakumar, P. Performance Optimization and Energy Minimization of Cloud Data Center Using Optimal Switching and Load Distribution Model. Sustain. Comput. Inform. Syst. 2024, 43, 101013. [Google Scholar]
Buyya, R.; Ilager, S.; Arroba, P. Energy-efficiency and sustainability in new generation cloud computing: A vision and directions for integrated management of data centre resources and workloads. Softw. Pract. Exp. 2024, 54, 24–38. [Google Scholar] [CrossRef]
Katal, A.; Choudhury, T.; Dahiya, S. Energy optimized container placement for cloud data centers: A meta-heuristic approach. J. Supercomput. 2024, 80, 98–140. [Google Scholar] [CrossRef]
Mongia, V. EMaC: Dynamic VM Consolidation Framework for Energy-Efficiency and Multi-metric SLA Compliance in Cloud Data Centers. SN Comput. Sci. 2024, 5, 643. [Google Scholar] [CrossRef]
Rajagopalan, A.; Swaminathan, D.; Bajaj, M.; Damaj, I.; Rathore, R.S.; Singh, A.R.; Blazek, V.; Prokop, L. Empowering power distribution: Unleashing the synergy of IoT and cloud computing for sustainable and efficient energy systems. Results Eng. 2024, 21, 101949. [Google Scholar] [CrossRef]
Sun, Y.; Wang, Z.J.; Deveci, M.; Chen, Z.S. Optimal releasing strategy of enterprise software firms facing the competition from cloud providers. Expert Syst. Appl. 2024, 236, 121264. [Google Scholar] [CrossRef]
Khan, A.Q.; Matskin, M.; Prodan, R.; Bussler, C.; Roman, D.; Soylu, A. Cloud storage cost: A taxonomy and survey. World Wide Web 2024, 27, 36. [Google Scholar] [CrossRef]
Nezafat Tabalvandani, M.A.; Hosseini Shirvani, M.; Motameni, H. Reliability-aware web service composition with cost minimization perspective: A multi-objective particle swarm optimization model in multi-cloud scenarios. Soft Comput. 2024, 28, 5173–5196. [Google Scholar] [CrossRef]
Chi, Y.; Dai, W.; Fan, Y.; Ruan, J.; Hwang, K.; Cai, W. Total cost ownership optimization of private clouds: A rack minimization perspective. Wirel. Netw. 2024, 30, 3855–3869. [Google Scholar] [CrossRef]
Moreira, L.F.R.; Moreira, R.; Travençolo, B.A.N.; Backes, A.R. An Artificial Intelligence-as-a-Service Architecture for deep learning model embodiment on low-cost devices: A case study of COVID-19 diagnosis. Appl. Soft Comput. 2023, 134, 110014. [Google Scholar] [CrossRef]
Simic, V.; Stojanovic, B.; Ivanovic, M. Optimizing the performance of optimization in the cloud environment–An intelligent auto-scaling approach. Future Gener. Comput. Syst. 2019, 101, 909–920. [Google Scholar] [CrossRef]
Chen, X.; Wang, H.; Ma, Y.; Zheng, X.; Guo, L. Self-adaptive resource allocation for cloud-based software services based on iterative QoS prediction model. Future Gener. Comput. Syst. 2020, 105, 287–296. [Google Scholar] [CrossRef]
Kirti, M.; Maurya, A.K.; Yadav, R.S. Fault-tolerance approaches for distributed and cloud computing environments: A systematic review, taxonomy and future directions. Concurr. Comput. Pract. Exp. 2024, 36, e8081. [Google Scholar] [CrossRef]
Debinski, M.; Breitinger, F.; Mohan, P. Timeline2GUI: A Log2Timeline CSV parser and training scenarios. Digit. Investig. 2019, 28, 34–43. [Google Scholar] [CrossRef]
Jayaweera, C.; Aziz, N. Reliability of principal component analysis and Pearson correlation coefficient, for application in artificial neural network model development, for water treatment plants. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2018; Volume 458, p. 012076. [Google Scholar]
Faruqui, N.; Yousuf, M.A.; Chakraborty, P.; Hossain, M.S. Innovative automation algorithm in micro-multinational data-entry industry. In Cyber Security and Computer Science: Proceedings of the Second EAI International Conference, ICONCS 2020, Dhaka, Bangladesh, 15–16 February 2020; Proceedings 2; Springer: Berlin/Heidelberg, Germany, 2020; pp. 680–692. [Google Scholar]
Racherla, S.; Sripathi, P.; Faruqui, N.; Kabir, M.A.; Whaiduzzaman, M.; Shah, S.A. Deep-IDS: A Real-Time Intrusion Detector for IoT Nodes Using Deep Learning. IEEE Access 2024, 12, 63584–63597. [Google Scholar] [CrossRef]
Demircioğlu, A. The effect of feature normalization methods in radiomics. Insights Imaging 2024, 15, 2. [Google Scholar] [CrossRef] [PubMed]
Geem, D.; Hercules, D.; Pelia, R.S.; Venkateswaran, S.; Griffiths, A.; Noe, J.D.; Dotson, J.L.; Snapper, S.; Rabizadeh, S.; Rosh, J.R.; et al. Progression of Pediatric Crohn’s Disease Is Associated With Anti–Tumor Necrosis Factor Timing and Body Mass Index Z-Score Normalization. Clin. Gastroenterol. Hepatol. 2024, 22, 368–376. [Google Scholar] [CrossRef] [PubMed]
Trivedi, S.; Patel, N.; Faruqui, N. NDNN based U-Net: An Innovative 3D Brain Tumor Segmentation Method. In Proceedings of the 2022 IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 26–29 October 2022; pp. 0538–0546. [Google Scholar]
Ullah, U.; Garcia-Zapirain, B. Quantum machine learning revolution in healthcare: A systematic review of emerging perspectives and applications. IEEE Access 2024, 12, 11423–11450. [Google Scholar] [CrossRef]
Faruqui, N.; Yousuf, M.A.; Whaiduzzaman, M.; Azad, A.; Alyami, S.A.; Liò, P.; Kabir, M.A.; Moni, M.A. SafetyMed: A novel IoMT intrusion detection system using CNN-LSTM hybridization. Electronics 2023, 12, 3541. [Google Scholar] [CrossRef]
Shahiwala, A.F.; Qawoogha, S.S.; Faruqui, N. Designing optimum drug delivery systems using machine learning approaches: A prototype study of niosomes. AAPS PharmSciTech 2023, 24, 94. [Google Scholar] [CrossRef]
Faruqui, N.; Yousuf, M.A.; Whaiduzzaman, M.; Azad, A.; Barros, A.; Moni, M.A. LungNet: A hybrid deep-CNN model for lung cancer diagnosis using CT and wearable sensor-based medical IoT data. Comput. Biol. Med. 2021, 139, 104961. [Google Scholar] [CrossRef]
Wang, L.; Ye, W.; Zhu, Y.; Yang, F.; Zhou, Y. Optimal parameters selection of back propagation algorithm in the feedforward neural network. Eng. Anal. Bound. Elem. 2023, 151, 575–596. [Google Scholar] [CrossRef]
Xie, G.; Lai, J. An interpretation of forward-propagation and back-propagation of dnn. In Pattern Recognition and Computer Vision: Proceedings of the First Chinese Conference, PRCV 2018, Guangzhou, China, 23–26 November 2018, Proceedings, Part II 1; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–15. [Google Scholar]
Cong, S.; Zhou, Y. A review of convolutional neural network architectures and their optimizations. Artif. Intell. Rev. 2023, 56, 1905–1969. [Google Scholar] [CrossRef]
Paula, L.P.O.; Faruqui, N.; Mahmud, I.; Whaiduzzaman, M.; Hawkinson, E.C.; Trivedi, S. A novel front door security (FDS) algorithm using GoogleNet-BiLSTM hybridization. IEEE Access 2023, 11, 19122–19134. [Google Scholar] [CrossRef]
Cao, Y.; Maghsudi, S.; Ohtsuki, T.; Quek, T.Q. Mobility-aware routing and caching in small cell networks using federated learning. IEEE Trans. Commun. 2023, 72, 815–829. [Google Scholar] [CrossRef]
Hossain, M.E.; Faruqui, N.; Mahmud, I.; Jan, T.; Whaiduzzaman, M.; Barros, A. DPMS: Data-Driven Promotional Management System of Universities Using Deep Learning on Social Media. Appl. Sci. 2023, 13, 12300. [Google Scholar] [CrossRef]
Kaur, H.; Anand, A. Review and analysis of secure energy efficient resource optimization approaches for virtual machine migration in cloud computing. Meas. Sens. 2022, 24, 100504. [Google Scholar] [CrossRef]
Pavlik, J.; Sobeslav, V.; Horalek, J. Statistics and analysis of service availability in cloud computing. In Proceedings of the 18th International Database Engineering & Applications Symposium, Porto, Portugal, 7–9 July 2014; pp. 310–313. [Google Scholar]
Augustyn, D.R.; Wyciślik, Ł.; Sojka, M. Tuning a Kubernetes Horizontal Pod Autoscaler for Meeting Performance and Load Demands in Cloud Deployments. Appl. Sci. 2024, 14, 646. [Google Scholar] [CrossRef]
Nanthini, N.; Prabha, P.S.; Vidhyasri, R.; Anand, V.V. Fault Tolerance Using AutoScaling in Amazon Web Services. In Proceedings of the 2024 International Conference on Computing and Data Science (ICCDS), Chennai, India, 26–27 April 2024; pp. 1–6. [Google Scholar]
Ali, B.; Golec, M.; Singh Gill, S.; Cuadrado, F.; Uhlig, S. ProKube: Proactive Kubernetes Orchestrator for Inference in Heterogeneous Edge Computing. Int. J. Netw. Manag. 2024, e2298. [Google Scholar] [CrossRef]
Pinciroli, R.; Ali, A.; Yan, F.; Smirni, E. Cedule+: Resource management for burstable cloud instances using predictive analytics. IEEE Trans. Netw. Serv. Manag. 2020, 18, 945–957. [Google Scholar] [CrossRef]
Li, S.; Zhou, Y.; Jiao, L.; Yan, X.; Wang, X.; Lyu, M.R.T. Towards operational cost minimization in hybrid clouds for dynamic resource provisioning with delay-aware optimization. IEEE Trans. Serv. Comput. 2015, 8, 398–409. [Google Scholar] [CrossRef]

Figure 1. The relationship among server costs, return on investment, and revenue margin.

Figure 2. The methodological overview of the proposed RAP-Optimizer.

Figure 3. The overlapping features from three different log files.

Figure 4. The feature variable ranges before and after performing the Z-score normalization.

Figure 5. The network architecture of the 6-layer deep fully connected neural network.

Figure 6. The learning progress in terms of training accuracy, validation accuracy, training loss, and validation loss.

Figure 7. The state space landscape shows the number of active VMs running on hosts.

Figure 8. The confusion matrix obtained from the test dataset with 21,417 instances.

Figure 9. The resource configuration prediction performance analysis using k-fold cross-validation.

Figure 10. The comparison of the number of active hosts before and after using the deep-annealing algorithm.

Figure 11. The cost optimization before and after using the proposed method.

Table 1. Comparison of RAP-Optimizer with existing resource optimization methods.

Optimization Model	Advantages	Limitations
Hill-Climbing (HC) Algorithm	Simple and effective for static resource allocation.	Operates reactively; does not maximize existing resource utilization; high cost in dynamic workloads
CEDULE+	Predictive analytics for burstable instances, good for CPU optimization	Limited in API request reallocation; moderate cost savings; does not address DNN overfitting issues
Conventional Autoscaling	Reliable for predictable workloads; quick resource activation	Activates new hosts frequently without utilization check; costly in AIaaS with variable workloads
RAP-Optimizer (Proposed)	Predictive DNN with dynamic dropout control; proactive host deactivation; 45% cost savings	Initial setup complexity requires training data; potential cost without mobility-aware routing.

Table 2. A modified sample of the dataset with all target variables.

Peak Frequency	Active Time (hours)	API Initiation Count	Service Requests	vCPU	vRAM (GB)	vDisk (GB)	Energy Usage (Wh)	Cloud Configuration
2	0.25	12	1500	2	1.5	0.6	10	Basic
4	0.45	30	1800	4	2.5	1	40	Standard
7	1.5	220	6200	5	4	2.5	50	Intermediate
8	3.2	340	8300	7	6	3.5	75	Advanced
11	5.8	550	11,200	9	8	5	95	Premium

Table 3. Impacts of different network configurations on overfitting and underfitting characteristics in the RAP-Optimizer.

Network Configuration	Number of Hidden Layers	Neurons per Layer	Characteristics	Observed Behavior
Initial Configuration	6	32	Overfitting	High training accuracy, low validation accuracy.
Modified Configuration 1	5	32	Overfitting	Overfitting persists, validation accuracy slightly improves but still significantly lower than training.
Modified Configuration 2	4	32	Overfitting	Moderate overfitting; slight improvement in validation performance, but gap remains.
Modified Configuration 3	3	16	Overfitting	Reduced overfitting but validation accuracy still does not match training accuracy.
Modified Configuration 4	4	8	Underfitting	Model starts underfitting; both training and validation accuracy are low.
Modified Configuration 5	4	4	Underfitting	Significant underfitting; both accuracies remain low, model complexity too reduced.
Modified Configuration 6	2	16	Underfitting	Underfitting persists, accuracy too low for both training and validation.
Modified Configuration 7	1	32	Underfitting	Severe underfitting; network too shallow to capture complex patterns.

Table 4. Performance Evaluation using K-Fold Cross-Validation.

Metrics	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5	Fold 6	Average
Accuracy	0.962	0.961	0.964	0.96	0.963	0.961	0.9618
Precision	0.975	0.976	0.973	0.977	0.974	0.976	0.9751
Recall	0.97	0.969	0.971	0.968	0.97	0.969	0.9695
F1 score	0.972	0.971	0.973	0.97	0.972	0.971	0.9715

Table 5. The number of active hosts per 24 h before and after using the deep-annealing algorithm.

Weeks	Number of Active Host Per 24 h		Reduction
Weeks	Without Deep-Annealing	With Deep-Annealing	Reduction
1	36	30	6
2	29	22	7
3	37	33	4
4	31	27	4
5	35	29	6
6	33	27	6
7	39	34	5
8	30	25	5
9	38	32	6
10	32	27	5
11	34	28	6
12	29	24	5
Average	33	28	5

Table 6. API request handling and resource optimization before and after using the deep-annealing algorithm.

Host	Resource Capacity		API Requests Before RAP-Optimizer		API Requests After RAP-Optimizer
Host	CPU (Cores)	RAM (GB)	Processed	CPU Cores	Processed	CPU Cores
1	10	128	12	8	18	10
2	10	128	15	9	20	9
3	10	128	16	10	22	10
4	10	128	7	5	Handled by Host 1–3	Idle Mode
5	10	128	14	9	19	10
6	10	128	6	4	Handled by Host 1–3	Idle Mode
7	10	128	9	7	18	9
8	10	128	13	8	20	9
9	10	128	10	6	16	9
10	10	128	5	3	Handled by Host 5–9	Idle Mode
11	10	128	8	6	Handled by Host 5–9	Idle Mode
12	10	128	4	3	Handled by Host 5–9	Idle Mode

Table 7. Numerical analysis of the objective achievement.

Months	Server Cost in USD		Return in USD		Revenue Margin in USD
Months	Before	After	Before	After	Before	After
1	200	150	500	500	300	350
2	500	400	1200	1200	700	800
3	800	650	1800	1900	1000	1250
4	1100	800	2200	2300	1100	1500
5	1500	900	2500	2600	1000	1700
6	1800	1000	2600	2700	800	1700
7	2000	1100	2700	2800	700	1700
8	2100	1150	2750	2800	650	1650
9	2200	1200	2800	2850	600	1650
10	2400	1200	2900	2900	500	1700
11	2500	1250	2900	2950	400	1700
12	2600	1250	3000	3000	400	1750

Table 8. Comparative analysis of RAP-Optimizer and CEDULE+.

Metric	RAP-Optimizer	CEDULE+	Percentage Improvement
Cost Savings (%)	45%	32%	+13%
Active Host Reduction (%)	40%	30%	+10%
Prediction Accuracy (%)	96.1%	92.5%	+3.6%
Response Time (ms)	85 ms	120 ms	−29.2%

Table 9. Performance comparison of RAP-Optimizer with and without the mobility-aware component.

Metric	Standard RAP- Optimizer	With Mobility-Aware Component	Percentage Improvement
Cost Savings (%)	45%	50%	+5%
Active Host Reduction (%)	40%	42%	+2%
Prediction Accuracy (%)	96.1%	96.3%	+0.2%
Response Time (ms)	85 ms	70 ms	+18%
Routing Cost Reduction (%)	-	15%	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sathupadi, K.; Avula, R.; Velayutham, A.; Achar, S. RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications. Electronics 2024, 13, 4462. https://doi.org/10.3390/electronics13224462

AMA Style

Sathupadi K, Avula R, Velayutham A, Achar S. RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications. Electronics. 2024; 13(22):4462. https://doi.org/10.3390/electronics13224462

Chicago/Turabian Style

Sathupadi, Kaushik, Ramya Avula, Arunkumar Velayutham, and Sandesh Achar. 2024. "RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications" Electronics 13, no. 22: 4462. https://doi.org/10.3390/electronics13224462

APA Style

Sathupadi, K., Avula, R., Velayutham, A., & Achar, S. (2024). RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications. Electronics, 13(22), 4462. https://doi.org/10.3390/electronics13224462

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications

Abstract

1. Introduction

2. Literature Review

2.1. Cloud Resource Optimization

2.2. Workload Balancing and API Request Handling

2.3. Overfitting in Deep Neural Networks

2.4. Energy-Efficient Cloud Systems

2.5. Revenue Impact and Cost Optimization

2.6. Comparison with Existing Technologies

3. Problem Analysis and Objective

4. Methodology

4.1. Application-Specific Adaptability and Resource Requirements

4.2. Dataset Preparation

4.2.1. Dataset Description

4.2.2. Dataset Cleaning

4.2.3. Feature Normalization

4.2.4. Dataset Splitting

4.2.5. Dataset Characteristics and Generalizability

4.3. Network Architecture

Optimal VM Configuration Predicted by DNN

4.4. Training the Network

4.4.1. Learning Algorithm

4.4.2. Learning Curve

4.4.3. DDC Algorithm

4.4.4. Achieving Optimal Unbiased Performance with DDC

4.4.5. VM Resource Requirements for AIaaS Workloads

4.5. RAP-Optimizer

4.5.1. Resource Analysis

4.5.2. Resource Space Landscape (RSL)

4.5.3. Deep-Annealing Algorithm

4.5.4. Criteria for Optimal Resource Usage

4.6. Mobility-Aware Resource Allocation

5. Experimental Result and Evaluation

5.1. Evaluation Metrics

5.2. Confusion Matrix Analysis

5.3. K-Fold Cross-Validation

5.4. Active Physical Host

5.5. Request Optimization

5.6. Resource Optimization

5.7. Objective Achievement

5.8. Case Study Analysis

5.9. Comparative Performance Evaluation

5.10. Mobility-Aware Performance Comparison

6. Limitations and Future Scope

6.1. Resource Utilization Focus

6.2. Uniform API Request Handling

6.3. Predictive Optimization Approach

6.4. Single-Cloud Focus

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI