1. Introduction
Executing queries in relational database applications with large amounts of data can take significant time. Database management systems (DBMSs) offer various mechanisms to reduce query execution time. One such mechanism is the creation and management of indexes. Creating column indexes is a strategy that reduces the time required to search and retrieve data. However, this index selection problem, which is to find an optimal set of indexes (i.e., an optimal index configuration) for given database tables, is an NP-hard problem [
1,
2]. This problem becomes more complex for large-scale database applications. Furthermore, the necessity for deleting, modifying, and inserting data may occur with considerable frequency, introducing further complexities to the problem. Indexes are managed by the database administrator (DBA), who has the knowledge about the query workload to create an efficient index configuration. As the query workload changes, the DBA must re-evaluate the index configuration. To reduce the burden on the DBA, various algorithms have been proposed to automate the process of tuning database indexes in classical computing. These include algorithms that use supervised machine learning techniques to learn what indexes have been used and how queries have been performed in the past from the given training data and predict what indexes should be created for the new query workload. Since training data are often difficult to obtain, there are index tuning algorithms that make use of reinforcement learning, which does not depend on training data and learns as it goes [
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16].
Quantum computing is an emerging technology that transforms the way information is processed, offering significant potential advantages over classical systems enabled by the quantum theory principles such as superposition and entanglement. This has been verified by Shor’s algorithm [
17], capable of factoring prime numbers in polynomial time and by a quantum search algorithm proposed by Lov Grover [
18], with the ability to perform searches on unstructured data with complexity
, where
is the number of elements in the search space. Additionally, in database management, quantum computing has been investigated with promising results in several areas of database management, including query optimization and transaction scheduling [
19], which are two other NP-hard problems.
However, an analysis of the current state of the art reveals that there are no studies that implement quantum reinforcement learning strategies with experimental results in the process of automating index tuning. To address this gap, in this paper, an existing index tuning algorithm for classical computers is implemented and a quantum–classical (hybrid) version, called Quantum Reinforcement Learning for database Index Tuning (QRLIT), that employs the capabilities of Grover’s search is proposed. The primary objective is to compare the performance of the hybrid algorithm against its classical counterpart.
The implemented classical algorithm [
3,
20] employs a machine learning technique called reinforcement learning [
21]. It is composed of two principal elements, a designated agent and environment. The agent learns to make decisions through interactions with the environment using a trial-and-error learning method. The classical index tuning algorithm employs a technique called Epsilon-greedy [
21] to balance the agent’s ability to explore or follow its learned policy (exploiting). The proposed hybrid model replaces this technique with Grover’s search algorithm, which enables a probabilistic approach and a natural balancing of the exploring–exploiting duality through the manipulation of the number of iterations.
This paper contributes a novel algorithm that combines quantum computing with reinforcement learning to automate the process of database index tuning. Furthermore, a series of experiments demonstrate the advantages of using quantum computing over traditional systems. The results obtained indicate that QRLIT converges faster to an optimal policy and is able to produce a higher reward in terms of queries processed per hour than its classical counterpart.
The rest of the paper is organized as follows:
Section 2 provides some background information.
Section 3 presents an overview of the state of the art, and the classical index tuning algorithm with its quantum counterpart implementation is described in
Section 4. The experimental environment and results obtained from running both algorithms are detailed in
Section 5. Finally,
Section 6 concludes the paper and proposes directions for future work.
3. Related Work
This section presents an overview of the current state of the art for classical index tuning algorithms that employ reinforcement learning and quantum index tuning algorithms. It concludes with a detailed description of the selected classical algorithm to be converted into a quantum version.
3.1. Classical Index Tuning Algorithms Using Reinforcement Learning
Nowadays, there has been a significant contribution in the domain of index tuning, which plays a fundamental role in the efficacy of database searches.
In [
25], the authors synthesize the current state of the art of this subject, referring to various index optimization methods, including methods using reinforcement learning. The method COREIL [
4] uses policy iteration as the algorithm, while SMARTIX [
3] uses linear Q-learning. As an evolution of SMARTIX, the authors in [
20] present an approach that corrects the implementation of the TPC-H benchmark [
26], which involved the execution order of the queries being processed incorrectly, as it differed from that specified in the TPC-H documentation [
27]. The methods NoDBA [
5], Lan’s DQN [
6], DRLindex [
7], MANTIS [
8], and DRLISA [
9] implement deep Q-networks (DQN) as algorithms; Welborn’s index advisor [
10] uses the Sinkhorn Policy Gradient, while SWIRL [
11] uses proximal policy optimization (PPO). BAIT [
12] and AutoIndex [
13] adopt Monte Carlo tree search (MCTS), and Lai’s PPO-MC [
14] uses proximal policy optimization–Monte Carlo (PPO-MC). Finally, DBABandit [
15] and HMAB [
16] use a technique called the multi-armed bandit (MAB) as an algorithm.
The methods presented use a variety of approaches to solve the problem of automating indexes; however, they were designed for classical computers.
3.2. Quantum Algorithms for Index Tuning
There exists little research in the area of quantum index tuning. The article [
2] proposes the conversion of the classical algorithm DINA (deep reinforcement divergent index advisor) [
28] to a quantum version. However, the paper is at an early stage; it has not provided quantum implementation details and experimental results.
Besides the capabilities that reinforcement learning provides to automate index tuning problems, other techniques are used. The paper [
29] leverages the capabilities of quantum annealers by proposing novel techniques to map the database indexes into the qubits of the quantum annealer. One technique exploits the qubits more efficiently by reducing the asymptotic qubit growth from quadratic to linear by incorporating additional auxiliary variables. The second technique is embedded within the transformation function, where efficiency is achieved through a process of extensive pre-processing before the run time. This technique generates a library of embedding templates, which cover a subset of index selection problem instances.
The paper in [
30] proposes SQIA, a quantum–classical (hybrid) index advisor that delivers optimal solutions with high probability by using a novel Grover search-based approach. This approach implements an efficient quantum Oracle used in the Grover search algorithm, which loads the problem dada into the qubit phases. In other words, this technique loads and encodes the storage cost, benefits, and constraints.
The present literature review reveals that, in addition to the vision paper proposing a quantum counterpart of DINA [
21], there is currently no quantum counterpart implementation with experimental results of index advisory using reinforcement learning.
3.3. Classical SMARTIX Algorithm
The SMARTIX experiments presented by the authors of [
3] demonstrated a good balance between the disk space utilized by its index configuration and the performance metric it can achieve, which led to the selection of its evolution [
20] as the foundation for the development of a quantum version in our work. As the authors of [
20] have made the source code publicly available on GitHub [
31], our work is built on that code, containing the adaptations required to fit the quantum algorithm and preserving the original characteristics. For the environment, they utilize a scalable database benchmark, TPC-H [
26], which offers a set of features. These features allow for the generation of data for a predefined group of database tables and the construction of 22 instances of queries according to 22 query templates.
The TPC-H benchmark schema includes eight database tables, each with a distinct set of attributes. When these attributes are added together, the state space contains 45 of these being available for indexing. Each attribute has two possible actions (CREATE or DROP), which generate a state space with a total of 90 actions, each encoded in a natural decimal value in the interval [0, 89].
The reward is defined by a TPC-H performance metric, expressed in queries per hour (QphH) (Equation (17)), which is composed of two other metrics, namely Power and Throughput. The Power metric is designed to measure the computing speed of simple queries (Equation (16)). The Throughput metric measures the capacity to process the maximum number of queries in the shortest time using parallelism mechanisms (Equation (15)).
The equations are composed of several elements. The quantity 3600 represents the number of seconds per hour, while the variable
denotes the execution time of query
. The variable
symbolizes the execution time of the refresh function
, which is responsible for inserting and removing records from the database. The variable
represents the number of query streams executed,
the scale factor of the database,
the total time needed to run the throughput test for the
streams, and finally,
represents the size of the database.
To address the issue of a tabular policy, SMARTIX uses a variant of Q-learning called Q-learning, with linear feature approximation as its reinforcement learning algorithm [
32]. This policy is represented by a set of weights, collectively referred to as a feature vector. A feature is defined as an element of the state space or the action space, so the vector has a total of 135 weights with an additional weight corresponding to a bias.
To calculate the Q-value, Equation (18) must be used, where
is the weight value and
is the value of each feature according to the current state of the environment.
However, during the learning process, it is crucial to modify the agent’s policy. The algorithm uses the temporal difference strategy with gradient descent (Equation (19)).
The SMARTIX algorithm works as follows: first, the feature vector is populated with random values, and the replay memory is set to an empty state. Second, a cycle is initiated based on a predefined number of episodes. In each episode, the database is set to an initial state . Subsequently, a sequence of steps is initiated. In each step, the algorithm determines the action to be executed in the environment using the Epsilon-greedy strategy and executes that action. Then, the environment moves to the new state and returns the reward (QphH) and its new state . With the reward obtained, the algorithm updates the feature vector. The algorithm then stores the experience and selects a mini batch of experiences and runs a replay on these data. Finally, the new state becomes the current state, and the algorithm repeats the sequence of steps for each episode until the episodes reach the end.
4. QRLIT: Quantum–Classical Implementation of Classical SMARTIX Algorithm
This section describes the implementation of our QRLIT algorithm, a hybrid quantum–classical version of SMARTIX. Initially, we present a method for combining quantum computing (QC) with reinforcement learning (RL), which serves as the basis for the development of QRLIT. Then, we provide a description of the process used to identify the components that were converted. Finally, we demonstrate and compute the fundamental values necessary for the construction of the quantum circuit and conclude with the QRLIT flow diagram and pseudocode.
Quantum reinforcement learning (QRL) is a method that combines the capabilities of QC and RL. Similarly to its classical counterpart, quantum reinforcement learning also includes a policy, a state space, an action space, and a reward function but is inspired by the superposition principle and quantum parallelism [
33]. Based on the novel algorithm proposed in [
33] for QRL, the authors of the paper [
34] propose an algorithm called quantum Q-learning (QQRL) that stores the policy in a superposition state and uses Grover’s algorithm as a strategy to amplify the probability amplitude of the best action based on the learned policy. Grover’s algorithm exploits the natural behavior of superposition states and offers a good balance between exploration and exploitation. This balance can be achieved by controlling the number of Grover iterations
through the learning process of the agent. In other words, as the agent learns and the number of iterations increases, the capacity to explore decreases until reaching a number of iterations
(as defined in Equation (14)), which maximizes the probability of measuring the “good” action. The number of iterations
is determined by the formula in Equation (20) from [
34], where
represents a rate that controls the proportion of policy and reward contributions and
denotes the maximum number of possible iterations.
Our QRLIT implementation is based on the QQRL algorithm. Therefore, we identified that the Epsilon-greedy procedure is replaced by the Grover search algorithm, keeping the remaining elements in a classical system. With Grover’s algorithm in QRLIT, we can not only determine the actions to be executed in the environment but also naturally balance the agent’s duality between exploration and exploitation. As previously outlined in the Background Section, Grover’s algorithm contains three distinct layers, namely State Preparation, Oracle, and Amplitude Amplification. In the State Preparation layer, we initiate the policy of the agent in a superposition state and in the Oracle, we encode the action with the highest Q-value in the current state of the environment. As the last layer, we implement the Amplitude Amplification, which amplifies the probability amplitude to measure the action encoded in the Oracle.
To run and build Grover’s algorithm, it is crucial to identify how many qubits are required in the quantum register to encode the actions. We calculate the number of qubits by using the formula
, presented by the authors of the paper [
33]. In this formula,
represents the size of the action space, while
denotes the number of qubits required to encode an action. We apply and solve the formula for a space of 90 actions and round off the excess, so that
is equal to seven (Equation (21)). We then define the maximum number of Grover iterations,
. As the number of qubits is already calculated,
is equal to 128 and therefore,
equals eight (Equation (22)).
As identified before, to create the QRLIT, we replace the Epsilon-greedy strategy in the classical algorithm with Grover’s search (line 6 in Algorithm 1).
Figure 2 illustrates the interactions between the principal components of QRLIT. The agent component initiates the first interaction through the execution of Grover’s algorithm, which returns the action
. In binary code, the action is converted to a decimal value and executed in the environment (line 7 in Algorithm 1). The environment then processes the value and enters a new state
. In this new state, the benchmark is prepared by creating the necessary query instances using QGen to run the power and throughput tests. Once the benchmark has been executed, the reward
and the new state
of the environment are returned to the agent. With these two values, the agent calculates the number of Grover iterations
, selects the action of the new state that contains the highest Q-value, and sends these values to the operation that builds the Grover algorithm circuit (line 8 in Algorithm 1). Then, the quantum circuit is constructed with all seven qubits initialized in the register in the state
. Our QRLIT proposal provides a natural balance between exploration and exploitation, allowing for more effective learning; as the agent learns and adjusts its policy, the exploration rate decreases (Equation (20)). Furthermore, given the properties of quantum parallelism and superposition states in Grover’s algorithm, this proposal offers another advantage, namely that it is able to find an action faster (complexity of
) (line 6 in Algorithm 1) than its classical counterpart (complexity of
).
Algorithm 1 QRLIT algorithm with Grover’s search, function approximation, and experience replay. Adapted from [3,34]. |
1: | |
2: | Empty initialization of replay memory D |
3: | for each episode do |
4: |
← DB initial index configuration mapped as initial state |
5: | for each step of episode do |
6: |
|
7: |
|
8: |
|
9: |
do |
10: | according to Equation (19) |
11: |
end for |
12: | in D |
13: |
D |
14: | Performance experience replay using sampled data |
15: |
s ← s′ |
16: |
end for |
17: | end for |
5. Performance Evaluation
This section presents the experiments conducted on the classical algorithm SMARTIX and its quantum–classical version QRLIT (source code in [
35]), as well as the analyses performed on the results. It is organized into three subsections.
Section 5.1 provides a detailed description of the environment used to execute the experiments and
Section 5.2 and
Section 5.3 presents the experiment results and their analysis.
5.1. Experimental Model
All experiments were conducted on a docker container with Ubuntu 22.04 on a 2021 MacBook Pro, which is equipped with 16GB of RAM, 1TB of disk space, and an Apple M1 Pro CPU with 10 cores. MySQL was used as the DBMS, which implements the TPC-H benchmark, while a simulator provided by the Qiskit SDK was used to build and execute the quantum algorithm.
Additionally, in accordance with the TPC-H benchmark specification [
27], 22 query instances were executed in the Power metric and 44 in the Throughput metric with two parallel streams (22 queries for each stream). This resulted in a total of 66 query instances being executed in each time step. The queries were generated through a tool provided by the TPC-H benchmark, designated as QGen.
The experiments were carried out according to the parameter settings outlined in
Table 1. The first parameter setting corresponds to the tests conducted in
Section 5.2 to study the overall performance of the algorithms when the database size is fixed at 10 MB, while the second parameter setting corresponds to
Section 5.3 to study the impact of database sizes of 10 MB, 20 MB, 30 MB, 40 MB, 70 MB, and 100 MB on the performance of the algorithms. In this second configuration, the number of episodes was reduced to 25 in order to reduce the time required to execute the experiments.
5.2. Overall Performance
In this section, experiments were conducted to study the overall performance of the two algorithms when the database is fixed at 10 MB. This study is based on the following metrics: number of queries processed per hour, episode execution time, temporal difference error, and number of Grover iterations. The first metric defines the quality of the algorithms in terms of their ability to identify a policy that maximizes the cumulative reward (queries per hour) over time. The episode execution time metric measures the velocity of the algorithms in executing an episode. The temporal difference error (TD Error) (Equation (1)) metric demonstrates the algorithm’s convergence to an optimal policy; in other words, the closer the values are to 0, the better the policy is. Finally, the Grover iterations metric measures the relation between exploration and exploitation; the lower the value, the higher the rate of exploration relative to exploitation. This metric allows for the analysis of the agent’s exploration capacity, which is directly correlated with the number of iterations.
The results obtained for each metric in each of 50 episodes for the two algorithms are shown in
Figure 3,
Figure 4,
Figure 5 and
Figure 6. The average results of each metric over 50 episodes of the two algorithms are summarized in
Table 2. The analysis of
Figure 3 and
Table 2 reveals that on average, the hybrid algorithm exhibits a higher number of queries processed per hour by 0.61% compared to its classical counterpart.
Moreover, from the analysis of
Figure 4, the hybrid algorithm has a much faster convergence to a low temporal difference error showing a more stable learning, displaying a temporal difference error trajectory closer to 0 (
Table 2).
To find a balance between exploring and exploiting, the classical algorithm implements a strategy known as Epsilon-greedy. This strategy uses an exploration rate epsilon
, which decreases with an exploration discount factor of 0.1 at the end of each episode. Therefore, with this algorithm, the agent explores with a probability of
or follows the learned policy (exploits) with a probability of
. In the case of the quantum–classical algorithm, as the agent learns and adjusts its policy, the number of Grover iterations also increases, consequently reducing the exploration probability (Equation (20)) (
Figure 5).
The quantum–classical algorithm provides a better index recommendation, resulting in a higher number of queries processed per hour than the classical algorithm because as the agent of the quantum–classical algorithm refines its policy through learning, the exploration rate decreases. This leads to a decrease in unnecessary explorations and allows for more effective learning. However, the quantum–classical algorithm takes on average 21.67% more time to complete an episode than its classical counterpart (
Figure 6). This discrepancy is related to the additional computational overhead to create and execute the quantum circuit at each time step.
Figure 7 shows the time required to build and execute Grover’s algorithm.
5.3. Impact of Database Sizes
The purpose of this section is to examine the behavior of the algorithms across a range of database sizes, specifically 10 MB, 20 MB, 30 MB, 40 MB, 70 MB, and 100 MB. The metrics employed in this analysis include the average number of queries processed per hour of the 25 episodes for each database size, the number of Grover iterations, and the temporal difference error of each algorithm.
The results obtained for each metric in each database size for the two algorithms are shown in
Figure 8,
Figure 9,
Figure 10,
Figure 11 and
Figure 12. The average results of each metric over the database sizes of the two algorithms are summarized in
Table 3. The analysis for the database sizes also indicates a superiority of the hybrid algorithm. The results in
Table 3 and
Figure 8 show that on average, the hybrid algorithm yields a higher number of queries processed per hour of 2.49% compared to its classical counterpart and displays a temporal difference error trajectory closer to 0 (
Figure 9). This trajectory is more evident in this analysis because the number of episodes is reduced by half, highlighting the importance of a faster convergence.
Furthermore, it can also be observed that the number of queries processed per hour decreases as the database size increases. This indicates that the size of the database affects the number of QphH generated, in other words, the reward. Consequently, according to Equation (20), which calculates the number of Grover’s iterations, and since the Q-values are directly related to the reward, they will also have smaller values. Thus, the smaller the policy and reward contribution, the smaller the number of iterations (
Figure 10), which increases the exploration rate (
Figure 11). Excessive exploration causes the agent not to follow the learned policy, resulting in a mostly random configuration of indexes as the database size increases.
In conclusion, besides the average superiority verified by the quantum–classical algorithm, the results in
Figure 10 also demonstrate the need to adjust the parameter
, which regulates the reward and policy contributions to the number of Grover’s iterations. In this case, as the reward value decreases, it is necessary to increase the value of
to reduce the exploration rate.
6. Conclusions and Future Work
This work presents the implementation of QRLIT, a hybrid quantum–classical version of SMARTIX [
3]. The QRLIT demonstrated better performance than its classical counterpart in terms of the number of queries processed per hour and a faster convergence to an optimal policy. By controlling the Grover iterations through the reward and the agent policy, as the agent refines its policy through learning, the exploration rate decreases, allowing for a superior temporal difference error convergence closer to zero with more effective learning compared to its classical counterpart. However, as the value of
controls the contribution of reward and policy to the number of Grover iterations, the increase in database size reveals the necessity to adjust this parameter manually to balance the exploration rate. This manual adjustment in an automatic system is a limitation because the reward (QphH) varies not only according to the size of the database but also according to the quality and capacity of the machine’s hardware.
As future work, we intend to analyze the behavior of the algorithms in databases with significant sizes and more queries. It would also be important to investigate their performance on distributed database systems. Finally, evaluating the execution of the quantum–classical algorithm on a real quantum computer is another direction for future research.