1. Introduction
Machine learning (ML) is the subfield of artificial intelligence (AI) that entails programming computers to learn from data [
1]. Among others, neural networks are one of the most widely used ML models, providing clear advantages including adaptive learning, self-organization, real-time operation, parallelism and fault tolerance [
2]. Deep learning is an extension of neural networks with a greater number of hidden neuron layers, and this technique is well suited for learning from large amounts of data, often referred to as big data [
3]. Over the years, many powerful deep neural networks (DNNs) models have been proposed and have been shown to exhibit exceptional performance in a variety of scenarios. Examples of applications where deep learning outperforms other state-of-the-art ML algorithms include speech recognition [
4,
5], computer vision [
6,
7], natural language processing [
8], cybersecurity [
9] and healthcare [
10,
11], to name a few.
The primary objective when designing DNN architectures is the efficient optimization of the network in such a way that it leads to better training accuracy (how well the model learns from the data) and validation accuracy (the model’s performance on unseen data) [
12,
13,
14]. Another important performance metric in DNNs is the model generalization (how efficiently the model adapts to new unseen data) [
15], which is the difference between the training and validation accuracy given that we have a large enough dataset for training the model and vice versa to avoid the issues of underfitting (low training accuracy and low validation accuracy) and overfitting (high training accuracy and low validation accuracy). Ignoring underfitting and overfitting, the generalization error can be defined as in Equation (
1).
The smaller the difference between the train and validation accuracy, the better the generalization. If the generalization error of a model is high, it means that the model is not actually learning, rather just memorizing the data. Consequently, the model will fail to efficiently predict using data it has not seen before. The convergence time (the time the model takes to reach an optimal performance) of neural networks is another important performance metric when dealing with real-world datasets, which should be reasonably practical.
In recent years, research in quantum computing has advanced considerably, mainly motivated by its potential to outperform classical computation for certain tasks. In fact, regarding quantum supremacy, refs. [
16,
17,
18] recently provided practical evidence of the computational advantage of quantum over classical computers. These successful experimental illustrations of quantum supremacy motivated the research community to explore the extent to which quantum computing can improve ML, which is today termed as quantum machine learning (QML). QML has become an interesting research topic, and various ML algorithms are being developed in the quantum realm. The primary purpose of QML is to explore and analyze the possible advantages quantum computation might offer to ML compared to classical ML algorithms.
Universal fault-tolerant quantum computers are anticipated to significantly enhance the performance of machine learning algorithms. Although the quest for building a universal fault tolerant quantum computer has had a great deal of effort devoted to it over the last decade and various important advances and milestones have been achieved, a universal fault-tolerant quantum computer is still not expected in the near future. However, small-scale quantum computers with limited numbers of qubits and with small resilience to noise have already been developed [
19,
20]. These small-scale quantum computers fall into the noisy intermediate scale quantum computation (NISQ) regime, and such quantum devices are not yet proven to enhance the performance of machine learning algorithms. However, significant progress has been made in QML [
21,
22,
23,
24], particularly quantum neural networks (QNNs) [
25,
26,
27,
28,
29,
30,
31,
32] for various applications including image generation [
33,
34,
35] and data classification [
23]. QNNs are also being explored in terms of their trainability and generalization [
36,
37,
38,
39,
40,
41,
42]. For example, a recent work [
39] investigated QNNs for NISQ devices and analyzed how a well-designed QNN can outperform classical neural networks in terms of data expressibility (the types of functions a neural network architecture can fit). Similarly, in [
43], QNNs have also been designed and analyzed on real-world datasets including MNIST [
44]. However, some claims have been made in the literature proposing the advantage of QNN over classical NNs [
43].
The main objective of quantum machine learning algorithms is to achieve better trainability and generalization with a reasonable model convergence time (at least compared with the classical machine learning algorithms). However, in the NISQ era, building better quantum machine learning models than their classical counterparts might be a challenge because of two fundamental problems: (1) the unavailability of native quantum datasets and (2) the unavailability of quantum memory (QRAM) and sufficiently strong quantum processors for storing and handling big data. While this limits the progress of developing standalone and sufficiently strong QNNs, it has motivated a hybrid quantum–classical approach [
26], which is now widely used to achieve a reasonable quantum advantage in neural networks. The hybrid quantum–classical neural networks (HQCNNs) follow the same architecture as QNNs, as shown in
Figure 1, while including classical input and output layers. The input layers reduce the input data dimension before being encoded into the quantum circuit, while the output layer is used to classically post-process the measurement results of the quantum circuit. Furthermore, HQCNNs are also possible to simulate on NISQ platforms and use variational quantum circuits due to their robustness against noise on NISQ devices [
45,
46,
47,
48]. We discuss HQCNNs and variational quantum circuits further in
Section 2.1 and
Section 2.4.
Some studies have claimed that QNNs can surpass classical DNNs for particular learning tasks, such as the discrete logarithm problem [
38], and quantum synthetic data classification [
23]. However, a relatively recent work [
49] criticized these claims and discussed the barren plateau problem (the problem of vanishing gradients) in QNNs, which poses a limitation on the applicability of QNNs for large-scale real-world problems. This limitation of QNNs makes it unclear whether or not the QNNs can provide any advantage over their classical counterparts. However, recently there have been some efforts to understand and overcome the issue of barren plateaus in QNNs [
50,
51], opening the doors for real world applications of QNNs.
There is still no standard methodology to design quantum circuits for QNNs, which we speculate is one reason behind the mixed opinions of the quantum advantage in QNNs. Therefore, it is crucial to develop a systematic approach to design quantum circuits for QNNs rather than relying on heuristics.
1.1. Contribution
Realizing the fact that the quantum part of HQCNNs is largely unexplored, in this paper, we perform a comprehensive Design Space Exploration (DSE) of quantum circuit construction in HQCNNs. To illustrate this process, we use image classification problems. We also demonstrate the practical quantum advantage of HQCNNs over pure classical networks in terms of computational efficiency and comparable accuracy. We use various commonly used parameterized quantum gates, particularly in the context of QNNs, with different data encoding techniques (specifically, amplitude and angle encoding), which gives us the best set of quantum gates with corresponding encoding techniques. We also develop different variants of HQCNNs with a maximum of four qubits (for a fair comparison with classical counterpart), progressing from simpler to more sophisticated quantum circuits. The primary objective of this exercise is to reduce the trial efforts required for designing quantum circuits for HQCNNs.
1.2. Organization
The rest of the paper is organized as follows:
Section 2 contains the necessary background and terminologies used in this paper. Our proposed hybrid HQCNN variants are introduced in
Section 3. The evaluation results and brief comparisons of all the hybrid and classical counterparts are presented in
Section 4 and
Section 5, respectively. Finally,
Section 7 concludes the paper.
3. Methodology
The methodology to design efficient quantum circuits in hybrid quantum–classical neural networks (HQCNNs) is not very well-defined in the literature, and that might be one of the important reasons behind the mixed opinions regarding the potential advantage of quantum computation in HQCNNs. In this paper, we develop different variants of HQCNNs and perform a comprehensive analysis of how various encoding techniques and quantum gates/circuits potentially affect the performance of these hybrid networks. This exercise will reduce the trial efforts required to select quantum gates (for quantum circuit design) and encoding techniques (encoding the data into quantum system) in HQCNNs for a specific application. Furthermore, we also compare the hybrid networks with their classical counterparts to determine whether the quantum layers introduce any advantage. In particular, we train the same model on both D103 and D204 to observe and compare the convergence rate and whether there is any computational advantage in the hybrid case. We also compare the models in terms of the overall accuracy and generalization error improvement rate.
For all the models (hybrid and classical), we use the Adam optimizer with an initial learning rate of 0.01. Moreover, the learning is scheduled for better training, and we use the early stopping method in keras to avoid overfitting. The maximum number of training epochs is set to 100; however, if there is no improvement in validation loss for three consecutive epochs, the learning rate scheduler reduces the learning rate by a factor of 0.1, and the new learning is calculated as shown in Equation (
7). If there is no improvement in validation loss for four consecutive epochs, the training is stopped to avoid overfitting.
The hybrid architectures we use in our experiments consist of two input layers. The first layer completely encodes the input features, and the following classical layer consists of four neurons, downsizing the feature size being encoded into the quantum circuit. The last classical layer of all the hybrid model consists of 10 neurons because the dataset we are using (MNIST) consists of 10 output classes. The qubit measurements are performed in the eigen-basis of for all the variants of implemented HQCNNs. In addition, for the angle encoding, the rotation gate used to encode features in qubit rotations is for all hybrid networks. In the following sections, we discuss in detail all our HQCNN variants and two classical counterparts. It is important to note here that we keep the number of qubits the same in all the variants of the HQCCNs to allow a fair comparison between them.
3.1. Hybrid Quantum–Classical Neural Network—Variant 1
The first variant of the implemented hybrid neural network (HQCNNv1) is relatively simple, with four qubits and only single qubit rotation gates followed by qubit measurements in the eigen-basis of
. The main purpose of HQCNNv1 is to (1) select the best batch sizes for the input dataset being fed to the circuit and (2) select the best parametrized rotation gates for both the data encoding techniques. Once the best parameters are selected, the rest of the hybrid network architectures is then be experimented upon to determine the best batch sizes and best gates. The complete architecture of HQCNNv1 is depicted in
Figure 3.
3.2. Hybrid Quantum–Classical Neural Network—Variant 2
Similar to HQCNNv1, the second variant of the hybrid network (HQCNNv2) also consists of four qubits. However, in HQCNNv2, instead of four single-qubit layers, two two-qubit layers have been used to introduce entanglement, which is one of the commonly used quantum mechanical properties in quantum computation (we use the CNOT gate create the entanglement). The HQCNNv2 architecture makes use of the best parameters (batch size and parametrized rotation gates), which we select based on the empirical results obtained for HQCNNv1 (
Section 4.1). The complete architecture of HQCNNv2 is shown in
Figure 4:
3.3. Hybrid Quantum–Classical Neural Network—Variant 3
In this section, we develop a more complex quantum circuit by entangling all four qubits. We analyze the performance of the third variant of the hybrid neural network (HQCNNv3) for both amplitude and angle encoding techniques with the best batch sizes and best rotation gates for each encoding (as discussed in
Section 4.3. The schematic of HQCNNv3 is shown in
Figure 5.
3.4. Classical Counterpart for Hybrid Networks
In order to compare the results and analyze the potential advantages of quantum layers in the network, we develop a pure classical model corresponding to our hybrid networks. The classical counterpart has two variants: (1) simply omitting the quantum part from
Figure 3, as shown in
Figure 6a, which we call CVa, and (2) replacing the four quantum layer from
Figure 3 with a classical layer of four neurons, as shown in
Figure 6b, which we call CVb. For a fair comparison with hybrid models, we made this classical layer a one-to-one connected layer and not a fully connected layer since the quantum layer in the hybrid network is not fully connected.
4. Results and Discussion
In this section, we report and discuss the results for all three variants of hybrid networks and two classical counterpart models for the hybrid networks.
4.1. Results—HQCNNv1
4.1.1. Small Dataset—D103
First, the HQCNNv1 was trained on D103. The following steps were performed to extract the best rotation gates and batch sizes for the small datatset:
The training results for the extraction of the best batch size are presented in
Table 1.
After the first step, the best batch sizes for D103 were found to be 8 and 32 with respect to accuracy, generalization and convergence time. Although the convergence time for a batch size of 64 is significantly lower than other batch sizes, it falls short in terms of overall accuracy and generalization error. The same experiment was repeated only for the best batch sizes with other rotation gates to extract the best rotation gate for amplitude encoding. The training results are presented in
Table 2.
Based on the experiment results shown in
Table 1 and
Table 2, we can conclude that, in terms of model accuracy, all four rotation gates have comparable performance, but
outperforms
and
in terms of model convergence time. Although
is slightly better in terms of overall accuracy, its convergence time is significantly higher than
. Moreover, while using
, the model converges faster, but it has reasonably low accuracy compared to the other gates, particularly
. As the
performs reasonably well with amplitude encoding, we use
whenever we use the amplitude encoding technique in the next variants of hybrid networks.
Now, we test the same set of rotation gates for the angle encoding technique. We already have the best batch sizes for D103; therefore, we only train the model for the best batch sizes. The training results of HQCNNv1 with angle encoding are shown in
Table 3.
Based on the results in
Table 3, we observe that
performs well with angle encoding in terms of accuracy, generalization and convergence time. Although
has a relatively smaller generalization error, not only is the overall accuracy is lower, but it also takes slightly more time to converge than
. Hence,
is found to be the best when the data are encoded using the angle encoding technique. Therefore, we use
whenever we use angle encoding in the next variants of hybrid networks.
4.1.2. Large Dataset—D204
Now, we train the same model (HQCNNv1) on D204 to analyze the change in model behavior with respect to accuracy, generalization error and convergence time. This exercise would provide grounds for a computational and expressibility (generalization ability) comparison of classical and hybrid networks by comparing the convergence rate and generalization error of both models, respectively. We use batch sizes of 8 and 32 (similar to that of D103) for fair comparison. We train the model for both amplitude and angle encoding with the corresponding best rotation gates and analyze if there is any improvement in the model performance. We observe that increasing the dataset size results in almost the same accuracy with better generalization and an obvious increase in model convergence time. The results are shown in
Table 4.
The graphical representation of the results of HQCNNv1 for both amplitude and angle encoding with the best corresponding parameters are shown in
Figure 7. We observe that, analogous to classical machine learning, the hybrid (quantum–classical) approach tends to perform better with more training and testing data. The accuracy and generalization of the hybrid model improves with an increase in the dataset size, as shown in
Figure 7a,b, whereas the convergence time increases (
Figure 7c), which is obvious since there are more data on which to train the model. At first, angle encoding performed slightly better in terms of accuracy whereas amplitude encoding performed better in terms of generalization and convergence time.
4.2. Results—HQCNNv2
HQCNNv2, as shown in
Figure 3, was implemented using the best parameters selected while experimenting with HQCNNv1. Similar to HQCNNv1, HQCNNv2 was also trained on both D103 and D204 with the best batch sizes and best rotation gates for both amplitude and angle encoding. The training results for HQCNNv2 are presented in
Table 5 and visualized in
Figure 8.
Like HQCNNv1, we observe that in HQCNNv2, the accuracy and generalization error tend to improve when the size of the dataset is increased, and the time of model convergence increases. Furthermore, both amplitude and angle encoding have a comparable performance with respect to overall accuracy. In addition, the amplitude encoding performed better in terms of generalization and convergence time for both D103 and D204.
4.3. Results—HQCNNv3
The training experiment for HQCNNv3 followed the same procedure as in HQCNNv1 and HQCNNv2. The tabular and graphical representation of the obtained results for both D103 and D204 for HQCNNv3 are shown in
Table 6 and
Figure 9.
Similar to HQCNNv1 and HQCNNv2, when the amount of data was increased, the overall accuracy and generalization of model improved in HQCNNv3 as well, while taking more time to converge in case of larger data. Furthermore, for a smaller batch size (8), the angle encoding performed worst in HQCNNv3 with almost no learning at all (lowest accuracy). The amplitude encoding exhibited a better performance with respect to all three performance metrics (accuracy, generalization and convergence time) in HQCNNv3.
4.4. Results—CVa and CVb
We trained both classical models (CVa and CVb) on both D103 and D204 for the best batch sizes of 8 and 32. The training results for both classical models are summarized in
Table 7 and visualized in
Figure 10.
For D103, the classical model CVa performs slightly better than CVb in terms of convergence time, with almost the same performance in terms of accuracy and generalization. This is because CVb includes one extra hidden layer and hence tends to converge slower (particularly for a smaller batch size). For D204, both CVa and CVb have comparable accuracy. However, CVa generalizes slightly better because of its simpler architecture compared to CVb (where the latter tends to overfit).
The overall performance of CVa and CVb is significantly better than HQCNNv1, particularly in terms of convergence time. This is mainly because of the classical nature of data, which can be easily encoded into the classical network. On the contrary, in hybrid networks, the classical data features are encoded into the quantum space, which is time-consuming and generally considered a bottleneck in hybrid quantum–classical algorithms. In order to demonstrate the potential quantum advantage in hybrid networks over pure classical networks, we consider the rate with which the underlying model’s performance would improve or deteriorate in terms of accuracy, generalization error and convergence time by training the networks on both D103 and 204.
7. Conclusions
Quantum neural networks with large quantum data and fault-tolerant quantum devices can potentially outperform classical neural networks. However, the unavailability of large-scale universal fault-tolerant quantum computers and sufficiently large quantum datasets limits their practical relevance. On the other hand, NISQ devices have already been developed and have been demonstrated to outperform classical computers for certain tasks. Hybrid quantum–classical neural networks (HQCNNs) are largely being explored for NISQ devices, where a small portion of the neural network is designed in quantum space—typically the hidden layers, sandwiched between classical input and output layers. Using classical data, HQCNNs attempt to leverage the quantum advantage in neural networks. However, it is still not proven that HQCNNs have an advantage over classical NNs (particularly for classical data).
Realizing the lack of a standardized methodology to design quantum layers in HQCNN, in this work, we propose a systematic methodology to construct these quantum layers. Such quantum layers are typically constructed using variational or parametrized quantum circuits. However, before running the quantum layer, the classical data need to be encoded, which is the most important step in HQCNNs and is often considered the performance bottleneck. In this paper, we use two of the most commonly used encoding techniques, namely amplitude and angle encoding.
We propose three variants of HQCNNs. HQCNNv1 consists of four single-qubit layers and is tested with all commonly used parametrized gates and both the encodings. We conclude that for amplitude encoding, performs the best, whereas is best to use with angle encoding, with respect to our performance metrics (accuracy, generalization and convergence time). HQCNNv2 and HQCNNv3 introduce fairly complex and reasonably complex entanglement, respectively.
We then compare all three hybrid variants, for which we consider the overall rate at which each performance metric improves or deteriorates when we increase the data, since we believe that quantum advantages are clearer with more computational data.
The results of the comparison of the hybrid variants do not lead to a single winner that is an optimal choice for all the scenarios and applications. Hence, considering any of the variants is highly application-dependent. An overview of variants’ selection preference for some applications is presented in
Table 10. For instance, healthcare applications usually require higher accuracy, and thus HQCNNv1 would be more appropriate choice. Similarly, applications such as recommendation systems can accommodate relatively low accuracy for faster model convergence, in which case HQCNNv3 would be a preferred variant. Finally, both HQCNNv2 and HQCNNv3 can be a desirable choice for applications in which accuracy and convergence need to be balanced, such as facial and speech recognition applications.
When evaluating the encoding techniques, we conclude that amplitude encoding is significantly faster than angle encoding, mainly because it encodes an exponential amount of data for n number of qubits, whereas angle encoding is slightly better in terms of overall accuracy, particularly for simple quantum circuits. Hence, amplitude encoding is recommended for applications where convergence time is the most important metric and angle encoding where accuracy is most important.
We also compare our hybrid variants with two distinct variants of classical counterparts. We observed that the accuracy improvement rate and model convergence are better in all hybrid variants for the majority of the experiments, and hence it is safe to say that when the amount of data is increased, the potential quantum advantages start to enhance the performance rate of HQCNNs as compared to pure classical NNs. Although the classical models generalize slightly better than hybrid variants, the generalization improvement rate of hybrid variants is still quite comparable to classical models.