1. Introduction
Due to unavoidable coupling from the noisy environment, the capability to precisely characterize the quantum features in a large Hilbert space is needed. In general, the reconstruction is not in the quantum state, but the corresponding density matrix as the degradation transforms the target quantum state into a mixed state. For continuous variables with infinite dimensions, by utilizing quantum homodyne measurements, quantum state tomography (QST) has provided us with a useful tool for reconstructing quantum states [
1,
2]. Nowadays, QST has been successfully implemented as a crucial diagnostic toolbox for many quantum systems, including quantum optics [
3,
4], ultracold atoms [
5,
6], ions [
7,
8], and superconducting circuit-QED devices [
9].
By estimating the closest probability distribution to the data, the maximum likelihood estimation (MLE) method is one of the most popular methods in reconstructing arbitrary quantum states [
10]. However, MLE suffers from the overestimation problem as the required amount of measurements to reconstruct the quantum state exponentially increases with the number of involved modes. To overcome the overestimation in MLE, by assuming some physical restrictions imposed upon the state in question, several alternative algorithms are proposed, such as permutationally invariant tomography [
11], quantum compressed sensing [
12], tensor networks [
13,
14], generative models [
15], and restricted Boltzmann machine [
16]. Instead, with the capability to find the best fit to arbitrarily complicated symmetry with a limited number of parameters available, machine-learning (ML) enhanced QST was implemented experimentally, demonstrating a fast, robust, and precise QST for continuous variables [
16,
17,
18,
19].
However, in dealing with continuous variables, even truncating the Hilbert space into a finite dimension, a very large amount of data are still needed in reconstructing a truncated density matrix. In this work, instead of training the machine on the reconstruction model, alternatively, we develop a characteristic model-based ML-QST by skipping the training on the truncated density matrix. Such a characteristic model-based ML-QST can avoid the problem of dealing with large Hilbert space but keep feature extraction with high precision. With the prior knowledge of the experimentally measured data generated from the balanced homodyne detectors, the direct parameter estimations, including the average photon numbers in the pure squeezed states, squeezed thermal states, and thermal reservoirs, agree with those acquired from the reconstruction model. Compared to the empirically fitting curves obtained from the covariance matrix, our characteristic model-based ML-QST also reveals all the degradation information about quantum noise squeezed states, indicating the loss and phase noises in the measured anti-squeezing. With the ability to instantly monitor quantum states, as well as to make feedback control possible, our experimental implementations illustrate a crucial diagnostic toolbox for all the possible applications with squeezed states. Based on the direct parameter estimations from this ML-QST, applications to the advanced gravitational wave detectors, quantum metrology, macroscopic quantum state generation, and quantum information process can be readily realized.
The paper is organized as follows: in
Section 2, we introduce the supervised machine learning-enhanced quantum state tomography based on the convolutional neural network (CNN). Then, the implementations of the reconstruction model and characteristic model are illustrated in
Section 2.1 and
Section 2.2, respectively. The comparisons on the predicted average photon numbers, as well as the squeezing-anti-squeezing curve to the experimental fittings, are demonstrated in
Section 3, validating the feature extraction from our direct parameter estimations. Finally, we summarize this work with some perspectives in
Section 4.
2. Supervised Machine Learning-Enhanced Quantum State Tomography
When applying MLE to reconstruct the target quantum state, the data acquisition is performed by the balanced homodyne detectors based on the covariance method or nullifiers [
20,
21,
22]. However, in order to estimate the probability distribution function in different quadratures, at least three measurements must be performed at a fixed local oscillator (LO) phase. To reduce unwanted uncertainty in the fixed quadrature, therefore, a precise phase locking for the LO phase is also needed. However, in the homodyne experiments, the repeatability of the PZT drifts, owing to the airflow and the temperature difference, resulting in introducing additional (phase) noises into the measuring system. Moreover, the validation of this method relies on the Gaussian properties of reconstructed states [
23]. Nevertheless, information about unmeasured LO phases is missing due to the limitations of the selected measurements.
Instead, by scanning the LO phase from 0 to
, referred to as a single-scan measurement of quadrature sequence data,
, our homodyne measurements contain all the information at different LO phases [
24]. Intrinsically, the phase noise automatically is counted in our ML-QST [
19]. A fast QST is possible with such a single-scan measurement by just varying the LO phase. Here, the quadrature sequence data
shares the similarity to the sound (voice) pattern in a time series [
25]. With prior knowledge of the squeezed states, a supervised ML with CNN configuration is introduced in this work.
As illustrated in
Figure 1, by feeding noisy data of a quadrature sequence acquired by quantum homodyne tomography into 17 convolutional layers, we take advantage of good generalizability in applying CNN [
26]. In our one-dimensional (1D)-CNN kernel, there are five convolution blocks used, each of which contains two convolution layers (filters) in different sizes. In order to tackle the gradient vanishing problem, which commonly happens in the deep CNN when the number of convolution layers increases, some shortcuts are also introduced among the convolution blocks [
27]. Nevertheless, after flattening the 1D-CNN kernel, we either apply extra fully connected layers to reconstruct the truncated density matrix (coined as the reconstructed model) or predict physical parameters directly (coined as the characteristic model). Below, the details and differences in the reconstruction model and characteristic modes are described.
2.1. Reconstruction Model
The target of implementing the reconstruction model is to predict the truncated density matrix. In the quantum noise squeezing experiments, we have three families of possible states, i.e., pure squeezed state
, squeezed thermal states
, and thermal states
[
28,
29,
30,
31]. These three families can be described uniformly by a generic formula for squeezed thermal states:
As shown in Equation (
1), we have three characteristic parameters,
r,
, and
, corresponding to the squeezing ratio, squeezing angle, and the average photon number, respectively. Here,
denotes the squeezing transformation, with
;
and
.
One can see that when
, Equation (
1) describes the thermal states with the average photon number
, reflecting the corresponding temperature in the thermal reservoir, i.e.,
. However, when
, Equation (
1) gives the pure squeezed vacuum state, characterized by its squeezing ratio
r and the squeezing angle
. In training the machine, a uniform sampling with different physical parameters
is applied for generating the simulated quadrature sequence.
The task of our reconstruction model can be formulated as mapping the estimated function to a truncated density matrix, i.e.,
. Here,
m denotes the dimension of our truncated Hilbert space in the number state basis. To avoid non-physical states, we impose the positive semi-definite constraint into the predicted density matrix. An auxiliary (lower triangular) matrix is introduced before generating the predicted factorized density matrix through the Cholesky decomposition, i.e.,
. The training set for the quadrature data
is the set formed by:
where
N is the number of the training set, and
is chosen for the number of sampling data in a quadrature sequence. Our target is training the machine to learn the function
, which can be mapped from
to
. This estimation function can be approximated by a deep neural network which is parametrized by trainable weight variables
, with
l corresponding to the
l-th layer in the deep neural network, i.e.,
. The training process is to minimize the mean squared error (MSE), while the optimizer used for training is Adam, which is a well-adopted optimization method used to find the minimum cost function for a neural network.
We take the batch size as 32 in the training process. In this setting, the network is trained with 70 epochs to decrease the loss (MSE) up to
. Moreover, the normalization is also applied during the training process in order to ensure that the trace of the output density matrix is kept as 1. Furthermore, to improve the performance in feature extraction and to reduce the number of parameters, the dense connection is also introduced in our 1D-CNN kernel [
32]. This makes the our 1D-CNN model more efficient and lightweight. Finally, as the schematic shown in
Figure 1, after flattening, the predicted matrices are used to reconstruct the density matrices in truncation.
By considering our quantum optics experiments with the maximum squeezing level up to 10 dB and the maximum anti-squeezing level up to 20 dB, we keep the sum in the probability up to by truncating the photon number to . More than one million datasets (exactly, 1,200,000) are fed into our machine with all possible combinations of pure squeezed states, squeezed thermal states, and thermal states in a variety of squeezing levels, quadrature angles, and reservoir temperatures. All the training is carried out with the Python package tensorflow.keras performed using a GPU (Nvidia Titan RTX). Typically, in less than one hour, the execution time for our well-trained machine learning-enhanced QST takes an average cost time milliseconds (by averaging 100 times) in a standard GPU server.
Regarding the hyper-parameters (filter sizes of each layer), in
Table 1, we provide information about the architecture and parameters used in our 1D-CNN kernel. The parameters in this table correspond to the kernel size and channel length. For example,
means that kernel size
and channel length
. Here, the size of our density matrix is
. There are also convolutions in shortcuts with dense connections [
32].
2.2. Characteristic Model
In general, the supervised ML is performing a regression task, predicting a truncated density matrix for the quantum state tomography. However, as shown in Equation (
1), the target mixed state is just a linear combination of three families composed of pure squeezed states, squeezed thermal states, and thermal states. These physical states can be basically described by a few simple physical parameters. In addition to reconstructing the density matrix, one can also train a machine to predict parameters directly, coined as a characteristic model.
In the quantum noise squeezing experiments, the parameter set defined by should provide enough information in the output measurements, which are the measured squeezing level (SQZ) and the anti-squeezing level (ASQZ). This characteristic model can help us to avoid the problem that occurs in dealing with high-dimensional Hilbert space. Compared to the reconstruction model, now, the task of our supervised estimation is mapping the estimated function to the physical parameters directly, i.e., .
As marked in
Figure 1 with the shadowed background, we can directly generate these three physical parameters, without bothering additional extra fully connected layers. In this characteristic model, after the convolution kernel completes the feature extraction, we do not need to apply the fully connected layers, but just perform a linear transformation to predict the characteristic values of quantum states. In addition, in the characteristic model, we take the batch size as 32 in the training process. By this setting, the network is trained with 30 epochs to guarantee that the error (MSE) is no larger than
.
The advantages of applying this characteristic model come from the absence in dealing with any post-processing. Of course, one can calculate these physical parameters with the help of the reconstructed density matrix. However, as fewer model parameters (architecture size) are involved, we also avoid the possible errors caused due to the truncation in the density matrix. In the following, we will demonstrate the implementation of this characteristic model-based ML in the laboratory, by directly and quickly inferring the value of in the quantum noise squeezing experiments.
3. Comparison between the Reconstruction and Characteristic Models
As reported in Ref. [
19], our quantum noise squeezed states are generated through an optical parametric oscillator cavity with a periodically poled KTiOPO
(PPKTP) inside. Experimentally, operated below the threshold at the wavelength of 1064 nm, the quantum homodyne tomography is performed by collecting quadrature sequence with the spectrum analyzer at
MHz with 100,001 data points, 100 kHz RBW (resolution bandwidth), and 100 Hz VBW (video bandwidth). The phase of LO is scanned with a 1 Hz triangle wavefunction. While the pump power increases to 70 mW, the measured noise levels for squeezing (SQZ) and anti-squeezing (ASQZ) in decibel (dB) are
and
, respectively. In training the reconstruction model, a “uniform distribution” is used to sample the value of LO angle. Here, we feed 4096 sampling points from the experimental datasets (5,000,000 data points). Our well-trained reconstruction model-based ML-QST has demonstrated its advantage in keeping the fidelity in the predicted density matrix as high as
[
19].
Now, to verify the physical parameter estimation with the characteristic model, in
Figure 2, we compare the predicted average photon number, as a function of pump power, between (a) the characteristic model and (b) the reconstruction model. As the pump power increases, both the characteristic and reconstruction models give great agreement in predicting the three curves of average photon numbers for the measured data
, the pure squeezed state
, and non-pure components
, denoted as (para est) for the parameter estimation and (dmtx) for the density matrix in
Figure 2a and
Figure 2b, respectively. Besides the tendency of monochromatic increment in these three curves, both models also reveal the cross-over between the pure squeezed states and non-pure components, as shown in blue and green colors. This cross-over indicates that the non-pure components become dominant parts at a higher pump power, which degrades the quantum noises, resulting in ASQZ being larger than SQZ.
We want to remark that unlike the reconstruction model, the physical parameters are predicted directly from the characteristic model without any post-data processing. However, in the reconstruction model, the singular value decomposition is applied first to the predicted density matrix (dmtx). Then, only with the obtained coefficient
for the pure squeezed state, the weighting in the non-pure components
can be known. However, as one can see in
Figure 2, when the pump power is larger than 40 mW, a larger discrepancy between the characteristic and reconstruction models appears. It is known that when the pump power increases, many additional effects may cause degradation. Without any prior knowledge on these additional effects, such as the heating in crystals, shift of resonance frequency, and/or other nonlinear mechanisms, the parameter estimations (para est) over-estimate the predicted average photon numbers, resulting in yielding a large number than the predicted density matrix (dmtx).
As a crucial diagnostic toolbox for practical applications, we also compare our ML-QST, both on the reconstruction and characteristic models, with the experimental fitting curves on the degradation in squeezed states. In the experiments, the degradation in quantum noise squeezing is typically described by the squeezing versus anti-squeezing curve, as shown in
Figure 3.
For the ideal case, without any degradation, the squeezing and anti-squeezing levels should be the same, located along the black line in
Figure 3. However, the measured squeezing level is limited due to the phase noise and loss mechanisms coupled with the environment and surrounding vacuum. Empirically, to estimate the loss and phase noises, not a single set of quadrature data, but a series of sets of quadrature data must be performed in order to have accurate fitting parameters for exp-fitting (co-variance fitting). The measured squeezing
and anti-squeezing
levels can be modeled by taking the optical loss (denoted as
L) and phase noise (denoted as
) into account:
where
and
are the squeezing and anti-squeezing levels in the ideal case, respectively. As shown in
Figure 3, the optimal fitting curve obtained by the orthogonal distance regression is depicted in green, along with the corresponding standard deviation (one-sigma variance) shown by the shadowed region. As we show in
Figure 3, an accurate EXP-fitting can only be obtained by performing many (in our illustration, 12) different pump power levels.
Moreover, the success of EXP-fitting relies on the common belief that the loss and phase noises can be estimated as long as the system is stable. Nevertheless, as we illustrated, such a common belief is only valid at a low degree of squeezing (less than 5 dB). On the contrary, when the pump power increases, many additional effects occur, such as the shift of resonance frequency due to the heating in crystals, and/or other nonlinear mechanisms, resulting in the increment in loss [
19].
On the contrary, only with a single-scan measurement, our ML-QST based on the reconstruction model (dmtx) and characteristic model (para est) both give agreement to experimental data, depicted in blue-dashed and red-dotted curves, respectively, in our figures. The curves shown in
Figure 3 clearly demonstrate that our well-trained ML-QST can extract the degradation information in quantum states not only very precisely, but also very fast. Compared to the time-consuming MLE, our methodology paves the road toward a real-time and online QST [
33,
34]. For example, this machine learning-enhanced QST has also been applied to the reconstruction of Wigner current [
33], which can be achieved with this methodology in a more efficient manner than traditional methods.