1. Introduction
Artifacts produced in the astronomical images from ground-based telescopes are mainly produced by the effect of the turbulent nature of the atmosphere. Nowadays, every ground-based telescope has an adaptive optics (AO) integrated system to improve the image quality, becoming an indispensable technique for the realization of stellar observations. The main objective of these techniques is to correct the aberrations present in the received light [
1,
2,
3].
On the whole way from the light source to the telescope, most of the aberrations are caused by the Earth’s atmosphere due to atmospheric turbulence. AO systems correct the atmospheric turbulence, by recovering, in each moment, the wavefront that conforms the aberrations and, with all the information, eliminates most of the defects in the received light. In the case that this work considers, the Sun is the celestial body to be studied. Differences between nocturnal and diurnal observations are subtle and, although the focus is to perform a similar correction, the development of correction methods must cope with the distance differences and size of the observed object.
Atmospheric turbulence is a random phenomenon in constant change; therefore, AO systems must make the corrections as fast as possible, trying to perform them in real time.
New solar telescopes that are currently under development are characterized by having a larger pupil diameter and making observations with a greater field of view (FOV), which implies for AO systems more detailed observations and more data to handle, where it is necessary to work in real time as with the previous ones. So, with the improvement of solar telescopes, better and more optimized AO systems are needed to obtain the maximum performance.
Telescopes can be classified into two large blocks, for night-time observations and for diurnal observations (the solar ones). AO systems currently work in both kinds of observations, being more developed in the nocturnal case, where AO is nowadays presented in every nocturnal telescope system, being essential for its operation [
4,
5,
6]. In the last 15 years, Solar AO has shown its good performance working with the simplest configurations for day-time observations, such as the Single Conjugated Adaptive Optics (SCAO) configuration. These kinds of reconstructors are already implemented in some real telescopes. However, new-generation telescopes that are still being built will collect larger amounts of data, which require more complex setups such as the Ground-Layer Adaptive Optics (GLAO) or the Multi-Conjugated Adaptive Optics (GLAO). These configurations exist both for nocturnal and diurnal AO but, for the second case, technology is still being developed.
AO systems retrieve the data measured from the wavefront sensors (WFSs) placed in the telescopes and calculate corrections for the aberrations presented, which will be applied by the deformable mirrors (DMs). Both systems are detailed in the next section. Several algorithms have been developed to calculate a correction from the WFS data, especially for the nocturnal case [
1]. The main goal of this work is to use Artificial Neural Networks (ANNs) to affront the reconstruction of Solar AO, in particular, the GLAO configuration.
Artificial Intelligence (AI) methods have shown good performance applied to several fields, such as language processing and image classification. In the case of scientific fields, they have been applied to improve the performance of numerical approximations used to simplify complex physical systems [
7]. In the case of AO, numerous ANN models have been used as reconstructors for nocturnal observations, obtaining good results, and, in recent years, their testing has started with solar image reconstructors, showing promising results.
However, these promising results were limited to work in regions of the Sun where the neural network models had been previously trained. When applied on a solar surface unknown to the neural network, the models are not able to correctly interpret the data, as they confuse the effects of turbulence with unknown sunspots, resulting in very low-quality reconstructions [
8,
9,
10]. This work intends to solve this problem by proposing a new neural network model capable of obtaining good-quality reconstructions of atmospheric turbulence in unknown regions.
Throughout this work, the first approach of a new reconstructor based on ANNs for Solar AO is presented. In
Section 2, the materials and methods used are shown, with subsections for Solar AO, artificial neural networks, and the simulation platform used to make the data.
Section 3 deals with the experimental setup, with the results and its discussion in
Section 4. Finally, the last section contains the main conclusions that can be extracted from this work.
2. Materials and Methods
In this section, the main concepts of Adaptive Optics (AOs) and Artificial Neural Networks (ANNs) are briefly introduced. Furthermore, the third subsection presents the Durham Adaptive optics Simulation Platform (DASP) that consists in the computational tool used to make the simulations needed for this work.
2.1. Solar Adaptive Optics
AO is a fundamental technique in current ground-based telescopes to improve the quality of the images received from celestial bodies. It is used both in nocturnal and diurnal observations but, despite being similar in many ways, each one must deal with the different difficulties that characterize them [
1].
Atmospheric turbulence aberrates the wavefronts of the light received from celestial bodies. On the way from the source to the atmosphere, the waves practically do not suffer aberrations, but the properties of the atmosphere make the wavefronts, which are expected to be received approximately flat, completely distorted along their path through the atmosphere to the telescope.
The atmosphere can be modeled as a set of layers of air with their own dynamics, moving in several directions and with different properties of humidity, density, temperature, etc. Each atmospheric layer therefore has its own refractive index. The effect can be briefly understood as the incoming light passing through a set of lenses. Furthermore, another important atmospheric feature is that its layers are continuously moving between them, making the atmospheric turbulence a random phenomenon constantly changing.
The Kolmogorov’s statistical model is one of the most used models to represent the performance of atmospheric turbulence [
11]. In a simple summary of this model, the atmospheric turbulence is represented as a set of layers consisting of random phenomena at different heights. The solar energy is collected by the highest layers, where the largest phenomena take place. These layers make reconstruction easier as their larger size makes the events more constant. The energy is passed to the lower layers, which gradually become smaller in terms of size, and so on for the remaining heights. Each layer in Kolmogorov’s model is characterized by a set of parameters, but there are two main factors, the height of the layer and the intensity of the turbulence. The last term is represented by Fried’s coherence length (
), measured in cm. The physical interpretation of this value corresponds to the diameter of the pupil of a telescope that, in the absence of turbulence, offers the same resolution power as a large telescope in the presence of the same atmosphere [
1,
12]. Its value is obtained from the structure tensor associated with the refractive index, which is calculated from the refractive index of two points separated by a distance
:
where
is the refractive index value of each point. Certain simplifications of this expression allow for obtaining a simpler structure tensor. Considering inertial range conditions, i.e., local homogeneity (velocity depends only on
), local isotropy (velocity depends only on the modulus of
), and incompressibility of turbulence
= 0), we simplify the definition of the structure tensor as a function of refraction index as follows:
represents the profile of the atmospheric turbulence and gives information on the strength or intensity of the turbulence at a height
. From this parameter, the value of Fried’s parameter is defined as:
where
is the optical path followed by the light and
is a correction factor corresponding to the zenith angle of observation. The correction is applied to solve changes in the length of the path of travel of the line of sight with the angle of observation. Therefore, an important aspect of this parameter is that the intensity of the turbulence increases when the value of the parameter decreases. Both
and the height terms are used in the results section to characterize the turbulence reconstructed at each case.
To reconstruct the turbulence, Zernike’s polynomials are used in this work. They consist of a sequence of orthogonal on-the-unit disk polynomials, typically used in optics thanks to their simplicity of factorization in radial and azimuthal functions [
13,
14]. To determine the turbulence at each moment the coefficient of each polynomial of the sequence is given, the higher the order, the more accurate the reconstruction will be.
Regarding the components of an AO system, the Shack–Hartmann (SH) wavefront sensor (WFS) consists in the most used instrument in telescope systems to determine how are the wavefronts of the incoming beams [
1,
15], and, in this work, they consist in the data source of the ANN inputs. The SH consists of a mesh of identical lenses that divide the incoming wavefront in a set of subapertures. There is one subaperture for each lens, so each subaperture receives an image of the same region of the Sun. The images between the different subapertures have subtle differences despite being from the same area of the Sun, caused by the atmospheric turbulence present at each moment; these differences are mathematically modeled as differentials in the position variables, which are later used to estimate the corrections. The SH calculates the cross-correlation between all the subapertures to obtain the center of gravity of each one, which is called a centroid. In an ideal scenario, in the absence of turbulence, the centroids would be placed in the center of each subaperture; however, due to the aberrations in the images and the consequent variation in the image measurements, each centroid is displaced. The position of the centroids consists in the inputs used by the ANN to reconstruct the atmospheric turbulence.
Solar SCAO (Ground-Layer Adaptive Optics)
In this work, the SCAO configuration is considered to compare recovered phases from the reconstruction algorithm. Its features involve a single SH working on-axis aiming to measure in a determined Sun’s region. Its main objective is to reconstruct the ground turbulence layer (GL), the one closest to the telescope aperture, in the direction of the observation. The GL is usually the most problematic one [
16], as it presents several difficulties as it is usually formed by small coats of turbulence one after the other, which vary independently, with a coherence time much smaller than the higher layers, which implies that the turbulence in one direction can be completely independent and different from what exists in another.
The SCAO systems also have a DM working coupled with the SH in a closed loop. That means that the light is previously corrected by the DM before being measured by the SH. Therefore, the SH measures the wavefront corrected by the calculation made at the previous instant, with most of the aberrations already corrected.
2.2. Artificial Neural Networks (ANNs)
Artificial neural networks consist of a set of interconnected processing units called neurons that try to mimic the behavior of biological neural networks. The simplest architecture of an ANN is the Multi-Layer Perceptron (MLP), which is selected for this work. MLPs have shown good results as AO reconstructors for night observations, such as the first AO reconstructor based on an ANN called CARMEN [
17,
18].
The neurons are distributed in several layers whose connections are linked to the neurons of the following layer: these connections are regulated by weights [
19,
20]. The final layer consists of an output layer where the final outputs of the ANN are given, which could be real numbers, a vector, a category, etc. For the case of this work, the final output is the first 150 coefficients of Zernike’s polynomials.
Each of the neurons applies a mathematical function to the data it receives to obtain an output that is transmitted to the neurons of the next layer. There are several functions that can be used as activation functions. The use of one or the other will depend on which of them best suits the problem to be modeled. Therefore, the simplest case of a neuron receiving as input a single value (
x) and obtaining an output (
x’) will follow the following process:
where
is the activation,
is the bias, and
is the weight of the corresponding connection. The feedforward process determines how the information is passed between the neurons:
The main characteristic of neural networks is that they can learn from the data during the training process. This procedure consists of giving some sets of data to the ANN with their corresponding ideal output known. The neural network compares the ideal result with its own output and calculates the error. First, the weights of the interconnections between neurons are randomly initialized but, to minimize the error of the output, the error is backpropagated through the layers of the networks by using a gradient descent algorithm [
7]. This algorithm, known as the backpropagation algorithm, is based on the classical solutions or approaches of mathematical optimization problems; it consists of the search of the direction of maximum decrement in the gradient’s value in a high-dimensional surface, which corresponds to the image of a loss function, which estimates how far the response of a network is from the desired output and, therefore, how the weights should be tuned. Although the gradient descent algorithm allows for searching the mentioned direction, it does not provide the length of the step that should be taken in that direction, and consequently, numerous iterations must be performed. In addition, several modifications of this algorithm, such as the Nesterov approach, or others such as Adagrad have been developed to approach a solution of the problem with the gradient step direction. That backpropagation updates the weights before this process is repeated. Each time the whole dataset for training is passed through the neural network is called an epoch, and the number of epochs characterizes each training process.
In this case, a Multi-Layer Perceptron (MLP) ANN is used, one of the simplest architectures of ANNs.
2.3. Durham Adaptive Optics Simulation Platform (DASP)
DASP [
21] is the simulation platform chosen to generate the data used to train and test neural networks. The DASP models different kinds of AO systems both in nocturnal and diurnal observations, allowing the user to determine all the main parameters of the simulation, such as the properties of the atmospheric turbulence (such as
, the number of layers and its height, the wind velocity and direction), the number of SH and their main parameters (such as the number of sensors and the number of pixels of the subaperture of each sensor), and the number of deformable mirrors. In the case of diurnal observations, the platform uses a wide-field image of the Sun as the scientific object of the observation.
The DASP simulator is a very useful platform when working with neural networks for AO as it allows for collecting several pieces of information of the simulation. Some of the most useful terms that it allows to save are the image of the profile of the phases of the turbulence layers, the images received by the SH on each subaperture, the cross-correlations made by the SH, the slopes of the centroids, or the Zernike coefficients of the turbulence phase.
DASP generates the turbulence according to Von-Karman statics. To control all the parameters of the simulation, DASP generates a parameter file where the used parameter can determine the main characteristics of the simulation and the data that are going to be saved. For the case of this work, the slopes of the centroids and the coefficients are saved, and the whole dataset used for training, validating, and testing is simulated using this platform.
The DASP platform is still under development, as it is continuously implementing improvements thanks to the development that is carried out at the University of Durham. That is why it is not a platform that is still frequently used in adaptive optics studies, as its use is still insufficiently intuitive. However, thanks to the previous experience of this research group with the platform, it makes it an ideal data generator for the experiments to be carried out.
2.4. Experimental Setup
For this research, an SCAO system with only 1 SH was simulated. In order to determine whether the neural network was able to generalize, it was necessary to perform simulations over different regions of the solar surface. In addition, it was necessary that each of the regions had nothing in common with the previous one, so the network could not use any of the information learned. For this purpose, each of the chosen regions was separated from the previous one by a minimum distance of 10 arcseconds. For each of the simulated samples, the slopes measured by the SH were saved together with the corresponding first 150 Zernike coefficients representing the atmospheric turbulence at that time. The SH simulated had 15 × 15 subapertures.
Two experiments were made to determine the viability of the reconstructor system. For the training process, the first dataset consisted of 195,000 samples of 4 different regions of the Sun where the turbulence was varying in height from 0 to 15,000 m, and was also varied from 7 cm to 15 cm in steps of 1 cm.
The size of the saved data depended on the chosen SH. In this case, as mentioned above, it consisted of an SH of 15 × 15 sub-apertures. Each of the sub-apertures measured two different slopes (one in each direction), eliminating those corresponding to the sub-apertures located at the corners as they do not receive light, because the pupil is circular. A total value of 354 slopes was obtained for each of the samples. The outputs given to the ANN corresponded to the first 150 Zernike coefficients, excluding the piston, the tip, the tilt, and the atmospheric turbulence at each moment (
Figure 1).
The first test dataset was made with the same structure as the training one but, in this case, only with 6000 samples, with the turbulence varying from 0 to 6000 m. It was simulated over 10 new regions of the Sun, completely unknown for the ANN and, of course, different from the first 4 regions used in the training dataset.
The choice of the neural network model was carried out by means of a genetic algorithm in hyperparameter space, in which a large number of models were tested in the search of the most optimal one. This same process was carried out for the general topology of the network (number of layers, neurons in each layer…) but in a more reduced form, as it was started from an initial idea with MLP network models that had behaved satisfactorily in experiments previously carried out by the research group [
22]. The ANN used for this research consisted of an MLP with 2 hidden layers and the final output layer, consisting of 150 neurons, one for each Zernike coefficient. The network was optimized with the Adam Optimizer Algorithm [
23] using the mean squared error as the loss function.
Once this previous experiment was made, a larger one was designed to obtain better results improving the number of samples available for the training process. For this second case, the training dataset consisted of 400,000 samples from 8 new Sun regions, where the variations in the simulated turbulence layer were the same as in the previous experiment in terms of height and value. The test datasets consisted of three new datasets, simulated over 10 new regions of the surface of the Sun (the same regions for the three datasets). The value of the parameter was fixed for each dataset, with values of , and 12 cm. The idea behind these three new datasets was to test whether or not the intensity of atmospheric turbulence affected the neural network’s ability to generalize.
3. Results
The results are shown in terms of residual wavefront error (WFE). The residual WFE determines the similarity between two images by estimating the RMSE of the difference pixel by pixel of both images. If both images are the same, the RMSE should be zero, whilst the higher the value, the more differences will exist between the images.
The residual WFE can be calculated as follows:
where
is the pixels of the original image and
is the reconstructed ones, and
N is the total number of pixels of both images.
The residual WFE is often presented in radians, and in those cases, it is calculated as:
where λ represents the wavelength of the incoming light.
To determine the residual WFE, the images of the turbulence phase are needed. The outputs of the neural network are the Zernike coefficients, so it is necessary to reconstruct the profile before the error is calculated. To reconstruct the profile, a function of the AOTools module is used, called a phase from Zernikes [
18]. The function allows the user to reconstruct the image of the profile of the phase given a list with the Zernike coefficients (the greater the number of coefficients, the more accurate the reconstruction will be) and the desired size of the image obtained as output. The size only influences the number of pixels but obviously the content of the image is the same.
For the first experiment, the results obtained over the 6000 samples of each test are shown in
Table 1.
Figure 2 shows an example over the 6000 samples of the test randomly selected to show visually the quality of the reconstructions made. The image on the left corresponds to the original phase, with the one on the right being obtained from coefficients of the neural network.
In
Table 2, the results obtained for the second experiment are shown. The difference with the first one is the training process, as more samples and regions of the Sun are used for this second case.
In
Figure 3 and
Figure 4, some examples for visual comparison are shown with their corresponding residual WFE for each one. In all of them, the original phase is placed on the left and the reconstructed one is shown on the right.
4. Discussion
Throughout the results shown, it is proved that MLP is a kind of ANN capable of making reconstructions for Solar AO, obtaining acceptable results in comparison with traditional algorithms of reconstruction such as the Least-Squares one [
19]. The main result obtained is that MLP can generalize and reconstruct in solar regions completely unknown where they have not been trained previously. That is the principal difference with other research made by our group where ANNs have shown good results but are always working over pre-trained Sun’s regions.
An expected result obtained from the comparison between tests one and two is that the quality of the reconstructions improves as the number of samples in the training dataset increases. It is an expected result as increasing the number of samples and, above all, the number of zones with which the network is trained, helps it to learn a greater variety of data, and it is easier for it to generalize.
From the second test, it can be analyzed if the quality of the reconstructions is affected by the intensity of the atmospheric turbulence. From the results, it is not, as the network is consistent in all cases, obtaining a 28% residual error in the cases of 8 and 12 cm and 26% in the case of 10 cm. Therefore, these are practically the same values in all situations, which is undoubtedly an advantage as it is a reconstructor capable of generalization whose results are not affected by the intensity of the atmospheric turbulence at the time of performing the observation.
To determine whether the reconstructions performed are accurate or not, it is not enough to check the residual wavefront error values. It is necessary to perform a visual inspection of the quality of the reconstructions. If both turbulent phases (original and reconstructed) are similar and have a low wavefront error value, it means that the reconstruction is accurate. This point is now checked with
Figure 3 and
Figure 4.
Figure 3 is discussed first, with the original phase extracted from the simulator on the left side and the one reconstructed by the MLP on the right side. In this case, it is a reconstruction with an
value of 8 cm, i.e., one of the most demanding for its intensity on which the performance of the MLPs was tested. At first glance, the areas with the greatest phase shift (those with the darkest tonality) are in approximately the same regions of the image, the lower and center-right areas, while the lighter ones are clearly found in both images in the upper left region. However, the residual WFE value obtained in this reconstruction is higher than the mean value over all other samples, implying that the reconstruction is less accurate. This can also be seen visually as, although the areas of maximum and minimum intensity are approximately in the same regions of the turbulence, the shapes of these areas vary from each other in both images. In the region with the darker hue, it is clearly seen that the shapes of the regions in the original and reconstructed phase vary from each other, being similar but not the same.
The opposite is true for
Figure 4, which is a sample with a higher
value of 12 cm, implying a much lower atmospheric turbulence intensity than in the previous case. In addition, the residual WFE value obtained in that reconstruction is only 110.45 nm, lower than the mean value over all samples, so it could be predicted that it is a good reconstruction. Upon visual inspection, two very similar phases are observed, as in the case of
Figure 3, the regions of higher and lower intensity coincide in both figures (with the first ones being in the lower part with a darker tonality and the others being in the upper part). The main difference with
Figure 3 is that in this case, both regions have very similar shapes, besides being in the same areas, and their contour is very similar. This especially occurs in the small intense regions that are located in scattered areas of the image, which, in both cases, are in the same positions with approximately the same size and shape. Therefore, we could conclude that this is a good reconstruction.
It is also important to mention one aspect when making visual comparisons: the network output is being compared to the original turbulence image. It is to be expected that the original turbulence extracted directly from the simulator will have a much higher accuracy, regardless of the quality of the reconstructor. As mentioned in
Section 2.4, the network output is 150 Zernike coefficients, so the image is generated from this approximation of only 150 values. Actually, the original image is being compared with an approximation of 150 values in which the possible calculation error made by the network must also be taken into account. For this reason, the different shapes that the turbulence adopts along the whole pupil have a less defined shape than those of the original image.
In other studies carried out by our research group [
20,
21], reconstructor systems for solar adaptive optics based on artificial neural networks had been developed with promising results, but they only work on regions of the Sun in which they had been previously trained. In these studies, the values of the relative residual WFE that were obtained were between 25 and 35%, depending on the case. Therefore, with this system, it has been possible to generalize while maintaining the range of error previously obtained, which represents a great advance in terms of system efficiency when used in a real case.
5. Conclusions
In this research, a new reconstructor system for Solar AO is proposed based on an ANN. The main advantage of this model over others based on artificial intelligence is its ability to generalize to different regions of the Sun, i.e., it is able to obtain reconstructions of atmospheric turbulence with low error even focusing on regions of the solar surface on which it has not been previously trained, completely unknowns. The proposed method relies on numerical approximations performed by MLPs that are capable of predicting turbulent profile phases. Previous reconstructor systems based on ANNs were able to perform high-quality reconstructions but only in areas where they had previously been trained unlike this new model described throughout the paper. All the results shown in this research are obtained using simulated data, both for training and testing processes.
For the training process of the ANN, two different scenarios are considered, the first one with 195,000 samples from 4 regions of the Sun and the second one with 400,000 samples from 8 different areas. Increasing the number of known regions and samples is shown to improve the quality of the reconstructions.
When testing the ANNs with unknown regions of the Sun, the system shows high-quality reconstructions, achieving similar values of relative residual WFE to other reconstructors based on ANNs, always below 30% of the relative error. Furthermore, the system is shown to be very strong to changes in the intensity of the atmospheric turbulence, obtaining approximately the same relative residual WFE with 8 cm or 12 cm of .
Visual analysis to check the quality of the reconstructions confirms that they are very similar to the original phases simulated by the DASP platform. For this purpose, reconstructions with residual error values higher than the mean value (a case where the network performs worse than expected) and a case where the network would perform better than expected are examined. In both cases, the reconstructed images are very similar to the expected image, with the areas of higher and lower turbulence located in the same regions of the image.
This is a breakthrough in terms of reconstructors for Solar AO based on artificial neural networks, as all the previous ones presented by our research group, although they were able to obtain good-quality reconstructions, were not able to generalize and work with unknown regions of the Sun. To obtain good reconstructions, it was necessary that the system was previously trained with images of the region where it was to be applied.
All the reasons exposed above show MLPs as a possible alternative for a Solar SCAO system, especially when it requires a high degree of generalization for different regions of the Sun.
Some possible future lines of research would involve testing these systems with real observational data instead of simulated data to see if the ANNs have a similar response or to observe the development of these systems based on MLPs for more complex adaptive optics configurations such as GLAO systems or MCAO.