Next Article in Journal / Special Issue
How to Reconstruct a Varying Speed of Light Signal from Baryon Acoustic Oscillations Surveys
Previous Article in Journal / Special Issue
A Review on the Cosmology of the de Sitter Horndeski Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluating the New Automatic Method for the Analysis of Absorption Spectra Using Synthetic Spectra

by
Matthew B. Bainbridge
1,* and
John K. Webb
2
1
Department of Physics and Astronomy, University of Leicester, University Road, Leicester LE1 7RH, UK
2
School of Physics, University of New South Wales, Sydney, NSW 2052, Australia
*
Author to whom correspondence should be addressed.
Universe 2017, 3(2), 34; https://doi.org/10.3390/universe3020034
Submission received: 1 February 2017 / Revised: 11 March 2017 / Accepted: 29 March 2017 / Published: 5 April 2017
(This article belongs to the Special Issue Varying Constants and Fundamental Cosmology)

Abstract

:
We recently presented a new “artificial intelligence” method for the analysis of high-resolution absorption spectra (Bainbridge and Webb, Mon. Not. R. Astron. Soc. 2017, doi:10.1093/mnras/stx179). This new method unifies three established numerical methods: a genetic algorithm (GVPFIT); non-linear least-squares optimisation with parameter constraints (VPFIT); and Bayesian Model Averaging (BMA). In this work, we investigate the performance of GVPFIT and BMA over a broad range of velocity structures using synthetic spectra. We found that this new method recovers the velocity structures of the absorption systems and accurately estimates variation in the fine structure constant. Studies such as this one are required to evaluate this new method before it can be applied to the analysis of large sets of absorption spectra. This is the first time that a sample of synthetic spectra has been utilised to investigate the analysis of absorption spectra. Probing the variation of nature’s fundamental constants (such as the fine structure constant), through the analysis of absorption spectra, is one of the most direct ways of testing the universality of physical laws. This “artificial intelligence” method provides a way to avoid the main limiting factor, i.e., human interaction, in the analysis of absorption spectra.

1. Introduction

Probing the variation of nature’s fundamental constants (such as the fine structure constant, α ), through the analysis of absorption spectra, is one of the most direct ways of testing the universality of physical laws. Interactive methods for analysing high-resolution quasar spectra of heavy element absorption systems are complex and require considerable expertise. Recently, we presented a new “artificial intelligence” method for the analysis of high-resolution absorption spectra [1]. Our new method unifies three established numerical methods: a genetic algorithm (GVPFIT); non-linear least-squares optimisation with parameter constraints (VPFIT); and Bayesian Model Averaging (BMA).
This method requires evaluation before being applied to the analysis of large sets of absorption spectra. In particular, it is unknown how the accuracy of GVPFIT and BMA is effected by the complexity of an absorption systems velocity structure. We investigate the performance of GVPFIT and BMA over a broad range of velocity structure complexities using synthetic spectra. This is the first time a sample of synthetic spectra has been used to investigate how we analyse quasar absorption spectra. Using synthetic spectra, we can provide stringent tests of the modelling process. When analysing spectral data, one cannot uniquely determine the velocity structure of the absorbing cloud and the physical parameters are unknown. In contrast, with synthetic spectra, the underlying (real) velocity structure and input parameters are uniquely determined. By directly comparing our models, parameter estimates, and statistical uncertainties with the underlying (real) velocity structures and input values, we can establish the stability, precision and accuracy of our approach over a broad range of complexity levels in the velocity structure. Such an investigation was previously infeasible due to the time-consuming nature of the interactive method of absorption spectra analysis.

2. Method

We previously applied GVPFIT and BMA to the analysis of a high signal-to-noise, high spectral resolution and complex absorption system at z a b s = 1.839 towards J110325-264515 [1]. In this analysis, GVPFIT was iterated for over 80 generations and generated a large database of candidate models over a broad range of model complexity. From this large database, we selected 37 models each corresponding to the minimum- A I C c model with 1 through 37 velocity components, where A I C c is the Akaike Information Criteria corrected for small sample size (see [2,3] and Equation (1)). We go up to 37 because a 37-component model corresponded to the minimum- A I C c model for the real data.
For each of these 37 models, we utilised the Voigt profile parameters to generate a synthetic spectra, with Δ α / α set to zero. The appropriate VPFIT1 output was applied to generate the synthetic models, and hence we convolved the synthetic spectra using the same instrumental profile as the real spectra. Using the actual error arrays from the real spectra, we assigned a Gaussian standard deviation to each pixel and used the Box–Muller transform approach [4] to add noise to the synthetic spectra. The real spectra have multiple observations at different epochs and instrumental settings (see [1] Table 2); we generated synthetic spectra corresponding to all of these for each of the selected 37 models. Thus, the synthetic spectra emulate the characteristics of the real spectra of this absorption system.
We then treated the synthetic spectra described above as if they were real spectra as described in [1]. To each spectra, we applied GVPFIT generating a large set of models to the synthetic spectra. The synthetic spectra were both created and fitted using turbulent b-parameters and using the same atomic data. We then estimated Δ α / α using BMA, with A I C c providing the relative likelihood used to weight the contribution of each model. In this method, A I C c provides a measure of the relative quality of a model, based on a balance of goodness-of-fit (chi-squared) against the complexity (number of components compared to the number of data points) of each model. We define A I C c in the normal way ([2,3])
A I C c j = χ j 2 + 2 k + 2 k ( k + 1 ) ( n k 1 )
where k is the number of free parameters and n is the number of data points. Statistical uncertainties are determined from the diagonal terms of the covariance matrix at the best-fitting solution.

3. Results

The new “artificial intelligence” method, GVPFIT and BMA, results in excellent fits to the synthetic spectra. As an example, Figure 1 and Figure 2 illustrate the BMA model for the most complex synthetic spectrum we analysed, with 37 underlying (real) velocity components. These figures show the residuals are well behaved and there are no discrepancies between the data and the model. The BMA model is determined by summing over all models for each pixel in the data, with the contribution of each model being weighted by its relative likelihood using A I C c (using Equations (7) and (13) from [1]):
ω ( A I C c j ) = L ( A I C c j ) l = 1 S L ( A I C c l ) = e A I C c j / 2 l = 1 S e A I C c l / 2
such that ω ( A I C c j ) is the weight of model j.
Similarly, the relative likelihood of velocity components at each pixel is determined by summing the probability density function of each redshift parameter from each component in all models, weighted by relative likelihood using A I C c (Equation (2)).
For the most complex synthetic spectra, the 37 underlying (real) velocity components represent a total of 148 Voigt profile parameters, with each component contributing four Voigt profile parameters: FeII and MgII column densities, redshift and Doppler broadening b-parameter. When we compared the minimum- A I C c model to the underlying (real) model, we found that 136 parameters, or 91.9%, were identified. GVPFIT failed to identify three velocity components, and inaccurately estimated (discrepancies of >3 σ in at least one Voigt profile parameter, using the statistical uncertainties determined from the diagonal terms of the covariance matrix at the best-fitting solution) a further three velocity components. This is illustrated in Figure 3. The missing components are among the weakest components in the underlying model and are surrounded by stronger components, while the inaccurately estimated components are weak compared to the surrounding velocity components and occur in regions of dense absorption. This trend is repeated throughout the entire set of 37 synthetic spectra, with GVPFIT identifying 653, or 92.9%, of the 703 underlying (real) velocity components. Additionally, four spurious (extra) weak velocity components were introduced in the GVPFIT process that were not present in the original models.
GVPFIT recovered the underlying (real) Δ α / α for the synthetic spectra in our sample. Figure 4 illustrates the Δ α / α estimates of all models generated by GVPFIT for each of the synthetic spectra with 34, 35, 36 and 37 underlying velocity components. A clear plateau is seen at Δ α / α = 0 , the underlying (real) value. At lower generations, i.e., when the models are under-fit, we see conspicuous departures from zero.
Figure 5 plots the BMA estimates of Δ α / α for the sample of 37 synthetic spectra. The inverse-variance weighted mean is Δ α / α = 0.04 ± 0.20 × 10 6 . This is consistent with zero, as expected given that the underlying (real) value of Δ α / α is zero for these synthetic spectra, and hence we found no evidence of a systematic bias. Figure 5 also shows that the statistical uncertainties grow as the absorption system complexity increases, as would be expected, and this is consistent with absorption systems with similar quality spectral data and numbers of components from previous analyses [5,6,7,8]. For example, the statistical uncertainty from the analysis of the (real) spectral data for this system in King et al. (2012) [8] is 4.0 × 10 6 (with 14 velocity components and using less spectral data and different transitions) and in Bainbridge and Webb (2017) [1] is 2.9 × 10 6 (with 37 velocity components).

4. Discussion

We found that the method described in Bainbridge and Webb [1], GVPFIT and BMA, recovers the velocity structures of absorption systems and accurately estimates Δ α / α over a broad range of velocity structures.
GVPFIT recovered almost all the underlying (real) Voigt profile parameters from the synthetic spectra (see Figure 3). The velocity components that GVPFIT missed or inaccurately estimated are weak and occur in locations of dense absorption. We believe that it is unlikely that a human interactively fitting this set of synthetic spectra would perform better than GVPFIT.
Figure 4 shows interesting characteristics in the evolution of Δ α / α , similar to those seen by Bainbridge and Webb [1] in the real spectra of the z a b s = 1.839 absorption system towards J110325-264515. There appears to be an underlying linear trend in the evolution of Δ α / α , with occasional conspicuous departures (see Figure 4). These conspicuous departures exhibit a dramatic shift in Δ α / α over a small change in complexity. Previous interactive methods, relying on a single “best-fit” model, lack this broad picture of how Δ α / α evolves with velocity structure and may lead to a spurious estimate of Δ α / α .
These results also highlight the importance of having an accurate spectra error estimate. The spectral error estimate heavily influences the statistics of the fitting process, as an incorrect spectral error can artificially increase or decrease the chi-squared “goodness-of-fit” statistic for a model and influence A I C c or any similar statistical criteria. This can lead to incorrectly estimating the number of components that are required to adequately fit the data and, as we have shown, have a large impact on the final estimate of Δ α / α .
Future work will increase the sample size, include a more diverse set of velocity structures and refine the method used to generate noise for the synthetic spectra. The sample of synthetic spectra used in this paper is small. Ideally, a study of this type would consist of 1000s of synthetic spectra and the automated nature of the new “artificial intelligence” method lends itself to analysing large samples. A larger sample will allow us to increase the precision of our analysis by reducing the uncertainty on our weighted mean and probe for any smaller systematic bias. For example, a similar sample consisting of 1000 synthetic spectra should allow us to estimate the weighted mean below 1 × 10 8 . In addition, we would like to include synthetic spectra based on a broader range of real absorption systems, to show that this method is generalizable to a larger range of velocity structures, data qualities and combinations of species.
In addition, we expect that a more refined analysis will allow us to optimise our approach. In this work, we generate Gaussian noise using the error array from (real) spectral data. However, in spectral data, the noise is not of Gaussian nature near zero flux and the noise in adjacent pixels is not independent. At the current level of precision with which α is being probed, these effects may become important.
However, we believe that this work is an important contribution, giving initial indications that this new method is accurate and unbiased. The size of this sample is adequate to show there is no evidence of bias in Δ α / α , when using our method, at the 2 × 10 7 level under ideal circumstances (correct spectral error, high signal to noise and high resolution). This level of precision is two orders of magnitude smaller than the systematic uncertainty estimated in previous analyses of real spectral data (for example in [8] σ r a n d is approximately 0.9 × 10 5 , using almost eight times the number of absorption systems). In addition, although GVPFIT is automated, the analysis still requires time and computing resources; time and resources which otherwise could be used to analyse (real) spectral data instead of synthetic spectra. Future work will extend these results, consider non-ideal circumstances and apply this approach to (real) spectral data.
Studies such as this one are required to test the new method of GVPFIT and BMA before being applied to the analysis of large sets of data. This is the first time that synthetic spectra have been utilised to evaluate how we analyse absorption spectra. One of the main limiting factors in the use of absorption spectra to probe fundamental physics is the human interaction required during the interactive modelling process. This human interaction involves many complex decisions, considerable expertise and can be very time-consuming for even a single moderately complex absorption system, such as a typical damped Lyman- α absorption system (such as [9]). Furthermore, the end result can be somewhat unreliable, with the literature providing many examples of fits to absorption systems which are clearly inadequate. Much time is devoted to echelle spectroscopy of quasars on large optical telescopes and considerable amounts of spectra exist in telescope archives which remain unpublished, or which have only partially been analysed, representing a great deal of valuable scientific information. With new instruments constantly being developed, such as ESPRESSO, the quality and quantity of available quasar echelle spectra are only going to increase.
Since the new method presented in Bainbridge and Webb (2017) [1] removes the previously required human interaction, we can begin to analyse the ever-increasing number of quasar echelle spectra more efficiently and undertake projects that were previously unrealistic. One example of this is modelling both thermally and turbulently broadened models for each absorption system independently, allowing a more reliable comparison between models and data. The development and testing of this new “artificial intelligence” method (GVPFIT and BMA) are key to moving past the limiting factor of human interaction and open the way for projects that were previously unrealistic.

Acknowledgments

This research used the ALICE High Performance Computing Facility at the University of Leicester.

Author Contributions

M.B.B. conceived, designed and performed the experiment, analyzed the data, and wrote the paper. J.K.W. contributed to the design of the project and the writing of the paper. M.B.B. and J.K.W. invented GVPFIT and first demonstrated the application and advantages of this method. All authors commented on the manuscript at all stages and approved the final version to be published.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
GVPFITGenetic Voigt Profile FITting software
VPFITVoigt Profile FITting software
BMABayesian Model Averaging, and
A I C c Akaike Information Criteria corrected for small sample size

References

  1. Bainbridge, M.B.; Webb, J.K. Artificial intelligence applied to the automatic analysis of absorption spectra. Objective measurement of the fine structure constant. Mon. Not. R. Astron. Soc. 2017. [Google Scholar] [CrossRef]
  2. Akaike, H. Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory; Petrov, B.N., Csaki, F., Eds.; Akademia Kiado: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]
  3. Hurvich, C.M.; Tsai, C.L. Regression and time series model selection in small samples. Biometrika 1989, 76, 297–307. [Google Scholar] [CrossRef]
  4. Box, G.E.P.; Muller, M.E. A note on the Generation of Random Normal Deviates. Ann. Math. Stat. 1958, 29, 610–611. [Google Scholar] [CrossRef]
  5. Webb, J.K.; Flambaum, V.V.; Churchill, C.W.; Drinkwater, M.J.; Barrow, J.D. Search for Time Variation of the Fine Structure Constant. Phys. Rev. Lett. 1999, 82, 884–887. [Google Scholar] [CrossRef]
  6. Murphy, M.T.; Flambaum, V.V.; Webb, J.K.; Dzuba, V.A.; Prochaska, J.X.; Wolfe, A.M. Constraining Variations in the Fine-Structure Constant, Quark Masses and the Strong Interaction. In Astrophysics, Clocks and Fundamental Constants; Karshenboim, S.G., Peik, E., Eds.; Lecture Notes in Physics; Springer: Berlin, Germany, 2004; Volume 648, pp. 131–150. [Google Scholar]
  7. Webb, J.K.; Murphy, M.T.; Flambaum, V.V.; Dzuba, V.A.; Barrow, J.D.; Churchill, C.W.; Prochaska, J.X.; Wolfe, A.M. Further Evidence for Cosmological Evolution of the Fine Structure Constant. Phys. Rev. Lett. 2001, 87, 091301. [Google Scholar] [CrossRef] [PubMed]
  8. King, J.A.; Webb, J.K.; Murphy, M.T.; Flambaum, V.V.; Carswell, R.F.; Bainbridge, M.B.; Wilczynska, M.R.; Koch, F.E. Spatial variation in the fine-structure constant—New results from VLT/UVES. Mon. Not. R. Astron. Soc. 2012, 422, 3370–3414. [Google Scholar] [CrossRef]
  9. Riemer-Sørensen, S.; Webb, J.K.; Crighton, N.; Dumont, V.; Ali, K.; Kotuš, S.; Bainbridge, M.; Murphy, M.T.; Carswell, R. A robust deuterium abundance; re-measurement of the z = 3.256 absorption system towards the quasar PKS 1937-101. Mon. Not. R. Astron. Soc. 2015, 447, 2925–2936. [Google Scholar] [CrossRef]
1.
R. F. Carswell and J. K. Webb, 2015, http://www.ast.cam.ac.uk/~rfc/vpfit.html.
Figure 1. Comparison of the Bayesian Model Averaging (BMA) model and the data for the most complex synthetic spectrum (37 velocity components) in our sample. Above the figure is a grayscale plot showing the relative likelihood of velocity components at each pixel (the scale is shown at the top). Each row in the figure shows two panels: the top panel compares the residuals (black points) to the 1 σ expected fluctuations (horizontal blue lines). The bottom panel compares the BMA model (smooth red line) to the data (black). The BMA model is determined by summing over all models weighing by the relative likelihood of each model. In this figure, we show the first 10 data regions contributing to the Δ α / α analysis. Figure 2 shows the remaining 8 data regions.
Figure 1. Comparison of the Bayesian Model Averaging (BMA) model and the data for the most complex synthetic spectrum (37 velocity components) in our sample. Above the figure is a grayscale plot showing the relative likelihood of velocity components at each pixel (the scale is shown at the top). Each row in the figure shows two panels: the top panel compares the residuals (black points) to the 1 σ expected fluctuations (horizontal blue lines). The bottom panel compares the BMA model (smooth red line) to the data (black). The BMA model is determined by summing over all models weighing by the relative likelihood of each model. In this figure, we show the first 10 data regions contributing to the Δ α / α analysis. Figure 2 shows the remaining 8 data regions.
Universe 03 00034 g001
Figure 2. Comparison of the BMA model and the data for the most complex synthetic spectrum (37 velocity components) in our sample. Above the figure is a grayscale plot showing the relative likelihood of velocity components at each pixel (the scale is shown at the top). Each row in the figure shows two panels: the top panel compares the residuals (black points) to the 1 σ expected fluctuations (horizontal blue lines). The bottom panel compares the BMA model (smooth red line) to the data (black). The BMA model is determined by summing over all models weighing by the relative likelihood of each model. Figure 1 shows the first 10 data regions contributing to the Δ α / α analysis. Here we show the remaining 8 data regions.
Figure 2. Comparison of the BMA model and the data for the most complex synthetic spectrum (37 velocity components) in our sample. Above the figure is a grayscale plot showing the relative likelihood of velocity components at each pixel (the scale is shown at the top). Each row in the figure shows two panels: the top panel compares the residuals (black points) to the 1 σ expected fluctuations (horizontal blue lines). The bottom panel compares the BMA model (smooth red line) to the data (black). The BMA model is determined by summing over all models weighing by the relative likelihood of each model. Figure 1 shows the first 10 data regions contributing to the Δ α / α analysis. Here we show the remaining 8 data regions.
Universe 03 00034 g002
Figure 3. Comparison of the minimum- A I C c model and underlying (real) model for the most complex synthetic spectrum (37 velocity components) in our sample. The synthetic spectrum (black) shown against various models (red). Panel (a): the underlying (real) model. Panel (b): the minimum- A I C c model selected from the set of all models generated by GVPFIT. Panel (c): the three underlying velocity components not present in the minimum- A I C c model. Panel (d): the three underlying velocity components that were inaccurately estimated by the minimum- A I C c model (discrepancies of >3 σ in at least one Voigt profile parameter, using the statistical uncertainties determined from the diagonal terms of the covariance matrix at the best-fitting solution). The central position of each velocity component is indicated by a tick (gray vertical line).
Figure 3. Comparison of the minimum- A I C c model and underlying (real) model for the most complex synthetic spectrum (37 velocity components) in our sample. The synthetic spectrum (black) shown against various models (red). Panel (a): the underlying (real) model. Panel (b): the minimum- A I C c model selected from the set of all models generated by GVPFIT. Panel (c): the three underlying velocity components not present in the minimum- A I C c model. Panel (d): the three underlying velocity components that were inaccurately estimated by the minimum- A I C c model (discrepancies of >3 σ in at least one Voigt profile parameter, using the statistical uncertainties determined from the diagonal terms of the covariance matrix at the best-fitting solution). The central position of each velocity component is indicated by a tick (gray vertical line).
Universe 03 00034 g003
Figure 4. Δ α / α vs generation for the four synthetic spectra with the most complex velocity structures in our sample. The underlying (real) Δ α / α is set to zero in all cases. Each panel shows the Δ α / α estimates of the models generated during the analysis of each synthetic spectra. The underlying (real) model used to generate the synthetic spectra have 34, 35, 36 and 37 velocity components for Panels (a), (b), (c) and (d) respectively. At increasing generation number, the models become increasingly complex (as greater numbers of velocity components are identified by GVPFIT).
Figure 4. Δ α / α vs generation for the four synthetic spectra with the most complex velocity structures in our sample. The underlying (real) Δ α / α is set to zero in all cases. Each panel shows the Δ α / α estimates of the models generated during the analysis of each synthetic spectra. The underlying (real) model used to generate the synthetic spectra have 34, 35, 36 and 37 velocity components for Panels (a), (b), (c) and (d) respectively. At increasing generation number, the models become increasingly complex (as greater numbers of velocity components are identified by GVPFIT).
Universe 03 00034 g004
Figure 5. Δ α / α vs number of underlying (real) velocity components for our sample of synthetic spectra. We created 37 synthetic spectra whose parameters were drawn from the minimum- A I C c models for the z a b s = 1.839 system towards J110325-264515 (analysed in [1]) for models that contained 1, 2, 3, ..., 37 velocity components. Δ α / α is set to zero in these simulations. We applied Bayesian Model Averaging, BMA, to the large set of models generated by GVPFIT to estimate Δ α / α for each of the 37 synthetic spectra. The statistical uncertainties are determined from the diagonal terms of the covariance matrix at the best-fitting solution.
Figure 5. Δ α / α vs number of underlying (real) velocity components for our sample of synthetic spectra. We created 37 synthetic spectra whose parameters were drawn from the minimum- A I C c models for the z a b s = 1.839 system towards J110325-264515 (analysed in [1]) for models that contained 1, 2, 3, ..., 37 velocity components. Δ α / α is set to zero in these simulations. We applied Bayesian Model Averaging, BMA, to the large set of models generated by GVPFIT to estimate Δ α / α for each of the 37 synthetic spectra. The statistical uncertainties are determined from the diagonal terms of the covariance matrix at the best-fitting solution.
Universe 03 00034 g005

Share and Cite

MDPI and ACS Style

Bainbridge, M.B.; Webb, J.K. Evaluating the New Automatic Method for the Analysis of Absorption Spectra Using Synthetic Spectra. Universe 2017, 3, 34. https://doi.org/10.3390/universe3020034

AMA Style

Bainbridge MB, Webb JK. Evaluating the New Automatic Method for the Analysis of Absorption Spectra Using Synthetic Spectra. Universe. 2017; 3(2):34. https://doi.org/10.3390/universe3020034

Chicago/Turabian Style

Bainbridge, Matthew B., and John K. Webb. 2017. "Evaluating the New Automatic Method for the Analysis of Absorption Spectra Using Synthetic Spectra" Universe 3, no. 2: 34. https://doi.org/10.3390/universe3020034

APA Style

Bainbridge, M. B., & Webb, J. K. (2017). Evaluating the New Automatic Method for the Analysis of Absorption Spectra Using Synthetic Spectra. Universe, 3(2), 34. https://doi.org/10.3390/universe3020034

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop