Towards Improved Turbomachinery Measurements: A Comprehensive Analysis of Gaussian Process Modeling for a Data-Driven Bayesian Hybrid Measurement Technique

Cruz, Gonçalo G.; Ottavy, Xavier; Fontaneto, Fabrizio

doi:10.3390/ijtpp9030028

Open AccessFeature PaperArticle

Towards Improved Turbomachinery Measurements: A Comprehensive Analysis of Gaussian Process Modeling for a Data-Driven Bayesian Hybrid Measurement Technique^†

by

Gonçalo G. Cruz

^1,2,*,

Xavier Ottavy

²

and

Fabrizio Fontaneto

¹

von Karman Institute for Fluid Dynamics, 1640 Sint-Genesius-Rode, Belgium

²

Ecole Centrale de Lyon, Univ. Lyon, CNRS, Univ. Claude Bernard Lyon 1, INSA Lyon, LMFA, UMR5509, 69130 Ecully, France

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in Journal of Physics: Conference Series, Volume 2511, XXVI Biennial Symposium on Measuring Techniques in Turbomachinery (MTT2622), Pisa, Italy, 28–30 September 2022.

Int. J. Turbomach. Propuls. Power 2024, 9(3), 28; https://doi.org/10.3390/ijtpp9030028

Submission received: 31 December 2023 / Revised: 3 May 2024 / Accepted: 14 June 2024 / Published: 1 August 2024

(This article belongs to the Special Issue Selected Papers from the XXVI Biennial Symposium on Measuring Techniques in Turbomachinery)

Download

Browse Figures

Versions Notes

Abstract

:

A cost-effective solution to address the challenges posed by sensitive instrumentation in next-gen turbomachinery components is to reduce the number of measurement samples required to assess complex flows. This study investigates Gaussian Process (GP) modeling approaches within the framework of a data-driven hybrid measurement technique for turbomachinery applications. Three different modeling approaches—Baseline GP, CFD to Experiments GP, and Multi-Fidelity GP—are evaluated, and their performance in predicting mean flow characteristics and associated uncertainties on a low aspect ratio axial compressor stage, representative of the last stage of a high-pressure compressor, are focused on. The Baseline GP demonstrates robust accuracy, while the integration of CFD data in CFD into Experiments GP introduces complexities and more errors. The Multi-Fidelity GP, leveraging both CFD and experimental data, emerges as a promising solution, exhibiting enhanced accuracy in critical flow features. A sensitivity analysis underscores its stability and accuracy, even with reduced measurements. The Multi-Fidelity GP, therefore, stands as a reliable data fusion method for the proposed hybrid measurement technique, offering a potential reduction in instrumentation effort and testing times.

Keywords:

machine learning; Gaussian process; data fusion; Bayesian inference; uncertainty quantification; measurement technique; axial compressor; instrumentation; computational fluid dynamics; turbomachinery

1. Introduction

In the pursuit of enhancing the performance and efficiency of gas turbine engines operating with a Joule–Brayton cycle, the next generation of engines demands higher overall pressure ratios. Figure 1 shows the evolution of engine pressure ratio over time showcasing this trend. Current flying engines from major manufacturers are indicated with crosses in the figures, illustrating the ongoing efforts in this direction.

This drive toward increased higher overall pressure ratios has led to the design of compressors with reduced blade heights, resulting in a more compact core. Increasing the overall pressure ratio at the compressor outlet, in turn, requires a reduction in the channel core dimensions due to the need to maintain a fixed Mach number at the combustion chamber inlet. This reduction leads to a subsequent decrease in compressor blade height. This compactness poses challenges for traditional measurement devices, which struggle to match the miniaturization process, causing increased blockage and variations in the local flow field.

The quest for enhancing the performance and efficiency of gas turbine engines demands innovative measurement techniques to overcome challenges posed by increased pressure ratios and compact core designs.

Efforts to address these challenges through machine learning data-driven approaches, particularly in Computational Fluid Dynamics (CFD) modeling, have shown promise [1]. However, their application to real, noisy experimental measurements remains limited.

Traditional optical techniques and high-speed particle image velocimetry, while promising, face challenges in industrial and engine testing environments. Data-driven methods offer an alternative by requiring fewer measurements and less instrumentation, minimizing the instrumentation blockage effect without compromising accuracy.

Venturi et al. [2] used DNS data; Bertream et al. [3] used a fusion of CFD with experimental data to reconstruct flow fields with proper orthogonal decomposition, and Lou et al. [4]; and Seshadri et al. [5] demonstrated successful flow reconstruction and optimal sensor placement, in the turbomachinery field with optimization and Bayesian methodologies, respectively.

In response to current machine learning trends [6] and recognizing the need to mitigate instrumentation impact and minimize data acquisition times, Gaussian Processes (GPs), or Kriging, has emerged as a key algorithm to deal with real data, due to this Bayesian formulation naturally dealing with uncertainty. GPs are a non-parametric Bayesian method that allows for the prediction of a mean unknown response and covariance, providing straightforward uncertainty propagation. GPs are typically used in the fluid mechanics literature as surrogate models, for example, to replace expensive CFD codes in blade design studies [7].

This paper focuses on evaluating and comparing various Gaussian Process (GP) modeling approaches within a proposed hybrid measurement technique, aiming to mitigate instrumentation impact and minimize testing duration. The schematic representation of this hybrid methodology is illustrated in Figure 2, depicting a process that seeks comprehensive information through localized experimental measurements, requiring significantly fewer samples.

The methodology begins by selecting an operating point (OP) and defining the essential boundary conditions for a CFD RANS simulation. The resulting flow field quantities across the turbomachinery domain guide a Design of Experiments (DoEs) framework to select optimal experimental acquisition locations. Experimental measurements acquired at these locations provide the basis for the subsequent GP modeling approaches, which will be systematically compared.

This paper seeks to evaluate the efficacy of different GP modeling approaches at achieving accurate flow assessments with a reduced number of measurement samples. The trained GP model allows for the inference of the flow field across the entire spatial domain, providing accurate assessments comparable to physical measurements, with the Bayesian framework facilitating uncertainty quantification.

The paper is outlined as it follows: Section 2 presents the H25 axial compressor test case used in work, with both experimental and numerical tools being presented. In Section 3, an overview of the different GP modeling approaches we propose is given. Section 4 shows the mean flow results and a comparison study between the different modeling approaches. Finally, the work concludes and speculates on future implementations to the proposed measurement technique.

2. Test Case—H25 Axial Compressor

This work focuses on the H25 compressor stage, which is a low aspect ratio research stage developed under the LEMCOTEC project. The stage is intended to evaluate the impact of core size on the performance and flow field of the stage of a high-speed axial compressor for aircraft gas turbine applications. The stage is highly three-dimensional and features a reduced blade height, from which it derives its name, with the blade height measuring a mere 25 mm.

The configuration of the H25 axial compressor, characterized by its reduced blade height, makes it an ideal candidate for the methodology’s assessment since it aligns with the research goals of improving experimental flow assessment methods in next-generation compressor technologies that are more sensitive to instrumentation since their size is decreasing and probe minimization cannot follow.

More importantly, data availability played a major role in the selection of this test case. The H25 axial compressor has been extensively and comprehensibly experimentally tested by [8], providing access and in-depth knowledge of the flow. These data include a wide array of measurements related to the flow characteristics of the compressor that are described below.

2.1. Experimental Setup

The experimental data were acquired in the R4 test rig of the von Karman Institute for Fluid Dynamics (VKI). The layout of this facility is depicted in Figure 3. This facility incorporates an inlet plenum equipped with a heat exchanger and honeycomb (1), which regulate the total temperature and flow homogeneity. The flow is then channeled to the H25 test section (2), via a smooth convergent bell-mouth. After passing through the test section, the compressed flow is released into a large collector (3). A return duct (4) connects this collector back to the inlet plenum, forming a closed-loop system. The facility includes a precision throttle valve (5) that enables the accurate control of mass flow.

The R4 facility operates in a closed-loop configuration, providing the flexibility to independently adjust key parameters, including the Reynolds number (Re), rotational speed, and throttling transients. All of the experimental data used for this work were obtained under atmospheric conditions.

A meridional view of the H25 test section, labeled as point (2) in Figure 3, is presented in Figure 4. In this view, the rotating components are highlighted in red, while the stator row is marked in green. The instrumentation is axially distributed along four measurement planes, which are referenced as MP (Measurement Plane). MP0: Located after the convergent inlet bell-mouth, seven rotor chords upstream of the rotor. MP1: One chord upstream of the rotor. MP2: In the inter-row region between rotor and stator. MP4: At the stage outlet, one chord downstream of the stator blades.

The overall performance, particularly the pressure ratio and efficiency, is assessed using combined pressure-total temperature rakes deployed at MP0 and MP4. Additionally, the radial traversing of probes is possible in all measurement planes.

At MP4, a motor was installed to allow for the pitch-wise traversing of the probes over one and a half stator pitches. This setup enabled the sampling of a radial-azimuthal two-dimensional map of the flow field. The acquisition process for this outlet flow cartography, while essential for evaluating the flow inside the compressor, was the most time-consuming part of the experimental campaign, taking approximately 3 h to collect data points from 950 locations, distributed in a matrix of 25 radial points over 38 pitch-wise points, featuring refinement near the walls in the radial direction for improved gradient evaluation and uniform discretization in the tangential direction. For this reason, and given the critical role of MP4 in assessing the compressor’s performance, this experimental acquisition of the total pressure ratio flow field (P) at the machine’s design point serves as the test case for applying the proposed GP modeling approaches.

For this test, a custom-designed miniaturized three-hole probe was utilized to measure flow total pressure, angle, and Mach number. The systematic uncertainty of the miniaturized pressure probe is determined by the pressure scanner used. In this case, it amounts to ±0.08% of the full span. At nominal speed, the total pressure ratio uncertainty budget was estimated to be around 1%.

The probes were installed with a standard mechanical safety margin of 0.02 mm to ensure a gap between the probe and hub/casing walls. Additionally, the radial size of the probe head was added to this value when determining the accessible span for each probe. This led to a 2% span end-wall proximity.

2.2. Numerical Simulations—CFD RANS

A CFD numerical model is used to integrate highly discretized flow data with the undersampled experiment in the modeling approaches described above, and thus the domain matching between CFD and experiments is critical. The simulation domain is extended from experimental MP0 to a location two rotor chords downstream of MP4. This avoids issues related to outlet boundary conditions at MP4.

The numerical domain retains technological features like blade fillets in rotor and stator sections but excludes experimental cavities. Assuming periodicity, only one blade passage is simulated, utilizing a mixing plane approach for conservative pitchwise coupling.

Mesh generation employs a structured grid with Autogrid5, adopting an O-4H topology for blade rows and OH topology in the rotor tip gap. The non-dimensional wall distance (

y +

) is maintained below 2.

Figure 5 illustrates the meridional and blade-to-blade views of the computational grid. A multi-block structured grid approach results in three mesh levels, with grid convergence analysis on 0.6, 4.55, and 19.45 million cells, selecting the 4.55 million cells mesh for the Reynolds-Averaged Navier–Stokes (RANS) data.

NUMECA FINE Turbo 12.1 solves fully turbulent compressible steady RANS equations using the k-

ω

SST turbulence model [9]. Operating at the compressor’s design point, the inlet imposes experimentally measured pitchwise averaged total pressure and total temperature profiles at MP0. At the outlet, absolute mass flow is imposed, with an adiabatic assumption for all domain walls.

3. Gaussian Process Modeling

The objective of this work is to retrieve missing or unavailable data using a limited set of experimental measurements through a Bayesian mathematical framework. This can be formulated as a regression problem:

y = f (X) + η

(1)

where

y

represents the vector of observed output variables with noise-associated

η

.

X

is the matrix of input variables and

f (X)

denotes the underlying regression function that maps inputs to outputs.

In the context of this work, the available experimental flow field measurements

y

at measurement locations

X

are used to build a regression model

f (X)

, enabling the estimation of unmeasured flow field data points

y_{*}

at locations where probe measurements are unavailable

X_{*}

. This repression model is based on Gaussian Processes.

Gaussian processes (GPs) are a non-parametric method that incorporates probabilistic distributions directly into the regression model

f (X)

, constructing “Gaussian distributions over functions” [10] rather than in a fixed number of parameters. This unique characteristic makes GPs a powerful probabilistic tool, especially for regression problems, as formulated in Equation (1), for predicting unknown data

y_{*}

from observed data

y

, which include some level of measurement error or uncertainty.

A GP defines a multivariate Gaussian prior distribution over functions for every point in the input space

X

:

p (f ∣ X) \sim N (f ∣ μ, K)

(2)

where

f (X)

is a multivariate random variable

f (X) = [f (x_{1}), f (x_{2}), \dots, f (x_{N})]

with a mean function

μ

and a covariance matrix

K (x_{i}, x_{j})

that is defined with a positive definite kernel function. Due to the flexibility of GPs in modeling, the mean is assumed to be zero

μ = 0

and the GP distribution over

f (X)

is fully defined by

K

. Therefore, the choice of the kernel function

K

is a key modeling decision in the GP framework.

Multiple kernel function families exist in the literature for different function modeling behaviors (e.g., smoothness, stationary), making this choice a prior input of knowledge of the problem. In the present work, the base kernel function selected is a stationary anisotropic Matérn-5/2 kernel, which is given by:

\begin{matrix} K (x_{i}, x_{j}) = {σ_{f}}^{2} (1 + \frac{\sqrt{5 {(x_{i}, x_{j})}^{T} (x_{i}, x_{j})}}{ℓ} + \\ \frac{5 {(x_{i}, x_{j})}^{T} (x_{i}, x_{j})}{3 ℓ^{2}}) exp (- \frac{\sqrt{5 {(x_{i}, x_{j})}^{T} (x_{i}, x_{j})}}{ℓ}) \end{matrix}

(3)

where ℓ is the length scales for a single input dimension and controls the smoothness of the function modeling and

σ_{f}

acts as a scaling factor. These are usually called hyperparameters

θ

, and ensuring their correct values plays a pivotal role in GPs. The length scales ℓ control the smoothness of the function by directly quantifying how much a single data point influences the space around it. The scaling factor

σ_{f}

controls the vertical scale of the function around its mean, influencing the uncertainty of the model.

Given the above assumptions and available noisy observations

y_{i} = f (x_{i}) + η_{i}

, where the noise

η

follows an independent, identically distributed Gaussian distribution with zero mean and fixed variance

η \sim N (0, σ_{y}^{2} I)

, the log marginal likelihood can be derived as:

\begin{matrix} log p (y ∣ X, θ) = - \frac{1}{2} log (det (K + σ_{y}^{2} I)) \\ - \frac{1}{2} {(y - μ)}^{⊤} {(K + σ_{y}^{2} I)}^{- 1} (y - μ) - \frac{N}{2} log (2 π) \end{matrix}

(4)

where the first term is a model complexity term, defined by the selection of the kernel function. The second term is a likelihood data-fit term showing the Bayesian weighting of the prior with the observed data. The third term is a constant term. Based on the available data, the log marginal likelihood can be maximized to obtain the optimal hyperparameters mentioned above.

The inherent noise of the available data

σ_{y}

is taken as a hyperparameter to infer along with the kernel parameters. This choice allows the model to determine the optimal level of noise during hyperparameter training. It enhances model flexibility by enabling adaptation to varying noise levels, and it provides valuable uncertainty information of what the model thinks of the data.

Predictions of the unknown states

y_{*}

can be obtained by conditioning their input locations

X_{*}

on the prior distribution of Equation (2), giving a predictive posterior distribution:

p (y_{*} ∣ X_{*}, X, y) = N (y_{*} ∣ μ_{*}, Σ_{*})

(5)

where

μ_{*}

represents the predicted mean values for new input locations

X_{*}

, and where

Σ_{*}

is the covariance matrix associated with the predicted values, computed, respectively, as:

μ_{*} = K_{*} {(K + σ_{y}^{2} I)}^{- 1} y

(6)

Σ_{*} = K_{* *} - K_{*} {(K + σ_{y}^{2} I)}^{- 1} K_{*}^{T}

(7)

3.1. Baseline GP Approach

The conventional single-fidelity approach to Gaussian Process modeling, here taken as a baseline, involves using a set of domain inputs,

X = (r, θ)

, where

x_{i} = {(r, θ)}_{i}

is a single location where a noisy experimental measurement

y_{i}

is taken. The unknown values,

y_{*}

, are the equivalent quantity of interest evaluated at non-sampled locations

X_{*} = (r_{*}, θ_{*})

.

f : X = [\begin{matrix} r_{1} & θ_{1} \\ ⋮ & ⋮ \\ r_{N} & θ_{N} \end{matrix}] \in R^{2} ⟼ y = {[\begin{matrix} P_{1} \\ ⋮ \\ P_{N} \end{matrix}]}_{E x p} \in R

(8)

This method provides a good approximation of unknown values close to input locations

x

where data are available but tends to become less accurate as the inference moves away from these locations, for example, in regions where few measurements are available or none exist, like in the boundary layer region of the flow.

3.2. CFD to Experiments Approach

The approach proposed in [11] involves coupling numerical data with experimental measurements in GPs. Instead of mapping the physical input domain

x = (r, θ)

, J RANS simulations flow fields (at various operating conditions of the machine) are used as inputs

x

. Using the case of total pressure at the compressor outlet, one input

x_{i}

is a j-dimensional vector that contains all the J RANS total pressure results evaluated at a grid point

i = (r, θ)

, where one experimental measurement

y_{i}

is available. The GP functional mapping of this work is represented by the following matrix format in Equation (9) for the total pressure at the outlet of the compressor.

f : X = {[\begin{matrix} P_{(1, 1)} & . . . & P_{(1, J)} \\ ⋮ & ⋮ \\ P_{(N, 1)} & . . . & P_{(N, J)} \end{matrix}]}_{C F D} \in R^{J} ⟼ y = {[\begin{matrix} P_{1} \\ ⋮ \\ P_{N} \end{matrix}]}_{E x p} \in R

(9)

where the matrix

X_{C F D}

represents the inputs for the GP model when incorporating CFD data. Each column of this matrix corresponds to a location

(r, θ)

in the physical domain, and the rows encapsulate the results of various RANS simulations at that location. Specifically,

P_{(i, j)}

denotes the total pressure obtained from the j-th RANS simulation at the i-th grid point

(r, θ)

.

The use of an anisotropic kernel in the GP model plays a crucial role in this context. An anisotropic kernel allows the model to adapt its lengthscales independently for each input dimension. In the case of multiple CFD simulations, this flexibility is essential for balancing lengthscales between different simulations. It enables the GP model to account for variations in the physical domain along the radial and pitch directions separately, ensuring a more accurate representation of the complex flow field. This adaptability is particularly beneficial when dealing with simulations under different operating conditions as it allows the GP model to capture variations in the flow characteristics across the entire parameter space.

3.3. Multi-Fidelity Approach

The last approach, previously applied to fluid mechanics by multiple authors in [12,13], leverages the auto-regressive formulation proposed by O’Hagan [14] for Multi-Fidelity GP modeling. The formulation assumes a linear dependency between each fidelity model. In a general formulation, it assigns a GP prior to each fidelity model t, where the higher fidelity model

f_{t}

is expressed as a function of the lower fidelity

f_{t - 1}

, multiplied by a scaling factor

ρ_{t - 1} (x)

plus a bias function

δ_{t} (x)

, which is itself a GP

δ_{t} (x) \sim N (μ_{δ}, K_{t})

.

f_{t} (x) = ρ_{t - 1} (x) f_{t - 1} (x) + δ_{t} (x)

(10)

The scaling factor

ρ_{t - 1} (x)

weights the cross-correlation between fidelity levels and it is considered an hyperparameter to be learned in the inference process. The above formulation ensures that at each fidelity level t, the conditional distribution of the GP

f_{t} (x)

is influenced by

f_{t - 1} (x)

thourgh the Markov property,

Cov \{f_{t} (x), f_{t - 1} (x^{'}) ∣ f_{t - 1} (x)\} = 0, \forall x \neq X^{'}

, which means that, assuming

f_{t - 1} (x)

is known, no more can be learned about

f_{t} (x)

from any other lower fidelity model output

f_{t - 1} (x^{'})

, for

x \neq x^{'}

.

In the present case, only two levels of fidelity are available, the high fidelity experiments and the low fidelity CFD, which is the default case in this work, and assuming a constant scaling factor

ρ

, the formulation of Equation (10) simplifies to:

f_{E x p} (x) = ρ f_{C F D} (x) + δ (x)

(11)

where

f_{E x p} (x)

and

f_{C F D} (x)

represent the high- low-fidelity datasets, respectively, and

δ (x)

is a GP

δ (x) \sim N (μ_{δ}, K_{b i a s})

. The prior GP model of the formulation of Equation (11) can be expressed as:

[\begin{matrix} f_{C F D} (x) \\ f_{E x p} (x) \end{matrix}] \sim N ([\begin{matrix} μ_{C F D} \\ μ_{E x p} \end{matrix}], [\begin{matrix} K_{C F D} & ρ K_{C F D} \\ ρ K_{C F D} & ρ^{2} K_{C F D} + K_{b i a s} \end{matrix}])

(12)

where

μ_{C F D}

and

μ_{E x p}

are the mean functions.

K_{C F D}

is the covariance matrix associated with the low-fidelity data,

ρ K_{C F D}

is the cross correlation matrix between fidelities, and

ρ^{2} K_{C F D} + K_{b i a s}

is the covariance matrix associated with the high-fidelity data.

The recursive approach of Le Gratiet [15] decouples the inference of the fidelities into standard GP inference. The key step of the formulation is to infer the posterior of the low-fidelity model

f_{C F D}^{\sim} (x)

independently and replace it with its prior in Equation (11). With this formulation, the multi-fidelity GP posterior distribution for the high-fidelity

p (f_{E x p} ∣ X_{E x p}, y_{E x p}, f_{C F D})

is fully defined, and predictions are made with following mean and covariance:

μ_{E x p, *} (x_{*}) = ρ μ_{C F D} (x_{*}) + μ_{δ} + K_{E x p, *} {(K_{E x p} + σ_{y, E x p}^{2} I)}^{- 1} [y_{E x p} - ρ μ_{C F D} (x_{E x p}) - μ_{δ}]

(13)

Σ_{E x p, *} (x_{*}) = ρ^{2} Σ_{C F D} (x_{*}) + K_{E x p, * *} - K_{E x p, *} {(K_{E x p} + σ_{y, E x p}^{2} I)}^{- 1} K_{E x p, *}^{T}

(14)

The proposed approach offers several benefits over the traditional GP regression approach. By using CFD as low fidelity data in the model, its high data density in the full domain enables the prediction of the unknown flow values

y_{E x p, *}

at non-sampled domain locations

x_{*} = (r, θ)

where the experimental probe cannot reach. This makes it possible to infer the flow at regions where few or no measurements are available, such as the hub boundary layer in the case of the H25 test case.

In summary, three distinct, increasing in complexity, modeling approaches—single-fidelity experiments, single-fidelity CFD to experiments, and multi-fidelity modeling—were presented for flow field reconstruction from undersampled measurements. The subsequent sections will present a detailed analysis of the performance in accurately reconstructing flow fields, especially in challenging scenarios with limited measurements, analyzing each approach’s strengths, limitations, and practical implications, shedding light on their effectiveness in real-world applications.

4. Results

This section presents a comprehensive analysis of various GP modeling approaches, with the goal of evaluating their performance within a data-driven hybrid measurement technique presented in Figure 2. An initial investigation involves a systematic comparison of all GP modeling approaches, utilizing a consistent measurement grid. This ensures a consistent evaluation of mean flow inference and its associated uncertainty across different approaches.

The data-driven hybrid measurement technique integrates a Design of Experiments (DoEs) phase, currently in development and not presented in this work. A preliminary application of this DoEs phase revealed that the optimal number of measurements is around one-third (33%) of the complete experimental reference test dataset.

The complete 950 experiment points dataset is divided into two distinct subsets: the training subset and the validation subset. The training subset corresponds to an optimal 33% measurement grid and is utilized in the optimization process, maximizing the marginal log-likelihood of Equation (4), enabling hyper-parameter learning and flow reconstruction. The validation subset, corresponding to the remaining 67% measurements, serves as the ground truth and is compared with the predicted flow reconstruction to evaluate the performance of the GP modeling approach.

To assess the different GP modeling approaches’ performance in predicting flow characteristics, the following fundamental metrics are assessed:

Root Mean Squared Error (RMSE): RMSE is a standard error metric that quantifies prediction accuracy. It calculates the square root of the average squared differences between predicted and actual values.

RMSE = \sqrt{\frac{1}{N_{v a l}} \sum_{i = 1}^{N_{v a l}} {(y_{*}^{i} - y_{v a l}^{i})}^{2}}

(15)

where

y_{*}^{i}

represents a predicted value and $y_{v a l}^{i}$ represents the corresponding ground truth validation measurement.

Maximum Absolute Error (MaxAE): MaxAE identifies the most significant prediction error, enabling a direct comparison with experimental measurement uncertainty. A notable difference between MaxAE and RMSE indicates problematic model performance in specific flow regions.

MaxAE = m a x_{1 \leq i \leq N_{v a l}} |y_{*}^{i} - y_{v a l}^{i}|

(16)

4.1. Baseline GP

This section presents a comprehensive analysis of the Baseline GP model within the context of the data-driven hybrid measurement technique, serving as a reference for subsequent comparisons with more advanced GP models. The baseline approach adopts a standard single-fidelity GP, associating each outlet pressure measurement

y_{i}

with its domain location

x_{i} = {(r, θ)}_{i}

.

The systematic application of the baseline GP model on a consistent measurement grid provides insights into mean flow inference and its associated uncertainty. Trained on a carefully selected 33% subset of the complete experimental dataset, the model demonstrates its capability to reconstruct the flow field across the entire spatial domain.

Figure 6 illustrates the comparison between the predicted total pressure flow values (

y_{*}

), on the right, and the ground truth validation data (

y_{v a l}

). The selected experimental grid is represented with white crosses.

The Baseline GP model shows a smooth flow field prediction, revealing a generally consistent match between predicted and actual flows. It accurately captures high-pressure regions and secondary flow structures in terms of both shape and location. Notably, the model successfully captures the radial contouring of the strong gradient wake and other secondary flows, such as the boundary layer hub region (A) and losses emanating from the rotor tip vortex in the casing (B). These are correctly predicted in terms of size and magnitude.

However, there are notable differences, particularly in predicting the minimum pressure value in the wake. The GP model struggles to infer the correct wake lower pressure region core (C), where higher losses are expected. Another observation is the transition between the hub secondary flow structure and the blade wake region, which appears more pointwise in the prediction but occupies a larger pitch size in the experimental reference.

The Bayesian modeling enables direct uncertainty propagation, as illustrated in Figure 7. This serves a dual purpose—showcasing the GP framework capability to predict the mean flow and its associated uncertainty, and providing insights into areas with potentially less certain predictions.

The model predicted uncertainty tends to increase rapidly outside the studied domain, which is particularly evident around the pitch close to zero. Furthermore, the model exhibits higher uncertainty in regions where no experimental measurements are available, as expected. The highest uncertainty within the domain of interest is observed in the strong gradient wake region, primarily around a radial span of 0.4, corresponding to the location of the wake core highlighted above. This increased uncertainty in regions of the wake and transition zones indicates that the model does not overstate its confidence in inherently challenging-to-predict flow regions. Importantly, even in regions with “high uncertainty”, the predicted uncertainty remains within the bounds of the full experimental test’s uncertainty. In other parts of the predicted flow, the uncertainty is comparatively lower.

The performance metrics presented in Table 1 provide a quantitative assessment of the Baseline GP model. Despite specific challenges in certain flow regions, the model demonstrates strong predictive capabilities, with both RMSE and MaxAE values at magnitudes of 1 ×

10^{- 3}

or lower. Notably, the MaxAE discrepancy, observed in the wake core, is still within a reasonable range. Importantly, both errors are smaller than the experimental uncertainty, suggesting that the Baseline GP model provides predictions aligning well with the expected variability in experimental measurements. This is a positive indicator of the model’s robustness in practical applications.

In summary, the Baseline GP model, while encountering specific challenges in certain flow regions, serves as a strong foundation for the subsequent comparison with other modeling approaches. The performance metrics validate its capability to predict the flow field with high accuracy and provide a realistic representation of uncertainty.

4.2. CFD to Experiments

This subsection presents the CFD to Experiments GP modeling approach, as proposed in [11]. Unlike the Baseline GP model, this approach introduces an innovative coupling of numerical data with experimental measurements within the GP framework.

In this approach, a total of J RANS simulations are executed, each corresponding to various operating conditions of the turbomachinery. These simulations provide comprehensive flow field data across the entire domain, effectively capturing the system’s behavior under different scenarios. Instead of directly mapping the physical input domain, each input

x_{i}

becomes a j-dimensional vector containing all J RANS total pressure results evaluated at a specific grid point

i = (r, θ)

. This strategic choice aligns with the case of total pressure at the compressor outlet, where each experimental measurement

y_{i}

is available.

Figure 8 visually compares the predicted total pressure flow values (

y_{*}

) using the CFD to Experiments GP approach (on the right) with the ground truth validation data (

y_{v a l}

) on the left.

The qualitative analysis of the CFD to Experiments GP model reveals distinctive features in the flow patterns. The model successfully captures key aspects, such as high-pressure regions and the wake. However, notable differences compared to the Baseline GP model are observed. Unlike the Baseline approach, the flow in this case exhibits sharper pointwise gradients in the radial contouring of the wake. Additionally, flow discrepancies appear larger, with the wake core not being correctly represented and its shape not aligning with the experimental reference (C).

In the tip region of the flow (B), the GP prediction displays a different flow feature shape, with the tip flow occupying a higher span and pitch area in the blade wake region and its having direction changed. This results in an overprediction of the pressure losses associated with this secondary flow.

Another effect of using CFD is the presence of a passage vortex signature in the hub region around pitch 0.5 in the predicted flow. This passage vortex is clearly visible in all CFD simulations at this span high, but it is not evident in the experimental reference (D). Despite extracting information from the CFD, this information might be incompatible with the experimental reference, hindering the GP modeling approach flow assessment capabilities.

Figure 9 extends the qualitative examination by illustrating the uncertainty in the predicted pressure ratio (

\pm 2 σ

). This provides insights into the areas where the CFD to Experiments GP approach exhibits higher uncertainty.

Overall, the uncertainty levels are equivalent or higher than in the Baseline GP case, with the lowest and highest predicted uncertainty values being double. However, the regions of higher uncertainty differ. Unlike the Baseline GP, where the highest uncertainty was in the domain limits, in the CFD to Experiments approach, the regions of higher uncertainty are in the region of the blade wake and in the boundary layer regions, both at the hub and tip, where the mean flow differences were highlighted above. Notably, the wake radial gradient contouring is clearly detected as a region of high uncertainty. This is believed to be due to the fact that the different CFD simulations, at different operating points, have different wakes. Similarly, the passage vortex region, not present in the experiments but present in the CFD, shows high uncertainty as well.

Table 2 presents the performance metrics of the CFD to Experiments Gaussian Process (GP) modeling approach and compares them with the Baseline GP using 33% of the measurements.

The increase in both RMSE and MaxAE for the CFD to Experiments modeling highlights a less accurate representation of the flow field compared to the Baseline GP. Specifically, the CFD to Experiments model exhibits larger discrepancies in predicting total pressure values across the spatial domain, particularly in some important flow regions, with the MaxAE being one order of magnitude higher. While the Baseline GP demonstrated robust performance, the integration of CFD data in this current framework seems to introduce complexities leading to larger errors.

Differences observed in the CFD to Experiments model, such as sharper gradients and discrepancies in specific flow features, are reflected in the quantitative metrics. Utilizing CFD data, despite capturing certain flow characteristics, introduces uncertainties and discrepancies contributing to higher error metrics. Challenges in integrating CFD data with experimental measurements are evident in the notable increase in errors.

While being a novel approach, the present integration of CFD data introduces challenges, resulting in higher prediction errors. The comparison with the Baseline GP emphasizes the importance of carefully evaluating the impact of additional data sources on overall model performance. Looking ahead, the subsequent subsection explores the Multi-Fidelity GP model as an alternative, aiming to address the challenges highlighted in the CFD to Experiments modeling approach.

4.3. Multi-Fidelity GP

The section presents the Multi-Fidelity GP modeling approach, building upon the framework proposed in [12]. The MFGP model integrates CFD data as low-fidelity information and experimental measurements as high-fidelity data to map the input domain space

x_{i} = {(r, θ)}_{i}

to pressure measurements

y_{i}

. This approach introduces CFD information and complexity over the Baseline GP, aiming to enhance the accuracy of flow field predictions.

Figure 10 visually compares the Multi-Fidelity GP model predicted total pressure flow values (

y_{*}

) with the ground truth validation data (

y_{v a l}

). The model demonstrates a visually seamless prediction, accurately capturing high-pressure regions and secondary flow structures (A) and (B). A distinctive enhancement is observed in the prediction of the wake lower pressure region (C), signifying improved accuracy in areas with higher losses. Additionally, the transition between the hub secondary flow structure and the blade wake region is more accurately represented, addressing a previous pointwise discrepancy in the Baseline GP.

Another characteristic of the Multi-Fidelity GP model is its ability to smooth the flow, providing a super-resolution effect. This inherent smoothing facilitates flow assessment with reduced experimental measurements, indicating potential benefits in terms of testing time and instrumentation effort.

Figure 11 shows the uncertainty of the Multi-Fidelity GP model, demonstrating a close resemblance to the Baseline GP. The model uncertainty pattern aligns with expectations, showing a rapid increase beyond the studied domain and higher uncertainty in regions without experimental measurements.

Contrary to initial expectations, the contribution of CFD uncertainty to the overall uncertainty is minimal. The scaling factor hyperparameter

ρ

, reflecting the relationship between CFD and experiments, is below 0.2, indicating a low contribution. This result is justified by a substantial discrepancy between RANS simulations and experiments, emphasizing the MFGP model’s adaptability when incorporating and propagating uncertainties in the presence of challenging data sources.

Table 3 provides a quantitative assessment of the Multi-Fidelity GP modeling approach, comparing its performance metrics with the Baseline GP.

The RMSE for the Multi-Fidelity GP is 4.5 ×

10^{- 4}

, showcasing a slight improvement of −4.3% compared to the Baseline GP. This reduction in RMSE indicates that the MFGP model provides a more accurate overall prediction of total pressure values, aligning closely with the experimental reference. The MaxAE, the Multi-Fidelity GP model, achieves a value of 1.6 ×

10^{- 3}

, representing a −5.9% decrease concerning the Baseline GP. The negative percentage change implies a reduction in the maximum prediction error.

The improvements observed in both RMSE and MaxAE highlight that the Multi-Fidelity GP model improved the capability to capture flow field characteristics compared to the Baseline GP. The model’s ability to better represent the wake core lower pressure region and refine the transition between flow structures contributes to these improvements.

Overall, the Multi-Fidelity GP approach demonstrates promising results, providing a more accurate and detailed representation of the flow field. The negative percentage changes in error metrics emphasize the advancement achieved by integrating CFD data as a supplementary information source in the present modeling framework. The Multi-Fidelity GP model’s effectiveness at mitigating the limitations observed in the Baseline GP positions is based in it being a valuable alternative for comprehensive flow field predictions in practical applications, which makes it an optimal data fusion approach for the proposed data-driven hybrid measurement technique in development.

4.4. Sensitivity Analysis

In this final segment of the results section, a comprehensive sensitivity analysis is conducted to evaluate the robustness and performance of the three GP modeling approaches—Baseline GP, CFD to Experiments, and Multi-Fidelity GP—under varying conditions of measurements subsampling and different measurement grids.

The sensitivity analysis evaluates subsampling percentages of 10%, 20%, 33%, and 50%, corresponding to absolute measurement points of 95, 190, 313, and 475, respectively. This systematic exploration aims to assess the models’ sensitivity to the quantity and distribution of experimental measurements, providing insights into their adaptability and performance across different measurement scenarios and grids.

Figure 12 presents the sensitivity of each modeling approach depicted through a series of RMSE box plots for different measurement subsampling percentages (10%, 20%, 33%, and 50%) and across different randomly generated measurement grids. The box plots provide a visual representation of the distribution of RMSE values, allowing for a comparative analysis of model performance under diverse measurement conditions. In these box plots, the whiskers span from the lowest to the highest values. The box itself represents the first and third quartiles, with the median shown as a line.

For all the different modeling approaches, an increase in the number of measurements leads to a reduction in RMSE. This reduction is particularly pronounced for the Baseline and the Multi-Fidelity GP when increasing the number of measurements. This behavior is due to the convergence of the hyperparameters, as discussed earlier. Once a sufficient number of measurements are used, the models effectively learn the correct data hyperparameters, and additional data do not significantly improve the modeling.

The results clearly indicate that the CFD to Experiments GP consistently performs worse than both the Baseline GP and Multi-Fidelity GP across all subsampling percentages. The box plots reveal higher median RMSE values for the CFD to Experiments GP, indicating a less accurate flow prediction compared to the other two models.

In contrast, the Baseline GP and Multi-Fidelity GP exhibit comparable performance. The box plots for both models show similar medians, suggesting a consistent level of accuracy across different subsampling percentages. However, the Multi-Fidelity GP stands out as a more robust option, as evidenced by its lower box plot spread and fewer outliers. This indicates that the Multi-Fidelity GP is less sensitive to variations in the experimental grid and maintains a stable performance even with reduced measurement data since the spread between the box plot whiskers can be primarily attributed to the influence of the randomly selected experimental measurement grids. The outliers correspond to random generated grids where the measurement points are clustered in regions of low-gradients. In these cases, the Multi-Fidelity GP can still rely on the gradients from the CFD, while the Baseline GP model cannot due to lack of information.

These findings highlight the advantages of the Multi-Fidelity GP model, showcasing its robustness and ability to provide accurate predictions with a reduced number of experimental measurements for the proposed data-driven Bayesian hybrid measurement technique in development.

5. Conclusions

In this work, three Gaussian Process (GP) modeling approaches—Baseline GP, CFD to Experiments GP, and Multi-Fidelity GP—are compared in the context of a data-driven hybrid measurement technique for turbomachinery applications. The goal was to evaluate their effectiveness in predicting mean flow characteristics and associated uncertainties, with a particular focus on informing the development of a hybrid measurement approach.

The Baseline GP served as a robust foundation, showcasing accurate predictions of the flow field and providing realistic uncertainty estimates. However, the novel integration of CFD data in the CFD into Experiments GP introduced complexities, resulting in more prediction errors and uncertainties, highlighting the need for careful consideration of additional data sources.

The Multi-Fidelity GP approach integrates CFD data as low-fidelity data and experimental measurements as high-fidelity data. It demonstrated enhanced accuracy in predicting critical flow features, such as the wake core and hub and casing regions.

The sensitivity analysis showed the Multi-Fidelity GP robustness, even with a reduced number of experimental measurements that might not be optimal. This is relevant in practical applications where resource constraints may limit the number of measurements. The Multi-Fidelity GP is, therefore, a reliable data fusion method for the proposed hybrid measurement technique, offering the potential to streamline instrumentation efforts and greatly reduce testing time.

This study lays the groundwork for further investigation into the scalability and practical implementation of these GP modeling approaches in real-world turbomachinery testing environments. Ongoing research is focusing on predicting the optimal number and location of measurement points based on CFD information. This method will then be experimentally validated in an industrial test case, to evaluate its feasibility and impact on reducing instrumentation efforts and testing time.

Author Contributions

Conceptualization, F.F. and X.O.; methodology, G.G.C.; software, G.G.C.; validation, G.G.C.; formal analysis, G.G.C.; investigation, G.G.C., F.F. and X.O.; resources, F.F. and X.O.; writing—original draft preparation, G.G.C.; writing—review and editing, F.F. and X.O.; visualization, G.G.C.; supervision, F.F. and X.O.; project administration, F.F. and X.O.; and funding acquisition, F.F. and X.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The design of the considered test article is the property of Safran Aircraft Engines. As such, the sharing of geometrical data and absolute performance parameters is either restricted or not allowed.

Acknowledgments

The authors would like to express their gratitude to Cedric Babin for the use of the H25 compressor stage data used for the present study, and Cadence for granting the use of the simulation software.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CFD	Computational Fluid Dynamics
DoEs	Design of Experiments
GP	Gaussian Process
RMSE	Root Mean Squared Error
MaxAE	Maximum Absolute Error

References

Hammond, J.; Pepper, N.; Montomoli, F.; Michelassi, V. Machine Learning Methods in CFD for Turbomachinery: A Review. Int. J. Turbomach. Propuls. Power 2022, 7, 16. [Google Scholar] [CrossRef]
Venturi, D.; Karniadakis, G.E. Gappy data and reconstruction procedures for flow past a cylinder. J. Fluid Mech. 2004, 519, 315–336. [Google Scholar] [CrossRef]
Bertram, A.; Bekemeyer, P.; Held, M. Fusing distributed aerodynamic data using Bayesian Gappy Proper Orthogonal Decomposition. In Proceedings of the AIAA Aviation and Aeronautics Forum and Exposition, AIAA AVIATION Forum 2021, Online, 2–6 August 2021; American Institute of Aeronautics and Astronautics: Reston, VA, USA, 2021. [Google Scholar] [CrossRef]
Lou, F.; Key, N.L. Reconstructing Compressor Non-Uniform Circumferential Flow Field From Spatially Undersampled Data—Part 1: Methodology and Sensitivity Analysis. J. Turbomach. 2021, 143, 081002. [Google Scholar] [CrossRef]
Seshadri, P.; Duncan, A.B.; Thorne, G.; Parks, G.; Vazquez Diaz, R.; Girolami, M. Bayesian assessments of aeroengine performance with transfer learning. Data-Centric Eng. 2022, 3, e29. [Google Scholar] [CrossRef]
Brunton, S.L.; Noack, B.R.; Koumoutsakos, P. Machine Learning for Fluid Mechanics. Annu. Rev. Fluid Mech. 2020, 52, 477–508. [Google Scholar] [CrossRef]
Aissa, M.H.; Verstraete, T. Metamodel-assisted multidisciplinary design optimization of a radial compressor. Int. J. Turbomach. Propuls. Power 2019, 4, 35. [Google Scholar] [CrossRef]
Babin, C. Impact of Shrouded Stator Cavity Flow on Axial Compressor Performance and Stability. Ph.D. Thesis, Université de Lyon, Lyon, France, 2022. [Google Scholar]
Menter, F.R. Two-equation eddy-viscosity turbulence models for engineering applications. AIAA J. 1994, 32, 1598–1605. [Google Scholar] [CrossRef]
Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; The MIT Press: Cambridge, MA, USA, 2005; Volume 7, pp. 32–46. [Google Scholar] [CrossRef]
Cruz, G.G.; Ottavy, X.; Fontaneto, F. Improvement of Measurement Accuracy Using Bayesian Inference—Reduction of Instrumentation Effort in an Axial Compressor. J. Phys. Conf. Ser. 2023, 2511, 012002. [Google Scholar] [CrossRef]
Cruz, G.G.; Babin, C.; Ottavy, X.; Fontaneto, F. A Bayesian Data Driven Multi-Fidelity Modelling Approach for Experimental Under-Sampled Flow Reconstruction. In Volume 13A: Turbomachinery—Axial Flow Fan and Compressor Aerodynamics; American Society of Mechanical Engineers: New York, NY, USA, 2023; Volume 87080, p. V13AT29A043. [Google Scholar] [CrossRef]
Babaee, H.; Perdikaris, P.; Chryssostomidis, C.; Karniadakis, G.E. Multi-fidelity modelling of mixed convection based on experimental correlations and numerical simulations. J. Fluid Mech. 2016, 809, 895–917. [Google Scholar] [CrossRef]
Kennedy, M.; O’Hagan, A. Predicting the output from a complex computer code when fast approximations are available. Biometrika 2000, 87, 1–13. [Google Scholar] [CrossRef]
Le Gratiet, L.; Garnier, J. Recursive co-kriging model for design of computer experiments with multiple levels of fidelity. Int. J. Uncertain. Quantif. 2014, 4, 365–386. [Google Scholar] [CrossRef]

Figure 1. Overall pressure ratio evolution in engine trends over the years, with crosses representing the current flying engines.

Figure 2. Proposed data-driven Bayesian hybrid measurement technique schematic for a complete accurate flow assessment and uncertainty quantification with a reduction in instrumentation usage and testing time.

Figure 3. VKI R4 facility layout for axial compressor studies.

Figure 4. H25 compressor stage experimental test section meridional view. Measurement planes in blue. Rotating machinery in red. Stator row in green.

Figure 5. H25 compressor stage numerical domain meridional view and blade-to-blade zoom cut (distorted blade geometry).

Figure 6. Baseline GP total pressure prediction (right) against fully experimental reference (left), with 33% measurements used, represented by the white marks. A, B and C represent critical flow features.