1. Introduction
The reactive flow in porous media plays an important role for many industrial, environmental and biomedical applications. Since the reactions occur at the pore scale, 3D pore scale simulations are very important. At the same time, pore scale measurements are difficult or impossible, and usually, some averaged quantities are measured. One such quantity of interest that can be measured is the breakthrough curve, i.e., the time-dependent integral of the species over the outlet. These breakthrough curves can be computed from a reaction–advection–diffusion equation on a porous medium, which is numerically solved. However, this usually leads to very high computational costs, which might be prohibitively high in a multiquery application, e.g., optimization of the geometry of the porous medium. In that case, a surrogate model for the full order simulation model (FOM) is required. A promising way of obtaining an adequate surrogate model is the use of machine learning (ML) techniques. ML techniques are nowadays widely spread and employed for a variety of tasks, including the estimation of damage for buried pipelines [
1] or automatic vision-based sewer inspection [
2]. ML algorithms are developed and tested for porous media flows with different complexity. The literature in this area is very rich, and we cite some exemplary papers simply to point to the place of our research. Numerous papers discuss predicting permeability from microscale geometry. For example, see [
3,
4,
5] and the references therein. At the next level of complexity, e.g., in [
6], the capability of using deep learning techniques has been shown in the case where the velocity field is predicted from the morphology of a porous medium. On the same topic, improvement is achieved by incorporating coarse velocities in the learning process, see [
7]. In the last decade, the application of ML techniques for the simulation of passive and reactive transport has rapidly grown. One of the directions in this case is using ML to predict reaction rates when those are very expensive in the case of complex reactions. This can be implemented without taking into account the geometry, see e.g., [
8,
9,
10], or predicting the rate from structure features, see, e.g., [
4,
11,
12], to name just a few. Recently, the potential of using machine learning models as surrogates for predicting breakthrough curves for varying physical parameters, i.e., Damköhler and Peclet numbers, on a fixed porous medium geometry [
13] has been reported.
In the current paper, we address the task of predicting the breakthrough curves for varying geometries of the porous medium with fixed Damköhler and Peclet numbers. Our work is in the same direction as that of [
4,
11], but there are essential differences. In [
11], ML addresses the impact of the structural features on the effective reaction rate to overcome the limitation of the well-mixing assumption. Pore scale reactive flow in an inert skeleton is considered there. In [
4], pore scale colloid transport is considered as a part of a filtration problem. Steady-state problems are solved. In our case, our pore scale geometry is in fact a two scale media, as the active washcoat particles are nanoporous. We consider the diffusion of the species within the washcoat, where the reaction occurs. Furthermore, while the other papers discuss different neural network algorithms, we consider a kernel-based method, namely, VKOGA. To the best of our knowledge, such studies have not yet been discussed in the literature.
Because of the computationally demanding simulation of the FOM, we are limited to a scarce data regime as we can only afford to compute a few samples. Furthermore, the input dimension (the number of elements of the discretization of the porous medium) and the output dimension (the number of time steps used during the FOM simulation) are very high. This yields a challenging task for machine learning techniques [
14]. For this purpose, greedy kernel approximation schemes [
15,
16] have been shown to be efficient. These meshless approximation techniques can be combined with deep learning techniques to yield two-layered kernels [
17,
18], which have also already been successfully applied for certified and adaptive surrogate modeling for heat conduction [
18], as well as surrogate modeling of chemical kinetics [
8].
Our work is organized as follows: In
Section 2, we introduce the underlying equations of the 3D porous medium reaction–advection–diffusion model. In
Section 3, we give an overview on greedy kernel methods and two-layered kernels. The numerical experiments are provided in
Section 4, and we conclude our work in
Section 5.
3. Kernel Methods
Kernel methods [
22] comprise versatile tools for scattered data approximation, revolving around the notion of a symmetric kernel
. An important type of kernel is given by strictly positive definite kernels, i.e., kernels such that the so-called kernel matrix
is positive definite for any choice of pairwise distinct points
,
. In the context of machine learning, the set
is often referred to as the training set
, i.e.,
In the following, we focus on the popular class of radial basis function kernels on
, which can be expressed as
using a radial basis function
. Popular instances of such kernels are given by the following:
which are also later on used in
Section 4. As we do not know about the regularity of the target function, we apply kernels that are infinitely smooth (Gaussian kernels) or finitely smooth (Matérn kernels) in order to cover various possible regularities of the target function. The parameter
is the so-called shape parameter, which allows us to tune the width of these kernels.
Given not only the input data
but also the corresponding (possibly vector-valued) target values
,
, a well known representer theorem states that the optimal least squares kernel approximant is given by
with coefficients
. These coefficients can be computed directly by solving the regularized linear equation system
, where
and
refer to a collection of the coefficients and target values in arrays. The regularization parameter
allows us to steer the robustness to outliers/noise versus the approximation accuracy in the training points. The value
corresponds to kernel interpolation. For this case
, the assumption on the strict positive definiteness of the kernel still ensures the solvability of the system.
Greedy kernel approximation: In order to obtain a sparse kernel model, one strives to have a smaller expansion size
within Equation (
4). For this, greedy algorithms can be leveraged, which select a suitable subset of centers
from
. For this, they start with
and iteratively add points from
to
as
, according to some selection criterion. While there are several selection criteria in use in the literature, we focus on the
f-greedy selection criterion, which reads
This residual-based selection criterion incorporates the data point of the largest error, i.e., directly aims at minimizing the maximal absolute error of the kernel model. An efficient implementation of such greedy kernel algorithms is provided by the VKOGA package [
15], and a comprehensive analysis of the convergence of such greedy kernel algorithms was provided in [
16]. Because of the greedy strategy, VKOGA is often a superior method compared to other kernel-based strategies. Other methods for obtaining sparse, kernel-based models are sparse Gaussian processes [
23,
24], support vector regression (SVR) [
25] or the reduced support vector set approach [
25]. The computational efficiency of VKOGA kernel models is shown, for example, in [
26], where a comparison of VKOGA with support vector machines was performed, which revealed that VKOGA results in sparser models and, hence, less computational time for training. Similarly, less computational time for prediction was observed in [
13], where Gaussian processes and VKOGA were compared. In [
27], a comparison of VKOGA and SVR over other ML techniques was performed in a turbulent flow prediction scenario. The study revealed that those kernel methods gave the best performance regarding the quality of the approximated discrete-time dynamics.
Two-layered kernels: In order to incorporate feature learning into kernel models, we make use of two-layered kernels [
17,
18]: These make use of a base kernel
k, as given above, and combine it with a matrix
, such that the two-layered kernel is given by
With this, the two-layered kernel can be optimized to the given data
by optimizing the first layer matrix
. Thus, this two-layered structure may considerably improve the performance of the kernel model, especially for medium- to high-dimensional input data, where an equal importance of all features is highly unlikely. The strength of two-layered kernel models and their superior performance over shallow kernel models has been observed, for example, when applied to heat conduction [
18] or when used as a surrogate model for chemical kinetics [
8]. We investigate whether this behavior can also be observed for our current problem. For the optimization of the matrix
, we employ the fast-gradient-descent-based mini-batch optimization proposed in [
17] and extended it to vector valued target values
in [
18]. The overall idea is to leverage an efficiently computable leave-one-out cross validation (LOOCV) error loss, which, thus, makes the kernel generalize well to the unseen data. In particular, this cross validation error loss makes use of both input and target values, i.e., it is a supervised optimization.
The matrix makes the two-layered kernel even more interpretable: The large singular values with corresponding right singular vectors within the singular value decomposition of the matrix correspond to the more important features within the data set, while the smaller singular values with the corresponding right singular vectors correspond to the less important features within the data set.
In the following
Section 4, we make use of the notion “single layered kernel” to refer to the standard kernel
k, while we make use of the notion “two-layered kernel” to refer to (optimized) kernels
.
4. Numerical Experiments
In this section, we present the numerical experiments. The goal is to approximate breakthrough curves Equation (
2) from voxel data, which characterizes the geometry of the porous medium. As explained in
Section 2, the voxel data are described by a vector
. In our numerical experiment, we choose
. We discretize a breakthrough curve
on an equidistant temporal grid
,
, as the vector
with
time steps.
We compare two different kernel-based models for learning the breakthrough curves (see
Figure 2), in which either morphological features (MF) prescribed by expert knowledge (see
Appendix A) are extracted or principal component analysis (PCA) features are learned from the data using a PCA. The kernel function
k is chosen to be either a shallow one-layered or a two-layered kernel. We refer to these models in the scope of our paper as MF-1L-kernel and MF-2L-kernel or PCA-1L-kernel and PCA-2L-kernel, depending on the depth of the kernel (one-layered/two-layered) and whether we extract morphological features or PCA features.
Both types of models (MF) and (PCA) are based on a feature map
with
features and a kernel function
with
, such that the resulting kernel models using the single-layer or the two-layered kernels are given as
with coefficients
,
and centers
for
. The expansion size
n is fixed to
Choosing smaller values for
n would worsen the results presented in the next subsections considerably, while increasing
n would only influence the approximation quality on the test set slightly.
All of these models involve hyper-parameters, which are listed in
Table 1. Suitable values for the hyper-parameters are determined via an LOOCV on the training data set. Note that in the cases where two-layered kernels are applied, no LOOCV for the shape parameter is performed, and instead, the matrix
is optimized using the procedure described in the previous section. The kernel function and the regularization parameter are determined via LOOCV in both cases.
In the following, we discuss the generation of the training and test data (
Section 4.1), the training and results of the (MF) models based on morphological features (
Section 4.2) and of the (PCA) models based on PCA features (
Section 4.3).
4.1. Training and Test Data
In total, we consider
samples
of voxel data
and the corresponding time-discretized breakthrough curves
(see
Figure 3). The breakthrough curves
are obtained by solving Equations (
1) and (
3) based on the geometry described by the corresponding voxel data
. The average computational cost of solving Equations (
1) and (
3) for one sample is 7514 wcs with a standard deviation of 286 wcs on a workstation with two Intel Xeon Gold 6240R (48 cores in total).
We use an approximate 80–
training–test split, which results in
training samples
and
test samples
with
and
. To demonstrate robustness with respect to the choice of the training–test split, we consider three different random splits, where we use the same three random splits to measure the performance of each model. As an error measure, we use a relative error on the test set
4.2. Kernel Models on Morphological Features
Because of the high dimensionality of the input space, it is necessary to apply some reduction technique before training the machine learning model. In this section, we extract
morphological geometry features that describe the porous medium. These are the porosity, the washcoat volume fraction, the volume of free pores, the surface area for the free pores, the integral of mean curvature of the free pores and the integral of total curvature of the free pores. For further information on how they are computed, see supplementary
Appendix A. Thus, the machine learning model can be represented by setting
in Equation (
6)
the mapping that computes the morphological features from a given geometry.
For the first experiment, we use shallow kernels and present the approximated breakthrough curves in
Figure 4. We observe that except for the red outlier curve in the bottom diagram, all curves are well approximated. This corresponds to the relative test error presented in
Table 2, where we see that for the first two data splits, a relative error of approximately
can be achieved; whereas, for the third split, we only achieve an error of approximately
. We further observe from
Table 2 that for the first two data splits, exactly the same hyper-parameters (kernel function, shape parameter and regularization parameter) are chosen. However, they are chosen differently for the third data set, which may be due to the red outlier curve being part of the test set for the third data split.
For the next experiment, we included two-layer optimization of the kernel model and present the approximated breakthrough curves in
Figure 5. We observe that a similar approximation quality can be achieved using two-layered kernels. All breakthrough curves except for the red outlier curve in the bottom diagram are well approximated. The relative test error (see
Table 3) is slightly improved for the third data split. However, for the first and second split, it becomes slightly worse. We observe that similar to the experiment with shallow kernel models, for each data split, the Matérn 1 kernel is selected and that for the third split, a smaller regularization parameter is selected than for the first and second one. To compare the shape transformations by the first layer, we present the first layer matrix
in
Table 3. For an easier comparison, the matrix
is visualized using the color-map at the bottom of the table. We observe that the matrices look very similar for each data split.
4.3. Kernel Models on PCA Features
Next, we use the PCA to define a feature map
. In order to be comparable to the previous experiment, we choose the number of features as
. Following the idea of the PCA [
28], we choose the PCA feature map
based on the first
left-singular vectors
of the matrix
via the SVD
and set
in Equation (
6) (technical note: The voxel data are saved as a boolean array
. To compute the SVD, we convert these data to a floating point number, which is why
).
In
Figure 6, we present the prediction of the breakthrough curves for the PCA-1L-kernel model on the test set
for the three different randomized training–test splits. We observe that many of the breakthrough curves are well approximated and a mean relative error of approximately 0.5–1.3% (see
Table 4) can be achieved. However, some breakthrough curves are badly approximated, especially the two blue curves for the first data split. We observe that three different kernels, three different shape parameters and three different regularization parameters are chosen for each data split. We further observe from
Section 4.3 that
is also a suitable choice in the sense that a small test error is achieved for that setting compared to higher and smaller values of
.
For the next experiment, we consider the PCA-2L-kernel model. The results for the analogous experiments to the previous paragraph are presented in
Figure 7. We chose again
to be consistent with the previous experiments. However, as we observe in
Section 4.3, for
, slightly better results could have been achieved.
We observe that the results from the shallow kernel models are considerably improved by using two-layered kernels. The relative errors from
Table 5 can be reduced to 0.08–0.7%, and we do not observe badly approximated curves for the first data split anymore. Compared to the the MF-2L-kernel experiments from the previous section, we also observe that the approximation qualities are quite similar. This means that the PCA features transformed by the first layer of the two-layered kernel are able to describe the geometry almost as well as the morphological features. This shows the strength of two-layered kernels, as the PCA feature extraction is purely data-based, and no expert knowledge had to be used.
We further observe from
Table 5 that, again, similar hyper-parameters are selected for each data split. For example, the same kernel (Matérn 1) was selected for the first and third data split. Moreover, we see similarities in the first layer matrices
of the kernels. For all three matrices, the entries in the left upper
block are considerably larger than the other entries, which means that the first three modes are the important ones. Compared to
Table 3, there are larger differences between the first layers. This can be explained by the fact that in contrast to the MF-feature map, the data-based PCA-feature map is different for all three data splits. This is due to the linear mappings defined by the PCA modes being different in each split.
Lastly, we compare the mean errors averaged over the three different training–test splits in
Table 6. We observe that the models based on morphological feature extraction work very well in combination with both one-layered and two-layered kernels. In contrast, combining one-layered kernels with PCA feature extraction yields an average error that is almost one order of magnitude larger. This error is considerably improved by applying two-layered kernels, and almost the same accuracy as with the morphological feature extraction is achieved.
4.4. Runtime Discussion
In this subsection, we compare the runtimes of the different methods in
Table 7. We compare the time needed for the computation of the feature map added to the feature extraction time on the test set (first column), the time needed for LOOCV, the training time of the final model and the evaluation time on the test set. Note that the full order model solves and the morphological feature extraction are performed on a different machine (workstation with two Intel Xeon Gold 6240R (48 cores in total)) than the PCA feature extraction and kernel model training (performed on a computer with 64 GB RAM and a 13th Gen Intel i7-13700K processor) due to licensing limitations. We observe that the LOOCV for the two-layered models is way more expensive than for the shallow models, which is due to the optimization of the first layer. However, it is still considerably lower (approximately 30 times for the MF-2L-kernel and 7 times for the PCA-2L-kernel) than the generation of a single training sample (7514 wcs). Furthermore, the LOOCV time and final model training of the MF-kernel models is less expensive than the training on the morphological features. However, the computation of the PCA feature map is less expensive than the computation of the morphological features. Moreover, the evaluation time of the PCA-kernel models is larger than the evaluation of the MF kernel models. Since the computation of the morphological features i.e., the evaluation of
takes approximately 2 wcs per sample, the MF approaches have a larger overall evaluation time (summing evaluation time of MF extraction and MF-1L-kernel/MF-2L-kernel). Compared to the FOM, this results in a speed-up of almost 6 orders of magnitude for the PCA kernel models and 3 orders of magnitude for the MF kernel models. Nevertheless, in both cases, the evaluation times are considerably lower than the computation of the training samples.
5. Conclusions and Outlook
In this work, we demonstrated how breakthrough curves can be predicted from the geometry of a 3D porous medium. We presented two approaches on how to treat the high dimension of the input data: For the first approach, we computed morphological features of the geometries and learned a mapping from these morphological features to the breakthrough curves. For the second approach, we computed PCA features of the geometry data set and learned a mapping from these PCA features to the breakthrough curve.
We observed that the MF approach worked well in combination with one-layered kernels and that both approaches worked well in combination with two-layered kernels. This is compelling, as the morphological features used in the study are well-known informative and predictive quantities for porous media, and in contrast, the PCA feature extraction approach is purely data-based. Thus, the results indicate the strength of using two-layered kernels, as they automatically adapt to the relevant features. We further observed that the scarce-data situation did not prevent these approaches from easily predicting the high-dimensional outputs.
The future work will focus on how ideas from convolutional neural networks can be combined with the framework of deep multi-layered kernels. Moreover, we will investigate whether data augmentation (rotating/reflecting geometry samples without changing the breakthrough curves) can further improve the PCA-feature approach while avoiding the computation of expensive FOM solutions.