Phenomenological Modeling of Antibody Response from Vaccine Strain Composition

Ovchinnikov, Victor; Karplus, Martin

doi:10.3390/antib14010006

Open AccessArticle

Phenomenological Modeling of Antibody Response from Vaccine Strain Composition

by

Victor Ovchinnikov

^1,* and

Martin Karplus

^1,2

¹

Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA

²

Laboratoire de Chimie Biophysique, ISIS, Université de Strasbourg, 67000 Strasbourg, France

^*

Author to whom correspondence should be addressed.

Antibodies 2025, 14(1), 6; https://doi.org/10.3390/antib14010006

Submission received: 30 November 2024 / Revised: 11 January 2025 / Accepted: 14 January 2025 / Published: 16 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

The elicitation of broadly neutralizing antibodies (bnAbs) is a major goal of vaccine design for highly mutable pathogens, such as influenza, HIV, and coronavirus. Although many rational vaccine design strategies for eliciting bnAbs have been devised, their efficacies need to be evaluated in preclinical animal models and in clinical trials. To improve outcomes for such vaccines, it would be useful to develop methods that can predict vaccine efficacies against arbitrary pathogen variants. As a step in this direction, here, we describe a simple biologically motivated model of antibody reactivity elicited by nanoparticle-based vaccines using only antigen amino acid sequences, parametrized with a small sample of experimental antibody binding data from influenza or SARS-CoV-2 nanoparticle vaccinations. Results: The model is able to recapitulate the experimental data to within experimental uncertainty, is relatively insensitive to the choice of the parametrization/training set, and provides qualitative predictions about the antigenic epitopes exploited by the vaccine, which are testable by experiment. For the mosaic nanoparticle vaccines considered here, model results suggest indirectly that the sera obtained from vaccinated mice contain bnAbs, rather than simply different strain-specific Abs. Although the present model was motivated by nanoparticle vaccines, we also apply it to a mutlivalent mRNA flu vaccination study, and demonstrate good recapitulation of experimental results. This suggests that the model formalism is, in principle, sufficiently flexible to accommodate different vaccination strategies. Finally, we show how the model could be used to rank the efficacies of vaccines with different antigen compositions. Conclusions: Overall, this study suggests that simple models of vaccine efficacy parametrized with modest amounts of experimental data could be used to compare the effectiveness of designed vaccines.

Keywords:

IgG; vaccination; simulation; influenza; hemagglutinin; coronavirus

Graphical Abstract

1. Introduction

Infections by highly mutable viruses cause high mortality and morbidity around the world. For example, according to the CDC, between the years 2010–2014, influenza-associated deaths in the United States alone ranged from 12,000 (winter of 2011–2012) to 56,000 (winter of 2012–2013). The currently available flu vaccines need to be reformulated yearly because the high mutation rate of the circulating flu strains continuously renders preexisting immunity obsolete. Despite the yearly adjustments, the long time lag in producing the vaccine relative to the high rate of antigenic drift can result in a low level of protective immunity (10–60%) [1]. For example, in 2022, the seasonal influenza vaccine efficacy was estimated at 16% [2], significantly below the approximate threshold needed for herd immunity of 50% (assuming a basic viral reproduction number

R_{0}

[3] of 2).

The COVID-19 pandemic resulted in rapid development of highly effective vaccines [4], with effectiveness of ≥90% against the original (Wuhan) strain [5,6]. However, mutations in the coronavirus spike protein [7], compounded by high transmission rates [3] and selective evolutionary pressure, exerted by vaccine-induced antibodies, caused the emergence of viral “escape” variants, against which the antibodies induced by the standard prime-boost vaccination regimen had much reduced efficacy, e.g., 67–70% against, e.g., the Omicron variant [8]. Such levels of efficacy are lower than the 80% requirement for herd immunity based on a conservative

R_{0}

estimate of 5 [9].

The situation is more dire with HIV, which causes about 680,000 yearly deaths worldwide [10], and no effective vaccine is available, despite ongoing efforts [11].

The above statistics demonstrate that universal, ideally permanent, vaccines for highly mutable viral pathogens are desirable. One strategy for developing such vaccines focuses on the elicitation of broadly neutralizing antibodies (bnAbs), i.e., the Abs that are able to neutralize a broad range of antigen variants [12,13], in contrast to strain-specific antibodies, which typically dominate the adaptive immune response [14].

Universal vaccine candidates designed to elicit broad protection typically aim to present multiple epitopes to the immune system in the form of chimeric antigens harboring a patchwork of different epitopes [15], in the form of optimized antigenic cocktails [16,17], or as co-display of different antigens on nanoparticles [18,19,20,21].

An important step toward robust rational design of such vaccines is being able to predict properties of the antibody response to vaccination. Such methods could be used in comparisons of the modeled outcomes to different vaccination cocktails or regimens, to select those that predict the highest antibody breadth or potency.

Many models exist that simulate affinity maturation (AM) inside germinal centers using concepts from biology and physics (reviewed in Ref. [22]). In these models, Darwinian evolution, driven by random mutations and the selection of B-cell receptors (BCRs) on the basis of their affinity for antigen, generates high-affinity antibodies. However, most of the AM models are too approximate or coarse-grained to be able to evaluate specific vaccines [23,24] and thus provide only general design guidelines (e.g., regarding efficacy of single-antigen vs. cocktail vaccination [25,26], or regarding the effects of binding valency or antigen concentration [24]). Although advances in computational methods and speed are enabling modeling of BCR/antigen interactions in all-atom detail [27], such models are still too expensive to be used for generating meaningful statistics.

Here, we propose a simple biologically motivated phenomenological model for reactivity to antigen variants of post-vaccination sera elicited by nanoparticle vaccines with prescribed antigenic sequences. The key assumptions in the models are that (i) antibody reactivity is higher when the test strain—that against which reactivity is measured—is similar to the strains in the vaccine, and (ii) different amino acid residues make different contributions to the overall binding titer; the contributions are represented in the form of residue weights, optimized by fitting to experimental vaccination data for influenza [20] or SARS-CoV-2 [21]. Residues with high weights can be interpreted as being involved in antibody binding, which can be tested by experimental mutagenesis to assess model accuracy. In addition to parametrization using experimental data, the model requires only the antigenic sequences, so that structural information, which could be difficult or time-consuming to obtain, is not needed. We conclude with an example of how the model could be used to compare the efficacies of different antigen vaccination cocktails.

2. Materials and Methods

2.1. Model Overview

Our modeling approach is motivated by the recent promise of nanoparticle vaccines in eliciting cross-reactive immunity [18,19,21,28]. We developed the model to be especially predictive if the elicited immunity is composed of cross-reactive antibodies, rather than only of multiple strain-specific antibodies. More specifically, if a particular B-cell receptor (BCR) has significant affinities for multiple strains in the cocktail via the same general epitope or region on the antigen, we assume that the overall selection pressure acting on the corresponding B-cell during affinity maturation (AM) will be the average of the individual (strain-specific) selection pressures. As the main result of AM is to increase the binding affinity of BCRs to their cognate antigen(s) by improving the structural complementarity of the BCR paratope to the epitope via interfacial residue composition matching [27], we assume that the effective evolutionary driving force is directed toward the epitope with the “average” composition. As the structures of the antigens are generally unknown, we model the effect of the average structure through the average sequence, denoted above by

\bar{V}

. If most of the Abs elicited by vaccination arise from such BCRs, then we would expect the model to recapitulate the experimental data [20,21]. We would also expect such Abs to be cross-reactive in the antigenic “subspace” spanned by the antigens that are present in the vaccine. We note that the AM process assumed above is similar to the idea of “frustration” [23,29], whereby different co-administered antigens exert conflicting selection pressures on BCR evolution, resulting in some averaged or compromise driving force. We caution that the above justification for the model is somewhat limited by our use of a single set of residue weights

W

. Although the model is generalizable to multiple weight vectors, which could be interpreted as, e.g., quantifying the contributions of distinct antigen epitopes, more complex models would require larger experimental datasets, and are left to future studies. Possible modifications to the modeling approach are discussed in the Supporting Material.

2.2. Sequence Coordinate Notation

Let

\hat{S} \in M_{N_{s} \times N_{r}}

denote a multiple sequence alignment (MSA) matrix of

N_{s}

antigen sequences, with each row

S_{i}

of

\hat{S}

corresponding to a unique sequence i of length

N_{r}

, possibly with gaps to facilitate the MSA.

\hat{S}

is represented by the standard 20-amino-acid alphabet (A) and the gap symbol (

G \equiv^{'} -^{'}

).

To be able to perform standard mathematical operations on

\hat{S}

, we embed

\hat{S}

in an n-dimensional (

n D

) vector space, as follows. To each residue

s_{i j}

at position j in

S_{i}

, we assign a vector

x_{i j} \in R_{n} \equiv x (s_{i j})

(see Figure 1). A number of such embeddings (also called encodings [30]) can be found in the literature [30,31,32,33], which have been used in machine learning models of sequence–activity relationships [33,34]. We note that the inverse map

s (x)

for arbitrary

x

is not needed in our study, because we do not create new sequences. However, such an inverse could be defined as a suitable projection onto the set of allowed residue coordinates, e.g.,

s (x) = arg min_{s \in A \cup G} ∥ x - x (s) ∥,

(1)

as conducted in Ref. [34].

Previous studies have found that the choice of embedding can have significant effects on model quality [30]. In the present calculations, we found that the simple 3D embedding model of Grantham [31], which is based on amino acid physico-chemical properties, produces model fits that approximate IgG data to within the experimental error, with high Pearson correlation coefficients (≃0.8–0.9) (see Results). The Grantham model uses three amino acid properties that show strong individual correlations with evolutionary residue exchangeability, (1) the atomic mass ratio of noncarbon to carbon atoms in the side chain, (2) residue polarity, and (3) molecular volume. Euclidean distances between amino acids computed by treating the normalized numerical values for each residue as orthogonal components show a Spearman rank (anti) correlation of −0.765 with residue substitution frequencies [31]. (The correlation is negative because residues that show greater exchangeability are modeled to have a smaller separation distance.) For completeness, we also tried a higher-dimensional (5D) embedding as per Atchley et al. [32]. These authors used a more sophisticated approach, first manually selecting 54 out of 494 amino acid attributes, and applying factor analysis to compute five factors as linear combinations of the original attributes. The factor scores are analogous to displacements along principal components in principal component analysis (PCA). In our calculations, the Atchley encoding gave slightly inferior agreement between modeled and experiment IgG (see Results). For simplicity, all gap positions in

\hat{S}

are assigned the zero vector; i.e., we did not attempt to optimize the MSA representation of amino acid deletions. In the remainder of this section, we use the 3D embedded coordinates

x_{i j}^{k}

with k = 1,2,3 to represent antigen residues in the MSA.

2.3. Distance to Average Strain IgG Model

Let

\hat{X}

be the embedded sequence alignment as described above. We assume that a subset of v strains

V \equiv {i_{1}, i_{2}, . . . i_{v}}

from

\hat{S}

has been chosen as the vaccination cocktail

\hat{V}

, and let

\bar{V}

be the average cocktail sequence. We use a simple average, but a weighted average could also be used, e.g., if the strains in the cocktail had different concentrations. We also note that additional (as yet unknown) strains can readily be added to the alignment a posteriori, i.e., after the model has been parametrized.

For any (test) sequence

T \in \hat{X}

, the IgG titer against

T

elicited by the cocktail

\hat{V}

is modeled as

I g G^{M} (\hat{V}, T) \equiv f (d (\bar{V}, T; W)),

(2)

where d is the weighted Euclidean distance between

\bar{V}

and

T

with weights

W \in M_{1 \times N_{r}}

(specified below), i.e.,

\begin{matrix} d (\bar{V}, T; W) & = {\{\sum_{j = 1}^{N_{r}} w_{j}^{2} {∥ t_{j} - {\bar{v}}_{j} ∥}^{2}\}}^{1 / 2} \\ = {\{\sum_{j = 1}^{N_{r}} w_{j}^{2} \sum_{k = 1}^{n} {(t_{j}^{k} - {\bar{v}}_{j}^{k})}^{2}\}}^{1 / 2} \end{matrix}

(3)

and

f (d; α, β) = \frac{1}{α + d^{β}}

(4)

is the sequence similarity function, with parameters

α

and

β

to be optimized below. Equation (2) assumes that IgG titers are determined by the similarity between the test strain and the average vaccination strain in the cocktail. The residue weights

W

, as well as

α

and

β

, are optimized by fitting to experimental IgG titers, as will be described at the end of this section. The reason for the rather ad hoc functional form of f is that it is one of the simplest ansätze that embodies an inverse relationship between distance and similarity, with sufficient flexibility for fitting through the coefficients

α

and

β

. While alternative functional forms are possible, e.g.,

f = α exp (- β d)

, and could be explored in the future, the fact that the majority of our modeled titers are within experimental uncertainty (see Results) suggests that the present choice is sufficient.

2.4. Model Parameter Optimization

Fitting the model to experimental data involves optimization of the parameters

α

,

β

and the residue weights

W

. Since the 2D parameter landscape of (

α

,

β

) was discovered to have multiple local minima, we performed a 2D scan of the parameter space to find optimal values (see Figure 2A). However, optimization of residue weights

W

in this fashion would be infeasible because of the large number of residues (∼550 for the influenza hemagglutinin MSA). Therefore, the weights were optimized using the steepest gradient descent, as follows.

Let

I g G^{E} (\hat{V}, T)

denote the experimental ELISA titer against test antigen

T

obtained using sera from animals vaccinated with cocktail

\hat{V}

[20,21], and let

δ I g G^{E} (\hat{V}, T)

denote the corresponding experimental uncertainty. The weighted squared error (WSE) of the antigen test set

\hat{T}

is defined as

ϵ_{\hat{V}, \hat{T}}^{2} = \sum_{T \in \hat{T}} {[\frac{I g G^{M} (\hat{V}, T) - I g G^{E} (\hat{V}, T)}{δ I g G^{E} (\hat{V}, T)}]}^{2},

(5)

where the dependence on

W

enters via Equation (2), and the subscripts on

ϵ^{2}

make explicit the dependence on

\hat{V}

and

\hat{T}

, which will be omitted below for brevity. Starting from uniform initial weights

W

= 0.1, and values for (

α

,

β

) corresponding to the current position in the 2D scan (discussed further below), the weights are updated according to

\begin{matrix} W^{*} = W^{n} - τ \nabla_{W} ϵ^{2} + D \nabla^{2} W^{n}, \\ W^{n + 1} = max {W^{*}, 0}, \end{matrix}

(6)

where

τ

is the steepest descent step,

\nabla_{W} (\cdot)

denotes gradients with respect to

W

,

\nabla^{2} W^{n}

is the discrete analog of the 1D Laplacian of

W^{n}

with respect to the residue position in the MSA, and D is an empirically selected diffusion constant. In the above, the diffusion term is added as a type of regularization, to enforce smoothness of

W

as a function of residue position. This approach reduces overfitting to experimental data, and also has a physical motivation: we can interpret the residues with high weights as those that contribute to antibody binding. The diffusion regularization term ensures that residues that are close to each other do not make vastly different contributions to binding. The sensitivity of the fits to the value of D is relatively minor, as will be shown below. The discrete Laplacian in our computer implementation of Equation (6) is

\nabla^{2} w_{i}^{n} \equiv \{\begin{matrix} w_{i - 1}^{n} - 2 w_{i}^{n} + w_{i + 1}^{n}, & i = 2, \dots, N_{r} - 1 \\ 0, & i = 1, N_{r}, \end{matrix}

(7)

which is equivalent to the finite difference of the second derivative computed to second order. The last line of Equation (6) ensures that

W

are non-negative at every step.

The expression for the gradient

\nabla_{W} ϵ^{2}

is obtained from Equations (2)–(4), i.e.,

\begin{matrix} \frac{\partial ϵ^{2}}{\partial w_{k}} & = \sum_{T \in \hat{T}} [\frac{I g G^{M} (\hat{V}, T) - I g G^{E} (\hat{V}, T)}{δ I g G^{E} {(\hat{V}, T)}^{2}}] \\ \times 4 f^{'} (d^{2}) w_{k} ∥ {\bar{v}}_{k} - t_{k} ∥, \end{matrix}

(8)

where we wrote f as a function of

d^{2}

rather than d to avoid the singularity of

\nabla_{W} d

at d = 0.

The above error minimization equations are written under the assumption of a particular vaccination cocktail

\hat{V}

. However, we optimized a single set of weights for all of the vaccination cocktails in Ref. [21] or Ref. [20] simultaneously. In that case, the total squared error (and its gradients) are the sums of those corresponding to the individual vaccinations.

The results of optimizing the similarity function parameters (

α

,

β

) in Equation (4) are illustrated in Figure 2 for fitting to influenza data [20]. Figure 2A shows a 2D discrete scan of the parameter space

(α, β) \in {0, 0.1, 0.2, \dots, 2} \times {0, 0.2, 0.4, \dots, 8}

. For each grid point, the weights

W

were reoptimized according to Equation (6) starting from

W^{0}

= 0.1, and iterating for

n_{m a x}

= 3000 steps with

τ

= 0.03, diffusion constant D = 0.4, and the mean squared error of the fit computed from Equation (5) and averaged. The scan shows a broad, irregularly shaped minimum for

α ≃

1,

β ≳

3); however, for

β ≳

5, the optimization landscape shows oscillatory behavior, suggesting that the model becomes sensitive to parameters, or that the gradient descent step could be too large in this region. However, the 1D plot through the minimum at each

β

value (Figure 2B) shows that the global minimum lies within the region

β \in (3, 5)

, and suggests that further parameter space optimization or exploration are not necessary. We repeated the optimization for multiple values of D (Figure 2B) and observed that the error minimum and location are not very sensitive to the choice of D; however, for

D ≳ 0.5

, we observed increased ruggedness, as can be seen for D = 0.5 (black line in Figure 2B). For the model calculations in Results, we used the value D = 0.49, which results in a relatively smooth distribution of the residue weights, as shown in the Results (e.g., Figure S4). We also noticed that the weights optimization could be performed with fewer iterations and at higher

τ

, with values up to 0.04, if the optimized weights were taken to be those that minimize the error over the entire optimization history (not just the final

W^{n}

in Equation (6)), i.e.,

W_{o p t} = arg min_{0 \leq n \leq n_{m a x}} ϵ^{2} (W^{n}) .

(9)

Parameter optimization for coronavirus data was very similar. The number of weights

W

to optimize was equal to the number of residue positions in the underlying MSA, which are 549 and 273 for influenza and coronavirus, respectively, subject to the constraints of non-negativity and diffusional regularization described above.

The parameters used to obtain the data in Results are given in Table 1. Calculations were performed on x86_64 AMD computers running Linux using Matlab [35] and GNU Octave [36], which is an open source program syntactically compatible with Matlab. A typical calculation to computed optimized weights and titer fits took about a minute on a single processor. Discrete scans of the (

α

,

β

) landscape required 800–1000 such calculations, taking about 14–16 h. Molecular structures for the visualization of weights were rendered using Visual Molecular Dynamics [37].

2.5. Principal Component Analysis

To visualize the landscape of influenza hemagglutinin (HA) sequences, and to aid in the selection of hypothetical vaccine strains, we used principal component analysis (PCA) in embedded coordinates

\hat{X}

defined above. Briefly, sequences of avian, swine, and human influenza type A HA proteins spanning the years 1918–2019 and subtypes 1–18 were downloaded from the NIH influenza research database [38], clustered, and sampled so that no two retained sequences were more than 97% identical. The sequences were multiply aligned, and embedded in the 5D vector space of Atchley et al. [32], which was found to give a good visual separation of the HA subtypes in the 2D landscape (see Results).

The covariance matrix of sequence differences was computed as

C_{j k}^{p q} = 〈 (x_{i j}^{p} - {〈 x_{i j}^{p} 〉}_{i}) 〈 {(x_{i k}^{q} - {〈 x_{i k}^{q} 〉}_{i} 〉}_{i}

(10)

where the angle brackets represent averages over sequences (rows in the MSA), which are indexed by i; j and k denote residue positions, and p and q denote embedded coordinate components.

The covariance matrix was diagonalized to yield the diagonal eigenvalue (EV) matrix

Λ

and eigenvectors (principal components; PCs) U,

C = U Λ U^{- 1} .

(11)

The coordinate projection of any sequence s onto any PC vector, e.g.,

{P C}_{l}^{q}

is computed as

P C_{l}^{q} = \sum_{j = 1}^{N_{r e s}} \sum_{p = 1}^{5} U_{l j}^{q} (x {(s)}_{j}^{p} - {〈 x_{i j}^{p} 〉}_{i}),

(12)

where

x (s)

are the embedded coordinates of s after it has been added to the MSA. The double index on

P C_{l}^{q}

is retained for notational convenience; in practice, we sort the PCs in the order of decreasing EVs and plot projections onto the two PCs corresponding to the highest EVs. The PCA was implemented in Matlab [35].

3. Results

First, we show that the model described in Methods reproduces measurements of post-vaccination mouse sera binding to panels of influenza [20] and SARS-Cov-2 [21] antigens to within experimental error. The virus strains and vaccine compositions are given in Table 2 and Table 3, for flu and coronavirus, respectively. Alignments of the antigen sequences and pairwise antigen similarity matrices are provided in supporting text, and in Figures S10 and S11.

Our focus is on mosaic nanoparticles (nps)—those that display the same protein antigen but from different strains—because such vaccines appear to elicit sera enriched in bnAbs [18,19,21,28] (see Figure 3 for an illustration of the nanoparticles used in the vaccinations). Fits to the titers for mixtures of homogeneous influenza nps are shown in the Supporting Material. Model performance is summarized in Table 4 and Table 5 for influenza and coronavirus, respectively.

3.1. Influenza Results

In Figure 4 we show the model results parametrized and tested using the the influenza vaccination data. Panels A and B show the optimized fits of the corresponding model to the IgG titers measured as area-under-curve (AUC) 21 days after vaccinating mice with mosaic nps [20].

Overall, the model shows a good fit to the average IgG titers (rmse = 0.09,

c_{P}

= 0.93,

c_{S}

= 0.94), where rmse is the root-mean-squared error, and

c_{P}

and

c_{S}

are the Pearson correlation and Spearman rank coefficients, respectively. Notably, the rmse of the corresponding experimental IgG error was 0.23, which was higher than that for the fit. The experimental error bars correspond to the standard deviations (

σ

) of the titer measurements; because there were four measurements of each titer, they also correspond to 2

σ

of the mean IgG, or the 95% confidence range. Panels A and B in Figure 4 show that all of the modeled titers for both models are within the corresponding error range.

To evaluate the predictive ability of the model, we performed the fitting procedure using three strategies. First, we carried out the fitting (or training) procedure with half of the strains (five), and used the remaining five strains to evaluate the fits. To eliminate the training/test set selection bias, we considered all

(\binom{10}{5})

= 252 possible strain partitions. Panel C of Figure 4 shows scatter plots of the Pearson and Spearman correlations computed over the training vs. test sets, and the correlation statistics are summarized in Table 4 (section I). The correlation for the training set (

c_{P} \sim

0.91) was higher than that for the test set (

c_{P} \sim

0.84). This was expected, because the training (but not the test) set was used in the optimization/fitting. However, because the correlations were rather high even for the test set, it appears that overfitting to the training data was minor. Further, comparing the distributions of the correlation coefficients in terms of the

σ

, the differences were not significant even to within 1

σ

or 68% confidence.

Second, we performed the fitting procedure using data from two of the four vaccine titers, corresponding to

(\binom{4}{2})

= 6 possible partitions. The results are summarized in Table 4 (section II). They are similar to those for the previous case, with the main difference being higher statistical errors in the correlation coefficients, as can also be seen from scatter plots of the Pearson and Spearman correlations in Figure S5.

In the third check, we removed the titers corresponding to vaccine V8 from the experimental dataset, and trained the models using 70% of the remaining data (i.e., 7 out of 10 strains). Taking 70%, rather than 50%, of the remaining data implies that the fraction of the training set relative to the size of the entire experimental sample is 0.7 × 0.75 = 0.525, which is roughly consistent with the 50% used in the first two cases above. The V8 vaccine titers were chosen as a "holdout" set because the V8 vaccine has some component strains that were not present in each of the remaining vaccines (Table 2), making it a nontrivial test case of a possible novel vaccine. Overall, the correlation coefficients were comparable to those from the previous two tests, the main difference being somewhat higher p-values, indicating lower statistical significance (see column 6 in Table 4). On average, the p-values correspond to greater than 95% significance, but the model parametrized using the Atchley encoding produced instances of particularly low statistical significance (the highest p-value of 0.77, which is why we favor using the Grantham encoding). The low-significance values were present despite the high Pearson correlation coefficient for the training set (≥0.9), suggesting that, in these cases, the models were overfit to the training data. To generate a single computational prediction for each of the measurements in the holdout set without model selection bias, the computed titers from all of the models corresponding to the different train/test partitions were averaged prior to computing correlations in Table 4 (holdout columns). These predictions generally produced somewhat lower correlation coefficients, in the range 0.63–0.89 with p-values ≤ 0.053 (see columns 7 and 8 in Table 4), still indicating satisfactory performance. Scatter plots corresponding to Table 4 section III are provided in Figure S8.

Overall, from Figure 4 panels A-C and Table 4, we can conclude that the model recapitulates the IgG titer data from the influenza nanoparticle vaccinations of Cohen et al. [20] with reasonable accuracy; the model gave better results for the mosaic vaccine dataset with the Grantham [31] encoding.

As described in Methods, a feature of the present model is that the residue weights

W

are interpretable as reflecting contributions to antibody binding. For example, we hypothesize that residues with high weights are involved in antibody/antigen binding. If true, the present models could aid in the identification of antigenic epitopes. To investigate this possibility, in panel D of Figure 4 we show the structure of an H1 hemagglutinin (HA) monomer [39] colored by residue weights; the weights are also plotted as a function of residue position in Figure S4. In the figures, the highest weights correspond to the upper portion of the HA stem, which is known to be targeted by several bnAbs [12]. In addition, the figure shows a residue patch along the top inner portion of the HA head, which is in the vicinity of the occluded epitope found by Watanabe et al. [40]. However, we note that the portion of the HA stem corresponding to the HA2 subunit does not contribute to binding in the above models, as the residue weights are zero in that region (see Figure S4A). Thus, the model may be too simple or approximate to identify all relevant binding epitopes. We note that, to actually map the epitopes to which the vaccination sera react would require antibody competition or epitope mutagenesis experiments. As the IgG model is nonlinear, we cannot directly separate the total IgG titer into relative contributions from the HA head and the HA stem. However, we can alternately set the residue weights in the head or the stem region to zero, renormalize the weights (to approximately preserve the total titer), and recompute the modeled titers. The results are shown in Figure 4A, with unfilled circles and squares corresponding to head-only and stem-only weights. The modeled values are generally in worse agreement with the experiment, demonstrating that residue weights in both the head and the stem region are important. The overall rmse between the modeled and experimental results in both cases increased by ∼50%, being somewhat larger in the head-only case.

3.2. Coronavirus Results

Figure 5 shows the model results parametrized and tested using coronavirus vaccination IgG titers measured as AUC 14 days after vaccination with mosaic nps [21]. The main difference between these data and the influenza data is that model produces a somewhat worse fit (rmse = 0.19,

c_{P}

= 0.81,

c_{S}

= 0.80), which can be seen in panels A and B of Figure 5. Nonetheless, most of the modeled values in Figure 5A,B are within the experimental error bar. The experimental rmse in the titers, with ten replicates per value, was

0.36

[21]; the 95% confidence limit for the average value would correspond to a reduction in the error bar by a factor of only

2 / \sqrt{10} ≃ 0.63

, giving 0.23, which is comparable to the rmse for the model. Quantitatively, we evaluated the model performance using three strategies, as described for influenza. First, we compared the distributions of the correlations obtained when the titer data were split into a training set with five antigen strains, and the remaining four strains were used for testing. There were

(\binom{9}{5})

=126 such set pairs, corresponding to the scatter plot in Figure 5 panel C.

As before, the correlations for the training sets (

c_{P} \sim

0.86 and

c_{S} \sim

0.84) were higher than those for the test set (

c_{P} \sim

0.59 and

c_{S} \sim

0.56, respectively). These differences were somewhat higher that those for the influenza data, suggesting that there was more overfitting to the training set for the coronavirus data. Although, on the average, the p-value for the Pearson correlation coefficient was 0.039, the maximum value was 0.25, indicating that some correlation values were not very significant (see Table 5 section I). This was caused, in part, by the small size of the dataset, which had only 36 IgG values in total (Figure 5).

Next, we split the experimental data along vaccine groups, choosing two groups for training and two for testing, for a total of six possible splits, as was performed for the influenza data. The results were similar to the first test above (

c_{P}^{t e s t} \sim

0.65), and the maximum p-value was somewhat lower (0.038; Table 5 section II). The correlation scatter plots are shown in Figure S6.

Finally, we evaluated the model on three-way data splits, in which all IgG titers corresponding to vaccine V8 were set aside as a holdout set, and two-thirds of the remaining data (six out of nine antigen strains) were used for training. Using two-thirds corresponded to 50% of the total data, consistent with the above two checks and with the testing procedure for influenza. Applying the models to the holdout set and averaging the computed titers over the models, as performed above for influenza, resulted in the Pearson correlation of 0.8 (p-values ≤ 0.026; Table 5 section III). This result appears satisfactory, especially given the small amount of training data. The correlation scatter plots for this case are shown in Figure S9.

Although the somewhat greater degree of overfitting mentioned above suggests that the model does not perform as well as for the influenza np dataset (compare Figure 4 and Figure 5’s panels C), the

c^{t r a i n}

vs.

c^{t e s t}

scatter plots show that there are several model instances in which the correlation on both the training and test sets was around ∼0.8. Moreover, the fact that the average model titers over all of the trained models applied to the holdout set correlate with the experiment to ∼0.8 indicate that, even with the small size of the dataset, it is possible to obtain a satisfactory model.

It is noteworthy that a large discrepancy between the model and experiment was found for the IgG titers obtained after vaccination with only the SARS-Cov-2 receptor-binding domain (RBD) antigen, measured against the same antigen (Figure 5 panel A, first bar from the left). As the test strain and the vaccine strain are identical, one would expect to see strong binding between the vaccination sera and the antigen. However, some serum samples showed no detectable binding in the experiment [21], which caused the high error bar and the lower average titer, compared with vaccines V4A (green bars) and V8 (blue bars) (see Table 3 and Figure 5).

This result seems at odds with the main assumption made in our model, i.e., that sera reactivity increases with the similarity of the test strain to the vaccination strain(s) (Equation (2)), and, in part, explains the worse fit and prediction quality. Other strains for which the discrepancies between the model and experimental IgG values are large are RS4081, SHC014, and YUN11, especially when measured with respect to the sera of vaccine V4B (orange bars in Figure 5). However, none of those strains are components of V4B (see Table 3). Therefore, the titer differences were, in principle, justifiable as modeling errors due to strain mismatch, rather than only due to non-binding experimental samples. In contrast, for the SARS-2 case, the match between the vaccine and test strains was 100%, which is why a low titer appeared unphysical. Although it might be tempting to consider the non-binding samples as erroneous or as outliers, we do not have sufficient evidence to do so; e.g., multiple different vaccination sera samples had very low titers, which cause high error bars [21]. Unfortunately, there are no experimental data that could provide a mechanistic explanation for the low binding titers to the vaccinating antigen in the SARS-2 group. It seems possible that different animals developed sufficiently different antibody profiles to the vaccinating antigen to explain the differences, but discriminating between a multi-epitope model and the present model would require an experimental dataset with higher precision.

We also investigated whether the anomalous behavior for SARS-2 described above could be responsible for the larger difference between the correlations computed over the training vs. test sets than was observed for influenza (compare panel C of Figure 4 with panel C of Figure 5), i.e., overfitting to the training set. To achieve this, we found the

(\binom{8}{4})

= 70 of the

(\binom{9}{5})

= 126 training/test partitions that contain the SARS-2 strain in the training set and recomputed the average in the correlations using only those strains. Figure 5C (blue symbols; see figure caption) shows that including the SARS-2 titers in the training data slightly lowered

c_{P}

for the training sets and increased it for the test set, which indicates reduced overfitting to training data. We interpret this finding as an additional indication that the IgG titers measured against the SARS-2 strain behave differently from the other titers (e.g., that the sequence similarity is not the main titer determinant). It can also be interpreted as underscoring the importance of a diverse training set in reducing overfitting.

Finally, in Figure 5D, we colored the structure of the SARS-2 RBD by residue weights from the model; the weights are also plotted as a function of residue position in Figure S4B. The residues that make the highest contributions to the antibody binding are not at the tip of the RBD, which contacts the angiotensin (ACE2) receptor prior to target cell entry, but near the middle region. According to the X-ray and Cryo-EM structures of Barnes et al. [42], this part of the RBD contains epitopes which bind class 3 neutralizing antibodies, such as Regeneron REGN10987 [43] bnAb. Thus, the weights suggest that the vaccines elicit antibodies that target the more conserved parts of the RBD, in accord with the vaccine design [21]. However, as is the case for influenza, epitope mapping experiments using, e.g., competitive binding or mutational analysis, would be needed to test the epitope predictions.

3.3. mRNA Influenza Vaccine

Although we developed our "average-strain" model to interpret vaccination data obtained from mosaic nanoparticle vaccines, which are hypothesized to improve elicitation of bnAbs via antibody bivalency [18,19,21,28], it is also useful to test the applicability of the model to other vaccine formulations. Further, Ives et al. [17] that the adaptive response can be targeted toward conserved immunosubdominant HA epitopes by manipulating antigen concentration in a cocktail of soluble HA ectodomains. Thus, we reparametrized the model using the mRNA vaccination data of Arevalo et al. [44], who vaccinated mice with a cocktail of mRNA lipid nanopartices encoding 20 different HAs corresponding to all known influenza subtypes. As controls, the authors also vaccinated mice with mRNA containing only H1 or H3 hemagglutinins. The model results are compared with the experiment in Figure 6, and summarized in Table 6. The strains composing the vaccines are listed in Table S1. The best model fit to the experiment, which uses all available titers, had an rmse of 0.09, twice the rms experimental error bar of 0.045. Qualitatively, it can be seen from Figure 6A that some of the model values are outside the experimental error bar. The difference was particularly large for the 20 HA vaccine titers against the two type B HA strains (Figure 6A, left panel). This is not very surprising, because the vaccination cocktail contained 18 type A strains, but only 2 B strains. Therefore, the "average" strain in the model was heavily biased toward type A and away from type B, which in part explains the low titers. Further, the model appeared to "compensate" for the low B titers by somewhat overestimating titers to type A HAs, which was especially visible in Figure 6A (middle panel). Given that the experimental titers against type B HAs in the 20 HA sera were generally as high as those for the type A strains, the discrepancy suggests that the average strain model assumed here may not be appropriate for cocktails composed of very dissimilar strains, such as those from type A and B hemagglutinins. As a test, we recomputed the best fit to experimental IgGs without any of the experimental titers against the B strains, and found the abovementioned overestimation of titers against type A strains to be somewhat reduced (see Figure S3). This appears to confirm that the discrepancy was caused, in part, by modeling a cocktail of antigens that were too dissimilar. Notably, this was not a rigorous test, because, in the refitting, we simply ignored data for the two B antigens, which were present in the 20 HA vaccine that elicited the actual post-vaccination sera, i.e., we do not know how the model would perform if the cocktail did not contain the B antigens. Although we currently do not have a quantitative approach to determine the minimum appropriate similarity between different strains in a cocktail, on the basis of pairwise sequence alignment scores computed for the antigenic cocktails considered here, we suggest a minimum cutoff of 30% (using BLOSUM50 scoring). This suggestion arises from the observation that all pairwise similarities computed here are above 35%, except those involving an influenza type B strain and and a type A strain, which are at or below 20% (see Figures S10–S12).

Despite the abovementioned errors, the model values generally correlate well with the experiments, producing

c_{P}

and

c_{S}

coefficients at or above ∼0.9 (Figure 6A–C). A notable exception to this observation can be seen in Table 6 section II, in which we split the data into training/test sets along vaccine definitions. Specifically, we trained the model using two of the three available vaccine titer sets, and used the remaining one for testing. If the training test only includes vaccines H1 and H3, the test on the remaining (20 HA) dataset produces no correlation to experiment (see Figure S7). However, this should be a surprise, because good models can only be expected to recapitulate data that are well represented in the training set.

Finally, in Figure 6D, we colored the structure of an HA spike using the residue weights determined from the model fit. It can be seen that the weights are significant for the stem region, as well as for the head, qualitatively, but not quantitatively, similar to the nanoparticle vaccine case. A comparison between the np-vaccine and mRNA-vaccine weights is shown in Figure S4. Overall, the np-vaccine weights seemed to be more concentrated in fewer regions/epitopes, whereas the mRNA-vaccine weights were more spread out, perhaps suggesting a more heterogeneous population of Abs.

3.4. Application to Vaccine Efficacy Prediction

Our final result is an example application of the model to a comparison of several hypothetical vaccines with different antigen compositions. For model training, we used the np influenza data [20] described above in Section 3.1, which provide the model parameters

α

and

β

(see also Table 1, top row), and residue weights

W

. We chose four np vaccine compositions, which are given in Table S2, H1 homotypic, H3 homotypic, H1 and H3 mosaic, and 11 HA mosaic. The strains for the 11 HA mosaic vaccine were chosen to provide broad antigenic coverage within the influenza landscape, illustrated using principal component analysis (PCA) in Figure 7. For the convenience of a future experimental investigation, the chosen HA sequences all have available crystal structures, which would facilitate structural analysis or antigen engineering. These strains correspond to the set

\hat{V}

in Section 2.3, and are listed in Table S2. For the test set (

T \in \hat{T}

), we chose a 36-antigen panel of Ives et al. [17], spanning multiple influenza subtypes over the years 1918–2014 (Table S2). The vaccination and test sets were added to the original MSA (

\hat{X}

) for the model, which associates the model residue weights with the new sequences. The model IgG titers were computed from Equations (2)–(4), and the results are compared in Figure 8. The figure illustrates several expected outcomes, e.g., that the H1 homotypic vaccine does not elicit significant Ab titers against group II HAs H3 and H7. Likewise, the H3 homotypic vaccine elicits predominantly anti-H3 titers. The 11 HA cocktail vaccine appears to elicit the greatest breadth, but at the expense of a lower anti-H3 response, relative to the H1 and H3 combination. These model predictions would benefit from future experimental validation. Since the predictions were generated with the np influenza parametrization, the experimental validation procedure should follow the mouse vaccination protocol of Cohen et al. [20]. Briefly, four groups of 4–6-week-old female Balb/c mice, with four or more mice per group, could be vaccinated with the nanoparticle vaccines proposed above; two additional groups could serve as positive (e.g., seasonal quadrivalent vaccine) and negative (e.g., PBS solution) controls. The mice would be primed on day 0 and boosted on day 14 via intraperitoneal injections of 20

μ

g of antigen in 200

μ

L of 50% v/v of Sigma Adjuvant System. ELISA assays would be carried out on sera collected on day 21, using the antigen test panel in Table S2, and the determined area-under-curve (AUC) signals correlated with the present predictions.

4. Concluding Discussion

Viral escape poses a major challenge for the development of vaccines for highly mutable pathogens such as HIV, influenza, and coronavirus, because rapidly emerging pathogen variants render host immunity raised by administered vaccines obsolete. Thus, it is important to design vaccines that elicit broadly reactive sera able to counteract a wide variety of pathogen mutants, especially those not present in the vaccines. Toward this goal, we parametrized a model of reactivity of the sera obtained from vaccinated naïve mice with influenza or coronavirus nanoparticle vaccines [20,21]. The model was able to recapitulate the experimental Ab titers to within experimental uncertainty, and produced Pearson and Spearman correlation coefficients in the range 0.6–0.9, depending on the dataset used. Notably, the model parametrization procedure required only a modest amount of data (e.g., 20 titer values used for parametrization/training and 16–20 for testing). However, additional experimental data would improve model quality, e.g., by reducing experimental uncertainty in the average titers (using more test subjects), by incorporating more strains into vaccine cocktails or testing panels, or by allowing the use of more complex models with additional parameters, such as multiple residue weights to represent different epitopes. In principle, sophisticated neural networks could be trained to model vaccine efficacy, such as those used to predict properties of small molecules [45,46], or antibody structures [47]. However, such models typically require very large amounts of data (thousands to millions of data points) to achieve high accuracy. Such data may be costly, even impossible, to obtain in vaccination experiments and would probably not be transferable between vaccine types or vaccination protocols (a limitation also shared by our approach). Although the present model yields good results with few data points, using it required significant human input, e.g., choice of the functional form of the similarity function (Equation (4)) or amino acid encoding. For example, using the Atchley et al. [32] encoding for the influenza vaccines yields a somewhat lower quality fit to the vaccine titers, with several model points outside of the experimental error bar (see Supplementary Figure S1B), and the distribution of residue weights was high for a portion of the hemagglutinin stem that is expected not to be exposed to antibodies in a trimer (Figure S1D). For these reasons, and with the additional justification that the Grantham [31] encoding appears to be more grounded in the physical properties of amino acids, we chose the Grantham encoding, rather than the more empirical fit of Atchley et al. [32].

The present model was developed to interpret vaccination data obtained from mosaic nanoparticle vaccines. The main reason for this choice is that such vaccines have been shown to direct the immune response to more conserved immunosubdominant epitopes [18,19,21,28], which are associated with eliciting broadly neutralizing antibodies (bnAbs) [40,48]. The underlying reason for the greater breadth associated with such vaccines appears to be antibody avidity (see Figure 3C). Specifically, because antibodies and their membrane-bound counterparts—BCRs—have two fragments for antigen-binding (Fabs), and because antigen binding translates into a proliferation signal for the corresponding B cells during antibody evolution [49], mosaic nanoparticle vaccines are expected to elicit antibodies directed to more conserved parts on the antigen [18].

Since the model assumes that the maturing antibodies see an “average” strain, the encouraging results obtained here suggest that the vaccination sera contain bnAbs that are cross-reactive with multiple antigen strains. However, to conclude this with confidence would likely require evaluating fits to alternative models (see, e.g., the Supporting Material), and showing that the present model produces superior results. This may not be possible with the presently available experimental datasets, which contain large statistical errors.

The ability to predict vaccine efficacy against an arbitrary pathogenic variant is clearly useful. However, because the space of possible amino acid mutations is astronomical (even if not all 20 natural amino acids are allowed at all sequence positions), it is generally not possible to predict which specific strains will emerge in the future. Nevertheless, the effects of viral mutations can be partially characterized by high-throughput mutagenesis experiments [7] to construct viral fitness landscapes, from which the most fit mutant sequences can be tested using models such as those presented here. Alternatively/additionally, one could use such models to maximize antibody breadth over a space of candidate vaccine designs (see e.g., Ref. [50]) relative to an established panel of antigen variants [51].

Although the results described here are encouraging, it is important to discuss the limitations of the modeling approach. First, the model parametrizations described here are very unlikely to be directly transferable between different vaccine formulations because of a multitude of complex factors that determine the overall immune response, which are not explicitly represented in the models. Examples of such factors are nanoparticle vs. soluble ectodomain presentation, route of administration, choice of adjuvant, or choice of boosting protocol. This can already be seen here by comparing the model parameters obtained from fitting the model to nanoparticle vs. mRNA vaccination data (Figure 4 and Figure 6, respectively). We expect the models to be somewhat general only within a particular class of vaccine formulation, i.e., after the factors such as those mentioned above are decided upon and fixed, so that the remaining differences between vaccines are dominated by antigen sequence variability. Even in these restricted scenarios, the models will likely need to be reparametrized. Fortunately, this is a straightforward task, given the simplicity of Equations (2)–(4), and is not expected to require excessive experimentation. Further, if sufficient data are available for training, model sophistication can readily be improved. For example, an assumption that is made in our modeling is that a single weight array

W

can adequately describe the binding between the antibodies and the antigens. This assumption implies that all of the modeled vaccines are sufficiently similar so as to elicit antibodies to the same antigen epitopes, which is not true, in general. Thus, a simple generalization of the present model is to include multiple weight arrays, which can be achieved, e.g., by replacing overall similarity function by a sum of constituent functions, e.g., for two weight arrays,

W_{1}

and

W_{2}

,

f_{t o t a l} = f (d_{1}; α_{1}, β_{1}) + f (d_{2}; α_{2}, β_{2}),

(13)

where

d_{i}

is as in Equation (3), with

W_{i}

in place of

W

. The fitting/training of such models would be very similar to the procedure described here. Although an advantage of the present modeling is that structural information about neither the antigens nor the antibodies is required, recent advances in structure prediction methods, e.g., AlphaFold3 [52], RoseTTAFold [53], ProtTrans [54], or ESM-2 [55], could be used to predict antigen structures in solution or in complex with antibodies. Such structures could provide information on epitope accessibility or immunogenicity, which could be incorporated into a future version of the model via additional or alternative epitope weights.

Finally, we do not explicitly take into account any preexisting immunity, or original antigenic sin (OAS), which is known to skew antibody responses toward previously encountered strains [56]. In principle, OAS could be incorporated in the model by including pre-vaccination titers to determine the direction of the skewness in antigen-experienced subjects.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/antib14010006/s1. Supporting text discussing alternative models, sequence alignments of antigenic sequences, Figures S1–S12 and Tables S1 and S2 accompany this article.

Author Contributions

Conceptualization, V.O. and M.K.; methodology, V.O. and M.K.; software, V.O.; validation, V.O. and M.K.; formal analysis, V.O. and M.K.; investigation, V.O. and M.K.; resources, V.O. and M.K.; data curation, V.O. and M.K.; writing—original draft preparation, V.O. and M.K.; writing—review and editing, V.O. and M.K.; visualization, V.O. and M.K.; supervision, V.O. and M.K.; project administration, V.O. and M.K.; funding acquisition, V.O. and M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Bill & Melinda Gates Foundation and Flu Lab grant opportunity OPP1214161.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The computer code used for the model in this paper, including the experimental dataset used for training, and partial results, can be found online at https://github.com/ovchinnv/npvaccine-titer-model2022.git, accessed on 29 November 2024.

Acknowledgments

The authors are grateful to Simone Conti and Aravinda Munasinghe for helpful discussions, to Alex Cohen and Pamela Björkman for providing antigen binding data, and to the Bill & Melinda Gates Foundation and Flu Lab for financial support under grant opportunity OPP1214161. The findings and conclusions contained within are those of the authors and do not necessarily reflect positions or policies of the Bill & Melinda Gates Foundation. Computer resources were provided by the National Energy Resource Scientific Computing Center (NERSC), which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and by Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Erbelding, E.; Post, D.; Stemmy, E.; Roberts, P.; Augustine, A.; Ferguson, S.; Paules, C.; Graham, B.; Fauci, A. A Universal Influenza Vaccine: The Strategic Plan for the National Institute of Allergy and Infectious Diseases. J. Infect. Dis. 2018, 218, 347–354. [Google Scholar] [CrossRef] [PubMed]
Chung, J.R.; Kim, S.S.; Kondor, R.J.; Smith, C.; Budd, A.P.; Tartof, S.Y.; Florea, A.; Talbot, H.K.; Grijalva, C.G.; Wernli, K.J. Interim Estimates of 2021–22 Seasonal Influenza Vaccine Effectiveness–United States, February 2022. MMWR Morb. Mortal. Wkly. Rep. 2022, 71, 365–370. [Google Scholar] [CrossRef]
Achaiah, N.C.; Subbarajasetty, S.B.; Shetty, R.M. R0 and Re of COVID-19: Can We Predict When the Pandemic Outbreak Will Be Contained? Indian J. Crit. Care Med. 2020, 24, 1125–1127. [Google Scholar] [CrossRef]
Subbarao, K. The Success of SARS-CoV-2 Vaccines and Challenges Ahead. Cell Host Microbe 2021, 29, 1111–1123. [Google Scholar] [CrossRef]
Corbett, K.S.; Edwards, D.K.; Leist, S.R.; Abiona, O.M.; Boyoglu-Barnum, S.; Gillespie, R.A.; Himansu, S.; Schäfer, A.; Ziwawo, C.T.; DiPiazza, A.T.; et al. SARS-CoV-2 MRNa Vaccine Design Enabled by Prototype Pathogen Preparedness. Nature 2020, 586, 567–571. [Google Scholar] [CrossRef]
Arunachalam, P.S.; Scott, M.K.; Hagan, T.; Li, C.; Feng, Y.; Wimmers, F.; Grigoryan, L.; Trisal, M.; Edara, V.V.; Lai, L.; et al. Systems Vaccinology of the BNT162b2 MRNa Vaccine in Humans. Nature 2021, 596, 410–416. [Google Scholar] [CrossRef]
Greaney, A.J.; Starr, T.N.; Gilchuk, P.; Zost, S.J.; Binshtein, E.; Loes, A.N.; Hilton, S.K.; Huddleston, J.; Eguia, R.; Crawford, K.H.; et al. Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain That Escape Antibody Recognition. Cell Host Microbe 2021, 29, 44–57.e9. [Google Scholar] [CrossRef]
Andrews, N.; Stowe, J.; Kirsebom, F.; Toffa, S.; Rickeard, T.; Gallagher, E.; Gower, C.; Kall, M.; Groves, N.; O’Connell, A.M.; et al. COVID-19 Vaccine Effectiveness Against the Omicron (B.1.1.529) Variant. N. Engl. J. Med. 2022, 386, 1532–1546. [Google Scholar] [CrossRef]
Liu, Y.; Rocklöv, J. The Reproductive Number of the Delta Variant of SARS-CoV-2 Is Far Higher Compared to the Ancestral SARS-CoV-2 Virus. J. Travel Med. 2021, 28, taab124. [Google Scholar] [CrossRef]
HIV AIDS Asia Pacific Research Statistical Data Information Resources AIDS Data Hub. UNAIDS Data Reference 2021. Available online: https://www.aidsdatahub.org/resource/unaids-data-2021 (accessed on 29 November 2024).
Burton, D.R.; Desrosiers, R.C.; Doms, R.W.; Feinberg, M.B.; Gallo, R.C.; Hahn, B.; Hoxie, J.A.; Hunter, E.; Korber, B.; Landay, A.; et al. A Sound Rationale Needed for Phase III HIV-1 Vaccine Trials. Science 2004, 303, 316. [Google Scholar] [CrossRef]
Corti, D.; Cameroni, E.; Guarino, B.; Kallewaard, N.; Zhu, Q.; Lanzavecchia, A. Tackling Influenza with Broadly Neutralizing Antibodies. Curr. Opin. Virol. 2017, 24, 60–69. [Google Scholar] [CrossRef] [PubMed]
Mu, Z.; Haynes, B.; Cain, D. Strategies for Eliciting Multiple Lineages of Broadly Neutralizing Antibodies to HIV by Vaccination. Curr. Opin. Virol. 2021, 51, 172–178. [Google Scholar] [CrossRef] [PubMed]
Ellebedy, A.; Krammer, F.; Li, G.; Miller, M.; Chiu, C.; Wrammert, J.; Chang, C.; Davis, C.; McCausland, M.; Elbein, R.; et al. Induction of Broadly Cross-Reactive Antibody Responses to the Influenza Ha Stem Region Following H5N1 Vaccination in Humans. Proc. Natl. Acad. Sci. USA 2014, 111, 13133–13138. [Google Scholar] [CrossRef]
Carter, D.; Darby, C.; Lefoley, B.; Crevar, C.; Alefantis, T.; Oomen, R.; Anderson, S.; Strugnell, T.; Cortés-Garcia, G.; Vogel, T.; et al. Design and Characterization of a Computationally Optimized Broadly Reactive Hemagglutinin Vaccine for H1N1 Influenza Viruses. J. Virol. 2016, 90, 4720–4734. [Google Scholar] [CrossRef]
Shaffer, J.; Moore, P.; Kardar, M.; Chakraborty, A. Optimal Immunization Cocktails Can Promote Induction of Broadly Neutralizing Abs Against Highly Mutable Pathogens. Proc. Natl. Acad. Sci. USA 2016, 113, E7039–E7048. [Google Scholar] [CrossRef]
Ives, S.; Bürckert, J.P.; Pettus, C.; Peñate, K.; Hovde, R.; Bayless, N.; Shanghavi, D.; Glanville, K.; Calgua, E.; Youssef, S.; et al. A General Solution to Broad-Spectrum Vaccine Design for Rapidly Mutating Viruses. Res. Sq. 2021. submitted. [Google Scholar] [CrossRef]
Kanekiyo, M.; Joyce, M.; Gillespie, R.; Gallagher, J.; Andrews, S.; Yassine, H.; Wheatley, A.; Fisher, B.; Ambrozak, D.; Creanga, A.; et al. Mosaic Nanoparticle Display of Diverse Influenza Virus Hemagglutinins Elicits Broad B Cell Responses. Nat. Immunol. 2019, 20, 362–372. [Google Scholar] [CrossRef]
Boyoglu-Barnum, S.; Hutchinson, G.; Boyington, J.; Moin, S.; Gillespie, R.; Tsybovsky, Y.; Stephens, T.; Vaile, J.; Lederhofer, J.; Corbett, K.; et al. Glycan Repositioning of Influenza Hemagglutinin Stem Facilitates the Elicitation of Protective Cross-Group Antibody Responses. Nat. Commun. 2020, 11, 791. [Google Scholar] [CrossRef]
Cohen, A.; Yang, Z.; Gnanapragasam, P.; Ou, S.; Dam, K.; Wang, H.; Bjorkman, P. Construction, Characterization, and Immunization of Nanoparticles That Display a Diverse Array of Influenza Ha Trimers. PLoS ONE 2021, 16, e0247963. [Google Scholar] [CrossRef]
Cohen, A.; Gnanapragasam, P.; Lee, Y.; Hoffman, P.; Ou, S.; Kakutani, L.; Keeffe, J.; Wu, H.; Howarth, M.; West, A.; et al. Mosaic Nanoparticles Elicit Cross-Reactive Immune Responses to Zoonotic Coronaviruses in Mice. Science 2021, 371, 735–741. [Google Scholar] [CrossRef]
Robert, P.; Marschall, A.; Meyer-Hermann, M. Induction of Broadly Neutralizing Antibodies in Germinal Centre Simulations. Curr. Opin. Biotechnol. 2018, 51, 137–145. [Google Scholar] [CrossRef]
Wang, S.; Mata-Fink, J.; Kriegsman, B.; Hanson, M.; Irvine, D.; Eisen, H.; Burton, D.; Wittrup, K.; Kardar, M.; Chakraborty, A. Manipulating the Selection Forces During Affinity Maturation to Generate Cross-Reactive HIV Antibodies. Cell 2015, 160, 785–797. [Google Scholar] [CrossRef] [PubMed]
Ovchinnikov, V.; Karplus, M. A Coarse-Grained Model of Affinity Maturation Indicates the Importance of B-Cell Receptor Avidity in Epitope Subdominance. Front. Immunol. 2022, 13, 816634. [Google Scholar] [CrossRef]
Wang, S. Optimal Sequential Immunization Can Focus Antibody Responses Against Diversity Loss and Distraction. PLoS Comput. Biol. 2017, 13, e1005336. [Google Scholar] [CrossRef]
Robert, P.; Arulraj, T.; Meyer-Hermann, M. Ymir: A 3D Structural Affinity Model for Multi-Epitope Vaccine Simulations. iScience 2021, 24, 102979. [Google Scholar] [CrossRef]
Conti, S.; Ovchinnikov, V.; Chakraborty, A.; Karplus, M.; Sprenger, K. Multiscale Affinity Maturation Simulations to Elicit Broadly Neutralizing Antibodies Against HIV. PLoS Comput. Biol. 2022, 18, e1009391. [Google Scholar] [CrossRef]
Corbett, K.; Moin, S.; Yassine, H.; Cagigi, A.; Kanekiyo, M.; Boyoglu-Barnum, S.; Myers, S.; Tsybovsky, Y.; Wheatley, A.; Schramm, C.; et al. Design of Nanoparticulate Group 2 Influenza Virus Hemagglutinin Stem Antigens That Activate Unmutated Ancestor B Cell Receptors of Broadly Neutralizing Antibody Lineages. mBio 2019, 10, 10-1128. [Google Scholar] [CrossRef]
Ovchinnikov, V.; Louveau, J.; Barton, J.; Karplus, M.; Chakraborty, A. Role of Framework Mutations and Antibody Flexibility in the Evolution of Broadly Neutralizing Antibodies. eLife 2018, 7, e33038. [Google Scholar] [CrossRef] [PubMed]
Nanni, L.; Lumini, A. A New Encoding Technique for Peptide Classification. Expert Syst. Appl. 2011, 38, 3185–3191. [Google Scholar] [CrossRef]
Grantham, R. Amino Acid Difference Formula to Help Explain Protein Evolution. Science 1974, 185, 862–864. [Google Scholar] [CrossRef]
Atchley, W.; Zhao, J.; Fernandes, A.; Drüke, T. Solving the Protein Sequence Metric Problem. Proc. Natl. Acad. Sci. USA 2005, 102, 6395–6400. [Google Scholar] [CrossRef] [PubMed]
Mei, H.; Liao, Z.; Zhou, Y.; Li, S. A New Set of Amino Acid Descriptors and Its Application in Peptide QSARs. Biopolymers 2005, 80, 775–786. [Google Scholar] [CrossRef] [PubMed]
Liu, G.; Zeng, H.; Mueller, J.; Carter, B.; Wang, Z.; Schilz, J.; Horny, G.; Birnbaum, M.; Ewert, S.; Gifford, D. Antibody Complementarity Determining Region Design Using High-Capacity Machine Learning. Bioinformatics 2020, 36, 2126–2133. [Google Scholar] [CrossRef] [PubMed]
MATLAB. Version 7.10.0 (R2010a); The MathWorks Inc.: Natick, MA, USA, 2010. [Google Scholar]
Eaton, J.W.; Bateman, D.; Hauberg, S.; Wehbring, R. GNU Octave Version 6.3.0 Manual: A High-Level Interactive Language for Numerical Computations. 2020. Available online: https://docs.octave.org/octave-6.3.0.pdf (accessed on 29 November 2024).
Humphrey, W.; Dalke, A.; Schulten, K. VMD—Visual Molecular Dynamics. J. Mol. Graph. 1996, 14, 33–38. [Google Scholar] [CrossRef]
Bao, Y.; Bolotov, P.; Dernovoy, D.; Kiryutin, B.; Zaslavsky, L.; Tatusova, T.; Ostell, J.; Lipman, D. The Influenza Virus Resource at the National Center for Biotechnology Information. J. Virol. 2008, 82, 596–601. [Google Scholar] [CrossRef]
Xu, R.; Ekiert, D.; Krause, J.; Hai, R.; Crowe, J.; Wilson, I. Structural Basis of Preexisting Immunity to the 2009 H1N1 Pandemic Influenza Virus. Science 2010, 328, 357–360. [Google Scholar] [CrossRef]
Watanabe, A.; McCarthy, K.; Kuraoka, M.; Schmidt, A.; Adachi, Y.; Onodera, T.; Tonouchi, K.; Caradonna, T.; Bajic, G.; Song, S.; et al. Antibodies to a Conserved Influenza Head Interface Epitope Protect by an IgG Subtype-Dependent Mechanism. Cell 2019, 177, 1124–1135.e16. [Google Scholar] [CrossRef]
Walls, A.C.; Park, Y.J.; Tortorici, M.A.; Wall, A.; McGuire, A.T.; Veesler, D. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell 2020, 181, 281–292.e6. [Google Scholar] [CrossRef]
Barnes, C.; West, A.; Huey-Tubman, K.; Hoffmann, M.; Sharaf, N.; Hoffman, P.; Koranda, N.; Gristick, H.; Gaebler, C.; Muecksch, F.; et al. Structures of Human Antibodies Bound to SARS-CoV-2 Spike Reveal Common Epitopes and Recurrent Features of Antibodies. Cell 2020, 182, 828–842. [Google Scholar] [CrossRef]
Hansen, J.; Baum, A.; Pascal, K.; Russo, V.; Giordano, S.; Wloga, E.; Fulton, B.; Yan, Y.; Koon, K.; Patel, K.; et al. Studies in Humanized Mice and Convalescent Humans Yield a SARS-CoV-2 Antibody Cocktail. Science 2020, 369, 1010–1014. [Google Scholar] [CrossRef]
Arevalo, C.; Bolton, M.; Le Sage, V.; Ye, N.; Furey, C.; Muramatsu, H.; Alameh, M.; Pardi, N.; Drapeau, E.; Parkhouse, K.; et al. A Multivalent Nucleoside-Modified MRNa Vaccine Against All Known Influenza Virus Subtypes. Science 2022, 378, 899–904. [Google Scholar] [CrossRef] [PubMed]
Jiménezm, J.; Škalič, M.; Martínez-Rosell, G.; De Fabritiis, G. K_DEEP: Protein–Ligand Binding Affinity Prediction Via 3D-Convolutional Neural Networks. J. Chem. Inf. Model. 2018, 58, 287–296. [Google Scholar] [CrossRef] [PubMed]
Feinberg, E.; Sur, D.; Wu, Z.; Husic, B.; Mai, H.; Li, Y.; Sun, S.; Yang, J.; Ramsundar, B.; Pande, V. PotentialNet for Molecular Property Prediction. ACS Cent. Sci. 2018, 4, 1520–1530. [Google Scholar] [CrossRef] [PubMed]
Ruffolo, J.; Sulam, J.; Gray, J. Antibody Structure Prediction Using Interpretable Deep Learning. Patterns 2022, 3, 100406. [Google Scholar] [CrossRef]
Nachbagauer, R.; Choi, A.; Hirsh, A.; Margine, I.; Iida, S.; Barrera, A.; Ferres, M.; Albrecht, R.; García-Sastre, A.; Bouvier, N.; et al. Defining the Antibody Cross-Reactome Directed Against the Influenza Virus Surface Glycoproteins. Nat. Immunol. 2017, 18, 464–473. [Google Scholar] [CrossRef]
Murphy, K. Janeway’s Immunobiology, 8th ed.; Garland Science: New York, NY, USA, 2012. [Google Scholar]
Conti, S.; Kaczorowski, K.; Song, G.; Porter, K.; Andrabi, R.; Burton, D.; Chakraborty, A.; Karplus, M. Design of Immunogens to Elicit Broadly Neutralizing Antibodies Against HIV Targeting the CD4 Binding Site. Proc. Natl. Acad. Sci. USA 2021, 118, e2018338118. [Google Scholar] [CrossRef]
Seaman, M.; Janes, H.; Hawkins, N.; Grandpre, L.; Devoy, C.; Giri, A.; Coffey, R.; Harris, L.; Wood, B.; Daniels, M.; et al. Tiered Categorization of a Diverse Panel of HIV-1 Env Pseudoviruses for Assessment of Neutralizing Antibodies. J. Virol. 2010, 84, 1439–1452. [Google Scholar] [CrossRef]
Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.; Bambrick, J.; et al. Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
Krishna, R.; Wang, J.; Ahern, W.; Sturmfels, P.; Venkatesh, P.; Kalvet, I.; Lee, G.; Morey-Burrows, F.; Anishchenko, I.; Humphreys, I.; et al. Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom. Science 2024, 384, eadl2528. [Google Scholar] [CrossRef]
Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7112–7127. [Google Scholar] [CrossRef]
Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef]
Arevalo, C.; Le Sage, V.; Bolton, M.; Eilola, T.; Jones, J.; Kormuth, K.; Nturibi, E.; Balmaseda, A.; Gordon, A.; Lakdawala, S.; et al. Original Antigenic Sin Priming of Influenza Virus Hemagglutinin Stalk Antibodies. Proc. Natl. Acad. Sci. USA 2020, 117, 17221–17227. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Illustration of the amino acid encoding procedure [31]. (A): Multiple alignment

\hat{S}

of

N_{s}

sequences, each having

N_{r}

residues, including gaps, with

s_{i j} \in {A, C, D, E, F, G, H, I, K, L, M, N,

P, Q, R, S, T, V, W, Y,^{'} -^{'}}

. (B): Embedded alignment

\hat{X}

; each residue type in

\hat{S}

shown in A is associated with a 3-dimensional real-valued vector

x_{i j}

=

{x_{i j}^{1}, x_{i j}^{2}, x_{i j}^{3}}

, which is interpreted as Cartesian coordinates of the residue.

Figure 1. Illustration of the amino acid encoding procedure [31]. (A): Multiple alignment

\hat{S}

of

N_{s}

sequences, each having

N_{r}

residues, including gaps, with

s_{i j} \in {A, C, D, E, F, G, H, I, K, L, M, N,

P, Q, R, S, T, V, W, Y,^{'} -^{'}}

. (B): Embedded alignment

\hat{X}

; each residue type in

\hat{S}

shown in A is associated with a 3-dimensional real-valued vector

x_{i j}

=

{x_{i j}^{1}, x_{i j}^{2}, x_{i j}^{3}}

, which is interpreted as Cartesian coordinates of the residue.

Figure 2. Parameter optimization (

α

,

β

) in the strain similarity function Equation (4) applied to influenza data [20] using a diffusion constant D = 0.4. (A): A 2D scan of the (

α

,

β

) landscape; the white line corresponds to a cubic spline curve through the minimum values of the mean squared error (MSE) over the range of

α

at each

β

; (B): MSE corresponding to the white line in A plotted in 1D for several values of the diffusion constant D.

Figure 2. Parameter optimization (

α

,

β

) in the strain similarity function Equation (4) applied to influenza data [20] using a diffusion constant D = 0.4. (A): A 2D scan of the (

α

,

β

) landscape; the white line corresponds to a cubic spline curve through the minimum values of the mean squared error (MSE) over the range of

α

at each

β

; (B): MSE corresponding to the white line in A plotted in 1D for several values of the diffusion constant D.

Figure 3. Illustration of the nanoparticles used in the vaccinations modeled here [20,21]. The nanoparticles are drawn in black, and the antigens are in color, with different colors indicating different strains. (A): Nanoparticles corresponding to the mosaic vaccines V1, V2, V4, and V8 in Table 2. (B): Mosaic vaccine (top) vs. nanoparticle mixture vaccine (bottom). (C): Hypothetical elicitation of strain-specific Abs (blue Ab binding to blue antigens) vs. cross-reactive bnAbs (purple Ab binding to blue and red antigens) by the mosaic vaccine via different Fabs.

Figure 4. Comparison of model results with the experimental IgG titers for influenza [20]. (A,B): best fit to experiment; in (A), the colors correspond to experimental data for vaccines in Table 2 (red ■ V1, green ■ V2, orange ■ V4, blue ■ V8), and the black outer bars are model values. The black unfilled circles and squares correspond to model titers computed after setting to zero the residue weights in the HA stem and HA head, respectively (see text). (C): Comparison of 252 possible fits in which 5 strains were used for fitting (training) and 5 for testing (red ∘:

C_{P}

, green ▿:

C_{S}

). (D): Influenza hemagglutinin (PDB ID: 3LZG [39]) monomer colored by model weights using the color map blue-gray-red (corresponding to low-medium-high).

Figure 4. Comparison of model results with the experimental IgG titers for influenza [20]. (A,B): best fit to experiment; in (A), the colors correspond to experimental data for vaccines in Table 2 (red ■ V1, green ■ V2, orange ■ V4, blue ■ V8), and the black outer bars are model values. The black unfilled circles and squares correspond to model titers computed after setting to zero the residue weights in the HA stem and HA head, respectively (see text). (C): Comparison of 252 possible fits in which 5 strains were used for fitting (training) and 5 for testing (red ∘:

C_{P}

, green ▿:

C_{S}

). (D): Influenza hemagglutinin (PDB ID: 3LZG [39]) monomer colored by model weights using the color map blue-gray-red (corresponding to low-medium-high).

Figure 5. Comparison of Model 1 results with the experimental IgG titers for coronavirus [21]. (A,B): Best fit to experiment; in (A), the colors correspond to experimental data for vaccines in Table 3 (red ■ V1, green ■ V4A, orange ■ V4B, blue ■ V8), and the black outer bars are model values. (C): Comparison of 126 possible fits in which 5 strains were used for fitting (training) and 4 for testing (red ∘:

C_{P}

, green ▿:

C_{S}

, blue ×:

C_{P}

for training sets that include SARS-2). (D): SARS-CoV-2 receptor binding domain (RBD), colored by model weights using the color map blue-gray-red (corresponding to low-medium-high); PDB ID: 6VXX [41].

Figure 5. Comparison of Model 1 results with the experimental IgG titers for coronavirus [21]. (A,B): Best fit to experiment; in (A), the colors correspond to experimental data for vaccines in Table 3 (red ■ V1, green ■ V4A, orange ■ V4B, blue ■ V8), and the black outer bars are model values. (C): Comparison of 126 possible fits in which 5 strains were used for fitting (training) and 4 for testing (red ∘:

C_{P}

, green ▿:

C_{S}

, blue ×:

C_{P}

for training sets that include SARS-2). (D): SARS-CoV-2 receptor binding domain (RBD), colored by model weights using the color map blue-gray-red (corresponding to low-medium-high); PDB ID: 6VXX [41].

Figure 6. Comparison of model results with the experimental IgG titers for mRNA influenza vaccination [44]. (A,B): Best fit to experiment; in (A), the subplot title indicates the vaccine, the titers are shown for the same 20 antigens (see Table S1). The experimental titers are shown with error bars in blue, and the model results are shown as red circles. The vertical dashed lines separate HA group 1, HA group 2, and HB antigens (going from left to right); in (B), red circles, green triangles, and blue squares correspond to 20-HA, H1, and H3 vaccinations, and to the panels in A (left to right). (C): Comparison of 1000 training/testing splits in which 10 strains were randomly chosen for fitting (training) and the remaining 10 for testing (red ∘:

C_{P}

, green ▿:

C_{S}

). (D): Influenza hemagglutinin (PDB ID: 3LZG [39]) monomer colored by model weights using the color map blue-gray-red (corresponding to low-medium-high).

Figure 6. Comparison of model results with the experimental IgG titers for mRNA influenza vaccination [44]. (A,B): Best fit to experiment; in (A), the subplot title indicates the vaccine, the titers are shown for the same 20 antigens (see Table S1). The experimental titers are shown with error bars in blue, and the model results are shown as red circles. The vertical dashed lines separate HA group 1, HA group 2, and HB antigens (going from left to right); in (B), red circles, green triangles, and blue squares correspond to 20-HA, H1, and H3 vaccinations, and to the panels in A (left to right). (C): Comparison of 1000 training/testing splits in which 10 strains were randomly chosen for fitting (training) and the remaining 10 for testing (red ∘:

C_{P}

, green ▿:

C_{S}

). (D): Influenza hemagglutinin (PDB ID: 3LZG [39]) monomer colored by model weights using the color map blue-gray-red (corresponding to low-medium-high).

Figure 7. Two–dimensional projections of influenza hemagglutinin sequences onto principal components. Colors: projections of avian, swine, and human influenza type A spike protein sequences spanning the years 1918–2019 and subtypes 1–18, which were downloaded from the NIH influenza research database [38]; the sequences were clustered to 97% identity. Principal component analysis (PCA) was performed in MATLAB [35], as described in Methods; black bullets: projections of 11 antigens with solved Xray crystal structures (labeled with PDB codes). The strains corresponding to the PDB IDs and their sequence accession numbers are listed in Table S2.

Figure 8. Comparison of predicted IgG titers for four hypothetical np vaccines using data of Cohen et al. [20]. Cocktail refers to the mixture of 11 HA antigens, as given in Table S2.

Table 1. Values of model parameters defined in Methods.

	$α$	$β$	$τ$	D	$n_{\max}^{†}$
Influenza-np	1	4	0.015–0.03	0.49	3000
Coronavirus-np	1	5	0.02–0.04	0.3	3000
Influenza-mRNA	1.4	9	0.01	0.2	4000

^† Number of gradient descent iterations in Equation (6).

Table 2. Influenza strains used in the experiments [20] and in the present modeling. V1, V2, V4, and V8 refer to the vaccination cocktails.

ABBREVIATION	STRAIN NAME	ACCESSION CODE	V1	V2	V4	V8
C09	A/California/04/2009(H1N1)	ACS45035	✓	✓	✓	✓
V04	A/Viet Nam/1203/2004	AAW80717.1	–	–	✓	✓
J57	A/Japan/305/1957(H2N2)	AAA43185.1	–	–	–	✓
HK99	A/guinea fowl/Hong Kong/WF10/99	AAO46082.1	–	–	–	✓
AI68	A/Aichi/2/1968	AAA43178.1	–	✓	✓	✓
SH13	A/Shanghai/1/2013	EPI439486	–	–	✓	✓
JX13	A/Jiangxi/IPB13/2013(H10N8)	AHK10761	–	–	–	✓
HB09	A/swine/HuBei/06/2009(H4N1)	AFV33926	–	–	–	✓
P10	A/flat-faced bat/Peru/033/2010	AGX84934.1	–	–	–	–
WA79	A/wedge-tailed shearwater/Western Australia/2576/1979	ABB88138.1	–	–	–	–

Table 3. Coronavirus strains used in the experiments [21] and in the present modeling.

NAME	ACCESSION CODE	HOST	V1	V4A	V4B	V8
SARS-2	MN985325.1	human	✓	✓	–	✓
RaTG13	QHR63300	bat	–	✓	–	✓
SHC014	KC881005	bat	–	✓	–	✓
Rs4081	KY417143	bat	–	✓	–	✓
Pang17	QIA48632	pangolin	–	–	✓	✓
RmYN02	EPI_ISL_412977	bat	–	–	✓	✓
Rf1	DQ412042	bat	–	–	✓	✓
WIV1	KF367457	bat	–	–	✓	✓
SARS	AAP13441.1	human	–	–	–	–
Yun11	JX993988	bat	–	–	–	–
BM-4831	NC_014470	bat	–	–	–	–
BtKY72	KY352407	bat	–	–	–	–

V1, V4a, V4b, and V8 refer to the vaccination cocktails.

Table 4. Summary of model performance using influenza nanoparticle vaccination data.

Enc-Type	$c_{P}^{t r n}$ (m ± sd)	$c_{P}^{t s t}$ (m ± sd)	$c_{S}^{t r n}$ (m ± sd)	$c_{S}^{t s t}$ (m ± sd)	$p_{v a l}^{c_{P}, t s t}$ (min/max/avg)	$c_{P}^{h l d} (p_{v a l})$	$c_{S}^{h l d} (p_{v a l})$
I. Train/test split by antigen
Gr-mosaic	0.91 ± 0.03	0.84 ± 0.05	0.92 ± 0.03	0.85 ± 0.05	$2.9 \times 10^{- 9} / 3.1 \times 10^{- 3} / 4.8 \times 10^{- 5}$	N/A	N/A
At-mosaic	0.91 ± 0.03	0.74 ± 0.13	0.91 ± 0.03	0.78 ± 0.09	$4.4 \times 10^{- 9} / 8.4 \times 10^{- 2} / 6.7 \times 10^{- 3}$	N/A	N/A
Gr-np-mix	0.77 ± 0.06	0.66 ± 0.11	0.73 ± 0.07	0.65 ± 0.15	$3.1 \times 10^{- 6} / 1.3 \times 10^{- 1} / 9.3 \times 10^{- 3}$	N/A	N/A
II. Train/test split by vaccine
Gr-mosaic	0.92 ± 0.04	0.84 ± 0.10	0.92 ± 0.03	0.86 ± 0.07	$3.4 \times 10^{- 9} / 2.0 \times 10^{- 3} / 3.4 \times 10^{- 4}$	N/A	N/A
At-mosaic	0.93 ± 0.03	0.77 ± 0.11	0.90 ± 0.03	0.78 ± 0.11	$9.7 \times 10^{- 7} / 3.6 \times 10^{- 3} / 1.0 \times 10^{- 3}$	N/A	N/A
Gr-np-mix	0.85 ± 0.20	0.71 ± 0.19	0.78 ± 0.19	0.70 ± 0.19	$1.8 \times 10^{- 10} / 9.6 \times 10^{- 2} / 1.6 \times 10^{- 2}$	N/A	N/A
III. With a “holdout” set (V8)
Gr-mosaic	0.92 ± 0.02	0.74 ± 0.07	0.92 ± 0.04	0.70 ± 0.09	$1.6 \times 10^{- 3} / 1.2 \times 10^{- 1} / 1.9 \times 10^{- 2}$	0.79 ( $6.9 \times 10^{- 3}$ )	0.73 ( $2.1 \times 10^{- 2}$ )
At-mosaic	0.92 ± 0.02	0.71 ± 0.11	0.88 ± 0.05	0.69 ± 0.13	$1.4 \times 10^{- 4} / 7.7 \times 10^{- 1} / 3.7 \times 10^{- 2}$	0.77 ( $9.4 \times 10^{- 3}$ )	0.71 ( $2.8 \times 10^{- 2}$ )
Gr-np-mix	0.90 ± 0.04	0.62 ± 0.09	0.66 ± 0.12	0.53 ± 0.14	$9.8 \times 10^{- 3} / 3.4 \times 10^{- 1} / 7.0 \times 10^{- 2}$	0.70 ( $2.5 \times 10^{- 2}$ )	0.63 ( $5.3 \times 10^{- 2}$ )

Abbreviations Gr and At refer to the Grantham [31] and Atchley [32] encodings (Enc), respectively; type refers to the nanoparticle type which can be mosaic or np-mix (see Figure 3B) [20]; subscripts trn and tst refer to training and test sets, respectively. p-values (

p_{v a l}

) are reported for the Pearson correlation coefficient computed over the test set as triplets, indicating the minimum, maximum, and average values. For testing on the holdout set, the performance of the models is indicated by Pearson (

c_{P}^{h l d}

) and Spearman (

c_{S}^{h l d}

) correlation coefficients with the corresponding p-values in parentheses; N/A indicates that a holdout set was not chosen.

Table 5. Summary of model performance using coronavirus vaccination data. The quantities listed are the same as those described for influenza in Table 4. The model was used with the Grantham encoding.

$c_{P}^{t r n}$ (m ± sd)	$c_{P}^{t s t}$ (m ± sd)	$c_{S}^{t r n}$ (m ± sd)	$c_{S}^{t s t}$ (m ± sd)	$p_{v a l}^{c_{P}, t s t}$ (min/max/avg)	$c_{P}^{h l d} (p_{v a l})$	$c_{S}^{h l d} (p_{v a l})$
I. Train/test split by antigen
0.86 ± 0.04	0.59 ± 0.13	0.84 ± 0.06	0.56 ± 0.13	$9.6 \times 10^{- 5} / 2.5 \times 10^{- 1} / 3.9 \times 10^{- 2}$	N/A	N/A
II. Train/test split by vaccine
0.89 ± 0.03	0.65 ± 0.16	0.84 ± 0.12	0.64 ± 0.16	$1.8 \times 10^{- 6} / 3.8 \times 10^{- 2} / 1.5 \times 10^{- 2}$	N/A	N/A
III. With a “holdout” set (V8)
0.87 ± 0.04	0.68 ± 0.12	0.86 ± 0.04	0.63 ± 0.15	$9.1 \times 10^{- 4} / 6.8 \times 10^{- 1} / 6.5 \times 10^{- 2}$	0.80 ( $9.0 \times 10^{- 3}$ )	0.83 ( $8.3 \times 10^{- 3}$ )

Table 6. Summary of model performance using influenza mRNA vaccination data [44]. The quantities listed are the same as those described for nanoparticle influenza in Table 4. The model was used with the Grantham encoding. For the train/test split by antigen, 1000 antigen panels were generated, in which 10 out of 20 antigens were chosen for training; the remaining antigens were used for testing. For the train/test split by the vaccine, we used data for two out of three vaccine titer sets for training, and the remaining one, for testing.

$c_{P}^{t r n}$ (m ± sd)	$c_{P}^{t s t}$ (m ± sd)	$c_{S}^{t r n}$ (m ± sd)	$c_{S}^{t s t}$ (m ± sd)	$p_{v a l}^{c_{P}, t s t}$ (min/max/avg)
I. Train/test split by antigen
0.95 ± 0.02	0.89 ± 0.06	0.89 ± 0.03	0.86 ± 0.05	$1.0 \times 10^{- 24} / 5.8 \times 10^{- 11} / 2.6 \times 10^{- 13}$
II. Train/test split by vaccine
0.96 ± 0.03	0.58 ± 0.5	0.83 ± 0.05	0.52 ± 0.5	$3.7 \times 10^{- 30} / 5.7 \times 10^{- 17} / 1.9 \times 10^{- 17}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ovchinnikov, V.; Karplus, M. Phenomenological Modeling of Antibody Response from Vaccine Strain Composition. Antibodies 2025, 14, 6. https://doi.org/10.3390/antib14010006

AMA Style

Ovchinnikov V, Karplus M. Phenomenological Modeling of Antibody Response from Vaccine Strain Composition. Antibodies. 2025; 14(1):6. https://doi.org/10.3390/antib14010006

Chicago/Turabian Style

Ovchinnikov, Victor, and Martin Karplus. 2025. "Phenomenological Modeling of Antibody Response from Vaccine Strain Composition" Antibodies 14, no. 1: 6. https://doi.org/10.3390/antib14010006

APA Style

Ovchinnikov, V., & Karplus, M. (2025). Phenomenological Modeling of Antibody Response from Vaccine Strain Composition. Antibodies, 14(1), 6. https://doi.org/10.3390/antib14010006

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Phenomenological Modeling of Antibody Response from Vaccine Strain Composition

Abstract

1. Introduction

2. Materials and Methods

2.1. Model Overview

2.2. Sequence Coordinate Notation

2.3. Distance to Average Strain IgG Model

2.4. Model Parameter Optimization

2.5. Principal Component Analysis

3. Results

3.1. Influenza Results

3.2. Coronavirus Results

3.3. mRNA Influenza Vaccine

3.4. Application to Vaccine Efficacy Prediction

4. Concluding Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI