1. Introduction
Antibodies are promising therapeutics for a variety of ailments, but the large size of antibody molecules means that the total mass of protein that must often be administered to achieve a therapeutic effect can be very high. In most cases, the preferred method of delivery is subcutaneous injection, which imposes strict limitations on the delivery volume. At the high concentrations (e.g., ≥100 mg/mL) required to achieve injection volumes compatible with subcutaneous administration, many antibody solutions become too viscous for traditional subcutaneous injections or pose challenges for standard manufacturing operations [
1,
2,
3,
4]. In some cases, this high viscosity can be alleviated in formulation but, often, the only option is to reduce the protein concentration and switch to intravenous delivery, which incurs a much higher cost and less patient convenience than subcutaneous administration. Since formulation efforts to modify viscosity can be challenging and costly, it would be beneficial to select less viscous molecules in screening. However, viscosity problems are often not apparent until late in the development pipeline when it is possible to generate the high-concentration solutions necessary for rheological measurements. Therefore, there is a strong need for methods to predict viscosity behavior from the dilute solutions available in the early stages of development. However, such predictions are inherently difficult because the two-body interactions occurring in dilute solutions are fundamentally different from the many-body interactions that are responsible for the nonlinear increase in viscosity observed at high concentrations.
Many previous methods have been developed to predict antibody viscosity from molecular properties using database methods. One recent study [
5] identified from a set of 59 therapeutic monoclonal antibodies (mAbs) that viscosity and opalescence could be predicted through measurement of weak self-interactions in dilute solution (second virial coefficient,
B2, or dynamic light scattering parameter,
kD). By contrast, in previous experimental work [
6] using a similar size sample set, it was observed that mAb viscosities increase strongly with hydrophobicity and charge dipole distribution but decrease with net charge. Along these lines, computational models that employ spatial charge mapping (SCM) [
7] calculate the charge on 3D homology models, resulting in an SCM score that can aid in identifying potentially viscous mAbs early in development. More recently, approaches that utilize machine learning have also been used to find the molecular properties responsible for high viscosity. One such study [
8] found that the overall net charge and the amino acid composition in the mAb variable region (Fv) are key properties that influence viscosity. Combined, these approaches and others have identified key characteristics such as surface electrostatics,
kD,
B2, and hydrophobicity [
4,
6,
7,
9,
10,
11]. While powerful, these studies are often limited in scope and may use curated mAb sample sets, which can hamper broader predictive abilities.
Another approach to predicting protein viscosity is to mimic the many-body environment in silico. This approach seeks to model the various weak protein–protein interactions that ultimately lead to macroscopic viscosity challenges [
12,
13]. Due to mAb size and complexity as well as computational cost, such methods often require extensive coarse graining (CG) of the molecules to achieve the large systems needed to assess viscosity. In early studies, Li et al. [
14] used a CG approach and observed that the net charge, ξ-potential, and pI of Fv regions correlated with high-viscosity mAb solutions. Similarly, Buck et al. [
12] found that transient antibody networks are formed due to domain–domain electrostatic complementarities. They propose that these interactions are most likely the origin of high-concentration viscosity for mAb solutions. In work by Calero-Rubio et al. [
15], they combined experimental light-scattering data to measure second osmotic virial coefficients (B
22) in dilute conditions with CG modeling to predict high-concentration viscosity behavior. Interestingly, their studies revealed that the addition of sucrose, a common excipient in drug product development, induces stronger repulsions between protein molecules via the accumulation of sucrose around the protein surface. Key limitations of CG computational approaches are the poor representation of short-range interactions, the lack of model flexibility, and the lack of atomistic detail in CG modeling. Additionally, computational costs/complexity as well as expert user interpretation can limit the utility of this approach.
Physics-based models have also been applied as a method to understand antibody viscosity. In principle, these methods have broader applicability because the results can be extended to conditions outside the training set. They have also shown the potential to bridge disparate length scales. This is carried out by accounting for amino acid-scale differences between molecules via their effect on parameters within a solution model described by either reptation theory or Wertheim theory [
16,
17,
18]. A key insight from these methods is that the primary driver of viscosity comes from the entanglement of elongated complexes, an observation that is consistent with the simulations of Buck et al. [
12].
In this manuscript, we describe an approach in which physics-based theory is used to predict many-body solution characteristics from experiments that are sensitive to two-body interactions. We achieve predictive success comparable to other methods in the literature; however, because our approach is based on physical models, it provides testable explanations that allow for future refinement.
2. Materials and Methods
2.1. Reptation Model
Our theory for antibody viscosity is based on the observation that the differences between antibodies are mostly localized to the complementarity-determining region (CDR). Therefore, the interactions driving viscosity should originate from CDR-mediated association. This simple observation has important implications for the morphology of antibody clusters. This is because the two CDRs on each molecule limit the number of interaction partners a given molecule has within an antibody cluster. For example, if the antibodies interact via CDR-CDR interactions, the resulting clusters will be linear chains of antibodies. We refer to this as the head-to-head (HH) model [
16]. Interactions between a CDR and another patch of the molecule increases the interaction valence of each molecule to three or four, which can result in branched chains. This is referred to as the head-to-tail (HT) model [
17]. Importantly, in each case, the limited valence arising from CDR-mediated binding will favor the formation of elongated structures. This should be contrasted with compact aggregates, which require interactions that are not directionally specified (for example, in colloidal instability).
The average assembly size can be evaluated for both the HH and HT models. For the HT model, the result is [
17]
where
K is the association equilibrium constant for the formation of antibody dimers by HH contacts, which is defined by
, where
C2 and
C1 are dimer and monomer concentration, respectively, and
C is the total antibody concentration.
For the HT model, the average size is given by
where
S is the association constant for HT dimers. The HH and HT models are difficult to distinguish in the limits of low concentration and/or low affinity when branched structures are rare. Therefore, for the moment, we focus on the HH model and revisit this simplification later in the manuscript.
The existence of elongated structures has important consequences for the solution dynamics because they occupy a much larger effective volume than the substituent molecules and are prone to entanglement. Entanglement prevents elongated structures from diffusing laterally, so nearly all diffusion occurs through snake-like motion along the elongated direction [
19]. This situation is described by reptation theory, which was originally conceived for long, flexible polymers but is also used to describe solutions of rodlike particles. In reptation theory, the solution viscosity is given by [
16,
17,
19,
20,
21]
where
L is the length of the elongated structures expressed in units of a monomer present at total concentration
C.
ν = 3/5 is the Flory exponent, which describes the effective volume occupied by the elongated structure.
Inserting our expression for the average complex size (Equation (1)) into Equation (3), we arrive at an expression for the viscosity of an antibody solution.
where
A is a proportionality constant with the value 5.4×10
−8cP(mg/mL)
3.75 [
16]. This constant describes lateral interactions between elongated antibody complexes. Therefore, it will be constant within a given antibody scaffold, but it can change due to disulfides within the hinge region that modify molecular flexibility or under solution conditions that promote non-specific interactions.
Equation (4) describes complex, many-body interactions driving antibody viscosity in terms of a parameter, K, that describes two-body interactions. This means that if K can be measured through dilute solution methods, we can predict the viscosity at high concentration. However, Equation (4) relies on several assumptions including the following: (1) HT interactions are ignored, (2) the dimerization physics at high concentration is approximately the same as that at low concentration, and (3) the “slithering” dynamics captured in the “A” parameter are constant. We expect that each of these assumptions will be violated in certain cases, but these cases can be experimentally identified and used to further refine the method.
2.2. Experimental Methods
A data set of 89 mAbs was generated by Amgen [
22]. This data set consists of AC-SINS plasmon shift (Δ
λ), AC-SINS diffusion coefficient (
Dnp), dynamic light scattering interaction parameter (
kD), viscosity at
~70 mg/mL and
~150 mg/mL at 25 °C. Mock et al. [
22] describes the experimental procedures to generate the data set including buffer conditions, coupling protocol to the nanoparticles, etc. Out of these 89 mAbs, 63 molecules were randomly selected as a training set, 17 molecules were randomly selected as a validation set, and the remaining 9 molecules were used as a test set. By fitting Equation (4) to these viscosity values, we obtained dimerization affinity parameter,
K, for each molecule in the sample set.
We employed two methods to assess the two-body interaction parameter, K.
AC-SINS (Affinity-Capture Self-Interaction Nanoparticle Spectroscopy) is an assay in which antibodies are attached to gold nanoparticles [
23,
24,
25,
26]. The resulting nanoparticles will then associate via antibody–antibody (Ab-Ab) interactions, which can be detected as either a reduction in the nanoparticle diffusivity or a shift in the plasmon wavelength. AC-SINS has advantages where, by clustering antibodies on the surface of the nanoparticle, it mimics the high-concentration environment and provides an amplification effect to increase the sensitivity to weak interactions. However, the method by which antibodies are tethered to the nanoparticle will prevent certain regions of the antibody from interacting with other antibodies. Therefore, we expect that AC-SINS will be insensitive to head-to-tail interactions. In addition, after the antibody capture protocol, there remains a significant concentration of soluble test antibody present in solution. This may result in three-body “bridging” interactions that complicate interpretation.
DLS (dynamic light scattering) can be used to measure the concentration-dependent change in antibody diffusivity [
27,
28]. Attractive interactions will result in a decrease in diffusion while repulsive interactions will increase diffusion. Since the antibodies are untethered in this experiment, DLS is sensitive to the HT interactions that are missing in AC-SINS. The observed diffusivity change is the result of a statistical average of all interactions. This can also complicate analysis if repulsive interactions between certain parts of the molecule (which minimally contribute to the viscosity profile) prevent the formation of attractive interactions between regions of the molecule that strongly influence viscosity.
Despite these limitations, we expect that both AC-SINS and DLS measurement will be most strongly affected by the CDR-dependent interactions that drive viscosity.
2.3. Description of Algorithm
Equation (4) describes the concentration-dependent viscosity of an antibody solution as a function of a single parameter, K, which describes the first dimerization event in the formation of elongated antibody complexes. Thus, we can predict the high-concentration viscosity if we can estimate K using methods such as those described in the previous section.
The major pitfall to this approach is that it implicitly assumes that the variation in dilute solution interactions is dominated by a single binding mode that is also responsible for the formation of elongated complexes at high concentration. The presence of multiple interaction modes will complicate the correlation between the dilute properties and
K, particularly if there is an interaction that results in the formation of compact assemblies, which will minimally contribute to the viscosity. Another complication is that some protein–protein interactions, such as electrostatic interactions [
29,
30,
31], can have many-body contributions, causing them to qualitatively differ between the dense and dilute environments. Additionally, Equation (4) assumes that the major relaxation mechanism in the concentrated state is the reptation diffusion of antibody complexes. However, if the complexes have a branched structure, rather than linear, then reptation will be strongly suppressed. As a result, antibodies that interact head-to-tail will have qualitatively different behaviors than head-to-head antibodies above the concentration where branching becomes significant [
17].
The effect of these complicating factors can be seen in
Figure 1, which shows the correlation between the diffusion of AC-SINS nanoparticles, the
kD parameter, and the
K values obtained from fitting the viscosity profile to Equation (4). Some of the
K values in
Figure 1 are very small, and in a few cases (training set molecule ID 17 (Tr
17), Tr
26, Tr
46, and Tr
51) are even negative. These non-physical values are indicative that these molecules are not experiencing entangled dynamics and, therefore, Equation (4) is not applicable. Instead, a non-entangled theory accounting for colloidal collisions should be used [
17]. These four molecules are excluded from our fits. Both
Dnp and
kD show a strong correlation between the dilute measurements and
K, along with a significant number of outliers. Upon closer inspection, many of these outliers can be understood. Molecules above the trend line in
Figure 1A,B are those whose viscosity is greater than what is expected from the AC-SINS experiment. This behavior is consistent with HT binding between antibodies, which will cause the AC-SINS experiment to predict anomalously low viscosity for two reasons. The first reason is that one of the interaction sites responsible for the formation of elongated complexes may be sterically inhibited by the tether of the antibody to the nanoparticle. This explanation is supported by the fact that many of these molecules fall near the trend line in the DLS experiment (
Figure 1C) in which the entire molecule is accessible for interactions. Further support comes from the fact that these molecules also deviate from the
Dnp vs.
kD trend line, consistent with the nanoparticle interfering with attracting interaction (
Figure 2). The second reason is that strong HT binding will lead to the formation of branched antibody complexes [
17] that are unable to relax through the reptation mechanism described by Equation (4) and, therefore, will have much a higher viscosity than would be expected from linear complexes. This is consistent with the fact that highly viscous molecules appear in both the AC-SINS and DLS experiments (
Figure 1A,C).
Further inspection of
Figure 1A shows that the correlation between viscosity and nanoparticle diffusion breaks down for diffusion rates less than 1 μm
2/s. It is likely that this is because the Stokes radius
grows less strongly as the number of particles in a cluster,
N, grows larger.
is the fractal dimension of a cluster of nanoparticles, which will lie between the values
for compact clusters and
for diffusion-limited aggregation [
32,
33]. Therefore, nanoparticle diffusion is expected to have poor sensitivity for strongly interacting antibodies, due to the large clusters found in these cases.
To avoid the poor sensitivity in the strong interaction regime, we limit our analysis to antibodies with
Dnp > 1 μm
2/s. We also exclude the three molecules identified as suspected HT binders. The best fit line for the remaining molecules is
For the DLS experiments, we omit two sets of outliers before obtaining the best-fit line. The first is a single molecule with a
kD = 84 mL/g, which is inconsistent with the expected range for antibodies, suggesting an experimental error [
3,
34]. The second set is molecules with
K > 0.01 mL/mg, which, as we explain below, are unlikely to be described by Equation (4). The greater number of molecules included in the
kD fit is because we expect that
kD is sensitive in cases where the nanoparticle will distort the results. The best fit line for the included molecules is
To summarize, our algorithm predicts the viscosity using either Dnp or kD values measured in dilute solution. These values can be input into the best fit lines given by Equations (5) or (6) to estimate the association parameter, K, from AC-SINS and DLS experiments, respectively. The K parameter can then be used in Equation (4) to predict the viscosity at the high concentrations where antibodies are in the entangled regime. This prediction will be most reliable for weak/moderate interacting antibodies that lie within the sensitive range of the AC-SINS experiment.
3. Results and Discussions
Figure 3 compares the predicted viscosity curves for the validation set of 17 molecules to the measured viscosity at two concentrations. The correlation between the predicted and measured viscosities is comparable to the training set (
Figure 1). Specifically, there are two molecules (validation set molecule ID 1 (V
1) and V
15) in which the DLS prediction is good, but the nanoparticle diffusion (
Dnp) under-predicts the viscosity. Since this is the expected result for HT binders, we predict that structural analysis would reveal an interaction between the CDR and a location occluded by nanoparticle and/or capture antibody. There are another three molecules (V
2, V
7, and V
9) in which both methods under-predict the viscosity. These molecules lie below the 1 μm
2/s threshold where the correlation between nanoparticle diffusion (
Dnp) and viscosity is observed to break down (
Figure 1). More experiments will be required to determine if this breakdown can be rectified by refining the experimental protocol to account for the long equilibration times expected of strongly binding molecules or if it is indicative of a different physical regime (for example, reptative and non-reptative relaxation for HH and HT binders, respectively).
We further evaluate the performance of our model using test data of nine additional molecules. Both ACSINS and DLS experiments measure the viscosity at 4 or 5 discrete point concentrations ranging from 70 mg/mL to 250 mg/mL, which allow us to evaluate the accuracy of the model-predicted concentration–viscosity curve.
In the log–log representation of the actual data (
Figure 4), it becomes evident that viscosities above 100 mg/mL and below 100 mg/mL manifest distinct power regimes, a phenomenon previously documented by Schmit et al. [
16]. The observed poor agreement at concentrations below 100 mg/mL is attributed to the molecules being excessively dilute, thus failing to exhibit the entangled dynamics described by Equation (4). Overall experimental viscosities agree with the predicted viscosity curve shown in
Figure 4. For test set molecules ID 2 (Te
2) and ID 4 (Te
4), both DLS and ACSINS algorithms underpredict the viscosity. As we observed in the training and validation sets, both outliers have
Dnp less than 1 μm
2/s.
Our model highlights the need to better understand the physics of antibody viscosity and effects that lead to the scatter in our fits (
Figure 1). Of these effects, the most important is head-to-tail binding. HT binding is difficult to assess through AC-SINS because the nanoparticle and/or capture antibody can sterically inhibit, or even occlude the binding sites responsible for increasing viscosity. This difficulty mitigates the sensitivity advantage of AC-SINS over DLS. While the occlusion of binding sites could be avoided using different ligation techniques, a larger uncertainty comes from the lack of reptation as a relaxation mechanism for branched antibody complexes. Resolving this uncertainty will require a comparison of viscosity curves of molecules for which the intermolecular association sites have been identified through techniques such as Hydrogen–Deuterium Exchange [
35,
36]. Another possible source of outliers is antibody association sites that are not compatible with the formation of elongated structures. Such a site would contribute to lower diffusion in AC-SINS and DLS experiments but would not lead to the entanglements that are the primary contributor to increased viscosity. Finally, a central weakness of dilute solution measurements in predicting high-concentration behavior is the possibility that many-body effects can invalidate the assumption of pairwise interactions. Such non-pairwise interactions are especially prevalent for charged molecules at high concentration [
31,
37]. The dense layer of proteins at the surface of AC-SINS nanoparticles may capture some many-body effects, although the geometry dependence of electrostatic screening interactions [
29] makes it unclear how much the nanoparticle surface can recapitulate a high-concentration solution.
To gain further insight into the relationship between
Dnp,
kD, and viscosity, we generated a 3D plot of all molecules in the training set (
Figure 5). Inspection of this plot suggests that our proposed cutoff for HT binders (
Figure 2) may have been too conservative. A revised cutoff line (black dashed line) divides the set into a region above the line where all three quantities are well correlated and a region below the line containing three sets of outliers. The first set of outliers is the previously identified HT molecules (red in
Figure 2 and
Figure 5). The second set is the high-viscosity outliers that extend off the scale of the plot. These unphysically large affinity constants are likely to be the result of fitting large, branched HT complexes, which cannot undergo reptation, to the reptation model. Third, the revised cutoff line separates a cluster of molecules at about
kD = 20 mL/g that deviates from the main group. Again, these molecules have a
Dnp greater than we would expect for the observed
kD and viscosity. But the overall affinity is too weak for these to show up as outliers in our original analysis.
Next, we evaluate whether this model can be used to assist the de-selection of viscous antibody molecules at the early candidate screening stage. In
Table 1, both experimental and predicted viscosity values (based on
Dnp and
kD) at ~150 mg/mL and experimental viscosity are listed for the 17 test molecules.
Here, we propose a binary classification system to categorize these mAbs using a cutoff value of 20 cP. Color coding was then applied to differentiate high- (>20 cP) vs. low- (≤20 cP) viscosity molecules as measured at 150 mg/mL and experimental concentrations. This cutoff is somewhat complicated by the fact that the experimental concentration often deviated from this value, which is significant for some borderline cases. Out of the 17 test molecules in
Table 1, the viscosity models flagged 3 molecules (V
1, V
2, and V
15), all 3 of which were experimentally found to be viscous. Three additional molecules (V
7, V
9, and V
11) were experimentally found to lie above the viscosity threshold. In the case of molecule V
11, the predicted viscosity is actually very close to the actual viscosity if we evaluate the viscosity at the same concentration of the experimental measurement. The remaining two mAbs (V
7 and V
9) are the two identified as outliers in (
Figure 3 and related discussion).
The same comparison was then conducted on a test set of nine molecules in
Table 2. Since viscosity was collected at multiple concentrations for these samples, the viscosity concentration curve was derived. Viscosities at exactly 150 mg/mL were calculated and compared with their corresponding predictions. Predictions suggest that all molecules have low viscosity, and 7 of the 9 molecules did have low experimental viscosity. However, two molecules (Te
2 and Te
4) were found to have high viscosity. This again is because the model has limited predictive power for the high-viscosity regime. We have incorporated a correlation graph into the
Supplementary Materials (Figure S1) to provide a more comprehensive evaluation of both the validation data set (
Table 1) and the test data set (
Table 2). Overall, we can conclude that (1) the model can effectively flag the viscous molecules and (2) the model still faces challenge with predictions for viscosity above 40 cP.