Data Augmentation to Support Biopharmaceutical Process Development through Digital Models—A Proof of Concept
Abstract
:1. Introduction
2. Materials and Methods
2.1. Methodological Procedure
- Step 1—Experimental campaign on the mAbs production process: batch data are obtained from experiments performed on the development scale of the process under study according to the availability of resources. In this work, we consider a simulated process for the production of mAbs at the shake-flask scale (Section 2.2);
- Step 2—In silico batch generation from a digital model: data on real batches are utilized through digital models of the process to drive the generation of in silico batches with a wider variety of behaviors. In particular, two alternative modeling strategies are adopted in this work: a first principle digital model (Section 2.3) and a hybrid digital model (Section 2.4);
- Step 3—Multivariate data-based modeling: all the available data (both the ones from the process and the ones generated in silico) are fed to a data-based model to support the process development and scale-up. In this work, process and in silico generated batches are regressed to estimate a CQA (i.e., mAb titer at harvest) through multivariate latent variable modeling (Section 2.5). In this way, the multivariate models exploit the data of a few process batches and the additional process knowledge extracted from the in silico generated batches, to make estimations of cell behavior for new samples from the culture variable time trajectories. Such estimations, especially in the presence of biological variability in the batches, are not feasible with the digital models of the process, which can only estimate the culture variable trajectories when the inputs (i.e., process initial conditions, feed composition, and scheduling) are manipulated given the biological characteristics already hardcoded in the digital model parameters.
2.2. Process for the Production of Monoclonal Antibodies
2.3. Modeling Strategy 1: First Principles Digital Model
In Silico Batch Generation through First Principles Digital Model
2.4. Modeling Strategy 2: Hybrid Digital Model
2.4.1. In Silico Batch Generation through Hybrid Digital Model
2.5. Multivariate Predictive Modeling
3. Results and Discussion
3.1. Monoclonal Antibodies Titer Estimation
3.1.1. Titer Estimation Performance and Sensitivity to the Available Number of Process Calibration Batches
3.1.2. Effect of Data Augmentation on the Estimation Performance
3.2. Process Understanding for mAbs Titer Estimation
3.2.1. Process Understanding with Process Batches Only
3.2.2. Process Understanding Supported by FPDM In Silico Data Augmentation
3.2.3. Process Understanding Supported by HDM In Silico Data Augmentation
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Parameter | Kontoravdi et al. [30] (Mean) | Standard Deviation |
---|---|---|
μmax (h−1) | 0.05800 | 0.0068 |
μd,max (h−1) | 0.03000 | 0.0025 |
kglc (mM) | 0.75000 | - |
kgln (mM) | 0.07500 | - |
ki,lac (mM) | 171.76000 | - |
ki,amm (mM) | 28.48000 | - |
Kd,amm (mM) | 1.76000 | 0.4253 |
N (-) | 2.00000 | - |
Yx,glc (Cell/mmol) | 2.60 × 108 | 3.10 × 107 |
mglc (mmol/cell h) | 4.9 × 10−14 | - |
Yx,gln (Cell/mmol) | 8.00 × 108 | 1.6 × 108 |
α1 (mmol L/cell h) | 3.4 × 10−13 | - |
α2 (mM) | 4.00000 | - |
Ylac,glc (mmol/mmol) | 2.00000 | - |
Yamm,gln (mmol/mmol) | 0.45000 | 0.0825 |
kd,gln (h−1) | 9.6 × 10−3 | 0.0030 |
NH (gene/cell) | 100.00000 | - |
SH (mRNA/gene h) | 3000.00000 | - |
K (h−1) | 0.10000 | - |
NL (gene/cell) | 100.00000 | - |
SL (mRNA/gene h) | 4500.00000 | - |
KA (cell/molecule L) | 1.0 × 10−6 | - |
TH (chain/mRNA h) | 17.00000 | - |
TL (chain/mRNA h) | 11.5.00000 | - |
KER (h−1) | 0.69000 | - |
ε1 (-) | 0.99500 | 0.1492 |
KG (h−1) | 0.14000 | - |
γ1 (-) | 0.10000 | - |
γ2 (h) | 2.0000 | 0.333 |
ε2 (-) | 1.0000 | 0.15 |
λmAb (g/mol) | 2.5 × 10−16 | - |
Appendix B
Variable | Reference | Minimum | Maximum |
---|---|---|---|
μg,max (h−1) | 0.073 | 0.058 | 0.090 |
km,glc (mM) | 0.010 | - | - |
αx (105 Cell/mmol) | 44,704.000 | - | - |
kd,max (h−1) | 0.020 | 0.015 | 0.041 |
kd,μ (h−1) | 0.635 | - | - |
Yx,glc (105 Cell/mmol) | 65,341.000 | 47,700.000 | 80,700.000 |
Ylac,glc (mmol/mmol) | 1.700 | - | - |
Yx,lac (105 Cell/mmol) | 182,050.000 | - | - |
km,lac (mM) | 3.908 | - | - |
Ymab,glc (mg/mmol) | 150.000 | 100.000 | 180.000 |
kgln (mM) | 0.020 | 0.020 | 0.050 |
Yx,gln (105 Cell/mmol) | 8000.000 | 7000.000 | 11,000.000 |
Kglc (-) | 0.200 | - | - |
μg,max (h−1) | 0.073 | 0.058 | 0.090 |
km,glc (mM) | 0.010 | - | - |
αx (105 Cell/mmol) | 44,704.000 | - | - |
Variable | Training (Mean) | Standard Deviation |
---|---|---|
μmax,VC | 2.00 | 0.13 |
μmax,glucose | 8.00 | 0.27 |
μmax,glutamine | 3.00 | 0.10 |
μmax,lactate | 8.00 | 0.53 |
μmax,ammonia | 2.00 | 0.13 |
μmax,mAb | 2.00 | 0.13 |
References
- Tripathi, N.K.; Shrivastava, A. Recent Developments in Bioprocessing of Recombinant Proteins: Expression Hosts and Process Development. Front. Bioeng. Biotechnol. 2019, 7, 420. [Google Scholar] [CrossRef] [PubMed]
- Walsh, G. Biopharmaceutical benchmarks 2018. Nat. Biotechnol. 2018, 36, 1136–1145. [Google Scholar] [CrossRef] [PubMed]
- Yang, O.; Qadan, M.; Ierapetritou, M. Economic Analysis of Batch and Continuous Biopharmaceutical Antibody Production: A Review. J. Pharm. Innov. 2020, 15, 182–200. [Google Scholar] [CrossRef] [PubMed]
- Li, F.; Vijayasankaran, N.; Shen, A.; Kiss, R.; Amanullah, A. Cell culture processes for monoclonal antibody production. MAbs 2010, 2, 466–479. [Google Scholar] [CrossRef]
- Farid, S.S.; Baron, M.; Stamatis, C.; Nie, W.; Coffman, J. Benchmarking biopharmaceutical process development and manufacturing cost contributions to R&D. MAbs 2020, 12, 1754999. [Google Scholar] [CrossRef]
- Epifa, The Pharmaceutical Industry in Figures—Key Data 2021. Available online: https://www.efpia.eu/publications/downloads/efpia/the-pharmaceutical-industry-in-figures-2021/ (accessed on 28 July 2022).
- Rameez, S.; Mostafa, S.S.; Miller, C.; Shukla, A.A. High-throughput miniaturized bioreactors for cell culture process development: Reproducibility, scalability, and control. Biotechnol. Prog. 2014, 30, 718–727. [Google Scholar] [CrossRef]
- Clarke, C.; Doolan, P.; Barron, N.; Meleady, P.; O’Sullivan, F.; Gammell, P.; Melville, M.; Leonard, M.; Clynes, M. Predicting cell-specific productivity from CHO gene expression. J. Biotechnol. 2011, 151, 159–165. [Google Scholar] [CrossRef]
- Barberi, G.; Benedetti, A.; Diaz-Fernandez, P.; Sévin, D.C.; Vappiani, J.; Finka, G.; Bezzo, F.; Barolo, M.; Facco, P. Integrating metabolome dynamics and process data to guide cell line selection in biopharmaceutical process development. Metab. Eng. 2022, 72, 353–364. [Google Scholar] [CrossRef]
- Facco, P.; Zomer, S.; Rowland-Jones, R.C.; Marsh, D.; Diaz-Fernandez, P.; Finka, G.; Bezzo, F.; Barolo, M. Using data analytics to accelerate biopharmaceutical process scale-up. Biochem. Eng. J. 2020, 164, 107791. [Google Scholar] [CrossRef]
- Ahuja, S.; Jain, S.; Ram, K. Application of multivariate analysis and mass transfer principles for refinement of a 3-L bioreactor scale-down model-when shake flasks mimic 15,000-L bioreactors better. Biotechnol. Prog. 2015, 31, 1370–1380. [Google Scholar] [CrossRef]
- Goldrick, S.; Holmes, W.; Bond, N.J.; Lewis, G.; Kuiper, M.; Turner, R.; Farid, S.S. Advanced multivariate data analysis to determine the root cause of trisulfide bond formation in a novel antibody–peptide fusion. Biotechnol. Bioeng. 2017, 114, 2222–2234. [Google Scholar] [CrossRef] [PubMed]
- Sokolov, M.; Ritscher, J.; MacKinnon, N.; Bielser, J.-M.; Brühlmann, D.; Rothenhäusler, D.; Thanei, G.; Soos, M.; Stettler, M.; Souquet, J.; et al. Robust factor selection in early cell culture process development for the production of a biosimilar monoclonal antibody. Biotechnol. Prog. 2017, 33, 181–191. [Google Scholar] [CrossRef] [PubMed]
- Kotidis, P.; Kontoravdi, C. Harnessing the potential of artificial neural networks for predicting protein glycosylation. Metab. Eng. Commun. 2020, 10, e00131. [Google Scholar] [CrossRef] [PubMed]
- Kjeldahl, K.; Bro, R. Some common misunderstandings in chemometrics. J. Chemom. 2010, 24, 558–564. [Google Scholar] [CrossRef]
- Tulsyan, A.; Garvin, C.; Undey, C. Industrial batch process monitoring with limited data. J. Process Control. 2019, 77, 114–133. [Google Scholar] [CrossRef]
- Mercier, S.M.; Diepenbroek, B.; Wijffels, R.H.; Streefland, M. Multivariate PAT solutions for biopharmaceutical cultivation: Current progress and limitations. Trends Biotechnol. 2014, 32, 329–336. [Google Scholar] [CrossRef]
- Maharana, K.; Mondal, S.; Nemade, B. A Review: Data Pre-Processing and Data Augmentation Techniques. Glob. Transit. Proc. 2022, 3, 91–99. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Rato, T.J.; Delgado, P.; Martins, C.; Reis, M.S. First Principles Statistical Process Monitoring of High-Dimensional Industrial Microelectronics Assembly Processes. Processes 2020, 8, 1520. [Google Scholar] [CrossRef]
- Chen, Z.S.; Zhu, B.; He, Y.L.; Yu, L.A. A PSO based virtual sample generation method for small sample sets: Applications to regression datasets. Eng. Appl. Artif. Intell. 2017, 59, 236–243. [Google Scholar] [CrossRef]
- Lee, S.S. Noisy replication in skewed binary classification. Comput. Stat. Data Anal. 2000, 34, 165–191. [Google Scholar] [CrossRef]
- Xie, Q.; Dai, Z.; Hovy, E.; Luong, M.; Le, Q.V. Unsupervised Data Augmentation for Consistency Training. Adv. Neural Inf. Process. Syst. 2020, 33, 6256–6268. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- O’Brien, C.M.; Zhang, Q.; Daoutidis, P.; Hu, W.S. A hybrid mechanistic-empirical model for in silico mammalian cell bioprocess simulation. Metab. Eng. 2021, 66, 31–40. [Google Scholar] [CrossRef] [PubMed]
- Tulsyan, A.; Garvin, C.; Ündey, C. Advances in industrial biopharmaceutical batch process monitoring: Machine-learning methods for small data problems. Biotechnol. Bioeng. 2018, 115, 1915–1924. [Google Scholar] [CrossRef] [PubMed]
- Marouf, M.; Machart, P.; Bansal, V.; Kilian, C.; Magruder, D.S.; Krebs, C.F.; Bonn, S. Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks. Nat. Commun. 2020, 11, 166. [Google Scholar] [CrossRef]
- Jimenez del Val, I.; Fan, Y.; Weilguny, D. Dynamics of immature mAb glycoform secretion during CHO cell culture: An integrated modelling framework. Biotechnol. J. 2016, 11, 610–623. [Google Scholar] [CrossRef]
- Narayanan, H.; Sokolov, M.; Morbidelli, M.; Butté, A. A new generation of predictive models: The added value of hybrid models for manufacturing processes of therapeutic proteins. Biotechnol. Bioeng. 2019, 116, 2540–2549. [Google Scholar] [CrossRef]
- Kontoravdi, C.; Pistikopoulos, E.N.; Mantalaris, A. Systematic development of predictive mathematical models for animal cell cultures. Comput. Chem. Eng. 2010, 34, 1192–1198. [Google Scholar] [CrossRef]
- Oliveira, R. Combining first principles modelling and artificial neural networks: A general framework. Comput. Chem. Eng. 2004, 28, 755–766. [Google Scholar] [CrossRef]
- Teixeira, A.; Cunha, A.E.; Clemente, J.J.; Moreira, J.L.; Cruz, H.J.; Alves, P.M.; Carrondo, M.J.T.; Oliveira, R. Modelling and optimization of a recombinant BHK-21 cultivation process using hybrid grey-box systems. J. Biotechnol. 2005, 118, 290–303. [Google Scholar] [CrossRef]
- Von Stosch, M.; Oliveira, R.; Peres, J.; Feyo de Azevedo, S. Hybrid semi-parametric modeling in process systems engineering: Past, present and future. Comput. Chem. Eng. 2014, 60, 86–101. [Google Scholar] [CrossRef]
- Yang, S.; Navarathna, P.; Ghosh, S.; Bequette, B.W. Hybrid Modeling in the Era of Smart Manufacturing. Comput. Chem. Eng. 2020, 140, 106874. [Google Scholar] [CrossRef]
- Sansana, J.; Joswiak, M.N.; Castillo, I.; Wang, Z.; Rendall, R.; Chiang, L.H.; Reis, M.S. Recent trends on hybrid modeling for Industry 4.0. Comput. Chem. Eng. 2021, 151, 107365. [Google Scholar] [CrossRef]
- Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958, 65, 386–408. [Google Scholar] [CrossRef]
- Teixeira, A.P.; Alves, C.; Alves, P.M.; Carrondo, M.J.T.; Oliveira, R. Hybrid elementary flux analysis/nonparametric modeling: Application for bioprocess control. BMC Bioinform. 2007, 8, 30. [Google Scholar] [CrossRef] [PubMed]
- Yang, A.; Martin, E.; Morris, J. Identification of semi-parametric hybrid process models. Comput. Chem. Eng. 2011, 35, 63–70. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J.L. ADAM: A method for stochastic optimization. arXiv 2015, arXiv:1412.6980. [Google Scholar]
- Narayanan, H.; Luna, M.; Sokolov, M.; Arosio, P.; Butté, A.; Morbidelli, M. Hybrid Models Based on Machine Learning and an Increasing Degree of Process Knowledge: Application to Capture Chromatographic Step. Ind. Eng. Chem. Res. 2021, 60, 10466–10478. [Google Scholar] [CrossRef]
- Nomikos, P.; MacGregor, J.F. Multi-way partial least squares in monitoring batch processes. Chemom. Intell. Lab. Syst. 1995, 30, 97–108. [Google Scholar] [CrossRef]
- Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
- Valle, S.; Li, W.; Qin, S.J. Selection of the number of principal components: The variance of the reconstruction error criterion with a comparison to other methods. Ind. Eng. Chem. Res. 1999, 38, 4389–4401. [Google Scholar] [CrossRef]
- Mehmood, T.; Liland, K.H.; Snipen, L.; Sæbø, S. A review of variable selection methods in Partial Least Squares Regression. Chemom. Intell. Lab. Syst. 2012, 118, 62–69. [Google Scholar] [CrossRef]
- Eriksson, L.; Johansson, E.; Kettaneh-Wold, N.; Trygg, J.; Wikström, C.; Wold, S. Multi-and Megavariate Data Analysis; Umetrics Ab: Umea, Sweden, 2006. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Botton, A.; Barberi, G.; Facco, P. Data Augmentation to Support Biopharmaceutical Process Development through Digital Models—A Proof of Concept. Processes 2022, 10, 1796. https://doi.org/10.3390/pr10091796
Botton A, Barberi G, Facco P. Data Augmentation to Support Biopharmaceutical Process Development through Digital Models—A Proof of Concept. Processes. 2022; 10(9):1796. https://doi.org/10.3390/pr10091796
Chicago/Turabian StyleBotton, Andrea, Gianmarco Barberi, and Pierantonio Facco. 2022. "Data Augmentation to Support Biopharmaceutical Process Development through Digital Models—A Proof of Concept" Processes 10, no. 9: 1796. https://doi.org/10.3390/pr10091796
APA StyleBotton, A., Barberi, G., & Facco, P. (2022). Data Augmentation to Support Biopharmaceutical Process Development through Digital Models—A Proof of Concept. Processes, 10(9), 1796. https://doi.org/10.3390/pr10091796