A Joint Batch Correction and Adaptive Clustering Method of Single-Cell Transcriptomic Data
Abstract
:1. Introduction
2. Materials and Methods
2.1. The Architecture of DACAL
2.1.1. The AE Module
2.1.2. The DPMM Module
2.1.3. The Adversarial Module
2.1.4. Loss Function
Algorithm 1 The DACAL training algorithm |
Input: A scRNA-seq dataset |
Output: The cluster label |
1: for do: |
2: Sample a mini-batch from the dataset, where |
3: for Step do: |
4: and update with loss |
7: end for |
and with loss |
9: if then: 10: Freeze and update with loss 11: end if |
12: end for |
2.2. Training the DACAL Model
2.3. Datasets and Preprocessing
2.4. Comparing Methods and Evaluation Metrics
3. Results
3.1. DACAL Is Robust to Hyperparameter Changes on scRNA-seq Data
3.2. DACAL Can Jointly Remove Batch Effect and Cluster Adaptively on scRNA-seq Data
3.3. DACAL Can Provide Fine-Grained Clusters on scRNA-seq Data with Batch Effect
3.4. Comparisons of Runtime and Memory Usage of DACAL and Other Methods
4. Discussion
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bacher, R.; Kendziorski, C. Design and Computational Analysis of Single-Cell RNA-Sequencing Experiments. Genome Biol. 2016, 17, 63. [Google Scholar] [CrossRef] [PubMed]
- Xu, Y.; Mizuno, T.; Sridharan, A.; Du, Y.; Guo, M.; Tang, J.; Wikenheiser-Brokamp, K.A.; Perl, A.-K.T.; Funari, V.A.; Gokey, J.J.; et al. Single-Cell RNA Sequencing Identifies Diverse Roles of Epithelial Cells in Idiopathic Pulmonary Fibrosis. JCI Insight 2016, 1, e90558. [Google Scholar] [CrossRef] [PubMed]
- Briggs, J.A.; Weinreb, C.; Wagner, D.E.; Megason, S.; Peshkin, L.; Kirschner, M.W.; Klein, A.M. The Dynamics of Gene Expression in Vertebrate Embryogenesis at Single-Cell Resolution. Science 2018, 360, eaar5780. [Google Scholar] [CrossRef] [PubMed]
- Stuart, T.; Butler, A.; Hoffman, P.; Hafemeister, C.; Papalexi, E.; Mauck, W.M.; Hao, Y.; Stoeckius, M.; Smibert, P.; Satija, R. Comprehensive Integration of Single-Cell Data. Cell 2019, 177, 1888–1902.e21. [Google Scholar] [CrossRef]
- Welch, J.D.; Kozareva, V.; Ferreira, A.; Vanderburg, C.; Martin, C.; Macosko, E.Z. Single-Cell Multi-Omic Integration Compares and Contrasts Features of Brain Cell Identity. Cell 2019, 177, 1873–1887. [Google Scholar] [CrossRef] [PubMed]
- Korsunsky, I.; Millard, N.; Fan, J.; Slowikowski, K.; Zhang, F.; Wei, K.; Baglaenko, Y.; Brenner, M.; Loh, P.-R.; Raychaudhuri, S. Fast, Sensitive and Accurate Integration of Single-Cell Data with Harmony. Nat. Methods 2019, 16, 1289–1296. [Google Scholar] [CrossRef]
- Dincer, A.B.; Janizek, J.D.; Lee, S.-I. Adversarial Deconfounding Autoencoder for Learning Robust Gene Expression Embeddings. Bioinformatics 2020, 36, i573–i582. [Google Scholar] [CrossRef]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
- Blondel, V.D.; Guillaume, J.-L.; Lambiotte, R.; Lefebvre, E. Fast Unfolding of Communities in Large Networks. J. Stat. Mech. Theory Exp. 2008, 2008, 10008. [Google Scholar] [CrossRef]
- Traag, V.A.; Waltman, L.; van Eck, N.J. From Louvain to Leiden: Guaranteeing Well-Connected Communities. Sci. Rep. 2019, 9, 5233. [Google Scholar] [CrossRef]
- Li, Y.; Lin, Y.; Hu, P.; Peng, D.; Luo, H.; Peng, X. Single-Cell RNA-Seq Debiased Clustering via Batch Effect Disentanglement. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–11. [Google Scholar] [CrossRef]
- Hu, H.; Li, Z.; Li, X.; Yu, M.; Pan, X. ScCAEs: Deep Clustering of Single-Cell RNA-Seq via Convolutional Autoencoder Embedding and Soft K-Means. Brief. Bioinform. 2022, 23, bbab321. [Google Scholar] [CrossRef]
- Antoniak, C.E. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. Ann. Stat. 1974, 2, 1152–1174. [Google Scholar] [CrossRef]
- Zhao, T.; Wang, Z.; Masoomi, A.; Dy, J. Deep Bayesian Unsupervised Lifelong Learning. Neural Netw. 2022, 149, 95–106. [Google Scholar] [CrossRef] [PubMed]
- Bishop, C. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
- Blei, D.M.; Jordan, M.I. Variational Inference for Dirichlet Process Mixtures. Bayesian Anal. 2006, 1, 121–143. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Vancouver, BC, Canada, 2019; Volume 32. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization 2019. Available online: https://openreview.net/forum?id=Bkg6RiCqY7 (accessed on 5 December 2023).
- Yu, X.; Xu, X.; Zhang, J.; Li, X. Batch Alignment of Single-Cell Transcriptomics Data Using Deep Metric Learning. Nat. Commun. 2023, 14, 960. [Google Scholar] [CrossRef] [PubMed]
- Strehl, A.; Ghosh, J. Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res. 2002, 3, 583–617. [Google Scholar]
- Hubert, L.; Arabie, P. Comparing Partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
- Powers, D. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
- Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
- Bach, K.; Pensa, S.; Grzelak, M.; Hadfield, J.; Adams, D.J.; Marioni, J.C.; Khaled, W.T. Differentiation Dynamics of Mammary Epithelial Cells Revealed by Single-Cell RNA Sequencing. Nat. Commun. 2017, 8, 2128. [Google Scholar] [CrossRef] [PubMed]
- Construction of Developmental Lineage Relationships in the Mouse Mammary Gland by Single-Cell RNA Profiling—PubMed. Available online: https://pubmed.ncbi.nlm.nih.gov/29158510/ (accessed on 30 September 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
An, S.; Shi, J.; Liu, R.; Wang, J.; Hu, S.; Dong, G.; Ying, X.; He, Z. A Joint Batch Correction and Adaptive Clustering Method of Single-Cell Transcriptomic Data. Mathematics 2023, 11, 4901. https://doi.org/10.3390/math11244901
An S, Shi J, Liu R, Wang J, Hu S, Dong G, Ying X, He Z. A Joint Batch Correction and Adaptive Clustering Method of Single-Cell Transcriptomic Data. Mathematics. 2023; 11(24):4901. https://doi.org/10.3390/math11244901
Chicago/Turabian StyleAn, Sijing, Jinhui Shi, Runyan Liu, Jing Wang, Shuofeng Hu, Guohua Dong, Xiaomin Ying, and Zhen He. 2023. "A Joint Batch Correction and Adaptive Clustering Method of Single-Cell Transcriptomic Data" Mathematics 11, no. 24: 4901. https://doi.org/10.3390/math11244901
APA StyleAn, S., Shi, J., Liu, R., Wang, J., Hu, S., Dong, G., Ying, X., & He, Z. (2023). A Joint Batch Correction and Adaptive Clustering Method of Single-Cell Transcriptomic Data. Mathematics, 11(24), 4901. https://doi.org/10.3390/math11244901