Integrating Multiple Single-Cell RNA Sequencing Datasets Using Adversarial Autoencoders
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Preparation and Data Preprocessing
2.2. Determining the Anchor Batch
- (1)
- Similar to IMMG, an intermediate batch is established as an anchor batch using the balanced mode.
- (2)
- A batch with a larger standard deviation is selected as the anchor batch. A larger standard deviation means that there is greater variability in the cells within the batch, which may cover more cell types.
- (3)
- The user can choose a batch as the anchor batch themselves.
2.3. Correcting the Batch Effect Using an Adversarial Autoencoder
2.3.1. Adversarial Autoencoder Network
2.3.2. Loss Functions
2.3.3. Hyperparameters
2.4. Comparison Methods
2.5. Evaluation Metrics
3. Results
3.1. IMAAE Performance for the Closed Set Scenarios
3.2. IMAAE Performance for the Partial Set Scenarios
3.3. IMAAE Performance for the Open Set Scenarios
3.4. IMAAE Performance on Low-Dimensional Data and Gene Expression Data
3.5. Running Time Comparison
3.6. Additional Experiment 1
3.7. Additional Experiment 2
3.8. Additional Experiment 3
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Consortium, T.M. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 2018, 562, 367–372. [Google Scholar] [CrossRef] [PubMed]
- Svensson, V.; Vento-Tormo, R.; Teichmann, S.A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 2018, 13, 599–604. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Liu, J.; Zhang, C.; Wang, S. SSGraphCPI: A Novel Model for Predicting Compound-Protein Interactions Based on Deep Learning. Int. J. Mol. Sci. 2022, 23, 3780. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Zhang, Z.; Zhang, C.; Meng, X.; Shi, X.; Qu, P. TransPhos: A Deep-Learning Model for General Phosphorylation Site Prediction Based on Transformer-Encoder Architecture. Int. J. Mol. Sci. 2022, 23, 4263. [Google Scholar] [CrossRef]
- Rozenblatt-Rosen, O.; Stubbington, M.J.; Regev, A.; Teichmann, S.A. The Human Cell Atlas: From vision to reality. Nature 2017, 550, 451–453. [Google Scholar] [CrossRef] [Green Version]
- Hon, C.-C.; Shin, J.W.; Carninci, P.; Stubbington, M.J. The Human Cell Atlas: Technical approaches and challenges. Brief. Funct. Genom. 2018, 17, 283–294. [Google Scholar] [CrossRef] [Green Version]
- Tung, P.-Y.; Blischak, J.D.; Hsiao, C.J.; Knowles, D.A.; Burnett, J.E.; Pritchard, J.K.; Gilad, Y. Batch effects and the effective design of single-cell gene expression studies. Sci. Rep. 2017, 7, 39921. [Google Scholar] [CrossRef] [Green Version]
- Tran, H.T.N.; Ang, K.S.; Chevrier, M.; Zhang, X.; Lee, N.Y.S.; Goh, M.; Chen, J. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 2020, 21, 12. [Google Scholar] [CrossRef] [Green Version]
- Haghverdi, L.; Lun, A.T.; Morgan, M.D.; Marioni, J.C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 2018, 36, 421–427. [Google Scholar] [CrossRef]
- Polański, K.; Young, M.D.; Miao, Z.; Meyer, K.B.; Teichmann, S.A.; Park, J.-E. BBKNN: Fast batch alignment of single cell transcriptomes. Bioinformatics 2020, 36, 964–965. [Google Scholar] [CrossRef]
- Hie, B.; Bryson, B.; Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 2019, 37, 685–691. [Google Scholar] [CrossRef]
- Korsunsky, I.; Millard, N.; Fan, J.; Slowikowski, K.; Zhang, F.; Wei, K.; Baglaenko, Y.; Brenner, M.; Loh, P.-R.; Raychaudhuri, S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 2019, 16, 1289–1296. [Google Scholar] [CrossRef]
- Li, X.; Wang, K.; Lyu, Y.; Pan, H.; Zhang, J.; Stambolian, D.; Susztak, K.; Reilly, M.P.; Hu, G.; Li, M. Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis. Nat. Commun. 2020, 11, 2338. [Google Scholar] [CrossRef]
- Wang, C.; Gao, Y.L.; Liu, J.X.; Kong, X.Z.; Zheng, C.H. Single-cell RNA sequencing data clustering by low-rank subspace ensemble framework. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 19, 1154–1164. [Google Scholar] [CrossRef]
- Zhang, W.; Li, Y.Y.; Zou, X.F. SCCLRR: A Robust Computational Method for Accurate Clustering Single Cell RNA-Seq Data. IEEE J. Biomed. Health Inform. 2021, 25, 247–256. [Google Scholar] [CrossRef]
- Riva, S.G.; Cazzaniga, P.; Tangherloni, A. Integration of Multiple scRNA-Seq Datasets on the Autoencoder Latent Space. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Houston, TX, USA, 9–12 December 2021; pp. 2155–2162. [Google Scholar]
- Shao, X.; Liao, J.; Lu, X.; Xue, R.; Ai, N.; Fan, X. scCATCH: Automatic annotation on cell types of clusters from single-cell RNA sequencing data. Iscience 2020, 23, 100882. [Google Scholar] [CrossRef] [Green Version]
- Cao, Y.; Wang, X.; Peng, G. SCSA: A cell type annotation tool for single-cell RNA-seq data. Front. Genet. 2020, 11, 490. [Google Scholar] [CrossRef]
- Shao, X.; Yang, H.; Zhuang, X.; Liao, J.; Yang, P.; Cheng, J.; Lu, X.; Chen, H.; Fan, X. scDeepSort: A pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network. Nucleic Acids Res. 2021, 49, e122. [Google Scholar] [CrossRef]
- Wang, D.; Hou, S.; Zhang, L.; Wang, X.; Liu, B.; Zhang, Z. iMAP: Integration of multiple single-cell datasets by adversarial paired transfer networks. Genome Biol. 2021, 22, 63. [Google Scholar] [CrossRef]
- Xiong, L.; Tian, K.; Li, Y.; Zhang, Q. Construction of continuously expandable single-cell atlases through integration of heterogeneous datasets in a generalized cell-embedding space. bioRxiv 2021. [Google Scholar] [CrossRef]
- Lotfollahi, M.; Wolf, F.A.; Theis, F.J. scGen predicts single-cell perturbation responses. Nat. Methods 2019, 16, 715–721. [Google Scholar] [CrossRef] [PubMed]
- Zheng, G.X.; Terry, J.M.; Belgrader, P.; Ryvkin, P.; Bent, Z.W.; Wilson, R.; Ziraldo, S.B.; Wheeler, T.D.; McDermott, G.P.; Zhu, J. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 2017, 8, 14049. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, Y.J.; Schug, J.; Won, K.-J.; Liu, C.; Naji, A.; Avrahami, D.; Golson, M.L.; Kaestner, K.H. Single-cell transcriptomics of the human endocrine pancreas. Diabetes 2016, 65, 3028–3038. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Baron, M.; Veres, A.; Wolock, S.L.; Faust, A.L.; Gaujoux, R.; Vetere, A.; Ryu, J.H.; Wagner, B.K.; Shen-Orr, S.S.; Klein, A.M. A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure. Cell Syst. 2016, 3, 346–360.e4. [Google Scholar] [CrossRef] [Green Version]
- Lawlor, N.; George, J.; Bolisetty, M.; Kursawe, R.; Sun, L.; Sivakamasundari, V.; Kycia, I.; Robson, P.; Stitzel, M.L. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type–specific expression changes in type 2 diabetes. Genome Res. 2017, 27, 208–222. [Google Scholar] [CrossRef] [Green Version]
- Muraro, M.J.; Dharmadhikari, G.; Grün, D.; Groen, N.; Dielen, T.; Jansen, E.; van Gurp, L.; Engelse, M.A.; Carlotti, F.; de Koning, E.J. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 2016, 3, 385–394.e3. [Google Scholar] [CrossRef] [Green Version]
- Grün, D.; Muraro, M.J.; Boisset, J.-C.; Wiebrands, K.; Lyubimova, A.; Dharmadhikari, G.; van den Born, M.; van Es, J.; Jansen, E.; Clevers, H. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 2016, 19, 266–277. [Google Scholar] [CrossRef] [Green Version]
- Wang, X.; Zhang, C.; Zhang, Y.; Meng, X.; Zhang, Z.; Shi, X.; Song, T. IMGG: Integrating Multiple Single-Cell Datasets through Connected Graphs and Generative Adversarial Networks. Int. J. Mol. Sci. 2022, 23, 2082. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Agarap, A.F. Deep learning using rectified linear units (relu). arXiv 2018, arXiv:1803.08375. [Google Scholar]
- Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef] [Green Version]
- Hubert, L.; Arabie, P. Comparing partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
- McDaid, A.F.; Greene, D.; Hurley, N. Normalized mutual information to evaluate overlapping community finding algorithms. arXiv 2011, arXiv:1110.2515. [Google Scholar]
- McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
- van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Data | Method | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Pancreas | IMAAE | −0.1798 | 0.6930 | 0.8731 | 0.0091 | 0.9064 | 0.9468 | 0.0295 | 0.9183 | 0.9437 |
scGen | −0.1293 | 0.3339 | 0.5154 | −0.0019 | 0.7491 | 0.8572 | 0.0302 | 0.8359 | 0.8979 | |
MNN | −0.1055 | 0.1786 | 0.3075 | 0.0095 | 0.5468 | 0.7046 | 0.0310 | 0.7667 | 0.8561 | |
iMAP | −0.0630 | 0.2087 | 0.3488 | −0.0015 | 0.8659 | 0.9288 | 0.0296 | 0.8259 | 0.8923 | |
SCALEX | −0.0442 | 0.2879 | 0.4514 | 0.0083 | 0.5870 | 0.7375 | 0.0182 | 0.7424 | 0.8455 | |
PBMC | IMAAE | 0.0084 | 0.3405 | 0.5069 | 0.0095 | 0.8172 | 0.8955 | 0.0059 | 0.8842 | 0.9359 |
scGen | 0.0092 | 0.3449 | 0.5116 | 0.0094 | 0.7031 | 0.8224 | 0.0059 | 0.8436 | 0.9127 | |
MNN | 0.0140 | 0.1912 | 0.3203 | 0.0100 | 0.6280 | 0.7685 | 0.0063 | 0.7678 | 0.8662 | |
iMAP | 0.0068 | 0.1732 | 0.2949 | 0.0098 | 0.5621 | 0.7171 | 0.0062 | 0.7321 | 0.8431 | |
SCALEX | 0.0065 | 0.2385 | 0.3846 | 0.0085 | 0.5343 | 0.6944 | 0.0052 | 0.7285 | 0.8411 | |
PBMC subset2 | IMAAE | 0.0094 | 0.4378 | 0.6072 | 0.0094 | 0.7946 | 0.8818 | 0.0059 | 0.8655 | 0.9253 |
scGen | 0.0180 | 0.4181 | 0.5865 | 0.0115 | 0.8094 | 0.8900 | 0.0053 | 0.8477 | 0.9153 | |
MNN | 0.0154 | 0.1954 | 0.3261 | 0.0131 | 0.6796 | 0.8049 | 0.0061 | 0.7794 | 0.8737 | |
iMAP | 0.0094 | 0.1466 | 0.2554 | 0.0148 | 0.5967 | 0.7432 | 0.0075 | 0.7022 | 0.8225 | |
SCALEX | 0.0438 | 0.2075 | 0.3409 | 0.0060 | 0.6185 | 0.7625 | 0.0018 | 0.6810 | 0.8096 |
Data | Method | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Pancreas | IMAAE(Mean) | −0.1798 | 0.6930 | 0.8731 | 0.0091 | 0.9064 | 0.9468 | 0.0295 | 0.9183 | 0.9437 |
IMAAE(Max. Std) | −0.1573 | 0.4617 | 0.6601 | −0.0023 | 0.7794 | 0.8769 | 0.0302 | 0.8619 | 0.9127 | |
IMAAE(Custom) | −0.1618 | 0.6087 | 0.7989 | 0.0100 | 0.8988 | 0.9422 | 0.0294 | 0.8811 | 0.9237 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, X.; Zhang, C.; Wang, L.; Zheng, P. Integrating Multiple Single-Cell RNA Sequencing Datasets Using Adversarial Autoencoders. Int. J. Mol. Sci. 2023, 24, 5502. https://doi.org/10.3390/ijms24065502
Wang X, Zhang C, Wang L, Zheng P. Integrating Multiple Single-Cell RNA Sequencing Datasets Using Adversarial Autoencoders. International Journal of Molecular Sciences. 2023; 24(6):5502. https://doi.org/10.3390/ijms24065502
Chicago/Turabian StyleWang, Xun, Chaogang Zhang, Lulu Wang, and Pan Zheng. 2023. "Integrating Multiple Single-Cell RNA Sequencing Datasets Using Adversarial Autoencoders" International Journal of Molecular Sciences 24, no. 6: 5502. https://doi.org/10.3390/ijms24065502
APA StyleWang, X., Zhang, C., Wang, L., & Zheng, P. (2023). Integrating Multiple Single-Cell RNA Sequencing Datasets Using Adversarial Autoencoders. International Journal of Molecular Sciences, 24(6), 5502. https://doi.org/10.3390/ijms24065502