Enhancing Personalized Mental Health Support Through Artificial Intelligence: Advances in Speech and Text Analysis Within Online Therapy Platforms
Abstract
:1. Introduction
2. Background
3. Materials and Methods
3.1. Materials
3.1.1. Data Preprocessing
3.1.2. Dataset
3.1.3. Archetypal Selection and Training
3.1.4. Language Modeling
3.1.5. Model Architecture
3.2. Tasks and Design
3.2.1. Conceptual Foundation
- Natural language understanding (NLU): This facet of NLP focuses on converting user input into a structured format that algorithms can interpret, thereby discerning the underlying intent and entities in a given text [52].
- Natural language generation (NLG): Contrasting with NLU, NLG is concerned with formulating coherent responses in natural language based on the machine’s understanding [53].
3.2.2. User Interaction Dynamics
3.2.3. Rasa Architectural Components
- NLU pipeline: Responsible for intent classification, entity extraction, and response generation [60]. It processes user inputs through a trained model, ensuring accurate intent recognition.
- Dialogue management: Discerns the optimal subsequent action in a conversation based on the immediate context [61].
- Tracker stores, event brokers, model storage, and lock stores: These collectively ensure the efficient storage of user interactions, integration with external services, and maintenance of message sequencing.
- domain.yml: A central configuration file that defines all the elements that the assistant can understand and produce. It includes the following:
- 1.
- Responses: The set of utterances the assistant can use in response to user inputs.
- 2.
- Intents: The classifications of user inputs that help the assistant interpret the user’s intentions.
- 3.
- Slots: Variables that store information throughout the conversation, maintaining context and state.
- 4.
- Entities: Information extracted from user inputs that can be used to personalize interactions.
- 5.
- Forms and actions: These enable the assistant to perform tasks and carry out dynamic conversations based on the dialogue flow.
- config.yml: Specifies the machine learning model configurations, guiding the natural language understanding and dialogue management processes.
- data directory: Contains the training data that the assistant uses to learn and improve its understanding and dialogue management with nlu.yml for intent and entity examples, stories.yml for conversational paths, and rules.yml for dialogue policies.
3.2.4. Data Preparation and Model Implementation
- (a)
- Conversational Design and Objective Identification
- (b)
- Data Acquisition and Conversation Simulation
- (c)
- NLU Pipeline and Language Model Choices
- (d)
- Text Tokenization and Featurization
- (e)
- Part-of-Speech Tagging and Intention Classification
- (f)
- Intent Definitions and Training Data
- (g)
- Dialogue Management
- (h)
- Forms in Conversations
3.2.5. Data Management and System Architecture
3.3. Analysis
3.3.1. Evaluation of ASR Performance
3.3.2. NLP System Evaluation
3.3.3. User Feedback and System Refinement
3.3.4. Deployment and Database Integration
3.3.5. Ethical Considerations and Data Privacy
4. Results
4.1. ASR System Performance
4.2. NLP System Evaluation
4.3. Error Analysis and Model Confidence
4.3.1. Model Confidence and Operational Accuracy Metrics
4.3.2. Distribution of Confidence Scores
4.3.3. Summary of Operational Performance Metrics for Voice Assistant
4.4. System Integration and Deployment
4.4.1. Enhancements in Natural Language Understanding (NLU)
4.4.2. Advancements in Dialogue Management
4.4.3. Refinements in Natural Language Generation (NLG)
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI (artificial intelligence) | The simulation of human intelligence processes by machines, particularly computer systems, including learning, reasoning, and self-correction. |
ASR (automatic speech recognition) | Technology that enables a computer to identify and process spoken language into text. |
CBT (cognitive behavioral therapy) | A psycho-social intervention aiming to improve mental health by focusing on challenging and changing unhelpful cognitive distortions and behaviors. |
CTC (Connectionist Temporal Classification) | A type of loss function used in machine learning, particularly for sequence-learning problems in the context of ASR systems. |
DIET (Dual Intent and Entity Transformer) | A model used within the Rasa framework for intent classification and entity extraction from user inputs. |
LM (language model) | A statistical or deep learning-based model that determines the likelihood of a sequence of words in a given language, often used to improve the accuracy of ASR (automatic speech recognition) and NLP systems by predicting subsequent words in a sentence or correcting word usage based on context. |
Mozilla Common Voice (MCV) | A crowd-sourced dataset developed by Mozilla to support speech recognition research, containing diverse voice recordings in multiple languages, including extensive representation of French dialects and accents. |
NLP (natural language processing) | A subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human (natural) languages. |
NLU (natural language understanding) | A subset of NLP focusing on machine reading comprehension, converting user inputs into a structured format that algorithms can interpret. |
NLG (natural language generation) | The process of producing coherent, natural language text from a machine’s internal representation of information. |
WER (word error rate) | A common metric used to measure the performance of an ASR system by comparing the recognized text with a reference text. |
References
- Dahl, G.E.; Yu, D.; Deng, L.; Acero, A. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Trans. Audio Speech Lang. Process. 2012, 20, 30–42. [Google Scholar] [CrossRef]
- Moshe, L.; Terhorst, Y.; Cuijpers, P.; Cristea, I.; Pulkki-Råback, L.; Sander, L. Three decades of internet-and computer-based interventions for the treatment of depression: Protocol for a systematic review and meta-analysis. JMIR Res. Protoc. 2020, 9, e14860. [Google Scholar] [CrossRef] [PubMed]
- Andrews, G.; Cuijpers, P.; Craske, M.G.; McEvoy, P.; Titov, N. Computer therapy for the anxiety and depressive disorders is effective, acceptable and practical health care: A meta-analysis. PLoS ONE 2010, 13, e13196. [Google Scholar] [CrossRef] [PubMed]
- Laranjo, L.; Dunn, A.G.; Tong, H.L.; Kocaballi, A.B.; Chen, J.; Bashir, R.; Surian, D.; Gallego, B.; Magrabi, F.; Lau, A.Y.; et al. Conversational agents in healthcare: A systematic review. J. Am. Med. Inform. Assoc. 2018, 25, 1248–1258. [Google Scholar] [CrossRef]
- Fitzpatrick, K.K.; Darcy, A.; Vierhile, M. Delivering Cognitive Behavior Therapy to Young Adults with Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial. JMIR Ment. Health 2017, 4, e19. [Google Scholar] [CrossRef]
- Fiske, A.; Henningsen, P.; Buyx, A. Your Robot Therapist Will See You Now: Ethical Implications of Embodied Artificial Intelligence in Psychiatry, Psychology, and Psychotherapy. J. Med Internet Res. 2019, 21, e13216. [Google Scholar] [CrossRef] [PubMed]
- Parsons, C.E.; Purves, K.L.; Davies, M.R.; Mundy, J.; Bristow, S.; Eley, T.C.; Breen, G.; Hirsch, C.R.; Young, K.S. Seeking help for mental health during the COVID-19 pandemic: A longitudinal analysis of adults’ experiences with digital technologies and services. PLoS Digit. Health 2023, 2, e0000402. [Google Scholar] [CrossRef]
- Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014. [Google Scholar]
- Amodei, D.; Ananthanarayanan, S.; Anubhai, R.; Bai, J.; Battenberg, E.; Case, C.; Casper, J.; Catanzaro, B.; Cheng, Q.; Chen, G.; et al. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. International Conference on Machine Learning, PMLR. 2016, pp. 173–182. Available online: http://proceedings.mlr.press/v48/amodei16.html (accessed on 1 November 2023).
- Vinyals, O.; Le, Q. A Neural Conversational Model. arXiv 2015, arXiv:1506.05869. [Google Scholar]
- Mancone, S.; Diotaiuti, P.; Valente, G.; Corrado, S.; Bellizzi, F.; Vilarino, G.T.; Andrade, A. The Use of Voice Assistant for Psychological Assessment Elicits Empathy and Engagement While Maintaining Good Psychometric Properties. Behav. Sci. 2023, 13, 550. [Google Scholar] [CrossRef] [PubMed]
- Topol, E.J. High-Performance Medicine: The Convergence of Human and Artificial Intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef]
- Jelinek, F. Statistical Methods for Speech Recognition; MIT Press: Cambridge, MA, USA, 1998; Available online: https://mitpress.mit.edu/9780262546607/statistical-methods-for-speech-recognition/ (accessed on 9 November 2023).
- Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.-R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
- Shawar, B.A.; Atwell, E. Chatbots: Are they really useful? J. Lang. Technol. Comput. Linguist. 2007, 22, 29–49. [Google Scholar] [CrossRef]
- Smith, A.C.; Thomas, E.; Snoswell, C.L.; Haydon, H.; Mehrotra, A.; Clemensen, J.; Caffery, L.J. Telehealth for global emergencies: Implications for coronavirus disease 2019 (COVID-19). J. Telemed. Telecare 2020, 26, 309–313. [Google Scholar] [CrossRef]
- Greenhalgh, T.; Wherton, J.; Shaw, S.; Morrison, C. Video consultations for COVID-19. BMJ 2020, 368, m998. Available online: https://www.bmj.com/content/368/bmj.m998 (accessed on 1 November 2023). [CrossRef] [PubMed]
- Mann, D.M.; Chen, J.; Chunara, R.; Testa, P.A.; Nov, O. COVID-19 transforms health care through telemedicine: Evidence from the field. J. Am. Med. Inform. Assoc. 2020, 27, 1132–1135. [Google Scholar] [CrossRef]
- Maier, A.; Haderlein, T.; Eysholdt, U.; Rosanowski, F.; Batliner, A.; Schuster, M.; Nöth, E. PEAKS–A system for the automatic evaluation of voice and speech disorders. Speech Commun. 2009, 51, 425–437. [Google Scholar] [CrossRef]
- Bickmore, T.W.; Pfeifer, L.M.; Paasche-Orlow, M.K. Using computer agents to explain medical documents to patients with low health literacy. Patient Educ. Couns. 2009, 75, 315–320. [Google Scholar] [CrossRef]
- Turakhia, M.P.; Desai, M.; Hedlin, H.; Rajmane, A.; Talati, N.; Ferris, T.; Desai, S.; Nag, D.; Patel, M.; Kowey, P.; et al. Rationale and design of a large-scale, app-based study to identify cardiac arrhythmias using a smart-watch: The Apple Heart Study. Am. Heart J. 2019, 207, 66–75. [Google Scholar] [CrossRef]
- Woebot. Available online: https://woebothealth.com/ (accessed on 1 November 2023).
- Vaidyam, A.N.; Wisniewski, H.; Halamka, J.D.; Kashavan, M.S.; Torous, J.B. Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape. Can. J. Psychiatry 2019, 64, 456–464. [Google Scholar] [CrossRef] [PubMed]
- Inkster, B.; Sarda, S.; Subramanian, V. An empathy-driven, conversational artificial intelligence agent (Wysa) for digital mental well-being: Real-world data evaluation mixed-methods study. JMIR mHealth uHealth 2018, 6, e12106. [Google Scholar] [CrossRef] [PubMed]
- Schachter, S.; Singer, J. Cognitive, social, and physiological determinants of emotional state. Psychol. Rev. 1962, 69, 379. [Google Scholar] [CrossRef]
- Lucas, G.M.; Gratch, J.; King, A.; Morency, L.P. It’s only a computer: Virtual humans increase willingness to disclose. Comput. Hum. Behav. 2014, 37, 94–100. [Google Scholar] [CrossRef]
- Bickmore, T.W.; Picard, R.W. Establishing and maintaining long-term human-computer relationships. ACM Trans. Comput. Hum. Interact. 2005, 12, 293–327. [Google Scholar] [CrossRef]
- Aubourg, T.; Demongeot, J.; Renard, F.; Provost, H.; Vuillerme, N. Association between social asymmetry and depression in older adults. A phone Call Detail Records analysis. Sci. Rep. 2019, 9, 13524. [Google Scholar] [CrossRef] [PubMed]
- Graham, S.; Depp, C.; Lee, E.E.; Nebeker, C.; Tu, X.; Kim, H.-C.; Jeste, D.V. Artificial Intelligence for Mental Health and Mental Illnesses: An Overview. Curr. Psychiatry Rep. 2019, 21, 116. [Google Scholar] [CrossRef]
- Javed, A.R.; Saadia, A.; Mughal, H.; Gadekallu, T.R.; Rizwan, M.; Maddikunta, P.K.R.; Mahmud, M.; Liyanage, M.; Hussain, A. Artificial Intelligence for Cognitive Health Assessment: State-of-the-Art, Open Challenges and Future Directions. Cogn. Comput. 2023, 15, 1767–1812. [Google Scholar] [CrossRef]
- Schulte-Frankenfeld, P.M.; Trautwein, F.M. App-based mindfulness meditation reduces perceived stress and improves self-regulation in working university students: A randomised controlled trial. Appl. Psychol. Health Well-Being 2022, 14, 1151–1171. [Google Scholar] [CrossRef]
- Denecke, K.; Abd-Alrazaq, A.; Househ, M. Artificial intelligence for chatbots in mental health: Opportunities and challenges. In Multiple Perspectives on Artificial Intelligence in Healthcare: Opportunities and Challenges; Househ, M., Borycki, E., Kushniruk, A., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 115–128. [Google Scholar]
- Haque, M.D.R.; Sabirat, R. An overview of chatbot-based mobile mental health apps: Insights from app description and user reviews. JMIR mHealth uHealth 2023, 11, e44838. [Google Scholar] [CrossRef]
- Klöppel, S.; Initiative, F.T.A.D.N.; Kotschi, M.; Peter, J.; Egger, K.; Hausner, L.; Frölich, L.; Förster, A.; Heimbach, B.; Normann, C.; et al. Separating symptomatic Alzheimer’s disease from depression based on structural MRI. J. Alzheimer’s Dis. 2018, 63, 353–363. [Google Scholar] [CrossRef] [PubMed]
- Straw, I.; Callison-Burch, C. Artificial Intelligence in mental health and the biases of language based models. PLoS ONE 2020, 15, e0240376. [Google Scholar] [CrossRef] [PubMed]
- Anmella, G.; Sanabra, M.; Primé-Tous, M.; Segú, X.; Cavero, M.; Morilla, I.; Grande, I.; Ruiz, V.; Mas, A.; Martin-Villalba, I.; et al. Vickybot, a chatbot for anxiety-depressive symptoms and work-related burnout in primary care and health care professionals: Development, feasibility, and potential effectiveness studies. J. Med. Internet Res. 2023, 25, e43293. [Google Scholar] [CrossRef]
- Ghatak, S.; Hrithik, P.; Debmitra, G. Voicebot For Mental Disease Prediction and Treatment Recommendation Using Machine Learning. TechRxiv 2023. [Google Scholar]
- Davis, S.; Mermelstein, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 1980, 28, 357–366. [Google Scholar] [CrossRef]
- van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A Generative Model for Raw Audio. arXiv 2016, arXiv:1609.03499. [Google Scholar]
- Mozilla. Common Voice: French Dataset. Available online: https://commonvoice.mozilla.org/fr/datasets (accessed on 6 November 2024).
- Fadel, W.; Araf, I.; Bouchentouf, T.; Buvet, P.A.; Bourzeix, F.; Bourja, O. Which French speech recognition system for assistant robots? In Proceedings of the 2nd International Conference on Innovative Research in Applied Science, Engineering & Technology (IRASET), Meknes, Morocco, 3–4 March 2022; IEEE Press: New York, NY, USA, 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Panayotov, V.; Chen, G.; Povey, D.; Khudanpur, S. Librispeech: An ASR corpus based on public domain audio books. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP, South Brisbane, QLD, Australia, 19–24 April 2015; IEEE Press: New York, NY, USA, 2015; pp. 5206–5210. Available online: https://ieeexplore.ieee.org/abstract/document/7178964/ (accessed on 1 November 2023).
- Kuchaiev, O.; Li, J.; Nguyen, H.; Hrinchuk, O.; Leary, R.; Ginsburg, B.; Kriman, S.; Beliaev, S.; Lavrukhin, V.; Cook, J.; et al. NeMo: A toolkit for building AI applications using Neural Modules. arXiv 2019, arXiv:1909.09577. [Google Scholar]
- NVIDIA; STT_FR_QuartzNet15x5. NVIDIA NeMo Model Catalog. Available online: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_fr_quartznet15x5 (accessed on 6 November 2024).
- Majumdar, S.; Balam, J.; Hrinchuk, O.; Balam, J.; Hrinchuk, O.; Lavrukhin, V.; Noroozi, V.; Ginsburg, B. Citrinet: Closing the gap between non-autoregressive and autoregressive end-to-end models for automatic speech recognition. arXiv 2021, arXiv:2104.01721. [Google Scholar]
- Huang, Y.; Ye, G.; Li, L.; Gong, Y. Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need. In Proceedings of the Interspeech, Brno, Czechia, 30 August–3 September 2021; pp. 1309–1313. [Google Scholar]
- Graves, A.; Fernández, S.; Gomez, F.; Schmidhuber, J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning ICML’06, Pittsburgh, PA, USA, 25–29 June 2006; ACM Press: New York, NY, USA, 2006; pp. 369–376. [Google Scholar] [CrossRef]
- Sharma, R.K.; Joshi, M. An analytical study and review of open source chatbot framework, rasa. Int. J. Eng. Res. 2020, 9, 1011–1014. [Google Scholar]
- Heafield, K. KenLM: Faster and smaller language model queries. In Proceedings of the Sixth Workshop on Statistical Machine Translation, Edinburgh, UK, 30–31 July 2011; pp. 187–197. Available online: https://aclanthology.org/W11-2123.pdf (accessed on 1 November 2023).
- Chen, S.F.; Goodman, J. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 1999, 13, 359–394. [Google Scholar] [CrossRef]
- Ardila, R.; Branson, M.; Davis, K.; Henretty, M.; Kohler, M.; Meyer, J.; Morais, R.; Saunders, L.; Tyers, F.M.; Weber, G. Common Voice: A Massively-Multilingual Speech Corpus. arXiv 2020, arXiv:1912.06670. [Google Scholar]
- Hirschberg, J.; Manning, C.D. Advances in natural language processing. Science 2015, 349, 261–266. [Google Scholar] [CrossRef] [PubMed]
- Reiter, E.; Dale, R. Building applied natural language generation systems. Nat. Lang. Eng. 1997, 3, 57–87. [Google Scholar] [CrossRef]
- Dhiman, D.B. Artificial Intelligence and Voice Assistant in Media Studies: A Critical Review, SSRN. 2022. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4250795 (accessed on 2 November 2023).
- Dinesh, R.S.; Surendran, R.; Kathirvelan, D.; Logesh, V. Artificial Intelligence based Vision and Voice Assistant. In Proceedings of the 2022 International Conference on Electronics and Renewable Systems ICEARS, Tuticorin, India, 16–18 March 2022; IEEE Press: New York, NY, USA, 2022; pp. 1478–1483. Available online: https://ieeexplore.ieee.org/abstract/document/9751819/ (accessed on 2 November 2023).
- Gupta, J.N.; Forgionne, G.A.; Mora, M. Intelligent Decision-Making Support Systems: Foundations, Applications and Challenges; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Kadali, B.; Prasad, N.; Kudav, P.; Deshpande, M. Home Automation Using Chatbot and Voice Assistant, in ITM Web of Conferences, EDP Sciences, 2020, 01002. Available online: https://www.itm-conferences.org/articles/itmconf/abs/2020/02/itmconf_icacc2020_01002/itmconf_icacc2020_01002.html (accessed on 2 November 2023).
- Patel, D.; Msosa, Y.J.; Wang, T.; Mustafa, O.G.; Gee, S.; Williams, J.; Roberts, A.; Dobson, R.J.; Gaughran, F. An implementation framework and a feasibility evaluation of a clinical decision support system for diabetes management in secondary mental healthcare using CogStack. BMC Med. Inform. Decis. Mak. 2022, 22, 100. [Google Scholar] [CrossRef] [PubMed]
- Chen, H.; Liu, X.; Yin, D.; Tang, J. A Survey on Dialogue Systems: Recent Advances and New Frontiers. SIGKDD Explor. Newsl. 2017, 19, 25–35. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need, Advances in Neural Information Processing Systems. 2017, Volume 30. Available online: https://proceedings.neurips.cc/paper/7181-attention-is-all-you-need (accessed on 2 November 2023).
- Serban, I.; Sordoni, A.; Bengio, Y.; Courville, A.; Pineau, J. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Available online: https://ojs.aaai.org/index.php/AAAI/article/view/9883 (accessed on 2 November 2023).
- Cassell, J.; Bickmore, T.; Billinghurst, M.; Campbell, L.; Chang, K.; Vilhjálmsson, H.; Yan, H. Embodiment in conversational interfaces: Rea. In Proceedings of the SIGCHI Conference on Human factors in Computing Systems the CHI Is the Limit—CHI ’99, Pittsburgh, PA, USA, 15–20 May 1999; ACM Press: New York, NY, USA, 1999; pp. 520–527. [Google Scholar] [CrossRef]
- Følstad, A.; Taylor, C. Investigating the user experience of customer service chatbot interaction: A framework for qualitative analysis of chatbot dialogues. Qual. User Exp. 2021, 6, 6. [Google Scholar] [CrossRef]
- Delorme, J.; Charvet, V.; Wartelle, M.; Lion, F.; Thuillier, B.; Mercier, S.; Soria, J.-C.; Azoulay, M.; Besse, B.; Massard, C.; et al. Natural Language Processing for Patient Selection in Phase I or II Oncology Clinical Trials. JCO Clin. Cancer Inform. 2021, 5, 709–718. [Google Scholar] [CrossRef] [PubMed]
- AI, E. spaCy French Language Models. Available online: https://spacy.io/models/fr (accessed on 6 November 2024).
- Vincent, M.; Douillet, M.; Lerner, I.; Neuraz, A.; Burgun, A.; Garcelon, N. Using deep learning to improve phenotyping from clinical reports. Stud. Health Technol. Inform. 2022, 290, 282–286. [Google Scholar] [PubMed]
- Honnibal, M.; Montani, I. spaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental Parsing. Neural Machine Translation. In Proceedings of the Association for Computational Linguistics, ACL, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 688–697. [Google Scholar]
- Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python, O’Reilly Media. 2009. Available online: https://www.oreilly.com/library/view/natural-language-processing/9780596803346/ (accessed on 9 November 2023).
- Bocklisch, T.; Faulkner, J.; Pawlowski, N.; Nichol, A. Rasa: Open Source Language Understanding and Dialogue Management. arXiv 2017, arXiv:1712.05181. [Google Scholar]
- Gaur, G.; Moh, M.; Zhang, L.; Lin, H. The effects of automatic speech recognition quality on human transcription latency. In Proceedings of the 2016 Conference of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016. [Google Scholar]
- Morris, A.C.; Maier, V.; Green, P. From WER and RIL to MER and WIL: Improved evaluation measures for connected speech recognition. In Proceedings of the Eighth International Conference on Spoken Language Processing, Jeju Island, Republic of Korea, 4–8 October 2004. [Google Scholar]
- Grinberg, M. Flask Web Development: Developing Web Applications with Python; O’Reilly Media Inc.: Sebastopol, CA, USA, 2018. [Google Scholar]
- Guazzaroni, G. Virtual and Augmented Reality in Mental Health Treatment; IGI Global: Hershey, PA, USA, 2018. [Google Scholar]
- Wrzesien, M.; Burkhardt, J.M.; Raya, M.A.; Botella, C. Mixing psychology and HCI in evaluation of augmented reality mental health technology. In Proceedings of the CHI’11 Extended Abstracts on Human Factors in Computing Systems Vancouver, Vancouver, BC, Canada, 7–12 May 2011; ACM Press: New York, NY, USA, 2011; pp. 2119–2124. [Google Scholar]
- Le Glaz, A.; Haralambous, Y.; Kim-Dufor, D.H.; Lenca, P.; Billot, R.; Ryan, T.C.; Marsh, J.; DeVylder, J.; Walter, M.; Berrouiguet, S.; et al. Machine Learning and Natural Language Processing in Mental Health: Systematic Review. J. Med. Internet Res. 2021, 23, e15708. [Google Scholar] [CrossRef] [PubMed]
- Niculescu, A.; van Dijk, B.; Nijholt, A.; Li, H.; See, S.L. Making social robots more attractive: The effects of voice pitch, humor and empathy. Int. J. Soc. Robot. 2013, 5, 171–191. [Google Scholar] [CrossRef]
- Funk, B.; Sadeh-Sharvit, S.; Fitzsimmons-Craft, E.E.; Trockel, M.T.; E Monterubio, G.; Goel, N.J.; Balantekin, K.N.; Eichen, D.M.; E Flatt, R.; Firebaugh, M.-L.; et al. A Framework for Applying Natural Language Processing in Digital Health Interventions. J. Med. Internet Res. 2020, 22, e13855. [Google Scholar] [CrossRef]
- Abd-Alrazaq, A.; AlSaad, R.; Aziz, S.; Ahmed, A.; Denecke, K.; Househ, M.; Farooq, F.; Sheikh, J. Wearable artificial intelligence for anxiety and depression: Scoping review. J. Med. Internet Res. 2023, 25, e42672. [Google Scholar] [CrossRef]
- Wadle, L.M.; Ebner-Priemer, U.W.; Foo, J.C.; Yamamoto, Y.; Streit, F.; Witt, S.H.; Frank, J.; Zillich, L.; Limberger, M.F.; Ablimit, A.; et al. Speech Features as Predictors of Momentary Depression Severity in Patients With Depressive Disorder Undergoing Sleep Deprivation Therapy: Ambulatory Assessment Pilot Study. JMIR Ment. Health 2024, 11, e49222. [Google Scholar] [CrossRef] [PubMed]
Intent | Definition | Entities | Training Data Count |
---|---|---|---|
goodbye | User wishes to say farewell | - | 8 |
greet | Greetings | - | 8 |
affirm | User confirms something | - | 9 |
deny | User refuses or denies something | - | 4 |
informApp | User seeks information about the application | - | 14 |
informPacks | User inquires about the application’s packages | - | 17 |
bot_challenge | User asks if they are speaking to a bot or a human | - | 3 |
prendreRdv | User requests an appointment | time | 41 |
changerRdv | User requests to change their appointment | time | 14 |
annulerRdv | User requests to cancel their appointment | - | 18 |
raterRdv | User missed their appointment | - | 6 |
informerRdv | User inquires about confirmed appointments | - | 11 |
info_date | User asks for a date of the appointment | time | 9 |
IdK | User responds with ‘I don’t know’ | - | 3 |
ageUser | User provides their age | age | 9 |
raisonEmotion | User responds due to an undesirable emotion | - | 4 |
entenduApp | User responds how they heard about the service | - | 9 |
emotion_therapy | User explains why they need therapy | - | 16 |
gerer_sentiment | User describes how they manage their feelings | - | 6 |
out_of_scope | Intent for text that the assistant does not cover initially | - | 6 |
Intent/Entity | Precision | Recall | F1-Score | Support | Confused with |
---|---|---|---|---|---|
time | 42.86% | 33.33% | 37.50% | 9 | - |
age | 0.00% | 0.00% | 0.00% | 5 | - |
Gerer sentiment | 50.00% | 33.33% | 40.00% | 6 | emotion_therapy (2), out_of_scope (1) |
Informer Rdv | 43.75% | 63.64% | 51.85% | 11 | raterRdv (1), annulerRdv (1) |
Prendre Rdv | 90.24% | 90.24% | 90.24% | 41 | informerRdv (2), out_of_scope (1) |
Emotion therapy | 60.00% | 56.25% | 58.06% | 16 | entenduApp (2), raterRdv (1) |
goodbye | 23.08% | 37.50% | 28.57% | 8 | greet (3), informerRdv (1) |
raterRdv | 42.86% | 50.00% | 46.15% | 6 | emotion_therapy (2), changerRdv (1) |
greet | 20.00% | 12.50% | 15.38% | 8 | goodbye (4), affirm (2) |
deny | 27.27% | 42.86% | 33.33% | 7 | Goodbye (2), emotion_therapy (1) |
info_date | 66.67% | 22.22% | 33.33% | 9 | changerRdv (2), raterRdv (1) |
Informer Packs | 76.19% | 94.12% | 84.21% | 17 | informerApp (1) |
affirm | 20.00% | 20.00% | 20.00% | 10 | goodbye (3), deny (3) |
informer App | 73.33% | 78.57% | 75.86% | 14 | informerPacks (3) |
out of scope | 60.00% | 50.00% | 54.55% | 6 | informerRdv (1), prendreRdv (1) |
annuler Rdv | 89.47% | 94.44% | 91.89% | 18 | deny (1) |
Entendu App | 33.33% | 11.11% | 16.67% | 9 | affirm (4), informerApp (2), emotion therapy (1) |
changer Rdv | 64.29% | 64.29% | 64.29% | 14 | informerRdv (3), info_date (1) |
ageUser | 100.00% | 77.78% | 87.50% | 9 | affirm (1), deny (1) |
Overall | 64.24% | 63.64% | 62.75% | 209 | - |
Metric | Value | Interpretation |
---|---|---|
Accuracy | 63.64% | Indicates the overall percentage of correct predictions by the model. |
Precision (macro avg) | 55.33% | Reflects the model’s ability to not label as positive a sample that is negative, on average. |
Recall (macro avg) | 52.87% | Measures the model’s ability to find all the positive samples, on average. |
F1-score (macro avg) | 52.45% | Combines the precision and recall of the model into a single metric that captures both aspects, on average. |
Support (macro avg) | 209 | Represents the total number of samples that were used to compute the above metrics. |
Precision (weighted avg) | 64.24% | Reflects the precision score adjusted for the number of instances for each label. |
Recall (weighted avg) | 63.64% | Measures the recall score accounting for the true positive rate across all instances. |
F1-score (weighted avg) | 62.75% | Represents the F1-score weighted by the number of instances for each label, providing an overall measure of accuracy. |
Support (weighted avg) | 209 | Denotes the total count of instances considered for the weighted averages |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jelassi, M.; Matteli, K.; Ben Khalfallah, H.; Demongeot, J. Enhancing Personalized Mental Health Support Through Artificial Intelligence: Advances in Speech and Text Analysis Within Online Therapy Platforms. Information 2024, 15, 813. https://doi.org/10.3390/info15120813
Jelassi M, Matteli K, Ben Khalfallah H, Demongeot J. Enhancing Personalized Mental Health Support Through Artificial Intelligence: Advances in Speech and Text Analysis Within Online Therapy Platforms. Information. 2024; 15(12):813. https://doi.org/10.3390/info15120813
Chicago/Turabian StyleJelassi, Mariem, Khouloud Matteli, Houssem Ben Khalfallah, and Jacques Demongeot. 2024. "Enhancing Personalized Mental Health Support Through Artificial Intelligence: Advances in Speech and Text Analysis Within Online Therapy Platforms" Information 15, no. 12: 813. https://doi.org/10.3390/info15120813
APA StyleJelassi, M., Matteli, K., Ben Khalfallah, H., & Demongeot, J. (2024). Enhancing Personalized Mental Health Support Through Artificial Intelligence: Advances in Speech and Text Analysis Within Online Therapy Platforms. Information, 15(12), 813. https://doi.org/10.3390/info15120813