Text-Mining-Based Non-Face-to-Face Counseling Data Classification and Management System
Abstract
:1. Introduction
2. Related Works (Text Mining)
3. Proposed Scheme
3.1. System Configuration
3.2. System Process
- Data Collection: In the first stage, the counseling data entered by the client are transmitted to the system server via the client server.
- Data Purification: In the second stage, the counseling data undergo a purification process to eliminate unnecessary elements. During this stage, the data are tokenized into individual words, and stopword removal is applied to delete unimportant words. This leaves only the core text data required for analysis. The tokenized data are then processed using the Word Embedding technique to semantically vectorize each word into a Word2Vec model.
- TF-IDF Weight Calculation: In the third stage, the purified data are used to calculate the weight of each word using TF-IDF. TF-IDF evaluates the significance of words, assigning lower weights to frequently occurring but semantically insignificant words. The TF-IDF weighted values are multiplied by the Word Embedding vectors to emphasize the meanings of important words.
- Analysis and Classification: In the fourth stage, the data, now with completed weight calculations, undergo analysis and classification. During this process, similar words are clustered, and the resulting values help identify the primary topic of the counseling data.
- Code Assignment: In the fifth stage, important words within each cluster are grouped, and the resulting values are output based on their order in each group. A code value is then derived by comparing the clustered data with a predefined word dictionary. This code is compared with the classification codes stored in the database, and a final code is assigned to the counseling data. The newly assigned code and the corresponding data are stored in the database for future analysis.
- Result Provision: In the final stage, data with the same code value as the newly assigned code are retrieved from the database and presented to the counselor. This completes the system’s overall process, allowing the counselor to use the provided information to offer appropriate counseling solutions to the client.
3.3. Dataset
Data Collection and Preprocessing Tasks
- Data Count and Proportion: Each category reflects clients’ main areas of interest, with “Friends” representing the largest portion at 24.3% of the total data, followed by “Personality” and “Appearance” with similar proportions, and “Family” and “Others” trailing. This distribution highlights the predominant topics clients discuss during counseling.
- Main Topics: Key counseling topics within each category were identified to capture representative characteristics. For instance, in the “Friends” category, frequent topics include friendship, conflict, and communication, while the “Appearance” category centers on themes such as appearance assessment, body shape, facial features, and insecurities. In the “Others” category, topics mainly involve academics and school-related concerns.
- Conversation Length: Average, maximum, and minimum conversation lengths were analyzed by category, quantitatively reflecting the complexity of discussions. The “Friends” category has an average conversation length of about 180 words, with some dialogues extending to 400 words, indicating more in-depth discussions around relationships and conflicts. In contrast, the “Family” and “Others” categories have shorter average conversation lengths of 155 and 150 words, respectively, which may suggest more straightforward discussions on family and academic topics.
4. Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Eo, J.K. Telephone Counselor’s Continued Education Needs. A study on the Long-term Volunteer Experience of Christian Woman Telephone Counselors. Ph.D. Dissertation, Department of Christian Counseling, Graduate School of Christian Studies. Baekseok University, Cheonan, Republich of Korea, 2016. [Google Scholar]
- Chang, Y.R.; Son, E.Y. Telephone Counselor’s Continued Education Needs. Korea J. Couns. 2007, 8, 467–481. [Google Scholar] [CrossRef]
- Dunkley, J.; Whelan, T.A. Vicarious traumatization; Current status and future direction. Br. J. Guid. Couns. 2007, 34, 107–116. [Google Scholar] [CrossRef]
- Linley, P.A.; Joshep, S. Therapy Work and Therapist’s Positive and Negative Well-Being. J. Soc. Clin. Psychol. 2007, 26, 385–403. [Google Scholar] [CrossRef]
- Kong, M.H.; Rou, S.D. Feminist Identity and Its Effects on Job Satisfaction Among Female Telephone Counselling Volunteers. Korean J. Couns. 2004, 5, 43–59. [Google Scholar]
- Suh, S.A.; Han, J.H. Analysis on Psychological Adjustment Process of Cyber Counselor for Adolescent. KCA 2016, 17, 21–36. [Google Scholar] [CrossRef]
- Jeon, B.J.; Choi, Y.J.; Kim, H.W. Application Development for Text Mining: KoALA. Inf. Syst. Rev. 2019, 21, 117–137. [Google Scholar] [CrossRef]
- Rafael, A.C.; Preeta, M.B. Measuring patent’s influence on technological evolution: A study of knowledge spanning and sub-sequent inventive activity. Res. Policy 2015, 44, 508–521. [Google Scholar] [CrossRef]
- Nezamuldeen, L.; Jafri, M.S. Text Mining to Understand Disease-Causing Gene Variants. Knowledge 2024, 4, 422–443. [Google Scholar] [CrossRef]
- Kam, M.A.; Song, M. A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis. J. Intell. Inf. Syst. 2012, 18, 53–77. [Google Scholar] [CrossRef]
- Przybyła, P.; Shardlow, M.; Aubin, S.; Bossy, R.; Castilho, E.D.; Piperidis, S.; McNaught, J.; Ananiadou, S. Text mining re-sources for the life sciences. Database 2016, 2016, baw145. [Google Scholar] [CrossRef]
- Chung, H.N.; Kim, D.H.; Goh, B.O. The Development and Application of Cyber Counseling System for the Gifted Class. JKAIE 2004, 8, 177–190. [Google Scholar]
- Kim, S.W.; Kim, N.G. A Study on the Effect of Using Sentiment Lexicon in Opinion Classification. J. Intell. Inf. Syst. 2014, 20, 133–148. [Google Scholar] [CrossRef]
- Kim, K.H.; Oh, S.R. Methodology for Applying Text Mining Techniques to Analyzing Online Customer Reviews for Market Segmentation. J. Korea Contents Assoc. 2009, 9, 272–284. [Google Scholar] [CrossRef]
- Goo, J.N.; Kim, K.A. Text Mining for Korean: Characteristics and Application to 2011 Korean Economic Census Data. Korean J. Appl. Stat. 2014, 27, 1207–1217. [Google Scholar] [CrossRef]
- Jung, Y.B.; Park, E.S. Keyword Analysis of Two SCI Journals on Rock Engineering by using Text Mining. Tunn. Undergr. Space 2015, 25, 303–319. [Google Scholar] [CrossRef]
- Cho, S.G.; Kim, S.B. Finding Meaningful Pattern of keywords in IIE Transactions Using Text Mining. J. Korean Inst. Ind. Eng. 2012, 38, 67–73. [Google Scholar] [CrossRef]
- Patricia, T.; Moses, L.D. Text mining applied to distance higher education: A systematic literature review. Educ. Inf. Technol. 2023, 29, 10851–10878. [Google Scholar] [CrossRef]
- Sundaram, G.; Berleant, D. Automating Systematic Literature Reviews with Natural Language Processing and Text Mining: A Systematic Literature Review. In Proceedings of Eighth International Congress on Information and Communication Technology; Springer: Singapore, 2023; Volume 9, pp. 73–92. [Google Scholar] [CrossRef]
- Wang, J.; Liu, J.; Wang, C. Keyword Extraction Based on PageRank. In Advances in Knowledge Discovery and Data Mining. PAKDD 2007; Springer: Berlin/Heidelberg, Germany, 2007; Volume LNAI 4426, pp. 857–864. [Google Scholar] [CrossRef]
- Sim, J.S.; Kim, H.J. A Searching Method for Legal Case Using LDA Topic Modeling. J. Inst. Electron. Inf. Eng. 2017, 54, 67–75. [Google Scholar] [CrossRef]
- Kuzma, K.; Yury, R.; Alexey, B. Digital Twins: A Systematic Literature Review Based on Data Analysis and Topic Modeling. Data 2022, 7. [Google Scholar] [CrossRef]
- Kim, N.R.; Lee, N.J. An Analysis of Changes in Social Issues Related to Patient Safety Using Topic Modeling and Word Co-occurrence Analysis. J. Korea Contents Assoc. 2020, 21, 92–104. [Google Scholar] [CrossRef]
- Jang, G.U.; Yoon, Y.M. Predicting Disease-related Genes Using Biomedical Literature Based on GloVe Word Embedding. J. Korean Inst. Inf. Technol. 2020, 18, 1–14. [Google Scholar] [CrossRef]
- Choi, J.M.; Kim, S.Y. Early Detection Assistance System for Rare Diseases based on Patient’s Symptom Information. Korea Inst. Electron. Commun. Sci. 2023, 18, 373–378. [Google Scholar] [CrossRef]
- Shounak, P.; Baidyanath, B.; Rohit, G.; Ajay, K.; Shivam, G. Exploring the factors that affect user experience in mobile-health applications: A text-mining and machine-learning approach. J. Bus. Res. 2023, 156, 113484. [Google Scholar] [CrossRef]
- Conti, D.; Gomez, C.E.; Jaramillo, J.G.; Ospina, V.E. Monitoring the Quality and Perception of Service in Colombian Public Service Companies with Twitter and Descriptive Temporal Analysis. Appl. Sci. 2023, 13, 10338. [Google Scholar] [CrossRef]
- Youm, D.; Kim, J. Text Mining Approach to Improve Mobile Role Playing Games Using Users’ Reviews. Appl. Sci. 2022, 12, 6243. [Google Scholar] [CrossRef]
- Wen, Z.; Chen, Y.; Liu, H.; Liang, Z. Text Mining Based Approach for Customer Sentiment and Product Competitiveness Using Composite Online Review Data. J. Theor. Appl. Electron. Commer. Res. 2024, 19, 1776–1792. [Google Scholar] [CrossRef]
- Lee, Y.R.; Kwon, H.I. Analysis of Twitter Post with ‘Self-Iinjury’ and ‘Ssuicide’ Using Text Mining. Korean J. Cult. Soc. Issues 2023, 29, 147–170. [Google Scholar] [CrossRef]
- Lee, J.E. News Big Data Analysis of Elderly Suicide in Korea Using Text Mining. J. Korea Contents Assoc. 2024, 24, 70–81. [Google Scholar] [CrossRef]
- Park, S.H.; Yu, K.L. Analysis of Instagram Posts Related to Self-Injury and Suicide Using Text Mining. Korean J. Couns. Psychother. 2021, 33, 1429–1455. [Google Scholar] [CrossRef]
- Park, B.G.; Park, S.R. A Study on The Expression of Spatial Meaning Through Text Mining Analysis—Focusing on Big Data about Suicide on the Bridge. Korean Inst. Spat. Des. 2021, 16, 181–190. [Google Scholar] [CrossRef]
- Jennifer, M.B.; Julie, M.K. A Critical Review of Text Mining Applications for Suicide Research. Curr. Epidemiol. Rep. 2022, 9, 126–134. [Google Scholar] [CrossRef]
- Yang, S.M.; Kim, S.B. Analysis of Research Trends in Counseling Program for Domestic University Students Using Text Mining Methods. Korean Soc. Cult. Converg. 2023, 45, 113–122. [Google Scholar] [CrossRef]
- Hyun, Y.C.; Yang, J.H.; Park, J.H. Analysis of Trends in Domestic Learning Counseling Research Using Text Mining Methods. J. Converg. Inf. Technol. 2022, 12, 302–310. [Google Scholar] [CrossRef]
Category | Friend | Personality | Appearance | Family | Ohter | Total |
---|---|---|---|---|---|---|
Data Count | 185 | 177 | 176 | 116 | 107 | 761 |
Data Proportion | 24.3% | 23.3% | 23.1% | 15.2% | 14.1% | 100% |
Key Topics | Friendship, Conflict, Communication, … | Self-esteem, Confidence, Stress, Self-doubt, … | Body, Face, Complex, Voice, … | Nagging, Conversation, Relationship, Brothers and Sisters, … | Exam, Career, Grades, … | - |
Avg. Conversation Length (words) | 180 | 170 | 160 | 155 | 150 | 165 |
Max Conversation Length (words) | 400 | 390 | 370 | 360 | 350 | 400 |
Min Conversation Length (words) | 60 | 55 | 53 | 52 | 50 | 50 |
Word | TF-IDF | Word | TF-IDF | Word | TF-IDF | |||
---|---|---|---|---|---|---|---|---|
1 | courage | 0.6051 | 16 | body | 0.3306 | 31 | annoyance | 0.2662 |
2 | running buddy | 0.5304 | 17 | failure | 0.3239 | 32 | complex | 0.2773 |
3 | violence | 0.7657 | 18 | rumor | 0.3237 | 33 | growth | 0.3237 |
4 | teacher | 0.6949 | 19 | misunder standing | 0.3767 | 34 | laugh | 0.2698 |
5 | friend | 1.0000 | 20 | gossip | 0.3986 | 35 | lie | 0.2769 |
6 | trust | 0.8146 | 21 | grade | 0.3864 | 36 | likeable | 0.6779 |
7 | unfair | 0.6136 | 22 | study | 0.3696 | 37 | pin money | 0.3842 |
8 | homo sexuality | 0.5879 | 23 | worry | 0.2920 | 38 | friendship | 0.8489 |
9 | bust | 0.3210 | 24 | face | 0.4709 | 39 | loneliness | 0.9697 |
10 | give up | 0.3209 | 25 | school | 0.4051 | 40 | club | 0.5349 |
11 | pride | 0.5773 | 26 | voice | 0.3364 | 41 | hatred | 0.5668 |
12 | self-doubt | 0.5712 | 27 | mom | 0.3847 | 42 | argument | 0.3498 |
13 | self-esteem | 0.4013 | 28 | protagonist | 0.3956 | 43 | fight | 0.3427 |
14 | apperance | 0.3596 | 29 | letter | 0.3848 | 44 | bullying | 0.2959 |
15 | looks | 0.3573 | 30 | work out | 0.3457 | 45 | ignore | 0.7299 |
····· |
Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 | |
---|---|---|---|---|---|
Word 1 | school | friend | personality | appearance | family |
Word 2 | club | interest | annoyance | armpit | mom |
Word 3 | study | running buddy | hatred | banter | older sister |
Word 4 | teacher | misunderstanding | experience | bust | dad |
Word 5 | pal | gossip | effort | height | nagging |
Word 6 | subjects | friendship | disobey | fashion | relationship |
Word 7 | violence | secret | affirmation | body | lie |
Word 8 | tattle | trust | talkative | gait | conversation |
Word 9 | cigarette | rumor | introvert | face | yearning |
Word 10 | grade | homosexuality | shy | complex | answer |
Category No. | Category Name | Category Keyword |
---|---|---|
01 | School | club pal violence ··· |
02 | Friend | running buddy gossip friendship ··· |
03 | Personality | affirmation annoyance shy ··· |
04 | Appearance | body complex armpit ··· |
05 | Family | nagging conversation relationship ··· |
School | Friend | Personality | Appearance | Family | |
---|---|---|---|---|---|
School | 44 | 3 | 1 | 2 | 0 |
Friend | 4 | 40 | 2 | 3 | 1 |
Personality | 2 | 4 | 40 | 1 | 3 |
Appearance | 1 | 4 | 2 | 49 | 0 |
Family | 0 | 2 | 4 | 2 | 42 |
Category | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
School | 0.88 | 0.86 | 0.88 | 0.87 |
Friend | 0.80 | 0.75 | 0.80 | 0.77 |
Personality | 0.80 | 0.82 | 0.80 | 0.81 |
Appearance | 0.98 | 0.86 | 0.87 | 0.86 |
Family | 0.84 | 0.91 | 0.84 | 0.87 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, W.; Oh, S.; Park, S. Text-Mining-Based Non-Face-to-Face Counseling Data Classification and Management System. Appl. Sci. 2024, 14, 10747. https://doi.org/10.3390/app142210747
Park W, Oh S, Park S. Text-Mining-Based Non-Face-to-Face Counseling Data Classification and Management System. Applied Sciences. 2024; 14(22):10747. https://doi.org/10.3390/app142210747
Chicago/Turabian StylePark, Woncheol, Seungmin Oh, and Seonghyun Park. 2024. "Text-Mining-Based Non-Face-to-Face Counseling Data Classification and Management System" Applied Sciences 14, no. 22: 10747. https://doi.org/10.3390/app142210747
APA StylePark, W., Oh, S., & Park, S. (2024). Text-Mining-Based Non-Face-to-Face Counseling Data Classification and Management System. Applied Sciences, 14(22), 10747. https://doi.org/10.3390/app142210747