Towards Knowledge-Based Tourism Chinese Question Answering System
Abstract
:1. Introduction
2. Related Works
3. Approach
3.1. Tourism Knowledge Graph Construction
- The crawler initiator creates a crawler task and adds it to the task queue.
- The crawler manager intermittently attempts to obtain all tasks from the crawler task queue and assigns them to the crawler to crawl with a new thread.
- The crawler sends an HTTP request to the URL described in the task and returns the result to the result processor connector. It then waits for the crawler manager to assign a task again.
- After receiving the result, the connector of the crawler result processor starts the processing thread and gives it to the corresponding result processor for processing.
- The crawler result processor processes the returned data (matches the desired content and saves it to the desired file). To crawl a deeper URL, a new crawler task can be created during the processing and added to the crawler queue.
- After we collected all the information relating to Zhejiang tourism, the crawled information must be screened, integrated, and standardize using the following measures:
- Screening: Screening is carried out based on two aspects. One is the selection of attribute entries. Based on our research on the information of various tourism websites and the popularity of user queries, the results show that the attributes of scenic spots are the scenic spot name, scenic spot category, grade, ticket price, opening time, the season that is suitable for playing, location, transportation, description, evaluation, contact number, strategy, and the relationship between scenic spots, including the small scenic spots. The second aspect is the choice of scenic spots. We ignore scenic spots if they have too little information or insufficient heat; are abandoned; have valuable information about fewer than three of the above attributes; do not have clear enough attributes; or whose attributes are too sparse.
- Integration: We first identify the names of the scenic spots we need and crawl the information of scenic spots by browsing various travel websites. Then, we use the intersection of each piece of information as a temporary result to view the description of each website and add necessary and valuable information. For some missing information, we conduct a secondary retrieval and fill in the information if it is a popular scenic spot. If the number of searches for the scenic spot is less than 200, we ignore it. In this way, the information integration is basically completed. If a scenic spot’s name is not reasonable or the name is orally defined, the spot is directly delete such scenic spots. This process is artificially filtered and spots are deleted individually.
- Regularization: The collected scenic spot attribute information is organized into a table form. Firstly, regular expressions are used for normalized information processing. We then manually browse the description of each item to judge whether it is reasonable and smooth. We make slight modifications for unreasonable or non-smooth information, which are generally completed at the same time as integration. In most cases, the information provided by the websites is relatively standardized, so most of the information does not need to be modified. However, the transportation, description, and evaluation information of each website are different, and the grammar may be colloquial in some places. Thus, it is necessary to replace and delete this information.
3.2. Intention Recognition
3.3. Answer Generation
4. Experiments
4.1. Dataset and Experiment Setup
4.1.1. Tourism Knowledge Graph
4.1.2. Evaluation Dataset
4.2. Parameter Setting
4.3. Evaluation Criterion
4.3.1. Classified Evaluation Criterion
4.3.2. QA Evaluation Criterion
4.4. Performance Comparisons and Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Turing, A.M. Computing machinery and intelligence. Mind 1950, 59, 433–460. [Google Scholar] [CrossRef]
- Weizenbaum, J. ELIZA—A computer program for the study of natural language communication between man and machine. Commun. ACM 1966, 9, 36–45. [Google Scholar] [CrossRef]
- Li, F.L.; Chen, W.; Huang, Q.; Guo, Y. Alime kbqa: Question answering over structured knowledge for e-commerce customer service. In Proceedings of the China Conference on Knowledge Graph and Semantic Computing, Hangzhou, China, 24–27 August 2019; Springer: Singapore, 2019; pp. 136–148. [Google Scholar]
- Das, S.; Chong, E.I.; Eadon, G.; Srinivasan, J. System for Ontology-Based Semantic Matching in a Relational Database System. U.S. Patent No. 7328209 B2, 5 February 2008. [Google Scholar]
- Huang, X.; Zhang, J.; Xu, Z.; Ou, L.; Tong, J. A knowledge graph based question answering method for medical domain. PeerJ Comput. Sci. 2021, 7, e667. [Google Scholar] [CrossRef] [PubMed]
- Sheng, M.; Li, A.; Bu, Y.; Dong, J.; Zhang, Y.; Li, X.; Li, C.; Xing, C. DSQA: A Domain Specific QA System for Smart Health Based on Knowledge graph. In Proceedings of the International Conference on Web Information Systems and Applications, Guangzhou, China, 23–25 September 2020; Springer: Cham, Switzerland, 2020; pp. 215–222. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Bast, H.; Haussmann, E. More Accurate Question Answering on Freebase. In Proceedings of the 24th ACM International, ACM, Online, 17 October 2015. [Google Scholar]
- Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. DBpedia: A Nucleus for a Web of Open Data. In Proceedings of the The Semantic Web. ISWC 2007, Busan, Korea, 11–15 November 2007; Lecture Notes in Computer Science. Volume 4825, Chapter 52. pp. 722–735. [Google Scholar]
- Wang, W.; Xiao, Y.; Cui, W. KBQA: An Online Template Based Question Answering System over Freebase. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), New York, NY, USA, 9–15 July 2016. [Google Scholar]
- Liu, N.; Chee, M.L.; Niu, C.; Pek, P.P.; Siddiqui, F.J.; Ansah, J.P.; Matchar, D.B.; Lam, S.S.; Abdullah, H.R.; Chan, A.; et al. Coronavirus disease 2019 (COVID-19): An evidence map of medical literature. BMC Med. Res. Methodol. 2020, 20, 177. [Google Scholar] [CrossRef] [PubMed]
- Goel, M.; Agarwal, A.; Thukral, D.; Chakraborty, T. Fiducia: A Personalized Food Recommender System for Zomato. arXiv 2019, arXiv:1903.10117. [Google Scholar]
- Zhang, Y.; Yu, M.; Li, N.; Yu, C.; Cui, J.; Yu, D. Seq2Seq Attentional Siamese Neural Networks for Text-dependent Speaker Verification. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
- Zhang, L.; Lin, C.; Zhou, D.; He, Y.; Zhang, M. A Bayesian end-to-end model with estimated uncertainties for simple question answering over knowledge bases. Comput. Speech Lang. 2021, 66, 101167. [Google Scholar] [CrossRef]
- Wu, Y.; He, X. Question Answering over Knowledge Base with Symmetric Complementary Attention. In Proceedings of the International Conference on Database Systems for Advanced Applications, Jeju, Korea, 24–27 September 2020; Springer: Cham, Switzerland, 2020; pp. 17–31. [Google Scholar]
- Tan, M.; Dos Santos, C.; Xiang, B.; Zhou, B. Improved representation learning for question answer matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 464–473. [Google Scholar]
- Li, Q.; Zhang, Y.; Wang, H. Knowledge Base Question Answering for Intelligent Maintenance of Power Plants. In Proceedings of the 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Dalian, China, 5–7 May 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar]
- Lai, Y.; Jia, Y.; Yang, L.; Feng, Y. A Chinese Question Answering System for Single-Relation Factoid Questions. In Proceedings of the National CCF Conference on Natural Language Processing and Chinese Computing, Dalian, China, 8–12 November 2017; Springer: Cham, Switzerland, 2017. [Google Scholar]
- Zhou, B.; Sun, C.; Lin, L.; Liu, B. InsunKBQA: A question answering system based on knowledge base. Intell. Comput. Appl. 2017, 7, 150–154. [Google Scholar]
- Partner, J.; Vukotic, A.; Watt, N. Neo4j in Action; Pearson Schweiz Ag: Zug, Switzerland, 2014. [Google Scholar]
- Lu, Y.; Wang, S. The improvement of question process method in QA system. Artif. Intell. Res. 2015, 5, 36–42. [Google Scholar] [CrossRef] [Green Version]
- Sun, C.; Qiu, X.; Xu, Y.; Huang, X. How to Fine-Tune BERT for Text Classification? In Proceedings of the China National Conference on Chinese Computational Linguistics, Kunming, China, 18–20 October 2019; Springer: Cham, Switzerland, 2019. [Google Scholar]
- Peng, X.Y.; Liu, Q.S. Naive Bayesian classification algorithm based on attribute clustering under different classification. J. Comput. Appl. 2011, 31, 3072–3074. [Google Scholar]
- Liu, H.; Zhang, Y.; Li, Y.; Kong, X. Review on Emotion Recognition Based on Electroencephalography. Front. Comput. Neurosci. 2021, 84, 758212. [Google Scholar] [CrossRef] [PubMed]
Website Name | Scale |
---|---|
Ctrip | 2976 |
Qunar | 3010 |
Cncn | 860 |
Spot Name | Type | Level | Ticket | Opening Hours | Suitable Season |
---|---|---|---|---|---|
西湖 (West Lake) | 自然景观 (natural landscape) | 5A | 免费开放, 景区内一些项目另外收费. (It is free, but some items in the scenic spot are charged separately.) | 全天开放 (Open all day) | 3–5月, 9–11月 (March to May, September to November) |
灵隐寺 (Lingyin Temple) | 寺庙 (temple) | 5A | 75元一位 (75 yuan per person) | 全年07:00–18:15 (07:00–18:15 throughout the year) | 四季皆宜 (All seasons) |
西溪国家湿地公园 (Xixi National Wetland Park) | 自然景观 (natural landscape) | 5A | 80元一位 (80 yuan per person) | 4月1日–10月7日07:30–18:30 10月8日-次年3月31日 08:00–17:30 (07:30–18:30 from 1 April to 7 October, 08:00–17:30 from 8 October to 31 March of the next year) | 春秋最佳 (Best in Spring and Autumn) |
Question Type | Scale | Example |
---|---|---|
Complex question | 1000 | 千岛湖豪华游船和普通游船有什么区别? (What is the difference between a luxury cruise ship and an ordinary cruise ship in Qiandao Lake?) 萧山极乐寺的签文准吗? (Is the signature of Xiaoshan blissful Temple accurate?) 龙门古镇和大奇山国家森林公园附近有什么美食推荐? (What are the food recommendations near Longmen ancient town and daqishan National Forest Park?) |
Simple question | 2635 | 西湖的开放时间是什么时候? (When is the West Lake open?) 杭州动物园的门票多少钱一张? (How much is the ticket to Hangzhou zoo?) 西溪国家湿地公园在哪里? (Where is Xixi National Wetland Park?) |
Dataset Name | Simple Question Scale | Complex Question Scale | Total |
---|---|---|---|
Classified training data | 2108 | 800 | 2908 |
Classified testing data | 527 | 200 | 727 |
Classification Name | Simple Question Intention Accuracy | Complex Question Intention Accuracy | Total Intention Accuracy |
---|---|---|---|
Our Classification | 94% | 52.7% | 82.4% |
Bayesian classification | 93% | 31.5% | 75.8% |
System Name | Simple Question Intention Accuracy | Complex Question Intention Accuracy | Total Intention Accuracy |
---|---|---|---|
Our System | 96% | 56.3% | 85.6% |
TravelQA | 94% | 43.7% | 79.9% |
Movie-QA-System (Reappearance) | 88.6% | 36.1% | 73.9% |
System Name | Simple Question Accuracy | Simple Question Smoothness | Complex Question Accuracy | Complex Question Smoothness | Total Accuracy |
---|---|---|---|---|---|
Our System | 88.2% | 85.5% | 49.2% | 78.7% | 65.3% |
TravelQA | 82.4% | 83.7% | 30.1% | 76.5% | 56.3% |
Movie-QA-System (Reappearance) | 86.3% | 84.2% | 32.5% | 76.7% | 59.5% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, J.; Luo, Z.; Huang, H.; Ding, Z. Towards Knowledge-Based Tourism Chinese Question Answering System. Mathematics 2022, 10, 664. https://doi.org/10.3390/math10040664
Li J, Luo Z, Huang H, Ding Z. Towards Knowledge-Based Tourism Chinese Question Answering System. Mathematics. 2022; 10(4):664. https://doi.org/10.3390/math10040664
Chicago/Turabian StyleLi, Jiahui, Zhiyi Luo, Hongyun Huang, and Zuohua Ding. 2022. "Towards Knowledge-Based Tourism Chinese Question Answering System" Mathematics 10, no. 4: 664. https://doi.org/10.3390/math10040664
APA StyleLi, J., Luo, Z., Huang, H., & Ding, Z. (2022). Towards Knowledge-Based Tourism Chinese Question Answering System. Mathematics, 10(4), 664. https://doi.org/10.3390/math10040664