CPEQA: A Large Language Model Based Knowledge Base Retrieval System for Chinese Confidentiality Knowledge Question Answering
Abstract
:1. Introduction
- Combining knowledge base retrieval techniques and LLMs, we designed an intelligent question-answering system oriented to Chinese confidentiality publicity and education, called CPEQA. To enable CPEQA to better understand the question, we used a topic-word generation technique that embeds the keywords into the user’s question vectorization which involves the provisions of confidentiality law, the confidentiality level, etc. Experimental data reflecting this design would improve the semantic search accuracy significantly.
- Similarity calculation for unstructured text corpus is another highlight of our work. We used the pretraining model [11] to vectorize the keywords and calculate the similarity value with the existing word vectors in the FAISS database. This process can mitigate the dilemma where we sometimes cannot obtain inaccurate similarity calculation results because of the difference between the length of the question text and the retrieved answers.
- We also integrated the conventional database retrieval technique and LLMs into the database query system construction of CPEQA, enabling real-time query and data analysis. Building upon the existing system architecture, we extracted the annotation set of the table creation statements, vectorized them, and stored them in the knowledge base alongside the table creation statements. For similarity computation, we compared the users’ questions with the annotation set to obtain the corresponding table creation statements, which then helped us derive the prompt sentences. These derived prompt sentences were subsequently input into the LLMs to generate the SQL queries required to retrieve relevant data from the database. Experimental results demonstrated that our methods exhibit excellent performance for both single-table and multi-table query tasks.
2. Related Work
2.1. Knowledge Base Question Answering System
2.2. LLMs-Based Question Answering System
2.3. LLMs-Based Database Query
3. Preliminaries on Large Language Models
3.1. Large Language Model
3.2. Prompt Engineering
4. Methodology
4.1. Question Answering System Architecture and Details
Algorithm 1 LLMs Base Question Answering System Architecture |
Input: unstructured_doc structure_QA_pairs Output: Final_answer. 1: Knowledge Base Constrcution Process 2: • Unstructured text Pre-processing 3: 4: 5: • Text vectorization and storage 6: 7: for do 8: 9: end for 10: • Structure_QA_pairs stored in database 11: for do 12: 13: 14: 15: 16: end for 17: Knowledge Base Question Answering 18: • Initilization 19: 20: 21: • Input Questions and Vectorization 22: Q= 23: =model.encode(Q) 24: • Similarity Comparison and candidate_texts Acquisition 25: = .search_similar(, =5) 26: for do 27: 28: end for 29: • Construct Prompt and Generate Answers 30: for do 31: prompts = Do you know about 32: end for 33: for do 34: = model.generate_answer() 35: print(“, ”) 36: end for |
4.2. Database Query Architecture and Details
Algorithm 2 LLMs Based Database Query System Architecture |
Input: unstructured_doc: Output: Data 1: Knowledge Base Constrcution 2: • Extract Table Creation Sentences Annotation 3: = extract_annotation() 4: • Split the Annotation by word 5: = split_annotation() 6: • Vectorization and Storage in database 7: = create_vector() 8: = create_vector() 9: Data Query Process 10: • Vector Similarity Matching between Quesions and Annotation 11: = matching() 12: • Acquiring corresponding DDL base annotation 13: = get DDL from annotation_index(, , ) 14: • Prompt Construction 15: = construct_prompt() 16: • LLMs Generate SQL and Query 17: = generate_sql() 18: = query_database() |
5. Experiments
5.1. Implementation Details
- ROUGE-L [66] measures recall by how much the words in reference sentences appear in predictions using Longest Common Subsequence-based statistics.
- BLEU [67] measures precision by how much the words in predictions appear in reference sentences. BLEU-1(B1), BLEU-2(B2), BLEU-3(B3), and BLEU-4(B4) use 1-gram to 4-gram for calculation, respectively.
5.2. LLMs-Based Question Answering System
5.2.1. Demonstration of Question Answering System
5.2.2. Case Study Process
5.2.3. Influence of Generated Keywords
5.2.4. The Performance Question Answering System CPEQA
5.3. LLMs-Based Database Query
5.3.1. Demonstration of Database Query Interface
5.3.2. Database Query System Performance
6. Discussion and Future Work
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large language models are zero-shot reasoners. Adv. Neural Inf. Process. Syst. 2022, 35, 22199–22213. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
- Fellbaum, C. WordNet: An Electronic Lexical Database; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Kruse, P.M.; Naujoks, A.; Rösner, D.; Kunze, M. Clever search: A wordnet based wrapper for internet search engines. arXiv 2005, arXiv:cs/0501086. [Google Scholar]
- Rocha, C.; Schwabe, D.; Aragao, M.P. A hybrid approach for searching in the semantic web. In Proceedings of the 13th International Conference on World Wide Web, New York, NY, USA, 17–20 May 2004; pp. 374–383. [Google Scholar]
- Airio, E.; Järvelin, K.; Saatsi, P.; Kekäläinen, J.; Suomela, S. Ciri-an ontology-based query interface for text retrieval. In Proceedings of the Web Intelligence: Proceedings of the 11th Finnish Artificial Intelligence Conference, Vantaa, Finland, 2–3 September 2004; Citeseer: Princeton, NJ, USA, 2004. [Google Scholar]
- Wang, H.; Shu, K. Explainable claim verification via knowledge-grounded reasoning with large language models. arXiv 2023, arXiv:2310.05253. [Google Scholar]
- Petroni, F.; Rocktäschel, T.; Lewis, P.; Bakhtin, A.; Wu, Y.; Miller, A.H.; Riedel, S. Language models as knowledge bases? arXiv 2019, arXiv:1909.01066. [Google Scholar]
- Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.; Madotto, A.; Fung, P. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 2023, 55, 248:1–248:38. [Google Scholar] [CrossRef]
- Bang, Y.; Cahyawijaya, S.; Lee, N.; Dai, W.; Su, D.; Wilie, B.; Lovenia, H.; Ji, Z.; Yu, T.; Chung, W.; et al. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv 2023, arXiv:2302.04023. [Google Scholar]
- Cheung, A.; Kamil, S.; Solar-Lezama, A. Bridging the gap between general-purpose and domain-specific compilers with synthesis. In Proceedings of the 1st Summit on Advances in Programming Languages (SNAPL 2015). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, Asilomar, CA, USA, 3–6 May 2015. [Google Scholar]
- Luo, K.; Lin, F.; Luo, X.; Zhu, K. Knowledge base question answering via encoding of complex query graphs. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 2185–2194. [Google Scholar]
- Zhu, S.; Cheng, X.; Su, S. Knowledge-based question answering by tree-to-sequence learning. Neurocomputing 2020, 372, 64–72. [Google Scholar] [CrossRef]
- Miller, A.; Fisch, A.; Dodge, J.; Karimi, A.H.; Bordes, A.; Weston, J. Key-value memory networks for directly reading documents. arXiv 2016, arXiv:1606.03126. [Google Scholar]
- Luo, H.; Tang, Z.; Peng, S.; Guo, Y.; Zhang, W.; Ma, C.; Dong, G.; Song, M.; Lin, W.; Zhu, Y.; et al. Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models. arXiv 2023, arXiv:2310.08975. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013, 26. [Google Scholar]
- Ram, P.; Gray, A.G. Maximum inner-product search using cone trees. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 931–939. [Google Scholar]
- Aumüller, M.; Bernhardsson, E.; Faithfull, A. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 2020, 87, 101374. [Google Scholar] [CrossRef]
- Wu, C.; Zhang, X.; Zhang, Y.; Wang, Y.; Xie, W. Pmc-llama: Further finetuning llama on medical papers. arXiv 2023, arXiv:2304.14454. [Google Scholar]
- Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; et al. Large language models encode clinical knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef]
- Yunxiang, L.; Zihan, L.; Kai, Z.; Ruilong, D.; You, Z. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv 2023, arXiv:2303.14070. [Google Scholar]
- Zhang, S.; Sun, Y. Automatically synthesizing sql queries from input-output examples. In Proceedings of the 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), Silicon Valley, CA, USA, 11–15 November 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 224–234. [Google Scholar]
- Li, H.; Chan, C.Y.; Maier, D. Query from examples: An iterative, data-driven approach to query construction. Proc. VLDB Endow. 2015, 8, 2158–2169. [Google Scholar] [CrossRef]
- Wang, C.; Cheung, A.; Bodik, R. Synthesizing highly expressive SQL queries from input-output examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, Barcelona, Spain, 18–23 June 2017; pp. 452–466. [Google Scholar]
- Thakkar, A.; Naik, A.; Sands, N.; Alur, R.; Naik, M.; Raghothaman, M. Example-guided synthesis of relational queries. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual, 20–25 June 2021; pp. 1110–1125. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Chen, M.; Tworek, J.; Jun, H.; Yuan, Q.; Pinto, H.P.D.O.; Kaplan, J.; Edwards, H.; Burda, Y.; Joseph, N.; Brockman, G.; et al. Evaluating large language models trained on code. arXiv 2021, arXiv:2107.03374. [Google Scholar]
- Shanahan, M. Talking about large language models. Commun. ACM 2024, 67, 68–79. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 2023, 24, 1–113. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Clark, K.; Luong, M.T.; Le, Q.V.; Manning, C.D. Electra: Pre-training text encoders as discriminators rather than generators. arXiv 2020, arXiv:2003.10555. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018, Preprint, Work in Progress. Available online: https://api.semanticscholar.org/CorpusID:49313245 (accessed on 20 October 2024).
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
- Nakano, R.; Hilton, J.; Balaji, S.; Wu, J.; Ouyang, L.; Kim, C.; Hesse, C.; Jain, S.; Kosaraju, V.; Saunders, W.; et al. Webgpt: Browser-assisted question-answering with human feedback. arXiv 2021, arXiv:2112.09332. [Google Scholar]
- Anil, R.; Dai, A.M.; Firat, O.; Johnson, M.; Lepikhin, D.; Passos, A.; Shakeri, S.; Taropa, E.; Bailey, P.; Chen, Z.; et al. Palm 2 technical report. arXiv 2023, arXiv:2305.10403. [Google Scholar]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
- Wen, Y.; Wang, Z.; Sun, J. Mindmap: Knowledge graph prompting sparks graph of thoughts in large language models. arXiv 2023, arXiv:2308.09729. [Google Scholar]
- Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
- Tonmoy, S.; Zaman, S.; Jain, V.; Rani, A.; Rawte, V.; Chadha, A.; Das, A. A comprehensive survey of hallucination mitigation techniques in large language models. arXiv 2024, arXiv:2401.01313. [Google Scholar]
- Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; Wu, X. Unifying large language models and knowledge graphs: A roadmap. IEEE Trans. Knowl. Data Eng. 2024, 36, 3580–3599. [Google Scholar] [CrossRef]
- Liu, J.; Liu, A.; Lu, X.; Welleck, S.; West, P.; Bras, R.L.; Choi, Y.; Hajishirzi, H. Generated knowledge prompting for commonsense reasoning. arXiv 2021, arXiv:2110.08387. [Google Scholar]
- Dong, Q.; Li, L.; Dai, D.; Zheng, C.; Wu, Z.; Chang, B.; Sun, X.; Xu, J.; Sui, Z. A survey on in-context learning. arXiv 2022, arXiv:2301.00234. [Google Scholar]
- Zhou, Y.; Muresanu, A.I.; Han, Z.; Paster, K.; Pitis, S.; Chan, H.; Ba, J. Large language models are human-level prompt engineers. arXiv 2022, arXiv:2211.01910. [Google Scholar]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
- Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y. React: Synergizing reasoning and acting in language models. arXiv 2022, arXiv:2210.03629. [Google Scholar]
- Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.; Chi, E.; Narang, S.; Chowdhery, A.; Zhou, D. Self-consistency improves chain of thought reasoning in language models. arXiv 2022, arXiv:2203.11171. [Google Scholar]
- Zhang, Z.; Zhang, A.; Li, M.; Smola, A. Automatic chain of thought prompting in large language models. arXiv 2022, arXiv:2210.03493. [Google Scholar]
- Wang, Z.; Zhang, H.; Li, C.L.; Eisenschlos, J.M.; Perot, V.; Wang, Z.; Miculicich, L.; Fujii, Y.; Shang, J.; Lee, C.Y.; et al. Chain-of-table: Evolving tables in the reasoning chain for table understanding. arXiv 2024, arXiv:2401.04398. [Google Scholar]
- Hu, H.; Lu, H.; Zhang, H.; Song, Y.Z.; Lam, W.; Zhang, Y. Chain-of-symbol prompting elicits planning in large langauge models. arXiv 2023, arXiv:2305.10276. [Google Scholar]
- Zhao, X.; Li, M.; Lu, W.; Weber, C.; Lee, J.H.; Chu, K.; Wermter, S. Enhancing zero-shot chain-of-thought reasoning in large language models through logic. arXiv 2023, arXiv:2309.13339. [Google Scholar]
- Dhuliawala, S.; Komeili, M.; Xu, J.; Raileanu, R.; Li, X.; Celikyilmaz, A.; Weston, J. Chain-of-verification reduces hallucination in large language models. arXiv 2023, arXiv:2309.11495. [Google Scholar]
- Yu, W.; Zhang, H.; Pan, X.; Ma, K.; Wang, H.; Yu, D. Chain-of-note: Enhancing robustness in retrieval-augmented language models. arXiv 2023, arXiv:2311.09210. [Google Scholar]
- Li, X.; Zhao, R.; Chia, Y.K.; Ding, B.; Joty, S.; Poria, S.; Bing, L. Chain-of-knowledge: Grounding large language models via dynamic knowledge adapting over heterogeneous sources. arXiv 2023, arXiv:2305.13269. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Li, J.; Li, G.; Li, Y.; Jin, Z. Structured chain-of-thought prompting for code generation. arXiv 2023, arXiv:2305.06599. [Google Scholar]
- Chen, W.; Ma, X.; Wang, X.; Cohen, W.W. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv 2022, arXiv:2211.12588. [Google Scholar]
- Nye, M.; Andreassen, A.J.; Gur-Ari, G.; Michalewski, H.; Austin, J.; Bieber, D.; Dohan, D.; Lewkowycz, A.; Bosma, M.; Luan, D.; et al. Show your work: Scratchpads for intermediate computation with language models. arXiv 2021, arXiv:2112.00114. [Google Scholar]
- Diao, S.; Wang, P.; Lin, Y.; Zhang, T. Active prompting with chain-of-thought for large language models. arXiv 2023, arXiv:2302.12246. [Google Scholar]
- Paranjape, B.; Lundberg, S.; Singh, S.; Hajishirzi, H.; Zettlemoyer, L.; Ribeiro, M.T. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv 2023, arXiv:2303.09014. [Google Scholar]
- Yang, C.; Wang, X.; Lu, Y.; Liu, H.; Le, Q.V.; Zhou, D.; Chen, X. Large language models as optimizers. arXiv 2023, arXiv:2309.03409. [Google Scholar]
- Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Stroudsburg, PA, USA, 2004; pp. 74–81. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 6–12 July 2002; pp. 311–318. [Google Scholar]
- Wang, J.; Hu, X.; Hou, W.; Chen, H.; Zheng, R.; Wang, Y.; Yang, L.; Huang, H.; Ye, W.; Geng, X.; et al. On the robustness of chatgpt: An adversarial and out-of-distribution perspective. arXiv 2023, arXiv:2302.12095. [Google Scholar]
- Zhu, K.; Wang, J.; Zhou, J.; Wang, Z.; Chen, H.; Wang, Y.; Yang, L.; Ye, W.; Zhang, Y.; Gong, N.Z.; et al. Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv 2023, arXiv:2306.04528. [Google Scholar]
Type | Model Name | Release Time | Training Dataset |
---|---|---|---|
Encoder-Only | Bert [11] ALBert XLNet | 2018 2019 2019 | BookCorpus, Wikipedia BookCorpus, Wikipedia BookCorpus, Wikipedia |
Decoder-Only | GPT-1 [36] GPT-2 [37] | 2018 2019 | BookCorpus Reddit outbound |
Encoder–decoder | T5(Base) MT5(Base) BART(Base) [38] | 2019 2020 2019 | Common Crawl New Common Crawl-based dataset Corrupting text |
GPT Family | GPT-3 [27] GPT-4 [32] WebGPT [39] | 2020 2023 2021 | Common Crawl, WebText2 − ELI5 |
PaLM Family | PaLM [33] PaLM-2 [40] Med-PaLM | 2022 2023 2022 | Github Code, Web documents Wed documents HealthWSearchQA |
LLaMA Family | LLaMA1 LLaMA2 [41] LongLLaMA Koala Alpaca | 2023 2023 2023 2023 2023 | Online Sources Online Sources − − GPT-3.5 |
Application | Prompt Technique | Comparison Scope | ||
---|---|---|---|---|
LLMs | Dataset | Metrics | ||
Reasoning and Logic | Self-Consistency [51] CoT [49] Auto-CoT [52] Chain of Table [53] Cos [54] LogicCoT [55] | PaLM GPT-4 Llama 2-70B T5-large GPT 3.5 GPT-3 | GSM8K Game of 24 GSM8K GSM8K TabFact Arithmetic | Precision Success Rate Precision Rouge BLEU, Rouge Precision |
Reduce Hallucination | CoVe [56] ReAct [50] RAG [16] CoN [57] CoK [58] | Llama 65B PaLM-540B RAG-Token Llama 2 GPT 3.5 | Wikidata HotpotQA MSMARCO TriviaQA MMLU Physics and Biology | Precision Precision Rouge, BLUE F1 Score Precision |
New Tasks without Training Data | Zero-shot [59] Few-shot [27] | GPT-2 GPT-3 | Arithmetic, Symbolic NaturalQS, WebQS | Rouge Precision |
Code Generation and Execution | SCoT [60] PoT [61] CoC [58] Scratchpad Prompting [62] | ChatGPT, Codex GPT-3.5-turbo text-davinici-003, GPT-3.5-Turbo GPT-3 | HumnaEval, MBPP, MBCPP GSM8K, FinQA BIG-Bench Hard MBPP | pass@k Exact Match(EM) Score Precision Precision |
User Interaction | Active Prompt [63] | text-davinici-003 | Arithmetic, Symbolic | Self-confidence |
Fine-Tuning and Optimization | APE [48] | text-davinici-002 | BBII, TruthfulQA | Log Probability Execution Accuracy |
Knowledge-Based Reasoning and Generation | ART [64] | GPT-3(175B) | BigBench, MMLU | Precision |
Optimization and Efficiency | OPRO [65] | PaLM 2-L-IT | GSM8K, BIG-Bench Hard | Precision |
Top1_acc | Top5_acc | Top10_acc | |
---|---|---|---|
No Key Words | |||
Embedding | 0.634 | 0.768 | 0.856 |
Key Words | |||
Embedding | 0.743 | 0.844 | 0.907 |
500 | 1000 | 2000 | |
---|---|---|---|
BLUE-4 | 0.845 | 0.811 | 0.825 |
ROUGE-1 | 0.752 | 0.734 | 0.726 |
ROUGE-L | 0.754 | 0.731 | 0.738 |
500 | 1000 | 2000 | |
---|---|---|---|
BLUE-4 | 0.997 | 0.989 | 0.952 |
ROUGE-1 | 0.996 | 0.983 | 0.916 |
ROUGE-L | 0.994 | 0.981 | 0.891 |
500 | 1000 | 2000 | |
---|---|---|---|
BLUE-4 | 0.831 | 0.801 | 0.795 |
ROUGE-1 | 0.732 | 0.726 | 0.700 |
ROUGE-L | 0.744 | 0.730 | 0.735 |
100 | 500 | 1000 | |
---|---|---|---|
Precision | 0.948 | 0.927 | 0.933 |
Recall Rate | 0.912 | 0.872 | 0.895 |
F1-Score | 0.929 | 0.897 | 0.914 |
100 | 500 | 1000 | |
---|---|---|---|
Precision | 0.881 | 0.872 | 865 |
Recall Rate | 0.853 | 0.824 | 0.831 |
F1-Score | 0.867 | 0.847 | 0.848 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, J.; Cao, J. CPEQA: A Large Language Model Based Knowledge Base Retrieval System for Chinese Confidentiality Knowledge Question Answering. Electronics 2024, 13, 4195. https://doi.org/10.3390/electronics13214195
Cao J, Cao J. CPEQA: A Large Language Model Based Knowledge Base Retrieval System for Chinese Confidentiality Knowledge Question Answering. Electronics. 2024; 13(21):4195. https://doi.org/10.3390/electronics13214195
Chicago/Turabian StyleCao, Jian, and Jiuxin Cao. 2024. "CPEQA: A Large Language Model Based Knowledge Base Retrieval System for Chinese Confidentiality Knowledge Question Answering" Electronics 13, no. 21: 4195. https://doi.org/10.3390/electronics13214195
APA StyleCao, J., & Cao, J. (2024). CPEQA: A Large Language Model Based Knowledge Base Retrieval System for Chinese Confidentiality Knowledge Question Answering. Electronics, 13(21), 4195. https://doi.org/10.3390/electronics13214195