Application of Large Language Models and Assessment of Their Ship-Handling Theory Knowledge and Skills for Connected Maritime Autonomous Surface Ships
Abstract
:1. Introduction
2. Existing Work
2.1. Autonomous Ship Navigation System
2.2. LLMs-Based Autonomous Driving
2.3. Evaluating LLMs with Multiple-Choice Questions
3. System Framework of LLM-Assisted Navigation for MASSs
4. Research Methodology
4.1. OOW Theory Examination
4.2. Test Datasets
4.3. Prompt Design
4.3.1. Instructing the LLMs to Role-Play and Demonstrate Specific Skills
4.3.2. Providing Example MCQs and Answers
4.3.3. Designing Structured Prompts
4.4. LLMs Used in Theory Test
5. Experiments
5.1. Experiments Settings
5.2. Experimental Results and Discussions
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ma, S. Economics of Maritime Business; Routledge: London, UK, 2020. [Google Scholar]
- UNCTAD. Review of Maritime Transport 2023, 2023rd ed.; United Nations: San Francisco, CA, USA, 2023. [Google Scholar]
- OECD. Impacts of Russia’s War of Aggression against Ukraine on the Shipping and Shipbuilding Markets; OCED: Paris, France, 2023. [Google Scholar]
- de Vos, J.; Hekkenberg, R.G.; Banda, O.A.V. The impact of autonomous ships on safety at sea—A statistical analysis. Reliab. Eng. Syst. Saf. 2021, 210, 107558. [Google Scholar] [CrossRef]
- StraitsResearch. Global Autonomous Ships Market to Expand at a CAGR of 6.81% by 2031. 2024. Available online: https://straitsresearch.com/press-release/global-autonomous-ships-market-outlook (accessed on 29 July 2024).
- Fenton, A.J.; Chapsos, I. Ships without crews: IMO and UK responses to cybersecurity, technology, law and regulation of maritime autonomous surface ships (MASS). Front. Comput. Sci. 2023, 5, 1151188. [Google Scholar] [CrossRef]
- Thombre, S.; Zhao, Z.; Ramm-Schmidt, H.; García, J.M.V.; Malkamäki, T.; Nikolskiy, S.; Hammarberg, T.; Nuortie, H.; Bhuiyan, M.Z.H.; Särkkä, S.; et al. Sensors and AI techniques for situational awareness in autonomous ships: A review. IEEE Trans. Intell. Transp. Syst. 2020, 23, 64–83. [Google Scholar] [CrossRef]
- Qiao, Y.; Yin, J.; Wang, W.; Duarte, F.; Yang, J.; Ratti, C. Survey of Deep Learning for Autonomous Surface Vehicles in Marine Environments. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3678–3701. [Google Scholar] [CrossRef]
- Issa, M.; Ilinca, A.; Ibrahim, H.; Rizk, P. Maritime autonomous surface ships: Problems and challenges facing the regulatory process. Sustainability 2022, 14, 15630. [Google Scholar] [CrossRef]
- Wright, R.G. Intelligent autonomous ship navigation using multi-sensor modalities. Transnav Int. J. Mar. Navig. Saf. Sea Transp. 2019, 13, 503–510. [Google Scholar] [CrossRef]
- Han, J.; Cho, Y.; Kim, J.; Kim, J.; Son, N.s.; Kim, S.Y. Autonomous collision detection and avoidance for ARAGON USV: Development and field tests. J. Field Robot. 2020, 37, 987–1002. [Google Scholar] [CrossRef]
- Sha, H.; Mu, Y.; Jiang, Y.; Chen, L.; Xu, C.; Luo, P.; Li, S.E.; Tomizuka, M.; Zhan, W.; Ding, M. Languagempc: Large language models as decision makers for autonomous driving. arXiv 2023, arXiv:2310.03026. [Google Scholar]
- Fu, D.; Li, X.; Wen, L.; Dou, M.; Cai, P.; Shi, B.; Qiao, Y. Drive like a human: Rethinking autonomous driving with large language models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2024; pp. 910–919. [Google Scholar]
- Ye, J.; Chen, X.; Xu, N.; Zu, C.; Shao, Z.; Liu, S.; Cui, Y.; Zhou, Z.; Gong, C.; Shen, Y.; et al. A comprehensive capability analysis of gpt-3 and gpt-3.5 series models. arXiv 2023, arXiv:2303.10420. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Tang, Z.; Shen, K.; Kejriwal, M. An Evaluation of Estimative Uncertainty in Large Language Models. arXiv 2024, arXiv:2405.15185. [Google Scholar]
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen technical report. arXiv 2023, arXiv:2309.16609. [Google Scholar]
- Villa, J.; Aaltonen, J.; Koskinen, K.T. Path-following with lidar-based obstacle avoidance of an unmanned surface vehicle in harbor conditions. IEEE/ASME Trans. Mechatron. 2020, 25, 1812–1820. [Google Scholar] [CrossRef]
- Cockcroft, A.N.; Lameijer, J.N.F. Guide to the Collision Avoidance Rules; Elsevier: Amsterdam, The Netherlands, 2003. [Google Scholar]
- Kufoalor, D.K.M.; Johansen, T.A.; Brekke, E.F.; Hepsø, A.; Trnka, K. Autonomous maritime collision avoidance: Field verification of autonomous surface vehicle behavior in challenging scenarios. J. Field Robot. 2020, 37, 387–403. [Google Scholar] [CrossRef]
- Kim, J.; Lee, C.; Chung, D.; Cho, Y.; Kim, J.; Jang, W.; Park, S. Field experiment of autonomous ship navigation in canal and surrounding nearshore environments. J. Field Robot. 2024, 41, 470–489. [Google Scholar] [CrossRef]
- Cui, C.; Ma, Y.; Cao, X.; Ye, W.; Wang, Z. Receive, Reason, and React: Drive as You Say, With Large Language Models in Autonomous Vehicles. IEEE Intell. Transp. Syst. Mag. 2024, 4, 81–94. [Google Scholar] [CrossRef]
- Duan, Y.; Zhang, Q.; Xu, R. Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs. arXiv 2024, arXiv:2404.04869. [Google Scholar]
- Huang, S.; Zhao, X.; Wei, D.; Song, X.; Sun, Y. Chatbot and Fatigued Driver: Exploring the Use of LLM-Based Voice Assistants for Driving Fatigue. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024; pp. 1–8. [Google Scholar]
- Li, W.; Li, L.; Xiang, T.; Liu, X.; Deng, W.; Garcia, N. Can multiple-choice questions really be useful in detecting the abilities of LLMs? arXiv 2024, arXiv:2403.17752. [Google Scholar]
- Zhang, Z.; Xu, L.; Jiang, Z.; Hao, H.; Wang, R. Multiple-Choice Questions are Efficient and Robust LLM Evaluators. arXiv 2024, arXiv:2405.11966. [Google Scholar]
- Zhang, Z.; Lei, L.; Wu, L.; Sun, R.; Huang, Y.; Long, C.; Liu, X.; Lei, X.; Tang, J.; Huang, M. Safetybench: Evaluating the safety of large language models with multiple choice questions. arXiv 2023, arXiv:2309.07045. [Google Scholar]
- Huang, Y.; Bai, Y.; Zhu, Z.; Zhang, J.; Zhang, J.; Su, T.; Liu, J.; Lv, C.; Zhang, Y.; Fu, Y.; et al. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. InAdvances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: New York, NY, USA, 2023; Volume 36, pp. 62991–63010. [Google Scholar]
- Wu, S.; Koo, M.; Blum, L.; Black, A.; Kao, L.; Fei, Z.; Scalzo, F.; Kurtz, I. Benchmarking Open-Source Large Language Models, GPT-4 and Claude 2 on Multiple-Choice Questions in Nephrology. NEJM AI 2024, 1, AIdbp2300092. [Google Scholar] [CrossRef]
- Dao, X.Q.; Le, N.B.; Ngo, B.B.; Phan, X.D. LLMs’ Capabilities at the High School Level in Chemistry: Cases of ChatGPT and Microsoft Bing Chat. ChemRxiv 2023. [Google Scholar] [CrossRef]
- Sadek, A. The Standards of Training, Certification and Watchkeeping for Seafarers (STCW) Convention 1978. In The International Maritime Organisation; Routledge: London, UK, 2024; pp. 194–213. [Google Scholar]
- Wang, W.; Lv, Q.; Yu, W.; Hong, W.; Qi, J.; Wang, Y.; Ji, J.; Yang, Z.; Zhao, L.; Song, X.; et al. CogVLM: Visual Expert for Pretrained Language Models. arXiv 2023, arXiv:2311.03079. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
- Team, G.; Mesnard, T.; Hardin, C.; Dadashi, R.; Bhupatiraju, S.; Pathak, S.; Sifre, L.; Rivière, M.; Kale, M.S.; Love, J.; et al. Gemma: Open models based on gemini research and technology. arXiv 2024, arXiv:2403.08295. [Google Scholar]
- AI@Meta. Llama 3 Model Card. 2024. Available online: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/ (accessed on 29 July 2024).
Model Name | Prices ($)/1 k Tokens | Model Size | Version | Creators | |
---|---|---|---|---|---|
Input | Output | ||||
Qwen-turbo | 0.0145 | 0.0435 | undisclosed | \ | Alibaba Cloud |
ERNIE-4.0-8k | 0.871 | 0.871 | undisclosed | 0329 | Baidu |
GPT-3.5-turbo | 0.0005 | 0.0015 | undisclosed | \ | Open AI |
CPT-4 | 0.03 | 0.06 | undisclosed | \ | |
GPT-4o | 0.005 | 0.015 | undisclosed | \ | |
GLM-3-turbo | Open Source | undisclosed | \ | Tsinghua and Zhipu | |
GLM-4-Air | undisclosed | \ | |||
GLM-4 | 9B | 0520 | |||
Qianfan-Chinese-Llama-2-7B | 0.029 | 0.029 | 7B | \ | Qianfan |
Qianfan-Chinese-Llama-2-13B | 0.044 | 0.044 | 13B | v1 | |
Qianfan-Chinese-Llama-2-70B | 0.254 | 0.254 | 70B | \ | |
Meta-Llama-3-8B | Open Source | 8B | Instruct | Meta AI | |
Meta-Llama-3-70B | 70B | Instruct | |||
Gemma-7B-it | 7B | Instruct |
Model Name | Temperature | Top_p | # Max Output Tokens |
---|---|---|---|
Qwen-turbo | 0.5 | 0.7 | 100 |
ERNIE-4.0-8k | 0.5 | 0.7 | 100 |
GPT-3.5-turbo | 0 | 1 | 100 |
CPT-4 | 0 | 1 | 100 |
GPT-4o | 0 | 1 | 100 |
GLM-3-turbo | 0.5 | 0.7 | 100 |
GLM-4-Air | 0.5 | 0.7 | 100 |
GLM-4 | 0.5 | 0.7 | 100 |
Qianfan-Chinese-Llama-2-7B | 0.5 | 0.7 | 100 |
Qianfan-Chinese-Llama-2-13B | 0.5 | 0.7 | 100 |
Qianfan-Chinese-Llama-2-70B | 0.5 | 0.7 | 100 |
Meta-Llama-3-8B | 0.5 | 0.7 | 100 |
Meta-Llama-3-70B | 0.5 | 0.7 | 100 |
Gemma-7B-it | 0.5 | 0.7 | 100 |
Model | # Ques. | # Corr. | Acc. | Time (s) | # Total Tokens | |
---|---|---|---|---|---|---|
# Input Tokens | # Output Tokens | |||||
Qwen-turbo | 706 | 423 | 59.92% | 636.6 | 247,105 | 739 |
ERNIE-4.0-8k | 706 | 412 | 58.36% | 2921.65 | 214,026 | 6925 |
GPT-3.5-turbo | 706 | 316 | 44.76% | 415.86 | 415,338 | 711 |
CPT-4 | 706 | 389 | 55.10% | 529.42 | 415,338 | 733 |
GPT-4o | 706 | 429 | 60.76% | 339.01 | 292,675 | 706 |
GLM-3-turbo | 706 | 340 | 48.16% | 940.27 | 239,117 | 2134 |
GLM-4-Air | 706 | 352 | 49.86% | 995.76 | 230,267 | 2122 |
GLM-4 | 706 | 371 | 52.55% | 1110.49 | 230,267 | 2133 |
Qianfan-Chinese-Llama-2-7B | 706 | 273 | 38.67% | 2520.14 | 230,108 | 2,010 |
Qianfan-Chinese-Llama-2-13B | 706 | 317 | 44.90% | 6994.99 | 230,108 | 111,678 |
Qianfan-Chinese-Llama-2-70B | 706 | 398 | 56.37% | 5510.65 | 230,108 | 93,757 |
Meta-Llama-3-8B | 706 | 283 | 40.08% | 2235.81 | 230,108 | 709 |
Meta-Llama-3-70B | 706 | 313 | 44.33% | 9015.19 | 230,108 | 116,630 |
Gemma-7B-it | 706 | 282 | 39.94% | 2779.97 | 230,108 | 30,907 |
Model | # Ques. | # Corr. | Acc. | Time (s) | # Total Tokens | |
---|---|---|---|---|---|---|
# Input Tokens | # Output Tokens | |||||
Qwen-turbo | 706 | 333 | 47.17% | 631.79 | 232,279 | 734 |
ERNIE-4.0-8k | 706 | 398 | 56.37% | 3129.01 | 218,968 | 8784 |
GPT-3.5-turbo | 706 | 315 | 44.62% | 442.89 | 260,724 | 713 |
CPT-4 | 706 | 380 | 53.82% | 498.75 | 260,724 | 799 |
GPT-4o | 706 | 435 | 61.61% | 329.97 | 237,607 | 706 |
GLM-3-turbo | 706 | 336 | 47.59% | 1519.96 | 232,763 | 2375 |
GLM-4-Air | 706 | 356 | 50.42% | 1387.82 | 223,913 | 2116 |
GLM-4 | 706 | 367 | 51.98% | 1391.58 | 223,913 | 2365 |
Qianfan-Chinese-Llama-2-7B | 706 | 312 | 44.19% | 1980.27 | 224,460 | 4461 |
Qianfan-Chinese-Llama-2-13B | 706 | 337 | 47.73% | 4285.56 | 224,460 | 113,071 |
Qianfan-Chinese-Llama-2-70B | 706 | 396 | 56.09% | 5289.31 | 224,460 | 93,379 |
Meta-Llama-3-8B | 706 | 289 | 40.93% | 1843.25 | 224,460 | 706 |
Meta-Llama-3-70B | 706 | 359 | 50.85% | 8680.55 | 224,460 | 116,637 |
Gemma-7B-it | 706 | 294 | 41.64% | 2,803.91 | 224,460 | 31,128 |
Model | # Ques. | # Corr. | Acc. | Time (s) | # Total Tokens | |
---|---|---|---|---|---|---|
# Input Tokens | # Output Tokens | |||||
Qwen-turbo | 814 | 451 | 55.41% | 767.11 | 244,152 | 845 |
ERNIE-4.0-8k | 814 | 549 | 67.44% | 3788.96 | 237,234 | 6011 |
GPT-3.5-turbo | 814 | 467 | 57.37% | 352.25 | 243,151 | 832 |
CPT-4 | 814 | 613 | 75.31% | 547.76 | 243,151 | 814 |
GPT-4o | 814 | 700 | 86.00% | 366.41 | 243,703 | 814 |
GLM-3-turbo | 814 | 476 | 58.48% | 1533.61 | 252,842 | 2481 |
GLM-4-Air | 814 | 531 | 65.23% | 1250.09 | 239,865 | 2442 |
GLM-4 | 814 | 553 | 67.94% | 1523.23 | 239,878 | 2443 |
Qianfan-Chinese-Llama-2-7B | 814 | 341 | 41.89% | 2364.15 | 242,846 | 2709 |
Qianfan-Chinese-Llama-2-13B | 814 | 393 | 48.28% | 4775.56 | 242,846 | 123,678 |
Qianfan-Chinese-Llama-2-70B | 814 | 486 | 59.71% | 5511.66 | 242,846 | 93,846 |
Meta-Llama-3-8B | 814 | 407 | 50.00% | 2475.05 | 242,846 | 814 |
Meta-Llama-3-70B | 814 | 542 | 66.58% | 8699.69 | 242,846 | 113,623 |
Gemma-7B-it | 814 | 361 | 44.35% | 3588.27 | 242,846 | 32,208 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pei, D.; He, J.; Liu, K.; Chen, M.; Zhang, S. Application of Large Language Models and Assessment of Their Ship-Handling Theory Knowledge and Skills for Connected Maritime Autonomous Surface Ships. Mathematics 2024, 12, 2381. https://doi.org/10.3390/math12152381
Pei D, He J, Liu K, Chen M, Zhang S. Application of Large Language Models and Assessment of Their Ship-Handling Theory Knowledge and Skills for Connected Maritime Autonomous Surface Ships. Mathematics. 2024; 12(15):2381. https://doi.org/10.3390/math12152381
Chicago/Turabian StylePei, Dashuai, Jianhua He, Kezhong Liu, Mozi Chen, and Shengkai Zhang. 2024. "Application of Large Language Models and Assessment of Their Ship-Handling Theory Knowledge and Skills for Connected Maritime Autonomous Surface Ships" Mathematics 12, no. 15: 2381. https://doi.org/10.3390/math12152381
APA StylePei, D., He, J., Liu, K., Chen, M., & Zhang, S. (2024). Application of Large Language Models and Assessment of Their Ship-Handling Theory Knowledge and Skills for Connected Maritime Autonomous Surface Ships. Mathematics, 12(15), 2381. https://doi.org/10.3390/math12152381