Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations
Abstract
:1. Introduction
- A Novel Recipe Recommendation System: We introduce Pic2Plate, a recipe recommendation framework that combines VLM and RAG. Pic2Plate processes both ingredient images and user preferences to recommend relevant recipes.
- Enhanced RAG with Online Retrieval: We improve the RAG framework by incorporating an online retrieval mechanism, allowing the system to continuously update its recipe database with newly available data, ensuring that Pic2Plate remains relevant and accurate over time.
- Evaluation of Pic2Plate’s Performance: We demonstrate the effectiveness of Pic2Plate by analyzing ingredient detection accuracy in image segmentation, as well as evaluating the recommendation accuracy in matching recipes to user preferences and ingredient inputs. The image segmentation module was tested using a mix of images captured with smartphone cameras and images sourced from the internet, ensuring robustness across diverse input types.
2. Related Work
2.1. Vision-Language Models
2.2. Recipe Recommendation System (Textual/Tabular Dataset to Recipe Recommendation)
3. Methodology
3.1. Image to Text Conversion
3.2. Retrieval-Augmented Generation System
Algorithm 1 Pic2Plate recipe recommendation process. |
|
Algorithm 2 Pic2Plate retrieval process. |
|
4. Evaluation
4.1. Ingredient Detection Performance
4.2. Recipe Relevance Measurement
- The number of “Best Votes” contributes to wins for the selected model.
- The number of “Worst Votes” contributes to losses for the selected model.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Survey Question Example Question List
Appendix B. The Ingredients Recognition Test Cases
Appendix C. Recipe Recommendation Case for Model A (LLama + RAG)
Appendix C.1. Recipe Recommendation Case 1
Appendix C.2. Recipe Recommendation Case 2
Appendix C.3. Recipe Recommendation Case–3
Appendix C.4. Recipe Recommendation Case 4
Appendix C.5. Recipe Recommendation Case 5
Appendix D. Recipe Recommendation Case for Model B (LLama)
Appendix D.1. Recipe Recommendation Case 1
Appendix D.2. Recipe Recommendation Case 2
Appendix D.3. Recipe Recommendation Case 3
Appendix D.4. Recipe Recommendation Case 4
Appendix D.5. Recipe Recommendation Case 5
Appendix E. Recipe Recommendation Case for Model C (GPT-4o + RAG)
Appendix E.1. Recipe Recommendation Case 1
Appendix E.2. Recipe Recommendation Case 2
Appendix E.3. Recipe Recommendation Case 3
Appendix E.4. Recipe Recommendation Case 4
Appendix E.5. Recipe Recommendation Case 5
References
- Drewnowski, A. Nutrient Density: Addressing the Challenge of Obesity. Br. J. Nutr. 2018, 120, S8–S14. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Zhang, C.; Guo, Z.; Huang, C.; Metoyer, R.; Chawla, N.V. RecipeRec: A Heterogeneous Graph Learning Model for Recipe Recommendation. arXiv 2022, arXiv:2205.14005. [Google Scholar] [CrossRef]
- Tian, Y.; Zhang, C.; Metoyer, R.; Chawla, N.V. Recipe Recommendation With Hierarchical Graph Attention Network. Front. Big Data 2022, 4, 778417. [Google Scholar] [CrossRef] [PubMed]
- Rodrigues, M.S.; Fidalgo, F.; Oliveira, Â. RecipeIS—Recipe Recommendation System Based on Recognition of Food Ingredients. Appl. Sci. 2023, 13, 7880. [Google Scholar] [CrossRef]
- Zhang, Y.; Yamakata, Y.; Tajima, K. MIAIS: A Multimedia Recipe Dataset with Ingredient Annotation at Each Instructional Step. In Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and Related APPlications, Lisboa, Portugal, 10 October 2022; pp. 49–52. [Google Scholar] [CrossRef]
- Wu, X.; Fu, X.; Liu, Y.; Lim, E.P.; Hoi, S.C.H.; Sun, Q. A Large-Scale Benchmark for Food Image Segmentation. In Proceedings of the 29th ACM International Conference on Multimedia, Online, 20–24 October 2021; pp. 506–515. [Google Scholar] [CrossRef]
- Shi, J.; Komamizu, T.; Doman, K.; Kyutoku, H.; Ide, I. RecipeMeta: Metapath-enhanced Recipe Recommendation on Heterogeneous Recipe Network. arXiv 2023, arXiv:2310.15593. [Google Scholar] [CrossRef]
- Chen, Y.; Subburathinam, A.; Chen, C.H.; Zaki, M.J. Personalized Food Recommendation as Constrained Question Answering over a Large-scale Food Knowledge Graph. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Online, 8–12 March 2021; pp. 544–552. [Google Scholar] [CrossRef]
- Bondevik, J.N.; Bennin, K.E.; Babur, Ö.; Ersch, C. A systematic review on food recommender systems. Expert Syst. Appl. 2024, 238, 122–166. [Google Scholar] [CrossRef]
- Min, W. and Jiang, S. and Jain, R. Food Recommendation: Framework, Existing Solutions, and Challenges. IEEE Trans. Multimed. 2020, 22, 2659–2671. [Google Scholar] [CrossRef]
- Food Recommender System: A Review on Techniques, Datasets and Evaluation Metrics. J. Syst. Manag. Sci. 2023, 13, 153–168. [CrossRef]
- Ghosh, A.; Acharya, A.; Saha, S.; Jain, V.; Chadha, A. Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions. arXiv 2024, arXiv:2404.07214. [Google Scholar] [CrossRef]
- Laurençon, H.; Marafioti, A.; Sanh, V.; Tronchon, L. Building and Better Understanding Vision-Language Models: Insights and Future Directions. arXiv 2024, arXiv:2408.12637. [Google Scholar] [CrossRef]
- OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Yang, A.; Fan, A.; et al. The Llama 3 Herd of Models. arXiv 2024, arXiv:2407.21783. [Google Scholar] [CrossRef]
- Jiang, Y.; Yan, X.; Ji, G.P.; Fu, K.; Sun, M.; Xiong, H.; Fan, D.P.; Khan, F.S. Effectiveness Assessment of Recent Large Vision-Language Models. Vis. Intell. 2024, 2, 17. [Google Scholar] [CrossRef]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, M.; Wang, H. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv 2024, arXiv:2312.10997. [Google Scholar] [CrossRef]
- Fan, W.; Ding, Y.; Ning, L.; Wang, S.; Li, H.; Yin, D.; Chua, T.S.; Li, Q. A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models. arXiv 2024, arXiv:2405.06211. [Google Scholar] [CrossRef]
- Minaee, S.; Boykov, Y.Y.; Porikli, F.; Plaza, A.J.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Ahsan, U.; Li, H.; Hagen, M. A Comprehensive Review of Modern Object Segmentation Approaches. Found. Trends® Comput. Graph. Vis. 2022, 13, 111–283. [Google Scholar] [CrossRef]
- Csurka, G.; Volpi, R.; Chidlovskii, B. Semantic Image Segmentation: Two Decades of Research. arXiv 2023, arXiv:2302.06378. [Google Scholar] [CrossRef]
- Yu, Y.; Wang, C.; Fu, Q.; Kou, R.; Huang, F.; Yang, B.; Yang, T.; Gao, M. Techniques and Challenges of Image Segmentation: A Review. Electronics 2023, 12, 1199. [Google Scholar] [CrossRef]
- Zhou, T.; Zhang, F.; Chang, B.; Wang, W.; Yuan, Y.; Konukoglu, E.; Cremers, D. Image Segmentation in Foundation Model Era: A Survey. arXiv 2024, arXiv:2408.12957. [Google Scholar] [CrossRef]
- Chopra, M.; Purwar, A. Recent Studies on Segmentation Techniques for Food Recognition: A Survey. Arch. Comput. Methods Eng. 2022, 29, 865–878. [Google Scholar] [CrossRef]
- Lan, X.; Lyu, J.; Jiang, H.; Dong, K.; Niu, Z.; Zhang, Y.; Xue, J. FoodSAM: Any Food Segmentation. IEEE Trans. Multimed. 2024, 1–14. [Google Scholar] [CrossRef]
- Zhang, Y.; Deng, L.; Zhu, H.; Wang, W.; Ren, Z.; Zhou, Q.; Lu, S.; Sun, S.; Zhu, Z.; Gorriz, J.M.; et al. Deep Learning in Food Category Recognition. Inf. Fusion 2023, 98, 101859. [Google Scholar] [CrossRef]
- Morol, M.K.; Rokon, M.S.J.; Hasan, I.B.; Saif, A.M.; Khan, R.H.; Das, S.S. Food Recipe Recommendation Based on Ingredients Detection Using Deep Learning. In Proceedings of the 2nd International Conference on Computing Advancements, Dhaka, Bangladesh, 10–12 March 2022; pp. 191–198. [Google Scholar] [CrossRef]
- Fu, K.; Dai, Y. Recognizing Multiple Ingredients in Food Images Using a Single-Ingredient Classification Model. Int. J. Intell. Inf. Technol. 2024, 20, 1–21. [Google Scholar] [CrossRef]
- Zhu, Z.; Dai, Y. A New CNN-Based Single-Ingredient Classification Model and Its Application in Food Image Segmentation. J. Imaging 2023, 9, 205. [Google Scholar] [CrossRef]
- Zhang, J.; Huang, J.; Jin, S.; Lu, S. Vision-Language Models for Vision Tasks: A Survey. arXiv 2024, arXiv:2304.00685. [Google Scholar] [CrossRef] [PubMed]
- Bordes, F.; Pang, R.Y.; Ajay, A.; Li, A.C.; Bardes, A.; Petryk, S.; Mañas, O.; Lin, Z.; Mahmoud, A.; Jayaraman, B.; et al. An Introduction to Vision-Language Modeling. arXiv 2024, arXiv:2405.17247. [Google Scholar] [CrossRef]
- Li, J.; Li, D.; Xiong, C.; Hoi, S. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. arXiv 2022, arXiv:2201.12086. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. arXiv 2021, arXiv:2103.00020. [Google Scholar] [CrossRef]
- Pandey, A.; Varma, R.; Gupta, A.; Tekwani, B. Recipe Recommendation System Based on Ingredients. Int. J. Multidiscip. Res. 2024, 6, 1–8. [Google Scholar] [CrossRef]
- Xing, T.; Gao, J. RecipeRadar: An AI-Powered Recipe Recommendation System. In Proceedings of the Intelligent Systems and Applications; Arai, K., Ed.; Springer: Cham, Switzerland, 2024; pp. 102–113. [Google Scholar] [CrossRef]
- Raj, S.; Sinha, A.; Srivastav, M.; Singh, A.S.; Anandhan, K. Recipe Recommendation System with Ingredients Available on User. In Proceedings of the 2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India, 17–18 December 2021; pp. 1857–1859. [Google Scholar] [CrossRef]
- Chen, J.; Yin, Y.; Xu, Y. RecipeSnap—A Lightweight Image-to-Recipe Model. arXiv 2022, arXiv:2205.02141. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar] [CrossRef]
- Ege, T.; Yanai, K. A New Large-scale Food Image Segmentation Dataset and Its Application to Food Calorie Estimation Based on Grains of Rice. In Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management (MADiMa ’19), Nice, France, 21–25 October 2019; pp. 82–87. [Google Scholar] [CrossRef]
- Bradley, R.A.; Terry, M.E. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika 1952, 39, 324–345. [Google Scholar] [CrossRef]
- Hunter, D.R. MM Algorithms for Generalized Bradley-Terry Models. Ann. Stat. 2004, 32, 384–406. [Google Scholar] [CrossRef]
- Shev, A.; Fujii, K.; Hsieh, F.; McCowan, B. Systemic Testing on Bradley-Terry Model against Nonlinear Ranking Hierarchy. PLoS ONE 2014, 9, e115367. [Google Scholar] [CrossRef] [PubMed]
- Zhu, C.; Byrd, R.H.; Lu, P.; Nocedal, J. Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-Scale Bound-Constrained Optimization. ACM Trans. Math. Softw. 1997, 23, 550–560. [Google Scholar] [CrossRef]
Mobile Device | Camera Sensor | Resolution (MP) | Focal Length (mm) | Aperture (f) |
---|---|---|---|---|
Smartphone 1 | Sony, 1.9 µm | 12 | 26 | 1.5 |
Smartphone 2 | Sony IMX766V, 1.0 µm | 50 | 10 | 1.8 |
Smartphone 3 | Samsung S5KGN3, 1.0 µm | 50 | 23 | 1.8 |
Smartphone 4 | Samsung S5K2LD, 1.8 µm | 12 | 24 | 1.8 |
Tablet 1 | -, 1.8 µm | 13 | - | 1.8 |
Tablet 2 | -, 1.0 µm | 13 | 26 | 2.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Soekamto, Y.S.; Lim, A.; Limanjaya, L.C.; Purwanto, Y.K.; Lee, S.-H.; Kang, D.-K. Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations. Sensors 2025, 25, 449. https://doi.org/10.3390/s25020449
Soekamto YS, Lim A, Limanjaya LC, Purwanto YK, Lee S-H, Kang D-K. Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations. Sensors. 2025; 25(2):449. https://doi.org/10.3390/s25020449
Chicago/Turabian StyleSoekamto, Yosua Setyawan, Andreas Lim, Leonard Christopher Limanjaya, Yoshua Kaleb Purwanto, Suk-Ho Lee, and Dae-Ki Kang. 2025. "Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations" Sensors 25, no. 2: 449. https://doi.org/10.3390/s25020449
APA StyleSoekamto, Y. S., Lim, A., Limanjaya, L. C., Purwanto, Y. K., Lee, S.-H., & Kang, D.-K. (2025). Pic2Plate: A Vision-Language and Retrieval-Augmented Framework for Personalized Recipe Recommendations. Sensors, 25(2), 449. https://doi.org/10.3390/s25020449