Enhancing Environmental Control in Broiler Production: Retrieval-Augmented Generation for Improved Decision-Making with Large Language Models

Leite, Marcus Vinicius; Abe, Jair Minoro; Souza, Marcos Leandro Hoffmann; de Alencar Nääs, Irenilza

doi:10.3390/agriengineering7010012

Open AccessArticle

Enhancing Environmental Control in Broiler Production: Retrieval-Augmented Generation for Improved Decision-Making with Large Language Models

by

Marcus Vinicius Leite

¹

,

Jair Minoro Abe

¹

,

Marcos Leandro Hoffmann Souza

² and

Irenilza de Alencar Nääs

^1,*

¹

Graduate Program in Production Engineering, Paulista University, Rua Dr. Bacelar 1212, São Paulo 04026-002, SP, Brazil

²

Computer Science, Universidade do Vale do Rio dos Sinos, Av. Unisinos, 950, São Leopoldo 93022-750, RS, Brazil

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(1), 12; https://doi.org/10.3390/agriengineering7010012

Submission received: 30 November 2024 / Revised: 20 December 2024 / Accepted: 3 January 2025 / Published: 8 January 2025

(This article belongs to the Special Issue The Future of Artificial Intelligence in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

The growing global demand for animal protein, particularly chicken meat, challenges poultry farming to adapt production systems through the adoption of digital technologies. Among the promising advances in artificial intelligence (AI), large language models (LLMs) hold potential to enhance decision-making in broiler production by supporting environmental control through the interpretation of climatic data, the generation of reports to optimize conditions, guidance on ventilation adjustments, recommendations for thermal management, assistance in air quality monitoring, and the translation of simulation results into actionable suggestions to improve bird welfare. For this purpose, the key limitations of LLMs in terms of transparency, accuracy, precision, and relevance must be effectively addressed. This study investigates the impact of retrieval-augmented generation (RAG) on improving LLM precision and relevance for environmental control in broiler production. Experiments with the OpenAI GPT-4o model and semantic similarity analysis were used to evaluate response quality with and without RAG. The results confirmed the approach’s effectiveness while identifying areas for improvement. A paired t-test revealed significantly higher similarity scores with RAG, demonstrating its impact on response quality. This study contributes to the field by advancing RAG-enhanced LLMs for environmental control, addressing market demands by demonstrating how AI improves decision-making for productivity and animal welfare, and benefits society by providing small-scale producers with cost-effective and accessible solutions for actionable insights.

Keywords:

retrieval-augmented generation (RAG); GPT; large language model (LLM); smart poultry farming; precision livestock farming

Graphical Abstract

1. Introduction

Global demand for animal protein, particularly poultry, has dramatically increased due to economic growth and changing dietary preferences, especially in developing regions across Asia and South America. As incomes rise, diets shift toward more frequent animal product consumption, a trend expected to drive a 70% increase in demand by 2050 [1,2,3]. This growth places immense pressure on the livestock sector to scale production while maintaining quality and safety standards. In response, the poultry industry has adopted intensive farming practices, where high-density production is essential for meeting demand. Intensive poultry farming now accounts for approximately 92% of global poultry production, with countries like Brazil exemplifying this shift through large-scale, integrated systems involving over 50,000 producers who follow strict health and safety standards regulated by large corporations [1,2,3,4]. While necessary, this transformation to high-density production introduces new challenges in managing animal health, welfare, and environmental conditions. High-density systems require strict biosecurity and environmental management to prevent disease, maintain productivity, and comply with complex animal welfare standards. Effective environmental control is vital for optimizing growth and feed conversion efficiency while ensuring sustainable production practices. The poultry industry has increasingly turned to smart poultry farming, a subset of precision livestock farming (PLF) that utilizes digital technologies to automate and enhance monitoring and management processes to address these challenges. This approach meets modern poultry farming’s rigorous demands by improving productive efficiency, promoting animal welfare, and supporting regulatory compliance [1,3,5,6,7,8].

In smart poultry farming, PLF systems typically perform three main functions: detection and monitoring, data analysis, and decision-making. Detection and monitoring technologies, such as IoT-based environmental sensors, gather extensive data on critical parameters within the poultry house environment, including temperature, humidity, air speed, and gas concentrations. While advancements in detection and monitoring are considerable, the sheer volume of data produced presents significant challenges in the stages of analysis and decision-making. Transforming this raw data into actionable insights that producers can use for real-time adjustments is complex, often requiring advanced analytical tools and expertise that may not be readily accessible to producers [7,8,9,10,11].

Previous work has demonstrated significant advancements in integrating machine learning for real-time health and welfare monitoring in poultry farms, highlighting the critical role of data-driven insights in precision livestock management. For example, computer vision systems have been employed to monitor and predict broiler behaviors and recognize stress-related conditions using technologies such as convolutional neural networks (CNNs) and deep reinforcement learning. Additionally, AI techniques like neural networks and support vector machines (SVMs) have been applied to analyze poultry vocalizations and behaviors, achieving high accuracy in classifying activities and detecting health or welfare issues [12,13,14]. Despite illustrating the potential of machine learning to address key challenges in poultry farming, these efforts have focused mainly on isolated AI applications with limited exploration of contextual knowledge to enhance decision-making. Consequently, they fall short of adequately addressing the complexities of analysis and decision-making stages, which often require sophisticated tools and expertise that are not readily accessible to producers [14].

The current approach to addressing these challenges relies on expert consultants who analyze data, respond to producers’ inquiries, provide guidance, and support decision-making. While these specialists play an important role in interpreting complex datasets and delivering tailored recommendations, their services are often expensive, limiting accessibility for small-scale producers. Moreover, consultancy typically relies on historical data, resulting in outdated insights that reduce the efficiency and effectiveness of recommendations, particularly in the fast-paced environment of poultry farming [10,11,12,13,14,15]. These limitations underscore the need for innovative solutions to enable real-time, cost-effective, and accessible decision-making support for producers.

To address these challenges, large language models (LLMs) are a promising technology with significant potential to generate insights from extensive textual data, leveraging deep learning and natural language processing (NLP) techniques. Built on the Transformer architecture, LLMs—such as OpenAI’s GPT, Google’s BERT, and Meta’s LLaMA—incorporate attention mechanisms that enable efficient contextual understanding of complex linguistic datasets [16,17,18]. These capabilities allow LLMs to summarize documents, generate reports, and answer questions, transforming raw data into intuitive insights supporting decision-making across healthcare, education, finance, agriculture, media, and scientific research [19,20,21,22].

In environmental control for broiler production, LLMs can support producers in environmental control activities such as monitoring climatic variables by analyzing and interpreting real-time data generated by sensors (e.g., temperature, humidity, and gasses) and producing automated reports with recommendations. They can also assist in ventilation control by diagnosing problems, interpreting sensor data, and suggesting possible causes based on known patterns. They act as a configuration assistant that guides producers in natural language on adjusting fans and evaporative panels based on environmental data and best practices. Additionally, LLMs can aid in heat and cold management by providing suggestions and recommendations to optimize thermal management (e.g., adjusting curtains or fan intensity) using analyses of environmental conditions and historical data. They can support air quality analysis by interpreting gas concentration readings, issuing alerts for unsafe levels, explaining how gas levels impact bird health and performance, and providing solutions based on best practices, as well as generating automated reports on air quality, linking ammonia or carbon dioxide levels to environmental conditions. Finally, LLMs can assist producers in operational adjustments by interpreting and translating simulation results and offering detailed suggestions on priority actions to improve environmental conditions.

Despite their promise, LLMs face substantial limitations that hinder their applicability in critical domains like environmental management in broiler farming. Their “black box” nature limits transparency, reducing user trust, particularly where verifiable justification is required. Additionally, LLMs often lack contextual specificity, generating generalized responses that fail to address the unique conditions of production sites. This limitation becomes more pronounced in rapidly evolving fields, as models trained on static datasets may provide outdated or irrelevant information. Finally, another critical concern is the generation of hallucinations, where LLMs produce plausible but incorrect responses, further undermining reliability [20,22,23].

To mitigate these shortcomings, techniques such as retrieval-augmented generation (RAG) and prompt engineering have been proposed [24,25]. In the RAG approach, a ‘smart retriever’ technology gathers data from external knowledge bases and the user’s query to create enriched input (Figure 1). The LLM then leverages this input to generate accurate, context-sensitive responses [26,27,28]. RAG proposes to enhance LLMs by integrating a retrieval mechanism that provides relevant and up-to-date information during response generation. This approach aims to improve transparency, mitigate traditional models ‘black box’ limitations, and address key challenges such as outdated training data, hallucinations, and limited accuracy [26,27,28,29,30,31,32,33].

However, RAG performance relies on the quality and relevance of the retrieved data, and maintaining updated databases remains resource-intensive. Scalability is another challenge, as computational costs and response times can limit real-time applications despite ongoing advancements in storage and retrieval mechanisms to improve their efficiency and reliability [26,30].

Given these considerations, this study employs an experimental design to investigate whether the RAG technique enhances the precision and relevance of LLM-generated responses for environmental control in broiler poultry farming. We hypothesize that RAG will improve LLM performance by providing contextual information, enabling producers to make informed decisions. The research framework ensures reliable conclusions through controlled variables and reproducible methodologies.

This study contributes to science by advancing RAG-enhanced LLMs for environmental control, demonstrating how AI improves the precision and reliability of decision-making in critical applications such as poultry farming. It addresses market demands by providing a practical framework that supports the poultry industry in enhancing productivity, animal welfare, and regulatory compliance. Additionally, it benefits society by bridging the knowledge gap for small-scale producers, offering cost-effective, accessible, and actionable insights to improve operations without requiring expert consultancy or advanced technical expertise.

2. Materials and Methods

This study employs an experimental design to assess the impact of the RAG technique on the quality of responses generated by an LLM. By introducing RAG as a variable, the experiment measures its effect on the semantic similarity index, quantifying the responses’ accuracy and relevance. Responses with and without RAG are compared under controlled conditions to evaluate how contextual augmentation influences semantic alignment.

To achieve this, the methodology is structured into three main phases (Figure 2): database creation, experimental execution, and comparative analysis. These phases evaluate the effectiveness of RAG in improving the precision and relevance of LLM-generated responses within the domain of environmental control in broiler poultry farming.

The structured methodology, including tools, datasets, and computational frameworks, supports the replication of this experiment.

2.1. Technologies

We selected Python version 3.13 for this experiment due to its versatility, extensive library ecosystem, and strong support for NLP and machine learning applications. LlamaIndex and LangChain were adopted to implement RAG and non-RAG models. At the same time, FAISS was used to store, index, and retrieve high-dimensional vectors efficiently, enabling fast and accurate retrieval of relevant information to enhance the RAG process. Additionally, we utilized OpenAI’s ChatGPT-4o model, recognized for its state-of-the-art natural language understanding and generation capabilities and its compatibility with RAG frameworks to improve response accuracy [19].

2.2. Database Creation

The database aims to organize and store all relevant information necessary for executing all experiment phases to evaluate RAG’s effectiveness in improving the response’s accuracy.

2.2.1. Selection of Article

Initially, we conducted a targeted literature review to identify recent studies on environmental control in broiler poultry farming. We used the following query string in the Scopus and Web of Science databases to identify relevant studies on environmental control in broiler production:

TITLE-ABS-KEY (“broiler chickens” OR “broiler production” OR “broiler houses” OR “poultry house” OR “broiler chicken barns”) AND (TITLE-ABS-KEY (“environmental control” OR “environmental management” OR “climate control” OR “air quality”) AND TITLE-ABS-KEY (“ventilation” OR “temperature” OR “humidity” OR “gas emissions” OR “NH₃” OR “ammonia” OR “CO₂” OR “carbon dioxide” OR “CO” OR “carbon monoxide” OR “H₂S” OR “hydrogen sulfide”) AND TITLE-ABS-KEY (“welfare” OR “comfort” OR “performance” OR “productivity”)) AND PUBYEAR > 2019 AND PUBYEAR < 2023 AND (LIMIT-TO (SUBJAREA, “AGRI”) OR LIMIT-TO (SUBJAREA, “VETE”) OR LIMIT-TO (SUBJAREA, “ENVI”)) AND (LIMIT-TO (LANGUAGE, “English”))

This query is structured to capture articles focused on broiler chicken production (including terms like “broiler houses” and “poultry house”) with a strong emphasis on environmental control aspects (e.g., ventilation, temperature, humidity, and gas emissions). It also targets research on animal welfare, performance, and productivity, reflecting the multi-dimensional impacts of environmental factors on broiler farming. The search is limited to studies published between 2020 and 2023 and includes relevant subject areas: agriculture and veterinary and environmental science.

The cut-off date was set because the GPT-4o model was trained with data up to 2023 [16]. This limitation was intentionally established to avoid creating a scenario where the RAG approach would naturally perform better, as the LLM would not have access to articles published beyond 2023. By excluding more recent materials, the comparison ensures a fair evaluation of RAG’s effectiveness.

We selected a sample of the ten most-cited articles from the search results. This citation-based sampling approach ensured that our dataset included high-impact studies widely recognized within the field, likely to offer comprehensive and relevant insights into environmental control in broiler production.

Our targeted search identified the ten most-cited articles on environmental control in broiler farming to form our dataset. Table 1 lists these selected articles, which serve as the foundation for evaluating the effectiveness of RAG in enhancing response accuracy.

2.2.2. Database Construction

We created a structured database based on scientific articles about environmental control in broiler chicken farming. This database stored all the necessary information to evaluate the effectiveness of RAG in improving response accuracy (Table 2).

2.2.3. Prompting and Data Generation

To generate questions that would be answered by the LLM both without and using the RAG technique, we created questions using each of the selected articles. For this purpose, we utilized OpenAI’s ChatGPT-4o based on the GPT-4o model.

We developed a set of tailored prompts to generate questions from each article. The primary prompt outlined the task’s objective, specifying that the questions be generated from scientific articles according to specific criteria. First, they should be relevant to broiler producers, addressing genuine concerns, challenges, or curiosities related to environmental control in poultry farming. Second, the questions should align with the page content, ensuring they can be answered using the information explicitly or implicitly available on the specific page. Additionally, the questions should be clear, simple, concise, and free of overly technical jargon while maintaining accuracy and professionalism. A practical focus should also be emphasized, prioritizing actionable information producers can use to improve farm management or decision-making. To ensure broad applicability, the questions should avoid references to paper-specific terminology, methodologies, or findings, instead generalizing the content into a form relatable to the producer’s context. Finally, the questions should be designed to encourage meaningful responses through open-ended exploration or contextualizing specific issues. Adhering to these guidelines ensured that the generated questions were realistic, practical, and aligned with the intended purpose of evaluating LLM performance in addressing practical queries.

The article input prompt provided the scientific article in PDF format. Finally, the iterative question-generation prompts repeated for each page to create a question for each page in each article.

An initial set of 133 questions was generated by these prompts and was saved in the experiment database. The results reflect the state of the Scopus and Web of Science databases as of October 2024, when the searches were executed.

2.2.4. Alignment Check of Generated Questions

We checked the adherence of each generated question with the established guidelines to ensure that they aligned with the intended purpose of evaluating the LLM’s performance in dealing with practical questions. At the end of this process, a set of 100 questions was selected for the experiments.

2.3. Experimental Execution: Response Generation with and Without RAG

To evaluate the performance of the GPT-4o model in terms of responding to the 100 queries related to environmental control in broiler poultry farming, we developed a Python-based program using advanced NLP techniques and the RAG framework (Figure 3). The program, built on the LangChain framework, facilitated the comparison of responses generated with and without RAG to analyze the impact of contextual information on response accuracy.

The evaluation involved generating responses for each query under two distinct conditions. In the first condition, without RAG, the model was provided with only the question and minimal guidance (basic prompt). In the second condition, with RAG, the model received both the question and the full-text page containing relevant contextual information (contextual prompt).

To implement the system, we used a two-step pipeline. First, we created a vector store using the FAISS library, where documents were converted into dense embeddings via OpenAI’s embedding model to enable efficient similarity-based retrieval. Second, we constructed a conversational chain by integrating the GPT-4o model with the retrieval mechanism and a memory module. The retrieval system fetched relevant documents from the vector store based on user queries, while the memory module preserved dialog context across multiple interactions.

The generated responses were stored in the database for further analysis. Each response was recorded in one of two dedicated columns: response_without_RAG and response_with_RAG. This structured approach facilitated a direct comparison between the two types of responses, allowing subsequent steps to assess their similarity.

2.4. Comparative Analysis: Semantic Similarity Evaluation

In NLP, evaluating semantic similarity is crucial for information retrieval, question-answering, and summarization tasks. Semantic similarity measures how closely two textual elements convey related meanings, focusing on ideas rather than exact word matches [32]. Techniques for measuring have evolved from basic methods like Jaccard and cosine similarity to word embeddings such as Word2Vec, GloVe, and FastText, which represent words in vector spaces but lack sentence-level context. Transformer models like BERT and RoBERTa improved contextual understanding but at high computational costs. Sentence-BERT (SBERT) addressed this by combining BERT’s contextual capabilities with efficiency, producing sentence-level embeddings optimized for semantic similarity tasks [33].

2.4.1. Semantic Similarity Assessment

We used semantic similarity as a metric to evaluate the responses generated by the model under RAG and non-RAG conditions. We employed the SentenceTransformer model to compute semantic similarity, specifically the pre-trained paraphrase-multilingual-MiniLM-L12-v2, designed to generate sentence embeddings. This model was selected for its ability to work across multiple languages, including English, and its effectiveness in representing the semantic meaning of sentences.

This approach allowed us to quantitatively assess how closely the responses generated by the model align with the expected answers. By comparing cosine similarity scores across RAG and non-RAG responses, we could evaluate the impact of contextual retrieval on the model’s ability to generate semantically accurate answers. This step was critical for determining the effectiveness of RAG in improving the relevance and accuracy of responses in the domain of environmental control in broiler poultry farming.

The process involved three main steps. First, each sentence was passed through the model using the encoding method, which converted the text into a dense numerical vector, or embedding, to capture its semantic meaning. Second, the resulting embeddings were transformed into tensors, a data structure optimized for mathematical operations and essential for performing similarity computations. Finally, the semantic similarity between two sentences was calculated using the cosine similarity index, a standard metric for comparing high-dimensional vectors such as sentence embeddings. This index measures the cosine of the angle between two vectors, with values ranging from −1 to 1, where 1 indicates semantically identical sentences, 0 indicates no semantic similarity, and −1 indicates semantic opposition (a rare outcome in such tasks) [32,33].

For the present study, we established three thresholds to categorize performance based on similarity scores. Responses with a similarity score between 0.0 and 0.6 were classified as having low similarity. These responses exhibited minimal alignment with the original text, with key concepts either missing, imprecisely represented, or significantly divergent from the source content. Linguistic and structural elements also showed notable deviations. Responses with a score between 0.6 and 0.8 were categorized as moderate-similarity. These responses partially aligned with the original text, reflecting some key concepts but with notable omissions or inaccuracies. Variations in language and structure were evident, and extraneous information not present in the original text could be included. Finally, responses with a similarity score between 0.8 and 1.0 were classified as high-similarity. These responses closely mirrored the original text, accurately capturing primary ideas and concepts. Language and structural organization were either highly consistent with the source material or appropriately adapted while maintaining fidelity without introducing significant inaccuracies or unrelated information.

2.4.2. Statistical Analysis

To evaluate the impact of the RAG technique on response quality, we analyzed the semantic similarity scores of response sets with and without RAG. Descriptive statistics summarized overall performance and variability. A line plot illustrated trends in similarity scores across all questions, with performance ranges visually represented. A histogram was generated to compare the frequency distributions of similarity scores, and a difference plot highlighted the degree of improvement across questions. Finally, boxplots were used to illustrate the distributions of similarity scores for visual comparison.

We also compared the statistical results between the two conditions and calculated the difference between scores with and without RAG to quantify the degree of improvement. Additionally, we determined the proportion of cases where RAG enhanced similarity scores. A paired t-test was conducted to assess the statistical significance of the observed differences. Together, these analyses provided a comprehensive evaluation of the RAG technique’s effectiveness in improving the semantic relevance of the generated responses.

3. Results

3.1. Descriptive Analysis

The analysis confirms the improvement in the performance of the RAG technique, highlighting its positive impact (Table 3). The increase in the median similarity index indicates that RAG consistently improved most responses in this dataset. Specifically, the average and the median similarity index with RAG compared without RAG represents a percentage increase of 13.45% for the mean and 12.68% for the median, demonstrating a consistent enhancement in semantic similarity when RAG is applied.

The higher standard deviation for similarity indices without RAG reflects more significant variability in the results, suggesting that the model’s performance is less consistent when relying solely on internal knowledge. Conversely, the lower standard deviation for similarity indices with RAG indicates that the technique improves the average similarity scores and stabilizes the results by 4.68%. RAG appears to act as a normalizing factor, reducing variability in the similarity indices and providing more consistent, semantically aligned responses. This consistency supports the hypothesis that RAG enhances the generated responses’ quality and reliability.

3.2. Statistical Validation of RAG’s Effectiveness

The paired t-test revealed a statistically significant difference between the similarity indices of responses generated with and without RAG (t = −7.610, p-value = 1.63 × 10⁻¹¹). The negative t-value indicates that, on average, similarity scores with RAG were significantly higher than those without RAG. The extremely small p-value demonstrates that the likelihood of this difference being due to chance is negligible, allowing us to confidently reject the null hypothesis that there is no significant difference between the two groups.

These results confirm that the observed improvement in similarity indices with RAG is not a random fluctuation but an actual effect. Combined with the descriptive statistics and visual analyses, these findings reinforce the conclusion that RAG consistently enhances semantic alignment in generated responses, further validating its effectiveness as a retrieval-based augmentation technique.

3.3. Similarity Comparison

Figure 4 illustrates the similarity index for responses generated with and without the RAG technique across all questions.

Figure 4 shows the overall positive impact of RAG, with responses using RAG generally achieving higher similarity scores than those without it. Responses without RAG predominantly fall into the low-similarity range, while those with RAG are more frequently distributed across the moderate- and high-similarity ranges, reflecting improved alignment with the source content. However, RAG’s performance is not consistently excellent. Around 10% of RAG-generated responses fall below the low-similarity threshold, and only about 30% reach the high-similarity range. The majority, approximately 60%, fall within the moderate-similarity range, indicating partial alignment with some omissions or inaccuracies. Additionally, in 12% of cases, responses without RAG outperformed those with RAG, suggesting that RAG is not always the optimal solution. These results highlight RAG’s potential and limitations, emphasizing the need for further refinement to achieve more consistent, high-quality performance.

3.4. Impact of RAG

The bar graph (Figure 5) indicates the positive impact of RAG by presenting the differences between similarity indices with and without RAG (i.e., the similarity index with RAG minus the index without RAG), with 88% of the differences being positive, confirming RAG’s effectiveness in improving semantic similarity. This result aligns with earlier analyses, such as the boxplot and histogram, which showed higher medians and a concentration of RAG responses in higher similarity ranges. The 12% of negative differences indicate cases where RAG underperformed, likely due to specific question or reference text characteristics, but these losses had smaller magnitudes, minimizing their overall impact. Positive differences often exceeded 0.4, highlighting significant gains for specific responses, while the majority of bars above the reference line (y = 0) underscore the consistency of RAG in enhancing semantic alignment. These findings confirm that RAG improves response quality and stability across most cases while identifying areas for further refinement to address the few cases where RAG was less effective.

3.5. Distribution

The histogram in Figure 6 provides a detailed comparison of the similarity indices for responses generated with and without RAG, revealing distinct distribution patterns. Responses with RAG are more concentrated in higher similarity ranges, with a tighter distribution and reduced variability, indicating a more consistent performance. In contrast, responses without RAG are more dispersed, with a higher concentration in the intermediate range and limited representation above 0.8, reflecting greater variability and lower alignment with the source content. This aligns with the descriptive analysis, which showed higher mean and median similarity indices for RAG responses than non-RAG responses. Additionally, the similarity comparison graph and the histogram highlight RAG’s stabilizing effect, shifting the distribution toward higher values.

However, limitations remain evident: approximately 10% of RAG responses fall below 0.6, only 30% achieve high similarity (0.8–1.0), and around 12% of non-RAG responses outperform RAG responses. These findings confirm that while RAG improves response quality and consistency overall, its effectiveness varies, leaving room for further refinement.

The boxplot in Figure 7 highlights differences in the distribution of similarity indices for responses generated with and without the RAG technique. The median similarity index for responses with RAG is visibly higher than those without RAG, confirming that most responses with RAG achieve better alignment with the source content. This finding is consistent with the descriptive analysis, which showed higher mean and median values for RAG responses than without RAG responses.

The smaller interquartile range (IQR) for RAG responses indicates reduced variability among the central 50% of scores, indicating a more consistent performance. In contrast, the larger IQR for responses without RAG reflects greater dispersion and inconsistency, as also seen in the similarity comparison graph and histogram, where non-RAG responses showed broader variability and a higher concentration of lower similarity scores. Additionally, while responses with RAG exhibit more outliers, these are concentrated in higher similarity ranges, suggesting occasional high-performing cases. Conversely, the outliers for responses without RAG occur in much lower similarity ranges, underscoring poor alignment and significantly lower response quality.

The histogram also supports this observation, showing a denser concentration of RAG responses in higher similarity ranges and non-RAG responses more scattered in intermediate ranges. These results reinforce the conclusion that RAG improves both the average similarity scores and the consistency of responses, although occasional limitations remain.

The results indicate the effectiveness of the RAG technique in enhancing the semantic similarity of generated responses. Across all analyses, RAG consistently outperformed the non-RAG approach, as evidenced by higher mean and median similarity indices and reduced variability, with a decrease in standard deviation. The paired t-test confirmed these improvements’ statistical significance, reinforcing that the observed gains were not due to random chance. The graph analysis highlighted the concentration of RAG responses in higher similarity ranges, demonstrating improved accuracy and greater consistency. It further revealed that 88% of cases showed positive gains with RAG, with significant improvements in some cases exceeding 0.4.

4. Discussion

In the present study, the results demonstrated the potential of RAG to enhance the accuracy and relevance of LLM-generated responses in environmental control for broiler poultry farming. The findings supported our hypothesis that RAG would improve LLM performance by providing contextual information to support informed decision-making. Integrating RAG with LLMs demonstrated potential by improving contextual alignment, as evidenced by a 13.45% increase in mean similarity scores. Since PLF in broiler production relies on detection, monitoring, analysis, and decision-making to ensure productivity and welfare [2,3], these enhancements are crucial for addressing complex environmental data and supporting real-time decision-making [3,8].

Despite these encouraging results, some limitations were observed. In 12% of cases, responses without RAG outperformed those with RAG, indicating that retrieval quality and relevance can occasionally hinder performance. This highlights the need to enhance retrieval mechanisms and ensure external knowledge bases remain current and comprehensive [25,26]. Computational costs and scalability are also important challenges, particularly for high-density poultry production, where real-time decision-making demands both efficiency and reliability [11].

The findings also emphasize the practicality of RAG for small-scale producers, offering a cost-effective alternative to traditional consultancy services. By retrieving and integrating relevant knowledge, RAG-equipped LLMs ensure more dynamic and tailored support, making real-time environmental control adjustments feasible even in resource-limited settings [26,27,28,29,30].

From a practical perspective, the enhanced performance of RAG has clear implications for the poultry industry. Its ability to deliver consistent, context-sensitive insights into environmental variables, such as temperature, humidity, and ammonia levels, supports producers in maintaining optimal conditions for animal welfare and productivity. Furthermore, RAG addresses key industry challenges by analyzing sensor data, generating reports, optimizing ventilation, managing thermal conditions, and monitoring air quality, providing actionable insights that improve productivity and bird welfare.

The controlled variables and reproducible methodologies used in this study further ensured the validation of the hypothesis by isolating the impact of RAG on response quality. The improvement in semantic similarity demonstrates that integrating relevant external knowledge into LLM workflows leads to more precise and context-sensitive outputs. These findings underscore the effectiveness of RAG in overcoming common limitations of LLMs, such as generalized responses and reliance on static training data [2,3,16,17,18].

In conclusion, the results confirm our initial hypothesis, demonstrating that RAG improves LLM performance by providing contextual information. While some areas require refinement, the observed improvements validate the viability of RAG as a transformative tool for modern poultry farming.

5. Conclusions

The present study demonstrates the effectiveness of integrating RAG with LLMs to enhance decision-making and improve environmental control in broiler poultry farming. The findings highlight RAG’s capacity to improve the semantic accuracy and contextual relevance of LLM responses, making it a promising approach for addressing the complex challenges of high-density poultry production systems. The statistical and semantic analyses confirmed that RAG reduces variability and enhances LLM response consistency, enabling producers to make data-driven adjustments informed by LLM answers and analysis, optimizing animal welfare and productivity.

Despite these promising results, there is still room for improvement. Challenges in managing retrieval quality, addressing inconsistencies in retrieved information, and ensuring scalability underscore the need for continued refinement of RAG frameworks.

Beyond its immediate benefits, integrating RAG into LLM workflows offers a scalable and cost-effective solution for supporting small-scale producers who often lack access to expert consultancy. By bridging the gap between raw sensor data and actionable insights, RAG-equipped LLMs demonstrate significant potential to transform environmental management practices, fostering sustainability and regulatory compliance in broiler farming.

Future research should focus on enhancing the precision and reliability of RAG by addressing uncertainties in retrieved information and incorporating non-classical logic, such as Fuzzy and Paraconsistent Logic, to handle variability and ambiguity in data. Exploring alternative metrics—perplexity, precision, and factual accuracy—could provide deeper insights into RAG performance. Comparative studies with other LLMs, including models from different providers, would also establish valuable benchmarks for scalability and generalizability. Another promising avenue lies in integrating sensor data from poultry houses with RAG-enabled knowledge bases, generating real-time, context-sensitive insights to advance precision livestock farming further. Such developments would solidify RAG’s role as a transformative tool for improving environmental control and decision-making in modern agriculture.

Author Contributions

Conceptualization, M.V.L., M.L.H.S., I.d.A.N. and J.M.A.; methodology, M.V.L. and M.L.H.S.; software, M.V.L. and M.L.H.S.; validation, M.V.L., M.L.H.S. and I.d.A.N.; formal analysis, M.V.L. and M.L.H.S.; investigation, M.V.L. and M.L.H.S.; resources, M.V.L. and M.L.H.S.; data curation, M.V.L. and M.L.H.S.; writing—original draft preparation, M.V.L. and M.L.H.S.; writing—review and editing, M.V.L., M.L.H.S., I.d.A.N. and J.M.A.; visualization, I.d.A.N.; supervision, I.d.A.N. and J.M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data will be available upon request.

Acknowledgments

The authors thank the Coordination for the Improvement of Higher Education Personnel (CAPES) for the master’s scholarship of the first author and the National Council for Scientific and Technological Development (CNPQ) for the PQ scholarship of the fourth author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mottet, A.; Tempio, G. Global poultry production: Current state and future outlook and challenges. J. World’s Poult. Sci. 2017, 73, 245–256. [Google Scholar] [CrossRef]
Berckmans, D. Automatic on-line monitoring of animal health and welfare by Precision Livestock Farming. In Proceedings of the European Forum Livestock Housing for the Future, Lille, France, 22–23 October 2009. [Google Scholar]
Berckmans, D. General introduction to precision livestock farming. Anim. Front. 2017, 7, 6–11. [Google Scholar] [CrossRef]
ABPA. Anual Report 2024. Brazilian Association of Animal Protein, 2024. Available online: https://abpa-br.org/wp-content/uploads/2024/04/RA_2024_ABPA_ingles_avicultura.pdf (accessed on 20 August 2024).
Gržinić, G.; Piotrowicz-Cieślak, A.; Klimkowicz-Pawlas, A.; Górny, R.L.; Ławniczek-Wałczyk, A.; Piechowicz, L.; Olkowska, E.; Potrykus, M.; Tankiewicz, M.; Krupka, M.; et al. Intensive poultry farming: A review of the impact on the environment and human health. Sci. Total Environ. 2020, 858, 160014. [Google Scholar] [CrossRef]
Hafez, H.M.; Attia, Y.A. Challenges to the Poultry Industry: Current Perspectives and Strategic Future After the COVID-19 Outbreak. Front. Vet. Sci. 2020, 7, 516. [Google Scholar] [CrossRef]
Astill, J.; Dara, R.A.; Fraser, E.D.; Roberts, B.; Sharif, S. Smart poultry management: Smart sensors, big data, and the internet of things. Comput. Electron. Agric. 2020, 170, 105291. [Google Scholar] [CrossRef]
Zheng, H.; Zhang, T.; Fang, C.; Zeng, J.; Yang, X. Design and Implementation of Poultry Farming Information Management System Based on Cloud Database. Animals 2021, 11, 900. [Google Scholar] [CrossRef]
Dewanto, P.; Munadi, M.; Tauviqirrahman, M. Development of an automatic broiler feeding system using PLC and HMI for closed house system. ASRJETS 2019, 58, 139–149. Available online: https://core.ac.uk/download/pdf/235050874.pdf (accessed on 20 August 2024).
Porter, M.E.; Heppelmann, J.E. How smart, connected products are transforming competition. Harv. Bus. Rev. 2014, 92, 64–88. [Google Scholar]
Lashari, M.H.; Memon, A.A.; Shah, S.A.A.; Nenwani, K.; Shafqat, F. IoT Based Poultry Environment Monitoring System. In Proceedings of the IEEE International Conference on Internet of Things and Intelligence System (IOTAIS), Bali, Indonesia, 3–8 November 2018; Available online: https://ieeexplore-ieee-org.ez346.periodicos.capes.gov.br/document/8600837 (accessed on 29 July 2024).
Halachmi, I.; Guarino, M.; Bewley, J.; Pastell, M. Smart Animal Agriculture: Application of Real-Time Sensors to Improve Animal Well-Being and Production. Annu. Rev. Anim. Biosci. 2019, 7, 403–425. [Google Scholar] [CrossRef]
Raikov, A.; Abrosimov, V. Artificial Intelligence and Robots in Agriculture. In Proceedings of the 2022 15th International Conference Management of large-Scale System Development (MLSD), Moscow, Russia, 26–28 September 2022; pp. 1–5. [Google Scholar] [CrossRef]
Ojo, R.O.; Ajayi, A.O.; Owolabi, H.A.; Oyedele, L.O.; Akanbi, L.A. Internet of Things and Machine Learning techniques in poultry health and welfare management: A systematic literature review. Comput. Electron. Agric. 2022, 200, 107266. [Google Scholar] [CrossRef]
García, M.; Martínez, J. Application of Internet of Things (IoT) in Animal Production Systems. J. Anim. Sci. Technol. 2019, 10, 123–134. [Google Scholar]
Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Google AI. Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Available online: https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html (accessed on 17 November 2024).
Meta AI. LLaMA: Open and Efficient Foundation Language Models. Available online: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ (accessed on 17 November 2024).
OpenAI. Introducing ChatGPT. Available online: https://openai.com/blog/chatgpt/ (accessed on 17 November 2024).
Doshi-Velez, F.; Kim, B. Towards a rigorous science of interpretable machine learning. arXiv 2017, arXiv:1702.08608. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.J.; Madotto, A.; Fung, P. A survey of hallucination in natural language generation. ACM Comput. Surv. 2022, 55, 1–38. [Google Scholar] [CrossRef]
Metze, K.; Morandin-Reis, R.C.; Lorand-Metze, I.; Florindo, J.B. Bibliographic Research with ChatGPT may be Misleading: The Problem of Hallucination. J. Pediatr. Surg. 2024, 59, 158. [Google Scholar] [CrossRef]
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv 2022, arXiv:2201.11903. [Google Scholar]
Ott, S.; Hebenstreit, K.; Liévin, V.; Hother, C.E.; Moradi, M.; Mayrhauser, M.; Praas, R.; Winther, O.; Samwald, M. ThoughtSource: A central hub for large language model reasoning data. Sci. Data 2023, 10, 528. [Google Scholar] [CrossRef]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.T.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv 2020, arXiv:2005.11401. [Google Scholar]
Li, H.; Su, Y.; Cai, D.; Wang, Y.; Liu, L. A Survey on Retrieval-Augmented Text Generation. arXiv 2022, arXiv:2202.01110. [Google Scholar]
Karpatne, A.; Jia, X.; Kumar, V. Knowledge-guided machine learning: Current Trends and Future Prospects. arXiv 2024, arXiv:2403.15989. [Google Scholar]
Izacard, G.; Grave, E. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. arXiv 2020, arXiv:2007.01282. [Google Scholar]
Petroni, F.; Piktus, A.; Fan, A.; Lewis, P.; Yazdani, M.; De Cao, N.; Thorne, J.; Jernite, Y.; Karpukhin, V.; Maillard, J.; et al. KILT: A Benchmark for Knowledge Intensive Language Tasks. arXiv 2020, arXiv:2009.02252. [Google Scholar]
Guo, J.; Fan, Y.; Pang, L.; Yang, L.; Ai, Q.; Zamani, H.; Wu, C.; Croft, W.B.; Cheng, X. A deep dive into neural ranking models for information retrieval. Inf. Process. Manag. 2022, 57, 102067. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Reimers, N.; Gurevych, I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
Bist, R.B.; Subedi, S.; Chai, L.; Yang, X. Ammonia emissions, impacts, and mitigation strategies for poultry production: A critical review. J. Environ. Manag. 2023, 328, 116919. [Google Scholar] [CrossRef]
Bloch, V.; Barchilon, N.; Halachmi, I.; Druyan, S. Automatic broiler temperature measuring by thermal camera. Biosyst. Eng. 2020, 199, 127–134. [Google Scholar] [CrossRef]
Costantino, A.; Fabrizio, E.; Villagrá, A.; Estellés, F.; Calvet, S. The reduction of gas concentrations in broiler houses through ventilation: Assessment of the thermal and electrical energy consumption. Biosyst. Eng. 2020, 199, 135–148. [Google Scholar] [CrossRef]
Kalus, K.; Konkol, D.; Korczyński, M.; Koziel, J.A.; Opaliński, S. Effect of biochar diet supplementation on chicken broilers performance, NH₃ and odor emissions and meat consumer acceptance. Animals 2020, 10, 1539. [Google Scholar] [CrossRef]
Babadi, K.A.; Khorasanizadeh, H.; Aghaei, A. CFD modeling of air flow, humidity, CO₂ and NH₃ distributions in a caged laying hen house with tunnel ventilation system. Comput. Electron. Agric. 2022, 193, 106677. [Google Scholar] [CrossRef]
Li, D.; Tong, Q.; Shi, Z.; Zheng, W.; Wang, Y.; Li, B.; Yan, G. Effects of Cold Stress and Ammonia Concentration on Productive Performance and Egg Quality Traits of Laying Hens. Animals 2020, 10, 2252. [Google Scholar] [CrossRef]
Al Assaad, D.K.; Orabi, M.S.; Ghaddar, N.K.; Ghali, K.F.; Salam, D.A.; Ouahrani, D.; Farran, M.T.; Habib, R.R. A sustainable localised air distribution system for enhancing thermal environment and indoor air quality of poultry house for semiarid region. Biosyst. Eng. 2021, 203, 70–92. [Google Scholar] [CrossRef]
Soliman, E.S.; Hassan, R.A. Influence of housing floor on air quality, growth traits, and immunity in broiler chicken farms. Adv. Anim. Vet. Sci. 2020, 8, 997–1008. [Google Scholar] [CrossRef]
Wen, P.; Li, L.; Xue, H.; Jia, Y.; Gao, L.; Li, R.; Huo, L. Comprehensive evaluation method of the poultry house indoor environment based on gray relation analysis and Analytic Hierarchy Process. Poult. Sci. 2022, 101, 101587. [Google Scholar] [CrossRef]

Figure 1. Schematic of the RAG Process Flow. Source: the authors.

Figure 2. Schematic flow of the research approach. Source: the authors.

Figure 3. GPT-4o and RAG implementation user print screen interface. Source: the authors.

Figure 4. Similarity index comparison, without RAG vs. with RAG. Source: the authors.

Figure 5. Differences between the similarity rate with RAG and without RAG. Source: the authors.

Figure 6. Differences between the frequencies of similarity indices with RAG and without RAG. Source: the authors.

Figure 7. Boxplot comparison of the distributions considering the similarity indices with and without RAG. Source: the authors.

Table 1. Selected sources, article identification (DOI), and the number of citations received.

Source	Digital Object Identifier (DOI)	Cited by
Bist et al. [34]	10.1016/j.jenvman.2022.116919	52
Bloch et al. [35]	10.1016/j.biosystemseng.2019.08.011	28
Costantino et al. [36]	10.1016/j.biosystemseng.2020.01.002	25
Costantino et al. [37]	10.3390/ani10091539	24
Ahmadi Babadi et al. [38]	10.1016/j.compag.2021.106677	21
Li et al. [39]	10.3390/ani10122252	21
Al Assaad et al. [40]	10.1016/j.biosystemseng.2021.01.002	18
Soliman et al. [41]	10.17582/JOURNAL.AAVS/2020/8.9.997.1008	17
Peng et al. [42]	10.1016/j.psj.2021.101587	15

Source: the authors.

Table 2. Database columns.

Column	Description
Article Citation	Identifies the source of the article.
Page Number	Specifies the page from which the question was derived.
Full Text of the Page	Contains the full text of the page used to generate the question.
Question	Contains the question derived from the specified page.
Correct Answer	Provides an interpretive response that captures the implied meaning within the context of the text.
Original Text Answering the Question	Extracts the direct answer as stated in the article.

Table 3. Measures of central tendency.

Measures	Similarity Without RAG	Similarity with RAG	Difference
Mean	0.6369	0.7713	~0.1345
Median	0.6569	0.7836	~0.1268
Standard deviation	0.1660	0.1192	~−4.68

Source: the authors.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Leite, M.V.; Abe, J.M.; Souza, M.L.H.; de Alencar Nääs, I. Enhancing Environmental Control in Broiler Production: Retrieval-Augmented Generation for Improved Decision-Making with Large Language Models. AgriEngineering 2025, 7, 12. https://doi.org/10.3390/agriengineering7010012

AMA Style

Leite MV, Abe JM, Souza MLH, de Alencar Nääs I. Enhancing Environmental Control in Broiler Production: Retrieval-Augmented Generation for Improved Decision-Making with Large Language Models. AgriEngineering. 2025; 7(1):12. https://doi.org/10.3390/agriengineering7010012

Chicago/Turabian Style

Leite, Marcus Vinicius, Jair Minoro Abe, Marcos Leandro Hoffmann Souza, and Irenilza de Alencar Nääs. 2025. "Enhancing Environmental Control in Broiler Production: Retrieval-Augmented Generation for Improved Decision-Making with Large Language Models" AgriEngineering 7, no. 1: 12. https://doi.org/10.3390/agriengineering7010012

APA Style

Leite, M. V., Abe, J. M., Souza, M. L. H., & de Alencar Nääs, I. (2025). Enhancing Environmental Control in Broiler Production: Retrieval-Augmented Generation for Improved Decision-Making with Large Language Models. AgriEngineering, 7(1), 12. https://doi.org/10.3390/agriengineering7010012

Article Menu

Enhancing Environmental Control in Broiler Production: Retrieval-Augmented Generation for Improved Decision-Making with Large Language Models

Abstract

1. Introduction

2. Materials and Methods

2.1. Technologies

2.2. Database Creation

2.2.1. Selection of Article

2.2.2. Database Construction

2.2.3. Prompting and Data Generation

2.2.4. Alignment Check of Generated Questions

2.3. Experimental Execution: Response Generation with and Without RAG

2.4. Comparative Analysis: Semantic Similarity Evaluation

2.4.1. Semantic Similarity Assessment

2.4.2. Statistical Analysis

3. Results

3.1. Descriptive Analysis

3.2. Statistical Validation of RAG’s Effectiveness

3.3. Similarity Comparison

3.4. Impact of RAG

3.5. Distribution

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI