Literacy Deep Reinforcement Learning-Based Federated Digital Twin Scheduling for the Software-Defined Factory

Ahn, Jangsu; Yun, Seongjin; Kwon, Jin-Woo; Kim, Won-Tae

doi:10.3390/electronics13224452

Open AccessArticle

Literacy Deep Reinforcement Learning-Based Federated Digital Twin Scheduling for the Software-Defined Factory

Future Convergence Engineering Major, Department of Computer Science and Engineering, Korea University of Technology and Education, Cheonan 31253, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(22), 4452; https://doi.org/10.3390/electronics13224452

Submission received: 15 October 2024 / Revised: 8 November 2024 / Accepted: 11 November 2024 / Published: 13 November 2024

(This article belongs to the Special Issue Metaverse and Digital Twins, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

As user requirements become increasingly complex, the demand for product personalization is growing, but traditional hardware-centric production relies on fixed procedures that lack the flexibility to support diverse requirements. Although bespoke manufacturing has been introduced, it provides users with only a few standardized options, limiting its ability to meet a wide range of needs. To address this issue, a new manufacturing concept called the software-defined factory has emerged. It is an autonomous manufacturing system that provides reconfigurable manufacturing services to produce tailored products. Reinforcement learning has been suggested for flexible scheduling to satisfy user requirements. However, fixed rule-based methods struggle to accommodate conflicting needs. This study proposes a novel federated digital twin scheduling that combines large language models and deep reinforcement learning algorithms to meet diverse user requirements in the software-defined factory. The large language model-based literacy module analyzes requirements in natural language and assigns weights to digital twin attributes to achieve highly relevant KPIs, which are used to guide scheduling decisions. The deep reinforcement learning-based scheduling module optimizes scheduling by selecting the job and machine with the maximum reward. Different types of user requirements, such as reducing manufacturing costs and improving productivity, are input and evaluated by comparing the flow-shop scheduling with job-shop scheduling based on reinforcement learning. Experimental results indicate that in requirement case 1 (the manufacturing cost), the proposed method outperforms flow-shop scheduling by up to 14.9% and job-shop scheduling by 5.6%. For requirement case 2 (productivity), it exceeds the flow-shop method by up to 13.4% and the job-shop baseline by 7.2%. The results confirm that the literacy DRL scheduling proposed in this paper can handle the individual characteristics of requirements.

Keywords:

industrial metaverse; federated digital twins; large language model; flexible job-shop; scheduling; the software-defined factory

1. Introduction

The paradigm of the manufacturing industry that started with craft production has evolved from mass production to mass customization [1]. Craft production can be customized to individual orders, but it relies heavily on skilled craftsmen and manual labor in the production process. Because this method requires specialized skills and experience, it is difficult to mass produce due to its low efficiency and high cost. Mass production, on the other hand, is a standardized production process that allows you to produce products of the same quality on a large scale [2]. In 1913, Henry Ford introduced the world’s first moving assembly line, and this revolutionary production method was considered a major turning point that made mass production possible [3]. Mass customization, on the other hand, allows for the mass production of goods that meet consumer needs by creating products that match customer preferences through a set of options. However, as personalized needs increase, the manufacturing industry of the future is expected to shift beyond mass customization to mass personalization [4]. Similarly to mass customization, this approach aims to provide products that fit the user’s demands. While bespoke production has a predetermined set of standardized options to choose from, mass personalization allows for the production of tailored products that perfectly meet individual requirements [5].

The manufacturing paradigm is shifting from mass production to customized production, and a more flexible production system is required to cope with this. Research on robot-based production and inspection systems has been a primary focus to respond to these changes, with the aim of improving flexibility and efficiency. For example, Ulrich et al. [6] presented a method to shorten inspection cycles through path planning optimization in an inspection system. Kim et al. [7] addressed the problem of the optimal scheduling of cluster tools using dual-arm robots and proposed a flexible scheduling approach to improve processing efficiency. In addition, Foumani et al. [8] proposed a scheduling approach to maximize productivity by utilizing a stop–resume processing mode while a multifunctional robot performed tasks in two mechanical robot cells. This research can be utilized as an essential approach for flexible production in complex manufacturing environments and can contribute to increased efficiency. However, these studies still rely on fixed rules and predefined scenarios, which limit their ability to meet the specialized needs of individual users.

Transitioning to a new manufacturing paradigm requires the integration of advanced ICT technologies such as digital twins and metaverses. The metaverse is a service that blurs the boundaries between the real and virtual worlds, thereby expanding our scope of activities [9]. In restricted situations, such as the COVID-19 pandemic, the metaverse has been extensively utilized in service platforms such as gaming and social networks. The industrial metaverse applies this concept to enhance efficiency in industrial settings through the interaction of virtual and physical objects. It extends not only the assets of the manufacturing site but also the manufacturing space and processes into the virtual realm, supporting operational optimization through real-time monitoring and analysis. In this context, digital twins (DTs) play a crucial role by serving as a data conduit between the metaverse and the real world, facilitating seamless integration and interaction.

The federated digital twin (fDT) is a set of digital twins that enhances the capabilities of individual digital twins through the interconnection of homogeneous or heterogeneous digital twins. This enables the assessment, prediction, and optimization of specific service statuses. A federated digital twin can form a set of digital twins that can perform the functions required by a service and optimize the service through scheduling between them. One example of a federated digital twin is the software-defined factory’s optimal manufacturing process, which consists of a large number of reconfigurable manufacturing robots that can adapt to dynamic requirements. The individual capabilities and deployment order of manufacturing robots can change to meet new requirements, such as changing the design of a product or adding features. Process optimization is achieved by configuring and scheduling robot DTs to perform specific functions in an optimal sequence.

Traditional factories, which can be described as hardware-defined factories, rely on fixed hardware equipment to perform static manufacturing functions. These weaknesses limit the adaptability of the production processes required by the future manufacturing paradigm and make it difficult to respond to the individual needs of users. To fulfill this need, the software-defined factory (SDF) is proposed as shown in Figure 1. The SDF represents an advanced form of a smart factory, providing an intelligent production system that can offer customized manufacturing services by reconfiguring the manufacturing functions defined in software to meet user needs. This allows for the intelligent reconfiguration of equipment functions based on user requirements, resulting in flexibility that traditional static factories cannot provide. This shift in manufacturing paradigms is essential for manufacturers to maintain competitiveness and meet customer expectations, with the SDF playing a core role in this process. In particular, the SDF is expected to establish itself as an innovative platform that maximizes production efficiency and effectively responds to user requirements through data-driven decision-making and flexible scheduling.

The scheduling process of digital twins is critical in fDTs that provide services through a federation of individual digital twins [10]. In recent research, reinforcement learning (RL) has been heavily used to deal with the many variables and exceptions that occur in the real world [11]. However, they often rely on fixed rules and predefined scenarios, making it difficult to dynamically respond to the specialized needs and situations of different users. Furthermore, the complexity of data relationships in industrial metaverses with a complex structure of physical and virtual resources requires new scheduling approaches to efficiently manage and analyze them. In particular, adaptive scheduling techniques that can instantly respond to dynamic changes in data and user demands are essential. Large language models (LLMs) can effectively address the limitations of RL-based scheduling, which often struggles with fixed rules. LLMs excel at understanding and processing the semantics of natural language, allowing them to derive optimal decisions from complex data inputs. This capability enables an analysis of intricate data relationships and facilitates customized scheduling that accounts for various KPIs. In doing so, LLMs enhance the flexibility and adaptability of the manufacturing process, optimizing interactions between digital twins and supporting decision-making.

In this paper, we propose a federated digital twin scheduling method using LLMs and deep reinforcement learning (DRL). The literacy module of this study uses LLM to analyze user requirements and weight data for scheduling. These data are then used by a DRL-based scheduling module to perform user-centric scheduling that can flexibly respond to the requirements. The contributions of this research are as follows. First, we propose a software-defined factory that enables the realization of the mass personalization paradigm through the intelligent reconfiguration of manufacturing functions designed by software. Second, we implement a literacy module based on LLM for user requirements analysis to enable an in-depth understanding of requirements. Then, we develop a DRL-based scheduling module with dynamic compensation for the characteristics of scheduling data to support optimized scheduling in real-world environments.

In Section 2, we define the industrial metaverse through a comparison of common metaverses and introduce the software-defined factory, which is a future manufacturing plant utilizing the industrial metaverse. We then discuss the importance of scheduling problems with federated digital twins and review research on existing scheduling methods based on reinforcement learning. Section 2 concludes with a generational overview of the evolution of LLMs. In Section 3, we propose a literacy DRL method for understanding and analyzing different user requirements and describe how it works. Section 4 describes the experimental environment and analyzes the results. Finally, Section 5 concludes this thesis with future work.

2. Background

2.1. The Software-Defined Factory

The software-defined factory (SDF) is an ideal factory capable of providing autonomous solutions that tailor user requirements. This factory can consist of a physical space responsible for actual production and an industrial metaverse that constitutes the virtual environment. Within the industrial metaverse, users can directly define and adjust the design, functionality, and performance of products, thereby revolutionizing the manufacturing process. In this context, user requirements are converted into executable parameters within the system. Unlike traditional fixed hardware, the SDF employs open hardware and supports the flexible reconfiguration of operating systems (OSs), middleware, and applications (APPs) through open APIs. This approach enables factories to seamlessly adapt their manufacturing capabilities to produce customized products for specific user needs. In addition, the Intelligent Manufacturing Operations System (I-MOS) offered by the SDF leverages real-time data analytics and predictive capabilities to enable more flexible and tailored manufacturing operations. I-MOS enables the real-time monitoring of every step of the production process to enable predictive maintenance to prevent equipment breakdowns, smart quality control to automatically resolve issues, and just-in-time production planning to optimize inventory. I-MOS can also quickly adapt to changing customer demands through personalized production and optimize resource allocation through intelligent resource management.

Unlike traditional fixed hardware structures, open assets serve as a pivotal enabler in enhancing the flexibility and adaptability of modern manufacturing systems. By allowing the seamless application of various software functions, open assets introduce a dynamic approach to system configuration, far surpassing the rigid constraints of conventional hardware setups. This flexibility is achieved through the use of standardized interfaces and protocols, which enable the smooth integration of diverse equipment and software essential for various stages of the manufacturing process. Figure 2 illustrates a scenario where robotic arms are reconfigured to perform different functions based on requirements. Components can be added, replaced, or updated to reconfigure the system as needed to respond to user requirements or factory conditions. For example, the SDF, which is equipped with a robotic arm for product assembly, can respond to an increase in order volume by updating the motor module. This adaptability is further enhanced by adding or removing physical components such as sensors, actuators, controls, and end-effectors through the standardized interfaces of open assets. Standardized interfaces and modularization make it easy to scale even as the number of jobs and machines grows, such as reacting to equipment breakdowns or maintenance needs by redeploying idle resources, which allows them to operate reliably even in large factories. This allows the SDF to respond quickly to changes in production demand and improve overall productivity.

2.2. Scheduling

Scheduling is the process of planning and coordinating tasks or activities according to goals [12]. In the manufacturing industry, scheduling refers to the management of production schedules, such as machine assignments and execution times, with the goal of increasing productivity or minimizing manufacturing costs. Scheduling can be categorized into two main scheduling methods depending on the manufacturing environment. The first is flow-shop scheduling, which is suitable for environments where the same product is produced repeatedly, with all production tasks processed in a fixed sequence. It has the advantage of maintaining an efficient production flow and is easy to manage because the sequence of tasks is clearly defined. The second method is job-shop scheduling, where all production tasks are processed by different machines or processes without following a specific process sequence [11]. This provides flexibility to customize products according to the individual needs of consumers. Therefore, job-shop scheduling is suitable for future factory environments that produce a wide variety of products, such as software-defined factories.

The job-shop scheduling problem (JSSP) is an optimization problem that deals with the allocation problem among machines that can process a job, which consists of a job and the machines that can process the job. A job is also composed of operations that are executed in a certain order. There are important assumptions in the JSSP that a machine cannot process more than one operation simultaneously and that the processing time of an operation performed by a machine does not change during the operation [13]. Algorithms for solving JSSPs are categorized into an accurate algorithm, heuristics algorithm, and meta-heuristics algorithm. An accurate algorithm guarantees global optimal results but requires high computing power and memory cost. On the other hand, heuristics algorithms provide approximate solutions instead of optimal solutions, making them relatively fast and efficient, but they are designed for specific problems and lack generality. To overcome these limitations, meta-heuristic algorithms are often used, which combine or modify multiple heuristic methods to solve a problem. In particular. reinforcement learning algorithms that can interact with the environment are often used to apply to real-world environments with complex variables [14,15].

2.3. Large Language Models

The first generation of generative AI relied on rule-based systems and statistical models to generate simple patterns. The second generation introduced deep learning methods like recurrent neural networks (RNNs), enabling the better handling of sequential data and improving tasks such as language modeling and text generation. The third generation of generative AI utilized transformer structures to learn more complex patterns from large amounts of data. GPT, released by Open AI in 2018, has shown promise in a variety of natural language processing applications, including word inference, classification, Q&A, and machine translation. GPT achieves its performance by pre-training large amounts of unlabeled natural language data and then fine-tuning the data for specific tasks where labels exist. GPT, which uses only the transformer’s decoder, learns to predict the next word in a given text and performs well at generating sentences. Google’s BERT (Bidirectional Encoder Representations from Transformers), on the other hand, is a bidirectional language model that uses only the transformer’s encoder and is trained by masking words in a sentence and then predicting them. This model can more accurately understand the meaning and context of words considering the sentence as a whole [16]. Both models are based on the transformer’s architecture and represent important advances in natural language processing. The development history of these language models is summarized in Figure 3. The success of these models has spurred the emergence of LLMs, with increasing numbers of parameters and training data [17]. On the other hand, language models with fewer parameters, such as Meta’s LLaMA, have also been well studied. This model has about 65 billion parameters, which is relatively small compared to other LLMs, but it emphasizes that it can perform well with less computing power by increasing training efficiency with high-quality data [18].

Recent research has shown advances and applications of LLMs in various industries. In Tian et al. [19], LLMs played a role in improving the system design, model integration, policy verification, and development processes of autonomous driving systems. In Peifeng et al.’s work [20], a novel framework integrating LLMs and knowledge graphs to enhance fault diagnosis in the aviation assembly industry was proposed, demonstrating industrial applicability through knowledge enhancement, subgraph generation and discovery, and high diagnostic accuracy. In addition, various studies have been conducted for human–robot collaboration (HRC). Wang et al. [21] applied LLMs in a manufacturing environment to enhance HRC by enabling natural language understanding, dynamic task planning, and optimized tool discovery. The system proposed by Gkournelos et al. [22] utilizes LLMs to process commands in natural language through a natural language interface, moving towards HRC. Xia et al. [23] proposed an intelligent agent that generates task schedules based on human input and integrates these with a digital twin to adapt in real-time to enable seamless HRC in smart manufacturing. In Fan et al.’s study [24], LLMs were utilized to enhance industrial robots by enabling them to interpret natural language commands, generate accurate motion paths, and autonomously execute manufacturing tasks.

3. Proposed Methods

3.1. Federated Digital Twin Scheduling Based Literacy DRL

Traditional RL-based scheduling has fixed rules that make it difficult to respond flexibly to user needs. However, literacy DRL-based federated digital twin scheduling is a user-centric method that uses reinforcement learning to schedule tasks based on user requirements analyzed by an LLM-based literacy module. Figure 4 is a simple schematic illustration of how a literacy DRL could work in a future manufacturing plant, such as the SDF. User requirements, including product specifications, features, designs, costs, etc., are analyzed by the literacy module. The literacy module (Figure 5) understands the user requirements expressed in natural language and allows the most relevant digital twin attributes to be prioritized in the scheduling process. While traditional scheduling is inflexible due to fixed rules, the proposed method is based on analyzing user requirements, which can be expected to improve the flexibility and efficiency of the production process. Literacy DRL scheduling also offers the possibility of scaling flexibly to production environments of different sizes. To achieve this, the system dynamically adjusts its scheduling approach based on resource availability and job demand. If production requirements change unexpectedly, such as equipment failure, the framework can reevaluate task prioritization and resource allocation to effectively meet the new demand. This adaptability enables robust performance across a wide range of manufacturing sizes and operational scenarios and demonstrates the flexibility of how dynamic manufacturing conditions are handled. The digital twin attributes are fed into graph neural networks (GNN) along with the state of the machines and operations to be scheduled, as shown in Figure 6. Traditional AI, which requires structured inputs, is difficult to apply to scheduling problems with high complexity and unstructured characteristics, such as those in the real world. However, GNNs are powerful tools for modeling complex interactions and relationships using a graph structure of nodes and edges, which can be used to extract unstructured data into a structured set of features [25]. For example, machines and operations are represented as nodes in a graph, and the interactions and dependencies between them are represented by edges.

The reinforcement learning policy network in Figure 5 takes the feature sets of the machine and operation as the input and generates various actions that the agent can perform. It also performs scheduling by selecting the best action for the environment from among these actions. After executing the selected action, the agent receives a reward from the environment and learns to maximize this reward. During this process, the agent adapts to different situations and conditions and gradually improves its performance through iterative training. The end result is flexible with efficient scheduling that effectively reflects user requirements.

3.2. LLM-Based User Requirements Analysis Module

Key performance indicators (KPIs) are used to effectively analyze user requirements. KPIs allow organizations to set important performance metrics to achieve their goals and continuously evaluate and adjust their progress. They foster collaboration across teams and serve as an important tool to drive performance improvement through data-driven decision-making. In the manufacturing industry, manufacturing costs and productivity are key factors in maximizing cost efficiency and staying competitive. Lowering manufacturing costs allows for the more efficient use of resources while increasing productivity allows for the more efficient production of products, which enables sustainable growth [26]. LLM is used to more deeply understand the meaning of requirements and select the most relevant KPIs. LLM performs well at understanding general words and sentences, but shows difficulty in understanding specialized words and sentences used in specific domains. This is where fine-tuning can improve the model’s performance [27]. The fine-tuned language model in the literacy module in Figure 5 gains a better understanding of the terms and expressions used in a particular domain, which enables it to more accurately assess the relevance of user requirements to KPIs.

The literacy module performs an analysis that takes into account the structure and meaning of sentences rather than just keyword matching. Cosine similarity is a method of measuring similarity using the angle between two vectors without considering the size of the vectors, which allows you to quantitatively evaluate the relevance of user requirements to KPIs. The user requirement is not simply compared to the KPI words but to the sentence that defines the KPI. A high similarity score indicates a high degree of relevance between the requirement and the KPI, which is used to select the most appropriate KPI. Because this method compares similarity by vectorizing context and meaning, it is useful for improving the accuracy of requirements analysis over traditional methods such as word matching and frequency analysis.

3.3. Assigning Digital Twin Attribute Data Weights

Digital twin-attributed data consist of metadata, which describe the structure, format, and semantics of the data, and actual data, which include status information, measurements, and more. For user-centered scheduling, digital twin attribute data are selected based on KPIs and comprehensively reflect the data required for scheduling. The selected digital twin attribute data are weighted according to the data that must be considered when scheduling. For example, attribute data that are closely related to KPIs are assigned a higher weight and considered more important in the scheduling process. The data are fed into the GNN along with machine and operational feature data to be reflected in the action set for the agent. As a result, digital twin attribute data weighted by user requirements enable flexible, user-centric scheduling.

3.4. The Reinforcement Learning-Based Scheduling Module

Graphs are tools that have the advantage of modeling complex relationships. In scheduling, the operations performed by a job are represented by nodes and directed combinatorial arcs that represent the priority between two operations, and undirected disjunctive edges that represent the relationship between operations that can be performed on the same machine. A scheduling problem is an optimization problem that determines in what order to process a given set of jobs, which means converting an undirected graph into a directed graph [28]. In traditional scheduling problems, operation nodes, machine nodes, and processing times are represented by edges. The GNN generates feature embeddings for each node, which are then fed into the policy network. This allows the policy network to make informed decisions based on a comprehensive understanding of the scheduling state rather than relying solely on local information. Using GNNs improves the ability of the model to generalize across different problem sizes and configurations, resulting in better scheduling performance.

In this study, we applied digital twin attributes extracted from the analysis of user requirements instead of the processing times considered in the existing methods. This approach can reflect more complex requirements than traditional scheduling, which only considers processing time. For example, the optimal schedule can be generated by considering a variety of factors, such as machine health or performance data and the priority of each operation. This does not mean simply placing jobs in chronological order but rather deriving an optimal sequence of jobs that reflects the complex variables of a real-world production environment. The approach proposed in this paper is a scheduling technique that enables more flexible scheduling than traditional methods, which only consider the processing time and can effectively reflect the various variables and conditions that occur in real-world manufacturing environments.

3.5. Dynamic Environments Based on Large Language Models

An environment in reinforcement learning, which consists of states, actions, rewards, and state transitions, interacts with an agent and returns with rewards as a response to actions. The reward provided by the environment is feedback on actions and is an important factor in helping the agent achieve its goal. An LLM that generates appropriate responses based on user input has a similar interaction pattern to an environment of reinforcement learning. As such, LLMs can play a role in reinforcement learning environments by interpreting states and rewarding agents for their behavior. LLM-based environments provide the flexibility to dynamically adjust the reward function as requirements or the environment change. This dynamic adjustment allows LLM to reflect changes in user requirements in real-time and generate appropriate feedback based on them, helping agents make optimal decisions. The LLM approach can design a reward structure that is more effective than the traditional fixed environment, further improving the learning efficiency of the agent.

The digital twin attributes that affect scheduling are of different natures. For example, smaller manufacturing cost data will perform better, while larger throughput data will perform better. Different data characteristics require different reward structures. LLM can help agents learn optimal behavior by understanding different data features and dynamically applying the appropriate reward function for each situation. The user-based data selected by the literacy module are applied to an environment of reinforcement learning using LLM, which applies to a reward structure that matches the characteristics of the data. This ensures that the agent is optimally rewarded according to user requirements and trains the optimal actions to perform well in various production situations. The dynamic reinforcement of the learning environment allows the agent to flexibly adapt to complex and changing environments. This dynamic reinforcement learning environment is much more flexible than a simple rule-based system and can adapt to complex real-time changes. As a result, agents can continuously improve their performance in different production scenarios and make more efficient and accurate scheduling decisions.

4. Evaluation

We assessed the effectiveness of the proposed scheduling algorithm by comparing it with the DRL-based job-shop scheduling algorithm and the FIFO algorithm used for flow-shop scheduling [25]. Table 1 summarizes the KPIs used in the literacy module, detailing the type, definition, digital twin attributes, and example requirements for each KPI. Manufacturing cost (KPI 1) and productivity (KPI 2) are central to the scheduling process, where manufacturing cost attributes like machine runtime and energy consumption enable cost-efficient scheduling, while productivity attributes such as facility utilization and energy efficiency support high output rates to meet the market demand. We found that our proposed algorithm works better under different conditions than traditional algorithms with fixed rules. We also compared the scheduling performance in diverse environments through scenarios with different numbers of jobs and machines and found that the effectiveness of our proposed method is more pronounced in scenarios with a large number of jobs and machines.

4.1. Experiment Setup

This experiment was conducted using a computer with an Intel Core i9-11900K CPU, NVIDIA RTX 3090, running Ubuntu 20.04. with Python 3.7. The literacy module was composed of a BERT model with strengths in contextual understanding and natural language processing. The BERT model utilizes transformer structures similar to GPT, but it is characterized by its ability to understand the context in both directions and learn the meaning of words in a sophisticated way. This allowed the literacy module to accurately interpret and analyze the various requirements. For performance evaluation, we utilized generative AI to generate 100 virtual user requirements and applied them as experimental data to verify the processing power of the module. The scheduling module was optimized through the application of the Proximal Policy Optimization (PPO) algorithm. This algorithm is based on an actor–critic structure and features the ability to learn optimal behavior while limiting policy variation, which makes it effective for complex scheduling problems. The hyperparameters of reinforcement learning were tuned to maximize performance, and the main hyperparameters are shown in Table 2.

4.2. Literacy Module Evaluation

The first requirement case involved various user requirements related to manufacturing costs. In total, 100 requirements were entered into the literacy module to calculate their relevance scores with respect to the KPIs. As explained earlier, manufacturing cost and productivity were defined as KPIs, and the relevance of the requirements was compared against these KPIs using cosine similarity to express the scores. In this case, the average correlation scores for KPI1 and KPI2 were 0.87 and 0.59, respectively. The user requirements, in this case, are related to manufacturing costs, and the relevance score for the manufacturing cost KPI is 0.29 higher than the manufacturing productivity KPI. For the second requirement case, which involves various user requirements related to productivity, we similarly input 100 requirements into the literacy module to calculate the relevance scores with the KPIs. In this case, the average relevance scores with KPI1 and KPI2 were 0.51 and 0.86, respectively. It was found that the relevance score of productivity-related requirements with the manufacturing productivity KPI was 0.35 higher compared to the manufacturing cost KPI. The average relevance scores for each requirement case are summarized in Table 3, while Figure 7 provides a box plot for a visual comparison of these relevance scores across requirement cases.

4.3. Schedule Module Evaluation

The scheduling module was evaluated by entering scheduling data weighted differently by relevant KPIs through the literacy module. For requirement case 1, which is cost-related, the evaluation focused on scheduling that minimized manufacturing costs as the data size decreased. Conversely, for requirement case 2, which is productivity-related, the focus was on scheduling maximum productivity as the data size increased. To train the reinforcement learning model, we performed training for a total of 1000 epochs. In each epoch, three weight updates were made. During the training process, we also improved the quality of the data by tuning various hyperparameters and removing outliers to improve the performance of the model.

Figure 8 and Figure 9 illustrate the average performance of each scheduling algorithm across scenarios with varying numbers of jobs and machines, represented as box plots. FIFO is shown in red, DRL in orange, and the literacy DRL in blue. Figure 8 compares manufacturing costs in each scenario and shows that literacy DRL is effective in reducing manufacturing costs, consistently lowering the median manufacturing costs compared to FIFO and DRL. The cost savings of literacy DRL are more pronounced in scenarios with a higher number of jobs and machines (e.g., Scenarios 5 and 6), and the narrow interquartile range within the box indicates a stable and consistent ability to manage manufacturing costs. Figure 9 compares productivity in each scenario and shows that literacy DRL achieves higher median productivity values compared to FIFO and DRL. Especially in complex scenarios such as Scenarios 4–6, literacy DRL maintains a consistently high productivity range, with less variability in the blue boxplot, indicating good adaptability to different production scales. These results suggest that literacy DRL provides a more efficient and scalable approach to manufacturing cost control and productivity improvement in dynamic production environments.

Table 4 and Table 5 compare the average performance improvement metrics for each algorithm. For requirement case 1 (cost reduction), literacy DRL demonstrates a performance improvement over FIFO, ranging from a minimum of 5.9% to a maximum of 14.9%. Compared to the DRL method, literacy DRL shows an improvement of 2% to 5.6%. For requirement case 2 (productivity), literacy DRL achieves an improvement over FIFO between 6.1% and 13.4% and an improvement over DRL from 2.6% to 7.6%. These results highlight that literacy DRL is particularly effective for flexible manufacturing environments, such as the SDF, where it adapts to various user requirements. Additionally, the experimental outcomes across different scenarios indicate that literacy DRL is especially beneficial in large-scale settings with a high number of jobs and machines, providing both cost control and productivity enhancements.

5. Conclusions

The software-defined factory (SDF) emerges to produce personalized products demanded by the evolving manufacturing industry. It represents an autonomous production system that delivers customized manufacturing services through the digital transformation of manufacturing functions and processes. This enhances competitiveness through production flexibility that is difficult to support in traditional factories where manufacturing software is tightly coupled with hardware. The industrial metaverse of the SDF allows users to create tailored products, reflecting their specific requirements for performance, design, and characteristics. The manufacturing services for production are provided through the reconfiguration of open assets. Open assets are hardware or software components that can be easily added, replaced, or updated to meet requirements in the SDF. This adaptability allows them to remain flexible and not be tied to a fixed configuration, thereby maximizing the efficiency of the production process. In addition, open assets can require hardware adjustments as necessary.

To produce personalized products, it is essential to have manufacturing planning that dynamically satisfies user requirements. However, the current fixed, single-rule methods struggle to accommodate diverse needs and varying characteristics. To address this problem, this study introduces literacy DRL-based federated digital twin scheduling that combines LLMs and DRL. The literacy module based on LLMs analyzes requirements in natural language and translates them into scheduling factors by weighting digital twin attributes. In addition to fine-tuning, the literacy module can further enhance domain expertise and understanding through retrieval-augmented generation (RAG) to better reflect user requirements. The scheduling module utilizes DRL to perform optimal scheduling by selecting the job–machine pairs that receive the best rewards in environments. We also demonstrated that the proposed method could respond to various requirements more flexibly than traditional DRL-based scheduling methods by incorporating diverse requirements such as manufacturing costs and manufacturing productivity. Additionally, the environment of DRL using LLMs can set the reward function according to the characteristics of the data. The proposed method can be effectively applied to large-scale and diverse tasks such as the SDF, where the relationship between tasks and requirements is very complex.

While the proposed method has demonstrated greater flexibility in addressing different requirements compared to DRL-based scheduling approaches that consider factors, such as manufacturing costs and productivity, it is important to recognize its limitations in real-world applications. The computational overhead associated with LLM-based systems may lead to challenges regarding both performance and scalability in dynamic manufacturing environments. This could make them particularly difficult to apply in scenarios where rapid decision-making is critical. It is also expected that implementing such systems into existing production systems will require significant effort. Future research will focus on developing strategies to facilitate the effective integration of LLMs with existing resources to improve computational efficiency. This could include methods for lightweighting AI models to reduce computational demand or novel algorithms that can reduce the computational resource burden while maintaining the performance of the LLM [29,30]. By addressing these challenges, we expect that the techniques proposed in this thesis will not only contribute to the hyper-personalized manufacturing paradigm but also lay the foundation for future innovations in the manufacturing industry.

Author Contributions

Conceptualization, J.A.; Methodology, J.A.; Software, J.A.; Validation, J.A.; Formal analysis, J.A.; Writing—original draft, J.A.; Writing—review & editing, J.A., S.Y., J.-W.K. and W.-T.K.; Supervision, W.-T.K.; Project administration, W.-T.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Institute of Information and Communications Technology Planning and Evaluation (IITP) Grant funded by the Korean Government through (MSIT) (No. RS-2024-00355259. Metaverse Standards Research Laboratory).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Koren, Y. The Global Manufacturing Revolution: Product-Process-Business Integration and Reconfigurable Systems; John Wiley & Sons: New York, NY, USA, 2010. [Google Scholar]
Hu, S.J. Evolving paradigms of manufacturing: From mass production to mass customization and personalization. Procedia CIRP 2013, 7, 3–8. [Google Scholar] [CrossRef]
Hughes, C. The assembly line at Ford and transportation platforms: A historical comparison of labour process reorganisation. In New Technology, Work and Employment; Wiley Online Library: New York, NY, USA, 2024. [Google Scholar]
Qin, Z.; Lu, Y. Self-organizing manufacturing network: A paradigm towards smart manufacturing in mass personalization. J. Manuf. Syst. 2021, 60, 35–47. [Google Scholar] [CrossRef]
Wang, Y.; Ma, H.S.; Yang, J.H.; Wang, K.S. Industry 4.0: A way from mass customization to mass personalization production. Adv. Manuf. 2017, 5, 311–320. [Google Scholar] [CrossRef]
Ulrich, M.; Lux, G.; Jürgensen, L.; Reinhart, G. Automated and cycle time optimized path planning for robot-based inspection systems. Procedia CIRP 2016, 44, 377–382. [Google Scholar] [CrossRef]
Kim, D.K.; Kim, H.J.; Lee, T.E. Optimal scheduling for sequentially connected cluster tools with dual-armed robots and a single input and output module. Int. J. Prod. Res. 2017, 55, 3092–3109. [Google Scholar] [CrossRef]
Foumani, M.; Gunawan, I.; Smith-Miles, K. Increasing throughput for a class of two-machine robotic cells served by a multifunction robot. IEEE Trans. Autom. Sci. Eng. 2015, 14, 1150–1159. [Google Scholar] [CrossRef]
Ritterbusch, G.D.; Teichmann, M.R. Defining the metaverse: A systematic literature review. IEEE Access 2023, 11, 12368–12377. [Google Scholar] [CrossRef]
Onaji, I.; Tiwari, D.; Soulatiantork, P.; Song, B.; Tiwari, A. Digital twin in manufacturing: Conceptual framework and case studies. Int. J. Comput. Integr. Manuf. 2022, 35, 831–858. [Google Scholar] [CrossRef]
Wang, L.; Pan, Z.; Wang, J. A review of reinforcement learning based intelligent optimization for manufacturing scheduling. Complex Syst. Model. Simul. 2021, 1, 257–270. [Google Scholar] [CrossRef]
Dauzère-Pérès, S.; Ding, J.; Shen, L.; Tamssaouet, K. The flexible job shop scheduling problem: A review. Eur. J. Oper. Res. 2024, 314, 409–432. [Google Scholar] [CrossRef]
Fang, Y.; Peng, C.; Lou, P.; Zhou, Z.; Hu, J.; Yan, J. Digital-twin-based job shop scheduling toward smart manufacturing. IEEE Trans. Ind. Inform. 2019, 15, 6425–6435. [Google Scholar] [CrossRef]
Liu, C.L.; Chang, C.C.; Tseng, C.J. Actor-critic deep reinforcement learning for solving job shop scheduling problems. IEEE Access 2020, 8, 71752–71762. [Google Scholar] [CrossRef]
Chen, R.; Yang, B.; Li, S.; Wang, S. A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem. Comput. Ind. Eng. 2020, 149, 106778. [Google Scholar] [CrossRef]
Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
Tian, Y.; Li, X.; Zhang, H.; Zhao, C.; Li, B.; Wang, X.; Wang, F.Y. VistaGPT: Generative parallel transformers for vehicles with intelligent systems for transport automation. IEEE Trans. Intell. Veh. 2023, 8, 4198–4207. [Google Scholar] [CrossRef]
Li, P.; Lu, Q.; Zhao, X.; Tao, B. Joint knowledge graph and large language model for fault diagnosis and its application in aviation assembly. IEEE Trans. Ind. Inform. 2024, 20, 8160–8169. [Google Scholar] [CrossRef]
Wang, T.; Fan, J.; Zheng, P. An LLM-based vision and language cobot navigation approach for Human-centric Smart Manufacturing. J. Manuf. Syst. 2024, 75, 299–305. [Google Scholar] [CrossRef]
Gkournelos, C.; Konstantinou, C.; Makris, S. An LLM-based approach for enabling seamless Human-Robot collaboration in assembly. CIRP Ann. 2024, 73, 9–12. [Google Scholar] [CrossRef]
Xia, Y.; Shenoy, M.; Jazdi, N.; Weyrich, M. Towards autonomous system: Flexible modular production system enhanced with large language model agents. In Proceedings of the 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA), Sinaia, Romania, 12–15 September 2023; IEEE: New York, NY, USA, 2023; pp. 1–8. [Google Scholar]
Fan, H.; Liu, X.; Fuh, J.Y.H.; Lu, W.F.; Li, B. Embodied intelligence in manufacturing: Leveraging large language models for autonomous industrial robotics. J. Intell. Manuf. 2024. [Google Scholar] [CrossRef]
Song, W.; Chen, X.; Li, Q.; Cao, Z. Flexible job-shop scheduling via graph neural network and deep reinforcement learning. IEEE Trans. Ind. Inform. 2022, 19, 1600–1610. [Google Scholar] [CrossRef]
Contini, G.; Peruzzini, M. Sustainability and industry 4.0: Definition of a set of key performance indicators for manufacturing companies. Sustainability 2022, 14, 11004. [Google Scholar] [CrossRef]
Ziegler, D.M.; Stiennon, N.; Wu, J.; Brown, T.B.; Radford, A.; Amodei, D.; Christiano, P.; Irving, G. Fine-tuning language models from human preferences. arXiv 2019, arXiv:1909.08593. [Google Scholar]
Lei, K.; Guo, P.; Zhao, W.; Wang, Y.; Qian, L.; Meng, X.; Tang, L. A multi-action deep reinforcement learning framework for flexible Job-shop scheduling problem. Expert Syst. Appl. 2022, 205, 117796. [Google Scholar] [CrossRef]
Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
Gu, Y.; Dong, L.; Wei, F.; Huang, M. MiniLLM: Knowledge distillation of large language models. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]

Figure 1. Concept of the software-defined factory.

Figure 2. Scenarios for the software-defined factory.

Figure 3. History of language model development.

Figure 4. Literacy DRL-based scheduling behavior in the software-defined factory. Different colors represent types of digital twin attributes. Labels like “R” (Resource), “H” (Hardware Module), and “App” (Application) show the organization of different resources, hardware, and applications within the digital twin framework.

Figure 5. Literacy DRL-based federated digital twin scheduling training steps. The colored action sets represent those selected through reinforcement learning during the training process, with the gray action sets indicating additional resources now being scheduled as part of the ongoing planning.

Figure 6. Data flow mechanism of literacy module in scheduling.

Figure 7. Requirement cases—KPIs relevance score comparison.

Figure 8. Comparison of manufacturing costs for each scenario.

Figure 9. Comparison of productivity for each scenario.

Table 1. KPI settings for literacy module.

	Type	Definition	DT Attribution	Example Requirement
KPI 1	Manufacturing cost	All costs incurred during manufacturing.	Machine run time, energy consumption, etc.	The goal is to develop and invest in innovative new products to expand our market presence.
KPI 2	Productivity	Amount of product produced in a unit of time.	Facility utilization, energy efficiency, etc.	The goal is to meet market demand and drive sales growth.

Table 2. Hyperparameters for Scheduling module.

	Learning Rate	Adam Optimizer Betas	Discount Factor	Number of Epochs	Maximum Iterations	Mini Batch Size	Parallel Iterations
Value	0.0002	[0.9, 0.999]	10	3	1000	512	20

Table 3. Average relevance score by requirements case. KPI 1 was defined as the manufacturing cost and KPI 2 was defined as manufacturing productivity.

	KPI 1	KPI 2
Requirements Case 1	0.87	0.59
Requirements Case 2	0.51	0.86

Table 4. Comparison of average performance for Requirement Case 1 (Cost Reduction) across scenarios with different numbers of jobs and machines.

Requirements Case 1	Scenario 1	Scenario 2	Scenario 3	Scenario 4	Scenario 5	Scenario 6
(Cost Reduction)	(J: 10, M: 5)	(J: 15, M: 5)	(J: 15, M: 10)	(J: 20, M: 10)	(J: 20, M: 15)	(J: 25, M: 15)
FIFO	55.11	75.61	86.11	103.19	131.62	149.1
DRL	52.91	70.18	79.99	93.68	119.05	134.45
Literacy DRL	51.84	68.11	77.25	89.61	113.36	126.8
Improvement	2.02%	2.94%	3.42%	4.34%	4.77%	5.67%

Table 5. Comparison of average performance for Requirement Case 2 (Productivity) across scenarios with different numbers of jobs and machines.

Requirements Case 2	Scenario 1	Scenario 2	Scenario 3	Scenario 4	Scenario 5	Scenario 6
(Productivity)	(J: 10, M: 5)	(J: 15, M: 5)	(J: 15, M: 10)	(J: 20, M: 10)	(J: 20, M: 15)	(J: 25, M: 15)
FIFO	653.62	748.2	1031.8	1234.4	1399.8	1599.8
DRL	677.6	785.15	1085.6	1319.8	1486.13	1713.38
Literacy DRL	696.31	811.95	1137.17	1399.43	1584.64	1848.28
Improvement	2.68%	3.31%	4.75%	5.69%	6.21%	7.29%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahn, J.; Yun, S.; Kwon, J.-W.; Kim, W.-T. Literacy Deep Reinforcement Learning-Based Federated Digital Twin Scheduling for the Software-Defined Factory. Electronics 2024, 13, 4452. https://doi.org/10.3390/electronics13224452

AMA Style

Ahn J, Yun S, Kwon J-W, Kim W-T. Literacy Deep Reinforcement Learning-Based Federated Digital Twin Scheduling for the Software-Defined Factory. Electronics. 2024; 13(22):4452. https://doi.org/10.3390/electronics13224452

Chicago/Turabian Style

Ahn, Jangsu, Seongjin Yun, Jin-Woo Kwon, and Won-Tae Kim. 2024. "Literacy Deep Reinforcement Learning-Based Federated Digital Twin Scheduling for the Software-Defined Factory" Electronics 13, no. 22: 4452. https://doi.org/10.3390/electronics13224452

APA Style

Ahn, J., Yun, S., Kwon, J. -W., & Kim, W. -T. (2024). Literacy Deep Reinforcement Learning-Based Federated Digital Twin Scheduling for the Software-Defined Factory. Electronics, 13(22), 4452. https://doi.org/10.3390/electronics13224452

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Literacy Deep Reinforcement Learning-Based Federated Digital Twin Scheduling for the Software-Defined Factory

Abstract

1. Introduction

2. Background

2.1. The Software-Defined Factory

2.2. Scheduling

2.3. Large Language Models

3. Proposed Methods

3.1. Federated Digital Twin Scheduling Based Literacy DRL

3.2. LLM-Based User Requirements Analysis Module

3.3. Assigning Digital Twin Attribute Data Weights

3.4. The Reinforcement Learning-Based Scheduling Module

3.5. Dynamic Environments Based on Large Language Models

4. Evaluation

4.1. Experiment Setup

4.2. Literacy Module Evaluation

4.3. Schedule Module Evaluation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI