Towards DevOps for Cyber-Physical Systems (CPSs): Resilient Self-Adaptive Software for Sustainable Human-Centric Smart CPS Facilitated by Digital Twins
Abstract
:1. Introduction and Background
1.1. Toward Resilient, Sustainable, and Human-Centric Smart CPSs: The Potential of DevOps
- Learn. DevOps significantly emphasizes comprehensive system monitoring and data collection [25], which facilitates knowledge acquisition and understanding of system behavior and aging. This understanding allows designers and operators to optimize the system towards enhanced value delivery.
- Train. DevOps promotes the convergence of engineering and operation through continuous experimentation, which enables engineers (e.g., developers, operators, and commissioning workers) to test, compare, and evaluate system changes in the real operational environment from remotely before their actual (remote) deployment to production [36]. In addition to that, continuous experimentation can be used for the remote training of people in real operational environments instead of using simulations only.
- Value. DevOps facilitates automating engineering tasks such as software deployment and testing during multiple lifecycle phases, including development, commissioning, and maintenance [23]. Hence, DevOps can free engineers from laborious, error-prone, and time-consuming routine tasks so that they can focus on more creative and value-added activities.
- Collaborate. As stated, we use DevOps to optimize value delivery by streamlining human-driven system evolution by effectively supporting system change and change processes at development time and runtime. However, changing a system during runtime is risky, particularly when adapting safety and mission-critical systems. Hence, our proposed approach also supports machine-driven system adaptation to provide self-protection mechanisms that ensure system safety and integrity during adaptation. These self-adaptation mechanisms can also enhance human–machine collaboration by continuously adapting machine behavior to the needs of individual humans in real time.
1.2. Design Space for the Structured Engineering of Resilient and Sustainable Smart CPSs
- The various types of uncertainty, covering also the risks identified during our stakeholder analysis described above. Table 2 provides examples per uncertainty type.
- The explicit representations of the resources on which the change processes work: (a) the architecture description and the system implementation for evolution and (b) the runtime model and running system for adaptation.
Type | Domain * | Description |
---|---|---|
Goal uncertainty | Evolution | The goals/requirements of the system may be subject to uncertainties, e.g., User requirement, communication protocols, workflows, and processes may dynamically change in ways that are difficult to predict. |
Environmental uncertainty | Adaptation | The context of the system may be subjects of uncertainties, e.g., The availability of resources may dynamically change in ways that are difficult to predict. |
Model uncertainty | Adaptation | The runtime architecture model may be subject to uncertainties (i.e., model uncertainty [41]), e.g., The model may only provide inaccurate predictions of an algorithm’s resource usage in time and memory consumption. |
Adaptation uncertainty | Adaptation | The adaptation itself may be subject to uncertainties, e.g., The analysis of a real situation may result in the selection of an inaccurate execution plan. |
Monitoring uncertainty | Both | Monitoring may be subject to uncertainties, e.g., Measuring physical quantities such as distance and pressure may be subject to noise. The cyclic update of measurement values obtained from external services may be subject to unexpected delays and jitter. |
Change enactment uncertainty | Both | Change enactment may be subject of uncertainty, e.g., The time to change a service may differ from the expected. Applied changes may have unexpected side-effects, e.g., due to high CPU or memory contention. |
- Primary assets (PAs) denote the processes, machinery, and heavy assets that implement the intended CPSs’ functionality by affecting the physical world. Examples of processes are power generation, power distribution, and goods manufacturing. Examples of machinery and heavy assets are turbines, generators, pumps, presses, and transportation systems.
- Secondary assets (SAs) denote the information processing and communication technology (ICT) infrastructure required to enhance the primary assets with computing capabilities. This comprises all hardware, software, and network elements required to control the primary assets. Examples of hardware are sensors, actuators, fieldbus systems, and control units (also denoted as programmable logic controllers (PLCs)). Examples of software and networks comprise all software services and data processed and transmitted via ICT infrastructure. Where appropriate, we distinguish ICT infrastructure between Information Technology (IT) and Operation Technology (OT). In principle, IT is designed for the consumer market. In contrast, OT is hardened for safety [16,18], which is an essential system property in critical industrial environments where failures of OT may lead to severe incidents that can cause significant financial losses and harm to humans and the environment. Section 2 provides further details on IT and OT.
- Real-time closed-loop control of primary assets denotes the most fundamental CPS capability and is realized through the computing and communication capabilities provided by the secondary assets. Our case study investigates a specific CPS class required to provide hard real-time sub-millisecond closed-loop control on resource-constrained embedded devices for critical infrastructure control. A domain-specific example is the excitation control system responsible for maintaining the voltage and frequency of the generator within acceptable limits, which is critical for the stable and reliable feed-in to the power distribution system.
- The Digital Twin is becoming a defacto standard element to enrich CPS asset capabilities and the smartness of CPS. Nevertheless, it is a novel paradigm in its early stages. So far, research has not agreed on a common definition [42], and with the increasing research on DT, new application areas and definitions emerge such as the one provided in Section 2.1. In industry, even pioneer companies are still in the initial adoption phases [5,19]. Hence, we decided to add Digital Twin to the smarter CPS design space and placed the Digital Twin technology element outside the circle of the most fundamental smart CPS components.
- Adaptability denotes the ability of a system to adjust its properties to new conditions. The sources of such new conditions are represented in the AdEpS model through uncertainty—see [9] for a detailed explanation. Table 1 summarizes and Section 9 discusses how this work addresses the uncertainty types highlighted in Figure 1b. Adaptability is realized through two principal properties of self-adaptive systems: parameter-based and architecture-based adaptation. Parameter-based adaptation is the ability of a software system to adapt internal variables to, e.g., optimize the parameters of a control function or communication protocol at runtime. In contrast, architecture-based adaptation is the ability of a software system to change its structure to, e.g., update software components and the way they are composed and interact. Thereby, architecture-based adaptation supports the evolution of software systems to cope with novelty and unanticipated change (i.e., design/goal uncertainty).
- Context- and Self-Aware Ability: Self-awareness means the system knows its own states and behaviors. Context-awareness means that the system is aware of its context, i.e., its operational environment. Both properties are based on self- and context-monitoring, which reflects what properties are monitored [56]. Context-awareness is required to obtain awareness of environmental uncertainty. On the other hand, self-awareness is required to address change-enactment uncertainty.
- Manageability in our case study denotes the ability of the system to make change-management capabilities (i.e., the functions necessary for system adaptation and evolution) easily accessible for humans and machines throughout all system lifecycle phases.
1.3. Research Agenda and Methodology
- RQ1
- Context- and Self-Aware Ability: How to design and integrate Digital Twins into software systems so that they provide a high-fidelity (i.e., real-time sub-millisecond) reflection of their CPS operational context in physical space and cyberspace that is available for autonomous (i.e., context-aware and self-aware) decision-making on resource-constrained embedded control units?
- RQ2
- Adaptability: How to design control software for resource-constrained embedded control units that supports parameter-based and architecture-based adaptation without causing system downtime and interruptions during the simultaneous execution of sub-millisecond real-time closed-loop control services?
- RQ3
- Manageability: How to automate and orchestrate distributed adaptations in the CPS physical space and cyberspace to support practitioners (i.e., development, operation, maintenance, and commissioning engineers) in mastering software-related complexities during CPS development and operations (i.e., monitoring, deployment, commissioning, control/operation, and maintenance)?
2. State-of-the-Art Analysis of Digital Twins in Modern Industrial Environments
2.1. Definition and Fundamental Concepts of Digital Twins
“A Digital Twin is a virtual representation of its physical counterpart. Its components provide the basis for a simulation or are simulation models themselves. The Digital Twin has an automated bidirectional data connection with the represented physical counterpart. This connection may span across several life phases of the system”.
“A Digital Twin is a virtual representation of a tengible or intengible object. This concept involves cloning a real object into a software counterpart, the logical object. The logical object bidirectionaly reflects the essential properties and characteristics of the real object as required by the specific operational context and lifecycle phase (i.e., the Digital Twin’s intended use). The logical object shall provide a service-oriented interface for seamless interaction and composition with other logical objects and software services and to facilitate the real and logical objects’ cooperation and co-evolution to cope with changing system goals”.
“Services are self-describing, open components that support rapid, low-cost composition of distributed applications. Services are offered by service providers—organizations that procure the service implementations, supply their service descriptions, and provide related technical and business support. Since services may be offered by different enterprises and communicate over the Internet, they provide a distributed computing infrastructure for both intra- and cross-enterprise application integration and collaboration.
Service descriptions are used to advertise the service capabilities, interface, behavior, and quality. Publication of such information about available services provides the necessary means for discovery, selection, binding, and composition of services. In particular, the service capability description states the conceptual purpose and expected results of the service (by using terms or concepts defined in an application-specific taxonomy). The service interface description publishes the service signature (its input/output/error parameters and message types). The (expected) behavior of a service during its execution is described by its service behavior description (for example, as a workflow process). Finally, the Quality of Service (QoS) description publishes important functional and nonfunctional service quality attributes, such as service metering and cost, performance metrics (response time, for instance), security attributes, (transactional) integrity, reliability, scalability, and availability. Service clients (end-user organizations that use some service) and service aggregators (organizations that consolidate multiple services into a new, single service offering) utilize service descriptions to achieve their objectives”.
2.2. Challenges
2.2.1. The Challenge of IT and OT Convergence
2.2.2. The Challenge of Runtime and Development Time Convergence
2.3. Modern Industrial Environments
2.4. Modern Industrial Software Architectures
3. Requirements for DT-Enhanced Embedded Control Software
- R1
- Primary Asset Closed-Loop Control: The SWF shall support sub-millisecond real-time closed-loop control of primary assets
- R2
- Primary Asset Monitoring Fidelity: The SWF shall support logical objects that reflect primary asset states at a sub-millisecond time scale.
- R3
- Secondary Asset Monitoring Fidelity: The SWF shall support logical objects that reflect secondary asset states at a sub-millisecond time scale.
- R4
- Secondary Asset Closed-Loop Control: The SWF shall support sub-second parameter-based self-adaptation of primary and secondary asset parameters.
- R5
- Secondary Asset Management: The SWF shall support architecture-based self-adaptation of both mission-critical and non-critical control unit services.
- R6
- Servitization: The SWF shall provide interfaces that allow access to the primary and secondary asset logical objects’ monitoring and adaptation services for integration and use at all CPS layers.
4. Design of a DT-Enhanced Self-Adaptive Software Framework for Mission-Critical Industrial Process Control
4.1. Software Architecture of the DT-Enhanced Control Services
- All microservices run as native processes hosted by the control unit.
- All microservices are hosted within a container (similar to the containerized DT) that runs on the control unit.
- A mixed configuration where some microservices run as native processes and others within a container environment.
- Practitioners can dynamically interact with the system using a well-defined command-line interface.
- Practitioners can use scripts to automate tasks that can be verified beforehand, lowering the risk of errors.
- Practitioners can use scripts to define context-aware tasks for autonomous execution by, e.g., transferring script execution to the SA control service.
4.2. DT-Enhanced Context-Aware Self-Adaptation
- 1.
- The PA and SA control services are responsible for the autonomous monitoring and control of its associated PAs and SAs under normal operating conditions (R1, R2, R3, R4);
- 2.
- The PA and SA control services (or an additional local management service) are responsible for managing the autonomous adaptation and self-adaptation of system properties in reaction to uncertainties (see Figure 1b) in physical space and cyberspace (R1, R2, R3, R4, R6); and
- 3.
- The PA and SA control services co-operate with external managing systems to ensure the reliable (i.e., self-protected) and context-aware orchestration of system adaptations across CPS software and system layers (R1, R2, R3, R4, R5, R6).
4.3. The Secondary Asset Digital Twin for Enhanced Management and Control
- 1.
- The input processing stage implements the subscriber pattern. The subscriber reads and receives messages from the network layers and performs routing and traffic shaping to optimize the data flow through the pipeline stages.
- 2.
- The primary function-processing stage implements the intended system functionality, which is, in the first place, the autonomous control of the PAs. To that aim, the PA control service processes the received data and updates the software representation (i.e., the logical object) of the PA. The SA control service operates similarly alongside the PA control service. The states of both the PAs and SAs are stored in their respective storage elements. Read and write access to these storage elements is implemented using highly efficient (wait-free) read–acquire and write–release operations, which ensures that both services can easily and efficiently access information about their operational context in physical and cyberspace via the PA and SA storage, respectively. Read and write access to the PA and SA storage elements can be considered deterministic if caching effects are negligible.
- 3.
- The output processing stage implements the publisher pattern. It reads data from the PA and SA storage elements to create and publish messages to registered subscribers.
4.4. The Timing Model of the Cyclic Real-Time Processing Pipeline
5. Service-Oriented Instrumentation and Self-Adaptation of the DT-Enhanced Control Service
5.1. Experiment Setup: Hardware Setup and Software Configuration of the DT-Enhanced Control Service
5.2. Experimental Execution: Instrumentation and Self-Adaptation
6. Experiment 1: Twinning Fidelity and Real-Time Control Characteristics
6.1. Experimental Design
- Service cycle time;
- Cycle portions per pipeline state and sub-stage;
- The number of messages received, to be processed, and to be sent per cycle;
- The size per received, processed, and sent message.
6.2. Results and Analysis
6.2.1. Timing and Cycle Usage Characteristics
- Columns and legend: All figures are structured into six columns, each representing the six possible configurations of factor1. Each column’s legend and heading indicate the configuration shown in a specific column. The PF input portion represents factor1. The Equations (1) and (2) define the dependencies between the PF input, function, and output sub-portions. Each column is structured into a top and a bottom diagram. The bottom diagram shows the discrete measurement values, whereas the top diagram shows the distribution of these measurements as a histogram.
- Y-Axis: Factor2 is plotted along the y-axis and shows the configured number of messages that shall be processed per cycle.
- X-Axis: The dependent variable under investigation is plotted along the x-axis. Each plotted x-axis sample represents a measurement result calculated in the twinning state of a single execution cycle. In other words, no moving-average algorithm is applied.
- Annotations: Each figure is annotated with vertical lines or boxes to indicate relevant properties of the reference model.
6.2.2. Data Flow Characteristics
- Figure A1 shows the number of received messages per cycle.
- Figure A3 shows the number of messages sent per cycle.
- Figure A1 shows the number of messages dropped during the send (i.e., output) step. Such a drop can occur due to backpressure at the message destination and due to routing decisions. The experimental results in Section 7.2 illustrate how the backpressure propagates through the system.
6.3. Interpretation
- Diagram (a)
- This diagram shows the maximum amount of messages that can be fully processed per cycle. This figure is derived from Figure 20 and shows the dependency between the factors under investigation. Factor1 is plotted along the x-axis. Factor2 is plotted along the y-axis.
- Diagram (b)
- This diagram shows the corresponding effective throughput ratios, i.e., the amount of message payload that can be processed per second.
- Diagram (c)
- This diagram shows the corresponding throughput ratios considering the overhead the implemented protocol header introduced. The protocol header provides the information required to provide certain functionalities like error handling, twinning, routing, and traffic shaping. The protocol header in the tested implementation is not optimized for size and comprises 250 bytes. A first investigation showed that the header size could be relatively easily reduced below 70 bytes if all twinning features of the framework are activated. If only the most essential features are activated, the header size can be reduced below 30 bytes. A further reduction in the header size requires a more elaborate investigation.
- (UC-1)
- A use case supporting low message-throughput ratios while providing a large cycle portion for the execution of the intended (i.e., primary) functionality. The experiment evaluated such a use case with the input and output portion set to 5%, which reserves 60% of the cycle for the execution of the primary function. In this configuration, two twinning messages and one general message are processed.
- (UC-2)
- A use case supporting high message throughput ratios while providing a smaller cycle portion for the execution of the intended (i.e., primary) functionality. The experiment evaluated such a use case with the input and output portion set to 30%, which reserves 10% of the cycle for primary function execution. In this configuration, 2 twinning messages and 18 general messages are processed.
7. Experiment 2: Context-Aware Self-Adaptation Characteristics
7.1. Experimental Design
- 1.
- Monitor: At the end of every cycle, all services (i.e., the data sources (SRCs)) twin their state into their corresponding queues. The CLT subsequently fetches these states. The latency between twinning and fetching is denoted as latency.
- 2.
- Analyze and Plan: The CLT checks the trigger condition, which is activated if the backpressure counter of the RT PF service is greater than 500 for more than 2 ms. The latency between the state fetching and the positive evaluation of the trigger condition is denoted as latency.
- 3.
- Execute: The countermeasure shall be executed on a positive trigger condition. In this experiment, the countermeasure shall reduce the message throughput to reduce the service load. Therefore, the CLT reconfigures the routing mechanisms of the RT PS input service to drop all messages that are not classified as critical. In other words, best-effort traffic routing is deactivated, which is achieved via updating the message delivery state of the RT PF service. The service reconfiguration time is denoted as latency and is equivalent to the execution time of the update command. An update command consists of several sub-commands to immediately check the system integrity by evaluating if the requested command could be successfully executed.
7.2. Results and Analysis
- Diagram (a)
- This diagram shows the time series data of the RT PS input service that acts as a data generator. The service simulates a continuous message burst and sends all messages to the RT PF service.
- Diagram (b)
- This diagram shows the time series data of the RT PF service that processes the received data and forwards them to the RT PS output service.
- Diagram (c)
- This diagram shows the time series data of the RT PS output service that processes the received data and simulates data forwarding.
- x-axis: Execution time in seconds.
- Primary y-axis: The backpressure is plotted along the primary y-axis on the left. The primary y-axis scaling is different between the plots.
- Secondary y-axis: The number of messages sent per cycle is plotted along the secondary y-axis on the right side. The secondary y-axis scaling is synchronized between the plots.
- (a)
- If the received backpressure reaches its limit, then messages received via the network are rejected, which results in an increase in backpressure at the message sender. end
- (b)
- If the input backpressure reaches its limit, the input stage can no longer forward messages, which propagates backward and immediately results in an increasing received backpressure.
- (c)
- In contrast, output backpressure does not propagate backward to the primary function stage because the primary function does not maintain a queue. Instead, the output stage must actively pull data from the storage elements of the primary function stage. Nonetheless, output backpressure may propagate to the input stage if messages are directly routed to the output stage. This is indicated by the data flow between the subscriber and publisher services in Figure 8.
- 1.
- M: The latency corresponds to the monitoring step and denotes the latency between the twinning at the service side (i.e., the source) and the processing of the twinned data by the CLT.
- 2.
- AP: The latency corresponds to the analyze–plan steps and denotes the time the CLT requires to evaluate the trigger condition shown in Figure 22.
- 3.
- E: The latency corresponds to the execute step and denotes the time to execute the series of commands and invariant checks required to reconfigure the message routing of the primary function service.
- 4.
- MAP: The latency shows the accumulated latency of the monitor, analyze, and plan steps.
- 5.
- MAPE: The latency shows the accumulated latency of the entire MAPE loop.
7.3. Interpretation
8. Experiment 3: Service Update and Reconfiguration Characteristics
8.1. Experimental Design
- 1.
- In the first phase, the RT PF B service is deployed and configured. This phase consists of steps to . The experiment is designed to evaluate the adaptation capabilities of the SWF. In order to avoid any blurring effects, the PF B service remains unchanged, unlike in an actual A/B testing situation. Instead, PF B is a replica of PF A with the same complexity, execution time, inputs, and outputs as defined by the constant factors shown in Table 5.
- 2.
- In the second phase, the RT PF B service is linked to the RT PS unput and output services, resulting in the side-by-side operation of services A and B. Note that the side-by-side operation represents the observation phase to compare the service behavior in an A/B testing scenario. At the end of the second phase, both services receive the same input messages and send their output messages to the RT PS output service. This phase consists of steps and .
- 3.
- Notes on the state transfer: If implemented, the state transfer between A and B would be executed next, resulting in service B taking over the responsibilities of service A. As noted, the evaluation of the state transfer a subject of future work. But we outline its underlying mechanism and expected timing characteristics. In principle, the state transfer is similar to updating the publisher and subscriber configurations (see steps and ) with one extra step. Hence, the time required for a state transfer is comparable to steps and . In particular, the state transfer consists of the following steps. First, the manager creates a new shared-memory logical object, which is denoted logical object B. Logical object B, at first, is a copy of the PF A service’s configuration and data elements (i.e., logical object B is a copy of logical object A). In the second step, the manager appends all elements required by the PF B service, such as variables, inputs, and outputs, to logical object B. Third, the publisher service is configured to perform a memory copy at the end of each cycle to transfer the state from logical object A to logical object B. The copy operation ensures that logical object B seamlessly obtains all the state information from logical object A. At this stage, service A is still the active service executing asset control. The publisher service is responsible for ensuring system integrity during side-by-side operations. To that aim, the publisher is configured (a) to perform the necessary twinning for A/B testing and (b) not to forward output signals of service B that interfere with service A’s output signals. The engineering team is responsible for creating an automation script that properly configures all relevant services, such as logical object B and the publisher. The twinning data are used for the (manual or automated) validation of service B behavior. If service B operates as intended, the publisher is reconfigured to block all signals of service A and simultaneously forward all signals of service B. At this stage, service B seamlessly becomes the active service and service A becomes the passive service. If the system operates as intended, the manager can terminate service A. Otherwise, the manager service or an operator can trigger a rollback by reconfiguring the publisher to its old state and terminating service B. Additional details about the described sequence can be found in our previous works [6,31].
- 4.
- In the last phase, the RT PF A service is terminated if the RT PF B service operates as intended, which leaves the system in the desired post-update state.
8.2. Results and Analysis
- Diagram(a)
- This diagram shows the time series data of the RT PS input service that acts as a data generator and sends messages to the RT PF service.
- Diagram(b)
- This diagram shows the time series data of the RT PF service A that processes the received data and forwards them to the RT PS output service.
- Diagram(c)
- This diagram shows the time series data of the RT PF service B that implements the same functional behavior as service A. Service A shall be replaced with service B.
- Diagram(d)
- This diagram shows the time series data of the RT PS output service that processes the received data from A and B and simulates data forwarding.
- Primary y-axis: The cycle usage is plotted along the primary y-axis on the left.
- Secondary y-axis: The number of messages received per cycle is plotted along the secondary y-axis on the right side.
- Annotations: Each colored area represents the execution time of its corresponding process step, as indicated by the alternating labels at the top and bottom left side of each colored area.
- The cycle usage of the RT PS input service increases slightly because of the additional message that is sent to service B.
- The cycle usage of service B increases by about 8% once the link between the RT PS input service and service B is established because service B starts to process the data received.
- Once the link between the services is established, the cycle usage of service A slightly increases.
- The newly established link does not affect the RT PS output service.
- Step does not affect service A or the RT PS input service.
- Once the link between the services is established, the cycle usage of service B decreases.
- After the successful subscription of the RT PS output service, the output service receives two messages, i.e., one message from each service.
8.3. Interpretation
9. Discussion
9.1. Research Agenda: Coverage of Research Questions and Requirements
9.2. Design Space: Self-Adaptive Software Models to Realize DevOps for Smart CPS
- Reliable remote asset monitoring is supported by the logical objects’ monitoring capabilities. Their service-oriented nature enables the dynamic subscription to asset information at runtime, which supports the dynamic composition of monitoring and alerting mechanisms.
- Reliable remote asset optimization relies on asset monitoring to drive, e.g., a machine-learning model that predicts the optimal operational parameters. The machine-driven parameter-based adaptation of CPS services, i.e., their logical objects, enables the adjustment of the runtime parameters based on the predicted parameter set without causing system downtime.
- Reliable remote asset commissioning and maintenance is driven by monitoring, self-protection, and architecture-based adaptation mechanisms. During commissioning and maintenance work, engineers typically reconfigure the underlying software system. This involves parametric changes (e.g., update of control parameters) and architectural changes (e.g., new input signal, software update, new communication link). Let us anticipate that logical objects can be used in future scenarios to pre-validate such changes in test environments. However, due to model uncertainty [41], it can only be guaranteed that testing (and simulation) can cover some aspects of the operational CPS and its environment. Hence, any reconfiguration and deployment of software and hardware in an operational environment are accompanied by change-enactment uncertainties and environmental uncertainties. In order to avoid unforeseen disruptions, it is of utmost importance to ensure the reliable execution of any adaptation to the CPS through self-protection mechanisms. In addition to self-protection, our approach also provides capabilities to mitigate model uncertainty through runtime experimentation (i.e., CE). The support for CE allows engineers to validate changes in the real operational environment before their actual deployment, which brings about the following two benefits. First, it can reduce unexpected interruptions and limited service quality. Second, engineers can use the data obtained during runtime experimentation to update their models to reflect the observations better, effectively mitigating model uncertainty. In addition, interruption-free CE fosters the vision of self-evolving computing systems [15] that rely on runtime experiments to validate system changes proposed by their machine-driven evolutionary engine.
9.3. Summary
10. Conclusions and Future Work
- Reliable and dynamically reconfigurable remote monitoring;
- Reliable, interactive, and automated remote commissioning and maintenance;
- Reliable, interactive, and automated software deployment; and
- Reliable, interactive, and automated runtime experiments (i.e., A/B testing) to validate changes in the operational environment before their deployment to production.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
API | application programming interface |
BE | best-effort |
CLT | command line tool |
CPS | Cyber-Physical System |
CU | control unit |
DevOps | DevelopmentOperation |
DT | Digital Twin |
FIFO | first-in–first-out |
IIoT | Industrial Internet of Things |
IoT | Internet of Things |
IPS2 | Industrial Product-Service System |
IT | Information Technology |
LO | logical object |
MAPE-K | monitor-analyze-plan-execute-knowledge |
OLS | ordinary least squares |
OT | Operation Technology |
PA | primary asset |
PF | primary function |
PFP | primary function portion |
PS | protocol stack |
R | requirement |
RE | reliable |
RQ | research questions |
RT | real time |
RO | real object |
SA | secondary asset |
SCADA | supervisory control and data acquisition |
SSH | secure socket shell |
SWF | software framework |
UC | use case |
Appendix A. Experiment 1
Appendix B. Screenshots
References
- Baheti, R.; Gill, H. Cyber-physical systems. Impact Control. Technol. 2011, 12, 161–166. [Google Scholar]
- Jazdi, N. Cyber physical systems in the context of Industry 4.0. In Proceedings of the 2014 IEEE International Conference on Automation, Quality and Testing, Robotics, Cluj-Napoca, Romania, 22–24 May 2014; pp. 1–4. [Google Scholar]
- Aheleroff, S.; Mostashiri, N.; Xu, X.; Zhong, R.Y. Mass personalisation as a service in industry 4.0: A resilient response case study. Adv. Eng. Inform. 2021, 50, 101438. [Google Scholar] [CrossRef]
- Meier, H.; Roy, R.; Seliger, G. Industrial Product-Service Systems—IPS 2. CIRP Ann. 2010, 59, 607–627. [Google Scholar] [CrossRef]
- Brissaud, D.; Sakao, T.; Riel, A.; Erkoyuncu, J.A. Designing value-driven solutions: The evolution of industrial product-service systems. CIRP Ann. 2022, 71, 553–575. [Google Scholar] [CrossRef]
- Dobaj, J.; Riel, A.; Macher, G.; Egretzberger, M. A Method for Deriving Technical Requirements of Digital Twins as Industrial Product-Service System Enablers. In Systems, Software and Services Process Improvement; Yilmaz, M., Clarke, P., Messnarz, R., Wöran, B., Eds.; Springer International Publishing: Cham, Switzerland, 2022; Volume 1646, Communications in Computer and Information Science; pp. 378–392. [Google Scholar] [CrossRef]
- Römer, K.; Mattern, F. Towards a unified view on space and time in sensor networks. Comput. Commun. 2022, 28, 1484–1497. [Google Scholar] [CrossRef]
- Weyns, D.; Andersson, J.; Caporuscio, M.; Flammini, F.; Kerren, A.; Löwe, W. A research agenda for smarter cyber-physical systems. J. Integr. Des. Process. Sci. 2021, 25, 27–47. [Google Scholar] [CrossRef]
- Weyns, D.; Caporuscio, M.; Vogel, B.; Kurti, A. Design for sustainability = runtime adaptation ∪ evolution. In Proceedings of the 2015 European Conference on Software Architecture Workshops, Dubrovnik/Cavtat, Croatia, 7–11 September 2015; pp. 1–7. [Google Scholar]
- Becker, C.; Chitchyan, R.; Duboc, L.; Easterbrook, S.; Mahaux, M.; Penzenstadler, B.; Rodriguez-Navas, G.; Salinesi, C.; Seyff, N.; Venters, C.; et al. The Karlskrona manifesto for sustainability design. arXiv 2014, arXiv:1410.6968. [Google Scholar]
- Taing, N.; Wutzler, M.; Springer, T.; Cardozo, N.; Schill, A. Consistent unanticipated adaptation for context-dependent applications. In Proceedings of the 8th ACM International Workshop on Context-Oriented Programming, Rome, Italy, 17–22 July 2016; pp. 33–38. [Google Scholar]
- Grieves, M.; Vickers, J. Digital Twin: Mitigating Unpredictable, Undesirable Emergent Behavior in Complex Systems. In Transdisciplinary Perspectives on Complex Systems; Kahlen, F.J., Flumerfelt, S., Alves, A., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 85–113. [Google Scholar] [CrossRef]
- Pahl, C.; Jamshidi, P.; Weyns, D. Cloud architecture continuity: Change models and change rules for sustainable cloud software architectures. J. Softw. Evol. Process. 2017, 29, e1849. [Google Scholar] [CrossRef]
- Tavčar, J.; Horvath, I. A review of the principles of designing smart cyber-physical systems for run-time adaptation: Learned lessons and open issues. IEEE Trans. Syst. Man Cybern. Syst. 2018, 49, 145–158. [Google Scholar] [CrossRef]
- Weyns, D.; Bäck, T.; Vidal, R.; Yao, X.; Belbachir, A.N. The vision of self-evolving computing systems. J. Integr. Des. Process. Sci. 2022, 26, 351–367. [Google Scholar] [CrossRef]
- Riel, A.; Kreiner, C.; Macher, G.; Messnarz, R. Integrated design for tackling safety and security challenges of smart products and digital manufacturing. CIRP Ann. 2017, 66, 177–180. [Google Scholar] [CrossRef]
- Dobaj, J.; Schuss, M.; Krisper, M.; Boano, C.A.; Macher, G. Dependable mesh networking patterns. In Proceedings of the 24th European Conference on Pattern Languages of Programs, Irsee, Germany, 3–7 July 2019; Boldt, T., Ed.; ACM: New York, NY, USA, 2019; pp. 1–14. [Google Scholar] [CrossRef]
- Avizienis, A.; Laprie, J.C.; Randell, B.; Landwehr, C. Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secur. Comput. 2004, 1, 11–33. [Google Scholar] [CrossRef]
- Siqueira, F.; Davis, J.G. Service Computing for Industry 4.0: State of the Art, Challenges, and Research Opportunities. ACM Comput. Surv. 2022, 54, 1–38. [Google Scholar] [CrossRef]
- McManus, H.; Hastings, D. A framework for understanding uncertainty and its mitigation and exploitation in complex systems. IEEE Eng. Manag. Rev. 2006, 34, 81. [Google Scholar] [CrossRef]
- Dobaj, J.; Iber, J.; Krisper, M.; Kreiner, C. A Microservice Architecture for the Industrial Internet-Of-Things. In Proceedings of the 23rd European Conference on Pattern Languages of Programs, Irsee, Germany, 4–8 July 2018; ACM: New York, NY, USA, 2018; pp. 1–15. [Google Scholar] [CrossRef]
- Qu, M.; Yu, S.; Chen, D.; Chu, J.; Tian, B. State-of-the-art of design, evaluation, and operation methodologies in product service systems. Comput. Ind. 2016, 77, 1–14. [Google Scholar] [CrossRef]
- Humble, J.; Molesky, J. Why Enterprises Must Adopt Devops to Enable Continuous Delivery. Cut. IT J. 2011, 24, 6–12. [Google Scholar]
- Ebert, C.; Gallardo, G.; Hernantes, J.; Serrano, N. DevOps. IEEE Softw. 2016, 33, 94–100. [Google Scholar] [CrossRef]
- Leite, L.; Rocha, C.; Kon, F.; Milojicic, D.; Meirelles, P. A Survey of DevOps Concepts and Challenges. ACM Comput. Surv. 2020, 52, 1–35. [Google Scholar] [CrossRef]
- Damjanovic-Behrendt, V.; Behrendt, W. An open source approach to the design and implementation of Digital Twins for Smart Manufacturing. Int. J. Comput. Integr. Manuf. 2019, 32, 366–384. [Google Scholar] [CrossRef]
- Minerva, R.; Lee, G.M.; Crespi, N. Digital Twin in the IoT Context: A Survey on Technical Features, Scenarios, and Architectural Models. Proc. IEEE 2020, 108, 1785–1824. [Google Scholar] [CrossRef]
- Wang, Z.; Gupta, R.; Han, K.; Wang, H.; Ganlath, A.; Ammar, N.; Tiwari, P. Mobility Digital Twin: Concept, Architecture, Case Study, and Future Challenges. IEEE Internet Things J. 2022, 9, 17452–17467. [Google Scholar] [CrossRef]
- Bellavista, P.; Bicocchi, N.; Fogli, M.; Giannelli, C.; Mamei, M.; Picone, M. Requirements and design patterns for adaptive, autonomous, and context-aware digital twins in industry 4.0 digital factories. Comput. Ind. 2023, 149, 103918. [Google Scholar] [CrossRef]
- Lwakatare, L.E.; Karvonen, T.; Sauvola, T.; Kuvaja, P.; Olsson, H.H.; Bosch, J.; Oivo, M. Towards DevOps in the embedded systems domain: Why is it so hard? In Proceedings of the 2016 49th Hawaii International Conference On System Sciences (Hicss), Koloa, HI, USA, 5–8 January 2016; pp. 5437–5446. [Google Scholar]
- Dobaj, J.; Riel, A.; Krug, T.; Seidl, M.; Macher, G.; Egretzberger, M. Towards digital twin-enabled DevOps for CPS providing architecture-based service adaptation & verification at runtime. In Proceedings of the 17th Symposium on Software Engineering for Adaptive and Self-Managing Systems, Pittsburgh, PA, USA, 18–23 May 2022; Schmerl, B., Maggio, M., Cámara, J., Eds.; ACM: New York, NY, USA, 2022; pp. 132–143. [Google Scholar] [CrossRef]
- Directorate-General for Research and Innovation. Industry 5.0—Towards a Sustainable, Human-Centric and Resilient European Industry. Available online: https://research-and-innovation.ec.europa.eu/knowledge-publications-tools-and-data/publications/all-publications/industry-50-towards-sustainable-human-centric-and-resilient-european-industry_en (accessed on 1 October 2023).
- Aheleroff, S.; Huang, H.; Xu, X.; Zhong, R.Y. Toward sustainability and resilience with Industry 4.0 and Industry 5.0. Front. Manuf. Technol. 2022, 2, 951643. [Google Scholar] [CrossRef]
- Xu, X.; Lu, Y.; Vogel-Heuser, B.; Wang, L. Industry 4.0 and Industry 5.0—Inception, conception and perception. J. Manuf. Syst. 2021, 61, 530–535. [Google Scholar] [CrossRef]
- Zizic, M.C.; Mladineo, M.; Gjeldum, N.; Celent, L. From industry 4.0 towards industry 5.0: A review and analysis of paradigm shift for the people, organization and technology. Energies 2022, 15, 5221. [Google Scholar] [CrossRef]
- Quin, F.; Weyns, D.; Galster, M.; Silva, C.C. A/B Testing: A Systematic Literature Review. arXiv 2023, arXiv:2308.04929. [Google Scholar]
- Haberfellner, R.; de Weck, O.; Fricke, E.; Vössner, S. Process Models: Systems Engineering and Others. In Systems Engineering: Fundamentals and Applications; Springer International Publishing: Cham, Switzarland, 2019; pp. 27–98. [Google Scholar] [CrossRef]
- Dobaj, J.; Krisper, M.; Macher, G. Towards Cyber-Physical Infrastructure as-a-Service (CPIaaS) in the Era of Industry 4.0. In Systems, Software and Services Process Improvement; Walker, A., O’Connor, R.V., Messnarz, R., Eds.; Communications in Computer and Information Science; Springer International Publishing: Cham, Switzerland, 2019; Volume 1060, pp. 310–321. [Google Scholar] [CrossRef]
- Weyns, D.; Schmerl, B.; Grassi, V.; Malek, S.; Mirandola, R.; Prehofer, C.; Wuttke, J.; Andersson, J.; Giese, H.; Göschka, K.M. On Patterns for Decentralized Control in Self-Adaptive Systems. In Software Engineering for Self-Adaptive Systems II; de Lemos, R., Giese, H., Müller, H.A., Shaw, M., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7475, pp. 76–107. [Google Scholar] [CrossRef]
- La Iglesia, D.G.D.; Weyns, D. MAPE-K Formal Templates to Rigorously Design Behaviors for Self-Adaptive Systems. ACM Trans. Auton. Adapt. Syst. 2015, 10, 1–31. [Google Scholar] [CrossRef]
- Schneider, G.F.; Wicaksono, H.; Ovtcharova, J. Virtual engineering of cyber-physical automation systems: The case of control logic. Adv. Eng. Inform. 2019, 39, 127–143. [Google Scholar] [CrossRef]
- Kuehner, K.J.; Scheer, R.; Strassburger, S. Digital twin: Finding common ground—A meta-review. Procedia CIRP 2021, 104, 1227–1232. [Google Scholar] [CrossRef]
- Tao, F.; Zhang, H.; Liu, A.; Nee, A.Y. Digital twin in industry: State-of-the-art. IEEE Trans. Ind. Inform. 2018, 15, 2405–2415. [Google Scholar] [CrossRef]
- Cimino, C.; Negri, E.; Fumagalli, L. Review of digital twin applications in manufacturing. Comput. Ind. 2019, 113, 103130. [Google Scholar] [CrossRef]
- Bayer, B.; Dalmau Diaz, R.; Melcher, M.; Striedner, G.; Duerkop, M. Digital twin application for model-based doe to rapidly identify ideal process conditions for space-time yield optimization. Processes 2021, 9, 1109. [Google Scholar] [CrossRef]
- Dobaj, J.; Macher, G.; Ekert, D.; Riel, A.; Messnarz, R. Towards a security–driven automotive development lifecycle. J. Softw. Evol. Process. 2021. [Google Scholar] [CrossRef]
- Malik, P.K.; Sharma, R.; Singh, R.; Gehlot, A.; Satapathy, S.C.; Alnumay, W.S.; Pelusi, D.; Ghosh, U.; Nayak, J. Industrial Internet of Things and its applications in industry 4.0: State of the art. Comput. Commun. 2021, 166, 125–139. [Google Scholar] [CrossRef]
- Silvestri, L.; Forcina, A.; Introna, V.; Santolamazza, A.; Cesarotti, V. Maintenance transformation through Industry 4.0 technologies: A systematic literature review. Comput. Ind. 2020, 123, 103335. [Google Scholar] [CrossRef]
- Leng, J.; Zhou, M.; Xiao, Y.; Zhang, H.; Liu, Q.; Shen, W.; Su, Q.; Li, L. Digital twins-based remote semi-physical commissioning of flow-type smart manufacturing systems. J. Clean. Prod. 2021, 306, 127278. [Google Scholar] [CrossRef]
- Mitzutani, I.; Ramanathan, G.; Mayer, S. Semantic data integration with DevOps to support engineering process of intelligent building automation systems. In Proceedings of the 8th ACM International Conference On Systems for Energy-Efficient Buildings, Cities, and Transportation (BuildSys), Coimbra, Portugal, 17–18 November 2021; pp. 294–297. [Google Scholar] [CrossRef]
- Mitzutani, I.; Ramanathan, G.; Mayer, S. Integrating Multi-Disciplinary Offline and Online Engineering in Industrial Cyber-Physical Systems through DevOps. In Proceedings of the 11th International Conference on the Internet of Things (IOT), St. Gallen, Switzerland, 8–12 November 2021; pp. 40–47. [Google Scholar] [CrossRef]
- Humble, J.; Farley, D. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation; Pearson Education: London, UK, 2010. [Google Scholar]
- Rodríguez, P.; Haghighatkhah, A.; Lwakatare, L.E.; Teppola, S.; Suomalainen, T.; Eskeli, J.; Karvonen, T.; Kuvaja, P.; Verner, J.M.; Oivo, M. Continuous deployment of software intensive products and services: A systematic mapping study. J. Syst. Softw. 2017, 123, 263–291. [Google Scholar] [CrossRef]
- Yaman, S.G.; Munezero, M.; Münch, J.; Fagerholm, F.; Syd, O.; Aaltola, M.; Palmu, C.; Männistö, T. Introducing continuous experimentation in large software-intensive product and service organisations. J. Syst. Softw. 2017, 133, 195–211. [Google Scholar] [CrossRef]
- Boschert, S.; Heinrich, C.; Rosen, R. Next generation digital twin. In Proceedings of the TMCE 2018, Las Palmas de Gran Canaria, Spain, 7–11 May 2018; Volume 2018, pp. 7–11. [Google Scholar]
- Salehie, M.; Tahvildari, L. Self-adaptive software: Landscape and research challenges. ACM Trans. Auton. Adapt. Syst. TAAS 2009, 4, 1–42. [Google Scholar] [CrossRef]
- van Solingen, R.; Basili, V.; Caldiera, G.; Rombach, H.D. Goal Question Metric (GQM) Approach. In Encyclopedia of Software Engineering; Marciniak, J.J., Ed.; Wiley: New York, NY, USA, 2002. [Google Scholar] [CrossRef]
- Wohlin, C.; Runeson, P.; Höst, M.; Ohlsson, M.C.; Regnell, B.; Wesslén, A. Experimentation in Software Engineering; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef]
- Gerostathopoulos, I.; Vogel, T.; Weyns, D.; Lago, P. How do we Evaluate Self-adaptive Software Systems?: A Ten-Year Perspective of SEAMS. In Proceedings of the 2021 International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), Madrid, Spain, 18–24 May 2021; pp. 59–70. [Google Scholar] [CrossRef]
- Grieves, M. Digital twin: Manufacturing excellence through virtual factory replication. White Pap. 2014, 1, 1–7. [Google Scholar]
- Kritzinger, W.; Karner, M.; Traar, G.; Henjes, J.; Sihn, W. Digital Twin in manufacturing: A categorical literature review and classification. IFAC-PapersOnLine 2018, 51, 1016–1022. [Google Scholar] [CrossRef]
- Singh, M.; Fuenmayor, E.; Hinchy, E.P.; Qiao, Y.; Murray, N.; Devine, D. Digital twin: Origin to future. Appl. Syst. Innov. 2021, 4, 36. [Google Scholar] [CrossRef]
- Papazoglou, M.P.; Georgakopoulos, D. Service-oriented computing. Commun. ACM 2003, 46, 25–28. [Google Scholar] [CrossRef]
- Beugnard, A. A software engineering perspective on digital twin: Many candidates, none elected. In Proceedings of the DigitalTwin 2023, Gif-sur-Yvette, France, 11–13 October 2023. [Google Scholar]
- Murray, G.; Johnstone, M.N.; Valli, C. The convergence of IT and OT in critical infrastructure. In Proceedings of the 15th Australian Information Security Management Conference, Perth, Australia, 5–6 December 2017; pp. 149–155. [Google Scholar] [CrossRef]
- Ehie, I.C.; Chilton, M.A. Understanding the influence of IT/OT Convergence on the adoption of Internet of Things (IoT) in manufacturing organizations: An empirical investigation. Comput. Ind. 2020, 115, 103166. [Google Scholar] [CrossRef]
- Giannelli, C.; Picone, M. Editorial “Industrial IoT as IT and OT Convergence: Challenges and Opportunities”. IoT 2022, 3, 259–261. [Google Scholar] [CrossRef]
- Tian, S.; Hu, Y. The role of opc ua tsn in it and ot convergence. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; pp. 2272–2276. [Google Scholar]
- Patera, L.; Garbugli, A.; Bujari, A.; Scotece, D.; Corradi, A. A layered middleware for ot/it convergence to empower industry 5.0 applications. Sensors 2021, 22, 190. [Google Scholar] [CrossRef]
- Joshi, R.; Didier, P.; Holmberg, C.; Jimenez, J.; Carey, T. The Industrial Internet of Things Connectivity Framework. Industry IoT Consortium 2022. Available online: https://www.iiconsortium.org/iicf/ (accessed on 18 October 2023).
- Baron, C.; Louis, V. Towards a continuous certification of safety-critical avionics software. Comput. Ind. 2021, 125, 103382. [Google Scholar] [CrossRef]
- Combemale, B.; Wimmer, M. Towards a Model-Based DevOps for Cyber-Physical Systems. In Software Engineering Aspects of Continuous Development and New Paradigms of Software Production and Deployment; Bruel, J.M., Mazzara, M., Meyer, B., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12055, pp. 84–94. [Google Scholar] [CrossRef]
- Hugues, J.; Hristosov, A.; Hudak, J.J.; Yankel, J. TwinOps—DevOps meets model-based engineering and digital twins for the engineering of CPS. In Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings, Virtual Event, Canada, 16–23 October 2020; Guerra, E., Iovino, L., Eds.; ACM: New York, NY, USA, 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Ugarte Querejeta, M.; Etxeberria, L.; Sagardui, G. Towards a DevOps Approach in Cyber Physical Production Systems Using Digital Twins. In Computer Safety, Reliability, and Security. SAFECOMP 2020 Workshops; Casimiro, A., Ortmeier, F., Schoitsch, E., Bitsch, F., Ferreira, P., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12235, pp. 205–216. [Google Scholar] [CrossRef]
- Hasselbring, W.; Henning, S.; Latte, B.; Mobius, A.; Richter, T.; Schalk, S.; Wojcieszak, M. Industrial DevOps. In Proceedings of the 2019 IEEE International Conference on Software Architecture Companion (ICSA-C), Hamburg, Germany, 25–26 March 2019; pp. 123–126. [Google Scholar] [CrossRef]
- Kostromin, R.; Feoktistov, A. Agent-Based DevOps of Software and Hardware Resources for Digital Twins of Infrastructural Objects. In Proceedings of the The 4th International Conference on Future Networks and Distributed Systems (ICFNDS); ACM: New York, NY, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Mertens, J.; Denil, J. The Digital Twin as a Common Knowledge Base in DevOps to Support Continuous System Evolution. In Computer Safety, Reliability, and Security. SAFECOMP 2021 Workshops; Habli, I., Sujan, M., Gerasimou, S., Schoitsch, E., Bitsch, F., Eds.; Springer International Publishing: Cham, Switzerland, 2021; Volume 12853, Lecture Notes in Computer Science; pp. 158–170. [Google Scholar] [CrossRef]
- Meissner, H.; Ilsen, R.; Aurich, J.C. Analysis of Control Architectures in the Context of Industry 4.0. Procedia CIRP 2017, 62, 165–169. [Google Scholar] [CrossRef]
- DesRuisseaux, D. Practical overview of implementing IEC 62443 security levels in industrial control applications. Schneider Electric 2018. Available online: https://www.se.com/uk/en/download/document/998-20186845/ (accessed on 13 November 2020).
- Sharpe, R.; van Lopik, K.; Neal, A.; Goodall, P.; Conway, P.P.; West, A.A. An industrial evaluation of an Industry 4.0 reference architecture demonstrating the need for the inclusion of security and human components. Comput. Ind. 2019, 108, 37–44. [Google Scholar] [CrossRef]
- Fogli, M.; Giannelli, C.; Stefanelli, C. Edge-powered in-network processing for content-based message management in software-defined industrial networks. In Proceedings of the ICC 2022-IEEE International Conference on Communications, Seoul, Republic of Korea, 16–20 May 2022; pp. 1438–1443. [Google Scholar]
- Fogli, M.; Giannelli, C.; Stefanelli, C. Joint Orchestration of Content-Based Message Management and Traffic Flow Steering in Industrial Backbones. In Proceedings of the 2022 IEEE 23rd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), Belfast, UK, 14–17 June 2022; pp. 325–330. [Google Scholar]
- Redelinghuys, A.J.H.; Basson, A.H.; Kruger, K. A six-layer architecture for the digital twin: A manufacturing case study implementation. J. Intell. Manuf. 2020, 31, 1383–1402. [Google Scholar] [CrossRef]
- Aheleroff, S.; Xu, X.; Zhong, R.Y.; Lu, Y. Digital Twin as a Service (DTaaS) in Industry 4.0: An Architecture Reference Model. Adv. Eng. Inform. 2021, 47, 101225. [Google Scholar] [CrossRef]
- European Commission: Smart Grid Coordination Group. Smart Grid Reference Architecture. Available online: https://energy.ec.europa.eu/publications/smart-grid-reference-architecture_en (accessed on 23 May 2020).
- Plattform Industry 4.0. The Reference Architectural Model Industrie 4.0 (RAMI 4.0)—An Introduction. Available online: https://www.plattform-i40.de/IP/Redaktion/EN/Downloads/Publikation/rami40-an-introduction.html (accessed on 23 May 2020).
- Shi, W.; Dustdar, S. The promise of edge computing. Computer 2016, 49, 78–81. [Google Scholar] [CrossRef]
- Lo Bello, L.; Steiner, W. A Perspective on IEEE Time-Sensitive Networking for Industrial Communication and Automation Systems. Proc. IEEE 2019, 107, 1094–1120. [Google Scholar] [CrossRef]
- UP – Bridge the Gap. UP Core Plus Specifications. Available online: https://up-board.org/upcoreplus/specifications/ (accessed on 23 July 2023).
- Canonical. Real-Time Ubuntu Is Now Generally Available. Canocial 2/14/2023. Available online: https://canonical.com/blog/real-time-ubuntu-is-now-generally-available#:~:text=14%20February%202023%2C%20London%3A%20Canonical,guarantee%20within%20a%20specified%20deadline (accessed on 23 July 2023).
- McKinley, P.K.; Sadjadi, S.M.; Kasten, E.P.; Cheng, B.H. Composing adaptive software. Computer 2004, 37, 56–64. [Google Scholar] [CrossRef]
- Artac, M.; Borovssak, T.; Di Nitto, E.; Guerriero, M.; Tamburri, D.A. DevOps: Introducing infrastructure-as-code. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), Buenos Aires, Argentina, 20–28 May 2017; pp. 497–498. [Google Scholar]
- Hüttermann, M. Infrastructure as code. In DevOps for Developers; Springer: Berlin/Heidelberg, Germany, 2012; pp. 135–156. [Google Scholar]
- Krug, T.; Dobaj, J.; Macher, G. Enforcing Network Safety-Margins in Industrial Process Control Using MACD Indicators. In Systems, Software and Services Process Improvement; Yilmaz, M., Clarke, P., Messnarz, R., Wöran, B., Eds.; Communications in Computer and Information Science; Springer International Publishing: Cham, Switzerland, 2022; Volume 1646, pp. 401–413. [Google Scholar] [CrossRef]
Type | Label | Section | Description |
---|---|---|---|
Method | Section 1.2 | Design space for the structured engineering of sustainable and smart CPSs | |
Agenda | Section 1.3 | Research agenda and methodology of the presented industrial case study | |
Analysis | Section 2 | State-of-the-art analysis of Digital Twins in modern industrial environments | |
Requirements | Section 3 | Requirements for Digital Twin-enhanced embedded control software | |
Design Model | D1 | Section 4 | DT-enhanced control service for resource-constrained embedded devices |
D2 | Microservice-based design for the orchestration of native embedded services | ||
D3 | Secondary asset DT model to manage and reflect the ICT properties and states of the CPSs | ||
Servitization | Section 5 | Description of the experiment setup and the practical implementation of the CPS asset servitization | |
Emperical Experiments | E1 | Section 6 | Sub-millisecond real-time twinning for context- and self-monitoring of CPS asset properties |
E1 | Section 6 | Sub-millisecond rel-time closed-loop control of CPS assets | |
E2 | Section 7 | Self-protection: Zero-downtime parameter-based self-adaptation of CPS assets to mitigate environmental and change-enactment uncertainty during CD and CE | |
E3 | Section 8 | Zero-downtime architecture- and parameter-based asset adaptation to enable CD and CE in CPS | |
Demonstrated DevOps Capabilities to facilitate CD and CE | C1 | Section 6, Section 7 and Section 8 | Reliable and reconfigurable remote monitoring of CPS assets and services (see E1, E2, E3) |
C2 | Section 7 | Reliable parameter-based remote adaptation of CPS services and communication properties (see E2) | |
C3 | Section 8 | Reliable remote software deployment and updating (i.e., architecture-based adaptation of CPS services and communication links) (see E3) | |
C4 | Section 8 | Reliable runtime experiments (i.e., A/B testing) to test architecture and parameter changes (see E3) | |
Discussion of Supported Industrial Use Cases | U1 | Section 9 | Reliable remote asset monitoring (enabler: C1) |
U2 | Reliable remote asset optimization (enabler: C1, C2) | ||
U3 | Reliable remote asset commissioning (enabler: C1, C2, C3, C4) | ||
U3 | Reliable remote asset maintenance (enabler: C1, C2, C3, C4) | ||
Conclusion | Section 10 | Conclusion and future work |
Type | Name | Value |
---|---|---|
Constants: Timing | Cycle time | 1 ms |
Management portion | 15% | |
Primary function portion | 70% | |
Twinning portion | 15% | |
Constants: Data flow | PA message payload size | 1500 Byte |
Number of twinned SA states per cycle | 317 | |
Size per twinned SA states | 4 Byte | |
Number twinning messages per cycle | 2 | |
Factor1: Timing | PF input portion | % |
PF function portion | % | |
PF output portion | % | |
Factor2: Data flow | Number of PA messages per cycle | |
Dependent variables | All constants, factors, and cycle portions | All dependent variables are measured per cycle leveraging the SA state twinning mechanism. |
Overall cycle usage | ||
Jitter (in percentage of the cycle time) | ||
Backpressure | ||
Various other dataflow statistics |
Type | Name | Value |
---|---|---|
Constants: Timing | Cycle time | 1 ms |
Management portion | 15% | |
Primary function portion | 70% | |
Twinning portion | 15% | |
PF input portion | 20% | |
PF function portion | 30% | |
PF output portion | 20% | |
Constants: Data flow | PA message payload size | 1500 Byte |
Number of twinned SA states per cycle | 317 | |
Size per twinned SA states | 4 Byte | |
Number twinning messages per cycle | 2 | |
Number of critical PA messages per cycle | 4 | |
Number of best-effort PA messages per cycle | 30 | |
Constants: CLT trigger condition | if (backpressure[RT PS Input] > 50) for more than 3 ms, then activate countermeasure | |
Factor: Fault-injection | Period between the activation of best-effort message bursts | s |
Dependent variables | MAPE loop latencies | |
Messages sent per cycle | ||
Backpressure per service |
Type | Name | Value |
---|---|---|
Constants: Timing | Cycle time | 1 ms |
Management portion | 15% | |
Primary function portion | 70% | |
Twinning portion | 15% | |
PF input portion | 20% | |
PF function portion | 30% | |
PF output portion | 20% | |
Constants: Data flow | PA message payload size | 1500 Byte |
Number of twinned SA states per cycle | 317 | |
Size per twinned SA states | 4 Byte | |
Number twinning messages per cycle | 2 | |
Number of PA messages per cycle | 1 | |
Factor: A/B deployment process | Phase 1: Deploy PF B (PF B is a replica of PF A with the same complexity, execution time, inputs, and outputs, as defined by the constant factors above). | |
Phase 2: Run PF B aside PF A | ||
Phase 3: Termiante PF A | ||
Dependent variables | Cycle usage | |
Messages received per cycle | ||
Execution time per update and reconfiguration step |
Step | Name | Duration | Description |
---|---|---|---|
s[01] | Observe | 4.98 s | Measure system behavior before the start of the update process. |
s[02] | Start B * | 6.51 s | CLT starts service B. |
s[03] | Connect to B | 0.54 s | CLT establishes a management connection to service B. |
s[04] | Configure B | 2.85 s | CLT configures the timing and cycle portions of service B. |
s[05] | Setup twinning of B | 0.84 s | CLT establishes the monitoring (i.e., twinning). |
s[06] | Observe | 5.96 s | Measure system behavior before linking B to the input and output services. |
s[07] | Link Input to B | 6.34 s | CLT registers service B as subscriber to all messages published by the RT PS input service. Therefore, the CLT must update the configurations of both services, requiring the execution of several reconfiguration commands. The CLT uses twinning information to validate service invariants to ensure system integrity after each reconfiguration. |
s[08] | Link B to Output | 6.46 s | Similar to , but the RT PS Output service subscribes to all published messages of B. |
s[09] | Observe | 5.17 s | Measure system behavior when B is running side-by-side with A. |
s[10] | Terminate A** | 0.50 s | CLT terminates service A, which deregisters itself from all subscriptions. |
s[11] | Observe | 5.37 s | Measure system behavior after update process completion. |
Sum | All steps | 45.2 s | |
Observe steps only | 21.2 s | ||
No observe steps | 24.3 s |
Service | Comparision |
---|---|
RT PS Input | The update process does not affect the number of messages to be processed in each cycle. The cycle usage in the first phase after the update is equal to the cycle usage before the update. However, after approximately one second, the cycle usage increases by about 5%. A subsequent investigation has shown a software bug in the subscriber deregistration sequence. As a consequence, the RT PS input service executes several retries to send data to the no-longer-running RT PF A service. This observation does not invalidate the experiment results. Instead, it highlights the robustness of the implementation, showing that individual service failures do not propagate through the system but are observable via the twinning mechanisms. |
RT PF A vs. B | The update process does not affect the cycle usage and the number of messages to be processed per cycle. |
RT PS Output | The update process does not affect the cycle usage and the number of messages to be processed per cycle. |
Requirement | RQ | Experiment | Properties |
---|---|---|---|
R1 Primary Asset Closed-Loop Control | RQ2 | Exp.1 (Section 6) | Figure 13 and Figure 14 confirm that the closed-loop cycle-time remains below 1 ms. |
R2 Primary Asset Monitoring Fidelity | RQ1 | Exp.1 (Section 6) | Figure 19, Figure 20 and Figure 21 confirm that the logical objects support the sub-millisecond twinning of primary asset data. |
R3 Secondary Asset Monitoring Fidelity | RQ1 | Exp.1 (Section 6) | Figure 19, Figure 20 and Figure 21 confirm that the logical objects support the sub-millisecond twinning of secondary asset data. |
R4 Secondary Asset Closed-Loop Control | RQ2 | Exp.2 (Section 7) | Figure 23 and Figure 24 illustrate that the logical objects support context- and self-aware parameter-based self-adaptation capabilities to provide self-protection mechanisms for mitigating environmental and change-enactment uncertainties. Figure 25 shows that the MAPE loop worst-case execution time for reliable (i.e., continuous self-monitoring and invariant checking) parameter-based self-adaptation is below 70 ms. |
R5 Secondary Asset Management | RQ2 RQ3 | Exp.3 (Section 8) | Figure 27 demonstrates how parameter-based and architecture-based adaptation can be combined with the logical objects’ capabilities to monitor and orchestrate service deployment and runtime experimentation, demonstrating that the software framework effectively supports CD and CE for CPS at the embedded system layer. |
R6 Servitization | RQ3 | Section 5 Exp. Setup and Instrumentation | Figure 10 shows that the experimental setup provides dedicated interfaces for local and remote monitoring, orchestration, automation, control, and adaptation. The effectiveness and service-oriented nature of these interfaces is demonstrated in all experiments. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dobaj, J.; Riel, A.; Macher, G.; Egretzberger, M. Towards DevOps for Cyber-Physical Systems (CPSs): Resilient Self-Adaptive Software for Sustainable Human-Centric Smart CPS Facilitated by Digital Twins. Machines 2023, 11, 973. https://doi.org/10.3390/machines11100973
Dobaj J, Riel A, Macher G, Egretzberger M. Towards DevOps for Cyber-Physical Systems (CPSs): Resilient Self-Adaptive Software for Sustainable Human-Centric Smart CPS Facilitated by Digital Twins. Machines. 2023; 11(10):973. https://doi.org/10.3390/machines11100973
Chicago/Turabian StyleDobaj, Jürgen, Andreas Riel, Georg Macher, and Markus Egretzberger. 2023. "Towards DevOps for Cyber-Physical Systems (CPSs): Resilient Self-Adaptive Software for Sustainable Human-Centric Smart CPS Facilitated by Digital Twins" Machines 11, no. 10: 973. https://doi.org/10.3390/machines11100973
APA StyleDobaj, J., Riel, A., Macher, G., & Egretzberger, M. (2023). Towards DevOps for Cyber-Physical Systems (CPSs): Resilient Self-Adaptive Software for Sustainable Human-Centric Smart CPS Facilitated by Digital Twins. Machines, 11(10), 973. https://doi.org/10.3390/machines11100973