Computationally Efficient Inference via Time-Aware Modular Control Systems

Shchyrba, Dmytro; Zarzycki, Hubert

doi:10.3390/electronics13224416

Open AccessArticle

Computationally Efficient Inference via Time-Aware Modular Control Systems

by

Dmytro Shchyrba

^* and

Hubert Zarzycki

^*

Faculty of Information and Communication Technology, Wroclaw University of Science and Technology, 50-370 Wroclaw, Poland

^*

Authors to whom correspondence should be addressed.

Electronics 2024, 13(22), 4416; https://doi.org/10.3390/electronics13224416

Submission received: 8 September 2024 / Revised: 6 November 2024 / Accepted: 6 November 2024 / Published: 11 November 2024

(This article belongs to the Special Issue Advancements in Artificial Intelligence (AI) for Engineering Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Control in multi-agent decision-making systems is an important issue with a wide variety of existing approaches. In this work, we offer a new comprehensive framework for distributed control. The main contributions of this paper are summarized as follows. First, we propose PHIMEC (physics-informed meta control)—an architecture for learning optimal control by employing a physics-informed neural network when the state space is too large for reward-based learning. Second, we offer a way to leverage impulse response as a tool for system modeling and control. We propose IMPULSTM, a novel approach for incorporating time awareness into recurrent neural networks designed to accommodate irregular sampling rates in the signal. Third, we propose DIMAS, a modular approach to increasing computational efficiency in distributed control systems via domain-knowledge integration. We analyze the performance of the first two contributions on a set of corresponding benchmarks and then showcase their combined performance as a domain-informed distributed control system. The proposed approaches show satisfactory performance both individually in their respective applications and as a connected system.

Keywords:

distributed artificial intelligence; modular neural systems; neural control; graph neural networks; physics-informed graph learning; meta-optimization; physics-informed machine learning; multi-agent system

1. Introduction

1.1. Preface

Multi-agent decision-making systems are a common concept in modern society, having found applications in industrial engineering [1], finance [2], traffic control [3], and optimal decision-making overall [4].

In this work, we will focus on real-time multi-agent systems (RTMAS) [5].

Profound expert knowledge is usually key to their design. However, this approach suffers from the inability to leverage abundant data resources, which are of increasing abundance with the constant growth in the importance of computing and Industry 4.0.

Neural networks have been known to drastically outperform handcrafted controllers in terms of both their performance and their data leverage capabilities. However, their usage is still widely restricted due to problems with instability [6], lack of interpretability, and lacking possibilities of integration with the expert knowledge domain [7].

Also, as computation becomes a more and more common expense in intelligent autonomous systems, there is the issue of inference, resulting in the appearance of a growing field of research on adaptive inference. There is a wide range of perspectives on both issues, ranging from integrating physical knowledge into the way black-box models operate [8] to imitating the behavior of the human brain [9] to increase the expressive power of neural networks.

In this work, we will be handling the questions of how to incorporate the underlying knowledge of spatio-temporal relations between agents and how to improve the usability of sequential models in varying conditions.

This article is separated into 3 main sections. In the Section 1, we list the key contributions and offer a summary of the contents. In the Section 2, related research on each of the topics on which we touch is outlined. In the Section 3, we directly describe our ways of approaching each issue. The Section 4 is devoted to empirical analysis of the proposals.

1.2. Contributions

PHIMEC
–
novel framework for neural control, which addresses data-leveraging efficiency
–
two variants in terms of design—recurrent and fully connected
–
two variations in terms of learning federation
–
showcases its performance in a real-world control application
IMPULSTM
–
integrates temporal knowledge into an LSTM design
–
addresses a lack of convexity of the loss landscape for recurrent neural networks to enhance generalization abilities
–
addresses the memory efficiency of LSTM design
–
provides a formal analysis of reduced expressiveness from our changes
–
proposes a human-in-the-loop way to control how networks handle situations with respect to temporal data
DIMAS
–
proposes a modular-design framework for domain-informed distributed control systems
–
combines physical awareness with temporal insight for the sake of full domain-knowledge integration
–
addresses the issue of wasted compute in RTMAS systems in an intuitive way

2. Related Research

2.1. Data-Efficient Neural Control

Data-driven control is one of the most common applications of artificial intelligence.

Deep reinforcement learning (DRL) is well known for the successful handling of complex cases in control and overall decision-making tasks with considerable precision. By leveraging large amounts of data from sensors and simulations, it can effectively resolve highly complex tasks both in industry and research.

In DRL, efficient data leverage is an especially important issue due to the reward-based learning perspective demanding large amounts of simulation data.

As a key inspiration for PHIMEC, we consider the work of Sun et al. [10], where authors propose a way to search for optimal parametric design by integrating the optimal decision-making approach and learning the problem representation via gray-box surrogate modeling.

Another perspective that we noted was a more usual design, finding optimal parameters of a decision-making agent for real-time control in advance [11].

Fast simulation is a core element in our approach, and for that purpose, we employ the RK-PINN [12], although neither usage of that specific model nor the usage of physics-informed neural networks is required.

The representation learner in PHIMEC is only restricted to a family of models, a computational graph of which allows for the differentiation of task-relevant control criteria with respect to the input. The approach of the integration of physical constraints into the unsupervised learning process of Autoencoder learning has been shown quite similarly in work by Erichson et al. [13].

The task of learning the representation of the decision-making system remains unchanged, irrespective of the specific design decisions. This aspect was also heavily influenced by Autoencoder [14,15] models.

Elements of classical reward-based learning remain present. Thus, a set of benchmarks was determined from the most prominent works from that perspective. These works include RDPG [16], DDPG [17], SAC [18], TD3 [19], and RTD3 [20].

2.2. Temporal Awareness

Introduced by Herr Hochreiter in 1997 [21], Long Short-Term Memory(LSTM) neural networks have since become an irreplaceable tool in sequential decision-making.

Solving the problem of exploding gradients [22], they have allowed recurrent neural networks to become a lot deeper and process far longer sequences, which will be of relevance to DIMAS architecture.

Although in recent years, classical recurrent architecture has become less of a hot topic after the Transformer [23,24] introduction, it still has lots of applications, especially in cases where recurrent structure would allow the integration of some expert knowledge into the model [25].

Our second contribution is strongly influenced by handcrafted interpretable recurrent algorithms [26] and autoregressive models [27], and the possibilities they create for human-in-the-loop system design.

The studies on incorporating the spacing between events [28] while decreasing the number of both parameters and nonlinear elements have also been of inspiration, as memory requirements are critical for practical applications. The transfer of expert knowledge into a black-box model with the goal of gray-box modeling [29] has been a topic of active research for a couple of decades. There is a wide range of approaches, such as weight transfer from pretrained models [30] or treating signal feature extraction as part of a model, as in the study by Branco et al. [31] where wavelet filters are combined with LSTM units.

Frequential awareness [32] has been applied and incorporated into sequence modeling tasks in a wide range of applications, such as software vulnerability detection [33] or inductive bias analysis of RNNs themselves [34].

The most similar work to ours shows the subset of time-aware recurrent neural networks, where the authors either use the attention mechanic to avoid the one-sided flow of temporal information and mimic the behavior of the transformer [35] or focus on the duration of events [36]. Although these approaches may seem similar, using the same traits of recurrent models, our proposal boasts higher performance.

2.3. Modular Neural Systems

Modular neural systems have been widely explored for distributed control across domains such as robotics, multi-agent coordination, and complex adaptive systems, given their potential for task specialization, fault tolerance, and scalability in dynamic environments.

Modular architectures like neural sub-networks or ensembles have enabled focused processing that can efficiently coordinate across decentralized agents while maintaining robustness under varying conditions [37,38,39].

In particular, modular structures are effective in dynamically changing contexts, where isolated component failure is less likely to destabilize the entire system [40].

Our work specifically targets modular graph-processing, building on the versatility of graph neural networks (GNNs) to handle relational and distributed control tasks [41].

Modular GNNs provide the ability to localize computation across graph nodes, a concept increasingly leveraged for adaptable distributed control systems [42].

A similar approach is seen in neural module networks by Andreas et al. [43], which enables task-specific processing through dynamically assembled modules.

Inspired by these designs, our proposed architecture, DIMAS, arranges distinct modules to handle control tasks independently yet cooperatively, guided by specific priority rules and interaction protocols.

Additionally, our integration of convolution and recurrence layers within a single modular structure takes inspiration from Yang et al. [44], who applied a pyramid-based approach to structured modularity in neural networks. Though originally designed for different applications, their approach has influenced our own in terms of hierarchical processing and addressing task-specific dependencies in modular architecture.

Furthermore, prioritization modules, frequently used in fields like energy management for efficient resource allocation [45,46], inform our modular-design approach to hierarchical task management within a distributed control setting.

Finally, integrating physical awareness as an independent learning module for control policies has shown effectiveness in enabling adaptive and context-aware systems, a concept echoed in recent work on physics-informed reinforcement learning [47].

Our work extends these ideas, using modular graph-based architectures to integrate task-specific and environmental awareness into distributed control frameworks.

3. Proposals

In this section, we will present the key components of DIMAS and analyze their respective capabilities as well as the limitations regarding their interpretability.

3.1. PHIMEC

Here, on Figure 1 we present the generalized architecture of the proposed meta-learning framework: green denotes the architectural elements that do not require training, such as objective function computation for sampling processes or the concatenation of state, control, and time. Blue denotes the control agent and the physics representation learner, whose role is that of the RK-PINN [12].

Assumption:

\exists U (s, a) 3 \forall s_{0}, s_{1}, \dots, s_{T}, \exists (a^{*}, s) .

where

(a^{*}, s)

is the unique optimal solution for the Bellman equation of the form

V^{*} (s) = \max_{a} [U (s, a) + γ \sum_{s^{'}} P (s^{'} | s, a) V^{*} (s^{'})]

V^{*} (s)

represents the optimal value function for state s

γ

is the discount factor

P (s^{'} | s, a)

is the transition probability from states s to

s^{'}

, given action a

U (s, a)

is some utility function

s_{0}, s_{1}, s_{2}, \dots, s_{T}

is a state sequence on time horizon T.

This can also be paraphrased as a weaker notion of global controllability of the system, as we do not state the possibility of any trajectory via control, but the existence of at least some possible combinations of initial and final state.

We do not assume that the aforementioned Bellman equation has to be solvable analytically, nor do we claim the uniqueness of utility functions, allowing for the existence of some additional regularizing term, possibly defined by learning mechanics, that is able to introduce uniqueness.

We do not assume that the Bellman equation must have an exact analytical solution or that utility functions are always unique. Instead, we allow for the possibility of adding a regularization term, which could be learned through some mechanism to create uniqueness.

Even making this limited claim may seem uncertain without formal proof. However, in practice, there is usually no need to add extra conditions to ensure uniqueness, as solutions are often unique without them.

The evidence for that serves the effectiveness of deterministic agents that can achieve the level of performance. Therefore, we deem it as truly optimal for systems that allow analytical inspection.

The overall perspective of learning optimal control via a physics-informed neural network is anything but new, as the data efficiency of control agents is a crucial issue in the field.

However, the problem with current research on the topic, such as the work by Mowlavi et al. [48], is that although the framework wins by a margin compared to fully black-box approaches at the time of learning, the advantage disappears at the time of inference, as the need to solve complex optimization problems in real-time settings still stands, thus either losing the usual speed advantage of neural networks or decreasing its prominence. PHIMEC belongs to a hybrid family, operating in a physics-informed approach during training and then going fully black-box in operational settings.

The architecture is very simple. The overall system employed, as shown in the diagram, can be perceived as a residual encoder. The model is separated into two parts—the controller and the representation learner. We consider two main schematics of training—joint and separate.

During joint training, from the beginning, the input, which consists only of information about the environment, such as sensor data, first goes into the input layer of the controller. Then, via residual connection, it is concatenated with the output of the controller and is used as an input via a representation learner.

What differentiates our structure from the typical representation learners, such as autoencoders, is that although embeddings are usually learned in the first half of the NN, PHIMEC is somewhat inverted because the flow begins from the decoder and then goes into the encoder, which tries to learn the representation of the code.

Separate training is split into three phases. First, the RK-PINN is trained in an unsupervised manner to learn the characteristics of the controlled system. Second, the computational graph of the optimizer is added to it as a head.

Then, we sample possible control outputs via a user-defined data-generation process and, obtaining their gradients via a representation learner, train the controller in a supervised manner (except that the gradient of the output is not computed via the difference between the control output and some perfect result, but is instead straightforwardly obtained from meta-training PINN).

The gradient of standard feedforward agent weights during either a joint or separate training process can be expressed as follows:

\frac{\partial L}{\partial W_{A}} = \sum_{t = 0}^{T} (\frac{\partial L}{\partial s_{f}} \cdot \frac{\partial s_{f}}{\partial a_{t}} \cdot \frac{\partial a_{t}}{\partial W_{A}})

where

W_{A}

is a controller weight

a_{t}

is some action corresponding to some state in a batch of size T

s_{f}

is the reached state of the system

L is the objective function, computed according to the

s_{f}

, which can be either maximized or minimized

If the recurrent model is used as a controller, the computation of the gradient is only slightly different to account for backpropagation through time. However, it restrains data generation, or the state-sampling process, exchanging completely random batches for time horizons.

Although any changes for PINN architecture in recurrent cases are not required, we still consider the possibility of using RNN as a representation learner, following frameworks such as the work of Zheng et al. [49], which could allow for joint training in the case of a recurrent agent.

\frac{\partial L}{\partial W_{A}} = \sum_{t = 0}^{T} (\frac{\partial L}{\partial s_{f}} \cdot \frac{\partial s_{f}}{\partial a_{t}} \cdot \frac{\partial a_{t}}{\partial h_{t}} \cdot \frac{\partial h_{t}}{\partial W_{A}})

Although joint and separate training may seem similar, they possess widely different empirical traits.

This separate approach allows for increased efficiency of computational resource allocation, e.g., in cases where a pretrained representation learner is already available; however, we want to train small and effective controllers for specific settings via few-shot learning or instead finetune the PINN according to some changes in the environment.

On the other hand, empirical analysis of the low-rank decomposition of weight parameters for simple tasks, such as pendulum stabilization with small controllers and trainers, has shown joint PHIMEC to be more robust to the initialization of weights, showing higher silhouette scores. If we have lots of data available, and there is no interest in dynamic changes in behavioral or environmental specifics, it can outperform a separate training process with a rigorous specification of the parameters.

However, it is also more impervious to the choice of architectural parameters of both networks and the characteristics chosen for training the optimization algorithm due to the increased risks of exploding or vanishing gradients during their propagation through both networks at the same time before unsupervised pretraining.

Thus, also possessing potential from a theoretical perspective and possessing increased convexity of loss function, we consider a separated training scheme to be a more viable structure for industrial settings and for the case of recurrent agents, which are prone to suffering from gradient instability due to the mechanics of backpropagation through time anyway.

We present both RNN and MLP versions of control agents, using joint training in both cases.

3.2. IMPULSTM

3.2.1. Methodology

IMPULSTM is a variation of LSTM [21], which aims at being both expressive enough to learn complicated patterns in the signal being processed and robust enough to allow for long sequence processing.

Our proposed change both achieves this result and allows for the transferring of the knowledge about changes in signal impulse response into the model, enabling the user to account for it without the need to retrain or finetune the model, as that is known to distort the feature-extracting process, causing a decrease in out-of-distribution performance. However, it comes at the price of losing expressive abilities compared to classical LSTM, thus making this approach highly restrictive for low-complexity, long-context tasks.

\begin{matrix} i_{t} = σ (W_{i i} \cdot x_{t} + W_{h i} \cdot h_{t - 1}) \\ f_{t} = σ (W_{i f} \cdot x_{t} + W_{h f} \cdot h_{t - 1}) \\ o_{t} = W_{i o} \cdot x_{t} + W_{h o} \cdot h_{t - 1} \\ g_{t} = \tanh (W_{i g} \cdot x_{t} + W_{h g} \cdot h_{t - 1}) \\ c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ g_{t} \end{matrix}

Here, we present the set of update equations for the LSTM cell with proposed architectural change.

m = | w_{x} | + | w_{h} |

a = | w_{x} | / m

r a t i o = f / f^{'}

a^{'} = 1 - {(1 - a)}^{r} a t i o

w_{x}^{'} = w_{x} * (a^{'} / a) w_{h}^{'} = w_{h} * (1 - a^{'}) / (1 - a)

(1)

Here is a step-by-step algorithm for updating the weights of the output gate according to changes in sampling rate.

There,

w_{x}

and

w_{h}

are the initial input and hidden state weight parameters, while

w_{x}^{'}

and

w_{h}^{'}

are the irregular ones, f is the initial sampling rate, and

a^{'}

and

f^{'}

are the “irregular” parameter and sampling rate, respectively. The absolute values account for the highly probable chance of the weights having different signs or both being negative, while the lack of bias allows for a sure preservation of magnitude in layer output to remove the need to update the linear layer at the output end of the network.

This algorithm mimics the parametric changes in exponential smoothing. This particular connection was inspired by the work of Woo et al. on the introduction of ARMAX mechanics into transformers [50]. Also, some key changes in the architecture are added.

First, we remove the activation function from the output gate, and second, we remove the biases altogether, with the goal of responding to changes in impulse response more efficiently, as the whole variation is now offset-independent, allowing for better retainment of scale between the input and output of each consecutive layer.

The interesting part of linearization, as it is, is that it also shows positive results, with the network becoming more relatively narrow, although not as visible, even without manipulations with frequency, simply due to the relaxation of loss landscape, allowing for the increase in learning efficiency, while still allowing for the required degree of expressiveness and negative Lyapunov spectrum of gradient propagation back in time via other gates.

This is not totally unexpected, as simple linear models of signal lags have been shown sometimes to outperform even state-of-the-art architectures on the tasks of their respective interest, such as time-series forecasting in the case of ARMA models, the behavior of which is essentially imitated in IMPULSTM.

3.2.2. Scaling of Architecture Changes

As it is an important indirect effect of our architecture changes, which comes irrespectively of whether the aforementioned algorithm is used, we shall provide some clarification on its behavior depending on the specifics of the implementation and architecture of a specific DNN.

The scaling of computational requirement reduction is given by the following equation:

Compute scaling = \frac{w (S - 1)}{4 S + T + d + H}

where:

-: w is the width of the layer,
-: S is the ratio between the computation for a sigmoid activation and the chosen nonlinear activation,
-: T is the computation required for a tanh activation,
-: d is the computation cost for a dot product operation, and
-: H is the computation cost for a Hadamard product.

We separate the computational costs for the Hadamard product (H) and the dot product (d) to address potential differences in parallelization strategies. This formula represents the contribution of each component to the overall computational requirements. The ratio R, which measures the compute/memory requirements of a nonlinear versus linear neuron, can be factored into these terms.

The structural components of the LSTM remain unchanged except for the output gate. Thus, we primarily account for expressiveness loss in this part of the network. In a standard LSTM, the output gate maps the domain

x (t), h (t)

to the codomain

x (t + 1), h (t + 1)

via a linear product involving

x (t), h (t), c (t)

, and a sigmoid activation function:

Sigmoid (W (x (t), h (t))) .

In terms of theoretical expressiveness, this output gate can be seen as an additional nonlinear layer in a feedforward neural network. It processes the representation from the rest of the LSTM structure before producing the final output.

In IMPULSTM, this additional nonlinear layer is removed, resulting in a reduction in both expressiveness and vanishing/exploding gradient—corresponding to the removal of L MLP layers from a neural network—and thus, the scaling of that change can be explained in the same way as the scaling of an MLP with depth and width.

This is due to it affecting only formal bounds—the empirical increase in training process ease scales with the Lipshitz bound of loss function hypersurface with the depth of the network.

3.3. DIMAS

In this section, we present the third and final contribution, which encapsulates the two previous contributions and fully leverages them. Here, we present the general architecture of DIMAS.

As illustrated in the diagram on Figure 2, the structure is organized into three layers, with the priority manager at the core serving as the only non-interchangeable component of the system.

The integration of domain knowledge regarding the spatio-temporal characteristics of the analyzed system is fundamental to the gray-box information-processing system that incorporates physical principles, a concept explored in various works on the topic.

Addressing such problems while managing other issues in a scalable manner necessitates the use of highly complex structures. A balance must be struck between efficacy and scalability.

Typically, a complex solution is proposed wherein the individual modules are interdependent, making it challenging to interpret the effects of changes in any module due to the lack of discrete, standalone architecture elements.

In our analysis, we have prioritized designing each module in a way that allows for easy interchangeability and complete separation of their end-to-end preparation processes.

This property comes not from traits such as scalability to a certain measurable parameter of specific architecture but from the intuitivity of separation between the integration of the specific data about individual elements and the state of the system overall.

DIMAS is rigid only in its general structure, not in its specific components.

Although DIMAS is a versatile framework and can be applied even when considering control agents individually, we believe its most effective application lies in leveraging the interrelationships between these agents to optimize the system as a whole.

Therefore, we represent the system as a graph G, which is defined as follows:

G = (V, E)

where V is the set of vertices and E is the set of edges.

V would contain the separate control entities, and E would mean some numerical representation of the relationships, e.g., the distance between them.

The control quality criterion C over a time horizon from 0 to T is defined as:

C = \sum_{t = 0}^{T} \sum_{i \in V} L (x_{i} (t), u_{i} (t))

where:

$x_{i} (t)$ is the last measured state of node i at time t.
$u_{i} (t)$ is the amount of control applied to node i at time t.
L is a function that depends on $x_{i} (t)$ and $u_{i} (t)$ .

The function

L (x_{i} (t), u_{i} (t))

represents the control quality at each node i and each time t, incorporating both the state and the control applied. In the case that the control quality can only be measured by assessing the system as a whole, L corresponds to the impact of individual nodes on overall quality.

Thus, the overall control quality criterion C is the sum of the control qualities over all nodes and all time steps.

3.3.1. Priority Manager (PM)

Described earlier as the core of DIMAS, PM comprises the logic behind making the decision, whether to gather and involve the most recent data from each agent on each timestep and involve it in control overall.

Although it may not seem that important for the regulator design itself, its goal is to reduce the compute expenditure and provide the possibility of adaptive downscaling, which is a usual problem with any system involving neural networks, especially presently, with the increase in popularity of large language models [24]. It may be stated as a decision-tree model operating on individual nodes or as a resource-allocation algorithm operating on groups of nodes at the same time.

3.3.2. Urgency Decoder (UD)

In summary, the primary function of

U_{D}

is to analyze the regulated system and aggregate information from various parts or individual elements. We have positioned it at the top of the graphical model, as its behavior is most easily visualized in typical use cases. The choice to separate it is for two reasons.

First, it increases the intuitivity of parallelization. Here, on the Figure 3 we show a flow diagram with a schema example, where k and d are the absolute differences in processing time for each structural element. For the sake of optimal scheduling, k should equal d.

Second, whether it is in industrial engineering [51] or social sciences [52], graph signal processing (GSP) is a powerful concept with a large underlying research foundation.

Separating UD into a different entity from PM allows us to access a wider range of GSP techniques, as now, UD is only connected to PM via a requirement to present a graph representation, where each node possesses its own urgency-related representation of the state.

3.4. Scalability

In this subsection, we will formally clarify the conceptual limitations of the interpretability of individual components within the system.

Additionally, we will define how system performance depends on the specific characteristics of the controlled system, outlining the relationship between these traits and the efficiency of the modular elements.

3.4.1. Structural

As DIMAS is a modular system, its scalability with respect to task complexity is primarily determined by the configuration of its individual components.

From a computational perspective, DIMAS scales similarly to its non-modular counterpart.

Since it is based on a graph neural network architecture, the system exhibits linear scalability relative to the number of controlled agents.

However, the incorporation of PHIMEC and IMPULSTM significantly impacts system efficiency. We will analyze each component individually to highlight the specific contributions and improvements they offer.

3.4.2. PHIMEC

As PHIMEC is essentially used as a combination of two neural networks, its efficiency traits are pretty generic and may either be described by the universal approximation theorem [53] or constraining it to the specific subset of neural systems [54,55].

In addition to the empirical improvements in efficiency, which will be demonstrated in Section 4, this approach introduces a conceptual advancement in the analysis of system performance.

By leveraging the actual residuals of the controlled physical system, the evolution of the controller becomes independent of both its own architecture and that of the teacher network.

This decoupling enables a more robust and architecture-agnostic evaluation of control efficiency, enhancing both the flexibility and generalizability of the system.

For instance, the approach outlined in the work by Yang et al. [56] can be utilized, offering stronger approximation bounds compared to the system-dependent regret bounds typically used in formal analyses of reinforcement-learning algorithms.

This shift allows for more rigorous performance guarantees and a broader application of the controller across various systems.

3.4.3. IMPULSTM

As the aggregation function is changed to IMPULSTM specifically, for some counterpart neural networks, the benefit of using IMPULSTM can be seen in two major ways:

First, a benefit can be observed with respect to the number of trainable parameters, which we have addressed in the Section 3.2.2.

Second, there is a difference in expressiveness bounds.

Let us define the generic difference in efficiency between the counterpart and IMPULSTM on some systems whose behavior is described via PDE.

Consider a system in multiple dimensions (spatial variables

x_{1}, x_{2}, \dots, x_{n}

) and time t. The general PDE can take the form:

\frac{d x (t)}{d t} = f (x (t), u (t), t)

Here:

-: $x (t)$ is the state of the system at time t,
-: $u (t)$ is the input or control variable,
-: $f (x (t), u (t), t)$ describes how the state evolves over time.

In this case, the differential equation shows that the rate of change in the system’s state

x (t)

depends on both the current state

x (t)

, the input

u (t)

, and possibly time t.

For this system, the advantage of using IMPULSTM instead of some counterpart neural network C may be separated into two metrics for simplicity.

First, there is the advantage of using basic LSTM instead of C.

Due to it being a widely explored topic, we will refer to the works of Oukhouya et al. [57] and Niu et al. [58].

Second, there is the benefit of using IMPULSTM instead of LSTM.

This advantage can be described as:

B = \frac{Δ T}{Δ E}

where:

Δ E = \frac{E_{L S T M} - E_{I M P U L S T M}}{E_{L S T M}}

Δ T = \frac{T_{I M P U L S T M}}{m a t h c a l T_{L S T M}}

We note the relative difference in theoretical expressiveness due to using IMPULSTM instead of LSTM and the improvement in handling temporal irregularity, respectively, with

-: $E_{L S T M}$ denoting the expressiveness of the standard LSTM.
-: $E_{I M P U L S T M}$ denoting the expressiveness of the IMPULSTM.
-: $T_{L S T M}$ and $T_{I M P U L S T M}$ denoting the temporal handling abilities of LSTM and IMPULSTM, respectively, with respect to irregularly sampled data.

Thus, B quantifies how much benefit is gained in handling irregular temporal data, normalized by the theoretical loss in expressiveness.

A higher value of B would indicate that the gain in managing irregular time sequences outweighs the loss in expressiveness.

Let us consider a more specific case, considering a linear system:

\dot{x} (t) = A x (t) + B u (t)

y (t) = C x (t) + D u (t)

where:

-: $x (t) \in R^{n}$ is the state vector of the system at time t,
-: $u (t) \in R^{m}$ is the input vector (control input),
-: $y (t) \in R^{p}$ is the output vector,
-: $A \in R^{n \times n}$ is the system matrix that defines the system’s dynamics,
-: $B \in R^{n \times m}$ is the input matrix,
-: $C \in R^{p \times n}$ is the output matrix, and
-: $D \in R^{p \times m}$ is the feedthrough (or direct transmission) matrix.

The linearity with respect to the mechanics can be assessed from the individual agent behavior and extended to the MAS via perspective such as the one in the work by Qin et al. [59].

We define temporal irregularity as the difference between the true state of the system at the most recent timestep and the state implied by the last available measurement when the sampling times are irregular.

Let us define this more precisely:

1. True State at Previous Timestep:

-: Let $x (t_{k})$ be the true state of the system at the current timestep $t_{k}$ .
-: The system evolves from the previous timestep $t_{k - 1}$ according to the system dynamics:

$x (t_{k}) = Φ (t_{k}, t_{k - 1}) x (t_{k - 1}) + \int_{t_{k - 1}}^{t_{k}} Φ (t_{k}, τ) B u (τ) d τ$

where $Φ (t_{k}, t_{k - 1})$ is the state-transition matrix.

2. Last Measurement:

-: If $y (t_{k - 1}) = C x (t_{k - 1})$ is the last measurement at time $t_{k - 1}$ , the neural network typically relies on the measurement $y (t_{k - 1})$ and the known input $u (t_{k - 1})$ to estimate the state at the next timestep $t_{k}$ .

3. Temporal Irregularity:

-: Temporal irregularity can then be defined as the deviation between the true state $x (t_{k})$ and the estimate based on the last available measurement at $t_{k - 1}$ , assuming regular evolution of the system:

$Δ x (t_{k}) = x (t_{k}) - \hat{x} (t_{k} | t_{k - 1})$

where $\hat{x} (t_{k} | t_{k - 1})$ is the estimated state at $t_{k}$ , based on the measurement $y (t_{k - 1})$ and system dynamics over the time interval $(t_{k - 1}, t_{k})$ .

This deviation

Δ x (t_{k})

represents how much the system’s actual state differs from the predicted state due to the irregularity in sampling times. Larger gaps between sampling times (i.e., greater temporal irregularity) generally lead to larger deviations, increasing the difficulty for the neural network to model the system accurately.

To define the cumulative advantage of using IMPULSTM over LSTM, we focus on how well each model represents the current state in the presence of temporal irregularity. Since the measurements of the system’s state still come externally (not recursively predicted by the model), we refrain from considering the long-term drift that would occur if the model were completely self-reliant and lacked temporal awareness. This is akin to the discretization errors seen in classical partial differential equation (PDE) solvers.

Let us formalize the cumulative advantage step-by-step:

1. State Representation and Isomorphism

We assume the existence of an isomorphism between the true state of the system in its original state space and the latent representation used by the IMPULSTM model. Let:

-: $x (t_{k}) \in R^{n}$ represent the true state of the system at time $t_{k}$ ,
-: ${\hat{x}}_{I M P} (t_{k}) \in R^{n}$ be the state representation in the latent space of the LSTM model at time $t_{k}$ ,
-: ${\hat{x}}_{L S T M} (t_{k}) \in R^{n}$ be the state representation of the LSTM model at time $t_{k}$ .

We assume there exists a mapping, an isomorphism

I : R^{n} \to R^{n}

, such that:

x (t_{k}) = I ({\hat{x}}_{I M P} (t_{k}))

This isomorphism implies that the latent representation

{\hat{x}}_{I M P} (t_{k})

of either the LSTM or IMPULSTM model can be mapped to the true state

x (t_{k})

in the original space without losing any information.

2. Deviation due to Temporal Irregularity

For a system with irregular sampling times, let

t_{k - 1}

be the time of the last measurement and

t_{k}

the current time. The actual state

x (t_{k})

evolves according to the system’s dynamics, as discussed earlier:

x (t_{k}) = Φ (t_{k}, t_{k - 1}) x (t_{k - 1}) + \int_{t_{k - 1}}^{t_{k}} Φ (t_{k}, τ) B u (τ) d τ

where

Φ (t_{k}, t_{k - 1})

is the state-transition matrix.

In contrast, the LSTM or IMPULSTM models, due to irregular sampling, may predict different latent states

{\hat{x}}_{L S T M} (t_{k})

and

{\hat{x}}_{I M P} (t_{k})

.

3. Cumulative Advantage Definition We define the advantage of IMPULSTM as the cumulative sum of differences between the actual state

x (t_{k})

and the state represented by the IMPULSTM model

{\hat{x}}_{I M P} (t_{k})

, compared to the difference between the actual state and the LSTM model state

{\hat{x}}_{L S T M} (t_{k})

.

Let the temporal error at each timestep be:

ϵ_{I M P} (t_{k}) = ∥ x (t_{k}) - I ({\hat{x}}_{I M P} (t_{k})) ∥

ϵ_{L S T M} (t_{k}) = ∥ x (t_{k}) - I ({\hat{x}}_{L S T M} (t_{k})) ∥

The cumulative advantage of IMPULSTM over LSTM across a time period T is then defined as:

A_{I M P} = \sum_{k = 1}^{N} (ϵ_{L S T M} (t_{k}) - ϵ_{I M P} (t_{k}))

where N is the number of sampled timesteps during the time period T. The advantage

A_{I M P}

represents the cumulative reduction in temporal error.

Now, let us explain how the advantage scales with some specific characteristics of the system, which are usually accessible in distributed control cases, such as the one to which we apply DIMAS.

1. Lipschitz Continuity and Temporal Variation

Let us assume that the dynamic system governing the evolution of the state

x (t)

is Lipschitz continuous.

This implies there exists a Lipschitz constant L such that for any two time instants

t_{k - 1}

and

t_{k}

:

∥ x (t_{k}) - x (t_{k - 1}) ∥ \leq L \cdot | t_{k} - t_{k - 1} |

The Lipschitz constant L characterizes how sensitive the system’s state is to changes in time.

A higher value of L indicates that small changes in time lead to large variations in the system state.

2. Error Scaling with Lipschitz Constant and Interval Variation

The cumulative error between the predicted state by the model (IMPULSTM or LSTM) and the true state

x (t_{k})

grows as a function of both the Lipschitz constant and the variation in the time intervals.

Let the time intervals between measurements be denoted as

Δ t_{k} = t_{k} - t_{k - 1}

. The error at each timestep depends on both the Lipschitz constant L and the size of the interval

Δ t_{k}

.

The error for a single timestep due to temporal irregularity can be expressed as:

ϵ_{I M P} (t_{k}) \leq L \cdot Δ t_{k} + model error

ϵ_{L S T M} (t_{k}) \leq L \cdot Δ t_{k} + model error

The model error includes inaccuracies due to the neural network’s inability to perfectly capture the system dynamics.

However, the temporal error specifically grows with the interval

Δ t_{k}

and the Lipschitz constant L, as the state varies more over larger time gaps.

3. Cumulative Error Over Time The total cumulative error over a time horizon T (from time

t_{0}

to

t_{N}

) scales with both the Lipschitz constant and the sum of interval variations. For N timesteps, the cumulative error for each model can be written as:

E_{I M P} = \sum_{k = 1}^{N} ϵ_{I M P} (t_{k}) \leq \sum_{k = 1}^{N} (L \cdot Δ t_{k} + model error)

E_{L S T M} = \sum_{k = 1}^{N} ϵ_{L S T M} (t_{k}) \leq \sum_{k = 1}^{N} (L \cdot Δ t_{k} + model error)

4. Effect of Interval Variations

If the intervals

Δ t_{k}

vary widely, the total cumulative error grows faster.

Specifically, irregularly sampled data (with higher variance in

Δ t_{k}

) leads to greater prediction errors, especially for systems with high Lipschitz constants.

Thus, for two models that have the same Lipschitz constant L, the total cumulative error over time depends on the distribution and variation of the intervals

Δ t_{k}

. Larger or more irregular intervals increase the error because the system’s state changes more between measurements, and the model must make predictions over larger gaps in time.

5. Cumulative Advantage Incorporating Lipschitz Scaling

We now modify the cumulative advantage equation from earlier to incorporate the effect of the Lipschitz constant and interval variation. The cumulative advantage of IMPULSTM over LSTM becomes:

A_{I M P} = \sum_{k = 1}^{N} (ϵ_{L S T M} (t_{k}) - ϵ_{I M P} (t_{k}))

Substituting the error bounds for each model:

A_{I M P} = \sum_{k = 1}^{N} ((L \cdot Δ t_{k} + model {error}_{L S T M}) - (L \cdot Δ t_{k} + model {error}_{I M P}))

A_{I M P} = \sum_{k = 1}^{N} (model {error}_{L S T M} - model {error}_{I M P})

Thus, the advantage primarily comes from the difference in model errors, while the Lipschitz term

L \cdot Δ t_{k}

scales both errors similarly for both models.

The greater the interval variation

Δ t_{k}

, the more pronounced the error differences will be between the models, especially in systems with high Lipschitz constants.

4. Experiments

This section is dedicated to the empirical analysis of the performance of both the individual components of the proposed method and the modular system as a whole.

This analysis was conducted on an Nvidia GeForce RTX 3060 graphics card, using Pytorch 3.9 for graph-processing and learning-process implementation. It is important to note the potential risks associated with the intuitive scalability of graph neural networks (GNNs). While GNNs scale linearly with the number of nodes in terms of computation, they also introduce challenges in communication overhead handling—the latter greatly slowing down the analysis.

4.1. Results PHIMEC

4.1.1. Control Problem

Although PHIMEC is a generic control learning framework, we have chosen to focus on a specific engineering application to showcase its performance in comparison to some benchmark reinforcement-learning algorithms. In our study, we focus on devising an efficient real-time optimal control mechanism for regulating the wing flaps of a flexible-wing aircraft with a fixed structure.

The lift and drag coefficients of a wing are commonly expressed as functions of the angle of attack (

α

), denoted as

L (α)

for lift and

D (α)

for drag.

Even when the flaps remain stationary, the lift and drag coefficients can vary with the angle of attack, influenced by both the wing and flap geometry. In our analysis, we consider the fixed position of the flaps and account for their impact within the lift and drag coefficients.

Let

Θ = [θ_{1}, θ_{2}, \dots, θ_{N}]

denote the vector of flap angles, where N is the number of flaps. We define the objective function as follows:

L (Θ) = \frac{F_{L}}{F_{D}} - λ (F_{L} - F_{L_{\min}})

where:

-: $λ$ is the Lagrange multiplier associated with the lift-force constraint, which we apply to simplify the usage of gradient descent-based algorithms for the training of the controller. Also, it serves as an additional regularization term for the control agent, increasing the uniqueness of the solution for the training process.
-: $F_{L_{\min}}$ is the minimum acceptable lift force, which will depend on the characteristics of the aircraft structure itself and the other characteristics of the use case.
-: $F_{L}$ and $F_{D}$ are the lift and drag forces, stated as functions of the flap angles $θ_{i}$ and environmental parameters such as axial force.

The Lagrange multiplier

λ

introduces the lift-force constraint into the optimization problem, ensuring that the lift force remains above the specified threshold

F_{L_{\min}}

, which is stated in the last constraint.

By maximizing the objective function

L (Θ)

, we aim to find the optimal control for the given moment via maximizing the lift-to-drag ratio while satisfying the lift-force constraint.

4.1.2. Empirical Analysis

In this section, we will present the comparative results of our algorithm with other reinforcement-learning algorithms that employ policy gradient. We will use them as points of comparison for RDPG [16], DDPG [17], SAC [18], TD3 [19], and RTD3 [20].

The choice was based on selecting a range of both recurrent and instance-oriented schemas, choosing those that involve policy gradient. The main point of our approach is an increase in efficiency for data incorporation. Thus, we will be comparing networks of the same size but using different amounts of data for the sake of experimental computation costs. However, this can be easily extrapolated into the perspective of the same data amounts.

For every algorithm, the actors and critics are feedforward MLPs with 256 parameters, ranging from 2-layer perceptrons with 256 hidden units to 8-layer perceptrons with a hidden dimension of 64 with Sigmoid as the intermediate activation. For recurrent agents, the layers are exchanged for LSTM cells with otherwise no change in structure.

The architectural choice was based on the architectures that were presented in the respective articles of benchmark RL algorithms. Also, there was a critical increase in the complexity of the learning process for off-policy algorithms with growing depth. Thus, the decision was made in favor of safer parameters.

As in our algorithm, we give the role of the critic to the PINN. We will use PINNs of the same size as other critics. However, as for the meta-trainer, the depth is the main focus. We will use solely the 8-layer fully connected network with 64 hidden units.

Here, in Table 1 and Table 2 we present the resulting average lift-to-drag value for various noise degrees of magnitude, ranging from 0.1 percent to 15 percent of the original signal from the sensor in terms of final objective function value. As for the first 1000 steps for non-recurrent algorithms and 10,000 for recurrent ones, the agents take random steps. We do not display these.

We present a grouped-by-noise histogram of the results for both versions.

Consistent outperformance can be seen on the Figure 4, although with a loss at 15 percent noise percentage to SAC and the recurrent version of TD3.

4.2. Results IMPULSTM

4.2.1. Performance Benchmarks

We have selected 8 consecutive main benchmarks.

Four time -series forecasting benchmarks were utilized, each comprising noisy signals with irregular patterns. The emphasis was on the diversity of the test suite, encompassing datasets of varying sizes and sequences with distinct characteristics.

To replicate irregular sampling rates, a process was implemented where 50 percent of entries in the time-series forecasting applications were artificially skipped. Subsequently, missing entries were either appended or existing ones were removed. Although the option of further reducing the initial sampling rate was considered, it was deemed insufficient as comparative statistics showed negligible differences.

In the realm of meta-sequential decision-making, the focal application was the meta-optimization of neural networks employing dropout. The number of optimizer iterations, wherein an optimized parameter was dropped out, was perceived as analogous to the time since the last event. Recurrent meta-learners were trained on architectures akin to the test applications, excluding dropout. Specifically, a two-layer architecture with 20 hidden units was utilized, mirroring the approach outlined by Andrychowicz et al. [60].

4.2.2. Empirical Analysis

We have trained our models with Adam for 500 epochs with full sequence rollout in order to put the abilities of stable learning of very long sequences to the test, using the networks with the best test loss as the points of reference.

Optimal hyperparameters for each architecture, such as learning rate and degree of regularization, have been found using ALMI-PSO [61], as it was essentially created for the efficient search of such hyperparameters in non-differentiable spaces.

In the Table 3 we show the datasets used, task-specific traits, and the way they are denoted in the manuscript. In the tables below, we will show the relative difference in the percentage of validation MAPE between the proposed architecture and the benchmark models with the same architecture with 4 hidden layers for each task. Both IMPULSTM and reference models will be 64 hidden units wide.

In the Table 4 and Table 5 we can see the results on forecasting applications, where we observe a significant increase in the performance gap between the baseline model and our proposed method.

This trend provides empirical support for the hypothesis that signal propagation through the network layers is more efficient in our approach.

However, in meta-optimization tasks, while IMPULSTM still demonstrates some improvement, the performance enhancement is less pronounced. The difference in outperformance is shown more clearly in Figure 5.

In certain cases, it even underperforms compared to the baseline LSTM. This observation underscores another important finding.

Convolutional networks are well known for their ability to learn effectively through layer separation, treating each kernel as a mini-network.

Previous studies have shown that these networks can even be initialized with a series of manually crafted image-processing filters.

We attribute the limited improvement of time-aware techniques over standard LSTM to the strong disconnection between layers, which reduces the benefits of temporal awareness in these cases.

For IMPULSTM, its primary advantage lies in scalability and adaptability to irregular time intervals.

However, this advantage does not fully materialize in tasks that can be effectively addressed by shallow networks or handcrafted optimization techniques.

Moreover, the nature of meta-optimization tasks further contributes to IMPULSTM’s suboptimal performance relative to other time-aware models.

While IMPULSTM excels in utilizing time-domain information, making it highly effective for modeling physical systems with coherent temporal dynamics, the time domain in meta-optimization tasks is often artificially constructed, lacking inherent temporal structure for the model to exploit.

This can especially clearly be seen on a chart, where other temporal-aware models are also seen to underperform in regards to a basic LSTM.

4.3. Results DIMAS

4.3.1. Formation Control Task

To effectively evaluate DIMAS, we will focus on the formation control task, similar to the analysis conducted by Jiang et al. [62].

Although we will utilize different models for the analysis, the objective remains consistent: to achieve the desired characteristics from a formation of multiple objects, each governed by a corresponding control agent. Specifically, we aim for these objects to adhere to a net formation composed of triangles.

We have considered two primary approaches for this evaluation.

The first approach involves working directly with the proximity graph, ensuring that the edges conform to a specific pattern.

However, we have opted for an alternative approach. While directly handling the edges might potentially enhance loss convexity—an aspect we have not analyzed in detail—we focus instead on the positions and movement characteristics of our robots as numerical representations of nodes.

To enforce a triangular formation among the robots, we utilize the potential function

U_{i j}

as a criterion for control quality. This function is defined as follows:

U_{i j} = \frac{1}{2} (∥ p_{i} - p_{j} {∥ - d^{*})}^{2}, \forall {i, j} \in E

where:

$∥ p_{i} - p_{j} ∥$ is the Euclidean distance between robots i and j.
$d^{*}$ is the prescribed inter-robot distance.

The potential function

U_{i j}

takes its minimum value when the distance between each pair of robots i and j equals

d^{*}

.

4.3.2. Graph Neural Networks

The introduction of graph neural networks(GNNs) [63] enabled the exploitation of complex relationships between objects in a system via black-box machine learning.

In this work, we will use one of their architectures, namely the Message-Passing Graph Neural Network, for the sake of preserving the intuitivity of operating on individual nodes.

Message-Passing Graph Neural Networks (MPGNNs) process graphs through three main steps: Aggregate, Update, and Readout.

Aggregate:

$m_{v}^{(k + 1)} = {AGGREGATE}^{(k)} ({h_{u}^{(k)} : u \in N (v)})$

where $m_{v}^{(k + 1)}$ is the aggregated message for node v at layer $k + 1$ , $N (v)$ denotes the set of neighbors of node v, and ${AGGREGATE}^{(k)}$ is a function that aggregates the messages from the neighbors.
Update:

$h_{v}^{(k + 1)} = {UPDATE}^{(k)} (h_{v}^{(k)}, m_{v}^{(k + 1)})$

where $h_{v}^{(k + 1)}$ is the updated hidden state of node v at layer $k + 1$ , and ${UPDATE}^{(k)}$ is a function that updates the node’s hidden state based on the current hidden state and the aggregated message.
Readout:

$y = READOUT ({h_{v}^{(K)} : v \in V})$

where y is the output for the entire graph, $h_{v}^{(K)}$ is the hidden state of node v at the final layer K, and READOUT is a function that combines the hidden states of all nodes to produce the graph-level output.

4.3.3. IMPULSTM and PHIMEC

In our approach to conserving computational resources during control, we introduce a parameterization of both the Aggregate and Update components of the network.

This parameterization is a departure from standard practice, where the Aggregate function is typically implemented as a simple pooling operation, such as summation or averaging.

In contrast, our methodology employs an IMPULSTM network for the Aggregate component, allowing for more sophisticated temporal awareness, while the Update component is powered by a network trained with PHIMEC, ensuring precise updates based on the physical model of the system.

4.3.4. Results

In this section, we present the results of the numerical experiments.

For the evaluation, we have selected success rate percentages as the primary metric.

The data are organized such that each row represents the success counts for a specific number of robots, ranging from 1 to 9.

Conversely, each column corresponds to the number of controlees omitted in each iteration.

In this section, we have presented both tabular—in Table 6, Table 7, Table 8, Table 9 and Table 10 and graphical in Figure 6 representation of our results. Under fully controlled conditions, where no computational savings are implemented, DIMAS exhibits a relative decrease in performance.

However, as computational resources are progressively reduced, a clear linear trend emerges, demonstrating improved efficiency relative to the amount of computation preserved.

5. Conclusions

In this study, we have introduced several key contributions: a novel methodology for integrating temporal awareness into recurrent neural networks, an end-to-end learning framework for physics-informed neural control, and a strategy for leveraging domain knowledge about the current state of individual agents to minimize computational demands.

We have conducted a comprehensive analysis of each contribution individually, as well as their combined effect on system performance. Additionally, we have explored the individual impact of each contribution on the modular system.

The proposed integration of these two contributions demonstrates scalability in system size due to the linear scalability of graph neural networks and design flexibility, owing to the intuitive interchangeability of individual components, which do not exhibit critical interdependencies.

Author Contributions

D.S. and H.Z. have contributed equally to different aspects of this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DIMAS	Domains Integration for Multi-Agent Systems
PHIMEC	Physics-Informed Meta Control
GNN	Graph Neural Network
LSTM	Long Short-Term Memory
GSP	Graph Signal Processing
UD	Urgency Decoder
PM	Priority Manager

References

Plappert, S.; Gembarski, P.; Lachmayer, R. Multi-Agent Systems in Mechanical Engineering: A Review. In Smart Innovation, Systems and Technologies; Springer: Singapore, 2021; pp. 193–203. [Google Scholar] [CrossRef]
Černevičienė, J.; Kabašinskas, A. Review of Multi-Criteria Decision-Making Methods in Finance Using Explainable Artificial Intelligence. Front. Artif. Intell. 2022, 5, 827584. [Google Scholar] [CrossRef] [PubMed]
Bazzan, A.L.C.; Klügl, F. A review on agent-based technology for traffic and transportation. Knowl. Eng. Rev. 2014, 29, 375–403. [Google Scholar] [CrossRef]
Rizk, Y.; Awad, M.; Tunstel, E. Decision Making in Multi-Agent Systems: A Survey. IEEE Trans. Cogn. Dev. Syst. 2018, 10, 514–529. [Google Scholar] [CrossRef]
Maestre, I.M.; Sánchez Prieto, S.; Velasco Pérez, J.R. Sistemas Multiagente de Tiempo Real. 2005. Available online: https://api.semanticscholar.org/CorpusID:170300785 (accessed on 7 September 2024).
He, X. Building Safe and Stable DNN Controllers using Deep Reinforcement Learning and Deep Imitation Learning. In Proceedings of the 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), Guangzhou, China, 5–9 December 2022; pp. 775–784. [Google Scholar] [CrossRef]
Deng, C.; Ji, X.; Rainey, C.; Zhang, J.; Lu, W. Integrating Machine Learning with Human Knowledge. iScience 2020, 23, 101656. [Google Scholar] [CrossRef] [PubMed]
Baty, H. A hands-on introduction to Physics-Informed Neural Networks for solving partial differential equations with benchmark tests taken from astrophysics and plasma physics. arXiv 2024, arXiv:2403.00599. [Google Scholar]
Malcolm, K.; Casco-Rodriguez, J. A Comprehensive Review of Spiking Neural Networks: Interpretation, Optimization, Efficiency, and Best Practices. arXiv 2023, arXiv:2303.10780. [Google Scholar]
Sun, Y.; Sengupta, U.; Juniper, M. Physics-informed deep learning for simultaneous surrogate modeling and PDE-constrained optimization of an airfoil geometry. Comput. Methods Appl. Mech. Eng. 2023, 411, 116042. [Google Scholar] [CrossRef]
Nguyen, N.T.; Cramer, N.B.; Hashemi, K.E.; Ting, E.; Drew, M.; Wise, R.; Boskovic, J.; Precup, N.; Mundt, T.; Livne, E. Real-Time Adaptive Drag Minimization Wind Tunnel Investigation of a Flexible Wing with Variable Camber Continuous Trailing Edge Flap System. In Proceedings of the AIAA Aviation 2019 Forum, Dallas, TX, USA, 17–21 June 2019. [Google Scholar] [CrossRef]
Stiasny, J.; Chevalier, S.; Chatzivasileiadis, S. Learning without Data: Physics-Informed Neural Networks for Fast Time-Domain Simulation. arXiv 2021, arXiv:2106.15987. [Google Scholar]
Erichson, N.B.; Muehlebach, M.; Mahoney, M.W. Physics-informed Autoencoders for Lyapunov-stable Fluid Flow Prediction. arXiv 2019, arXiv:1905.10866. [Google Scholar]
Chen, S.; Guo, W. Auto-Encoders in Deep Learning—A Review with New Perspectives. Mathematics 2023, 11, 1777. [Google Scholar] [CrossRef]
Bank, D.; Koenigstein, N.; Giryes, R. Autoencoders. arXiv 2021, arXiv:2003.05991. [Google Scholar]
Song, D.R.; Yang, C.; McGreavy, C.; Li, Z. Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge. In Proceedings of the 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), Automation, Singapore, 18–21 November 2018. [Google Scholar] [CrossRef]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2019, arXiv:1509.02971. [Google Scholar]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:1801.01290. [Google Scholar]
Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. arXiv 2018, arXiv:1802.09477. [Google Scholar]
Hou, Y.; Hong, H.; Sun, Z.; Xu, D.; Zeng, Z. The Control Method of Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism to Multi-DOF Manipulator. Electronics 2021, 10, 870. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Philipp, G.; Song, D.; Carbonell, J.G. The exploding gradient problem demystified—Definition, prevalence, impact, origin, tradeoffs, and solutions. arXiv 2018, arXiv:1712.05577. [Google Scholar]
Zhao, Z.; Ding, X.; Prakash, B.A. PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks. arXiv 2024, arXiv:2307.11833. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
Guo, T.; Lin, T.; Antulov-Fantulin, N. Exploring Interpretable LSTM Neural Networks over Multi-Variable Data. arXiv 2019, arXiv:1905.12034. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
Liu, Z.; Zhu, Z.; Gao, J.; Xu, C. Forecast Methods for Time Series Data: A Survey. IEEE Access 2021, 9, 91896–91912. [Google Scholar] [CrossRef]
Smedt, J.D.; Yeshchenko, A.; Polyvyanyy, A.; Weerdt, J.D.; Mendling, J. Process Model Forecasting Using Time Series Analysis of Event Sequence Data. arXiv 2021, arXiv:2105.01092. [Google Scholar]
Yu, X.; Ren, Z.; Liu, P.; Imsland, L.; Georges, L. Comparison of time-invariant and adaptive linear grey-box models for model predictive control of residential buildings. Build. Environ. 2024, 254, 111391. [Google Scholar] [CrossRef]
Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. arXiv 2018, arXiv:1808.01974. [Google Scholar]
Branco, N.W.; Cavalca, M.S.M.; Stefenon, S.F.; Leithardt, V.R.Q. Wavelet LSTM for Fault Forecasting in Electrical Power Grids. Sensors 2022, 22, 8323. [Google Scholar] [CrossRef]
Zhang, W.; Yang, D.; Cheung, C.Y.; Chen, H. Frequency-Aware Inverse-Consistent Deep Learning for OCT-Angiogram Super-Resolution. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2022, Singapore, 18–22 September 2022; Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S., Eds.; Springer: Cham, Switerland, 2022; pp. 645–655. [Google Scholar]
Cao, D.; Huang, J.; Zhang, X.; Liu, X. FTCLNet: Convolutional LSTM with Fourier Transform for Vulnerability Detection. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December 2020–1 January 2021; pp. 539–546. [Google Scholar] [CrossRef]
Ishii, T.; Ueda, R.; Miyao, Y. Empirical Analysis of the Inductive Bias of Recurrent Neural Networks by Discrete Fourier Transform of Output Sequences. arXiv 2023, arXiv:2305.09178. [Google Scholar]
Nguyen, A.; Chatterjee, S.; Weinzierl, S.; Schwinn, L.; Matzner, M.; Eskofier, B. Time Matters: Time-Aware LSTMs for Predictive Business Process Monitoring. arXiv 2020, arXiv:2010.00889. [Google Scholar]
Baytas, I.M.; Xiao, C.; Zhang, X.; Wang, F.; Jain, A.K.; Zhou, J. Patient Subtyping via Time-Aware LSTM Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, New York, NY, USA, 13–17 August 2017; pp. 65–74. [Google Scholar] [CrossRef]
Fuengfusin, N.; Tamukoh, H. Network with Sub-networks: Layer-wise Detachable Neural Network. J. Robot. Netw. Artif. Life 2020, 7, 240. [Google Scholar] [CrossRef]
Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
Yang, Z.; Li, L.; Xu, X.; Kailkhura, B.; Xie, T.; Li, B. On the Certified Robustness for Ensemble Models and Beyond. arXiv 2022, arXiv:2107.10873. [Google Scholar]
Zhong, Y.; Ta, Q.T.; Luo, T.; Zhang, F.; Khoo, S.C. Scalable and Modular Robustness Analysis of Deep Neural Networks. arXiv 2021, arXiv:2108.11651. [Google Scholar]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Kortvelesy, R.; Prorok, A. ModGNN: Expert Policy Approximation in Multi-Agent Systems with a Modular Graph Neural Network Architecture. arXiv 2023, arXiv:2103.13446. [Google Scholar]
Andreas, J.; Rohrbach, M.; Darrell, T.; Klein, D. Neural Module Networks. arXiv 2017, arXiv:1511.02799. [Google Scholar]
Yang, R.; Singh, S.K.; Tavakkoli, M.; Amiri, N.; Yang, Y.; Karami, M.A.; Rai, R. CNN-LSTM deep learning architecture for computer vision-based modal frequency detection. Mech. Syst. Signal Process. 2020, 144, 106885. [Google Scholar] [CrossRef]
Langston, J.; Ravindra, H.; Steurer, M.; Fikse, T.; Schegan, C.; Borraccini, J. Priority-Based Management of Energy Resources During Power-Constrained Operation of Shipboard Power System. In Proceedings of the 2021 IEEE Electric Ship Technologies Symposium (ESTS), Arlington, VA, USA, 3–6 August 2021; pp. 1–9. [Google Scholar] [CrossRef]
Faxas-Guzmán, J.; García-Valverde, R.; Serrano-Luján, L.; Urbina, A. Priority load control algorithm for optimal energy management in stand-alone photovoltaic systems. Renew. Energy 2014, 68, 156–162. [Google Scholar] [CrossRef]
Sebastian, E.; Duong, T.; Atanasov, N.; Montijano, E.; Sagues, C. Physics-Informed Multi-Agent Reinforcement Learning for Distributed Multi-Robot Problems. arXiv 2024, arXiv:2401.00212. [Google Scholar]
Mowlavi, S.; Nabi, S. Optimal control of PDEs using physics-informed neural networks. arXiv 2022, arXiv:2111.09880. [Google Scholar] [CrossRef]
Zheng, Y.; Hu, C.; Wang, X.; Wu, Z. Physics-informed recurrent neural network modeling for predictive control of nonlinear processes. J. Process Control 2023, 128, 103005. [Google Scholar] [CrossRef]
Woo, G.; Liu, C.; Sahoo, D.; Kumar, A.; Hoi, S. ETSformer: Exponential Smoothing Transformers for Time-series Forecasting. arXiv 2022, arXiv:2202.01381. [Google Scholar]
Jawad, M.; Dhawale, C.; Ramli, A.A.B.; Mahdin, H. Adoption of knowledge-graph best development practices for scalable and optimized manufacturing processes. MethodsX 2023, 10, 102124. [Google Scholar] [CrossRef]
Xiao, H.; Ordozgoiti, B.; Gionis, A. Searching for polarization in signed graphs: A local spectral approach. In Proceedings of the Web Conference 2020, WWW ’20, Taipei, Taiwan, 20–24 April 2020. [Google Scholar] [CrossRef]
Augustine, M.T. A Survey on Universal Approximation Theorems. arXiv 2024, arXiv:2407.12895. [Google Scholar]
Wang, S.; Li, Z.; Li, Q. Inverse Approximation Theory for Nonlinear Recurrent Neural Networks. arXiv 2024, arXiv:2305.19190. [Google Scholar]
Zhu, R.; Lin, B.; Tang, H. Bounding The Number of Linear Regions in Local Area for Neural Networks with ReLU Activations. arXiv 2020, arXiv:2007.06803. [Google Scholar]
Yang, Y.; Wang, T.; Woolard, J.P.; Xiang, W. Guaranteed approximation error estimation of neural networks and model modification. Neural Netw. 2022, 151, 61–69. [Google Scholar] [CrossRef] [PubMed]
Oukhouya, H.; El Himdi, K. Comparing Machine Learning Methods—SVR, XGBoost, LSTM, and MLP— For Forecasting the Moroccan Stock Market. Comput. Sci. Math. Forum 2023, 7, 39. [Google Scholar] [CrossRef]
Niu, K.; Zhou, M.; Abdallah, C.T.; Hayajneh, M. Deep transfer learning for system identification using long short-term memory neural networks. arXiv 2022, arXiv:2204.03125. [Google Scholar]
Qin, J.; Yu, C.B. Exponential consensus of general linear multi-agent systems under directed dynamic topology. Automatica 2014, 50, 2327–2333. [Google Scholar] [CrossRef]
Andrychowicz, M.; Denil, M.; Gomez, S.; Hoffman, M.W.; Pfau, D.; Schaul, T.; Shillingford, B.; de Freitas, N. Learning to learn by gradient descent by gradient descent. arXiv 2016, arXiv:1606.04474. [Google Scholar]
Shchyrba, D.; Paniczek, I. Adaptively Learning Memory Incorporating PSO. arXiv 2024, arXiv:2402.11679. [Google Scholar]
Jiang, C.; Huang, X.; Guo, Y. End-to-end decentralized formation control using a graph neural network-based learning method. Front. Robot. AI 2023, 10, 1285412. [Google Scholar] [CrossRef] [PubMed]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Architecture of PHIMEC.

Figure 2. Architecture of PHIMEC.

Figure 3. Architecture of DIMAS.

Figure 4. Relative improvement visualization: non-recurrent and recurrent versions, respectively.

Figure 5. Relative improvement visualization: forecasting and optimization tasks correspondingly.

Figure 6. Relative improvement visualization: 5 percent and 10 percent tolerance rates.

Table 1. Results for non-recurrent variants.

Noise Percentage	PHIMEC	DDPG	SAC	TD3
0.01	20.5	19.8	19.7	19.9
0.05	20.2	19.3	19.2	19.2
0.10	19.2	18.4	18.4	18.5
0.15	18.6	17.8	19.2	18.1

Table 2. Results for recurrent variant(RPHIMEC).

Noise Percentage	R-PHIMEC	RDPG	R-SAC	RTD3
0.01	21.3	20.6	20.3	20.1
0.05	20.8	19.5	19.7	19.6
0.10	20.2	18.8	19.2	19.0
0.15	18.6	18.0	18.9	18.7

Table 3. List of evaluation use cases.

Task	Traits	Reference
UCI Electricity Load Diagrams Dataset	High spectral flatness	F1
UCI PEM-SF Traffic Dataset	Sharp peaks	F2
UCI Air Quality Dataset	Generic benchmark	F3
Electricity Transformer Temperature	Spectral flatness and long-term dependencies	F4
Cifar-10	Convolutional Network	T1
MNIST with RELU	Unfamiliar activation function	T2
LSTM for two-tank cascade system identification	Spurious Valleys	T3
Random quadratic functions	Simple task	T4

Table 4. Results on the forecasting tasks.

Task	T-LSTM	T-RNN	LSTM
F1	25	27	38
F2	24	23	31
F3	25	26	33
F4	31	30	42

Table 5. Results on the optimization tasks.

Task	T-LSTM	T-RNN	LSTM
T1	5	2	−2
T2	3	2	−1
T3	3	7	3
T4	4	8	4

Table 6. Parameters of respective layers and network of our controller.

Layer	Input	Output
IMPULSTM	128	96
PHIMEC 1	256	128
PHIMEC 2	256	2

Table 7. Success rate for MP-GNN with a 5 percent tolerance rate.

Number	0	1	2	3	4	5
4	100	91	83	67	N/a	N/a
5	97	92	79	71	65	N/a
6	93	84	71	67	53	47
7	87	85	68	61	49	44
8	73	77	60	54	38	40
9	66	60	52	47	37	35

Table 8. Success rate for DIMAS with a 5 percent tolerance rate.

Number	0	1	2	3	4	5
4	100.0	94.0	88.0	74.0	N/a	N/a
5	96.0	94.0	83.0	77.0	72.0	N/a
6	91.0	84.0	74.0	71.0	58.0	53.0
7	84.0	84.0	69.0	64.0	53.0	49.0
8	69.0	75.0	60.0	56.0	41.0	44.0
9	62.0	58.0	51.0	48.0	39.0	38.0

Table 9. Success rate for MP-GNN with a 10 percent tolerance rate.

Number	0	1	2	3	4	5
4	100	100	92	74	N/a	N/a
5	100	100	87	78	72	N/a
6	99	93	78	74	59	52
7	96	94	75	68	54	49
8	81	85	66	60	42	44
9	73	66	58	52	41	39

Table 10. Adjusted success rate for DIMAS with a 10 percent tolerance rate.

Number	0	1	2	3	4	5
4	100.0	103.0	98.0	81.0	0.0	0.0
5	99.0	102.0	91.0	84.0	80.0	0.0
6	97.0	93.0	81.0	79.0	65.0	59.0
7	92.0	93.0	76.0	71.0	58.0	54.0
8	77.0	83.0	66.0	62.0	45.0	48.0
9	68.0	63.0	57.0	53.0	43.0	42.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shchyrba, D.; Zarzycki, H. Computationally Efficient Inference via Time-Aware Modular Control Systems. Electronics 2024, 13, 4416. https://doi.org/10.3390/electronics13224416

AMA Style

Shchyrba D, Zarzycki H. Computationally Efficient Inference via Time-Aware Modular Control Systems. Electronics. 2024; 13(22):4416. https://doi.org/10.3390/electronics13224416

Chicago/Turabian Style

Shchyrba, Dmytro, and Hubert Zarzycki. 2024. "Computationally Efficient Inference via Time-Aware Modular Control Systems" Electronics 13, no. 22: 4416. https://doi.org/10.3390/electronics13224416

APA Style

Shchyrba, D., & Zarzycki, H. (2024). Computationally Efficient Inference via Time-Aware Modular Control Systems. Electronics, 13(22), 4416. https://doi.org/10.3390/electronics13224416

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Number	0	1	2	3	4	5
4	100	91	83	67	N/a	N/a
5	97	92	79	71	65	N/a
6	93	84	71	67	53	47
7	87	85	68	61	49	44
8	73	77	60	54	38	40
9	66	60	52	47	37	35

Number	0	1	2	3	4	5
4	100	100	92	74	N/a	N/a
5	100	100	87	78	72	N/a
6	99	93	78	74	59	52
7	96	94	75	68	54	49
8	81	85	66	60	42	44
9	73	66	58	52	41	39

Number	0	1	2	3	4	5
4	100	91	83	67	N/a	N/a
5	97	92	79	71	65	N/a
6	93	84	71	67	53	47
7	87	85	68	61	49	44
8	73	77	60	54	38	40
9	66	60	52	47	37	35

Number	0	1	2	3	4	5
4	100	100	92	74	N/a	N/a
5	100	100	87	78	72	N/a
6	99	93	78	74	59	52
7	96	94	75	68	54	49
8	81	85	66	60	42	44
9	73	66	58	52	41	39

Article Menu

Computationally Efficient Inference via Time-Aware Modular Control Systems

Abstract

1. Introduction

1.1. Preface

1.2. Contributions

2. Related Research

2.1. Data-Efficient Neural Control

2.2. Temporal Awareness

2.3. Modular Neural Systems

3. Proposals

3.1. PHIMEC

3.2. IMPULSTM

3.2.1. Methodology

3.2.2. Scaling of Architecture Changes

3.3. DIMAS

3.3.1. Priority Manager (PM)

3.3.2. Urgency Decoder (UD)

3.4. Scalability

3.4.1. Structural

3.4.2. PHIMEC

3.4.3. IMPULSTM

4. Experiments

4.1. Results PHIMEC

4.1.1. Control Problem

4.1.2. Empirical Analysis

4.2. Results IMPULSTM

4.2.1. Performance Benchmarks

4.2.2. Empirical Analysis

4.3. Results DIMAS

4.3.1. Formation Control Task

4.3.2. Graph Neural Networks

4.3.3. IMPULSTM and PHIMEC

4.3.4. Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Number	0	1	2	3	4	5
4	100	91	83	67	N/a	N/a
5	97	92	79	71	65	N/a
6	93	84	71	67	53	47
7	87	85	68	61	49	44
8	73	77	60	54	38	40
9	66	60	52	47	37	35

Number	0	1	2	3	4	5
4	100	100	92	74	N/a	N/a
5	100	100	87	78	72	N/a
6	99	93	78	74	59	52
7	96	94	75	68	54	49
8	81	85	66	60	42	44
9	73	66	58	52	41	39