1. Introduction
Industry has entered a phase of big change that sees digital technologies as a key factor for the future to design cyber–physical production systems. The convergence of the virtual world of the Internet and information technology (IT) and the real world of industrial installations and operational technology (OT) will be the challenge for the Factory of the Future (Industry 4.0). Modern information and communication technologies seem to be a solution to increase productivity, quality and flexibility within industry. The challenges are numerous: connectivity and interoperability, digital twins, decentralization, human-centered automation… These systems are predicted to enable new automation paradigms and improve plant operations in terms of increased facility effectiveness. However, that could be accomplished only if innovative methodologies and tools are given to the control engineer. One of them concerns programmable logic controllers (PLCs), program verification tools. PLCs, with their sequential and cyclic operation, specified by the IEC 61131 standards are widely used today for the control of automated installations [
1]. Article [
2] shows that the programming model for PLCs needs to evolve to support the increasing complexity and more extensive network integration demanded by Industry 4.0, but PLCs will remain key components in Industry 4.0. Furthermore, refs. [
3,
4,
5] give a deep description of risk management in a factory where PLCs work together. In previous works, we have proposed original methodological approaches to bridge the gap between formal approaches in academia and industrial applications by proposing an original workflow for automation study [
6,
7]. This workflow is based on an automatic generation process of deliverables (PLC programs, recipe book...). However, the problem of PLC code verification remains a scientific and technical issue. Our work in this paper is thus complementary to the abovementioned approaches. The validation stage is generally based on factory acceptance testing (FAT) carried out in the supplier’s test facilities and site acceptance testing (SAT) carried out at the customer’s site. The commissioning of controllers remains a critical step today and for Industry 4.0, as it is time-consuming and stressful for the staff. Correcting errors at this stage of the project is particularly costly. Verification and “virtual commissioning” of installations are now scientific and methodological obstacles to Industry 4.0. Today, we note that automation engineers use few or no code verification tools from the academic world, formal or not, to facilitate the validation and commissioning stage [
8]. The reason these practices are not used in the industrial world is mainly related to their usability. They are considered too complex, given the models, formalism and methods used, and often disconnected from the real needs of automation specialists.
Our goal is to propose a set of methods that allow checking some properties common to each correctly programmed PLC code during development and prior to the implementation of the facility. In order to automatically find faults before implementation, we must restrict ourselves to indicating the presence of certain errors that will almost certainly induce problems during code execution. Errors will be found on the variables that compose the code to avoid the demanding task of code formalisation. We will focus on extracting accessible states, non-accessible states, deadlock, and fugitive states from the structured text (ST), also defined in IEC 61131-3 and implemented on PLCs. Usually, a PLC program is a cyclic sequence. The program includes a few variables that are in deadlock, which makes its systematic search relevant. It is the same for fugitive states; we expect a PLC to react to situations. A fugitive state is a change in the parameters of the PLC without any external event that can involve fugitive actions which, most of the time, is a problem. From the accessible and non-accessible states, a competition matrix can be produced which can give more information about the code. In order to explain the relevance of the methods developed, we will apply our methods to different examples to test the limits of considered approaches.
Software exists that can perform a formal proof of the code. Examples include UPPAAL [
9], NuSMV [
10] or Supremica [
11]. However, it is necessary to convert the code implemented on the PLC to have an overview of the reliability of the implemented code as noted in [
12,
13,
14]. To remove this source of error, we propose a method similar to [
15] or [
16] which involves verifying properties from a specification. To our best of knowledge, no other study examines this specific application context. This method is much more computationally expensive. The error search performed by UPPAAL, as noted in [
17] keeps the information necessary for its completion in RAM. The amount of information sufficient to ensure the reliability of a code can be very important; it is necessary to find an effective method for storage and research of the information once generated. To solve this problem, we propose organizing the information in a database by converting the logical code into a relational database similar to [
18,
19,
20,
21]. This allows optimizing the storage space to allow fast retrieval of information. We also propose a search for errors in the PLC code by queries in the database. Thus, the error search is no longer an exploration of a graph in a PLC but by a query on a database. This method is particularly time-consuming because the information is stored on a computer’s hard drive. Nevertheless, this search for errors can be distributed among several computers by following the method detailed by [
22] which uses fast communication between FPGA (field-programmable gate array). However, this precise study of complex code is often not necessary during code development, but it is convenient to quickly obtain information. Text understanding by using neural networks is a fast-growing problem. Although very standardized, this operation is not simple. Indeed, unlike functions to be reproduced or images to be analyzed, all the texts we want to analyze are not the same size, and there is no simple way to force a text into a precise size unlike an image. It is thus necessary to analyze a text as a sequence of words; for that we can use the recurrent neural networks as detailed in [
23,
24]. Neural networks composed of layers of long short-term memory (LSTM), as noted in [
25], are ideal for this task as shown by [
26]. It is natural to try to find properties in computer program codes with LSTMs, as noted in [
27,
28]. By analyzing the code with a recurrent neural network, we can obtain results similar to the formal methods.
The rest of this paper is organized as follows.
Section 2 examines the framework of the methods we will use as well as some notations in codes used as examples.
Section 3 shows how it is possible from the ST code of a PLC to generate a graph in the UPPAAL format. This method is efficient, but only a small verification can be carried out because of the research without the storage of UPPAAL. This issue is addressed in
Section 4, where we explain how a database can be used to store and retrieve information to check larger examples.
Section 5 explains how to go even further by using recursive neural networks to sequence the code to find errors without exploring the marking graph. The conclusion of the paper is given in
Section 6.
5. Code Analysis by a Recursive Neural Network
5.1. Principle
We propose in this section an exploratory approach around the use of recurrent neural networks for the automatic verification of PLC code, the idea being not to have to browse the graph of reachable markings but only the code. The difficulty of implementing this method lies in the generation of a large database of checked model codes, which must be representative of those to which we wish to apply the neural network. We will limit our experiments to codes with few variables represented by a list of Boolean equations, so we can already validate the use of neural networks in a case where the generation of the database is not restrictive.
In order to process the codes with neural networks, we first need to convert them into a sequence of number vectors. We chose to put the codes in an elementary form by decomposing all the Boolean equations into a sequence of simple operations. The lines readable by the neural network must be compatible with the following regular expression where the following are noted:
the variable names | the operator or and ? the presence of zero or one occurrence of the expression preceded by the operator.
An example of code corresponding to this description is used in
Figure 10. This format allows us to convert each line of the elementary code into a vector of
with
and
, respectively, the numbers of input and output variables of the codes. Line to vector conversions are given in
Table 3.
We consider the internal variables as output variables because they have the same role in the construction of the graph of reachable markings. The first coordinates of the vector are used to encode the output variable to which a new value will be assigned; they are all zero except for the coordinate corresponding to the variable number. The next corresponds to the variables appearing in the assignment and the next three are for the logical operator in the line.
Now that we have a method to transform our ST codes into a sequence of number vectors, we can build binary classifiers based on a sequential path of the code using recurrent neural networks, and more particularly LSTM networks [
25], which are widely used for sequential data analysis. Examples of their uses for source code processing problems can be found in the scientific literature, for example in [
23,
27,
28] for the detection of different types of programming errors. We will jointly train an LSTM network and a multilayer perceptron to classify ST codes. The vectors
corresponding to each line of code will be given sequentially as input to the LSTM cell to update the memory vectors
, and then the last short-term memory will be decoded by the multilayer perceptron to create the prediction vector whose each component between zero and one can be interpreted as a probability that the corresponding variable is involved in a fugitivity or deadlock problem in the ST code (
Figure 11). During training, the parameters of the LSTM network and the perceptron will be optimized to minimize the cross-entropy between the objective and the prediction, which is the error function commonly used for binary classification problems.We could also train another LSTM network to classify the codes but this time by browsing them in the opposite direction. This would allow us to give more importance to the first lines of the codes, which would be the last ones to be analyzed by the LSTM network and thus would have the most influence on the output memory at the beginning of the training. This is the idea behind the classical bi-LSTM architectures (for bidirectional LSTM networks), where a forward LSTM and a backward LSTM browse the code independently and for which the multilayer perceptron (MLP) performing the classification takes as input the concatenation of the final memories of these two LSTM.
The LSTM networks browse the code sequentially; they are given the input lines in order of their appearance in the code. This way of browsing the code is not optimal because in a code there can be independent blocks of instructions whose order can be changed without affecting the output of the algorithm. The idea of our approach is to inscribe this invariance by permutation directly in the structure of the recurrent network. The idea of constructing LSTM networks that traverse a text in a non-sequential way is not new: in [
26], LSTM networks based on the traversal of a tree rather than a sequence are presented. However in our case, the structure adapted to code traversal is not that of a tree but that of a directed graph.
We can summarize the dependencies of the lines on the others in a directed graph that we will call the causality graph. Each node of this graph corresponds to a line of code, and there is an edge from one node to another if and only if the line corresponding to the arrival node involves a variable whose last assignment took place on the line corresponding to the departure node. We add a start node and an end node to this graph: the start node will be linked to the nodes having no other antecedents in the graph (which correspond to the lines of code that can be executed first because they do not depend on any other), and in an analogous way, the end node will have as antecedents all the nodes having no other successors (i.e., the nodes corresponding to the lines that can be executed last). To understand this, we use an example of arbitrary code and the associated causal graph (
Figure 10). The start and end nodes are, respectively, numbered by 0 and −1. Line 9 uses the variable
which is modified for the last time in Line 2. It could therefore be executed at any position in the code as long as it is executed after Line 2 and before Line 10 since the latter depends directly on it. Similarly, Line 3 must be executed after Line 2 which itself must be executed after Line 1, but Lines 4, 5, 9 and 10 are not in the same branch as Line 3 and can be executed either before or after it. There is thus a multitude of codes associated with the same causal graph (except for the numbering of the nodes).
We define a DGLSTM (DiGraph long short-term memory) network as a network composed of an LSTM cell that is recurrently evaluated by following a given oriented graph which must verify some properties:
It must have nodes if the input sequences are of size n;
It must have a single node without antecedents as well as a single node without successors;
It must not contain any oriented cycle.
When a node has several antecedents in the graph, the memory used by the LSTM cell when evaluating the corresponding input will be the average of the memories computed in the output of the LSTM cell when evaluating the inputs corresponding to the antecedents of the node. The multilayer perceptron trained jointly with the DGLSTM network to make the prediction will take the average of the input memories of the output node as input. At each evaluation of a DGLSTM network, we must therefore not only provide the input sequence
but also the graph of causality of the corresponding code (
Figure 12). In a similar way to bi-LSTM, we can define the bi-DGLSTM architectures for which the causality graph is browsed in both directions independently by two DGLSTM networks.
5.2. Experiments
We first generated a large number of random codes for which we set the maximum amount of input and output variables to 3 and 10, respectively. The lines of ST codes are generated randomly and independently; there are between 5 and 15 per code. We used 250,000 codes for the training and 50,000 codes for the verification of the epoch. We used 50,000 other codes that had never been presented to the different neural networks to perform our statistical analysis. The fact that the lines are generated independently of each other means that there are many unnecessary lines of code. For example, it is common to assign a new value to a variable without having used it since its last assignment, but there are also many less trivial cases of unnecessary lines that can hardly be corrected. We do not really know to what extent the code used in practice by the APIs may contain unnecessary lines or unnecessarily complex portions of the code. In any case, we have chosen to keep the generation of codes by independent lines for the moment, hoping that the “realistic” codes are not under-represented in the set of codes that can be generated. The ideal solution would be to simplify the codes before they are processed by the LSTM or DGLSTM networks, but this would require considerable work. This theme of simplification of the codes has been evoked, especially in [
20].
We trained five different types of models:
LSTM networks based on a word encoding;
LSTM networks based on line encoding;
Bi-LSTM networks based on a line encoding;
DGLSTM networks based on a line encoding;
Bi-DGLSTM networks based on a line encoding.
All models were trained on the same amount of data and with batches of 128 codes. We used the ADAM algorithm [
31] for training. RMSprop and SGD were tested without much success. We also used a strategy to reduce the training rate when the error decrease stopped and stopped training when the average error on the validation data stopped decreasing due to over-learning. We used a Bayesian optimization algorithm of the library
optuna to find the best set of hyperparameters for each of the five types of models. The error on the validation data at the end of training was the criterion to minimize. The hyperparameters in question are the same for all five types of models:
Size of the memory of the LSTM cells;
Number of LSTM cells composed during the evaluation of a single input vector;
Number of layers and number of neurons of the multilayer perceptron;
Value of the dropout layers.
In addition to the cross-entropy, four other error measures are defined which are only used to analyze the results: the percentages of false positives and false negatives for the deadlock and for the fugitivity. The error on the validation base decreases faster for DGLSTM and bi-DGLSTM than for the other models: a few epochs are usually enough to reach a plateau in the error, while it takes several tens for the others. This plateau generally corresponds to a lower error than for the other models. For both (bi-)LSTMs and (bi-)DGLSTMs, the deepest networks give the best results but at the cost of a more difficult training which seems very sensitive to the initialization weights. Sometimes, the error does not decrease at all even for a large number of epochs, whereas it sometimes decreases from the first epochs for identical hyperparameters. Thus, the best results we have obtained are the result of good luck at initialization.
In addition to the cross-entropy, four other error measures are defined which are only used to analyze the results: the percentages of false positives and false negatives for the deadlock and for the fugitivity (
Figure 13). It seems that it is more difficult to predict deadlock than fugitivity. We ran the hyperparameter optimization routine for the same amount of time for all five model types. The models with the slowest evaluation could therefore be trained a smaller number of times than the others. The (bi-)DGLSTM networks take by far the longest to evaluate and are therefore the most disadvantaged for this choice, which does not prevent them from having the best results, thus demonstrating the relevance of the non-sequential browsing of the ST codes.
We have demonstrated the ability of neural networks to make correct predictions for the majority of randomly generated codes, but this does not guarantee that these predictions will always be mostly correct for real codes used in practice. We therefore applied our best bi-DGLSTM networks to model check the three-lamp example. We have therefore evaluated the best DGLSTM network on the three-lamp example (
Table 4). Note that the criterion for selecting the best DGLSTM is the prediction error on the validation base composed of random codes. The network was therefore not selected to be the best on this particular example, which would have led to artificially good results. The network succeeds almost perfectly in predicting fugitivity, with only one minor error for the second lamp. As observed, on average on all random codes, the prediction of deadlock seems much worse.
5.3. Discussion
The use of neural networks for model checking seems promising. A full-fledged study would be needed to improve the performance of the networks and extend their use to codes with larger numbers of variables, which would require more attention to parsimonious database generation. In particular, we did not conduct any data augmentation. From a single code verified with UPPAAL, it would have been possible to create many equivalent codes that do not necessarily have the same causal graphs, for example by permuting the names of some variables. Finally, it would be interesting to understand the codes that cause problems to neural networks. The use of the causal graph to browse the code in a non-sequential way seems relevant and is not restricted to the context of our study.
Table 5 presents, as a synthesis, the results obtained from the three methods proposed in the paper. From these first results, we could imagine for the future the following testing PLC code methodology. Once a compilable code is obtained, it is best to start with an algorithm based on syntactic analysis by neural networks that provides a quick opinion on the provided code. This operation allows detecting faults as soon as possible, which allows the engineer to correct them before using more resource-intensive methods. Then, we can use UPPAAL or databases depending on the size of the code to certify that the errors defined in this article are in the code. The key point of our proposed method is to find simple errors in a code that are often programming errors without a priori. Thus, all the codes implemented in a factory should go through these verification steps to eliminate simple faults which would considerably reduce the time of analysis with respect to the behavior of the PLC which requires a comparison with the driven machine and its environment.
6. Conclusions
In this article, we have discussed several methods that could allow the Factory 4.0 engineers to test generic properties of PLC programs in a simple way. To do this, we explored different methods that could be used together in the future for testing PLC programs of different sizes and complexities. Moreover, we believe that this work could be used as a basis for more complex database implementations on supercomputers and for training neural networks on larger syntax. Indeed, the exploration of the accessible marking graph can be easily parallelized, and the accessibility to the database can be parallelized. A work on an implementation closer to the functioning of a database would certainly make it possible to treat much more complex cases in record time. Furthermore, neural networks are able to translate and summarize text thanks to attention mechanisms which could certainly give excellent results on code verification. At the moment, our algorithm is not able to cover all the codes implemented on the different controllers used in the industry. Some complex semantics inducing strong combinatorial explosion cannot be present in the verified codes. We hope to be able to extend our work to the whole set of codes by implementing other ingenious solutions allowing us to treat the various combinatorial problems.
The final goal of this study is to support humans to design reliable PLC codes. During this first study, we did not submit our methods to actors who did not know our research subject. One of the objectives of our future work will be to submit our proposed tools to personnel working on the realization of PLC code in an industrial environment.