Next Article in Journal
Autonomous Vision-Based Aerial Grasping for Rotorcraft Unmanned Aerial Vehicles
Next Article in Special Issue
Toward Dynamically Adaptive Simulation: Multimodal Classification of User Expertise Using Wearable Devices
Previous Article in Journal
Capture Point-Based Controller Using Real-Time Zero Moment Point Manipulation for Stable Bipedal Walking in Human Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Bayesian Argumentation Framework for Distributed Fault Diagnosis in Telecommunication Networks

by
Álvaro Carrera
1,*,†,
Eduardo Alonso
2 and
Carlos A. Iglesias
1
1
Departamento de Ingeniería de Sistemas Telemáticos, Universidad Politécnica de Madrid, 28040 Madrid, Spain
2
Department of Computer Science, City University London, London EC1V 0HB, UK
*
Author to whom correspondence should be addressed.
Current address: Avenida Complutense, 30, 28040 Madrid, Spain.
Sensors 2019, 19(15), 3408; https://doi.org/10.3390/s19153408
Submission received: 27 June 2019 / Revised: 30 July 2019 / Accepted: 1 August 2019 / Published: 3 August 2019

Abstract

:
Traditionally, fault diagnosis in telecommunication network management is carried out by humans who use software support systems. The phenomenal growth in telecommunication networks has nonetheless triggered the interest in more autonomous approaches, capable of coping with emergent challenges such as the need to diagnose faults’ root causes under uncertainty in geographically-distributed environments, with restrictions on data privacy. In this paper, we present a framework for distributed fault diagnosis under uncertainty based on an argumentative framework for multi-agent systems. In our approach, agents collaborate to reach conclusions by arguing in unpredictable scenarios. The observations collected from the network are used to infer possible fault root causes using Bayesian networks as causal models for the diagnosis process. Hypotheses about those fault root causes are discussed by agents in an argumentative dialogue to achieve a reliable conclusion. During that dialogue, agents handle the uncertainty of the diagnosis process, taking care of keeping data privacy among them. The proposed approach is compared against existing alternatives using benchmark multi-domain datasets. Moreover, we include data collected from a previous fault diagnosis system running in a telecommunication network for one and a half years. Results show that the proposed approach is suitable for the motivational scenario.

1. Introduction

Telecommunication companies have seen an exponential increase in their activity in the last few decades [1]. As a consequence, telecommunication networks have been continuously growing, both in size, heterogeneity and complexity. The current Internet is based on the premise of a simple network service used to interconnect end systems where relatively intelligent services are running. That simplicity has allowed a massive growth of the network since the beginning of the primitive Internet [2]. However, the management approach followed by network operators in the current Internet is obstructing its evolution. Furthermore, network management is challenging for next-generation networks [3]. The future Internet needs to optimise the use of its resources continuously and recover from problems, faults or attacks transparently for the network operator and without any impact on the services running over it [4]. Thus, future networks need to be more intelligent and adaptive than the current ones, and their management systems need to be embedded in the network itself, instead of being external systems [5].
In those next-generation networks, many different actors will interact dynamically to offer reliable end-to-end services. That diversity of actors (users, sensors, devices or content providers) will make network operation and management very hard for the traditional network management approach [6]. Services deployed on top of those next-generation networks will be considered as one of those actors that will have to cooperate autonomously with other actors to get the expected result [7]. That dynamic, autonomous and complex cooperation among actors is a crucial requirement to get flexible and efficient networks [8]. Moreover, that complexity of the future Internet will bring a high level of uncertainty to management tasks [9]. However, that uncertainty is not only an issue for the future Internet. The current Internet deals with it as exposed by Clark et al. [10]. They estimated that the current Internet is over-dimensioned by a factor of 400% to ensure its performance under almost any conditions. This means that the strategy of the current Internet is to over-size the network to ensure its availability. However, maintaining this strategy in the future Internet would be very inefficient and costly. Indeed, the cost of network management and support has increased drastically in recent years due to the complexity of the network technologies requiring more highly-skilled engineers and administrators and rounding 200 billion dollars [11]. Therefore, uncertainty management coming from complex networks is an essential requirement for any management system of the future Internet [12].
To deal with that complexity, autonomic approaches have been proposed both for computing [13] and networking [14,15,16]. This trend tries to achieve self-management capabilities with a Monitor-Analyse-Plan-Execute (MAPE) control loop [13] implemented by autonomic managers. Those autonomic managers must perform different management tasks, such as self-configuring, self-healing, self-optimising and self-protecting. However, a single isolated autonomic manager can achieve autonomic behaviour only for the resources it manages, which can lead to scalability problems. Many managers must be coordinated to obtain global autonomic management of the network to avoid those problems. That coordination is another critical challenge to get a real autonomic network management approach.
Summarising, autonomic approaches require innovative aspects and mechanisms to enable the desired self-capabilities to govern an integrated behaviour of the future Internet [17]. Those mechanisms are based on the usage of the specific domain knowledge of network engineering, taking into consideration the dynamism and complexity of the supervised systems. The European Telecommunications Standards Institute (ETSI) supports this autonomic approach with a generic reference model for autonomic networking named Generic Autonomic Network Architecture (GANA) [18], which defines a set of desired properties for those autonomic systems. Those desired properties are automation, awareness, adaptiveness, stability, scalability, robustness, security, switchable and federation. Getting all of them in an autonomic management system is challenging for network operators following the traditional management approach. Thus, Laurent et al. [18] defined some enabling concepts and mechanisms for further research to achieve those desired properties in the management systems of the autonomic future Internet. This autonomic approach is supported by the Internet Research Task Force (IRTF). Behringer et al. [19] described the design goals of the autonomic networking in the Request For Comments (RFC) 7575, and Jiang et al. [20] analysed the wide gap for autonomic networking reviewing the current status of the autonomic aspects of current networks. Among others, they identified troubleshooting and recovery as some of the non-autonomic behaviours of the current Internet in the RFC 7576, which motivated us to develop this work in the field of autonomic fault diagnosis of telecommunication networks.
In conclusion, due to the increasing complexity, heterogeneity and consequent high level of uncertainty in telecommunication networks, autonomic fault management is an exciting research field for network operator companies and research institutions [21,22,23]. Accordingly, our motivation when preparing this paper was to improve the current situation concerning the challenges mentioned about the autonomic future Internet. In particular, we are motivated by the fact that there is still a lack of solutions for autonomic fault diagnosis mechanisms.
In this paper, we present an innovative approach for distributed fault diagnosis based on a Multi-Agent System (MAS), which applies argumentation techniques to reach agreements among agents (i.e., autonomic managers). During the argumentation process, agents use Bayesian reasoning to handle the uncertainty inherent to fault diagnosis tasks of complex systems. Moreover, data privacy restrictions and their distributed nature are also considered, enabling the application of the proposed method in federated network domains.
Thus, the main contributions of the presented work are (i) an argumentation framework for fault diagnosis based on Bayesian reasoning and (ii) a coordination protocol to apply that framework in a Multi-Agent System (MAS) for distributed fault diagnosis in federated network domains. This work was based on Chapter 4 of the first author’s Ph.D. thesis [24].
The rest of this paper is structured as follows. Firstly, Section 2 presents some previous work, which has been used as the basis of the method presented in this paper, and discusses other related works in the research field of distributed fault diagnosis for network management. Section 3 presents the multi-agent architecture proposed to carry out the cross-domain diagnostic process. The proposed Bayesian Argumentation Framework (BAF) is defined in Section 4, and the protocol proposed to apply it in an MAS is proposed in Section 5. Next, in Section 6, experimental results are presented, validated and discussed. Finally, Section 7 presents some concluding remarks and a brief discussion on future work.

2. Background

The method proposed in this paper is based on a previous work, which consisted of an MAS for fault diagnosis deployed in a real-telecommunication network [25]. In that MAS, the diagnosis process starts with a request made by a human operator, which offers the first evidence of a fault. Then, the system executes a set of tests to collect other relevant information from the network and other third-party systems. Finally, the system infers the most probable root cause of the fault and shows the result to network operators. The diagnosis system was evaluated in a real-life scenario for a specific service provided by Telefónica O2 Czech Republic. The system performance was measured with several Key Performance Indicator (KPIs), which show the acceptance of the diagnosis system by human operators and the reduction of the average incident solution time.
Following the successful system applied initially for fault diagnosis of one specific service, the system was adopted and deployed in parallel to other services and networks. Then, network operators considered that it would be beneficial if those isolated diagnosis systems could share some relevant information and diagnose collaboratively to solve faults. That feature presents some challenges for the previous MAS architecture due to scalability issues. Specifically, those issues were focused on the reasoning technique applied for uncertainty management, which is one of the key features that made the system applicable to that real-life scenario. That reasoning process under uncertainty was performed with a Bayesian inference engine applying a causal model of faults, which relates the cause of the fault and their symptoms. However, as that reasoning process was centralised in one single agent, the complexity of that Bayesian model increased when new systems and services had to be diagnosed by MAS, which made the system robustness decrease, and the causal model maintainability was more costly.
The Bayesian reasoning technique is widely used for fault diagnosis in the literature [26,27] for different kinds of networks, such as high-speed rail networks [28,29], wireless sensor networks [30,31], or optical networks [32,33]. Consequently, some distributed reasoning techniques were explored to solve that scalability issues. Our first attempt was to apply one of the existing techniques for distributed Bayesian inference [34], such as Distributed Perception Network (DPN) [35] or Multiple Sectioned Bayesian Network (MSBN) [36,37]. However, some requirements for applying those techniques in the considered scenario, i.e., a dynamic and complex environment such as a telecommunication network, made them not fully compatible with the deployed MAS architecture. Among other restrictions, the requirement of combining knowledge from different experts or domains is not directly covered by these techniques without generating one centralised model and splitting it later into partial models. This issue reduced the scalability of the final solution. Then, we explored other alternatives, which would offer an extra degree of flexibility and were compatible with the desired features of the future Internet exposed by [18], paying particular attention to the network federation, which would allow different autonomic managers to collaborate when required. In our case, we would like to have several agents that are able to perform a distributed diagnosis process handling the uncertainty inherent in any diagnostic task under several conditions such as data privacy and access restriction, critical aspects in federated domains. Thus, the main requirement for that desired technique is the MAS must be able to perform distributed reasoning under uncertainty with access restriction to some crucial information for the fault diagnosis process.
Therefore, we explored other possibilities to add the desired capability to the system. Concerning the distributed classification task based on MAS, Modi and Shen [38] proposed an approach based on each agent only receiving a subset of the attributes of the classification domain. That means each agent had a subset of attributes, and all agents knew all dataset instances during the training phase of the classifier. This feature could be a scalability issue when the number of instances increases. In contrast, in the PISA framework proposed by Wardeh et al. [39], data were either distributed among the agents, i.e., each agent had its private local dataset. For our case study, we would need a combination of both approaches adding flexibility to allow an agent to have any quantity or subset of variables (i.e., attributes) of the diagnosis causal model. On the one hand, the main issue in a multi-agent distributed task is not the algorithms themselves, but the most appropriate mechanism to allow agents to collaborate, as said by Gorodetsky et al. [40]. In this aspect, the argumentation technique applied the PISA framework [39], providing a satisfactory collaboration mechanism. On the other hand, our diagnosis system deals with uncertain information and would need agents able to discuss those uncertain variables given a complete probability distribution about any unknown attribute of the diagnosis case using the soft-evidence technique [41]. The PISA framework uses absolute values for attributes, which would be an issue in complex and dynamic environments, such as our motivational scenario.
Exploring other argumentative techniques, we found that Dung [42] gave the basis of the mainstream contemporary work. On that basis, Bondarenko et al. [43] proposed the Assumption-based Argumentation Framework (AAF), which extends Dung’s framework and offers more flexibility to generate and process arguments, including assumptions in the problem. However, these theoretical frameworks do not deal with uncertainty or probabilistic statements. In contrast, other recent works explore the application of probabilistic argumentation frameworks [44,45,46] for modelling uncertain logical arguments [47] extracting them from a Bayesian networks [48,49]. With respect to the application of probabilistic reasoning in argumentation processes, we can find in the literature many works from a mathematical point of view, such as the application of a probabilistic approach to modelling uncertain logical arguments [47,50], direct translations from Bayesian approach to argumentation [51], to a philosophical point of view, such as a Bayesian perspective of testimonies and arguments [52,53], or to a legal point of view, such as an argumentation supporting tool [54]. Following these approaches, we propose to extend the previous diagnosis system [25], which applied Bayesian reasoning to infer fault root causes with an argumentative capability. That capability allows agents to dialogue about diagnosis cases, keeping coherence in the distributed reasoning process under some restrictive conditions mentioned previously for federated domains, such as data privacy or access restriction.

3. Multi-Agent Architecture for Distributed Fault Diagnosis

This section presents the proposed multi-agent architecture for fault diagnosis in federated domains. In this scenario, every agent manages a specific network domain and is responsible for monitoring and diagnosing its faults. However, in a federated scenario, some of those faults involve different domains. In those cases, agents use the argumentation framework, proposed in Section 4, to carry out a dialogue during the diagnosis process sharing the minimum required information to perform a distributed diagnosis process. Finally, to coordinate that argumentation, they apply the protocol proposed in Section 5.
In conclusion, we are considering a distributed environment where agents have their partial view of the global problem and cooperate using argumentation techniques to achieve reliable conclusions for a fault diagnosis task in a telecommunication network management scenario. This scenario is schematically depicted in Figure 1, where different network domains are federated to ensure that agents can carry out the inter-domain fault diagnosis process.
The proposed agent architecture, named the Bayesian Argumentative Agent, extends the agent architecture presented in a previous works [24,25], labelled as Bayesian Agent in the figure. As the Bayesian Argumentation Framework and the Coordination Protocol are detailed in Section 4 and Section 5, respectively, the following paragraphs presents a brief summary of the Bayesian Agent architecture.
The aim of the Bayesian Agent architecture is monitoring and diagnosing a specific domain of a telecommunication network. These agents use Bayesian models [55], which provide the capability of modelling causal relations to represent possible faults with their symptoms. The agent collects data in real-time, performs a diagnostic process and offers as output a set of the most probable fault root causes with associated probabilities of occurrence. One of the main features of this agent architecture is the capacity to deal with the uncertainty of complex systems. Another exciting feature of those Bayesian models is that agents can learn by themselves from their experience or based on given knowledge using machine learning algorithms. Thus, every agent has its Bayesian Network (BN), which synthesises its knowledge to infer possible fault root causes based on variables observed from the environment, in our case a telecommunication network. BN is a model that represents the variables involved in the diagnosis process among them and with the possible root causes via a Directed Acyclic Graph (DAG). Therefore, the diagnosis problem domain is described by a set of variables with a set of states in which they can be. Therefore, the problem domain is represented as a causal model; in our case, a Bayesian network. The agent applies this model to discriminate the most probable fault hypotheses. Then, it offers them a conclusion of the fault diagnosis process.
In this work, we are dealing with a distributed scenario with federated domains. We considered agents have their partial perception of the environment, in this case, a portion of the telecommunication network. That means some information could not be accessed directly due to technical issues or privacy restrictions, which generates uncertain situations. Furthermore, data privacy is a critical aspect in network management, which may involve legal clauses, such as final user privacy, to business interests, such as cross-domain actions from different telecommunication operators. Therefore, agents must be able to work under uncertainty and to cooperate keeping data privacy to diagnose faults in a cross-domain scenario. However, every agent has different experience, knowledge and a partial view of the global problem. Thus, different conclusions (or fault root causes) can be inferred. To achieve agreements for a specific fault case in those cross-domains diagnostic processes, we propose the Bayesian Argumentation Framework (BAF), defined in Section 4, and the Coordination Protocol, exposed in Section 5.

4. Bayesian Argumentation Framework

This section proposes a Bayesian argumentation framework to discriminate the most probable cause of a fault during a distributed fault diagnosis process in federated domains. We considered that every agent manages its domain and has a partial view of the global problem. This ability to divide the global problem into domains combined with coordination mechanisms ensures the scalability for large-scale systems [56]. Therefore, the coordination mechanism provided by this argumentation framework is required to ensure the scalability of the multi-agent architecture presented in Section 3. As mentioned previously, the model applied during the hypothesis discrimination phase to reason under uncertainty is the Causal Model. This model was used to update the hypothesis set every time new observations were collected from the network. The information used as input and output of the Causal Model was used in this argumentation framework to build arguments keeping the uncertainty management capability offered by the model. Thus, the argumentation framework exposed in this section required that every agent had a Causal Model to build and process arguments.
All in all, the definition of the argumentation framework is presented in Section 4.1. The possible relations that can exist between arguments of this framework are exposed in Section 4.2.

4.1. Framework Definition

The proposed argumentation framework relies on the idea of probabilistic statements built using a Causal Model. That model is composed of a set of variables and their conditional probabilistic dependencies, as explained in Section 3. Accordingly, we consider that the problem domain for this argumentation framework is described by a set of variables V = { v 1 , , v n } and a set of states S = { s 1 , , s m } in which the variables can be. Each variable v i V can be in a state s j S with a given probability. The set of states a variable v i can be in is denoted by S v i S and is defined as the variable state set. We define two types of variables: observations, o b s , and fault root causes, f r c , which compose the set V = o b s f r c . Those observations and fault root causes are modelled as variables of the agent’s Causal Model, which allows the agent to infer the probability of a variable is in a given state. That probability represents the agent’s degree of certainty about the state of a given variable, which is the crucial concept to handle the uncertainty of the diagnosis process. In this argumentation framework, we denote that probability as p ( i , j ) = P r ( v i , s j ) = [ 0 , 1 ] , where s j S v i . To condense the probabilities of all states of a given variable v i , we define a set of probabilities on that variable, as a statement s t v i . Formally,
Definition 1.
A statement s t v i is a pair v i , D where v i V and D is a set of probabilities p ( i , j ) , which represent the probability of the variable v i being in the state s j .
A statement s t v i on a variable v i is coherent if and only if p ( i , j ) s t v i , p ( i , j ) = 1 . That means that a statement is coherent if it represents a probability distribution for the possible states of the variable v i . Formally,
Definition 2.
A statement s t v i is coherent p ( i , j ) D p ( i , j ) = 1 . Otherwise, s t v i is incoherent.
We define three different types of statements: evidence, assumption and proposal. On the one hand, evidence is based on an observation collected from the network and represents that a variable is in a specific state. As observed directly from the network, we considered that information is certain and cannot be discussed. Formally,
Definition 3.
Given a coherent statement s t v i = v i , D , s t v i is evidence v i o b s p ( i , j ) D p ( i , j ) = 1 .
On the other hand, an assumption represents an unobserved variable. That means the agent cannot gather that information for any reason, such as technical issues or privacy restrictions. Then, an agent can infer this assumption based on the knowledge synthesised in its Causal Model. As an assumption is based on background knowledge and is not certain information, this type of statement can be discussed among agents to clarify the state of the variable, as explained in the following sections. Formally,
Definition 4.
Given a coherent statement s t v i = v i , D , s t v i is an assumption v i o b s p ( i , j ) D p = 1 .
Finally, a proposal represents a hypothesis for the states of a specific variable, which can be a conclusion of the possible fault root cause or a possible clarification for an assumption. Formally,
Definition 5.
Given a coherent statement s t v i = v i , D , s t v i is a proposal v i V o , p D p = 1 .
To summarise, statements about different variables in the domain are grouped into a set of statements to conform arguments. Based on the three types of statements, we define an argument as a triplet of sets of statements: one set for certain information, another set for uncertain information and the last one for proposing conclusions or clarifications. Formally,
Definition 6.
An argument a r g is a triplet E , A , P , where E is the evidence set of a r g , A is the assumption set of a r g and P is the proposal set of a r g .
In conclusion, this argumentation framework defines three different types of statements, which represent different types of knowledge. Arguments are built as a triplet of sets of statements: evidence set, assumption set and proposal set.

4.2. Relations between Arguments

The framework defined in the previous section was proposed to perform hypothesis discrimination tasks among sets of agents in distributed fault diagnosis processes. Thus, agents have to generate and evaluate arguments to try to finish the process with the most reliable diagnosis conclusion. That evaluation process is based on the relations between every pair of arguments as explained below.
To explain the relations between arguments, we define that a pair of agents, A g i and A g j , can agree or disagree, because they have different background knowledge and different views of the global problem when they are diagnosing in federated domains. Hence, if A g i , generates an argument, a r g i , and A g j generates another as response, a r g j ; there can be two main types of relation between those arguments: a support relation, if both agents agree, or an attack relation, if not. Moreover, there are different types of attacks. However, before starting with the definition of those attack types, we must define the relations of similarity and preferability between two statements, α and β , generated by two different agents about a specific variable. Similarity is used to check the agreement between agents measuring how similar both statements are. If the statements are not similar, we say agents disagree. Then, preferability is used to choose one and discard the other, i.e., to know which of the two statements is preferred against the other. These concepts of similarity and preferability are explained below, in Section 4.2.1 and Section 4.2.2 respectively. Finally, the types of attacks between arguments are exposed in Section 4.2.3.

4.2.1. Similarity

We define similarity between statements as a measure of equivalence between them. If two statements are similar enough, we say they are equivalent to the fault diagnosis task. To measure the similarity between two statements, we process those statements as probability distributions that represent the possible states of the variable v i , as defined in Section 4. Thus, the similarity is used to know if two agents agree or disagree about the state of a specific variable, i.e., if their statements are similar enough or not. Strictly, two statements are equal if both have equal probabilities for every state of a variable p ( i , j ) . As agents have their private causal models, it is not probable that two statements from different agents have equal probabilities. For that reason, the definition of similarity between statements includes some permissibility to allow that agreement was found with more flexibility, which reduces the number of arguments needed to achieve a reliable conclusion. Moreover, for our fault diagnosis task, we do not need strict equity between statements. Two similar statements are a tolerable agreement between agents to continue with the argumentation process.
Therefore, to measure the similarity of two statements α , β about the same variable v i , we need to apply a distance function, Δ , to get a numeric measure, Δ ( α , β ) R , about how similar two statements are between them to know if agents agree or disagree.This similarity can be measured using different distance metrics, such as Euclidean distance, Hellinger distance, Kullback–Leibler distance, J-divergence distance or Cumulative Distribution Function (CDF) distance. For a review of distance metrics between probability distributions, please refer to the work of Koiter [57]. For our fault diagnosis field, we picked the Hellinger distance [58], which offers the following exciting features. Firstly, it can be normalised to bound the metrics in [ 0 , 1 ] , which simplifies its processing in contrast with other unbounded metrics, such as Kullback–Leibler distance or J-divergence. Secondly, it does not require any order sequence among the states of a variable, in contrast with CDF distance, which is targeted towards ordinal distributions. Thirdly, it is symmetric, in contrast with others, such as Kullback–Leibler distance. That symmetry is an interesting feature since it does not require any order between statements to measure the distance between them; because similarity must be a symmetric measure. Finally, it is more sensitive near zero and one, in contrast with Euclidean distance. That sensitivity is a desirable feature because probabilities near those values in a statement represent that an agent is almost sure that a variable is ( p ( i , j ) 1 ) or is not ( p ( i , j ) 0 ) in a given state. This feature is suitable because a statement that is more sure about the state of a variable should be less similar than other less certain or confident ones.
Then, with the normalised Hellinger distance [58], shown in Definition 7, chosen to measure the similarity between two statements, Δ ( α , β ) R , we define a threshold t h = [ 0 , 1 ] to establish the bound distance between two statements to be classified as similar enough. Therefore, two statements α , β about the same variable v i are similar enough, if the distance between them is below the threshold, Δ ( α , β ) < t h .
Definition 7.
Given two discrete probability distributions P = ( p 1 , , p k ) and Q = ( q 1 , , q k ) , their normalised Hellinger distance is defined as:
H ( P , Q ) = 1 2 i = 1 k ( p i q i ) 2
Based on this definition, a threshold value near zero would imply strict behaviour, because, then, agents only agree when the distance between them is narrow. That behaviour would increase the number of arguments to achieve a conclusion. In contrast, a threshold value near one would entails a permissive behaviour, as agents would almost always agree, which would reduce the duration of the argumentation. However, any convergence would not be achieved. Accordingly, a threshold value between the two bounds should be adjusted depending on the preferences between these two behaviours. In the same sense, a value above 0.5 would have no sense to get agreement at the end of the argumentation, because it would diverge the beliefs of the agents instead to converge to a common conclusion. Thus, the threshold value should be between zero and 0.5 to foster agreements, 0 < t h < 0.5 .
Finally, we formally define similarity as follows:
Definition 8.
Given two coherent statements α , β about the same variable v i , a distance function Δ and a threshold t h , α is similar to β, and vice versa Δ ( α , β ) < t h . Otherwise, they are not similar.

4.2.2. Preferability

After presenting the concept of similarity, we define preferability between two statements as an order of preference between them. As the goal of the system is to diagnose fault root causes, a statement that contains more reliable information about a variable is preferred against others. As mentioned above, when two statements are not similar enough, i.e., they are not equivalent, we can define an order of preference between them.
As mentioned previously, a statement is composed of a set of probabilities p ( i , j ) that a variable is in a given state. Those probabilities represent the agent’s degree of certainty about the state of a given variable. That means a probability p ( i , j ) 1 represents the agent is almost sure the variable v i is in the state s j . Contrarily, a probability p ( i , j ) 0 means the agents is almost sure the variable is not in that state. For the fault diagnosis task, that certainty is more valuable than other less certain probabilities, such as p ( i , j ) 0.5 . Thus, we would prefer the statement that represents the highest level of certainty to get more confident conclusions. Notice that this preference is valid because we are considering that all agents have a common goal of diagnosing faults, and they cooperate to achieve it. In competitive environments, every agent could have different preferences, and this decision could be made based on different criteria. However, we are considering only collaborative behaviours in this work. Then, we formally define preferability as follows:
Definition 9.
Given two coherent statement α , β about the same variable v i with their respective sets of probabilities D α , D β , α is preferred to β p ( i , j ) D α p ( i , k ) D β , p ( i , j ) > p ( i , k ) .
Finally, the orderability of statements provided by this preferability property is an interesting feature to solve conflicts and to choose the preferable statement of a set. This property is used in the conflict resolution strategies exposed in Section 5.3.

4.2.3. Types of Attacks

At this point, we have defined two key concepts: similarity and preferability. Now, we define the different type of attack relations that can exist between two arguments, a r g j and a r g i , based on those key concepts. We define three different attack types if agents disagree on a specific type of statement: discovery, clarification and contrariness.
Definition 10.
Given two arguments a r g i and a r g j , generated by agents A g i and A g j , respectively, if a r g j contains any new evidence, s t v i | v i o b s , about the diagnosis in progress that is not in a r g i , we define that a r g j is a discovery for a r g i .
If a r g j is a discovery for a r g i , a r g i is discarded, and agent A g i should generate a new argument including the new evidence. The discovery is the most basic attack type because it modifies the ground of the reasoning process. Hence, if the ground (the evidences) changes, the output of the inference process (the conclusions) could change as well.
Definition 11.
Given two arguments a r g i and a r g j , generated by agents A g i and A g j , respectively, when a r g j contains a proposal about the variable, s t v i | v i o b s , and a r g i contains an assumption of that variable, if both statements are not similar and the proposal is preferred to the assumption, we define that a r g j is a clarification for a r g i .
If a r g j is a clarification for a r g i , a r g i is discarded, and agent A g i should accept the proposal and generate a new argument including it. A clarification attack tries to offer more certain information about an unknown variable represented in the assumption.
Definition 12.
Given two arguments a r g i and a r g j , generated by agents A g i and A g j , respectively, when a r g j contains a proposal that contains a possible conclusion of the diagnosis process, s t v i | v i f r c , and a r g i contains other possible conclusions, s t v j | v j f r c , if both conclusions are not similar enough, we define that a r g j is a contrariness for a r g i , and vice versa.
As no statement is discarded in this attack type, both agents stop arguing until new discoveries or clarifications appear during the argumentation process. This type of attack is resolved globally at the end of the argumentation process of the coordination protocol, as shown in Section 5.
Finally, we define that an argument supports another one if no attack relation exists between them. If a support relation exists among all pairs of non-discarded arguments, a global agreement has been achieved. However, as every agent has its domain knowledge synthesised in the Causal Model, we cannot ensure a global agreement for any argumentation. Thus, a conflict resolution mechanism must be applied if required.

5. Coordination Protocol

This section proposes a coordination protocol for distributed autonomic fault diagnosis in federated domains based on the argumentation framework proposed in the previous section. In this protocol, there are two different agent roles, namely the Argumentative role and the Manager role. An argumentative agent is responsible to generate and process arguments. A manager agent is responsible for several tasks: (i) to establish a coalition of argumentative agents to argue; (ii) to decide when an argumentation process has finished; (iii) to figure out the conclusion of the argumentation process.
The proposed protocol has three different phases, as summarised in Figure 2. The initial phase for the formation of a group of agents capable of reaching a reliable conclusion for a specific diagnosis case is called the Coalition Formation Phase, as explained in Section 5.1. After the argumentation coalition is established and every agent knows the rest of the constituents, the Argumentation Phase starts, as exposed in Section 5.2. Finally, when a manager agent decides argumentation is finished, all non-discarded arguments are analysed to extract a conclusion during the Conclusion Phase, as shown in Section 5.3. For the sake of brevity, a detailed working example of a distributed fault diagnosis is not included in the paper, but it can be found in the Supplementary Material of this article. For the sake of clarity, diagrams included in the following subsections follow the Business Process Model and Notation (BPMN) 2.0 standard specification [59].

5.1. Coalition Formation Phase

This phase starts when an Argumentative agent (Initiator agent) initiates a new process to diagnose an anomaly or a symptom detected in the supervised network elements. This Argumentative agent sends a Coalition Formation Request message to the Manager agent. This message includes data to identify the problem domain. Then, the Manager agent broadcasts the message to the rest of the Argumentative agents as a Coalition Invitation message. When an agent receives that invitation, it decides whether to join the coalition or not. If it decides to join, it must respond to the invitation. Otherwise, it ignores it. This decision is based on the local private knowledge of the agent. In other words, if the agent can offer any relevant information about the problem domain, it would accept the invitation. Otherwise, it would not.
A period of time is specified as the deadline to respond to the invitation to avoid deadlocks while the Manager agent is waiting for responses from all Argumentative agents. In this way, the Coalition Invitation message can be broadcast. After the deadline, the Manager agent establishes the coalition with all agents that accepted the invitation and the Initiator agent. Finally, it broadcasts a Coalition Established message with the complete list of agents that joined the coalition. With this last message, the Coalition Formation Phase is finished. Figure 3 shows a diagram of this phase.

5.2. Argumentation Phase

This phase starts when the Coalition Established message, sent in the previous phase, is received by the Initiator Agent who broadcasts the initial argument to the coalition. After this step, this agent acts as any other Argumentative agent. Later, every agent in the coalition receives that initial argument and analyses it to find any attack relation following the reasoning process exposed in Section 4.2. After an argument is processed, the options of an argumentative agent are the following: (i) If a discovery or a clarification relation is found, the agent generates a new updated argument. (ii) Alternatively, if a contrariness relation is found, it looks for new information from the environment to add it to the argumentation process. (iii) Finally, if the received argument supports the agent beliefs, it can wait until another argument is received.
During this Argumentation Phase, an agent can receive a message that contains an argument while it is processing or generating another one received previously. In that case, the agent should analyse arguments in reception order. Afterwards, it would generate only one argument as the response when all incoming arguments have been processed. This strategy reduces the number of arguments generated during the argumentation dialogue and makes the process require less messaging and, consequently, less computational resources.
As shown in Figure 4, agents broadcast any generated argument to all coalition members, including the Manager agent. In this way, we ensure that any agent receives all arguments to attack or support any of them during the argumentation dialogue. Every time the Manager agent receives an argument, it restarts a timer that is used to know when the argumentation is finished. When all agents remain silent for a time longer than a silence time-out, the Manager decides this phase is done and starts the Conclusion Phase. In other words, the Argumentation Phase continues until every agent has proposed its conclusions to the diagnosis case under consideration, and it does not receive a new argument that makes an agent change those conclusions. It is important to remark that Argumentative agents must be able to process and generate arguments in a time lower than the silence time-out, to ensure the Manager agent does not finish this phase prematurely. Notice that the end of the Argumentation Phase can be delayed in time depending on the number of agents and their configuration. That delay is caused by the impact of the similarity threshold on the agents’ permissibility, as shown in Section 4.2. To avoid this phase being too time-consuming, the Manager agent can be configured to allow the coalition member agents to argue during a time, and then, the Conclusion Phase starts in any case, even if agents continue arguing. At the deadline, the Manager agent sends a message to the coalition members to notify that the argumentation process has finished.

5.3. Conclusion Phase

After the Argumentation Phase is done, the Manager agent must process arguments that would have not been discarded due to attacks and extract a conclusion, as shown in Figure 5. We name that set of non-discarded arguments as the candidate arguments set. Based on the attack types of the argumentation framework proposed in Section 4, only one type of attack relation can be among arguments of this set: the contrariness relation. That is, every argument of the set contains a proposal about a possible fault root cause. If some contrariness is found between two arguments, we say that a conflict is found. If any conflict is found while analysing the candidate arguments set, different criteria can be applied to resolve those conflicts and select a conclusion. Contrarily, if no conflict is found, the conclusion all agents agree on is picked as the final conclusion. We propose three different conflict resolution strategies for this final phase.
  • Most Popular Conclusion: This strategy picks as the final conclusion the most popular one in the candidate arguments set.
  • Most Confident Conclusion: This strategy picks as the final conclusion the one with the highest confidence in the candidate arguments set.
  • Weighted Conclusion: This strategy calculates the average confidence value among all arguments with the same conclusion and picks as the final conclusion the one with the highest average confidence value.
Finally, the Manager agent sends the conclusions of the distributed diagnosis to all Argumentative Agents to notify them that the argumentation is finished including the conclusion of the diagnosis process.

6. Results and Discussion

This section presents a report of the experiments performed to assess the validity of the proposed argumentation framework for distributed fault diagnosis. The three main tasks of any fault diagnosis process [60] are symptom detection, hypothesis generation and hypothesis discrimination.
While the previous work [25] addressed the evaluation of the three tasks, this work is focused on evaluating distributed hypothesis discrimination. This task can be considered a classification task that determines the probability of the possible cause being the fault root. Thus, we have evaluated the argumentation as a distributed multi-class classification technique. In this way, the evaluation compares other classification methods in a number of datasets.
The rest of this section is structured as follows. Firstly, Section 6.1 presents the experimentation framework developed to carry out this evaluation process. Section 6.2 summarises the datasets used in the experiments. Finally, Section 6.3 shows the results, and Section 6.4 discusses them.

6.1. Experimentation Framework

To provide an empirical assessment of the application of the proposed argumentative framework in the context of standard classification problems, a set of experiments was conducted to compare the results of the proposed technique with other traditional classification techniques. The traditional techniques considered were decision trees (such as J48, Logical Analysis of Data (LAD) Tree (LADTree) or Pruning Rule-based Classification (PART); support vector machines (such as Sequential Minimal Optimization (SMO); simple probabilistic classifiers (such as NBTree); and probabilistic graphs (such as BayesSearch). Most of the considered techniques are available in the WEKA library (Weka Website: http://www.cs.waikato.ac.nz/ml/weka/). In addition, the SMILE library (SMILE Website: http://genie.sis.pitt.edu/ or http://www.bayesfusion.com/) was used because it provides some algorithms not available in WEKA.
Contrary to the mentioned traditional centralised approaches, the proposed argumentative solution requires more than a data mining library to be executed. We developed an experimentation framework that offers an environment to execute agents under federated domains conditions, such as access restriction situations and different background knowledge for every agent. This framework is available as an open-source tool, named Bayesian ARgumentative Multi-Agent System (BARMAS) framework (GitHub public repository https://github.com/gsi-upm/BARMAS). This framework uses the SMILE library as the Bayesian inference and learning engine to enable Argumentative Agents to reason with their Causal Models and the MASON simulation framework (MASON Website: http://cs.gmu.edu/~eclab/projects/mason/) as the agent platform.
Notice that the considered traditional classification techniques follow a centralised approach, while the proposed argumentative framework was designed as a distributed solution from the beginning. That is a crucial feature for the application of this work in the motivational scenario, i.e., a distributed fault diagnosis system for federated telecommunication networks. However, it is interesting to compare the results of those techniques for contexts where a centralised one could replace a distributed approach.
Summarising, Table 1 shows the classification techniques considered in the experiments and the respective software libraries used to execute them.
The validation process was carried out with a cross-validation technique with a 10-fold configuration. While for the validation of traditional centralised approaches, 90% of data were used for training and 10% for testing in 10 different iterations, the training data were divided into many sets as argumentative agents were running in the experiment of the BARMAS framework. Therefore, each agent had only a portion of the total training dataset to provide different background knowledge for every agent. For instance, in an experiment with two argumentative agents, each one had only a 45% of the original dataset for training. The other 10% was used for testing. For three agents, 30%; for four agents, 22.5%, etc. To learn from training data, each argumentative agent performed a training process using the BayesSearch technique to synthesise the agent’s background knowledge in a Causal model. To reproduce access restriction conditions, the set of variables was divided into many subsets as Argumentative agents, V 1 V n = V \ f r c (excluding the classification target variables f r c ). Each agent can access only one of those subsets to reproduce the partial view of the global problem in federated domains.
In addition to the argumentative agents mentioned previously, some extra agents were used in the experiments to reproduce the conditions of the real-life scenario that motivated this work. Firstly, a Generator Agent was included in the experiments to generate diagnosis cases. In other words, it simulated the symptom detection task and triggered the distributed classification process. A Manager Agent was included to control the argumentation process as explained in Section 5. That Manager agentwas configured to follow the most certain conclusion strategy, proposed in Section 5.3, as the conflict resolution strategy. The silence time-out defined in Section 5.2 was configured to ensure that every agent could generate at least one argument. An Evaluator Agent was included to evaluate the conclusion of the argumentation process. This agent was notified by the Manager agent when the argumentation process was finished to check the correctness of the classification conclusion. Finally, all Argumentative Agents were configured with a threshold value equal to 0.2 ( t h = 0.2 ) to measure the similarity between statements, as explained in Section 4. In conclusion, the following agents were executed in all experiments: Generator, Evaluator, Manager, and a set of Argumentative agents.

6.2. Datasets

Several public datasets were used for the evaluation, as well as the private one extracted from the case study presented in the previous work [25], which contains fault diagnosis data of a real-life telecommunication network running for one and a half years. Those datasets were used to measure the accuracy of the proposed approach for the multi-class classification problem. Public datasets were collected from the UCI (UCI Repository Website: http://archive.ics.uci.edu/ml/datasets.html) and KEEL (KEEL Repository Website: http://sci2s.ugr.es/keel/datasets.php) repositories and have a meaningful difference among their characteristics to the number of classes, number of instances and number of attributes. An overview of the complexity of the considered datasets is shown in Table 2.

6.3. Results

To measure the accuracy of the considered classification techniques, we present the Error Rate (ER) values obtained for every dataset under different uncertainty levels. That uncertainty was generated hiding some variables of datasets to reproduce a crucial aspect of the motivational problem, i.e., uncertainty during fault diagnosis of telecommunication networks. Uncertainty was reproduced in the experiments using missing attributes for the classification algorithms. Three configurations were used to generate different uncertainty levels: no missing attributes (Section 6.3.1), 25% missing attributes (Section 6.3.2) and 50% missing attributes (Section 6.3.3). We considered uncertainty levels above 50% to be quite improbable in real-life scenarios and would result in unreliable conclusions. The following tables show the results of the considered traditional classification algorithms (BayesSearch, J48, LADTree, NBTree, PART and SMO) and the proposed argumentative technique (BARMAS) with different numbers of agents involved in the argumentation process (2, 3 and 4 agents). Notice that the results of different tables should not be compared between them, because the results with no uncertainty were, broadly speaking, better than the results with uncertainty.
Furthermore, to compare different classifiers in this work, statistical tests were applied to the obtained results; concretely, the Friedman test proposed in [61], which is correctly oriented toward the comparison of several classifiers on multiple datasets. This test is based on the rank of each algorithm in each dataset, where the best performing algorithm gets the rank of a score of one, the second-best a score of two, etc. After joining scores for every dataset, the test ranks all classifiers, the best classifier being the one with the lowest score. This test was applied in the three considered scenarios with different uncertainty levels, and other additional tests were performed to evaluate the classifiers under any uncertainty conditions (i.e., testing results for both 25% and 50% in a single Friedman test), shown in Section 6.3.4. Results of these Friedman tests were calculated with an alpha value equal to 0.05 ( α = 0.05 ).

6.3.1. Experimentation Scenario: No Uncertainty

Table 3 shows the results with no missing attributes. However, notice that values were truncated, and values equal to 0.00 do not mean a perfect classification. They represent values between 0.00 and 0.01 . That implies Argumentative agents make no assumptions during the Argumentation Phase because all variables are known with certainty. By analysing these results, we observed that, even with data privacy restrictions and a fully-distributed approach that reduces the number of instances for training by agent, BARMAS are close to traditional techniques. This finding suggests that generally speaking, the use of BARMAS as a distributed approach produces similar results to other centralised alternatives in non-uncertain situations.
However, the Friedman Test showed that, under no uncertainty conditions, the BARMAS approach only statistically improved the BayesSearch algorithm, which indicates that in centralised and non-uncertain scenarios, traditional approaches had slightly better results.

6.3.2. Experimentation Scenario: Moderated Uncertainty

This section exposes the results adding moderated uncertainty to experiments, i.e., removing 25% of the attributes of the classification case as unknown information (missing attributes). That uncertainty allowed BARMAS agents discuss and make assumptions, which was the motivational requirements for this work.
Closer inspection of Table 4 revealed that the difference between columns became significant in uncertain situations for some datasets. For example, focusing on the Mushroom row of Table 4, we observed that BARMAS (0.01∼0.02), BayesSearch ( 0.03 ) and NBTree ( 0.04 ) (Notice that all of them are Bayesian approaches) had quite low ER compared with other alternatives, such as J48 ( 0.26 ), LADTree ( 0.52 ) PART ( 0.23 ) or SMO ( 0.23 ). Another example can be observed in the Zoo row of the same table. In contrast, all compared alternatives presented similar results in the remaining datasets.
Under moderate uncertainty conditions, the Friedman test showed that the BARMAS approach was one of the alternatives with better results in the ranking.

6.3.3. Experimentation Scenario: Strong Uncertainty

This section shows the considered alternatives under strong uncertainty conditions (50% missing attributes). From Table 5, it is apparent that BARMAS had a lower (or at least equal) error rate than all other alternatives, which attests to the accuracy of BARMAS in situations with high uncertainty (50% missing attributes), being preferable to other alternatives. Furthermore, it offers the flexibility to be applied in distributed environments with private knowledge, as mentioned previously. As the data in Table 5 show, the results for the Zoo or Mushroom datasets presented, again, a significant improvement using Bayesian approaches with differences up to 0.49 for the ER values; as can be seen between BARMAS with four agents ( 0.03 ) and LADTree ( 0.52 ) for the Mushroom dataset, shown in Table 5. Contrarily, comparable values are observed (with equal uncertainty levels) in other datasets, such as Solar Flare, Marketing, Nursery, Chess or Network.
Moreover, the Friedman test assessed the BARMAS approach to the best alternative among the considered classifiers under strong uncertainty conditions, which makes its application suitable in the motivational scenario: distributed fault diagnosis tasks in complex and dynamic scenarios, such as telecommunication networks.

6.3.4. Experimentation Scenario: Average Uncertainty

This section presents the scores obtained applying the Friedman test to results from Table 4 and Table 5 together. The objective of this test is to assess the validity of the proposed BARMAS approach under different or undetermined uncertainty conditions, which means that the test offers a ranking for the classifiers in different uncertainty conditions. As shown in Table 6, the benefits of the BARMAS approach are immediately visible under uncertainty conditions.

6.4. Discussion

In conclusion, one of the most important consequences of the analysed results was the robustness of the proposed approach against the uncertainty that could be observed focusing on the same dataset of Table 3 and Table 5. For example, at one end, the experiment with four BARMAS agents presented only a 0.02 difference between situations with no uncertainty ( E R = 0.01 ) and 50% of missing attributes ( E R = 0.03 ) for the Mushroom dataset. In contrast, at the other end, LADTree presented a difference equal to 0.52 ( E R = 0.00 with no uncertainty and E R = 0.52 with 50% missing attributes.).
Moreover, the Friedman test results showed the improvement of the accuracy under strong uncertainty conditions, as shown in Figure 6, where only classifiers with the best scores are highlighted. Thus, the results of the experiments attest that BARMAS provided a suitable mechanism to perform distributed fault diagnosis under uncertainty in federated domains.

7. Conclusions and Future Work

This paper presented an argumentation framework based on Bayesian reasoning for a distributed fault diagnosis task in telecommunication networks. Moreover, a protocol was proposed to apply that framework for a distributed MAS in federated domains. We considered that those agents have different partial views of the global scenario with data privacy restrictions in those domains. Hence, the presented approach proposed an MAS with argumentation capabilities based on Bayesian reasoning for agents with local private datasets. Two agent types were considered in the proposed protocol: Argumentative and Manager agents. During an argumentation, the protocol allowed a set of Argumentative agents to discuss the causes of a detected anomaly, such as a fault in telecommunication networks that must be diagnosed. Those agents interchange arguments that contained information about the diagnosis case until a Manager agent decided the argumentation was finished and extracted the conclusion, i.e., the fault root cause of the detected problem.
The proposed method was evaluated with a set of experiments based on empirical data. The obtained results supported its validity as a distributed hypothesis discrimination mechanism for a fault diagnosis system. Among others, we can highlight a set of exciting features for our motivational scenario, i.e., fault diagnosis in telecommunication networks. (i) It allows performing distributed hypothesis discrimination keeping coherence with high robustness against uncertainty. (ii) It allows keeping private knowledge among agents. (iii) It provides a cooperation mechanism for conflict resolution. Furthermore, (iv) it can be deployed in a dynamic and complex environment as agents create temporal coalitions in execution time as required.
After the proposed approach was validated as a distributed hypothesis discrimination mechanism, several possible paths can be explored as future work. An interesting feature that we plan to explore is the usage of trust mechanisms during the argumentation process. Adding reputation, agents can decide if the information received from other agent is more or less reliable than their own beliefs. For instance, if an agent has less experience or is always wrong, its arguments are less reliable than others sent by expert agents, which proposed correct arguments in other past cases. Furthermore, this feature could be completed with feedback mechanisms to check, after the argumentation process, if a statement was true or false, i.e., correct or incorrect. With that feedback, the reputation of any agent could be adjusted at execution time.
Finally, we plan to perform another new set of experiments to look for a set of rules that define the optimal value of the threshold parameter based on the context. Thus, agents could have a self-adaptive behaviour depending on the number of agents in the argumentation coalition, the level of uncertainty or the trust they have in the reminder agents to reach the optimal criterion to accept or reject a received argument.

Supplementary Materials

The following are available online at https://www.mdpi.com/1424-8220/19/15/3408/s1, Worked Example of the proposed Bayesian Argumentation Framework.

Author Contributions

Conceptualization, Á.C., E.A. and C.A.I.; data curation, Á.C.; formal analysis, Á.C. and E.A.; funding acquisition, Á.C. and C.A.I.; investigation, C.A.I.; methodology, Á.C.; project administration, C.A.I.; resources, Á.C. and C.A.I.; software, Á.C.; supervision, C.A.I.; validation, Á.C.; visualization, Á.C.; writing, original draft, Á.C.; writing, review and editing, Á.C., E.A. and C.A.I.

Funding

This research work was supported by the Spanish Ministry of Economy and Competitiveness under the R&D project SEMOLA (TEC2015-68284-R) and by UPM grant for Short Stays (EE.BB.2013).

Acknowledgments

The authors want to acknowledge the cooperation of Telefónica O2 Czech Republic in providing a dataset with real values from their networks. We also acknowledge the use of an academic license of the SMILE Engine (https://www.bayesfusion.com/) for our experimentation.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AAFAssumption-based Argumentation Framework
MASMulti-Agent System
BNBayesian Network
FIPAFoundation for Intelligent Physical Agents
AOSEAgent-Oriented Software Engineering
BAFBayesian Argumentation Framework
DAGDirected Acyclic Graph
CDFCumulative Distribution Function
BPMNBusiness Process Model and Notation
BARMASBayesian ARgumentative Multi-Agent System
ERError Rate
MAPEMonitor-Analyse-Plan-Execute
ETSIEuropean Telecommunications Standards Institute
GANAGeneric Autonomic Network Architecture
IRTFInternet Research Task Force
RFCRequest For Comments
KPIKey Performance Indicator
DPNDistributed Perception Network
MSBNMultiple Sectioned Bayesian Network
VPNVirtual Private Network
UPMUniversidad Politécnica de Madrid

References

  1. Cetinkaya, E.K.; Broyles, D.; Dandekar, A.; Srinivasan, S.; Sterbenz, J. Modelling communication network challenges for Future Internet resilience, survivability, and disruption tolerance: A simulation-based approach. Telecommun. Syst. 2013, 52, 751–766. [Google Scholar] [CrossRef]
  2. Evans, D. The Internet of Things: How the Next Evolution of the Internet is Changing Everything. CISCO White Pap. 2011, 1, 1–11. Available online: https://www.cisco.com/c/dam/en_us/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf (accessed on 2 April 2019).
  3. Plevyak, T.; Sahin, V. Next Generation Telecommunications Networks, Services, and Management; John Wiley & Sons: Hoboken, NJ, USA, 2011; Volume 15. [Google Scholar]
  4. Charalambides, M.; Pavlou, G.; Flegkas, P.; Wang, N.; Tuncer, D. Managing the future internet through intelligent in-network substrates. Netw. IEEE 2011, 25, 34–40. [Google Scholar] [CrossRef] [Green Version]
  5. Jennings, B.; van der Meer, S.; Balasubramaniam, S.; Botvich, D.; ó Foghlú, M.; Donnelly, W.; Strassner, J. Towards autonomic management of communications networks. Commun. Mag. IEEE 2007, 45, 112–121. [Google Scholar] [CrossRef]
  6. Galis, A.; Abramowicz, H.; Brunner, M.; Raz, D.; Chemouil, P.; Butler, J.; Polychronopoulos, C.; Clayman, S.; De Meer, H.; Coupaye, T. Management and service-aware networking architectures (MANA) for future Internet—Position paper: System functions, capabilities and requirements. In Proceedings of the 2009 Fourth International Conference on Communications and Networking in China, Xi’an, China, 26–28 August 2009; pp. 1–13. [Google Scholar]
  7. Tselentis, G.; Galis, A. Towards the Future Internet: Emerging Trends from European Research; IOS Press: Amsterdam, The Netherlands, 2010. [Google Scholar]
  8. Müller, P. Future Internet Design Principles; Technical Report; European Commission—Information Society and Media: Brussels, Belgium, 2012.
  9. Guckenheimer, J.; Ottino, J.M. Foundations for Complex Systems Research in the Physical Sciences and Engineering. Report from an NSF Workshop. 2008. Available online: http://pi.math.cornell.edu/~gucken/PDF/nsf_complex_systems.pdf (accessed on 2 April 2019).
  10. Clark, D.; Shenker, S.; Falk, A. GENI Research Plan (Version 4.5); GENI Research Coordination Working Group and GENI Planning Group, 2007. Available online: https://groups.geni.net/geni/raw-attachment/wiki/OldGPGDesignDocuments/GDD-06-28.pdf (accessed on 2 April 2019).
  11. Agoulmine, N. Chapter 1—Introduction to Autonomic Concepts Applied to Future Self-Managed Networks. In Autonomic Network Management Principles; Academic Press: Oxford, UK, 2011; pp. 1–26. [Google Scholar]
  12. Pras, A.; Schönwälder, J.; Burgess, M.; Festor, O.; Perez, G.M.; Stadler, R.; Stiller, B. Key research challenges in network management. Commun. Mag. IEEE 2007, 45, 104–110. [Google Scholar] [CrossRef] [Green Version]
  13. Kephart, J.; Kephart, J.; Chess, D.; Boutilier, C.; Das, R.; Kephart, J.O.; Walsh, W.E. An architectural blueprint for autonomic computing. In IBM White Paper; IBM Corporation: Hawthorne, NY, USA, 2003; pp. 2–10. [Google Scholar]
  14. Strassner, J.; Agoulmine, N.; Lehtihet, E. FOCALE: A novel autonomic networking architecture. Int. Trans. Syst. Sci. Appl. 2007, 3, 67–79. [Google Scholar]
  15. Tschudin, C.; Jelger, C. An Autonomic Network Architecture Research Project. Praxis der Informationsverarbeitung und Kommunikation 2007, 30, 26–31. [Google Scholar] [CrossRef]
  16. Wang, Y.; Zhu, K.; Sun, M.; Deng, Y. An Ensemble Learning Approach for Fault Diagnosis in Self-organizing Heterogeneous Networks. IEEE Access 2019. [Google Scholar] [CrossRef]
  17. Bouabene, G.; Jelger, C.; Tschudin, C.; Schmid, S.; Keller, A.; May, M. The autonomic network architecture (ANA). Sel. Areas Commun. IEEE J. 2010, 28, 4–14. [Google Scholar] [CrossRef]
  18. Laurent, C. Autonomic Network Engineering for the Self-Managing Future Internet (AFI); Generic Autonomic Network Architecture (An Architectural Reference Model for Autonomic Networking, Cognitive Networking and Self-Management). Available online: https://www.etsi.org/deliver/etsi_gs/AFI/001_099/002/01.01.01_60/gs_afi002v010101p.pdf (accessed on 2 April 2019).
  19. Behringer, M.; Pritikin, M.; Bjarnason, S.; Clemm, A.; Carpenter, B.; Jiang, S.; Ciavaglia, L. Autonomic Networking: Definitions and Design Goals; Technical Report. Available online: https://tools.ietf.org/html/rfc7575 (accessed on 2 April 2019).
  20. Jiang, S.; Carpenter, B.; Behringer, M. General Gap Analysis for Autonomic Networking; Technical Report. Available online: https://tools.ietf.org/html/rfc7576 (accessed on 2 April 2019).
  21. Sorrentino, M.; Bruno, M.; Trifirò, A.; Rizzo, G. A Novel Energy Efficiency Metric for Model-Based Fault Diagnosis of Telecommunication Central Offices. Energy Procedia 2019, 158, 3901–3907. [Google Scholar] [CrossRef]
  22. Zafar, A.; Akbar, A.H.; Wajid, B.; Akram, B.A.; Irfan, T. AMNA: Probe Agent Based Inter-Process Dependency Model for Wireless Sensor Network’s Fault DiAgnosis. In Proceedings of the International Telecommunications Conference, Waikoloa Village, HI, USA, 9–13 December 2019; Boyaci, A., Ekti, A.R., Aydin, M.A., Yarkan, S., Eds.; Springer: Singapore, 2019; pp. 125–135. [Google Scholar]
  23. Grastien, A.; Zanella, M. Discrete-Event Systems Fault Diagnosis. In Fault Diagnosis of Dynamic Systems: Quantitative and Qualitative Approaches; Springer International Publishing: Cham, Switzerland, 2019; pp. 197–234. [Google Scholar] [CrossRef]
  24. Carrera Barroso, A. Application of Agent Technology for Fault Diagnosis of Telecommunication Networks. Ph.D. Thesis, E.T.S.I. Telecomunicación (UPM), Madrid, Spain, 2016. [Google Scholar]
  25. Carrera, A.; Iglesias, C.A.; García-Algarra, J.; Kolařík, D. A real-life application of multi-agent systems for fault diagnosis in the provision of an Internet business service. J. Netw. Comput. Appl. 2014, 37, 146–154. [Google Scholar] [CrossRef]
  26. Cai, B.; Liu, Y.; Hu, J.; Liu, Z.; Wu, S.; Ji, R. Bayesian Networks in Fault Diagnosis; World Scientific: Singapore, 2018. [Google Scholar] [CrossRef]
  27. Chen, S.; Gao, L.; Liao, G. MBAN-MLC: A multi-label classification method and its application in automating fault diagnosis. Int. J. Internet Manuf. Serv. 2018, 5, 350–364. [Google Scholar] [CrossRef]
  28. Li, Y.; Liu, J. A Bayesian Network Approach for Imbalanced Fault Detection in High Speed Rail Systems. In Proceedings of the 2018 IEEE International Conference on Prognostics and Health Management (ICPHM), Seattle, WA, USA, 11–13 June 2018; pp. 1–7. [Google Scholar] [CrossRef]
  29. Lukasik, Z.; Nowakowski, W.; Ciszewski, T.; Freimane, J. A fault diagnostic methodology for railway automatics systems. Procedia Comput. Sci. 2019, 149, 159–166. [Google Scholar] [CrossRef]
  30. Zhang, Z.; Mehmood, A.; Shu, L.; Huo, Z.; Zhang, Y.; Mukherjee, M. A Survey on Fault Diagnosis in Wireless Sensor Networks. IEEE Access 2018, 6, 11349–11364. [Google Scholar] [CrossRef]
  31. Sorrentino, M.; Bruno, M.; Trifirò, A.; Rizzo, G. An innovative energy efficiency metric for data analytics and diagnostics in telecommunication applications. Appl. Energy 2019, 242, 1539–1548. [Google Scholar] [CrossRef]
  32. Mata, J.; de Miguel, I.; Durán, R.J.; Merayo, N.; Singh, S.K.; Jukan, A.; Chamania, M. Artificial intelligence (AI) methods in optical networks: A comprehensive survey. Opt. Switch. Netw. 2018, 28, 43–57. [Google Scholar] [CrossRef]
  33. Velasco, L.; Rafique, D. Fault Management Based on Machine Learning. In Proceedings of the 2019 Optical Fiber Communications Conference and Exhibition (OFC), San Diego, CA, USA, 3–7 March 2019; pp. 1–3. [Google Scholar]
  34. Zhang, D. Multi-Agent Based Control of Large-Scale Complex Systems Employing Distributed Dynamic Inference Engine. Ph.D. Thesis, Georgia Institute of Technology, Atlanta, GA, USA, 2010. [Google Scholar]
  35. Pavlin, G.; Oude, P.D.; Maris, M.; Hood, T. Distributed Perception Networks: An Architecture for Information Fusion Systems Based on Causal Probabilistic Models. In Proceedings of the 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, Heidelberg, Germany, 3–6 September 2006; pp. 303–310. [Google Scholar] [CrossRef]
  36. Xiang, Y.; Poole, D.; Beddoes, M.P. Multiply Sectioned Bayesian Networks and Junction Forests for Large Knowledge-Based Systems. Comput. Intell. 1993, 9, 171–220. [Google Scholar] [CrossRef]
  37. Oña;García, A.L.; Sucar, L.E.; Morales, E.F. A Distributed Probabilistic Model for Fault Diagnosis. In Advances in Artificial Intelligence—IBERAMIA 2018; Simari, G.R., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J.A., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 42–53. [Google Scholar]
  38. Modi, P.J.; Shen, W.M. Collaborative multiagent learning for classification tasks. In Proceedings of the Fifth International Conference on Autonomous Agents, Montreal, QC, Canada, 28 May–1 June 2001; pp. 37–38. [Google Scholar]
  39. Wardeh, M.; Coenen, F.; Bench-Capon, T. Multi-agent based classification using argumentation from experience. Auton. Agents Multi Agent Syst. 2012, 25, 447–474. [Google Scholar] [CrossRef]
  40. Gorodetsky, V.; Karsaeyv, O.; Samoilov, V. Multi-agent technology for distributed data mining and classification. In Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology, Halifax, NS, Canada, 13–17 October 2003; pp. 438–441. [Google Scholar]
  41. Pan, R.; Peng, Y.; Ding, Z. Belief Update in Bayesian Networks Using Uncertain Evidence. In Proceedings of the 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06), Arlington, VA, USA, 13–15 November 2006; pp. 441–444. [Google Scholar]
  42. Dung, P.M. On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artif. Intell. 1995, 77, 321–357. [Google Scholar] [CrossRef] [Green Version]
  43. Bondarenko, A.; Toni, F.; Kowalski, R.A. An Assumption-Based Framework for Non-Monotonic Reasoning. In Second International Workshop on Logic Programming and Non-Monotonic Reasoning; MIT Press: Cambridge, MA, USA, 1993; pp. 171–189. [Google Scholar]
  44. Li, H.; Oren, N.; Norman, T.J. Theorie and Applications of Formal Argumentation: First International Workshop, TAFA 2011. Barcelona, Spain, July 16–17, 2011. In Theorie and Applications of Formal Argumentation; Chapter Probabilistic Argumentation Frameworks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 1–16. [Google Scholar]
  45. Rienstra, T.; Thimm, M.; Liao, B.; van der Torre, L. Probabilistic abstract argumentation based on scc decomposability. In Proceedings of the Sixteenth International Conference on Principles of Knowledge Representation and Reasoning, Tempe, AZ, USA, 30 October–2 November 2018. [Google Scholar]
  46. Prakken, H. Probabilistic strength of arguments with structure. In Proceedings of the Sixteenth International Conference on Principles of Knowledge Representation and Reasoning, Tempe, AZ, USA, 30 October–2 November 2018. [Google Scholar]
  47. Hunter, A. A probabilistic approach to modelling uncertain logical arguments. Int. J. Approx. Reason. 2013, 54, 47–81. [Google Scholar] [CrossRef]
  48. Keppens, J. Argument diagram extraction from evidential Bayesian networks. Artif. Intell. Law 2012, 20, 109–143. [Google Scholar] [CrossRef]
  49. Eva, B.; Hartmann, S. Bayesian argumentation and the value of logical validity. Psychol. Rev. 2018, 125, 806. [Google Scholar] [CrossRef] [PubMed]
  50. Riveret, R.; Baroni, P.; Gao, Y.; Governatori, G.; Rotolo, A.; Sartor, G. A labelling framework for probabilistic argumentation. Ann. Math. Artif. Intell. 2018, 83, 21–71. [Google Scholar] [CrossRef] [Green Version]
  51. Zenker, F. Bayesian Argumentation: The Practical Side of Probability. In Bayesian Argumentation; Springer: Dordrecht, The Netherlands, 2013; Volume 362, pp. 1–11. [Google Scholar]
  52. Hahn, U.; Oaksford, M.; Harris, A. Testimony and Argument: A Bayesian Perspective. In Bayesian Argumentation; Springer: Dordrecht, The Netherlands, 2013; Volume 362, pp. 15–38. [Google Scholar]
  53. Eva, B.; Hartmann, S. Supplemental Material for Bayesian Argumentation and the Value of Logical Validity. Psychol. Rev. 2018, 125, 806–821. [Google Scholar] [CrossRef] [PubMed]
  54. Prakken, H. A new use case for argumentation support tools: Supporting discussions of Bayesian analyses of complex criminal cases. Artif. Intell. Law 2018, 1–23. [Google Scholar] [CrossRef]
  55. Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Morgan Kaufmann: Burlington, MA, USA, 1988. [Google Scholar]
  56. Cerf, V. Abstraction, Federation, and Scalability. Internet Comput. IEEE 2013, 17, 96-c3. [Google Scholar] [CrossRef]
  57. Koiter, J.R. Visualizing Inference in Bayesian Networks. Ph.D. Thesis, Delft University of Technology, Delft, The Netherlands, 2006. [Google Scholar]
  58. Nikulin, M. Hellinger distance. In Encyclopaedia of Mathematics; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2002. [Google Scholar]
  59. Allweyer, T. BPMN 2.0: Introduction to the Standard for Business Process Modeling; BoD–Books on Demand: Norderstedt, Germany, 2016. [Google Scholar]
  60. Benjamins, R. Problem-Solving Methods for Diagnosis and their Role. Int. J. Expert Syst. Res. Appl. 1995, 8, 93–120. [Google Scholar]
  61. Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Figure 1. Overview of Bayesian Argumentative agents in federated domains.
Figure 1. Overview of Bayesian Argumentative agents in federated domains.
Sensors 19 03408 g001
Figure 2. Phases of the Coordination Protocol.
Figure 2. Phases of the Coordination Protocol.
Sensors 19 03408 g002
Figure 3. Coalition Formation Phase.
Figure 3. Coalition Formation Phase.
Sensors 19 03408 g003
Figure 4. Argumentation Phase.
Figure 4. Argumentation Phase.
Sensors 19 03408 g004
Figure 5. Conclusion Phase.
Figure 5. Conclusion Phase.
Sensors 19 03408 g005
Figure 6. Friedman test scores for the best classifiers by uncertainty level.
Figure 6. Friedman test scores for the best classifiers by uncertainty level.
Sensors 19 03408 g006
Table 1. Summary of the considered classification techniques.
Table 1. Summary of the considered classification techniques.
Classification TechniqueAcronymSoftware Library
J48 - implementation of C4.5 algorithmJ48WEKA
Logical Analysis of Data (LAD) TreeLADTreeWEKA
Pruning Rule-based Classification TreePARTWEKA
Sequential Minimal OptimizationSMOWEKA
Naive Bayes TreeNBTreeWEKA
Bayesian SearchBayesSearchSMILE
Bayesian ARgumentative Multi-AgentBARMASSMILE + MASON
System
Table 2. Datasets’ summary.
Table 2. Datasets’ summary.
Dataset# of Instances# of Classes# of Attributes
Network (Private dataset with Telefónica O2 Czech Republic rights.)11831527
Zoo (http://sci2s.ugr.es/keel/dataset.php?cod=69)101716
Solar Flare (http://sci2s.ugr.es/keel/dataset.php?cod=98)1066611
Marketing (http://sci2s.ugr.es/keel/dataset.php?cod=163)8933913
Nursery (http://sci2s.ugr.es/keel/dataset.php?cod=103)1269059
Mushroom (http://archive.ics.uci.edu/ml/datasets/Mushroom)8124222
Chess (http://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King%29)28056186
Table 3. Results without uncertainty (all available data). Best results are marked in bold.
Table 3. Results without uncertainty (all available data). Best results are marked in bold.
DatasetBARMASBayesJ48LADNBPARTSMO
# of AgentsSearchTreeTree
234
Network0.160.160.160.180.130.140.140.140.12
Zoo0.000.030.000.020.020.010.010.020.01
Solar Flare0.290.260.260.360.260.270.260.290.26
Marketing0.710.700.700.710.70.670.680.700.66
Nursery0.070.080.090.070.010.080.030.010.07
Mushroom0.010.000.010.010.000.000.000.000.00
Chess0.50.620.640.610.420.690.390.460.56
Friedman Test5.936.076.007.503.505.143.074.573.21
Ranking687935142
Table 4. Results with 25% missing attributes. Best results are marked in bold.
Table 4. Results with 25% missing attributes. Best results are marked in bold.
DatasetBARMASBayesJ48LADNBPARTSMO
# of AgentsSearchTreeTree
234
Network0.160.170.160.170.140.140.150.150.18
Zoo0.090.120.110.110.430.40.160.430.41
Solar Flare0.580.560.60.620.680.670.580.680.79
Marketing0.700.700.700.710.700.730.670.700.69
Nursery0.250.250.260.250.240.270.240.260.32
Mushroom0.010.010.010.030.260.520.040.230.23
Chess0.640.730.770.680.620.860.640.680.75
Friedman Test3.214.214.795.074.716.933.006.007.07
Ranking235648179
Table 5. Results with 50% missing attributes. Best results are marked in bold.
Table 5. Results with 50% missing attributes. Best results are marked in bold.
DatasetBARMASBayesJ48LADNBPARTSMO
# of AgentsSearchTreeTree
234
Network0.220.220.210.240.230.230.280.230.30
Zoo0.130.180.190.160.510.590.280.530.60
Solar Flare0.630.610.610.650.690.690.630.690.80
Marketing0.720.720.710.720.720.890.730.720.83
Nursery0.270.270.270.270.270.270.270.280.32
Mushroom0.040.040.030.050.320.520.220.410.21
Chess0.720.740.770.740.720.870.730.740.78
Friedman Test2.713.212.794.434.937.295.216.298.14
Ranking132458679
Table 6. Friedman test scores with average uncertainty levels. Best results are marked in bold.
Table 6. Friedman test scores with average uncertainty levels. Best results are marked in bold.
DatasetBARMASBayesJ48LADNBPARTSMO
# of AgentsSearchTreeTree
234
Friedman Test2.963.713.784.754.827.104.106.147.61
Ranking123568479

Share and Cite

MDPI and ACS Style

Carrera, Á.; Alonso, E.; Iglesias, C.A. A Bayesian Argumentation Framework for Distributed Fault Diagnosis in Telecommunication Networks. Sensors 2019, 19, 3408. https://doi.org/10.3390/s19153408

AMA Style

Carrera Á, Alonso E, Iglesias CA. A Bayesian Argumentation Framework for Distributed Fault Diagnosis in Telecommunication Networks. Sensors. 2019; 19(15):3408. https://doi.org/10.3390/s19153408

Chicago/Turabian Style

Carrera, Álvaro, Eduardo Alonso, and Carlos A. Iglesias. 2019. "A Bayesian Argumentation Framework for Distributed Fault Diagnosis in Telecommunication Networks" Sensors 19, no. 15: 3408. https://doi.org/10.3390/s19153408

APA Style

Carrera, Á., Alonso, E., & Iglesias, C. A. (2019). A Bayesian Argumentation Framework for Distributed Fault Diagnosis in Telecommunication Networks. Sensors, 19(15), 3408. https://doi.org/10.3390/s19153408

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop