4.1. Ontology Building
The ontology development in our work is carried out in several rounds for corrections and optimization. In each round, the development is referred to the method proposed by Natalya F. Noy et al. [
16].
Figure 3 summarizes the ontology we built including entity types and semantic relations. The whole flight control system includes five kinds of entities that are, respectively,
Equipment,
Computer,
System,
Power, and
Function (
Figure 3a). The
Equipment class represents physical units used in the system, such as “elevator”, “rudder”, “flap”, etc. The
Computer class includes computers, such as “slat flap control computer”, “elevator aileron computer”, “sec”, etc., leveraged in flight control for calculations and issuing commands. The entity class
System represents different kinds of subsystems, such as “centralized fault display system”, “mechanical trim system”, and “efcs”. The
Power class represents power sources, e.g., “electrical motor”, “hydraulic actuator”, et al. Finally, the
Function class represents various functions that can be achieved in the flight control system, e.g., “turn coordination”, “yaw damping”, “aileron droop”, “pitch control”, “roll control”, “yaw control”, etc. Based on the five entity classes, we define relations according to the semantic between different entities.
Figure 3b leverages directed lines to depict all possible relations appeared between a pair of entity classes. Relations defined are, respectively,
consistOf,
control,
achieve,
sendMsgTo,
connectTo,
locateOn,
drive, and
acronym.
ConsistOf expresses that one unit is composed of several other units.
Control represents one unit controls multiple other units; for example, a dedicated computer controls some mechanical components.
Achieve is defined to express one component achieving a kind of function.
SendMsgTo defines one component sending commands or messages to other components.
LocatedOn reflects on some kind of positional relationship.
Drive represents power sources. Finally,
acronym is used to express an entity’s acronym.
4.3. Named Entity Recognition and Relation Extraction
In this section, we elaborate on the way of named entity recognition and relation extraction. As a flight control system is domain-specific, few referential experiences could be referred to. We adopt the basic pipeline paradigm to reach a milestone.
By careful study of the annotated text, we note that different entity classes have different features. The entity class Power is special: entities of this class always appear regularly such that the appearance can be described by specific rules. Unfortunately, not all entity classes show such properties. Hence, we adopt a hybrid method that first leverages a rule-based algorithm to recognize the entities belonging to Power and then an algorithm to recognize the remaining entities.
The rule-based entity recognition method is designed based on regular expression which is a logical formula for string operations. Regular expression uses predefined specific characters and their combinations to form a pattern string. The pattern string is indeed a filter used to retrieve or replace the text conforming to the pattern.
Figure 5 is the pseudo-code of the rule-based entity recognition algorithm dedicated to the
Power class. For each input sentence, we first obtain POS (part of speech) of each token (line 3) by NLTK (natural language toolkit) [
42], a famous toolkit for natural language processing. We then leverage regular expression to identify all noun phrases (line 4). The
appears in line 4 and is defined to be
<NN.*|JJ>*<NN.*> ∨ <JJ>*<CC><JJ>*<NN.*>, where “NN” means a noun, “.” represents any character except
and
, “*” means the token or the token sequence in <> before “*” can appear multiple times, “|” expresses logic or, “JJ” means an adjective and “CC” refers to a conjunction. The first regular expression is used for the case that a sequence of nouns or a sequence of adjectives describe a key noun phrase, while the second one handles the scenario that two sequences of adjectives concatenated by a conjunction and describe a key noun phrase. The two rules are mutually exclusive, i.e., matching any one of them is enough. Finally, we apply a simple matching that ends with “motor” or “actuator” to differentiate whether there is a noun phrase belonging to the
Power class (lines 6–9). Let us consider the sentence “One valve block is given for each hydraulic motor” as an example. The set of noun phrases
returned from line 4 is {“valve block”, “hydraulic motor”}. It can be seen that “hydraulic motor” ends with motor such that it is an entity of
Power.
The entity classes beyond Power are not easy to recognize with manual-defined rules, so we propose a machine-learning-based algorithm to handle them. Entity recognition is now modeled as a sequence annotation problem. Let us use a set to express a sentence, where is a token (word). Our target is to build a model f such that , where is the annotation sequence output by the model f and is the real annotation sequence.
The entity recognition model
f we designed is based on a popular strategy combining Transformer encoder with a CRF (conditional random field) layer [
24,
43].
Figure 6 provides an overview of the recognition model. The Transformer encoder includes a position encoding layer and an encoder that mainly consists of multi-head self-attention and a feed-forward fully connected layer. Meanwhile, residual connection and normalization are used to connect the outputs of different hidden layers. When achieving output from the Transformer encoder, we leverage a linear and CRF layer to output the sequence of predicated labels.
For each sentence, Transformer takes its embedding as the input, which is a tensor consisting of the embedding of every token. To combine the character of the flight control system, we customize word embedding delicately by adopting information from POS, domain-specific phrase, and whether it is a high-frequency word. Specifically, there are five steps:
Step (1): The model word2vec [
44] is used to achieve an embedding for every token
. This process is represented by Equation (
1), where
is the embedding obtained.
Step (2): As entities are all noun phrases, we take POS into the embedding. For a sentence
, we use the toolkit NLTK to obtain a corresponding sequence of POS:
. Embedding related to POS is represented by Equation (
2), where
is the output embedding.
Step (3): We consider domain-specific phrases and add such information into embedding. Let
include the domain-specific phrases in the dictionary of terms. For a token
, if
matches one domain-specific phrase
, then
obtained from Equation (
3) is the embedding of the domain-specific phrase. If
is not a domain-specific phrase, we leverage a default embedding.
Step (4): We leverage the information of whether
is a high-frequency word. If
appears with high frequency, we denote it as “True”. Otherwise, we denote it as “False”. For a sentence
S, we can obtain a corresponding sequence
, where each
. For each token
,
achieved from Equation (
4) is the embedding referring to the information of high frequency
Step (5): Finally, all the partial embedding vectors are concatenated together as a complete one for the word
:
For a sentence
S, embedding of each token is organized as a tensor which is the input of the Transformer encoder. The linear and CRF layer is used for label predication. The linear operation is for dimension transformation whose output is a sequence of vectors
corresponding to the
n tokens of sentence
S. CRF takes charges of scoring each candidate label sequence. For example, consider
is a candidate label sequence, where
and
are two auxiliary labels representing, respectively, the starting and ending tags of the sentence. The score of a candidate label sequence is obtained by Equation (
6), where
is the probability that
’s label is
, and
T is a probability transition matrix with dimension
.
Finally, a Softmax of all possible label sequences yields the probability (Equation (
7)). The candidate label sequence with the highest probability is selected as the output.
Relation extraction is the task to predict the relation between two entities. As the importance of global semantic information, we consider Bi-LSTM + self-attention mechanism [
32] to perform relation extraction. The attention mechanism adopted here aims at effectively obtaining key information. The schematic diagram of relation extraction is depicted in
Figure 7.
The input
is a sentence
S together with two entities
and
. The model will predict whether in the sentence
S there exists a pre-defined relation from
to
, i.e., for such a relation
and
are, respectively, the head and tail entities. Different from [
32], for each token
in
S, we will obtain its embedding which is concatenated by
,
,
and
.
and
are, respectively, from Equations (
1) and (
2).
and
are relative position embeddings obtained by Equations (
8) and (
9).
and in the above two equations are two relative positions of , respectively, to the given head and tail entities. For a sentence S, we can obtain two relative position sequences and . For example, suppose a sentence S is “the rudder does the yaw control”. The head and tail entities here are, respectively, “yaw control” and “rudder”. There are six tokens in S such that the position sequence relative to “yaw control” is , and the position sequence relative to “rudder” is . Embeddings are then put into a Bi-LSTM with the self-attention model to extract features. Finally, a Softmax layer generates the output.
Note that we do not simply use the above model to directly perform relation extraction due to us confronting a non-negligible sample imbalance problem. For example, in the sentence “The left blue and right green actuators are controlled by ELAC 1, and the other two actuators by ELAC 2”, there are four named entities: “left blue and right green actuators”, “ELAC 1”, “actuators”, and “ELAC 2”. Among the 4 entities, there are 12 tuples. However, only two of them have a specific semantic relation: 〈“ELAC 1”,
control, “left blue and right green actuators”〉; 〈“ELAC 2”,
control, “actuators”〉. For the other 10 tuples, no relation exists. We take non-relation as a negative sample. Clearly, the number of negative samples dominates the dataset. In order to relieve the imbalance problem, we present a two-stage relation extraction process by leveraging the model in
Figure 7. During the first stage, we train a particular relation extraction model that only takes charge of a binary classification: deciding whether there exists a defined relation between two given entities. During the section stage, we train another model to differentiate concrete relation types. Note that in the second stage it is not necessarily to consider an empty relation again such that the imbalance problem is relieved.