3.1.2. Method to Minimize Depth
Assume T is a nonempty decision table containing features. The values may be computed using the DAG . Let’s say t is . We compute the value for each vertex of to determine . It is easier for us to take into account not only subtables corresponding to vertices of , but also (empty subtables of T), as well as (subtables that contain only a single row r of T but are not considered vertices of ). The work of our method starts with terminal vertices of (vertices without exiting arcs) and these unique subtables. Then we continue moving up to the table T one step at a time.
Assume is equal to for a row r of T or a terminal vertex of . The decision tree for is the one that has a single vertex tagged by a common decision for , which means that . If , the decision tree for will be only one vertex labeled with 0 and in this case, .
Assume is a nonterminal vertex of such that is already known for each child of . With the help of this information, we can determine the decision tree’s minimum depth for . Regarding the root of this tree, we can have the following scenarios:
A feature from is used to mark the root; the decision tree’s minimum depth is indicated by the symbol .
a hypothesis over T is used to mark the root; the decision tree’s minimum depth is indicated by the symbol .
a proper hypothesis over T is used to mark the root; the decision tree’s minimum depth is indicated by the symbol .
The set is not empty because is nondegenerate. Here are three methods for calculating the values , , and , respectively.
Consider a decision tree’s root labeled with
where the
is a decision tree for
. For each
, an edge exits the root and enters a vertex,
. On this edge, the equation system
is labeled. The vertex
is the root of a decision tree of form
t for
, and its depth is
. Clearly
Because
for any
,
For each , it is evident that the subtable is ’s child in the , therefore ’s value is known.
With the root labeled with the feature and decision trees of form t used for the subtables corresponding to the root’s children, it can be shown that is the minimum depth of a decision tree for .
The fact that there is a
b in
for each such feature and that
implies that we cannot build an optimal decision tree for
means that we should not consider these features. Thus, the depth, in this case, will be
The procedure of Computation. Create the collection of features
. Calculate the value
for each feature
using (
1). Determine
with reference to (
2).
Note that, this procedure’s time complexity is polynomial based on table T.
We consider only admissible hypothesis over T. Assume is such a hypothesis. It is admissible for and a feature in if is not the same as for any . It is not admissible for and a feature iff and . Such a hypothesis is referred to as admissible for if it is admissible for both and any feature .
Assume a decision tree’s root is tagged by the hypothesis
(admissible for
) where the
is a decision tree for
. The collection of responses to the question associated with
H is
. There must be an edge that departs from
’s root and reaches a vertex
for each
S in
. The equation system
S is written on this edge. The decision tree of type
t for
has this
as its root, and its depth is
. Clearly
Note that,
if
or
for any row
r of
T.
is equal to empty set whenever
because
H is admissible for
. It is evident that
for any feature
and any
where
. In this case,
. Hence
Obviously for any and any , is a child of in . Therefore, is already known.
With the root labeled by the hypothesis H and decision trees of form t used for the subtables corresponding to the root’s children, it can be shown that is the minimum depth of a decision tree for .
Not admissible for hypotheses should not be taken into consideration.
The procedure of Computation. Create the hypothesis:
for
. Assume that
in
. The sole number in the set
is then equal to
. Let’s say
. If so,
where
is the lowest number from
. There is no doubt that
is admissible for
. Calculate the value of
using (
3). It is clear from a quick look at (
3) that
.
Note that, this procedure’s time complexity is polynomial based on table T.
The procedure of Computation. Create a proper hypothesis
corresponding to every row
and determine if this proper hypothesis is admissible for
. Calculate the value of
using (
3).
is equal to the minimum of the obtained numbers.
Note that this procedure’s time complexity is polynomial based on table T.
Now, an algorithm (Algorithm 2) is described below that determines . It is the minimum depth of a decision tree with the provided form t (where, ) for the table T. As the algorithm executes, we learn for each vertex in .
Algorithm 2: (computation of ). |
Input: The DAG for a nonempty table T. |
Output: The value of the depth . |
The algorithm will end if all vertices of the DAG have numbers associated to them and return the number associated to the vertex T. In the absence of such a case, we select a vertex that does not have an assigned number. This vertex can be a terminal vertex of or a nonterminal vertex of where every child contains the numbers that are assigned to it. If ( is a terminal vertex) { assign the value to and move on to step 1. } else { perform the following based on the value t: If ( ) { calculate the value and set . } else if ( ) { calculate the value and set . } else if ( ) { calculate the value and and set . } else if ( ) { calculate the value and set . } else if ( ) { calculate the value and and set . } } Move on to step 1.
|