In this section, we propose to enhance the partitioning strategy of NoNN, in two aspects, to make it more resilient to communication failures.
4.1. Averaging
Although it has been argued that NoNN itself is resilient to communication failures, it is only so to limited information loss. It was observed in [
12] that the inference accuracy starts decreasing significantly as the number of unavailable student networks gets more than three devices. This is attributed to be the monolithic one-on-one relationship between the parts and student networks in NoNN. That is, an important part is more likely to be missing if it is reflected to only one student network. As a remedy to this problem, we propose to use parts in a more
redundant way when training student networks. Therefore, in the averaging technique, we keep the number of parts (
M) less than or equal to half of the number of student networks (
S) to be derived, i.e.,
. Then, the
ith student network (among
S) is trained by the
jth part (among
M) where
i .
In the example illustrated in
Figure 6, part 0 is used to train student networks 0 and 2, while part 1 is used for others. In this way, each part can be used to derive at least two student networks.
If a student network is trained from a part with
S filters, its inference result is represented as a
vector. When merging the partial results to conclude the final inference result, we collect the results from the student networks that are trained from the same part and output the average of them as illustrated in
Figure 6. In this example, the results from student networks 0 and 2 are combined by averaging into a
vector and another
vector is obtained in the same way from student networks 1 and 3. At last, the final inference result is represented as a concatenation of them, i.e., a
vector.
4.2. Eigenvector-Based Partitioning
Note in Equation (
7) that the correlation between filters in fconv is significantly compromised by
in NoNN. A similar partitioning problem exists in the analysis of social network and, in that domain, Newman [
19] proposed a partitioning approach that groups the vertices with strong correlation into the same group. They partition a given graph into two exclusive subsets and, in order to quantitatively evaluate the partitioning, a quality factor
Q is defined as follows for a graph with
N vertices:
where
denotes the binary partitioning decision for all
N vertices in the graph. Thus, if the
ith and
jth vertices are partitioned into the same group, we have
, while
becomes
if the two vertices are separated. Then, the quality factor
Q can be rewritten as follows with
B being an
matrix (
), based on Eigenvector decomposition (Eigenvector of a
square matrix
Y is defined as a
vector
x that satisfies
, where
is a nonzero scalar value, called Eigenvalue. There can exist at most
z pairs of Eigenvector
x and Eigenvalue
for a
matrix):
It has been shown in [
19] that maximizing the quality factor
Q in Equation (
9) is equivalent to finding the largest eigenvalue of
B in Equation (
10).
In the proposed partitioning approach, we borrow the basic Eigenvector-based partitioning presented in [
19]. However, this Eigenvector-based decomposition has a couple of limitations to be applied as it is to the target partitioning problem of distributed NNs. First, it partitions the vertices into only two subsets. In distributed NNs, the partitioning should be performed generically to any arbitrary number of devices. Second, the size of the two partitioned subsets may differ significantly. In this Eigenvector-based decomposition, only the connectivity between the vertices is taken into consideration in maximizing
Q, which can result in considerably unbalanced sizes between the partitioned subsets. In order to tackle these issues, we propose to apply the basic partitioning algorithm recursively and adaptively, as will be discussed in what follows.
Algorithm 1 delineates how we revise the basic Eigenvector-based partitioning that partitions a given filter group into two. The modified partitioning, procedure
E-PARTITION, takes three arguments as input: a set of filters to be partitioned (
), an adjacency matrix (
), and the desired minimum number of filters for a single part (
). As the Eigenvector-based partitioning divides the original group into two subgroups, a partitioning result is characterized by the followings: two divided sets of filters (
), corresponding adjacency matrices for them (
), and the quality factor of the partitioning (Eigenvalue)
. Therefore, a return value of
E-PARTITION is a tuple of
. The working flow of the proposed algorithm is as follows.
Algorithm 1 Proposed modified Eigenvector-based partitioning. |
1: | : a set of N filters to be partitioned (input argument) | |
2: | : adjacency matrix (input argument) | |
3: | : desired number of vertices in a part (input argument) | |
4: | : two partitioned filter subsets (return value) | |
5: | : two adjacency matrices for (return value) | |
6: | : quality factor of the resultant partitioning, i.e., Eigenvalue (return value) | |
7: | | |
8: | procedureE-partition() | |
9: | , ; | ▹ Initialization of return value |
10: | , ; | |
11: | ; | |
12: | | |
13: | if then | ▹ If too small to be partitioned into two subgroups |
14: | return ; | ▹ Not valid |
15: | end if | |
16: | | |
17: | EigenSearch(); | ▹ Ordered list of Eigenvectors/values |
18: | for i in 0 to do | ▹ For all possible Eigenvectors |
19: | , ; | |
20: | ; | |
21: | ; | |
22: | number of rows in ; | ▹ is an matrix |
23: | for j in 0 to do | |
24: | if then | |
25: | ; | |
26: | else | |
27: | ; | |
28: | end if | |
29: | end for | |
30: | if min then | ▹ Check if valid |
31: | ; | |
32: | break; | |
33: | end if | |
34: | end for | |
35: | | |
36: | for i in 0 to 1 do | ▹ For two partitioned subsets, |
37: | for j in 0 to do | |
38: | the jth element of ; | |
39: | the th element of ; | ▹ Calculate |
40: | for k in 0 to do | |
41: | the kth element of ; | |
42: | ; | ▹ Calculate |
43: | end for | |
44: | end for | |
45: | end for | |
46: | | |
47: | return ; | ▹ Return the best possible partitioning result |
48: | end procedure | |
After initializing the return values (lines 9–11 of Algorithm 1), it first checks the input filter group is big enough to be partitioned into two with respect to the desired size, i.e., the number of filters in a partitioned group (lines 13–15 of Algorithm 1). If the size of input set, , is less than , the procedure immediately returns and no partitioning is performed. Otherwise, all possible Eigenvector and Eigenvalue pairs are calculated and stored in in a descending order of their Eigenvalues by invoking EigenSearch (line 17 of Algorithm 1). For example, after line 17 of Algorithm 1, and are the the biggest possible Eigenvalue and its corresponding Eigenvector for the matrix . Then, in lines 18–34 of Algorithm 1, it examines all possible Eigenvector candidates in a descending order of their Eigenvalues to see if the corresponding partitioning from the current Eigenvector satisfies the size requirement enforced by . To be more specific, given a candidate Eigenvector, it tries to partition into two subsets (lines 23–29 of Algorithm 1). If both of the partitioned subsets have at least filters, the partitioning is deemed as valid and it stops searching the remaining candidates (lines 30–33 of Algorithm 1). Otherwise, it continues to search for a valid partitioning from the next available Eigenvector/Eigenvalue pair in . After finding a valid partitioning, it is necessary to build new partitioned subgroups and their corresponding adjacency matrices; these are illustrated in lines 36–45 of Algorithm 1. At last, the procedure returns the found partitioning, characterized by and , as well as the quality factor of the resultant partitioning, .
Note that E-PARTITION shown in Algorithm 1 only partitions the given group into two subsets. In order to partition a given filter group into an arbitrary number of subgroups, we apply E-PARTITION in an adaptive and iterative way to obtain parts, each of which consists of filters, as described in Algorithm 2. At the beginning, a sorted list (SortedList), that maintains partitioning candidates in a descending order of , is initialized with of the teacher network (lines 1–3 of Algorithm 2). Until the number of partitions reaches the desired number, , it performs the followings iteratively: (1) it takes a candidate partitioning with the biggest from (line 5 of Algorithm 2), (2) and performs the partitioning if it is a valid partitioning (lines 6–8 of Algorithm 2); (3) two newly created groups are inserted back to to be considered for further partitioning in the successive iterations (lines 9–10 of Algorithm 2); (4) then, it increases the number of parts by one as one part is removed and two new ones are created (lines 11 of Algorithm 2). After line 12 of Algorithm 2, we have subsets of filters in , each of which includes at least filters. Thus, in lines 14–22 of Algorithm 2, we remove less important filters in each part to make the sizes of all parts consistent. In doing so, the filters assigned to each group is sorted in a descending order of average activation value (line 16 of Algorithm 2) and only first filters remain in the part (lines 18–20 of Algorithm 2). Finally, we obtain parts stored in and each part has exactly filters in it.
Figure 7 shows how the proposed distributed NN partitioning technique works for the same partitioning example with four student networks illustrated in
Figure 5. Unlike NoNN, we are able to adjust the desired part’s size in the proposed distributed NN partitioning method; in this example each part is expected to have 30 filters, i.e.,
. Firstly, the adjacency matrix
of 256 filters in
is derived by the AH rule, Equation (
6). By invoking
E-PARTITION with
,
, and
,
is firstly partitioned into two subgroups
and
as shown in the figure. In the two following iterations of the while loop in Algorithm 2, they are partitioned into four groups:
with 50 filters,
with 60 filters,
with 79 filters, and
with 67 filters. As each part is expected to have only
filters, total 136 filters are eliminated by the last for loop of Algorithm 2.
Algorithm 2 Proposed NN Partitioning using E-PARTITION. |
1: | ; | |
2: | .push(E-PARTITION); | ▹ Initial partitioning |
3: | ; | |
4: | whiledo | ▹ Until the number of parts becomes |
5: | .pop(); | ▹ take a candidate partition with the biggest |
6: | if then | ▹ No available partitioning |
7: | break; | |
8: | end if | |
9: | .push(E-PARTITION(,,); | |
10: | .push(E-PARTITION(,,); | |
11: | ; | ▹ One part eliminated; two new ones created |
12: | end while | |
13: | | ▹ Now, has elements |
14: | ; | |
15: | fordo | ▹ Prune out useless filters |
16: | sorted by average activation value; | |
17: | ; | |
18: | for i in 0 to do | |
19: | .pop()}; | |
20: | end for | |
21: | .push(); | |
22: | end for | |