1. Introduction
Practically, any digital system includes various sequential blocks [
1,
2,
3]. To specify the behaviour of a sequential block, it is necessary to use some formal model. In many cases, the behaviour of sequential blocks is specified by Mealy finite state machines (FSMs) [
4,
5,
6,
7]. Three main characteristics determine the quality of an FSM circuit: the chip area occupied by the circuit, performance (either minimum propagation time or maximum operating frequency), and the consumption of power. These characteristics are strongly interconnected [
8]. As a rule, the occupied chip area significantly affects other characteristics of an FSM circuit [
9,
10]. Various methods of structural decomposition can be used for optimizing the size of the occupied chip area [
11]. These methods have one serious drawback: they lead to multi-level FSM circuits with a significant decrease in performance compared with equivalent single-level circuits.
If a multi-level FSM circuit does not provide the required operating frequency, then it is necessary to reduce the number of levels. This reducing must be performed in a way that increases as little as possible the occupied chip area. One of these methods is proposed in [
12]. This is a method of twofold state encoding. The method is aimed at Mealy FSMs implemented with field programmable gate arrays (FPGAs) [
13,
14,
15].
Now, FPGAs are widely used for implementing various digital systems [
16,
17,
18]. Due to this, we chose the FPGA-based Mealy FSMs as a research object. Our current article considers Mealy FSM circuits implemented using three main components of FPGA chips. These components are look-up table (LUT) elements, programmable flip-flops and interconnections of FPGAs. We focus our current research on solutions of Xilinx which is the largest manufacturer of FPGA chips [
19,
20]. So, we consider ways of increasing the performance of LUT-based Mealy FSMs.
A LUT is a logic block having
inputs and a single output [
14,
21]. A single LUT may implement an arbitrary Boolean function having up to
arguments. Unfortunately, the value of
is very small [
20]. If a Boolean function depends on more than
variables, then it is necessary to apply methods of functional decomposition of this function [
19,
22]. It is known, the applying functional decomposition leads to multi-level FSM circuits with “spaghetti-type” interconnections [
23,
24].
FSM circuits are represented by systems of Boolean functions (SBFs). To implement an LUT-based FSM circuit, it is necessary to transform an initial SBF into a network of LUTs of a particular FPGA chip. This is a step of technology mapping [
25,
26]. The outcome of technology mapping tremendously affects all main characteristics of FSM circuits [
20,
27,
28]. To implement an LUT-based FSM circuit, it is necessary to use LUTs, programmable flip-flops, programmable interconnections, circuits of synchronization, and input-output blocks.
The article [
26] notes that time delays of the interconnection system are starting to play a major role in comparison with LUT delays. Furthermore, more than 70% of the power dissipation is due to the interconnections [
29]. So, the optimization of interconnections leads to increasing the performance and reducing the power consumption of LUT-based FSM circuits. This can be done, for example, using the twofold state assignment [
12].
Obviously, increasing the number of LUT inputs leads to a decrease in both the number of LUTs and their levels in FSM circuits. However, as shown in [
30,
31], for the foreseeable future, it is very difficult to expect an increase in the number of LUT inputs. Basically, modern LUTs have no more than 6 inputs [
14,
21]. An increase in the number of inputs leads to an imbalance for the main characteristics of an LUT circuit. However, this state of affairs leads to an imbalance between an increasing number of arguments in SBFs representing FSM circuits and a fairly small number of LUT inputs. However, the increasing complexity of modern digital systems is accompanied by an increase in the number of arguments in functions representing FSM circuits. This imbalance is the source of the need to improve synthesis methods of LUT-based FSMs.
Our current article is devoted to synthesis of multi-level LUT-based circuits of Mealy FSMs obtained using the method of twofold state assignment [
12]. The twofold state assignment allows improving characteristics of LUT-based Mealy FSMs compared with their counterparts based on functional decomposition [
32]. However, applying twofold state assignment leads to FSM circuits with three levels of LUTs. These circuits are significantly slower than their counterparts with fewer logic levels. If performance is the dominant quality factor, then the number of levels in FSM circuit should be reduced. It is extremely important that the increase in operating frequency is accompanied by as small an increase in the number of LUTs as possible (compared with the original three-level FSM circuit).
Recently, we have published works [
11,
12,
32] where a method of twofold state assignment have been proposed. In these works, each FSM state has two codes. The first of them represents a state as an element of the set of FSM states. The second code is an extended state code (ESC). The ESC represents a state as an element of some class of partition of the set of states. This approach requires using an additional block generating ESCs. This block introduces an additional level of logic to the FSM circuit. In our current article, we propose a method which allows eliminating this block. The elimination of this additional block is the main feature that distinguishes our current work from works [
11,
12,
32]. This approach leads to an increase in the value of FSM operating frequency compared with FSMs based on methods [
11,
12,
32].
The main contribution of this paper is a novel design method aimed at increasing the operating frequency of LUT-based Mealy FSMs with twofold state assignment. The main idea of the proposed approach is to use only extended state codes. This approach leads to a completely new structural diagram of the LUT-based Mealy FSM. This diagram does not include a block transforming binary state codes into extended state codes. Our current research shows that this approach leads to FSM circuits having practically the same amount of LUTs as FSM circuits based on the twofold state assignment. The experimental results show that this method allows increasing the maximum operating frequency of LUT-based FSMs compared with equivalent FSMs obtained using some known methods of FSM design.
The further text of the article includes six sections. The second section shows the background of single-level LUT-based Mealy FSMs. The third section discusses the state-of-the-art in the synthesis of LUT-based FSMs. The fourth section represents the main idea of the proposed method. The fifth section includes an example of FSM synthesis using the extended state codes. The results of experiments are shown and discussed in the sixth section of the article. A short conclusion ends the article.
2. Single-Level LUT-Based Mealy FSMs
FPGAs manufactured by Xilinx include a lot of configurable logic blocks (CLB) [
1,
15], such as embedded memory blocks, digital signal processors, even microprocessors. The CLBs are connected using a programmable routing matrix [
33]. In this paper, we consider CLBs including LUTs, multiplexers and programmable flip-flops. A LUT has
inputs and a single output. Networks of LUTs implement systems of Boolean functions representing an FSM circuit.
A LUT can implement an arbitrary Boolean function which depends on up to
arguments. A combinational output of LUT may be connected with a flip-flop. Usually, D flip-flops are used to organize an FSM state register (SRG) [
34,
35]. The value of a function may be loaded into a flip-flop using the pulse of synchronization
Clk. It is possible to make output of flip-flop equal to zero using the pulse
Reset. To select the type of a CLB output (combinational or registered), a programmable multiplexer is used.
In practical digital systems, an SBF representing an FSM circuit may depend on up to 50–70 literals [
4,
33]. At the same time, modern LUTs have no more than 6 inputs. So, there is a contradiction between a large number of arguments and a very small number of LUT inputs. This leads to the need for functional decomposition (FD) of SBFs representing an FSM circuit [
36,
37]. However, FD-based circuits have a lot of logic levels and complex systems of “spaghetti-type” interconnections [
38,
39].
A Mealy FSM is represented by a six-component vector
[
4,
33]. These components are the following:
is a set of inputs,
is a set of outputs,
is a set of internal states, a function of transitions
, a function of outputs
, and an initial state
. An FSM can be represented using various tools, such as state transition graphs [
4], binary decision diagrams [
40,
41], and-inverter graphs [
25,
26], graph-schemes of algorithms [
33], state transition tables (STTs) [
4].
Very often, STGs represent FSMs. An STG is a directed graph which vertices correspond to FSM states and edges correspond to interstate transitions [
4]. Each edge is directed from the current state
to the state of transition
. There are
H edges in an STG. The
h-th edge is marked by the pair
, where
is a conjunction of inputs causing a transition
,
is a collection of outputs generated during the transition
. Consider a Mealy FSM
represented by the STG shown in
Figure 1.
The following characteristics of FSM
can be found from
Figure 1:
inputs,
outputs,
states, and
interstate transitions. Furthermore, the STG (
Figure 1) defines functions of transitions and outputs of FSM
.
An STG can be transformed into an STT. Each edge of an STG corresponds to a single row of an STT. For example, FSM
can be represented by its STT (
Table 1) obtained from the STG (
Figure 1).
There are five columns in an STT [
4]. The first four of them are the following: a current state
; a state of transition
; a conjunction of inputs
; a collection of outputs
. The column
h includes the numbers of interstate transitions (
). For example, the following functions are determined by the fourth row of
Table 1:
and
. For a row with the unconditional transition, there is
. In
Table 1, this is the row 12.
Let us encode FSM states
by binary codes
having
bits where
To encode states, we use state variables from the set
. These variables are kept into the SRG. To change the contents of SRG, the special input memory functions (IMFs) are used. They form a set
. Now, we can transform an STT into a direct structure table (DST). A DST is an extension of the STT with the three following columns:
,
, and
[
1]. The first of them contains a code of a current state, the second includes a code of a state of transition, and there is a collection of IMFs in the third of them. This collection includes IMFs equal to 1 to load the code
into SRG.
Using a DST, we can derive two following SBFs representing an FSM circuit:
The SBF (
2) represents the function of transitions, the SBF (
3) represents the function of outputs. Using the terminology from [
42], we can state that SBFs (
2) and (
3) represent a structural diagram of
P Mealy FSM (
Figure 2).
In
P Mealy FSMs, the block of transition logic is determined by SBF (
2), the block of output logic is specified by SBF (
3). The inputs of register SRG are connected with outputs of the block of transition logic. In each cycle of FSM operation, the SRG contains a current state code. The pulse of synchronization
Clk allows changing the contents of SRG. To load the zero code of the initial state
into SRG, it is necessary to generate the pulse
Reset. We discuss a case when both blocks are implemented using LUT-based CLBs. In this case, the flip-flops of SRG are distributed among LUTs of the block of transition logic.
The analysis of systems (
2) and (
3) shows that both input memory functions and FSM outputs depend on state variables and FSM inputs. This peculiarity of Mealy FSMs is used for optimizing LUT-based FSM circuits [
11].
3. Optimizing Circuits of FPGA-Based Mealy FSMs
There are a significant number of methods for synthesis of LUT-based FSMs [
11,
12,
23,
24,
26,
32,
36,
37,
38,
41,
43,
44]. It is quite possible that the quality of FSM circuits obtained by different synthesis methods will differ significantly. As a rule, the quality of an FSM circuit is determined by a combination of three main characteristics. These characteristics are the following: (1) chip resources used for implementing the circuit; (2) the maximum performance and (3) the power consumption [
45,
46]. In the case of LUT-based FSMs, the following chip resources are necessary: (1) LUTs; (2) programmable flip-flops; (3) programmable interconnections; (4) the circuit of synchronization and (5) the programmable input-outputs of a chip. Obviously, the best FSM circuit consumes the minimum amount of chip resources, has the maximum possible operating frequency, and consumes the minimum power.
To improve the quality of an FSM circuit, it is necessary to solve some optimization problems [
47,
48]. In this article, we propose a method for improving the performance (the maximum operating frequency) of LUT-based Mealy FSMs.
The functions (
2) and (
3) are represented as sum-of-products (SOPs) [
4]. These SOPs include product terms
corresponding to rows of a DST. A term
is determined as the following conjunction:
In (
4), the symbol
stands for a conjunction of state variables corresponding to the code of a current state
from the
h-th row of the DST.
Each function
depends on
literals, where a literal is either direct or compliment value of a Boolean variable [
4]. Consider the following condition
If condition (
5) takes place, then there is exactly a single LUT in the circuit implementing the function
. If condition (
5) is violated, then the corresponding LUT-based circuit is a multi-level one.
To design multi-level LUT-based FSM circuits, various methods of functional decomposition (FD) may be used [
22]. During the process of FD, an initial SOP is broken down by partial SOPs corresponding to some additional functions. This process is terminated when each partial SOP includes no more than
literals. Different partial SOPs of a function
may include the same inputs
or/and the same state variables
. Thus, there is a duplication of literals in different partial SOPs of the original SOP. This phenomenon leads to a significant complication of the interconnection system. In turn, this not only complicates the placement and tracing process, but also reduces performance and increases power consumption compared with an equivalent single-level circuit [
11,
29].
If condition (
5) is true for all functions representing an FSM circuit, then the number of LUTs in the circuit is equal to
. This is the best possible LUT count. If condition (
5) is violated, then the LUT count is equal to
, where
is the number of additional functions different from
.
To optimize FSM circuits, it is necessary to reduce the value of
. The importance of this problem has led to the development of a significant number of methods of FD [
22]. Various algorithms of FD are included into CAD tools aimed in implementation of FPGA-based digital systems.
The values of
could be reduced due to proper state assignment [
4]. For example, it is possible to represent a state using only a single state variable. This is achieved by the one-hot state assignment, when
[
49]. The one-hot state assignment requires using the SRG with
M bits. However, this is not a problem because modern FPGAs include a lot of programmable flip-flops. Due to this, this approach is very popular in FPGA-based design. For example, this method is used in the academic CAD system ABC by Berkeley [
50,
51]. It is also used in industrial CAD packages such as, for example, Vivado of Xilinx [
52] and Quartus of Intel (Altera) [
53].
The Equation (
1) determines so called maximum binary state codes [
4]. The negative effect of one-hot state assignment is an increase in the number of IMFs compared with their minimum possible number (
1). However, these IMFs are much simpler than in the case of maximum binary state assignment [
1]. These approaches have been compared, for example, in [
54]. The research [
54] shows that using one-hot codes improves FSM characteristics if there is
. However, the number of state variables is not the only factor influencing the circuit characteristics. The limited number of LUT inputs increases the effect of the number of FSM inputs on the characteristics of LUT-based FSM circuits [
1]. It is shown in [
35] that the one-hot state assignment produces worse FSM circuits if the number of FSM inputs exceeds 10.
So, in one case the best results are produced by the method of maximum binary state assignment, and in the other case it is better to use the one-hot state codes. Thus, it is necessary to check which method will give the best results for a particular FSM. Due to this, we have compared the FSM circuits produced by our proposed approach with FSM circuits produced by three other methods of state assignment. As a base for comparison, we chose the algorithm JEDI [
9,
55], the methods of binary state assignment Auto and the One-hot state assignment of Vivado [
52] by Xilinx [
15]. We chose Vivado because it aims in Xilinx FPGA chips. We chose JEDI because it is one of the best maximum binary state assignment methods [
5].
It is possible to encode states in a way reducing the power consumption [
56]. The majority of such methods are based on reducing the switch activity of an FSM circuit [
48,
57]. To do it, it is necessary to minimize the Hamming distance for codes of states with the maximum amount of transitions [
57]. However, our research [
12,
38,
39] shows that the power consumption can be reduced due to the reducing the number of interconnections inside an FSM circuit. To reduce the number of interconnections, it is necessary to minimize the numbers of arguments in SBFs (
2) and (
3) [
4]. This can be done using various methods of state assignment.
The structural decomposition is an efficient way of reducing LUT counts in Mealy FSMs logic circuits [
11]. The main idea of these methods is the elimination of direct connection between FSM inputs
and state variables
, on the one hand, and outputs
and IMFs
, on the other hand. This is connected with introducing additional functions forming a set
F having
elements. The functions
depend on FSM inputs and state variables. In turn, FSM outputs and IMFs use these additional functions as arguments creating literals of corresponding SOPs. To optimize LUT count due to applying the methods of structural decomposition (SD), the following conditions should take places [
11]:
All known methods of SD are based on conditions (
6) and (
7). These methods are analysed, for example, in [
11]. If condition (
5) is violated for some functions
, then joint application of FD- and SD-based decomposition methods is necessary [
12].
One of the SD-based methods is a method of twofold state assignment [
12,
32]. Let us analyse this method. The method is based on construction a partition
of the set of Mealy FSM states. Each class
includes
states. The maximum binary codes
are used for encoding states as elements of some class
. There are
bits in codes
of states
, where
To encode states
, the variables
are used. The sets
form a set
having
elements:
Each class determines three sets. The set includes inputs causing transitions from states . There are elements in the set . The set includes outputs generating during the transitions from states . The set includes IMFs equal to 1 during the transitions from states .
This method can be used if the following condition takes place:
In this case, it is possible to use the model of
Mealy FSM (
Figure 3).
A class
corresponds to a
implementing SBFs
The circuit of each
is implemented with LUTs having
inputs. The functions (
11) and (
12) represent the partial SOPs of FSM input memory functions and outputs. The
creates final values of functions from the set
. To do it, each LUT of this block implements disjunctions having up to
K inputs:
The outputs of LUTs producing functions (
13) are connected with inputs of D flip-flops. This explains the existence of pulses
and
as inputs of
. So, this block produces outputs
and state variables
.
To create SBFs (
11) and (
12), it is necessary to have state variables
. These variables are generated by the
. This block transforms state variables
and generates the following SBF:
As follows from [
12], the circuits of
FSMs require fewer LUTs than the circuits of equivalent
P Mealy FSMs. If condition (
10) takes place, circuits of
FSMs also have fewer levels of logic. Due to this, they are faster than the circuits of equivalent
P Mealy FSMs.
As follows from
Figure 3, both inputs
and state variables
enter only LUTs of the first level of FSM circuit. The partial functions (
11) and (
12) enter only LUTs of the second level of FSM circuit. At last, the state variables
enter only LUTs of the
creating the third level of logic. Due to this, the circuits of
FSMs have regular systems of interconnections. This distinguishes them from the circuits of FD-based
P FSMs having complex systems of “spaghetti-type” interconnections. Our research [
11,
32] shows that the circuits of
FSMs consume less power than the circuits of equivalent
P Mealy FSMs.
The analysis of
Figure 3 shows that
FSMs have a drawback. Namely, they include
in the path leading from inputs
to state variables
. The conversion (
15) takes some time, which is added to the FSM cycle time. In this article, we propose a way of eliminating the
from FSM circuit. It allows reducing the cycle time and, therefore, increasing the value of maximum operating frequency of a resulting FSM circuit.
4. Main Idea of the Proposed Method
Assume that we have constructed a partition
for some Mealy FSM. To eliminate the converter of state codes
into state codes
, we propose to use extended state codes
. For a state
, the extended state code determines the state as an element of both sets
S and
. The number of bits in
is equal to
determined by (
9). To encode states by ESCs, we use state variables
, where
. If
, then only variables
can differ from zero.
Using only ESCs leads to
Mealy FSMs. There is a structural diagram of
Mealy FSM shown in
Figure 4.
In
Mealy FSMs, the blocks of the first level implement partial functions (
11) and (
12). These functions enter inputs of LUTs from
. As in the case of
FSMs, the functions (
12) are transformed into FSM outputs. However, the functions (
11) are transformed directly into state variables
. Due to this, there is no need in the
used in the case of
Mealy FSM (
Figure 3).
By removing the
, the three-level FSM circuit turns into a two-level one. So, we can expect that
Mealy FSMs have better performance than equivalent
Mealy FSMs. Results of our experimental studies, shown in
Section 6, have confirmed this assumption.
In this article, we propose a synthesis method for
Mealy FSMs. The method aims in LUT-based FSMs where an LUT has
inputs. We assume that an FSM is represented by its state transition table. Furthermore, we assume that there exists a partition
satisfying the condition (
10). There are the following steps in the proposed synthesis method:
Constructing the partition with the minimum cardinality number K.
Encoding of FSM states by extended state codes .
Creating DST of Mealy FSM.
Creating tables representing –.
Creating table representing .
Constructing SBFs representing blocks of FSM circuit.
Implementing Mealy FSM circuit with particular LUTs and other resources of FPGA chip.
To create the partition
, we can use the methods from [
11,
32]. These methods try to minimize the number of LUTs in the resulting FSM circuit. Firstly, the number of shared outputs in the sets
should be minimum one. This reduces the number of LUTs implementing partial outputs in the circuit of
. Furthermore, this allows to reduce the number of interconnections among
and LUTs of
. The circuit of block
is guaranteed to have a single level of logic if the following condition is true:
If condition (
16) is violated, then the circuit of
is still a single-level one, if each partial output is generated by no more than
blocks of the first level of logic. Due to this, it makes sense to minimize the appearance of shared FSM outputs in different sets
.
Each class
is characterised by a set
including states of transitions from the states
. The methods [
11,
32] minimize the appearance of shared states of transition in different classes of the partition
. This allows minimizing the number of partial input memory functions generated by a particular block of logic. In turn, this minimizes the numbers of LUTs in circuits of all blocks of
FSM circuit. In the case of
FSM, we still use this method.
The difference in the organization of
and
FSMs leads to a change in the method of state assignment. In [
32], state are encoded in a way minimizing the number of LUTs in the circuit of
. There is no
in
FSMs. We propose to encode states in a way minimizing the number of LUTs generating functions (
11). To do it, we encode the states
in a way maximizing the number of zeros in the same bits of codes for different states of transition within each set
.
Consider the following example. There is a set . If these states have the codes , , then all five partial input memory functions are generated by . To generate them, it is necessary to use five LUTs. If there are codes = 00001 and = 00011, then only two partial input memory functions should be generated. To do it, only 2 LUTs are necessary. We use this approach to encode states of Mealy FSMs.
5. Example of Synthesis
If a model of
FSM is used to implement the circuit of some FSM
, then we denote this as
. In this section, we discuss an example of synthesis of
Mealy FSM starting from STT (
Table 1). To implement the circuit of Mealy FSM
, we use LUTs having
.
Step 1. Using the approach in [
32] gives the partition
of the set of states. It includes the classes
and
. There is
in the discussed case. Using (
8) gives
. As follows from the analysis of
Table 1, the following sets exist for each class
:
,
,
for
,
,
for
. So, the following relation takes place:
. Because of
, the condition (
10) is true for each class of the partition
.
As we can see, there are no identical elements in the sets and (. The fewer identical elements in different sets , the fewer connections between the sources of FSM inputs and LUTs of the first level of logic. In our particular case, the absolute minimum of zero is reached.
There is only a single common output in sets and . It means that only a single LUT of is necessary to generate FSM outputs. All other outputs are generated by LUTs from corresponding blocks of the first level of logic circuit. In general, the fewer identical elements in different sets , the smaller the number of LUTs in the second level of logic.
Step 2. There is
. From (
9), we can get
. This gives the set
. We can create the following sets for the state encoding:
and
. If
, then
in the code
. There is
in the code
for the states
. Furthermore, we should maximize the number of zeros in the same bits of codes for different states of transition within each set
.
In our example, it is possible to encode only states
in such a way. The outcome of state assignment is shown in
Figure 5.
As follows from the Karnaugh map (
Figure 5), there is
in extended codes of states
. It means that only three LUTs are used for generating input memory functions in
. Furthermore, we can see that variables
and
are insignificant for conjunctions
for the codes of states
. The same is true for the variables
and
and the codes of states
. This allows diminishing the number of literals in terms (
4) up to
. For example, there is
. These minimized conjunctions are used in SOPs of functions (
11) and (
12).
Step 3. The transition from an STT to a DST is executed in the trivial way. The columns
,
,
,
,
h contain the same symbols in both tables. The state codes are taken from a corresponding Karnaugh map. In the discussed example, the codes are taken from
Figure 5. There is a symbol
in the row h of the column
, if there is 1 in the code of state of transition
in this row. In the discussed case, the DST of FSM
is represented by
Table 2. This table includes
rows.
Step 4. To construct a table of
, it is necessary to select the rows of a DST with transitions from the states
. Obviously, this is executed in a trivial way. Applying this approach to
Table 2 gives two tables.
Table 3 represents the
;
Table 4 represents the
. There are 9 rows in
Table 3 and 6 rows in
Table 4. So, together, these tables include all 15 rows of the original DST (
Table 2).
Step 5. The table of includes two columns. The column Function includes all functions from sets of input memory functions and FSM outputs. The column Block is divided by K sub-columns . The sub-column corresponds to . If a function is generated by , then there is 1 at the intersection of the row and the column .
In the discussed case, there are
rows in the table of
. This block is represented by
Table 5.
There is a transparent connection between
Table 3 and
Table 4, on the one hand, and
Table 5, on the other hand. For example, there is
in
Table 4. So, there is 0 at the intersection of the row
and the column
of
Table 5. Next, there is
in
Table 3. So, there is 1 at the intersection of the row
and the column
of
Table 5. All other rows of
Table 5 are filled on the base of a similar analysis.
Step 6. Systems (
11) and (
12) are extracted from tables of
–
. The following SBFs can be derived from
Table 3 (after minimization):
The following systems are derived from
Table 4:
The following SBF is derived from
Table 5:
Systems (
17)–(
21) represent logic circuit of FSM
. Let us analyse the LUT counts for each level of logic. To do it, we should analyse systems (
17)–(
21).
As follows from (
17) and (
18), there are 9 LUTs in the circuit of
. The LUT count is determined by the number of equations in (
17) and (
18). From SBFs (
19) and (
20) follows that there are 6 LUTs in the circuit of
.
If some equation of SBF (
21) has more than a single product term, then the corresponding LUT is included into
. Otherwise, a corresponding function is generated by some of block of the first level of logic. So, there are 4 LUTs in the circuit of
.
So, there are
LUTs in the circuit of FSM
. The pulses
and
enter CLBs generating functions
. The circuit is shown in
Figure 6.
Step 7. To implement the circuit of FSM
, it is necessary to use very complicated methods of technology mapping [
26]. This can be done using, for example, the CAD tool Vivado by Xilinx [
52]. This package solves the problems of mapping, placement, routing, testing, finding characteristics of an FSM circuit (such as the LUT count, number of CLBs, number of flip-flops, maximum operating frequency, power consumption). We do not show the results of implementation for this particular example. In the next section, we use Vivado to investigate the efficiency of our method compared with some other design methods.
6. Experimental Results
The results of experiments are shown in this section. To conduct experiments, we use benchmark FSMs from the library [
58]. The library includes 48 benchmarks represented in the format KISS2. These benchmarks have a wide range of basic characteristics (numbers of states, inputs, and outputs). The characteristics of these benchmarks can be found in many articles and books, for example, in [
11,
27,
37]. These benchmarks are used very often by different researchers to compare area and time characteristics of FSMs obtained using different design methods. The characteristics of the benchmarks are shown in
Table 6.
To conduct the experiments, we used a personal computer with the following characteristics: CPU: Intel Core i7 6700 K
[email protected] GHz, Memory: 16 GB RAM 2400 MHz CL15. As a platform for implementing FSM circuits we used the Virtex-7 VC709 Evaluation Platform (xc7vx690tffg1761-2) [
59]. The FPGA chip of this platform includes LUTs with 6 inputs. We used CAD tool Vivado v2019.1 (64-bit) [
52] to execute the technology mapping. The results of experiments are taken from reports produced by Vivado. As the source information for the CAD tool, we used VHDL-based FSM models obtained by the transformation of files in KISS2 format into VHDL codes. The transformation is executed by the CAD tool K2F [
11].
We compared area (the LUT count) and time (the maximum operating frequency) characteristics of FSMs based on five different approaches. Three of them are
P Mealy FSMs based on: (1) Auto of Vivado (it uses binary state codes); (2) One-hot of Vivado; (3) JEDI. The fourth objects for comparison are
-based FSMs [
12,
32]. We compared these four FSMs with our approach.
It is known [
11] that area and time characteristics of LUT-based FSM circuits depend strongly on the relation between numbers of inputs (L) and state variables (
), on the one hand, and the number of LUT inputs
, on the other hand. Due to this, we have divided the benchmarks into five following classes. The benchmarks belong to class of trivial FSMs (class 0) if
. The benchmarks belong to class of simple FSMs (class 1) if
. The benchmarks belong to class of average FSMs (class 2) if
. The benchmarks belong to class of big FSMs (class 3) if
. The benchmarks belong to class of very big FSMs (class 4) if
. As research [
39] shows, the larger the class number, the bigger the gain from using methods of structural decomposition.
The class 0 includes the benchmarks bbtas, dk17, dk27, dk512, ex3, ex5, lion, lion9, mc, modulo12, and shiftreg. The class1 contains the benchmarks bbara, bbsse, beecount, cse, dk14, dk15, dk16, donfile, ex2, ex4, ex6, ex7, keyb, mark1, opus, s27, s386, s8, and sse. The class 2 consists of the benchmarks ex1, kirkman, planet, planet1, pma, s1, s1488, s1494, s1a, s208, styr, and tma. The class 3 includes the benchmark sand. At last, the benchmarks s420, s510, s820, and s832 create the class 4.
The results of experiments are shown in
Table 7,
Table 8,
Table 9,
Table 10,
Table 11,
Table 12,
Table 13,
Table 14 and
Table 15. There is the same organization of
Table 7,
Table 9,
Table 11,
Table 13,
Table 14 and
Table 15. We marked the table columns by the names of investigated methods. The table rows contain the names of benchmarks. Within each table, the results for the same class are shown in adjacent rows. There are results of summation of values from columns in the row “Total”. The row “Percentage” includes the percentage of summarized characteristics of FSM circuits produced by other methods, respectively, to
-based FSMs. To point that the model of
P FSM is used for methods Auto, One-hot, and JEDI, we name these methods as P-Auto, P-One-hot, and P-JEDI.
Table 8,
Table 10 and
Table 12 show summarized experimental results for different classes of benchmark FSMs. Let us analyse the experimental results taken from reports produced by Vivado.
As follows from
Table 7, the
–based FSMs consume the minimum amount of LUTs compared with other investigated approaches. The
-based FSMs require 7.7% more LUTs than equivalent
–based FSMs. However, all other FSMs require more LUTs than it is for our approach. Our approach consumes fewer LUTs than it is for P-Auto (25.47% of gain), P-One-hot (46.1% of gain) and P-JEDI-based FSMs (6.67% of gain).
To show a dependence of the gain in LUTs on the class of benchmarks, we have created
Table 8. It shows the gain for classes 0, 1 and 2–4.
As follows from
Table 8, the
–based FSMs have worse results than equivalent P-Auto FSMs (12.68% of loss on the number of LUTs), P-JEDI FSMs (14.08% of loss on the number of LUTs) and
-based FSMs (12.68% of loss on the number of LUTs). So, for the class 0, the P-JEDI based FSMs have the minimum amount of LUTs. For the class 1, our approach loses out relative to the other two approaches (11.01% to
–based FSMs and 1.46% to P-JEDI-based FSMs). However, our approach produces more economical circuits than it is for P-Auto (18.71% of gain) and P-One-hot (53.51% of gain). The
–based FSMs only lose to the
-based FSMs (6.23%). So, for the classes 2–4, our approach defeats P-Auto FSMs (30.35% of gain), P-One-hot FSMs (45.23% of gain), and P-JEDI FSMs (6.13% of gain). It means that our approach allows obtaining FSM circuits with a number of LUTs comparable to this number for equivalent P-JEDI- and
-based FSMs. At the same time, the loss of our approach decreases as the complexity of FSMs increases: the larger the class number, the smaller our loss relative to the best solutions (P-JEDI- and
-based FSMs).
We thought that our approach would allow us to obtain FSM circuits with higher performance than it is for circuits based on models of either
P or
FSMs. To test this assumption, we have created
Table 9. It includes values of maximum operating frequency measured in megahertz.
As follows from
Table 9, the
–based FSMs have the higher values of maximum operating frequency compared with other investigated FSMs. Our approach provides the following gain: (1) 25.49% compared with P-Auto-based FSMs; (2) 26.09% compared with P-One-hot-based FSMs; (3) 10.06% compared with P-JEDI-based FSMs; (4) 15.9% compared with
-based FSMs. Our research has shown that the frequency gain depends on the class to which an FSM belongs. This conclusion is supported by data from
Table 10.
As follows from
Table 10,
–based FSMs have the same operating frequency as equivalent
FSMs from the class 0. We can explain this phenomenon by the fact that there is no code transformer al
FSMs. Furthermore, P-JEDI-based FSMs have slightly higher frequency (0.84%). However, our approach gives a slight advantage over P-Auto-based FSMs (0.07%) and P-One-hot-based FSMs (2.51%). It follows from
Table 10 that for trivial automata, the method of organizing the FSM circuit is practically irrelevant. The difference in frequency depends mainly on the state encoding method.
Starting from simple FSMs (the class 1), our approach allows producing the fastest circuits. There is the following gain in maximum operating frequency: (1) 24.68% compared with P-Auto-based FSMs; (2) 24.77% compared with P-One-hot-based FSMs; (3) 19.33% compared with P-JEDI-based FSMs; (4) 15.68% compared with -based FSMs. The gain is even greater for FSMs of classes 2–4. This is the following: (1) 39.94% compared with P-Auto-based FSMs; (2) 40.1% compared with P-One-hot-based FSMs; (3) 32.02% compared with P-JEDI-based FSMs; (4) 24.64% compared with -based FSMs. As we can see, starting from simple FSMs, the difference in frequency depends mainly on the architecture of FSM.
When comparing different variants of the FSM circuit implementation, an integral estimate is often used, which is equal to the product of the chip area occupied by a circuit by the performance [
45]. In the case of LUT-based FSMs, the circuit quality is estimated by the product of the LUT count by the minimum cycle time [
45]. Obviously, the time of cycle is inversely to the operating frequency. The lower the value of this product, the better is the quality of the corresponding FSM circuit.
The results of the comparison relative to this estimate are shown in
Table 11. The time of cycle is represented in nanoseconds. So, the data inside
Table 11 are represented as “number of LUTs × nsecs”.
As follows from
Table 11, our approach leads to FSM circuits with better integral estimates than it is for other investigated methods. There is the following gain in the integral estimates: (1) 101.63% compared with P-Auto-based FSMs; (2) 133.62% compared with P-One-hot-based FSMs; (3) 45.54% compared with P-JEDI-based FSMs; (4) 16.89% compared with
-based FSMs. This is connected with the fact that our approach produces two-level FSM circuits. At the same time, a circuit for each function for any level of logic requires only a single LUT.
To check the dependence of the integral estimate on the value of
, we split
Table 11 and built
Table 12, which shows integral estimates for FSMs from classes 0, 1 and 2–4.
In the case of class 0 (
Table 12), the circuits of
–based FSMs have the worst values of integral estimates compared with the circuits produced by P-Auto, P-JEDI, and
FSMs. There is the following loss: (1) 10.78% compared with P-Auto; (2) 12.78% compared with P-JEDI and (3) 12.41% compared with
FSMs. So, the loss is approximately the same relative to these three models. However,
–based FSMs have a gain of 27.07% compared with LUT-based P-One-hot FSMs.
Starting from simple FSMs (the class 1), our approach allows producing the circuits with significantly better integral estimates as their counterparts. There is the following gain in values of area-time products: (1) 63.38% compared with P-Auto-based FSMs; (2) 110.11% compared with P-One-hot-based FSMs; (3) 24.27% compared with P-JEDI-based FSMs; (4) 5.01% compared with -based FSMs. The gain is even greater for FSMs of classes 2–4. This is the following: (1) 124.33% compared with P-Auto-based FSMs; (2) 150.66% compared with P-One-hot-based FSMs; (3) 58.24% compared with P-JEDI-based FSMs; (4) 23.48% compared with -based FSMs. As we can see, starting from simple FSMs, the difference in the values of area-time products depends mainly on the architecture of FSM. Due to this, the circuits based on FSMs also benefit in comparison to P-Auto, P-One-hot, and P-JEDI FSMs.
As follows from results of experiments, using the model of FSMs allows obtaining FSM circuits with higher operating frequency and less values of integral estimates than they are for other investigated models. Winning starts already from simple FSMs for whom the following relation takes place: . The gain from using our method increases as the difference between the number of FSM inputs and state variables, on one side, and the number of inputs of LUT, on the other side, increases.
In our previous papers [
12,
32,
38,
39], we have proposed various methods for improving characteristics of LUT-based Mealy FSMs. All these methods lead to three-level FSM circuits. In [
12], there is proposed a synthesis method based on the twofold state assignment and one-hot encoding of outputs. In [
32], there is proposed a synthesis method based on the twofold state assignment and encoding of collections of outputs. These methods are also shown in our book [
11]. In [
38], there is proposed a synthesis method based on the replacement of FSM inputs and encoding of collections of outputs. In [
39], there is proposed a synthesis method based on the transformation of codes of collections of outputs into state codes.
To compare our new results with results [
12,
32,
38,
39], we have created three additional tables.
Table 13 includes the experimental results for the number of LUTs.
Table 14 shows the results for the maximum operating frequency.
Table 15 contains the products of LUT counts by propagation times.
As follows from
Table 13, the circuits of [
38]-based FSMs use a minimal number of LUTs compared to other investigated methods. The
-based FSMs require 7.7% more LUTs than equivalent [
12]–based FSMs, 1.32% more LUTs than [
32]–based FSMs, and 8.54% more LUTs than [
38]-based FSMs. Our approach consumes fewer LUTs than it is [
39]-based FSMs (0.49% of gain).
As follows from
Table 14, our approach allows obtaining FSM circuits with the highest operating frequency. There is the following gain in operating frequency: (1) 15.9% compared with [
12]–based FSMs; (2) 26.43% compared with [
32]-based FSMs; (3) 27.81% compared with [
38]-based FSMs; (4) 17.44% compared with [
39]-based FSMs.
As follows from
Table 15, the circuits of
-based FSMs have the best values of area-time products. There is the following gain: (1) 16.89% compared with [
12]–based FSMs; (2) 47.89% compared with [
32]-based FSMs; (3) 40.87% compared with [
38]-based FSMs; (4) 29.88% compared with [
39]-based FSMs.
We have proposed the method based on extended state codes to improve the time characteristics of FSMs. Note that the gain in frequency is accompanied by a slight increase in the number of LUTs compared with equivalent FSMs. We think that our approach can be used instead of FSMs if the performance is the main criterion for the optimality of LUT-based FSM circuits.