The controller interface system assures precise input-output operations scheduling with relaxed timing requirements for software processing. The following subsections give architecture details and discuss the implementation efficiency of processing methods.
4.2. The Windings Current Sesning Unit
The winding current is measured in every period of the PWM inverter operation along with the measurement of the motor shaft position. Those values must be obtained in a correlated manner to assure correct computations of
id and
iq current values for the control algorithm. Switching transistors of the inverter are sources of significant disturbances influencing the momentary value of the current. Even though the passive filtering circuits are implemented in the power lines of the inverter, measured current signals are strongly distorted by the switching activity of transistors. Exemplary oscillograms of current sensors output are shown in
Figure 5. The directly sampled signal is shown in case A It contains significant noise components from the working inverter. The noise can be eliminated through averaging, as case B illustrates. The square waveform at the bottom is the most significant bit of the angle converter used for synchronization purposes.
Using a compensation analog to digital converter a momentary value of the signal is captured and converted to a number. This method introduces a high error of measurement signals with periodic noise. Using an analog filter in the input signal path introduces the phase shift that results in the incorrect computation of static coordinate system values of
id and
iq in reference to the shaft angle. To get a usable signal appropriate filtering must be applied. The measurement process must be averaged in the period of inverter operation. This can be implemented as collecting an equally spaced sequence of samples for the period of inverter operation:
where:
iw is a sum of
n winding current samples
i(…) over the inverter operation period,
TINV—inverter period,
t0—sampling start time. The
iw is proportional to the average signal with a factor of
n. The averaging measurement unit implementation requires synchronization of the sampling process to the inverter operation.
The ZYNQ device is equipped with a compensation analog to digital converter [
19]. It can achieve the sampling performance of 1 Ms/s. The exact performance is slightly lower at 100 MHz clock, while the conversion takes 104 clock cycles. The converter performance is enough to take 64 samples over a 125 µs inverter basic period. The block diagram of the averaging integrated converter is shown in
Figure 6. The XADC control is completely synchronized with the inverter operation. A set of samples is collected upon request pulse of the RQ signal. The value of collected signal samples is accumulated since the beginning of the measurement cycle as shown by (2).
The measurement timing waveforms are shown in
Figure 7. The implementation utilizes two ADC channels that are alternatively selected by the control unit to assure uniform distribution of measurement points in both current measurement channels. The distance of samples is 192 clock cycles. In total, 32 samples are collected for input current for each channel during one period of operation.
The winding currents obtained using the proposed averaging method and acquired shaft angle are shown in
Figure 8. Comparing oscillograms from
Figure 5 and winding currents there could be observed switching noise elimination. The method can be compared to dual integration voltmeters where the input signal measurement is done over the period (or its integer multiplicity) of the disturbing signal. This results in eliminating the noises coming from the inverter’s IGBT switching.
4.3. The Vector System Processing Units S2R and R2S
The vector processing system consists of two transformation modules. The input path performs a transformation of the rotating input system of windings currents [
ia,
ib,
ic] to static vectors [
id,
iq]. Initially, the tri-phase system is converted into two vectors [
iα,
iβ]
Next, the rotation of the [
iα,
iβ] with the angle α of shaft position is calculated to project vectors on [
id,
iq]:
The inverse transformation is based on computing tri-phase system voltages [
Va,
Vb,
Vc] from [
Vd,
Vq] and the current motor shaft position α. First, voltage vectors from the static system are projected to rotating vectors [
Vα,
Vβ]:
Next, rotating vectors are projected to individual driving vectors of the inverter
There is also required post-processing of the vectors that shifts the vector from the signed range to the unsigned range of PWM space and assure value saturation in the range of correct operation of the PWM:
Outlined above theoretical background of the computations to perform allow moving to the implementation stage. The essential part of both transformations is the rotation of the vectors in two-dimensional space (4) (5). This operation requires computing sine and cosine functions of α angle. This task is complex and usually is implemented using floating-point computation (e.g., C math library) [
20]. This introduces significant complexity and results in a long computation time. The coordinate rotation procedure should be accommodated to the resolution of angle measurement in the system that is represented by a 12-bit value. The precision of computations offered by a standard 32-bit floating point representation (IEEE 754) with 24-bit mantissa gives a 12-bit of computation precision margin.
Computing vector rotation can be considered a complete computation problem using the CORDIC approach [
21]. The CORDIC algorithm is based on an iterative compensation-based approach. Step-by-step computations quickly converge to desired angle value. The computations in general can be put down as multiple vector rotations for the angle that arcus tangent is the power of 2 with a negative integer exponent {0, −1, −2,…}. The direction of the rotation is chosen based on the sign of the angle value from the previous computation step:
Putting down the rotation matrix and dividing it by cos we get:
Assuming that in each step the tangent value is selected as the
ki factor above equation can be put down as follows:
The CORDIC method utilizes the addition and division by an integer power of two. In digital systems utilizing binary numbers division by a power of two is implemented as a shift-right operation. The final result requires scaling by the factor depending on step number:
The rotation process of the point for a given angle is calculated using conditionally selected addition or subtraction and shift operation as shown in (8)–(11). In a considered implementation, the final scaling can be omitted while values are used in the control loop with negative feedback. The adjustment multiplication can be assumed as a gain factor of the current sensor path or inverter voltage selection.
The computation procedure is well mapped into systolic array implementation that block diagram is shown in
Figure 9. Computations can be implemented as an iterative structure that implies reduced requirements of hardware resources. The hardware resource reduction can be roughly estimated around the
n (number of stages) to systolic implementation but some specific problems should be analyzed. The iterative implementation requires programmable shifters for
xi and
yi. A barrel shifter utilizing analogue switches [
22] is implemented in ASICs. In FPGA devices the multiplexer-based structure consumes a large number of programmable resources. There is also an alternative of using a combinational multiplier block for shifting. The low resource count implementation is a shift register with the parallel load. This concept results in a significant extension of computation time. Additional clock cycles required for shift operations are a sum of the arithmetic sequence from 1 to n with a step of 1.
The systolic structure can be implemented as a pure combinational path (eliminating intermediate registers in
Figure 9) or pipelined structure. The combinational implementation requires including wait cycles between arguments passing and obtaining the result while pipelined implementation offers the ability to achieve high clock frequency and average computation performance of one result per clock cycle.
It is essential to determine the number of steps required for computing the results with satisfactory precision. In the considered case the position sensor delivers an angle with a precision of 12 bits [
15]. The number of iterations depends on the precision of the angle determination and can be calculated from following inequality:
where:
k is the number of bits of angle sensor and
n is the number of iterations for the CORDIC method to compute results with the required precision. Solving the precision inequality for the 12-bit shaft angle sensor number of iteration should be no less than 12. This constitutes the unit’s requirements for 12 systolic layers and a computation time of 13 clock cycles (including the input capture cycle) for pipelined architecture.
The CORDIC unit (as shown in
Figure 9) allows the implementation of vector rotation only. Additional computations must be implemented separately. Since Xilinx Virtex II-PRO (high-end FPGA family) and Spartan 3 (budget FPGA family) multipliers are available in FPGA structures as dedicated hardware blocks. This allows for saving a lot of logic resources and achieving significantly better performance when multiplication is performed. For this reason, the implementation of vector rotation directly (either in the case of (4) or (5)) could be easily embedded into the preprocessing of winding currents and postprocessing of inverter voltages (PWM duty factors).
The essential problem is the fast computation of the sine and cosine of the shaft position angle. This method should compute results quickly and with the required precision of 12-bit angle encoding. The only way to get the result as quickly as possible is to use the lookup table approach. The implementation requires remembering three functions that are {sin(φ), cos(φ), and –sin(φ)}. There arises a question of implementation nature. Do we have to implement all three functions separately?
The first feature exploited in implementation is function symmetry. It is enough to remember only sin(φ) samples in the closed range . The remaining part of the function is restored by symmetry. When the period of the function is divided into equally spaced intervals the lookup table must store samples. This requires nonstandard implementation of the memory module with an odd number of cells and specific decoding circuitry.
The problem is overcome in the proposed implementation shown in
Figure 10 by storing the sine function samples in the range
(case B–black line). This allows the reduction of memory requirements only by half but eliminates the problem of additional hardware necessary for accessing the odd number of memory cells. Case A of
Figure 10 illustrates the implementation of a universal sine and cosine function generator. The hatched rectangles denote the registers in the signal paths. The implementation with the required angle resolution is possible using the RAMB36 module [
23] which can hold 2048 18-bit words. The output of the lookup table is passed to an adder that depending on the value of the MSB of the angle
complements or passes simple content of the memory cell. In order to balance the registered delay of the memory block the register is placed on the A
11 line to synchronize it with data arriving from the memory. In order to use the same lookup table for obtaining the sine and cosine functions the indexed addressing concept is used. The function selector (sel input) is considered as a two-bit offset added to the high-order bits of the angle
φ. This two-bit vector when considered as the natural number introduces an angle shift inside the address word:
.
This simple and fast function generator is only a part of the rotation computation unit. The computation system must implement equations (3)–(4) or (5)–(7) or both at once enabling separated calculations. When implementation of the unit is made the remaining calculations should be implemented to create the complete processing solution. The following will be illustrated the implementation of rotation to a static (R2S) computation core. The processing problem is illustrated using the data flow graph shown in case A of
Figure 11. It could be observed that a similar sum of products is computed three times. To achieve high clock frequency the implementation should shorten the combinational paths and separate them with registers. Such architecture enables the introduction of a pipelined (overlapped) operation. Based on earlier authors’ experience and developed mapping methodologies [
7] the developed data path block diagram is shown in case B of
Figure 11. The modern FPGAs implement the DSP block that integrates a multiplier followed by an adder. In the case of Xilinx’s ZYNQ family, the DSP block is named DSP48E1 [
24]. Such a structure is perfectly suited to implement multiply and accumulate (or pipelined add). Fitting computations into the DSP48E1 block allows for reducing the requirements for general-purpose logic resources and obtaining the processing system with short propagation delays that reflect the possibility of increasing the clock frequency.
It is essential to illustrate the performance and logic requirements of the proposed processing system in reference to other possible implementations that justify the selection. It is also important to compare the hardware implementations with pure software-based computations.
Table 2 gathers the performance factors of different implementations. The two first rows collect the programmatic implementation of rotations using ARM Cortex A9 running at 600 MHz and compiled with speed optimization [
20]. The first one (Trig. Rotation) utilizes the math library and floating point implementation. Even though ARM supports floating point instructions computing a single rotation requires 3.83 µs. An improvement is observed for the CORDIC sw (abbreviation of software) implementation utilizing integers for number representation. In this implementation, computation time has been reduced to 37.7% of the initial time and requires 1.445 µs.
The next five implementations are dedicated to the programmable hardware platform. It is assumed that the hardware platform is clocked with a frequency of 100 MHz. First, the CORDIC implementation is examined. The benefit of this implementation enables getting rotation results of both coordinates in a single computation process. The first implementation utilizes the combinational structure. Registers are placed on the processing system inputs and outputs only enabling free data flow through the structure. In this case, the rotation is computed in 30 ns (3 clock cycles). There is an observed long propagation delay in the circuit path that requires two clock cycles to completely propagate the data (propagation delay of 14.8 ns). This is illustrated by the maximal clock frequency of 67.8 MHz. The CORDIC pipe implementation shows changes in performance when all stages are separated with registers. The computations take 130 ns (13 clock cycles). It should be observed that the pipelined structure offers the maximal performance of average computations of one clock cycle per sample. In the considered case the deep pipelined structure (12 stages) is not required while only one computation is done for a sample.
Next, implementation is based on the computation of sine and cosine using the lookup table approach. The performance of the sine and cosine computation unit based on RAMB memory used as a lookup table is shown first to give the reference point. There should be observed minimal resource requirements for computing. When compared with CORDIC implementations the LUTs requirement has been reduced 33.7 times (from 641 to 19). The lookup table based architecture offers an extremely high clocking frequency of more than 600 MHz and pipelined operation. Using a pipelined architecture enables completing the computations of sine and cosine functions in 6 clock cycles that improve the performance of the following implementations of R2S and S2R units. Finally, coordinate system computation units are presented. Its implementation utilizes the lookup table-based sine and cosine computation unit. The data flow diagram shown in
Figure 11 perfectly fits the DSP48E1 unit computing capabilities. It should be noted that such a unit is located next to the RAMB32 block. The close location of components reduces programmable connection length allowing for high-frequency operation. The total computation time of R2S is 12 clock cycles while reverse computations in S2R blocks take 14 clock cycles. In comparison to CORDIC pipe implementation, the computation time is comparable. There is a wide margin between maximal and operating frequency (requested 100 MHz, the worst achieved case is 266.5 MHz for S2R). The peripheral frequency can be set to 250 MHz meeting timing requirements, which is the maximal frequency of PLL for peripheral clocking [
11]. The CORDIC implementation enables the illustration of resource requirements reduction and architecture fitting. The R2S requires about 8.4 times fewer general-purpose logic resources while the S2R ratio is about 4.42. It should be noted that the CORDIC algorithm is only a part of the computations to be implemented while remaining the transformations between 3-phase and two vectors systems that require general-purpose resources and additional computation time.