An Efficient Hardware Implementation for Complex Square Root Calculation Using a PWL Method
Abstract
:1. Introduction
- This is the first study in which a PWL method has been applied to implement the computation of the square root of a complex number in pursuit of low latency.
- The complex square root computation is decomposed into several substeps involving three real square root functions. A software-based segmentor approximates these real square root functions using the fewest possible segments while meeting the specified requirements of a predefined fractional bit width and MAE.
- In accordance with the fractional bit width of the slope defined in the segmentor, the bit width of the multipliers is reduced to save hardware overhead. Additionally, the multipliers are implemented with a two-stage pipelined architecture to reduce the critical path.
- Because of the usage of the state-of-the-art PWL method and a formula with a simple computational flow, our design has a significant advantage in delay. In addition, because the front part of the circuit is shared between the real and imaginary parts of the computation, the proposed architecture has an absolute advantage in hardware overhead.
2. Theoretical Background
2.1. PWL Method
2.2. Precision Criteria
3. Proposed Methodology
3.1. Optimized Segmentor for Computing the Real Square Root
3.2. Proposed Segmentor for Computing the Complex Square Root
3.3. Calculation of
Algorithm 1: Proposed segmentor. |
Input: , , , Output: /* Simulation of Hardware Circuit: */ 1 ; Traverse all values of i and j 2 3 ; Traverse all values of i and j 4 ; 5 ; 6 Traverse all values of i and j 7 ; 8 ; Traverse all values of i and j 9 ; 10 ; Traverse all values of i and j 11 ; /* Calculation of : */ 12 ; Traverse all values of i and j 13 ; 14 ; 15 ; 16 ; 17 |
3.4. Parameter Selection
4. Hardware Implementation and Comparison
4.1. Implementation Results and Comparison
4.2. Details of Hardware Implementation
5. Conclusions, Limitations, and Future Research
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
PWL | Piecewise linear |
FPGA | Field-programmable gate array |
CORDIC | Coordinate rotation digital computer |
VLSI | Very-large-scale integration |
PLAC | Piecewise linear approximation computation |
CVNN | Complex-valued neural network |
Maximum absolute error | |
Predefined MAE for the segmentation of a real square root function | |
MAE of the complex square root computation | |
Predefined MAE for the segmentation of a complex square root function | |
Left edge of the bisection window | |
Right edge of the bisection window | |
of the circuit | |
of the circuit for computing the real part | |
of the circuit for computing the imaginary part | |
Average absolute error | |
of the circuit for computing the real part | |
of the circuit for computing the imaginary part | |
Fractional bit width of the slope | |
Fractional bit width of the other intermediate data excepting the slope |
References
- Bindel, D.; Demmel, J.; Kahan, W.; Marques, O. On computing Givens rotations reliably and efficiently. ACM Trans. Math. Softw. (TOMS) 2002, 28, 206–238. [Google Scholar] [CrossRef]
- Sima, M.; Senthilvelan, M.; Iancu, D.; Glossner, J.; Moudgill, M.; Schulte, M. Software solutions for converting a MIMO-OFDM channel into multiple SISO-OFDM channels. In Proceedings of the Third IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob 2007), New York, NY, USA, 8–10 October 2007; p. 9. [Google Scholar]
- Mitroy, J.; Ivallov, I. Quantum defect theory for the study of hadronic atoms. J. Phys. G Nucl. Part. Phys. 2001, 27, 1421. [Google Scholar] [CrossRef]
- Salo, J.; Fagerholm, J.; Friberg, A.T.; Salomaa, M. Unified description of nondiffracting X and Y waves. Phys. Rev. E 2000, 62, 4261. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ercegovac, M.D.; Muller, J.M. Complex square root with operand prescaling. J. VLSI Signal Process. Syst. Signal Image Video Technol. 2007, 49, 19–30. [Google Scholar] [CrossRef] [Green Version]
- Wang, D.; Ercegovac, M.D. A design of complex square root for FPGA implementation. In Proceedings of the Mathematics for Signal and Information Processing, International Society for Optics and Photonics, Minneapolis, Minnesota, 17–19 May 2009; Volume 7444, p. 74440L. [Google Scholar]
- Wang, D.; Ercegovac, M.D. A Radix-16 Combined Complex Division/Square Root Unit with Operand Prescaling. IEEE Trans. Comput. 2012, 61, 1243–1255. [Google Scholar] [CrossRef]
- Wang, D.; Ercegovac, M.D.; Zheng, N. Design of High-Throughput Fixed-Point Complex Reciprocal/Square-Root Unit. IEEE Trans. Circ. Syst. II Express Briefs 2010, 57, 627–631. [Google Scholar] [CrossRef]
- Mopuri, S.; Acharyya, A. Low-complexity methodology for complex square-root computation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2017, 25, 3255–3259. [Google Scholar] [CrossRef]
- Yang, B.; Wang, D.; Liu, L. Complex division and square-root using CORDIC. In Proceedings of the 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), Yichang, China, 21–23 April 2012; pp. 2464–2468. [Google Scholar] [CrossRef]
- Mopuri, S.; Acharyya, A. Low-Complexity and High-Speed Architecture Design Methodology for Complex Square Root. Circ. Syst. Signal Process. 2021, 40, 5759–5772. [Google Scholar] [CrossRef]
- Sun, H.; Luo, Y.; Ha, Y.; Shi, Y.; Gao, Y.; Shen, Q.; Pan, H. A Universal Method of Linear Approximation With Controllable Error for the Efficient Implementation of Transcendental Functions. IEEE Trans. Circ. Syst. I Regul. Pap. 2020, 67, 177–188. [Google Scholar] [CrossRef]
- Dong, H.; Wang, M.; Luo, Y.; Zheng, M.; An, M.; Ha, Y.; Pan, H. PLAC: Piecewise Linear Approximation Computation for All Nonlinear Unary Functions. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2020, 28, 2014–2027. [Google Scholar] [CrossRef]
- Lyu, F.; Xu, X.; Wang, Y.; Luo, Y.; Wang, Y.; Pan, H. Ultralow-Latency VLSI Architecture Based on a Linear Approximation Method for Computing Nth Roots of Floating-Point Numbers. IEEE Trans. Circ. Syst. I Regul. Pap. 2021, 68, 715–727. [Google Scholar] [CrossRef]
- Yeats, E.C.; Chen, Y.; Li, H. Improving Gradient Regularization using Complex-Valued Neural Networks. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 11953–11963. [Google Scholar]
- Lyu, F.; Mao, Z.; Zhang, J.; Wang, Y.; Luo, Y. PWL-Based Architecture for the Logarithmic Computation of Floating-Point Numbers. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2021, 29, 1470–1474. [Google Scholar] [CrossRef]
- Luo, Y.; Wang, Y.; Sun, H.; Zha, Y.; Wang, Z.; Pan, H. CORDIC-Based Architecture for Computing Nth Root and Its Implementation. IEEE Trans. Circ. Syst. I Regul. Pap. 2018, 65, 4183–4195. [Google Scholar] [CrossRef]
- An, M.; Luo, Y.; Zheng, M.; Wang, Y.; Dong, H.; Wang, Z.; Peng, C.; Pan, H. Piecewise Parabolic Approximate Computation Based on an Error-Flattened Segmenter and a Novel Quantizer. Electronics 2021, 10, 2704. [Google Scholar] [CrossRef]
Design | Total Number of Segments | Number of Iterations | qw | MAEreal | AAEreal | MAEimg | AAEimg |
---|---|---|---|---|---|---|---|
Proposed | 52 | – | 12 | ||||
[8] Interpolation | – | – | 11 | ||||
−29.38% | – | −100% | −29.92% | −19.51% | −23.65% | −18.90% | |
[10] CORDIC | – | 34 | 14 | ||||
−59.62% | −70.48% | −0% | −41.60% | −21.66% | −51.66% | −45.82% | |
[11] CORDIC | – | 35 | 14 | ||||
−56.32% | −65.23% | −100% | −13.27% | −2.58% | −27.12% | −8.28% | |
[17] CORDIC | – | 36 | 13 | ||||
−56.16% | −62.53% | −100% | −12.40% | −4.00% | −25.43% | −18.65% |
Design | Area (m2) | Delay (ns) | Power (mW) | ADP (pJ m2) |
---|---|---|---|---|
Proposed | 9451 | 11 | 2.72 | 282,773.92 |
[8] Interpolation | 26,409 | 13.2 | 7.79 | 2,715,584.65 |
−64.21% | −16.67% | −65.08% | −89.59% | |
[10] CORDIC | 46,773 | 34 | 7.76 | 12,340,588.32 |
−79.79% | −67.65% | −64.95% | −97.71% | |
[11] CORDIC | 51,402 | 27 | 8.90 | 12,351,900.60 |
−81.61% | −59.26% | −69.44% | −97.71% | |
[17] CORDIC | 39,165.48 | 29 | 6.88 | 7,814,296.57 |
−75.87% | −62.07% | −60.47% | −96.38% |
Design | LUTs | Registers | DSP | Delay (ns) | Power (W) |
---|---|---|---|---|---|
Proposed | 577 | 420 | 0 | 66 | 0.141 |
[8] Interpolation | 817 | 333 | 8 | 92.4 | 0.303 |
−29.38% | – | −100% | −28.57% | −53.47% | |
[10] CORDIC | 1429 | 1423 | 0 | 180.2 | 0.212 |
−59.62% | −70.48% | −0% | −63.37% | −33.49% | |
[11] CORDIC | 1321 | 1208 | 3 | 167.4 | 0.197 |
−56.32% | −65.23% | −100% | −60.57% | −28.43% | |
[17] CORDIC | 1316 | 1121 | 5 | 136.3 | 0.248 |
−56.16% | −62.53% | −100% | −51.58% | −43.15% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Liang, X.; Xu, W.; Han, C.; Lyu, F.; Luo, Y.; Li, Y. An Efficient Hardware Implementation for Complex Square Root Calculation Using a PWL Method. Electronics 2023, 12, 3012. https://doi.org/10.3390/electronics12143012
Wang Y, Liang X, Xu W, Han C, Lyu F, Luo Y, Li Y. An Efficient Hardware Implementation for Complex Square Root Calculation Using a PWL Method. Electronics. 2023; 12(14):3012. https://doi.org/10.3390/electronics12143012
Chicago/Turabian StyleWang, Yu, Xingcheng Liang, Weizhe Xu, Caofan Han, Fei Lyu, Yuanyong Luo, and Yun Li. 2023. "An Efficient Hardware Implementation for Complex Square Root Calculation Using a PWL Method" Electronics 12, no. 14: 3012. https://doi.org/10.3390/electronics12143012
APA StyleWang, Y., Liang, X., Xu, W., Han, C., Lyu, F., Luo, Y., & Li, Y. (2023). An Efficient Hardware Implementation for Complex Square Root Calculation Using a PWL Method. Electronics, 12(14), 3012. https://doi.org/10.3390/electronics12143012