Low-Latency Hardware Implementation of High-Precision Hyperbolic Functions Sinhx and Coshx Based on Improved CORDIC Algorithm
Abstract
:1. Introduction
2. Mathematical Background
2.1. Basic CORDIC Algorithm
Yi+1 = Yi + σi 2−i Xi
Zi+1 = Zi − σi αi
2.2. Computation of Functions Sinhx and Coshx with CORDIC
2.3. Range of Convergence for Basic Hyperbolic CORDIC Algorithm
2.4. Another Computation of Functions Sinhx and Coshx
3. Quadruple-Step-Ahead Hyperbolic CORDIC Architecture
3.1. Improvement of Basic CORDIC Algorithm
+ 2−(2i+5) * [16 σi+1 σi + 8 σi+2 σi+ 4 σi+2 σi+1 + 4 σi+3 σi+ 2 σi+3 σi+1 + σi+3 σi+2]}
+ Yi * { 2−(i+3) * [8 σi + 4 σi+1 + 2σi+2 + σi+3]
+ 2−(3i+6) * [8 σi+2 σi+1 σi + 4 σi+3σi+1σi + 2 σi+3 σi+2 σi + σi+3 σi+2 σi+1]}
+ 2−(2i+5) * [16 σi+1 σi + 8 σi+2 σi + 4 σi+2 σi+1 + 4 σi+3 σi + 2 σi+3 σi+1 + σi+3 σi+2]}
+ Xi * { 2−(i+3) * [8 σi + 4 σi+1 + 2σi+2 + σi+3]
+ 2−(3i+6) * [8 σi+2 σi+1 σi + 4 σi+3 σi+1 σi + 2 σi+3 σi+2 σi + σi+3 σi+2 σi+1]}
3.2. General Architecture of QH-CORDIC
3.3. ROC of QH-CORDIC for Exponential Function
3.4. Validity of Computing Exponential Function with QH-CORDIC
3.5. Simplified Computing of B in Formula (16) or (19)
4. Hardware Implementation of Hyperbolic Functions Sinhx and Coshx with QH-CORDIC
5. Implementation and Comparisons
5.1. Functional Verification
5.2. FPGA Implementation Analysis
5.3. ASIC Implementation Performance
5.4. Related Works and Comparisons
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Muller, J.M. Elementary Functions: Algorithms and Implementations, 2nd ed.; Birkhauser: Basel, Switzerland, 2006. [Google Scholar]
- Parhami, B. Computer Arithmetic: Algorithms and Hardware Designs; Oxford University Press: Oxford, UK, 1999. [Google Scholar]
- Saha, A.; Kumar, K.G.; Ghosh, A.; Naskar, M.K. Area efficient architecture of Hyperbolic functions for high frequency applications. In Proceedings of the 2017 International Conference on Circuits, Controls, and Communications (CCUBE), Bangalore, India, 15–16 December 2017; pp. 139–142. [Google Scholar]
- Tang, P.T.P. Table-lookup Algorithms for Elementary Functions and Their Error Analysis. In Proceedings of the 10th IEEE Symposium on Computer Arithmetic, Grenoble, France, 26–28 June 1991; pp. 232–236. [Google Scholar]
- Saint-Geniès, H.d.L.; Defour, D.; Revy, G. Exact Lookup Tables for the Evaluation of Trigonometric and Hyperbolic Functions. IEEE Trans. Comput. 2017, 66, 2058–2071. [Google Scholar] [CrossRef] [Green Version]
- Koren, I.; Zinaty, O. Evaluating Elementary Functions in a Numerical Coprocessor Based on Rational Approximations. IEEE Trans. Comput. 1990, 39, 1030–1037. [Google Scholar] [CrossRef]
- Schulte, M.J.; Swartzlander, E.E. Hardware Design for Exactly Rounded Elementary Functions. IEEE Trans. Comput. 1994, 43, 964–973. [Google Scholar] [CrossRef]
- Volder, J.E. The CORDIC Trigonometric Computing Technique. IEEE Trans. Electron. Comput. 1959, EC-8, 330–334. [Google Scholar] [CrossRef]
- Boudabous, A.; Ghozzi, F.; Kharrat, M.W.; Masmoudi, N. Implementation of hyperbolic functions using CORDIC algorithm. In Proceedings of the 16th International Conference on Microelectronics, Tunis, Tunisia, 6–8 December 2004; pp. 738–741. [Google Scholar]
- Vazquez, Á.; Villalba, J.; Antelo, E. Computation of Decimal Transcendental Functions Using the CORDIC Algorithm. In Proceedings of the 2009 19th IEEE Symposium on Computer Arithmetic, Portland, OR, USA, 8–10 June 2009; pp. 179–186. [Google Scholar]
- Ross, D.-M.; Miller, S.; Mihai, S. Exploration of sign precomputation-based CORDIC in reconfigurable systems. In Proceedings of the 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA, USA, 6–9 November 2011; pp. 2186–2191. [Google Scholar]
- Kuhlmann, M.; Parhi, K.K. P-CORDIC: A Precomputation Based Rotation CORDIC Algorithm. EURASIP J. Adv. Signal Process. 2002, 2002, 936–943. [Google Scholar] [CrossRef] [Green Version]
- Srikanthan, T.; Gisuthan, B. A novel technique for eliminating iterative based computation of polarity of micro-rotations in CORDIC based sine-cosine generators. Microprocess. Microsyst. 2002, 26, 243–252. [Google Scholar] [CrossRef]
- Gisuthan, B.; Srikanthan, T. FLAT CORDIC: A Unified Architecture for High-Speed Generation of Trigonometric and Hyperbolic Functions. In Proceedings of the 43rd Midwest Symposium on Circuits and Systems (MWSCAS 2000), Lansing, MI, USA, 8–11 August 2000; pp. 1414–1417. [Google Scholar]
- Juang, T.-B.; Hsiao, S.-F.; Tsai, M.-Y. Para-CORDIC: Parallel CORDIC Rotation Algorithm. Trans. Circuits Syst.—I Regul. Pap. 2004, 51, 1515–1524. [Google Scholar] [CrossRef]
- Gaines, B.R. Stochastic Computing. In Proceedings of the American Federation of Information Processing Societies Spring Joint Computer Conf, Atlantic City, NJ, USA, 18–20 April 1967. [Google Scholar]
- Parhi, K.; Liu, Y. Computing Arithmetic Functions Using Stochastic Logic by Series Expansion. IEEE Trans. Emerg. Top. Comput. 2016, 7, 1–13. [Google Scholar] [CrossRef]
- Card, B.D.; Brown, H.C. Stochastic Neural Computation I: Computational Elements. IEEE Trans. Comput. 2001, 50, 891–905. [Google Scholar]
- Liu, Y.; Parhi, K.K. Computing hyperbolic tangent and sigmoid functions using stochastic logic. In Proceedings of the 2016 50th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 6–9 November 2016; pp. 1580–1585. [Google Scholar]
- Luong, T.; Nguyen, V.; Nguyen, A.; Popovici, E. Efficient Architectures and Implementation of Arithmetic Functions Approximation Based Stochastic Computing. In Proceedings of the 2019 IEEE 30th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), New York, NY, USA, 15–17 July 2019; pp. 281–287. [Google Scholar]
- Walther, J.S. A Unified Algorithm for Elementary Functions. In Proceedings of the AFIPS Spring Joint Computer Conference, New York, NY, USA, 18–20 May 1971; pp. 379–385. [Google Scholar]
- Kharrat, M.W.; Loulou, M.; Masmoudi, N.; Kamoun, L. A New Method to Implement CORDIC Algorithm. In Proceedings of the International Conference on Electronics, Circuits and Systems ICECS, Malta, Malta, 2–5 September 2001. [Google Scholar]
- Eklund, N. CORDIC: Elementary Function Computation Using Recursive Sequences. Issue Coll. Math. J. 2001, 32, 330–333. [Google Scholar] [CrossRef]
- Llamocca-Obregón, R.D.; Agurto-Ríos, P.C. A fixed-point implementation of the expanded hyperbolic CORDIC algorithm. Lat. Am. Appl. Res. 2007, 37, 83–91. [Google Scholar]
- De Dinechin, F.; Pasca, B. Floating-point exponential function-ns for DSP-enabled FPGAs. In Proceedings of the IEEE International Conference on Field-Program Technology, Beijing, China, 8–10 December 2010; pp. 110–117. [Google Scholar]
- Langhammer, M.; Pasca, B. Single precision logarithm and exponential architectures for hard floating-point enabled FPGAs. IEEE Trans. Comput. 2017, 66, 2031–2043. [Google Scholar] [CrossRef]
- Pineiro, J.-A.; Ercegovac, M.D.; Bruguera, J.D. Algorithm and architecture for logarithm, exponential, and powering computation. IEEE Trans. Comput. 2004, 53, 1085–1096. [Google Scholar] [CrossRef]
- Chen, D.; Han, L.; Ko, S.B. Decimal floating-point antilogarithmic converter based on selection by rounding: Algorithm and architecture. IET Comput. Digit. Technol. 2012, 6, 277–289. [Google Scholar] [CrossRef]
- Chen, D.; Han, L.; Choi, Y.; Ko, S.-B. Improved decimal floating-point logarithmic converter based on selection by rounding. IEEE Trans. Comput. 2012, 61, 607–621. [Google Scholar] [CrossRef]
- Meher, P.K.; Valls, J.; Juang, T.-B.; Sridharan, K.; Maharatna, K. 50 years of CORDIC: Algorithms, architectures, and applications. IEEE Trans. Circuits Syst. I Reg. Pap. 2009, 56, 1893–1907. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Dinavahi, V. Real-time digital multi-function protection system on reconfigurable hardware. IET Gen. Transm. Distrib. 2016, 10, 2295–2305. [Google Scholar] [CrossRef]
- Phatak, D.S. Double step branching CORDIC: A new algorithm for fast sine and cosine generation. IEEE Trans. Comput. 1998, 47, 587–602. [Google Scholar] [CrossRef] [Green Version]
- Xia, J.; Fu, W.; Liu, M.; Wang, M. Low-Latency Bit-Accurate Architecture for Configurable Precision Floating-Point Division. Appl. Sci. 2021, 11, 4988. [Google Scholar] [CrossRef]
- Huai, L.; Li, P.; Sobelman, G.E.; Lilja, D.J. Stochastic computing implementation of trigonometric and hyperbolic functions. In Proceedings of the 2017 IEEE 12th International Conference on ASIC (ASICON), Guiyang, China, 25–28 October 2017; pp. 553–556. [Google Scholar]
- Hayes, J.P. Introduction to stochastic computing and its challenges. In Proceedings of the 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 8–12 June 2015; pp. 1–3. [Google Scholar]
- Chen, T.; Ting, P.; Hayes, J.P. Achieving progressive precision in stochastic computing. In Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, QC, Canada, 14–16 November 2017; pp. 1320–1324. [Google Scholar]
m | Mode 1 | Initial Values | Functions 2 | |
---|---|---|---|---|
Xn | Yn or Zn | |||
1 | R | X0 = 1, Y0 = 0, Z0 = θ | cosθ | Yn = sinθ |
−1 | R | X1 = 1, Y1 = 0, Z1 = θ | coshθ | Yn = sinhθ |
−1 | R | X1 = a, Y1 = a, Z1 = θ | aeθ | Yn = aeθ |
1 | V | X0 = 1, Y0 = a, Z0 = ℼ/2 | √(a2 + 1) | Zn = cot−1a |
−1 | V | X1 = a, Y1 = 1, Z1 = 0 | √(a2 − 1) | Zn = coth−1a |
−1 | V | X1 = a + 1, Y1 = a − 1, Z1 = 0 | 2√a | Zn = 0.5lna |
−1 | V | X1 = a + 1/4, Y1 = a – 1/4, Z1 = 0 | √a | Zn = ln(a/4) |
−1 | V | X1 = a + b, Y1 = a – b, Z1 = 0 | 2√ab | Zn = 0.5ln(a/b) |
Case | σi | σi+1 | σi+2 | σi+3 | Yi+4 |
---|---|---|---|---|---|
1 | −1 | −1 | −1 | −1 | Yi+4 = Yi × [1 + 2−(4n+6) + 35 × 2−(2n+5)] + Xi × [– 15 × 2−(n+3) – 15 × 2−(3n+6)] |
2 | −1 | −1 | −1 | 1 | Yi+4 = Yi × [1 – 2−(4n+6) + 21 × 2−(2n+5)] + Xi × [– 13 × 2−(n+3) – 2−(3n+6)] |
3 | −1 | −1 | 1 | −1 | Yi+4 = Yi × [1 – 2−(4n+6) + 9 × 2−(2n+5)] + Xi × [– 11 × 2−(n+3) + 7 × 2−(3n+6)] |
4 | −1 | 1 | −1 | −1 | Yi+4 = Yi × [1 – 2−(4n+6) – 9 × 2−(2n+5)] + Xi × [– 7 × 2−(n+3) + 11 × 2−(3n+6)] |
5 | 1 | −1 | −1 | −1 | Yi+4 = Yi × [1 – 2−(4n+6) – 21 × 2−(2n+5)] + Xi × [2−(n+3) + 13 × 2−(3n+6)] |
6 | −1 | −1 | 1 | 1 | Yi+4 = Yi × [1 + 2−(4n+6) – 2−(2n+5)] + Xi × [– 9 × 2−(n+3) + 9 × 2−(3n+6)] |
7 | −1 | 1 | −1 | 1 | Yi+4 = Yi × [1 + 2−(4n+6) – 15 × 2−(2n+5)] + Xi × [– 5 × 2−(n+3) + 5 × 2−(3n+6)] |
8 | −1 | 1 | 1 | −1 | Yi+4 = Yi × [1 + 2−(4n+6) – 19×2−(2n+5)] + Xi × [– 3 × 2−(n+3) – 3 × 2−(3n+6)] |
9 | 1 | −1 | −1 | 1 | Yi+4 = Yi × [1 + 2−(4n+6) – 19 × 2−(2n+5)] + Xi × [3 × 2−(n+3) + 3 × 2−(3n+6)] |
10 | 1 | −1 | 1 | −1 | Yi+4 = Yi × [1 + 2−(4n+6) – 15 × 2−(2n+5)] + Xi × [5 × 2−(n+3) – 5 × 2−(3n+6)] |
11 | 1 | 1 | −1 | −1 | Yi+4 = Yi × [1 + 2−(4n+6) – 2−(2n+5)] + Xi × [9 × 2−(n+3) – 9 × 2−(3n+6)] |
12 | −1 | 1 | 1 | 1 | Yi+4 = Yi × [1 – 2−(4n+6) – 21 × 2−(2n+5)] + Xi × [– 2−(n+3) – 13 × 2−(3n+6)] |
13 | 1 | −1 | 1 | 1 | Yi+4 = Yi × [1 – 2−(4n+6) – 9 × 2−(2n+5)] + Xi × [7 × 2−(n+3) – 11 × 2−(3n+6)] |
14 | 1 | 1 | −1 | 1 | Yi+4 = Yi × [1 – 2−(4n+6) + 9 × 2−(2n+5)] + Xi × [11 × 2−(n+3) – 7 × 2−(3n+6)] |
15 | 1 | 1 | 1 | −1 | Yi+4 = Yi × [1 – 2−(4n+6) + 21 × 2−(2n+5)] + Xi × [13 × 2−(n+3) + 2−(3n+6)] |
16 | 1 | 1 | 1 | 1 | Yi+4 = Yi × [1 + 2−(4n+6) + 35 × 2−(2n+5)] + Xi × [15 × 2−(n+3) + 15 × 2−(3n+6)] |
Case | σi | σi+1 | σi+2 | σi+3 | Zi+4 |
---|---|---|---|---|---|
1 | −1 | −1 | −1 | −1 | Zi+4 = Zi + αi + αi+1 + αi+2 + αi+3 |
2 | −1 | −1 | −1 | 1 | Zi+4 = Zi + αi + αi+1 + αi+2 – αi+3 |
3 | −1 | −1 | 1 | −1 | Zi+4 = Zi + αi + αi+1 – αi+2 + αi+3 |
4 | −1 | 1 | −1 | −1 | Zi+4 = Zi + αi – αi+1 + αi+2 + αi+3 |
5 | 1 | −1 | −1 | −1 | Zi+4 = Zi – αi + αi+1 + αi+2 + αi+3 |
6 | −1 | −1 | 1 | 1 | Zi+4 = Zi+ αi + αi+1 – αi+2 – αi+3 |
7 | −1 | 1 | −1 | 1 | Zi+4 = Zi + αi – αi+1 + αi+2 – αi+3 |
8 | −1 | 1 | 1 | −1 | Zi+4 = Zi + αi – αi+1 – αi+2 + αi+3 |
9 | 1 | −1 | −1 | 1 | Zi+4 = Zi – αi + αi+1 + αi+2 – αi+3 |
10 | 1 | −1 | 1 | −1 | Zi+4= Zi– αi + αi+1 – αi+2 + αi+3 |
11 | 1 | 1 | −1 | −1 | Zi+4 = Zi – αi – αi+1 + αi+2 + αi+3 |
12 | −1 | 1 | 1 | 1 | Zi+4 = Zi + αi – αi+1 – αi+2 – αi+3 |
13 | 1 | −1 | 1 | 1 | Zi+4 = Zi – αi + αi+1 – αi+2 – αi+3 |
14 | 1 | 1 | −1 | 1 | Zi+4 = Zi – αi – αi+1 + αi+2 – αi+3 |
15 | 1 | 1 | 1 | −1 | Zi+4 = Zi – αi – αi+1 – αi+2 + αi+3 |
16 | 1 | 1 | 1 | 1 | Zi+4 = Zi – αi – αi+1 – αi+2 – αi+3 |
Paper [3] | Paper [32] | Proposed | |
---|---|---|---|
Clock cycles | 128 (100%) | 64 (50%) | 32 (25%) |
Time taken (ns) | 1280 (100%) | 640 (50%) | 320 (25%) |
Slice | 1106 (100%) | 7624 (689.3%) | 9430 (852.6%) |
Slice flip flops | 337 (100%) | 462 (137.1%) | 512 (151.9%) |
Four-input LUTs | 3403 (100%) | 24168 (710.2%) | 29172 (857.2%) |
Bonded IOBs | 403 (100%) | 425 (105.5%) | 403(100%) |
Paper [3] | Paper [32] | Proposed | |
---|---|---|---|
Area (μm2) | 451782 (100%) | 909540 (201.3%) | 1321500 (292.5%) |
Power (mW) | 4.11 (100%) | 8.12 (197.6%) | 12.60 (306.6%) |
Latency (cycle) | 137 (100%) | 73 (53.3%) | 41 (29.9%) |
Period (ns) | 3.3 | ||
Total time (ns) 1 | 452.1 (100%) | 240.9 (53.3%) | 135.3 (29.9%) |
ATP (mm2∙ns) 2 | 204.25 (100%) | 219.11 (107.3%) | 178.79 (87.5%) |
Total energy (fJ) 3 | 1858.13 (100%) | 1956.11 (105.3%) | 1580.04 (85%) |
Energy efficiency (fJ/bit) 4 | 14.52 (100%) | 15.28 (105.2%) | 12.34 (84.9%) |
Area efficiency (bit/(mm2∙ns)) 5 | 0.63 (100%) | 0.58 (92.1%) | 0.71 (112.7%) |
LUT Method | Stochastic Computing | CORDIC Algorithms | |||||
---|---|---|---|---|---|---|---|
Paper [5] | Paper [34] | Paper [20] | Paper [9] | Paper [3] | Paper [32] | Proposed | |
Accuracy (bit) | 4 | 10 | 7 | 8 | 4 | 10 | 128 |
Function Error | - | - | MAE 1 = 0.0043 | MRE 2 = 0.45 | MAE = 0.043 | - | <2−113 |
LUT volume 3 | 77 × 14 | No LUTs | 20 × 8 | Entry depth = 8 | Entry depth = 4 | Entry depth = 10 | 136 × 128 |
ROC 4 | [0,10080] | [0,1] | [0,1] | [−1,1] | [−1.207,1.207] | [−1.743,1.743] | (−215,215) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fu, W.; Xia, J.; Lin, X.; Liu, M.; Wang, M. Low-Latency Hardware Implementation of High-Precision Hyperbolic Functions Sinhx and Coshx Based on Improved CORDIC Algorithm. Electronics 2021, 10, 2533. https://doi.org/10.3390/electronics10202533
Fu W, Xia J, Lin X, Liu M, Wang M. Low-Latency Hardware Implementation of High-Precision Hyperbolic Functions Sinhx and Coshx Based on Improved CORDIC Algorithm. Electronics. 2021; 10(20):2533. https://doi.org/10.3390/electronics10202533
Chicago/Turabian StyleFu, Wenjia, Jincheng Xia, Xu Lin, Ming Liu, and Mingjiang Wang. 2021. "Low-Latency Hardware Implementation of High-Precision Hyperbolic Functions Sinhx and Coshx Based on Improved CORDIC Algorithm" Electronics 10, no. 20: 2533. https://doi.org/10.3390/electronics10202533
APA StyleFu, W., Xia, J., Lin, X., Liu, M., & Wang, M. (2021). Low-Latency Hardware Implementation of High-Precision Hyperbolic Functions Sinhx and Coshx Based on Improved CORDIC Algorithm. Electronics, 10(20), 2533. https://doi.org/10.3390/electronics10202533