F_Radish: Enhancing Silent Data Corruption Detection for Aerospace-Based Computing
Abstract
:1. Introduction
- (1)
- Redundant assertions impair the detection efficiency
- (2)
- The detection degree and benign detection ratio are not considered in the process of assertion selection.
- (1)
- The detection degree and benign detection ratio of an assertion are considered during the process of assertion screening. For a program point, the importance of each of its assertions is evaluated based on the detection degree and benign detection ratio. As a result, only the most important assertion remains in the program point.
- (2)
- Redundant assertions in neighbouring program points are handled. The redundancy degree of an assertion with respect to its neighbouring assertion is calculated. If the redundancy degree exceeds a specified threshold, the gain and loss of deleting the assertion are evaluated. When there is a profit, the assertion is deleted.
- (3)
- An evaluation of F_Radish is conducted. Compared to Radish, the SDC detection efficiency of F_Radish is about two times greater. Moreover, the percentage increase of the detection degree is 10%. In addition, F_Radish reduces the benign detection ratio from 27.8% to 19.2%.
2. Related Work
3. Overview of the F_Radish Approach
4. The Stages of F_Radish
4.1. Screening Assertions for Every Program Point
- (1)
- Determining the benign detection ratio of assertions
- (2)
- Determining the detection degree of assertions
- (3)
- Calculating the importance of assertions
4.2. Screening Assertions for Neighbouring Program Points
- (1)
- Calculating the redundancy degree of with respect to
Algorithm 1 Screening assertions for every program point. |
|
- (2)
- Evaluating the profit of deleting
Algorithm 2 Screening assertions for neighbouring program points. |
|
5. Experimental Analysis
5.1. Experimental Setup
- (1)
- Fault injection.
- (2)
- Benchmarks.
- (3)
- Evaluation Metrics.
5.2. Experimental Evaluation
- (1)
- SDC coverage, detection overhead and detection efficiency
- (2)
- Benign detection ratio
- (3)
- Detection degree
- (4)
- Different contributions of the two stages of F_Radish
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- James, B.; Tran, L.T.; Bolst, D.; Peracchi, S.; Davis, J.A.; Prokopovich, D.A.; Guatelli, S.; Petasecca, M.; Lerch, M.; Povoli, M.; et al. SOI Thin Microdosimeters for High LET Single-Event Upset Studies in Fe, O, Xe, and Cocktail Ion Beam Fields. IEEE Trans. Nucl. Sci. 2020, 67, 146–153. [Google Scholar] [CrossRef] [Green Version]
- Olsen, J.; Becher, P.E.; Fynbo, P.B.; Raaby, P.; Schultz, J. Neutron-induced Single Event Upsets in Static RAMS Observed at 10 km Flight Altitude. IEEE Trans. Nucl. Sci. 1993, 40, 74–77. [Google Scholar] [CrossRef]
- Martínez, J.A.; Maestro, J.A.; Reviriego, P. Evaluating the Impact of the Instruction Set on Microprocessor Reliability to Soft Errors. IEEE Trans. Device Mater. Reliab. 2018, 18, 70–79. [Google Scholar] [CrossRef]
- Yang, N.; Wang, Y. Identify Silent Data Corruption Vulnerable Instructions using SVM. IEEE Access 2019, 2019, 40210–40219. [Google Scholar] [CrossRef]
- Yang, N.; Wang, Y. Predicting the Silent Data Corruption Vulnerability of Instructions in Programs. In Proceedings of the IEEE International Conference on Parallel and Distributed Systems, Tianjin, China, 4–6 December 2019; pp. 862–869. [Google Scholar]
- Li, G.P.; Pattabiraman, K.; Hari, S.K.S.; Sullivan, M.; Tsai, T. Modeling Soft-Error Propagation in Programs. In Proceedings of the IEEE International Conference Dependable Systems and Networks, Luxembourg, 25–28 June 2018; pp. 27–38. [Google Scholar]
- Snir, M.; Wisniewski, R.W.; Abraham, J.A. Addressing Failures in Exascale Computing. Int. J. High Perform. Comput. Appl. 2014, 28, 129–173. [Google Scholar] [CrossRef] [Green Version]
- Calhoun, J.; Snir, M.; Olson, L.N.; Gropp, W.D. Toward a More Complete Understanding of SDC Propagation. In Proceedings of the International Symposium High-Performance Parallel and Distributed Computing, Washington, DC, USA, 26–30 June 2017; pp. 131–142. [Google Scholar]
- Reis, G.A.; Chang, J.; Vachharajani, N.; Rangan, R.; August, D.I. SWIFT: Software Implemented Fault Tolerance. In Proceedings of the International Symposium Code Generation and Optimization, New York, NY, USA, 20–23 March 2005; pp. 243–254. [Google Scholar]
- Didehban, M.; Shrivastava, A.; Lokam, S.R.D. NEMESIS: A Software Approach for Computing in Presence of Soft Errors. In Proceedings of the IEEE/ACM Int. Conf. Computer-Aided Design, Irvine, CA, USA, 13–16 November 2017; pp. 297–304. [Google Scholar]
- Thati, V.B.; Vankeirsbilck, J.; Boydens, J.; Pissort, D. Selective Duplication and Selective Comparison for Data flow Error Detection. In Proceedings of the International Conference System Reliability and Safety, Rome, Italy, 20–22 November 2019; pp. 10–15. [Google Scholar]
- Berrocal, E.; Gomez, L.B.; Di, S.; Lan, Z.L.; Cappello, F. Lightweight Silent Data Corruption Detection Based on Runtime Data Analysis for HPC Applications. In Proceedings of the International Symposium High-performance Parallel & Distributed Computing, Portland, OR, USA, 15–19 June 2015; pp. 1–4. [Google Scholar]
- Thomas, T.E.; Bhattad, A.J.; Mitra, S.; Bagchi, S. Sirius: Neural Network based Probabilistic Assertions for Detecting Silent Data Corruption in Parallel Programs. In Proceedings of the IEEE International Symposium Reliable Distributed Systems, Budapest, Hungary, 26–29 September 2016; pp. 41–50. [Google Scholar]
- Ma, J.; Yu, D.Y.; Wang, Y.; Cai, Z.B.; Zhang, Q.X.; Hu, C. Detecting Silent Data Corruptions in Aerospace-based Computing using Program Invariants. Int. J. Aerosp. Eng. 2016, 2016, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Sahoo, S.K.; Li, M.L.; Ramachandran, P.; Adve, S.V.; Sdve, V.S.; Zhou, Y.Y. Using Likely Program Invariants to Detect Hardware Errors. In Proceedings of the IEEE International Conference Dependable Systems and Networks with FTCS and DCC, Anchorage, AK, USA, 24–27 June 2008; pp. 70–79. [Google Scholar]
- Di, S.; Cappello, F. Adaptive Impact-Driven Detection of Silent Data Corruption for HPC Applications. IEEE Trans. Parallel Distrib. Syst. 2016, 27, 2809–2823. [Google Scholar] [CrossRef]
- Restrepo-Calle, F.; Martĺnez-Álvarez, A.; Asensi, S.C.; Morenilla, A.J. Selective SWIFT-R: A Flexible Software-Based Technique for Soft Error Mitigation in Low-Cost Embedded Systems. J. Electron. Test. 2013, 29, 825–838. [Google Scholar] [CrossRef] [Green Version]
- Rehman, S.; Shafique, M.; Aceituno, P.V.; Kriebel, F.; Chen, J.J.; Henkel, J. Leveraging Variable Function Resilience for Selective Software Reliability on Unreliable Hardware. In Proceedings of the IEEE International Conference Design, Automation & Test in Europe Conference & Exhibition, Grenoble, France, 18–22 March 2013; pp. 1759–1764. [Google Scholar]
- Mutlu, B.O.; Kestor, G.; Cristal, A.; Unsal, O.; Krishnamoorthy, S. Ground-Truth Prediction to Accelerate Soft-Error Impact Analysis for Iterative Methods. In Proceedings of the IEEE International Conference High Performance Computing, Data, and Analytics, Hyderabad, India, 17–20 December 2019; pp. 333–344. [Google Scholar]
- Chen, C.; Eisenhauer, G.; Wolf, M.; Pande, S. LADR: Low-cost Application-level Detector for Reducing Silent Output Corruptions. In Proceedings of the International Symposium High-Performance Parallel and Distributed Computing, Tempe, AZ, USA, 11–15 June 2018; pp. 156–167. [Google Scholar]
- Racunas, P.; Constantinides, K.; Manne, S.; Mukherjee, S.S. Perturbation-based Fault Screening. In Proceedings of the IEEE International Symposium High Performance Computer Architectur, Scottsdale, AZ, USA, 10–14 February 2007; pp. 169–180. [Google Scholar]
- Hari, S.K.S.; Adve, S.V.; Naeimi, H. Low-cost Program-level Detectors for Reducing Silent Data Corruptions. In Proceedings of the IEEE International Conference Dependable Systems and Networks, Boston, MA, USA, 25–28 June 2012; pp. 1–12. [Google Scholar]
- Ernst, M.D.; Perkins, J.H.; Guo, P.J.; McCamant, S.; Pacheco, C.; Tschantz, M.S.; Xiao, C. The Daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 2007, 69, 35–45. [Google Scholar] [CrossRef] [Green Version]
- Fang, B.; Lu, Q.; Pattabiraman, K.; Ripeanu, M.; Gurumurthi, S. ePVF: An Enhanced Program Vulnerability Factor Methodology for Cross-layer Resilience Analysis. In Proceedings of the IEEE International Conference Dependable Systems and Networks, Toulouse, France, 28 June–1 July 2016; pp. 168–179. [Google Scholar]
- Lu, Q.; Pattabiraman, K.; Gupta, M.S.; Rivers, J.A. SDCTune: A Model for Predicting the SDC Proneness of an Application for Configurable Protection. In Proceedings of the International Conference Compilers, Architecture and Synthesis for Embedded Systems, Greens, India, 12–17 October 2014; pp. 1–10. [Google Scholar]
- Ma, J.; Wang, Y.; Zhou, L.; Hu, C.; Wang, H. SDCInfer: Inference of Silent Data Corruption Causing Instructions. In Proceedings of the International Conference Software Engineering and Service Science, Beijing, China, 23–25 September 2015; pp. 228–232. [Google Scholar]
- Ma, J.; Wang, Y. Identification of Critical Variables for Soft Error Detection. In Proceedings of the IEEE International Conference Human Centered Computing, Colombo, Sri Lanka, 7–9 January 2016; pp. 310–321. [Google Scholar]
Symbol | Description |
---|---|
O | The output set of the program, . |
The weight of , . | |
The variable set of , }. | |
The forward slice set of at , | |
The size of . | |
The k-th element of . | |
P | The initial hardened program. |
The filtered program of P after the first stage of F_Radish. | |
The filtered program of after the two stages of F_Radish. | |
The set of program points of P. | |
p | The p-th program point in . It is also called program point p for convenience. |
The set of assertions at p. | |
The q-th assertion at p. | |
The variable set of . | |
The l-th element of . | |
The forward slice set of at . | |
The set of instructions that operate one or more variables in at . | |
The backward slice set of the instructions in . | |
The number of fault injections that incur SDC and invalidate . | |
The number of fault injections that not only result in SDC but also invalidate and . | |
The threshold of redundancy degree, . | |
The set of assertion-pairs of . | |
The backward slice set of instructions that operate at . | |
The number of fault injections that are injected on the backward slice set of the instructions that operate at , and not only result in benign error but also are detected by . | |
The number of fault injections that are injected on the backward slice set of the instructions that operate at and result in benign error. | |
The weight of the detection degree. | |
The weight of the benign detection ratio. | |
The maximum detection degree. | |
The maximum benign detection ratio. | |
The instructions between assertions and . | |
x | The instruction number of the first instruction in . |
The number of the instructions in . | |
The SDC detection ratio of . | |
The execution times of instruction . | |
y | The instruction number of the first instruction of . |
z | The instruction number of the last instruction of . |
Source Code | Instruction | PR | PW | Node ID |
---|---|---|---|---|
k = 0 | mov dword ptr [esp+0x24], 0x0 | - | [esp + 0x24] | 1 |
i = 0 | mov dword ptr [esp + 0x20], 0x0 | - | [esp + 0x20] | 2 |
i < size | mov eax, dword ptr [esp + 0x20] cmp eax, dword ptr [esp – 0x8] jl 0x80486da | [esp + 0x20] eax, [esp + 0x1c] eflag | eax eflag eip | 3, 9, 15 4, 10, 16 5, 11, 17 |
k = k + i | mov eax, dword ptr [esp + 0x20] add dword ptr [esp + 0x24], eax | [esp + 0x20] eax, [esp + 0x24] | eax [esp + 0x24] | 6, 12 7, 13 |
i++ | add dword ptr [esp + 0x20], 0x1 | [esp + 0x20] | [esp + 0x20] | 8, 14 |
return k | mov eax, dword ptr [esp + 0x24] | [esp + 0x24] | eax | 18 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, N.; Wang, Y. F_Radish: Enhancing Silent Data Corruption Detection for Aerospace-Based Computing. Electronics 2021, 10, 61. https://doi.org/10.3390/electronics10010061
Yang N, Wang Y. F_Radish: Enhancing Silent Data Corruption Detection for Aerospace-Based Computing. Electronics. 2021; 10(1):61. https://doi.org/10.3390/electronics10010061
Chicago/Turabian StyleYang, Na, and Yun Wang. 2021. "F_Radish: Enhancing Silent Data Corruption Detection for Aerospace-Based Computing" Electronics 10, no. 1: 61. https://doi.org/10.3390/electronics10010061
APA StyleYang, N., & Wang, Y. (2021). F_Radish: Enhancing Silent Data Corruption Detection for Aerospace-Based Computing. Electronics, 10(1), 61. https://doi.org/10.3390/electronics10010061