Parallel Dislocation Model Implementation for Earthquake Source Parameter Estimation on Multi-Threaded GPU
Abstract
:1. Introduction
2. Background and Related Work
2.1. Earthquake Source Parameter Estimation
2.2. Nonlinear Optimization
2.3. GPU and CUDA
3. Our Proposed Approach
3.1. Dislocation Model and Dataset
3.2. GPU Kernel Optimization
- Common subexpression elimination (CSE);
- Constant caching;
- Thread block size.
3.3. Line Search Algorithm for the L-BFGS-B
Algorithm 1 L-BFGS-B algorithm. |
|
4. Experimental Result and Discussion
4.1. Kernel Optimization and Evaluation
4.2. Line Search Algorithm Comparison and Correctness Verification
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
InSAR | Interferometric synthetic aperture radar |
3D | Three-dimensional |
2D | Two-dimensional |
GPU | Graphics processing unit |
CUDA | Compute unified device architecture |
RMSE | Root-mean-square-error |
LOS | Line-of-sight |
MAI | Multiple aperture interferometry |
BFGS | Broyden–Fletcher–Goldfarb–Shanno |
L-BFGS | Limited memory Broyden–Fletcher–Goldfarb–Shanno |
L-BFGS-B | Limited memory Broyden–Fletcher–Goldfarb–Shanno with boundaries |
Armijo | Armijo line search |
BWW | Bisection method for weak-Wolfe conditions |
MT | Móre-Thuente line search |
PTX | Parallel thread execution |
SM | Streaming multiprocessor |
SIMD | single instruction, multiple data |
DRAM | Dynamic random access memory |
cuLBFGSB | Parallel implementation of the L-BFGS-B |
CSE | Common subexpression elimination |
GFLOPS | Giga floating point operations per second |
References
- Oldenburg, D.W.; Li, Y. Inversion for Applied Geophysics: A Tutorial. In Near-Surface Geophysics; Society of Exploration Geophysicists: Tulsa, OK, USA, 2005; pp. 89–150. [Google Scholar]
- Clarke, P.J.; Paradissis, D.; Briole, P.; England, P.C.; Parsons, B.E.; Billiris, H.; Veis, G.; Ruegg, J. Geodetic investigation of the 13 May 1995 Kozani-Grevena (Greece) Earthquake. Geophys. Res. Lett. 1997, 24, 707–710. [Google Scholar] [CrossRef] [Green Version]
- Feigl, K. Estimating Earthquake Source Parameters from Geodetic Measurements. In International Geophysics; Elsevier: Amsterdam, The Netherlands, 2002; Volume 81, p. 607-cp1. [Google Scholar]
- Aster, R.C.; Borchers, B.; Thurber, C.H. Parameter Estimation and Inverse Problems; Elsevier: Amsterdam, The Netherlands, 2018. [Google Scholar]
- Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A tutorial on synthetic aperture radar. IEEE Geosci. Remote. Sens. Mag. 2013, 1, 6–43. [Google Scholar] [CrossRef] [Green Version]
- Schlögel, R.; Doubre, C.; Malet, J.P.; Masson, F. Landslide deformation monitoring with ALOS/PALSAR imagery: A D-InSAR geomorphological interpretation method. Geomorphology 2015, 231, 314–330. [Google Scholar] [CrossRef]
- Hu, B.; Wu, Y.; Zhang, X.; Yang, B.; Chen, J.; Li, H.; Chen, X.; Chen, Z. Monitoring the thaw slump-derived Thermokarst in the Qinghai-Tibet plateau using satellite SAR interferometry. J. Sens. 2019, 2019, 1698432. [Google Scholar] [CrossRef]
- Gray, L. Using multiple RADARSAT InSAR pairs to estimate a full three-dimensional solution for glacial ice movement. Geophys. Res. Lett. 2011, 38, 132–140. [Google Scholar] [CrossRef]
- Okada, Y. Surface deformation due to shear and tensile faults in a half-space. Bull. Seismol. Soc. Am. 1985, 75, 1135–1154. [Google Scholar] [CrossRef]
- Okada, Y. Internal deformation due to shear and tensile faults in a half-space. Bull. Seismol. Soc. Am. 1992, 82, 1018–1040. [Google Scholar] [CrossRef]
- Yang, X.M.; Davis, P.M.; Dieterich, J.H. Deformation from inflation of a dipping finite prolate spheroid in an elastic half-space as a model for volcanic stressing. J. Geophys. Res. Solid Earth 1988, 93, 4249–4257. [Google Scholar] [CrossRef]
- McTigue, D. Elastic stress and deformation near a finite spherical magma body: Resolution of the point source paradox. J. Geophys. Res. Solid Earth 1987, 92, 12931–12940. [Google Scholar] [CrossRef]
- CUDA Toolkit. Available online: https://developer.nvidia.com/cuda-toolkit (accessed on 22 May 2021).
- Wang, C.; Ding, X.; Li, Q.; Jiang, M. Equation-based InSAR data quadtree downsampling for earthquake slip distribution inversion. IEEE Geosci. Remote. Sens. Lett. 2014, 11, 2060–2064. [Google Scholar] [CrossRef]
- De Novellis, V.; Castaldo, R.; De Luca, C.; Pepe, S.; Zinno, I.; Casu, F.; Lanari, R.; Solaro, G. Source modelling of the 2015 Wolf volcano (Galápagos) eruption inferred from Sentinel 1-A DInSAR deformation maps and pre-eruptive ENVISAT time series. J. Volcanol. Geotherm. Res. 2017, 344, 246–256. [Google Scholar] [CrossRef]
- Funning, G.J.; Parsons, B.; Wright, T.J.; Jackson, J.A.; Fielding, E.J. Surface displacements and source parameters of the 2003 Bam (Iran) earthquake from Envisat advanced synthetic aperture radar imagery. J. Geophys. Res. Solid Earth 2005, 110. [Google Scholar] [CrossRef] [Green Version]
- Qu, W.; Zhang, B.; Lu, Z.; Kim, J.W.; Zhang, Q.; Gao, Y.; Hao, M.; Zhu, W.; Qu, F. Source parameter estimation of the 2009 Ms6. 0 Yao’an Earthquake, Southern China, using InSAR observations. Remote Sens. 2019, 11, 462. [Google Scholar] [CrossRef] [Green Version]
- Dicelis, G.; Assumpção, M.; Kellogg, J.; Pedraza, P.; Dias, F. Estimating the 2008 Quetame (Colombia) earthquake source parameters from seismic data and InSAR measurements. J. S. Am. Earth Sci. 2016, 72, 250–265. [Google Scholar] [CrossRef]
- Bagnardi, M.; Hooper, A. Inversion of surface deformation data for rapid estimates of source parameters and uncertainties: A Bayesian approach. Geochem. Geophys. Geosyst. 2018, 19, 2194–2211. [Google Scholar] [CrossRef]
- Dutta, R.; Jónsson, S.; Wang, T.; Vasyura-Bathke, H. Bayesian estimation of source parameters and associated Coulomb failure stress changes for the 2005 Fukuoka (Japan) earthquake. Geophys. J. Int. 2018, 213, 261–277. [Google Scholar] [CrossRef]
- Šílený, J. Earthquake source parameters and their confidence regions by a genetic algorithm with a ‘memory’. Geophys. J. Int. 1998, 134, 228–242. [Google Scholar] [CrossRef] [Green Version]
- Picozzi, M.; Oth, A.; Parolai, S.; Bindi, D.; De Landro, G.; Amoroso, O. Accurate estimation of seismic source parameters of induced seismicity by a combined approach of generalized inversion and genetic algorithm: Application to The Geysers geothermal area, California. J. Geophys. Res. Solid Earth 2017, 122, 3916–3933. [Google Scholar] [CrossRef]
- Lee, S.; Kim, T. Search Space Reduction for Determination of Earthquake Source Parameters Using PCA and-Means Clustering. J. Sens. 2020, 2020, 8826634. [Google Scholar] [CrossRef]
- Nocedal, J.; Wright, S. Numerical Optimization; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Moré, J.J. The Levenberg–Marquardt algorithm: Implementation and theory. In Numerical Analysis; Springer: Berlin/Heidelberg, Germany, 1978; pp. 105–116. [Google Scholar]
- Shan, S. A Levenberg–Marquardt Method for Large-Scale Bound-Constrained Nonlinear Least-Squares. Ph.D. Thesis, University of British Columbia, Vancouver, BC, Canada, 2008. [Google Scholar]
- Coleman, T.F.; Li, Y. On the convergence of interior-reflective Newton methods for nonlinear minimization subject to bounds. Math. Program. 1994, 67, 189–224. [Google Scholar] [CrossRef]
- Voglis, C.; Lagaris, I. A rectangular trust region dogleg approach for unconstrained and bound constrained nonlinear optimization. In Proceedings of the WSEAS International Conference on Applied Mathematics, Corfu Island, Greece, 16–19 August 2004; Volume 7. [Google Scholar]
- Liu, D.C.; Nocedal, J. On the limited memory BFGS method for large scale optimization. Math. Program. 1989, 45, 503–528. [Google Scholar] [CrossRef] [Green Version]
- Byrd, R.H.; Lu, P.; Nocedal, J.; Zhu, C. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 1995, 16, 1190–1208. [Google Scholar] [CrossRef]
- Wolfe, P. Convergence conditions for ascent methods. SIAM Rev. 1969, 11, 226–235. [Google Scholar] [CrossRef]
- Armijo, L. Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 1966, 16, 1–3. [Google Scholar] [CrossRef] [Green Version]
- Moré, J.J.; Thuente, D.J. Line search algorithms with guaranteed sufficient decrease. ACM Trans. Math. Softw. TOMS 1994, 20, 286–307. [Google Scholar] [CrossRef]
- Kirk, D.B.; Hwu, W.-M. Programming Massively Parallel Processors: A Hands-On Approach; Morgan Kaufmann: Cambridge, MA, USA, 2016. [Google Scholar]
- CUDA C++ Programming Guide. Available online: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (accessed on 22 May 2021).
- Ryoo, S.; Rodrigues, C.I.; Baghsorkhi, S.S.; Stone, S.S.; Kirk, D.B.; Hwu, W.m.W. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Salt Lake City, UT, USA, 20–23 February 2008; pp. 73–82. [Google Scholar]
- NVCC. Available online: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html (accessed on 22 May 2021).
- Ryoo, S.; Rodrigues, C.I.; Stone, S.S.; Stratton, J.A.; Ueng, S.Z.; Baghsorkhi, S.S.; Hwu, W.-M. Program optimization carving for GPU computing. J. Parallel Distrib. Comput. 2008, 68, 1389–1401. [Google Scholar] [CrossRef]
- Plaza, A.; Du, Q.; Chang, Y.L.; King, R.L. High performance computing for hyperspectral remote sensing. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2011, 4, 528–544. [Google Scholar] [CrossRef]
- Sánchez, S.; Ramalho, R.; Sousa, L.; Plaza, A. Real-time implementation of remotely sensed hyperspectral image unmixing on GPUs. J. Real-Time Image Process. 2015, 10, 469–483. [Google Scholar] [CrossRef]
- Liao, P.C.; Lii, C.C.; Lai, Y.C.; Chang, P.Y.; Zhang, H.; Thurber, C. A graphics processing unit implementation and optimization for parallel double-difference seismic tomography. Bull. Seismol. Soc. Am. 2014, 104, 953–961. [Google Scholar] [CrossRef]
- Venetis, I.E.; Saltogianni, V.; Stiros, S.; Gallopoulos, E. Multivariable inversion using exhaustive grid search and high-performance GPU processing: A new perspective. Geophys. J. Int. 2020, 221, 905–927. [Google Scholar] [CrossRef]
- Fei, Y.; Rong, G.; Wang, B.; Wang, W. Parallel L-BFGS-B algorithm on gpu. Comput. Graph. 2014, 40, 1–9. [Google Scholar] [CrossRef]
- Hu, J.; Li, Z.; Ding, X.; Zhu, J.; Zhang, L.; Sun, Q. Resolving three-dimensional surface displacements from InSAR measurements: A review. Earth Sci. Rev. 2014, 133, 1–17. [Google Scholar] [CrossRef]
- Jung, H.S.; Lu, Z.; Won, J.S.; Poland, M.P.; Miklius, A. Mapping three-dimensional surface deformation by combining multiple-aperture interferometry and conventional interferometry: Application to the June 2007 eruption of Kilauea volcano, Hawaii. IEEE Geosci. Remote Sens. Lett. 2010, 8, 34–38. [Google Scholar] [CrossRef]
- Beauducel, F. Matlab/Octave Tools for Geophysical Studies. 2014. Available online: https://www.ipgp.fr/~beaudu/matlab.html (accessed on 1 October 2021).
- Ryoo, S.; Rodrigues, C.I.; Stone, S.S.; Baghsorkhi, S.S.; Ueng, S.Z.; Stratton, J.A.; Hwu, W.m.W. Program optimization space pruning for a multithreaded gpu. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’08, Boston, MA, USA, 5–9 April 2008; pp. 195–204. [Google Scholar] [CrossRef] [Green Version]
- Harris, M. Optimizing cuda. In Proceedings of the Tutorial at the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), Reno, NV, USA, 10–16 November 2007; Volume 60. [Google Scholar]
- CUDA Occupancy Calculator. Available online: https://docs.nvidia.com/cuda/cuda-occupancy-calculator/ (accessed on 22 May 2021).
- Zhu, C.; Byrd, R.H.; Lu, P.; Nocedal, J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. TOMS 1997, 23, 550–560. [Google Scholar] [CrossRef]
- Burke, J.V. Nonlinear optimization. Lect. Notes Math. 2014, 408, 80. [Google Scholar]
- Lee, K.K. Final Report of the Korean Government Commission on Relations between the 2017 Pohang Earthquake and EGS Project; Technical Report; The Geological Society of Korea: Gangnam-gu, Seoul, 2019. [Google Scholar]
Name | Location | Scope | Speed |
---|---|---|---|
Register | On-chip | Thread | Extremely fast |
Constant memory | Off-chip (not-cached), on-chip (cached) | GPU | Slow (not-cached), fast (cached) |
Shared memory | On-chip | Thread block | Fast |
Local memory | Off-chip | Thread | Slow |
Global memory | Off-chip | GPU | Slow |
Parameter | Unit | Description |
---|---|---|
E | km | Distance from the reference point to the east |
N | km | Distance from the reference point to the north |
Depth | km | Depth of source |
Strike | degrees | Angle of fault relative north |
Dip | degrees | Angle between the fault and a horizontal plane |
Length | km | Length of fault |
Width | km | Width of fault |
Rake | degrees | Angle of slip relative to the width direction |
Slip | cm | Dislocation in rake direction |
Open | cm | Dislocation in tensile component |
CSE Option | The Number of Precomputed Subexpressions |
---|---|
L | 24 |
ML | 28 |
MH | 40 |
H | 43 |
Subroutine | Computational Cost |
---|---|
Algorithm CP | |
Subspace minimization (direct primal method) | |
Line search (Armijo) | |
Line search (MT, BWW) | |
Update limited memory BFGS matrix | |
Compute objective function f | |
Compute gradient g |
Type | Specification |
---|---|
OS | Ubuntu 18.04 |
CPU | Intel® Core i9-10900K @ 3.70GHz |
RAM | 128GB |
GPU | NVIDIA GeForce RTX 2080 SUPER |
CUDA version | 11.0 |
CUDA compute capability | 7.5 |
Host compiler | g++ 7.5.0 |
Programming Model | Mesh Size | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Sequential (CPU) | 47.3466 | 116.4604 | 122.0598 | 128.4378 | 130.2648 | 132.4892 | 133.0033 | 134.3299 | 134.0760 | 134.9387 |
OpenMP-2 threads (CPU) | 24.5715 | 60.4006 | 63.0262 | 66.2300 | 67.2731 | 68.5220 | 68.7152 | 69.3884 | 69.3488 | 69.6882 |
OpenMP-4 threads (CPU) | 12.3336 | 30.5240 | 31.5752 | 33.2385 | 33.7230 | 34.3336 | 34.4576 | 34.8817 | 34.8903 | 35.0240 |
OpenMP-8 threads (CPU) | 6.2591 | 15.3084 | 15.8690 | 16.7954 | 16.9538 | 17.2413 | 17.2489 | 17.4243 | 17.4365 | 17.5468 |
OpenMP-16 threads (CPU) | 4.9698 | 12.0188 | 12.5567 | 13.2865 | 13.5082 | 13.7300 | 13.7568 | 13.9138 | 13.9230 | 14.0098 |
CUDA_baseline (GPU) | 1.3831 | 2.7500 | 2.8542 | 2.9088 | 2.9322 | 2.9490 | 2.9651 | 2.9782 | 2.9864 | 2.9883 |
Constraints | Parameters | ||||||||
---|---|---|---|---|---|---|---|---|---|
E (km) | N (km) | Depth (km) | Strike (deg.) | Dip (deg.) | Length (km) | Width (km) | Rake (deg.) | Slip (cm) | |
Lower bound | 4.8 | −12.0 | 3.0 | 110.0 | 33.0 | 4.0 | 4.0 | 80.0 | 10.0 |
Upper bound | 10.0 | −6.0 | 5.7 | 235.0 | 55.0 | 6.5 | 6.7 | 150.0 | 30.0 |
Algorithm | RMSE (cm) | Computation Time (s) | ||||||
---|---|---|---|---|---|---|---|---|
Min. | Max. | Mean. | Std. | Min. | Max. | Mean. | Std. | |
Armijo | 0.4741 | 0.5103 | 0.4784 | 0.0030 | 0.2465 | 1.9259 | 1.0198 | 0.2481 |
MT | 0.4745 | 0.8243 | 0.5291 | 0.0589 | 0.2309 | 3.6965 | 1.1638 | 0.5253 |
BWW | 0.4748 | 0.8164 | 0.5273 | 0.0613 | 0.1737 | 11.4612 | 1.8269 | 0.9649 |
cuLBFGSB | 0.4739 | 0.9450 | 0.4860 | 0.0258 | 0.1163 | 2.5101 | 1.1922 | 0.3514 |
Algorithm | Parameters | ||||||||
---|---|---|---|---|---|---|---|---|---|
E (km) | N (km) | Depth (km) | Strike (deg.) | Dip (deg.) | Length (km) | Width (km) | Rake (deg.) | Slip (cm) | |
Armijo | 6.7756 ± 0.1814 | −8.0579 ± 0.1563 | 3.7138 ± 0.2603 | 203.9841 ± 6.5069 | 38.7941 ± 2.3886 | 5.0784 ± 0.4434 | 5.3181 ± 0.4994 | 115.3062 ± 4.7805 | 12.5695 ± 2.0428 |
MT | 6.8278 ± 0.8900 | −7.9336 ± 0.9688 | 3.6811 ± 1.0870 | 201.7411 ± 33.9702 | 38.5657 ± 7.9274 | 4.9229 ± 0.8697 | 5.3982 ± 1.0252 | 112.9773 ± 20.5970 | 12.9347 ± 5.2915 |
BWW | 6.7314 ± 1.0308 | −8.0493 ± 1.0263 | 3.7909 ± 0.8933 | 202.7314 ± 31.6251 | 38.7331 ± 8.3463 | 5.0749 ± 0.8959 | 5.6685 ± 0.9614 | 115.2043 ± 18.3948 | 12.2949 ± 5.4515 |
cuLBFGSB | 6.7366 ± 0.3813 | −7.9948 ± 0.4724 | 3.6931 ± 0.7714 | 204.2539 ± 12.2758 | 38.4440 ± 4.1287 | 5.0455 ± 1.4588 | 5.4938 ± 1.8056 | 114.8131 ± 10.1586 | 12.3322 ± 12.0922 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, S.; Kim, T. Parallel Dislocation Model Implementation for Earthquake Source Parameter Estimation on Multi-Threaded GPU. Appl. Sci. 2021, 11, 9434. https://doi.org/10.3390/app11209434
Lee S, Kim T. Parallel Dislocation Model Implementation for Earthquake Source Parameter Estimation on Multi-Threaded GPU. Applied Sciences. 2021; 11(20):9434. https://doi.org/10.3390/app11209434
Chicago/Turabian StyleLee, Seongjae, and Taehyoun Kim. 2021. "Parallel Dislocation Model Implementation for Earthquake Source Parameter Estimation on Multi-Threaded GPU" Applied Sciences 11, no. 20: 9434. https://doi.org/10.3390/app11209434
APA StyleLee, S., & Kim, T. (2021). Parallel Dislocation Model Implementation for Earthquake Source Parameter Estimation on Multi-Threaded GPU. Applied Sciences, 11(20), 9434. https://doi.org/10.3390/app11209434