TY - JOUR
T1 - Development of a parallel CUDA algorithm for solving 3D guiding center problems
AU - Bak, Soyoon
AU - Kim, Philsu
AU - Park, Sangbeom
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/7
Y1 - 2022/7
N2 - In this study, we develop a novel compute unified device architecture (CUDA) algorithm, which we call C-ECM3, for solving a three-dimensional (3D) guiding center problem. The C-ECM3 is a parallel algorithm for the iterative-free backward semi-Lagrangian method with third-order temporal accuracy (ECM3). One well known challenge in speeding up a CUDA program is to efficiently design kernel functions that can optimally use hierarchical memory classified according to access speed. To solve this challenge, the C-ECM3 is mainly devoted to making a decomposition strategy for solving the tremendous number of generated Cauchy problems. The decomposition strategy divides the 9×9 linear system for each Cauchy problem in the ECM3 into two 3×3 linear systems, more solverable parts. In addition, the strategy explicitly solves these small systems using Cramer's rule. It turns out that the proposed C-ECM3 enables us to design an array-free kernel function that efficiently uses hierarchical memory. In addition, the C-ECM3 significantly reduces the run-time for tracing trajectories of particles compared to other graphics processing unit (GPU) programs that use the usual Gaussian algorithm. The Kelvin-Helmholtz instability and a 3D guiding center problem are simulated to demonstrate the numerical evidence for the C-ECM3. With these numerical experiments, we verify that the proposed C-ECM3 significantly improves computational speed compared to other methods while maintaining the accuracy of the CPU (central processing unit) version of ECM3. The validity of the C-ECM3 is also confirmed by showing that it satisfies Shoucri's analysis for Kelvin-Helmholtz instability.
AB - In this study, we develop a novel compute unified device architecture (CUDA) algorithm, which we call C-ECM3, for solving a three-dimensional (3D) guiding center problem. The C-ECM3 is a parallel algorithm for the iterative-free backward semi-Lagrangian method with third-order temporal accuracy (ECM3). One well known challenge in speeding up a CUDA program is to efficiently design kernel functions that can optimally use hierarchical memory classified according to access speed. To solve this challenge, the C-ECM3 is mainly devoted to making a decomposition strategy for solving the tremendous number of generated Cauchy problems. The decomposition strategy divides the 9×9 linear system for each Cauchy problem in the ECM3 into two 3×3 linear systems, more solverable parts. In addition, the strategy explicitly solves these small systems using Cramer's rule. It turns out that the proposed C-ECM3 enables us to design an array-free kernel function that efficiently uses hierarchical memory. In addition, the C-ECM3 significantly reduces the run-time for tracing trajectories of particles compared to other graphics processing unit (GPU) programs that use the usual Gaussian algorithm. The Kelvin-Helmholtz instability and a 3D guiding center problem are simulated to demonstrate the numerical evidence for the C-ECM3. With these numerical experiments, we verify that the proposed C-ECM3 significantly improves computational speed compared to other methods while maintaining the accuracy of the CPU (central processing unit) version of ECM3. The validity of the C-ECM3 is also confirmed by showing that it satisfies Shoucri's analysis for Kelvin-Helmholtz instability.
KW - Backward semi-Lagrangian method
KW - Compute unified device architecture
KW - Graphics processing units
KW - Guiding center problem
KW - Parallel computing
UR - http://www.scopus.com/inward/record.url?scp=85125760820&partnerID=8YFLogxK
U2 - 10.1016/j.cpc.2022.108331
DO - 10.1016/j.cpc.2022.108331
M3 - Article
AN - SCOPUS:85125760820
SN - 0010-4655
VL - 276
JO - Computer Physics Communications
JF - Computer Physics Communications
M1 - 108331
ER -