TY - JOUR
T1 - Gem5-AVX
T2 - Extension of the Gem5 Simulator to Support AVX Instruction Sets
AU - Lee, Seungmin
AU - Kim, Youngsok
AU - Nam, Dukyun
AU - Kim, Jong
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - Recent commodity x86 CPUs still dominate the majority of supercomputers and most of them implement vector architectures to support single instruction multiple data (SIMD). Although research on architectural exploration requires computer architecture simulators and a number of simulators have been developed, only a few tools support recent x86 SIMD instructions. This paper describes gem5-AVX, an extended version of the gem5 simulator that enables simulating recent x86 SIMD extensions, especially targeted for high performance computing (HPC). The gem5-AVX comprises advanced vector extension (AVX), AVX2 and subsets of AVX-512, except for cache and memory management instructions. Moreover, it covers full set of streaming SIMD extensions (SSE) and subsequent extensions that are required to simulate HPC workloads. It can simulate the key features of the AVX, AVX2 and AVX-512 such as 256 and 512 bits wide registers, three and four operands syntax, fused multiply-add (FMA), vector gather-scatter using vector scale-index-base (VSIB), mask registers, embedded broadcasting, compressed displacement memory addressing mode. We evaluate the accuracy of gem5-AVX by comparing its results to those of real hardware and Intel's software development emulator (SDE) running benchmark suites,i.e., high-performance linpack (HPL), high-performance conjugate gradient (HPCG) and NAS parallel benchmark (NPB) which are representative programs in the HPC field. The gem5 and gem5-AVX are compared with the speed-up of HPL benchmark according to configuration combinations. Gem5-AVX, with mean absolute percentage errors of 7.3-9.2% and 9.2-11.9%, is more accurate than gem5, which shows mean absolute percentage errors 17.9-21.5% and 19.7-29.7% for Haswell and Skylake processors, respectively.
AB - Recent commodity x86 CPUs still dominate the majority of supercomputers and most of them implement vector architectures to support single instruction multiple data (SIMD). Although research on architectural exploration requires computer architecture simulators and a number of simulators have been developed, only a few tools support recent x86 SIMD instructions. This paper describes gem5-AVX, an extended version of the gem5 simulator that enables simulating recent x86 SIMD extensions, especially targeted for high performance computing (HPC). The gem5-AVX comprises advanced vector extension (AVX), AVX2 and subsets of AVX-512, except for cache and memory management instructions. Moreover, it covers full set of streaming SIMD extensions (SSE) and subsequent extensions that are required to simulate HPC workloads. It can simulate the key features of the AVX, AVX2 and AVX-512 such as 256 and 512 bits wide registers, three and four operands syntax, fused multiply-add (FMA), vector gather-scatter using vector scale-index-base (VSIB), mask registers, embedded broadcasting, compressed displacement memory addressing mode. We evaluate the accuracy of gem5-AVX by comparing its results to those of real hardware and Intel's software development emulator (SDE) running benchmark suites,i.e., high-performance linpack (HPL), high-performance conjugate gradient (HPCG) and NAS parallel benchmark (NPB) which are representative programs in the HPC field. The gem5 and gem5-AVX are compared with the speed-up of HPL benchmark according to configuration combinations. Gem5-AVX, with mean absolute percentage errors of 7.3-9.2% and 9.2-11.9%, is more accurate than gem5, which shows mean absolute percentage errors 17.9-21.5% and 19.7-29.7% for Haswell and Skylake processors, respectively.
KW - AVX
KW - AVX-512
KW - AVX2
KW - Gem5 simulator
KW - x86 SIMD
UR - http://www.scopus.com/inward/record.url?scp=85184305698&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3359296
DO - 10.1109/ACCESS.2024.3359296
M3 - Article
AN - SCOPUS:85184305698
SN - 2169-3536
VL - 12
SP - 20767
EP - 20778
JO - IEEE Access
JF - IEEE Access
ER -