TY - GEN
T1 - Enhanced memory management for scalable MPI intra-node communication on many-core processor
AU - Cho, Joong Yeon
AU - Jin, Hyun Wook
AU - Nam, Dukyun
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/9/25
Y1 - 2017/9/25
N2 - As the number of cores installed in a single computing node drastically increases, the intra-node communication between parallel processes becomes more important. The parallel programming models, such as Message Passing Interface (MPI), internally perform memory-intensive operations for intra-node communication. Thus, to address the scalability issue on many-core processors, it is critical to exploit emerging memory features provided by the contemporary computer systems. For example, the latest many-core processors are equipped with a high-bandwidth on-package memory. Modern 64-bit processors also support a large page size (e.g., 2MB), which can significantly reduce the number of TLB misses. The on-package memory and the huge pages have considerable potential for improving the performance of intra-node communication. However, such features are not thoroughly investigated in terms of intranode communication in the literature. In this paper, we propose enhanced memory management schemes to efficiently utilize the on-package memory and provide support for huge pages. The proposed schemes can significantly reduce the data copy and memory mapping overheads in MPI intra-node communication. Our experimental results show that our implementation on MVAPICH2 can improve the bandwidth of point-to-point communication up to 373%, and can reduce the latency of collective communication by 79% on an Intel Xeon Phi Knights Landing (KNL) processor.
AB - As the number of cores installed in a single computing node drastically increases, the intra-node communication between parallel processes becomes more important. The parallel programming models, such as Message Passing Interface (MPI), internally perform memory-intensive operations for intra-node communication. Thus, to address the scalability issue on many-core processors, it is critical to exploit emerging memory features provided by the contemporary computer systems. For example, the latest many-core processors are equipped with a high-bandwidth on-package memory. Modern 64-bit processors also support a large page size (e.g., 2MB), which can significantly reduce the number of TLB misses. The on-package memory and the huge pages have considerable potential for improving the performance of intra-node communication. However, such features are not thoroughly investigated in terms of intranode communication in the literature. In this paper, we propose enhanced memory management schemes to efficiently utilize the on-package memory and provide support for huge pages. The proposed schemes can significantly reduce the data copy and memory mapping overheads in MPI intra-node communication. Our experimental results show that our implementation on MVAPICH2 can improve the bandwidth of point-to-point communication up to 373%, and can reduce the latency of collective communication by 79% on an Intel Xeon Phi Knights Landing (KNL) processor.
KW - Huge page
KW - Intra-node communication
KW - Many-core
KW - MPI
KW - On-package memory
UR - http://www.scopus.com/inward/record.url?scp=85054223737&partnerID=8YFLogxK
U2 - 10.1145/3127024.3127035
DO - 10.1145/3127024.3127035
M3 - Conference contribution
AN - SCOPUS:85054223737
SN - 9781450348492
T3 - ACM International Conference Proceeding Series
BT - EuroMPI 2017 - Proceedings of the 24th European MPI Users� Group Meeting
PB - Association for Computing Machinery
T2 - 24th European MPI Users� Group Meeting, EuroMPI 2017
Y2 - 25 September 2017 through 28 September 2017
ER -