Change search
ReferencesLink to record
Permanent link

Direct link
A Performance Evaluation of MPI Shared Memory Programming
KTH, School of Computer Science and Communication (CSC).
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
En utvärdering av MPI shared memory - programmering med inriktning på prestanda (Swedish)
Abstract [en]

The thesis investigates the Message Passing Interface (MPI) support for shared memory programming on modern hardware architecture with multiple Non-Uniform Memory Access (NUMA) domains. We investigate its performance in two case studies: the matrix-matrix multiplication and Conway’s game of life. We compare MPI shared memory performance in terms of execution time and memory consumption with the performance of implementations using OpenMP and MPI point-to-point communication, also called "MPI two-sided". We perform strong scaling tests in both test cases.

We observe that MPI two-sided implementation is 21% and 18% faster than the MPI shared and OpenMP implementations respectively in the matrix-matrix multiplication when using 32 processes. MPI shared uses less memory space: when compared to MPI two-sided, MPI shared uses 45% less memory. In the Conway’s game of life, we find that MPI two-sided implementation is 10% and 82% faster than the MPI shared and OpenMP implementations respectively when using 32 processes. We also observe that not mapping virtual memory to a specific NUMA domain can lead to an increment in execution time of 64% when using 32 processes. The use of MPI shared is viable for intranode communication on modern hardware architecture with multiple NUMA domains.

Abstract [sv]

I detta examensarbete undersöker vi Message Passing Inferfaces (MPI) support för shared memory programmering på modern hårdvaruarkitektur med flera Non-Uniform Memory Access (NUMA) domäner. Vi undersöker prestanda med hjälp av två fallstudier: matris-matris multiplikation och Conway’s game of life. Vi jämför prestandan utav MPI shared med hjälp utav exekveringstid samt minneskonsumtion jämtemot OpenMP och MPI punkt-till-punkt kommunikation, även känd som MPI two-sided. Vi utför strong scaling tests för båda fallstudierna.

Vi observerar att MPI-two sided är 21% snabbare än MPI shared och 18% snabbare än OpenMP för matris-matris multiplikation när 32 processorer användes. För samma testdata har MPI shared en 45% lägre minnesförburkning än MPI two-sided. För Conway’s game of life är MPI two-sided 10% snabbare än MPI shared samt 82% snabbare än OpenMP implementation vid användandet av 32 processorer. Vi kunde också utskilja att om ingen mappning av virtuella minnet till en specifik NUMA domän görs, leder det till en ökning av exekveringstiden med upp till 64% när 32 processorer används. Vi kom fram till att MPI shared är användbart för intranode kommunikation på modern hårdvaruarkitektur med flera NUMA domäner.

Place, publisher, year, edition, pages
2016. , 66 p.
Keyword [en]
MPI, OpenMP, Shared Memory, HPC, Parallel Programming, NUMA
National Category
Computer Science
URN: urn:nbn:se:kth:diva-188676OAI: diva2:938293
Educational program
Master of Science in Engineering - Computer Science and Technology
Available from: 2016-06-16 Created: 2016-06-16 Last updated: 2016-06-16Bibliographically approved

Open Access in DiVA

fulltext(1179 kB)23 downloads
File information
File name FULLTEXT01.pdfFile size 1179 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 23 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 39 hits
ReferencesLink to record
Permanent link

Direct link