Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Locality-aware Task Scheduling and Data Distribution on NUMA Systems
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0003-3958-4659
(SICS Swedish ICT AB.)
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-9637-2065
2013 (English)In: OpenMP in the Era of Low Power Devices and Accelerators: 9th International Workshop on OpenMP, IWOMP 2013, Canberra, Australia, September 16-18, 2013 / [ed] Alistair P Rendell, Barbara M. Chapman, Matthias S.Müller, Springer Science+Business Media B.V., 2013Conference paper, Published paper (Refereed)
Abstract [en]

Modern parallel computer systems exhibit Non-Uniform Memory Access (NUMA) behavior. For best performance, any parallel program therefore has to match data allocation and scheduling of computations to the memory architecture of the machine. When done manually, this becomes a tedious process and since each individual system has its own peculiarities this also leads to programs that are not performance-portable.

We propose the use of a data distribution scheme in which NUMA hardware peculiarities are abstracted away from the programmer and data distribution is delegated to a runtime system which is generated once for each machine. In addition we propose using task data dependence information now possible with the OpenMP 4.0RC2 proposal to guide the scheduling of OpenMP tasks to further reduce data stall times.

We demonstrate the viability and performance of our proposals on a four socket AMD Opteron machine with eight NUMA nodes. We identify that both data distribution and locality-aware task scheduling improves performance compared to default policies while still providing an architecture-oblivious approach for the programmer.

Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2013.
Series
Lecture Notes in Computer Science, 8122
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-124881DOI: 10.1007/978-3-642-40698-0_12Scopus ID: 2-s2.0-84883296523ISBN: 978-3-642-40697-3 (print)OAI: oai:DiVA.org:kth-124881DiVA: diva2:638662
Conference
International Workshop on OpenMP (IWOMP),September 16-18 2013, Canberra, Australia
Note

QC 20130924

Available from: 2013-08-01 Created: 2013-08-01 Last updated: 2014-02-12Bibliographically approved
In thesis
1. Locality-aware Scheduling and Characterization of Task-based Programs
Open this publication in new window or tab >>Locality-aware Scheduling and Characterization of Task-based Programs
2014 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Modern computer architectures expose an increasing number of parallel features supported by complex memory access and communication structures. Currently used task scheduling techniques perform poorly since they focus solely on balancing computation load across parallel features and remain oblivious to locality properties of support structures. We contribute with locality-aware task scheduling mechanisms which improve execution time performance on average by 44\% and 11\% respectively on two locality-sensitive architectures - the Tilera TILEPro64 manycore processor and an AMD Opteron 6172 processor based four socket SMP machine.

Programmers need task performance metrics such as amount of task parallelism and task memory hierarchy utilization to analyze performance of task-based programs. However, existing tools indicate performance mainly using thread-centric metrics. Programmers therefore resort to using low-level and tedious thread-centric analysis methods to infer task performance. We contribute with tools and methods to characterize task-based OpenMP programs at the level of tasks using which programmers can quickly understand important properties of the task graph such as critical path and parallelism as well as properties of individual tasks such as instruction count and memory behavior.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2014. vi, 29 p.
Series
TRITA-ICT-ECS AVH, ISSN 1653-6363 ; 14:01
Keyword
Locality-aware, Task scheduling, OpenMP
National Category
Computer Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-141124 (URN)978-91-7501-994-9 (ISBN)
Presentation
2014-03-05, Sal/Hall E, Forum, KTH-ICT, Isafjordsgatan 39, Kista, 10:00 (English)
Opponent
Supervisors
Note

QC 20140212

Available from: 2014-02-12 Created: 2014-02-07 Last updated: 2014-02-12Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopusConference website

Authority records BETA

Muddukrishna, AnanyaBrorsson, Mats

Search in DiVA

By author/editor
Muddukrishna, AnanyaVlassov, VladimirBrorsson, Mats
By organisation
Software and Computer systems, SCS
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 123 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf