Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Characterizing task-based OpenMP programs
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0003-3958-4659
(SICS Swedish ICT AB.)
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS. SICS Swedish ICT, Sweden.ORCID iD: 0000-0002-9637-2065
2015 (English)In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 10, no 4, e0123545- p.Article in journal (Refereed) Published
Abstract [en]

Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. We demonstrate the utility of our method by quickly diagnosing performance problems and characterizing exposed task parallelism and per-task instruction profiles of benchmarks in the widely-used Barcelona OpenMP Tasks Suite. Programmers can tune performance faster and understand performance tradeoffs more effectively than existing tools by using our method to characterize task-based performance.

Place, publisher, year, edition, pages
2015. Vol. 10, no 4, e0123545- p.
Keyword [en]
Scheduling Strategies, Performance Analysis, Benchmark
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-141201DOI: 10.1371/journal.pone.0123545ISI: 000352590300104PubMedID: 25860023Scopus ID: 2-s2.0-84929498034OAI: oai:DiVA.org:kth-141201DiVA: diva2:695691
Note

QC 20150623. Updated from "Manuscript" to "Article".

Available from: 2014-02-12 Created: 2014-02-12 Last updated: 2017-12-06Bibliographically approved
In thesis
1. Locality-aware Scheduling and Characterization of Task-based Programs
Open this publication in new window or tab >>Locality-aware Scheduling and Characterization of Task-based Programs
2014 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Modern computer architectures expose an increasing number of parallel features supported by complex memory access and communication structures. Currently used task scheduling techniques perform poorly since they focus solely on balancing computation load across parallel features and remain oblivious to locality properties of support structures. We contribute with locality-aware task scheduling mechanisms which improve execution time performance on average by 44\% and 11\% respectively on two locality-sensitive architectures - the Tilera TILEPro64 manycore processor and an AMD Opteron 6172 processor based four socket SMP machine.

Programmers need task performance metrics such as amount of task parallelism and task memory hierarchy utilization to analyze performance of task-based programs. However, existing tools indicate performance mainly using thread-centric metrics. Programmers therefore resort to using low-level and tedious thread-centric analysis methods to infer task performance. We contribute with tools and methods to characterize task-based OpenMP programs at the level of tasks using which programmers can quickly understand important properties of the task graph such as critical path and parallelism as well as properties of individual tasks such as instruction count and memory behavior.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2014. vi, 29 p.
Series
TRITA-ICT-ECS AVH, ISSN 1653-6363 ; 14:01
Keyword
Locality-aware, Task scheduling, OpenMP
National Category
Computer Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-141124 (URN)978-91-7501-994-9 (ISBN)
Presentation
2014-03-05, Sal/Hall E, Forum, KTH-ICT, Isafjordsgatan 39, Kista, 10:00 (English)
Opponent
Supervisors
Note

QC 20140212

Available from: 2014-02-12 Created: 2014-02-07 Last updated: 2014-02-12Bibliographically approved
2. Improving OpenMP Productivity with Data Locality Optimizations and High-resolution Performance Analysis
Open this publication in new window or tab >>Improving OpenMP Productivity with Data Locality Optimizations and High-resolution Performance Analysis
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The combination of high-performance parallel programming and multi-core processors is the dominant approach to meet the ever increasing demand for computing performance today. The thesis is centered around OpenMP, a popular parallel programming API standard that enables programmers to quickly get started with writing parallel programs. However, in contrast to the quickness of getting started, writing high-performance OpenMP programs requires high effort and saps productivity.

Part of the reason for impeded productivity is OpenMP’s lack of abstractions and guidance to exploit the strong architectural locality exhibited in NUMA systems and manycore processors. The thesis contributes with data distribution abstractions that enable programmers to distribute data portably in NUMA systems and manycore processors without being aware of low-level system topology details. Data distribution abstractions are supported by the runtime system and leveraged by the second contribution of the thesis – an architecture-specific locality-aware scheduling policy that reduces data access latencies incurred by tasks, allowing programmers to obtain with minimal effort upto 69% improved performance for scientific programs compared to state-of-the-art work-stealing scheduling.

Another reason for reduced programmer productivity is the poor support extended by OpenMP performance analysis tools to visualize, understand, and resolve problems at the level of grains– task and parallel for-loop chunk instances. The thesis contributes with a cost-effective and automatic method to extensively profile and visualize grains. Grain properties and hardware performance are profiled at event notifications from the runtime system with less than 2.5% overheads and visualized using a new method called theGrain Graph. The grain graph shows the program structure that unfolded during execution and highlights problems such as low parallelism, work inflation, and poor parallelization benefit directly at the grain level with precise links to problem areas in source code. The thesis demonstrates that grain graphs can quickly reveal performance problems that are difficult to detect and characterize in fine detail using existing tools in standard programs from SPEC OMP 2012, Parsec 3.0 and Barcelona OpenMP Tasks Suite (BOTS). Grain profiles are also applied to study the input sensitivity and similarity of BOTS programs.

All thesis contributions are assembled together to create an iterative performance analysis and optimization work-flow that enables programmers to achieve desired performance systematically and more quickly than what is possible using existing tools. This reduces pressure on experts and removes the need for tedious trial-and-error tuning, simplifying OpenMP performance analysis.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2016. 208 p.
Series
TRITA-ICT, 2016:1
Keyword
OpenMP, Performance Analysis, Scheduling, Locality Optimizations
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-179670 (URN)978-91-7595-818-7 (ISBN)
Public defence
2016-01-29, Sal C, Sal C, Electrum, Isafjordsgatan 26, Kista, 09:00 (English)
Opponent
Supervisors
Note

QC 20151221

Available from: 2015-12-21 Created: 2015-12-18 Last updated: 2016-01-15Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMedScopus

Authority records BETA

Muddukrishna, AnanyaBrorsson, Mats

Search in DiVA

By author/editor
Muddukrishna, AnanyaBrorsson, Mats
By organisation
Software and Computer systems, SCS
In the same journal
PLoS ONE
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 370 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf