Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring heterogeneous scheduling using the task-centric programming model
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-9637-2065
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
2013 (English)In: Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349, Vol. 7640Article in journal (Refereed) Published
Abstract [en]

Computer architecture technology is moving towards more heteroge-neous solutions, which will contain a number of processing units with different capabilities that may increase the performance of the system as a whole. How-ever, with increased performance comes increased complexity; complexity that is now barely handled in homogeneous multiprocessing systems. The present study tries to solve a small piece of the heterogeneous puzzle; how can we exploit all system resources in a performance-effective and user-friendly way? Our proposed solution includes a run-time system capable of using a variety of different heterogeneous components while providing the user with the already familiar task-centric programming model interface. Furthermore, when dealing with non-uniform workloads, we show that traditional approaches based on centralized or work-stealing queue algorithms do not work well and propose a scheduling algorithm based on trend analysis to distribute work in a performance-effective way across resources.

Place, publisher, year, edition, pages
2013. Vol. 7640
Keywords [en]
Task Scheduling, OpenMP, GPU, Tilera, Work-Stealing, Performance
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-120436DOI: 10.1007/978-3-642-36949-0_16ISI: 000341240400016Scopus ID: 2-s2.0-84874433328OAI: oai:DiVA.org:kth-120436DiVA, id: diva2:614720
Conference
HeteroPAR'2012: Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms, August 27, 2012, Rhodes Island, Greece
Projects
ENCORE
Funder
Swedish e‐Science Research Center
Note

QC 20130429

Available from: 2013-04-05 Created: 2013-04-05 Last updated: 2017-12-06Bibliographically approved
In thesis
1. Performance-driven exploration using Task-based Parallel Programming Frameworks
Open this publication in new window or tab >>Performance-driven exploration using Task-based Parallel Programming Frameworks
2013 (English)Licentiate thesis, comprehensive summary (Other academic)
Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2013. p. 39
Series
Trita-ICT-ECS AVH, ISSN 1653-6363 ; 13:08
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-122569 (URN)978-91-7501-718-1 (ISBN)
Presentation
2013-05-28, Sal D, KTH Kista Forum, Isafjordsagatan 39, Kista, 13:00 (English)
Opponent
Supervisors
Note

QC 20130530

Available from: 2013-05-30 Created: 2013-05-23 Last updated: 2013-06-25Bibliographically approved
2. Improving Performance and Quality-of-Service through the Task-Parallel Model​: Optimizations and Future Directions for OpenMP
Open this publication in new window or tab >>Improving Performance and Quality-of-Service through the Task-Parallel Model​: Optimizations and Future Directions for OpenMP
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

With the failure of Dennard's scaling, which stated that shrinking transistors will be more power-efficient, computer hardware has today become very divergent. Initially the change only concerned the number of processor on a chip (multicores), but has today further escalated into complex heterogeneous system with non-intuitive properties -- properties that can improve performance and power consumption but also strain the programmer expected to develop on them.

Answering these challenges is the OpenMP task-parallel model -- a programming model that simplifies writing parallel software. Our focus in the thesis has been to explore performance and quality-of-service directions of the OpenMP task-parallel model, particularly by taking architectural features into account.

The first question tackled is: what capabilities does existing state of the art runtime-systems have and how do they perform? We empirically evaluated the performance of several modern task-parallel runtime-systems. Performance and power-consumption was measured through the use of benchmarks and we show that the two primary causes for bottlenecks in modern runtime-systems lies in either the task management overheads or how tasks are being distributed across processors.

Next, we consider quality-of-service improvements in task-parallel runtime-systems. Striving to improve execution performance, current state of the art runtime-systems seldom take dynamic architectural features such as temperature into account when deciding how work should be distributed across the processors, which can lead to overheating. We developed and evaluated two strategies for thermal-awareness in task-parallel runtime-systems. The first improves performance when the computer system is constrained by temperature while the second strategy strives to reduce temperature while meeting soft real-time objectives.

We end the thesis by focusing on performance. Here we introduce our original contribution called BLYSK -- a prototype OpenMP framework created exclusively for performance research.

We found that overheads in current runtime-systems can be expensive, which often lead to performance degradation. We introduce a novel way of preserving task-graphs throughout application runs: task-graphs are recorded, identified and optimized the first time an OpenMP application is executed and are later re-used in following executions, removing unnecessary overheads. Our proposed solution can nearly double the performance compared with other state of the art runtime-systems.

Performance can also be improved through heterogeneity. Today, manufacturers are placing processors with different capabilities on the same chip. Because they are different, their power-consuming characteristics and performance differ. Heterogeneity adds another dimension to the multiprocessing problem: how should work be distributed across the heterogeneous processors?We evaluated the performance of existing, homogeneous scheduling algorithms and found them to be an ill-match for heterogeneous systems. We proposed a novel scheduling algorithm that dynamically adjusts itself to the heterogeneous system in order to improve performance.

The thesis ends with a high-level synthesis approach to improve performance in task-parallel applications. Rather than limiting ourselves to off-the-shelf processors -- which often contains a large amount of unused logic -- our approach is to automatically generate the processors ourselves. Our method allows us to generate application-specific hardware from the OpenMP task-parallel source code. Evaluated using FPGAs, the performance of our System-on-Chips outperformed other soft-cores such as the NiosII processor and were also comparable in performance with modern state of the art processors such as the Xeon PHI and the AMD Opteron.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2015. p. 64
Series
TRITA-ICT ; 2015:13
Keywords
Task Parallel, OpenMP, Scheduling, OmpSs, multicore, manycore
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-175539 (URN)978-91-7595-711-1 (ISBN)
Public defence
2015-11-10, Sal A, KTH Kista, Electrum Kistagången 16, Kista, 10:00 (English)
Opponent
Supervisors
Note

QC 20151016

Available from: 2015-10-16 Created: 2015-10-16 Last updated: 2015-10-16Bibliographically approved

Open Access in DiVA

fulltext(284 kB)187 downloads
File information
File name FULLTEXT01.pdfFile size 284 kBChecksum SHA-512
75c02851296dd85a09871536ca4f5b02fcd4789e74f5d39b016c36391240d343e3e40d4945211a260d939fdfb4bd357c60dbc433ee0b21d53e44c6b4de173ed8
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopushttp://pm.bsc.es/heteropar12/papers/heteropar2012_submission_29.pdf

Authority records BETA

Brorsson, Mats

Search in DiVA

By author/editor
Podobas, ArturBrorsson, MatsVlassov, Vladimir
By organisation
Software and Computer systems, SCS
In the same journal
Lecture Notes in Computer Science
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 187 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 143 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf