Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A comparative performance study of common and popular task-centric programming frameworks
KTH, Skolan för informations- och kommunikationsteknik (ICT), Programvaruteknik och Datorsystem, SCS.
KTH, Skolan för informations- och kommunikationsteknik (ICT), Programvaruteknik och Datorsystem, SCS.ORCID-id: 0000-0002-9637-2065
Swedish Institute of Computer Science.
2013 (Engelska)Ingår i: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634Artikel i tidskrift, Editorial material (Refereegranskat) Published
Abstract [en]

SUMMARY: Programmers today face a bewildering array of parallel programming models and tools, making it difficult to choose an appropriate one for each application. An increasingly popular programming model supporting structured parallel programming patterns in a portable and composable manner is the task-centric programming model. In this study, we compare several popular task-centric programming frameworks, including Cilk Plus, Threading Building Blocks, and various implementations of OpenMP 3.0. We have analyzed their performance on the Barcelona OpenMP Tasking Suite benchmark suite both on a 48-core AMD Opteron 6172 server and a 64-core TILEPro64 embedded many-core processor. Our results show that the OpenMP offers the highest flexibility for programmers, and this flexibility comes to a cost. Frameworks supporting only a specific and more restrictive model, such as Cilk Plus and Threading Building Blocks, are generally more efficient both in terms of performance and energy consumption. However, Intel's implementation of OpenMP tasks performs the best and closest to the specialized run-time systems.

Ort, förlag, år, upplaga, sidor
2013.
Nyckelord [en]
OpenMP, cilk, TBB, wool, scheduling task performance
Nationell ämneskategori
Datorsystem
Identifikatorer
URN: urn:nbn:se:kth:diva-143625DOI: 10.1002/cpe.3186ISI: 000349085100001Scopus ID: 2-s2.0-84919463068OAI: oai:DiVA.org:kth-143625DiVA, id: diva2:707915
Anmärkning

QC 20140626

Tillgänglig från: 2014-03-26 Skapad: 2014-03-26 Senast uppdaterad: 2017-12-05Bibliografiskt granskad
Ingår i avhandling
1. Improving Performance and Quality-of-Service through the Task-Parallel Model​: Optimizations and Future Directions for OpenMP
Öppna denna publikation i ny flik eller fönster >>Improving Performance and Quality-of-Service through the Task-Parallel Model​: Optimizations and Future Directions for OpenMP
2015 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

With the failure of Dennard's scaling, which stated that shrinking transistors will be more power-efficient, computer hardware has today become very divergent. Initially the change only concerned the number of processor on a chip (multicores), but has today further escalated into complex heterogeneous system with non-intuitive properties -- properties that can improve performance and power consumption but also strain the programmer expected to develop on them.

Answering these challenges is the OpenMP task-parallel model -- a programming model that simplifies writing parallel software. Our focus in the thesis has been to explore performance and quality-of-service directions of the OpenMP task-parallel model, particularly by taking architectural features into account.

The first question tackled is: what capabilities does existing state of the art runtime-systems have and how do they perform? We empirically evaluated the performance of several modern task-parallel runtime-systems. Performance and power-consumption was measured through the use of benchmarks and we show that the two primary causes for bottlenecks in modern runtime-systems lies in either the task management overheads or how tasks are being distributed across processors.

Next, we consider quality-of-service improvements in task-parallel runtime-systems. Striving to improve execution performance, current state of the art runtime-systems seldom take dynamic architectural features such as temperature into account when deciding how work should be distributed across the processors, which can lead to overheating. We developed and evaluated two strategies for thermal-awareness in task-parallel runtime-systems. The first improves performance when the computer system is constrained by temperature while the second strategy strives to reduce temperature while meeting soft real-time objectives.

We end the thesis by focusing on performance. Here we introduce our original contribution called BLYSK -- a prototype OpenMP framework created exclusively for performance research.

We found that overheads in current runtime-systems can be expensive, which often lead to performance degradation. We introduce a novel way of preserving task-graphs throughout application runs: task-graphs are recorded, identified and optimized the first time an OpenMP application is executed and are later re-used in following executions, removing unnecessary overheads. Our proposed solution can nearly double the performance compared with other state of the art runtime-systems.

Performance can also be improved through heterogeneity. Today, manufacturers are placing processors with different capabilities on the same chip. Because they are different, their power-consuming characteristics and performance differ. Heterogeneity adds another dimension to the multiprocessing problem: how should work be distributed across the heterogeneous processors?We evaluated the performance of existing, homogeneous scheduling algorithms and found them to be an ill-match for heterogeneous systems. We proposed a novel scheduling algorithm that dynamically adjusts itself to the heterogeneous system in order to improve performance.

The thesis ends with a high-level synthesis approach to improve performance in task-parallel applications. Rather than limiting ourselves to off-the-shelf processors -- which often contains a large amount of unused logic -- our approach is to automatically generate the processors ourselves. Our method allows us to generate application-specific hardware from the OpenMP task-parallel source code. Evaluated using FPGAs, the performance of our System-on-Chips outperformed other soft-cores such as the NiosII processor and were also comparable in performance with modern state of the art processors such as the Xeon PHI and the AMD Opteron.

Ort, förlag, år, upplaga, sidor
Stockholm: KTH Royal Institute of Technology, 2015. s. 64
Serie
TRITA-ICT ; 2015:13
Nyckelord
Task Parallel, OpenMP, Scheduling, OmpSs, multicore, manycore
Nationell ämneskategori
Kommunikationssystem
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-175539 (URN)978-91-7595-711-1 (ISBN)
Disputation
2015-11-10, Sal A, KTH Kista, Electrum Kistagången 16, Kista, 10:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20151016

Tillgänglig från: 2015-10-16 Skapad: 2015-10-16 Senast uppdaterad: 2015-10-16Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Brorsson, Mats

Sök vidare i DiVA

Av författaren/redaktören
Podobas, ArturBrorsson, MatsFaxén, Karl-Filip
Av organisationen
Programvaruteknik och Datorsystem, SCS
I samma tidskrift
Concurrency and Computation
Datorsystem

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 472 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf