Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Task Scheduling on Manycore Processors with Home Caches
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0003-3958-4659
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-9637-2065
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
2013 (English)In: Euro-Par 2012 Workshops, 2013Conference paper, Published paper (Refereed)
Abstract [en]

Modern manycore processors feature a highly scalable and softwareconfigurablecache hierarchy. For performance, manycore programmers will notonly have to efficiently utilize the large number of cores but also understand andconfigure the cache hierarchy to suit the application. Relief from this manycoreprogramming nightmare can be provided by task-based programming modelswhere programmers parallelize using tasks and an architecture-specific runtimesystem maps tasks to cores and in addition configures the cache hierarchy. In thispaper, we focus on the cache hierarchy of the Tilera TILEPro64 processor whichfeatures a software-configurable coherence waypoint called the home cache. Wefirst show the runtime system performance bottleneck of scheduling tasks obliviousto the nature of home caches. We then demonstrate a technique in whichthe runtime system controls the assignment of home caches to memory blocksand schedules tasks to minimize home cache access penalties. Test results of ourtechnique have shown a significant execution time performance improvement onselected benchmarks leading to the conclusion that by taking processor architecturefeatures into account, task-based programming models can indeed providecontinued performance and allow programmers to smoothly transit from the multicoreto manycore era.

Place, publisher, year, edition, pages
2013.
Series
LNCS 7640
Keyword [en]
Manycore processor, task scheduling, architecture-aware, runtime system
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-107418DOI: 10.1007/978-3-642-36949-0_39ISI: 000341240400039Scopus ID: 2-s2.0-84874439716OAI: oai:DiVA.org:kth-107418DiVA: diva2:583641
Conference
Parallel Processing Workshops, Euro-Par 2012: BDMC 2012, CGWS 2012, HeteroPar 2012, HiBB 2012, OMHI 2012, Paraphrase 2012, PROPER 2012, Resilience 2012, UCHPC 2012, VHPC 2012; Rhodes Island; Greece; 27 August 2012 through 31 August 2012
Projects
ENCORE EU project
Funder
Swedish e‐Science Research CenterICT - The Next Generation
Note

QC 20130108

Available from: 2013-01-08 Created: 2012-12-11 Last updated: 2014-10-03Bibliographically approved
In thesis
1. Locality-aware Scheduling and Characterization of Task-based Programs
Open this publication in new window or tab >>Locality-aware Scheduling and Characterization of Task-based Programs
2014 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Modern computer architectures expose an increasing number of parallel features supported by complex memory access and communication structures. Currently used task scheduling techniques perform poorly since they focus solely on balancing computation load across parallel features and remain oblivious to locality properties of support structures. We contribute with locality-aware task scheduling mechanisms which improve execution time performance on average by 44\% and 11\% respectively on two locality-sensitive architectures - the Tilera TILEPro64 manycore processor and an AMD Opteron 6172 processor based four socket SMP machine.

Programmers need task performance metrics such as amount of task parallelism and task memory hierarchy utilization to analyze performance of task-based programs. However, existing tools indicate performance mainly using thread-centric metrics. Programmers therefore resort to using low-level and tedious thread-centric analysis methods to infer task performance. We contribute with tools and methods to characterize task-based OpenMP programs at the level of tasks using which programmers can quickly understand important properties of the task graph such as critical path and parallelism as well as properties of individual tasks such as instruction count and memory behavior.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2014. vi, 29 p.
Series
TRITA-ICT-ECS AVH, ISSN 1653-6363 ; 14:01
Keyword
Locality-aware, Task scheduling, OpenMP
National Category
Computer Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-141124 (URN)978-91-7501-994-9 (ISBN)
Presentation
2014-03-05, Sal/Hall E, Forum, KTH-ICT, Isafjordsgatan 39, Kista, 10:00 (English)
Opponent
Supervisors
Note

QC 20140212

Available from: 2014-02-12 Created: 2014-02-07 Last updated: 2014-02-12Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Authority records BETA

Muddukrishna, AnanyaBrorsson, Mats

Search in DiVA

By author/editor
Muddukrishna, AnanyaPodobas, ArturBrorsson, MatsVlassov, Vladimir
By organisation
Software and Computer systems, SCS
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 186 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf