Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Pandia: Comprehensive contention-sensitive thread placement
KTH. University of Luxembourg, Luxembourg.ORCID iD: 0000-0002-7860-6593
2017 (English)In: Proceedings of the 12th European Conference on Computer Systems, EuroSys 2017, Association for Computing Machinery, Inc , 2017, 254-269 p.Conference paper (Refereed)
Abstract [en]

Pandia is a system for modeling the performance of in-memory parallel workloads. It generates a description of a workload from a series of profiling runs, and combines this with a description of the machine's hardware to model the workload's performance over different thread counts and different placements of those threads. The approach is "comprehensive" in that it accounts for contention at multiple resources such as processor functional units and memory channels. The points of contention for a workload can shift between resources as the degree of parallelism and thread placement changes. Pandia accounts for these changes and provides a close correspondence between predicted performance and actual performance. Testing a set of 22 benchmarks on 2 socket Intel machines fitted with chips ranging from Sandy Bridge to Haswell we see median differences of 1.05% to 0% between the fastest predicted placement and the fastest measured placement, and median errors of 8% to 4% across all placements. Pandia can be used to optimize the performance of a given workload - for instance, identifying whether or not multiple processor sockets should be used, and whether or not the workload benefits from using multiple threads per core. In addition, Pandia can be used to identify opportunities for reducing resource consumption where additional resources are not matched by additional performance - for instance, limiting a workload to a small number of cores when its scaling is poor.

Place, publisher, year, edition, pages
Association for Computing Machinery, Inc , 2017. 254-269 p.
Keyword [en]
Computer science, Computers, Degree of parallelism, Functional units, Multiple processors, Multiple resources, Multiple threads, Parallel workloads, Predicted performance, Resource consumption, Parallel processing systems
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-216522DOI: 10.1145/3064176.3064177Scopus ID: 2-s2.0-85019261110ISBN: 9781450349383 OAI: oai:DiVA.org:kth-216522DiVA: diva2:1161834
Conference
12th European Conference on Computer Systems, EuroSys 2017, 23 April 2017 through 26 April 2017
Note

QC 20171201

Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2017-12-01Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Authority records BETA

Varisteas, Georgios

Search in DiVA

By author/editor
Varisteas, Georgios
By organisation
KTH
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf