Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Towards Scalable Performance Analysis of MPI Parallel Applications
KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).ORCID-id: 0000-0001-9693-6265
2015 (engelsk)Licentiatavhandling, med artikler (Annet vitenskapelig)
Abstract [en]

  A considerably fraction of science discovery is nowadays relying on computer simulations. High Performance Computing  (HPC) provides scientists with the means to simulate processes ranging from climate modeling to protein folding. However, achieving good application performance and making an optimal use of HPC resources is a heroic task due to the complexity of parallel software. Therefore, performance tools  and runtime systems that help users to execute  applications in the most optimal way are of utmost importance in the landscape of HPC.  In this thesis, we explore different techniques to tackle the challenges of collecting, storing, and using  fine-grained performance data. First, we investigate the automatic use of real-time performance data in order to run applications in an optimal way. To that end, we present a prototype of an adaptive task-based runtime system that uses real-time performance data for task scheduling. This runtime system has a performance monitoring component that provides real-time access to the performance behavior of anapplication while it runs. The implementation of this monitoring component is presented and evaluated within this thesis. Secondly, we explore lossless compression approaches  for MPI monitoring. One of the main problems that  performance tools face is the huge amount of fine-grained data that can be generated from an instrumented application. Collecting fine-grained data from a program is the best method to uncover the root causes of performance bottlenecks, however, it is unfeasible with extremely parallel applications  or applications with long execution times. On the other hand, collecting coarse-grained data is scalable but  sometimes not enough to discern the root cause of a performance problem. Thus, we propose a new method for performance monitoring of MPI programs using event flow graphs. Event flow graphs  provide very low overhead in terms of execution time and  storage size, and can be used to reconstruct fine-grained trace files of application events ordered in time.

sted, utgiver, år, opplag, sider
Stockholm: KTH Royal Institute of Technology, 2015. , s. viii, 39
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 2015:05
Emneord [en]
parallel computing, performance monitoring, performance tools, event flow graphs
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-165043ISBN: 978-91-7595-518-6 (tryckt)OAI: oai:DiVA.org:kth-165043DiVA, id: diva2:806809
Presentation
2015-05-20, The Visualization Studio, room 4451, Lindstedtsvägen 5, KTH, Stockholm, 10:00 (engelsk)
Opponent
Veileder
Merknad

QC 20150508

Tilgjengelig fra: 2015-05-08 Laget: 2015-04-21 Sist oppdatert: 2015-05-08bibliografisk kontrollert
Delarbeid
1. Scaling Dalton, a molecular electronic structure program
Åpne denne publikasjonen i ny fane eller vindu >>Scaling Dalton, a molecular electronic structure program
Vise andre…
2011 (engelsk)Inngår i: Seventh International Conference on e-Science, e-Science 2011, 5-8 December 2011, Stockholm, Sweden, IEEE conference proceedings, 2011, s. 256-262Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a characterization and performance optimization of Dalton that increases the scalability and parallel efficiency of the application. We also propose asolution that helps to avoid the master/worker design of Daltonto become a performance bottleneck for larger process numbers and increase the parallel efficiency.

sted, utgiver, år, opplag, sider
IEEE conference proceedings, 2011
Emneord
Chemistry, Image color analysis, Libraries, Measurement, Optimization, Quantum mechanics, Wave functions
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-50421 (URN)10.1109/eScience.2011.43 (DOI)2-s2.0-84856350618 (Scopus ID)978-1-4577-2163-2 (ISBN)
Konferanse
Seventh International Conference on e-Science, e-Science 2011, 5-8 December 2011, Stockholm, Sweden
Forskningsfinansiär
Swedish e‐Science Research Center, OpCoReSEU, FP7, Seventh Framework Programme, INFSO RI-261523Swedish e‐Science Research Center
Merknad
Copyright 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. QC 20120110Tilgjengelig fra: 2012-01-10 Laget: 2011-12-05 Sist oppdatert: 2018-01-12bibliografisk kontrollert
2. Design and Implementation of a Runtime System for Parallel Numerical Simulations on Large-Scale Clusters
Åpne denne publikasjonen i ny fane eller vindu >>Design and Implementation of a Runtime System for Parallel Numerical Simulations on Large-Scale Clusters
2011 (engelsk)Inngår i: Proceedings Of The International Conference On Computational Science (ICCS) / [ed] Sato, M; Matsuoka, S; Sloot, PMA; VanAlbada, GD; Dongarra, J, Elsevier, 2011, Vol. 4, s. 2105-2114Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The execution of scientific codes will introduce a number of new challenges and intensify some old ones on new high-performance computing infrastructures. Petascale computers are large systems with complex designs using heterogeneous technologies that make the programming and porting of applications difficult, particularly if one wants to use the maximum peak performance of the system. In this paper we present the design and first prototype of a runtime system for parallel numerical simulations on large-scale systems. The proposed runtime system addresses the challenges of performance, scalability, and programmability of large-scale HPC systems. We also present initial results of our prototype implementation using a molecular dynamics application kernel.

sted, utgiver, år, opplag, sider
Elsevier, 2011
Serie
Procedia Computer Science, ISSN 1877-0509 ; 4
Emneord
Hybrid computational methods, Parallel computing, Advanced computing architectures, Runtime systems
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-38886 (URN)10.1016/j.procs.2011.04.230 (DOI)000299165200229 ()2-s2.0-79958278307 (Scopus ID)
Konferanse
11th International Conference on Computational Science, ICCS 2011. Singapore. 1 June 2011 - 3 June 2011
Forskningsfinansiär
Swedish e‐Science Research Center
Merknad
QC 20120110Tilgjengelig fra: 2011-09-02 Laget: 2011-09-02 Sist oppdatert: 2018-01-12bibliografisk kontrollert
3. Online Performance Data Introspection with IPM
Åpne denne publikasjonen i ny fane eller vindu >>Online Performance Data Introspection with IPM
2014 (engelsk)Inngår i: Proceedings of the 15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013), IEEE Computer Society, 2014, s. 728-734Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Exascale systems will be heterogeneous architectures with multiple levels of concurrency and energy constraints. In such a complex scenario, performance monitoring and runtime systems play a major role to obtain good application performance and scalability. Furthermore, online access to performance data becomes a necessity to decide how to schedule resources and orchestrate computational elements: processes, threads, tasks, etc. We present the Performance Introspection API, an extension of the IPM tool that provides online runtime access to performance data from an application while it runs. We describe its design and implementation and show its overhead on several test benchmarks. We also present a real test case using the Performance Introspection API in conjunction with processor frequency scaling to reduce power consumption.

sted, utgiver, år, opplag, sider
IEEE Computer Society, 2014
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-136212 (URN)10.1109/HPCC.and.EUC.2013.107 (DOI)2-s2.0-84903964607 (Scopus ID)978-076955088-6 (ISBN)
Konferanse
The 15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013). Zhangjiajie , China , November 13-15, 2013.
Merknad

QC 20140602

Tilgjengelig fra: 2013-12-04 Laget: 2013-12-04 Sist oppdatert: 2015-05-08bibliografisk kontrollert
4. MPI Trace Compression Using Event Flow Graphs
Åpne denne publikasjonen i ny fane eller vindu >>MPI Trace Compression Using Event Flow Graphs
2014 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Understanding how parallel applications behave is crucial for using high-performance computing (HPC) resources efficiently. However, the task of performance analysis is becoming increasingly difficult due to the growing complexity of scientific codes and the size of machines. Even though many tools have been developed over the past years to help in this task, current approaches either only offer an overview of the application discarding temporal information, or they generate huge trace files that are often difficult to handle.

In this paper we propose the use of event flow graphs for monitoring MPI applications, a new and different approach that balances the low overhead of profiling tools with the abundance of information available from tracers. Event flow graphs are captured with very low overhead, require orders of magnitude less storage than standard trace files, and can still recover the full sequence of events in the application. We test this new approach with the NERSC-8/Trinity Benchmark suite and achieve compression ratios up to 119x.

Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 8632
Emneord
MPI event flow graphs, trace compression, trace reconstruction, performance monitoring
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-165042 (URN)2-s2.0-84958532986 (Scopus ID)
Konferanse
Euro-Par 2014 Parallel Processing
Merknad

QC 20150423. QC 20160314

Tilgjengelig fra: 2015-04-21 Laget: 2015-04-21 Sist oppdatert: 2017-04-28bibliografisk kontrollert

Open Access i DiVA

Licentiate Thesis(1546 kB)377 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 1546 kBChecksum SHA-512
ca03734ba8242a84b2108fb439084fc9d0fe6507cfd06d148aa83909ae4d1634852e69781e9661bb337464a56b575c22e9d2549d504458d591dfe79925380bb5
Type fulltextMimetype application/pdf

Personposter BETA

Aguilar, Xavier

Søk i DiVA

Av forfatter/redaktør
Aguilar, Xavier
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 377 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 1054 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf